SHIP DESIGN ——————— A Rational Approach Giorgio Trincasunina.stidue.net/Universita' di Trieste/Ingegneria... · 2009. 6. 24. · Department of Naval Architecture, Ocean

Department of Naval Architecture,

Ocean and Environmental Engineering——————

Faculty of Engineering

University of Trieste

SHIP DESIGN———————

A Rational Approach

Giorgio Trincas

2009

II

Table of contents

1 Design Science 11.1 Contents of Design Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Design and Technical Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Design and Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.1 Knowledge and Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.2 Engineering and Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.3 Development of Design Knowledge . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 Areas of Design Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4.1 Theory of Technical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4.2 Theory of Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Bibliography 23

2 Standard Design Processes 252.1 Design Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.1 Descriptive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.1.2 Prescriptive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.1.3 Hybrid Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2 Design Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3 Systems Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.3.2 Process Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.3.3 Synergy with Information Technology . . . . . . . . . . . . . . . . . . . . . 352.3.4 Critique of Systems Engineering . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4 Concurrent Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.4.1 Basic principles and benefits . . . . . . . . . . . . . . . . . . . . . . . . . . 392.4.2 Information Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.4.3 Concurrent Engineering Environment . . . . . . . . . . . . . . . . . . . . . 43

2.5 Quality Function Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.6 Design Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.7 Decision–Based Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.7.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.7.2 Design Types and Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.8 Decision Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

III

2.8.1 Design Time Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.8.2 Designing for an Original Product . . . . . . . . . . . . . . . . . . . . . . . 632.8.3 Metadesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632.8.4 Axioms of Decision–Based Design . . . . . . . . . . . . . . . . . . . . . . . 64

Bibliography 67

3 Design As a Multicriterial Decision–Making Process 693.1 Decision Making Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.1.1 Decision Making in Technical Systems Design . . . . . . . . . . . . . . . . . 713.2 Basic Concepts of Multicriterial Decision Making . . . . . . . . . . . . . . . . . . . 74

3.2.1 What is Multicriterial Decision Making? . . . . . . . . . . . . . . . . . . . . 743.2.2 Individual Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.2.3 Group Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.2.4 Elements of MCDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.3 Multicriterial Decision–Making Theory . . . . . . . . . . . . . . . . . . . . . . . . . 813.3.1 Properties of Attributes/Objectives . . . . . . . . . . . . . . . . . . . . . . 833.3.2 Typology of MCDM Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

3.4 Nondominance and Pareto Optimality . . . . . . . . . . . . . . . . . . . . . . . . . 863.5 Theory of the Displaced Ideal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3.5.1 Measurement of Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . 903.5.2 Traditional Utility Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 913.5.3 Ideal Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923.5.4 Key Concepts and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 933.5.5 Fuzziness and Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953.5.6 Membership Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 973.5.7 Multiattribute Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 983.5.8 Composite Membership Functions . . . . . . . . . . . . . . . . . . . . . . . 100

3.6 Multiattribute Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073.7 Multiattribute Utility Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.7.1 Additive Utility Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1103.7.2 Risk and Utility Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

3.8 Multiattribute Concept Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1143.8.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163.8.2 Concept Design Process Description . . . . . . . . . . . . . . . . . . . . . . 118

3.9 Multiobjective Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213.9.1 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1233.9.2 Goal Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1243.9.3 Compromise Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 1273.9.4 Physical Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

3.10 Advanced Decision Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 1373.10.1 Distributed Decision Support Systems . . . . . . . . . . . . . . . . . . . . . 1373.10.2 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

IV

Bibliography 143

4 Multiattribute Solution Methods 1474.1 Decision Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1484.2 Measuring Attribute Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

4.2.1 Mean of the Normalized Values . . . . . . . . . . . . . . . . . . . . . . . . . 1514.2.2 Eigenvector Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1514.2.3 Weighted Least Square Method . . . . . . . . . . . . . . . . . . . . . . . . . 1524.2.4 Entropy Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

4.3 Selection Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1564.4 Methods with No Preference Information . . . . . . . . . . . . . . . . . . . . . . . 158

4.4.1 Dominance Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1584.4.2 Maximin Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1614.4.3 Maximax Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

4.5 Selection Methods with Information on Attributes . . . . . . . . . . . . . . . . . . 1634.6 Methods with Threshold on Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 163

4.6.1 Conjunctive Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1644.6.2 Disjunctive Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

4.7 Methods with Ordinal Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 1654.7.1 Lexicographic Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1664.7.2 Elimination by Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1674.7.3 Permutation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

4.8 Methods with Cardinal Information . . . . . . . . . . . . . . . . . . . . . . . . . . 1704.8.1 Analytical Hierarchy Process . . . . . . . . . . . . . . . . . . . . . . . . . . 1704.8.2 Simple Additive Weighting Method . . . . . . . . . . . . . . . . . . . . . . . 1724.8.3 Hierarchical Additive Weighting Method . . . . . . . . . . . . . . . . . . . . 1734.8.4 Linear Assignment Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 1754.8.5 ELECTRE Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1764.8.6 TOPSIS Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

4.9 MAUT Method of Group Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . 1854.10 Methods for Trade–offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

4.10.1 Marginal Rate of Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . 1874.10.2 Indifference Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1884.10.3 Indifference curves in SAW and TOPSIS . . . . . . . . . . . . . . . . . . . . 1904.10.4 Hierarchical Trade–Offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

Bibliography 193

5 Optimization Methods 1955.1 Mathematical Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1965.2 Historical Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1985.3 Statement of an Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . 199

5.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2005.3.2 Design Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

V

5.3.3 Graphical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2045.4 Classical Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

5.4.1 Single Variable Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 2085.4.2 Multivariable Optimization without Constraints . . . . . . . . . . . . . . . 2095.4.3 Multivariable Optimization with Equality Constraints . . . . . . . . . . . . 2125.4.4 Multivariable Optimization with Inequality Constraints . . . . . . . . . . . 219

5.5 Classification of Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . 2265.6 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

5.6.1 Graphical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2305.6.2 Standard Form of a Linear Programming Problem . . . . . . . . . . . . . . 2325.6.3 Definitions and Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2345.6.4 Solution of a System of Linear Simultaneous Equations . . . . . . . . . . . 2385.6.5 Why the Simplex Method? . . . . . . . . . . . . . . . . . . . . . . . . . . . 2415.6.6 Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2425.6.7 Phases of the Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . 245

5.7 NLP: One–Dimensional Minimization Methods . . . . . . . . . . . . . . . . . . . . 2505.7.1 Elimination Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

5.8 NLP: Unconstrained Optimization Methods . . . . . . . . . . . . . . . . . . . . . . 2565.8.1 Direct Search Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2585.8.2 Descent Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

5.9 NLP: Constrained Optimization Methods . . . . . . . . . . . . . . . . . . . . . . . 2805.9.1 Characteristics of a Constrained Problem . . . . . . . . . . . . . . . . . . . 2825.9.2 Direct Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2835.9.3 Indirect Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

Bibliography 299

6 Design of Experiments 3016.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

6.1.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3036.1.2 Guidelines for Designing Experiments . . . . . . . . . . . . . . . . . . . . . 3046.1.3 Statistical Techniques in Design Experiments . . . . . . . . . . . . . . . . . 305

6.2 Simple Comparative Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3066.2.1 Basic Concept of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 3076.2.2 Basic Statistical Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3086.2.3 Sampling and Sampling Distributions . . . . . . . . . . . . . . . . . . . . . 3146.2.4 Inferences about the Differences in Means, Randomized Designs . . . . . . . 322

6.3 Experiments with a Single Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3276.3.1 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3276.3.2 Fixed Effects Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3286.3.3 Model Adequacy Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . 3346.3.4 Random Effects Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

6.4 Response Surface Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3406.4.1 Approximating Response Functions . . . . . . . . . . . . . . . . . . . . . . . 342

VI

6.4.2 Sequential Nature of RSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3466.4.3 Objectives and Product Quality Improvement . . . . . . . . . . . . . . . . . 348

6.5 Building Empirical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3506.5.1 Linear Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3506.5.2 Parameters Estimation in Linear Regression Models . . . . . . . . . . . . . 3516.5.3 Model Adequacy Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . 3536.5.4 Fitting a Second-Order Model . . . . . . . . . . . . . . . . . . . . . . . . . . 3636.5.5 Transformation of the Response Variable . . . . . . . . . . . . . . . . . . . 366

6.6 Response Surface Methods and Designs . . . . . . . . . . . . . . . . . . . . . . . . 3696.6.1 Steepest Ascent Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3696.6.2 Analysis of a Second-Order Model . . . . . . . . . . . . . . . . . . . . . . . 3706.6.3 Experimental Designs for Fitting Response Surfaces . . . . . . . . . . . . . 374

Bibliography 385

7 Fuzzy Sets and Fuzzy Logic 3877.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387

7.1.1 Types of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3897.1.2 Crisp Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3907.1.3 Fuzzy Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392

7.2 Basics of Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3937.2.1 Membership Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3937.2.2 Formulations of Membership Functions . . . . . . . . . . . . . . . . . . . . 3977.2.3 Fuzzy partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3997.2.4 Properties of Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4007.2.5 Extension Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4027.2.6 Operations on Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4037.2.7 Elementhood and Subsethood . . . . . . . . . . . . . . . . . . . . . . . . . . 4097.2.8 Fuzzy Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4117.2.9 Operations on Fuzzy Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 413

7.3 Fuzzy SMART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4187.3.1 Screening Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4187.3.2 Categorization of a Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4207.3.3 Assessing the Alternatives: Direct Rating . . . . . . . . . . . . . . . . . . . 4257.3.4 Criterion Weights and Aggregation . . . . . . . . . . . . . . . . . . . . . . . 4277.3.5 Sensitivity Analysis via Fuzzy SMART . . . . . . . . . . . . . . . . . . . . . 430

7.4 Additive and Multipllcative AHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4337.4.1 Pairwise Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4337.4.2 Calculation of Impact Grades and Scores . . . . . . . . . . . . . . . . . . . 4377.4.3 Criterion Weights and Aggregation . . . . . . . . . . . . . . . . . . . . . . . 4427.4.4 Fuzzy Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4477.4.5 Original AHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449

7.5 ELECTRE Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4537.5.1 Discrimination Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454

VII

7.6 Fuzzy Multiobjective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4587.6.1 Ideal and Nadir Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4597.6.2 Weighted Cebycev–Norm Distance Functions . . . . . . . . . . . . . . . . . 4607.6.3 Weighted Degrees of Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . 4627.6.4 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4647.6.5 Design of a Gearbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466

Bibliography 471

8 Engineering Economics 4738.1 Engineering Economics and Ship Design . . . . . . . . . . . . . . . . . . . . . . . . 474

8.1.1 Criteria for Optimizing Ship Design . . . . . . . . . . . . . . . . . . . . . . 4768.1.2 Operating Economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478

8.2 Time–Value of Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4808.3 Cash Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

8.3.1 Cash Flow Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4818.3.2 Interest Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484

8.4 Financial Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4898.4.1 Taxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4898.4.2 Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4928.4.3 Practical Cash Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4958.4.4 Depreciation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4988.4.5 Inflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5018.4.6 Escalation Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504

8.5 Economic Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5058.5.1 Set of Economic Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5058.5.2 Definition of the Economic Criteria . . . . . . . . . . . . . . . . . . . . . . . 5068.5.3 Choice of the Economic Criteria in the Marine Field . . . . . . . . . . . . . 514

8.6 Ship Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5188.6.1 Building Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5208.6.2 Operating Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5268.6.3 Other Decision Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534

8.7 Ship Leg Economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5358.7.1 Inventory Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5358.7.2 Economic Cost of Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . 5368.7.3 Effects on NPV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537

Bibliography 539

Theory of Design

and

Multicriterial Decision Making

Foreword

Eliminate the specialist man

(Oscar Niemeyer)

The role of naval architects is both wide and focused. It obviously depends, to some extent, onthe segment of the shipbuilding or shipping industry in which they choose to work. However,whatever it is, it is usually in a leadership role. That they are able to fulfill this leadership role isa reflection of the useful breadth as well as specialization of the education and experience gainedin the industry.

Naval architects are found in many positions in the marine industry. They can be found inthe following categories: shipowner, design companies, shipbuilder, government (departmentof transportation, navy, research centers, classification societies, universities, independent re-search centers, marine equipment manufacturers). Typical positions of naval architects are:shipowner’s technical/design manager, design agent executive, shipyard executive, chief navalarchitect, project manager, technical project manager, technical manager, ship manager. It canbe seen that the role of naval architects offers many interesting challenges and opportunities fora satisfying and rewarding career in the marine industry.

A naval architect needs to be educated in all the topics required in the design and construction ofships and other marine products. In addition, the ship designer must have a basic understandingof most of the engineering discipline topics as well as of engineering economics. The educationalrequirements for naval architects can be obtained by looking at the course curricula for thevarious universities that offer degrees in naval architecture. Although there are some differences,the traditional naval architecture topics include: theoretical naval architecture, hydrodynamics,marine structures, materials, welding, mechanics, ship motions, ship design theory and practice,shipbuilding practice, planning and scheduling, engineering economics, statistics, probability andrisk, product modelling practice, computer–based tools, marine environment, marine industry,ship acquisition, shipowner’s requirements, regulatory and classification requirements, contractsand specifications, cost estimating, human factors, safety, composites, corrosion and preservation,marine engineering considerations.

3

The State-of-the-Art

Ship design was probably one of the first technological areas to benefit from a scientific approachas well as from the application of mathematics to modelling and problem solving. The historyof the study of mechanics of materials, the theory of elasticity, hydrostatics and hydrodynamics,the study of engines, all form part of the rich heritage of humanity’s quest for knowledge, whichhas been, and still is, applied to the design and manufacturing of ships.

In contrast with traditional ship design that often assumes a closed system in isolation fromexternal influences, this course aims to encourage students as future naval architects to identifyand understand the impacts their design decisions have as a whole on the technical and economicperformance of the industrial as well as on the shipping market where their ships will operate.

Ship design is an activity in which a clear view of the functional requirements and an equallyclear view of the constraints on feasible solutions are both essential. The imperative requirementsof payload, speed, and size of a ship are challenged by the problems induced by a often hostileoperation environment and complex demands due to the required economics of operation, safety,etc. This notion of requirements frequently conflicting to some degree with each other appliesto the design of a great deal of ships and offshore vessels. Those not actively involved in designtake all this for granted and expect increasingly sophisticated products with lasting life and re-liability than their predecessors, at a reasonable cost. It would be difficult to achieve all thiseven in a vacuum but the designers have to operate in the real world of competition, which setshigh standards and demands, in the shortest possible time from the conceptual idea to productdevelopment.

Modern strategies and tactics should see as their goals not only to simplify the product and itsproduction, but also to make all kinds of processes more effective and economical. The mainobjective, the optimization of the relationship between costs and benefits, should be reachedthrough rational organization and introduction of scientific knowledge into the design process, soincreasing net efficiency.

Since shipbuilding industry has exploited modern computer-aided techniques (surface modelling,solid modelling, automatic mesh generation, direct interfaces between models and their analysis,numerically controlled production facilities), great benefits have accrued. If design practicesare well founded and competently managed, better ships are delivered to the customers, morequickly than before. The synergy between new technology, advanced design tools and short lead–times is achievable provided a rational design strategy is exploited. It creates a reward, which isconsiderably larger than the sum of the individual benefits of small changes in design practicesor islands of automation in the design office.

4

The Way Ahead

There is much to be done: a deeper understanding of all aspects of the design process and howthey interact is still needed. In many design departments the impressive potential of computerscience, scientific knowledge and successful practices are still a long way from being appliedas parts of a comprehensive and rational design methodology. There are too many companieswhere drawing boards, computers, and mathematical models are still used in a time–consumingsequential mode. The requirements and wishes of the customer are frequently translated intospecifications in departments separated from the design headquarters; and no central database isgenerally accessible. To improve out–of–time way of designing the following is required: robustand user’s friendly computer codes, proper understanding of all the processes involved in design,sound information managing and retrieval, efficient linking of design procedures to a comprehen-sive database, and updated decision–making support systems to facilitate sound decision-makingearly in the design process.

Designing ships requires a broad spectrum of specializations and skills. Beyond naval architects,they include mechanical, electrical, nautical, aeronautical, and production engineers, computer-systems experts, mathematicians, physicists, and market analysts. The research topics extendfrom hydrodynamics to quality assurance, from efficient materials selection to optimization in theface of many conflicting requirements. The result has to be the long–term emergence of an overalltheory of design. The more immediate outcomes will be deeper understanding of design process,specific decision–making procedures, faster and more reliable computer–based design tools, andimproved interfaces with existing methods and techniques. As these are addressed, one can beginto perceive areas that are common to a variety of engineering application and thus there is thebeginning of a general approach to design.

The Course Themes

The overall objective of this course is to provide students with the opportunity to deepen under-standing of ship design by developing the concept and preliminary design of a modern vessel basedon the paradigm of decision-based design. Students will be introduced to techniques for integrat-ing science–based knowledge in structuring a rational design process. They will be provided withthe tools needed to incorporate simultaneously technical–specific issues and engineering economicsinto ship design. Their mathematical and statistical knowledge from traditional engineering cur-ricula will be supplemented to have a link to advanced techniques that are needed to developtechnical and economic modelling of ship properties. By the end of the course, students shouldhave internalized the meaning of multicriterial approach from different perspectives, having ex-plored how these various perspectives impact their own selection of the ‘best possible’ design.The course also enables students to identify, formulate, and negotiate robust solutions, and togain a deeper understanding of issues typically associated with decision-based engineering. Bydeveloping and exercising students in structuring and solving problems, the course empowersfuture designers to become agents of innovation.

5

The tutor will use cognitive mapping to keep the course individually responsive to each stu-dent. The learning essay is an instrument to help develop and assess critical thinking skills. Thecourse introduces students to strategy and tactics for developing solutions through multicriterialdecision–making techniques. Because the ability to manage imprecision and uncertainty is criticalin design process, robustness techniques are provided to consider design always coupled with risk.

The course initially provides fundamentals of design theory and practice that are further exploredwith respect to ship design in the remainder of the course. In order to prepare students forthe design project, the second part of the course focuses on concept ship design in the contextof multicriterial decision-making. Decision–based design is the paradigm of naval architecturedesign used throughout the course, and is rooted in the belief that the principal role of a navalarchitect, when designing a ship or an offshore vessel, is to make decisions. Main emphasis is de-voted to a structured, multiattribute approach to making robust decisions since the early designphases. This approach examines potential solutions in conceptual form in order to select a subsetof feasible alternatives for development into non-dominated solutions, which are most likely tosucceed. The selection of the ‘best possible’ design ties the issues of fundamentals in ship design,engineering economics, and decision-making explored in the first part of the course to specificdesign problems by incorporating shipowner requirements as decision criteria.

Having been introduced to the set of multiattribute decision-making (MADM) techniques, thestudents will submit a revised project outline after reflecting on their initially proposed outlineand receiving feedback from the tutor.

Last but not least, the course integrates the tutor’s experience in decision-based and robustdesign, industry–academic interfaces, professionalism, and alternative paradigms in ship designlearning.

Finally, the purpose of this tutorial is threefold:

• to provide naval architecture students with the opportunity to get basic understanding offundamentals of ship design;

• to assist members of design teams make better design decisions by providing basic knowl-edge in one relatively easily accessible source;

• to serve as a handbook when they enter the marine industry.

6

Chapter 1

Design Science

The subject of this tutorial is design science, which is often deemed as a new branch of scienceaiming to make the design process rational. Design has been formally acknowledged as a separateactivity since the beginning of the industrial revolution, at first in production and manufacturing.Design is not a body of knowledge, but is is a highly manipulative activity in which the designteam has to continuously and simultaneously integrate the existing body of knowledge and bal-ance many factors that influence the achievement of the expected outcome.

The term ‘design science’ is to be understood as a systemic activity, which should contain andintegrate the existing body of knowledge about and for designing, e.g. it is the scientifically andlogically related knowledge about engineering design. Design science must therefore explain thecausal connections and laws related to design of technical systems. The knowledge system mustbe fixed in the forms of its terminology, taxonomy, relationships (including inputs, throughputsand outputs), theories and hypotheses, so that it can serve as basis for consciously guided designactivity.

Both the terms design and designing can be used, even though the comprehensive term designingis thought to designate the entirety of all design activities. The advantage of the word designinglies in its general intelligibility. In addition, this word is widely used worldwide, as it is under-stood in the defined context in Germanic, Romance and Slavish languages.

The purpose for design science is to develop a coherent and comprehensive knowledge aboutengineering design. Engineering is a very misused word. It can be used to describe a profession,the process of developing a design into working instructions, and a type of manufacturing. Oneof the earliest definitions of engineering is that ‘engineering is the art of directing the greatsources of power in nature for the use and convenience of man’. Another definition offered byErichsen (1989) is that ‘designers create and engineers analyze’ so that design is seen as a part ofengineering, where some engineers design and some analyze the design of others; in other terms,engineering develops and documents the design to enable its manufacture.

1

1 – Design Science

1.1 Contents of Design Science

The object of the design science is represented by the technical product to be designed, and thedesign process itself. The reasons why design science is today so necessary can be found especiallyin the new situation of the marketplace:

• its need increases constantly with time, not only with respect to quality and quantity, butalso because of demands on shortened time from ordering to delivery of technical products;

• the pressure of competition continues to grow dramatically in connection with the politicaldevelopments (e.g., reduction of trade barriers, internationalization);

• laws, regulations, standards, and codes with respect to quality assurance, safety, and envi-ronment are being reinforced and enhanced continuously.

Nowadays technical products must be able to perform the desired task with a demanded groupof properties, denoted with the term of affordability, namely, performance, operability, life–cycle,safety, reliability, maintainability, and so on. The time scale of planning, manufacturing, anddelivering must be suitable. At the same time cash flow and financing must be considered. Allthese requirements force the industrial companies, and especially the design departments, to uti-lize new methods and procedures in order to remain competitive.

The main question of rational design is to find which organization, skill, knowledge–basis areneeded and adequate so that the product and/or process is suitable for the intended purpose. Tothis end, design science should explain the complete structure of the design process and relatedtheories. Its contents should comprehend the engineering disciplines, making explicit the con-nections of the individual knowledge elements. Besides design science terms may be found like‘design philosophy’, ‘design theory’, ‘scientific design’, ‘design instructions’ or ‘design studies’.These names are not suitable to indicate the total knowledge of design as they were chosen todesignate only a part of the design knowledge.

Design science needs not only systematic descriptions (declarative knowledge, descriptive state-ments belonging to theory); it also needs methodology, instructions for the practical activity(procedural or prescriptive knowledge), and/or (deterministic and stochastic) algorithms andtechniques.

Historically the way towards design science did not lead straight to an entirety, but first to differ-ent separated activities. The state–of–the–art in these activities have turned out very differently,but in their final effect they have all aimed to improving engineering practice. Developmentshave always taken place simultaneously in a dual direction: from practice towards abstraction,and from theory towards application, without bringing these directions to a useful convergence.Nevertheless, no comprehensive design science has emerged even though three classes of activitiesmay be envisaged by which improvement in design science can be reached:

• practice: rationalization in engineering practice, i.e. in a design team or in a project;

• science: answering to scientific questions via research projects;

• education: improvement of design teaching in the universities.

2

1.1 – Contents of Design Science

The term design science will be accepted with difficulty in too many companies where, at best,the design department is responsible only for the last phases of the design process, that is, thefunctional and detail phases. Many authors have tried to define the terms design, engineeringdesign, and designing; among the others:

• engineering design is the process of applying various techniques and scientific principles forthe purpose of defining a device, a process, or a system in sufficient detail to permit itsphysical realization (Taylor and Acosta, 1959);

• engineering design is a purposeful activity directed toward the goal of fulfilling human needs,particularly those which can be met by the technology factors of our culture, and towarddecision making in the face of uncertainty, with high penalty for error (Asimov, 1962);

• designing means to find a technically perfect, economically favorable and aesthetically sat-isfactory solution for a given task (Kesselring, 1964);

• design is a goal-directed problem-solving activity (Archer, 1964);• design is a creative activity, bringing into being something new and useful that has not

existed previously (Reswick, 1965);• design is the imaginative jump from present facts to future possibilities (Page, 1966);• design is the activity involved with actually constructing the system; i.e., given a specifi-

cation of the system, it is mapped into its physical realization; the design task, however,extends throughout a system life cycle, from the initial commitment to build a new systemto its final full scale production (Katz, 1985);

• design is the creation of a synthesized solution in the form of products that satisfy perceivedneeds through mapping between the functional requirements in the functional domain andthe design parameters of the physical domain, through proper selection of the design pa-rameters that satisfy the functional requirements (Suh, 1989);

• designing is the chain of events that begins with the sponsor’s wish and moves through theactions of designers, manufacturers, distributors and consumers to the ultimate effects of anewly designed thing upon the world (Jones, 1992).

The nature of design is reflected in many other statements, which at best capture only part ofthe truth. Design is still considered a predominantly creative activity, founded on knowledgeand experience, devoted to determine the functional and structural construction, and to createdocuments and drawings that are ready for manufacturing. It is, therefore, compulsory to lookfor innovative approaches to design, eliminating whichever iterative methodology to save designtime and effort.

One should recognize, as a premise, that the task of design science itself cannot be limited only tothe delivery of technical-scientific tools (statistical databases, numerical codes, etc.) for improv-ing the most important technical properties. Design science should also offer complete knowledgeabout the design process, i.e. information about models, procedures and methods with which allproperties of the technical system can be established and realized in optimal quality, and alsohow they can be implemented to make the design process economic and quick.

The following partial goals of design science can be derived from the above:• optimizing technical properties of the industrial product to be designed;

3


• reducing the design lead–time and costs of the design process by controlling the responsi-bilities and downstream commitments;

• increasing attractiveness of design work for talented engineers also by decreasing the pro-portion of routine work;

• decreasing the risks for the company and design team;

• shortening the complete education (maturity time to competence) of designers;

• creating the knowledge–base for computer application in designing by collecting and storingall available information (including experience) into relational databases;

• making the design process scientifically transparent.

Engineering designing is executed in two well characterized, but not strictly separated, phases oflaying out and detailing. Despite the reference to those two phases, the largest advantages fromdesign science are foreseen in the most important design stages, that is, in establishing and for-mulating the design specifications, and in the concept design where innovations might and shouldbe generated. It is during the very initial design phases that the most important innovative,creative and holistic thoughts can bring the greatest benefits for the product to be designed.

It must also be clear that many psychological barriers are present, which work against acceptanceof the newer research and design methods in engineering practice. Modification of the knowledgeand provision of instructions and education is only one of the necessary steps for acceptance;others must also be fulfilled.

1.2 Design and Technical Knowledge

Since designing can also be considered as a process of transformation of information, an inclusiveand huge amount of information in technical knowledge is needed. This information consists in:

• basic knowledge at different extent for various design phases, taken from individual engi-neering sciences like hydrodynamics, strength, materials, manufacturing technology, as wellas mathematics, physics, and so forth;

• knowledge of mathematical models and modelling techniques;

• knowledge of the product with regard to functions, operation mode, maintenance, etc., andrelated guideline (heuristic) values and constraints;

• knowledge of typical components, semi-finished parts, purchased subsystems from manu-facturers and suppliers;

• knowledge of organization management and engineering economics;

• knowledge of the working tools, which can support the design tasks, including libraries,handbooks, computer packages;

• knowledge of standards, regulations, rules, and patents.

4

1.2 – Design and Technical Knowledge

There is no single source to obtain all this information in the current literature. The most in-clusive answer available in some engineering handbook generally presents only a fraction of therequired knowledge. Even recommended handbooks such as the ones published in the last decadeby Watson (1998) and Schneekluth & Bertram (2000) in ship design area, practically omit dis-cussion over the design process.

The technical knowledge in its highest form, that is, in the theory of design, is a process knowl-edge. Survey of the literature shows how little of this knowledge is contained even in specializedpublications about designing. Even successful designers are not conscious enough about why theydevelop designs the way they do. They probably learned it by following a mentor early in theircareers. On the other hand, academicians that teach ship design generally do not document thedifferent approaches and even do not try to provide their students with capability to understandwhich strategy is better at different design stages. Another problem is that only a part of thisknowledge exists in written form. In general, experience internalized by individual designers isnot available to scientific community. In teaching and subsequent practice, only some of the enu-merated elements are transmitted to future designers, without organizing them into a knowledgesystem. Only in the last decade the construction of such a knowledge system has started.

Fortunately, there has been considerable research into design in all disciplines over the past fewdecades. Outcomes have been translated into computer codes and procedures, and validated inconnection with design knowledge. For instance, hydrodynamics and strength numerical codeshave been developed into reliable tools as part of the basic/preliminary design process. Vi-sual/graphical representations within the framework of technical drawings have also reached ahigh technical level, especially as wire-frame, surface and solid-based modelling on computerpackages.

It is perhaps astonishing to see how much the historical evolution in designing has crystallizedin a situation, which still influences the common idea of designing. Some technical experts andscientists, even in the universities, have considered design simply a more or less sophisticatedcomputation activity. This can include complex operations such as computational fluid dynamics(CFD), finite element methods (FEM), structural optimization and reliability, statistical pro-cessing and decision theory applications. For others, the task of technical drawing characterizesdesigning.

The central core of design work, that is the prior thinking out of the technical products and pro-cess, has been removed from consideration merely as ‘getting an idea’ or an inspiration. In thisrespect, most engineers believe that these ideas, as well as other decisions, could be obtained justwith a portion of ‘intuition’ and ‘creativity’, without fully understanding of their implications.More often, design is still held to be an ‘art form’ rather than involving a scientific activity. Talentcertainly influences successful designing in several aspects, especially in reference to efficiency andeffectiveness, but a more important factor is the ‘working method’.

Only with the movement towards rationalizing design in the 1950’s and 60’s the activity of de-signers was explained to be cognitive and technical–scientific. This was rediscovered in the late80’s in the USA, with practically no reference to previous European and Soviet Union research

5


work. Design needed technical knowledge, and a new task was to find, explore and process thisknowledge to a theory. The discipline known under the terms ‘methodical or systematic design-ing’ has provided considerable preparatory work during the last forty years.

Nevertheless, practitioner engineers do not consider design as a science yet. The impression isthat engineers and workers can eliminate or compensate for all errors during the subsequentdevelopment and production of the product. In practice, designers do not find the time andmotivation to locate within their activity the research about design, to understand and introduceit into their own work. It is, therefore, useful to discuss some classical concepts commonly relatedto definition of designing, such as the ideas of creativity, intuition, innovation, etc.

Design and Intuition

For many designers the term intuition is still a keyword. An appraisal of the results of intuitivethinking shows that one can often achieve brilliant ideas quickly. But, on the other hand, intuitionshows itself to be extremely unreliable, both in timing and in practical applicability of the idea.In any case, an intuitive idea must be examined (logically, systematically, analytically) for thepossibility of its realization. Frequently new ideas must be rejected since analyses of definitiveresults of solutions, which have emerged from intuition, show that almost nothing has remainedof the original idea after many necessary improvements and corrections.

Nevertheless, engineers think intuitively and are generally inclined to behave so (Hubka & Eder,1992). Leadership personnel use intuition mainly in situations where the problem must be treatedquickly, or the design has to progress on the basis of partial information, and if new solutionsfor organizational concepts, for marketing strategies or for technical innovations are to be found.Rich experiences are a prerequisite for each intuition. Lack of experience, and thus a small basisfor comparison, can only lead to a risky guessing game.

Design and Feeling

If one observes experienced designers at work, it is impressive to see how often they make appro-priate quantitative decisions initially without any calculation. The subsequent computation, inorders of magnitude and with refined processes, often confirms the correctness of the estimate.This ability is called feeling for design, and it is often regarded as a talent. It is obvious thatthis feeling is based on experience and knowledge, rather than on the idea that the designer was’born with’ something innate.

The products of feeling for design are quantified statements, usually even depending on severalquantities. The dependencies are not always quantitatively available, be it in the form of an exactor approximate derived formula, or of a quantified experience value (heuristic ‘rule of thumb’).Often only qualitative statements are possible: larger or smaller, sharper or blunter. However,designers can exploit only that knowledge that they have stored and internalized.

6


Rules, which are presented as formulae, heuristic advice or guidelines, or were gained intuitivelythrough experience, begin subconsciously to influence the thinking processes after frequent, re-peated use. In this way, the developed ‘feeling for design’ delivers quick decisions. These decisionsare of limited validity as the experience must remain within a conventional range and can breakdown if this range is exceeded. The danger exists that a designer is not aware of exceeding theconventional range limits, and underestimates checking of his/her ‘design feeling’.

Design and Creativity

The combination of the two qualities of intuition and feeling can bring to the idea that designersshould be creative. The word ‘creative’ means capable of creation, inventiveness, and imagination,in addition to habitual skill and knowledge.

Creativity does not ‘just occur’. Recent researches in psychology have shown that creativity,generating novel ideas, occurs as a result of a natural tension between scientific and intuitivemental modes. The scientific mode (systematic, analytical) can recognize that a problem existsand can analyze its nature. The intuitive mode (erratic, non-calculable) can yield a sense ofdissatisfaction, which motivates the interaction with the scientific mode to attempt to solve aproblem. The oscillatory interplay between intuitive and systematic thinking modes can result increative proposals for original solutions. Creativity occurs as a direct result from using a system-atic approach to searching for solutions. Of course, science–based design methodology may offerthe appropriate methods to support creativity.

Neither a rigid systematic approach to designing, nor a fully intuitive mode of working are ade-quate by themselves to make it likely that good products may result. A flexible combination ofsystematic work and freedom of action, adapted to the specific design problem, and performedby well–educated designers using efficient tools in a suitable technical–scientific environment, islikely to increase the chances of getting successful designs.

A design team effective and creative in problem solving must show simultaneously:

• knowledge of products and physical principles, including adequate knowledge derived fromexperiences, heuristics and feelings;

• knowledge of processes, especially knowledge about design and problem–solving processes;• open-minded attitude, e.g. willingness to accept ideas and suggestions;• ability to communicate the generated proposals in useful and attractive forms.

Design, Innovation and Invention

Today the word innovation is a much abused term, used as a slogan by politicians and managers.It not only contains the implied positive objective (analogous to creativity) but also the dangerto elevate search for the new and original to the supreme goal of the design activity. The goalof designing must always and only be to achieve the ‘best possible’ solution in the given conditions.

7


If a new idea, not deducible from the current state–of–the–art, is devised for improving an in-dustrial product, this is called an invention. Inventions are normally patentable, to protect themagainst copying through granting a patent to the inventors.

The combination of inventing with designing has a varied character. Designers should examinesolutions with regard to the possibility of patent applications as inventions. It has to be empha-sized that invention should not be a primary target for designers. Higher on the scale of strivingfor competition should be the combination of an optimal solution containing an invention.

Design and Heuristics

As an adjective, the word heuristic refers to an instruction or guideline, which is not necessarilybased on science. In this sense, an heuristic is simply a ‘rule of thumb’, derived by experience,which could lead to an acceptable result with good probability.

Using this interpretation, Koen (1985) has proposed a hypothesis that in engineering all theoriesand instructions are to be regarded as heuristics because of two main reasons: (i) science servesin engineering only to formulate heuristics for the realization of technical systems; and (ii) theform in which the results of science are usually expressed in engineering is not directly suitablefor the explorative ways of design. Koen has even stated that the application of these heuristicsis the only approach useful to engineers. The interpretation about heuristics by Koen is useful forenhancing the humility of engineers, but is useless as a prescription about how one can accomplishdesign more effectively and efficiently.

More strictly, heuristics was defined by Klaus (1965) as the ‘science of the methods and rulesof discovery and invention’. The heuristic method is a particular form of the trial–and–errormethod . It differs from the deductive method in that it works with conjectures, working hy-potheses, prediction models, etc. The heuristic methods can be simulated on computers. Thebest–known proponent of modern heuristics is Polya (1980), who is regarded as the re-discovererof heuristics.

Interaction between heuristics and design methodology provides systematic heuristics. Muller(1980) has discussed the following assumptions as a result of his observations on design methodsof experienced designers, systems analysis and heuristic methods:

• innovative design is a problem–solving activity;

• each design activity can be divided in a finite number of subtasks, which can be solvedsimultaneously or sequentially;

• each subtask needs a different method, which must be tailored correspondingly;

• each design team has its own and peculiar heuristic method.

8


Design and Education

If one wants to rationalize the engineering design process through an innovative approach, twoaxioms must be established:

• design is a rational activity, which can be decomposed into smaller design phases;• the design process can be conceived and structured in a very general form, even though it

depends on the product to be designed.

It is therefore mandatory to completely refuse some diffuse and wrong opinions, namely:

• design is an art and only especially talented persons can execute it;• design is not a generalizable activity, but is always bound to the particular product to be

designed.

Design education is closely related with the first axiom: design is teachable, provided a theory(design science) and right educational methods are made available. Engineering design commu-nity is still discussing whether design should be taught primarily by establishing a foundationof theory or by engaging students in loosely supervised practice. For the broader activity ofproduct design and development, both approaches must be refused when taken to their extremes.Theory without practice is ineffective because there are many nuances, exceptions and subtletiesto be learned in practical settings and because some necessary tasks simply lack any theoreticalbasis. Practice without theory fails to exploit the knowledge that successful product developmentprofessionals and researchers have accumulated over time. Although there still strong defendersof both extremes, it is likely that over time the theoretical approach will prevail.

One reason that the theory of design has developed so slowly is that most engineers did not andyet do not receive formal education in design. That this is so is validated from the fact that designtheory and practice are insufficient or even lacking from current curricula. Therefore, it must beone of the goals of the design science to propose suitable models and didactic instruments, whichconsider all elements (teaching, theory, practice, etc.) and integrate them into a total system.

The main task remains, however, to process the ‘design science’ generally and particularly for stu-dents, and to develop the necessary didactic principles, tools, procedures and teaching materials.The assignment of this task to design science could be disputed. The task is so special that it canbe solved only by cooperation between theorists and design specialists. This cooperative workrequires specialized organizations combining engineering and education elements. Sorrowfully,there is nothing similar in Italy as, for instance, the American Society for Engineering Education(ASEE) in the USA, the Internationale Gesellschaft fur Ingenieurpedagogik (IGIP) in Germany,and Societe Europeenne pour la Formation des Ingenieurs (SEFI) in France.

Designing is also a skill, i.e. it needs both knowledge and abilities, reachable through working,exercises, and training. If no or only little relevant and organized knowledge is presented, thelearning time will be long: hence, education and training are inseparable. Not each industrialcompany puts good instructors in charge of a new generation of designers, and the best designersare not always good instructors. The time to maturity and capability of design engineers is theproblematic parameter. It is generally numbered at about ten post–graduate years. Such a long

9


time span is still needed on average to gain and internalize the knowledge lacking from their ownexperiences and own study.

In any case, it is necessary to educate students in application of commercial packages. In shipdesign computer–based packages are available for design synthesis (ASSET, SDS), analysis (FEM,CFD), surface modelling (Rhinoceros), CAD (AutoCad, Fastship, MacSurf, AVEBA–TRIBON,NAPA, GHS, FORAN) and CAE (TRIBON, CALMA). However, although some people claimthat computers eliminate the need for a design process, the need for an efficient and rationaldesign process is mandatory.

In short, education and learning goals with design science should be acquisition of:• knowledge: technical system theory, specialized knowledge about design activities, theory

of design processes, theory of decision making;• ability: methodical procedures, mathematical modelling, transfer of know-how into databases,

application of statistical and numerical codes;• skill: sketching, drawing by hand, drawing by computer (CAGD), computing via CAD–

CAM tools.

1.3 Design and Science

Introduction of design science into the design process does not mean neither immediate successes,nor double–digit percentages of savings. It is undisputable that design science will bring a long–term improvement both in the procedures of the design process and in the quality of the industrialproducts. But, as for any science or theory, one cannot expect that it can be immediately appliedto the real problems of design engineers. On one hand, improvements must be derived fromscience and adapted to the practice, and on the other hand where improvements originate fromthe practice, they must be inserted and absorbed into the science. Only afterwards, technicalknowledge presented in a new order, totality and form, may become a striking and productivetool for designers.

Extensive knowledge exists which can be denoted as traditional design knowledge. Design sciencedoes not want to be a competitive discipline, but wants to deliver a framework discipline, whichallows and encourages inclusion, transfer or reference to relevant knowledge selected and revisedfrom the existing global knowledge, combined with necessary ‘completions’. The new integratedknowledge system must form a logical and organized entirety, in which the individual elementsmutually and synergistically fertilize each other. It will fulfill the objective of delivering relevantinformation for designing in suitable form.

1.3.1 Knowledge and Science

As in all areas of human activities, also in science and technology, knowledge is power. Thereforerelevant knowledge and experiences from the individual areas have to be collected and organized,

10

1.3 – Design and Science

and thereby abstracted to general rules, models and laws. That happened also when the mislead-ing title ‘ship theory’ as science dealing with phenomena and forms of floating objects, appearedas a collective term around the year 1800 with Russell and other scientists. The overall structureof knowledge was then simplified and made clearer by collecting the existing areas of statics andstructures.

In order to arrive at a more rational and effective design process, it is compulsory to develop de-sign research, that is, suitably structured knowledge through gathering and interpreting existingdata, developing decision–making support systems, etc. Three forms of design research can beenvisaged (Eder, 1994):

• research into design, e.g., various kinds of protocols, decision–making techniques, etc.;• research for design, e.g., computer–aided tools, databases, modelling techniques, design of

experiments;• research through design, e.g., abstraction from observations during design, synthesis from

intermediate and final solutions;

Design science is not autonomous in its development, and has to adapt and include knowledgefrom other disciplines and branches of science. Important knowledge areas are systematic heuris-tics as part of technology, mathematics, computer science, and management. The search forgeneral laws in engineering sciences, but also in the engineering design field, has always consid-ered mathematics as its most important tool. The increasing demands made on decision–makingand modelling have always led to more intensive exploitation of the classical as well as new meth-ods of mathematics, such as statistics, probabilistic theory, fuzzy sets, cluster analysis, surfaceresponse methods, decision–making theory, optimization, artificial intelligence, and others.

The design process must be organized and directed in its progress. Management methods areoften discussed in relation to engineering design management and leadership. The methods ofsystems engineering (SE), concurrent engineering (CE), quality function deployment (QFD), totalquality management (TQM), product development strategies (PDS), etc., are also counted amongthe boundary region between management and design. Team work and team building (Shuster,1990) are important areas for current management techniques, including control of conflict, mu-tual support, interchanging and stimulating ideas, and so on.

1.3.2 Engineering and Science

The engineering sciences are committed to organize the technical knowledge as expediently aspossible, and strive for a complete and suitable form. A simplistic and misleading statement that‘engineering is applied science’ is frequently heard in this context. Although most engineers quotetechnology as the object of engineering sciences, the latter comprehends three basic elements, e.g.products, processes, and materials, which can be explored independently of each other; but theyform an inseparable unit.

The class of engineering sciences perhaps overcomes the average understanding of this term,which normally limits itself to applied mathematics, mechanics, fluid dynamics, electronics and

11


similar areas. This concerns knowledge that makes possible the treatment of the technical prod-ucts and processes in all of their life phases so that design work is facilitated. The importanceof engineering sciences and the proportion of technical disciplines increase continuously with therising number of scientific specialties. In the course of the technical work, phenomena and objectproperties must be evaluated, modelled, simulated even experimentally, etc.

A list of knowledge issues for research and development in engineering design consists of:

• case–based reasoning: adapting the lessons learned from a previous design problem into acomputerized knowledge–base for establishing new guidelines for future projects;

• collaborative design: allowing several designers to interact with a design database to modifydesign issues that are their responsibility;

• decision–making support systems: mainly multicriterial decision-making systems to help indesign evaluation and selection process;

• information delivery systems for design: not just information storage and retrieval, but alsointerpreting and presenting it in design–suitable forms;

• knowledge–based design tools: capturing knowledge, especially unstructured experiences andinformation for databases, artificial intelligence, etc.;

• modelling: response surface methodology, neural networks, kriging, etc.;

• optimization techniques;

• simulation: computational investigations of the behavior of technical systems, e.g., compu-tational fluid dynamics, finite element methods, etc.;

• virtual reality: visually walking through an environment that has only been built as a com-puter model.

The general theory of technical systems (Hubka, 1984) should act as the fundamental disciplinethat determines the problems and their interrelated structures. All hierarchically lower prod-uct theories, e.g. the technical process sciences mediate technical knowledge about individualprocesses and types of processes.

1.3.3 Development of Design Knowledge

The need for improvement efforts concerning designing has been caused by different reasons: (i)an unusual pressure towards better performance and very demanding targets in a high–tech in-dustry; (ii) availability of fast and powerful computers together with development of numericalcodes; (iii) shortage of skilled people in some areas; and (iv) time pressure.

Up to the mid sixties there were only some isolated groups or individual experts who proposedcertain solutions for improvement of design work. The next period, especially in the seventies anduntil today, can be labelled as the prime time for the development of design science. Increasingnumbers of research contracts, exchange of opinions through international conferences and thefounding of ‘Institutes for Design Technology’ enlarged qualitative capacity to attack the designproblems scientifically.

12

1.3 – Design and Science

Factors mainly affecting the development of design knowledge are the degree of industrial devel-opment, the levels of education, the extent of research and the size in the individual countries.The corresponding picture is quite complicated because of many factors, namely:

• The level of industrialization plays a large role when a need for rationalizing the design pro-cess is recognized. It is surprising how sparsely the research projects for design knowledgebegan to work and how weak the interest in the existing knowledge was in many countries(especially in the USA). A possible explanation to this could be that where the economicpower is strong, a persistent inertia exists, which decreases research efforts for the purposeof design improvement.

• University graduate engineers are more open for design knowledge than graduates of thelower engineering schools or industrially trained designers. It may be that university engi-neers are less represented in the embodiment and detailed design process, and they especiallyprocess the more abstract conceptual, analytic, and numerical tasks. In comparison, thetrained designers dedicate themselves mainly to the more concrete tasks, where results fromthe older expertise appear more important.

• The cultural traditions also play a decisive role: understanding about goals and means fordesign knowledge in the countries of the European continent is different from that in theUK or the USA.

• The size and financial power of a country appears not to play a relevant role. The resultsin Germany or in the Scandinavian countries towards design science is incomparable in thisrespect with those in the USA, Russia and even Italy. In comparison, the output of resultsfrom USA computer science as a whole far outweigh any others.

To obtain a more precise picture of the situation, one would have to observe the state-of-the-artand/or the development in research, in engineering design practice, and in design education. Inrough terms the present state of design knowledge can be described as follows:

• much knowledge has been accumulated, mainly as ‘islands of knowledge’, because too littlesynthesis was pursued; relationships between single elements of design knowledge have beeninadequately explored.

• some individual areas were and are not explored sufficiently, which is a disadvantage, sincethe mutual relationships do not emerge in the knowledge system.

• more care has been taken recently of the design methodology.

As a result, some goals crystallized either into partial tasks or even into principles for designmethodology; for example:

• task to obtain a rational formulation of design strategy and design procedures;• task to obtain clear separation of individual activities, especially the ones requiring high

level of specialization (CFD, FEM, DoE, etc.);• scope of generating as many feasible designs as possible and their best selection in the con-

cept design phase.

13


1.4 Areas of Design Science

To establish the technical knowledge, the structure of ‘design science’ is categorized and orderedfrom partial and specialized areas, namely, the theory of technical systems and theory of designprocesses. To enhance understanding of the technical system, it is advisable to explore therelationships among these areas and with regard to exogenous factors.

1.4.1 Theory of Technical Systems

A partial area of ‘design science’ is the theory of technical systems (TTS) which describes andexplains the technical system to be designed from all viewpoints important for designing. Thecorresponding descriptive statements primarily affect the transformation process and the effectsof technical product systems on the performance of an industrial product, on the different waysof modelling its characteristics, and on the structure of the building components.

Table 1.1 summarizes the partial areas of the theory of technical systems, with their thematicquestions. and some of those engineering sciences which deal with detailed investigations of somerelevant properties.

Areas Transformation Technical Propertiesof Process Product of Technical

TTS Systems Systems Systems

Goals: Task: Classes:Reply to - modelling - operation mode - relationships amongspecific - technology - prototyping propertiesquestion - environment - parts - design characteristics

- organization - design - evaluation of properties

- Physics - Cybernetics - Mechanics- Electrical - System Technology - Strength (of materials)

Items Engineering - Building Technology - Industrial Designof - Manufacturing - Instrumentation - Ergonomics

analysis Technology Engineering - Aesthetics- Process - Branch Knowledge - Hydromechanics

Knowledge about TS families - Vibrations...

......

Table 1.1. Areas of the theory of technical systems

The theory of technical systems comprehends the knowledge areas available, which consider sub-stantially the individual families of technical systems. The existing knowledge systems aboutobjects of designing only deal with some specialties, and in any case they cover only partiallythe information needed by designers. Thus, the theory of technical systems is an area of ‘designscience’ aimed to provide required knowledge in suitable form for designing.

14

1.4 – Areas of Design Science

Blanchard and Fabricky (1978) identify three basic elements in a technical system:

• components, which are the operating parts of a technical system; each system componentmay assume a variety of values as set by some evaluations subject to constraints;

• attributes, which are the properties characterizing the components of a technical system;

• relationships, which are the links between components and attributes.

Theory of Properties

One of the most important parts of the theory of technical systems is the theory of properties,where properties of a technical system are all those features which allow substantial evaluationof the technical and economic characteristics since the concept design phase. As a consequence,concept design can be seen as a search for feasible design properties. The actual properties ofa feasible design are measurable quantities (attributes). Designers must establish and evaluatethese properties, subject to requirements and constraints, accurately and reliably since the veryearly design stages.

The value of each property (size, dynamic behavior, cost, etc.) represents its quantitative forall feasible designs. The total value of a design is composed of values of multiple attributes toenable an overall judgment such as utility value and required freight rate. The value scale forms asequence of continuous or discrete total values. Continuous scales can be absolute, with a definedzero point, or relative, with an arbitrary zero point or none. Discrete scales can be selected valuesfrom a continuous scale (ranking), or merely belonging to a set (Pareto–set).

A complete and general list of all properties necessary to describe requirements for a complextechnical system might require some hundreds of attributes, which generally represent an imprac-tical check list useless for design. The way out lies in inquiring about a general set of classesof properties which can be concretized in individual domains and subclasses down to individualproperties defined for a particular subsystem. Four collective classes of properties can be distin-guished to provide a complete coverage of the requirements:

1. design–related classes, the so–called internal properties;

2. classes that refer to economic properties, safety and ergonomic properties, aesthetic prop-erties, law and rules compliance properties;

3. classes that directly cover the manufacturing properties;

4. classes related to the purpose of a technical system during its life–cycle time.

The boundaries between these classes are not well defined; each property can affect one or severalclasses. Typical relationships between one property and other properties represent the sum ofknowledge needed to be able to design for that property.

15


Quality of Technical Systems

Whichever technical system has to be evaluated from the viewpoint of its quality, which can beheavily influenced since early design phases (quality assurance).

The German standard DIN 55350 understands quality as an entirety of a technical product prop-erties, which refers to its ability to fulfill given requirements. This definition implies a totaljudgment (total value, value vector) about a set of attributes that have to describe how suitablethe technical system is to fulfill all partial requirements. This total value is composed of the in-dividual values of all relevant properties by a weighted combination of them, where a synergisticinfluence of one property on another should be taken into account.

According to groups of properties used to determine quality, different kinds of quality can becategorized, such as structural quality, manufacturing quality, and reliability quality. There arethree partial areas within the development of an industrial product, which govern its quality:

• Quality of design, which mostly affects the quality of technical systems. It can be assuredthrough a rational, scientific and systematic design process, which uses theoretical evalua-tion, experimental tests and validation at suitable milestones during the design work andafter its conclusion, based on the compiled documents (‘design audit’): this quality abso-lutely demands an objective set of technical knowledge (‘design science’).

• Quality of manufacturing , which is measured on the produced components of the technicalsystem. It is known as quality of conformity to the manufacturing specifications, i.e. to thedetail drawings. After production of the components, measurement of this quality aims atrejecting the unfit ones with the help of statistical processes (‘quality control’).

• Quality of application, which can be evaluated only when the technical system is employed.This includes also secondary processes such as maintenance, repair, refitting, upgrading, etc.

Knowledge systems have been developed for each of these three partial areas. The relevantstandards for recognition and verification of quality assurance schemes are ISO 9000, ISO 9001,and ISO 9002.

Models of Technical Systems

In engineering practice, at preliminary and functional design phases a set of drawings of the futuretechnical system is executed, which must contain all information necessary for its manufacturing.Design practice uses many kinds of representations (isometric or perspective projection, graphicrepresentation) and models (communication models for data transmission, experimental models,mathematical models, etc.). The practitioners are not conscious that they are really dealing withthe model of a technical system.

For design, models serve very different purposes such as:

• prediction of the properties of a technical system;

• optimization of a subsystem;

• management of the manufacturing planning.

16


Cybernetics tries to accurately specify the term ‘model’ and distinguishes two fundamental classesof models, both needed in design:

• Behavior models, which model the properties of a technical system via numerical simulationof functioning, computation of dynamic responses, analyses of experiments; they are moreuseful in preliminary and contractual design phases. Meaningful and simple models (meta-models) of technical systems should guarantee lack of ambiguity of interpretation. Thesemodels represent either a single attribute (e.g., lightship weight) or all single propertiesbelonging to a class (e.g., seakeeping behavior).

• Structure models, which model the structure of a technical system in detail and assemblydrawings; they are mostly used in embodiment design.

An important set of models is the class of ideal models, which aims to create a meaningful originalwhich can be seen as utopia. The idealization results in an ideal design (zenith), in which allessential functional properties of the technical system are included at the highest value. It mayserve as reference scale for evaluating, ranking and selecting competitive alternatives.

1.4.2 Theory of Design Process

The relevance of the design process has to be emphasized wherever, since it happens in manycountries, and in Italy too, that the top–management does not recognize yet the broad impor-tance and scope of design for the success of products and their company. By the way, that impliessalaries relatively low offered to the designers.

The scope of the theory of design processes is to explore the design process as much generallyas possible, and to organize, store and reference the complete process knowledge for and aboutdesign. This means that, in contrast to the present condition of empiricism, the relationshipsamong the individual knowledge elements should be explicitly recognized and explained.

The contents of the theory of design process can be outlined by the following themes:

• task and structure of the design process;

• computer–aided tools in the design process;

• procedures for solving the design tasks;

• factors affecting the efficiency and effectiveness of the design process;

• subjective decision–making preferences in the selection activity.

The design process has to be well documented so that every designer may understand the de-sign rationale, thus reducing the risk of imprecise and uncertain decisions. A record of the maindecisions undertaken has to be created for future reference and for educating young designers.Documented design processes usually have been developed over time by trial–and–error searchingfor the so–called optimal solution by an evolutionary approach.

In many industrial activities such as shipbuilding, designers are under increasing pressure toproduce ‘right first time’ designs in a very short time. Therefore, skillness is required in the

17


application of correct decision–making operations (problem solving) at the right time during theproduct development process. At concept design phase, the problem–solving activity consistsof structuring the design problem, generating alternative solutions, evaluating and selecting the‘best possible’ design among candidate solutions, and communicating to the preliminary level bydelivering top–level specifications.

Due to the complexity of technical systems, it is compulsory for designers to use decompositionto facilitate evaluation of individual properties by subdividing the design process into a set ofdesign activities that permit evaluation prior to a recombination of individual properties to forman overall evaluation of the design. The basis of decomposition and identification of design char-acteristics are obtained from the ‘product design specification’.

It is important to note that companies and industries, at varying levels of maturity, will havevery different views of what their design strategy is and therefore which evaluation methods andapproaches they will use to support their strategy. For example, a company may be preparedto adopt a high–risk strategy to introduce a novel product. However, when one considers themany millions of euros, and the many hundreds of people, required to design, develop, test andmanufacture a new complex technical system, it is clear that the consequences of failure is great.To reduce risk levels, several evaluation methods have developed over the years to accommodatethe needs of the different design strategies employed by companies and decision makers.

Characterization and Evaluation in the Design Process

In general, the design process is still recursive - problem solving calling problem solving for asub–problem, or a design process calling a second design process for a less complex subsystem,going downwards in a hierarchy of complexity - and also iterative - exploring forward into moreadvanced design stages, backwards for review, expansion, completion and correction.

The design process is to some extent idiosyncratic, depending on the experience background ofthe design team. It may concern a novel product , with little or nothing taken over from a previ-ous product; then risk tends to be relatively large, especially if untried principles are used. Theproduct may be more or less an adaptation of an existing one to new requirements, that is, a kindof redesign. It can be a renewal of a previous product, using modified principles, performancevalues, new subsystems, etc. Design can result in an alteration or updating of existing products.Many design projects relate to variants, by scaling up or down with appropriate adaptation ofsize relationships.

A typical evolution of a technical product from a management viewpoint consists of identifyingcriteria, conceptualizing, evaluating with accuracy, embodiment, and detailing. Identifying thecriteria leads to design specifications. Concept design uses knowledge–based metamodels to eval-uate technical and economic properties of the product to be designed. Accurate evaluation inpreliminary design and contractual design requires precise definition of the product geometry,usage of numerical codes, experimental analysis, etc. Embodiment develops the functional design

18


up to final improvement of the product. The final stage of detailing produces the complete man-ufacturing information; typically detail drawings, parts lists, and instructions for assembly.

The decision process may be applied for evaluation, improvement and optimization of a candidatedesign, but its main purpose is selection. Decision-making theory has been developed for thatpurpose, to make decisions more rational, provided that the criteria and attribute functions canbe formulated in mathematical terms. Evaluation needs previous identification of criteria andtargets about acceptable performance related to the properties. Some evaluation criteria will beabsolutely objective, containing numbers or mathematical relationships, but others will be heav-ily dependent on designers’ subjective preferences. Selection includes comparison of candidatealternatives with respect to the ideal design to establish their potential quality.

Goals of the Design Process Design EngineeringDesign Process and Information Design

Methodology Domain

Quality of thetechnical system decisive decisive moderateto be designed

Design time decisive decisive decisive

Design efficiency decisive decisive moderate

Reduction ofrisk for the decisive decisive decisivedesigners

Ratio ofskilled to decisive moderate low

routine work

Maturationtime for decisive decisive moderatedesigners

Design costs decisive decisive decisive

Team work decisive moderate decisive

Table 1.2. Relevance of areas on goals of the design process

The design process must be managed and made more reliable and predictable in timing andquality of outcomes trying to shorten the time to economic break–even and profit; that requires:

• generating and evaluating alternative solutions to select the ‘preferred design’ among feasi-ble, non–dominated alternatives;

• improving the preferred solution before basic design phase through sensitivity analyses, sothat as many of the potential faults can be avoided;

• scientific knowledge–base about similar products that have been designed, about availableprinciples of operation, but also about failures that have occurred, and their causes;

• the best possible knowledge about the design process;

19


• auditing the design work with respect to all properties and accepted design specificationsby validating the models, verifying I/O data and checking manufacturing information.

Furthermore, rationalizing the design process also implies that the product and its manufactur-ing process should be designed concurrently. For a novel product, concurrency usually needsto wait until at least the concept design phase is substantially completed. During embodimentand detailing of the product, concurrent engineering of the manufacturing process can proceedat best. It is necessary, however, to ‘freeze’ the design development of the product before themanufacturing process is completed; otherwise, an expensive redesign situation can occur.

A possible list of the partial areas of the design process and the corresponding effect on differentgoals is shown in Table 1.2. It is evident that relationship between goals and partial areas of thedesign process is at maximum as regards design time and reduction of risk for the designers. Ofcourse, evaluations may be modified in this judgement matrix according individual experiences.

Transformation Technologies in the Design Process

To realize a rational transformation in the design process, the technology of the design processshould be ascertained. It should make possible not only the transformation of the information(from rough specifications to a representation and modelling of the industrial product ready formanufacturing), but at the same time also its optimization. To guarantee that the process pro-gresses in an effective way, all the technical and organizational elements as well as their mutualrelationships must be standardized.

An analysis of present design processes shows at least three typical classes of transformationtechnologies:

• traditional , which is still predominantly used in engineering heuristical design practice;

• methodical , if the process is guided by systematic and rational methods;

• mixed , if both intuitive and methodical classes of procedures are used simultaneously.

The tasks of the individual designers are obviously different within these types of design processand technologies. Many parameters of the design process are influenced by the design technology;in particular, quality of the design and product, time to complete the design work, and especiallytransparency of the design process. This latter factor can permit or hinder such procedures asteam cooperation, design audits, product liability litigation.

Among the available procedural models, three classes can be envisaged according to the type ofguidelines:

• strictly algorithmic instructions, i.e. rigidly regulated procedures;

• heuristic instructions, i.e. relatively flexible procedures;

• relatively vague instructions, with no clear references, i.e. fairly free procedures, guidedonly by some relevant principles.

20


To make the area of methodology clearer, one should distinguish between design strategy , whichshould determine the general direction of the design process, and design tactics, which treats themethods and working principles of the individual design steps. In the context of the systematicinstructions for procedure, the question has emerged whether the whole design process is algorith-mically solvable. Whatever is the reply, procedural models of design have emerged and continueto emerge, as ‘flexible algorithms’, not only in design strategy but also in design tactics.

Risk Analysis

A major development is the use of statistics and probability for risk assessment in all aspects ofdesign and operation.

Though relatively new to marine applications, risk analysis and other risk techniques have beenused in other industries for more than 50 years. It obtained its impetus from the start of entirelynew industries such as nuclear power generation and the USA space program. Most recentlyit was applied to the protection of the environment and safety in the operating of all types oftechnical products. All these cases shared the same problem in that there were no historical dataon which to base design decisions, or to predict the performance of equipment relative to its safeoperation.

The old way to design for uncertainty and to eliminate risk of failure was to apply safety marginsto the derived requirements. The problem is that safety margins are built on experience andwhere there is no experience large safety margins have to be applied which is a waste of resourcesand may be cost prohibitive. In order to find solutions to this situation the designers turned tothe application of probability, which is the foundation of risk analysis.

Techniques such as fault tree analysis were developed to break down the problem into parts thatcould be analyzed and assigned individual probability levels, which would then be combined intoan overall risk assessment. After its initial development, the application of risk analysis expandedinto industries where the rate of new technology development was high, or the risk of catastrophicor very serious outcomes was present. In some cases it was only brought into use after significantaccidents occurred, such as the ‘Exxon Valdez’ cargo oil spill in Alaska.

In case of the current focus, it can be seen as the risk of not achieving the contract speed on trialsor deadweight for a ship, the risk of structural failure, the risk of an oil spill, the risk of collisionin a crowded sea lane, and so on. Probabilistic approaches in ship design now cover subdivision,damage stability, oil outflow, reliability-based structural design, machinery monitoring and con-trol, maintenance and operation.

The global marine industry was introduced to risk analysis and management through the activi-ties of IMO. The UK Marine Safety Agency developed a risk analysis and mitigation approach,the Formal Safety Assessment (FSA), which is a broad brush approach to identifying major riskareas, analyzing them in turn, developing ways to mitigate the risk, performing a cost–benefitanalysis for the proposed solutions, and then deciding on an approach.

21


Designers and Leadership

What are engineering designers? They are educated engineers open to new ideas to think ahead,capable to transform a concept about an industrial product into a technical system to be manufac-tured. In process terms, the designers transform information from requirements and constraintsinto the description and representation of a product, which is capable of fulfilling given demands,including requirements from the life cycle. Designers include all team members who cooperate inand contribute to the transformation in the design process; not only development engineers andlayout engineers but also decision makers and draftsmen.

The design process demands from the designers a large quantity of different activities which sup-port design, but cannot be allocated directly to the real design process. They range from designplanning to supplying information, from interaction with production planning to experimentalanalyses. Besides the design team directly active in the transformation process, the complete de-sign system contains also several further staff members who fulfill the general functions necessaryfor the progress of the design process: providing managing, administrative processing, archiving,etc. The historical individuality of designers meant, but sometimes still means, that the predom-inant share of design was accomplished by designers themselves. That is why the great relevanceof design for an industrial company has to rest with the designers. This necessity is, however, inevident contrast to many situations, where the designers are generally held in low respect.

The leadership strategy in the design process will hardly be found in a manual for management.Leading designers up to the top position must always be branch experts, who understand thework and are ready to take part in discussion, for otherwise they can hardly find the necessaryrecognition from their team members. All activities of the design leaders, such as determiningtasks and assignments, planning, working methods, work for coordination, organization and oth-ers, must be conducted and influenced by conscious leadership tactics. Continuing education ofthe staff members should not be forgotten, in branch knowledge and in design process knowledge.

That is why the ‘design science’ has the task to support the design managers with related knowl-edge and to make available for design processes the particular theories of specialization, organi-zation and planning, in order to solve often conflicting problems; among the others:

• creative–innovative tasks which can be very complex;

• design work often carried out and released under time pressure;

• simultaneous processing of several tasks which are respectively in different states of designmaturity;

• shortening of qualified members in some design areas;

• management of even capable staff members who are often individualists and not suitableto group job;

• psychological barriers up to refusal of new methods and techniques/technologies;

• space for personality development, career prospects and chances for promotion (for exam-ple, ascent without compelled transfer into company management).

22

Bibliography

[1] Archer, M.: Systematic Method for Designers, Council for Industrial Design, London, 1964.

[2] Asimov, M.: Introduction to Design, Englewood Cliffs, Prentice–Hall, New York, 1962.

[3] Dieter, G.E.: Engineering Design, 2nd edition, McGraw–Hill, New York, 1991.

[4] Eder, W.E.: Developments in Education for Engineering Design - Some Results of 15 Years of WDKActivity in the Context of Design Research, Journal of Engineering Design, Carfax, 1994, Vol. 5,no. 2, pp. 135–144.

[5] Hubka, V.: Theory of Technical Systems, Springer–Verlag, Berlin–Heidelberg, 2nd edition, 1984.

[6] Hubka, V., Eder, W.E.: Introduction to Design Science, Springer–Verlag, Berlin–Heidelberg & NewYork, 1992.

[7] Jones, F.: Design Methods - Seeds of Human Futures, Reinhold van Nostrand, New York, 1992.

[8] Katz, R.H.: Information Management for Engineering Design Applications, Springer–Verlag, NewYork, 1985.

[9] Kesselring, F.: Technical–Economic Designing , VDI–Zeitschr., 1964, Vol. 106, no. 30, pp. 1530–1532.

[10] Klaus, G.: Dictionary of Cybernetics, Fischer, Frankfurt, 1969.

[11] Koen, B.V.: Definition of the Engineering Method , ASEE, Washington, 1985.

[12] Muller, J.: Working Methods of the Engineering Sciences - Systematics, Heuristics, Creativity ,Springer–Verlag, Berlin–Heidelberg, 1990.

[13] National Science Foundation: Program Announcement Design Theory and Methodology , Nr. OMB3145–0058, 1985.

[14] Page, J.K.: Contribution to Building for People, Conference Report, London: Ministry of PublicBuildings and Works, 1966.

[15] Polya, J.K.: School of Thinking - about Solving Mathematical Problems, Francke, Bern, 1980.

[16] Reswick, J.B.: Prospectus for an Engineering Design Center , Cleveland, OH: Case Institute ofTechnology, 1965.

[17] Schneekliut, H., Bertram, V.: Ship Design for Efficiency and Economy , 2nd edition, Butterworth–Heinemann, Oxford, 1998.

[18] Shuster, H.D.: Teaming for Quality Improvement , Englewood Cliffs, Prentice–Hall, New York, 1990.

[19] Suh, N.P.: Principles of Design, University Press, London, 1989.

[20] Taylor, E.S., Acosta, A.J.: MIT Report, Cambridge, MIT Press, 1991.

[21] Watson, D.G.M.: Practical Ship Design, Elsevier Ocean Engineering Book Series, Vol. 1, Elsevier,Oxford, 1998.

23

Bibliography

24

Chapter 2

Standard Design Processes

Nowadays, there is still a general lack of understanding of the essence of the design process.Design is not a body of knowledge, but it is the activity in which the design team integratesthe existing bodies of knowledge, while continuously and simultaneously paying attention to andbalancing many factors that influence the design outcome.

The term design process, interchangeable with design methodology , concerns procedures devel-oped to adequately perform design activities. These procedures are structured, that is, they area step–by–step description and provide a template for the key information and decision–making.Documented design processes have been developed over time by trial-and-error following an evo-lutionary approach, with a few exceptions based on more rational design theories.

In general, because of the complexity of modern technical systems and more stringent require-ments the traditional design methods are no longer suited to yield competitive products. That isworth mostly for ships, which are among the most complex products as it is evident by consideringthe number of individual parts required for different products (see Table 2.1).

Product Type Number of Parts

Aircraft carrier 2,500,000Submarine 1,000,000VLCC 250,000Boeing 777 100,000Fighter aircraft 15,000Automobile 1,000

Table 2.1. Number of unique parts in technical products

In today competitive market, there is an ever–increasing need to develop and produce technicalsystems that fulfill a number of customer requirements, are reliable and of high quality, and arecost effective. Even though it is well known that the early design phases are of utmost impor-tance for the lifetime success of a technical product, even recently the focus was on engineeringimprovements, such as product planning, parts planning, process planning, production planningautomated manufacturing, reduction of labor costs, etc. (Elvekrok, 1997). Therefore, a ratio-nal paradigm for design is needed to increase both the efficiency and effectiveness of the design

25

2 – Standard Design Processes

process. Efficiency is intended as a measure of the quickness in generating reliable information,which can be used by designers in decision–making process. Effectiveness denotes a measure ofquality of design decisions (accuracy, comprehensiveness).

The primary objective of the design effort, besides creating the information needed to build thetechnical system, is to satisfy the customer’s requirements at minimum cost. Life–cycle cost of anindustrial product includes the design, construction, and operating costs. For designs that incor-porate new technologies, and hence research and development costs, these also must be included.

The demand for innovation in product development requires innovation in the design processachievable by consideration of some basic fundamentals:

• design is the primary driver of cost, quality, and time:

– concept design influences over 70% of the total life–cycle cost of a technical system;– too much is spent too late;– more focus is needed on the concept and preliminary design phases;

• recognize the need to leverage the power of design earlier, broader and deeper:

– design improvements are marginal if they only address single components and proper-ties, and not the life–cycle processes;

– it is a mistake to focus only on reduction of labor and material costs;– the major cost battleground is overhead, which must be attacked aggressively;

• multidisciplinary teams are the key to drive the overall design process:

– only multidisciplinary teams can manage the multiple, often conflicting properties ofa complex technical system;

– customers have to participate in decision–making from the earliest design stages.

Since Wallace and Hales (1987) documented in explicit detail an actual design process in indus-try, nothing similar was published in open literature. Even though Andreasen (1987) highlightedsome improvements and positive changes in designers’ attitude because they had started to relyon the basic methods of design science (for example, using the ‘concurrent engineering’ approach),analysis of designers’ behavior along the design process shows diffuse situations with poor trainingand cognitive limitations. In general, designers:

• develop the functional aspects of the design in stages throughout the design process ratherthan during an initial functional development phase;

• use functional relationships that are not always quantitative since the real beginning of thedesign process;

• make decisions based on heuristic reasoning instead of rational decision making;• use their individual knowledge to influence the generation and evaluation of different solu-

tions, thus avoiding to use domain–independent procedures;• select the final design in a pure deterministic way without due consideration of imprecision

and external noise.

26

2.1 – Design Models

2.1 Design Models

Nowadays two major streams of development in design theory can be envisaged, e.g. the pursuitof a rational theory of design and the development and application of computer–based tools.From a theoretical viewpoint, there are many theoretical models for the design process developedcompetitively from practitioner designers and researchers, which can be grouped in two basicapproaches and a third one derived from their merge:

• descriptive models, which build the design process from observation of the professional prac-tice of designers;

• prescriptive models, which aim to structure a rational design process based on knowledge-base;

• hybrid models, which combine descriptive and prescriptive models; they promise to be moreflexible than the prescriptive models while fixing the sequence of main design phases.

2.1.1 Descriptive Models

The basic activities of the descriptive models are:

• problem definition: the problem statement is developed to identify the design attributesand constraints according to customer expectations; a useful investigation is conducted toclarify the advantages and disadvantages of various subsystem options to study during theconceptual design phase;

• concept design: the problem statement is used to develop a baseline system to meet customerde3mands under given constraints; decisions made during this design phase are strategicand influence the whole product life cycle as regards performance and cost;

• preliminary/contractual design: the previous conceptual baseline is used as an initiation fora detailed design description and drawings that define an integrated engineering system bymeans of high–tech tools and knowledge–base.

• functional design: the final design is refined as to its functionality through optimization oftotal engineering system performance and cost; a specification is provided with supportingdrawings and procurement documents to purchase long lead–time equipment;

• detail design: all the details of the final design are specified, and the manufacturing draw-ings and documentation are prepared.

The descriptive models exemplify the way design is performed sequentially by a design team, butdo not indicate how to arrive at the ‘best possible’ solution applying decision–making techniques.They closely resemble the traditional approach to design relying on decomposition, evaluationand iteration. These models are usually no more than guidelines because of the predominantinfluence of subjective preferences of designers on the process. Their validation is difficult asmost engineers often do not have the required skills.

27


2.1.2 Prescriptive Models

In contrast to descriptive models, prescriptive models encompass a decision–based design ap-proach, which has its foundation on a general design methodology containing four basic processes(Jones, 1963):

• problem definition: defining all the requirements for the technical system and reducingthese to a complete set of performance specifications; to this end, technical systems are de-composed or partitioned into subsystems to determine their size, properties, etc.; anothercrucial objective is to develop the orderly design schedules and plans;

• analysis: finding possible solutions for each individual performance specification and inte-grating subsystems to arrive at alternative design solutions with feasible properties.

• synthesis: selecting the preferred solution among a number of alternative solutions withfeasible properties.

• evaluation: assessing the degree to which the preferred solution fulfills the stated require-ments.

Prescriptive models of the design process have been developed by Hubka (1982) and Pahl andBeitz (1984) in harmony with ‘systems engineering’ approach. Commentaries on the develop-ment, implementation, and application of related guidelines have been made by Beitz (1987) andCross (1989).

De Boer (1989) deems that a basic three-step pattern (diverging, systemizing and converging)can generally be recognized in each phase and subphase of a prescriptive model. The first stepin design is usually divergent in nature: the design team generates a large number of candidatealternatives, which are then analyzed and synthesized into forms that may represent feasiblesolutions. Finally, the refined solutions are further analyzed, synthesized, and evaluated leadingin general to a set of satisficing designs. As the number of acceptable solutions is decreased, theselection activity is characterized by convergence. This pattern is that one more correspondingto the logic of the concept design.

2.1.3 Hybrid Models

The is no doubt that design of complex technical products will continue to require the individualexpertise, perception, and judgement of the members of the design teams. However, the designershave to be organized as a network of experts based upon their technical speciality or particularrole. Further information technology and computer-aided support to decision making should bedisseminated across design subteams to facilitate their critical communication and negotiationabout the design process. This strategy results in hybrid models for the design process as a com-bination of prescriptive and descriptive models, supported by cross–functional design subteams.

Although it is considered a means of achieving an efficient and effective concept design (Parsonset al., 1999), the hybrid model approach is suitable also to preliminary and contractual design

28

2.2 – Design Approaches

phases, when the naval architects, strength and vibration experts, and marine engineers in ded-icated teams can be assigned specific tasks. It is responsibility of each job leader of a subteamto decide when and how to carry out more accurate computations and evaluations. Nevertheless,conflicts arise when design subteams disagree on the importance of the single characteristics oftheir own functional parts with respect to the features of the entire technical system. As a result,trade–offs are often resolved in ways that do not optimize for the best overall system. Therefore,hybrid models are also aimed to overcome these problems by addressing three key problems indesign management:

• planning : design tasks cannot be sequenced in rigid detail, as it happens in the designspiral approach; project management has to be flexible and adaptable to specific customerrequirements and external environment; parallelism is a key concept in design planning;

• integration: decisions that a design subteam makes affect previous decisions made by an-other design team; this permanent need for re–evaluation as a result of changes in thedesign interfaces can lead to lengthly cycles of iteration and change; moreover, design ishighly influenced by the integration of marketing, procurement, production, finance, andhuman resources.

• ranking : designers have non-common language and preferences; hence mathematical toolsare necessary for comparing the importance of different attributes and ranking overall scores.

This fosters a decision–based design approach allowing the design to proceed concurrently anddefers detailed specifications until trade–offs are more fully understood.

2.2 Design Approaches

As the design process progresses and decisions accumulate, freedom to make changes is reduced;at the same time, the knowledge about the product under design increases. This increase inknowledge is characterized by a transformation of soft (vague) information into hard (more pre-cise) information. Soft information refers to the heuristic and uncertain information that stemsfrom approximate predictions of product properties also based on designer’s subjective judgmentand experience, whereas hard information is generally based on much more accurate and science–based evaluations. Given this nature of design information, what a rational design approach hasto facilitate is to know more about the design properties early on, that is, faster increase in ratioof hard–to–soft information. This relative improvement in the quality of information is expectedto lead to equivalent or better designs that are completed in less time and at less cost than thosedeveloped using a traditional serial process. Therefore, it is worth discussing how the most rele-vant design approaches satisfy these expectations.

The design spiral has been and is still the preferred approach to describe the ship design process.It is focused on a series of activities that meet, as efficiently as possible, at a single design solu-tion for a specific project. This approach often involves making decisions based on incomplete

29


information and/or compromise. Thus, it requires significant rework (iterations) to reach an ac-ceptable design and has no way of knowing if it is a good solution other than experience. Figure2.1 shows how the ship designers move through the design process in a serial series of steps, eachdealing with a particular analysis or synthesis task.

After all the steps have been completed, the design is unlikely to be balanced, or even feasible.Thus a second cycle begins and all the steps are repeated in the same sequence. Typically, anumber of cycles (design iterations) are required to arrive at a satisfactory solution. Anyone whohas ever participated in a ship design knows that in this approach the steps often will not beperformed in a prescribed, hierarchical order; instead the design team will jump from one spotto another on the spiral, as knowledge is gained and problems are encountered.

Figure 2.1. The design spiral

The design bounding approach is an alternative design process that uses the concept of designspace. It considers a number of alternative solutions within a range of values for independentvariables and parameters, which bound the design space, avoiding pressure for iteration. Whileit involves performing the design calculations for every design combination, the need for iterationit avoided. This approach exploits the fact that all design team members have access to a unique3D computer model of the ship by means of a network, which can only be updated with theapproval of the design team leader.

It is because of the iterative nature of many design approaches that the approach of least com-mitment should be followed. That is, progressing from step to step in the process, no irreversibledecision should be taken until it is necessary. This principle of least commitment provides maxi-mum flexibility in each step and the assurance that more alternatives remain available as long aspossible, before selecting the best alternative. The policy of least commitment has been shown

30

2.3 – Systems Engineering

to result in more efficient design, primarily due to the reduced requirement for iteration.

The decision–based design approach, accredited to Toyota, was born out by evidence as the bestdesign methodology since it provides better designs and in shorter time. It is founded on theaxiom that the principal role of designers is to make decisions. It encompasses ‘systems engi-neering’ paradigm and embodies the ideas of ‘concurrent engineering’ approach for the productlifetime.

The general application of decision–based design is particularly suitable in the early stages ofthe design process, although the tools developed and employed (multicriterial decision–makingtechniques) can be useful even in further design phases. The motivation to foster this paradigm inthe early design phases is that it offers the greatest potential to affect the design process, productperformance and total cost. The initial stages of design dramatically affect the final outcomes.Indeed, possibilities for influencing total lifetime product cost are very high in the real begin-ning of the design process and decrease during the following design stages, process developmentand manufacturing. Hence, to solve the conflicting issues of economic viability, technical andeconomic design properties of an industrial product are to be considered simultaneously sinceconcept design.

Independent of the paradigms or methods used to plan, establish goals and model technical sys-tems, designers are and will continue to be involved in two primary activities, namely, performingsimulations and making decisions - two activities that are central to increasing the efficiency andeffectiveness of designers themselves and the processes they use.

Before discussing the multicriterial decision–making approach and summarizing some suitablesupporting mathematical tools, the evolution of design paradigms is illustrated which includesnew processes, new management practices and new tools.

2.3 Systems Engineering

Design is a process of synthesis and integration covering many disciplines. Because of the extentof required knowledge, traditional design is accomplished by dividing the overall product intomanageable subsystems and solving for them. Therefore, system engineering1 has been devel-oped to ensure that the isolated specialist solutions are integrated. It focuses on the relationshipof the different subsystems and disciplines involved in their design and integration of them all.

The technical literature on the subject has extended in last decades, even though it traces itsorigin back to the end of the Second World War, while the earlier books go back to the 1960’s.Systems engineering received its impetus from the defense industries in a number of countries.The US Navy defined systems engineering as ‘the application of science and art to integrate theinterdependent aspects of a ship design to form an optimum whole which meets operational re-quirements within technical and programmatic constraints’.

1The term ‘system’ stems from the Greek word ‘systema’, which means ’organized whole’.

31


Systems engineering developed because of two reasons. The first is that engineers had becomeso specialized (taylorism) that someone needed to take the responsibility for the overall technicalsystem. As to ships, the naval architects have always had this responsibility and still maintain itin a lot of shipbuilding countries, even if in the United States they allowed this responsibility tobe taken away from them since many decades. The second reason is that some industrial products(aircraft, ships, cars, and so forth) had become so complex that a rational way to manage thedesign has become essential.

Increased system complexity had increased emphasis on the definition of requirements for individ-ual system elements as well as definition of the interfaces between subsystems. Increased systemsize and complexity had forced expansion of the engineering workforce required to develop thesystem as well as increased specialization within the workforce. Collectively, these trends hadinevitably forced the managers of complex systems to expand and formalize their developmentprocedures and processes under the systems engineering umbrella.

2.3.1 Goals

Systems engineering is a formal process for the design of complex systems to meet technicalperformance and achievable objectives within cost and schedule constraints. It concerns the en-gineering processes and techniques aimed to transform a set of operational requirements intoa defined system configuration through a top–down iterative process of requirements analysis,functional analysis, design synthesis, system documentation, manufacturing, trials and valida-tion. Therefore, it overlaps and interfaces with both design and project management. Systemsengineering does not comprise, but is only the organization and management of the design process.

In recent years some proponents of systems engineering have pushed for its use almost as if it is adesign approach. Nevertheless, while the overall design has always considered both the design ofindividual components and integration of them, systems engineering does not comprehend the de-sign process. In other terms, while design is a decision-making process characterized by selectionof the design variables and parameters as well as of the preferred product, systems engineeringis only the organization and management of the design process. Systems engineering cannot beconsidered intrinsically as an engineering discipline in the same way as civil engineering, shipengineering, mechanical engineering, and other design specialty areas.

The systems engineering process involves both technical and management aspects. Its princi-pal objective is to achieve the optimum balance of all system elements so as to optimize overallsystem effectiveness, albeit at the expense of subsystem optimization. According to the ‘Inter-national Council on Systems Engineering’, this methodology focuses on defining customer needsand required functionality early in the development cycle, documenting requirements, and thenproceeding with design and system validation. It integrates technical disciplines and ensuresthe compatibility of all physical, functional, and program interfaces. These disciplines include:reliability, maintainability, supportability, safety, manning, human factors, survivability, test en-gineering and production engineering. During the technical product development, the systemsengineering process gives great weight to customer needs, characterizing and managing technical

32


risk, technology transfer from the scientific community into application development, system testand evaluation, system production, and life–cycle support considerations.

The objectives of the systems engineering process are the following:

• ensure that the product design reflects requirements for all system elements: hardware,software, personnel, facilities, and procedural data;

• integrate the technical efforts of the design specialists to produce a balanced design;

• provide a framework for production engineering analysis and production/manufacturingdocumentation;

• ensure that life–cycle cost is fully considered in all phases of the design process.

2.3.2 Process Description

The systems engineering methodology is a collection of processes.that surround and enhance thefundamental process by complementing or focusing on particular aspects of it. The fundamentalprocess is iterative and increases detail in each phase of the system development. It is followedat the total system level by those with overall responsibility for system integration while, at thesame time, it is being followed by the developers of individual subsystems and components. Theprincipal steps in the process are shown in Figure 2.2 and briefly discussed in the following.

Figure 2.2. The systems engineering process

33


Initial Requirements. Initial requirements are contained in an ‘initial draft system requirementsdocument’. They consist of objectives, constraints, and relevant measures of effectiveness for thetechnical system. Generally they come from the customer.

Functional Analysis. Functional analysis is a method for analyzing the initial top–level require-ments for the technical system and dividing them into discrete tasks, which define the essentialfunctions that the technical system must perform. It consists of two activities: identificationof system functions and allocation of system requirements. It is performed in parallel with thesecond step in the fundamental process, design synthesis, since there must be interactions be-tween the two activities. Functional analysis starts with the identification of the top–level systemfunctions and then progressively allocates the functions to lower levels in the system. Each func-tion is described in terms of inputs, outputs, and interface requirements. Functional flow blockdiagrams are used to depict the serial relationship of all the functions to be performed at onelevel, that is, the time–phased sequence of the functional events. For some time-critical functions,time line analysis is used to support the functional analysis and design requirements development.

Requirements Allocation. Requirements allocation proceeds after the system functions have beenidentified in sufficient detail and candidate design concepts have been synthesized. It definesthe performance requirements for each functional block and allocates the functional performancerequirements to individual system elements. The performance requirements are stated in termsof design constraints and requirements for aspects such as reliability, safety, operability, andmaintainability. Requirements allocation decomposes the system level requirements to the pointwhere a specific hardware item, software routine, or trained crew member will fulfill the neededfunctional/performance requirements. The end result of requirements allocation is the technicalsystem specification. Design constraints such as dimensions, weight, and electric power are definedand documented in the ‘requirements allocation sheet’, along with all functional and technicalinterface requirements. The personnel requirements for all tasks are defined. Some performancerequirements or design constraints can be allocated to lower levels of the system.

Design Synthesis. Design synthesis provides the engineers’ response to the requirements outputsof functional analysis. Its goal is the creation of a design concept that best meets the statedsystem requirements. Inputs from all engineering specialties that significantly affect the outcomeare utilized. Several design solutions are typically synthesized and assessed. Two tools are usedto document the resulting candidate design solutions, that is, the overall configuration, inter-nal arrangement of system elements, and principal attributes by means of the ‘schematic blockdiagrams’ and ‘concept description sheet’. As the concepts that survive the screening processare developed further, ‘schematic block diagrams’ are developed in greater detail and are usedto develop ‘interface control documents’. For attractive design concepts, physical and numericalmodels are developed later in the synthesis process. The concept description sheet is the initialversion of the ‘concept design report’, a technical report that documents the completed conceptdesign. This report includes drawings and technical data such as weights, material element list,etc. The results of system analysis for the concept, described in the following, are also typicallyincluded in the report.

34


System Analysis: Once a design concept has been synthesized, its overall performance, costs andrisks are analyzed. As the design development proceeds, the number of attributes and level ofdetail of the analysis will increase. Early phase analysis typically consists of quick assessmentsusing empirical data based on past designs and reflects many simplifying assumptions. In thelater stages of design process, much more sophisticated modelling and simulation is done, cou-pled with physical model tests in some cases. The aspects of performance with major effectson mission effectiveness are identified and analyzed individually. Development, production andoperation costs are typically analyzed for each option being considered. Risk is assessed usingstandard procedures.

Evaluation and Decision. Trade–off studies are an essential part of the systems engineering pro-cess. Once several feasible design concepts have been generated, a selection process must beactivated.by means of a standard methodology where seven steps can be envisaged, namely:

1. Define the goals and requirements to be met by the candidate designs (functional analysis).

2. Identify the design candidates and discard the unfeasible solutions (design synthesis).

3. Formulate selection criteria (attributes) and, if possible, define threshold and goal valuesfor everyone (minimum acceptable and desired values, respectively).

4. Weight the attributes. Assign numerical weights to each attribute according to its perceivedcontribution to overall performance. Mathematical techniques can be used to translate thesubjective preferences into weights.

5. Formulate utility functions, which translate diverse attributes to a common scale, for ex-ample, comparing speed vs. endurance, cargo capacity vs. on-off-load times for a ro–ro ship.

6. Evaluate the alternatives. Estimate overall performance and other required attributes suchas risk (system analysis). Then score the overall technical capability with respect to cost.Calculate the cost/capability ratio for each alternative.

7. Perform sensitivity analysis. Assess the sensitivity of the resulting overall score to changesin attributes, weights, and utility functions. This enables a more informed judgment to bemade as to whether one alternative is clearly preferred over the others.

System Documentation. The system design must be documented as it evolves. Traditionally,this has been done on paper by means of documents such as specifications, drawings, technicalreports, and tables of data. Today, this is increasingly done utilizing integrated design systemsand producing the desired documentation on electronic supports. In the future, ‘smart productmodels’ will contain all necessary design documentation.

2.3.3 Synergy with Information Technology

In the ‘information age’, systems engineering and computer technology operate in an industrialworld requiring synergistic activities. Systems engineering may allow designers to find some solu-tions to design problems relying on databases and decision–making support systems available on

35


computers. When adapting the information technology (IT) in an industrial company to princi-ples of systems engineering, information may be provided in a synergistic way to designers almostinstantly in quantity and quality yet not considered possible.

Designers are still involved primarily with the unstructured or partially structured parts of prob-lems. Nevertheless, project managers and designers as well are discovering that the simple anddichotomous subdivision that separates, say, mechanical engineering, electrical engineering andnaval architecture is more a historical tradition than a technical necessity. It may be a traditionaland convenient means for structuring administrative entities and budgets, but it is dramaticallyinefficient for organizing design teams.

Combination of systems engineering and information technology permits to plan and control themanufacturing process as a whole. A corollary effect of the advent of systems engineering is theblurring of the lines that have separated the traditional disciplines in both academic institutionsand industry.

On the other hand, in the decades since computers became the universal tool of engineers andscientists, dramatic changes have been observed in the computers themselves and the way ofusing them, which have paved the way to new related fields of research in science and technology.Designers are on the eve of being able to use computers not just as fast and accurate devices, butas partners in the design process.

2.3.4 Critique of Systems Engineering

An area of potential confusion in discussing systems engineering is its scope as a discipline inrelation to activities such as design process, project management, systems management, designmanagement, engineering design, etc. Many engineering design texts (Erichsen, 1989; Sage, 1992)present many confusing overlaps among design, design management and systems engineering, eventhough they emphasize the creative challenges and techniques of the design synthesis task at theinitial design phases.

Differentiation between design and project management is ambiguous. Again there are significantoverlaps, and many of the techniques of systems engineering are also claimed in project man-agement literature. Some author states that systems engineering provides the creative heart ofproject management by defining the technical and work deliverables, i.e. the requirements, thedesign development and all the tasks necessary to build and test the industrial product. Underthis view project management becomes the set of activities associated with implementing andcontrolling the ‘product and process blueprint’, which systems engineering has provided. Thuscontract management, scheduling techniques, cost control, quality assurance, etc., are activitieswhich have no meaning without the foundation of systems engineering. An alternative, but re-lated, view is not to see project management and systems engineering as separate disciplines, butis to see project management simply as the larger canvas which must include systems engineering.

So far, the idea that systems engineering may cope with the overall design of complex productshas been criticized by many experts. This paradigm is no more than a framework for thinking

36

2.4 – Concurrent Engineering

about the engineering process, which needs tailoring to be applicable to a particular product andproject (Van Griethuysen, 2000). It is evident that many industrial products have always beendesigned and produced using some kind of systems engineering. It is also true that much of navalarchitecture and marine engineering concerned with design management is an example of systemsengineering. Thus, it is not so much a question of whether systems engineering can cope withunstructured design of industrial products, but more a question of whether in its current ’born–again’ form it has anything to offer beyond current understanding of the engineering management.

Although existing systems engineering texts - often with a bias towards software/computer sys-tems - appear not to offer anything new for the overall system design process of industrial prod-ucts, there are techniques and insights to be learnt in the area of process/information systems.As engineering products become more influenced by software systems, these methods need to beadded to the design management ‘toolkit’.

However, the relevance of systems engineering should be advocated with due caution because ofthe following reasons:

• The current language of systems engineering has to some extent been abused by engineeringcommunities working in particular industrial sectors. What is presented as ‘general’ is infact often ‘partial’, especially with respect to ’systems design’. It would undoubtedly behelpful to its wider acceptance, if systems engineering publications and courses used moresignificant examples from a wider product base, and paid due attention to the physicalaspects of the design of complex products.

• Systems engineering can be harmful if procedures are applied across products in an inappro-priate or disproportionate manner; see, for example, the over elaboration of requirementsin computer databases under the banner of ‘requirements engineering’, without progressivedesign modelling to establish feasibility in terms of cost and operability.

• Whilst the concept of systems engineering, as a set of knowledge, methods and techniqueswhich can be applied to different product areas, is a valuable one, the further step of defin-ing systems engineering as an independent branch, or even an overall design paradigm, ishighly questionable.

2.4 Concurrent Engineering

Today designers are compelled more than ever to develop strategies to allow for reduction ofdesign time without loss of quality. According to Kusiak (1993), an appropriate paradigm tothis comprehensive perspective is concurrent engineering (CE) whose term originated from theUS Defense Department. At its outset, CE was the concurrent design of a technical system andits manufacturing processes. The main objective of concurrent engineering is to shorten timefrom order to delivery of a new industrial product at lowest cost and highest quality. Figure ??schematically shows the CE approach, where all main activities are carried out through parallelismand bi–directional integration..

37


Figure 2.3. Concurrent product development process

The concept first gained considerable attention during the 1980’s when the United States auto-motive industry realized that it needed to shorten the time for developing and marketing newmodels in competition with the Japanese industry. Concurrent engineering has been widely ac-cepted as an effective engineering practice for decreasing product development time, improvingquality, and decreasing manufacturing costs. Since it aims to consider all elements of productlife–cycle from the outset, CE approach increases the complexity of the design process.

In the past there has been widespread emphasis on work specialization, and the result oftenhas been a stovepipe organizational structure with inadequate communication and transfer ofinformation. To hinder this trend, concurrent engineering aims to totally integrate developmentof product and process design using cross–functional, empowered teams. The essential tenetsof concurrent engineering are customer focus, life–cycle emphasis, and the acceptance of designcommitment by all team members.

Concurrent engineering, like systems engineering, is more a matter of approach and philosophythan an engineering discipline. It represents a common sense approach to an industrial productdevelopment in which all elements of the product’s life cycle from conception to delivery areintegrated in a single continuous feedback–driven design process. There are several other wordsfor concurrent engineering; among the others, ‘simultaneous engineering’, ‘unified life–cycle engi-neering’, ‘producibility engineering’, ‘concurrent design’, ‘co–operative product development’, etc.

A generally accepted definition of CE was prepared by Winner et al. (1988) and is: Concur-rent engineering is a systematic approach to the integrated design of industrial products and theirrelated processes, including manufacturing and support. This approach is intended to cause devel-opers, from the outset, to consider all elements of the product from conception through disposal,including quality, costs, schedule, and user requirements. This definition may be regarded asoperational oriented, implementing concurrent engineering on a low level.

Alternatively, Shina (1991) stated that ”... concurrent engineering is the earliest possible inte-gration of the overall company’s knowledge, resources, and experiences in design, development,marketing, manufacturing and sales into creating successful new products, with high quality andlow cost, while meeting customer expectations. The most important result of applying concurrentengineering is the shortening of the product design, and the development process from a serial toa parallel one.

38


This definition applies more to an overall design strategy at company level. The different per-spectives of the two definitions complement each other and state that concurrent engineeringmust be applied both at the company’s overall product development and design strategy and atoperational level. The most practical definition of CE is probably: Concurrent engineering issystems engineering performed by cross functional teams.

By regarding the different definitions and semantics it is possible to designate some characteristicsof concurrent engineering; the most significant may be: design, integration, parallelism, product ,and process. Implicit in these characteristics, organization, communication, and requirementsmust be managed. Compared to traditional engineering design in which analysis of the productplays the central role, the synthesis of the process is the dominant feature in concurrent (paralleland integrated) engineering.

Concurrent engineering is not new; since as a concept has now been around over two decades, ifone starts counting from the publication of Pennel and Winner’s (1989) deinition. Many of itstechniques and tools have been around since long time worldwide, but this approach packagedthem into an integrated philosophy. Its implementation, therefore, goes to the very structureof an organization and its management philosophy. Implementation of concurrent engineeringrequires moving from:

• department focus to customer focus;

• individual interests to team interests;

• dictated decisions to consensus decisions.

Experience has shown that concurrent engineering cannot be implemented gradually and grace-fully; an all or nothing approach is required. Such changes are clearly difficult to implement andrequire the expenditure of time and money, but they induce potential long-term benefits. Perhapsan even greater challenge is changing the culture of the organization. Managers and workers atall levels may be fearful of giving up some individual authority, but they must recognize thatchange is necessary in order to remain competitive in a global economy.

2.4.1 Basic principles and benefits

The basic principles of concurrent engineering require process orientation, team approach, empow-erment, open communication, and customer satisfaction. Concurrent engineering is characterizedby a focus on the customer’s requirements and priorities, a conviction that quality is the result ofimproving a process, and a philosophy that improvement of the processes of design, productionand support are never–ending responsibilities of the entire company.

In concurrent engineering the design problem is approached by defining a multi–disciplinary de-sign and focusing on such aspects like functional requirements, production, quality assuranceand economic efficiency of the engineering product. Generally, the term concurrent engineeringis connected to consideration of how the product will be manufactured, but it may be used todescribe the consideration of economy in its overall development.

39


As some analysts state, concurrent engineering has shifted companies from a manufacturing en-vironment to a design environment. It changed the way designers work, as they are compelled tointeract with greater numbers of people and gain knowledge from other disciplines and organiza-tions. Throughout all of these changes, the designers have to be the key actors in the concurrentengineering process and the design process has to drive the overall manufacturing process.

The primary benefit of concurrent engineering is improved design quality and production pro-ductivity. This can lead to increased market share achieved by:

• understanding the customer requirements and the cost implied;

• appraising one’s own products with respect to those of the competitors;

• minimizing the time and cost from concept design through production and delivering.

A design team that employs concurrent engineering principles has to include experts in require-ments analysis, cost analysis (acquisition and operation), production engineering, ilities (reliabil-ity, maintainability, availability); material procurement, tests and trials, and marketing.

A basic premise is that the design team has many customers. As to ships, these include theshipbuilder and shipowner staffs. Experts on crew training and logistics are also customers, par-ticularly if the design includes new technologies. These different groups view the ship design fromdifferent perspectives, have different goals and objectives, and they bring different experiencesand expertise to the design team. Hence, early involvement of all these different customers willproduce a better product. Expressions such as ‘integrated product teams’ and ‘integrated prod-uct and process development’ are now widely discussed. Coupling process and product is alsoworthy of note, since it recognizes that if an enterprize hopes to improve the product, it mustfirst examine and improve the processes used to design and build it.

In general, the expected benefits of the concurrent engineering approach are (Winner et al., 1988):

• improving the quality of designs, which may result in dramatic reductions of engineeringchange orders (greater than 50%);

• reduction of product development time by 40–60% with respect to serial design processes;

• reduction of manufacturing costs by 30–40% when multidisciplinary design teams integrateproduct and process designs;

• reduction of the scrap and rework costs by 75% through product and process design opti-mization.

Although concurrent engineering can be implemented in many ways, its basic elements are:

• reliance on multidisciplinary teams to integrate the design, manufacturing and support pro-cesses of a product;

• use of computer–aided design, engineering and manufacturing methods to support designintegration through shared process models and databases;

• use of a variety of analytical, numerical, and experimental methods to optimize a product’sdesign, manufacturing and support processes.

40


Of course, this is a simple strategy to state on paper; it is quite another to implement it inpractice, especially when one recognizes the increasing complexity of modern products and theuse of geographically distributed and multidisciplinary teams. This situation demands the useof information technology to assist the control of the concurrent processes and to ensure that acommon database of information can be shared by all those involved in the product developmentprocess.

The application of concurrent engineering has several meanings to the ship designers. In the past,ship designs were generally developed by a stove–piped design organization without the direct,early participation of the shipbuilder, shipowner, operators and maintainers. Nor were specialistsin unique but important disciplines such as manning, cost, safety, reliability, and risk analysesinvolved from the outset. When these and other groups did get involved, after the design waslargely complete, it was generally in a review and comment mode. By this time, changes wouldbe difficult to incorporate without extra costs.

A customer’s representative should be a design team member. The basic premise of concurrentengineering is that it is better to make design decisions (at all levels) based on real time feedbackfrom all who have an interest in designing, producing, marketing, operating, and servicing thefinal product.

2.4.2 Information Flow

In CE the early design stages are especially significant because major design decisions are madethere with far–reaching effects on the engineering product being designed. Portions of a serialversus a concurrent engineering process of design are illustrated in Figure 2.4.

Figure 2.4. A comparison of serial and concurrent engineering

41


To achieve high–quality designs, the information flow in concurrent engineering between designengineering, manufacturing, marketing, and others, has to be early transferred, and decisions areto be based on both upstream and downstream considerations (bi–directional). On the contrary,in a serial approach information flows in one direction only.

It has been recognized that, in an engineering design, most of the changes occurring in ear1ydesign stages will lead to high quality design with significantly reduced cycle time (Sullivan,1986). On the contrary, if most of the changes happen in late design stages, e.g. re-design, thecost of making changes will dramatically increase since design freedom is highly limited in thesestages. Figure 2.5 shows the comparison of traditional serial design approach and concurrentengineering design approach with respect to a design time line (Kirby and Mavris, 2001). Fromthis figure, one can see that the cost of design changes increases during engineering design andincreases exponentially when the changes are necessary during manufacturing.

Figure 2.5. Serial approach vs. concurrent engineering approach

Therefore, as many changes as possible shou1d be comp1eted ear1y in the design time line. Toprevent the costly re-designs, as much know1edge as possib1e shou1d be made availab1e at theear1y stage of a design and the requisite changes should be accomplished before the cost is 1ockedin. This paradigm shift of bringing know1edge to the ear1y design stages to increase design free-dom and reduce cost is illustrated in Figure 2.6.

It is absolutely evident that as the design process progresses and decisions are made, freedomto make changes is reduced and knowledge about the object of design increases. A product ofand a clear motivation for concurrent engineering is to anticipate the knowledge curve, therebyincreasing the ratio of hard–to–soft information that is available in the early design phases. Thisrelative improvement in the quality of information should lead to products that are completed inless time and at less cost than those designed using a traditional serial process.

42


Figure 2.6. Cost–knowledge–freedom relations

Therefore, as briefly stated before, more and more attention is paid to the conceptual and pre-liminary design stages to increase the probability of choosing a design that wil1 be successful.The decisions made during these design stages, including identifying customer’s requirements,determining the attributes of interest, and selecting analysis tools (design mathematical models),play a critical ro1e in the design process. They are the guidance and basis that subsequent designdecisions rely upon, and have an important impact on the final design solution. Therefore, thesedecisions in the early design stage need to be made rationally based upon decision–besed design..

2.4.3 Concurrent Engineering Environment

To keep the different development processes in balance, a dynamic concurrent engineering envi-ronment must be introduced. It ensures that the different conditions for concurrent engineeringare arranged, systematized and controlled.

Managing Sources of Change

The main reason to introduce a concurrent engineering environment is to manage change in orga-nization and product development. Baker & Carter (1992) outline five sources of change, namely:

• Technology. Both the technology in a product and the technology to produce it becomemore complex. New technology is a source of change and it is important to have a plan forintroducing new technology and managing changes allowed by the technology.

43


• Tools. The tools to design and produce a technical system change in time with technology.The sources of change in using advanced tools may be degree of automation in production,integration of product development processes, and flexibility in the production process.

• Tasks. The variation and complexity of the tasks are sources of change. If the tasks aredifferent from time to time, the task itself become a source of change.

• Talent. Each individual worker/technician/engineer may have a special talent for new ideas,which may be a source of change. In addition, the degree of change is also influenced bythe individual ability to manage external or organizational changes.

• Time. The time to product delivery is important to stay in the market. Therefore, it isnecessary to search for improvements which contribute to reduced overall development andmanufacturing time.

The outlined sources of change are dependent on each other; for example, decisions about tech-nologies impact the choice of design and development tools. Moreover, some aspects may beinternally managed by the company while others are influenced by the company’s external inter-action.

To yield the desired changes, the sources of change are translated into resources by four intercon-nected activities:

• Organization. This includes both the organizational boundaries, such as disciplines and de-partments, and physical location; but typical for concurrent engineering is the design team.The organization may, therefore, be divided into managers and product design teams. Themanagers must establish the overall strategy and are responsible to provide a concurrentengineering environment. They must establish the product development teams by givingthem the authority and responsibility to make decisions. In addition, they must provideteams training and support the team with professional and technical needs.

• Communication Network. The main purpose of communication network is sharing infor-mation. It transfers to the actual members involved in the design process the overall infor-mation related to a product. In large projects with many people or different co-operatedcompanies, establishing one development team may not be possible. In such projects in-ternet communication technologies are required for effective information exchanges andsharing. However, the most effective way of communication may be person–to–person.

• Requirements. The customer requirements are the overall target of an industrial product.The most important to specify in the concept design phase are the required properties andthe constraints of the product. The quality function deployment method described in thefollowing identifies the requirements and solutions in concurrent engineering.

• Integrated Product Development. This activity links all the tasks in the development processtogether, including considerations about support, operation and maintenance through thelifetime. All tasks are executed in parallel and across disciplines.

44


Targets

Concurrent engineering may advantage the customer in two ways. According to Blankenburg(1994), the advantages are both in the process, which means that the customer gets the productfaster and cheaper, or in the product, which means that the customer gets a better product.Therefore, process and product are the targets of concurrent engineering process.

Process. The process includes procedures, methods, techniques, etc., to design and produce aproduct. Most of the literature and definitions of concurrent engineering focus on processes andthis indicates a belief that improved productivity and quality in the processes also results inimproved quality of the products.

Product. The outcome of the product development process is the technical system. In addi-tion, outcomes may also consist of other services which secure the support over its lifetime.(user-instructions, refitting, etc.). The main goal of the product is to satisfy the quality andfunctionality required by the customers.

Mechanisms

Although a considerable number of studies have been devoted on design decomposition as a meansto reduce the complexity of a large scale design problem, only recently due attention has beengiven to the computational framework for dynamic and systematic design integration in a com-puter network–oriented design environment. On the basis of the integrated product developmentmodel, Blankenburg (1994) introduces three mechanisms of concurrent engineering: integration,prescience and parallelism. These mechanisms are necessary to accomplish the activities in theproduct development process according to the concurrent engineering concepts.

Integration. It is important to secure all available, relevant information and knowledge abouta product during the product design and manufacturing, and insure they are taken into con-sideration. No single discipline or department alone has information or knowledge necessary toconsider all the elements influencing a product during its lifetime. Therefore, the knowledgeand information from different disciplines and departments must be integrated. Regarding theintegrated product development model, a vertical integration between the market, product andproduction insure that the information and knowledge from the different departments are takeninto consideration. To consider elements from the different phases of the product developmentprocess, a horizontal integration is necessary. The horizontal integration includes in early designphases considerations from late phases, such as manufacturing and operation. This may be doneby including people from late phases, for example manufacturing and operation, in modellingconcept design phase. The advantage of integration is that all special information and knowledgethat usually belongs to a special discipline, department or phase is shared and taken into con-sideration before a decision is made. This insures that decisions are made in co-operation andacross disciplines and organizational boundaries.

Prescience. Prescience aims to search for and identify forthcoming activities of high uncertaintyas well as to execute parts of these activities searching for information to reduce the uncertainty

45


in preceding activities. Prescience insures short feedback time and iterations instead of long iter-ations from late to early activities.

Parallelism. A way to shorten the time of the product development process is to execute activitiesin parallel, e.g. at the same time, independent of the function or the phase to which they belong.However, the extent and contents of the different activities influence the dependencies and givesome restrictions to parallelism. The restrictions may be divided into three groups:

• resource dependencies, which are restricted by resources available, e.g. quantity and qualityof people, hardware and software;

• precedence dependencies, which are caused by natural limitations;• information dependencies, which occur when one activity is dependent on the information

output from another activity to be still executed.

Concurrent Engineering Matrix

The targets (product, process) and mechanisms (integration, prescience, parallelism) of concur-rent engineering influence each other according to the arrangement of a two–dimensional matrixas shown in Figure 2.7. The matrix shows that combining integration and prescience increases thequality and minimizes the uncertainty of the product and processes of manufacturing. Further, acombination of prescience and parallelism reduces the lead–time and controls uncertainty. Thesecombinations distinguish concurrent engineering from traditional, serial approach of product de-velopment and design.

Figure 2.7. The concurrent engineering matrix

The challenge of a concurrent engineering environment may be summarized as a combination ofmechanisms (integration, prescience, parallelism) which concurrently advance the targets (prod-uct and processes), all supported and arranged by the five sources of change and the four mainactivities of concurrent engineering.

46

2.5 – Quality Function Deployment

2.5 Quality Function Deployment

Engineering systems have become increasingly complex to design and build while the demandfor quality and effective development at lower costs and shorter time continues. Today, manycompanies are facing rapid change, stimulated by technological innovation and new customerdemands. They realize that if they can deliver new products earlier than their competitors, theyhave a good chance of obtaining a major advantage in the market. Thus, designers are attemptingto shorten the duration of new product development through the use of concurrent engineeringconcepts and good time estimation techniques.

However, many new industrial products with short development time are not successful. Thisis mostly because the design teams did not focus on actual customer demands and expectationssince the very initial design phases. For the prevention of unsuccessful products, quality shouldbe set much before the functional and embodiment design phase to avoid developing productswith low customer satisfaction. Quality is the measure of how well a product satisfies a customerat a reasonable cost during its lifetime (Priest and Sanchez, 2001).

There are different groups of properties used to determine quality. Setting quality concerns threepartial areas during the comprehensive development of a technical system:

• quality of design, which has the largest influence on the overall quality; it can be ensuredthrough a methodical and transparent design process, beginning with the definition of needsand extending through requirement analysis, design synthesis, design evaluation, functionalanalysis and system validation; this quality absolutely demands an objective set of technicalknowledge (design science) as well as special techniques to put quality under risk control(robustness analysis);

• quality of manufacturing, which is measured on the produced components of the technicalsystem; it is known as quality of conformity to the manufacturing specifications, i.e. to thedetail drawings;

• quality of application, which appears only when the technical system is employed; this in-cludes also the secondary processes, such as maintenance, repair, refit, upgrading, etc.

Knowledge systems have been developed for each of these three partial areas. The relevant stan-dards for recognition and verification of quality assurance scheme are ISO 9000, ISO 9001, andISO 9002.

Companies employ different design strategies to suit quality assurance. Well–established largecompanies are generally more likely to adopt low risk strategies because of the large losses thatcould accrue from the failure of a product in the market place. At the same time these companiesare very aware of the need to ensure customer satisfaction via product quality and to competein global markets with reducing time available for product development. The challenges aresignificant and have led to the development of complete methodologies aimed to ensure that designteams produce customer–driven designs that are ‘right first time’ and delivered very quickly.

47


One such methodology is known as quality function deployment (QFD), which stands for:

• quality : meeting customer requirements;

• function: focusing attention on what must be done;

• deployment : indicating who will do what and when, perhaps even how.

Quality function deployment is a method employed to facilitate the translation of a prioritizedset of customer demands into a set of project targets by means of applied statistics. In particular,it facilitates identification of a set of system–level requirements during conceptual design. It isalso applicable to all project phases during product, parts, process and production planning.

The QFD–method was originally developed and implemented at the Kobe Shipyards of Mit-subishi Heavy Industries in the late sixties to support the design process for large ships. Duringthe 1970’s, Toyota and its suppliers further developed QFD in order to address design problems as-sociated with automobile manufacturing. Toyota was able to reduce start–up and pre–productioncosts by 60% from 1977 to 1984 through the use of QFD. During the 1980’s, many US–basedcompanies began employing QFD. It is believed that there are now over 150 major corporationsusing QFD in the United States, including Motorola, Compaq, Hewlett–Packard, Xerox, AT&T,NASA, Eastman Kodak, Goodyear, Ford, General Motors, and the housing industry.

The philosophy behind QFD is that product design must reflect customer requirements from thestart of the project and the product development. It needs multidisciplinary coordination. Toinvolve all related actors, such as marketing experts, engineers and manufacturers, it is usualto establish a project team to carry out the QFD–analysis. The aim of the project team is tointegrate all necessary aspects regarding the product development. In addition, to expose in-terdisciplinary and functional relationships, the QFD method allows weighting of criticality andstimulates team work.

The QFD process is relatively simple to outline on paper but requires significant commitmentto achieve in practice. It aims to identify and record customer requirements and then translatethese into design requirements and product component characteristics. Basically the translationinvolves restating the often vague customer requirements into specific design targets and engineer-ing characteristics. By consequence, it requires the identification of operating requirements andmanufacturing procedures that will ensure that the customer viewpoint is maintained throughoutthe design, manufacturing and test process. If successfully applied, the result should be a deeperunderstanding of customer needs coupled with better organized and more efficient projects. Ad-ditionally, there should be a smoother introduction to production with fewer design changes and,of course, an enhanced quality accordingly.

The quality function deployment has also been defined as a system for designing a product basedon customer demands and involving all members of the producer or supplier organization. It is aplanning and decision–making tool. The method is based on a matrix transformation (Fig. 2.8)which forms like a house and it is, therefore, also referred to as the house of quality (HoQ) due toits shape. The process involves constructing one or more matrices or quality tables (Cohen, 1995).

48


House of Quality

Figure 2.8 shows the principles of the house of quality which form the map of the quality functiondeployment analysis. The left part of the HoQ contains the whats, that is the attributes identifiedto better describe customer demands. The top part of the HoQ identifies the designer’s technicalresponse relative to the characteristics (attributes) that must be incorporated into the designprocess to respond to the customer requirements. This constitutes the how , a set of designcharacteristics (technical measures of performance). The inter–relationships among attributesare identified (correlation matrix ). The center part of the HoQ conveys the strength or impactof the proposed technical response on the identified requirement (relationship matrix ).

Figure 2.8. House of quality

The structure of a house of quality (HoQ) depends on the objective, development stage, andscope of the QFD project, and thus, different HoQ charts have different components. However,there is a set of standard components that include the following:

• marketing and technical benchmarking of data from customer and technical competitiveanalysis;

• customer requirements (attributes) and their relative importance;• design characteristics (product specifications);• relationship matrix between customer requirements and design attributes;• correlation matrix among design attributes;• computed absolute/relative importance ratings of design attributes.

49


Mapping the house

There are no general rules to follow to derive the house of quality and the design team mustcustomize its own house regarding the needs and the purpose of the analysis. Some generalguidelines are outlined in the following.

Step 1. The analysis starts by identifying the customer’s requirements such as wants and needs,likes and dislikes, termed the whats, and bringing out the so–called customer attributes (CA). Thecustomer is defined as any user of the design (shipowner, ship operator, shipbuilder, etc.) Theneeds and desires of these customers are identified, based on consensus, including a prioritizing ofrelative importance, which is a weighting of the benefits the customer expects to get by fulfilmentof the CAs. A benchmarking of the product’s competitiveness with regard to the competitorsmay be assessed.

Step 2. The next step makes a list and description of the hows, also called design characteristics(DC), which affect one or more CAs. The DC must be measurable and possible to control by theproject-team and should ensure that the CAs are satisfied. There must be at least one how foreach what and there may be more. Also, each how will typically influence more than one what .The relationships between the CAs and the DCs are marked in the relationship matrix. If a DChas no relationship to a CA, it satisfies no customer needs and may be discarded.

Step 3. The hows and whats are then correlated by means of the what–how relationship matrix ,which is the core matrix of the QFD and correlates the CAs and DCs. Once the CAs and DCsare linked objective measures should be added, which contain customer satisfaction, preferablystated in units of measurement. The DC may be marked with a positive ‘+’ or negative ‘-’ sign.A positive sign means that increasing the value of the attribute to measure will benefit the user,a negative sign the opposite. The hows associated with each what are noted in the appropriateboxes of the matrix and the strength of each association is estimated. By this means, the relativebenefits of each how can be expressed numerically, that is, the hows can be weighted.

Step 4. This step fills the roof matrix which exposes the hows can be correlated with one an-other and the related relationships are also rated. This is done in the roof of the HoQ. Changesmay lead to improved or impaired quality and affect different characteristics. The challenge is tobalance the matrix according to optimal benefit for the customer. By exposing these interactions,the roof matrix may contain the most important information of the QFD–analysis.

Step 5. This step sets the priorities for improving components by weighting attributes to actualcomponents. The components to consider may be added by the design–team’s own priorities andmay include technical items, attributes’ importance and estimated costs.

Step 6. The intention of this step is to help the design team to set the targets. These are enteredat the bottom of the house. The targets are determined by looking at the relative importance asprovided by the users.

50


Application of HoQ

The QFD team employs HoQ analysis to understand which design characteristics maximize cus-tomer requirements and how much these characteristics have to be improved to achieve preferenceover the competitors. To answer the first question, the relative importance of DCs is calculatedtaking into account the importance of customer attributes (CA) as shown in Figure 2.9.

Figure 2.9. A house of quality chart

To answer the second question, traditional QFD finds a proper design strategy to improve cus-tomer perception using trial–and–error methods. In the example given in the figure the matrixhas four CAs and five DCs. It may be observed that CA4 has the maximum importance leveland CAl has the minimum one. So, the design team at first will try to improve CA4 in order toincrease total customer satisfaction. This CA4 is affected strongly by DC2, even though DC2 isnot the most important design characteristic. So, the design team may prefer to improve DC3 atfirst because this strategy can improve three attributes at the same time. The cost of improve-ment is another criterion for the selection of some DCs vis–a–vis their level of improvement. Inthe end, the customer decision depends on the total cost, and thus the product development teamhas to control this design variable. The design team has many possible choices for improvementwith various effects on customer satisfaction and the total cost of the product. Therefore, manytypes of quantitative models, especially optimization models,have been developed to help QFDteams.

Due to the inherent differences between market analysis and design strategy, it is assumed thatthe QFD process is performed in two phases:

1. setting target value related to each attribute according to customer requirements;

2. determination of the design attributes to maximize performance achievements and to min-imize product cost.

Actually, in the second phase, goals are achieved that were set in the first phase. This approachis consistent with the inherent characteristics of QFD to determine a successful design strategy.

51


Linking the houses

The QFD–analysis not only gives the customer’s requirements for the product, but is also suitableduring later phases of the project, mainly concerned with detailed design. Figure 2.10 shows theQFD matrix chain.

To continue the HoQ to succeeding phases, the whats of the preceding house are transformed tohows in the next phase. In each successive matrix, correlations can be identified and the impactsof these correlations can be judged. By this multi–step process, the customers’ desires can belinked to system features and the relative importance of various system features can be assessed.This knowledge can be used to influence the allocation of design resources and the numeroustrade-off decisions that must be made during design development.

Figure 2.10. The QFD matrix chain

2.6 Design Evaluation Methods

Decisions must be made at every stage of the design development process when selecting amongthe technical alternatives that are capable to meet functional requirements.

Traditionally, it has been assumed that the technical requirements are mutually compatible. Inthis case a few feasible alternatives can be developed, selection attributes (or an objective func-tion) established, the criteria applied and a basis ship selected. In this serial process no realdecision-making is involved.

On the contrary, when one wants to consider the real design situation where the criteria gov-erning a selection are in conflict, the decision-making process is as important as the quantitativeoutcomes upon which decisions are based. Multicriterial decision–making (MCDM) methods aredesigned to address this kind of problem (see Chapter 3). The underlined methodologies are stillin development and not so much diffused in the shipbuilding community.

As a transition between the traditional design approach and a rational one based on MCDMmethods, a number of design evaluation methods and tools have been developed to enhance indi-vidual and group evaluation activity in the design process. They are:

52

2.6 – Design Evaluation Methods

• Controlled Convergence Method• Weighted Attributes/Objectives Method• Systematic Method• Probabilistic Method

Each of the above methods employs decomposition to enable design evaluation, and assumesthe availability of soft and hard information about the design being evaluated. Therefore, eachmethod fits with a specific design phase, as reported in Table 2.2.

Method Design Phase

Controlled Convergence ConceptWeighted Attributes/Objectives Basic

Systematic PreliminaryProbabilistic Design Concept/Basic

Table 2.2. Evaluation methods

The increasing complexity of technical systems requires the application of rigorous, team–basedand objective evaluation that demands a clear knowledge of the pro’s and con’s of each supporttechnique and how and when it can be effectively applied within the design process. It is also worthnoting that, with the exception of the ‘probabilistic design’ option, all these methods assumedeterministic evaluations. This means that there is an assumption of certainty about specificvalues of design properties. The more detailed approaches are best applied within subsystemswith an increasing support from computer–aided packages.

Controlled Convergence Method

The controlled convergence method was developed by Pugh (1980) and reflects the fact that theattention of the design team may be initially divergent, generating lots of alternatives, and thenconvergent towards one design solution. This evaluation method is a non–numeric and iterativetool for concept selection, which has joint goals of both narrowing and improving the choice offeasible concepts. It therefore seeks to identify specific strengths and weaknesses of alternativedesign concepts. The method encourages a cyclic process of expanding number of concept designsand elimination of unfeasible alternatives before tending to the ‘preferred solution’.

With reference to Table 2.3, the controlled convergence method uses a simple matrix to comparefeasible concept designs against a set of pre–defined attributes, which should be driven by a clearunderstanding of customer requirements and normative rules. The list of attributes is the verticalaxis of the matrix, whereas the concept designs form the horizontal axis.

To compare concept designs, one of them is assumed as the ‘datum concept’ (‘◦’). It does notmatter which one, but it can help if it is a technical system that already exists. Each concept iscompared with the datum for each attribute. If the concept is better than datum in a property,the corresponding attribute is marked as ‘+’; if it is worse, it is marked as ‘-’; if it is similar to

53


or the same as the datum, it is marked as ‘n’ (neutral). For each design, the total number of ‘+’,‘-’, and ‘n’ scores are added up taking the ‘-’ total away from the ‘+’ total. Each concept willnow have a net score, and it is possible to rank the competing designs in preferential order.

Design Design Design Design Design DesignAttributes 1 2 3 4 5 6

Stability ◦ + + - - nPower ◦ - + + - -

Endurance ◦ + + - + +Payload ◦ + - + - +

Seakeeping ◦ n + - n -Manoeuvring ◦ - n n + +

Vibration ◦ n + n - +Acquisition Cost ◦ - - + - +Operating Cost ◦ + + - + n

Net Score ◦ 1 4 -1 -2 3

Rank 4 3 1 5 6 2

Table 2.3. Evaluation in controlled convergence method

The design team eventually might repeat the cycle, taking one of the stronger candidates asthe new datum, while increasing the level of detail in the evaluation attributes and adding newattribute(s).

Weighted Attributes/Objectives Method

The weighted attributes method is a very straightforward evaluation method. It applies weightfactors according to the relative importance given to each attribute/objective. This method hasbeen applied mainly during the concept design phase when an appropriate amount of informationis available. It differs from the ‘controlled convergence method’ in that cardinal scales (ranks)are used to evaluate the degree of match between the design outcomes and specifications. Weightfactors are applied to each attribute/objective to reflect the relative scale of importance of eachdesign characteristic to the overall quality of the design. When multiplied by the correspondingweight factor a weighted score results for each attribute/objective. The sum of the weightedscores yields a total weighted score providing the means of comparing the overall performance ofeach design. A scale of 1 through 10, or alternatively of 0 through 1, is generally used to rateeach design against the design attributes.

It could come out that even the best ranked design concept still possesses relative weaknesses insome important design characteristics. While keeping that design for further development andanalyses, it is important to remind that these weaknesses may and are to be eliminated in furtherphases of the design process.

54

2.6 – Design Evaluation Methods

Systematic Method

The systematic method is a particular evaluation method developed by Pahl and Beitz (1984).It is very similar to the ’weighted attributes method’ although it has been generally used atpreliminary design phase when much more hard information is available. This is reflected in agrowing number of more detailed design characteristics that may be used as a measure for thedesign outcomes of product subsystems. Since at preliminary and contract design phases enoughinformation is available in the design process via direct computations and experimental analysesto obtain more accurate values for most of the design properties, the systematic method is usedfor weighted optimization of specific subsystems. Once again, when the value of each objectiveis multiplied by its weight factor a weighted score results. The sum of these weighted scoresprovides the relative ranking of each subsystem variant.

Probabilistic Design Method

This method is attributable to Siddall (1983) and is important in that it reflects the uncertaintyof evaluation at the very initial stages of the design process. It is flexible enough to deal withuncertainty, which is an all–pervading and dominant characteristic of engineering practice. Aprobabilistic design can be defined as a design in which the design team codifies uncertainty byprobability distributions. Evaluating this uncertainty is a design decision.

Figure 2.11. Value and probability density curves for engine power

An important feature of the probabilistic design method is its use of the ‘value probability dis-tribution’ (corporate utility) of each design as decision criterion when the design characteristicsare random variables. The probability density curves reflect the uncertainty in the minds of thedesigners. A simple graph with a utility scale on the y axis and a design characteristic value

55


scale on the x axis is used to reflect the customer view of the importance of achieving particularvalues for a design characteristic. In the example shown in Figure 2.11, the value curve (utilitycurve) shows that there is a preference for a lower value of power; indeed, as the power increasesbeyond 15 MW the utility starts to drop significantly. It is worth noting that this technique isuseful in determining target values for the product design specifications when the design is in theconceptual phase. Members of the customer staff may be asked to sketch these utility curves fora range of design properties of the potential product, giving the designers a clear indication ofdesign targets they should try to achieve.

Once the utility curve (value curve) is available the design team superimposes evaluation of eachdesign attribute value for each design option. In Figure 2.11 this is illustrated for two competingdesigns. In Design 1 there is a greater probability that the power will be 20 MW but there is avery small probability that it could be as low as 10 MW or as high as 32 MW. A higher averageand a wider spread are indicated in Design 2. In this case, Design 1 is clearly preferred since itwould result in higher utility for the customer and there is more confidence in the design achievinga more acceptable power level. As the design progresses and more specific information is avail-able, it may become possible for the probability density curve to reduce to one deterministic value.

Unfortunately, almost no technique has succeeded in combining multicriterial decision makingapproach with probabilistic design method. Bandte (2000) has tried to overcome this deficiencyby generating a multivariate probability distribution that serves in conjunction with a criterionvalue range of interest as an applicable objective function for multicriterial optimization andproduct selection.

2.7 Decision–Based Design

Any discussion about designing technical systems of tomorrow, using approaches based on sys-tems thinking and information technology, must include concurrent engineering design for the lifecycle. Further, while the targets of concurrent engineering are clear, there is no generally acceptedmodel for the design process able to combine concurrency and rational design for the lifetime of anindustrial product. It is unlikely that one model will emerge as the ultimate model of design for allindustrial products and processes. Therefore, only a paradigm, such as the decision–based design,can be advocated aiming to make rational, value–based decisions in a realistic design environment.

Decision–based design is a term coined to emphasize that the main role of designers is to makedecisions (Mistree et al., 1990). Therefore, design methods are to be based on a paradigmthat springs from the perspective of decisions made by designers as opposed to design that issimply assisted by the use of optimization methods or specific analysis tools. In decision–baseddesign, decisions serve as markers to identify the progression of a design from conception throughdelivering.

56

2.7 – Decision–Based Design

2.7.1 Basic Principles

Some basic principles from a decision–based design perspective are as follows:

• design involves a series of decisions which may be serial or concurrent;

• design implies hierarchical decision–making and interaction among decisions;

• design productivity can be increased by combining usage of prescriptive models (analysis,synthesis, and evaluation) and more powerful and capable computers in processing numer-ical information;

• life–cycle considerations that affect design can be modelled in upstream design decisions;

• techniques that support design team’s decision–making should be:

– process–based and discipline–independent;– suitable for solving uncertain, imprecise, and ambiguous problems;– suitable to facilitate self–learning.

Decision–based design is the decision–making paradigm that translates information into knowl-edge, provided design decisions are governed by the following main properties:

• decisions on design are ruled by multiple measures of merit and performance;

• decisions involve hard and/or soft information that comes from different sources and disci-plines;

• none of the decisions may yield a singular, unique solution (ideal), since whichever decisionis less than optimal.

2.7.2 Design Types and Structures

Three different types of design, namely, original, adaptive and variant, may be distinguished ac-cording to the amount of originality included (Pahl and Beitz, 1984):

• Original Design: original solution principles are used to design an innovative product; forexample, in shipbuilding an original design occurs only when the well-known and abused‘basis ship’ design procedure cannot be employed.

• Adaptive Design: an existing design is adapted to different conditions or tasks; thus, thesolution principles remain the same but the technical product will be sufficiently differentto meet the new targets derived from specifications; the ‘basis ship’ approach is an exampleof adaptive design.

• Variant Design: only the size and/or arrangement of subsystems of an existing technicalproduct are varied so that the solution principles are not changed.

The type of design has its influences on the type of tools and amount of design interaction re-quired. As shown in Figure 2.12, a variant design is an integral part of an adaptive design, whichin itself can be viewed as a subset of an original design. Whether the design process is classified asoriginal, adaptive or variant greatly depends on the perspective chosen. The application of steam

57


power to ships, which occurred during the industrial revolution, generated original design prin-ciples for providing waterborne transportation. Clearly, this represented a discontinuity in thedevelopment of design solutions for naval and merchant ships. However, if the design procedureis classified based on the functional requirements of the entire product, simultaneous designingin all three categories is possible.

Capability to structure the design process using a set of decision entities is one of the main featuresof decision–based design. Modelling processes (e.g. design, manufacturing, maintenance) helpsdesigners to identify the right problem at the right level and structure each process for ensuringthe ‘best possible’ outcome. Without modelling the design process, it is impossible to providesuitable guidance for improving the efficiency and effectiveness of a design team. By focusing upondecisions, a means should be provided for creating models of decision-based processes, supportedby a computer–based ‘decision support system’ (Bras et al., 1990).

Figure 2.12. Design types

In industrial engineering there is an increasing awareness that decisions made by designers couldbe the key element in the development of design methods that facilitate design for the life cycleand promote concurrency in the process (Suh, 1984; Whitney et al., 1988; Hills and Buxton,1989; Mistree et al., 1991; Zanic et al., 1992).

The starting point for representing a designer’s perception of the real world and design envi-ronment is a heterarchical set of activities arranged without discernable pattern and with noactivity being dominant. Typically, the heterarchical set associated with a product lifetime in-cludes market analysis, design, manufacturing, maintenance of the product and its subsequentscrapping. In decision–based design this heterarchical set embodies decisions or sets of decisions(decision entities) that characterize the designer’s judgment. A hierarchical set of activities, onthe other hand, characterizes the sequence of decision entities involved, and hence, heavily influ-ence the design product. Knowledge and information entities may link the decision entities inboth heterarchical and hierarchical representations. In a heterarchical structure there are con-nections between nodes, but the structure is recursive without a permanent uppermost node orwell–identified starting point (Fig. 2.13).

A design process starts when the first step is taken to extract a hierarchy from a heterarchy,that is, when the dominant node is chosen. In practice, transforming a decision heterarchy to adecision hierarchy requires to identify a correct starting point, that leads to a plan of action thatis both viable and cost–effective.

58

2.7 – Decision–Based Design

Figure 2.13. Heterarchical and hierarchical sets

With knowledge brought forward in the design time line, desighers are able to make more ratio-nal decisions. The Integrated Product Process Development (IPPD), illustrated in Figure 2.14,encourages moving information forward in the design process. IPPD is concerned with upfrontactivities in the early design phases and allows the designers to decompose the product and pro-cess design trade iteration through a systems’s life cycle (Marx et al., 1994). The implementationof IPPD reorders decision making, brings downstream and global issues to bear earlier and inconcern with conceptual and detailed planning (DoD, 1996); so it can allow the design team tomake better decisions in the early design stages.

Figure 2.14. Hierarchical process flow for technical system integration

59


2.8 Decision Support Systems

To provide support for selection in designing technical systems, computer–aided decision supportsystems (DSS) are very effective. They assist decision makers in considering the implications ofvarious courses of thinking and can help reduce the risk of human errors.

The concept of DSS was introduced, from a theoretical viewpoint, in the late 1960’s. In gen-eral, decision support systems can be defined as computer systems that provide information fromdatabases and mathematical models, analyze it using decision–making techniques according tocustomer specifications, and finally yield the results in a format that users can readily understandand use. Thus, the basis target of DSS is to provide the necessary information to the decisionmakers in order to help them get a better understanding of the decision environment and selectamong feasible design alternatives.

A typical structure of a decision support system includes three main components:

• a mathematical design model rooted in knowledge–base systems;

• a multicriterial decision–making shell for implementing decision support tools;

• a set of user’s friendly interfaces connecting evaluation modules and selection procedures.

A decision support system is aimed to carry out whichever different type of design, namely, origi-nal, adaptive and variant. It requires implementation of two design phases, namely, a metadesignphase and a computer–based design phase. Metadesign is accomplished through partitioning adesign problem into its elemental entities and then devising a plan of action by establishing hi-erarchical sets. Multiple attributes and multiple objectives, quantified using analysis–based softinformation and insight–based hard information, respectively, can be modelled providing domain–specific mathematical models (metamodels) to reduce uncertainty in design decision–making.

Overall design and manufacturing processes may be modelled via DSS using entities such asphases, events, tasks, decisions. Formulation and solution of a decision support system provide ameans for allowing different types of decisions:

• Heuristics: decisions made on the basis of a knowledge–base and rules of thumb;

• Selection: decisions based on multiple attributes, weighted according to preferences, for the‘best possible’ design among nondominated alternatives (Kuppuraju et al., 1985; Trincas etal., 1994);

• Robustness: managing the risk and uncertainty related to exogenous design parameters(Allen et al., 1989; Grubisic et al., 1997).

• Compromise: improvement of the ‘best possible’ design through further optimization ofsubsystems (Lyon and Mistree, 1985);

Applications of decision support systems include the design of ships, aircraft, mechanisms, thermalenergy systems, etc. They have been developed also for hierarchical design, where selection–compromise, compromise–compromise and selection–selection decisions may be coupled (Bascaranet al., 1989).

60

2.8 – Decision Support Systems

2.8.1 Design Time Line

An industrial product life–cycle has a beginning and an end with certain specific events occurringat approximately predictable points during this lifetime. Time in development processes of it maybe modelled using event–based time rather than physical time. As noticed earlier, the principaltarget of the design process is to convert information that characterizes the needs and require-ments for a technical system into knowledge about the product itself. From the standpoint of theinformation necessary for making decisions in each of the design phases, what is important is that:

• the types of decisions being made (e.g., selection, compromise, robustness analysis) are thesame in initial design phases of all technical systems;

• the quantity of hard information with respect to soft information increases as the knowledgeabout the product increases.

In the decision support systems (see Figure 2.15, which provides an example incorporating design-ing for concept and designing for manufacturing), the ratio of hard–to–soft information availableis a key factor in determining the nature of the support that a design team needs as soon as asolution is searched. Hence, it is mandatory to define any of the design processes in terms ofphases (e.g., designing for concept and designing for manufacturing) and identifiable milestonesor events (e.g., economic viability, preliminary synthesis, detailed analysis).

Figure 2.15. A typical design time line

Using the hard–to–soft relationship makes intuitively possible to categorize computer–based toolsfor design; for example, tools used to provide support for the decision–making activities form onecategory, while analytical, numerical, and statistical codes that facilitate evaluation of engineer-ing product’s performance form another category.

61


The simplified time line for an original design (Fig. 2.15) shows how in designing for concept phasea net as wide as practicable is distributed in order to generate as many feasible solutions as possibleand then to select the ‘compromise’ concept, which satisfies the functional specifications at best.In designing for manufacturing, the goal is to ensure that the product can be manufactured cost–effectively. Even if it is not explicitly shown in Figure 2.15, in practice iteration between eventsand phases will occur.

Event: Conceptual Design

Feasibility Generate a large number of feasible concepts

(two/three decks, single/twin–screw, diesel/diesel–electric)

Decision via Initial Selection

Generate and select the ‘best possible design’ amongnon-dominated solutions, subject to multiple constraints

Engineering

Functional feasibility of the ‘preferred’ conceptsgiven basic requirements

Decision via Selection

Select the ‘robust design’ for manufacturing development(establish the cost–effectiveness and manufacturability)

(develop top–level specifications)

Event: Preliminary Design

Decision via Compromise DSP

Improve the functional effectiveness of the ‘robust design’ through modification(establish and accept a ‘satisficing’ design)

Contract Assignment

Event: Contractual Design

Engineering

Based on information provided in preliminary design, check the functionality of theimproved design, subject to a comprehensive set of functional requirements

for subsystems, and develop detailed information on acquisition cost

Decision via Refined Compromise DSP

Improve, through modification, the functional and cost–effectiveness of the final design(refine the compromise DSP by including information on costs and manufacturability -

establish and accept the improved design)

Event: Functional Design .....

Event: Detail Design .....

Table 2.4. Flow of designing for an original concept

62


2.8.2 Designing for an Original Product

One possible scenario to accomplish an original design from concept through preliminary designfor a ro–ro vessel is shown in Table 2.4. Provided the economic viability of the project has beenestablished, the first task is the generation of a large number of feasible designs. Techniques thatfoster an original product include brainstorming to identify and agree upon selection of attributesand constraints. At this stage technical and economic information on feasible alternatives shouldbe sufficient to rank candidate designs and to arrive at selection of the ‘robust solution’ via amulticriterial approach.

The key design phase, that is, the concept design, is a three–step process:

• in the first step,the available soft information on attributes is used to evaluate the feasiblesolutions; an initial selection is accomplished by solving for nondominated designs.

• the amount of hard information is increased windowing the design space, in order to reduceuncertainty about attribute values of further nondominated designs;

• finally the ‘robust design’ is selected for further development, which results in a robustproduct that fulfills the functional requirements, is cost–effective and can be manufactured.

In preliminary design the robust solution is improved through sub–optimization of various at-tributes. This is achieved by formulating and solving a compromise decision support problem.

In contractual design the final design is completely reviewed, subject to a comprehensive andstringent set of requirements (final specifications), thus ensuring functional feasibility and cost–effectiveness.

Designing generally involves costly iterations. Ideally they should be avoided or at least accom-plished as rapidly as possible in a decision–based design environment. Iteration costs can also bereduced by evaluating the need for iteration at clearly defined points (phases and events). Theevents are used to model the design process by means of a time line, thus arriving at a metadesign.

2.8.3 Metadesign

The specific activities performed by design teams change as the design process evolves. In theconcept design phase, a mathematical design model of the technical system is needed to evaluateits required properties. The model is built using representations of subsystems or clusters ofsubsystems through tuned metamodels. Later on, in preliminary design stage, within the boundsof top–level specifications, the design teams can arrange and rearrange the essential functionalcomponents of the product, before the design is frozen and changes in it can be made only withgreat difficulty. Therefore, it is necessary to develop methods for dividing technical system designinto subsystems, solving them and then synthesizing the solutions into a metadesign for the entiretechnical system.

For metadesign to represent dynamic partitioning and planning the connotation placed on theterm ‘meta’ can have three meanings:

• after : meta–‘x’ occurs after ‘x’; thus ‘x’ is a prerequisite of meta–‘x’;

63


• change: meta–‘x’ indicates that ‘x’ changes and is a general name of that change;

• above: meta–‘x’ is superior to ‘x’ in the sense that it is more highly organized, of a higherquality or viewed from an enlarged perspective.

This third meaning is the most feasible for design purposes. This notion of higher has also beenemployed in terms like metaknowledge, metadomain, metamodelling, etc.

In a metadesign, the design problem may be divided into subproblems either by decomposing orpartitioning and planning , which are not synonyms, in the early stages of project. For furtherdesign phases, and particularly in the context of the ‘decision support systems’, the differencesin the meanings of these terms are essential to distinguish between two modes of approach todesigning, that is, conventional design and metadesign. In particular:

• Decomposing is the process of dividing the system into its smallest elements. especiallyappropriate when design synthesis is based on the principle of repeated analysis on com-ponents. In adaptive and variant design, decomposition is important and the reverse ofthe decomposition process, that is synthesis, is exploited. On the contrary, in designingoriginal products, which initially are vaguely specified, using decomposition is precluded.Partitioning and planning are then required since subsystems cannot be defined a priori.

• Partitioning is the process of dividing the functions, processes, and structures, that com-prise the technical system, into subsystems, sub–subsystems, etc. In partitioning a designteam is guided by knowledge of the technical system, by considerations of the requirementsthe system must fulfill and the tasks that must be performed by the fully functional system.Partitioning a design problem yields a grouping of interrelated decisions and also providesknowledge and information that can be used for planning. In the DSS technique the productbeing designed is partitioned into its subsystems and the process of design is partitionedinto decisions using generic, discipline–independent models (Miller, 1987).

• Planning allows to add information about organizational resources and time constraintsto the decisions identified in the partitioning phase. These decisions are organized into adecision plan, that is, a plan of action for implementing the decision–based design process.

Metadesign is, therefore, a metalevel process of designing industrial products that includes parti-tioning the product for function, dividing the design process into a set of decisions and planningthe sequence in which these decisions will be made. Metadesign is particularly useful in the designof technical systems in which concurrency among disciplines is required or in which some degreeof concurrency in analysis and synthesis is sought.

2.8.4 Axioms of Decision–Based Design

Metadesign is based on the primary axioms of decision–based design. They map the particulardesign tasks to characteristic decisions and provide a domain–independent framework for therepresentation and processing of domain–relevant design information (Kamal, 1990).

64


Axiom 1. Existence of Decisions in DSS

The application of the decision support systems results in the identification of relevant decisionsassociated with the technical system and its relevant subsystems.

Axiom 2. Type of Decisions in DSS

All decisions identified in the decision support systems are categorized as selection, compromise,or a combination of these. Selection and compromise are referred to as primary decisions. Allother decisions which are represented as a combination of these are identified as derived decisions.Primary and derived decisions are resolved using specialized tools.

Selection Decision

The selection decision is the process of making a choice between a number of feasible alternativestaking into account a number of measures of merit. These measures, called attributes, representthe functional requirements and may not all be of equal importance. Attributes may be quanti-fied using precise and/or vague information. The emphasis in selection is on the acceptance ofcertain alternatives while others are discarded. The goal of selection in design is to reduce thealternatives to a realistic and manageable number.

Keywords and Descriptors

Table 2.5 summarizes the keywords and descriptors associated with the selection and compromisedecision support problems.

DSP Keywords Descriptors

Selection Given Candidate AlternativesIdentify Attributes’ Relative ImportanceRate Alternatives vs. AttributesRank Order of Preference

Compromise Given InformationFind Attribute Values (MADM)

Deviation Variables (MODM)Satisfy System Constraints

Targets (goals, attributes)Bounds

Minimize Distance from IdealDeviation Function

Table 2.5. Keywords and Descriptors in Decision Support Problems

65


Compromise Decision

Similarly, the compromise decision requires to find the best combination of design variables inorder to improve the ‘best possible solution’ with respect to multiple constraints and attributes.The emphasis in compromise is on modification and change by making appropriate trade–offsbased on criteria relevant to the feasibility and performance of the technical system.

Keywords are the tasks that classify domain–relevant information and identify the related rela-tionships. They embody in themselves the domain-independent ‘procedural knowledge’ for deci-sion support problems. Procedural knowledge is the knowledge about the process, i.e., knowledgeabout how to represent and process domain information for design synthesis. The keyword ‘given’is a heading under which the background or known information is grouped.Descriptors are objects organized under the relevant keywords within the decision support prob-lem formulation. Again, they also help to transform the problem from its discipline specificdescription to a discipline independent representation. In other terms, they represent ‘declara-tive knowledge’ (Rich, 1983), which is the knowledge about the product, i.e., the representationof problem relevant information and background knowledge about the domain.

Within the DSS the nature of decision support problems is qualified by means of two axioms:

Axiom 3. Domain–Independence of DSS Descriptors and Keywords

The descriptors and keywords used to model decision support problems need to be domain–independent with respect to processes (e.g., design, manufacturing, maintenance) and disciplines(e.g., hydrodynamics, structural mechanics, engineering management).

Axiom 4. Domain–Independence of DSS Techniques

The techniques used to actually provide decision support need to be domain–independent withrespect to processes and disciplines. This axiom may seem self-evident as many solution tech-niques (e.g., linear programming, nonlinear optimization and expert systems) are applicable toproblems from different domains. However, this condition supplements the previous axiom bystating that decision support models using domain–independent techniques should be solved in adomain-independent manner.

66

Bibliography

[1] Allen, I.K., Simovic G., Mistree, F.: Selection Under Uncertain Conditions: A Marine Application,Proceedings, Fourth International Symposium on ‘Practical Design of Ships and Mobile Units’,PRADS’89, Bulgarian Ship Hydrodynamics Centre, Varna, 1989, Vol. 2, pp. 80.1–80.8.

[2] Andreasen, M.M.: Design Strategy , Proceedings, International Conference on Engineering Design,ICED ’87, The American Society of Mechanical Engineers, 1987, Vol. 1, pp. 171-178.

[3] Bandte, O.: A Probabilistic Multi–Criteria Decision Making Technique for Conceptual and Prelimi-nary Aerospace Systems Design, Ph.D. Thesis, Georgia Institute of Technology, 2000.

[4] Bascaran, E., Bannerot, R. B., Mistree, F.: Hierarchical Selection Decision Support Problems inConceptual Design, Engineering Optimization, 1989, Vol. 14, pp. 207–238.

[5] Bashir, H., Thomson, V.: Project Estimation from Feasibility Study and Completion. A QuantitativeMethodology , Concurrent Engineering: Research and Application, 2001, Vol. 9. no. 4.

[6] Beitz, W.: General Approach of Systematic Design - Application of VDI-Guideline 2221 , Pro-ceedings, International Conference on Engineering Design, ICED ’87, The American Society ofMechanical Engineers, 1987, Vol. 1, pp. 15–20.

[7] Blanchard, B.S., Fabrycky, W.J.: Systems Engineering and Analysis, Prentice–Hall, New York, 1998.

[8] Bras, B., Smith, W.F., Mistree, F.: The Development of a Design Guidance System for the EarlyStages of Design, Proceedings, CFD and CAD in Ship Design, Van Oortmerssen Ed., ElsevierScience Publishers B.V., Wageningen, 1990, pp. 221-231.

[9] Cohen, L.: Quality Function Deployment: How to Make QFD Work for You, Addison–WesleyPublishing Company, New York, 1995.

[10] Cross, N.: Engineering Design Methods, John Wiley & Sons, Chichester, 1989.

[11] De Boer, S.J.: Decision Methods and Techniques in Methodical Engineering Design, AcademischBoeken Centrum, De Lier, The Netherlands, 1989.

[12] DoD: Dod Guide to Integrated Product and Process Development , Systems Engineering, Office of theunder Secretary of Defense (Acquisition and Technology), Washington D.C., 1996.

[13] Elvekrok, D.R.: Concurrent Engineering in Ship Design, Journal of Ship Production, Vol. 13, no. 4,1997, pp. 258–269.

[14] Evans, J.H.: Basic Design Concepts, ASNE Journal, 1959.

[15] Hills, W., Buxton, I.L.: Integrated Ship Design and Production During the Pre–Construction Phase,Transactions RINA, 1989, Vol. 131, pp. 189-210.

[16] Hubka, V.: Principles of Engineering Design, Eder Ed., Butterworth, London, 1982.

[17] Jones, J.C.: A Method of Systematic Design, in ‘Developments in Design Methodology’, John Wiley& Sons, Chichester, 1963.

[18] Kamal, S.Z., Karandikar, H.M., Mistree, F., Muster, D.: Knowledge Representation for Discipline–Independent Decision Making , Proceedings, Expert Systems in Computer–Aided Design, Gero Ed.,Elsevier Science Publishers B.V., Amsterdam, 1987, pp. 289–321.

67

Bibliography

[19] Kirby, D., Mavris, D.N.: A Method for Technology Selection Based on Benefit, Available Scheduleand Budget Resources, Proceedings, 2000 World Aviation Congress and Exposition, SAEAIAA2000–01–5563, 2000.

[20] Kuppuraju, N., Ittimakin, P., Mistree, F.: Design through Selection - A Method that Works, DesignStudies, 1985, Vol. 6, no. 2, pp. 91–106.

[21] Kusiak, A.: Concurrent Engineering: Automation, Tools and Techniques John Wiley & Sons, 1993.

[22] Lyon, T.D., Mistree, F.: A Computer-Based Method for the Preliminary Design of Ships, Journal ofShip Research, 1985, Vol. 29, no. 4, pp. 251–269.

[23] Marx, W.J., Mavris, D.N., Schrage, D.P.: Integrating Design and Manufacturing for the High SpeedCivil Transport , Proceedings, 19th ICAS Congress / AIAA Aircraft Systems Conferencem 94–10.8.4,International Council of the Aeronautical Sciences, 1994.

[24] Miller, J.G.: Living Systems, McGraw-Hill, New York, 1978.

[25] Mistree, F., Smith, W.F., Bras, B.A., Allen, J.K., Muster, D.: Decision–Based Design: A Contem-porary Paradigm for Ship Design, Transactions SNAME, 1990, Vol. 98, pp. 565–597.

[26] Pahl, G., Beitz, W.: Engineering Design, Wallace Ed., Pomerans Transactions, The Design Council,Springer–Verlag, London/Berlin, 1984.

[27] Pennel, J.P., Winner, R.I.: Concurrent Engineering Practices & Prospects , Proceedings, IEEEGlobal Telecommunications Conference Exhibition, Part I, 1989, pp. 647–655.

[28] Priest, J.W., Sanchez, J.M.: Product Development and Design for Manufacturing , Marcel DekkerInc., New York, 2001, pp. 15–36.

[29] Rogan, J.E., Cralley, W.E.: Meta–Design - An Approach to the Development of Design Methodolo-gies, IDA Paper no. P–2152, Institute for Defense Analyses, Alexandria, Virginia, 1990.

[30] Simon, H.A.: The Sciences of the Artificial , The MIT Press, Cambridge, Massachusetts, 1982.

[31] Shina, S.G.: Concurrent Engineering and Design for Manufacture of Electronics Products, VanNostrand Reynold, New York, 1991.

[32] Suh, N.P.: Development of the Science Base for the Manufacturing Field through the AxiomaticApproach, Robotics and Computer lntegrated Manufacturing, 1984, Vol. 1, no. 314, pp. 397–415.

[33] Sullivan, L.P.: Quality Function Deployment , Quality Ptogress, 1986, Vol. 10.

[34] Trincas, G., Zanic, V., Grubisic, I.: Comprehensive Concept Design of Fast Ro–Ro Ships by Multiat-tribute Decision Making , Proceedings, 5th International Marine Design Conference, IMDC’94, Delft,1994, pp. 403-418.

[35] Wallace, K.M., Hales, C.: Detailed Analysis of an Engineering Design Project , Proceedings,International Conference on Engineering Design, ICED ’87, The American Society of MechanicalEngineers, 1987, Vol. 1, pp. 94–101.

[36] Whitney, D.E., Nevins, J.L., De Fazio, T.L., Gustavson, R.E., Metzinger, R.W., Rourke, J.M.,SeIzer, D.S.: The Strategic Approach to Product Design, Proceedings, Design and Analysis ofManufacturing Systems, National Academy Press. Washington D.C., 1988.

[37] Winner, R.I., Pennell, I.P., Bertrand, H.E., Slusarczuk, M.M.G.: The Role of Concurrent Engineeringin Weapons System Acquisition, IDA Report R-338, Institute for Defense Analyses, Alexandria,Virginia, 1988.

68

Chapter 3

Design As a MulticriterialDecision–Making Process

Although design is a purposeful activity directed toward the goal of fulfilling market and/orhuman needs, particularly those which can be met by the technological factors of one culture(Asimov, 1962), and even though the identification of design variables, parameters and con-straints, as well as the selection of the ‘best design solution’ represent a decision–making process(Hazelrigg, 1996), efforts for rationalizing the design process remained taboo for a long time. Infact, designing was regarded substantially as an intuitive and creative activity for which talentwas necessary, rather than as a rational and science–based work.

For more than three centuries, all over the world engineering design has been based on theNewtonian concepts of reductionism and mechanism, considering closed systems in equilibriumisolated from their environments. Only during the Second World War period, particular condi-tions and demands (i.e. shortage of materials) led to designing being more close observed, andcertain insights were made useful for rationalizing. After the Second World War the marketsbecame hungry after all products, but their quality played a secondary role. Only in the laterfifties, but more especially in the sixties, a new situation emerged which brought increasing andbroader demands on higher product quality. In addition, opening of the world markets forcedtoward an increasing international competition, which exploded in the nineties with globalization.

In the past sixty years, there has been a revolution in the way engineers view many of theirproblems and, even more recently, in the way designing is being taught at some universities. Thepressure on the quality of products has led to searching for new knowledge about designing. Thefundamental reason for this change can be attributed to two separate events: a new emphasis onsystems engineering and the pervasive diffusion of computer science. In their synergistic coupling,they have irreversibly changed the world view of engineering and engineering education, and pro-vided the foundation for developing systematic methods for rational, science–based approachesto the design of large–scale, fuzzily–defined, trans–disciplinary technical systems open to externalenvironments.

69

3 – Design As a Multicriterial Decision–Making Process

In the sixties the research efforts were mainly devoted to design methodology . If one analyzesthe status of the design knowledge as it existed then, practically no references can be foundto the working methodology of the designer. The phases of the design process with respect tothe designed product and also with respect to the design process raised further questions, whichhad to be answered to arrive at definition of a rational design process. The knowledge andinsights collected and organized into this system helped not only to efficiently find the necessaryinformation, but also to discover the gaps, and orient the research in these areas.

3.1 Decision Making Process

Design concerns usage of available information to make intelligent decisions leading to optimalsolutions which satisfy the customer’s requirements. Problem definition, for example, involvesdeciding what the customer requirements are, and how to define the constraints and targets.Other design activities such as alternative concepts generation, technology infusion, and conceptselection heavily rely on or are pure decision-making processes. In addition, the selection of thedesign parameter, the basic element of the design process, represents the decision. Therefore, onecan state with confidence that design is a decision making process.

The theory of design makes it possible to help decision makers in identifying which design vari-ables are needed to satisfy the functional requirements of an industrial product, in deciding whya design is better than the others, in understanding whether the ‘preferred solution’ is a robustdesign, and so forth. These and similar goals form a decision–making problem in systems engi-neering. The close relation between design and decision making can be seen from the followingstatements: ”A decision–making problem exists when and only when there is an objective tobe reached, alternative methods of proceeding, and a variety of factors that are relevant to theevaluation of the alternatives or their probability of success” (Dixon, 1966). ”Decision makingis the study of identifying and choosing alternatives based on the values and preferences of thedecision maker(s). Making a decision implies that there are alternative choices to be considered,and in such a case we want not only to identify as many of these alternatives as possible but tochoose the one that best fits with our goals, objectives, desires, values, and so on” (Harris, 1980).

Decision making can be shortly defined as the cognitive process based on explicit assumptions,which leads to the selection of a course among alternatives up to a final choice. Structured ratio-nal decision making is an important part of all science–based professions, where specialists applytheir knowledge in a given area to make decisions.

In general, the performance attributes of the design solution are needed to meet some func-tional requirements and constraints. For example, to design a large merchant ship, multiplerequirements, such as requirements on hydrodynamics, propulsion, structure and noise, need tobe satisfied. Usually, the design that best satisfies one individual requirement does not have thebest performance on other requirements. That is, typically there is no design that has the bestperformance on all the requirements. As a result, trade–offs need to be done when the require-ments are simultaneously taken into account. This usually involves decision making activities,

70

3.1 – Decision Making Process

such as determining the preference information of the customer, establishing the decision rulesof evaluating the alternatives, and selecting the ‘best solution’ among the alternatives. Sen andYang (1998) point out that decision making in engineering design ”can be helpfully visualized asa collection of activities that relate to choice in the context of competing technical or functionalrequirements”. Dieter (2000) also argued that ”decision making is essentially part of the designprocess and the fundamental structure in engineering design”.

3.1.1 Decision Making in Technical Systems Design

According to Baker et al. (2001), decision making should start with the identification of thedecision maker(s) in the decision, reducing the possible disagreement about problem definition,requirements, goals and criteria. Figure 3.1 shows a possible decision–making process at conceptdesign phase, which can be divided in a set of steps.

Figure 3.1. A decision–making cycle at concept design level

71


In the intelligence phase, the goal is to define the problem, collect the necessary information,determine criteria and establish goals. The decision makers (design team) must translate theproblem in a clear, concise problem statement agreed by the customer(s). Even if it can be some-times a long iterative process to come to such an agreement, it is a crucial and necessary pointbefore processing ahead. The following step is to determine the criteria, which spell out whatthe solution to the problem must or cannot do. To establish goals, the decision makers shouldanswer such questions as: what is more important? which are the attributes (objectives)? max-imizing product performance or minimizing its cost? maximizing profit or market share? whatabout minimizing risks? Getting a clear understanding of the crucial goals in a decision situationmust be done before design evaluations are accomplished. In mathematical form, the goals areobjectives contrary to requirements that are constraints.

When the design problem is clearly stated and pertinent criteria and goals are established, thenext step (design phase) is to generate the design alternatives after formulation of an adequatedesign mathematical model. Any alternative must meet sets of criteria for design and selection.The infeasible solutions must be deleted from further consideration, thus obtaining the explicitlist of feasible alternatives. Often a careful examination and analysis of outcomes can revealdesign alternatives that were not obvious at the outset. Therefore, ‘modelling and solving’ formthe heart of most textbooks on decision analysis. Although the idea of modelling is critical indecision making, design problems are generally decomposed to understand their structures as wellas to measure values and related uncertainties. Indeed, decomposition may be seen as a mile-stone to decision analysis (Clemen, 1996). The first level of decomposition calls for structuringthe problem in smaller and more manageable subproblems. Subsequent decomposition by thedecision maker may entail careful consideration of elements of uncertainty in different parts ofthe problem or careful thought about different aspects of the objectives.

Modelling may be performed in several ways. Influence diagrams or decision trees may be used tocreate a representation of the decision problem. Probability theory is used to build models of theuncertainty inherent in the problem. Hierarchical and network models are used to understandthe relationships among multiple attributes (objectives), and utility functions or metamodels areassessed in order to model the way in which decision makers value different outcomes and trade–off competing attributes (objectives).

Every correct method for decision making needs, as input data, the evaluation of the alternativesagainst the criteria. Depending on the criterion, the assessment may be objective, with respect tosome commonly shared and understood scale of measurement, or can be subjective reflecting thesubjective assessment of the decision maker. After the evaluations the selected decision–makingtool can be applied to rank the alternatives or to choose a subset of the most promising alterna-tives.

Decision analysis (choice phase) is typically an iterative process. Once the ‘best alternative’has been designed, sensitivity analysis is performed. If a ‘preferred solution’ or a ‘compromisesolution’ has been selected, the further step is to improve it by developing basic design.

72

3.1 – Decision Making Process

The arrows in Figure 3.1 show that the decision maker(s) may return even to the intelligencephase. It may be necessary to refine the definition of the attributes (objectives) or include at-tributes (objectives) that were not previously included in the mathematical design model . Newalternatives may be identified, the design model structure may change, and the models of uncer-tainty and preferences may need to be refined. The term decision–making cycle best describesthe overall process, which may go through several iterations before a satisfactory solution is found.

In this iterative process, the decision maker’s perception of the problem might change, beliefsabout the likelihood of various uncertain eventualities might develop and change, and inter- andintra–attribute preferences not previously considered might mature as more time is spent in re-flection. Decision making not only provides a structured way to think about decisions, but alsomore fundamentally provides a structure within which a decision maker can develop preferencesand feelings, those subjective judgements that are critical for a good solution.

Figure 3.2 illustrates main categories involved with decision–making activities. Most expressionsand their usage will be explained diffusely over this textbook. Here, the complexity of thedecision–making process is considered mainly with reference to multiutility and multicriterialconcepts.

Figure 3.2. Activities and categories associated with decision making

There are strong connections between design and decision making. Problem definition, for ex-ample, involves deciding what the customer requirements are and how to define constraints andtargets. Other design phases such as alternative generation, design space exploration, and conceptselection, rely heavily on or are pure decision–making processes (Li et al., 2004). Furthermore,the selection of design parameters, which is a basic design fulfillment, represents the decision.

73


3.2 Basic Concepts of Multicriterial Decision Making

Design is a decision making process which permeates through out the design process, and is atthe core of all design activities. In modern design of technical systems, more and more attentionis paid to the conceptual and preliminary design phases so as to increase the odds of choosinga design that will ultimately be successful at the completion of the design process. Therefore,decisions made during these early design stages play a critical role in determining the success ofa design.

3.2.1 What is Multicriterial Decision Making?

Worldwide experience indicates that successful innovative products presuppose significant innova-tion in design strategy . Many design techniques have been introduced over the course of decadesto invent or produce the best product possible. But, whereas inventiveness seeks many possibleanswers and analysis seeks one actual answer, decision making seeks to choose the ‘best possiblesolution’. But such a solution can be difficult to obtain, particularly when the decision is based onseveral criteria. Indeed, decisions in design are multidimensional in nature; hence, they are basedon multicriterial decisions. There is no doubt that the discipline of the decision–making theorywhich is called multicriterial decision making, better respects the very character of a rationaldesign process.

Almost every design problem in modern engineering design inherently has multiple criteria whichneed to be satisfied. It is often the case that good values of some criteria inevitably go with poorvalues of others, so that the best design is always a compromise in some sense. In order to findthe best compromise design solution, designers are required to take all the metrics of interest intoaccount concurrently when making decisions. For example, when designing a merchant ship, de-signers will have to consider reducing cost, increasing performance and minimizing motions. As aresult, a trade–off has to be done, and compromise becomes an essential part of the multicriterialdecision–making process.

Since complex technical systems deal with interacting disciplines and technologies, the decisionmakers dealing with design problems are involved in balancing the multiple, potentially conflictingattributes/objectives, transforming a large amount of customer supplied guidelines into a solidlydefined set of requirement definitions. Thus, one could state with confidence that modern designis a multicriterial decision making (MCDM) process.

Typically, in order to solve an MCDM problem, some necessary factors need to be known before-hand: i) the well defined, measurable criteria, (ii) the preference information on the criteria, (iii)the design alternatives, and (iv) a disciplined, repeatable, transparent decision–making method.The criteria can be thought of as the measure of performance for an alternative, such as speedand payload of a ship concept, and can be identified by analyzing the customer’s requirements.The criteria need to be well defined so that the customer’s requirements can be fully represented.The alternatives are the candidates among which the ‘best solution’ is selected. They may bethe concepts that are already existing, or need to be generated in the design process. Since the

74

3.2 – Basic Concepts of Multicriterial Decision Making

criteria do not have same priority to the customer, the preference information on the criteriashould be defined. Relative weights, which are assigned beforehand or calculated, are a popularway to represent the preference information, even though there are other ways to represent thecustomer’s preference. A set of appropriate alternatives has critical impact on the final solutionbecause the final solution is one of the elements of this set.

The decision making method is usually a systematic process which employs some decision rulesand algorithms to formulate the decision problem and provide guidance to the decision maker(s)to reach the final decisions. Different decision–making methods have their own advantages anddisadvantages, and are suitable to solve one type of decision problem, so the selection of an ap-propriate method should be carefully carried out before the decision–making process proceeds.

Multicriterial decision making (MCDM) methods apply to problems where a decision maker isselecting or ranking a finite number of alternatives which are measured by often conflicting at-tributes. Multiple criteria pervade all that people do and include such public policy tasks asdetermining a country’s policy, developing a national energy plan, as well as planning nationaldefense expenditures, in addition to such public/private company tasks as are product develop-ment, pricing decisions, and research project selection. All have a common thread, i.e. multipleconflicting targets.

MCDM usually refers to the set of methods enabling a decision maker to make decisions in thepresence of multiple, often conflicting, criteria. It is an excellent tool for multiattribute selectionand multiobjective optimization of industrial products. MCDM as a discipline, and its applica-tion, has increased significantly after development of computer science, as most of methods arecomplex combinations of higher mathematics.

Design is a decision–making process, so that there is a close relation between design and decisionmaking, requiring to choose that strategy which best satisfies the decision maker’s goals. Thelatter are customer–applied guidelines for the design team. The peculiarity of multicriterial de-cision making is to shift attention towards the definition and selection of the criteria (attributes,objectives, constraints), which can only be defined and described through the identification ofneeds and requirements (specifications) from the customer.

Even if designs may be managed by means of MCDM techniques in widely different ways, theyshare the following common characteristics:

• Problem statement . Problem formulation is based on identifying the true needs of thecustomer and formulating them in a set of targets (attributes, objectives) for the designsolution. The problem statement has to express as specifically as possible what is intendedto be accomplished to achieve the established goals. Design specifications are a majorcomponent of the problem statement. It is widely accepted that a good problem statementplays an important role in determining the success of the final solution.

75


• Resolution of conflict among multiple criteria. The problem definition yields a set of at-tributes/objectives (criteria) on which the design team should base its design decisions.Criteria play the essential role in the decision–making process, deeming an alternative solu-tion successful when the customer desired levels are met. Multiple criteria usually conflictwith each other. MCDM allows managing these conflicts since is a conflict–resolution ap-proach

• Normalization of attribute values. Each objective/attribute has a different unit of mea-surement. In a technical system selection case, fuel consumption is expressed by tons permile, comfort is measured by specialized indexes in a non-numerical way, cost is indicatedby monetary units, etc. Hence, a normalization of the criteria values may be essential toobtain comparable scales.

• Selection/Optimization. Solutions to design problems are either to select the best solutionamong previously defined finite number of alternatives or to optimize the ‘best possiblesolution’. At first, the MCDM selection process involves searching for an alternative that isthe ‘best possible solution’ or the ‘preferred solution’ over all criteria. Then the ‘preferredsolution’ can be improved by means of a MCDM optimization process.

Studies dealing with the identification of decision alternatives focus on the question how the‘complete solution’ of a decision problem with multiple attributes/objectives can be describedand characterized. This ‘complete solution’ consists of the set of functionally–efficient decisionalternatives and/or the set of efficient vectors of objective values. For linear decision problems,like multiattribute decision making, efforts have been made to identify functionally–efficient facetsof the set of alternatives by assigned preference set of weights given to attributes. Extensionsare concerned with the question to what extent available computational techniques which havealready been applied to linear problem formulations are useful and/or must be modified for thedetermination of the set of efficient points in nonlinear problems.

Apart from dealing with degeneracies of the set of alternatives, the decision–making process isconcerned with the question on how unessential attributes or objective functions can be identifiedand eliminated ex–ante in order to simplify the decision problem.

The decision making about a problem may be partitioned by means of the following double di-chotomy:

1. Is it a problem under certainty or uncertainty? If it is in the uncertainty category, then onehas to assume that to each action there is a well–defined probability distribution over thepossible resulting consequences.

2. Is it a single or multiple attribute problem? That is, can the outcome be adequately de-scribed in terms of a single descriptor or a single aggregate measure like cost, or is morethan one attribute needed? Then the decision can be made implicitly by determining thealternative with the best value of the single attribute or aggregate measure.

The most general case to consider is when a decision problem is both uncertain and multidimen-sional. It can be labelled as x where the tilde represents uncertainty and the boldface x representsa vector in contrast to a scalar. One can distinguish from cases as exhibited in Figure 3.3.

76


When the problem is both certain and unidimensional the analysis is clear, at least conceptually:the decision maker merely chooses the feasible alternative that maximizes the given single objec-tive measure. Of course, in practice, if the alternatives are numerous and constraints are givenin terms of a set of mathematical constraints, the decision maker might be hard pressed to findthe optimum by employing the entire range of mathematical programming techniques.

Figure 3.3. Double dichotomy of decision problems

There are many MCDM methods available in literature. As they have each its characteristics,there are many ways to classify them. One classification method is according the type of datathey use, which can be deterministic, stochastic, or fuzzy . Another way to classify MCDM meth-ods is corresponding to the number of decision makers involved in the decision–making process.There can be only one decision maker, or a group of decision makers.

Since its early development a few decades ago, multicriterial decision making has reached ma-turity but not in all respects. A still too large part of research in this field concentrates onalgorithms rather than problems, even though more attention is being paid to the adaptationof tools to problems instead of the other way round. MCDM is coupled more and more withdecision support tools, using results of research in human sciences and organization theory, as faras they are concerned with the study of decisions by either individuals or groups.

3.2.2 Individual Decision Making

To generate and select solutions for multicriterial decision problems involving only one decisionmaker, one frequently assumes some decision rule which serves as the decision maker’s guidingprinciple. One can distinguish between multiattribute decision problems and multiobjective pro-gramming problems. The former are concerned with the task of ordering a finite number ofdecision alternatives, each of which is explicitly described in terms of different attributes, whichhave to be taken into account simultaneously. The crux of the problem is in obtaining informationon the decision maker’s preferences. This can be achieved in many different ways. The spectrumranges from directly asking the decision maker for preference statements on the basis of strongorders over preference functions, to the attempt to decompose a cardinal utility function withrespect to its arguments in order to be able to measure the effects of isolated changes of individualattributes. On the contrary, multiobjective decision problems are usually characterized by thefact that several objective functions are to be optimized with respect to an infinite convex set(implicitly described by a set of constraints) of decision alternatives.

77


In a relatively large number of procedures, a linear or locally linear approximating utility func-tion is assumed. An optimal solution is then detected gradually by asking the decision makerfor certain values of the objectives, for weights given to the objectives or for marginal rates ofsubstitution between pairs of objectives.

In recent years, a large part of research has been devoted to sensitivity analysis (robustness),that is, to ascertain how sensitive is a given problem solution to unpredictable changes in someparameters. This question is not only important because of uncertainty with respect to the toolsand their effectiveness, but also because of uncertainty about the ‘rightness’ of the statements onthe decision makers’ preferences.

3.2.3 Group Decision Making

With the complexity of design problems increasing, decision making is almost an impossible taskfor the individual decision maker to manage. Group decision is usually understood as aggregatingdifferent individual preferences on a given set of alternatives to a single collective preference. It isassumed that the individuals participating in making a group decision face the problem and areall interested in finding a solution. A group decision situation involves multiple decision makers,each with different skills, experience and knowledge relating to different aspects (criteria) of theproblem. In a correct method for synthesizing group decisions, the competence of the differentdecision makers to the different professional fields has also to be taken into account.

Decision making in groups is sometimes examined separately as process and outcome. This hasled to a series of different methodical approaches. One method has tried to apply the conceptswhich have proven to be successful in dealing with multicriterial problems with one decision makerto problems involving a multiplicity of decision makers using the same analytical tools. Prob-lems on preference structures have been been considered within the framework of multiattributeutility theory (MAUT). Among the others, the following questions have been dealt with: whichaxioms allow the aggregation of the individual utility functions into a group preference function?how to solve the conflict between Pareto–optimal and fair utility distributions among the groupmembers? what forms of group preference functions may be contemplated?

Another approach has chosen a completely different starting point. Partly based on game andbargaining theoretic approaches, the conditions are examined under which the former can beapplied to multiobjective decision problems in groups. In comparison with the first methodicalapproach, the game and bargaining theoretic approaches generally possess a greater formal ele-gance, having their basis in utility theory as well as in other axiomatic viewpoints. Critics pointout that the axiomatic foundation has a large influence on the determination of the optimal solu-tion, which consequently entails the loss of flexibility required for practical applications. However,this critique is counterbalanced by the presence of a great number of such approaches which areable to deal adequately with real decision behavior as observed in groups and organizations.

78


Individual and group decision making are interrelated and can be approached from the samemethodological viepoint.

3.2.4 Elements of MCDM

By MCDM one refers usually to a set of methods enabling a user to aggregate several evaluationcriteria in order to select one or several ‘actions’ (projects, solutions, etc.). But these expressionsrefer also to the activity of supporting decisions for a well–defined decision maker (individual,groups, company, ...).

Set of Methods

The main available methods in decision–making theory stem from very different horizons:

• Utility theory , born in the eighteenth century with the first works of Bernoulli, was con-cerned at first with modelling preferences of an individual decision maker who must chooseamong alternatives with risky outcomes. Multiattribute utility theory (MAUT) is a devel-opment of utility theory .

• Theory of social welfare was also born in the eighteenth century with the works of theMarquis de Condorcet who was interested in the problems of aggregating individual pref-erences expressed under the form of collective unique ranking . Some methods issued fromthis field of research use developments in linear programming ; some others are at the originof important concepts in multicriterial decision making such as outranking relation.

• Multiattribute decision making approaches are more suitable to design problems which ofteninvolve a conflict resolution process focusing on problems when the number of the criteriaand alternatives is finite; the analytical and synthesis tools in concept design must allowfor this.

• Operational research and mathematical programming involve the design of the ‘best alter-native’ by considering the trade–offs for an infinite number of feasible alternatives within aset of interacting constraints. It always had to handle the difficult question of choosing aparticular objective function leaving some aspects of the preference in the set of constraints.Many important concepts and methods have been developed in this field; among the others,the goal programming approach, methods to find the set of efficient solutions, interactivemethods to find a compromise solution, etc.

• Data analysis and multidimensional scaling have recently been conceived with the analysisof qualitative and often ordinal data. Regression methods such as response surface method-ology have been proposed in order to estimate the parameters of a model (additive valuefunction) consistent with some holistic ranking of alternatives.

79


Modelling Decision–Making Activities

Roy and Vincke (1980) define decision making as ”the activity of a person who relies on clearlyexplicit but more or less completely formalized models, in order to get answers to the questionsposed in a decision–making process”. This definition refers to a very large conception of decisionmaking, if compared with classical operational research whose aim is to find out the optimalsolution: it implies analytical approaches or mathematical models. From a practical viewpoint,decision making leads to modelling activities at three levels:

1. Nature of the decision and choice of a problem formulation

What are the alternatives or more generally the actions (alternatives are mutually exclusiveactions are not necessarily exclusive)? While identifying a set of alternatives, the decisionmaker has to choose a problem formulation which might be:

• choice of one and only one alternative;

• choice of all good alternatives;

• choice of some of the best alternatives.

2. Definition of a set of criteria

If the choice of a single criterion is too difficult or arbitrary to make, one has to use severaland often conflicting criteria. The concept of a consistent family of criteria gives conditionsto respect in the choice of a set of criteria.

3. Choice of an approach in aggregating the criteria

In order to aggregate the criteria, one can choose one among the following approaches:

• Aggregation of criteria into a single one called value/utility function

A utility function is the name often given to a multicriterial utility function; this modelconsists of aggregating the n criteria into a function U (g1,g2, . . . ,gn), which representsan overall criterion. In utility theory, the distinction is made between a value functionwhen no risky outcomes are taken into account, and a utility function which allowsthe comparison of risky outcomes through the computation of an expected utility.

• Aggregation models in an outranking relation

These models aggregate the criteria into a partial binary relation (outranking relation)which is ‘more complete’ than the dominance relation.

Dominance relation and efficient set. They are interesting concepts when, but onlywhen, the problem formulation is to select one and only one alternative.

Concepts for building outranking relations are:

– concordance: it generalizes the concept of majority rule;– nondiscordance: it is used to reject a situation of a over b whenever there exists

a criterion for which b is ‘much better’ than a;– cardinal outranking relations: it uses the concept of trade–off ratio.

80

3.3 – Multicriterial Decision–Making Theory

• Interactive and local aggregation of criteria to find a compromise solution

Even though this method was first proposed in the context of multiobjective linear pro-gramming. using the notion of ideal point (Zeleny, 1982), it is better suited to MADM.Each coordinate of this point equals the maximum value which can be obtained on thecorresponding attribute without considering the other criteria, i.e.

g∗ is such that g∗i = maxa∈A

for all i

The interaction process can rely on the following phases (Roy, 1975):

– Search of candidate designs for a compromise solution: considering the informa-tion available on the preference of the decision maker, the model searches for oneor more alternative which could appear as possible compromise solution(s);

– Communication to the decision maker: these solutions are shown to the decisionmaker, together with all the information which seems useful to him/her, such asthe values of these solutions on the different criteria;

– Reaction of the decision maker: some solutions can be judged satisfactory and thenthe procedure stops; otherwise, information on the decision maker’s preferences isobtained; the type of information differs from one method to the other (holisticjudgement, aspiration levels, new constraints which modify the ideal point, etc.).

3.3 Multicriterial Decision–Making Theory

Almost every design problem in modern engineering design has multiple criteria which need tobe satisfied. It is often the case that good values of some criteria inevitably go with poor valuesof others, so that the best design is always a compromise in some sense. In order to find thebest compromise design solution, designers are required to take all the metrics of interest intoaccount concurrently when making decisions. For example, when designing a large merchant ship,designers will have to consider reducing cost, increasing performance and minimizing risks. As aresult, a trade–off will be done.

In a multicriterial decision–making process there is a decision maker, or a group of decisionmakers, who make the decisions, a set of attributes (objectives) that are to be pursued and a setof alternatives from which one is to be selected. In a decision situation the decision makers haveto manage goals, criteria, objectives, attributes, constraints and targets, in addition to decisionvariables. Although goals, criteria, objectives, and targets have essentially similar dictionarymeanings, it is useful to distinguish them in a decision–making context. For example, whilecriteria typically describe the standards of judgements or rules to evaluate feasibility, in MCDMthey simply indicate attributes, objectives and constraints. These terms are individually definedin the Appendix.

81


In designing, multicriterial considerations arise as soon as both economic and technical factorsare present in the design evaluation and selection. In the framework of prescriptive design models(that is, directed toward helping the decision maker to make better decisions), a set of multi-criterial decision–making methodologies, i.e. sequential linear programming, weighted criteriamethods, goal programming, fuzzy outranking, etc., was developed. They allow the design pro-cess to involve a number of often conflicting criteria both of technical and economic nature to behandled simultaneously . MCDM techniques enable the design team either to generate and selectthe ‘best possible’ design or to evaluate the merit index of alternative designs or to optimize somefeatures of a robust design.

The discipline of multicriterial decision making can be broadly grouped into two classes: multi-attribute decision–making (MADM) and multiobjective decision–making (MODM):

• Multiattribute decision making includes methods that deal with selection of the ‘best possi-ble design’ among a discrete set of alternatives, which are described in terms of prioritizedattributes of those alternatives. MADM problems involve analysis of a finite and gener-ally small set of discrete and predetermined alternatives. Assessment of alternatives andselection of the ‘best possible design’ is done via straightforward evaluation. The increasedspeed of computers provides the opportunity to model a complex design problem as a mul-tiple evaluation process by intentionally creating a large number of design variants. Mostof the techniques available for dealing with multiple attribute problems require informationabout (i) the decision maker’s preferences among values of a given attribute (inter-attributepreferences) and (ii) the decision maker’s preference across attributes (intra-attribute pref-erences). The multiple attribute techniques either directly ask the decision maker for anassessment of the strengths of these preferences or they infer them from his/her past choices,while all attributes are evaluated simultaneously .

• Multiobjective decision making relates to techniques that synthesize a set of alternatives,which optimize or ‘best satisfy’ the set of mathematically prescribed objectives (or goals)and constraint functions of the decision maker(s). MODM problems involve the design of the‘best alternative’ by considering the trade–offs within a set of interactive design constraints.They assume continuous solution spaces. i.e. the number of alternatives is effectively in-finite and the trade–offs among design objectives are typically described by continuousfunctions. Multicriterial optimization problems fall under the heading of MODM. That is,optimization will be performed to maximize or minimize the associated objectives, and thefinal selected solution is a design with the best values of the objectives. Each optimizationproblem can be classified into two parts: the set of functions to be optimized (minimizedor maximized), i.e., objectives; and the set of functions to be satisfied in terms of their pre-determined values, i.e., constraints. In general, the objectives are often conflicting so theoptimal solution is usually a compromise concept that can best simultaneously satisfy thedifferent objectives. The inverse mapping implied in this design class is entangled with com-plex mathematical problems. They have lead to different methods tailored to characteristicsof objective and constraint functions of the problem at hand. Long experience with MODMhas shown that during design process number of design alternatives should be investigated,

82


each requiring execution of nonlinear programming modula with sophisticated convergencechecks, linearization techniques, etc. A general multiobjective optimization problem is tofind the vector of design variables x = (x1,x2, . . . ,xn)T which minimizes/maximizes a vec-tor of objective functions f(x) = (f1(x),f2(x), . . . ,fk(x))T over the feasible design space x.‘Multicriterial optimization problems’ (Stadler, 1988) fall under the heading of MCDM.

In actual practice this classification is well fitted to the two facets of design solving: MADM isfor design selection of the best alternative among a finite number of solutions, whereas MODMis for design optimization of the best alternative.

In this respect, the MADM method is best conceived for the concept design phase whose decisions(top–level specifications) become constraints for the subsequent design phases. On the contrary,the MODM method, mostly based on goal programming, is more oriented to support decisionmaking in basic design, since it presupposes some development of drawings and details (Mistreeet al., 1991; Sen, 1992; Ray and Sha, 1994, Lee and Kim, 1996). Although the goal programmingtechnique provides a way of striving towards several objectives simultaneously, the sequentialnature of its procedure implies that the various objectives have to be ranked in a strict hierarchy(Smith, 1992), thus loosing potentiality of really considering all objectives simultaneously.

The main distinctions of MADM and MODM are enumerated in Table 3.1 according to Yoonand Hwang (1981).

Elements MADM MODM

Criteria Attributes Objectives/goalsObjectives Implicit ExplicitAttributes Explicit ImplicitAlternatives Finite number Infinite numberApplication Design selection Design optimization

Table 3.1. MADM vs. MODM

Nevertheless, MADM and MODM approaches should not be thought of as alternative method-ologies to each other but complementary in a rational design strategy. When dealing with mul-tiattribute and multiobjective decisions, a combination of methods is often more effective than asingle technique.

3.3.1 Properties of Attributes/Objectives

Understanding attributes (objectives) is an important part in structuring the decision–makingprocess. The importance of identifying fundamental attributes has to be stressed. Fundamentalattributes are organized into a hierarchy in which the lower levels of the hierarchy explain whatis meant by the higher levels.

To provide the means to measure accomplishment of the fundamental attributes, the concept ofattribute scales is introduced. Some attribute scales are easily defined; others are more difficult.

83


For example, there is no obvious way to measure risks related to aesthetic aspects.

To adequately represent the targets, it is important in any decision problem that the set of at-tributes have appropriate characteristics: to be complete, so that they cover all the importantaspects of the problem; to be minimal, so that the number of attributes is kept as small as pos-sible; to be decomposable, so that the evaluation process can be simplified by breaking it downinto parts; to be workable, so that they can be meaningfully used in the analysis; and to benon-redundant, so that double counting of their impacts can be avoided.. An encapsulation ofthe essential criteria follows:

Completeness. A set of attributes is complete if it includes all relevant aspects of a decision prob-lem and is adequate in providing the decision maker with a clear picture about the degree to whichthe overall goal is met. This condition should be satisfied when the lowest–level objectives in ahierarchy include all areas of concern in the problem at hand and when the individual attributesassociated with each of the lowest–level objectives in this hierarchy satisfy the comprehensive-ness criterion. The fact that important attributes are missing can be indicated by reluctance ofthe decision maker to accept the results of an analysis or simply the feeling that something ismissing. If the results ‘just don’t feel right’, the decision maker has to ask himself/herself whatis wrong with the alternatives that the analysis suggests should be preferred. Careful thoughtshould reveal the missing attributes.

Minimum Size. At the same time, the set of attributes (objectives) should be as small as possi-ble. Too many attributes can be cumbersome and hard to grasp. Furthermore, each objectiveshould differentiate the available alternatives. If all the alternatives are equivalent with regard toa particular attribute, then that attribute will not be of any help in the decision–making process.In some problems, it is possible to combine attributes and thus reduce the dimensionality. Thedecision makers often want to fulfill conflicting attributes/objectives and, since this is an idealwhich cannot be achieved, they must engage in vexing trade–offs or make usage of multicriterialmethods.

Non-redundancy. The final set of fundamental attributes should not be redundant. That is, thesame attributes should not be repeated in the hierarchy, and the attributes should not be closelyrelated. A way in which redundancies enter a set of attributes is when some attributes requirevariables that are inputs to a system while others require variables that are outputs. One exampleof such a problem is the evaluation of space vehicles. An input might be ‘weight’ and an outputmight be ‘thrust’ required to break out of the earth’s gravitational field. Weight may only beimportant because of its implications on thrust.

Decomposability. As far as possible, the set of attributes should be decomposable. A formaldecision analysis requires possibility of quantifying both the decision makers’ preferences andtheir judgments about uncertain events. For a problem with n attributes, this means assessing ann-attribute utility function as well as joint probability distributions for the relevant uncertainties.Because of the complexity involved, these tasks will be extremely difficult for decision problemsin which the dimensionality n is even modestly high unless the set of attributes is decomposable.

84


Workability. Attribute scales must be workable, that is, they should provide an easy way tomeasure performance of the alternatives or the outcomes on the fundamental attributes. Theattributes must be meaningful to the decision makers, so that they can understand the implica-tions on the design alternatives. The decision makers must be aware of the many non-technicalproblems that may render a set of attributes as non–workable.

3.3.2 Typology of MCDM Models

Quite naturally, different researchers have proposed different decision making typologies, whichreflect their own biases. So, any typology reflects individual interpretation of the world of MCDMmodels. The main dimensions of a possible typology are:

• the nature of outcomes: deterministic versus stochastic.

• the nature of the alternative generating mechanism - whether the constraints limiting thealternatives are explicit or implicit.

These dimensions are indicated in Table 3.2. The left–hand column includes the implicit con-straint models. When the constraints are implicit or explicit and non–mathematical, the alter-natives must be explicit. One of a list of alternatives is then selected.

Implicit Constraints Explicit Constraints(Explicit Solutions) (Implicit Solutions)

Deterministic Choosing among deterministic Deterministic mathematicalOutcomes discrete alternatives programming

Stochastic Stochastic decision Stochastic mathematicalOutcomes analysis programming

Table 3.2. A multicriterial decision method typology

The decision analysis problem is included in the implicit constraint category. When the con-straints are explicit and mathematical, then the alternative solutions are implicit and may beinfinite in number if the design space is continuous and consists of more than one solution. Prob-lems in the explicit constraint are generally regarded as mathematical programming problemsinvolving multiple criteria.

More dimensions may be added to this typology. In addition to implicit constraints versusexplicit constraints, and deterministic outcomes versus stochastic outcomes, other dimensionscan be identified as well. The number of decision makers may be classified as a dimension: onedecision maker versus two or more decision makers. One may classify the number of objectives,the nature of utility function considered, as well as the number of solutions found (one solutionversus all nondominated solutions). Two dimensions only have been chosen here because theyseem to be the most significant factors.

85


3.4 Nondominance and Pareto Optimality

Since good values of some criteria inevitably go with poor values of others, the goal of the MCDMis to find the ‘best compromise’ solution which has best overall performance of satisfying all theattributes. This ‘best compromise’ solution can be obtained from a set of design alternativesreferred to as the efficiency frontier , Pareto optimal–set . All these solution sets consist of pointshaving a simple and highly desirable property, i.e. dominance.

A point within such a set is nondominated in that no other point is feasible at which the same orbetter performance could be achieved with respect to all criteria, with at least one being strictlybetter.

The nondominance solution concept, originating with Pareto (1906), has been one of the cor-nerstones of traditional economic theory. It is usually stated as the Pareto principle or Paretooptimality principle: a solution B is dominated by solution A if by moving from B to A at leastone attribute (objective function) is improved while leaving others unchanged. A design solutionis nondominated if there is no other feasible solution which would improve at least one attribute(objective function) and not worsen any other.

The definition of the Pareto optimality indicates that there is no other feasible solution in thedesign space which has the same or better performance than the Pareto optimal solution con-sidering all criteria, and the Pareto optimal solution does not have the best performance in allcriteria (Zeleny, 1982). It is clear that Pareto-optimal solution is a nondominated solution whichis achieved when no criteria can be improved without simultaneous detriment to at least oneother criterion. The locus of the Pareto optimal solutions is known as Pareto frontier . A two–dimensional Pareto frontier is illustrated in Figure 3.4 for ‘smaller is better’ criteria.

Figure 3.4. Two–dimensional Pareto frontier

It is useful to express nondominance in terms of a simple vector comparison. Let x and y be twovectors of n components, x1, . . . , xn and y1, . . . , yn, respectively. Thus

x = {x1, . . . , xn} and y = {y1, . . . , yn}

86

3.4 – Nondominance and Pareto Optimality

One can say that x dominates y if xi ≥ yi (i = 1, . . . ,n) and xi > yi for at least one i, and maycompare x and y directly and say that x dominates y if x ≥ y and x 6= y.

Assume that x belongs to a set of feasible solutions or feasible design alternatives, designated X.Then it is nondominated in X if there exists no other x in X such that x ≥ x and x 6= x.

The set of all nondominated solutions in X is designated N. The main property of N is that forevery dominated solution (i.e., feasible solution not in N) the decision maker can find a solution inN at which no vector components are smaller and at least one is larger. Figure 3.5 provides somegraphic explanation of the above concepts. Feasible set X, the shaded area in the two–dimensionalspace of points x = {x1, x2}, consists of feasible combinations of x1 and x2.

Figure 3.5. Set of nondominated solutions

Observe that the point x in X is dominated by all points in the shaded subregion of X, indicatingthat the levels of both components can be increased simultaneously. Only for points in N doesthis subregion of improvement extend beyond the boundaries of X into the infeasible region.Thus the points in N are the only points satisfying given definitions, and they make up the heavyboundary of X. All other points of X are dominated.

The set of nondominated solutions is often referred to in the literature as the ‘efficient set’, the‘admissible set’, the ‘noninferior set’, the ‘Pareto–optimal set’, etc. The term ‘nondominated’should be preferred because of its clear, unambiguous meaning and because it best describeswhat such points really are: not dominated by other points.

Finding N on X is one of the major tasks of multiattribute methods and multiobjective pro-gramming. At this point some comments should be made about the usefulness of nondominatedsolutions. Among the advantages of dominance concept the following are relevant:

1. Multiple attributes/objectives are often incommensurate, both quantitative and qualitative,and carry different weights of importance. This leads to a complex problem of trade–offevaluation using the decision maker’s utility or preference function. Reliable constructionof a utility function may, however, be too complex, unrealistic, or impractical. The set ofnondominated solutions then provides a meaningful step forward under such conditions ofrelative ignorance.

87


2. If more is always preferable to less, then any solution which maximizes the utility functionof a rational decision maker must be nondominated: if more is preferred to less then onlyhigher or equal utility may be derived from the increased levels of corresponding attributesor criteria of choice. Such a utility function is said to be nondecreasing in its arguments;that is

U (x1 + ∆1, x2 + ∆2) ≥ U (x1, x2) for ∆1,∆2 ≥ 0

Thus, regardless of the specific mathematical form of U , one knows that its maximum willbe reached at a nondominated point.

3. If N consists of only a relatively small number of solutions or alternatives of choice, thereis no need to search for the decision maker’s utility function. Consequently it makes senseto explore X and characterize its N before engaging in the assessment of U . It is not wiseto gather and process all the information needed for utility assessment without finding theapproximate size of N first. It is even possible that an alternative will emerge, such asshown in Figure 3.6.

Figure 3.6. Conflict–free solution

Observe that N consists of a single point only, that such a point will always be the choiceunder the assumption of nondecreasing utility functions, and that an assessment of U forthis particular X would constitute an effort of considerable redundance.

4. The set of nondominated alternatives can be useful in dealing with more complicated typesof X; for example, discrete point sets or nonconvex sets of feasible alternatives. Figure 3.7ashows seven distinct alternatives. The nondominated ones are indicated by heavy dots.Observe that only points 3 and 6 are dominated by some other available points, whilethe nondominated set comprises points 1, 2, 4, 5, and 7. In Figure 3.7b observe that thenondominated boundary is not necessarily continuous, especially in the presence of gaps inX (nonconvex cases of X). Both these cases are more difficult to handle analytically.

88

3.5 – Theory of the Displaced Ideal

Figure 3.7. Nondominance on a discrete point set (a) and on a nonconvex set (b)

A careful review of the figures displaying two–dimensional nondominated sets would reveal thata nondominated solution is a feasible solution for which an increase in value of any one criterioncan be achieved only at the expense of a decrease in value of at least one other criterion. Thisdefinition leads naturally to the concept of value trade–offs: How much achievement with respectto criterion 1 is one decision maker willing to sacrifice in order to gain a particular achievementwith respect to criterion 2? The nondominated boundary is sometimes characterized as the‘trade-off curve’.

3.5 Theory of the Displaced Ideal

The importance of creating new design alternatives and the major properties they should have isto be always emphasized. All design alternatives should be technologically feasible and as closeas possible to the ideal alternative (point x∗).

The theory of the displaced ideal has evolved from ideas that were floating around MCDM circlesfor some years. Its main concept, the ideal solution, has been disguised under many different la-bels, and the exposition of this concept has often been indirect through a large variety of workingpapers, theses, articles. The idea seems to possess the exciting and elegant quality of a paradigm.It seems that the appearance of the concept of the ideal solution is due to parallel searches inthe early sixties for an approach to multiobjective conflict resolution. The idea was temporarilyabandoned in favor of the nondominated solutions concept.

The concept of displaced ideal was briefly introduced by Geoffrion (1967) as the ‘perfect solu-tion’. It was originally conceived as a technical artifact, a fixed point of reference, facilitatingthe choice of a compromise solution. The first fully operational use of the concept occurs in thelinear multiprogramming methodology of Saska (1968). The ideal solution soon became knownunder the term ‘movable target’. Zeleny (1974) introduced the concept of the compromise setand developed the method of the displaced ideal. Sequential displacements of the ideal solutionalso form the basis for evolutive target procedure, introduced by Roy (1977).

89


The concept appears to be general enough to encompass problems involving multiple decisionmakers as well. Some initial thoughts on this possibility are advanced by Yu (1973) who usesthe term ‘utopia point’. It is a diffuse opinion that the concept of the ideal solution and itsdisplacement represent more than a convenient technical tool. It is a hypothesis about therationale underlying human decision–making processes. As such it deserves a full axiomaticdevelopment, empirical testing, and interdisciplinary cross validation.

3.5.1 Measurement of Preferences

The first important prerequisite is to acquire some understanding of the scales of measurementof human preferences, utilities, and subjective probabilities. All three notions are crucial for theso–called von Neumann–Morgenstern utility theory and its major normative dictum: maximizeexpected utility. The above ‘golden rule’ stands rather firmly at the core of modern decisionanalysis and its most lively derivative, multiattribute decision theory (MAUT), as it is discussedin Section 3.8.

Two basic ways of measuring preferences can be expressed by notions of ordinal and cardinalscales. Ordinal scales are purely relational; designs are rank–ordered, and no other meaningfulnumerical properties can be assigned to them. One can say only that design A is preferred to B,that A is equal to B, or that B is preferred to A, but one cannot say by how much; the intensityof preference is not apparent from ordinal scales.

Ordinal scales can be expressed through numerical or verbal rankings, i.e., 1, 2, 3, 4, etc., or ‘bad’,‘average’, ‘good’, ‘excellent’, etc. A special case of an ordinal scale would be a boolean variable,i.e., assigning 1 to preference and 0 otherwise. Ordinal numbers are those for which the intervals(differences) between them are meaningless. That is, if 7−5 6= 4−2, then all algebraic manipula-tion of such numbers are meaningless as well, and the numbers can be replaced by ordinal ranking.

Cardinal scales do assign meaningful numerical values (numbers, intervals, ratios, etc.) to the de-signs in question. Intervals or differences between cardinal numbers are meaningful; for example,7−5 = 4−2, and addition, subtraction and multiplication by a constant are allowable operations.

Cardinal scales can be further divided into interval and ratio scales. Interval scales are char-acterized by the allowance of an arbitrary zero point so that only addition, subtraction andmultiplication by a constant are well defined. The Fahrenheit and Celsius scales of measuringtemperatures are typical examples. Ratio scales are characterized by a nonarbitrary zero point, asfor example in the Kelvin temperature scale. Here the multiplication by interval–scaled variablesis allowed, i.e., the ratios of individual scale values have meaning.

Observe that cardinal ordering would be meaningless unless the interval 0 to 1 were specified.Without such reference points, or anchor points, the decision maker could not make any sense outof the intensities of preference. Although anchor points can be chosen arbitrarily. there are somechoices that are better than others. And often an anchor point is implied uniquely and unequiv-ocally by a given physical situation. It will be argued later that, especially in decision making

90


in designing and related assessment of preference, reference designs are not selected arbitrarilybut are characterized by distinct desirable properties. Even ordinal scales can be anchored, i.e.,furnished with convenient reference points.

Anchored scales are important because decision makers usually express their preferences only withrespect to a given reference point (or points). The choice of appropriate anchors will influencethe intensity or even rank order of preferences. It is not sufficient to ask, do you prefer A to B?One must know with respect to what. What is the point of reference, the framework of inquiry?Is it point C? Or is it D? That makes difference for the choice of A versus B?

3.5.2 Traditional Utility Approach

Some relevant utility theory concepts are briefly anticipated to understand better the advantagesof the ideal point concept. Consider the preference space in Figure 3.8. Both axes x and y

may represent a number of things: attribute scores, criteria levels, preferences of two differentindividuals, and so on. Maximum utility is achieved at M , the ‘point of bliss’. Obviously, M

is preferred to all points on lower indifference curves, that is, M > In > . . . > I2 > I1; towardpoints on the same curve, like A and B, the decision maker is assumed to be indifferent, that is,A ≈ B, where the symbol ≈ indicates indifference. ln the absence of any availability constraints(or production–possibility boundary), point M would always be the choice; no conflict is presentand no decision making is needed.

Morgenstern (1972) criticizes the indifference–curve analysis as introduced above. If x and y de-note respective amounts of goods in one’s possession, then one could move from B to M directlyby disposing of (freely or at costs) excess amounts of x and y. One cannot similarly go from A

to M . It is thus difficult to maintain indifference between A and B; actually B > A. Similarly,F > A, A > E, etc. It turns out that indifference curves seem to be valid only in the shadedsubregion of Figure 3.8.

Figure 3.8. Unconstrained utility space

91


Most utility theory assumes that all alternatives are comparable in the sense that given any twoalternatives, one or the other is either strictly preferred or the two are seen as being preferentiallyequivalent. If the decision maker is presumed not to be able to express the intensity of his/herpreference, as is assumed in the ordinal utility model, then also the notion of indifference, whichis the extreme and the most precise expression of preference intensity (i.e., the one of intensityzero), becomes difficult or impossible to assess explicitly.

If the decision maker does not strictly prefer one alternative to another, the absence of strictpreference should not imply indifference. As Roy (1977) emphasizes, certain pairs of altemativesare noncomparable because the decision maker (i) does not know how to, (ii) does not want to, or(iii) is not able to compare them. To confound such noncomparability with indifference representsa considerable simplification of the decision–making process. In Figure 3.8 decision maker is notexpected to be able to state how much G is preferred to C (only that G > C), and yet he/sheis assumed to be quite capable of stating how much C is preferred to E. The decision maker isassumed to be able to determine the indifference between C and E with absolute precision.

3.5.3 Ideal Point

Coombs (1958) assumes that there is an ideal level of attributes for candidate designs and thatthe decision maker’s utility decreases monotonically on both sides of this ideal point. He showsthat probabilities of choice depend on whether compared alternatives lie on the same side of theideal point, or whether some lie on one side of the ideal and some on the other.

Figure 3.9. Constrained utility space

In technologically constrained situations, as in Figure 3.9, attainment of M becomes an unrealis-tic goal over a given planning time horizon. The set of available alternatives is much too limitedby the production–possibility boundary P . Conflict between what is preferable (the ends) andwhat is possible (the means) is thus established, and a decision–making process may take place.

92


Because M is not a clearly defined point or a crisply delineated region but rather a fuzzy cloudof preferred levels of attributes, the conflict is perceived by decision makers only as a fuzzy senseof conflict.

As the decision maker attempts to grasp the extent of the emerging conflict between means andends, he/she explores the limits attainable with each important attribute. The highest achievablescores with all currently considered attributes form a composite, an ideal alternative x∗. Fig-ure 3.9 shows both M and x∗. Whereas M is almost always too difficult to identify, x∗ is easierto conceptualize because all its characteristics can be directly experienced and derived from theexisting alternative choices. These individual attribute maxima can be found, quantified, andmade fully operational. Point x∗ serves as a good temporary approximation of M in decisionmaking.

The general infeasibility or nonavailability of x∗ creates a predecision conflict and thus generatesthe impulse to move as closely as possible toward it. Because of the conflict experienced, thedecision maker starts searching for new alternatives, preferably those which are the closest to theideal one.

It should be noted that if such an ideal alternative is created, that is, if point x∗ becomes feasible,then there is no need for further continuation of the decision process. Conflict will have beendissolved, and x∗ will automatically be selected since it is unquestionably the best of the currentlyavailable choices, provided that the set of alternatives is technologically closed.

Note that in contrast to the relative stability of M , ideal point x∗ can be and is frequentlydisplaced. It is responsive to changes in the available set of alternatives, objectives, evaluations,measurements, and even errors. It responds to new technological advances, inventions, anddiscoveries of oversights. It becomes a moving target, a point of reference which provides ananchor for human adaptivity, intransitivity, and dynamic adjustment of preferences.

3.5.4 Key Concepts and Notation

Recall that X = {x1,x2, . . . ,xm} denotes the set of feasible alternatives and that each alternativeis characterized by n attributes. For example, the kth design alternative can be written as

xk = (xk1, x

k2, . . . , xk

n) k = 1, 2, . . . , m

Individual xki designate the level of attribute i attained by alternative k, where i = l, . . . , n; k =

l, . . . , m.

Thus, xk is simply a vector of n numbers, assigned to each xk and summarizing the availableinformation about xk in terms of incommensurable, quantitative and qualitative, objective andsubjective, attributes and criteria. It has been thus established what is often called a ‘multiat-tribute alternative’ in decision theory.

93


Look now at the ith attribute in isolation. The set X generates m numbers, a vector

xi = (x1i , x

k2, . . . , xm

i )

representing the currently achievable scores or levels of the ith attribute. Their simplest interpre-tation occurs when it is assumed that more is always preferred to less (or vice versa). Because

mink

xki = max

k(−xk

i ) k = 1, 2, . . . , m

i.e., finding the minimum of the m numbers is identical to finding the maximum of these numberstaken with negative signs, one shall agree to treat both cases as maximization.

There are, of course, situations when the extremal achievable scores of an attribute are not de-sirable. That is, there is an ideal value, and desirability decreases monotonically on both sides ofthis ideal point. Note that if this ideal point happens to lie outside the feasible set (i.e., it is notcurrently achievable), then the assumption of straightforward maximization again applies.

Among all achievable scores for any ith attribute, see vector xi, there is at least one extreme or‘ideal value’ that is preferred to all others. It can be called an ‘anchor value’, denoted x∗i , andwritten as

x∗i = maxk

xki i = 1, 2, . . . , n

with the understanding that the above ‘max’ operation is only a simplification, since both maxi-mum and ideal values are included in the concept of an anchor value.

The set of all such ‘anchor values’ is called the ‘ideal alternative’ or the ‘ideal’ denoted as

x∗ = (x∗1, . . . , x∗n)

The ‘ideal’ plays a prominent role in decision making. Suppose, for example. that there existsxk in X such that xk ≡ x∗; the ideal is attainable by the choice of xk. There is no decisionto be made. Any conceivable (but rational) utility function defined over an n–tuple of numbers(x1, . . . , xn) would attain its maximum value at x∗ and consequently at xk. The ideal is, however,not feasible in general , or if feasible, it soon becomes infeasible as soon as the decision makerraises aspiration for just one xi.

At this point, the axiom of choice can be stated: Alternatives that are closer to the ideal arepreferred to those that are farther away. To be as close as possible to the perceived ideal is therationale of human choice.

The fuzzy language employed in the axiom of choice - ‘as close as possible’, ‘closer’, ’farther’, etc.- reflects the reality of the fuzziness of human thought, perception, and preferences. It is actuallymore precise than the artificial precision and rigor of mathematical formalism. Before engagingin further elaboration of the axiom, it is proper to clarify a few minor points.

It is quite obvious that ‘preference’. can be expressed as an ‘as far as possible’ concept as well,employing an anti–ideal as the point of reference. It can be shown that the two concepts areclosely interrelated and complementary.

94


3.5.5 Fuzziness and Precision

It is straightforward to explore the case of a single attribute first, mainly to emphasize its inclusionas a special case of the displaced ideal theory. Given that the anchor value of a single attribute hasbeen successfully located, the decision problem is trivial: choose the anchor value. Constructionof a utility function seems superfluous. Neither the choice nor the ordinal order would be affected.

In order to express the intensities of preference for all alternatives (especially if a selection of mul-tiple alternatives is intended) and to demonstrate the use of the axiom of choice in this specialcase, a cardinal analysis is essential.

Since the ideal point and the anchor value are now identical, the alternatives close to x∗i arepreferred to those farther away. Consider the following: Three different alternatives are to beevaluated with respect to a single, simple attribute, say ‘euro return’. For example, a three–dimensional vector of returns might describe the alternatives (5, 10, 100). Obviously the first twovalues are quite far from 100, with 10 being a little closer then 5. Observe that 100 is the anchorvalue and, in this case, the ideal. Assume that the lucrative third alternative has tuned out to beinfeasible and was replaced by a new alternative, thus generating a modified vector (5, 10, 11).This change in the anchor value has also caused 10 to be much closer to the ideal than 5. Thedifference between 5 and 10 has changed from negligible to substantial.

There are two important points made by this example: the levels of preference change with thesituation, and they are expressed in fuzzy terms.

It is suitable, therefore, to employ the linguistic approach developed by Zadeh (1973). Theessence of the linguistic approach is best captured in Zadeh’s principle of incompatibility . Hestates that ... as the complexity of a system increases, our ability to make precise and yet signif-icant statements about its behavior diminishes until a threshold is reached beyond which precisionand significance can no longer coexist (Zadeh, 1973).

The complexity of human preferences is unquestionable, and it is amplified further by the dom-inant role of judgment, perception, and emotions. In contrast, to create units of measurementfor preferences may allow for precise mathematical treatment but diminishes understanding ofhuman preferences. The key elements in human thinking are not numbers but labels of fuzzy sets,i.e. , classes of objects in which the transition from membership to nonmembership is gradualrather than abrupt.

For example, to designate a color by a natural linguistic label, such as ‘red’, is often much lessprecise than to apply the numerical value of the appropriate wavelength. Yet it is far moresignificant and useful in human affairs. Similarly, people tend to assign a linguistic rather thana numerical value to the intensity of our choice preferences. In order to amplify the relationshipbetween fuzziness and precision in human deliberations, the example of labelling of colors iselaborated in the following short digression.

95


Fuzzy linguistic labels, rather than ‘precise’ numerical measurements, are often successfully usedby large numbers of people with no apparent handicap or uncertainty. A typical example concernsour way of defining colours.

It would be most precise, a scientist might argue, to associate each color with its particularwavelength, as registered by the human retina, and measure it in angstroms to as many decimalplaces as desired. Yet anybody has never suggested such nonsense, even though the wavelengthsof colors are in principle measurable.

The reason for using linguistic labels for designating color is the need for a system which is ac-ceptable and usable in science, sufficiently broad for art and industry, and sufficiently familiarto be understood by the public. Such diverse needs are well met by the Munsell color system.According to this system, each color can be described in terms of three basic attributes: hue,lightness, and saturation. Hue names can be used as both nouns and adjectives: ‘red’, ‘reddishorange’, ‘orange’, ‘orange yellow’. ‘yellow’, etc. The hues include black, gray, and white. Theterms ‘light’, ‘medium’, and ‘dark’ designate decreasing degrees of lightness. The adverb ‘very’then extends the lightness scale from ‘very light‘ to ‘very dark’. Finally, the increasing degrees ofcolor saturation are labelled with the adjectives ‘grayish’, ‘moderate’, ‘strong’, and ’vivid’. Ad-ditional adjectives cover combinations of lightness and saturation: ‘brilliant’ for light and strong,‘pale‘ for light and grayish, and ‘deep’ for dark and strong.

Combining the agreed upon linguistic labels, one can specify about 267 visually distinguishablecolors, for example, vivid purple, brilliant purple, very light purple, very pale purple, very deeppurple, very dark purple, but also dark grayish purple, very light purplish gray, and strong pur-plish pink. Most of these colors can be recognized and their differences remembered by scientists,artists, professionals, and the public alike.

The Munsell system also fixes the boundaries of each color name. These boundaries are thentranslated into numerical scales of hue, lightness, and saturation, and each color can thus beexpressed as accurately as desired.

Definition. A fuzzy subset A of a set of objects U is characterized by a membership function fA

which associates with each element x of U a number fA in the interval [0, 1], which representsthe grade of membership of x in A.

This definition will be used to exemplify the meaning of ‘as close as possible’ in the axiom ofchoice. Consider the vector xi of available scores of the ith attribute over m alternatives. Thedegree of closeness of xk

i to x∗i is defined as

d (xki , x

∗i ) = Uk

i

where Uki = 1 if xk

i = x∗i and otherwise 0 ≤ Uki ≤ 1.

96


3.5.6 Membership Functions

Essentially the ith attribute’s scores are now viewed as a fuzzy set , defined as the following set ofpairs

{xki , U

ki } i = 1, . . . , n ; k = 1, . . . ,m

where Uki is a membership function mapping the scores of the ith attribute into the interval [0, 1].

For example, the scores generated by available alternatives might be labelled with respect to theideal as ‘close’, ‘not close’, ‘very close’, ‘not very close’, ‘distant’, ‘not distant’, ‘not very distant’,‘not close and not distant’, etc.

The membership function of a fuzzy set can be defined by a fuzzy recognition algorithm, a pro-cedure suggested by Zadeh (1973). At this stage it is enough to simply introduce a few plausiblefunctions yielding the degree of closeness to x∗i for individual alternatives:

1. If x∗i is a maximum, then

Uki =

xki

x∗i

2. If x∗i is a minimum, then

Uki =

x∗ixk

i

3. If x∗i is a feasible attribute value, whether x∗i is preferred to all xki smaller and larger than

x∗i , then

Uki =

[12

(xk

i

x∗i+

x∗ixk

i

)]−1

4. If the most distant feasible score is labeled by zero regardless of its actual closeness to x∗i ,one can define

xi∗ = mink

xki

and write Uki as

Uki =

xki − xi∗

x∗i − xi∗

The above four functions Uki indicate that xj is preferred to xk when Uk

i < U ji .

To gain a proper numerical grasp of the functions Uki introduced so far, evaluate a simple vector

of ten numbers with respect to their distances from different anchor values x∗i (and xi∗), as shownin Figure 3.10.

97


Figure 3.10. A simple fuzzy recognition algorithm

That implies that preference ordering among available alternatives is transitive with respect to asingle attribute.

3.5.7 Multiattribute Dependency

Degrees of closeness Uki are not of great value in the case of a single attribute. The transitivity of

preferences is preserved along a single dimension, and the ordinal ranking of alternatives is notinfluenced by changes and adjustments in degrees of closeness.

Alternatives are usually characterized by multiple attributes, i.e., by vectors xk = (xk1, x

k2, . . . , x

kn),

k = 1, . . . ,m. Independent attributes can be represented as in table reported in Figure 3.11.

Figure 3.11. Matrix of alternatives and attributes

In each column the decision maker locates an anchor and then transforms the scores into the cor-responding degrees of closeness, i.e., all xk

i ’s would be changed into Uki ’s according to a particular

membership function, as for example, one the four function types (fuzzy recognition algorithms)indicating the closeness of an attribute score to the ideal attribute. A question now arises, Howclose is the kth alternative to the ‘anchor value’ along the ith attribute? There are n questionsfor each alternative. If the decision maker was to assume independency among the individualcolumns of the given table, this approach would be quite straightforward. There is, however, usu-ally some interdependence among the attributes in the sense that the value of, say, Uk

1 restrictsor even determines the possible values of Uk

2 , Uk3 , etc.

98


Assume that attributes are generally dependent on each other in a complex, dynamic, and highlysubjective way. This subjective nature of attribute dependency makes an interaction betweendecision maker and model almost mandatory. To this end, some traditional notions of attributedependency are now reviewed briefly, as they can be derived from the multiattribute utility lit-erature.

Most theories of multiattribute utility first define strict independence conditions for a decisionmaker’s preferences for different levels of a given set of attributes while the levels of the remainingattributes are held fixed. It is often assumed that when the levels of the other attributes shift,the initially derived preferences stay unaffected. The two basic types of attribute dependency arevalue dependency and preferential dependency:

• Value dependency. A set of attributes is value–dependent if the measurement of numericalscores (either objective or subjective) with respect to one attribute implies or restricts aparticular attainment of scores by all other attributes of the set. Typical examples arewater temperature and water density, cost and price, and size and weight.

• Preferential dependency. A set of attributes is preferentially–dependent on other attributesif preferences within the set depend on the levels at which the scores of other attributes arefixed. For example, the preference for speed in a car depends on safety; the preference forsize in a ship depends on harbors’ capability; etc.

These two essential types of attribute dependency form a base for an array of more specifictechnical derivatives of dependency conditions. Note that value dependency and preferential de-pendency are themselves interdependent. That is, the scores of the attributes cannot be fixed atany particular level without simultaneously fixing all value–dependent attributes as well.

Preferential changes are thus induced in response to different subsets of the value–dependent set,and consequently they are extremely difficult to trace. Similar interdependence exists across thealternatives. The problem lies in the proper specification of attributes and in increase of thenumber of attributes, and such composite attributes are often difficult to quantify and even toconceptualize.

Traditionally, dependency has been treated as separable from a particular set of feasible alterna-tives. Thus, if the intensity of preference for a given level of one attribute systematically changeswith respect to all achievable levels along the second attribute, then all the conditional or para-metric preferential functions must be assessed a priori.

Focus on X, the set of all initially feasible alternatives. Each alternative k induces a particularvector xk consisting of the scores attained with respect to all salient attributes. In this sense onecan say that all attribute scores are fixed for a given alternative. That is, xk

1 comes only with xk2

and not any other value. The two scores xk1 and xk

2 are not separable, and they both characterizea particular alternative in a vector sense. Consequently, the value dependency, as defined earlier,does not require any special attention.

99


Instead of making an a priori assessment of attribute dependency, its impact is implicitly in-corporated into the dynamic process of partial decision making. As an alternative, say the kth,is removed from further consideration, the set of n attribute scores (xk

1, . . . , xkn) is removed as

well. The initial evaluation is performed on a more or less complete set X, and the attributeinteraction demonstrates itself only as the alternatives (and the appropriate attribute scores) arebeing progressively removed. The impact of removing an alternative k is essentially twofold:

• the variety and contrast of the currently achievable attribute scores is reduced;

• the ideal alternative can be displaced if the removed alternative contained at least one at-tribute anchor value.

Consequently, the removal of any alternative affects the ranking of the remaining alternatives interms of their closeness to the ideal. It also affects the discriminatory power of the attributes andthus their relative importance as well. Finally, if the ideal is displaced, the actual distances of theremaining alternatives must also be recomputed. As some attribute scores become unavailable.the preferences for the remaining levels have to be interactively reassessed.

Attribute levels do not increase or decrease by themselves, by decree or by an analyst’s fixations..There is always the underlying alternative or a set of alternatives being made available or un-available. No significant understanding of preferences. their intransitivities and reversals, can beachieved without analyzing the dynamics of the set of feasible alternatives.

The simple notion of anchor dependency is introduced reflecting the conditions of choice discussedabove: A set of attributes is anchor dependent if the degrees of closeness computed within theset depend on the corresponding anchor values as well as on the degrees of closeness associatedwith other attributes in the set.

Then, all degrees of closeness shall interactively be adjusted each time an ideal value of anattribute is displaced. The question, how close is alternative k to the ideal?, can be viewed asa composite question, a collection of constituent questions. How close is alternative k to the ith

attribute anchor value? The answer to the composite question can be derived from the answersto its constituent questions. The multiattribute nature of this dependency. i.e., the manner inwhich the constituent questions are combined to form a composite question. is explored next.

3.5.8 Composite Membership Functions

Answers to both the constituent and the composite questions represent the grade of membershipof the alternative k in the fuzzy set ‘as close as possible’, expressed either numerically or linguis-tically. This answering thus corresponds to assigning a value to the membership function. Theanswer set may be the unit interval [0, 1] or a countable set of linguistic labels defined over [0.1].

Assume that the set of feasible alternatives X has been mapped through Uki s into a ‘distance

space’, where Uki represents the degrees of closeness of xk

i to x∗i . Denote the space of all Uki s

generated by X as D.

100


Note also that the ideal alternative is now translated into a unitary vector U∗ = (U∗1 , . . . , U∗

n).because if

xki = x∗i then Uk

i = U∗i = 1

To determine the degree of closeness of any xk to x∗ in terms of Uk and U∗, an appropriatefamily of distance membership functions is defined as follows

Lp (w, k) =

[n∑

i=1

wpi (1− Uk

i )p

]1/p

where w = (w1, . . . ,wn) is a vector of attribute preference levels wi, and the power p representsthe distance parameter 1 ≤ p ≤ ∞. Thus Lp (w, k) evaluates the distance between the idealalternative with membership grade U∗ and the actual vector of degrees of closeness induced byan alternative with membership grade Uk.

Observe that for p = 1, and assuming∑

wi = 1, one can write Lp (w, k) as

L1 (w, k) = 1−n∑

i=1

wi Uki

Similarly, for p = 2, one obtains

L2 (w, k) =

[n∑

i=1

w2i (1− Uk

i )2]1/2

and for p = ∞L∞ (w, k) = max

i{wi (1− Uk

i )}

In order to appreciate the numerical differences between L1 (w, k), L2 (w, k), and L∞ (w, k),consider ten alternatives evaluated with respect to two attributes. Numerical values are given inthe table reported in Figure 3.12.

Figure 3.12. Distance metrics (ten alternatives vs. two attributes)

Observe that x∗1 = x101 = 99 and x∗2 = x8

2 = 15. Therefore, x∗ = (99, 15) is the (infeasible) idealpoint.

101


The following formulae

Uki =

xki

x∗iand Uk

i =

[12

(xk

i

x∗i+

x∗ixk

i

)]−1

have been used for transforming attributes 1 and 2 into the distances from their respective anchorpoints, x∗1 = 99 and x∗2 = 15. For example

U61 =

x61

x∗1=

3699

= 0.3636

and

U62 =

[12

(x6

2

x∗2+

x∗2x6

2

)]−1

=[12

(1015

+1510

)]−1

= 0.9230

Note that only x8, x9, and x10 are nondominated by any other alternative in the set of ten.Applying the three measures of distance the decision maker derives the closeness of each xk tox∗. Both attributes are assumed as equally important, that is, w1 = w2 = 0.5.

For example,

L1 (w, 6) = 1− (w1U61 + w2U

62 ) = 1− (0.5× 0.363 + 0.5× 0.923) = 0.357

L2 (w, 6) = [w21 (1− U6

1 )2 + w22 (1− U6

2 )2]1/2 = [0.25 (1− 0.363)2 + 0.25 (1− 0.923)2]1/2 = 0.320

L∞ (w, 6) = max{w1 (1− U61 ); w2 (1− U6

2 )} = max{0.5 (1− 0.363); 0.5 (1− 0.923)} = 0.318

Observe that x10 is the closest to x∗ with respect to L1, while x9 is the closest with respect toL2 and L∞. Both compromise solutions x9 and x10 are encircled in the above table.

Compromise Solution

Thus the closest alternatives to the ideal can be defined as those minimizing Lp (w, k) with respectto some p. If

mini

Lp (w, k)

is achieved at xk(p), then xk(p) is called the compromise alternative with respect to p. Let C

denote the set of all such compromise alternatives for p = 1, . . . ,∞.

A number of interesting and useful properties are typical for compromise solutions:

• For 1 ≤ p ≤ ∞, since there is no xk in X such that Uki ≥ U

k(p)i for all i’s and Uk 6= Uk(p),

xk(p) is nondominated , even though it can be demonstrated that at least one xk(∞) is non-dominated.

• For 1 < p < ∞, xk(p) (and Uk(p)) is the unique minimum of Lp (w, k) on X.

102


It can be shown that Lp (w, k) is a strictly increasing function of

L′p (w, k) =

n∑

i=1

wpi (1− Uk

i )p

and thus xk(p) minimizes Lp if and only if it minimizes L′p. Note that L

′p (w, k) is a strictly convex

function and thus it gives a unique minimal point on X for 1 ≤ p ≤ ∞.

It is important to realize that the membership functions Lp and L′p are not independent of a

positive linear transformation of individual degrees of closeness (Yu, 1973). For example. letdk

i = αi Uki , αi > 0. Then

d∗i = αiU∗i = αi

and

Lp (w, k) =

[n∑

i=1

wpi (d∗i − dk

i )p

]1/p

transforms into

Lp (w, k) =

[n∑

i=1

wpi (αi − αiU

ki )p

]1/p

=

[n∑

i=1

wpi α

pi (1− Uk

i )p

]1/p

Thus changing the scale of the degrees of closeness has the same effect as changing the preferencelevels wi in Lp and L

′p.

The above observation is potentially very important. It suggests that the degrees of closenessare interrelated with the weights of importance. It seems that their compounding effect must beclearly understood to avoid ‘double weighting’. Decision makers should concentrate on manipulat-ing either Uk

i or wi, only exceptionally on both. The assignment of a particular set (U1i , . . . , Um

i )already implicitly contains and reflects the importance of the ith attribute. It is necessary tounderstand how much Uk

i reflect the underlying objective measurements and how much theyare products of a subjective reinterpretation. Otherwise, additional weighting by wi could onlyobfuscate the problem.

Before exploring the problem of weights in greater detail (see Chapter 4), it is advisable to gainsome understanding of the distance parameter p. So far it was worked with p = 1, 2,∞. Becausethe power 1/p may be disregarded, use L

′p and substitute ν = 1− Uk

i

L′p (w, k) =

n∑

i=1

wpi νp−1

i (1− Uki )

Observe that as p increases, more and more weight is given to the largest deviation (1 − Uki ).

Ultimately the largest deviation completely dominates, as when p = ∞ in L∞ and L′∞. It can

be concluded that p weights the individual deviations according to their magnitudes and acrossthe attributes. while wi weights deviations according to the attributes and irrespective of theirmagnitudes.

103


The compromise with respect to p then indicates a particular form of conflict resolution betweenthe available alternatives and the infeasible ideal. Observe that for p = 1 the minimization ofL′∞(w, k) reflects decision maker’s extreme disregard for individual deviation magnitudes - it is

their total sum they are after. On the other hand, for p = ∞ one tries to minimize the maximumof the individual deviations. All attributes are thus considered to be of comparable importance,and the compromise deviations are equalized as much as possible.

Figure 3.13. Typical compromise solutions and a compromise set

What about the cases of 0 < p < 1? Because the values of Uki are normalized between zero and

1, observe that the emphasis is reversed: As p changes from 1 to zero, the smallest deviation isgiven relatively larger and larger weight in the total sum, while the larger deviations are adjustedrelatively slightly. Figure 3.13 shows typical compromise solutions.

Nondominated Solutions

It has become an accepted belief that nondominated solutions provide a good general startingpoint (or sometimes even the endpoint) of rational decision analysis. So far, the concept ofnondominance was not used explicitly, and now its general usefulness will actually be disputedand its inferiority to the concept of compromise solutions will be discussed. If there is no j andxj in X such that U j

i ≥ Uki for all i’s and U j

i 6= Uki , then k represents a nondominated alternative

xk, which generates a nondominated outcome Uki in the above sense. That is, xk is nondominated

if, and only if, there is no other feasible alternative generating an outcome which can dominateit. It may be concluded that a good decision must yield a nondominated solution, and manyauthors actually start their procedures by eliminating all dominated xj from X.

104


At least two objections can be raised against such a conceptual framework:• If more than one alternative is required for a solution, then the second and subsequent

choices are not necessarily nondominated. The concept of nondominated solutions is fullyviable if and only if a single solution is required.

• If a ranking of alternatives is desired, then the set of all nondominated solutions does notprovide a good basis for the ranking. Even if only a single solution is the target, subsequentrankings of alternatives serve as an importani intermediate orientation tool, helping thedecision maker to explicate preferences.

The above points are of course only additional to such obstacles as computational difficulties,nondominated sets that are too large, and nonlinearity gaps. Yet they are much more important,since they do not allow anybody to generalize the concept. These objections, however, do notdispose of the fact that a single or the first selection is always to be nondominated. It is only thetendency to work exclusively with nondominated solutions which is questionable.

Figure 3.14. The problem of the second best

In Figure 3.14 the shaded boundary of D, denoted N , represents the set of all nondominatedsolutions. Recall that all compromise solutions, denoted C, are nondominated by definition. SinceC is always smaller or equal to N , the selection of a single solution is thus greatly simplified.If the decision maker is concerned about the second best alternative (after the ideal point) withdistance dk(2) from the ideal, it can be assumed that the kth alternative is the next closest to theideal. Observe that even if the solution with dk is obviously dominated by the one with dk(2), yetits initial omission could significantly distort the final choice of the second best. Correct rankingof alternatives, even if only partial, provides the essential information for the intermediate as wellas the final stages of a decision process.

105


Anti–Ideal

A concept similar to the ideal alternative, its mirror image, the anti–ideal , can be defined on anyproperly bounded set of feasible alternatives.

Among all achievable scores, for any ith attribute, there is at least one extreme value which isthe least preferred in relation to all remaining values. Define

xi∗ = mink

xki i = 1, . . . , n

and the collection of all such minima, the anti–ideal alternative, as

x∗ = (x1∗ , . . . , xn∗)

The anti–ideal might be either infeasible or feasible; in either case it could serve as a point ofreference during the process of decision making. The question is, do humans strive to be as closeas possible to the ideal or as far away as possible from the anti-ideal? The answer - both.

Since all alternatives are compared with the ideal (rather than directly among themselves), it isobvious that the ideal’s usefulness will depend on its discriminatory power, i.e., how well it aidsthe decision maker in distinguishing among the alternatives.

Return to the simple example of three alternatives, evaluated along a single dimension, generatinga vector of scores (5, 10, 11). The task is to choose among the first two alternatives, 5 and 10, usinghe third one, 11, as the ideal. To transform the scores into the corresponding degrees of closenessit will be assumed that a simple seminal function xk

i /x∗i provides a good approximation. Theideal will be displaced farther and farther away from the two values in question as in Table 3.3.

no. vector xki /x∗i xk

i − x∗i1 (5, 10, 11) (0.45, 0.9, 1) (6, ‘, 0)2 (5, 10, 20) (0.25, 0.5, 1) (15, 10, 0)3 (5, 10, 100) (0.05, 0.1, 1) (95, 90, 0)4 (5, 10, 500) (0.01, 0.02, 1) (495, 490, 0)5 (5, 10, 1000) (0.005, 0.01, 1) (995, 990, 0)...

......

...∞ (5, 10, ∞) (0, 0, 1) (∞, ∞, 0)

Table 3.3. Discriminatory power of the ideal

Observe that in the last two columns, the discriminatory power of the ideal diminishes as itsvalue approaches large numbers. Under such conditions a decision maker might attempt to usethe anti–ideal since its discriminatory power would still be observed.

Naturally, the compromise set based on the ideal is not identical with the compromise set basedon the anti–ideal. This fact can be used in further reducing the set of available solutions byconsidering the intersection of the two compromises. This possibility is illustrated in Figure 3.15.

106

3.6 – Multiattribute Selection

Figure 3.15. Ideal and anti–ideal

3.6 Multiattribute Selection

Multiattribute decision–making methods are developed to handle selection problems at conceptdesign level. In this class of problems, the ‘best possible solution’ is determined from a finiteand usually small set of alternatives. The selection is performed based on the evaluation of theattributes and their preference information.

Consider a multiattribute decision–making problem which has m attributes and n alternatives.Let C1, . . . , Cm and A1, . . . , An denote the attributes and design alternatives, respectively. Astandard feature of multiattribute decision–making methodology is the decision table as shownin Figure 3.16. In the table each row belongs to an attribute and each column describes theperformance of an alternative. The score aij describes the performance of alternative Aj againstattribute Xi.

Figure 3.16. The decision table

As shown in the decision table, weights w1, . . . , wm are assigned to the attributes. Weight wi

reflects the relative importance of attribute Xi to the decision, and is assumed to be positive. Theweights of the attributes are usually determined on subjective basis. They represent the opinionof a single decision maker or synthesize the opinions of a group of experts using a group decisiontechnique as well.

107


The values x1, . . . , xn are the final scores of the alternatives. Usually, a higher score for analternative means a better performance, so the alternative with the highest score is the best ofthe alternatives.

Multiattribute decision–making techniques partially or completely rank the alternative: a singlemost preferred alternative can be identified or a short list of a limited number of alternatives canbe selected for subsequent detailed appraisal.

Besides some monetary–based and elementary methods, the two main families in the multiat-tribute selection techniques methods are those based on the multiattribute utility theory (MAUT)and on the outranking methods.

The family of MAUT methods consists of aggregating the different attributes into a function,which has to be maximized. Thereby the mathematical conditions of aggregations are examined.This theory allows complete compensation between attributes, i.e. the gain on one attribute cancompensate the lost on another (Keeney and Raiffa, 1976).

The concept of outranking was originally proposed by Roy (1968). The basic idea is as follows.Alternative Ai outranks Aj if on a great part of the attributes Ai performs at least as good as Aj

(concordance condition), while its worse performance is still acceptable on the other attributes(non–discordance condition). After having determined for each pair of alternatives whether onealternative outranks another, these pairwise outranking assessments can be combined into apartial or complete ranking. Contrary to the MAUT methods, where the alternative with thebest value of the aggregated function can be obtained and considered as the best one, a partialranking of an outranking method might not render the best alternative directly. A subset ofalternatives can be determined such that any alternative not in the subset be outranked by atleast one member of the subset. The aim is to make this subset as small as possible. This subsetof alternatives can be considered as a short list, within which a good compromise alternativeshould be found by further considerations or methods.

3.7 Multiattribute Utility Theory

Utility theory , originated in economics with the first works of Bernoulli in the eighteenth century,was concerned at first with modelling preferences of a decision maker who has to select amongalternatives with risky outcomes. Utility is an abstract variable, indicating goal–attainment. Itcan also be considered as a ‘measure of satisfaction or value which the decision maker associateswith each outcome’ (Dieter, 2000). The diminishing marginal utility, which is represented as anutility function (Fig. 3.17), indicates that a decision ’s valuation of a risky venture is not theexpected return of that venture, but rather the expected utility from that venture.

The basic reason for using a utility function as a preference model in decision making is to capturedecision maker’s attitudes about achievable target and risk. Accomplishing high performanceand minimizing exposure to risk of an industrial product are two of the fundamental conflictinggoals that decision makers face. There are many other trade–offs that designers make in their

108

3.7 – Multiattribute Utility Theory

decisions; for example, cost versus safety of an industrial product. When purchasing industrialproducts, owners consider not only reliability and life span but also price, maintenance costs,operating expenses, and so on. Understanding trade–offs in detail, however, may be critical fora design team. It is suitable to model preference trade-offs between conflicting attributes usingutility functions. A utility function represents a mapping of decision maker’s preference onto amathematical function so allowing the preference information to be expressed numerically. For adecision–making problem with multiple attributes, a utility function is assigned to each attributeto reflect the decision maker’s preference information. Usually, a more preferred performancevalue of the attribute obtains a higher utility value. For example, if cost is identified as anattribute its associated utility function would have higher utility values for lower cost values.

Figure 3.17. Marginal utility

A relatively straightforward way of dealing with conflicting attributes is to create an additivepreference model ; that is, to calculate a utility score for each attribute and then weighting themappropriately according to the relative importance assigned to each one; and hence obtaininga function which expresses utility as a mathematical function of the decision–making criterion.Thus, the first task is identifying attributes, constructing their hierarchies, and creating usefulattribute scales. With attribute scales specified, the matter of understanding trade–offs may bedealt with. But limitations come with the simple additive form so that it is advisable to constructmore complicated preference models that are less limiting.

To overcome aforementioned limitations, multiattribute utility theory (MAUT) provides a formalbasis for describing or prescribing choices between alternative solutions whose properties are char-acterized by a large number of attributes. It evaluates utility functions intended to accuratelyexpress a decision maker’s outcome preferences in terms of multiple attributes. MAUT grew outof the unidimensional utility theory and its central dogma of ‘rational’ behavior. If an appro-priate utility is assigned to each possible outcome and the expected utility of each alternativeis calculated, then the best course of action for any decision maker is the alternative with thehighest expected utility.

MAUT does not tend to replace unidimensional utility functions defined over single attributes.Rather, it reduces the complex problem of assessing a multiattribute utility function into one ofassessing a series of functions. Such individually estimated component functions are then gluedtogether again; the glue is known as ‘value trade–offs’. Determining the trade–off often requires

109


the subjective judgment of the decision maker, who must reflect deeply on the question: howmuch achievement in terms of a given attribute is he/she willing to give up in return for im-proved, specific achievement in another objective?

The main purpose of MAUT is to establish a superattribute, to maximize the overall utility , asthe criterion for selecting a project. Objectives would be the attributes of the available alterna-tives, and a utility function would be constructed on their basis.

The main concept od multiattribute utility theory is that there is a single cardinal dimensionalvalue which can be used for ranking. MAUT combines a class of measurement models and scalingprocedures; for example, MAUT can be used to analyze preferences between alternative solutionsdescribed by attributes like cost, comfort, safety, and performance. MAUT may also be appliedas a decision aiding technology for decomposing a complex evaluation task into a set of simplersubtasks; for example, the decision maker might be asked to assess the utility of each alternativewith respect to each attribute and to assign importance weights to each attribute. Then anappropriate combination rule is used to aggregate utility across attributes. This theory allowscomplete compensation between attributes, i.e. the gain on one criterion can compensate the loston another (Keeney and Raiffa, 1976).

The family of MAUT methods consists of aggregating different utility functions into one functionto be maximized. Utility functions can be applied to transform the performance values of thealternatives against diverse attributes, both factual (objective, quantitative) and judgemental(subjective, qualitative) to a common, dimensionless scale. In the practice, the interval [0, 1] isused for this purpose. That allows a more preferred performance to obtain a higher utility value.A good example is an attribute reflecting the target of cost minimization. The associated utilityfunction must result in higher utility values for lower cost values.

It is advisable to perform some normalization on a non-negative row in the decision matrix. Theentries in a row can be divided by the sum of the entries in the row, by the maximal element inthe row, or by a desired value greater than any entry in the row. These normalizations can bealso formalized as applying utility functions.

3.7.1 Additive Utility Function

The essential problem in multicriterial decision making is deciding how best to trade off increasedvalue on one attribute for lower value on another. Making these trade–offs is a subjective matterand requires the decision maker’s judgment. If there is a large number of alternatives and stillfew attributes, it is preferable to attempt an explicitly assessment of the overall utility functionU (acquisition cost, horsepower, miles per ton, depreciation percent, maintenance costs, comfort)of the applicable multiple attributes.

One of the most common assumptions is that function U is additive. That means that is can bewritten as follows

U = w1u1 (cost)+w2u2 (hp)+w3u3 (nm/t)+w4u4 (depr.)+w5u5 (maint. costs)+w6u6 (comfort)

110


The preference model which can solve aforementioned problems is called additive utility function.The most comprehensive discussion, and the only one that covers swing weights, is that by vonWinterfeldt and Edwards (1986). Swing weighting considers differences in attribute scales, wherethe input is admittedly an approximation. It can be used in virtually any weight assessmentsituation (Clemen, 1996). Keeney (1980) and Keeney and Raiffa (1993) have devoted a lot ofefforts to this preference model. Edwards and Barron (1994) discuss some heuristic approachesto assessing weights, including the use of only rank–order information about the attributes.

The basic idea of creating an additive utility function, which has to be maximized, has beenapplied diffusely. Other decision-aiding techniques also use the additive utility function implicitlyor explicitly, including the Analytic Hierarchy Process (Saaty, 1980) and goal programming withnon–preemptive weights (Winston, 1987). For all of these alternative models, extreme care mustbe exercised in making the judgments on which the additive utility function is based.

To build an additive utility function properly, satisfactory ways must be found because of twoproblems. The first problem has to do with comparing the attribute levels (numerical scores) ofthe available alternatives, thus requiring a quantitative model of preferences for each alternativethat reflect the comparisons. The second problem depends on to how the attributes compare toeach other in terms of importance. As with the scores, numerical weights must be assessed foreach attribute.

Additivity and the determination of individual U ’s and W ’s require independence of attributes.For example, a shipowner’s preferences regarding the price of a ship should not be affected bychanges in its fuel mileage. The two attributes, cost and mileage, are not independent. It is theirrelationship that matters. The use of an additive utility function would be difficult to substanti-ate over the given set of attributes. One has to redefine the objectives, i.e., combine acquisitioncost and fuel mileage into an overall ship cost over the lifetime period. This could be achieved byadding acquisition cost, fuel costs, and maintenance costs, all properly discounted over lifetime.A typical ship would then be characterized by the triplet of total cost, horsepower, and comfort.The decision maker must test, of course, whether total cost, horsepower, and comfort are mutuallyindependent attributes. If they are not, one has to resort to additional combinations of attributes.

Consequently, one of the most important tasks of MAUT is to verify the independence of at-tributes. After independent attributes suitable for analysis have been established, all individualsingle–attribute utility function must be constructed.

Similarly, the scaling factors wi must be determined. These ‘weights’, unfortunately, do notmeasure the relative importance of each attribute. Their heuristic interpretation, and thus theintuitive appeal of the additive utility decomposition, is not among the strongest aspects ofMAUT. These constants do reflect the relative importance of each attribute as it changes fromits worst available to its best available value.

After ascertaining that the sum of the ‘weights’ is equal to 1, as it must be for the additivedecomposition of U , the decision maker will finally be able the estimate utility function. Thisoverall utility function U is then used for evaluating all available alternatives by simply substitut-ing their appropriate attribute levels in the above formula and searching for the highest value of U .

111


The additive utility function assumes that marginal utility functions are available, that is, U1(x1),U2(x2), . . . ,Um(xm) for m different attributes x1 through xm. In particular, it is assumed thateach marginal utility function assigns values of 0 and 1 to the worst and best levels on thatparticular attribute, respectively. The additive utility function is simply a weighted average ofthese different marginal utility functions, where the decision maker must assign weighting factorswhich reflect the relative contribution of each attribute to overall value. For a design solutionthat has numerical scores x1, . . . ,xm on the m attributes, the utility of this alternative maybe calculated by aggregating the weights and values obtained above according to the additivecombination rule

U(x1, . . . ,xm) = w1 U1(x1) + . . . + wm Um(xm) =m∑

i=1

wi Ui(xi) (3.1)

where the weights w1, . . . ,wm, associated with the attributes, reflect the relative importance ofeach attribute through a 1 point scale, where 0 points are assigned to the less important attributeand 1 point is assigned to the most important attribute. The final relative weights are computedby normalizing the sum of the points to one.

When one plugs in the worst level (x−i ) for each attribute, the marginal utility functions thenassign 0 to each attribute [U(x−i ) = 0], and so the overall utility is also 0. If one plugs in the bestpossible value for each attribute (x+

i ), the marginal utility functions are equal to 1 [U(x+i ) = 1],

and so the overall utility becomes

U(x+1 , . . . ,x+

m) = w1 U1(x+1 ) + . . . + wm Um(x+

m) = w1 + . . . + wm = 1 (3.2)

The multiattribute utility function given by equation (3.1) is based on two assumptions whichare verified to be appropriate for many realistic decision–making problems (Keeney and Raiffa,1993). They are:

• the utility functions of all the attributes are independent each other;

• the relative weight of an attribute can be determined regardless of the relative weights ofother attributes.

3.7.2 Risk and Utility Function

MAUT is primarily concerned about the independence of attributes, which allows to decomposethe evaluation of multiattribute alternatives into unidimensional attribute evaluations. Althougfhthere are no unidimensional decision problems as such, there are many situations where one issearching for the alternative which maximizes or minimizes a single measure of merit: requiredfreight rate, response to comfort of a passenger ship, etc. It is around situations of this typethat unidimensional utility theory has evolved. One notices that the examples of one–attributeproblems come across as being a bit forced. It is quire rare, and therefore difficult to imagine,that a comparison of decision alternatives proceeds in so simple–minded a fashion. Nearly always,there are multiple criteria to be taken into account.

112


The ‘riskless decomposition’ is, however, only a first step in MAUT. If alternatives become risky,the decomposition over attributes is closely linked to the decomposition over uncertain events.Therefore, it is useful to distinguish between riskless and risky decisions. In the former case, thedecision maker acts with perfect information, and thus is able to specify with complete certaintythe properties which will result from any combination of independent variables. In the lattercase, the decision maker has only partial information, and is assumed only to be able to assignsubjective probabilities to each of the possible properties. It can be argued that no decision istruly riskless - one does never act with perfect information - but for many purposes the risklesschoice assumption provides a reasonable approximation to the situation actually confronting thedecision maker.

By fitting curves through the individually assessed points achieved by assuming different valuesof probability p which reflect the actual attitude of the decision maker, one can gain same ideaabout the shape and a possible functional form of the utility function. If such a curve lies belowthe straight line connecting the endpoints of a given interval of values, it is said to be concaveand upward sloping (the curve opens downward) over the interval (Fig. 3.18). Concave utilityfunctions reflect a decision maker’s aversion to risk ; straight lines, i.e., linear utility functions,define risk neutrality or indifference; while convex (opening upward) utility functions (everywhereabove the straight line) define risk propensity , or risk seeking.

Figure 3.18. Utility functions

Most functions should display all three basic attitudes toward risk over certain nonoverlappingsubregions of possible attribute levels. Although most decision makers are not risk–neutral, it isoften reasonable for them to assume their utility curve is nearly linear for a particular decision,say, in the range of safety. Keep in mind that the utility function is only a model of a decisionmaker’s attitude toward risk.

Finally, one should consider that some values of the multiattribute decision models are oftensubjective. The weights of the attributes and the scoring values of the alternatives against thesubjective criteria contain always some uncertainties. It is therefore an important question howthe final ranking or the ranking values of the alternatives are sensitive to the changes of someinput parameters of the decision model.

113


3.8 Multiattribute Concept Design

A very competitive market of high–tech technical systems pushes to improve design methods,especially concept design where main characteristics of the industrial product are determined,which affect performance and total cost in its life-span. Certain controlling factors such as maindimensions and geometric characteristics, technical performance, etc., are not expected to varysubstantially upon the subsequent design phases. Possibilities for influencing total life-time costof a technical system are very high during concept design and decrease during following designphases, process development and manufacturing.

Therefore, the design process has to involve simultaneously a number of often conflicting goalsboth of technical and economic nature. Classical single objective optimization schemes, best il-lustrated by the design spiral, involve one criterion at time dealing with the others through pureheuristic preferences. On the other hand, some multicriterial optimization methods use hybridsolvers inadequate to comprehend the actual nature of design. In particular, Mistree et al. (1991),Sen (1992), Ray and Sha (1994) utilize procedures that require predefined preference informationon attributes to be applied in ranking feasible designs only. To circumvent this limitation thepowerful concept of nondominated (Pareto–optimal) designs is introduced in concept design, fullyillustrated by Zanic et al. (1992).

Experience with different multicriterial design procedures has indicated that the multiattributedecision-making (MADM) method is the most suitable for practical application to concept shipdesign. It treats design as a whole requiring only simple evaluation and selection procedure. Itdeliberately does not attempt to provide automatically an optimal solution, also because thatwould not decrease the risk associated with uncertainties intrinsic to design process,. On thecontrary, it is conceived so as to drive selection of the ‘best possible solution’ also on the basis ofaspiration levels required by the decision–maker.

Figure 3.19. Framework of multiattribute concept design

114

3.8 – Multiattribute Concept Design

A framework of multiattribute concept design suitable to robust design simulation is illustratedin Figure 3.19, where communication between the design process and external environment isemphasized.

In this method a large number of feasible designs is created by multiple execution of design modelwith sets of design parameters generated by an adaptive Monte Carlo method. Constraints, ofmin-max, crisp or fuzzy type, may be applied to any criterion value generated within the designmodel. A design is feasible in the feasible region of a selection problem if it meets the given setof requirements and all constraints are within acceptable limits. It is probable that some of themwill be superseded by other designs in every respect. It means that if there exists a design A,that in all relevant attributes is better than design B, then design B is dominated and thereforeis discarded in further consideration. Therefore, among all feasible designs only nondominatedones are retained as shown in Figure 3.20, which presents an illustration of a two attribute space.

These designs are optimal in the Pareto sense, i.e. the nondominated designs are better in atleast one attribute value than any other design. This approach makes it relatively easy to searchfor the multidimensional highly constrained subspace of feasible designs. At the same time thenumber of evaluations is high, which requires composing an efficient and reliable design model.The end product of multicriterial concept design procedure is a hyper–surface of nondominateddesigns, where selection of the ‘preferred design’ is performed only after sufficient number ofnondominated designs are generated. The procedure is implemented in a concept design shellcapable of searching the design space and monitoring the process.

Figure 3.20. Projection of two–dimensional attribute space

The MADM design procedure is illustrated in Figure 3.21, where the two main tasks, i.e. designmodelling and design selection procedure are bolded.

115


Figure 3.21. The multiattribute design procedure

Primary design attributes will be transformed into objectives in the process of decision makingat basic design level.

3.8.1 Basic Concepts

Identification of design problems implies specification of design criteria in terms of design vari-ables.

Design criteria, as measure of design effectiveness, can be divided into two broad groups:

• Criteria with a priori given aspiration level form design constraints (hard constraints) ondesign variables. They are used to distinguish between feasible and infeasible designs.

• Criteria that can be used as design performance measures are called design attributes (softconstraints). They can be transformed into design goals if direction of quality improvementis specified (minimization or maximization of objective function).

In the frame of MADM approach, each design can be represented as a point in the design spaceX spanned by NV design variables. It can also be considered as a point in the attribute space Yspanned by NA design attributes. Constraints in X space bound the subspace of feasible designs(Fig. 3.22).

The evaluation process is a mapping (f : X → Y ) from X on Y space, i.e. calculation of at-tribute functions values for given values of design variables. The design process is inverse mapping(f−1 : Y → X) from Y space to X space, i.e. identification of most appropriate values of designvariables for given aspiration levels of attributes.

116


When preference among attributes is established (level, ordinal, cardinal, etc.), techniques ofmulticriterial decision making (lexicographic ordering, MAUT, goal seeking, etc.) can be appliedto evaluate feasible solutions. The outcome of concept design process may be optimal, efficient,satisficing or preferred solution (Trincas et al., 1987). Optimal solution is rarely attainable inmulticriterial problems. It is reached if all attribute functions reach their extreme targets si-multaneously. Ideal solution (utopia, zenith), usually infeasible, is a point defined by favorableextremes of attribute values. Anti–ideal point (nadir) is the most unfavorable combination ofattribute levels. Nondominated, efficient solutions are of primary importance, since they corre-spond to designs which are better than any other feasible design in at least one attribute. Amongsatisficing solutions the set of nondominated solutions is characterized by the fact that thereexists no solution in which increase in any criterion would not cause decrease in at least anothercriterion. Satisficing solutions correspond to all designs that completely satisfy aspiration levelsof the decision maker. Finally, the preferred solution (or efficient solution) is one selected amongnondominated designs when design team’s preferences are completely established.

Figure 3.22. Concept design mapping

It is worth noting that selection of nondominated solutions usually does not require any preferenceinformation across attributes or on their numerical scores. While evaluation is a straightforwardprocedure, the inverse mapping implied in the design process is entangled with many mathemat-ical problems.

However, the increased speed of workstations provides the opportunity to model the complexdesign problem as a multiple evaluation process by intentionally creating a discrete number offeasible designs by multiple execution of a design model . The mathematical design model isdriven by applying an adaptive Monte Carlo method which guides the random generation of alarge set of design alternatives. Constraints of minmax, crisp or fuzzy type, may be applied toany attribute value generated within the mathematical model. A design is feasible if it complies

117


with all crisp constraints. Feasible designs are tested for dominance, based on membership gradefunctions for intra–attribute preference and design team’s subjective preference across attributes(inter–attribute preference).

Among feasible designs only nondominated ones are retained. This approach makes it possibleto search the multidimensional highly constrained subspace of nondominated designs with littledifficulty. If sufficient density of nondominated points is generated one may obtain a ‘discrete’inversion of Y on X mapping for the most important part of design space. Therefore, it is possi-ble to replace optimization–oriented MODM approach with much simpler MADM, which impliesonly mathematically easier procedures for evaluation and selection. Details of the underlinedprocedure are available in specialized literature (Zanic et al., 1992, 1997; Grubisic et al. (1997)).

Present increased interest for simulation, random generation and Monte Carlo methods showsthat simple approaches to complex problems of decision making are at hand in many engineer-ing disciplines. Based on hardware development, increased computational speed makes MADMmethods feasible for many otherwise mathematically cumbersome problems. These so called ‘lastresort methods’ seem to have the potential of the analytical methods if implemented on parallelcomputing environments. This situation resembles the period of fast replacement of complexanalytical methods in the mechanics of continua (i.e. structural analysis) by simple finite elementmethod as soon as the solution of large systems of linear equations become possible in reasonablecomputing time. In the sequel the basic steps of the proposed design process are given.

3.8.2 Concept Design Process Description

The concept design process is divided in two phases, that is, the phase of design points generationin the design space and the phase of design selection in the attribute space, which may requireintroduction of metrics. Both are included in a shell which drives main activities, i.e. problemformulation, solution generation, dominance filtering, etc. The process of design selection is basi-cally interactive since the designer would change and refine his/her preferences (sensitivity study).

Information on the synthesis shell, on graphic representation of design and attribute space, aswell as on organization of data structure have been extensively detailed by Trincas et al. (1994).The MADM shell consists of the following main functions:

• define min-max design subspace;

• generate sample designs via an adaptive Monte Carlo method;

• evaluate feasibility of generated designs subject to specified crisp constraints;

• define intra-attribute fuzzy functions and inter-attribute preference matrix;

• transform design attributes to membership grade via fuzzy functions;

• define dominance structure for filtering nondominated solutions by building metrics of val-ues of design variables and attributes;

• refine designs around specific nondominated designs to establish robustness of the preferredsolutions.

118


Main steps of design generation in the design space

.The logical flow of the generation activity is:

• Determination of the ranges of the design variables defined by minmax and linear con-straints only. They are determined via series of simple linear programming problems withmaximization/minimization of each design variable. Errors in linear constraints definitionare spotted here.

• Random generation of points in the X space as defined in the first step. Standard randomnumber generator is used.

• Evaluation of design feasibility. The analytical model for calculation of constraint values isused first to filter feasible designs. All infeasible points with respect to the linear constraintsset are immediately discarded.

• Evaluation of attribute functions, i.e. Y space image of the design

yi = yi(x) ; i = 1, . . . , NA

• Transformation of attribute evaluation functions yi(x) through the membership grade func-tion Ui(yi). Function range is interval 0-1:

yi = Ui [yi(x)] ; i = 1, . . . , NA

• The most simple ‘more is better’ dominance structure (Pareto dominance) can be builtusing yi(x) values. Filtering of nondominated designs among feasible ones can be now per-formed with very efficient dominance algorithm.

• Control of number of nondominated designs with respect to total number of feasible designsbased on given resolution in X and Y space. If prescribed density is not achieved moreefficient method of design generation is needed since designer is principally interested inefficient, nondominated designs. Formula for number of nondominated points (ND) asfunction of number of feasible points (NF ) and number of attributes (NA) reads (Calpineand Golderg, 1976)

ND = 1 + ln(NF ) +ln2(NF )

2!+ . . . +

lnNA−3(NF )(NA− 3)!

+ 0.5572lnNA−2(NF )(NA− 2)!

+lnNA−1(NF )(NA− 1)!

• Random generation of new design points in mini hypercubes with decreased range aroundnondominated solutions, hence yielding the ‘chain generation’ of a great number of non-dominated points or alternatively discrete approximation of nondominated hypersurface inY space. With proper convergence checks in X and Y space the number of random pointsin the ‘minicube’ based design process is much smaller than the number of points yieldedby crude generation in the primary screening of the design space (see Fig. 3.22).

• Random generation of designs around extreme points for all attributes. In this manner theextremes of nondominated subspace are obtained more accurately.

119


This procedure is developed in the design space and no intricacies connected with normalization,inter–attribute and intra–attribute relationships are needed. Moreover the process can be auto-mated and executed off-line.

The result of this process are ND nondominated designs defined in two matrices:

- design matrix : X(NV, ND) - X–space coordinates of nondominated designs

- decision matrix : Y(NA, ND) - Y–space coordinates of nondominated designs

Decision matrix plays the most important role in design selection in Y space, while design matrixgives X space counterpart of any selected design.

Main steps of design selection in the attribute space

.The logical flow of the selection procedure is:

• Interactive definition of the preference information regarding relationship between attributesor, alternatively, preference information on design alternatives. Selection among ND de-signs is then very fast.

• Extraction of weight factors from subjective preference matrix (Saaty method, least squaresmethod or entropy method).

• Selection of attribute value type (direct, membership grade formulation).

• Normalization of attribute values (vectorial or linear scale)

yji =

yji√√√√

NA∑

k=1

(yjk)

2

; i = 1, . . . , NA ; j = 1, . . . , ND

yji =

yji − y∗imin

y∗imax− y∗imin

for maximization

yji =

y∗imax− yj

i

y∗imax− y∗imin

for minimization

• Calculation of L1, L2 (Euclid) and L∞ (Cebycev) norms with respect to the ideal point (yi)or prescribed goal for all nondominated designs (y), with given attribute weights (w):

Lp(w, y) = |y − y∗|w, p =NA∑

i=1

[wi |yj

i − y∗i !p]1/p

; i = 1, . . . , ND ; p = 1, 2,∞

• Stratification of the set of nondominated solutions into layers according to the value func-tion (i.e. L1, L2 or L∞ norms or other). Stratified X or Y space can be used for

1. graphic presentation;2. experiments in interpolation (i.e., design variable as a function of design attribute val-

ues in specified limited stratum).

120

3.9 – Multiobjective Design

• Extraction of ‘preferred solutions’ according to given preference structure. In this way de-signs of minimal distance from ideal or other prescribed goals are obtained and displayed.

• Random generation of designs around all ‘preferred solutions’. In this manner more accu-rate value of the best possible design is obtained in few steps.

3.9 Multiobjective Design

Multiobjective decision making (MODM) methods are suited to handle the MCDM problems inbasic design where the ‘preferred design’ is the final outcome of the concept design. That is, op-timization will be performed to maximize or minimize the objectives associated with subsystemsof the complex technical system In general, these objectives are often conflicting so the optimalsolution is usually a ‘compromise design’ that aims to satisfy simultaneously the different objec-tives at best.

Figure 3.23 lists some MODM methods that are capable of dealing with this class of problems.These MODM methods are classified into different groups mainly ‘based on the types of preferenceinformation and timing for eliciting preference information’ (Sen and Yang, 1998).

Figure 3.23. Classification of MODM methods

121


A decision tree for MODM technique selection was also developed by Sen and Yang (1998), asillustrated in Figure 3.24. By using this figure, the decision maker can construct a choice rule toselect a method by examining the decision rule or the computational procedure of the methods.

Figure 3.24. Decision tree for MODM technique selection

122


3.9.1 Genetic Algorithm

The Genetic Algorithm (GA) is a type of evolutionary algorithm used in computing to find ap-proximate solutions to optimization and search problems. The basis of GA is the use of anadaptive heuristic global search algorithm originated from the evolutionary ideas of natural se-lection and genetic. Genetic Algorithm technique consists of a structured random algorithm thatreproduces Darwin’s evolutionary process of survival of the fittest in natural system. By mimick-ing this process, genetic algorithm is able to evolve solutions to realistic problems by performingan intelligent exploitation of a random search within a defined design space. It has been demon-strated that GA is capable of efficiently finding the global optimum for a MODM problem.

The GA has five major steps: initialization, evaluation, selection, crossover and mutation. Fig-ure 3.25 depicts the steps of the genetic algorithm.

Figure 3.25. Genetic algorithm

The GA starts from the creation of an initial population which is chosen randomly from thedesign space defined by the independent variables. The individuals of the population are usuallyencoded as binary strings of 0s and 1s (called chromosomes). The individuals are then evaluatedand a fitness value is assigned to each individual. Once evaluated, the ‘parents’ for each gener-ation are stochastically selected based on their fitness; this step is known as ‘selection’. Afterthe new population is established, the genetic material of the parents is combined to create chil-dren by performing a crossover operation (structural exchange of characters). The crossover isaccomplished by randomly selecting a splice point in the binary string and then swapping bitsbetween the parents at the splice. Once the crossover is done, the mutation operation is applied.

123


In the process of mutation, the value of a bit is changed (0 changes to 1 and vice versa) with aspecified mutation probability. Thus a new offspring is established and their value is evaluatedagain. If the best individuals and fitness are obtained the algorithm stops; otherwise, the processis repeated for several cycles until the fitted solutions are selected.

This approach requires specification of the objectives to be optimized and identification of pa-rameters with their variation range, which solve the design problem.

3.9.2 Goal Programming

Goal programming (GP) grew out of the need to deal with some unsolvable linear programmingproblems. The term ‘goal programming’ was used by its developer (Ignizio, 1983) to indicatethe search for an ‘optimal’ program, for a mathematical model that is composed solely of goals.These problems are generated by the fact that in real life decisions often have to be taken incircumstances where the decision maker sets goals which are not necessarily attainable but whichserve nevertheless as a standard to aspire to or as a reminder of long term aims. The objectivethen becomes the attainment of a solution which comes as close as possible to the indicated goals.However ‘closeness’ to goals is a vague concept. Goal programming gets around this problem byan ordinal ranking of goals, so that lower priority goals are attended to only after the higherpriority goals have either been satisfied or the solution has reached the point beyond which nofurther improvements are possible or desirable.

The general form of a linear programming problem may be written as

maximize Z = f(x) =n∑

j=1

cjxj

subject ton∑

j=1

aijxj ≤ bi , i = 1, 2, . . . , k

≥ bi , i = k + 1, k + 2, . . . , l

= bi , i = l + 1, l + 2, . . . ,m

xj ≥ 0 , j = 1, 2, . . . , n

where xj are the independent variables defining the problem and

cj profit associated with each unit of xj

aij amount of resource i used (i.e. steel, foreign exchange) or contributionto goal i (i.e. cost savings) associated with each unit xj

bi total amount of resource i available or target value for goal i

It is quite simple to conceive of a set of constraints that are incompatible and hence do not definea feasible region. For example, the target value of savings may not be attainable with the amountof steel or foreign exchange available. Under such circumstances, the view may be taken that theobjective of the decision maker is simply to satisfy a set of constraints.

124


The goal programming formulation would then be

minimize Z =p∑

k=1

m∑

i=1

(wki−Pk d−i + wki+Pk d+i ) i+ = i + m

subject ton∑

j=1

aijxj + d−i − d+i = bi i = 1, 2, . . . , m

xj ≥ 0 j = 1, 2, . . . , n

d−i , d+i ≥ 0 i = 1, 2, . . . , m

Pk >> Pk+1 k = 1, 2, . . . , p− 1

where

p number of priority levelsPk preemptive weights, or priority weightswki− , wki+ weights for d−i and d+

i , respectivelyd−i , d+

i deviation variables representing negative and positive deviation from bi

Therefore, the goal programming formulation is one in which the weighted sum of the deviationsfrom target values or goals, bi, is minimized, according to some externally imposed priority rank-ing of goals.

The units in which the goals, bi, would be expressed would, in general, be incommensurate. Ingoal programming, however, incommensurate goals are quite acceptable, and indeed are the norm.The only requirement is that goals at the same priority level need to be in commensurate units.

It is quite conceivable that when several different commensurate deviation (slack) variables areto be treated at the same priority level, the decision maker may wish to give somewhat greaterimportance to a few of these deviation variables. That is. even at the same priority level. somegoals may be more important than others. This can be conveniently handled by multiplying theappropriate deviation variables by suitable weighting factors.

The priority ordering of goals reflects the attitude of a particular decision maker. However, onedoes not need to adhere to some rigid priority ordering. Although in many decision situations,high and low priority goals may be easily discernible. Preemptive weights determine the hierarchyof goals Goals of higher priority levels are satisfied first, and only then may the lower prioritygoals be considered. Lower priority goals cannot alter the goal attainment at higher priority levels.

Depending on the nature of the goals and on whether the decision maker wishes to meet a goalexactly, or to overachieve or underachieve it, several alternate formulations can be envisaged.

125


Minimization of sum of deviations (d−i + d+i )

This will ensure that goal i is met exactly, if possible. For example, the transport task performedby the replacement vessels must be exactly equal to the transport task performed previously bythe scrapped vessels. Hence the net transportation task performed by scrapped and replacementvessels should be zero, if the former is taken to be negative and the latter taken to be positive.To achieve this goal, therefore, the sum of the positive and negative deviations from the goal ofzero net transportation should be minimized.

Minimization of d−i

This will ensure that d−i is driven to zero if possible. Since d−i represents a shortfall in meeting thegoal i, this minimization implies that the goal should be achieved or overachieved. The solutionwill try to ensure that

n∑

j=1

aijxj ≥ bj

Lower target values for any achievement index like cost savings are goals of this type.

Minimization of d+i

This has the opposite effect to that in above formulation. The solution will try to ensure thatn∑

j=1

aijxj ≤ bj

That is, a solution that underachieves the goal i will be preferred. Limitation on the use of scarceand expensive resources like foreign exchange will result in goals of this type.

Very large or very small bi values

Using a very large bi value and minimizing d−i is equivalent to ensuring a large value ofn∑

j=1

aijxj

This, therefore, effectively means an attempt to achieve as high a value of this goal as possible.It will be obvious that using a very small bi value and minimizing d+

i value will have the oppositeeffect.

126


3.9.3 Compromise Programming

Compromise programming handles constraints and bounds separately from the system goals,contrary to the goal programming where all is converted into goalsCompromise programming can be viewed as an effort to approach or emulate the ideal solution asclose as possible. Observe that the ideal solution is the situation where every objective does getthe largest possible outcome. In a technical sense, the decision maker measures the ‘goodness’ ofany compromise by its closeness to the ideal or by its remoteness from the anti–ideal. Thus, thenotion of distance and its measurement cannot be avoided in decision making.

One of the best–known concepts of distance is the pythagorean theorem, for measuring the dis-tance between two points whose coordinates are known. That is, given points x1 ≡ (x1

1, x12) and

x2 ≡ (x21, x

22) in a plane, distance d between them is found to be

d =√

(x11 − x2

1)2 + (x12 − x2

2)2

But instead of measuring the distance between any two points, one can be interested to comparingthe distances of various points from one point of reference, the ideal point x∗. That is, variouspoints xk are compared in terms of their distance from point x∗. In a two–dimensional case

d =√

(x∗1 − xk1)2 + (x∗2 − xk

2)2

Observe that the geometric concept of distance here is very simple: differences between thecoordinates of the ideal point and the corresponding coordinates of a given point are computedand raised to the second power. These second powers are then added, and the square root of thetotal is taken. The concept is readily generalizable to higher dimensions. If there are n objectives(measured along n coordinates) characterizing the points being compared, the distance formulabecomes

d =

[n∑

i=1

(x∗i − xki )

2

]1/2

(3.3)

In the above formula, the deviations (x∗i − xki ) are raised to the power p = 2; in general, they

could be raised to any power p = 1, . . . ,∞ before being summed. Also different deviations,corresponding to different objectives i, can be weighted by differential levels of their contributionto the total sum, weights wi. A generalized family of distance measures, dependent on power p,can be expressed as follows

dp =

[n∑

i=1

wpi (x∗i − xk

i )p

]1/p

(3.4)

where wi > 0 and p ranges from 1 to ∞.

127


For p = ∞, the above expression is reduced to

d∞ = maxi{wi (x∗i − xk

i )} (3.5)

Consider points (8, 6) and (4, 3). Their distance can be compared numerically for different levelsof p; wi is assumed to be equal to 1 for all i; that is, weights can be ignored.

In Table 3.4 observe the effect of the increasing p on the relative contribution of individualdeviations: the larger the p, the greater the emphasis given to the largest of the deviations informing the total. Ultimately, for p = ∞, the largest of the deviations completely dominatesthe distance determination. One can also see why the values p = 1, 2 and ∞ are strategicallyimportant: p = 1 implies the ‘longest’ distance between the two points in a geometric sense - onehas to go over the full extent of all deviations. This measure d1 is therefore often referred to as a‘city block’ measure of distance. The shorter distance between any two points is a straight line,and this is achieved for p = 2. For p > 2 one has to consider distances that are based on even‘shorter’ measures of distance than a straight line. How do decision makers interpret them andwhy are they interested in them?

p (x11 − x2

1)p (x1

1 − x21)

p total sum dp

1 4 3 7 7.0002 16 9 25 5.0003 64 27 91 4.4985 1024 243 1267 4.174...

......

......

∞ 4∞ 3∞ ∞ 4∗

Table 3.4. Measurements of distance

Recall that distance is employed as a proxy measure for human preference and not as a purelygeometric concept only. In multicriterial decision making distance is used as a measure of resem-blance, similarity, or proximity with respect to individual coordinates, dimensions, and attributes.Thus decision makers cannot narrow their attention to only one p or even to the interval of geo-metrically intuitive measures of 1 ≤ p ≤ 2.

The concept of the ideal solution and that of a set of compromise solutions were studied in Section3.6. These concepts are fully applicable to the problems of mathematica programming, where afeasible set X is described indirectly through a set of functions serving as constraints.

Given a set of decision variables x = (x1, . . . xn) and a set of constraint functions gr(x) ≤br, r = 1, . . . , m, a feasible set X is the set of decision variables x which satisfies the constraints,X = {x | gr(x) ≤ br. Let f1,f2, . . . ,fl denote the objective functions defined on X; that is, fi(x) isthe value achieved at x from X with respect to the ith objective function (i = 1, . . . , l). Observethat f = (f1, . . . , fl) maps the n–dimensional set X into its l–dimensional image f(X) = Y. Itis often useful to explain some concepts in terms of the objective (attribute space) Y rather thanthe decision space X.

128


It is straightforward to demonstrate the mapping from X to Y on a simple numerical geometricexample of linear multiobjective programming.

maximize f1(x) = 3x1 + x2

f2(x) = x1 + 2x2

subject to x1 + x2 ≤ 7x1 ≤ 5

x2 ≤ 5x1, x2 ≥ 0

In Figure 3.26 observe that f2 is maximized at point B = (2, 5), achieving value f2(B) = 12;function f1 reaches its maximum at point C = (5, 2) and the value f2(C) = 17. The heavyboundary N of X denotes the set of nondominated solutions, including both corner points B andC. All other solutions in X are inferior to those in N.

Figure 3.26. Set of nondominated solutions N and the ideal x∗ in the decision space

The ideal solution, although infeasible, is x∗ = (4.4; 3.8); it provides maxima for both objectivefunctions. Check that f1(x∗) = 17 and f2(x∗) = 12. Next, points 0, A, B, C, D, and x∗ aretranslated into corresponding value space Y = f(X), consisting of points y = f(x) based on allx from X.

In Figure 3.27, obtained mapping from Figure 3.26, y∗ = f(x∗); that is, f(4.4, 3.8) = (17, 12).All other points of the polyhedron X are similarly translated. Observe that a point y from Y isnondomiinated if there is no other y in Y such that y ≥ y and y 6= y.

129


Any point y from Y is a compromise solution if it minimizes

dp =

[l∑

i=1

wpi (y∗i − yi)p

]1/p

for some choice of weight wi > 0,∑

wi = 1, and 1 ≤ p ≤ ∞. It can be shown that each compromisesolution satisfying these conditions is nondominated. For 1 < p < ∞ these compromise solutionsare also unique.

Figure 3.27. Set of nondominated solutions f(N) and the ideal y∗ in the objective space

3.9.4 Physical Programming

Physical programming (PP) addresses the inherent multicriterial nature of design problems sinceit provides a flexible approach to obtaining a satisficing solution taking ino account designers’preferences deterministically. It captures the designers’ physical preferences in forming the ag-gregate multiobjective function. The physical programming method (Messec, 1996) eliminate theneed for weight setting in multicriterial optimization. It places the design process into a flexibleand natural framework and completely eliminates the need for iterative weight settings, providingthe means of direct expression of designers’ preferences.

In the physical programming method the design team specifies quantitative ranges of differentdegrees of desirability for each design objective, using three different classes. Each class comprisestwo cases defined as soft (S) and crisp (C). As depicted in Figure 3.28, the preference soft classeswith respect to each objective are referred to as follows:

• class l-S: smaller–is–better,

• class 2-S: larger–is–better,

• class 3-S: centre–is–better.

130


For each criterion, a class function gi is established that constitutes a component of the multi-objective preference function to be minimized. A lower value of the class function is consideredbetter than a higher value. The ideal value of the class function is zero. The preceding classifi-cation offers significantly more flexibility than the typical weighted criterion approach providedthe proper shape of the soft curves can be determined.

Figure 3.28. Class–function classification

Class Functions

Physical programming exploits information which is already available to designers. They mustknow the desired features of the ‘best solution’. Physical programming class functions are usedas indicators of aspiration levels (Trincas, 2002).

Figure 3.29 depicts the qualitative meaning of each class. The value of the design metric underconsideration, gi, is on the horizontal axis, and the function that will be minimized for thatdesign metric, gi, hereby called the class–function, is on the vertical axis. Each class comprisestwo cases, hard and em soft, referring to the sharpness of the preference. All soft class functionsbecome constituent components of the multiobjective function.

Under conventional design optimization approaches (for example, weighted sum approach), thedesign metric for which class l-S or 2-S applies would generally become part of the multiobjectivefunction with a multiplicative weight, while all the hard classes would simply become constraints.Handling the cases of class 3–S is a more difficult matter. One approach would be to use a

131


positive or negative weight, depending on whether the current value of the pertaining designmetric is on the right or left of the most desired value during optimization. Choosing the rightassociated weights would involve significant trial–and–error. Physical programming removes thistrial–and–error entirely by using the 3–S class functions which essentially adapt to the currentrange in objective space during optimization. These functions essentially adapt to the currentregion in objective space during optimization. The shape of the class–function depends on thestated preference of the decision maker.

Figure 3.29. Class–function ranges for the ith objective

132


Physical Programming Lexicon

As mentioned previously, physical programming allows designers to express preferences for eachcriterion with more specificity and flexibility than by simply using the terms minimize, maximize,greater than, less than, or equal to; as would be done in conventional mathematical programmingformalism. Physical programming circumvents the limitations of such a problem formulationframework by employing a new expansive and flexible lexicon. The PP lexicon comprises termsthat characterize the degree of desirability of six ranges for each generic criterion of classes l-Sand 2-S, and ten ranges for classes 3-S.

To illustrate the physical programming lexicon, consider the case of class l-S, shown in Figure 3.29.The ranges are defined as follows, in order of decreasing preference:

• highly desirable (gi ≡ gi1): a value for which every value of the objective is ideal;.

• desirable (gi1 ≤ gi ≤ gi2): a range that is desirable;• tolerable (gi2 ≤ gi ≤ gi3): a range that is tolerable;• undesirable (gi3 ≤ gi ≤ gi4): a range that is undesirable;• highly undesirable (gi4 ≤ gi ≤ gi5): a range that is highly undesirable;• unacceptable (gi ≥ gi5): this range is treated as infeasible.

These terms form the backbone of the physical programming and bring a new flexibility, rigor,and deliberate impression to the design process. The shape of the class functions depends onthe numerical values of the range targets. The targets gi1 through gi5 are physically meaningfulvalues that are provided by the designer to quantify the preference associated with the ith designobjective. Further insight into these ranges can be gained by examining the generic shapes of theclass functions (Fig. 3.29). Since the curve in the highly desirable range is nearly flat, any twopoints in the range are of a nearly equivalent desirability level.

The class functions map design objectives into nondimensional, strictly positive real numbers.This mapping, in effect, transforms design objectives with different units and physical meaningonto a dimensionless scale through a unimodal function. Figure 3.29 illustrates the mathematicalnature of the class functions and shows how they allow designers to express the preference ranges.Consider the class function l-S where six ranges are defined. The designer specifies parameters gi1

through gi5. When the value of the design objective, gi, is less than gi1 (highly-desirable range)further minimization would express explicit indifference of the class function. When, on the otherhand, the value of the metric, gi, is between gi4 and gi5 (highly–undesirable range), the value ofthe class function is large, requiring further, significant minimization of the class–function. Thebehavior of the other class functions is indicated in Figure 3.29. Stated simply, the value of theclass–function for each design objective governs the optimization path in the objective space forthat design metric.

In the case of class l-S, the class function will seek to minimize its criterion only until the targetvalue gi1 is reached. The decision maker will easily preclude the possibility of obtaining domi-nated solutions by setting gi1 to a value outside of the feasible space in order to exclude solutionsin the ideal range. A similar discussion would apply to the cases of class 2-S.

133


In cases where the designers have only to care staying within some limits, the hard option wouldapply. For a hard criterion, only two ranges are defined, that is, feasible and infeasible. Itshould be noticed that all of the soft class functions will become constituent components of themultiobjective function to minimize, and that all of the hard class functions will simply appearas crisp constraints, as it will be seen in the linear physical programming (LPP) model. Thequantity on the vertical axis, gi, is what will be minimized. LPP’s effectiveness comes from itsability to shape the class function to suite the typical complex texture of the design performance.It should be emphasized that while choosing weights is difficult and undesirable, choosing targetvalues is instead a welcome option. This is because weights are physically meaningless, whiletargets are physically meaningful.

Mathematical Model

Since the weakest link in MODM process is the development of the multiobjective function, mak-ing this step an effective phase of the design optimization process is a critical objective of thephysical programming method. Another objective of PP is to simplify application of computa-tional optimization. To this end, physical programming brings user–friendliness to optimizationsince it removes the weighting process and frees the decision maker from the details and intrica-cies of numerical optimization. A short description of the mathematical procedure for applyingphysical programming follows.

The implementation of physical programming will define the path from design variables to actualmultiobjective function to minimize via a nonlinear programming code. As stated before, thebasic design process starts choosing the design variables which are then mapped into the designobjectives, gi. Numerically the goodness achieved by a design objective depends on the valueachieved on the class type assigned to the objective and on the aspiration values associated withit (gil to gi5). The sum of all the class functions, which represent mappings of the design objectivs,equals the multiobjective function (Messac, 1996).

Intra–objective preference

Intra–objective preference function is aimed at expressing desiderata from the design team withrespect to each objective. Some useful relationships are herein provided that result from class–function properties (see Fig. 3.29).

• The value of the class function is always the same at each range boundary, for any class–type.Only the location of the boundary changes from objective to objective. As a consequence, asone travels across a given region (say, desirable), the change in the class function will alwaysbe of the same magnitude. This behavior of the class function values at the boundaries isthe critical factor that makes each range type having the same numerical consequence fordifferent criteria. This common behavior has a normalizing effect, and results in favorablenumerical conditioning properties.

134


The improvement that takes place as one travels across the kth region reads gk and isexpressed by the relation

gk = gi(g+ik) = gi(g−ik) ; 2 ≤ k ≤ 5 ; g1 ≈ 0 (3.6)

where i and k denote a generic objective number and range–intersection, while primes ’+’and ’-’ refer to Class l-S and Class 2-S, respectively.

• The independence of the class function from both the considered objective and the regiontype determines the following condition

gk = β(nsc − 1)g(k−1) ; 2 ≤ k ≤ 5 ; β > 1 (3.7)

where nsc denotes the number of soft criteria and β is a convexity parameter. To applyequation (3.7), a small positive number (say, 0.l) needs to be given for g2.

• Keeping convexity requirement yields

g+ik = g+

ik − g+i(k−1) ; g−ik = g−ik − g−i(k−1) ; 2 ≤ k ≤ 5 (3.8)

g1 > gi [gi1] (3.9)

where g±ik is the length of the kth range of the ith objective.

The magnitude of the slopes of the class function of the ith objective changes from rangeto range and takes the form

s±ik =

(∂g±i∂gi

)

gi=gik

; s±i(k−1) =

(∂g±i∂gi

)

gi=gi(k−1)

; 2 ≤ k ≤ 5 (3.10)

More than to the goal programming method, the physical programming offers conceptual sim-ilarities to fuzzy optimization where membership grade functions play a role similar to that ofclass functions in physical programming.

Formulation of generic preference

Following Messac (1996), the class function in the range k > 1 can be expressed by means of avery flexible spline as

gik = T0(ζik) gi(k−1) + T1(ζik) gik + T0(ζik, λik) si(k−1) + T1(ζik, λik) sik ; k = 2 , . . . , 5 (3.11)

where

T0 = 0.5 ζ4 − 0.5 (ζ − 1)4 − 2 ζ + 1.5

T1 = −0.5 ζ4 + 0.5 (ζ − 1)4 + 2 ζ − 0.5

T0 = λ [0.125 ζ4 − 0.375 (ζ − 1)4 − 0.5 ζ + 1.5]

T1 = λ [0.375 ζ4 − 0.125 (ζ − 1)4 − 0.5 ζ + 0.125]

135


ζik =gi − gi(k−1)

gi(k) − gi(k−1); 0 ≤ ζik ≤ 1

λik = gi(k) − gi(k−1)

For range 1, the class–function expression is given by an exponential function that reads

gi1 = gi1 ·exp[si1

gi1

(gi − gi1)]

(3.12)

The quantities sik and si(k−1) are exactly the weights in the linear programming model of theclass functions. ln effect, equation (3.11) states that so long as all these weights are positive, theclass function will be convex. The important point is to observe that convexity can always besatisfied increasing the magnitude of the convexity parameter β through an iterative procedure.

Multiobjective preference function

Once the decision maker decides to which class the objective belongs and defines the range–targetvalues, the intra–objective preference is complete. As to inter–objective preference, the worst ob-jective is always treated first.

Finally, the physical programming mathematical model associated with objective class functionstakes the form of a multiobjective preference function

minx

G(x) = Log

{1

nsc

nsc∑

i=1

gi[gi(x)]

}(3.13)

subject to

gi(x) ≤ gi5(x) (for class 1–S)

gi(x) ≥ gi5(x) (for class 2–S)

gi5L(x) ≤ gi(x) ≤ gi5R(x) (for class 3–S)

while the crisp classes are treated as

gi(x) ≤ gimax(x) (for class 1–C)

gi(x) ≥ gimin(x) (for class 2–C)

gi(x) = giν(x) (for class 3–C)

xjmin < xj < xjmax (for variable constraints)

where pedix ’j’ denotes each generic crisp constraint, while ximin , ximax , gimin , and gimax , representminimum and maximum values of independent variables and crisp constraints, respectively, whilethe giν ’s help to define the equality constraints.

136

3.10 – Advanced Decision Support Systems

Using the logarithmic operator in forming G(x) has the result of mapping a domain that spansseveral orders of magnitude to one that typically involves one order of magnitude only. Themultiobjective function in equation (3.13) is the actual function that nonlinear programmingcodes minimize with possible minor reassignments.

3.10 Advanced Decision Support Systems

Decision Support Systems (DSS) originally developed to aid managers in the decision makingprocesses at the beginning of 1970’s. Various DSSs were proposed in the past decades, and thesystems mainly aimed at easing the decision makers’ tasks in decision making process. They covera wide variety of systems, tools and technologies, and integrates them into a computer systemto facilitate the decision making process. Various definitions have been given to this term bythe researchers in the early days after this term just emerged. Keen and Scott–Morton (1978)proposed the following classic definition: DSSs combine the intellectual abilities of humans withthe abilities of computer systems in order to improve the quality of the decisions made. DSSsare computer–based systems that are used in order to support decision makers in ill–structuredproblems. This definition was extended by Sage (1991) and Adelman (1992) to the followingfinal formulation: Decision Support Systems are interactive computer–based systems (software),which use analytical methods such as decision analysis, optimization algorithms, etc., in orderto develop appropriate models that will support decision makers in the formulation of alternativesolutions, the resolution of the reactions amongst them, their representation, and finally in thechoice of the most appropriate solution to be implemented .

Therefore, DSS is a computer-based information system that uses data and multicriterial decisionmaking (MADM and MODM) models to organize information for decision situations and interactwith decision makers to expand their horizons. It highly alleviates the decision maker’s burdenin dealing with the problems which are semi–structured or ill–structured, and supports all thephases in a decision making process. In addition, the systems are able to store and process a largeamount of knowledge at much higher speed than the human mind, and therefore can considerablyimprove the decision making quality.

3.10.1 Distributed Decision Support Systems

In today’s engineering design decision makers seldom make decision alone, since the decisionmaking problems are becoming more and more complicated. This complexity inspires the idea ofdecomposing the complex decision making problem into partial problems and handling each bydifferent groups of experts. This motivation results in the emergence of Distributed Decision Sup-port Systems (DDSS), specific DSSs to handle Distributed Decision Making (DDM) situations.DDM is defined as a decision making process in which the participating people own different spe-cialized knowledge, execute different specialized tasks, and communicate with each other througha computer environment, which aims at the support of the entire process (Chi and Turban, 1995).

137


With the development of IT, the utilization of DDM was dramatically expanded. The infor-mation technologies have the on–line and rea–time information capabilities through which theDDM can be fulfilled easily and efficiently because the ITs offer immediate response and easyinformation exchange. Most of the current information systems provide such capabilities that canbe characterized as distributed on–line systems. More recently, the web–based DSSs are viewedas clients linked to a server hosting the DSS application, and have great potentials to inspirenew distributed, cooperative or collaborative decision support strategies impacting the very corestructures of the DSSs.

3.10.2 Artificial Intelligence

After the first calculating machine, the abacus, was invented by the Chinese in the twenty–sixthcentury BC, the ability to mechanize the algebraic process intrigued humans until great progresswas made as the digital computer was invented by Charles Babbage in 1856. Just after the SecondWorld War the digital computer was rapidly employed in many areas and alleviated some of theonerous and tedious work that people engage. At almost the same time, researchers make effortsto create machines with some sort of intelligence. In 1950, Alan Turin, the father of ArtificialIntelligence (AI) presented the famous Turing test to show that it is possible for a machine tothink as a human being. Artificial intelligence became an area of computer science that focuseson making intelligent machines, especially intelligent computer programs, that can engage onbehaviors that humans consider intelligent. Today with the rapid upgrading of the computerand sixty years of research, AI has been utilized in various fields, such as decision making, gameplaying, computer vision, speech recognition, expert systems and so on.

Expert systems (ES) are viewed as the most well–known application field of artificial intelligence.ESs are problem–solving programs that combine the knowledge of human experts and mimic theway human experts reason. The goal of an expert system is to emulate the problem–solvingprocess of an expert whose knowledge was used in developing the system.

Figure 3.30. Typical structure of an expert system

Figure 3.30 presents the typical structure of an expert system, which consists of three modules:user interface, inference engine, and knowledge base. The operation procedure starts from theuser’s task querying through the user interface. After receiving the query from the user, the infer-ence engine manipulates and uses information in the knowledge base to form a line of reasoning.

138


And then the response is provided by the expert system via the user interface. Further inputinformation may be required from users until the system reaches a desired solution.

The user interface system allows the user to interact with the expert system to accomplish acertain task. It manages the interaction, which can be menus, natural language or any other typeof data, between the system and users. A user can be (i) an expert who maintains and developthe system, (ii) an engineer who employs the system to solve his/her specific problem or (iii) atechnician who is trained for the problem solving procedure.

The inference engine is the control mechanism that applies information present in the knowledgebase to task–specific data to arrive at a decision. It organizes and controls the steps taken to solvethe problem. The most widely used problem-solving method at this point is IF–THEN rules, andthe expert systems that use the rules for reasoning are called rule–based systems. In rule–basedsystems, inference engines utilize the idea that if the condition holds then the conclusion holdsto form a line of reasoning. There are a few techniques for drawing inferences from a knowledgebase such as forward chaining, backward chaining and tree search. Forward chaining starts froma set of conditions and moves to a conclusion while backward chaining has the conclusion firstand tries to find a path to get the conclusion. Tree search is applied when the knowledge base isrepresented by a tree, and the reasoning process is performed by checking the nodes around theinitial node until a terminal node is found.

The knowledge base is the core of the advisor system. Its main purpose is to provide the con-nections between ideas, concepts, and statistical probabilities that allow the inference engine toperform an accurate evaluation of a problem. The knowledge base stores facts and rules, whichinclude both factual and heuristic knowledge and support the judgment and reasoning of theinference engine.

139


Appendix

Terminology for MCDM Problems

There are many terms mostly used in MCDM literature such as alternatives, criteria, attributes, objectives,goals, decision matrix and so on. There are no universal definitions of these terms, since some authorsmake distinctions in their usage, while many use them interchangeably.

Alternatives

Alternatives are the finite set of different solutions which are available to the decision maker.

Criteria

Criteria are a measure of effectiveness of performance. They are the basis by which the performance of adesign is evaluated. Criteria may be hard (constraints) or soft (attributes), according to the requirementsanalysis and the actual problem setting.

Attributes

Attributes are generally referred as designed-to criteria that describe the performance, properties, andalike, of a technical system (size, weight, range, speed, payload, reliability, cost, etc.). They provide ameans of evaluating the levels of aspiration achieved on various targets. That is why they are often re-ferred as soft constraints. Each design alternative can be characterized by a number of attributes chosenby the decision maker. Although most criteria are structured on a single level, sometimes, if there is alarge quantity of them, this structure is based on a hierarchical composition.

Decision Variables

A decision variable is one of the specific choices made by a decision maker. For example, the weight of anindustrial product is a decision variable.

Constraints

Constraints are temporarily fixed requirements on attributes and decision variables which cannot be vio-lated in a given problem formulation, that is, upper and lower bounds cannot be exceeded, and strictlyrequirements must be satisfied precisely. Constraints divide all possible solutions (combinations of vari-ables) into two groups: feasible and infeasible. They are crude yes or no requirements, which can be eithersatisfied or not satisfied.

Decision Weights

Most of the decision–making methods assign weights of importance to the criteria. To represent the relativeimportance of the attributes, the weighting vector is to be given as w = (w1,w2, . . . ,wn). The attributeweights are cardinal weights that represent the decision maker’s absolute preference.

Objectives

Objectives are unbounded, directionally specified (maximization/minimization) requirements which aretobe pursued to the greatest extent possible. It is very likely that objectives will conflict with each other inthat the improved achievement with one objective can only be accomplished at the expense of another.They generally indicate the desired direction of change, i.e. the direction in which to strive to do better asperceived by the decision maker. No particular value of the objective is set a priori as a reference point.Only its maximum/minimum is sought within the limits of feasibility determined by constraints and goals.

140


Goals

Goals (synonymous with targets) are useful for clearly identifying a level of achievement to strive forward,to overcome or not to exceed. They are often referred as hard constraints because they are fixed to limitand restrict the set of feasible solutions from the generated alternatives. In many cases, the terms objec-tive and goal are used interchangeably. They are temporarily fixed requirements which are to be satisfiedas closely as possible in a given problem formulation. That is, upper and lower bounds as well as fixedrequirements are to be approached as closely as possible. Goals allow for fine tuning through their controlover the degree of satisfaction.

Decision matrix

Common to many MCDM techniques is the concept of decision matrix, or comparison matrix, goal decisionmatrix, or project impact matrix. It concisely indicates both the set of alternatives and the set of attributesbeing considered in a given problem.. A decision matrix D is a (m× n) matrix on which the element xij

represents the ‘score’ or ‘performance rating’ values of the alternative Ai, i = 1,2,...,m with respect to aset of attributes xj , j = 1,2,...,n . Hence an alternative is denoted by a row vector

xi = (xi1, xi2, . . . , xin)

whereas a column vector of attributesxj = (x1j , x2j , . . . , xmj)T

shows the contrast of each alternative with respect to attribute xj .

When expressed in numerical terms, the element xij is commonly termed the jth attribute value foralternative i.

Classification of MCDM Solutions

MCDM problems will not always have a unique solution. Depending on their nature, different types aregiven to different solutions.

Optimal solution

An optimal solution to a MCDM problem is one which results in the maximum value of each of the at-tribute or objective functions simultaneously. That is, x∗ is an optimal solution to the problem iff x∗ ∈ Xand f(x∗) ≥ f(x) for all x∗ ∈ X.

Since it is the nature of MCDM criteria to conflict to each other, usually there is no optimal solution to aMCDM problem.

Ideal solution

The concept of the ideal solution is essential for the approach of multicriterial decision making. An idealsolution may be indicated also as optimal solution, superior solution, or utopia.In a MADM problem the ideal solution A∗ to the decision problem is a hypothetical alternative whoseCartesian product combines the best achievements of all attributes given in the decision matrix. Formally,

A∗ = (x∗1, x∗2,, . . . , x

∗j , . . . , x

∗n)

x∗j = maxj

Uj(xij) , i = 1, 2, . . . ,m

where U(·) indicates the value/utility function or the membership grade of the jth attribute.

141


In a MODM problem: maxx∈x

[f1(x), f2(x), . . . , fk(x)], X = {x |gi(x) ≤ 0, i = 1, 2, . . . , m}, the ideal solution

is the one that optimizes each objective function simultaneously, i.e.

maxx∈X

fj(x) , j = 1, 2,..., k

An ideal solution can be defined as

A∗ = (f∗1 , f∗2 , . . . , f∗j , . . . , f∗k )

where f∗j is a feasible and optimal value for the jth attribute (objective) function. This solution is generallyinfeasible; if it were not, then there would be no conflict among objectives.

Though an ideal solution does not actually exist, the concept of an ideal solution is essential in the devel-opment of MCDM methods. For example, a compromise model is based on the idea of obtaining the ‘bestpossible solution’, which is the closest to the ideal solution.

It should be noted that an ideal solution in MADM problems is driven by the existing solutions. On theother hand, in a MODM environment the objective ideal is the best solution that any alternative couldpossibly obtain. Hence locating an ideal solution is one of the topics in MADM study if a decision makeruses nonmonotonic value/utility functions or membership grade functions.

Nondominated solution

This solution is named differently by different disciplines: non-inferior solution and efficient solution inMCDM, a set of admissible alternatives in statistical decision theory, and Pareto–optimal solution in eco-nomics.

A feasible solution x∗ in MCDM is called a nondominated solution if and only if there exists no otherfeasible solution that will yield an improvement in one attribute without causing a degradation in at leastanother attribute. In other words, a nondominated solution is achieved when no attribute can be improvedwithout simultaneous detriment to at least another attribute.

The nondominated solution concept is well utilized for the second–level screening process in MADM. Butthe generation of a large number of nondominated solutions reduces significantly its effect of screening thefeasible solutions in MODM problems; rather the nondominated concept is used for the sufficient conditionof the final solution.

Satisficing solution

A satisficing solution (Simon, 1955) is a reduced subset of feasible solutions with each alternative exceedingall the aspiration levels of each attribute. Satisficing solutions need not to be nondominated. It may wellbe used as the final solution though it is often utilized for screening out infeasible solutions. Whether asolution is satisficing belongs to the level of the knowledge and ability of the decision maker.

Preferred solution

The preferred solution, which is a nondominated solution, represents the solution that is the mostlysatisficing solution for the decision maker. Under this view, MCDM methods can be referred to as decisionaids to reach the preferred solution on condition that the subjective preferences of the decision maker areobserved.

142

Bibliography

[1] Adelman, L.: Evaluating Decision Support, Expert Systems, John Wiley & Sons, New York, 1992.

[2] Asimov, M.: Introduction to Design, Prentice–Hall, Englewood Cliffs, 1962.

[3] Box, G.E.P., Draper, N.R.: Empirical Model-Building, Response Surfaces, John Wiley & Sons, 1987,New York.

[4] Calpine, H.C., Golding, A.: Some Properties of Pareto–Optimal Choices in Decision Problems,OMEGA, 1976, Vol. 4, no. 1, pp. 141–147.

[5] Clemen, R.T.: Making Hard Decisions - An Introduction to Decision Analysis, 2nd Edition, DuxburyPress, Pacific Grove, 1995.

[6] Coombs, C.H.: On the Use of Inconsistency of Preferences in Psychological Measurement , Journalof Experimental Psychology, Vol. 55, 1958, pp. 1–7.

[7] Dieter, G.E.: Engineering Design, McGraw–Hill, Boston, 2000.

[8] Edwards, W., Barron, F. H.: SMARTS, SMARTER: Improved Simple Methods for MultiattributeUtility Measurement , Organizational Behavior, Human Decision Processes, 1994, Vol. 60, pp. 306–325.

[9] Geoffrion, A.M.: Solving Bicriterion Mathematical Programs, Operations Research, Vol. 15, no. 1,1967, pp. 39–54.

[10] Grubisic, I., Zanic, V., Trincas, G.: Sensitivity of Multiattribute Design to Economic EnvironmentShortsea Ro-Ro Vessels, Proceedings, 6th International Marine Design Conference, IMDC’97,Newcastle-upon-Tyne, 1997, pp. 201-216.

[11] Hazelrigg, G.A.: System Engineering: An Approach to Information–Based Design, Prentice–Hall,Upper Saddle Rive, 1996.

[12] Hey, J.D.: Uncertainty in Microeconomics, Martin Robertson, Oxford, 1979.

[13] Hwang, C.L., Yoon, K.: Multiple Attribute Decision Making; Methods, Application - A State-of-the-Art Survey , Springer–Verlag, Berlin–Heidelberg, 1981.

[14] Ignizio, J.P.: Generalized Goal Programming: An Overview , Computers, Operations Research, Vol. 5,no. 3, 1983, pp. 179-197.

[15] Keen, P.G.W., Scott–Morton, M.S.: Decision Support Systems: An Organizational Perspective, MA:Addison–Wesley, 1978.

[16] Keeney, R.: Siting Energy Facilities, Academic Press, New York, 1980.

[17] Keeney, R., Raiffa, H.: Decisions with Multiple Objectives - Preferences, Value Tradeoffs, New York:John Wiley & Sons, 1976.

[18] Kesselring, F.: Technical–Economic Designing , VDI–Zeitschr., 1964, Vol. 106, no. 30, pp. 1530–1532.

[19] Lee, D., Kim, S.Y.: Techno-Economic Optimization of an LNG Carrier with Multicriteria inPreliminary Design Stage, Journal of Ship Production, 1996, Vol. 12, no. 3, pp. 141-152.

[20] Li, Y., Mavris, D.N., De Laurentis, D.A.: The Investigation of a Decision–Making Technique Usingthe Loss Function, Proceedings, 4th AIAA Aviation Technology, Integration, Operations (ATIO)

143

Bibliography

Forum, AIAA-2004-6205, 2004.

[21] Messac, A.: Physical Programming: Effective Optimization for Design, AIAA Journal, Vol. 34, no. 1,1996, pp. 149-158.

[22] Mistree, F., Smith, W.F., Kamal, S.Z., Bras, B.A.: Designing Decisions: Axioms, Models, MarineApplications, Proceedings, 4th International Marine Systems Design Conference, IMSDC’91, SNAJ,Kobe, 1991, pp. 1-24.

[23] Morgenstern, O.: Thirteen Critical Points in Contemporary Economic Theory - An Interpretation,Journal of Economic Literature, Vol. 10, 1972, pp. 1163–1189.

[24] Moskowitz, H., Wright, G.P.: Operation Research Techniques for Management , Prentice–Hall, 1979.

[25] Neumann, J. von, Morgenstern, O.: Theory of Games, Economic Behavior , Princeton UniversityPress, Princeton, 1944.

[26] Pareto, V.: Manuakle di Economia Politica, con una Introduzione alla Scienza Sociale, SocietaEditrice Libraria, Milano, 1906.

[27] Ray, T., Sha, O.P.: Multicriteria Optimization Model for a Containership Design, Marine Technology,Vol. 29, no. 4, 1994, pp. 258–268.

[28] Roy, B.: Classement et choix en presence de points de vue multiple (la methode electre), RAIRO,Vol. 2, 1968, pp. 57–75.

[29] Roy, B.: Vers une methodologie generale d’aide a la decision, METRA, Vol. XIV, no. 3, 1975,pp. 459–497.

[30] Roy, B.: A Conceptual Framework for a Prescriptive Theory of ‘Decision Aid’ , in ‘Multiple CriteriaDecision Making, TIMS Studies in teh Management Sciences, Starr, Zeleny eds., North–HollandPublishing, Vol. 6, 1977, pp. 179–210.

[31] Roy, B.: Partial Preference Analysis, Decision–Aid: the Fuzzy Outranking Concept , in ‘ConflictingObjectives in Decision’, John Wiley & Sons, New York, 1994.

[32] Roy, B., Vincke, P.: Pseudo-merites et systemes relationnels de preferences - nouveaux conceptset nouveaux resultats en vue de l’aide a la decision, Universite de Paris–Dauphine, Cahiers duLAMSADE, no. 28, 1980.

[33] Saaty, T.L.: A Scaling Method for Priorities in Hierarchical Structures, Journal of MathematicalPsychology, 1977, Vol. 15, no. 3, pp. 234–281.

[34] Saaty, T.: The Analytic Hierarchy Process, New York: McGraw–Hill, 1980.

[35] Sage, A.P.: Decision Support Systems Engineering , John Wiley & Sons, New York, 1991.

[36] Sen, P.: Marine Design: The Multiple Approach, Transactions RINA, Vol. 122, 1992.

[37] Sen, P., Yang, J.B.: Multiple Criteria Decision Support in Engineering Design, Springer–Verlag,London, 1998.

[38] Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication, The University of IllinoisPress, Urbana, III, 1947.

[39] Simon, H.A.: A Behavioral Model of Rational Choice, Quarterly Journal of Economics, 1955, Vol. 69,no. 1, pp. 99–114.

[40] Stadler, W.E.: Fundamentals of Multicriteria Optimization, Multicriteria Optimization in Engineer-ing, in the Science, Plenum Press, New York, 1988.

[41] Starr, M.K., Greenwood, L.H.: Normative Generation of Alternatives with Multiple Criteria Evalu-ation, in Multiple Criteria Decision Making, Starr & Zeleny eds. North Holland, New York, 1977,pp. 111–128.

144

Bibliography

[42] Trincas, G., Zanic, V., Grubisic, I.: Optimization Procedure for Preliminary Design of FishingVessels, Proceedings, 2nd Symposium on ‘Technics, Technology in Fishing Vessels’, Ancona, 1989,pp. 22–31.

[43] Trincas, G.: Addressing Robust Concept Ship Design by Physical Programming , (invited paper),Proceedings, Fourth International Conference on Marine Industry, Barudov, Bogdanov eds., Varna,Vol. II, 2002, pp. 29-38.

[44] Trincas, G., Zanic, V., Grubisic, I.: Comprehensive Concept Design of Fast Ro–Ro Ships byMultiattribute Decision-Making , Proceedings, 4th International Marine Design Conference, IMDC94, Delft, 1994, pp. 403-418.

[45] Winterfeldt, D. von, Edwards, W.: Decision Analysis, Behavioral Research, Cambridge, CambridgeUniversity Press, 1986.

[46] Winston, W.: Operations Research: Applications, Algorithms. Boston: PWS–KENT.

[47] Yu, P.L.: A Class of Solutions for Group Decision Problems, Management Science, 1973, Vol. 19,no. 8, pp. 836-946.

[48] Yu, P.L., Zeleny, M.: The Set of All Nondominated Solutions in Linear Cases, a MulticriteriaSimplex Method , Journal of Mathematical Analysis, Applications, 1975, Vol. 49, pp. 430-468.

[49] Zadeh, L.A.: Outline of a New Approach to the Analysis of Complex Systems, Decision Processes, in‘Multiple Criteria Decision Making’, Cochrane, Zeleny eds., South Carolina Press, Columbia, 1973,pp. 686–725.

[50] Zeleny, M.: A Concept of Compromise Solutions, the Method of the Displaced Ideal , Computers,Operations Research, Vol. 1, no. 4, 1974, pp. 479–496.

[51] Zeleny, M.: Linear Multiobjective Programming , Springer–Verlag, Berlin/Heidelberg, 1974.

[52] Zeleny, M.: Multiple Criteria Decision Making , McGraw-Hill, New York, 1982.

[53] Zanic, V., Grubisic, I., Trincas, G.: Multiattribute Decision-Making System Based on RandomGeneration of Nondominated Solutions: An Application to Fishing Vessel Design, in ‘PracticalDesign of Ships, Mobile Units’, PRADS’92, Caldwell, Ward ed., Elsevier Applied Science, Vol. 2,1992, pp. 1443-1460.

[54] Zanic, V., Grubisic, I., Trincas, G.: Mathematical Models for Ship Concept Design, Proceedings,Eighth Congress of the ’International Maritime Association of Mediterranean’, IMAM’97, Istanbul,1997, Vol. 1, pp. 5.1-7-5.1-16.

145

Chapter 4

Multiattribute Solution Methods

A complex technical system needs complex decisions. To assist in this difficult task, decisionsupport methods have been developed. They try to supersede heuristic choices based on experi-ence or intuition and allow decisions based on scientifically based arguments. A decision supportmethod tries to model the preference system in the mind of the decision maker.

Among decision support methods, multiattribute decision making (MADM) deals with the method-ologies of selection from among different design alternatives associated with multiple attributes.Although only the last decades have seen the effort to introduce the concept of multiple criteriainto the normative decision making process, studies on multiple criteria have a long tradition.Since MADM has found acceptance in the areas of the business sector, management science,economics, psychometrics, marketing, applied statistics, decision theory, and so on, the disciplinehas created several selection methodologies. Consequently each area has developed methods forits own particular application, mostly to explain, rationalize, understand, or predict decision be-havior, but not to guide the decision making process.

Multiattribute decision–making techniques can partially or completely rank the alternatives: asingle most ‘preferred alternative’ can be identified or a short list of a limited number of ‘bestpossible alternatives’ can be selected for subsequent detailed appraisal.

In discrete alternative multiattribute decision–making problems, an integrated procedure for de-cision making can be the following:

1. identification of necessary attributes for the problem;

2. elicitation of weights to attributes by individual;

3. allocation of weights to attributes by group consensus;

4. ranking alternatives by individual;

5. screening alternatives by group for the final decision;

6. choosing the most preferred alternative.

147

4 – Multiattribute Solution Methods

Multiattribute decision making is defined in a narrow sense as a decision aid to help a decisionmaker to identify the ‘best possible alternative’ that maximizes his/her satisfaction with respectto more than one attribute.

Various approaches have been developed. Some of the methods, particularly those from the psy-chology literature, are oriented toward describing the process by which such decisions are made inorder to better understand them and to predict actions and choices in future decision situations.Other approaches, particularly those from the management science literature, are directed to-ward providing the decision makers with practical techniques which can be used to improve theirdecision making. Intermediate methods are more suitable in the engineering field, where modelsare structured in terms of the designer’s actual choice behavior, but are then used normatively,MADM methods apply to problems where a decision maker is choosing or ranking a finite numberof alternatives which are measured by two or more relevant attributes.

Most multiattribute decision–making techniques for problem–solving on discrete alternatives fo-cus on value evaluation, such as setting standards for evaluation attributes, assigning weight foreach attribute, grading each alternative under individual or group criteria, synthesizing utilities,and ranking alternatives. These techniques usually assume that the set of attributes is predefinedor there exists some kind of agreement before the MADM solving procedure starts.

The decision maker’s representation of the attributes and constraints serves as input to his/heractual model for conceptual design process. The MADM modelling for outranking and selectionstarts out at the stage where the decision situation has been formulated and the nondominated al-ternatives have been identified. The general concepts of dominance structures and nondominatedsolutions play an important role in describing the decision problems and the decision maker’srevealed preferences. Usually, there exist a number of Pareto optimal solutions, which are con-sidered as candidates of final decision–making solution. The nondominated designs together withtheir attribute levels form the decision matrix , which is the concise expression of a MADM prob-lem.

4.1 Decision Matrix

A multiattribute decision–making problem can generally be characterized by a decision matrixwhich indicates both the set of alternatives and the set of attributes being considered in a givenproblem. The decision matrix summarizes the ‘raw’ data available to the decision maker atthe start of the analysis. A decision matrix has a row corresponding to each alternative beingconsidered and a column corresponding to each attribute being considered. A problem witha total of m alternatives characterized by n attributes is described by an m × n matrix A asshown in Figure 4.1. Each element of the matrix is the ‘score’ or ‘performance rating’ of thatrow’s alternative with respect to that column’s attribute, and can be stated either numericallyor verbally. When expressed in numerical terms, the element aij is commonly termed the jth

attribute value for the alternative ith.

148

4.1 – Decision Matrix

Figure 4.1. The comparison matrix

The square comparison matrix A is a ‘reciprocal matrix’ and has all positive elements. Its elementsaij compare the alternatives i and attributes j of a decision problem. These elements are said tobe consistent if they respect the transitivity rule

aij = aik ·akj (4.1)

together with the reciprocity rule

aij =1

aji(4.2)

where j > k > i are any alternatives of the comparison matrix1.

The multiattribute decision–making approaches can be viewed as alternative methods for com-bining the information on a problem’s decision matrix together with additional information fromthe decision maker in order to determine final ranking, screening, or selection from the alter-natives. Besides the information contained in the decision matrix, all but the simplest MADMtechniques require additional information from the decision maker in order to arrive at a finalranking, screening, or selection. For example, the decision matrix provides no information aboutthe relative importance of the different attributes to the decision maker, nor about any minimumacceptable, maximum acceptable, or target values for particular attributes.

It is important that the decision matrix include only those attributes which vary significantlyamong one or more alternatives and for which the decision maker considers this variation tobe important. Some attributes may be important as threshold criteria (constraints), in thatalternatives are excluded from further consideration if they do not meet the threshold requirement.But with respect to such constraints, variation among alternatives that all pass the screeningrequirement is irrelevant. In such cases these attributes should not be included in the decisionmatrix after the screening stage of the analysis.

1A comparison matrix is reciprocal because its inferior part is reciprocal to the superior part and all the elementsof the principal diagonal are 1. Therefore a transitivity test of one of the two parts of the matrix is sufficient;hence, for each element aij a number of j − (i + 1) equations (4.1) have to be respected.

149


To overcome these limitations, research is still dealing with model choice problem, which is asystematic analysis of decision procedures if one method of multicriterial decision making makesmore sense than another for a specific problem if necessary.

4.2 Measuring Attribute Importance

One crucial problem in MADM is to assess the relative importance or weights of different at-tributes, which should be consistent to the design strategy. If the decision problem entails severalconflicting attributes, experience has shown that not all of them are of equal importance to thecustomer. The importance of preferences among attributes should not be underestimated sincea multiattribute decision making procedure can produce different designs depending on the wayattribute information is processed. Weights may represent the opinion of a simple decision makeror synthesize the opinions of a group of experts using a group decision techniques In other terms,which attribute is more important than another is often not for the individual decision maker todecide, but must be part of a comprehensive analysis from a group of decision makers.

Designers may determine the weighting coefficients on the basis of their perception, what impliesthat they are subject to incomplete information and subjective judgement.

In case of n attributes, the vector with attribute weights reads

wT = {w1,w2, . . . ,wj , . . . ,wn}T

wheren∑

j=1

wj = 1

Typically, the method used for assigning weights to attributes is primarily based on the nature ofthe problem and the available information. Several approaches have been proposed to determineweights. They may be distinguished in two types of weights, i.e. subjective weights and objectiveweights.

The subjective weights are supposed to convey the strategic task as perceived by the decisionmaker. They form the inter–attribute preferences and correspond to situations where the dataof the decision matrix, in terms of a set of alternative solutions and values of associated decisionattributes, is unknown.

The objective weights are derived from computational analysis of the attribute space. There-fore they are intrinsic properties of the attribute space (intra-attribute preferences). When thedecision matrix information is available, methods such as the ‘entropy method‘ and the ‘linearprogramming technique’ can be used to assess weights of attributes.

Apart from the oldest method named mean of the normalized values, four methods are availablefor assessing the cardinal weights in the MADM environment:

• the eigenvector method;

150

4.2 – Measuring Attribute Importance

• the weighted least-square method;• the entropy method;• the LINMAP method.

Among these four methods, the entropy and LINMAP methods both require the decision matrixto be a part of the input. However, at the design stage, the requirement is to use the weights tofind the best alternative and not to choose the best one from an enumeration of a set of alterna-tives. Therefore, these methods cannot be used in conjunction with the MODM environment.

The weighted least–square and eigenvector methods are based on a so–called fundamental scaleconcept in the MADM environment. They can be used only if the information about the relativeimportance of each attribute over another is known. This is represented in a square matrix,termed as the pairwise comparison matrix.

4.2.1 Mean of the Normalized Values

This is the oldest method and is based on three steps to derive the priorities as follows:

• sum of the elements of the column j;• normalization of column j;• mean of the row i.

The ‘mean of the normalized values’ calculates exact priorities only for consistent matrices. Inthe case of inconsistent matrices, this method cannot be mathematically justified. No theory isknown for inconsistent matrices.

4.2.2 Eigenvector Method

The decision maker is supposed to judge the relative importance of n attributes. Saaty (1977)introduced a method of scaling ratios using the principal eigenvalue of a positive pairwise com-parison matrix , which has to respect the principle of consistency. Let the positive pairwisecomparison matrix A be

A =

A1

A2...

Am

a11 a12 . . . a1n

a21 a22 . . . a2n...

.... . .

...am1 am2 . . . amn

=

w1

w1

w1

w2. . .

w1

wn

w2

w1

w2

w2. . .

w2

wn...

......

wm

w1

wm

w2. . .

wm

wn

(4.3)

where A1, A2, . . ., Am are the design alternatives among which the decision maker has to choose.X1, X2, . . ., Xm are the attributes with which each alternative performances are measured, whileaij is the rating of design Ai with respect to attribute Xj , and wj denotes the weight of attributeXj . The number of independent pairwise evaluations is n(n− 1)/2.

151


Multiplying A by the desired priorities vector w yields

A·w =

w1

w1

w1

w2. . .

w1

wn

w2

w1

w2

w2. . .

w2

wn...

......

wm

w1

wm

w2. . .

wm

wn

·

w1

w2

...wn

= λ

w1

w2

...

wn

= λ·w

or

(A− λ I)·w = 0 (4.4)

where λ is the eigenvalue associated with the eigenvector A, while the unit matrix I may benormalized and used as a vector of relative attributes.

Due to the consistency property of equation (4.2), the system of homogeneous linear equations(4.4) has only trivial solutions unless some imprecision from these evaluations is allowed.

In general, as the precise value of wi/wj is difficult to assess, the decision maker’s evaluationscannot be so accurate to satisfy the transitivity rule (4.2) completely. It is known that in anymatrix, small perturbations in the coefficients of the matrix imply small perturbations in theeigenvalues. If A′ is defined as the decision maker’s estimate of A and w′ denotes the corre-sponding weight vector, then w′ is determined as the eigenvector corresponding to the maximumeigenvalue λmax of A′ according to

A′ ·w′ = λmax w′ (4.5)

So the vector w′ can be obtained by solving the system of linear equations (4.5). This principaleigenvector of the pairwise comparison matrix is then normalized so that the elements in the finalvector of weights w′ sum to 1.

In summary, the eigenvector method calculates a vector of cardinal weights which are derivedfrom the principal eigenvector of the pairwise comparison matrix and which are normalized tosum to one. This method argues that slight perturbations on a consistent matrix induce slightperturbations on the eigenvalue and the corresponding eigenvector.

4.2.3 Weighted Least Square Method

Chu et al. (1979) propose the weighted least square method to obtain the weights. This methodinvolves the solution of a set of simultaneous linear algebraic equations and is conceptually easierto understand than Saaty’s eigenvector method.

To determine the weights, suppose the decision maker giving his/her pairwise comparisons be-tween the elements aij of Saaty’s matrix A in equation (4.3). If aij = wi/wj are the elements of a

152


pairwise comparison matrix, the weights can be obtained by solving the constrained optimizationproblem (nonlinear programming model)

Minimize z =n∑

i=1

n∑

j=1

(wj aij − wi)2

subject ton∑

i=1

wi = 1

(4.6)

where aij denotes the relative weight of attribute Ai with respect to attribute Aj . An additionalconstraint for model (4.6) is that wi > 0. However, it is assumed that the above problem can besolved to obtain wi > 0 without this constraint.

Equation (4.6) is a nonlinear programming model. In order to minimize z, the Lagrangian functionis formed

L =n∑

i=1

n∑

j=1

(wj aij − wi)2 + 2λ

(n∑

i=1

wi − 1

)(4.7)

where λ is the Lagrangian multiplier.

Differentiating equation (4.7) with respect to wl and λ, the following set of (n+1) nonhomogeneouslinear equations with (n + 1) unknowns is obtained

n∑

i=1

wl (aij − wi) ail −n∑

j=1

(wj aij − wi) + λ = 0 (4.8)

which provides the n weights wi and the Lagrangian multiplier λ.

For example, for n = 2, the equations are (recall that aii = 1 , ∀ i)

(1 + a221) w1 − (a12 + a21) w2 + λ = 0

−(a21 + a12) w1 + (1 + a212) w2 + λ = 0

w1 + w2 = 1

Given the coefficients aij , the above equations can be solved for w1, w2, and λ.

The main disadvantage of the weighted least square method is probably the fact that the theorybehind this method is based on the assumption that the weights are known exactly. It is importantto remain aware of this potential problem, and to use weighted least square method only whenthe weights can be estimated precisely relative to one another.

4.2.4 Entropy Method

The entropy method can be used for evaluating the weights when the data of the decision matrixis known with some uncertainty. Entropy analysis indicates the discriminating ability of a certainattribute in a given design space. If the performance of all competing alternatives with respect to a

153


certain performance attribute have similar scores, then this attribute does not have any relevancein the comparative analysis and can be eliminated (Hwang and Yoon, 1981). On the other hand,if the attribute outcomes of alternative solutions are very different, then the attribute has an im-portant discriminating ability. In other terms, the more distinct and differentiated are the scores,i.e. the larger is the constrast intensity of an attribute, the greater is the amount of ‘decision in-formation’ contained.in and transmitted by the attribute. Weights derived from entropy analysisshow these strengths. Therefore, the entropy idea is particularly useful to investigate contrastsbetween sets of data, which can be pictured as a set of solution alternatives in the decision matrix.

Entropy is the most fundamental concept in information theory as well as in the statistical me-chanics, since it has many properties that agree with the intuitive notion of what a measure ofinformation should be. It measures the uncertainty associated with random phenomena of theexpected information content (Shannon and Weaver, 1947). This uncertainty is represented bya discrete probability distribution, pj , which agrees that a broad distribution represents moreuncertainty than does a sharply peaked one.

The measure of uncertainty E in a probability distribution (p1, p2, . . . , pn), associated with n

possible outcomes of a certain attribute, is given by Shannon (1948) as

E (p1, p2, . . . , pn) = −kn∑

j=1

pj ·ln pj

where k is a positive constant and ln denotes natural logarithm.. Since the terms ‘entropy’ and‘uncertainty’ are considered synonymous in statistical mechanics, E is called the entropy of theprobability distribution pj , since it depends only on the single probabilities. Observe that thelarger entropy is, the less information is transmitted by the jth attribute. E(p1, . . . , pn) takes itsmaximum value when all scores have the same probability pj = 1/n.

The entries of the decision matrix with n alternatives and m decision attributes can be representedin a probability distribution pij , where i = 1, 2, . . . , m alternatives and j = 1, 2, . . . , n attributes.Each entry pij includes a certain information content, which can be measured by means of theentropy value. Therefore, if the decision matrix D of m alternatives and n attributes is

D =

x11 x12 . . . x1n

x21 x22 . . . x2n

......

. . ....

xm1 xm2 . . . xmn

a probability value pij for each entry in the decision matrix can be simply determined by normal-izing the attribute values at each m design alternative; that is

pij =xij

m∑

i=1

xi

, ∀ i, j (4.9)

154


Based on this, the pij matrix is formed as follows

pij =

p11 p12 . . . p1n

p21 p22 . . . p2n...

.... . .

...pm1 pm2 . . . pmn

Then the entropy Ej of a decision attribute j for m design alternatives is determined as

Ej = −km∑

i=1

pij ·ln pij , ∀ j ∈ 1, . . . ,m (4.10)

where k denotes a constant with a value of (1/ ln m) at Emax, which guarantees that 0 ≤ Ej ≤ 1.

The weight related to entropy Ej is then

wj = 1− Ejn∑

i=1

Ej

(4.11)

Zeleny (1974) mentioned that a weight assigned to an attribute is directly related to the averageintrinsic information generated by a given set of alternatives over that attribute as well as to itssubjective assessment. Based on this, the degree of diversification dj of the information providedby the outcomes of an attribute j can be defined as

dj = 1− Ej , ∀ j (4.12)

According to Hwang and Yoon (1981), if the decision maker has no reason to prefer one attributeto another, the principle of insufficient reason (Starr and Greenwood, 1977) suggests that eachattribute should be equally preferred. Then the best weight set w associated with n decisionattributes the decision maker can expect, instead of equal weights, has elements

wj =dj

n∑

j=1

dj

, ∀ j (4.13)

If the decision maker has a prior, subjective weight λj , then the overall importance weight can beadapted using the set of calculated weight wj . The new weight w◦j can be formulated as follows

w◦j =wj λj

n∑

j=1

wj λj

, ∀ j (4.14)

It may be concluded that the most important attribute is always the one having both wj and λj

at their highest levels possible.

155


4.3 Selection Models

The multiattribute decision–making methods can address three types of problems: screening al-ternatives, ranking alternatives, and selecting the final ‘best’ alternative. Note that if a methodgenerates a cardinal ranking of the alternatives, then it can be used for both screening and rank-ing as well as for selecting. In cases where the initial number of alternatives is large, ‘narrowingthe field’ through the use of simple screening methods first will reduce the computational andinformation burdens of subsequent ranking or selection analysis. An instance where the prior useof simple screening methods is a ‘must’ is when there exist minimum requirements with respectto one or more attributes,

Most of the effective techniques available for dealing with MADM require information about therelative importance of the different values of the same attribute (intra-attribute preference) andthe decision maker’s preference across attributes (inter-attribute preference).

There are two major approaches in multiattribute information processing, that is, compensatorymodels and non–compensatory models. In any case, the decision maker may deem that high per-formance relative to one attribute can at least partially compensate for low performance relativeto another attribute, particularly if an initial screening analysis has eliminated alternatives whichfail to meet any minimum performance requirements.

Non–compensatory Models

The non–compensatory models do not permit trade–offs between attributes since compensationis not allowed. A decrease or unfavorable value in one attribute cannot be offset by an advantageor favorable value in some other attribute. Hence comparisons are made on an attribute-by-attribute basis. The MADM methods which belong to this class are credited for their simplicitywhich matches the behavior process of the decision maker whose knowledge is limited. Theyinclude methods such as dominance, maximin, maximax, conjunctive constraint method, dis-junctive constraint method, lexicographic method, lexicographic semiordering, and eliminationsby aspects.

Compensatory Models

The compensatory models incorporate trade–offs between high and low scores of attributes, as-signing a number to each multidimensional representation. In many cases, the decision makermay deem that high performance relative to one attribute can at least partially compensate forlow performance relative to another attribute. With compensatory models a single number isusually assigned to each multidimensional property characterizing an alternative design. Basedupon the method of evaluating this number, these models can further be divided into three sub-groups (Yoon and Hwang, 1981), that is, scoring, compromising and concordance models.

Scoring Models. These models select an alternative which has the highest score (or the maximumutility), reducing the decision problem to assessing the appropriate multiattribute utility function

156

4.3 – Selection Models

for the relevant decision situation. Simple additive weighting, hierarchical additive weighting andinteractive simple additive weighting belong to this category.

Compromising Models. These models identify the alternative which is the closest to the idealsolution. TOPSIS, LINMAP and nonmetric MRS methods belong to this category. Especiallywhen a decision maker uses a square utility function, identification of an ideal solution is assistedby LINMAP procedures.

Concordance Models. These models identify a set of preference rankings which best satisfy a givenconcordance measure. Permutation method, linear assignment method, and ELECTRE methodare classifiable in this class.

Compensatory methods, in order to accommodate trade–offs of low versus high performanceamong attributes, generally either require that the attributes be all measured in commensurateunits, or that the methods incorporate procedures for normalizing data which is not initiallycommensurate in order to facilitate attribute trade–off analysis.

Overview of MADM methods

The degree of evaluation accuracy also varies, since the different preference information on at-tributes may be listed by an ascending order of complexity. i.e. threshold value, ordinal, cardinal,and marginal rate of substitution. Methods for MADM are classified based upon different formsof preference information available to a decision maker; they require different amounts and typesof information about the attributes and alternatives, beyond the basic data included in the deci-sion matrix.

A three-stage taxonomy of MADM methods is shown in Table 4.1 according to:

• kind of information (on attribute or alternative or neither) essential to the decision maker;

• salient feature of the information needed;

• major classes of methods in any combination from previous stages.

Some methods (dominance, maximin, and maximax methods) require no additional informationbesides the basic decision matrix data. Other methods (additive weighting, TOPSIS, ELECTRE)require cardinal attribute importance ‘weights’ and cardinal performance ratings of the alterna-tives with respect to the attributes. Methods requiring this additional information place heavierdemands on the decision maker (in terms of time and information searching required), but in turnthey are able to combine, evaluate, and trade–off the decision matrix data in more sophisticatedways than the simpler methods.

Before going into the actual review, some key concepts and notations will be defined also withthe purpose to establish a unified notation of the most used terms. Also some supporting tech-niques for MADM, such as transformation of attributes and assessment of attribute weights, arediscussed, also illustrating the computational procedure of each method.

157


Information from Salient Feature Major Classesthe Decision Maker of Information of Methods

No DominancePreference MaximinInformation Maximax

Threshold Conjunctive MethodDisjunctive Method

Ordinal Lexicographic MethodInformation on Elimination by Aspects

Attributes Permutation Method

Cardinal Analytical Hierarchy Process (AHP)Simple Additive Weighting Method (SAW)Hierarchical Additive Weighting MethodLinear Assignment MethodELECTRE MethodTOPSIS Method

Marginal Rate Hierarchical Trade–offs

Information on Pairwise Preference LINMAPAlternatives Interactive SAW Method

Pairwise Proximity Multidimensional Scaling with Ideal PointMarginal Rate of Substitution with Ideal Point

Table 4.1. Classes of methods for multiattribute decision making

4.4 Methods with No Preference Information

There are some classical decision rules such as dominance, maximin and maximax which are stillfit for MADM, since they do not require any preference information from the decision maker .However, the right selection of these methods for the specific situation is important.

4.4.1 Dominance Method

The use of dominance rule in decision–making is quite common in procedures related to MADM.Assume that a number of feasible designs, satisfying the same set of requirements and for thesame selection of attributes, are generated. It is probable that some of them will be supersededby other designs in every respect. It means that if there exists a feasible design x′ that in eachrelevant attribute is better or equal to design x′′, that is

x′i ≥ x

′′i , ∀ i

then design x′ dominates x′′.

158

4.4 – Methods with No Preference Information

Therefore, x′′ is not a competitor in further selection since a better design, x′, is found. In con-trast, an alternative design x′′ is dominated if another design x′ overperforms it with respect toat least one attribute, and performs equally with respect to the remainder of attributes.

The subset of all designs from the set M of feasible designs, which are non dominated by anyother vector of M is the Pareto set . Testing feasible designs for dominance and filtering onlynondominated solutions finally yields a set of nondominated designs. Finally, the number of alter-natives can be reduced before selection process by eliminating the dominated ones. This methoddoes not require any assumption or any transformation of attributes.

Application of the dominance rule takes the following procedures. Compare the first two alter-natives and if one is dominated by the other, discard the dominated one. Next compare theundiscarded alternative with the third alternative and discard the dominated alternative. Thenintroduce the fourth alternative and so on. After (m− 1) stages the nondominated set is deter-mined. It has multiple elements in it; hence the dominance method is mainly used for the initialfiltering. The concept of dominance exploits only the ordinal character and not the cardinal char-acter of the attribute values. Also observe that dominance does not require comparison betweendifferent attributes of two competitive designs.

With the dominance method, alternatives are screened so that all dominated alternatives arediscarded. The screening power of this method tends to decrease as the number of independentattributes becomes larger.

The following important characteristics in using nondominated designs are worth mentioning:

• subjectivity does not influence the selection of nondominated designs if the decision makerdefines the direction of design improvement (i.e. the larger attribute value the better, orvice versa);

• the process of design generating is additive; that is, it is possible to generate more designsat any time and test them against an already existing set of nondominated designs;

• selecting the final ‘preferred design’ is performed by a separate procedure and only aftersufficient number of nondominated designs are generated.

Calpine and Golding (1976) derived the formula for the expected average number of nondominatedsolutions when m alternatives are compared with respect to n attributes. Consider first the veryspecial case in which all the elements in the decision matrix are random numbers uniformlydistributed over the range 0 to m. Attention is first focussed on the final nth column. Arrangethe rows so that the elements in the nth column are in decreasing order of magnitude. By therandomness of the elements, the probability of an arbitrarily selected row being the rth in the orderis 1/m. Let p (m,n) be the probability that a row, arbitrarily chosen from m rows (alternatives),is nondominated with respect to n attributes. Consider the rth row. The ordering ensures thatthis row is not dominated by any row below it and also that it exceeds no row above it in thenth attribute. Hence a necessary and sufficient condition for the rth row to be nondominated isthat it is nondominated among the first r candidates with respect to the first (n− 1) attributes.Thus the probability of a row being the rth and nondominated is p (r, n− 1)/m.

159


Figure 4.2. Expected number of nondominated alternatives

The probability of an arbitrarily selected row being nondominated is

p (m,n) =m∑

r=1

p (r, n− 1)/m = [p (m,n− 1) + (m− 1)· p (m− 1, n)] /m (4.15)

Then the expected average number of nondominated alternatives, a (m,n), is

a (m,n) = m·p (m,n) = a (m,m− 1)/m + a (m− 1, n)

As a (m,1) = a (1,n) = 1, the number a (m, n) can be calculated recursively. A good approxima-tion of a (m,n) is given by

a (m,n) ≈ 1 + lnm + (ln m)2/2! + . . . + (ln m)(n−3)!/(n− 3)! +

γ(lnm)(n−2)/(n− 2)! + (lnm)(n−1)/(n− 1)! (4.16)

where γ is Euler’s constant, equal to -0.5772.

Some typical results are shown in Figure 4.2, where the expected number of nondominated al-ternatives is given in terms of the number of attributes for different sets of feasible alternatives.It indicates that the number of nondominated alternatives, for a few attributes, i.e., n = 4, willbe reduced to 8, 20, and 80 for m = 10, 100, and 1000, respectively. However, the number ofnondominated alternatives for a large number of attributes, i.e., n = 8, will still be very large,i.e., 10, 90, and 900, respectively, for m = 10, 100, and 1000.

160

4.4 – Methods with No Preference Information

4.4.2 Maximin Method

The principle underlying the maximin method is that ‘a chain is only as strong as its weakestlink’. Effectively, the method gives each alternative a score equal to the strength of its weakestlink, where the ‘links’ are the attributes. Thus, it requires that performance with respect toall attributes be measured in commensurate units or else be normalized prior to performing themethod. Moreover, the maximin method can be used only when the attributes have a high degreeof comparability.

The maximin method is based upon a strategy that maximizes the minimum possible loss, thatis, it tries to avoid the worst possible performance, maximizing the minimal performance amongattributes. The alternative for which the score of its weakest attribute is the highest is preferred.

In situations where the overall performance of an alternative is determined by the lowest at-tribute score, the decision maker should examine the attribute values for each alternative, notethe lowest value for each alternative, and then select the alternative with the most acceptablevalue in its lowest attribute. This method belongs to the class of non–compensatory techniques.Alternatively, it can be thought of as minimizing the maximum gain (minimax method).

This method utilizes only a small part of the available information in making a final choice, i.e.only one attribute per alternative. Under this procedure only the single weakest attribute rep-resents the related alternative design; all other (n− 1) attributes for a particular alternative areignored. Thus even if an alternative is clearly superior in all but one attribute which is belowaverage, another alternative with only average on all attributes would be chosen over it. If theselowest attribute values come from different attributes, as they often do, the decision maker maybe basing his/her final choice on single values of attributes that differ from alternative to alter-native. Therefore, the maximin method can be used only when all attributes must be measuredon a common scale which can be a limitation (Linkov et al., 2004).

The alternative, A+, is selected such that

A+ ={

Ai |maxi

minj

xij

}i = 1,2, . . . ,m ; j = 1,2, . . . ,n (4.17)

where all xij ’s are in a common scale.

One way of making a common scale is using the degree of closeness to the ideal solution (Zeleny,1974), defined as the ratio of an attribute value to the most preferable attribute value (xmax

j =max {x1j , x2j , . . . , xmj}), that is

rij =xij

xmaxj

(4.18)

provided that attribute j is a benefit criterion (i.e., the larger xj , the more preference).

A more complicated form of rij is

rij =xij − xmin

j

xmaxj − xmin

j

(4.19)

where xminj = mini xij , i = 1,2, . . . ,m.

161


Then the maximin procedure selects the alternative design as

A+ = maxi

minj

rij i = 1, 2, . . . , m ; j = 1, 2, . . . , n (4.20)

Note that in case of a cost criterion, rij has to be computed as

rij =1/xij

maxi

(1/xij)=

mini

xij

xij=

xminj

xij(4.21)

or

rij =xmax

j − xij

xmaxj − xmin

j

(4.22)

The maximin method has some shortcomings, so that its applicability is relatively limited. Ingeneral, the method can be applied whenever the decision maker has a pessimistic outlook in thedecision making situation and the attributes are truly of equal importance. The maximin and itsreverse, the minimax procedure, is used in game theory (von Neumann and Morgenstern, 1947).

4.4.3 Maximax Method

In contrast to the maximin method, the maximax method selects the alternative that maximizesthe maximum outcome among attributes for every alternative. Extending the ‘chain’ analogyused in describing the maximin method, maximax performs as if one was comparing alternativechains in search of the best single link. The score of each alternative is equal to the performanceof its strongest attribute. Like the maximin method, maximax requires that all attributes becommensurate or pre–normalized.

In this case the highest attribute value for each alternative is identified; then these maximumvalues are compared in order to select the alternative with the largest such value. This methodis also called an ‘optimistic decision criterion’.

Note that in this procedure only the single strongest attribute represents the whole alternativedesign; all other (n− 1) attributes for the particular alternative are ignored. Therefore, as withthe maximin method, the maximax method can be used only when all attributes are measuredon a common scale - see equations (4.18) through (4.22).

The alternative, A+, is selected such that

A+ ={

Ai |maxi

maxj

rij

}j = 1,2, . . . ,n ; i = 1,2, . . . ,m (4.23)

The comparability assumptions and incompleteness properties of the maximax method do notmake it a very useful technique for general decision making. However, just as the maximinmethod, also the maximax method may be suitable in some specific decision–making situations.

Both the maximin procedure and the maximax procedure use what could be called a specializeddegenerate weighting, which may be different for each alternative (Moskowitz and Wright, 1979):

162

4.5 – Selection Methods with Information on Attributes

the maximin method assigns a weight of 1 to the worst attribute value and a weight of 0 to allothers; the maximax method assigns a weight of 1 to the best attribute value and a weight of 0to all others.

The Hurwicz procedure (Hey, 1979) is an amalgamation of the above two, in that it takes intoaccount both the worst and the best, thus selecting A+ such that

A+ = {Ai |max [α min rij + (1− α)max rij ]} (4.24)

The weight α is referred to as the pessimism-optimism index ; it is supposed to vary (over0 ≤ α ≤ 1) among the individual decision makers; the higher α the more pessimistic the in-dividual decision maker. As is apparent, the extreme case α = 1 gives the maximin, while α = 0the maximax.

Although this procedure might seem useful in a single instance, it is clearly inadequate whenconsidering the whole multiple attribute problem since it would draft the whole design team onthe basis of a single attribute.

4.5 Selection Methods with Information on Attributes

There is a large variety of multicriterial techniques to aid selection in conditions of multiple at-tributes.and/or alternatives. Usually the information on attributes is less demanding to assessthan that on alternatives. The majority of MADM methods require preference information toprocess inter-attribute and intra–attribute comparisons.

The information can be expressed in various ways:

• threshold value of each attribute (conjunctive method; disjunctive method);

• relative importance of each attribute by ordinal preference (lexicographic ordering; elimi-nation by aspects; permutation method);

• relative importance of each attribute by cardinal preference (linear assignment method; sin-gle additive weighting method; hierarchical additive weighting method; ELECTRE method;TOPSIS method);

• marginal rate of substitution (MRS) between attributes (marginal rate of substitution; in-difference curves; hierarchical trade–offs).

threshold values or ordinal preference information is utilized in non–compensatory models, whereascardinal preference or marginal rate of substitution is needed in compensatory models.

4.6 Methods with Threshold on Attributes

These methods require satisfactory rather than best performance in each attribute. To obtaina feasible solution the decision maker sets up the minimal threshold values he/she will accept

163


for each attribute. Any candidate design which has an attribute value less than the thresholdvalue will be rejected. This procedure is called the conjunctive method (Dawes, 1964) or thesatisficing method (Simon, 1955). On the other hand, if evaluation of an alternative solution isbased upon the greatest value of only one attribute, the procedure is called the disjunctive method .

Any alternative that does not meet the conjunctive or disjunctive rules is deleted from furtherconsideration. These screening rules can be used to select a subset of alternatives for analysis byother more complex decision making tools. Screening by conjunctive and disjunctive rules canalso be applied in determination of requirements for the decision–making process.

4.6.1 Conjunctive Method

Consider, for example, the position of a CFD consultant in a ship design company. His/hereffectiveness as an expert will be limited by the lesser of his/her abilities in hydrodynamics andnumerical computation; he/she cannot compensate for an insufficient knowledge of hydrodynam-ics by an excellent knowledge of fluid dynamics, or vice versa. The company will reject thecandidates who do not possess the required standard knowledge level in both fields.

The conjunctive method is purely a screening method. The requirement embodied by the con-junctive screening approach is that in order to be acceptable, an alternative must exceed givenperformance thresholds for all attributes (cut–off values). The cut–off values given by the deci-sion maker play the key role in eliminating the non–contender alternatives. Hence increasing theminimal threshold levels in an iterative way, the decision maker can sometimes narrow down thealternatives to a single choice. The attributes, and thus the thresholds, need not be measured incommensurable units.

An alternative Ai is classified as feasible only if

xij ≥ xoj , ∀ j = 1,2, . . . ,n (4.25)

where xoj is the cut–off value of xj .

The conjunctive method is not usually used for selection of alternatives but rather for dichotomiz-ing them into feasible/unfeasible categories. Dawes (1964) developed a way to set up the standardsif the decision maker wants to dichotomize the alternatives.

Consider a set of n equally weighted independent attributes. Let it be

r - the proportion of alternatives which are rejected;pc - the probability that a randomly chosen alternative yields outcomes above the

conjunctive cut–off level.

Thenr = 1− pn

c (4.26)

since the probability of being rejected is equal to one minus the probability of passing on allattributes.

164

4.7 – Methods with Ordinal Information

From equation (4.26) it can be derived

pc = (1− r)1/n (4.27)

which indicates that the decision maker must choose a cut–off level for each attribute such thatpc of the candidate designs will place above this score.

The conjunctive method does not require for the attribute information to be in numerical form;information on the relative importance of each attribute is not needed. The method belongs tothe class of non–compensatory models: if the decision maker simply use minimum cut–off valuesfor each attribute, then none of the alternative solutions gets credited for especially good attributevalues. The attempts to credit alternatives with especially high values suggest other methods tobe discussed later.

4.6.2 Disjunctive Method

The disjunctive method is also a pure screening method. It is the complement of the conjunctivemethod, substituting ‘or’ in place of ‘and’. That is, to pass the disjunctive screening test, analternative design must exceed the given performance threshold for at least one attribute. Likethe conjunctive method, the disjunctive method does not require attributes be measured on acommon scale.

An alternative Ai is classified as feasible only if

xij ≥ xoj , j = 1 or 2 or . . . n (4.28)

where xoj is a desirable level of xj .

For the disjunctive method, the probability of being rejected is equal to the probability of failingon all attributes

r = (1− pd)n (4.29)

where r is the proportion of alternatives which are rejected, and pd is the probability that arandomly chosen alternative scores above the disjunctive cut–off level. From equation (4.29), oneobtains

pd = 1− r1/n (4.30)

Like the conjunctive method, the disjunctive method does not need that the attribute informationbe in numerical form and does not require information on the relative importance of the attributes.

4.7 Methods with Ordinal Information

The most important information needed for lexicographic method, elimination by aspects, andpermutation method is the ordinal inter–attribute preference information. The relative impor-tance among attributes determined by ordinal preference is less demanding for the decision makerto assess than that by cardinal preference.

165


The permutation method was originally developed for the cardinal preferences of attributes given,but it is better used for the ordinal preferences given. The method will identify the best orderingof the alternative rankings.

4.7.1 Lexicographic Method

The lexicographic ordering is more widely adopted in practice than it deserves to be. This methodis simple and easy to handle. The term ‘lexicography’ reflects the similarity between this methodand the method by which words are ordered in a lexicon since it ranks attributes according toimportance. The values of each successive attribute are compared across alternatives.

In some decision problems a single attribute may be predominant. One way of treating thesesituations is to compare the alternatives by ranking the most important attributes in the orderof their importance. The alternative with the best performance score on the most importantattribute is preferred and the decision process ends. However, if multiple alternatives have thehighest values on the specified attribute, then the attribute ranked second in importance is com-pared across all alternatives. If alternatives are tied again, the performance of the next mostimportant attribute will be compared, and so on, till a unique alternative is found. The processcontinues sequentially until a single alternative is chosen or until all attributes have been consid-ered.

The method requires the decision maker to rank the attributes in the order of importance. Letthe subscripts of the attributes indicate not only the components of the attribute vector, but alsothe priorities of the attributes, i.e., x1 be the most important attribute to the decision maker, x2

the second most important one, and so on. Then alternative(s), A1, is (are) selected such that

A1 = {Ai |maxi

xi1} , i = 1, 2, . . . , m (4.31)

If this set {A1} has a single element, then this element is the most preferred alternative. If thereare multiple alternatives with maximal scores, consider

A2 = {A1 |maxi

xi2} , i ∈ {A1} (4.32)

If this set {A2} has a single element, then stop and select this alternative. If not, consider

A3 = {A2 |maxi

xi3} , i ∈ {A2} (4.33)

Continue this process until either some {Ak} with a single element is found which is then the mostpreferred alternative, or all n attributes have been considered, in which case, if the remainingset When applied to general decision making, the lexicographic method requires information onthe preference among attribute values and the order in which attributes should be considered. Inboth cases, it needs only ordering or ranking information and not (necessarily) numerical values.Because of its limited information requirements, lexicography has received serious considerationas a decision technique in a number of MADM problems.

166


The lexicographic method, as it is in maximin and maximax methods, utilizes only a small part ofthe available information in making a final choice. But lexicography is somewhat more demandingof information than ‘maximin’ and ‘maximax’, because it requires a ranking of the importance ofthe attributes, whereas ‘maximin’ and ‘maximax’ do not. However, lexicography does not requirecomparability across attributes as did ‘maximin’ and ‘maximax’ methods.

The lexicographic semiordering, described by Tversky (1969), is closely related to the lexico-graphic ordering. In most cases it makes sense to allow ranges of imperfect discrimination sothat one alternative is not judged better just because it has a slightly higher value on one at-tribute. In a lexicographic semiordering, a second attribute is considered not only in cases wherevalues for several alternatives on the most important attribute are equal but also for cases wherethe differences between the values on the most important attribute are negligible, keeping morealternatives in the decision–making process. This same process may then be used for furtherattributes if more than one alternative still remains. Thus a consideration of whether differencesare significant is imposed upon lexicographic ordering.

4.7.2 Elimination by Aspects

Elimination by aspects (EBA) is a formalization of the well–known heuristic ‘process of elimina-tion’. It is a discrete model of probabilistic choice worked out by Tversky (1971), which supposesthat decision makers follow a particular heuristic during a process of sequential selection. Like thelexicographic method, EBA examines one attribute at a time, starting with attributes deemedto be most important to make comparison among alternatives. However, it does differ slightlysince it eliminates alternatives which do not satisfy some minimum performance, and it proceedsuntil all alternatives except one have been eliminated, although adjustment of the performancethreshold may be required in some cases in order to achieve a unique solution. Another differenceis that the attributes are not ordered in terms of importance, but in terms of their discriminationpower in a probabilistic mode.

Tversky has formalized the decision process mathematically with the introduction of selectionprobability as a theoretical concept in the analysis of choice. The model proposed by Tverskypostulates that the utility assigned to the candidate designs is deterministic, but that the deci-sion rules by the decision makers are intrinsically probabilistic. Each alternative is viewed as aset of attributes which could represent values along some fixed quantitative or qualitative scores(i.e., price, quality, comfort). Since the model describes selection as an elimination process gov-erned by successive choices of attributes instead of cut–offs, it is called the elimination by aspects.

After a set of attributes is selected, EBA heuristic rule focuses first on the most important at-tribute and searches for a clear winner. If a winner emerges, the process stops. If a winner doesnot emerge, attention focuses on the second attribute, and so forth. The decision maker, as in theconjunctive method, is assumed to have minimum cut–offs for each attribute. When an attributeis selected, all design alternatives not passing the cut–off on that attribute are eliminated. Theprocess stops when all but one alternative are eliminated.

167


In the EBA model, individual choices are described as a result of a stochastic process involvinga successive elimination of the design alternatives:

• the attributes common to all alternatives are eliminated because they cannot permit todiscriminate between solutions during the selection process;

• all the alternatives that do not have a randomly selected attribute are eliminated; the higherthe utility of a property is, the larger the probability of selecting this property is;

• if the remaining solutions still have specific attributes, the decision maker goes back to thefirst step; on the contrary, if all solutions have the same attributes, the procedure ends; ifonly one alternative remains, it is selected by the decision maker; otherwise, all the remain-ing alternatives have the same probability to be selected.

The EBA is similar in some respects to the lexicographic method and conjunctive method. Itdiffers from them in that, due to its probabilistic nature, the criteria for elimination (i.e. theselected attributes) and the order in which they are applied vary from one design situationto another and are not determined in advance. In particular, it differs from the conjunctivemethod in that the number of criteria for elimination varies. If an attribute which belongs onlyto a single alternative is chosen at the first stage, then the EBA needs only one attribute. Theelimination by aspects has some advantages: it is relatively easy to apply; it involves no numericalcomputations, and it is easy to explain and justify in terms of a priority ordering defined on theattributes. The major flow on the logic of elimination by aspects lies in the noncompensatorynature of the selection process. Although each selected attribute is desirable, it might lead to theelimination of alternatives that are better than those which are retained. In general, the strategyof EBA cannot be defended as a rational procedure of choice. On the other hand, there may bemany decision situations in which it provides a good approximation to much more complicatedcompensatory models and could thus serve as a useful simplification procedure.

4.7.3 Permutation Method

The permutation method (Paelinck, 1976) aims to identify the dominating design, that is, thebest ordering of the alternative rankings among candidate designs by measuring the level of con-cordance and discordance of the complete preference order. It uses Jaquet–Lagreze’s successivepermutations of all possible rankings of the alternative designs against all others. The methodwas originally developed to treat the cardinal preferences (i.e., a set of weight) of attributes given,but is rather designed for the ordinal preferences of attributes given. With m alternatives, m!permutation rankings are available.

Suppose a number of alternatives (Ai, i = 1, 2, . . . ,m) have to be evaluated according to n

attributes (xj , j = 1, 2, . . . , n).

168


The problem can be stated in a decision matrix D as

x1 x2 . . . xn

D =

A1

A2...

Am

x11 x12 . . . x1n

x21 x22 . . . x2n...

.... . .

...xm1 xm2 . . . xmn

Assume that a set of cardinal weights wj , j = 1, 2, . . . , n,∑

wj = 1, be given to the set ofcorresponding attributes.

Suppose that the problem is to rank three alternatives: A1,A2, and A3. Then six permutationsof the ranking of the alternatives exist (m! = 3! = 6). They are:

P1 = (A1, A2, A3) P4 = (A2, A3, A1)

P2 = (A1, A3, A2) P5 = (A3, A1, A2)

P3 = (A2, A1, A3) P6 = (A3, A2, A1)

Assume that a testing order of the alternatives be P5 = (A3,A1,A2). Then the set of concordancepartial order is {A3≥A1, A3≥A2, A1≥A2} and the set of discordance is {A3≤A1, A3≤A2, A1≤A2}.If in the ranking the partial ranking Ak ≥Al appears, the fact that xkj ≥ xlj will be rated wj ,xkh ≤ xlh being rated −wh. The evaluation criterion of the chosen hypothesis (ranking of thealternatives) is the algebraic sum of wj ’s corresponding to the element–by–element consistency.Consider the ith permutation

Pi = (. . . ,Ak, . . . ,Al, . . .) , i = 1, 2, . . . ,m!

where Ak is ranked higher than Al.

Then the evaluation criterion of Pi, i.e. Ri, is given by

Ri =∑

j∈Ckl

wj −∑

j∈Dkl

wj , i = 1, 2, . . . ,m! (4.34)

whereCkl = {j |xkj ≥ xlj} , k, l = 1, 2, . . . , m , k 6= l

Dkl = {j |xkj ≤ xlj} , k, l = 1, 2, . . . , m , k 6= l

The concordance set Ckl is the subset of all criteria for which xkj ≥ xlj , and the discordance setDkl is the subset of all criteria for which xkj ≤ xlj .

The permutation method is a useful method owing to its flexibility with regard to ordinal andcardinal rankings. A possible drawback of this method is the fact that, in the absence of aclear dominant alternative, rather complicated conditions for the values of weights may arise,particularly because numerical statements about ordinal weights are not easy to interpret. Alsowith the increase of the number of alternatives the number of permutations increases drastically.

169


4.8 Methods with Cardinal Information

The methods in this class require the decision maker’s cardinal preferences of attributes. This isthe most common way of expressing inter–attribute preference information. All of them involveimplicit trade–offs, but their evaluation principles are quite different:

• select an alternative in problems that have a hierarchical structure of attributes (AHP);

• select an alternative which has the largest utility (simple additive weighting, hierarchicaladditive weighting);

• arrange a set of overall preference rankings which best satisfies a given concordance measure(linear assignment method, ELECTRE);

• select an alternative which has the largest relative closeness to the ideal solution and thelargest relative distance from the anti–ideal solution (TOPSIS).

Each multiattribute decision process with cardinal information is based on four steps:

• determine the target, the attributes and the alternatives;

• allocate weights to attributes representing their relative importance and deliver weights toalternatives representing their effects on these attributes;

• process the scores to determine a ranking of the solutions;

• analyze the sensitivity of the results by systematically varying the weights allocated to thedifferent attributes to assess the uncertainty of the derivation.

4.8.1 Analytical Hierarchy Process

The Analytical Hierarchy Process (AHP) method, originally developed by Saaty (1980), dealswith the study of how to derive relative scales using judgement and data from a standard scale.It needs information from the decision maker on a cardinal scale. AHP helps capture both sub-jective preferences and objective evaluation measures, providing a useful mechanism for checkingthe consistency of the evaluation measures and alternatives generated by the design team thusreducing bias in decision making.

AHP is a powerful and flexible decision–making process to help decision makers set priorities andmake the best decisions when both qualitative and quantitative aspects of a decision need to beconsidered. This mathematically–based method is intended to solve such selection problems thathave a hierarchical structure of attributes. Hence decision makers can make usage of their levelof expertise and apply judgement to the attributes deemed important to achieve goals.

By reducing complex decisions to a series of one-on-one (pairwise) comparisons, then synthesizingthe results, AHP helps decision makers arrive at the best possible decisions. Attributes in onelevel are compared in terms of relative importance with respect to an element in the immediatehigher level, treating the pairwise comparison with the eigenvector method as outlined in Senand Yang (1998).

170

4.8 – Methods with Cardinal Information

The method is developed through two steps. The first step is for the decision maker to decom-pose the decision problem into its constituent parts, progressing from the general to the specific.Since the decision maker is assumed to be consistent in making evaluations about any one pair ofattributes and since all attributes will always rank equally when compared to themselves, one hasaij = 1/aji and aii = 1. This means that it is necessary to make only m (m− 1)/2 comparisonsto establish the full set of pairwise judgements for m attributes. The entries aij , i,j = 1, . . . ,mcan be arranged in a pairwise comparison matrix A of size m×m.

The second step is to estimate the set of weights that are more consistent with the impreci-sions expressed in the comparison matrix. Note that while there is complete consistency in the(reciprocal) evaluations made about any one pair, consistency of evaluations between pairs, i.e.aij ·ajk = aik for all i,j,k (i.e. the transitivity rule) is not guaranteed. Thus the task is to searchfor an m-vector of weights such that the m×m matrix w of entries wi/wj will provide the bestfit to the evaluations recorded in the pairwise comparison matrix.

To assess the scale ratio wi/wj , Saaty (1977) gives a nine-point scale expressing the intensity scaleof the preference for one attribute over another, as shown in Table 4.2. If attribute Ai is deemedmore important than attribute Aj , then the reciprocal of the relevant index value is assigned.Saaty’s original method to compute the weights is based on matrix algebra and determines themas the elements in the eigenvector associated with the maximum eigenvalue of the matrix.

Intensity of Verbal JudgementImportance of Preference Explanation

1 Equal importance Two attributes contribute equallyor preference to the goal

3 Moderate preference of one Experience and evaluation slightlyattribute over another favor one attribute over another

5 Essential or strong Experience and evaluation stronglypreference favor one attribute over another

7 Very strong or demonstrated An attribute is strongly favoredpreference and its dominance is demonstrated

in practice

9 Extreme preference The evidence favoring one attributeover another is of the highestpossible order of affirmation

2, 4, 6, 8 Intermediate values When compromise is neededbetween the twoadjacent evaluations

Table 4.2. Pairwise comparison scale of attributes in AHP

Similarly to calculation of the weights for the attributes, AHP also uses the technique based onpairwise comparisons to determine the relative performance scores of the decision matrix for each

171


of the alternatives on each subjective evaluation. Searched scores use the same set of nine indexassessments as before, and the same techniques can be used in computing the weights of attributes.

In spite of its quality, AHP method has several disadvantages. First, it requires attributes to beindependent with respect to their preferences, which is rarely the case in design selection processes.Second, all attributes and alternatives are compared with each other (at a given level), whichmay cause a logical conflict of the kind: A > B and B > C, but C > A. The likelihood of suchconflicts occurring in the hierarchy trees increases dramatically with the number of alternativesand attributes. Moreover, a number of specialists have voiced a number of concerns about theAHP, including the potential internal inconsistency and the questionable theoretical foundationof the rigid 1-9 scale, as well as the phenomenon of rank reversal possibly arising when a newalternative is introduced. On the same time, there have also been attempts to derive similarmethods that retain the strength of AHP while avoiding some of the criticisms.

4.8.2 Simple Additive Weighting Method

The simple additive weighting method (Klee, 1971) is one of the best known and the most widelyused MADM methods.

To reflect his/her marginal worth assessments within attributes, the decision maker assigns toeach attribute the importance weights which become the coefficients of the elements in the deci-sion matrix, thus making a numerical scaling of intra–attribute values. He/she can then obtaina total score for each alternative simply by multiplying the scale rating for each attribute valueby the importance weight assigned to the attribute and then summing these products over allattributes. After the total scores are computed for each alternative, the solution with the highestscore (the highest weighted average) is the one suggested to the decision maker. Although thistechnique is easy to apply, it runs the risk of ignoring interactions among the attributes.

Mathematically, simple additive weighting method can be stated as follows. Suppose the decisionmaker assigns a set of importance weights w = (w1, w2, . . . , wn) to the attributes. Then the mostpreferred alternative, A∗, is selected such that

A∗ =

Ai |max

i

n∑

j=1

wjxij /n∑

j=1

wj

(4.35)

where xij is the outcome of the ith alternative about the jth attribute with a numerically com-parable scale. Usually the weights are normalized so that

∑wj = 1.

Simple additive weighting method uses all n attribute values of an alternative and uses the regulararithmetical operations of multiplication and addition; therefore, the attribute values must beboth numerical and comparable. Further, it is also necessary to find a reasonable basis on whichto form the weights reflecting the importance of each of the attributes.

When weights are assigned and attribute values are numerical and comparable, some arbitraryassumptions still remain. It can happen that a low outcome multiplied by a high weight yields

172


about the same product as a high attribute value multiplied by a low weight. This identity thenimplies that two attributes just ‘offset each other’, that is, both make the same contribution tothe weighted average. Thus there exist some difficulties in interpreting the output of the multi-plication of attribute values by weights.

Attributes cannot often be considered separately and then added together; because of the comple-mentarities between the various attributes, the approach of weighted averages may give misleadingresults. But when the attributes can in fact be considered separately (i.e., when there are essen-tially no important complementarities), simple additive weighting method can be a very powerfultool in MADM. This method will leads to a unique choice since a single number is arrived at foreach alternative, and since these numbers will usually be different. For this reason, and becauseit has some intuitive appeal, it is frequently used.

The utility functions being used for uncertainty can be equally used in the case of certainty, andis called value function V (x1, x2, . . . , xn). A value function satisfies the following property

V (x1, x2, . . . , xn) ≥ V (x′1, x

′2, . . . , x

′n) ⇐⇒ (x1, x2, . . . , xn ≥ (x

′1, x

′2, . . . , x

′n)

For independent attributes, a value function takes the form

V (x1, x2, . . . , xn) =n∑

j=1

wj vj(xj) =n∑

j=1

wjrj

where vj(·) is the value function for the jth attribute and rj is the jth attribute transformed intothe comparable scale. A utility function can be a value function, but a value function is notnecessarily a utility function; that is

U (x1, x2, . . . , xn) =⇒ V (x1, x2, . . . , xn)

Hence a valid additive utility function can be substituted for the simple additive weighting func-tion.

In simple additive weighting method, it is assumed that the utility (score, value) of the multipleattributes can be separated into utilities for each of the individual attributes. When the attributesin question are complementary (that is, excellence with respect to one attribute enhances theutility of excellence with respect to another), or substitutes (that is, excellence with respect toone attribute reduces the utility gain associated with excellence with respect to other attributes),it is hard to expect that attributes take the separable additive form. Then the overall score orperformance can be made in a quasi–additive or multilinear form (Keeney and Raiffa, 1976).But theory, simulation computations, and experience all suggest that simple additive weightingmethod yields extremely close approximations to very much more complicated nonlinear forms,while remaining far easier to use.

4.8.3 Hierarchical Additive Weighting Method

In simple additive weighting method, the weighted averages (or priority value) for the alternativeA. are given by

173


n∑

j=1

wjxij /n∑

j=1

wj

where it is generally imposed that∑

wj = 1 and attribute xij is in a ratio scale.

If one interprets the normalized value xij as the subscore of the ith alternative with regard to thejth attribute (Klee, 1971), then the vector xj = (x1j ,x2j , . . . ,xij) may indicate the contributionor importance of Ai’s for the jth attribute, whereas the weight vector w still represents the im-portance of the considered attributes for the decision problem.

In fact, the more sophisticated hierarchical additive weighting recognizes that attributes may sim-ply be means towards higher level targets. Hence, the decision maker assigns preferences to thehigher level targets and then assesses the capability of each of the attributes in attaining thesehigher level targets. In this way he/she infers the inter–attribute weighting from his/her directassessment of the higher level targets. Such an approach matches Saaty’s hierarchical structures(Saaty, 1977).

Consider, for example, a ship decision problem by the SAW can be represented as a hierarchywith three levels. In Figure 4.3 the first hierarchy level has a single attribute, say, the life–timeeffectiveness of the ship. Its priority (weight) value is assumed to be equal to unity. The secondhierarchy level has six attributes, maximum speed, range, maximum payload, acquisition cost,reliability, and maneuverability. Their weights are derived from the various weight assessingmethods with respect to the attribute of the first level. The third hierarchy level has the fourcandidate ships considered. In this level weights should be derived with respect to each attributeof the second level. The problem is to determine the priorities of the different ships on life–timeeffectiveness through the intermediate second level.

Figure 4.3. A hierarchy for priorities in ship concept design

To this end, it is essential to structure a formal hierarchy (Saaty, 1977) in terms of partiallyordered sets of decision maker’s intuitive understanding of the design concept. The hierarchy ofpriorities has various levels: the top level consists of a single element and each element of a given

174


level dominates (serve as a property of) some or all the elements in the level immediately below.The pairwise comparison matrix approach may be then applied to compare elements in a singlelevel with respect to a property from the adjacent higher level. The process is repeated up thehierarchy and the problem is to compose the resulting weights (obtained by either eigenvectormethod or least weighted method) in such a way as to obtain one overall weighting vector of theimpact of the lowest elements on the top element of the hierarchy by successive weighting andcomposition.

4.8.4 Linear Assignment Method

Bernardo and Blin (1977) developed the linear assignment method which gives an overall prefer-ence ranking of the alternatives based on a set of attribute-wise rankings and a set of attributeweights. It features a linear compensatory process for attribute interaction and combination. Inthe process only ordinal data, rather than cardinal data, are used as input. This weaker infor-mation requirement is attractive in that scaling the qualitative attributes is not needed.

This method requires, in addition to the decision matrix data, cardinal importance weights foreach attribute and rankings of the alternatives with respect to each attribute. The primary use ofthe additional information is to enable compensatory rather than noncompensatory analysis, thatis, allowing good performance on one attribute to compensate for low performance on another.

The linear assignment method is a special type of linear programming problem. Besides beingable to determine the best alternative, the method has certain unique advantages in application.For data collection, all that is required is the attribute-wise rankings. Thus the tedious require-ments of the existing compensatory models are eliminated; i.e., the rather lengthy procedures oftrade–off analysis are not required. The procedure also eliminates the obvious difficulties encoun-tered in constructing appropriate interval–scaled indices of attributes as required for regressionanalysis to be applicable. Even though a lengthy data gathering effort is eliminated, the methoddoes satisfy the compensatory hypothesis, whereas other procedures which rely on minimal datado not. By the way, the elimination by aspects approach, and the lexicographic method are nottruly compensatory.

A compensatory model from this simple approach is devised. Let define a product–attribute ma-trix π as a square (m×m) non–negative matrix whose elements πik represent the frequency (ornumber) that Ai is ranked for the kth attribute-wise ranking. It is understood that πik measuresthe contribution of Ai to the overall ranking, if Ai is assigned to the kth overall rank. The largerπik indicates the more concordance in assigning Ai to the kth overall rank. Hence the problemis to find Ai for each k, (k = 1, 2, . . . , m) which maximizes

∑πik. This is an m! comparison

problem. A linear programming model is suggested for the case of large m.

Let define a permutation matrix P as a (m ×m) square matrix whose element Pik = 1 if Ai isassigned to overall rank k, and Pik = 0 otherwise. The linear assignment method can be writtenby the following linear programming form

175


Maximize Π =m∑

i=1

m∑

k=1

πik Pik

subject tom∑

k=1

Pik = 1 , i = 1, 2, . . . , m

m∑

i=1

Pik = 1 , k = 1, 2, . . . , m

Pik ≥ 0 ∀ i, k

(4.36)

Recall that Pik = 1 if alternative i is assigned rank k. Of course, alternative i can be assigned toonly one rank; therefore, the first constraint equation in (4.36) holds. Likewise, a given rank k

can only have one alternative assigned to it; therefore, the second constraint equation in (4.36)holds.

Let the optimal permutation matrix, solution of the above LP problem, be P ∗. Then, the optimalordering can be obtained by multiplying the attribute-wise preference matrix A by P ∗.

4.8.5 ELECTRE Method

The ELECTRE (Elimination et Choice Translating Reality) method was originally proposed byBenayoun et al. (1966). Since then Roy (1973), Nijkamp and van Delft (1977) have developed itto the present state.

This method uses the concept of an outranking aggregate relationship by using pairwise compari-son of alternatives for each attribute. The outranking relationship of two alternatives denoted asAk→Al, states that even though the kth alternative does not dominate lth alternative quantita-tively, the decision maker may still accept the risk of regarding Ak as almost surely better than Al

(Roy, 1973). Through sequential assessments of the outranking relationships of all alternatives,the dominated alternatives can be eliminated.

The ELECTRE method sets the criteria for the assessment of the outranking relationships byeliciting for each pair of alternative a concordance index and a discordance index . The formerrepresents the sum of all the weights for those attributes where the performance score of thealternative Ak is at least as high as that of Al; the latter describes the counter–part of the con-cordance index. Finally a binary outranking relation between the alternatives is yielded as a finalresult.

The ELECTRE method is applied in situations where the less favored alternatives should beeliminated and a leading set of alternatives should be produced; this holds particularly in casesof a large number of alternatives with only a few attributes involved. The outranking proceduretakes the following steps:

176


Step 1. Normalize the decision matrix

During this step the attribute scales xij are transformed into comparable scales. Each value rij

in the normalized decision matrix R

R =

r11 r12 . . . r1n

r21 r22 . . . r2n...

.... . .

...rm1 rm2 . . . rmn

(4.37)

can be calculated as the normalized preference measure of the ith alternative in terms of the jth

attribute as

rij =xij√√√√m∑

i=1

x2ij

so that all attributes have the same unit length of vector.

Step 2. Weighting the normalized decision matrix

The weighted normalized decision matrix V is calculated by multiplying each column of thematrix R with its associated weight wj as determined off–line by the decision maker; so it isequal to

V = R·w =

v11 . . . v1j . . . v1n

v21 . . . v2j . . . v2n...

......

vm1 . . . vmj . . . vmn

=

w1r11 . . . wjr1j . . . wnr1n

w1r21 . . . wjr2j . . . wnr2n...

......

w1rm1 . . . wjrmj . . . wnrmn

(4.38)

where

w =

w1 0w2

. . .0 wm

andn∑

i=1

wi = 1

Step 3. Define the concordance and discordance set

For each pair of alternatives k and l (k, l = 1, 2, . . . , m and k 6= l), the set of decision attributesJ = {j|j = 1, 2, . . . , n} is divided into two distinct subsets. The concordance set Ckl of twoalternatives Ak and Al is explained as the set of all attributes for which Ak is preferable to Al;that is

Ckl = {j|xkj ≥ xlj} for j = 1, 2, . . . , n (4.39)

177


On the other hand, the complementary set is called the discordance set , which is

Dkl = {j | xkj < xlj} = J − Ckl (4.40)

Step 4. Calculate the concordance and discordance matrices

The relative value of the elements in the concordance set is calculated by means of the concordanceindex , which is equal to the sum of all the weights associated with those attributes which arecontained in the concordance set. Therefore, the concordance index ckl between Ak and Al isdefined as

ckl =

∑

j∈Ckl

wj

n∑

j=1

wj

which for the normalized weight set reduces to

ckl =∑

j∈Ckl

wj with 0 ≤ ckl ≤ 1 (4.41)

The concordance index reflects the relative importance of Ak with respect to Al : a higher valueof ckl indicates that Ak is preferable to Al as far as the concordance criteria are concerned. Thesuccessive values of the concordance indices ckl (k,l = 1,2, . . . ,m and k 6= l) form the concordancematrix C of (m× n) terms

C =

− c12 c13 . . . c1(m−1) c1m

c21 − c23 . . . c2(m−1) c2m...

......

......

cm1 cm2 cm3 . . . cm(m−1) −

which is generally not symmetric.

So far, no attention has been paid to the degree to which the properties of a certain alternativeAk are worse than the properties of a competing alternative Al.

Therefore a second index, called the discordance index , has to be defined as

dkl =maxj∈Dkl

| vkj − vlj |maxj∈J

| vkj − vlj | (4.42)

where 0 ≤ dkl ≤ 1 and terms v.,j denote the weighted normalized values for the attribute jth. Ahigher value of dkl implies that, for the discordance criteria, Ak is less preferable than Al, and alower value of dkl implies that Ak is more preferable than Al. The discordance indices form thediscordance matrix Dx of (m× n) terms, which is generally an asymmetric matrix.

178


Dx =

− d12 d13 . . . d1(n−1) d1n

d21 − d23 . . . d2(n−1) d2n...

...... . . .

......

dm1 dm2 dm2 . . . dm(n−1) −

It should be noticed that the information contained in the concordance matrix C is considerablydifferent to that contained in the discordance matrix D, making the information content C andD complementary. In other terms, the concordance matric describes differences among weights,whereas differences among attribute values are represented by means of the discordance matrix.

Step 5. Determine the concordance and discordance dominance matrices

These matrices can be calculated with the help of a threshold value for the concordance indexand the discordance index, respectively. That means that for the concordance index Ak will onlyhave a chance of dominating Al, if its corresponding concordance index ckl exceeds at least acertain threshold value c, i.e.

ckl ≥ c

This cut-off value can be determined, for example, as the average concordance index

c =m∑

k=1

m∑

l=1

ckl

m(m− 1), with k 6= l (4.43)

On the basis of the threshold value, the Boolean concordance dominance matrix F can be con-structed, whose elements are determined as

fkl = 1 , if ckl ≥ c

fkl = 0 , if ckl < c

}(4.44)

Then each element of l on the matrix F represents the dominance of one alternative with respectto another one.

The discordance dominance matrix G is constructed in a way analogous to the F matrix on thebasis of a threshold value d to the discordance indices, calculated as

d =m∑

k=1

m∑

l=1

dkl

m(m− 1), with k 6= l (4.45)

The unit elements of the Boolean matrix G are defined as

gkl = 1 , if dkl ≤ d

gkl = 0 , if dkl > d

}(4.46)

represent the dominance relationships between any two alternatives.

179


Step 6. Determine the aggregate dominance matrix

The next step is to calculate the intersection of the concordance dominance matrix F and dis-cordance dominance matrix G. The resulting matrix, called the aggregate dominance matrix E,is defined by means of its typical elements ekl as follows

ekl = fkl ·gkl (4.47)

Step 7. Eliminate the less favorable alternatives

The aggregate dominance matrix E gives the partial–preference ordering of the alternatives. Ifekl = 1, the alternative Ak is preferable to the alternative Al for both the concordance criteria andthe discordance criteria, but Ak still has the chance of being dominated by the other alternatives.Hence according to the ELECTRE procedure the condition that Ak is not dominated is

ekl = 1 , for at least one l , l = 1, 2, . . . ,m ; k 6= l

ekl = 0 , for all i , i = 1, 2, . . . ,m ; i 6= k , i 6= l

(4.48)

This condition appears difficult to apply, but the dominated alternatives can be easily identifiedin the aggregate dominance matrix. If any column of the E matrix has at least one element of 1,then this column is ‘ELECTREcally’ dominated by the corresponding row(s). Hence any columnwhich has an element of 1, will be simply eliminated.

A weak point of the ELECTRE method is the use of threshold values c and d. These values arerather arbitrary, although their impact on the final solution may be significant. For example, ifthe decision maker takes a cut–off value of c = 1, and d = 0 for complete dominance, then it israther difficult to eliminate any of the alternatives. And by relaxing the threshold value (c = 1lowered; d = 0 increased) the number of nondominated solutions can be reduced.

Nijkamp and van Delft (1977) have introduced the net dominance relationships for the comple-mentary analysis of the ELECTRE method. First they define the net concordance dominancevalue ck, which measures the degree to which the total dominance of the alternative Ak exceedsthe degree to which all competing alternatives dominate Ak, i.e.

ck =m∑

l=1

ckl −m∑

l=1

clk ; with l 6= k (4.49)

Similarly, the net discordance dominance value dk is defined as

dk =m∑

l=1

dkl −m∑

l=1

dlk ; with l 6= k (4.50)

Obviously Ak has a higher chance of being accepted with the higher ck and the lower dk. Hencethe final selection should satisfy the condition that its net concordance dominance value should beat a maximum and its net discordance dominance at a minimum. If one of these conditions is notsatisfied, a certain trade–off between the values of ck and dk has to be carried out. The procedure

180


is to rank the alternatives according to their net concordance and discordance dominance values.The alternative that scores on the average as the highest one can be selected as the final solution.

The ELECTRE method should be considered to be one of the best ranking methods because ofits simple logic and full utilization of information contained in the decision matrix.

4.8.6 TOPSIS Method

The Technique for Order Preference by Similarity to the Ideal Solution (TOPSIS), initially pre-sented by Yoon and Hwang (1985), is an alternative to the ELECTRE method. It is one of thecompromising methods among the compensatory techniques, which utilizes performance informa-tion provided in the form of weights wi for each attribute. TOPSIS is attractive in that limitedsubjective input is needed from decision makers. It originates from the concept of displaced ideal(Zeleny, 1974) according to which the selected alternative should have the shortest distance fromthe ideal solution and the farthest from the anti–ideal solution. Commonly used metrics (L1,L2, and L∞) are considered to measure distance from zenith and nadir points on whose relativecloseness and remoteness, respectively, the preferred solution is adopted.

Then it is easy to locate the ideal solution which is composed of all best attribute values attainable,and the anti–ideal solution composed of all worst attribute values attainable. Sometimes thechosen alternative, which has the minimum Euclidean distance from the ideal solution, also hasthe shorter distance (to the anti–ideal) with respect to the other alternatives. Figure 4.4 showsan example where an alternative A1 has shorter distances (both to ideal solution A∗ and to theanti–ideal solution A−) than A2. In this case it is very difficult to justify the selection of A1.

Figure 4.4. Euclidean distances to the ideal and anti–ideal points in 2D space

This method again requires a decision matrix but also needs relative weights to represent pref-erence information. TOPSIS also assumes that each attribute is monotonically increasing ordecreasing.

181


The Algorithm

The TOPSIS method computes the following decision matrix, which refers to m alternatives A

that are evaluated in terms of n attributes as follows

X1 X2 . . . Xj . . . Xn

D =

A1

A2...

Ai...

Am

x11 x12 . . . x1j . . . x1n

x21 x22 . . . x2j . . . x2n...

......

...xi1 xi2 . . . xij . . . xin...

......

...xm1 xm2 . . . xmj . . . xmn

where xij denotes the performance measure of the ith alternative in terms of the jth attribute.

TOPSIS takes the cardinal preference information on attributes; that is, a set of weights for theattributes is required. The solution depends upon the weighting scheme given by the decisionmaker. Reliable methods for weight assessment have appeared (Chu et al., 1979; Saaty, 1977;Zeleny, 1974) which enhance usage of this method. TOPSIS assumes each attribute in the deci-sion matrix to take either monotonically increasing or monotonically decreasing utility. In otherwords, the larger the attribute outcomes, the greater the preference for the ‘benefit’ attributesand the less the preference for the ‘cost’ attributes. Further, any attribute which is expressed ina non–numerical way should be quantified through an appropriate scaling technique.

The process of TOPSIS includes a series of six successive steps as follows.

Step 1. Construct the normalized decision matrix

This process standardizes the various dimensional attributes into non–dimensional attributes,which allows comparison across the attributes. One way is to take the outcome of each attributedivided by the norm of the total outcome vector of the criterion at hand, also called the Euclideanlength of a vector . An element rij of the normalized decision matrix R can be calculated as

rij =xij√√√√m∑

i=1

x2ij

obtained by all existing solutions; consequently, each attribute has the same unit length of vector.

Step 2. Form the weighted normalized decision matrix

A set of weights w = (w1,w2, . . . ,wj , . . . ,wn),∑

wj = 1, is accommodated to the decision matrixin this step. This matrix can be calculated by multiplying each column of the matrix R withits associated weight wj . Therefore, the weighted normalized decision matrix V is generated asfollows

182


V = R·w =

v11 v12 . . . v1j . . . v1n...

......

...vi1 vi2 . . . vij . . . vin...

......

...vm1 vm2 . . . vmj . . . vmn

=

w1r11 w2r12 . . . wjr1j . . . wnr1n...

......

...w1ri1 w2ri2 . . . wjrij . . . wnrin

......

......

w1rm1 w2rm2 . . . wjrmj . . . wnrmn

Step 3. Identify the ideal and anti–ideal solutions

The ideal solution A∗ and the anti–ideal solution, denoted as A−, are the collection of the bestand the worst values of the attributes and defined respectively as

A∗ = {(maxi

vij | j ∈ J), (mini

vij | j ∈ J ′), i = 1, 2, . . . , m} =

{v∗1, v∗2, . . . , v∗j , . . . , v∗n} (4.51)

A− = {(mini

vij | j ∈ J), (maxi

vij | j ∈ J ′), i = 1, 2, . . . , m} =

{v−1 , v−2 , . . . , v−j , . . . , v−n } (4.52)

where

J = {j = 1, 2, . . . , n | j associated with benefit criteria}J ′ = {j = 1, 2, . . . , n | j associated with cost criteria}

Then it is obvious that the previous created alternatives A∗ and A− represent the most prefer-able alternative, i.e. the ideal solution, and the least preferable alternative or anti–ideal solution,respectively.

Step 4. Develop the separation measure over each attribute to both zenith and nadir

The separation distances of each alternative from the ideal solution and the anti–ideal solutionare measured by the n–dimensional Euclidean metrics. That means Si∗ is the distance (in anEuclidean sense) of each alternative from the ideal solution and is defined as

Si∗ =

√√√√n∑

j=1

(vij − v∗j )2 , i = 1, 2, . . . , m (4.53)

Similarly, the separation from the anti–ideal solution is given by

Si− =

√√√√n∑

j=1

(vij − v−j )2 , i = 1, 2, . . . , m (4.54)

183


Step 5. Determine the relative closeness to the ideal solution

The relative closeness of an alternative Ai to the ideal solution A∗ is then found for each designas

Ci∗ =Si−

Si∗ + Si−, where 0 < Ci∗ < 1 , i = 1,2, . . . ,m (4.55)

Apparently, an alternative Ai is closer to the ideal solution A∗ as Ci∗ approaches to 1. ThusCi∗ = 1, if Ai = A∗, and Ci∗ = 0, if Ai = A−.

Step 6. Rank the preference order among alternatives

Now a preference order can be ranked according to the descending order of Ci∗ . Therefore, thebest alternative is the one with the largest value of Ci∗ , that is, with the shortest distance to theideal solution and with the largest distance and the non–ideal solution.

Unfortunately, TOPSIS suffers from two weaknesses. Firstly, the definition of separation betweenalternatives and ideal and anti–ideal points is done via an Euclidean distance measurement. Thismetric is highly sensitive to the subjective weights used to give the weighted normalized decisionmatrix. Sensitivity is increased further for higher dimension decision spaces. The second problemconcerns the fact that the distance definition automatically assumes that the attributes can bedirectly compensated by each other in a simple manner. This may lead to the method selectingdesigns with strange balances between attributes possibly leasing to extreme solutions.

SAW reviewed through TOPSIS

Probably the best known and widely used MADM method is the Simple Additive Weighting(SAW) method. This method is so simple that some decision makers are reluctant to accept thesolution. The SAW method is re–examined here through the concept of TOPSIS.

SAW chooses an alternative which has the maximum weighted average outcome, that is, it selectsA+ such that

A+ =

Ai | max

n∑

j=1

wjrij/n∑

j=1

wj

where∑

wj = 1 and rij is the normalized outcome of Ai with respect to the jth benefit criterion(cost criterion is converted to the benefit by taking the reciprocal before normalization).

The selected alternative A+ can be rewritten as

A+ =

Ai|max

n∑

j=1

vij

184

4.9 – MAUT Method of Group Decision

Let the separation measure in TOPSIS be defined by the city block distance (Dasarathy, 1976)instead of the Euclidean distance; then the separation between Ai and Ak can be written as

Sik =n∑

j=1

|vij − vkj | , i, k = 1, 2, . . . ,m ; i 6= k

This city block distance measure has the following useful relationship for the separation measuresto both ideal and anti–ideal solutions (Yoon, 1980)

Si∗ + Si− = S∗− = K

where K is a positive constant.

This relationship states that any alternative which has the shortest distance to the ideal solutionis guaranteed to have the longest distance to the anti–ideal solution. This is not true for theEuclidean distance measure (see Fig. 4.4). Now the relative closeness to the ideal solution canbe simplified as

Ci∗ =Si−

S∗−, i = 1,2, . . . ,m

Under the hypothesis that the chosen alternative A+ can be described as

A+ = {Ai | max Ci∗}it can be proved that

A+ =

Ai|max

n∑

j=1

vij

= {Ai|max Ci∗}

so that it can be concluded that the result of SAW is a special case of TOPSIS using the cityblock distance.

4.9 MAUT Method of Group Decision

There are several approaches to extend the basic multiattribute decision making techniques forthe case of group decisions; among the others, the method developed by Csaki et al. (1995).

Consider a decision problem with r group members (decision makers) D1, . . . , Dr, n design alter-natives A1, . . . , An, and m attributes X1, . . . , Xm. In case of a factual attribute the evaluationscores must be identical for any alternative and any decision maker, while subjective attributescan be evaluated differently by each decision maker. Denote the result of the evaluation of thedecision maker Dk for alternative Aj on the attribute Xi by ak

ij . Assume that the possible prob-lem arising from the different dimensions of the attributes has already been settled, and the ak

ij

values are the results of proper transformations.

185


The individual preferences of the attributes are expressed as weights. Let the weights of impor-tance wk

i ≥ 0 be assigned to attribute Xi by the decision maker Dk, i = 1, . . . , m; k = 1, . . . , r.

The different knowledge and priority of the group members are expressed by voting powers bothfor weighting the attributes and scoring the alternatives against the attributes. For factual at-tributes only the preference weights given by the decision makers will be revised at each attributeby the voting powers for weighting. However, in case of subjective attributes, not only the weightsbut also the ak

ij values will be modified by the voting powers for scoring.

Let V (w)ki denote the voting power assigned to Dk for weighting on attribute Xi and V (q)k

i thevoting power assigned to Dk for scoring on attribute Xi. The method of calculating the grouputility (group ranking value) of alternative Aj is as follows:

• For each attribute Xi, the individual weights of importance of the attributes will be aggre-gated into the group weights Wi as

Wi =

r∑

k=1

wki V (w)k

i

n∑

k=1

V (w)ki

, i = 1, . . . , m (4.56)

• The group scoring Qij of alternative Aj against attribute Xi is

Qij =

r∑

k=1

akij V (q)k

i

n∑

k=1

V (q)ki

, i = 1, . . . , m ; j = 1, . . . , n (4.57)

• The group utility Uj of Aj is determined as the weighted algebraic mean of the aggregatedscoring values with the aggregated weights

Uj =

m∑

i=1

W ki Qij

n∑

k=1

Wi

, j = 1, . . . , n (4.58)

The best alternative of group decision is the one associated with the highest group utility. Acorrect group utility function for cardinal ranking must satisfy the axioms given in Keeney (1976).

186

4.10 – Methods for Trade–offs

4.10 Methods for Trade–offs

A shipowner sometimes trades in a second–hand ship plus some amount of money for a new shipbased upon his/her acceptance of the market offer. The procedure for this commercial transactionis applied in multiple attribute decision making situations. If he/she can settle for a lower valueon one attribute (i.e. reduce an amount in his/her own capital), how much can he/she expect toget for the improved value of another attribute (i.e. long–term profit)? Another specific examplein choosing a ship is that if the shipowner is willing to lower the range value of a ship, how muchpassenger space can he/she get if other properties remain the same?

Most MADM methods except the noncompensatory model deal with trade–offs implicitly or ex-plicitly. A trade–off is a ratio of the change in one attribute that exactly offsets a change inanother attribute.

Here two methods are discussed where trade–off information is explicitly utilized. Marginal rateof substitution (MRS) and indifference curves are the two basic terms describing the trade–offinformation.

4.10.1 Marginal Rate of Substitution

Suppose that in a ship selection problem, where two attributes x1 (range) and x2 (payload) arespecified desirable attributes while other attributes remain equal, the decision maker is asked: ifx2 is increased by ∆ units, how much does x1 have to decrease in order for the decision maker toremain indifferent? Clearly, in many cases, the answer will depend on the levels x1 of x1 and x2

of x2. If, at a point (x1, x2) the decision maker is willing to give up λ∆ units of x1 for ∆ units ofx2, then it will be said that the marginal rate of substitution (MRS) of x1 for x2 at (x1, x2) is λ.In other words, λ is the amount of x1 the decision maker is willing to pay for a unit of x2 giventhat he/she presently has x1 of x1 and x2 of x2 (Fig. 4.5). The marginal rate of substitution isthe rate at which one attribute can be used to replace another.

Making trade–offs among three attributes is usually more difficult than making trade–offs be-tween two attributes. Hence only pairs of attributes are usually considered at a time.

It should be noted that when two attributes are independent of each other (noncompensatory),trade–offs between these attributes are not relevant. In this case it is not possible to get a highervalue on one attribute even though the decision maker is willing to give up a great outcome ofanother attribute.

The marginal rate of substitution usually depends on the levels of x1 and x2, that is, on (x1,x2).For example, suppose the substitution rate at (x1,x2), the point b in Figure 4.5, is λb. If x1 isheld fixed, one might find that the substitution rates increase with a decrease in x2 and decreasewith an increase in x2 as shown at points a and c in Figure 4.5 for x1 (range) and x2 (payload).The changes in the substitution rates mean that the more of x2 the decision maker has, the lessof x1 he/she would be willing to give up to gain additional amounts of x2 and the sacrifice of

187


x1 is less at point c than at point a. This implies that the MRS at which the decision makerwould give up x1 for gaining x2 decreases as the level of x2 increases, i.e., the marginal rate ofsubstitution diminishes.

Figure 4.5. The marginal rate of substitution as a function of x1 and x2

4.10.2 Indifference Curves

Consider again the ship selection problem with x1 (fuel consumption) and x2 (payload expressedby the volume per passenger). Consider A1 as a reference point, a ship whose fuel consumptionis 26 quintals per day and whose individual cabin space is 81 ft3. A1 can be expressed, then,as the point (26, 81) in Figure 4.6. The indifference curve would require a new alternative, sayA2 (20, 95), that the decision maker would deem equivalent in preference to A1. By obtaining anumber of such points, it would be possible to trace out a curve of indifference through A1. Theindifference curve is, then, the locus of all attribute values indifferent to a reference point. Thedecision maker can draw any number of indifference curves with different reference points.

The indifference curve can be thought of as the locus of a set of alternatives among which thedecision maker is indifferent. It is particularly useful because it divides the set of all attributevalues into (i) those indifferent to the reference point, (ii) those preferred to the reference point,and (iii) those to which the reference point is preferred. It is well known that any point on thepreferred side of the indifference curve is preferred to any point on the curve or on the non–preferred side of the curve. Hence, if the decision maker is asked to compare A1 with any otheralternative, he/she can immediately indicate a choice. However, if several points are given on thepreferred side of the indifference curve, nobody can say which is the most preferred. It would be

188


necessary to draw new indifference curves. See the preference relationships of A1 ' A2 > A3 > A4

in Figure 4.6.

Figure 4.6. A set of indifference curves

Three major properties are assumed for indifference curves (MacCrimmon and Toda, 1969). Thefirst property is non–intersection as opposite to intersection which implies an intransitivity ofpreference and its occurrence would generally indicate a rushed consideration of preferences. Thesecond property relates to the desirability of the attributes considered: if the decision makerassumes both attributes are desirable, then in order to get more of one attribute he/she would bewilling to give up some amount of a second attribute. This leads to a negatively sloped curve inthe preference origin. The third property is an empirical matter, in the sense that the indifferencecurves are assumed to be convex to the preference origin. This implies that the marginal rate ofsubstitution diminishes.

Note that the slope of indifference curves in Figure 4.6 gets steeper as the curves are moved down(from right to left). Also it can be observed that the MRS at (x1, x2) is the negative reciprocalof the slope of the indifference curve at (x1,x2). Thus if indifference curves are drawn, then thedecision maker can directly calculate the marginal rate of substitution.

MacCrimmon and Toda (1969) suggest some effective methods for obtaining indifference curves.One of their methods has the three types of structured procedures: (i) generating points by fixingonly one attribute; (ii) generating points by fixing both attributes, but fixing them one at a time;and (iii) generating points by fixing both attributes simultaneously.

189


4.10.3 Indifference curves in SAW and TOPSIS

The two different separation measures of TOPSIS (by the city block distance and Euclideandistance) are contrasted with the concept of trade–offs. Mathematically, if an indifference curvepassing through a point (v1,v2) is given by

f(v1, v2) = c (4.59)

where f is a value function and c is a constant, then the marginal rate of substitution, λ, at(v1, v2) can be obtained as

λ =(−dv1

dv2

)

(v1,v2)=

(∂f/∂v2

∂f/∂v1

)

(v1,v2)

(4.60)

The Simple Additive Weighting method or TOPSIS method with city block distance measure hasthe value function of

f(v1, v2) = v1 + v2 (4.61)

The MRS is then given by λ = 1 (actually λ = w2/w1 in x1 and x2 space). This implies that theMRS in SAW is constant between attributes, and the indifference curves form straight lines withthe slope of -1. A constant MRS is a special rare case of MRS, which implies that the local MRSis also the global MRS.

TOPSIS with Euclidean distance measure has the value function of

f(v1, v2) =Si−

Si− + Si∗= c =

√(v1 − v−1 )2 + (v2 − v−2 )2√

(v1 − v−1 )2 + (v2 − v−2 )2 +√

(v1 − v∗1)2 + (v2 − v∗2)2(4.62)

The MRS is now calculated by

λ =S2

i− (v∗2 − v−2 ) + S2i∗ (v2 − v−2 )

S2i− (v∗1 − v−1 ) + S2

i∗ (v1 − v−1 )(4.63)

It is evident that the marginal rate of substitution depends on the levels of v1 and v2 except atthe point where distances to the ideal and anti–ideal solution are equal, i.e., when Si∗ = Si− .

In this case

λ =v∗2 − v−2v∗1 − v−1

(4.64)

and it is not easy to illustrate the general shapes of the indifference curves.

If the value function is rewritten as

f(v1, v2) = c Si∗ − (1− c)Si− = 0 (4.65)

where 0 < c < 1, this expression indicates a variation of hyperbola where the difference of itsweighted distances from the ideal and anti–ideal points is zero.

190


Some typical indifference curves are shown in Figure 4.7. Any curves with c ≥ 0.5 are convexto the preference origin, which indicate the property of the diminishing MRS observed in mostindifference curves (MacCrimmon and Toda, 1969), whereas indifference curves with c ≤ 0.5 areconcave to the preference origin.

Figure 4.7. Typical indifference curves observed in TOPSIS

This is an unusual case of indifference curves, but it may be interpreted as a risk–prone attituderesulting from a pessimistic situation; when a decision maker recognizes his/her solution is closerto the anti-ideal than to the ideal one, he/she is inclined to take one which has the best attributewith the other worst attribute.. This approach can be viewed, therefore, as an amalgamation ofoptimistic and pessimistic decision methods which is presented by the Hurwirtz rule (Hey, 1979).

4.10.4 Hierarchical Trade–Offs

When interdependency exists among attributes, the consideration of trade–offs allows the decisionmaker to make the alternatives much more comparable than they are initially. That is, he/shecan make alternatives equivalent for all attributes except one by trade–offs, and then evaluatethe alternatives by the attribute values of the remaining one (Mac Crimmon, 1969; MacCrimmonand Wehrung, 1977).

The simplest way to deal with trade–offs on n attributes is to ignore all but two attributes; thenattributes are discarded one by one through the trade–offs between the natural combination of twoattributes. The indifference curves easily facilitate this equalization process. Suppose alternativesare located on the indifference curves. The one attribute level can be easily driven to the samelevel, and the corresponding modified value of the other attribute is read. The attribute whichis driven to the same level (called base level) is no longer necessary for further consideration. Ifthis procedure can be carried through for pairs of the remaining (n− 2) attributes, the decision

191


maker will have a new set of n/2 attributes. Similarly, if these composite attributes also havepairs of natural combinations, he/she can consider the trade–offs among the pairs and use theindifference curves he/she obtains to scale a new higher order composite attribute. The decisionmaker can continue this hierarchical combination until he/she obtains two high–order compositeattributes for which he/she again forms the trade–off. In the end, all the attributes might beincorporated.

To select the preferred alternative with this approach, the decision maker must be able to lo-cate it in the final composite space. This can be done by assuring that each alternative is onan indifference curve in the initial space; thus, the combination of values defining an alternativewill be one of the scale values for the new attribute. By including these combinations on anindifference curve each step of the way, the decision maker can ensure that the alternatives willbe representable in the highest order space finally considered.

The use of this method requires that the attributes be independent among the initial classes.That is, while the trade–off between any initial pair can be nonconstant and highly interrelated,this trade–off cannot depend on the level of the other attributes. This restriction suggests thata useful way to form the initial pairs is by grouping attributes that seem relatively independentfrom the other ones.

A drawback of the hierarchical trade–off analysis with two attributes at a time may be its slownessin reducing attributes. MacCrimmon and Wehrung (1977) propose the lexicographic trade–offsfor eliminating this difficulty. If the most important class by the lexicographic method has morethan one attribute, the decision maker forms trade–offs among these attributes. The second mostimportant class of attributes is considered only if there are several alternatives having equallypreferred attribute values in the most important class. This extended lexicography overcomes thenoncompensatory characteristic of the standard lexicography by considering trade–offs within aclass.

Trade–off information is more useful when designing multiple attribute alternatives than whenchoosing among final versions of them.

192

Bibliography

[1] Benayoun, R., Roy, B. and Sussman, N.: Manual de Reference du Programme Electre, Note deSynthese et Formation, no. 25, Direction Scientifique SEMA, Paris, 1966.

[2] Bernardo, J.J. and Blin, J.M.: A Programming Model of Consumer Choice among Multi–AttributeBrands, Journal of Consumer Research, Vol. 4, no. 2, 1977, pp. 111–118.

[3] Calpine, H.C. and Golding, A.: Some Properties of Pareto–Optimal Choices in Decision Problems,OMEGA, Vol. 4, no. 1, 1976, pp. 141–147.

[4] Chu, A.T.W., Kalaba, R.E. and Spingarn, K.: A Comparison of Two Methods for Determining theWeights of Belonging to Fuzzy Sets, Journal of Optimization Theory and Applications, Vol. 27,no. 4, 1979, pp. 531–538.

[5] Dasarathy, E.V.: SMART: Similarity Measure Anchored Ranking Technique for the Analysis ofMultidimensional Data Analysis, IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC–6,no. 10, 1976, pp. 708–711.

[6] Dawes, R.M.: Social Selection Based on Multidimensional Criteria, Journal of Abnormal and SocialPsychology, Vol. 68, no. 1, 1964, pp. 104–109.

[7] Hey, J.D.: Uncertainty in Microeconomics, Martin Robertson, Oxford, 1979.

[8] Hwang, C.L. and Yoon, K.: Multiple Attribute Decision Making; Methods and Application - AState-of-the-Art Survey , Springer–Verlag, Berlin–Heidelberg, 1981.

[9] Keeney, R.L.: Decisions with Multiple Objectives: Preference and Value Tradeoffs, John Wiley, NewYork, 1976.

[10] Keeney, R.L. and Raiffa, H.: A Group Preference Axiomatization with Cardinal Utility , ManagementScience, Vol. 23, 1976, pp. 140–145.

[11] Klee, A.J.: The Role of Decision Models in the Evaluation of Competing Environmental HealthAlternatives, Management Science, Vol. 18, no. 2, 1971, pp. B52–B67.

[12] Linkov, I. Varghese, A., Jamil, S., Seager, T.P. and Bridges, T.: Multicriteria Decision Analysis:A Framework for Structuring Remedial Decisions at the Contaminated Sites, in ‘Comparative RiskAssessment and Environmental Decision Making’, Linkov and Ramadan eds. Spriger, New York,2004, pp. 15–54.

[13] MacCrimmon, K.R. and Toda, M.: The Experimental Determination of Indifference Curves, TheReview of Economic Studies, Vol. 36, no. 4, 1969, pp. 433–450.

[14] MacCrimmon, K.R. and Wehrung, D.A.: Trade–off Analysis; the Indifference and Preferred Propor-tions Approaches, in ‘Conflicting Objectives in Decisions, Bell et al. eds., John Wiley, New York, 1977.

[15] Moskowitz, H. and Wright, G.P.: Operation Research Techniques for Management , Prentice–Hall,1979.

[16] Neumann, von J. and Morgenstern, O.: Theory of Games and Economic Behavior , PrincetonUniversity Press, Princeton, 1947.

193

Bibliography

[17] Nijkamp, P. and Delft, van A.: Multi–Criteria Analysis and Regional Decision–Making , Martinus–Nijkoff Social Sciences Division, Leiden, 1977.

[18] Roy, B.: A Conceptual Framework for a Prescriptive Theory of ‘Decision–Aid’ , in Multiple CriteriaDecision Making, Cochrane & Zeleny eds., 1973, pp. 179–201.

[19] Saaty, T.L.: A Scaling Method for Priorities in Hierarchical Structures, Journal of MathematicalPsychology, Vol. 15, no. 3, 1977, pp. 234–281.

[20] Sen, P. and Yang, J.B.: Multiple Criteria Decision Support in Engineering Design, Springer–Verlag,Belin-Heidelberg, 1998.

[21] Shannon, C.E.: A Mathematical Theory of Communication, Bell System Technical Journal, Vol. 27,1948, pp. 379–423.

[22] Shannon, C.E. and Weaver, W.: The Mathematical Theory of Communication, The University ofIllinois Press, Urbana, III, 1947.

[23] Simon, H.A.: A Behavioral Model of Rational Choice, Quarterly Journal of Economics, Vol. 69,no. 1, 1955, pp. 99–114.

[24] Starr, M.K. and Greenwood, L.H.: Normative Generation of Alternatives with Multiple CriteriaEvaluation, in Multiple Criteria Decision Making, Starr & Zeleny eds., North Holland, New York,1977, pp. 111–128.

[25] Tversky, A.: Intransitivity of Preferences, Psychological Review, Vol. 76, no. 1, 1969, pp. 31–48.

[26] Tversky, A.: Elimination by Aspects: A Probabilistic Theory of Choice, Michigan MathematicalPsychology Program MMPM 71–12, The University of Michigan, Ann Arbor, 1971.

[27] Yoon, K.: Systems Selection by Multiple Attribute Decision Making , Ph.D. Dissertation, KansasState University, 1980.

[28] Yoon, K. and Hwang, C.L.: Multiple Attribute Decision Making: An Introduction, Sage, ThousandOaks, 1985.

[29] Zeleny, M.: Linear Multiobjective Programming , Springer–Verlag, Berlin/Heidelberg, 1974.

194

Chapter 5

Optimization Methods

The subject which started as operation research during the Second World War has grown theoret-ically and also in its applications to a variety of problems in different fields, such as engineering,economics and management. In its more comprehensive sense, which includes data collection,mathematical modelling, solutions of mathematical problems and improvements through feedbackof results, the subject has come to be known as systems analysis. The mathematical contents ofsystems analysis concerned with optimization of objectives may be grouped under the headingoptimization methods, which form the subject matter of these notes.

This chapter is an elementary mathematical introduction to classical optimization techniques, lin-ear and nonlinear programming and direct search methods. Most of the chapter can be studiedindependently of each other. A knowledge of algebra (including matrices), calculus and geometryis assumed.

In their application to real life, problems in systems analysis and operations research usuallyinvolve large number of variables, parameters, equations and constraints. The problems generallyinvolve too much numerical work which can be handled only by digital computer. For this reasonthe methods of solution are computer oriented. The criterion of suitability of a method is oftenthe economy and efficiency with which it can be programmed on the computer.

195

5 – Optimization Methods

5.1 Mathematical Modelling

Optimization is the act of achieving the best result under given conditions. In design, construc-tion, and maintenance of any technical system, engineers have to take many technological andmanagerial decisions at several stages. The ultimate goal of whichever decision is to either mini-mize the effort required or maximize the desired benefit. Since the effort required or the benefitdesired in any practical situation can be expressed as a function of certain decision variables,optimization can be defined as the process of finding the conditions that give the maximum orminimum value of a function. It can be seen from Figure 5.1 that if a point x∗ corresponds tothe minimum value of function f(x), the same point also corresponds to the maximum value ofthe negative of the function, −f(x). Thus, without loss of generality, optimization can be takento mean minimization since the maximum of a function can be found by seeking the minimumof the negative of the same function.

Figure 5.1. Minimum of f(x) is same as maximum of −f(x)

A vector of objective functions is denoted as f(x) with components fi(x), i=1,...,n. The standardform of an optimization problem assumes that all the objective functions are to be minimized. Ifthere is a problem where some function f(x) is to be maximized instead, it can be transformedto the standard form by minimizing f(x) = −f(x).

Design problems are translated into mathematical optimization problems by identifying the fol-lowing elements of the mathematical model for all technical systems. The first is to specify designvariables, which the design team can change in order to optimize its design. The second is todefine objective functions, which are figures of merit to be minimized or maximized. The third isto identify constraint functions, which specify limits that must be satisfied by the design variables.

If an n–dimensional cartesian space with each coordinate axis representing a design variable xi isconsidered, this space is called the design space. Each point in the design space is called a designpoint , which represents a feasible, or non–dominated solution to the design problem.

Solving a problem with a multiobjective function is much more complicated than solving a prob-lem with a single objective function. The solution to an optimization problem with a single

196

5.1 – Mathematical Modelling

objective is usually a single design point. On the contrary, when there are multiple objectives,the solution is usually a subspace of designs. This subspace is characterized by the condition(called Pareto optimality) that no objective function can be improved without some deteriora-tion in another objective function. Because of the complexity associated with multiple objectivefunctions, for the time being focus will be put mostly on problems with a single objective.

The optimum searching methods are also known as mathematical programming techniques and aregenerally studied as a part of operation research. Operation research is a branch of mathematicswhich is concerned with the application of scientific methods and techniques to decision–makingproblems and with establishing the best or optimal solutions. There is no single method availablefor solving all kinds of optimization problems efficiently. Hence a number of optimization methodshave been developed for solving different types of optimization problems. These methods can bebroadly divided in three categories:

1. Mathematical programming techniques

• Calculus methods

• Calculus of variations

• Geometric programming

• Linear programming

• Integer programming

• Nonlinear programming

• Quadratic programming

• Stochastic programming

• Multiobjective programming

• Dynamic programming

• Theory of games

• Network methods

2. Stochastic process techniques

• Statistical decision theory

• Markov processes

• Queueing theory

• Simulation methods

• Reliability theory

3. Statistical methods

• Regression analysis

• Cluster analysis, patter recognition

• Design of experiments

• Factor analysis

197


The mathematical programming techniques are useful in finding the minimum of a function ofseveral variables eventually under a prescribed set of constraints. The stochastic process tech-niques can be used to solve problems which are described by a set of random variables havingknown probability distributions. The statistical methods enable the decision maker to analyzeexperimental data and databases in order to build empirical models which should provide themost accurate representation of a physical phenomenon. This chapter essentially deals with thetheory of mathematical programming techniques that are suitable for the solution of engineeringproblems.

5.2 Historical Development

The existence of optimization methods can be traced back to the days of Newton, Lagrange andCauchy. The development of differential calculus methods for optimization was possible becauseof the contributions of Newton and Leibnitz. The foundations of calculus of variations were laidby Bernoulli, Euler, Lagrange and Weierstrass. Cauchy made the first application of the steepestdescent method to solve unconstrained minimization problems. The method of optimization forconstrained problems, which involves the addition of unknown multipliers, became known by thename of its inventor, Lagrange. In spite of these early contributions, very little progress wasmade until the middle of the twentieth century, when high–speed digital computers made theimplementation of the optimization procedures possible and stimulated further research on newmethods. Spectacular advances followed, producing a massive literature on optimization tech-niques and the emergence of several new areas in optimization theory.

It is interesting to note that the major developments in the area of numerical methods of uncon-strained optimization have been made in the United Kingdom only in the 1960s. The developmentof the simplex method by Dantzig in 1947 for linear programming problems and the statementof principle of optimality by Bellman in 1957 for dynamic programming paved the way for thedevelopment of the methods of constrained optimization. The work by Kuhn and Tucker in 1951on the necessary and sufficient conditions for the optimal solution of programming problems laidthe foundations for a great deal of later research in nonlinear programming . Although no singletechnique has been refined to be universally applicable for nonlinear programming problems, theworks by Carroll and Fiacco, and McCormick as well, made many difficult problems feasible tosolve by using the well–known techniques of unconstrained optimization. Geometric programmingwas developed in the 1960s by Duffin, Zener and Peterson. Gomory did pioneering work in in-teger programming , which is one of the most exciting areas of optimization, since most of thereal–world applications fall under this category of problems. Dantzig and Chames, and Cooperas well, developed stochastic programming techniques and solved some optimization problems byassuming design parameters to be independent and normally distributed. The desire to optimizemore than one objective or goal while satisfying the physical constraints led to the developmentof multiobjective programming methods. Goal programming is a well known technique for solv-ing specific types of multiobjective optimization problems. It was originally proposed for linear

198

5.3 – Statement of an Optimization Problem

problems by Chames and Cooper. Network analysis methods are essentially management controltechniques and were developed during the 1950s. The foundations of the theory of games werelaid by von Neumann in 1928 and since then it has been applied to solve several mathematicaleconomics and military problems.

Except for industrial engineering problems, there are very few linear problems in engineeringdesign. A bulk of research work has gone into developing nonlinear techniques, and literallydozen of numerical techniques have been developed. However, none have been as successful asthe linear case, which guarantees a global optimum in a finite number of steps. Despite this, thereare a number of good nonlinear methods that work successfully in most applications (Fletcher,1970; Siddall, 1972).

5.3 Statement of an Optimization Problem

Optimization problems can be formulated using widely varying notation, which can inhibit ef-fective communication about mathematical properties, algorithms and software. For this reason,there is a tendency to adopt a standard form of optimization formulation. Unfortunately, thistendency has not yet been fully realized, and the standard form described in this section is notuniversal, although it is quite common in engineering optimization textbooks.

The standard formulation of a mathematical programming problem for a single objective functioncan be stated as

Find x = {x1,x2, . . . ,xn} which minimizes f(x)

subject to the constraints gj(x) ≤ 0 , j = 1,2,...,m

hk(x) = 0 , k = 1,2,...,p

with xL ≤ x ≤ xU

(5.1)

where x is the n–dimensional vector of variables, whereas the design vector f(x) belonging to asubset of the n-dimensional real space Rn, is called the objective function, and gj and hk are theset of algebraic inequality and equality constraints, respectively. Vectors xL and xU denote thelower limit vector and upper limit vector, respectively.

The number of variables n and the number of constraints m and/or p need not be related in anyway. The problem stated in equation (5.1) is called a constrained optimization problem 1.

When an optimization problem does not involve any constraint, then it is called an unconstrainedoptimization problem and can be stated simply as

Find x = {x1,x2, . . . ,xn} which minimizes f(x) (5.2)

1In the mathematical programming problems, the equality constraints hk(x) = 0 , k = 1,2,...,p are oftenneglected, for simplicity, in the statement of a constrained optimization problem although several methods areavailable for handling problems with equality constraints.

199


5.3.1 Definitions

It is worthwhile to establish a common language at least for the most important definitions inmathematical programming problems. They are design vector distinguishing between variablesand parameters, single and multiobjective function, design constraints (functional and geomet-ric, active and inactive), feasible solution and feasible domain, simple and composite constraintsurface, free and bound design point.

Design vector

Any engineering system or component is described by a set of quantities some of which are viewedas variables during the design process. In general certain quantities are usually fixed from theoutset and these are called preassigned parameters. The vector of design variables is denotedas x with components x1, x2, ..., xn, denoting the n design variables. In general, vectors aredenoted by bold characters, whereas their components have the same character in regular font,with subscripts indicating the component number. Design variables can be real or integer orbinary ; they can be continuous or discrete.

When optimization problems are solved numerically, there is substantial advantage in scaling allquantities to avoid ill–conditioned problems. It is customary to scale all design variables so thatthey are all of order 1. Besides improving the numerical conditioning of the problem, this practicealso creates unit–independent design variables, which is often an advantage.

Design constraints

In many practical problems, the design variables have to satisfy certain specified functional andother requirements. The restrictions that must be satisfied in order to produce an acceptabledesign are collectively called design constraints. The constraints which represent limitations onthe behavior or performance of the technical system are termed as behavior or functional con-straints. The constraints which represent physical limitations on the design variables are knownas geometric constraints.

In engineering applications most constraints are inequality constraints. However, occasionallyequality constraints are also used. As in the case of design variables, it is worthwhile to trans-form constraints into non-dimensional forms of similar magnitudes.

A constraint which is satisfied with a margin is called inactive. A constraint with a negativevalue is called violated . When gj(x) = 0, the constraint is active. A design point which satisfiesall the constraints is called feasible, while a design point which violates even a single constraint iscalled infeasible. The collection of all feasible points is called the feasible domain, or occasionallythe constraint set .

200


Constraint surface

For illustration, consider an optimization problem with only inequality constraints gj(x) ≤ 0.The set of values of x that satisfy the equation gj(x) = 0 forms a hypersurface in the designspace, which is called a constraint surface. The constraint surface divides the design space intotwo regions; one in which gj(x) < 0 and the other in which gj(x) > 0. Thus, the points lying onthe hypersurface will satisfy the constraint gj(x) critically whereas the points lying in the regionwhere gj(x) < 0 are feasible. The set of all the constraint surfaces gj(x) = 0 , j = 1,2, . . . ,m,which separates the feasible region, is called the composite constraint surface.

Figure 5.2 shows a hypothetical two–dimensional design space where the unfeasible region is in-dicated by hatched lines. A design point which lies on one or more than one constraint surfaceis called a bound point and the associated constraint is called an active constraint. The designpoints which do not lie on any constraint surface are known as free points. Depending uponwhether a particular design point belongs to the feasible or unfeasible region, it can be identifiedas one of the following four types:

• free feasible point,• free unfeasible point,• bound feasible point,• bound unfeasible point.

Figure 5.2. Constraint surfaces in a 2D design space

Objective function

The concept design procedures aim to find the ‘best possible’ design, which merely satisfies thefunctional, geometric and other requirements of the problem. The criterion with respect to whicha subsystem of the design is optimized, when expressed as a function of the design variables isknown as criterion or merit or objective function of the mathematical model. The choice of theobjective function is governed by the nature of the problem. For example, the objective functionfor minimization may be generally taken as steel weight in ship, aircraft and aerospace structural

201


design problems. The maximization of mechanical efficiency is the obvious choice of an objectivein mechanical engineering systems design.

However, there may be cases where the optimization with respect to a single objective may lead toresults which may not be satisfactory with respect to another criterion. For example, in propellerdesign the geometry established for maximizing efficiency might not correspond to the one thatwould minimize the induced pressure forces. Similarly, in statically indeterminate structures, thefully stressed design may not correspond to the minimum weight design; again, it may not be thecheapest one.

In many situations, it could feasible or even necessary to identify more than one criterion tosatisfy simultaneously. For example, a gear-pair may have to be designed for minimum weightand maximum efficiency while transmitting a specified horse power. With multiple objectivesthere arises possibility of conflict and one simple way to handle this issue is to assign somesubjective preference weights and to take the actual objective function as a linear combinationof the conflicting multiple objective functions. Thus, if f1(x) and f2(x) are the two objectivefunctions, it is possible to formulate the objective function for optimization as

f(x) = w1 f1(x) + w2 f2(x) with w1 + w2 = 1 (5.3)

Objective function surfaces

The locus of all points satisfying f(x)= c = constant forms a hypersurface in the design spaceand for each value of the constant c there corresponds a different member of a family of surfaces.These surfaces, called the objective function surfaces, are shown in a hypothetical two–dimensionaldesign space in Figure 5.3.

Figure 5.3. Contours of the objective function

Once the objective function surfaces are drawn along with the constraint surfaces, the optimumpoint can be determined without much difficulty. But the main problem is that as the numberof design variables exceeds two or three, the constraint and objective function surfaces becomecomplex even for visualization and the problem has to be solved purely as a mathematical problem.

202


5.3.2 Design Optimization

A design model at preliminary design stage is called an optimization model , where the ‘best’design selected is called the optimal design and the criterion used is called the objective of themodel. Some optimization models will be studied later. What follows is just a discussion of theway design optimization models can be used in practice.

Optimal Design Concept

Today design is still the ultimate expression of the sclence of engineering. From the early days ofengineering, the goal has been to improve the design so as to achieve the best way of satisfyingthe original need, within the available means.

The design process can be organized in many ways, but it is clear that there are certain elementsin the process that any descrlption must contain: a recognition of need , a phase of generation, anda selection of alternatives. Traditionally, the improvement of the ‘best’ aiternative is the phase ofdesign optimization. In a traditional description of the design phases, recognition of the originalneed is followed by a technical statement of the problem (problem definition), the generation ofone or more physical configurations (synthesis), the study of the candidates’ performance usingengineering sclence (analysis), and the improvement of the ‘best’ alternative (optimization). Theprocess concludes with experimental validation of the prototype against the original need.

Such sequential descrlption, though perhaps useful for educational purposes, cannot describe re-ality adequately since the question of how a ‘best’ design is improved within the avallable means,is pervasive, influencing all phases where decisions are made.

So what is design optimization? One may recognize that a rigorous definition of ‘design opti-mization’ can be reached if the following questions are answered:

1. How to describe different designs?2. What is the criterion for enhancing the ‘best’ design?3. What are the ‘available means’?

The first question is addressed by describing a design as a system defined by design variablesand parameters. The second question requires decision–making models where the idea of ‘bestpossible’ design is introduced and the criterion for an optimal design was called an objective.

Designers are left with the last question on the ‘available means’ by which decision makers signifya set of requirements that must be satisfied by any acceptable design. These design requirementsmay not be uniquely defined but are under the same limitations as the choice of problem objec-tive and variables. In addition, the choices of design requirements that must be satisfied are veryintimately related to the choice of objective function and design variables.

To summarize, informally, but rigorously, it can be said that design optimization involves:

1. the selection ot a set of variables to describe the design alternatives;2. the selection of an objective, expressed in terms of the design variables, to be minimized or

maximized;

203


3. the determination of a set of constraints, expressed in terms of the design variables, whichmust be satisfied by any acceptable design;

4. the determination of a set of values for the design variables, which minimize (or maximize)the objective, while satisfying all the constraints.

Optimal Product Development

The motivation for using design optimization models is improvement of the ‘most preferred’ designselected at concept design stage, which represents a compromise of many different requirements.Clearly, if this attempt is successful, substantial cost savings will be realized. Such optimizationstudies may provide the competitive edge in product design.

In the case of product development , a new original design may be represented by its model. Designalternatives can be generated by manipulating the values of the design variables. Also, changesin design parameters can show the effect of environmental changes on a particular design. Theobjective criterion will help select the best of all preferred alternatives. Consequently, a prelimi-nary design is developed. How good it is depends on the model used. Many details must be leftout because of modelling difficulties. But with accumulated experience, reliable elaborate modelscan be constructed and design costs may be drastically reduced.

In the case of product enhancement , an existing design can be described by a model. At this designstage engineering designers should not be interested in drastic design changes that might resultfrom a full–scale optimization study, but in relatively small design changes that might improvethe performance of the product. In such circumstances, the model can be used to predict theeffect of the changes. Design cost and cycle time will be reduced. Sometimes this type of modeluse is called sensitivity study , to be distinguished from a complete optimization study .

5.3.3 Graphical Optimization

When the design problem can be formulated in terms of only two design variables, graphicalmethods can be profitably used to solve the problem and gain understanding on the nature of thedesign space. As visualizing the design space is a powerful tool for understanding the trade-offsassociated with a design problem, graphical methods are often used even when the number ofdesign variables is greater than two. In that case, one looks at special forms of the design problemwith some of the design variables frozen, and two allowed to vary.

The design space could be some part of the earth’s surface that would represent the objectivefunction. Mountain peaks would be maxima, and valley bottoms would be minima. An equalityconstraint would be a road one must stay on. An inequality constraint could be a barrier witha no trespassing sign. In fact, some optimization jargon comes from topography. Much can begained by this visualization and used to describe features of the design space. One should keepin mind, however, that certain unexpected complexities may arise in problems with dimensionshigher than two, which may not be immediately evident from the three-dimensional image.

204


Interior and boundary optima

A problem such as

minimize f = f(x)

subject to g1(x) ≤ 0

subject to g2(x) ≤ 0

(5.4)

can be represented by a two-dimensional picture, as in Figure 5.4.

Figure 5.4. One–dimensional representation

If the functions behave as shown in the figure, the problem is restated simply as

minimize f = f(x)

subject to xL ≤ x ≤ xU

}(5.5)

The function f(x) has a unique minimum x∗, an interior minimum, lying well within the range[xL, xU ]. The point x∗ may also be called an unconstrained minimum, in the sense that theconstraints do not influence its location, that is, g1 and g2 are both inactive. It is possible,however, that problem (5.4) may result in all three situations shown in Figure 5.5. Therefore, ifx∗ is the minimum of the unconstrained function f(x), the solution to problem (5.4) is generallygiven by selecting x∗ such that it is the middle element of the set (xL, x∗, xU ) ranked accordingto increasing order of magnitude, with x∗ being the unconstrained optimum. In cases (b) and (c)where x∗ = xL and x∗ = xU , respectively, the optima are boundary optima because they occur atthe boundary of the feasible region.

205


Figure 5.5. Possible bounding of minimum

In two-dimensional problems the situation becomes more complicated. A function f(x1,x2) isrepresented by a surface, so the feasible domain would be defined by the intersection of surfaces.It is obviously difficult to draw such pictures, so a representation using the orthogonal projectioncommon in engineering design drawings may be more helpful. Figure 5.6 shows a map of thedependence of vertical acceleration in a cabin of a ro–ro/pax vessel on two hull form geometricalvariables. The region where the acceleration values are higher are dark, while the region wherethe acceleration values are lower are light. The curved lines separating bands of shadings arecalled function contours. Along these lines the vertical acceleration is constant, so that they areakin to isotherms or isobars on a weather map.

206

5.4 – Classical Optimization Techniques

av(rms) (SS6)Fitting With Weighted Squares

2.2 2 1.8 1.6 1.4 1.2

6.4006.600

6.8007.000

7.2007.400

7.6007.800

8.0008.200

L/V1/3

13.20

13.40

13.60

13.80

14.00

14.20

14.40

14.60

14.80

15.00

15.20

15.40

L1/2

Figure 5.6. Vertical accelerations for a family of ro-ro vessels

5.4 Classical Optimization Techniques

The classical methods of optimization techniques are useful in finding the optimum of continuousand differentiable functions. These methods are analytical and make use of the techniques ofdifferential calculus in locating the optimum points. Since some of the practical problems in-volve objective functions that are not continuous and/or differentiable, the classical optimizationtechniques have limited scope in practical applications. However, a study of the calculus meth-ods of optimization forms a basis for developing most of the numerical techniques of optimization.

What follows presents the necessary and sufficient conditions in locating the optimum of a singlevariable function, a multivariable function without constraints, and multivariable function withequality and inequality constraints. The application of differential calculus will be consideredin the unconstrained optimization of single and mu1tivariable functions. The methods of directsubstitution, constrained variation and Lagrange multipliers will be discussed for the minimizationof a function of several variables subject to equality constraints. The application of Kuhn–Tuckernecessary conditions for the solution of a general nonlinear optimization problem with inequalityconstraints is illustrated. The convex programming problem is also defined, for which the Kuhn–Tucker conditions are both necessary and sufficient.

207


5.4.1 Single Variable Optimization

It occurs rather rarely in practice that the optimum value of a function of just one variable isrequired. However, several numerical techniques for nonlinear programming require its usage aspart of the computation strategy.

A function of one variable f(x) is said to have a relative or local minimum at x = x∗ iff(x∗) ≤ f(x∗+h) for all sufficiently small positive and negative values of h. Similarly a point x∗

is called a relative or local maximum if f(x) ≥ f(x∗+h) for all values of h sufficiently close to zero.

A function f(x) is said to have a global or absolute minimum at x∗ if f(x∗) ≤ f(x) for all x inthe domain over which f(x) is defined. Similarly a point x∗ will be a global maximum of f(x)if f(x∗) ≥ f(x) for all x in the domain. Figure 5.7 shows the difference between the relative andthe global optimum points.

Figure 5.7. Relative and global minima

A single variable optimization problem is one in which the value of x = x∗ is to be found in theinterval [a,b] such that x∗ minimizes f(x). The following two theorems provide the necessary andsufficient conditions for the relative minimum of a function of a single variable.

Necessary Condition (theorem 1). If a function f(x) is defined in the interval a ≤ x ≤ b

and has a relative minimum at x = x∗ where a ≤ x∗ ≤ b , and if the derivative df(x)/dx = f ′(x)exists as a finite number at x = x∗ , then f ′(x∗) = 0.

Discussion

1. This theorem can be proved even if x∗ is a relative minimum.

2. The theorem does not say what happens if a minimum or maximum occurs at a point x∗

where the derivative fails to exist (Fig. 5.8). If f ′(x∗) does not exist, the above theorem isnot applicable.

3. The theorem does not say what happens if a minimum or maximum occurs at an end pointof the interval of definition of the function.

208


4. The theorem does not say that the function necessarily will have a minimum or maximumat every point where the derivative is zero; it may happen that this point is neither a min-imum nor a maximum. In general, a point x∗ at which f ′(x∗) = 0 is called a stationarypoint .

Figure 5.8. Derivative undefined at x∗

If the function f(x) possesses continuous derivatives of every order in the neighborhood of x = x∗,the following theorem provides the sufficient condition for the minimum or maximum value ofthe function.

Sufficient Condition (theorem 2). Let f ′(x∗) = f ′′(x∗) = ... = f (n−1)(x∗) = 0, butf (n)(x∗) 6= 0. Then f(x∗) is (i) a minimum value of f(x) if f (n)(x∗) > 0 and n is even,(ii) a maximum value of f(x) if f (n)(x∗) < 0 and n is even, (iii) neither a maximum nor aminimum if n is odd .

In the latter case the point x∗ is called an inflection point .

5.4.2 Multivariable Optimization without Constraints

To discuss the necessary and sufficient conditions for the minimum or maximum of a function ofmultiple variables without any constraints, it is necessary to formulate Taylor’s series expansionof a multivariable function, which previously requires definition of the rth differential of thatfunction.

Definition of the rth differential of a multivariable function

If all partial derivatives of the function f through order r ≥ 1 exist and are continuous at a pointx∗, then the polynomial

d rf(x∗) =n∑

i=1

n∑

j=1

...n∑

k=1

hi hj ...hk∂rf(x∗)

∂xi ∂xj ... ∂xk(5.6)

is called the rth differential of f at x∗. Notice that there are r summations and one hi is associ-ated with each summation in equation (5.6).

209


For example, when r = 2 and n = 3, one has

drf(x∗) = d2f(x∗1,x∗2,x

∗3) =

3∑

i=1

3∑

j=1

hihj∂2f(x∗)∂xi∂xj

= h21

∂2f(x∗)∂x2

1

+ h22

∂2f(x∗)∂x2

2

+ h23

∂2f(x∗)∂x2

3

+

2h1h2∂2f(x∗)∂x1∂x2

+ 2h1h3∂2f(x∗)∂x1∂x3

+ 2h2h3∂2f(x∗)∂x2∂x3

Taylor’s series expansion

The Taylor’s series expansion of a multivariable function f(x) about a point x∗ is a multipleseries expansion given by

f(x) = f(x∗) + df(x∗) +12!

d2f(x∗) +13!

d3f(x∗) + . . . +1n!

dnf(x∗) + Rn(x∗,h) (5.7)

where the last term is called the remainder, and is given by

Rn(x∗,h) =1

(n + 1)!d(n+1)f(x∗ + θ h)

where 0 < θ < 1, and h = x− x∗.

Necessary Condition (theorem 3). If f(x) has an extreme point (maximum or minimum) atx = x∗, and if the first partial derivatives of f(x) exist at x∗, then

∂f

∂x1(x∗) =

∂f

∂x2(x∗) = . . . =

∂f

∂xnx∗ = 0 (5.8)

Sufficient Condition (theorem 4). A sufficient condition for a stationary point x∗ to be anextreme point is that the matrix of second partial derivatives (Hessian matrix) of f(x) evaluatedat x∗ is (i) positive definite when x∗ is a minimum point, and (ii) negative definite when x∗ isa maximum point .

Note: A matrix A is positive definite if all its eigenvalues are positive, i.e. all the values of λ,

which satisfy the determinant equation

| A− λ I |= 0

are positive.

210


Saddle Point

In the case of a function of two variables, f(x,y), the Hessian matrix may be neither positive nornegative definite at a point (x∗,y∗) at which

∂f

∂x=

∂f

∂y= 0

In such a case, the point (x∗,y∗) is called a saddle point . The characteristic of a saddle point isthat it corresponds to a relative minimum or maximum of f(x,y) with respect to one variable,say, x (the other variable being fixed at y = y∗) and a relative maximum or minimum of f(x,y)with respect to the second variable y (the other variable being fixed at x = x∗).

As an example, consider the function f(x,y) = x2 − y2. For this function,

∂f

∂x= 2x and

∂f

∂y= −2y

These first derivatives are zero at x∗ = 0 and y∗ = 0. Since the Hessian matrix of f at (x∗,y∗) isneither positive definite nor negative definite, the point (x∗ = 0, y∗ = 0) is a saddle point. Thefunction is shown in Figure 5.9. It can be seen that f(x,y∗) = f(x,0) has a relative minimum andf(x∗,y) = f(0,y) has a relative maximum at the saddle point (x∗,y∗).

Figure 5.9. Saddle point of the function f(x,y) = x2 − y2

Saddle points may exist for functions of more than two variables too. The characteristic of thesaddle point stated above still holds provided that x and y are interpreted as vectors in multidi-mensional cases.

The saddle point may be particularly tricky to rule out because it appears to be a minimum ifone approaches it from only certain directions. Yet both ascending and descending directionslead away from it. All points at which the gradient is zero are collectively called stationary point .

211


5.4.3 Multivariable Optimization with Equality Constraints

The optimization of continuous functions subject to equality constraints can be stated as

minimize f = f(x)

subject to gj(x) = 0 , j = 1,2, . . . ,m

}(5.9)

where m is less than or equal to the number of variables n; otherwise (if m > n), the problembecomes overdefined and, in general, there will be no solution. There are several methods availablefor the solution of this problem. The methods of direct substitution, constrained variation andLagrange multipliers are discussed below.

Method of Direct Substitution

In the unconstrained problem, there are n independent variables and the objective function canbe evaluated for any set of n numbers. However, in the constrained problem, at least one indepen-dent variable loses its arbitrariness with the addition of each equality constraint. Thus a problemwith m constraints in n variables will have only (n −m) independent variables. If the values ofany set of (n−m) variables are selected, the values of the remaining variables are determined bythe m equality constraints.

Thus it is theoretically possible to solve simultaneously the m equality constraints and expressany n variables in terms of the remaining (n−m) variables. When these expressions are substi-tuted into the original objective function, there results a new objective function involving only(n − m) variables. The new objective function is not subject to any constraint and hence itsoptimum can be found by using the unconstrained optimization techniques discussed above.

This method of direct substitution, although appears to be simple in theory, is not convenient frompractical point of view. The reason for this is that the constraint equations will be nonlinear formost of the practical problems and often it becomes impossible to solve them and express any m

variables in terms of the remaining (n−m) variables. However, this method of direct substitutionmight prove to be very simple and direct for solving simple problems.

Method of Constrained Variation

The basic idea used in the method of constrained variation is to find a closed form expressionfor the first order differential of the objective function f at all points at which the constraintsgj (x) = 0 , j = 1,2, . . . ,m are satisfied. The desired optimum points are then obtained by settingthe differential df equal to zero.

Simple Problem

Before presenting the general method, its salient features will be indicated through the followingsimple problem with n = 2 and m = 1.

212


Consider the problem

minimize f (x1,x2)

subject to g (x1,x2) = 0

(5.10)

Let the constraint equation g(x1,x2) = 0 be solved to obtain x2 as

x2 = h (x1) (5.11)

By substituting equation (5.11), the objective function becomes a function of only one variableas f = f [x1,h(x1)]. A necessary condition for f to have a minimum at some point (x∗1,x∗2) is thatthe total derivative of f(x1,x2) with respect to x1 must be zero at (x∗1,x∗2). The total differentialof f(x1,x2) may be written as

df (x1,x2) =∂f

∂x1dx1 +

∂f

∂x2dx2

and the total derivative with respect to x1 as

df (x1,x2)dx1

=∂f (x1,x2)

∂x1+

∂f (x1,x2)∂x2

· dx2

dx1

When this is equated to zero, the following relation is obtained

df =∂f

∂x1dx1 +

∂f

∂x2dx2 = 0 (5.12)

Since g (x∗1,x∗2) = 0 at the minimum point, any variations dx1 and dx2 taken about the point(x∗1,x∗2) are called admissible variations provided they satisfy the relation

g (x∗1 + dx1,x∗2 + dx2) = 0 (5.13)

The Taylor’s series expansion of the function in equation (5.13) about the point (x∗1,x∗2) gives

g (x∗1 + dx1,x∗2 + dx2) ' g(x∗1,x

∗2) +

∂g (x∗1,x∗2)∂x1

dx1 +∂g (x∗1,x∗2)

∂x2dx2 = 0 (5.14)

where dx1 and dx2 are assumed to be small.

Since g(x∗1,x∗2) = 0, equation (5.14) reduces to

dg =[

∂g

∂x1dx1 +

∂g

∂x2dx2

]

(x∗1,x∗2)= 0 (5.15)

Thus equation (5.15) has to be satisfied by all admissible variations, as it is shown in Figure 5.10where PQ denotes the constraint curve at each point of which the second of equations (5.10) issatisfied. If A is the base point (x∗1,x∗2), the variations in x1 and x2 leading to the points B andC are called admissible variations. On the other hand, the variations in x1 and x2 representingthe point D are not admissible since the point D does not lie on the constraint curve.

213


Figure 5.10. Variations about the base point

Thus any set of variations (dx1,dx2) that does not satisfy equation (5.15) leads to points like D,which do not satisfy the constraint equation (5.10).

Assuming that ∂g/∂x2 6= 0, equation (5.15) can be rewritten as

dx2 = −[∂g/∂x1

∂g/∂x2

]

(x∗1,x∗2)

dx1 (5.16)

This relation indicates that once the variation dx1 in x1 is chosen arbitrarily, the variation dx2

in x2 is automatically decided in order to have dx1 and dx2 as a set of admissible variations. Bysubstituting equation (5.16) in equation (5.12), one obtains

df =[

∂f

∂x1− (∂g/∂x1)

(∂g/∂x2)∂f

∂x2

]

(x∗1,x∗2)

dx1 = 0 (5.17)

Note that equation (5.17) has to be satisfied for all values of dx1. Since dx1 can be chosenarbitrarily, equation (5.17) leads to

df =(

∂f

∂x1

∂g

∂x2− ∂f

∂x2

∂g

∂x1

)

(x∗1,x∗2)= 0 (5.18)

Equation (5.18) gives a necessary condition in order to have (x∗1,x∗2) as an extreme point (mini-mum or maximum).

General problem: necessary conditions

The procedure indicated above can be generalized to the case of a problem in n variables withm constraints. In this case, each constraint equation gj(x) = 0, j = 1,2, . . . ,m, gives rise toa linear equation in the variations dxi, i = 1,2, . . . ,n. In all there will be m linear equationsin n variations. Hence any m variations can be expressed in terms of the remaining (n − m)variations. These expressions can be used to express the differentiated objective function df interms of the (n−m) independent variations. By letting the coefficients of the independent vari-ations vanish in the equation df = 0, one obtains the necessary conditions for the constrainedoptimum of the given function. The equations involved in this procedure are given below in detail.

214


The differential of the objective function is given by

df =∂f

∂x1(x∗) dx1 +

∂f

∂x2(x∗) dx2 + . . . +

∂f

∂xn(x∗) dxn (5.19)

where x∗ represents the extreme point and (dx1,dx2, . . . ,dxn) indicates the set of admissible in-finitesimal variations about the point x∗.

The following holds

gj (x∗) = 0 , j = 1,2, . . . ,m (5.20)

since the given constraints are satisfied at the extreme point x∗, and

gj (x∗ + dx) ' gj (x∗) +n∑

i=1

∂gj

∂xi(x∗) dxi = 0 , j = 1,2, . . . ,m (5.21)

since dx∗ = {dx1,dx2, . . . ,dxn} is a vector of admissible variations.

Equations (5.20) and (5.21) lead to

∂g1

∂x1dx1 +

∂g1

∂x2dx2 + . . . +

∂g1

∂xndxn = 0

∂g2

∂x1dx1 +

∂g2

∂x2dx2 + . . . +

∂g2

∂xndxn = 0

...∂gm

∂x1dx1 +

∂gm

∂x2dx2 + . . . +

∂gm

∂xndxn = 0

(5.22)

where all the partial derivatives are assumed to have been evaluated at the extreme point x∗. Anyset of variations dxi, i = 1,2, . . . ,n, not satisfying equations (5.22) will not be of interest here,since they do not satisfy the imposed constraints. Equations (5.22) can be solved to express anym variations, say, the first m variations in terms of the remaining variations. For this, equations(5.22) are rewritten as

∂g1

∂x1dx1 +

∂g1

∂x2dx2 + . . . +

∂g1

∂xmdxm = − ∂g1

∂xm+1dxm+1−

∂g1

∂xm+2dxm+2 − . . .− ∂g1

∂xndxn = h1

∂g2

∂x1dx1 +

∂g2

∂x2dx2 + . . . +

∂g2

∂xmdxm = − ∂g2

∂xm+1dxm+1−

∂g2

∂xm+2dxm+2 − . . .− ∂g2

∂xndxn = h2

...∂gm

∂x1dx1 +

∂gm

∂x2dx2 + . . . +

∂gm

∂xmdxm = − ∂gm

∂xm+1dxm+1−

∂gm

∂xm+2dxm+2 − . . .− ∂gm

∂xndxn = hm

(5.23)

215


where the terms containing the independent variations dxm+1,dxm+2, . . . ,dxn are placed on theright side. Thus, for any arbitrarily chosen values of dxm+1,dxm+2, . . . ,dxn, the values of thedependent variations are given by equations (5.23), which can be solved using Cramer’s rule.

General Problem: sufficient conditions

By eliminating the first m variables, using the m equality constraints (at least theoreticallypossible), the objective function f can be made to depend only on the remaining variablesxm+1,xm+2, . . . ,xn. Then the Taylor’s series expansion of f , in terms of these variables, aboutthe extreme point x∗ gives

f (x∗ + dx) ' f (x∗) +n∑

l=m+1

(∂f

∂x1

)

gdxl +

12!

n∑

l=m+1

n∑

j=m+1

(∂2f

∂xi ∂xj

)

g

dxi dxj (5.24)

where (∂f/∂xi)g denotes the partial derivative of f with respect to xi (holding all the othervariables xm+1,xm+2, . . . ,xi+1,xi+2, . . . ,xn constant) where x1,x2, . . . ,xm are allowed to changeso that the constraints gj (x∗ + dx) = 0, (j = 1,2, . . . ,m) are satisfied, and the second derivative(∂2f/∂xi ∂xj)g denote a similar meaning.

Method of Lagrange Multipliers

In the method of direct substitution, m variables were eliminated from the objective function withthe help of m equality constraints. In the method of constrained variation, m variations wereeliminated from the differential of the objective function. Thus both these methods were basedon the principle of eliminating m variables by making use of the constraints and then solving theproblem in terms of the remaining (n−m) decision variables.

In the Lagrange multiplier method , on the contrary, one additional variable is introduced to theproblem for each constraint. Thus if the original problem has n variables and m equality con-straints, m additional variables are added to the problem so that the final number of unknownsbecomes (n + m). Of course, there are some simplifications afforded by the addition of the newvariables. The basic features of the method will be initially given for a simple problem of twovariab1es with one constraint. The extension of the method to a general prob1em of n variableswith m constraints follows.

Simple Problem

Consider the optimization problem

minimize f(x1,x2)

subject to g(x1,x2) = 0

(5.25)

which was examined in discussing the method of constrained variation, where the necessarycondition for the existence of an extreme point at x = x∗ was found (see equation (5.18)) to be

216


that

(∂f

∂x1− ∂f/∂x2

∂g/∂x2· ∂g

∂x1

)

(x∗1,x∗2)

= 0 (5.26)

where all quantities are evaluated at (x∗1,x∗2).

By defining a quantity λ, called the Lagrange multiplier , as

λ = −(

∂f/∂x2

∂g/∂x2

)

(x∗1,x∗2)

(5.27)

equation (5.26) can be rewritten as

(∂f

∂x1+ λ

∂g

∂x1

)

(x∗1,x∗2)= 0 (5.28)

whereas equation (5.27) can be rewritten with some rearrangement as

(∂f

∂x2+ λ

∂g

∂x2

)

(x∗1,x∗2)= 0 (5.29)

In addition, the constraint equation has to be satisfied at the extreme point, i.e.

g(x1,x2) |(x∗1,x∗2)= 0 (5.30)

Thus equations (5.28) through (5.30) represent the necessary conditions for the point (x∗1,x∗2) tobe an extreme point.

Notice that the partial derivative (∂g/∂x2) |(x∗1,x∗2) has to be nonzero in order to be able to defineλ by equation (5.27). This is because the variation dx2 was expressed in terms of dx1 by equation(5.16) in the derivation of equation (5.26). On the other hand, if one chooses to express dx1 interms of dx2, the requirement would be obtained that (∂g/∂x1) |(x∗1,x∗2) be nonzero to define λ.Thus the derivation of the necessary conditions by the method of Lagrange multipliers requiresonly that at least one of the partial derivatives of g(x1,x2) be nonzero at an extreme point.

The necessary conditions given by equations (5.28) to (5.30) can also be generated by constructinga function L, known as the Lagrange function, as

L(x1,x2,λ) = f(x1,x2) + λ·g (x1,x2) (5.31)

If the partial derivatives of the Lagrange function L(x1,x2,λ) with respect to each of its argumentsare set equal to zero, the necessary conditions given by equations (5.28) through (5.30) can beobtained as

217


∂L

∂x1(x1,x2,λ) =

∂f

∂x1(x1,x2) + λ

∂g

∂x1(x1,x2) = 0

∂L

∂x2(x1,x2,λ) =

∂f

∂x2(x1,x2) + λ

∂g

∂x2(x1,x2) = 0

∂L

∂λ(x1,x2,λ) = g(x1,x2) = 0

(5.32)

which are to be satisfied at an extreme point (x∗1,x∗2). The sufficient conditions to be satisfied willbe given later.

General Problem

The equations derived above can be extended to the case of a general problem with n variablesand m equality constraints. The result can be stated in the form of a theorem as follows.

Necessary condition (theorem 5). A necessary condition for a function f(x) subject to theconstraints gj(x) = 0 (j = 1,2, . . . ,m) to have a relative minimum at a point x∗ is that the firstpartial derivatives of the Lagrange function defined by L = L (x1,x2, . . . ,xn; λ1,λ2, . . . ,λm) withrespect to each of its arguments must be zero.

To have a constrained relative minimum at x∗ it needs to comply with the following theoremwhich gives the sufficient condition for f(x) .

Sufficient condition (theorem 6). A sufficient condition for f(x) to have a relative minimumat x∗ is that the quadratic, Q, defined by

Q =n∑

i=1

n∑

j=1

∂2L

∂xi ∂xjdxi dxj (5.33)

evaluated at x = x∗ must be positive definite for all values of dxi for which the constraints aresatisfied.

Discussion

1. If the quadratic form Q =n∑

i=1

n∑

j=1

[∂2L (x∗,λ)/(∂xi ∂xj)] dxi dxj at an extreme point of f(x)

is negative for all choices of the admissible variations dxi, x∗ will be a constrained maximumof f(x).

2. It has been shown by Hancock (1960) that a necessary condition for the quadratic form Q

to be positive (negative) definite for all admissible variations dx is that each root of thepolynomial, zi, defined by the following determinant equation, be positive (negative)

218


∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

(L11 − z) L12 L13 . . . L1n g11 g21 . . . gm1

L21 (L22 − z) L23 . . . L2n g12 g22 . . . gm2

...Ln1 Ln2 Ln3 . . . (Lnn − z) g1n g2n . . . gmn

g11 g12 g13 . . . g1n 0 0 . . . 0

g21 g22 g23 . . . g2n 0 0 . . . 0...

gm1 gm2 gm3 . . . gmn 0 0 . . . 0

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

= 0 (5.34)

where

Lij =

(∂2L

∂xi∂xj

)

(x∗, λ∗)

(5.35)

and

gij =

(∂gi

∂xj

)

(x∗)

(5.36)

3. Equation (5.34), on expansion, leads to a (n −m)th order polynomial in z. If some of theroots of this polynomial are positive while the others are negative, the point x∗ is not anextreme point.

5.4.4 Multivariable Optimization with Inequality Constraints

In the constrained optimization problems with inequality constraints

minimize f (x)

subject to gj (x) ≤ 0 , j = 1,2, . . . ,m

(5.37)

the latter can be transformed to equality constraints by adding non–negative slack variables, yj ,as

gj (x) + y2j = 0 , j = 1,2, . . . ,m (5.38)

where the values of the slack variables are yet unknown.

The problem is now in a form suitable for the application of one of the methods discussed in theforegoing, that is

minimize f (x)

subject to Gj (x,y) = gj (x) + y2j = 0 , j = 1,2, . . . ,m

(5.39)

where y = {y1,y2, . . . ,ym} is the vector of the slack variables.

219


This problem can be conveniently solved by the method of Lagrange multipliers. For this, theLagrange function L is constructed as

L(x,y,λ) = f(x) +m∑

j=1

λj Gj (x,y) (5.40)

where λ = {λ1,λ2,...,λm} is the vector of Lagrange multipliers.

The stationary points of the Lagrange function can be found by solving the following equations(necessary conditions)

∂L

∂xi(x,y,λ) =

∂f

∂xi(x) +

m∑

j=1

λj∂gj

∂xi(x) = 0 , i = 1,2, . . . ,n (5.41)

∂L

∂λj(x,y,λ) = Gj (x,y) = gj (x) + y2

j = 0 , j = 1,2, . . . ,m (5.42)

∂L

∂yj(x,y,λ) = 2λj yj = 0 , j = 1,2, . . . ,m (5.43)

Equations (5.42) ensure that the constraints gj (x) ≤ 0 (j = 1,2, . . . ,m), are satisfied, whileequations (5.43) imply that either λj = 0 or yj = 0. If λj = 0, it means that the constraint isinactive2 and hence it can be ignored. On the other hand, if yj = 0, it means that the constraintis active (gj = 0) at the optimum point. Consider the division of the constraints into two subsetsJ1 and J2 where J1 + J2 represent the total set of constraints. Let the set J1 indicate the indicesof those constraints which are active at the optimum point and let J2 include the indices for allthe inactive constraints.

Thus for j ∈ J1, yj = 0 (constraints are active), whereas for j ∈ J2, λj = 0 (constraints areinactive), and equation (5.41) can be simplified as

∂f

∂xi+

∑

j∈J1

λj∂gj

∂xi= 0 , i = 1,2, . . . ,n (5.44)

Similarly, constraint equations (5.42) can be written as

gj (x) = 0 , j ∈ J1 (5.45)

gj (x) + y2j = 0 , j ∈ J2 (5.46)

Equations (5.44) through (5.46) represent [n + p + (m− p)] = (n + m) equations in the (n + m)unknowns xi (i = 1,2, . . . ,n), λj (j ∈ J1) and yj (j ∈ J2), where p denotes the number of activeconstraints.

2Those constraints which are satisfied with equality sign, gj = 0, at the optimum point are called the activeconstraints, while those that are satisfied with strict inequality sign, gj ≤ 0, are termed as inactive constraints.

220


Assuming that the first p constraints are active, equations (5.44) can be expressed as

− ∂f

∂xi= λ1

∂g1

∂xi+ λ2

∂g2

∂xi+ . . . + λp

∂gp

∂xi, i = 1,2, . . . ,n (5.47)

or written collectively as

−∇f = λ1 ·∇g1 + λ2 ·∇g2 + . . . + λp ·∇gp (5.48)

where ∇f and ∇gj are the gradients of the objective function and the jth constraint given,respectively, by

∇f =

∂f/∂x1

∂f/∂x2...

∂f/∂xn

and ∇gj =

∂gj/∂x1

∂gj/∂x2...

∂gj/∂xn

Thus the negative of the gradient of the objective function can be expressed as a linear combinationof the gradients of the active constraints at the optimum point . Further, it can be shown that inthe case of a minimization problem, the λj ’s (j ∈ J1) must be positive.

For simplicity of illustration, suppose that only two constraints are active (p = 2) at the optimumpoint. Then equation (5.48) reduces to

−∇f = λ1 ·∇g1 + λ2 ·∇g2 (5.49)

Let S be a feasible direction 3 at the optimum point.

By pre-multiplying both sides of equations (5.49) by ST, the following equation is obtained

−ST ·∇f = λ1 ST ·∇g1 + λ2 ST ·∇g2 (5.50)

Since S is a feasible direction, it should satisfy the relations

ST ·∇g1 < 0

ST ·∇g2 < 0

(5.51)

3A vector S is called a feasible direction from a point x if at least a small step can be taken along it that does notimmediately leave the feasible region. Thus for problems with sufficiently smooth constraint surfaces, the vector Ssatisfying the relation

ST ·∇gj < 0

can be called a feasible direction. On the other hand, if the constraint is either linear or concave as shown in Figures5.11(b) and 5.11(c), any vector satisfying the previous relation can be called a feasible direction. The geometricinterpretation of a feasible direction is that the vector S makes an obtuse angle with all the constraint normalsexcept that, for the linear or outward curving (concave) constraints, the angle may go to 90o at the optimum point.

221


Thus if λ1 > 0 and λ2 > 0, the quantity ST ·∇f can be seen to be always positive. As ∇f

indicates the gradient direction, along which the value of the function increases at the maximumrate, ST·∇f represents the component of the increment of f along the direction S. If ST·∇f > 0,the function value increases as it moves along the direction S. Hence, if λ1 and λ2 are positive,one will not be able to find any direction in the feasible domain along which the function valuecan be further decreased. Since the point at which equation (5.51) is valid is assumed to beoptimum, λ1 and λ2 have to be positive. This reasoning can be extended to cases where thereare more than two constraints active. By proceeding in a similar manner, one can show that theλj ’s have to be negative for a maximization problem.

Figure 5.11. Feasible direction S

Kuhn–Tucker Conditions

When the set of active constraints is known, the conditions to be satisfied at a constrainedminimum point, x∗, for the problem stated in equation (5.37), can be expressed as

∂f

∂xi+

∑

j∈J1

λj∂gj

∂xi= 0 , i = 1,2, . . . ,n

λj > 0 , j ∈ J1

(5.52)

where ∂f/∂xi and ∂gj/∂xi denote the gradient vectors with respect to x. These conditions arecalled the Kuhn–Tucker conditions (1951) after the mathematicians who developed them.

They are the necessary conditions to be satisfied at a relative constrained minimum of f(x).These conditions are, in general, not sufficient to ensure a relative minimum. However, there isa class of problems, called convex programming problems for which the Kuhn–Tucker conditionsare necessary and sufficient for a global minimum.

222


If the set of active constraints is not known, the Kuhn–Tucker conditions can be stated as follows

∂f

∂xi+

m∑

j=1

λj∂gj

∂xi= 0 , i = 1,2, . . . ,n

λjgj = 0 , j = 1,2, . . . ,m

gj ≤ 0 , j = 1,2, . . . ,m

λj ≥ 0 , j = 1,2, . . . ,m

(5.53)

Note that if the problem is one of maximization or if the constraints are of the type gj ≥ 0, thenλj have to be non-positive in equations (5.53). On the other hand, if the problem is a minimizationone with constraints in the form gj ≥ 0, then λj have to be non–negative in equations (5.53).

Convex Programming

Any optimization problem stated in the form of equation (5.37) is called a convex programmingproblem, provided the objective function, f(x), and the constraint functions, gj(x), are general(smooth) convex functions. The definitions and properties related to convex functions are givenin Appendix A.

Suppose that f(x) and gj(x) (j = 1,2, . . . ,m), are convex functions. The Lagrange function ofequations (5.39) can be written as

L(x,y,λ) = f(x) +m∑

j=1

λj [gj(x) + y2j ] i = 1,2, . . . ,n (5.54)

If λj > 0, then λj gj(x) is convex and since λj yj = 0 from equation (5.43), L(x,y,λ) will be aconvex function.

It has been derived that a necessary condition for f(x) to be a relative minimum at x∗ is thatL(x,y,λ) has a stationary point at x∗. However, if L(x,y,λ) is a convex function, its derivativevanishes only at one point and hence this point must be an absolute minimum for the functionf(x). Thus the Kuhn–Tucker conditions are both necessary and sufficient for an absolute mini-mum of f(x) at x∗.

To end, the following remarks:

• If the given optimization problem is known to be a convex programming problem, there willbe no relative minima or saddle points and hence the extreme point found by applying theKuhn–Tucker conditions is guaranteed to be an absolute minimum of f(x). However, it isoften very difficult to ascertain whether the objective and constraint functions involved ina practical engineering problem are convex.

• The Kuhn–Tucker conditions derived above are based on the development given for equalityconstraints. One of the requirements for these conditions is that at least one of the Jaco-bians composed of the m constraints and m of the (n + m) variables (x1, . . . ,xn; y1, . . . ,yn)be nonzero. This requirement is implied in the above derivation.

223


Appendix A

Convex and Concave Functions

1. Convex Function

A function f(x) is said to be convex if for any pair of points

x1 = {x(1)1 ,x

(1)2 ,...,x

(1)n }, and x2 = {x(2)

1 ,x(2)2 ,...,x

(2)n }

it results

f [λx2 + (1− λ)x1] ≤ λ f(x2) + (1− λ) f(x1) for all λ , 0 ≤ λ ≤ 1 (5.55)

that is, if the line segment connecting any two points in the graph lies entirely above or on thegraph of f(x).

Figures 5.12(a) and 5.13(a) illustrate a convex function in one and two dimensions, respectively.It can be seen that a convex function is always bending upwards and hence it is apparent thatthe local minimum of a convex function is also a global minimum.

Figure 5.12. Functions of one variable: (a) convex function; (b) concave function

2. Concave Function

A function f(x) is called a concave function if for any two points x1, x2, it results

f [λx2 + (1− λ)x1] ≥ λ f(x2) + (1− λ) f(x1) for all λ , 0 ≤ λ ≤ 1 (5.56)

that is, if the line segment joining any two points lies entirely below or on the graph of the functionbetween the two points x1, x2.

Figures 5.12(b) and 5.13(b) illustrate a concave function in one and two dimensions, respectively.It can be seen that a concave function bends downwards and hence the local maximum will alsobe a global maximum. It can be seen that the negative of a convex function is a concave function

224


and vice versa. Also note that the sum of convex functions is a convex function and the sum ofconcave functions is a concave function.

A function f(x) is a strictly convex or concave function if strict inequality holds in equations(5.190) or (5.191) respectively, for any x1 6= x2. A linear function will be both convex andconcave since it satisfies both the inequalities (5.190) and (5.191). A function may be convexwithin a region and concave elsewhere.

Figure 5.13. Functions of two variables: (a) convex function; (b) concave function

It is important to note that the convexity or concavity of a function is defined only when itsdomain is a convex set. Convex sets are a special class of sets of points in the Euclidean spaceEn, which play an important role in optimization theory.

Testing for convexity and concavity

In addition to above definition, the following equivalent relations can be used to identify a convexfunction.

Theorem 7. A function f(x) is convex if, for any pair of points x1 and x2

f(x2) ≥ f(x1) +∇fT (x1)·(x2 − x1)

If f(x) is concave, the opposite type of inequality holds.

Theorem 8. A function f(x) is convex if the Hessian matrix H(x) = [∂2f(x)/∂xi∂xj ] is positivesemidefinite.

If H(x) is positive definite, the function f(x) will be strictly convex. It can also be proved thatif f(x) is concave, the Hessian matrix is negative semidefinite.

Theorem 9. Any local minimum of a convex function f(x) is a global minimum.

It means that there cannot exist more than one minimum for a convex function.

225


5.5 Classification of Optimization Techniques

The various techniques available for the solution of optimization problems are given under theheading mathematical programming techniques. They are generally studied as a part of operationresearch, which is a branch of mathematics which is concerned with the application of scientificmethods and techniques to decision-making problems and with establishing the best or optimalsolutions.

The classical methods of differential calculus can be used to find unconstrained maxima and min-ima of a function of several variables. These methods assume that the function is differentiabletwice with respect to the design variables and the derivatives are continuous. When the problemis one of minimization or maximization of an integral, the methods of calculus of variations canbe used to solve it. For problems with equality constraints, the Lagrange multiplier method isfrequently used. But this method, in general, leads to a set of nonlinear simultaneous equationswhich may be difficult to solve.

Classification of optimization problems can be based on the nature of the expressions for theobjective function and the constraints (see Section 5.1). This classification is extremely usefulfrom the computational viewpoint since there are many methods indicated by the same name,developed solely for the efficient solution of a particular class of problems. These are all numericalmethods wherein an approximate solution is sought by proceeding in an iterative manner startingfrom a guess solution. Thus the first task of a designer would be to investigate the class of theproblem encountered. This will, in many cases, dictate the types of solution procedures to adoptin solving the problem.

Geometric Programming

Geometric programming (GP) is an optimization technique applicable to programming problemsinvolving functions of a special mathematical form called posynomials, an adaptation of the wordpolinomial to the case where all coefficients are positive. A function f(x) is called a posynomialif it can be expressed as the sum of positive terms, each of which is a power function

f(x) = ci xai11 xai2

2 ... xainn + . . . + cN xaN1

1 xaN22 ... xaNn

n (5.57)

where ci and aij are constants with ci > 0, xj > 0 and aij ∈ R. Note, however, that the power–function exponents, which must be positive integers for polinomials, can be any real number forposynomials.

A geometric programming is one in which the objective function and constraints are expressedas posynomials in x. Thus the GP method is applicable to an optimization problem of the type(Duffin et al., 1967):

226

5.5 – Classification of Optimization Techniques

Find x which minimizes f(x) =No∑

i=1

ci

n∏

j=1

xpij

j

ci > 0 , xj > 0

subject to gj(x) =Nj∑

i=1

aij

[n∏

k=1

xqikk

]≤ 0 aij > 0 , j = 1,2,...,m

(5.58)

where No and Nj denote the number of posynomial terms in the objective and jth constraintfunction, respectively.

Linear Programming

If the objective function and all the constraints in equation (5.1) are linear function of the de-sign variables, the general problem of mathematical programming reduces to linear programming(LP). A linear programming problem is stated in the following standard form (Dantzig, 1963):

Find x which minimizes f(x) =n∑

i=1

cixi

subject ton∑

k=1

ajkxk = bj , j = 1,2,... ,m

xi ≥ 0 i = 1,2,... ,n

(5.59)

where ci, ajk and bj are constants.

Integer Programming

If some or all of the design variables x1,x2,... ,xn of an optimization problem are restricted to takeon only integer (or discrete) values, the problem is called an integer programming problem (Hu,1969). On the other hand, if all the design variables are permitted to take any real values, theoptimization problem is called a real–valued programming problem.

Strictly speaking, if in an LP problem one restricts the design vector x to non-negative integers,the problem becomes nonlinear. But it should be more realistic to call it an integer linear pro-gramming because the form of the constraints and the objective function remain linear if therestrictions on x are ignored.

A systematic method for handling the integer programming problem consists in ignoring the re-strictions on x solving it as an ordinary LP problem, and then introducing additional constraintsone by one to cut out the region near the solution point till an integer solution is obtained. The-oretically the method converges, but in practice the number of iterations may be very large. Alsothe method increases the number of constraints and even an original small–sized problem maybecome very large. When the answers are in the neighborhood of large integers, the method givesa satisfactory result. But if the answer is in the neighborhood of small integers, such roundingoff may lead to a totally wrong answer.

227


Nonlinear Programming Problem

If one or all of the functions among the objective and constraint functions in equation (5.1) isnonlinear, the problem is said to be of a nonlinear programming (NLP). This is the most generalprogramming problem, which can be used to solve any optimization problem. All other problemscan be considered as special cases of the nonlinear programming problem.

In general, nonlinear programming presents much greater mathematical difficulties than linearprogramming. Even the case when all the constraints are linear and only the objective functionis nonlinear is often quite complicated.

Quadratric Programming

The quadratic programming (QP) problem is the simplest case of nonlinear programming problemwhen the objective function is quadratic and the constraints are linear. It is usually formulatedas follows:

Find x which minimizes f(x) = c +n∑

i=1

qixi +n∑

i=1

rijxixj

subject tom∑

i=1

aijxi = bj , j = 1,2,... ,m

xi ≥ 0 , i = 1,2,... ,n

(5.60)

where c, qi, rij , aij and bj are constants.

Stochastic Programming

A stochastic programming problem is an optimization problem in which some of the design vari-ables and/or preassigned parameters are described by probabilistic (nondeterministic or stochas-tic) distributions (Sengupta, 1972).

Multiobjective Programming

A multiobjective programming problem can be stated as follows:

Find x which minimizes f1(x),f2(x), . . . ,fk(x)

subject to gj(x) ≤ 0 , j = 1,2,... ,m

(5.61)

where f1, f2, ... , fk denote the objective functions to be minimized simultaneously.

228

5.5 – Classification of Optimization Techniques

Theory of Games

When two or more candidate designs are competing for the achievement of conflicting goals, acompetitive goal exists. Generally, in such problems the losses of one candidate signify the gainsof the others. Naturally, the objective function depends on a set of controlled as well as uncon-trolled variables where the uncontrolled variables depend on the strategy of the competitor.

Dynamic Programming

The method of dynamic programming (DP) was developed in the fifties through the work of Bell-man who is still the doyen of research workers in this field. The essential feature of the method isthat a multivariate optimization problem is decomposed into a series of stages, optimization beingdone at each stage with respect to one variable only. Bellman gave it rather the non–descriptivename dynamic programming, whereas a more significant name would be recursive optimization.

Both discrete and continuous problems can be amenable to this method; also deterministic as wellas stochastic models can be handled. The complexities increase tremendously with the number ofconstraints. A single constraint problem is relatively simple, but even more than two constraintsproblem can be formidable.

Network Methods

Networks are familiar diagrams in electrical theory, even though they are easily visualized intransportation systems like roads, railways or pipelines. A large variety of intricate mathematicalproblems challenging mathematicians are presented by networks. Many problems, particularlythose which involve sequential operations or different but related states or stages, are convenientlydescribed as networks. Sometimes a problem with no such apparent structure assumes a mathe-matical form which is best understood and solved by interpreting it as a network.

A network, in its more generalized and abstract sense, is called a graph. In last decades graphtheory has found more and more applications in diverse areas. In the field of operation researchgraph theory plays a particularly important role as quite often the problem of finding an optimalsolution can be looked upon as a problem of choosing the best sequence of operations out of afinite number of alternatives which can be represented as a graph.

The critical path methods (CPM) and the programme evaluation and review technique (PERT)are network methods which are useful in planning, scheduling and controlling a project. Thesemethods belong to network methods since in both the methods, the various operations necessaryto complete the project and the order in which the operations are to be performed are shownin a graph called a network. CPM is useful for projects in which the durations of the variousoperations are known exactly, whereas PERT is designed to deal with projects in which there isuncertainty regarding the durations of various operations.

229


5.6 Linear Programming

Linear programming is an optimization method applicable for the solution of problems havingobjective functions and constraints that are all linear functions of the decision variables. Theconstraint equations in a linear programming problem may be in the form of equalities and in-equalities.

The linear programming type of optimization problem was first recognized in the 1930s byeconomists while developing methods for the optimal allocation of resources. During the SecondWorld War the United States Air Force sought more effective procedures of allocating resourcesand turned to linear programming. Dantzig (1947), who was a member of the Air Force group,formulated the general linear programming problem and devised the simplex method of solution.This has become a significant step in bringing the linear programming into wider usage.

Afterwards, much progress has been made in the theoretical development and in the practicalapplications of linear programming. Among all the works, the theoretical contributions madeby Kuhn and Tucker had a major impact in the development of the duality theory in linearprogramming. The works of Charnes and Cooper were responsible for paving the path to theindustrial applications of linear programming; their number has been so large that it is possiblehere to describe only some of them.

One of the early industrial applications of linear programming has been made in the petroleumrefineries. In general, an oil refinery has a choice of buying crude oil from several different sourceswith differing compositions and at differing prices. It can manufacture different products likeaviation fuel, diesel fuel and gasoline in varying quantities. The constraints may be due to therestrictions on quantity of the crude oil available from a particular source, the capacity of therefinery to produce a particular product, etc. A mix of the purchased crude oil and the manufac-tured products is sought that gives the maximum profit.

In food processing industry, linear programming has been used to determine the optimal shippingplan for the distribution of a particular product from the different manufacturing plants to thevarious warehouses. In the iron and steel industry, the linear programming was used to decide thetypes of products to be made in their rolling mills to maximize the profit. Metal working industriesuse linear programming for shop loading and for determining the choice between producing andbuying a part. The optimal routing of aircraft and ships as well as an optimal fleet can also bedecided by using linear programming.

5.6.1 Graphical Representation

The concept of linear programming can be grasped in a preliminary way by observing a graphicalsolution when the number of variables is three or less. One can graph the set of feasible solutionstogether with the level sets of the objective function. Then, it is usually a trivial matter to writedown the optimal solution.

230

5.6 – Linear Programming

To illustrate, consider the following problem:

maximize f = 2x1 + x2

subject to g1 : x1 + 2x2 ≤ 8 g2 : 2x1 ≥ 1

g3 : x1 − x2 ≤ 3/2 g4 : 2x2 ≥ 1

x1 ≥ 0 , x2 ≥ 0

(5.62)

Each constraint (including the nonnegativity constraints on the variables) is a half-plane. Thesehalf–planes can be determined by first graphing the equation one obtains by replacing the in-equality with an equality. The geometric representation of the feasible space and the contours off are shown in in Figure 5.14. The optimization surface is a plane, and once any contour line forthe objective function is drawn, it is clear that the optimum is the vertex of the feasible regionthrough which the line representing the largest value of the objective function can pass; in thiscase the optimum solution is x∗1 = 2.73, x∗2 = 2.17.

Figure 5.14. The set of feasible solutions together with level sets of the objective function

Some immediate general remarks can be made based on this example. In a linear model, theobjective and constraint functions are always monotonic. If equalities exist, one can assume,without loss of generality, that they have been eliminated, explicitly or implicitly, so that theresulting reduced problem will be monotonic. From the first monotonicity principle, there will bealways at least one active constraint, identified possibly with the aid of dominance. Subsequentelimination of active constraints will always yield a new monotonic problem. The process willcontinue as long as activity can be proven, until no variables remain in the objective. The solutionreached usually will be at a vertex of the feasible space, which is the intersection of as many activeconstraint surfaces as there are variables. The only other possibility is to have variables left in theconstraints that do not appear in the objective function. In this case, the second monotonicityprinciple would indicate the existence of an infinite number of optima along the edge or facewhose normal matches that of the objective function gradient. The limiting optimal values will

231


be at the corners of the optimal face which correspond to upper and lower bounds on the variablenot appearing in the objective function.

5.6.2 Standard Form of a Linear Programming Problem

In linear programming the objective is always to maximize or to minimize some linear functionof the decision variables. The general linear programming problem can be stated in the followingstandard form:

Scalar form

minimize f (x1,x1, . . . ,xn) = c1x1 + c2x2 + . . . + cnxn

subject to a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2

am1x1 + am2x2 + . . . + amnxn = bm

xi ≥ 0

(5.63)

where cj , bj and aij (i = 1,2, . . . ,m; j = 1,2, . . . ,n ) are known constants, and xj are the decisionvariables.

The linear programming problem in scalar form may also be stated in a compact form by usingthe summation sign as

minimize f (x1,x1, . . . ,xn) =n∑

j=1

cjxj

subject ton∑

j=1

aijxj = bi , i = 1,2. . . . ,m

xj ≥ 0 i = 1,2, . . . ,n

(5.64)

Matrix form

minimize cTx

subject to ax = b

x ≥ 0

(5.65)

where x = {x1,x2, . . . ,xn} ; b = {b1,b2, . . . ,bn} ; c = {c1,c2, . . . ,cn}

a =

a11 a12 . . . a1n

a21 a22 . . . a2n...

am1 am2 . . . amn

and the superscript T is used to indicate the transpose.

232


The characteristics of the linear programming problem stated in the standard form are:

• the objective function is of the minimization type;• all the constraints are of the equality type;• all the decision variables are nonnegative.

It is shown below that any linear programming problem can be put in the standard form by theuse of the following transformations:

1. Also in LP problems the maximization of a function f(x1,x2, . . . ,xn) is equivalent to theminimization of the negative of the same function. Consequently, the objective functioncan be stated in the minimization form in any linear programming problem.

2. In most of the engineering optimization problems, the decision variables represent somephysical dimensions and hence the variables xj have to be nonnegative. However, when avariable is unrestricted in sign (e.g., it can take a positive, negative or a zero value), it canbe written as the difference of two nonnegative variables. Thus, if xj is unrestricted in sign,it can be written as xj = x

′j − x

′′j where

x′j ≥ 0 and x”

j ≥ 0

It can be noticed that xj will be negative, zero or positive depending on whether x′′j is

greater than, equal to or less than x′j .

3. If a constraint appears in the form of a ‘less than’ type of inequality as

ak1x1 + ak2x2 + . . . + aknxn ≤ bk

it can be converted into the equality form by adding a nonnegative slack variable xn+1 asfollows

ak1x1 + ak2x2 + . . . + aknxn + xn+1 = bk

Similarly, if the constraint is in the form of a ‘greater than’ type of inequality as

ak1x1 + ak2x2 + . . . + aknxn ≥ bk

it can be converted into the equality form by subtracting a variable as

ak1x1 + ak2x2 + . . . + aknxn − xn+1 = bk

where xn+1 is a nonnegative variable known as the surplus variable.

A set of specific values for the decision variables is called a solution with values {x1,x2, . . . ,xn}.A solution is called feasible if it satisfies all the constraints. One should assume that m < n, forif m > n, then there would be (m−n) redundant equations which could be eliminated. The casen = m is of no interest, for then there is either a unique solution x which satisfies constraints inequations (5.63), in which case there can be no optimization, or no feasible solution, in which caseat least one constraint is contradicted. The case m < n corresponds to an unbounded problemwith an underdetermined set of linear equations. The problem of linear programming is to findone of these solutions satisfying equations (5.64) or (5.65) and yielding the minimum of f .

233


5.6.3 Definitions and Theorems

The geometrical characteristics of linear programming problems can be proved mathematically.Some of the more powerful methods for solving linear programming problems take advantage ofthese characteristics. The terminology used in linear programming and some of the most impor-tant related theorems are considered below.

1. Point in n–dimensional space

A point x in an n–dimensional space is characterized by an ordered set of n values or coor-dinates (x1,x2, . . . ,xn). The coordinates of x are also called the components of x.

2. Line segment in n-dimensions

If the coordinates of two points A and B are given by x(1)j and x

(2)j (j = 1,2,...,n), the line

segment L joining these points is the collection of points x (λ) whose coordinates are givenby xj = λx

(1)j + (1− λ) x

(2)j , j = 1,2, . . . ,n , where 0 ≤ λ ≤ 1. Thus

L = {x | x = λx(1) + (1− λ)x(2)) (5.66)

In one dimension, for example, it is easy to see from Figure 5.15 that the definition is inaccordance with experience

x(λ) − x(1) = λ (x(2) − x(1)) , 0 ≤ λ ≤ 1 (5.67)

whence

x(λ) = (1− λ) x(1) + λx(2) (5.68)

Figure 5.15. A line segment

3. Hyperplane

In n–dimensional space, the set of points whose coordinates satisfy a linear equation

a1x1 + a1x1 + . . . + anxn = aTx = b (5.69)

is called a hyperplane.

Thus, the hyperplane H is given by

H(a,b) = {x | aTx = b} (5.70)

A hyperplane has (n−1) dimensions in n–dimensional space (En). For example, in a three–dimensional space, it is a plane, and in two–dimensional space, it is a line. The set of pointswhose coordinates satisfy a linear inequality like a1x1 + . . . + anxn ≤ b is called a closedhalf–space; closed due to the inclusion of equality sign in the above inequality.

234


A hyperplane partitions En into two closed half–spaces so that

H+ = {x |aTx ≥ b} (5.71)

and

H− = {x |aTx ≤ b} (5.72)

This is illustrated in Figure 5.16 in the case of a two–dimensional space (E2).

Figure 5.16. Hyperplane in two dimensions

4. Convex set

A convex set is a set of points such that if x(1) and x(2) are any two points in the set, theline segment joining them is also in the set .

If S denotes the convex set, it can be defined mathematically as follows:

if x(1) ,x(2) ∈ S , then x ∈ S

where

x = αx(2) + (1− α)x(1) , 0 ≤ α ≤ 1

As an assumption the set containing only one point is convex. Some examples of convexsets in two dimensions are shown shaded in Figure 5.17.

Figure 5.17. Convex sets

On the other hand, the sets depicted by the shaded region in Figure 5.18 are not convex.The L–shaped zone, for example, is not a convex set because it is possible to find two pointsa and b in the set such that not all points on the line joining them belong to the set.

235


Figure 5.18. Nonconvex sets

5. Convex polyhedron

A convex polyhedron is a set of points common to one or more half–spaces.

In particular, a convex polygon is the intersection of one or more half planes. Thus, Figure5.19(a) shows a 2D convex polygon, while Figure 5.19(b) represents a 3D convex polyhedron.

6. Vertex (extreme point)

A vertex is a point in the convex set which does not lie on a line segment joining two otherpoints of the set .

Thus, for example, every point on the circumference of a circle and each corner point of apolygon can be called a vertex or extreme point.

Figure 5.19. Convex polyhedron

7. Feasible solution

In a linear programming problem, any solution which satisfies the constraints

ax = b for x ≥ 0 (5.73)

is called a feasible solution.

8. Basic solution

It is a solution in which (n−m) variables are set equal to zero.

The basic solution can be obtained by setting (n − m) variables to zero and solving theconstraint equations (5.73) simultaneously.

236


9. Basis

The set of variables not set equal to zero to obtain the basic solution is the basis.

10. Basic feasible solution

It is a basic solution which satisfies the nonnegativity conditions of equation (5.65)

11. Non–degenerate basic feasible solution

It is a basic feasible solution which has got exactly m positive xi.

12. Optimal solution

An optimal solution is a feasible solution which optimizes the objective function.

13. Optimal basic solution

It is a basic feasible solution for which the objective function is optimal .

Basic theorems in linear programming.

Theorem 10. The intersection of any number of convex sets is also convex .

Physically, the theorem states that if there are a number of convex sets represented by R1, R2,. . ., then the set of points R common to all these sets will also be convex. Figure 5.20 illustratesthe meaning of this theorem for the case of two convex sets.

Figure 5.20. Intersection of two convex scts

Theorem 11. The feasible region of a linear programming problem is convex .

Theorem 12. Any local minimum solution is global for a linear programming problem.

Theorem 13. Every basic feasible solution is an extreme point of the convex set of feasible so-lutions.

Theorem 14. Let S be a closed, bounded convex polyhedron with xi, i = 1,2, . . . ,p as the set ofits extreme points. Then any vector x ∈ S can be written as

237


x =p∑

i=1

λi xi with λ ≥ 0 ,p∑

i=1

λi = 1

Theorem 15. Let S be a closed convex polyhedron. Then the minimum of a linear functionover S is attained at an extreme point of S.

5.6.4 Solution of a System of Linear Simultaneous Equations

Before studying the most general method of solving a linear programming problem, it will be use-ful to review the methods of solving a system of linear equations. Hence some of the elementaryconcepts of linear equations are reviewed.

Particular Case

Consider the following square system of n equations in n unknowns

a11x1 + a12x2 + . . . + a1nxn = b1 E1

a21x1 + a22x2 + . . . + a2nxn = b2 E2

a31x1 + a32x2 + . . . + a3nxn = b3 E3

......

......

an1x1 + an2x2 + . . . + annxn = bn En

(5.74)

or in matricial form

AX = B (5.75)

If it is assumed the reader is familiar with the definition of the inverse of a square matrix, andrecall that the inverse of A, denoted as A−1, is defined only when the determinant of A, denotedas |A|, is non zero. When |A| = 0 the matrix A is said to be singular , and when |A| 6= 0 it iscalled nonsingular .

Assuming that this set of equations possesses a unique solution, one way of solving the systemconsists of reducing the equations to a form known as canonical form.

It is well known from elementary algebra that the solution of equations (5.74) will not be al-tered under the following elementary operations: (i) any equation Er is replaced by the equationk Er, where k is a nonzero constant, and (ii) any other equation Er is replaced by the equationEr + k Es, where Es is any other equation of the system. By making use of these elementaryoperations, the system of equations (5.74) can be reduced to a convenient equivalent form asfollows. Let select some variable xi and try to eliminate it from all the equations except thejth one (for which aji is nonzero). This can be accomplished by dividing the jth equation by aji

and subtracting aki times the result from each of the other equations, k = 1,2, . . . ,j−1,j+1, . . . ,n.

238


The resulting system of equations can be written as

a′11x1 + a

′12x2 + . . . + a

′1,i−1xi−1 + 0·xi + a

′1,i+1xi+1 + . . . + a

′1nxn = b

′1

a′21x1 + a

′22x2 + . . . + a

′2,i−1xi−1 + 0·xi + a

′2,i+1xi+1 + . . . + a

′2nxn = b

′2

...a′j−1,1x1 + . . . + a

′j−1,i−1xi−1 + 0·xi + a

′j−1,i+1xi+1 + . . . + a

′j−1,nxn = b

′j−1

a′j,1x1 + a

′j,2x2 + . . . + a

′j,i−1xi−1 + 1·xi + a

′j,i+1xi+1 + . . . + a

′jnxn = b

′j

a′j+1,1x1 + . . . + a

′j+1,i−1xi−1 + 0·xi + a

′j+1,i+1xi+1 + . . . + a

′j+1,nxn = b

′j+1

...a′n1x1 + an2x2 + . . . + . . . + a

′n,i−1xi−1 + 0·xi + a

′n,i+1xi+1 + . . . + a

′n,nxn = b

′n

(5.76)

where the primes indicate that the a′ij and b

′j are changed from the original system. This pro-

cedure of eliminating a particular variable from all but one equation is called a pivot operation.The system of equations (5.76) produced by the pivot operation have exactly the same solutionas the original set of equations (5.74). That is, the x which satisfies equations (5.74) satisfiesequations (5.76) and vice versa.

In the next step, if one takes the system of equations (5.76) and performs a new pivot operationby eliminating xs,s 6= i, in all the equations except in the tth equation, t 6= j, the zeroes or the1 in the i–th column will not be disturbed. This pivotal operations can be repeated by usinga different variable and a different equation each time until the system of equations (5.74) isreduced to the form

1.x1 + 0.x2 + 0.x3 + . . . + 0.xn = b′′1

0.x1 + 1.x2 + 0.x3 + . . . + 0.xn = b′′2

0.x1 + 0.x2 + 1.x3 + . . . + 0.xn = b′′3

...0.x1 + 0.x2 + 0.x3 + . . . + 1.xn = b

′′n

(5.77)

The system of equations (5.77) is said to be in canonical form and has been obtained after carryingout n pivot operations. From the canonical form, the solution vector can be directly obtained as

xi = b′′i ; i = 1,2, . . . ,n (5.78)

Since the set of equations (5.77) has been obtained from equations (5.74) only through elementaryoperations, the system of equations (5.77) is equivalent to the system of equations (5.74). Thusthe solution given in equation (5.78) is the desired solution for equations (5.74).

Pivotal Reduction of a General System of Equations

Instead of a square system, let consider a system of m equations in n variables with n > m

239


a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2

...am1x1 + am2x2 + . . . + amnxn = bm

(5.79)

This system of equations is assumed to be consistent so that it will have at least one solution. Thesolution vectors x which satisfy the system are not evident from equations (5.79). However, it ispossible to reduce this system to an equivalent canonical system from which at least one solutioncan be readily deduced. If pivotal operations, with respect to any m variables, say, x1,x2, . . . ,xm

are carried out, the resulting set of equations can be written as follows

1·x1 + 0·x2 + . . . + 0·xm + a′′1,m+1xm+1 + . . . + a

′′1nxn = b

′′1

0·x1 + 1·x2 + . . . + 0·xm + a′′2,m+1xm+1 + . . . + a

′′2nxn = b

′′2

...0·x1 + 0·x2 + . . . + 1·xm + a

′′m,m+1xm+1 + . . . + a

′′mnxn = b

′′m

(5.80)

One special solution which can always be deduced from the system of equations (5.80) is

xi = b′′i , i = 1,2, . . . ,m

xi = 0 , i = m + 1,m + 2, . . . ,n

(5.81)

This solution is called a basic solution since the solution vector contains no more than m nonzeroterms. The pivotal variables xi(i = 1,2, . . . ,m) are called basic variables whereas the remainingvariables xi, i = m+1,m+2, . . . ,n are called non-pivotal, or independent, or nonbasic variables.Of course, the basic solution is not the only solution, but it is the one most readily deduced fromequations (5.80). If all b

′′i in the solution given by equations (5.81) are nonnegative, it satisfies

all the constraints in equations (5.63), and hence it can be called a basic feasible solution.

It is possible to obtain the other basic solutions from the canonical system of equations (5.80).One can perform an additional pivotal operation on the system after it is in canonical form, usinga′′pq (which is nonzero) as the pivot term, q > m, and using any row p (among 1,2, . . ., m). The

new system will still be in canonical form, but with xq as the pivotal variable in place of xp. Thevariable xp, which was a basic variable in the original canonical form, will no longer be a basicvariable in the new canonical form. This new canonical system yields a new basic solution (whichmay or may not be feasible) similar to that of equations (5.81). It is to be noted that the valuesof all the basic variables change, in general, as one goes from one basic solution to another, butonly one zero variable (which is nonbasic in the original canonical form) becomes nonzero (whichis basic in the new canonical system) and vice versa.

240


5.6.5 Why the Simplex Method?

Given a system in canonical form corresponding to a basic solution, it has been shown how tomove to a neighboring basic solution by a pivot operation. Thus, one way to find the optimalsolution of a linear programming problem is to generate all the basic solutions and pick the onewhich is feasible and corresponds to the optimal value of the objective function. This can bedone because the optimal solution, if one exists, always occurs at an extreme point or vertexof the feasible domain. If there are m equality constraints in n variables with n > m, a basicsolution can be obtained by setting any of the (n −m) variables equal to zero. The number ofbasic solutions to be inspected is thus equal to the number of ways in which m variables can beselected from a group of n variables, i.e.

n!(n−m)! m!

=(

nm

)

For example, if n = 10 and m = 5, there are 252 basic solutions and if n = 20 and m = 10,one gets approximately 184700 basic solutions. Usually, one does not have to inspect all thesebasic solutions since many of them will be infeasible. However, for large n and m, this is stilla very large number for inspecting one by one. So, it is not possible to find an analytical so-lution to the LP problem. The difficulty arises because the analysis tools are not well suitedto handle inequalities. Hence, what one really needs is a computational scheme that examinesa sequence of basic feasible solutions, each of which corresponds to a lower value of the objec-tive function f until a minimum is reached. Numerical methods which enable to compute thesolution of numerical values of aij , xi and bj for finite number of variables and constraints havebeen discovered. The most general and widely used of these methods is called the simplex method .

The simplex method of Dantzig provides an algorithm4 for obtaining a basic feasible solution; ifthe solution is not optimal, the method provides for finding a neighboring basic feasible solutionwhich has a lower or equal value of f . The process is repeated until, in a finite number of steps,an optimum is found.

The first step involved in the simplex method is to construct an auxiliary problem by introducingcertain variables, known as artificial variables, into the standard form of the linear programmingproblem. The primary aim of adding the artificial variables is to bring the resulting auxiliaryproblem into a canonical form from which its basic feasible solution can be immediately obtained.Starting from this canonical form, the optimal solution of the original linear programming problemis sought in two phases. The first phase is intended to find a basic feasible solution to the originallinear programming problem. It consists of a sequence of pivot operations which produces asuccession of different canonical forms from which the optimal solution of the auxiliary problemcan be found. This also enables to find a basic feasible solution, if one exists, of the originallinear programming problem. The second phase is intended to find the optimal solution of theoriginal linear programming problem. It consists of a second sequence of pivot operations which

4An algorithm is a rule of procedure usually involving repetitive application of an operation. The word isderived from the Arabic Al Khwarizmi (after the Arab mathematician of the same name, about 825 Dc) which inOld French became algorismus and in Middle English algorism.

241


enables to move from one basic feasible solution to the next of the original linear programmingproblem. In this process, the optimal solution of the problem, if one exists, will be identified. Thesequence of different canonical forms that is necessary in both the phases of the simplex methodis generated according to the simplex algorithm described below. That is, the simplex algorithmforms the kernel of the simplex method.

5.6.6 Simplex Algorithm

The starting point of the simplex algorithm is always a set of equations, which includes theobjective function along with the equality constraints of the problem in canonical form. Thus theobjective of the simplex algorithm is to find the vector x ≥ 0 which minimizes the function f(x)and satisfies the system of equations

1·x1 + 0·x2 + . . . + 0·xm + a′′1,m+1xm+1 + . . . + a

′′1nxn = b

′′1

0·x1 + 1·x2 + . . . + 0·xm + a′′2,m+1xm+1 + . . . + a

′′2nxn = b

′′2

...0·x1 + 0·x2 + . . . + 1·xm + a

′′m,m+1xm+1 + . . . + a

′′mnxn = b

′′m

0·x1 + 0·x2 + . . . + 0·xm − f + c′′m+1xm+1 + . . . + c

′′mnxn = −f

′′o

(5.82)

where a′′ij , c

′′j , b

′′j , and f

′′o are constants. Notice that (−f) is treated as a basic variable in

the canonical form of equations (5.82). The basic solution which can be readily deduced fromequations (5.82) is

xi = b′′i , i = 1,2, . . . ,m

f = f′′o

xi = 0 , i = m + 1,m + 2, . . . ,n

(5.83)

If this basic solution is also feasible, the values of xi (i = 1,2, . . . ,n) are non–negative and hence

b′′i ≥ 0 , i = 1,2, . . . ,m

In the first phase of the simplex method, the basic solution corresponding to the canonical formobtained after the introduction of the artificial variables must be feasible for the auxiliary prob-lem. As it has been stated earlier, the second phase of the simplex method starts with a basicfeasible solution of the original linear programming problem. Hence the initial canonical form atthe start of the simplex algorithm will always be a basic feasible solution.

It is known from theorem 15 that the optimal solution of a LP problem lies at one of the basicfeasible solutions. Since the simplex algorithm is intended to move from one basic feasible solu-tion to the other through pivotal operations, it must be made sure that the present basic feasiblesolution is not the optimal solution before moving to the next basic feasible solution. By merelyglancing at the numbers c

′′j (i = 1,2, . . . ,n) it is possible to ascertain whether the present basic

feasible solution is optimal or not.

242


Identifying an optimal point (theorem 16)

A basic feasible solution is an optimal solution with a minimum objective function value of f′′o if

all the cost coefficients c′′j (j = m + 1,m + 2, . . . ,n) in equations (5.82) are non-negative.

A glance over c′′j can also show if there are multiple optima. Let all c

′′i > 0 (i = m + 1,m +

2, . . . ,k + 1, . . . ,n) and let c′′k = 0 for some nonbasic variable xk. Then if the constraints allow

that variable to be made positive (from its present value of zero), no change in f results, andthere are multiple optima. It is possible however, that the variable may not be allowed by theconstraints to become positive; this may occur in the case of degenerate solutions.

Thus, as a corollary to the above discussion, one can state that a basic feasible solution is theunique optimal feasible solution if c

′′i > 0 for all nonbasic variables xj (j = m + 1,m + 2, . . . ,n).

If, after testing for optimality, the current basic feasible solution is found to be non-optimal, animproved basic solution is to be obtained from the present canonical form as follows.

Improving a non-optimal basic feasible solution

From the last row of equations (5.82), the objective function can be written as

f = f′′o +

m∑

i=1

c′′i xi +

n∑

j=m+1

c′′j xj = f

′′o (5.84)

for the solution given by equations (5.83).

If at least one c′′j is negative, the value of f can be reduced by making the corresponding xj > 0.

In other words, the nonbasic variable xj , for which the cost coefficient c′′j is negative, is to be

made a basic variable in order to reduce the value of the objective function. At the same time,due to the pivotal operation, one of the current basic variables will become nonbasic and hencethe values of the new basic variables are to be adjusted in order to bring the value of f less thanf′′o . If there are more than one c

′′j < 0, the index s of the nonbasic variable xs which is to be

made basic is chosen such that

c′′s = minimum c

′′j < 0 (5.85)

Although this may not lead to the greatest possible decrease in f (since it may not be possible toincrease xs very far), this is intuitively at least a good rule for choosing the variable to becomebasic. It is the one generally used in practice because it is simple and it usually leads to feweriterations than just choosing any c

′′j < 0. If there is a tie in applying (5.85), i.e., if more than

one c′′j have the same minimum value, one selects one of them as c

′′s arbitrarily.

Having decided on the variable xs to become basic, it has to be increased from zero holding allother nonbasic variables zero, observing the effect on the current basic variables. By equations(5.82), these are related as

243


x1 = b′′1 − a

′′1sxs , b

′′1 ≥ 0

x2 = b′′2 − a

′′2sxs , b

′′2 ≥ 0

...xm = b

′′m − a

′′msxs , b

′′m ≥ 0

(5.86)

f = f′′o + c

′′sxs , c

′′s < 0 (5.87)

Since c′′s < 0, equation (5.87) suggests that the value of xs should be made as large as possible

in order to reduce the value of f as much as possible. However, in the process of increasing thevalue of xs, some of the variables xi (i = 1,2, . . . ,m) in equations (5.86) may become negative.It can be seen that if all the coefficients a

′′is < 0, then xs can be made infinitely large without

making any xi < 0. In such a case, the minimum value of f is minus infinity and the linearprogramming problem is said to have an unbounded solution.

On the other hand, if at least one a′′is is positive, the maximum value that xs can take without

making any xi negative is (b′′i /a

′′is). If there are more than one a

′′is > 0. the largest value x∗s that

xs can take is given by the minimum of the ratios (b′′i /a

′′is) for which a

′′is > 0. Thus

x∗s =b′′r

a′′rs

= mina′′is>0

b′′i

a′′is

(5.88)

The choice of r in the case of a tie, assuming that all b′′i > 0, is arbitrary. If any b

′′i , for which

a′′is > 0, is zero in equations (5.86), then xs cannot be increased by any amount. Such a solution

is called a degenerate solution.

In the case of a non–degenerate basic feasible solution, a new basic feasible solution can beconstructed with a lower value of the objective function as follows. By substituting the value ofx∗s given by equation (5.88) into equations (5.86) and (5.87), one obtains

xs = x∗s

xi = b′′i − a

′′isx

∗s , i = 1,2, . . . ,m and i 6= r

xr = 0

xj = 0 , j = m + 1,m + 2. . . . ,n and j 6= s

(5.89)

f = f′′o + c

′′sx∗s ≤ f

′′o (5.90)

which can readily be seen to be a feasible solution different from the previous one. Since a′′rs > 0

in equations (5.88), a single pivot operation on the element a′′rs in the system of equations (5.82)

will lead to a new canonical form from which the basic feasible solution of equations (5.89) caneasily be deduced. Also, equation (5.90) shows that this basic feasible solution corresponds to alower objective function value compared to that of equations (5.83). This basic feasible solutioncan again be tested for optimality by seeing whether all c

′′i > 0 in the new canonical form. If

the solution is not optimal, the whole procedure of moving to another basic feasible solution

244


from the present one has to be repeated. In the simplex algorithm, this procedure is repeatedin an iterative manner until the algorithm finds either (i) a class of feasible solutions for whichf → −∞, or (ii) an optimal basic feasible solution with all c

′′i > 0. Since there are only a finite

number of ways to choose a set of m basic variables out of n variables, the iterative process ofthe simplex algorithm will terminate in a finite number of cycles. The iterative process of thesimplex algorithm is shown as a flowchart in Figure 5.21.

Figure 5.21. Searching the optimal solution by the simplex algorithm

5.6.7 Phases of the Simplex Method

The problem is to find non–negative values for the variables x1,x2, . . . ,xn which satisfy theconstraint equations

a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2

...am1x1 + am2x2 + . . . + amnxn = bm

(5.91)

245


and minimize the objective function given by

f = c1x1 + c2x2 + . . . + cnxn (5.92)

The two–phase simplex method can be used to solve this problem. The difficulties encounteredin solving this problem are:

• an initial feasible canonical form may not be readily available; this is the case when thelinear programming problem does not have slack variables for some of the equations or whenthe slack variables have negative coefficients.

• the problem may have redundancies and/or inconsistencies, and may not be solvable innon–negative numbers.

The first phase of the simplex method uses the simplex algorithm itself to find whether the linearprogramming problem has a feasible solution. If a feasible solution exists, it provides a basicfeasible solution in canonical form ready to initiate the second phase of the method.

The second phase, in turn, uses the simplex algorithm to find whether the problem has a boundedoptimum. If a bounded optimum exists, it finds the basic feasible solution which is optimal .

The simplex method is described in the following steps

1. Arrange the original system of equations (5.91) so that all constant terms bi are positive orzero by changing, where necessary, the signs on both sides of any of the equations.

2. Introduce to this system a basic set of artificial variables y1,y2, . . . ,ym where each yi ≥ 0 sothat it becomes

a11x1 + a12x2 + . . . + a1nxn + y1 = b1

a21x1 + a22x2 + . . . + a2nxn + y2 = b2

...am1x1 + am2x2 + . . . + amnxn + ym = bm

(5.93)

with bi ≥ 0.

The objective function of equation (5.92) can be written as

c1x1 + c2x2 + . . . + cnxn + (−f) = 0 (5.94)

3. First phase of the method . Define a quantity w as the sum of the artificial variables

w = y1 + y2 + . . . + ym (5.95)

and use the simplex algorithm to find xi ≥ 0 (i = 1,2, . . . ,n) and yi ≥ 0 (i = 1,2, . . . ,m)which minimize w and satisfy the equations (5.93) and (5.94).

246


Consequently, consider the array

a11x1 + a12x2 + . . . + a1nxn + y1 = b1

a21x1 + a22x2 + . . . + a2nxn + y2 = b2


c1x1 + c2x2 + . . . + cnxn + (−f) = 0

y1 + y2 + . . . + ym + (−w) = 0

(5.96)

This array is not in canonical form; however, it can be rewritten as a.canonical system withbasic variables y1,y2, . . . ,ym,− f and −w by subtracting the sum of the first m equationsfrom the last one to obtain the new system

a11x1 + a12x2 + . . . + a1nxn + y1 = b1

a21x1 + a22x2 + . . . + a2nxn + y2 = b2


c1x1 + c2x2 + . . . + cnxn + (−f) = 0

d1x1 + d2x2 + . . . + dnxn + (−w) = −wo

(5.97)

where

di = −(a1i + a2i + . . . + ami) for i = 1,2, . . . ,n (5.98)

and

−wo = −(b1 + b2 + . . . + bm) (5.99)

Equations (5.97) provide the initial basic feasible solution that is necessary for starting thefirst phase.

4. The quantity w is called the infeasibility form and has the property that if, as a result ofthe first phase, minimum of w > 0, then no feasible solution exists for the original linearprogramming problem stated in equations (5.93) and (5.94), and thus the procedure is ter-minated. On the other hand, if minimum of w = 0, then the resulting array will be incanonical form. So initiate the second phase, eliminating the w equation from the array aswell as the columns corresponding to each of the artificial variables y1,y2, . . . ,ym.

5. Second phase of the method . Apply the simplex algorithm to the adjusted canonical systemat the end of the first phase to obtain a solution, if a finite one exists, which optimizes thevalue of f .

The flowchart for the two-phase simplex method is given in Figures 5.22 and Figures 5.23, whichare to be read sequentially and simultaneously.

247


Figure 5.22. Flowchart for the two phase simplex method (A)

248


Figure 5.23. Flowchart for the two phase simplex method (B)

249


5.7 NLP: One–Dimensional Minimization Methods

Section 5.4 has shown that if the expressions for the objective function and the constraints arefairly simple in terms of design variables, the classical methods of optimization can be used tosolve the problem. On the other hand, if the optimization problem involves objective functionsand/or constraints which are nonlinear and/or are not stated as explicit functions of the designvariables or which are too complicated to manipulate, it cannot be solved by using the classicalanalytical methods. There are many engineering design problems which possess this characteris-tic, where it is necessary to resort to numerical, nonlinear optimizatiion methods.

The basic philosophy of most of the numerical methods of optimization is to produce a sequenceof improved approximations to the optimum according to the following scheme:

1. start with an initial trial point xi

2. find a suitable direction Si (i = 1 to start with) which points in the direction of minimum;3. find an appropriate step length λ∗i for movement along the feasible direction Si;4. obtain the new approximation xi+1 as

xi+1 = xi + λ∗i Si (5.100)

5. test whether xi+1 is optimum; if xi+1 is optimum, stop the procedure; otherwise, set newi = i + 1, and repeat step 2 onwards.

The iterative procedure indicated by equation (5.100) is valid for unconstrained as well as con-strained optimization problems. The procedure is graphically represented for a hypotheticaltwo–variable problem in Figure 5.24.

Figure 5.24. The iterative process of optimization

From equation (5.100) it can be felt that the efficiency of an optimization method depends onthe efficiency with which the quantities λ∗i and Si are determined. The methods of finding thestep length λ∗i are considered in this section, whereas the methods of finding Si are considered inthe next two sections.

250

5.7 – NLP: One–Dimensional Minimization Methods

If f (x) is the objective function to be minimized, the problem of finding λ∗i comes down to findingthe value λi = λ∗i which minimizes f (xi+1) = f (xi + λi Si) = f (λi) for fixed values of xi and Si.Since f becomes a function of one variable λi only, the methods of finding λ∗i in equation (5.100)are called one-dimensional minimization methods.

Section 5.4 demonstrated that the differential calculus method of optimization is an analyticalapproach and is applicable to continuous, twice–differentiable functions. In this method, the cal-culation of the numerical value of the objective function is virtually the last step of the process.where the optimal value of the objective function is calculated after determining the optimal valuesof the decision variables. On the contrary, in the numerical methods of optimization, an oppositeprocedure is followed in that the values of the objective function are first found at various com-binations of the decision variables and conclusions are then drawn regarding the optimal solution.

Several methods are available for solving the one–dimensional minimization problem. They canbe classified as illustrated in Table 5.1.

Elimination Methods Interpolation Methods

Unrestricted search Requiring no derivatives Requiring derivatives

Exhaustive search - quadratic - cubic

Dichotomous search - direct root

Fibonacci method

Golden section method

Table 5.1. One–dimensional numerical minimization methods

The elimination methods can be used for the minimization of even discontinuous functions. Thequadratic and the cubic interpolation methods involve polynomial approximations to the givenfunction. The direct root method interpolates the derivatives of the functions linearly.

5.7.1 Elimination Methods

Fibonacci Method

The Fibonacci method can be used to find the minimum of a function of one variable, even if thefunction is not continuous. The method, like many other elimination methods, has the followinglimitations:

• the initial interval of uncertainty, in which the optimum lies, has to be known;

• the function being optimized has to be unimodal5 in the initial interval of uncertainty;

• the exact optimum cannot be located in this method; only an interval, known as the finalinterval of uncertainty, will be known; the final interval can be made as small as desired bymaking more computations;

5A unimodal function is one that has only one peak (maximum or minimum) in a given interval

251


• the number of function evaluations to be used in the search or the resolution required hasto be specified beforehand.

This methods makes use of the sequence of Fibonacci numbers, {Fn}, for placing the experiments.These numbers are defined as

F0 = F1 = 1

Fn = Fn−1 + Fn−2 , n = 2,3,4, . . .

(5.101)

yielding the sequence 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, .....

Procedure

Let L0 be the initial interval uncertainty defined by a ≤ x ≤ b , and n the total number ofexperiments to be considered. Define

L∗2 =Fn−2

FnL0 (5.102)

and place the first two experiments at the points x1 and x2 which are located at a distance of L∗2from each end of L0

6. This gives7

x1 = a + L∗2

x2 = b− L∗2 = a +Fn−1

FnL0

(5.103)

Discard some part of the interval by using the unimodality assumption. Then there remains asmaller interval of uncertainty L2

8 given by

L2 = L0 − L∗2 = L0

(1− Fn−1

Fn

)=

Fn−1

FnL0 (5.104)

and with one experiment left in it. This experiment will be at a distance of

L∗2 =Fn−2

FnL0 =

Fn−2

Fn−1L2 (5.105)

from one end, and

L2 − L∗2 =Fn−3

FnL0 =

Fn−3

Fn−1L2 (5.106)

from the other end.6If one experiment is at a distance of (Fn−2/Fn) from one end, it will be at a distance of (Fn−1/Fn) from the

other end. Thus L∗2 = (Fn−1/Fn) L0 will also yield the same result as with L∗2 = (Fn−2/Fn) L0.7It can be seen that

L∗2 =Fn−2

FnL0 ≤ L0

2for n ≥ 2

8The symbol Lj is used to denote the interval of uncertainty remaining after conducting j experiments, whilethe symbol L∗j is used to denote the position of experiments.

252


Now place the third experiment in the interval L2 so that the current two experiments are locatedat a distance of

L∗3 =Fn−3

FnL0 =

Fn−3

Fn−1L2 (5.107)

from each end of the interval L2.

Again the unimodality property will allow to reduce the interval of uncertainty to L3 given by

L3 = L2 − L∗3 = l2 − Fn−3

Fn−1L2 =

Fn−2

Fn−1L2 =

Fn−2

FnL0 (5.108)

This process of discarding a certain interval and placing a new experiment in the remaininginterval can be continued, so that the location of the jth experiment and the interval of uncertaintyat the end of j experiments are, respectively, given by

L∗j =Fn−j

Fn−(j−2)Lj−1 (5.109)

Lj =Fn−(j−1)

FnL0 (5.110)

The ratio of the interval of uncertainty remaining after conducting j of the n predeterminedexperiments, to the initial interval of uncertainty becomes

Lj

L0=

Fn−(j−1)

Fn(5.111)

which for j = n reads

Ln

L0=

F1

Fn=

1Fn

(5.112)

The ratio (Ln/L0) will permit to determine n, the required number of experiments, to obtainany desired accuracy in locating the optimum point. Table 5.2 gives the reduction ratio in theinterval of uncertainty obtainable for difverent number of experiments.

Position of the Final Experiment

In the Fibonacci method, the last experiment has to be placed with some care. From equation(5.110) the following holds

L∗nLn−1

=F0

F2=

12

for all n (5.113)

Thus, after conducting (n− 1) experiments and discarding the appropriate interval in each step,the remaining interval will contain one experiment precisely at its centre. However, the finalexperiment, namely, the nth experiment, is also to be placed at the centre of the present intervalof uncertainty.

253


n Fn Ln/L0 n Fn Ln/L0

0 1 1.0 11 144 0.0069441 1 1.0 12 233 0.0042922 2 0.5 13 377 0.0026533 3 0.3333 14 610 0.0016394 5 0.2 15 987 0.0010135 8 0.125 16 1597 0.00064066 13 0.07692 17 2584 0.00038707 21 0.04752 18 4181 0.00023928 34 0.02941 19 6765 0.00014799 55 0.01818 20 10946 0.000091410 89 0.01124 . . . . . . . . .

Table 5.2. Fibonacci numbers and reduction ratios

That is, the position of the nth experiment will be same as that of (n− 1)th one and this is truefor whatever value is chosen for n. Since no new information can be gained by placing the nth

experiment, the nth experiment is placed very close to the remaining valid experiment as in thecase of dichotomous search method. This enables to obtain the final interval of uncertainty towithin 1

2Ln−1. The flowchart for implementing the Fibonacci method of maximization is given inFigure 5.25.

254


Figure 5.25. Implementation of the Fibonacci search method

255


5.8 NLP: Unconstrained Optimization Methods

This section deals with the various methods of solving an unconstrained minimization prob-lem. An unconstrained minimization problem is one where the value of the design vector x ={x1,x2, . . . ,xn} is sought that minimizes the objective function f(x). The unconstrained min-imization problem can be considered as a particular case of the general constrained nonlinearprogramming problem. The special characteristic of this problem is that the solution vector xneeds not satisfy any constraint. Although rarely a practical design problem would be uncon-strained, the study of this class of problems is important because:

• there are some design problems that can be treated as unconstrained except very close tothe final minimum point;

• some of the most powerful and convenient methods of solving constrained minimizationproblems involve the transformation of the problem into one of unconstrained minimization;

• the study of the unconstrained minimization techniques provides the basic understandingnecessary for the study of the constrained optimization techniques;

• these methods have emerged as powerful solution techniques for certain engineering analysisproblems.

For example, the displacement response (linear or nonlinear) of any structure under any specifiedload condition can be obtained by minimizing its potential energy. Similarly, the eigenvalues andeigenvectors of any discrete system can be found by minimizing the Rayleigh quotient, etc.

As it has already been demonstrated when discussing classical optimization techniques, a pointx∗ will be a relative minimum of f(x) if the necessary conditions

∂f(x∗)∂xi

= 0 , i = 1,2, . . . ,n (5.114)

are satisfied.

The point x∗ is guaranteed to be a relative minimum if the Hessian matrix is positive definite,i.e.

Jx∗ =∂2f (x∗)∂xi ∂xj

= positive definite (5.115)

Equations (5.114) and (5.115) can be used to identify the optimum point during numerical com-putations. While these properties of a minimum are useful in many problems, there are severalfunctions where equations (5.114) and (5.115) cannot be applied to identify the optimum point.In all such cases, only the commonly understood notion of a minimum, namely, f(x∗) ≤ f(x) forall x, can be used to identify a minimum point.

Several methods are available for solving an unconstrained minimization problem. These methodscan be classified into two broad categories as direct search methods and descent methods, as shownin Table 5.3.

256

5.8 – NLP: Unconstrained Optimization Methods

Direct Search Methods Descent Methods(do not require the derivatives (require the derivatives

of the function) of the function)

Random search method Steepest descent method

Univariate method Conjugate gradient method(Fletcher–Reeves)

Pattern search method Newton method- Powell method- Hooke and Jeeves method

Rosenbrock method of Variable metric methodrotating coordinates (Davison-Fletcher-Powell)

Table 5.3. Unconstrained minimization methods

All the uncontrained minimization methods are iterative in nature and hence they start from aninitial trial solution and proceed towards the minimum point in a sequential manner. The generaliterative scheme is shown in Figure 5.26 as a flowchart.

Figure 5.26. General iterative scheme for optimization

It is important to note that all the unconstrained minimization methods require an initial pointx1 to start the iterative procedure and differ from one another only in the method of generatingthe new point xi+1 (from xi), and in testing the point xi+1 for optimality.

257


5.8.1 Direct Search Methods

The direct search methods require only objective function evaluations and do not use the partialderivatives of the function in finding the minimum. Hence, they are often called the non-gradientmethods, which are more suitable for simple problems involving a relatively small number ofvariables. These methods are, in general, less efficient than the descent methods.

Random Search Methods

The random search methods are based on the use of random numbers in finding the minimumpoint. Since most of the computer libraries have random number generators, these methods canbe used quite conveniently. These methods have the following advantages:

• they work even if the objective function is discontinuous and non–differentiable at some ofthe points;

• they can be used to find the global minimum when the objective function possesses severalrelative minimum points;

• they are applicable when other methods fail due to local difficulties such as sharply varyingfunctions and shallow regions;

• although they are not very efficient by themselves, they can be used in the early stagesof optimization to detect the region where the global minimum is likely to be found; oncethis region is found, some of the more efficient techniques can be used to find the preciselocation of the global minimum point.

Random Jumping Method

Let the problem be to find the minimum of f(x) in the n–dimensional hypercube defined by

li ≤ xi ≤ ui , i = 1,2, . . . ,n (5.116)

where li and ui are the lower and the upper bounds on the variable xi. In the random jumpingmethod, one generates sets of n random numbers, (r1,r2, . . . ,rn), that are uniformly distributedbetween 0 and 1. Each set of these numbers, is used to find a point x inside the hypercube definedby inequalities (5.116) as

x =

x1

x2

...xn

=

l1 + r1(u1 − l1)l2 + r2(u2 − l2)

...ln + rn(un − ln)

(5.117)

and the value of the function is evaluated at this point x. By generating a number of points andevaluating the value of the objective function at each of these points, one takes the least value off(x) as the desired minimum point.

258


Although the random jumping method is very simple, it is not practical for problems with manyvariables and is used only when efficiency is not a consideration.

Random Walk Method

The random walk method is more efficient than the random jumping method. It is based on gen-erating a sequence of improved approximations to the minimum, each derived from the precedingapproximation. Thus if xi is the approximation to the minimum obtained in the (i − 1)th step,the new or improved approximation in the ith stage is found from the relation

xi+1 = xi + λui (5.118)

where λ is a prescribed scalar step length, and ui is a unit random vector generated in the ith

stage.

Figure 5.27. Flowchart for the random walk method

259


The detailed procedure of this method is given by the following steps (see the flowchart in Figure5.27):

1. Start with an initial point x1 and a scalar step length that is sufficiently large in relationto the final accuracy desired; find the function value f1 = f(x1).

2. Set the iteration number, i = 1.

3. Generate a set of n random numbers and formulate the unit random vector xi.

4. Find the new value of the objective function as f = f(x1 + λu).

5. Compare the values of f and f1. If f < f1, set xi = x1 + λu, and f1 = f , and repeat steps3 through 5. If f ≥ f1, just repeat steps 3 through 5.

6. If a sufficient large number of iterations, N , cannot produce a better point, xi+1, reducethe scalar step length λ and go to step 3.

7. If an improved point could not be generated even after reducing the value of λ below asmall number ε, take the current point x1 as the desired optimum, and stop the procedure.

Univariate Method

In this method, only one variable at a time is changed in order to try producing a sequence ofimproved approximations to the minimum point. By starting at a base point xi in the ith itera-tion, one fixes the values of (n − 1) variables and varies the remaining variable. Since only onevariable is changed, the problem becomes a one-dimensional minimization problem and any of themethods previously discussed can be used to produce a new base point xi+1. The search is nowcontinued in a new direction. This new direction is obtained by changing any one of the (n− 1)variables that were fixed in the previous iteration. In fact, the search procedure is continued bytaking each coordinate direction in turn. After all the n directions are searched sequentially, thefirst cycle is complete and hence the entire process of sequential minimization is repeated. Thisprocedure is continued until no further improvement is possible in the objective function in anyof the n directions of a cycle. The choice of the direction and the step length in the univari-ate method for a n–dimensional problem can be summarized in the following procedure (see theflowchart in Figure 5.28):

1. Choosing a starting point x1 and set i = 1.

2. Find the search direction Si as

STi =

(1,0,0, . . . ,0) for i = 1,n + 1,2n + 1, . . .

(0,1,0, . . . ,0) for i = 2,n + 2,2n + 2, . . .

(0,0,1, . . . ,0) for i = 3,n + 3,2n + 3, . . ....(0,0,0, . . . ,1) for i = n,2n,3n, . . .

(5.119)

260


Figure 5.28. Flowchart for the univariate method

3. Determine whether λi should be positive or negative. This means, for the current directionSi, find whether the function value decreases in the positive or negative direction. Forthis, one takes a small probe length, ε, and evaluates fi = f(xi), f+ = f(xi + εSi) andf− = f(xi − εSi). If f+ < fi, Si will be the correct direction for decreasing the value off and if f−< fi, −Si will be the correct one. If both f+ and f− are greater than fi, thepoint xi is taken as the minimum along the direction Si.

4. Find the optimal step length λ∗i such that

f(xi ± λ∗i Si) = minλi

(xi ± λi Si)

where + or - sign has to be used depending upon whether Si or -Si is the direction fordecreasing the function value.

261


5. Set xi+1 = xi ± λ∗i Si depending on the direction for decreasing the function value, andfi+1 = f(xi+1).

6. Set the new value of i = i+1, and go to step 2; continue this procedure until no significantchange. is achieved in the value of the objective function.

The univariate method is very simple and it can be implemented very easily. However, it will notconverge rapidly to the optimum solution as it has a tendency to oscillate with steadily decreasingprogress towards the optimum. Hence it will be better to stop the computations at some pointnear to the optimum point rather than trying to find the precise optimum point.

Theoretically this method can be applied to find the minimum of any function that possessescontinuous derivatives. However, if the function has a steep valley, the method may not evenconverge. For example, consider the contours of a function of two variables with a valley asshown in Figure 5.29. If the univariate search starts at point P , the function value cannot bedecreased either in the direction ±S1 or in the direction ±S2. Thus the search comes to ahalt and one may be misled to take the point P , which is certainly not the optimum point, asthe optimum point. This situation arises whenever the value of the probe length ε needed fordetecting the proper direction (±S1 or ±S2) happens to be less than the number of workingsignificant figures of the computer.

Figure 5.29. Failure of the univariate method in a steep valley

Pattern Search Methods

In the univariate method, one searches for the minimum along directions parallel to the coordinateaxes. It is worth noticing that this method may not converge in some cases and, even if itconverges, its convergence will be very slow while approaching the optimum point. These problemscan be avoided by changing the directions of search in some favorable manner instead of retainingthem always parallel to the coordinate axes. To understand this idea, consider the contours of afunction shown in Figure 5.30.

262


Figure 5.30. Lines defined by the alternate points lie in the general direction of the minimum

The points 1, 2, 3, . . . indicate the successive points found by the univariate method. It canbe noticed that the lines joining the alternate points of the search (like 13; 24; 35; 46; . . . )lie in the general direction of the minimum. It can be proved that if the function being mini-mized is quadratic in two variables, all such lines pass through the minimum. In other words, all1ines like 13; 24; 35; 46 move toward the common center of the family of ellipses which are thecontours of the quadratic objective function. Unfortunately, this characteristic does not carrythrough directly to higher dimensions even for quadratic objective functions. However, this ideacan still be used to achieve rapid convergence while finding the minimum of a n–variable function.

This is the basic idea used in several direct search methods, which are known collectively as thepattern search methods. Two of the well–known pattern search methods, namely the Hooke andJeeves method and the Powell method , will be discussed below. In general, a pattern searchmethod takes m univariate steps (m = n if there are n variables in the problem) and thensearches for the minimum along the direction Si defined by

Si = xi − xi−m (5.120)

where xi, is the point obtained at the end of m univariate steps and xi−m is the starting pointbefore taking the m univariate steps. The direction defined by equation (5.120) is called a patterndirection and hence the methods that make use of the pattern direction are called pattern searchmethods. Actually, the directions used prior to taking a move along a pattern direction need notbe univariate directions. The general pattern search method is shown in Figure 5.31.

One important point is to be noted while using equation (5.120). If the point xi is already aminimum point on the line Si, no improvement can be achieved even by searching along thepattern direction Si. Hence whenever the optimal step length, λ∗i , along the pattern direction Si,is found to be zero, the corresponding starting point xi can be taken as the optimum point.

Of course, the other convergence requirements are also to be verified before actually terminatingthe procedure. In some cases, the direction Si may give a direction of ascent and in such cases

263


the optimum step length will be negative. This situation can be handled by providing an ap-propriate logic for determining the proper direction Si, or −Si, before proceeding to solve theone–dimensional minimization problem.

Figure 5.31. Flowchart for pattern search method

Hooke and Jeeves Method

The simple and very effective technique called Hooke and Jeeves direct search method is a sequen-tial technique each step of which consists of two kinds of moves, one called the exploratory moveand the other called the pattern move. The first kind of move has to explore the local behaviorof the objective function, while the second kind of move has to take advantage of the patterndirection. The general procedure can be described by the following steps:

1. Select an arbitrarily starting point x = x1,x2, . . . ,xn , called the initial base point , and asmall predetermined step length ∆xi, in each of the coordinate directions ui, i = 1,2, . . . ,n.Set k = 1.

2. Compute fk = f(xk). Set i = 1, yk,0 = xk (the point ykj indicates the temporary basepoint obtained from the base point xk by perturbing the jth component of xk) and startthe exploratory search as stated in step 3.

264


3. The variable xi is perturbed about the current temporary base point yk,i−1 to obtain thenew temporary base point as

yk,i =

yk,i−1 + ∆xiui if f+ = f(yk,i−1 + ∆xiui) < f(yk,i−1)

yk,i−1 −∆xiui if f− = f(yk,i−1 −∆xiui) < f(yk,i−1) < f+

yk,i−1 if f = f(yk, i−1) < min (f+,f−)

This process of finding the new temporary base point is continued for i = 1,2, . . . until xn

is perturbed to find yk,n.

4. If the point yk,n remains the same as xk, reduce the step lengths ∆xi, (say, by a factor oftwo), set i = 1 and go to step 3. If yk,n is different from xk, obtain the new base point as

xk+1 = yk,n

and go to step 5.

5. With the help of the base points xk and xk+1, establish a pattern direction S as

S = xk+1 − xk

and find a point yk+1, 0 as

yk+1,0 = xk+1 + λS (5.121)

where λ is the step length which can be taken as 1 for simplicity. Alternatively, one cansolve a one–dimensional minimization problem in the direction S and use the optimum steplength λ∗ in place of λ in equation (5.121).

6. Set k = k + 1, fk = f(yk0), i = 1, and repeat step 3. If at the end of step 3, f(yk,n) <

f(xk), take the new base point as xk+1 = yk,n, and go to step 5. On the other hand, iff(yk,n) ≥ f(xk), set xk+1 = yk, reduce the step lengths ∆xi, and go to step 2.

7. The process is assumed to have converged whenever the step length falls below a predeter-mined small quantity ε. Thus the process is terminated if

maxi

∆xi < ε

Powell Method

The Powell method is an extension of the basic pattern search method. It is the most widelyaccepted direct search methods and it can be proved to be a method of conjugate directions.As it will be shown later, a conjugate directions method will minimize a quadratic function ina finite number of steps. Powell has suggested some modifications to facilitate the convergenceof this method when applied to non-quadratic objective functions. The Powell method is a verypowerful method and has been proved to be superior to some of the descent methods.

The basic idea of the Powell method can be understood with the help of Figure 5.32.

265


Figure 5.32. Progress of Powell method

Let the given two–variable function be minimized once along each of the coordinate directions,and then in the corresponding pattern direction. This gives point 4. For the next cycle of mini-mization, one of the coordinate directions (x1–direction in the present case) is discarded in favorof the pattern direction. Thus minimization is searched along u2 and S1 and the point 6 is ob-tained. Then a new pattern direction S2 is generated. For the next cycle of minimization, one ofthe previously used coordinate directions is discarded (u2-direction in this case) in favor of thenewly generated pattern direction. Then, by starting from point 7, minimization is performedalong the directions S1 and S2 thereby obtaining points 8 and 9, respectively. For the next cycleof minimization, since there is no coordinate direction to discard the whole procedure is restartedby minimizing along directions parallel to the coordinate axes. This procedure is continued untilthe desired minimum point is found.

The flowchart for the simplest version of the Powell method is given in Figure 5.33. Note thataccordingly the search will be made sequentially in the directions Sn; S1, S2, S3, . . . , Sn−1, Sn,S1

p; S2, S3, S4, . . . , Sn−1, Sn, S(1)p ; S(2)

p ; S3, S4, . . . ,Sn−1, Sn; S(1)p ; S(2)

p ; S(3)p ; . . . until the

minimum point is found. Here Si indicates the coordinate direction ui, while S(j)p denotes the jth

pattern direction. In Figure 5.33 the previous base point is stored as the vector z in block A, andthe pattern direction is constructed by subtracting the previous base point from the current onein block B. The pattern direction is then used in the previous cycle as a minimization directionin blocks C and D. For the next cycle, the first direction used in the previous cycle is discardedin favor of the newly generated pattern direction: This is achieved by updating the numbers ofthe search directions as shown in block E. Thus both the points z and z used in block B for the

266


construction of pattern direction are points that are minima along Sn in the first cycle, the firstpattern direction S(1)

p in the second cycle, the second pattern direction S(2)p in the third cycle,

and so forth.

Figure 5.33. Flowchart for the Powell method

Definitions

Conjugate directions: Let A be an n × n symmetric matrix. A set of n vectors (or directions)Si are said to be conjugate (more accurately A-conjugate) if

STi ASj = 0 for all i 6= j, i = 1,2, . . . ,n and j = 1,2, . . . ,n (5.122)

Quadratically convergent method. If a minimization method always locates the minimum of ageneral quadratic function in no more than a predetermined number of operations and if the

267


limiting number of operations is directly related to the number of variables n, then the methodis said to be quadratically convergent.

Theorem 17. If a quadratic function

Q (x) =12xTA x + BT x + C (5.123)

is minimized sequentially, once along each direction of a set of n linearly independent, A–conjugate directions, the global minimum of Q will be located at or before the nth step regardlessof the starting point .

Such a method is known as a quadratically convergent method. The order in which the directionsare used is immaterial in this property.

5.8.2 Descent Methods

The descent techniques require, in addition to objective function evaluations, the evaluation offirst and possibly higher order derivatives of the objective function. Since more informationabout the function being minimized is used through the use of derivatives, the descent methodsare generally more efficient compared to the direct search techniques. The descent techniques arealso known as gradient methods.

Gradient of a Function

The partial derivatives of a function f with respect to each of the n variables are collectivelycalled the gradient of the function, which is denoted by

∇f =

∂f/∂x1

∂f/∂x2...

∂f/∂xn

(5.124)

The gradient is a n–component vector and has a very important property. If one moves alongthe gradient direction from any point in n–dimensional space, the function value increases at thefastest rate. Hence the gradient direction is called the direction of steepest ascent . Unfortunately,the direction of steepest ascent is a local property and not a global one. This is illustrated in Fig-ure 5.34 where the gradient vectors∇f evaluated at the points 1, 2, 3 and 4 lie along the directions11’ , 22’ , 33’ and 44’, respectively. Thus the function value increases at the fastest rate in thedirection 11’ at point 1, but not at point 2. Similarly, the function value increases at the fastestrate in the direction 22’ (33’) at point 2 (3), but not at point 3 (4). In other words, the direction ofsteepest ascent generally varies from point to point, and if one makes infinitely small moves alongthe direction of steepest ascent, the path will be a curved line like the curve 1 2 3 4 in Figure 5.34.

268


Since the gradient vector represents the direction of steepest ascent, the negative of the gradientvector denotes the direction of steepest descent . Thus, any method which makes use of thegradient vector can be expected to give the minimum point faster than the one which does notmake use of the gradient vector. All the descent methods make use of the gradient vector, eitherdirectly or indirectly, in finding the search directions. Before considering the descent methods ofminimization, it is necessary to state that the gradient vector represents the direction of steepestascent (Theorem 18).

Figure 5.34. Steepest ascent directions

Evaluation of the gradient

As stated earlier, all the descent methods are based on the use of gradient in one form or other.Assuming that the function is differentiable, the gradient at any point xm can be evaluated as

∇f |xm= ∇fm =

∂f/∂x1

∂f/∂x2...

∂f/∂xn

xm

However there are three situations where the evaluation of the gradient poses certain problems:

• the function is differentiable at all the points, but the calculation of the components of thegradient, ∂f/∂xi, is either impractical or impossible;

• the expressions for the partial derivatives ∂f/∂xi can be derived, but they require largecomputational time for evaluation;

• the gradient ∇f is not defined at all the points.

In the first case, the forward finite difference formula can be used

∂f

∂xi|xm '

f(xm + ∆xi ui)− f(xm)∆xi

, i = 1,2, . . . ,n (5.125)

269


to approximate the partial derivative ∂f/∂xi at xm. If the function value at the base point xm

is known, this formula requires one additional function evaluation to find (∂f/∂xi)|xm . Thusit requires n additional function evaluations to evaluate the approximate gradient ∇f |xm . Forbetter results, one can use the central finite difference formula to find the approximate partialderivative of (∂f/∂xi)|xm

(∂f

∂xi

)

xm

' f(xm + ∆xiui)− f(xm −∆xiui)∆xi

, i = 1,2, . . . ,n (5.126)

This formula requires two additional function evaluations for each of the partial derivatives. Inequations (5.125) and (5.126), ∆xi is a small scalar quantity and ui is a vector of order n whoseith component has a value of one, and all other components have a value of zero. In practicalcomputations, the value of ∆xi has to be chosen with some care. If ∆xi is too small, the differencebetween the values of the function evaluated at (xm + ∆xi ui) and (xm − ∆xi ui) may be verysmall and numerical round–off error may predominate. On the other hand, if ∆xi is too large,the truncation error may predominate in the calculation of the gradient.

In the second case also, the use of the finite difference formulae is preferred whenever the exactgradient evaluation requires more computational time than the one involved in using equations(5.125) and (5.126).

In the third case, the finite difference formulae cannot be used since the gradient is not definedat all the points. For example, consider the function shown in Figure 5.35. If equation (5.126)is used to evaluate the derivative ∂f/∂x at xm, one obtains a value of α1 for a step size ∆x1

and a value of α2 for a step size ∆x2. Since, in reality, the derivative does not exist at the pointxm, the use of finite difference formulae might lead to a complete breakdown of the minimizationprocess. In such cases, the minimization can only be done by any of the direct search techniquesdiscussed earlier.

Figure 5.35. Gradient not defined at xm

270


Rate of change of a function along a direction

In most of the optimization techniques, one will be interested in finding the rate of change of afunction with respect to a parameter λ along some specified direction Si, away from a given pointxi. Any point in the specified direction away from the point xi can be expressed as x = xi +λSi.The interest is to find the rate of change of the function along the direction Si (characterized bythe parameter λ), that is

df

dλ=

n∑

j=1

∂f

∂xj· ∂xj

∂λ(5.127)

where xj is the jth component of x. But

∂xj

∂λ=

∂

∂λ(xij + λ sij) = sij (5.128)

where xij and sij are the jth components of xi and Si, respectively. Hence

df

dλ=

n∑

j=1

∂f

∂xjsij = ∇fT Si (5.129)

If λ∗ minimizes f in the direction Si, one obtains

df

dλ|λ=λ∗ = ∇f |Tλ∗Si = 0 (5.130)

at the point xi + λ∗ Si.

Steepest Descent Method

The use of the negative of the gradient vector as a direction for minimization was first made byCauchy (1847). In this method, one starts from an initial trial point x1 and iteratively movetowards the optimum point according to the rule

xi+1 = xi + λ∗i Si = xi − λ∗i ∇fi (5.131)

where λ∗i is the optimal step length along the search direction Si = −∇fi. The flowchart for thismethod is given in Figure 5.36. The method of steepest descent may appear to be the best uncon-strained minimization technique since each one–dimensional search starts in the ‘best’ direction.However, owing to the fact that the steepest descent direction is a local property, the method isnot really effective in most of the problems.

In two–dimensional problems, the application of the steepest descent method leads to a pathmade up of parallel and perpendicular segments as shown in Figure 5.37. It can be seen that thepath is a zig-zag in much the same way as the univariate method. In higher dimensions, the pathmay not be made up of parallel and perpendicular segments and hence the method may havedifferent characteristics than the univariate method. For functions with significant eccentricity,the method settles into a steady n–dimensional zig–zag and the process will be hopelessly slow.

271


Figure 5.36. Flowchart for the steepest descent method

On the other hand, if the contours of the objective function are not very much distorted, themethod may converge faster as shown in Figure 5.37.

Figure 5.37. Convergence of the steepest descent method

The following criteria can be used to terminate the iterative process

| f(xi+1)− f(xi)f(xi)

| ≤ ε1 (5.132)

∣∣∣∣∂f

∂xi

∣∣∣∣ ≤ ε2 , i = 1,2, . . . ,n (5.133)

| xi+1 − xi | ≤ ε3 (5.134)

272


Conjugate Gradient Method

The convergence characteristics of the steepest descent method can be greatly improved by mod-ifying it into a conjugate gradient method known as Fletcher–Reeves method . It has been shownthat any minimization method that makes use of the conjugate directions is quadratically conver-gent. This property of quadratic convergence is very usefuI because it ensures that the methodwill minimize a quadratic function in n steps or less. Since any general function can be approx-imated reasonably well by a quadratic near the optimum point, any quadratically convergentmethod is expected to find the optimum point in a fjnite number of iterations.

It has been shown that the Powell’s conjugate direction method requires n single variable min-imizations per iteration and sets up one new conjugate direction at the end of each iteration.Thus it requires, in general, n2 single variable minimizations to find the minimum of a quadraticfunction. On the other hand, if one can evaluate the gradients of the objective function, he/shecan set up a new conjugate direction after every one–dimensional minimization and hence canachieve faster convergence. The construction of conjugate directions and the development of theconjugate gradient method is given below.

Development of the conjugate gradient method

The procedure used in the development of the conjugate gradient method is analogous to theGram–Schmidt orthogonalization procedure. This procedure sets up each new search directionas a linear combination of all the previous search directjons, and the newly determined gradient.The following theorem is important in developing the conjugate gradient method.

Theorem 19. Suppose that the point xi+1 is reached after i steps while minimizing a quadraticfunction f(x) = 1/2xT Ax+BT x+C. If the search directions used in the minimization process,S1, S2, . . . , Sn, are mutually conjugate with respect to A, then

STk ∇fi+1 = 0 for k = 1,2, . . . ,i (5.135)

New algorithm

Consider the development of a new algorithm by modifying the steepest descent method applied toa quadratic function f(x) = 1/2xT Ax+BTx+C by imposing the condition that the successivedirections be mutually conjugate. Let x1 be the starting point for the minimization and let thefirst search–direction be the steepest descent direction. Then

S1 = −∇f1 = −Ax1 −B (5.136)and

x2 = x1 + λ∗1S1 (5.137)

where λ∗1 is the minimizing step length in the direction S1 so that

ST1 ∇f |x2 = 0 (5.138)

273


Equation (5.138) can be expanded as

ST1 [A(x1 + λ∗1S1) + B] = 0

or

ST1 Ax1 + λ∗1 ST

1 AS1 + ST1 B} = 0

from which the value of λ∗1 can be obtained as

λ∗1 =−ST

1 (Ax1 + B)ST

1 AS1=

ST1 ∇f1

ST1 AS1

(5.139)

Now express the second search direction as a linear combination of S1 and -∇f2 as

S2 = −∇f2 + β2S1 (5.140)

where β2 is to be chosen so as to make S1 and S2 conjugate. This requires

ST1 AS2 = 0 (5.141)

Substituting for S2 from equation (5.140), equation (5.141) becomes

ST1 A (−∇f2 + β2 S1) = 0 (5.142)

Since equation (5.137) gives

S1 =x2 − x1

λ∗1(5.143)

Equation (5.142) can be written as

ST1 AS2 = −x2 − x1)T

λ∗1A (∇f2 + β2 S1) = 0 (5.144)

The difference of the gradients (∇f2 −∇f1) is given by

∇f2 −∇f1 = (Ax2 + B)− (Ax1 + B) = A (x2 − x1) (5.145)

With the help of equations (5.145), equation (5.144) can be written as

(∇f2 −∇f1)T (∇f2 − β2 S1) = 0 (5.146)

where the symmetricity of the matrix A has been used. Equation (5.146) can be expanded toobtain

∇fT2 ∇f2 −∇fT

1 ∇f2 − β2∇fT2 S1) + β2∇fT

1 S1) = 0

Since ∇fT1 ∇f2 = −ST

1 ∇f2 = 0 from equation (5.135), this equation gives the value of β2 as

β2 =∇fT

2 ·∇f2

∇fT1 ∇f1

(5.147)

274


Next consider the third search direction as a linear combination of S1, S2 and −∇f3 as

S3 = −∇f3 + β3 S2 + δ3 S1 (5.148)

where the values of β3 and δ3 can be found by making S3 conjugate to S1 and S2.

First consider

ST1 AS3 = −ST

1 A∇f3 + β3 ST1 AS2 + δ3 ST

1 AS1 = 0 (5.149)

If one assumes that S1 and S2 are already made conjugate, ST1 AS2 = 0, and equation (5.149)

gives

δ3 =ST

1 A∇f3

ST1 AS1

(5.150)

From equation (5.143) δ3 can be expressed as

δ3 =(x2 − x1)T

λ∗1· A∇f3

ST1 AS1

(5.151)

By using equation (5.145), equation (5.151) can be rewritten as

δ3 =1λ∗1· (∇f2 −∇f1)T ∇f3

ST1 AS1

(5.152)

Since S1 = −∇f1 from equation (5.136) and S2 − β2S1 = −∇f2 from equation (5.140), oneobtains

∇f2 −∇f1 = −S2 + S1 (1 + β2) (5.153)

and equation (5.152) gives

δ3 =1λ∗1· [−S2 + (1 + β2)S1]T ∇f3

ST1 AS1

(5.154)

which can be seen to be equal to zero in view of equation (5.135). Therefore equation (5.148)becomes

S3 = −∇f3 + β3 S2 (5.155)

The value of β3 can be found by making S3 conjugate to S2. However, instead of finding thevalue of a particular β, one can derive a general formula for βi,i = 2,3, . . ..

By generalizing equation (5.155), one can express can express the search direction in the ith step,Si, as a 1inear combination of −∇fi and Si−1, that is

Si = −∇fi + βi Si−1 (5.156)

where the value of βi can be found by making Si, conjugate to Si−1 as

βi =∇fT

i ∇fi

∇fTi−1∇fi−1

(5.157)

275


The search directions that have been considered so far, equation (5.156), are precisely the direc-tions used in the Fletcher–Reeves method.

So far Si and Si−1 have been conjugate. It will be shown now that Si, given by equation (5.156),will be automatically conjugate to all the previous search directions, Sk, k = 1,2, . . . ,i−2, providedthat S1, S2, . . ., Si−1 are conjugate. For this, consider

STk ASi = ST

k A(−∇fi + βi Si−1) = −STk A∇fi + βi ST

k ASi−1 , k = 1,2, . . . ,i− 2 (5.158)

Since STk ASi−1 = 0 for k = 1,2, . . . ,i− 2, one obtains

STk ASi = −ST

k A∇fi , k = 1,2, . . . ,i− 2 (5.159)

Equations similar to (5.143) and (5.145) can be obtained as

Sk =xk+1 − xk

λ∗k(5.160)

and

∇fk+1 −∇fk = A (xk+1 − xk) (5.161)

and equation (5.159) can be written as

STk ASi = − 1

λ∗k(∇fk+1 −∇fk)T ∇f1 = 0 , k = 1,2, . . . ,i− 2 (5.162)

in view of the relation

∇fTk ·∇fi+1 = 0 , k = 1,2, . . . ,i− 2 (5.163)

The algorithm

The use of equations (5.156) and (5.157) for the minimization of general functions was first sug-gested by Fletcher and Reeves (1964). Their algorithm can be summarized as follows:

1. Start with an arbitrary initial point x1.

2. Set the first search direction S1 = −∇f(x1) = −∇f1.

276


Figure 5.38. flowchart for the Fletcher–Reeves method

3. Find the point x2 according to the relation

x2 = x1 + λ∗1 S1

where λ∗1 is the optimal step length in the direction S1. Set i = 2 and go to the next step.

4. Find ∇f1 = ∇f(xi), and set

Si = ∇fi +|∇fi|2|∇fi−1|2 Si−1 (5.164)

277


5. Compute the optimum step length λ∗i in the direction Si, and find the new point

xi+1 = xi + λ∗i Si (5.165)

6. Test for the optimality of the point xi+1. If xi+1 is optimum, stop the process. Otherwise,set the value of i = i + 1, and repeat steps 4, 5 and 6 until the convergence is achieved.

The flowchart for the Fletcher and Reeves method is shown in Fig. 5.38.

This method was originally proposed as a method for solving systems of linear equations derivedfrom the stationary conditions of a quadratic. Since the directions Si, used in this method areA–conjugate, the process should converge in n-cycles or less for a quadratic function. Howeverfor ill–conditioned quadratics (whose contours are highly eccentric and distorted), the methodmay require much more than n-cycles for convergence. The reason for this has been found to hethe cumulative effect of rounding errors. Since Si is given by equation (5.164), any error resultingfrom the inaccuracies involved in the determination of λ∗i , and from the round–off error involved inaccumulating the successive |∇fi|2 Si−1/|∇fi−1|2, is carried forward through the vector Si. Thusthe search directions Si will be progressively contamined by these errors. Hence it is necessary,in practice, to restart the method periodically after every, say, m steps by taking the new searchdirection as the steepest descent direction. That is, after every m steps, Sm+1 is set equal to-∇fm+1 instead of the usual form. Fletcher and Reeves have recommended a value of m = n + 1where n is the number of design variables.

In spite of this, the Fletcher and Reeves algorithm is vastly superior to the steepest descentmethod and the pattern–search methods, but it turns out to be rather less efficient than thequasi–Newton and the variable metric methods, which will be considered below. It should beborne in mind, however, that all these methods (to be discussed) require a storage of a n × n

matrix.

Variable Metric Method

Significant developments have taken place in the area of the descent techniques with the intro-duction of the variable metric method by Davidon (1959). This method was extended by Fletcherand Powell (1963), so becoming the Davidon-Fletcher-Powell method . This method is the bestgeneral purpose unconstrained optimization technique making use of the derivatives. The itera-tive procedure of this method can be stated as follows:

1. Start with an initial point x1 and a n× n positive definite symmetric matrix H1. UsuallyH1 is taken as the identity matrix I. Set iteration number as i = 1.

2. Compute the gradient of the function, ∇fi, at the point xi, and set

Si = −Hi∇fi (5.166)

taking into account that for the first iteration, the search direction S1 will be the same asthe steepest descent direction, i.e. S1 = −∇f1, if H1 = I.

278


3. Find the optimal step length λ∗i in the direction Si and set

xi+1 = xi + λ∗i Si (5.167)

4. Test the new point xi+1 for optimality. If xi+1 is optimal, terminate the iterative process;otherwise, go to the next step.

5. Update the H matrix as

Hi+1 = Hi + Mi + Ni (5.168)

where

Mi = λ∗iSi ST

i

STi Qi

(5.169)

Ni = −(Hi Qi)(Hi Qi)T

QTi Hi Qi

(5.170)

Qi = ∇f(xi+1)−∇f(xi) = ∇fi+1 −∇fi (5.171)

6. Set the new iteration number l = i + 1, and go to step 2.

This method is very powerful and converges quadratically since it is a conjugate gradient method.It is very stable and continues to progress towards the minimum even while minimizing highlydistorted and eccentric functions. The stability of this method can be attributed to the fact thatit carries the information obtained in previous iterations through the matrix Hi. It can be shownthat Hi will always remain positive definite and will be an approximation to the inverse of thematrix of second partial derivatives of the objective function.

279


5.9 NLP: Constrained Optimization Methods

This section deals with the optimization techniques that are applicable for the solution of a con-strained nonlinear optimization problem.

There are many techniques available for the solution of a constrained nonlinear programmingproblem, which can be classified into two broad categories, namely, the direct methods and indirectmethods as shown in Table 5.4.

Direct Methods Indirect Methods

(a) Heuristic search method (a) Transformation of variables- complex method

(b) Constraint approximation methods (b) Penalty function methods- cutting plane method - interior penalty function method- approximate programming method - exterior penalty function method

(c) Methods of feasible directions- Zountendijk method- Rosen method

Table 5.4. Constrained Optimization Techniques

Direct Methods

In the direct methods, the constraints are handled in an explicit manner, whereas in most of theindirect methods the constrained problem is solved as a sequence of unconstrained minimizationproblems.

Heuristic Search Methods

The heuristic search methods are mostly intuitive and do not have much theoretical support. Thecomplex method , which can be considered to be similar to the simplex method, may be studiedunder this category.

Constraint Approximation Methods

In these methods, the nonlinear objective function and the constraints are linearized about somepoint and the approximating linear programming problem is solved by using linear programmingtechniques.

The resulting optimum solution is then used to construct a new linear approximation which willagain be solved by using LP techniques. This procedure is continued until the specified conver-gence criteria are satisfied. There are two methods, namely, the cutting plane method and theapproximate programming method , which work on this principle.

280

5.9 – NLP: Constrained Optimization Methods

Methods of Feasible Directions

The methods of feasible directions are those which produce an improving succession of feasiblevectors xi, by moving in a succession of usable feasible directions. A feasible direction is onealong which at least a small step can be taken without leaving the feasible domain.

A usable feasible direction is a feasible direction along which the objective function value canbe reduced at least by a small amount. Each iteration consists of two important steps in themethods of feasible directions. The first step finds a usable feasible direction at a specified pointand the second step determines a proper step length along the usable feasible direction found inthe first step. The Zoutendijk method of feasible directions and Rosen gradient projection methodcan be considered as particular cases of general methods at feasible directions.

Indirect Methods

Two basic types of indirect optimization methods are dealt with.

Transformation of Variables

Some of the constrained optimization problems have their constraints expressed as simple andexplicit functions of the decision variables. In such cases, it may be possible to make a changeof variables such that the constraints are automatically satisfied. In some other cases, it may bepossible to know, in advance, which constraints will be active at the optimum solution. In thesecases, the particular constraint equation, gj(x) = 0, can be used to eliminate some of the variablesfrom the problem. Both these approaches will be discussed under the heading transformation ofvariables.

Penalty Function Methods

There are two types of penalty function methods, namely, the interior penalty function methodand the exterior penalty function method . In both types of methods, the constrained problem istransformed into a sequence of unconstrained minimization problems such that the constrainedminimum can be obtained by solving the sequence of unconstrained minimization problems.

In the interior penalty function methods, the sequence of unconstrained minima lies in the feasibleregion and thus it converges to the constrained minimum from the interior of the feasible region.In the exterior methods, the sequence of unconstrained minima lies in the infeasible region andconverges to the desired solution from the exterior of the feasible region.

Before discussing various types of constrained minimization techniques stated, it is worth seeingsome of the important characteristics of a constrained problem.

281


5.9.1 Characteristics of a Constrained Problem

The presence of constraints in a nonlinear programming problem creates more problems whilefinding the minimum. Several situations can be identified depending on the effect of constraintson the objective function. The simplest situation is when the constraints do not have any influ-ence on the minimum point. However, it is necessary to proceed with the general assumptionthat the constraints will have some influence on the optimum point.

A general case would appear as the case shown in Figure 5.39. Here the minimum value of f

corresponds to the contour of the lowest value having at least one point in common with theconstraint set. If the problem is a LP problem, the optimum point will always be an extremepoint.

Figure 5.39. Constrained minimum occurring on a nonlinear constraint

It should be noted that ∇f is not equal to zero at the optimum point x∗, but at least one of theconstraints, gj(x), will be zero at x∗. It can be seen from Figure 5.39 that the negative of thegradient must be expressed as

−∇f = λ∇gj , λ > 0

at an optimum point. This condition can easily be identified as a particular case of the Kuhn–Tucker necessary conditions to be satisfied at a constrained optimum point.

Another situation is one where the minimization problem has two or more local minima. Ifthe objective function has two or more unconstrained local minima and if at least one of them iscontained in a feasible region, then the constrained problem would have at least two local minima.

In summary, the minimum of a nonlinear programming problem will not be, in general, an extremepoint of the feasible region, and may not even be on the boundary. Also the problem mayhave local minima even if the corresponding unconstrained problem is not having local minima.Further, none of the local minima may correspond to the global minimum of the unconstrainedproblem. All these characteristics are direct consequences of the introduction of constraints.

282


5.9.2 Direct Methods

Methods of feasible directions

The methods of feasible directions are based on the same philosophy as the methods of uncon-strained minimization, but are constructed to deal with inequality constraints. The basic idea isto choose a starting point satisfying all the constraints and to move to a better point accordingto the iterative scheme

xi+1 = xi + λSi (5.172)

where xi is the starting point for the ith iteration, Si is the direction of movement, λ is thedistance of movement (step length) and xi+1 is the final point obtained at the end of the ith

iteration. The value of λ is always chosen so that xi+1 lies in the feasible region. The searchdirection Si is found such that (i) a small move in that direction violates no constraint, and (ii)the value of the objective function can be reduced in that direction. The new point xi+1 is takenas the starting point for the next iteration and the whole procedure is repeated several times untila point is obtained such that no direction satisfying both (i) and (ii) can be found. In general,such a point denotes the constrained local minimum of the problem. This local minimum needsnot be a global one unless the problem is a convex programming problem. A direction satisfyingthe property (i) is called feasible, while a direction satisfying both the properties (i) and (ii) iscalled a usable feasible direction. This is the reason, why these methods are known as methods offeasible directions. There are many ways of choosing usable feasible directions and hence thereare many different methods of feasible directions.

Situations for feasible directions will depend on the geometry of constraint functions; that is, inFigure 5.40(a) g1 and g2 are convex, in Figure 5.40(b) g1 is convex and g2 is linear; in Figure5.40(c) g1 is convex and g2 is concave.

Figure 5.40. Feasible directions S

283


A vector S will be a usable feasible direction if it satisfies both the relations

d

dλf(xi + λS |λ=0= ST ·∇f(xi) ≤ 0 (5.173)

d

dλgj(xi + λS |λ=0= ST ·∇gj(xi) ≤ 0 (5.174)

where the equality sign holds true only if a constraint is linear or strictly concave as shown inFigures 5.11(b) and 5.11(c).

The geometrical meaning of equation (5.174) is that the vector S must make an obtuse anglewith all the constraint normals except that, for the linear or concave constraints, the angle maygo to as less as 90o. Any feasible direction satisfying the strict inequality sign of equation (5.174)lies at least partly in the feasible region. By moving along such a direction from xi one will beable to find another point xi+1, which also lies in the feasible region.

It is possible to reduce the value of the objective function at least by a small amount by takinga step length λ > 0 along such a direction.

The iterative procedure of the methods of feasible directions is shown graphically in Figure5.41. Let x1 be the starting feasible point and let the initial usable feasible direction chosen beS1 = −∇f (x1). A step length λ > 0 is to be taken along the direction S1 so as to minimize f

along the direction S2 without violating any of the constraints. This procedure gives x2 as thenew point.

Figure 5.41. Iterative procedure of the methods of feasible directions

Proceeding in the direction of the negative gradient of the objective function at x2 violates theconstraints. Hence a usable feasible direction S2 is found at the point x2 such that it makes anangle greater than 90o with ∇g2 and an angle lesser than 90o with ∇f2. Several usable feasibledirections can be generated at this point x2. The locally best feasible direction may be selected

284


along which the value of f decreases most rapidly, that is, along which −ST2 ∇f(x2) is maxi-

mized. This is the feasible direction which makes the smallest angle with −∇f(x2) = −∇f2.By moving along the direction S2 by the maximum possible distance, the point x3 is obtained.A new usable feasible direction S3 at x3 is obtained along which to move as much as possibleto obtain the point x4. At this point, the negative of the gradient of the objective function is−∇f4 and no usable feasible direction can be found. In other words, no feasible direction at x4

makes an angle of less than 90o with −∇f4. Thus the point x4 will be taken as a local minimum.It can be seen that this local minimum is same as the global minimum of f over the constraint set.

It may not always be possible to obtain the global minimum of f . For example, if the processstarts with point y1 shown in Figure 5.41, the iterative procedure leads to the local minimum y3,which is different from the global minimum x4. This problem of local minima is common to allmethods and one can be sure of avoiding them only in the case of convex programming problems.

5.9.3 Indirect Methods

Transformation Techniques

Change of Variables

If the constraints gj(x) are explicit functions of the variables xi and have certain simple forms, itmay be possible to make a transformation of the independent variables such that the constraintsare automatically satisfied. Thus it may be possible to convert a constrained optimization probleminto an unconstrained one by making change of variables. One of the frequently encounteredconstraints, which can be satisfied in this way, is that when the variable is bounded below andabove by certain constants

li ≤ xi ≤ ui (5.175)

where li, and ui, are, respectively, the lower and the upper limits on xi. These constraints canbe satisfied by transforming the variable xi as

xi = li + (ui − li) sin2 yi (5.176)

where yi is the new variable which can take any value.

In the particular case when the variable xi is restricted to lie in the interval (0, 1), any of thefollowing transformations can be used

xi = sin2 yi ; xi = cos2 yi ; xi = eyi/(eyi + e−yi) ; xi = y2i /(1 + y2

i )

If the variable xi is constrained to take only positive values, the transformation has to be asfollows

xi =| yi | ; xi = y2i ; xi = eyi

285


On the other hand, if the variable is restricted to take values lying only in between -1 and 1, thetransformation is given by

xi = sin yi ; xi = cos yi ; xi =2yi

1 + y2i

After applying these transformations, the unconstrained minimum of the objective function issought with respect to the new variables yi.

The following points are to be noted in applying this transformation technique:

• the constraints gj(x) have to be very simple functions of xi;• for certain constraints it may not be possible to find the necessary transformation;• if it is not possible to eliminate all the constraints by making change of variables, it may

be better not to use the transformation at all; the partial transformation may, sometimes,produce a distorted objective function which might be more difficult to minimize than theoriginal function.

Elimination of Variables

If an optimization problem has m inequality constraints, all of them may not be active 9 at theoptimum point. If it is known, in advance, which constraints are going to be active at the opti-mum point, those constraint equations can be used to eliminate the variables from the problem.Thus if r (< n) specific constraints are known to be active at the optimum point, any r variablescan be eliminated from the problem and a new problem is obtained involving only (n−r) variablesand (m− r) constraints. This problem will be, in general, much easier to solve compared to theoriginal problem.

The major drawback of this method is that it will be very difficult to know, before hand, whichof the constraints are going to be active at the optimum point. Thus, in a general problem withm constraints, it is needed to check (i) the minimum of f (x) with no constraints (assuming thatno constraint is active at the optimum point), (ii) the minimum of f (x) by taking one constraintat a time with equality sign (assuming that one constraint is active at the optimum point), (iii)the minimum of f (x) by taking all possible combinations of constraints taken two at a time(assuming that two constraints are active at the optimum point), etc. If any of these solutionssatisfies the Kuhn–Tucker necessary conditions, it is likely to be a local minimum of the originaloptimization problem. It can be seen that, in the absence of a prior knowledge about whichconstraints are going to be active at the optimum point, the number of problems to be solved isgiven by

1 + m +m (m− 1)

2!+

m (m− 1) (m− 2)3!

+ . . . +m!

(m− n)!n!=

n∑

k=0

m!k! (m− k)!

For example, if the original optimization problem has m = 5 variables and n = 10 constraints,the number of problems to be solved will be 638, which can be seen to be very large.

9A constraint, which is satisfied with equality sign, is called an active constraint

286


However, in LP problems, it is known that exactly (n−m) variables will be zero at the optimumpoint. In such cases, it is necessary to solve only n!/[(n − m)!m!] problems to identify theoptimum solution. For example, if m = 5 and n = 10, the number of problems to be solved willbe 252, which is still a 1arge number in terms of practical computations. Hence this approachis not feasible even for solving LP problems. The simplex method can be seen to be much moreefficient than this technique because it moves from one basic feasible solution to an improvedneighboring basic feasible solution in a systematic manner.

Penalty Function Methods: Basic Approach

The penalty function methods transform the basic optimization problem into alternative formula-tions such that numerical solutions are sought by solving a sequence of unconstrained minimiza-tion problems. Let the basic optimization problem be of the form

Find x which minimizes f(x)

subject to gj(x) ≤ 0 , j = 1,2,...,m

(5.177)

This problem is converted into an unconstrained minimization problem by constructing a functionof the form

φk = φ(x,rk) = f(x) + rk

m∑

j=1

Gj [gj(x)] (5.178)

where Gj is some function of the constraint gj and rk is a positive constant known as the penaltyparameter . The second term on the right side of equation (5.178) is called the penalty term andits significance will be seen later. If the unconstrained minimization of the φ–function is repeatedfor a sequence of values of the penalty parameter rk (k = 1,2, . . .), the solution may be broughtto converge to that of the original problem stated in equation (5.177). This is the reason why thepenalty function methods are also known as sequential unconstrained minimization techniques(SUMT).

The penalty function formulations for inequality constrained problems can be divided into twocategories, namely, the interior penalty function method and the exterior penalty function method .In the interior formulations some of the popularly used forms of Gj are given by

Gj = − 1gj(x)

(5.179)

or

Gj = log [−gj(x)] (5.180)

In the case of exterior penalty function formulations, some of the commonly used forms of thefunction Gj are

Gj = max [0, gj(x)] (5.181)

287


or

Gj = {max [0, gj(x)]}2 (5.182)

In the interior methods, all the unconstrained minima of φk lie in the feasible region and convergeto the solution of equations (5.177) as rk is varied in a particular manner. In the exterior methods,all the unconstrained minima of φk lie in the infeasible region and converge to the desired solutionfrom the outside as rk is changed in a specified manner. The convergence of the unconstrainedminima of φk is illustrated in Figure 5.42 for the simple problem.

Find x = {x1}which minimizes f(x) = α x1

subject to g1(x) = β − x1 ≤ 0

(5.183)

It can be seen from Figure 5.42(a) that the unconstrained minima of φ(x,rk) converge to theoptimum point x∗ as the parameter rk is increased sequentially. On the other hand, the interiormethod shown in Figure 5.42(b) gives convergence as the parameter rk is decreased sequentially.

Figure 5.42. Penalty function methods: (a) exterior method; (b) interior method

There are several reasons for the appeal of the penalty function formulations. One main reason,which can be observed from Figure 5.42 is that the sequential nature of the method allows agradual or sequential approach to criticality of the constraints. In addition, the sequential processpermits a graded approximation to be used in the analysis of the system. This means that if theevaluation of f and gj , and hence φ(x,rk), for any specified design vector x is computationally

288


very difficult, one can use coarse approximations during the early stages of optimization (when theunconstrained minima of φk are far away from the optimum) and finer or more detailed analysisapproximation during the final stages of optimization. Another reason is that the algorithms forthe unconstrained minimization of rather arbitrary functions have been well studied and generallyare quite reliable.

Interior Penalty Function Method

As indicated previously, in the interior penalty function method, a new function (φ-function) isbuilt by adding a penalty term to the objective function. The penalty term is chosen such thatits value will be small at points away from the constraint boundaries and will tend to infinity asthe constraint boundaries are approached. Hence the values of the φ-function also ‘blows up’ asthe constraint boundaries are approached. This behavior can also be seen from Figure 5.42(b)Thus, once the unconstrained minimization of φ(x,rk) is started from any feasible point x1, thesubsequent points generated will always lie within the feasible domain since the constraint bound-aries act as barriers during the minimization process. This is the reason why the interior penaltyfunction methods are also known as ‘barrier methods’.

The φ–function defined originally by Carroll (1981) is

φ(x,rk) = f(x)− rk

m∑

j=1

1gj(x)

(5.184)

It can be seen that the value of the function φ will always be greater than f since gj(x) is negativefor all feasible points x. If any constraint gj(x) is satisfied critically (with equality sign), the valueof φ tends to infinity. It is to be noted that the penalty form in equation (5.184) is not definedif x is infeasible. This introduces serious shortcoming while using equation (5.184). Since thisproblem does not allow any constraint to be violated, it requires a feasible starting point for thesearch towards the optimum point. However, in many engineering problems, it may not be verydifficult to find a point satisfying all the constraints, gj(x) ≤ 0, at the expense of large valuesof the objective function f(x). If there is any difficulty in finding a feasible starting point, themethod described below can be used to find a feasible point. Since the initial point as well aseach of the subsequent points generated in this method lies inside the acceptable region of thedesign space, the method is classified as the interior penalty function formulation. The iterationprocedure of this method is illustrated below.

Iterative Process

1. Start with an initial feasible point x1 satisfying all the constraints with strict inequalitysign, that is, gj(x1) < 0 for j = 1,2, . . . ,m, and an initial value of r1 > 0. Set k = 1.

2. Minimize φ(x,rk) by using any of the unconstrained minimization methods and obtain thesolution x∗k.

289


3. Test whether x∗k is the optimum solution of the original problem. If x∗k is found to beoptimum, terminate the process; otherwise, go to the next step.

4. Find the value of the next penalty parameter, rk+1, asrk+1 = c·rk where c < 1

5. Set the new value of k = k + 1, take the new starting point as xi = x∗k and go to step 2.

These steps are shown in the form of a flowchart in Figure 5.43.

Figure 5.43. Flowchart for the interior penalty function method

Although the algorithm is straightforward, there are a number of points to be considered in im-plementing the method. These are:

• the starting feasible point x1 may not be readily available in some cases;

• a suitable value of the initial penalty parameter r1 has to be found;

• a proper value has to be selected for the multiplication factor c;

• suitable convergence criteria have to be chosen to identify the optimum point;

• the constraints have to be normalized so that each one of them vary between -1 and 0 only.

290


Starting feasible point

In most of the engineering problems, it is not very difficult to find an initial point x1 satisfying allthe constraints, gj(x1) < 0. In most of the practical problems, a feasible starting point may befound even at the expense of a large value of the objective function. However, there may be somesituations where the feasible design points could not be found so easily. In such cases, the requiredfeasible starting points can be found by using the interior penalty function method itself as follows:

1. Choose an arbitrary point x1 and evaluate the constraints gj(x) at the point x1. Since thepoint x1 is arbitrary, it may not satisfy all the constraints with strict inequality sign. If r

out of a total of m constraints are violated, renumber the constraints such that the last r

constraints will become the violated ones; that is,

gj(x) < 0 j = 1,2, . . . ,m− r

gj(x) ≥ 0 j = m− r + 1,m− r + 2, . . . ,m

(5.185)

2. Identify the constraint which is violated most at the point x1, that is, find the integer k

such that

gk(x1) = max [gj(x1)] for j = m− r + 1,m− r + 2, . . . ,m (5.186)

3. Now formulate a new optimization problem as

Find x which minimizes gk(x)

subject to gj(x) ≤ 0 j = 1,2, . . . ,m− r

gj(x)− gk(x1) ≤ 0 j = m− r + 1, . . . ,k, . . . ,m

4. Solve the optimization problem formulated in step 3. by taking the point x1 as a feasi-ble starting point using the interior penalty function method. Note that this optimizationmethod can be terminated whenever the value of the objective function gk(x) drops belowzero. Thus the solution obtained xM will satisfy at least one more constraint than did theoriginal point x1.

5. If all the constraints are not satisfied at the point x∗, set the new starting point as x1 = x∗and renumber the constraints such that the last r constraints will be the unsatisfied ones(this value of r will be different from the previous value), and go to step 2.

This procedure is repeated until all the constraints are satisfied, and a point x1 = xM is obtainedfor which gj(x1) < 0,j = 1,2, . . . ,m.

If the constraints are consistent, it should be possible to obtain, by applying the above procedure, apoint x1 that satisfies all the constraints. However, there may exist situations in which the solutionof the problem formulated in step 3 gives the unconstrained or constrained local minimum of gk(x)that is positive. In such cases, one has to restart the procedure with a new point x1 from step 1onwards.

291


Initial value of the penalty parameter

Since the unconstrained minimization of φ(x,rk) is to be carried out for a decreasing sequence ofrk, it might appear that by choosing a very small value of r1, one can avoid an excessive numberof minimization of the function φ. But, from computational point of view, it will be easier tominimize the unconstrained function φ(x,rk) if rk is large. This can be seen qualitatively fromFigure 5.42(b). It can be seen that as the value of rk becomes smaller, the value of the functionφ changes more rapidly in the vicinity of the minimum φ∗k. Since it is easier to find the minimumof a function whose graph is smoother, the unconstrained minimization of φ will be easier if rk islarge. However, the minimum of φk, x∗k, will be farther away from the desired minimum x∗ if rk islarge. Thus it requires an excessive number of unconstrained minimization of φ(x,rk) (for severalvalues of rk) to reach the point x∗ if r1 is selected to be very large. Thus a ‘moderate’ value hasto be chosen for the initial penalty parameter r1. In practice, a value of r1, which gives the valueof φ(x1,r1) approximately equal to 1.1 to 2.0 times the value of f(x1), has been found to be quitesatisfactory in achieving quick convergence of the process. Thus, for any initial feasible startingpoint x1, the value of r1 can be taken as

r1 ' (0.1÷ 1.0)f(x1)

−m∑

j=1

1gj(x1)

(5.187)

Subsequent values of the penalty parameter

Once the initial value of rk is chosen, the subsequent values of rk have to be chosen such that

rk+1 < rk (5.188)

For convenience, the value of rk are chosen according to the relation

rk+1 = c·rk where c < 1 (5.189)

The value of c can be taken as 0.1 or 0.2 or 0.5, etc.

Convergence criteria

Since the unconstrained minimization of φ (x, rk) has to be carried out for a decreasing sequenceof values rk, it is necessary to use proper convergence criteria to identify the optimum pointand to avoid an unnecessary large number of unconstrained minimization. The process can beterminated whenever the following conditions are satisfied:

1. The relative difference between the values of the objective function obtained at the end ofany two consecutive unconstrained minimization falls below a small number ε1, i.e.

| f(x∗k)− f(x∗k−1)f(x∗k)

| ≤ ε1 (5.190)

292


2. The difference between the optimum points x∗k and x∗k−1 becomes very small. This can bejudged in several ways. Some of them are given below

|(∆x)i| ≤ ε2 (5.191)

where ∆x = x∗k − x∗k−1, and (∆x)i is the ith component of the vector ∆x; or

max |(∆x)i| ≤ ε3 (5.192)

|(∆x)| = [(∆x)21 + (∆x)22 + . . . + (∆x)2n]1/2 ≤ ε4 (5.193)

Note that the values of ε1 to ε4 have to be chosen depending on the characteristics of the problemat hand.

Normalization of constraints

A structural optimization problem, for example, might be having constraints on the deflection,δ, and the stress, σ, as

g1(x) = δ(x)− δmax ≤ 0 (5.194)

g2(x) = σ(x)− σmax ≤ 0 (5.195)

where the maximum allowable values are given by δmax = 0.5 cm, and σmax = 3,000 kg/cm2. If adesign vector x1 gives the values of g1 and g2 as -0.2 and -2000, the contribution of g1 will be muchlarger than that of g2 (by an order of 104) in the formulation of the φ–function given by equations(5.184). This will badly effect the convergence rate during the minimization of φ–function. Thus,it is advisable to normalize the constraints so that they vary between -1 and 0 as far as possible.For the constraints shown in equations (5.194) and (5.195), the normalization can be done as

g′1(x) =

g1(x)δmax

=δ(x)δmax

− 1 ≤ 0 (5.196)

g′2(x) =

g2(x)σmax

=σ(x)σmax

− 1 ≤ 0 (5.197)

If the constraints are not normalized as shown in equations (5.196) and (5.197), the problem canstill be solved effectively by defining different penalty parameters for different constraints as

φ(x, rk) = f(x)− rk

m∑

j=1

Rj

gj(x)(5.198)

where R1, R2, . . ., Rm are selected such that the contributions of different gj(x) to the φ–functionwill be approximately same at the initial point x1. When the unconstrained minimization ofφ(x, rk) is carried for a decreasing sequence of values of rk, the values of R1, R2, . . ., Rm willnot be altered; however, they are expected to be effective in reducing the disparities between thecontributions of the various constraints to the φ function.

293


Exterior Penalty Function Method

In the exterior penalty function method, the φ-function is generally taken as

φ(x, rk) = f(x) + rk

m∑

j=1

< gj(x) >q (5.199)

where rk is a positive penalty parameter, the exponent q is a nonnegative constant, and thebracket function < gj(x) > is defined as

< gj(x) > = max < gj(x),0 > =

gj(x) if gj(x) > 0 constraint is violated

0 if gj(x) ≤ 0 constraint is satisfied(5.200)

It can be seen, from equation (5.199), that the effect of the second term on the right side isto increase φ(x, rk) in proportion to the qth power of the amount by which the constraints areviolated. Thus there will be a penalty for vio1ating the constraints, and the amount of penaltywill increase at a faster rate compared to the amount of violation of a constraint (for q > 1). Thisis the reason why the formulation is called the penalty function method. Usually, the functionφ(x, rk) possesses a minimum as a function of x in the infeasible region. The unconstrainedminima x∗k converge to the optimal solution of the original problem as k → ∞ and rk → ∞.Thus the unconstrained minima approach the feasible domain gradually and as k → ∞, the x∗keventually lies in the feasible region. Consider equation (5.199) for various values of q.

• q = 0

Here the φ-function is given by

φ(x, rk) = f(x) + rk

m∑

j=1

< gj(x) >0 =

{f(x) + mrk if all gj(x) > 0

f(x) if all gj(x) ≤ 0(5.201)

This function is discontinuous on the boundary of the acceptable region as shown in Figure5.44 and hence it would be very difficult to minimize this function.

Figure 5.44. φ–function discontinuous for q = 0

294


• 0 < q < 1

Here the φ–function will be continuous, but the penalty for violating a constraint may betoo smal1. Also the derivatives of the function are discontinuous along the boundary. Thusit will be difficult to minımize the φ–function. Typical contours of the φ–function are shownin Figure 5.45.

Figure 5.45. Derivatives of φ–function discontinuous for 0 < q < 1

.

• q = 1

In this case, under certain restrictions, it has been shown that there exists an r◦ large enoughthat the minimum of φ(x, rk) is exactly the constrained minimum of the original problemfor all rk > r◦. However, the contours of the φ–function look similar to those shown inFigure 5.45 and possess discontinuous first derivatives along the boundary. Hence, in spiteof the convenience of choosing a single rk that yields the constrained minimum in one un-constrained minimization, the method is not very attractive from computational point ofview.

• q > 1

In this case the φ–function will have continuous first derivatives as shown in Figure 5.46.These derivatives are given by

∂φ

∂xi=

∂f

∂xi+ rk

m∑

j=1

< gj(x) >q−1 ∂gj(x)∂xi

(5.202)

Generally, the value of q is chosen as 2 in practical computation. It will be assumed a valueof q > 1 in the subsequent discussion of this method.

295


Figure 5.46. Derivatives of φ–function discontinuous for 0 < q < 1

Algorithm

The exterior penalty function method can be .stated by the following steps:

1. Start from any design x1 and a suitable value of r1. Set k = 1.

2. Find the vector x∗k that minimizes the function

φ(x.rk) + rk

m∑

j=1

< gj(x) >q

3. Test whether the point x∗k satisfies all the constraints. If x∗k is feasible, it is the desiredoptimum and hence terminate the procedure. Otherwise, go to step 4.

4. Choose the next value of the penalty parameter which satisfies the relation

rk+1 > rk

and set the new value of k as original k plus one, and go to step 2.

This procedure is indicated as a flow chart in Figure 5.47 where rk+l is chosen, for simplicity,according to the relation

rk+1

rk= c

where c is a constant greater than one.

296


Figure 5.47. Flowchart for exterior penalty function method

297


298

Bibliography

[1] Carroll, A.L.: The Created Response Surface Technique for Optimizing Nonlinear Restrained System,Operations Research, Vol. 9, 1961, pp. 169–184.

[2] Cauchy, A.L.: Methode generale pour la resolution des systemes d’equations simultanees, C.R. Acad.Science, Vol. 7, 1947.

[3] Davidon, W.C.: Variable Metric Method of Minimization, Argonne National Laboratory Report no.ANL–5990, 1959.

[4] Duffin, R.J., Peterson, E.L., Zener, C.: Geometric Programming: Theory and Applications, Wiley,New York, 1967.

[5] Dantzig, C.: Linear Programming and Extensions, Princeton University Press, 1963.

[6] Fletcher, R. and Powell, M.J.D.: A Rapidly Convergent Descent Method for Minimization, ComputerJournal, Vol. 6, no. 2,, 1963, pp. 163–168.

[7] Fletcher, R. and Reeves, C.M.: Function Minimization by Conjugate Gradients, Computer Journal,Vol. 7, no. 2,, 1964, pp. 149–154.

[8] Hancock, H.: Theory of Maxima and Minima, Dover, New York, 1960.

[9] Hu, T.C.: Integer Programming and Network Flows, Addison–Wesley, Reading, Massachusets, 1969.

[10] Kuhn, H.W., Tucker, A.: Nonlinear Programming , Proceedings, the Second Berkeley Symposium onMathematical Statistic and Probability, University of California Press, Berkeley, 1951.

[11] Sengupta, J.K.: Stochastic Programming: Methods and Applications, North–Holland Publishing Co.,Amsterdam, 1972.

299

Bibliography

300

Chapter 6

Design of Experiments

Sir Ronald Fisher was the innovator in the use of statistical methods in design of experiments(DoE). He developed the analysis of variance as the primary method of statistical analysis inexperimental design. While Fisher was clearly the pioneer, there have been many other signifi-cant contributors to the techniques of experimental design, including Yates, Bose, Kempthorne,Cochran, and Box.

Many of the early applications of experimental design methods were in the agricultural and bio-logical sciences, and as a result, much of the terminology of the field is derived from this heritage.However, the first industrial applications of experimental design began to appear in the 1930s,initially in the British textile and woolen industry. After the Second World War, experimentaldesign methods were introduced to the chemical and process industries in the United States andWestern Europe. The semiconductor and electronics industry has also used experimental designmethods for many years with considerable success. In recent years, there has been a revival ofinterest in experimental design in the United States because many industries discovered that theirAsiatic competitors have been using designed experiments for many years and that this has beenan important factor in their competitive success.

The day is approaching when all engineers will receive formal training in experimental design aspart of their undergraduate education. The successful integration of experimental design into theengineering profession is expected to become a key factor in the future competitiveness of theEuropean industrial base.

6.1 General

Experiments are performed by investigators in virtually all industrial sectors. Literally, an exper-iment is a test. A designed experiment is, therefore, a test or series of tests in which purposefulchanges are made to the input variables of an industrial process or technical system so that thereasons for changes in the output response may be identified.

The process under study can be represented by the model shown in Figure 6.1. One can usuallyvisualize the process as a combination of machines, methods, engineers, computers, and other

301

6 – Design of Experiments

resources that transforms some input into an output that has one or more observable responses.Some of the process variables x1,x2, . . . ,xp are controllable, whereas other variables z1,z2, . . . ,zp

are uncontrollable.

Figure 6.1. General model of a process

Experimental design methods play an important role in design and process development as well asin process trouble shooting to improve manufacturing performance. The objective in many casesmay be to develop a robust design and/or a robust process, that is, a process affected minimallyby external sources of variability (the noise parameters z’s).

Experimental design methods have found broad application in many disciplines. In engineer-ing, experimentation may be viewed as part of the scientific process and as one of the ways theexperimenter learns about how technical systems or manufacturing processes work. Generally,the experimenter learns through a series of activities in which he/she makes conjectures about aprocess or a physical phenomenon, perform experiments to generate data from the process, andthen uses the information from the experiment to establish new conjectures, which lead to newexperiments, and so on.

Experimental design methods are critically important tools in the engineering world for improvingthe performance of a manufacturing process. The application of experimental design techniquesearly in manufacturing process development can result in

• improved product and process outcomes;• reduced variability and closer conformance to nominal or target requirements;• reduced development time;• reduced overall costs.

Experimental design methods also play a major role in engineering design activities, where newproducts are developed and existing ones improved. Some applications include:

• evaluation and comparison of conceptual and basic design configurations;• selection of design parameters so that the technical product will work well under a wide

variety of environmental conditions, that is, so that the product is robust;• determination of key product design variables that impact product performance.

302

6.1 – General

The use of experimental design in these areas can result in products that are easier to manufac-ture, products that enhance technical performance and reliability together with lower productcost, and shorter product design and development time.

Finally, the objectives of an experiment may include the following determinations:

• which variables are most influential on the response, y;

• where to set the influential x’s so that y is almost always near the desired nominal value;

• where to set the influential x’s so that variability in y is small;

• where to set the influential x’s so that the effects of the uncontrollable variables z1,z2, . . . ,zn

are minimized.

6.1.1 Basic Principles

The statistical approach to experimental design is mandatory if one wishes to draw meaningfulconclusions from the data. collected from a physical experiment and from numerical analysis ofa phisical phenomenon or process. In order to perform the design experiment most efficiently, ascientific approach to planning the experiment must be employed. To reach valid and objectiveconclusions by means of the statistical design of experiments, the investigator may plan theexperiment by collecting data appropriately,When the problem involves data that are subject to experimental errors, statistical methodologyis the only objective approach to analysis. Thus, there are two aspects to any experimental prob-lem: the design of the experiment and the statistical analysis of the data. These two subjects areclosely related since the method of analysis depends directly on the design method employed.

There are three basic principles of experimental design; namely, replication, randomization, andblocking .

Replication is intended as a repetition of the basic experiment. It has two important properties.First, it allows the experimenter to obtain an estimate of the experimental error. This estimateof error becomes a basic unit of measurement for determining whether observed differences in thedata are really statistically different. Second, if the sample mean, y, is used to estimate the effectof a factor in the experiment, then replication permits the experimenter to obtain a more preciseestimate of this effect. For example; if σ2 is the variance of the data, and there are n replicates,then the variance of the sample mean is

σ2y =

σ2

n

The practical implication of this fact is that, if the experimenter has n = 1 replicates and manyobservations, he/she would probably be unable to make satisfactory inferences about the effectof variables, that is, the observed difference could be the result of an experimental error. Onthe other hand, if n is reasonably large, and the experimental error is sufficiently small, then theexperimenter will be reasonably safe in his/her conclusions.

303


Randomization is the cornerstone underlying the use of statistical methods in experimental de-sign. By randomization it is meant that both the allocation of the experimental data and theorder in which the individual runs or trials of the experiment are to be performed are randomlydetermined. Randomization usually makes it valid the assumption that statistical methods re-quire the observations (or errors) be independently distributed random variables. By properlyrandomizing the experiment, the investigator is also helped in ‘averaging out’ the effects of ex-traneous, imprecise and ambiguoius factors that may be present.

Blocking is a technique used to increase the precision of an experiment. A block is a portion ofthe experimental data that should be more homogeneous than the entire set of data. Blockinginvolves making comparisons among the conditions of interest in the experiment within eachblock.

6.1.2 Guidelines for Designing Experiments

To use the statistical approach in designing and analyzing a design experiment, the investigatormust have a clear idea in advance of exactly which phenomenon or process is to be modelled, howthe data are to be collected, and at least a qualitative understanding of how these data are to beanalyzed. The recommended procedure is outlined through the following steps:

1. Recognition and statement of the problem. This may seem to be a rather obvious point, butin practice it is often not so simple to develop a clear and generally accepted statement ofthe problem. Usually, it is important to press for cooperation from all concerned parties.A clear statement of the problem often contributes substantially to a better understandingof the phenomenon or process and to reach a sound solution.

2. Choice of the variables and levels. The experimenter must choose the variables (factors) tobe varied in the experiment, the ranges over which these factors will be varied, and the spe-cific levels at which runs will be made. Thought must also be given to how these factors areto be controlled at the desired values and how they are to be measured. The experimenterwill also have to decide on the range over which each factor will be varied and on how manylevels of each variable to use. It is important to investigate all variables that may be ofimportance; particularly when the investigator is in the early stages of experimentation.When the objective is factor screening or process characterization, it is usually best to keepthe number of factor levels low (most often two attributes are used).

3. Selection of the response. In selecting the response variable, the experimenter should becertain that this variable really provides useful information about the phenomenon understudy. Most often, the average or standard deviation (or both) of the measured character-istic will be the response variable.

4. Choice of the experimental design. If the first three steps are done correctly, this step isrelatively easy. Selection of the appropriate design involves consideration of sample size,selection of a suitable run order for the experimental trials, and determination of whether or

304

6.1 – General

not blocking or other randomization restrictions are involved, while keeping the experimen-tal objectives in mind. In many engineering experiments, the investigator already knowsat the outset that some of the factor levels will result in different values for the response.Consequently, he/she has to identify which factors cause this difference and to estimate themagnitude of the response variation.

5. Performing the experimental design. Errors in experimental procedure at this stage willusually destroy experimental validity. It is easy to underestimate the planning aspects ofrunning a designed experiment in a complex research and development environment.

6. Data analysis. Statistical methods should be used to analyze the data so that results andconclusions are objective in nature. If the experiment has been designed correctly and if ithas been performed according to the design, then the statistical methods required are notelaborate. There are many excellent software packages designed to assist in data analysis,and simple graphical methods play an important role in data interpretation. Residual anal-ysis and model adequacy checking are important analysis techniques. Notice that statisticalmethods cannot prove that a factor has a particular effect. They only provide guidelinesas to the reliability and validity of results. Properly applied, statistical methods allow tomeasure the likely error in a conclusion or to assign a level of confidence to a statement. Theprimary advantage of statistical methods is that they add objectivity to the decision–makingprocess. Statistical techniques coupled with good engineering knowledge will usually leadto sound conclusions.

7. Conclusions and recommendations. Once the data have been analyzed, the experimentermust draw practical conclusions about the results. Graphical methods are often useful inthis stage, particularly in presenting the results. Follow–up runs and confirmation testingshould also be performed to validate the conclusions from the experiment. Throughout thisentire process, it is important to keep in mind that experimentation is an important part ofthe 1earning process. This suggests that experimentation is iterative: it is usually a majormistake to design a single, large, comprehensive experiment at the start of a study. A suc-cessful experiment requires knowledge of the primary factors, the ranges over which thesefactors should be varied, the appropriate number of levels to use, and the proper units ofmeasurement for these variables. As an experimental program progresses, the experimenteroften drops some input variables, adds others, changes the region of exploration for somefactors, or adds new response variables.

6.1.3 Statistical Techniques in Design Experiments

Much of the research in engineering, science, and industry is empirical and makes extensive useof (numerical) experimentation. Statistical methods can greatly increase the efficiency of ex-periments and often strengthen the conclusions so obtained. The intelligent use of statisticaltechniques in experimental designs requires that the investigator keeps the following points inmind:

305


1. Use non-statistical knowledge of the problem. In some fields there is a large body of physicaltheory on which to draw in explaining the relationships between factors and responses. Thistype of nonstatistical knowledge is invaluable in choosing factors, determining factor levels,deciding how many replicates to run, interpreting the results of the analysis, and so forth.Using statistics is no substitute for thinking about the problem.

2. Keep the design and analysis as simple as possible. Do not exagerate in the use of com-plex, sophisticated statistical techniques. Relatively simple design and analysis methodsare almost always the best. If the investigator builds the experimental design carefully andcorrectly, the analysis will almost always be relatively straightforward.

3. Recognize the difference between practical and statistical significance. Just because two ex-perimental conditions produce mean responses that are statistically different, there is noassurance that this difference is large enough to have any practical value.

4. Experiments are usually iterative. Remember that, in most situations, it is unwise to designtoo comprehensive an experiment at the start of a study. This argues in favor of the iterativeor sequential approach. Of course, there are situations where comprehensive experimentsare entirely appropriate, but as a general rule, most experiments should be iterative.

6.2 Simple Comparative Experiments

In this section, experiments to compare two conditions or formulations (treatments) are consid-ered; they are often called simple comparative experiments. The discussion leads to a reviewof several basic statistical concepts, such as random variables, probability distributions, randomsamples, sampling distributions, and tests of hypotheses.

Start with an example of an experiment performed to determine whether two different formula-tions of a product give equivalent results. The tension bond strength of portland cement mortaris an important characteristic of the product. An engineer is interested in comparing the strengthof a modified formulation in which polymer latex emulsions have been added during mixing to thestrength of the unmodified mortar. The experimenter has collected 10 observations of strengthfor the modified formulation and other 10 observations for the unmodified formulation. The twodifferent formulations can be referred as two treatments or levels of the factor formulations. Thedata from this experiment are plotted in Figure 6.2. This display is called a dot diagram.

Figure 6.2. Dot diagram for the tension bond strength

306

6.2 – Simple Comparative Experiments

Visual examination of these data give the immediate impression that the response of the unmod-ified formulation is greater than the response of the unmodified formulation. This impression issupported by comparing the average response, y1 = 16.76 kgf/cm2 for the modified formulationand y2 = 17.92kgf/cm2 for the unmodified formulation. The average responses in these twosamples differ by what seems to be a nontrivial amount.

However, it is not obvious that this difference is large enough to imply that the two formulationsare really different. Perhaps this observed difference in average strengths is the result of samplingfluctuation and the two formulations are really identical. Possibly other two samples would giveopposite results, with the strength of the modified formulation exceeding that of the unmodifiedformulation. A technique of statistical inference called hypothesis testing , or significance testing ,can be used to assist the experimenter in comparing two formulations. Hypothesis testing allowsthe comparison of the two formulations to be made on objective terms, with a knowledge of therisks associated with reaching the wrong conclusion. To present procedures for hypothesis testingin simple comparative experiments, however, it is first necessary to develop and review someelementary statistical concepts.

6.2.1 Basic Concept of Probability

Engineers must deal with uncertainty, which is an all–pervading and dominant characteristic ofengineering practice. Engineering uncertainty may occur in three basic ways.

Uncertainty occurs when the investigator measures something or makes predictions of dependentvariables from measured/computed quantities. Engineers are inclined to assume that this kind ofuncertainty does not exist at all. They tend to treat problems of this type as deterministic, andoverdesign to compensate for uncertainty. This must be considered as a rather crude treatmentof this kind of uncertainty. It is necessary to tackle the problem in a more rational way, usingthe concepts of probability theory . For example, in case of a towing resistance experiment, navalarchitects would expect that the measurement could be repeated with very consistent results.This is typical of engineering measurements, in which the discrimination of the instrument’s scalecorresponds to its inherent variation, and one gets the impression that the measurement is exact,or almost exact. However, substantial uncertainty about the true value of the towing resistancemay really exists. It may be due to a bias in the measurement technique, or variations in tem-perature and humidity conditions in the model basin. Furthermore, the equation predicting totalresistance in full scale may be empirical, with considerable uncertainty about the ‘correct‘ valueof scaling coefficients. Or the predictive law may be a poor model for the true behavior, to anuncertain degree.

The second basic type of engineering uncertainty occurs when engineers are concerned with anevent that may or may not occur, or the time of its occurrence may be uncertain. Typical of thiswould be hydrodynamic loads on a ship structure, or the frequency of occurrence a seakeepingphenomenon reaches a critical level.

307


Designers may also be uncertain about the validity of a hypothesis or theory that is to be used topredict performance of a technical system or subsystem. This is similar to the case mentioned inthe discussion of the first type of uncertainty, the question of a poor model for the design analy-sis. However, designers are concerned there about whether or not a model, which was known tohave good validity in certain circumstances, was applicable to a particular design for which thecircumstances are uncertain. Here designers are concerned with the validity of one or perhapsalternative models when the circumstances are well defined or well controlled.

To deal with uncertainty in a rational, ordered manner, engineers must be able, in a sense, tomeasure it. The term event will be used to designate all the kinds of things engineers are uncer-tain about. The measure of the degree of uncertainty about the likelihood of an event occurring isprobability . In an attempt to measure probability the experimenter begins by arbitrarily definingits range as 0 to 1, or 0 to 100%. If he/she is certain that an event will not occur, one says thatit has a probability of 0; if he/she is certain that an event will occur, it has a probability of 1.In general engineers are faced with the problem of measuring intermediate values of probabilitywhere the measure is obvious from the real situation or known a priori. Situations in which apriori probabilities may be assumed, represent a rather special type of situation quite unreprese-nattive of engineering problems.

Designers are concerned with random or unpredictable events; so, they have to understand whatis really meant by the likelihood of an event occurring. In this respect, there are in fact severalmeanings. One straughtforward meaning uses the frequency concept, which is clear enough forrepeatable events with a priori probabilities. Therefore, probability can be defined as a mea-sure of the number of times one expects an event to occur. There are other apparently differentmeanings for probability. The first and rather simple meaning defining probability as the rela-tive frequency of occurrence of an event should be examined a little more closely because of thedifficulty experienced by many investigators in being forced to say that the probability measuregives about the frequency of occurrence of events in a sample. Experience shows that a series ofa given number of trials or measurements of an event will not yield exactly the same frequencyof occurrence each time. Experience also shows that variation in frequency becomes less as thesample increases. Probability can be defined in this sense more precisely as the limit of the ratioof the number of occurrences to the number of trials, as the number of trials increases withoutlimit .

Probability is also a measure of risk that an event will or will not occur, a measure of uncertaintyabout the occurrence of an event, and a measure of the degree of a designer’s belief that an eventwill occur. These are closely related concepts, and represent successive increases in generality ofthe concept of probability.

6.2.2 Basic Statistical Concepts

Each observation in an experiment is called a run. Notice that the individual runs differ, so thereis fluctuation or noise in the results. This noise is usually called experimental error or simply

308


error . It is a statistical error since it arises from variations that are uncontrolled, unpredictableand generally unavoidable. The presence of error or noise implies that the response variable is arandom variable.

Random variables may be discrete or continuous. If the set of all possible values of the randomvariable is either finite or countably infinite, then the random variable is discrete, whereas if it isan interval, then the random variable is continuous. Discrete random design variables are ratherrare in the design of technical components, but are not uncommon in design of technical systems.Environmental random discrete variables also occur.

In engineering the concern is with the probability of occurrence of events, and the experimentermust relate random variables to events. If the variable is discrete, the event is the occurrence ofany given value. The set of events defined by the random variable is predetermined in size andmutually exclusive and collectively exhaustive. On the other hand, for continuous variables, itis not possible to enumerate a finite number of events having a specific value, and therefore anevent is defined as the occurrence of a value within a specified interval. A set of events in thiscase needs not be defined. The experimenter may only be interested, for example, in the eventthat a continuous variable have a value greater than some specification.

Sample of Data

Because of the fundamental importance of the probability density function (PDF) in probabilisticdesign it is extremely important to illustrate its concept. The PDF is the basic tool for codifyingand communicating uncertainty about a value of a continuously varying variable.

Figure 6.3. Frequency histogram

There are several ways of developing the concept of a density function. Simple graphical methodsare often used to assist in analyzing the data from an experiment. The dot diagram, illustratedin Figure 6.2, is a very useful device for displaying a small body of data (say up to about 20observations). The dot diagram enables the experimenter to see quickly the general location orcentral tendency of the observations and their spread.

309


If the data are fairly numerous, then the dots in a dot diagram become difficult to distinguish,and a histogram may be preferable. If a sample of values of a random variable is analyzed to givefrequencies in specified intervals, the results can be plotted as a histogram, as shown in Figure6.3. The histogram shows the central tendency, spread, and general shape of the probabilisticdistribution of the data.

The box plot (or box and whisker plot) is a very useful way to display data. A box plot displaysthe minimum, the maximum, the lower and upper quartiles (the 25th percentile and the 75thpercentiles, respectively), and the median (the 50th percentile) on a rectangular box aligned ei-ther horizontally or vertically. The box extends from the lower quartile to the upper quartile,and a line is drawn through the box at the median. Lines (or whiskers) extend from the ends ofthe box to the minimum and maximum values.

Figure 6.4 presents the box plots for the two samples of tension bond strength in the portlandcement mortar experiment. This display clearly reveals the difference in mean strength betweenthe two formulations. It also indicates that both formulations produce reasonably symmetricdistributions of strength with similar variability or spread.

Figure 6.4. Box plots

Dot diagrams, histograms, and box plots are useful for summarizing the information in a sampleof data. To describe the observations that might occur in a sample more completely, the conceptof the probability distribution is used.

Probability Distributions

The probability structure of a random variable, say y, is described by its probability distribution.If the random variable is discrete, the probability distribution of y, say p(y) = f(y), is oftencalled the probability function of y and is represented directly by a probability mass function asillustrated in Figure 6.5. The function f(y) now represents the actual probability of the value y

occurring; it is its height that represents probability.

310


Figure 6.5. Discrete probability distribution

If y is continuous, the probability distribution of y, say f(y), is often called the probability densityfunction (PDF) for y. Figure 6.6 illustrates a hypothetical continuous probability distributiontogether with its relationship with the cumulative distribution function Fy. It is the area underthe curve f(y) associated with a given interval that represents probability.

Figure 6.6. Continuous probability distribution

The properties of probability distributions may be summarized quantitatively as follows:

if y discrete 0 ≤ p(yj) ≤ 1 for all values of yj

P (y = yj) = p(yj) for all values of yj

n∑

j=1

p (yj) = 1

311


if y continuous f(y) ≥ 0

P (a ≤ y ≤ b) =∫ b

af(y) dy

∫ ∞

−∞f(y) dy = 1

Characteristic Measures of a Random Variable

The random nature of a variable is commonly represented in a limited way by a central measureand a ‘scatter’ measure. The central measure may be the mean, median, or mode.

The mean value is a weighed average, in which the weighing factors are the probabilities associatedwith each value. It is a measure of the central tendency or location of a probability distribution.The mean is commonly designated µ, and is defined mathematically by

µ =

∫ ∞

−∞y ·f(y) dy if y continuous

n∑

j=1

yj ·p(yj) if y discrete(6.1)

where n is the set size, yj is the jth discrete value, and p(yj) is the probability mass function.

The mean may also be called the expected value of y, designated E(y), or the long–run averagevalue of the random variable y; it is defined as

µ = E(y) =

∫ ∞

−∞y ·f(y) dy if y continuous

n∑

j=1

yj ·p(yj) if y discrete(6.2)

where E denotes the expected value operator .

The median is that value of the random variable for which any other sampled value is equallylikely to be above or below. Mathematically it is defined by means of the cumulative distributionfunction as

0.5 =∫ y

−∞f(y) dy = F (y) (6.3)

where y is the median. This can be generalized to give fractiles or percentiles

ξ =∫ yξ

−∞f(y) dy = F (yξ) (6.4)

312


Thus y0.25 is the value for y corresponding to a cumulative distribution function value of 0.25, or25% probability that a sampled y value will be less than y0.25; it is called the 25 percentile.

The mode is the most likely value of the random variable, and corresponds to the maximumvalue of the probability density function, or probability mass function for a discrete variable. Adensity function can be multimodal, usually as a result from combining two different populations.

The spread or dispersion of a probability distribution can be measured by the variance, also calledsecond central moment, defined as

σ2 =

∫ ∞

−∞(y − µ)2 ·f(y) dy if y continuous

n∑

j=1

(yj − µ)2 · p(yj) if y discrete(6.5)

The square root of the variance is the standard deviation, σ, which is the commonly used char-acteristic measure of dispersion, or ‘width’ of the density function.

Notice that the variance can be expressed entirely in terms of expectation since

σ2 = E[(y − µ)2] (6.6)

Finally, the variance is used so extensively that it is convenient to define a variance operator V

such that

V (y) = E[(y − µ)2] = σ2 (6.7)

The concepts of expected value and variance are used extensively in statistics, and it may behelpful to review several elementary results concerning these operators. If y is a random variablewith mean µ and variance σ2 and c is a constant, then

• E(c) = c

• E(y) = µ

• E(cy) = cE(y) = c µ

• V (c) = 0

• V (y) = σ2

• V (cy) = c2 V (y) = c2 σ2

• If there are two random variables, for example, y1 with E(y1) = µ1 and , V (y1) = σ21 , and

y2 with E(y2) = µ2 and V (y2) = σ22, then

E(y1 + y2) = E(y1) + E(y2) = µ1 + µ2

313


• It is possible to show that

V (y1 + y2) = V (y1) + V (y2) + 2Cov(y1,y2)

whereCov(y1,y2) = E[(y1 − µ1)·(y2 − µ2)]

is the covariance of the random variables y1 and y2. The covariance is a measure of thelinear association between y1 and y2. More specifically, it may be shown that if y1 and y2

are independent, then Cov(y1,y2) = 0.

• It may also be shown that

V (y1 − y2) = V (y1) + V (y2)− 2Cov(y1,y2)

• If y1 and y2 are independent, then

V (y1 − y2) = V (y1) + V (y2) = σ21 + σ2

2

and

E(y1 ·y2) = E(y1)·E(y2) = µ1 ·µ2

• However, note that, in general,

E

(y1

y2

)6= E(y1)

E(y2)

regardless of whether or not y1 and y2 are independent.

6.2.3 Sampling and Sampling Distributions

Random Sampling, Sample Mean, and Sample Variance

The objective of statistical inference is to draw conclusions about a population using a samplefrom that population. Most of the statistical inference methods assume that random samples areused. That is, if the population contains N elements and a sample of n of them is to be selected,then if each of the N !/(N − n)!n! possible samples has an equal probability of being chosen, theprocedure employed is called random sampling .

Statistical inference makes considerable use of quantities computed from the observations in thesample. Statistic is defined as any function of the observations in a sample that does not containunknown parameters. For example, suppose that y1,y2, . . . ,yn represents a sample. Then thesample mean

y =1n

n∑

i=1

yi (6.8)

314


and the sample variance

S2 =

n∑

i=1

(yi − y)2

n− 1(6.9)

are both statistics. These quantities are measures of the central tendency and dispersion ofthe sample, respectively. Sometimes S =

√S2, called the sample standard deviation, is used as a

measure of dispersion. Engineers often prefer to use the standard deviation to measure dispersionbecause its units are the same as those for the variable of interest y.

Properties of the Sample Mean and Sample Variance

The sample mean y is a point estimator of the population mean µ, and the sample varianceS2 is a point estimator of the population variance σ2. In general, an estimator of an unknownparameter is a statistic that corresponds to that parameter. Note that a point estimator is arandom variable. A particular numerical value of an estimator, computed from sample data, iscalled an estimate.

There are several properties required for good point estimators. Two of the most important arethe following:

1. The point estimator should be unbiased. That is, the long–run average or expected value ofthe point estimator should be the parameter that is being estimated. Although unbiased-ness is desirable, this property alone does not always make an estimator a good one.

2. An unbiased estimator should have minimum variance. e.g. a variance that is smaller thanthe variance of any other estimator of that parameter.

It may be easily shown that y and S2 are unbiased estimators of µ and σ2, respectively. Firstconsider y. Using the properties of expectation

E(y) = E

n∑

i=1

yi

n

=1n

n∑

i=1

E(yi) =1n

n∑

i=1

µ = µ (6.10)

since the expected value of each observation yi is µ. Thus, y is an unbiased estimator of µ.

Now consider the sample variance S2; one has

E(S2) = E

n∑

i=1

(yi − y)2

n− 1

=1

n− 1E

[n∑

i=1

(yi − y)2]

=1

n− 1E(SS)

where SS =∑

(yi − y)2 is the corrected sum of squares of the observations yi.

315


Since

E(SS) = E

[n∑

i=1

(yi − y)2]

= E

[n∑

i=1

y2i − ny2

]=

n∑

i=1

(µ2+σ2)−n (µ2+σ2/n) = (n−1)σ2 (6.11)

the expected sample variance reads

E(S2) =1

n− 1E(SS) = σ2 (6.12)

and it can be seen that S2 is an unbiased estimator of σ2.

Degrees of Freedom

The quantity (n− 1) in equation (6.11) is called the number of degrees of freedom of the sum ofsquares SS. This is a very general result; that is, if y is a random variable with variance σ2 andSS =

∑(yi − y)2 has ν degrees of freedom, then

E

(SS

ν

)= σ2 (6.13)

The number of degrees of freedom of a sum of squares is equal to the number of independentelements in that sum of squares.

Theoretical Continuous Distributions

Theoretical distributions tend to be used because of their convenienece, despite rather limitedphysical justification. There is little need for this in design applications, but purely numericalrepresentation of distributions can be quite adequate when numerical simulations are used. How-ever, if there is evidence that a theoretical distribution can be fitted well to a random variable,this is useful information, and more confidence can be placed in small–place information used todefine the distribution parameters.

It is often possible, therefore, to determine the probability distribution of a particular statistic ifone knows the probability distribution of the population from which the sample was drawn. Theprobability distribution of a statistic is called a sampling distribution. Several useful samplingdistributions are briefly discussed below.

Gamma Function

It may be recalled that for a Poisson process, with a mean rate of occurrence per unit interval ofλ, the mass probability function is

f (y;λ,t) =(λt)y ·e−λt

y!(6.14)

316


and it gives the probability that y events will occur in a given interval t, where λ is the meanoccurrence rate (events per unit interval) and y is the random variable.

Now envisage a situation, still with a Poisson process, when the experimenter wishes to considert as the random variable rather than y. To achieve this let y be k, now a constant. It must benoticed also that f (y;λ,t) is a probability, and cannot be converted directly to a density functionby simply redefining the variable. The transformation is made by working with the cumulativedistribution function for the random variable t.

By definitionP (t) = P (k occurrences in interval t or less)

The complement to this is

1− P (t) = P (not having k occurrences in t or less) =

P (0 occurrences in t or 1 occurrence in t . . . or k − 1 occurrences in t) =

P (0 occurrences in t) + P (1 occurrence in t) + . . . + P (k − 1 occurrences in t)

These probabilities are given by the Poisson distribution; thus

F (t) = 1−k−1∑

y=0

(λt)y ·e−λt

y!

The derivative of the Poisson distribution gives the gamma density function

f (t) =λ (λt)k−1 ·e−λt

(k − 1)!whose mean and variance are respectively

µ =k

λ, σ2 =

k

λ2

The gamma density function can also be generalized for noninteger k. It can be shown that ithas the form

f (t) =λ(λt)k−1 ·e−λt

Γ (k)(6.15)

where Γ (k) is the standard gamma function

Γ (k) =∫ ∞

0sk−1e−s ds (6.16)

Normal Distribution

The normal or Gaussian function is historically the dominant theoretical probability function inthe theory of statistics, and has a central role in the theory of statistical inference. Although ithas been widely used in scientific work to represent populations arising from natural phenomena,and in error theory, its use in engineering work is much more limited. Its main disavantages arethat it must be symmetrical, and the tails go to infinity at both ends.

317


If y is a normal random variable, then the probability density function of y has the form

f(y) =1

σ√

2πe−(y−µ)2/σ2 −∞ < y < ∞ (6.17)

where −∞ < µ < ∞ is the mean value of the distribution and σ > 0 is the standard deviation.The normal distribution is shown in Figure 6.7.

Because sample runs that differ as a result of experimental errors are often well described bya normal distribution, the normal distribution plays a central role in the analysis of data fromdesigned experiments. Many important sampling distributions may also be defined in terms ofnormal random variables. The notation y ∼ N(µ,σ2) is often used to denote that y is distributednormally with mean µ and variance σ2.

Figure 6.7. The normal distribution

An important special case of the normal distribution is the standard normal distribution; that.is,µ = 0 and σ2 = 1. It is evident that if y ∼ N(µ,σ2), then the random variable

z =y − µ

σ(6.18)

follows the standard normal distribution, denoted z ∼ N(0,1). The operation demonstrated inequation (6.18) is often called standardizing the normal random variable y.

The normal distribution may be theoretically justified in certain rather limited circumstances bythe central limit theorem.

The Central Limit Theorem

If y1,y2, . . . ,yn is a sequence of n independent and identically distributed random variables withE(yi) = µ and V (yi) = σ2 (both finite) and x = y1 + y2 + . . . + yn is the sum of a large numberof independent elements, each of which has a small effect on the sum, then the distribution of thevariable

zn =x− nµ√

nσ2

tends to be normal. In other terms, the statistic zn has an approximate N(0.1) distribution inthe sense that, if Fn(z) is the distribution function of zn and Φ(z) is the distribution function ofthe N(0,1) random variable, then

limn→∞[Fn(z)/Φ(z)] = 1

318


This result states essentially that the sum of of n independent and identically distributed randomvariables is approximately normally distributed. In many cases this approximation is good forvery small n, say n < 10, whereas in other cases large n is required, say n > 100. Frequently, theexperimenter thinks of the error in an experiment as arising in an additive manner from severalindependent sources; consequently, the normal distribution becomes a plausible model for thecombined experimental error.

Chi–square Distribution

Many statistical techniques assume that the random variable is normally distributed. An im-portant sampling distribution that can be defined in terms of normal random variables is thechi-square or χ2 distribution. If z1,z2, . . . ,zk are normally and independently distributed randomvariables with mean 0 and variance 1, denoted NID(0, 1), then the random variable

χ2k = z2

1 + z22 + . . . + z2

k

follows the chi-square distribution with k degrees of freedom.

The probability density function of chi-square is

f(χ2) =1

2k−2 Γ

(k

2

) (χ2)(k/2)−1 · e−χ2/2 (6.19)

where Γ is the standard gamma function.

Several chi-square distributions are shown in Figure 6.8.

Figure 6.8. Several chi–square distributions

The distribution is asymmetric or skewed , with mean and variance given respectively as

µ = k , σ2 = 2k

As an example of a random variable that follows the chi-square distribution, suppose thaty1,y2, . . . ,yn is a random sample from an N(µ,σ2) distribution.

319


Then

SS

σ2=

n∑

i=1

(yi − y)2

σ2∼ χ2

n−1 (6.20)

that is, a sum of squares in normal random variables when divided by σ2 follows the chi-squaredistribution.

As many of the statistical techniques involve the computation and manipulation of sums ofsquares, the result given in equation (6.20) is extremely important and occurs repeatedly.

Examining equation (6.9), the sample variance can be written as

S2 =SS

n− 1(6.21)

If the observations in the sample are NID(µ,σ2), then the distribution of S2 is [σ2/(n− 1)]χ2n−1

.Thus, the sampling distribution of the sample variance is a constant times the chi-square distri-bution if the population is normally distributed.

t distribution

If z and χ2k are independent standard normal and chi-square random variables, respectively, then

the random variable

tk =z√

χ2k/k

(6.22)

follows the t distribution (t–Student) with k degrees of freedom, denoted tk.

Figure 6.9. Several t distributions

320


The density function of t is

f(t) =Γ [(k + 1)/2]√

kπ Γ (k/2)· 1[(t2/k) + 1](k+1)/2

−∞ < t < ∞ (6.23)

and the mean and variance of t are µ = 0 and σ2 = k/(k − 2) for k > 2, respectively. Severalt distributions are shown in Figure 6.9. Note that if k = ∞, the t distribution becomes thestandard normal distribution. If y1,y2, . . . ,yn is a random sample from the N(µ,σ2) distribution,then the quantity

t =y − µ

S/√

n≈ tn−1 (6.24)

is distributed as t with (n− 1) degrees of freedom.

F distribution

The final sampling distribution considered is the F distribution (Fisher distribution). If χ2u and

χ2v are two independent chi-square random variables with u and v degrees of freedom, respectively,

then the ratio

Fu,v =χ2

u/u

χ2u/v

(6.25)

follows the F distribution with u numerator degrees of freedom and v denominator degrees offreedom. The probability distribution of F is

h(F ) =Γ

(u + v

2

) (u

v

)u/2

F (u/2)−1

Γ

(u

2

)Γ

(v

2

) [(u

v

)F + 1

](u+v)/2(6.26)

Several F distributions are shown in Figure 6.10. This distribution is very important in thestatistical analysis of designed experiments.

Figure 6.10. Several F distributions

321


6.2.4 Inferences about the Differences in Means, Randomized Designs

In this subsection it will be discussed how the data from a simple comparative experiment can beanalyzed using hypothesis testing and confidence interval procedures for comparing two treatmentmeans. It will be assumed that a completely randomized experimental design is used. In such adesign, the data are viewed as if they were a random sample from a normal distribution.

Hypothesis Testing

A statistical hypothesis is a statement about the parameters of a probability distribution. Thismay be stated formally as

H◦ : µ1 = µ2

H1 : µ1 6= µ2

where µ1 and µ2 are the mean values of the two treatments. The statement H◦ : µ1 = µ2 iscalled the null hypothesis and H1 : µ1 6= µ2 is called the alternative hypothesis. The alternativehypothesis specified here is called a two–sided alternative hypothesis since it would be true eitherif µ1 < µ2 or if µ1 > µ2.

To test a hypothesis a procedure is devised for taking a random sample, computing an appro-priate test statistic, and then rejecting or failing to reject the null hypothesis H◦. Part of thisprocedure is specifying the set of values for the test statistic that leads to rejection of H◦. Thisset of values is called the critical region or rejection region for the test.

Two kinds of errors may be committed when testing hypotheses. If the null hypothesis is rejectedwhen it is true, then a type I error has occurred. If the null hypothesis is not rejected when it isfalse, then a type II error has been made. The probabilities of these two errors are given specialsymbols:

α = P (type I error) = P (rejectH◦ | H◦ is true)

β = P (type II error) = P (fail to rejectH◦ | H◦ is false)

}

Sometimes it is more convenient to work with the power of the test, where

Power = 1− β = P (rejectH◦ | H◦ is false)

The general procedure in hypothesis testing is to specify a value of the probability of type I errorα, often called the significance level of the test, and then design the test procedure so that theprobability of type II error β has a suitably small value.

Suppose that it could be assumed that the variances of two treatments were identical for bothformulations. Then an appropriate test statistic to use for comparing two treatment means inthe completely randomized design is

t◦ =y1 − y2

Sp

√1n1

+1n2

(6.27)

322


where y1 and y2 are the sample means, n1 and n2 are the sample size, S2p is an estimate of the

common variance σ21 = σ2

2 = σ2 computed from

S2p =

(n1 − 1)S21 + (n2 − 1)S2

2

n1 + n2 − 2(6.28)

and S21 and S2

2 are the two individual sample variances.

To determine whether to reject the hypothesis H◦ : µ1 = µ2, the experimenter would comparet◦ to the t distribution with (n1 + n2 − 2) degrees of freedom. If | t◦ | > tα/2,n1+n2−2 , wheretα/2,n1+n2−2 is the upper α/2 percentage point of the t distribution with (n1 + n2 − 2) degreesof freedom, the experimenter would reject H◦ and conclude that the mean values of the twotreatments of an experiment differ.

In some problems, the experimenter may wish to reject the hypothesis H◦ only if one mean is largerthan the other. Thus, he/she would specify a one–sided alternative hypothesis H1 : µ1 > µ2

and would reject H◦ only if t◦ > tα,n1+n2−2. If the experimenter wants to reject H◦ only if µ1

is less than µ2, then the alternative hypothesis is H1 : µ1 < µ2 and he/she would reject H◦ ift◦ < −tα,n1+n2−2.

Choice of Sample Size

Selection of an appropriate sample size is one of the most important aspects of any experimentaldesign problem. The choice of sample size and the probability of type II error β are closelyconnected. Suppose that the experimenter is testing the hypothesis

H◦ : µ1 = µ2

H1 : µ1 6= µ2

and that the means are not equal so that δ = µ1 − µ2. Since the hypothesis H◦ : µ1 = µ2 isnot true, one is concerned about wrongly failing to reject H◦. The probability of type II errordepends on the true difference in means δ. A graph of β versus δ for a particular sample sizeis called the operating characteristic curve for the test. The β error is also a function of samplesize. Generally, for a given value of δ, the β error decreases as the sample size increases. That is,a specified difference in means is easier to detect for larger sample sizes than for smaller ones.

A set of operating characteristic curves for the hypotheses

H◦ : µ1 = µ2

H1 : µ1 6= µ2

for the case where the two population variances σ21 and σ2

2 are unknown but equal (σ21 = σ2

2 =σ2 ) and for a level of significance of α = 0.05 is shown in Figure 6.11. The curves also assumethat the sample sizes from the two populations are equal; that is, n1 = n2 = n. The parameteron the horizontal axis in Figure 6.11 is

d =| µ1 − µ2 |

2σ=| δ |2σ

323


Dividing |δ | by 2σ allows the experimenter to use the same set of curves, regardless of the valueof the variance (the difference in means is expressed in standard deviation units). Furthermore,the sample size used to construct the curves is actually n∗ = 2n− 1.

From examining these curves, the following is noted:• the greater the difference in means, µ1 − µ2, the smaller the probability of type II error for

a given sample size and the probability α; that is, for a specified sample size and α, thetest will detect large differences more easily than small ones;

• as the sample size gets larger, the probability of type II error gets smaller for a given differ-ence in means and α; that is, to detect a specified difference δ, the experimenter may makethe test more powerful by increasing the sample size.

Figure 6.11. Operating characteristics curves for the two–sided t-test with α = 0.05

Operating characteristic curves are often helpful in selecting a sample size to use in an exper-iment. They often play an important role in the choice of sample size in experimental designproblems. For a discussion of the uses of operating characteristic curves for other simple compar-ative experiments similar, see Hines and Montgomery (1990).

Confidence Intervals

Although hypothesis testing is a useful procedure, it sometimes does not tell the entire story. Itis often preferable to provide an interval within which the value of the parameter or parametersin question would be expected to lie. These interval statements are called confidence intervals.In many engineering and industrial experiments, the experimenter already knows that the meansµ1 and µ2 differ; consequently, hypothesis testing on µ1 = µ2 is of little interest. The experi-menter would usually be more interested in a confidence interval on the difference in means µ1−µ2.

324


To define a confidence interval, suppose that θ is an unknown parameter. To obtain an intervalestimate of θ, the experimenter needs to find two statistics L and U such that the followingprobabilistic probability statement holds

P (L ≤ θ ≤ U) = 1− α (6.29)

where the interval

L ≤ θ ≤ U (6.30)

is called a 100 (1−α) percent confidence interval for the parameter θ. The interpretation of thisinterval is that if, in repeated random samplings, a large number of such intervals are constructed,100 (1−α) percent of them will contain the true value of θ. The statistics L and U are called thelower and upper confidence limits, respectively, and (1 − α) is called the confidence coefficient .If α = 0.05, then equation (6.30) is called a 95 percent confidence interval for θ. Note thatconfidence intervals have a frequency interpretation; that is, the experimenter does not know ifthe statement is true for this specific sample, but he/she does know that the method used toproduce the confidence interval yields correct statements 100 (1− α) percent of the times.

Case where σ21 6= σ2

2

If the experimenter is testing the hypothesis

H◦ : µ1 = µ2

H1 : µ1 6= µ2

and cannot reasonably assume that the variances σ21 and σ2

1 are equal, then the two–sample t testmust be modified slightly. The test statistic becomes

t◦ =y1 − y2√S2

1

n1+

S22

n2

(6.31)

This statistic is not distributed exactly as t. However, the distribution of t◦ is well approximatedby t if one uses

v =

(S2

1

n1+

S22

n2

)2

(S21/n1)2

n1 − 1+

(S22/n2)2

n2 − 1

(6.32)

as the degrees of freedom.

325


Case where σ21 and σ2

2 are known

If the variances of both populations are known, then the hypothesis

H◦ : µ1 = µ2

H1 : µ1 6= µ2

may be tested using the statistic

Z◦ =y1 − y2√σ2

1

n1+

σ22

n2

(6.33)

If both populations are normal, or if the sample sizes are large enough so that the central limittheorem applies, the distribution of Z◦ is N(0,1) if the null hypothesis is true. Thus, the criticalregion would be found using the normal distribution rather than the t distribution. Specifically,the experimenter would reject H◦ if |Z◦ | > Zα/2 , where Zα/2 is the upper α/2 percentage pointof the standard normal distribution.

Unlike the t test as considered previously, the test on means with known variances does not re-quire the assumption of sampling from normal populations. One can use the central limit theoremto justify an approximate normal distribution for the difference in sample means y1 − y2.

The 100 (1− α) percent confidence interval on µ1 − µ2, where the variances are known, is

y1 − y1 − Zα/2

√σ2

1

n1+

σ21

n1≤ µ1 − µ2 ≤ y1 − y1 + Zα/2

√σ2

1

n1+

σ21

n1(6.34)

As noted previously, the confidence interval is often a useful supplement to the hypothesis testingprocedure.

Comparing a Single Mean to a Specified Value

Some experiments involve comparing only one population mean µ to a specified value, say µ◦.The hypotheses are

H◦ : µ1 = µ2

H1 : µ1 6= µ2

If the population is normal with known variance, or if the population is nonnormal but the samplesize is large enough so that the central limit theorem applies, then the hypothesis may be testedusing a direct application of the normal distribution. The test statistic is

Z◦ =y − µ◦σ/√

n(6.35)

If the hypothesis H◦ : µ1 = µ2 is true, then the distribution of Z◦ is N(0,1). Therefore, thedecision rule for H◦ : µ1 = µ2 is to reject the null hypothesis if |Z◦ | > Zα/2. The value of the

326

6.3 – Experiments with a Single Factor

mean µ◦ specified in the null hypothesis is usually determined in one of three ways. It may resultfrom past evidence, knowledge, or experimentation. It may be the result of some theory or modeldescribing the situation under study. Finally, it may be the result of contractual specifications.The 100 (1− α) percent confidence interval on the true population mean is

y − Zα/2 ·σ/√

n ≤ µ ≤ y + Zα/2 ·σ/√

n (6.36)

6.3 Experiments with a Single Factor

In previous section, methods have been discussed for comparing two conditions or treatments.Another way to describe an experiment is to consider it as a single–factor experiment with a

levels of the factor (variable), where the factor is formulation of the phenomenon (property) andthe levels are the different formulation methods.

6.3.1 Analysis of Variance

Suppose there are a treatments or different levels of a single factor that the experimenter wishesto compare. The observed response from each of the a treatments is a random variable. Generally,the data would appear as in Table 6.1 where each entry, yij , represents the jth observation takenunder treatment i. There will be, in general, n observations under the ith treatment.

Level Observations Totals Averages

1 y11 y12 . . . y1n y1 y1

2 y21 y22 . . . y2n y2 y2

......

... . . ....

......

a ya1 ya2 . . . yan ya ya

y y

Table 6.1. Typical data for a single-factor experiment

It may be useful to describe the observations by means of a linear statistical model:

yij = µ + τi + εij

{i = 1,2, . . . ,a

j = 1,2, . . . ,n(6.37)

where µ is a parameter common to all treatments called the overall mean, τi is a parameterunique to the ith treatment called the ith treatment effect , and εij is a random error component.The objectives are to test appropriate hypotheses about the treatment effects and to estimatethem. For hypothesis testing, the model errors are assumed to be normally and independentlydistributed random variables with mean zero and variance σ2 which is assumed to be constantfor all levels of the factor.

327


This model is called the one-way or single-factor analysis of variance because only one vari-able is investigated. Furthermore, it is required that the experiment be performed in randomorder so that the environment in which the treatments are used (often called the experimentalunits) is as uniform as possible. Thus, the experimental design is a completely randomized design.

Actually, the statistical model, equation (6.37), describes two different situations with respect tothe treatment effects. First, the a treatments could have been specifically chosen by the experi-menter when he/she wishes to test hypotheses about the treatment means. In this situation theconclusions will apply only to the factor levels considered in the analysis and cannot be extendedto similar treatments that were not explicitly considered. The experimenter might also wish toestimate the model parameters (µ, τi, σ2). This is called the fixed effects model .

Alternatively, the a treatments could be a random sample from a larger population of treatments.In this situation the experimenter should like to be able to extend the conclusions (which arebased on the sample of treatments) to all treatments in the population, were they explicitlyconsidered in the analysis or not. Here the τi are random variables, and knowledge about theparticular ones investigated is relatively useless. Instead, the experimenter tests hypotheses aboutthe variability of the τi and try to estimate this variability. This is called the random effects modelor components of variance model .

6.3.2 Fixed Effects Model

In the fixed effects model the treatment effects τi are usually defined as deviations from the overallmean, so

a∑

i=1

τi = 0 (6.38)

Let yi represent the total of the observations under the ith treatment and yi represent the averageof the observations under the ith treatment (Table 6.1). Similarly, let y represent the grandtotal of all the observations and y represent the grand average of all the observations. Expressedsymbolically

yi. =n∑

j=1

yij yi. = yi./n i = 1,2, . . . ,n

y.. =a∑

i=1

n∑

j=1

yij y.. = y../N

(6.39)

where N = an is the total number of observations, whereas the ‘dot’ subscript notation impliessummation over the subscript that it replaces.

The mean of the ith treatment is E(yij) = µi = µ + τi , i = 1,2, . . . ,a. Thus, the mean of theith treatment consists of the overall mean plus the ith treatment effect. The experimenter isinterested in testing the equality of the a treatment means; that is,

328


H◦ : µ1 = µ2 = . . . = µa

H1 : µi 6= µj for at least one pair (i,j)

}

Note that if the hypothesis H◦ is true, all treatments have a common mean µ. An equivalent wayto write the above hypotheses is in terms of the treatment effects τi, say

H◦ : τ1 = τ2 = . . . = τa = 0

H1 : τi 6= 0 for at least one i

}

Thus, it is equivalent to speak of testing the equality of treatment means or testing that thetreatment effects (the τi) are zero. The appropriate procedure for testing the equality of a

treatment means is the analysis of variance.

Decomposition of the Total Sum of Squares

The term analysis of variance is derived from a partitioning of total variability into its componentparts. The total corrected sum of squares

SST =a∑

i=1

n∑

j=1

(yij − y..)2

is used as a measure of overall variability in the data. Intuitively, this is reasonable since, if theexperimenter were to divide SST by the appropriate number of degrees of freedom (in this case,an−1 = N−1), he/she would have the sample variance of the y’s, which is, of course, a standardmeasure of variability.

Notice that the total corrected sum of squares SST may be written as

a∑

i=1

n∑

j=1

(yij − y..)2 =a∑

i=1

n∑

j=1

[(yi. − y..) + (yij − yi.)]2

= na∑

i=1

(yi. − y..)2 +a∑

i=1

n∑

j=1

(yij − yi.)2 + 2a∑

i=1

n∑

j=1

(yi. − y..)·(yij − yi.)

However, the cross–product term in equation (6.40) is zero, sincen∑

j=1

(yij − yi.) = yi. − n yi. = yi. − n (yi./n) = 0

Therefore, the fundamental analysis of variance identity is obtaimed asa∑

i=1

n∑

j=1

(yij − y..)2 = na∑

i=1

(yi. − y..)2 +a∑

i=1

n∑

j=1

(yij − yi.)2 (6.40)

329


Equation (6.40) states that the total variability in the data, as measured by the total correctedsum of squares, can be partitioned into a sum of squares of the differences between the treatmentaverages and the grand average, plus a sum of squares of the differences of observations withintreatments from the treatment averages. Now, the difference between the observed treatmentaverages and the grand average is a measure of the differences between treatment means, whereasthe differences of observations within a treatment from the treatment average can be due only torandom error. Thus, equation (6.40) can be written symbolically as

SST = SSTreatments + SSE

where SSTreatments is called the sum of squares due to treatments (i.e., between treatments), andSSE is called the sum of squares due to error (i.e., within treatments). There are an = N

total observations; thus, SST has (N − 1) degrees of freedom. There are a levels of the factor(and a treatment means), so SSTreatments has (a − 1) degrees of freedom. Finally, within anytreatment there are n replicates providing (n− 1) degrees of freedom with which to estimate theexperimental error. Since there are a treatments, there are a (n− 1) = an − a = N − a degreesof freedom for error.

It is helpful to examine explicitly the two terms on the right–hand side of the fundamental analysisof variance identity. e.g. equation (6.40). From the error sum of squares SSE , combination of asample variances gives an estimate of the common variance within each of the a treatments asfollows

MSE =SSE

N − a=

a∑

i=1

n∑

j=1

(yij − yi.)2

a∑

i=1

(n− 1)

Similarly, if there were no differences between the a treatment means, one could use the differencesof the treatment averages from the grand average to estimate σ2. Specifically,

MSTreatments =SSTreatments

a− 1=

na∑

i=1

(yi. − y..)2

a− 1

is an estimate of σ2 if the treatments means are equal.

It may be observed that the analysis of variance identity, given by equation (6.40), provides theexperimenter with two estimates of σ2: one based on the inherent variability within treatmentsand one based on the variability between treatments. If there are no differences in the treat-ment means, these two estimates should be very similar, and if they are not, the experimentershould suspect that the observed difference must be caused by differences in the treatment means.Although an intuitive argument has been used to develop this result, a somewhat more formalapproach can be taken.

330


The quantities MSTreatments and MSE are called mean squares. Their expected values whenintroducing the linear statistical model given by equation (6.37) into previous equations areobtained respectively as

E (MSTreatments) = σ2 +

na∑

i=1

τ2i

a− 1

and

E (MSE) = σ2

Thus MSE = SSE/(N − a) estimates σ2, and, if there are no differences in treatment means(which implies that τi = 0), MSTreatments = SSTreatments/(a−1) also estimates σ2. However, notethat if treatment means do differ, the expected value of the treatment mean square is greaterthan σ2.

It seems clear that a test of the hypothesis of no difference in treatment means can be performedby comparing MSTreatments and MSE . Therefore, it will be illustrated how this comparison maybe made.

Statistical Analysis

As a test of the hypothesis of no difference in treatment means can be performed by comparingMSTreatments and MSE , it is proper to investigate how a formal test of the hypothesis of no differ-ences in treatment means (H◦ : µ1 = µ2 = . . . = µa, or equivalently, H◦ : τ1 = τ2 = . . . = τa = 0)can be performed. Since it has been assumed that the errors εij are normally and independentlydistributed with mean zero and variance σ2, the observations yij are normally and independentlydistributed with mean µ + τi and variance σ2. Thus, SST is the total sum of squares in nor-mally distributed random variables; consequently, it can be shown that SST /σ2 is chi–squaredistributed with (N − 1) degrees of freedom. Furthermore, it can be shown that also SSE/σ2 ischi–square with (N − a) degrees of freedom and that SSTreatments/σ2 is chi-square with (a − 1)degrees of freedom if the null hypothesis H◦ : τi = 0 is true. However, all three sums of squaresare not independent since SSTreatments and SSE add to SST .

However, since according to Cochran’s theorem SSTreatments/σ2 and SSE/σ2 are independentlydistributed chi–square random variables, if the null hypothesis of no difference in treatment meansis true, the statistic ratio

F◦ =SSTreatments/(a− 1)

SSE/(N − a)=

MSTreatments

MSE(6.41)

is distributed as F with (a − 1) and (N − a) degrees of freedom. Equation (6.41) is the teststatistic for the hypothesis of no differences in treatment means.

From the expected mean squares it may be noticed that, in general, MSE is an unbiased estimatorof σ2. Also MSTreatments is an unbiased estimator of σ2 under the null hypothesis. However, if the

331


null hypothesis is false, then the expected value of MSTreatments is greater than σ2. Therefore,under the alternative hypothesis, the expected value of the numerator of the test statistic inequation (6.41) is greater than the expected value of the denominator, and the experimentershould reject H◦ on values of the test statistic that are too large. This implies an upper–tail,one–tail critical region. The experimenter would reject H◦ if

F◦ > Fα, a−1. N−a

where F◦ is computed from equation (6.41).

Computation of formulas for the sums of squares may be obtained by rewriting and simplifyingthe definitions of SSTreatments and SST in equation (6.40). This yields

SST =a∑

i=1

n∑

j=1

y2ij −

y2..

N(6.42)

and

SSTreatments =a∑

i=1

y2i.

n− y2

..

N(6.43)

The error sum of squares is obtained by subtraction as

SSE = SST − SSTreatments (6.44)

The test procedure is summarized in Table 6.2, which is called an analysis of variance table.

Source of Sum of Degrees of MeanVariation Squares Freedom Square F◦

Between treatments SSTreatments a− 1 MSTreatmentsMSTreatments

MSE

Error (within treatments) SSE N − a MSE

Total SST N − 1

Table 6.2. Analysis of variance table for the single-factor, fixed effects model

Information from Computer Packages

It may be observed that the sums of squares have been defined in terms of averages; that is, forexample, from equation (6.40)

SSTreatments = na∑

i=1

(yi. − y)2

but the computing formulas were developed using totals; for example, to compute SSTreatments

the analyst would use equation (6.43)

332


SSTreatments =a∑

i=1

y2i.

n− y2

..

N

The reason for this is numerical accuracy; the totals yi. and y.. are not subject to rounding error,whereas the averages yi. and y.. are. Generally, the experimenter needs not be too concerned withcomputing, as there are many widely available computer programs for performing the calcula-tions. These computer programs are also helpful in performing many other analyses associatedwith experimental design (such as residual analysis, etc.).

The sum of squares corresponding to the ‘model’ is the usual SSTreatments for a single factordesign. These sums of squares are always identical for a balanced design, and in case of a single–factor design, they are the same as the model sum of squares.

In addition to the basic analysis of variance, the computer programs display some additionaluseful information such as R2, R2

adj, std-dev, t-test, etc. The computer programs also calculateand display the residuals.

R-square

The quantity R2, also called coefficient of determination, is calculated as

R2 =SSR

SSTotal= 1− SSE

SST

It is loosely interpreted as the proportion of the variability in a data set that is accounted for bya statistical model and ‘explained’ by the analysis of variance model. In this definitiom, the term‘variability’ stands for variance or, equivalent, sum of squares. Clearly, it must result 0 < R2 < 1,with larger values being more desirable, e.g. values approaching 1 are valuable.

R–square is the relative predictive power of a model. If SSE is much smaller than SST , thenthe model fits well. R–square can be a lousy measure of goodness–of–fit, especially when it ismisused. By definition, R2 is the fraction of the total squared error that is explained by themodel. But some data contain irreducible error, and no amount on the limiting value of R2.Sadly, many practitioners pursue very high order polynomila models in the mistaken but widelyheld belief that as the number of variables approaches the number of observations the model canbe made to pass through every point.

It must be noticed that R2 does not tell whether:

• the independent variables are a true cause of the changes in the dependent variable;• omitted variable bias exists;• the corrected regression was used• the most appropriate set of independent variables has been chosen• co–linearity is present in the data.

333


Adjusted R-square

The main drawback of R–square is that it always increases as the number of variables in themodel increases The alternative technique is to look for the adjusted R-square (R2

adj) statistic.

Adjusted R-square is a modification of R2 that adjusts for the number of variables in a model.Unlike R2, the adjusted R2 increases only if the new term improves the model. R2

adj can be evennegative, and will always be less than or equal to R2. It is defined as

R2adj = 1− SSE/(n− p)

SST /(n− 1)= (1−R2)− n− 1

n− p− 1

where p is the total number of variables in a linear model, and n is the sample size. In general,the adjusted-R2 statistic will not always increase as variables are added to the model. In fact, ifunnecessary terms are added, the value of R2

adj will often decrease.

Standard Deviation

Standard deviation ‘std dev’ is the square root of the error mean square and ‘CV’ is the coefficientof variation, defined as (

√MSE/y)·100. The coefficient of variation measures the unexplained or

residual variability in the data as a percentage of the mean of the response variable.

Student t-test

A t-test is any statistical hypothesis test for two models in which the test statistic has a Student’st distribution if the null hypothesis is true. The t statistic was introduced by William Sealy Gossetfor cheaply monitoring the quality of beer brews. Student was his pen name. Today, it is moregenerally applied to the confidence that can be placed in judgements made from small samples.

Among the most frequent used t tests:

• a test of null hypothesis that the means of two normally distributed populations are equal;

• a test of whether the mean of a normally distributed population has a value specified in anull hypothesis;

• a test of whether the slope of a regression line differs significantly from 0.

6.3.3 Model Adequacy Checking

The decomposition of the variability in observations through the fundamental analysis of vari-ance identity, as given in equation (6.40), is a purely algebraic relationship. However, the useof the partitioning to test formally for no differences in treatment means requires that certainassumptions be satisfied.

Specifically, these assumptions are that the observations are adequately described by the model


334


and that the errors are normally and independently distributed with mean zero and constant butunknown variance σ2. If these assumptions are valid, then the analysis of variance procedure isan exact test of the hypothesis of no difference in treatment means.

In practice, however, these assumptions will usually not hold exactly. Consequently, it is usu-ally unwise to rely on the analysis of variance until the validity of these assumptions has beenchecked. Violations of the basic assumptions and model adequacy can be easily investigated bythe examination of residuals. The residual for observation j in treatment i is defined as

eij = yij − yij (6.45)

where yij is an estimate of the corresponding observation yij obtained as follows

yij = µ + τi = y.. + (yi. − y..) = yi. (6.46)

That is, the residuals for the ith treatment are found by subtracting the treatment average fromeach observation in that treatment. Model adequacy checking usually consists of plotting theresiduals as described below. Such a diagnostic checking should be a routine part of every exper-imental design project. Equation (6.46) gives the intuitively appealing result that the estimateof any observation in the ith treatment is just the corresponding treatment average.

Examination of the residuals should be an automatic part of any analysis of variance. If themodel is adequate, the residuals should be structureless; that is, they should contain no obviouspatterns. Through a study of residuals, many types of model inadequacies and violations of theunderlying assumptions can be discovered.

Normality Assumption

A check of the normality assumption may be made by plotting a histogram of the residuals. If theNID(0,σ2) assumption on the errors is satisfied, then this plot should look like a sample from anormal distribution centered at zero. Unfortunately, with small samples, considerable fluctuationoften occurs, so the appearance of a moderate departure from normality does not necessarilyimply a serious violation of the assumptions. Gross deviations from normality are potentiallyserious and require further analysis.

Another useful procedure is to construct a normal probability plot of the residuals. A nor-mal probability plot is just a graph of the cumulative distribution of the residuals on a normalprobability paper , that is, graph paper with the ordinate scaled so that the cumulative normaldistribution plots as a straight line. To construct a normal probability plot, arrange the residualsin increasing order and plot the kth of these ordered residuals versus the cumulative probabilitypoint Pk = (k− 1/2)/N on normal probability paper. If the underlying error distribution is nor-mal, this plot will resemble a straight line. In visualizing the straight line, place more emphasison the central values of the plot than on the extremes.

The normal probability plot is shown in Figure 6.12 with the residuals plotted versus (1−Pk)×100on the right vertical scale. Note that the bottom of this figure also gives a dot diagram of the

335


residuals. The general impression from examining this display is that the error distribution maybe slightly skewed, with the right tail being longer than the left. The tendency of the normalprobability plot to bend down slightly on the left side implies that the left tail of the errordistribution is somewhat thinner than would be anticipated in a normal distribution; that is, thenegative residuals are not quite as large (in absolute value) as expected.

Figure 6.12. Normal probability plot and dot diagram of residuals

In general, moderate departures from normality are of little concern in the fixed effects analysisof variance. An error distribution that has considerably thicker or thinner tails than the normalis of more concern than a skewed distribution. Since the F–test is only slightly affected, it can besaid that the analysis of variance (and related procedures such as multiple comparisons) is robustto the normality assumption. Departures from normality usually cause both the true significancelevel and the power to differ slightly from the foreseen values, with the power generally being lower.

The random effects model is more severely impacted by nonnormality. In particular, the trueconfidence levels on interval estimates of variance components may differ greatly from the foreseenvalues.

336


Analysis of Residuals

If the model is correct and if the assumptions are satisfied, the residuals should be structureless;in particular, they should be unrelated to any other variable including the response yij . A simplecheck is to plot the residuals versus the fitted values yij . For the single factor model, rememberthat yij = yij , the ith treatment average. This plot should not reveal any obvious pattern. Figure6.13 plots the residuals versus the fitted values for some experiment. No unusual structure isapparent.

Figure 6.13. Plot of residuals versus fitted values

A defect that occasionally shows up on this plot is nonconstant variance. Sometimes the varianceof the observations increases as the magnitude of the observation increases. This would be thecase if the error or background noise in the experiment was a constant percentage of the size ofthe observation. This commonly happens with many measuring instruments - error is a percentof the scale reading. If this were the case, the residuals would get larger as yij gets larger, and theplot of residuals versus yij would look like an outward opening funnel or megaphone. Nonconstantvariance also arises in cases where the data follow a nonnormal, skewed distribution because inskewed distributions the variance tends to be a function of the mean.

If the assumption of homogeneity of variances is violated, the F–test is only slightly affected inthe balanced fixed effects model. However, in unbalanced designs or in cases where one varianceis very much larger than the others, the problem is more serious. For the random effects model,unequal error variances can significantly disturb inferences on variance components even if bal-anced designs are used.

The usual approach to dealing with nonconstant variance when it occurs for the above reasonsis to apply a variance–stabilizing transformation and then to run the analysis of variance on the

337


transformed data. In this approach, it should be noticed that the conclusions of the analysis ofvariance apply to the transformed populations too.

Considerable research has been devoted to the selection of an appropriate transformation. Ifexperimenters know the theoretical distribution of the observations, they may utilize this in-formation in choosing a transformation. For example, if the observations follow the Poissondistribution, then the square root transformation y∗ij = √

yij or y∗ij =√

1 + yij would be used.If the data follow the log–normal distribution, then the logarithmic transformation y∗ij = log yij isappropriate. For binomial data expressed as fractions, the arcsin transformation y∗ij = arcsin√yij

is useful. When there is no obvious transformation, the experimenter usually empirically seeksa transformation that equalizes the variance regardless of the value of the mean. In factorialexperiments another approach is to select a transformation that minimizes the interaction meansquare, resulting in an experiment that is easier to interpret. Transformations made for inequalityof variance also affect the form of the error distribution. In most cases, the transformation bringsthe error distribution closer to norma.

6.3.4 Random Effects Model

Experimenters are frequently interested in factors that have a large number of possible levels. Ifhe/she randomly selects a of these levels from the population of factor levels, then it is said thatthe factor is random. Because the levels of the factor actually used in the experiment were chosenrandomly, inferences are made about the entire population of factor levels. It is assumed thatthe population of factor levels is either of infinite size or is large enough to be considered infinite.Situations in which the population of factor levels is small enough to employ a finite populationapproach are not encountered frequently.

The linear statistical model called the components of variance or random effects model is


{i = 1,2, . . . ,a

j = 1,2, . . . ,n(6.47)

where both τi and εij are now random variables. If τi has variance σ2τ and is independent of εij ,

the variance of any observation is

V (yij) = σ2τ + σ2

where the variances σ2τ and σ2 are called variance components, and the model given in equation

(6.47).

The sum of squares identity

SST = SSTreatments + SSE (6.48)

is still valid. That is, the total variability in the observations is partitioned into a componentthat measures the variation between treatments (SSTreatments) and a component that measuresthe variation within treatments (SSE).

338


Testing hypotheses about individual treatment effects is meaningless; so instead the experimentertests the hypotheses

H◦ : σ2τ = 0

H1 : σ2τ > 0

If σ2τ = 0, all treatments are identical; but if σ2

τ > 0, variability exists between treatments. Asfor the fixed effects model, SSE/σ2 is distributed as chi-square with (N − a) degrees of freedom,and under the null hypothesis, SSTreatments is distributed as chi-square with (a − 1) degrees offreedom. Both random variables are independent. Thus, under the null hypothesis σ2

τ = 0, thestatistic ratio

F◦ =

SSTreatments

a− 1SSE

N − a

=MSTreatments

MSE(6.49)

is distributed as F with (a−1) and (N−a) degrees of freedom. However, the experimenter needsto examine the expected mean squares to fully describe the test procedure. It can be shown that

E (MSTreatments) = σ2 + n σ2τ (6.50)

E (MSE) = σ2 (6.51)

From the expected mean squares, the experimenter can see that under the hypothesis H◦ both thenumerator and denominator of the test statistic, as in equation (6.49), are unbiased estimatorsof σ2, whereas under the hypothesis H1 the expected value of the numerator is greater than theexpected value of the denominator. Therefore, the experimenter should reject H◦ for values ofF◦ that are too large. This implies an upper-tail, one-tail critical region, so he/she will reject H◦if F◦ > Fα, a−1, N−a.

The computational procedure and analysis of the variance table for the random effects modelare identical to those for the fixed effects model. The conclusions, however, are quite differentbecause they apply to the entire population of treatments.

The experimenter is usually interested in estimating the variance components (σ2 and σ2τ ) in

the model. The procedure that is used to estimate σ2 and σ2τ is called the analysis of variance

method because it makes use of the lines in the analysis of variance table. The procedure consistsof equating the expected mean squares to their observed values in the analysis of variance tableand solving for the variance components. In equating observed and expected mean squares in thesingle–factor random effects model, one obtains

MSTreatments = σ2 + nσ2τ

and

MSE = σ2

339


Therefore, the estimators of the variance components are

σ2 = MSE (6.52)

σ2τ =

MSTreatments −MSE

n(6.53)

For unequal sample sizes, replace n in equation (6.53) by

n◦ =1

a− 1

a∑

i=1

ni −

a∑

i=1

n2i

a∑

i=1

ni

(6.54)

The analysis of variance method of variance component estimation does not require the normalityassumption. It does yield estimators of σ2 and σ2

τ that are best quadratic unbiased (i.e., of allunbiased quadratic functions of the observations, these estimators have minimum variance).

Occasionally, the analysis of variance method produces a negative estimate of a variance com-ponent. Clearly, variance components are by definition nonnegative, so a negative estimate of avariance component has to be viewed with some concern. One course of action is to accept theestimate and use it as evidence that the true value of the variance component is zero, assumingthat sampling variation led to the negative estimate. This has intuitive appeal, but it suffersfrom some theoretical difficulties. For instance, using zero in place of the negative estimate candisturb the statistical properties of other estimates. Another alternative is to re–estimate thenegative variance component using a method that always yields nonnegative estimates. Still an-other alternative is to consider the negative estimate as evidence that the assumed linear modelis incorrect and reexamine the problem.

6.4 Response Surface Methodology

Response surface methodology (RSM) is a collection of statistical and mathematical techniquesuseful for developing, improving, and optimizing processes. It also has important applications inthe design, development, and formulation of new technical products, as well as in the improve-ment of existing technical systems.

The origin of RSM is the seminal paper by Box and Wilson (1951), which had a profound impacton industrial applications of experimental design, and was the motivation of much of the researchin the field. The monograph by Myers (1976) was the first book devoted exclusively to RSM.There are also two other full–length books on the subject: Box and Draper (1987) and Khuri andCornell (1996). The paper by Myers (1999) on future directions in RSM offers a view of researchneeds in the field.

340

6.4 – Response Surface Methodology

The most extensive applications of RSM are in the industrial world, particularly in situationswhere several input variables potentially influence some performance measure or quality charac-teristic of the product or process. This performance measure or quality characteristic is calledthe response, which is typically measured on a continuous scale, although attribute responsesand ranks are not unusual. The input variables are sometimes called independent variables, andthey are subject to the control of the engineer or scientist at least for purposes of a test or anexperiment.

Figure 6.14. Response surface (a) and contour plot (b) of a theoretical response surface

Figure 6.14 shows graphically the relationship between the response variable y in an industrialprocess and the two process variables (or independent variables) ξ1 and ξ2. Note that for eachvalue of ξ1 and ξ2 there is a corresponding value of y, and that one may view these values ofthe response as a surface lying above the ξ1 − ξ2 plane, as in Figure 6.14(a). It is this graphicalperspective of the problem environment that has led to the term response surface methodology. Itis also convenient to view the response surface in the two–dimensional plane, as in Figure 6.14(b).In this presentation one has to look down at the ξ1 − ξ2 plane and connect all points that havethe same response values to produce contour lines of constant response. This type of display iscalled a contour plot .

341


Clearly, if one could easily construct the graphical displays in Figure 6.14, optimization of thisprocess would be very straightforward. By inspection of the plot, one may note that the responseis maximized in the vicinity of ξ1 = 4 and ξ2 = 525. Unfortunately, in most practical situations,the true response function in Figure 6.14 is unknown. The field of response surface methodol-ogy consists of the experimental strategy for exploring the space of the process or independentvariables, empirical statistical modelling to develop an appropriate approximating relationshipbetween the response and the process variables, and optimization methods for finding the levelsor values of the process variables ξ1 and ξ2 that produce desirable values of the responses.

6.4.1 Approximating Response Functions

In general, suppose that the scientist or engineer or experimenter is concerned with a product,process, system, or physical phenomenon involving a response y that depends on the controllableinput variables β1,β2, . . . ,βk. The relationship is

y = f (β1,β2,, . . . ,βk) + ε (6.55)

where the form of the true response function f is unknown and perhaps very complicated, andε is a term that represents other sources of variability not accounted for in f . Thus ε includeseffects such as measurement error on the response, other sources of variation that are inherent inthe process or system (background noise), the effect of other variables, and so on. Generally, ε

is treated as a statistical error, often assuming it to have a normal distribution with mean zeroand variance σ2. If the mean of ε is zero, then

E (y) ≡ η = E [f (βl,β2, . . . ,βk)] + E (ε) = f (β1,β2, . . . ,βk) (6.56)

The variables β1,β2,, . . . ,βk in equation (6.56) are usually called the natural variables, becausethey are expressed in the natural units of measurement, such as kilowatts, pascal, or g/kWh. Inmuch RSM work it is convenient to transform the natural variables to coded variables x1,x2, . . . ,xk,which are usually defined to be dimensionless with mean zero and the same spread or standarddeviation. In terms of the coded variables, the true response function (6.56) is written as

η = f (x1,x2, . . . ,xk) (6.57)

Because the form of the true response function f is unknown, it is necessary to approximate it.In fact, successful use of RSM is critically dependent upon the experimenter’s ability to developa suitable approximation for f . Usually, a low–order polynomial in some relatively small regionof the independent variable space is appropriate. In many cases, either a first–order or a second–order model is used.

For the case of two independent variables, the first–order model in terms of the coded variablesis

η = β◦ + β1 x1 + β2 x2 (6.58)

342


Figure 6.15 shows the three–dimensional response surface and the two–dimensional contour plotfor a specific first–order model. In three dimensions, the response surface is a plane lying abovethe (x1,x2) space. The contour plot shows that the first–order model can be represented asparallel straight lines of constant response in the (x1,x2) plane.

Figure 6.15. Response surface (a) and contour plot (b) for a first–order model

The first–order model is likely to be appropriate when the experimenter is interested in approxi-mating the true response surface over a relatively small region of the independent variable spacein a location where there is little curvature in f . For example, consider a small region around thepoint A in Figure 6.14(b); the first–order model would likely be appropriate here.

The form of the first–order model in equation (6.58) is sometimes called a main effects model ,because it includes only the main effects of the two variables x1 and x2. If there is an interactionbetween these variables, it can be added to the model easily to find out the first–order model withinteraction. as follows

η = β◦ + β1 x1 + β2 x2 + β12 x1 x2 (6.59)

Figure 6.16 shows the three–dimensional response surface and the contour plot for a specific casewhere the interaction term β12 xl x2 introduces curvature into the response function.

Often the curvature in the true response surface is strong enough that the first–order model isinadequate even with the interaction term included. A second-order model will likely be required

343


in these situations. For the case of two variables, the second–order model is

η = β◦ + β1 xl + β2 x2 + β11 x21 + β22 x2

2 + β12 x1 x2 (6.60)

This model would likely be useful as an approximation to the true response surface in a relativelysmall region around the point B in Figure 6.14(b), where there is substantial curvature in thetrue response function f .

Figure 6.16. Response surface (a) and contour plot (b) for a first–order model with interaction

Figure 6.17 presents the response surface and contour plot for a special case of the second–ordermodel. Notice the mound–shaped response surface and elliptical contours generated by thismodel. Such a response surface could arise in approximating a response near a maximum pointon the surface.

The second–order model is widely used in response surface methodology mainly because

• it is very flexible since it can take on a wide variety of functional forms, so it will often workwell as an approximation to the true response surface;

• it is easy to estimate the coefficients (the β’s) in the second–order model, say, by means ofthe least squares method;

344


• there is considerable practical experience indicating that second–order models work well insolving real response surface problems.

Figure 6.17. Response surface (a) and contour plot (b) for a second–order model

In general, the first–order model is

η = β◦ + β1 xl + β2 x2 + . . . + βk xk (6.61)

and the second–order model is

η = β◦ +k∑

j=1

βj xj +k∑

j=1

βjj x2j +

∑

i<j

k∑

j<2

βij xi xj (6.62)

Figure 6.18 shows several different response surfaces and contour plots that can be generated bya second–order model. In some situations, approximating polynomials of order greater than twoare used.

The general motivation for a polynomial approximation for the true response function f is basedon the Taylor series expansion around the point (x10,x20, . . . ,xk0).

For example, the first–order model is developed from the first–order Taylor series expansion

f = f (x10,x20, . . . ,xk0) +(

∂f

∂x1

)

x=x◦(x1 − x10) +

(∂f

∂x2

)

x=x◦(x2 − x20) + . . . +

(∂f

∂xk

)

x=x◦(xk − xk0) (6.63)

345


where x refers to the vector of independent variables and x◦ is that vector of variables at thespecific point (x10,x20, . . . ,xk0). In equation (6.63) only the first–order terms are included in theexpansion, thus implying the first–order approximating model in equation (6.61). If one was toinclude second–order terms in equation (6.63), this would lead to the second–order approximatingmodel in equation (6.62).

Figure 6.18. Examples of types of surfaces defined by the second-order model in two variables

Finally, note that there is a close connection between RSM and linear regression analysis. Forexample, consider the model

y = β◦ + β1 x1 + β2 x2 + . . . + βk xk + ε

The β’s are a set of unknown parameters. To estimate the values of these parameters, theexperimenter must collect data on the process or technical system under study. Regressionanalysis is a branch of statistical model building that uses these data to estimate the β’s. Because,in general, polynomial models are linear functions of the unknown β’s, one refers to the techniqueas linear regression analysis. It will also noticed that it is very important to plan the datacollection phase of a response surface study carefully. In fact, special types of experimentaldesigns, called response surface designs, are valuable in this regard.

6.4.2 Sequential Nature of RSM

Most applications of RSM are sequential in nature. That is, at first some ideas are generatedconcerning which factors or independent variables are likely to be important in the response sur-face study. This usually leads to an experiment designed to investigate these factors with a viewtoward eliminating the unimportant ones. This types of experiments are usually called screening

346


experiments. Often at the outset of a response surface study there is a rather long list of variablesthat could be important in explaining the response. The objective of factor screening is to reducethis list of candidate variables to a relatively few so that subsequent experiments will be moreefficient and require fewer runs or tests. Screening experiment is referred to as phase zero ofa response surface study. The experimenter should never undertake a response surface analysisuntil a screening experiment has been performed to identify the important factors.

Once the important independent variables are identified, phase one of the response surface studybegins. In this phase, the experimenter’s objective is to determine if the current levels or settingsof the independent variables result in a value of the response that is near the optimum, suchas point B in Figure 6.14(b), or if the process is operating in some other region that is remotefrom the optimum, such as point A in Figure 6.14(b). If the current settings or levels of theindependent variables are not consistent with optimum performance, then the experimenter mustdetermine a set of adjustments to the process variables that will move the process toward theoptimum. This phase of response surface methodology makes considerable use of the first–ordermodel and an optimization technique called the method of steepest ascent .

Phase two of a response surface study begins when the process is near the optimum. At thispoint the experimenter usually wants a model that will accurately approximate the true responsefunction within a relatively small region around the optimum. As the true response surfaceusually exhibits curvature near the optimum (see Figure 6.14), a second–order model (or perhapssome higher–order polynomial) will be used. Once an appropriate approximating model has beenobtained, this model may be analyzed to determine the optimum conditions for the process.

Figure 6.19. Region of operability and region of experimentation

This sequential experimental process is usually performed within some region of the independentvariable space called the operability region. Suppose one is currently operating at the levels shownas point A in Figure 6.19. Now it is unlikely that one would want to explore the entire region ofoperability with a single experiment. Instead, one usually defines a smaller region of interest or

347


region of experimentation around the point A within the larger region of operability. Typically,this region of experimentation is either a cuboidal region, as shown around the point A in Figure6.19, or a spherical region, as shown around point B.

The sequential nature of response surface methodology allows the experimenter to learn about theprocess or system under study as the investigation proceeds. This ensures that over the courseof the RSM application the experimenter will learn the answers to questions such as

• how much replications are necessary;• the location of the region of the optimum;• the type of the most appropriate approximating functions;• the proper choice of experimental design;• whether or not changes on the responses or any of the process variables are required.

Often, when the experimenter is at a point on the response surface that is remote from theoptimum, such as the current operating conditions in Figure 6.14, there is little curvature in thesystem and the first-order model will be appropriate. The strategic objective of RSM is to leadthe experimenter rapidly and efficiently to the general vicinity of the optimum. Once the regionof the optimum has been found, a more elaborate model such as the second–order model may beemployed, and an analysis may be performed to locate the optimum. From Figure 6.14, one cansee that the analysis of a response surface can be thought of as ‘climbing a hill’ where the top ofthe hill represents the point of maximum response. If the true optimum is a point of minimumresponse, then one may think of ‘descending into a valley’. The eventual objective of RSM isto determine the optimum operating conditions for the system or to determine a region of thefactor space in which operating specifications are satisfied. RSM is not used primarily to gainunderstanding of the physical mechanism of the system, although RSM may assist in the gainingof such knowledge. Furthermore, note that ‘optimum’ in RSM is used in a special sense. The’hill climbing’ procedures of RSM guarantee convergence to a local optimum only.

6.4.3 Objectives and Product Quality Improvement

Response surface methodology is useful in the solution of many types of industrial problems.Generally, these problems fall into three categories:

1. Mapping a Response Surface over a Particular Region of Interest . Some changes to normaloperating levels might occasionally be necessary. If the true unknown response function hasbeen approximated over a region around the current operating conditions with a suitablefitted response surface (say a second–order surface), then the process engineer and/or thedesigner can predict in advance the changes in the response that will result from any read-justments to the independent variables.

2. Selecting the Operating Conditions to Achieve Specifications or Customer Requirements.In most response surface problems there are several responses that must in some sense besimultaneously considered. In this case, one way to solve the problem is to obtain responsesurfaces for all the responses and then superimpose the contours for these responses.

348


3. Optimizing the Response. In the industrial world, a very important problem is determin-ing the conditions that optimize a process or a subsystem of a technical product. An RSMstudy that has begun near point A in Figure 6.14(b) would eventually lead the experimenterto the region near point B. A second–order model could then be used to approximate theresponse in a narrow region around point B, and from examination of this approximatingresponse surface the optimum levels or condition for the independent variables could bechosen.

During the last 25 years, industrial organizations in the United States and Europe have becomequite interested in quality improvement. Statistical methods, including statistical process controland design of experiments, play a key role in this activity. Quality improvement is most effectivewhen it occurs early in the product and process development cycle. Industries such as semi-conductors and electronics, aerospace, automotive, biotechnology and pharmaceuticals, medicaldevices, chemical, and process industries are all examples where experimental design methodologyhas resulted in products that are easier to manufacture, have higher reliability, have enhancedproduct performance, and meet or exceed customer requirements.

RSM is an important branch of experimental design in this respect. It is a critical technology indeveloping new processes, optimizing their performance, and improving the design of new prod-ucts. It is often an important concurrent engineering tool , in that product designers, processdevelopers, quality controllers, manufacturing engineers, and operations personnel often worktogether in a team environment to apply RSM. The objectives of quality improvement, includingreduction of variability and improved product and process performance, can often be accom-plished directly using RSM.

It is well known that variation in key performance characteristics can result in poor product andprocess quality. During the 1980s, considerable attention was given to this problem, and method-ology was developed for using experimental design, specifically for the following purposes:

• designing products or processes so that they are robust to environment conditions;• designing or developing products so that they are robust to component variation;• minimizing variability in the output response of a product around a target value.

Robust means that the product or process performs consistently on target and is relatively insen-sitive to factors that are difficult to control. Taguchi (1981, 1883) used the term robust parameterdesign to describe his approach to this important class of industrial problems. Essentially, ro-bust parameter design methodology prefers to reduce product or process variation by choosinglevels of controllable factors that make the system insensitive (or robust) to changes in a set ofuncontrollable factors that represent most of the sources of variability. Taguchi referred to theseuncontrollable factors as noise factors. These are the environmental factors such as stowage fac-tor levels, changes in prices of materials, fuel cost variability, interest rate on debt, and so on.It is usually assumed that these noise factors are uncontrollable in actual operation, but can becontrolled during product or process design and development for purposes by means of DoE.

Considerable attention has been focused on the methodology advocated by Taguchi, and a numberof flaws in his approach have been discovered. However, there are many useful concepts in his

349


philosophy, and it is relatively easy to incorporate these within the framework of response surfacemethodology. Several attractive alternatives to the robustness studies were developed, that arebased on principles and philosophy of Taguchi, and avoid the flaws and controversy that surroundhis techniques.

6.5 Building Empirical Models

6.5.1 Linear Regression Models

In the practical application of response surface methodology it is necessary to develop an approxi-mating model for the true response surface, which is typically driven by some physical mechanism.The approximating model is based on observed or computed data from the manufacturing processor technical system and is an empirical model. Multiple regression is a collection of statisticaltechniques useful for building the types of empirical models required in RSM.

As an example, suppose that the experimenter wishes to develop an empirical model relating theeffective lift of an airfoil to the flow speed and the incidence angle. A first-order response surfacemodel that might describe the relationship for an empirical model with two variables is

y = β0 + β1x1 + β2x2 + ε (6.64)

where y represents the lift, x1 represents the flow speed, and x2 denotes the incidence angle. Thisis a multiple linear regression model with two independent variables. The independent variablesare often called predictor variables or regressors. The term ‘linear’ is used because equation (6.64)is a linear function of the unknown parameters β0, β1 and β2. The model describes a plane in thetwo-dimensional x1,x2 space. The parameter β0 fixes the intercept of the plane. The parametersβ1 and β2 are sometimes called partial regression coefficients, because β1 measures the expectedchange in y per unit change in x1 when x2 is held constant, and β2 measures the expected changein y per unit change in x1 when x1 is held constant.

In general, the response variable y may be related to k regressor variabies. The model

y = β0 + β1x1 + β2x2 + . . . + βkxk + ε (6.65)

is called a multiple linear regression model with k regressor variables. The parameters βj , j =0,1, . . . ,k, are called the regression coefficients. This model describes a hyperplane in the k-dimensional space of the regressor variables {xj}. The parameter βj represents the expectedchange in response y per unit change in xj when all the remaining independent variables xi (i 6= j)are held constant.

Models that are more complex in appearance than equation (6.65) may often still be analyzedby multiple linear regression techniques. For example, considering adding an interaction term tothe first-order model in two variables, say

y = β0 + β1x1 + β2x2 + β12x1x2 + ε (6.66)

350

6.5 – Building Empirical Models

If one lets x3 = x1x2 and β3 = β12, then equation (6.66) can be written as

y = β0 + β1x1 + β2x2 + β3x3 + ε (6.67)

which is a standard multiple linear regression model with three regressors.

As another example, consider the second-order response surface model in two variables:

y = β0 + β1x1 + β2x2 + β11x21 + β22x

22 + β12x1x2 + ε (6.68)

If one lets x3 = x21, x4 = x2

2, x5 = x1x2, β3 = β11, β4 = β22, and β5 = β12, then equation (6.68)becomes

y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + ε (6.69)

which is a linear regression model.

In general, any regression model that is linear in the parameters (the β-values) is a linear regres-sion model, regardless of the shape of the response surface that it generates.

Methods for estimating the parameters in multiple linear regression models will be illustrated.This is often called model fitting . Also methods for testing hypotheses and constructing confidenceintervals for these models will be discussed, as well as for checking the adequacy of the model fit.Focus is primarily on those aspects of regression analysis useful in RSM.

6.5.2 Parameters Estimation in Linear Regression Models

The method of least squares is typically used to estimate the regression coefficients in a multiplelinear regression model. Suppose that n > k projects on the response variable are available, sayy1,y1, . . . ,yn. Along with each observed or computed response yi, the experimenter will have avalue on each regressor variable. Let xij denote the ith level of variable xj . The data matrix willappear as in Table 6.3. The error term ε in the model is assumed to have E(ε) = 0 and V(ε) = σ2

and that the {ε} are uncorrelated random variables.

y x1 x2 x3 . . . xk

y1 x11 x12 x13 . . . x1k

y2 x21 x22 x23 . . . x2k

......

......

...yn xn1 xn2 xn3 . . . xnk

Table 6.3. Data matrix for multiple linear regression

In general, the model equation (6.65) may be written in terms of the regressors in Table 6.3 as

y = β0 + β1xi1 + β2xi2 + . . . + βkxik + εi = β0 +k∑

j=1

βixij + εi , i = 1,2, . . . ,n (6.70)

The method of least squares chooses the β’s in equation (6.70) so that the sum of the squares ofthe errors, εi, is minimized. The least squares function is

351


L =n∑

i=1

ε2i =

n∑

i=1

yi − β0 −

k∑

j=1

βjxij

2

(6.71)

The function L is to be minimized with respect to β0,β1, . . . ,βk. The least squares estimators,say b0,b1, . . . ,bk, must satisfy the system of equations

(∂L

∂β0

)

b0,b1,...,bk

= −2n∑

i=1

yi − b0 −

k∑

j=1

βjxij

= 0

(∂L

∂βj

)

b0,b1,...,bk

= −2n∑

i=1

yi − b0 −

k∑

j=1

βjxij

xij = 0 , j = 1,2, . . . ,k

(6.72)

Simplifying equations (6.72), one obtains

nb0 + b1

n∑

i=1

xi1 + b2

n∑

i=1

xi2 + . . . + bk

n∑

i=1

xik = b1

n∑

i=1

yi

b0

n∑

i=1

xi1 + b1

n∑

i=1

x2i1 + b2

n∑

i=1

xi1xi2 + . . . + bk

n∑

i=1

xi1xik =n∑

i=1

xi1yi

......

......

...

b0

n∑

i=1

xik + b1

n∑

i=1

xikxi1 + b2

n∑

i=1

xikxi2 + . . . + bk

n∑

i=1

x2ik =

n∑

i=1

xikyi

(6.73)

These equations are called the least squares normal equations. Note that there are p = k + 1normal equations, one for each of the unknown regression coefficients. The solution to the normalequations will be the least squares estimators of the regression coefficients β0,β1, . . . ,βk.

In scalar notation, the fitted model is

yi = b0 +k∑

j=1

bjxij , i = 1,2, . . . ,n

The difference between the computed value yi and the fitted value yi is the residual of the ith

project, say

ei = yi − yi

352


6.5.3 Model Adequacy Checking

It is always necessary to (i) examine the fitted model to ensure that it provides an adequateapproximation to the true physical phenomenon or process and (ii) verify that none of the leastsquares regression assumptions are violated. Proceeding with exploration and optimization of afitted response surface will likely give poor or misleading results unless the model is an adequatefit. Several techniques for checking model adequacy are presented below.

Residual Analysis

The decomposition of the variability in observations through an analysis of variance identity -see equation (6.40) - is a purely algebraic relationship. However, the use of the partitioning totest formally for no differences in treatment means requires that certain assumptions be satisfied.Specifically, these assumptions are that the observations are adequately described by the model


and that the errors are normally and independently distributed with mean zero and constant butunknown variance σ2. If these assumptions are valid, then the analysis of variance procedure isan exact test of the hypothesis of no difference in treatment means.

In practice, however, these assumptions will usually not hold exactly. Consequently, it is unwise torely on the analysis of variance until the validity of these assumptions has been checked. Violationsof the basic assumptions and model adequacy can be easily investigated by the examination ofresiduals. The residual from the least square fit for the jth observation in the ith treatment,defined as

eij = yij − yij (6.74)

plays an important role in judging model adequacy.

In equation (6.74) yij is an estimate of the corresponding experimental or computed observationyij obtained as follows

yij = µ + τi = y.. + (yi. − y..) = yi. (6.75)

Equation (6.75) gives the intuitively appealing result that the estimate of any observation in theith treatment is just the corresponding treatment mean.

Examination of the residuals should be an automatic part of any analysis of variance. If themodel is adequate, the residuals should be structureless; that is, they should contain no obviouspatterns. Through a study of residuals, many types of model inadequacies and violations of theunderlying assumptions can be discovered.

A check of the normality assumption may be made by constructing a normal probability plot ofthe residuals, as in Figure 6.20. If the residuals plot approximately along a straight line, thenthe normality assumption is satisfied. Figure 6.20 reveals no apparent problem with normality.

353


When this plot indicates problems with the normality assumption, the response variable is oftentransformed as a remedial measure.

Figure 6.20. Normal probability plot of residuals

Figure 6.21 presents a plot of residuals ei versus the predicted response yi. The general impressionis that the residuals scatter randomly, suggesting that the variance of the original observationsis constant for all values of y. If the variance of the response depends on the mean level of y,then this plot will often exhibit a funnel–shaped pattern. This is also suggestive of the need fortransformation of the response variable y.

Figure 6.21. Plot of residuals versus predicted response yi

Scaling Residuals

Standardized and Studentized Residuals

Many response surface analysts prefer to work with scaled residuals, in contrast to the ordinaryleast squares residuals. These scaled residuals often convey more information than do the ordi-nary residuals.

354


One type of scaled residual is the standardized residual

di =ei

σ, i = 1,2, . . . ,n (6.76)

where the standard deviation σ =√

MSE is generally used in the computation. These standard-ized residuals have mean zero and approximately unit variance; consequently, they are useful inlooking for outliers. Most of the standardized residuals should lie in the interval −3 ≤ di ≤ 3, andany observation with a standardized residual outside of this interval is potentially unusual withrespect to its observed response. Thes outliers should be carefully examined, because they mayrepresent something as simple as a data recording error or something of more serious concern,such as a region of the regressor variable space where the fitted model is a poor approximationto the true response surface.

The standardizing process in equation (6.76) scales the residuals by dividing them by their av-erage standard deviation. In some data sets, residuals may have standard deviations that differgreatly. A scaling that takes this into account is presented hereafter.

The vector of fitted values yi corresponding to the computed values yi is

y = xb = Hy (6.77)

The n × n matrix H is usually called the hat matrix because it maps the vector of computedvalues into a vector of fitted values. The hat matrix and its properties play a central role inregression analysis.

The residuals from the fitted model may be conveniently written in matrix notation as

e = y − y (6.78)

There are several other ways to express the vector of residuals e that will prove useful, including

e = y −Hy = (I−H)y (6.79)

The hat matrix has several useful properties. It is symmetric (H′ = H) and idempotent(HH = H). Similarly the matrix I−H is symmetric and idempotent.

The covariance matrix of the residuals is

V(e) = V[(I−H)y] = ((I−H) V(y) (I−H)′) = σ2 (I−H) (6.80)

because V(y) = σ2I and the matrix (I − H) is symmetric and idempotent. This matrix isgenerally not diagonal, so the residuals have different variances and they are correlated.

The variance of the ith residual is

V (ei) = σ2 (1− hii) (6.81)

355


where hii is the ith diagonal element of H. Because 0 ≤ hii ≤ 1, using the residual mean squareMSE to estimate the variance of the residuals actually overestimates V(ei). Furthermore, be-cause hii is a measure of the location of the ith point in x-space, the variance of ei depends uponwhere the point xi lies. Generally, residuals near the center of the x-space have larger variancethan do residuals at more remote locations. Violations of model assumptions are more likely atremote points, and these violations may be hard to detect from inspection of ei (or di) becausetheir residuals will usually be smaller.

It is therefore recommended to take this inequality of variance into account when scaling theresiduals. Instead of ei (or di) it is suggested plotting the studentized residuals

ri =ei√

σ2(1− hii), i = 1,2, . . . ,n (6.82)

with σ2 = MSE .

The studentized residuals have constant variance V(ri) = 1 regardless of the location of xi whenthe form of the model is correct. In many situations the variance of the residuals stabilizes, par-ticularly for large data sets. In these cases there may be little difference between the standardizedand studentized residuals. Thus standardized and studentized residuals often convey equivalentinformation. However, because any point with a large residual and a large hii is potentiallyhighly influential on the least squares fit, examination of the studentized residuals is generallyrecommended .

PRESS Residuals

The prediction error sum of squares (PRESS) proposed by Allen (1971, 1974) provides a usefulresidual scaling. To calculate PRESS, select a project i. Fit the regression model to the remainingn − 1 computations and use this equation to predict the withheld response yi. Denoting thispredicted value by y(i) one may find the prediction error for point i as e(i) = yi − y(i). Theprediction error is often called the ith PRESS residual. This procedure is repeated for eachproject i = 1,2, . . . ,n, producing a set of n PRESS residuals e(1), e(1), . . . , e(n). Then the PRESSstatistic is defined as the sum of squares of the n PRESS residuals as in

PRESS =n∑

i=1

e2(i) =

n∑

i=1

[yi − y(i)]2 (6.83)

Thus PRESS uses each possible subset of n−1 observations as an estimation data set, and everycomputation in turn is used to form a prediction data set. It would initially seem that calculatingPRESS requires fitting n different regressions. However, it is possible to calculate PRESS fromthe results of a single least squares fit to all n observations. It turns out that the ith PRESSresidual is

e(i) =ei

1− hii(6.84)

356


Thus, because PRESS is just the sum of the squares of the PRESS residuals, a simple computingformula is

PRESS =n∑

i=1

(ei

1− hii

)2

(6.85)

From equation (6.84) it is easy to see that the PRESS residual is just the ordinary residualweighed according to the diagonal elements of the hat matrix hii. Data points for which hii

are large will have large PRESS residuals. These computations will generally be high influencepoints. Generally, a large difference between the ordinary residual and the PRESS residual willindicate a point where the model fits the data well, but a model built without that point predictspoorly.

The variance of the ith residual is

V[e(i)] = V[

ei

1− hii

]=

σ2(1− hii)(1− hii)2

=σ2

1− hii(6.86)

so that standardized PRESS residual ise(i)√V[e(i)]

=ei/(1− hii)√σ2/(1− hii)

=ei√

σ2 (1− hii)

which, if MSE is used to estimate σ2, is just the studentized residual.

Finally, one may note that PRESS can be used to compute an approximate R2 for prediction,say

R2prediction = 1− PRESS

SST(6.87)

This statistic gives some indication of the predictive capability of the regression model. Theoverall predictive capability of the model based on this criterion is generally satisfactory.

R Student

The studentized residual ri discussed above is often considered an outlier diagnostic. It is cus-tomary to use MSE as an estimate of σ2 in computing ri. This is referred to as internal scalingof the residual, because MSE is an internally generated estimate of σ2 obtained from fitting themodel to all n projects. Another approach would be to use an estimate of σ2 based on a dataset with the ith project removed. Denote the estimate of σ2 so obtained by S2

(i). It can be shownthat

S2(i) =

(n− p) MSE − e2i /(1− hii)

n− p− 1(6.88)

The estimate of σ2 in equation (6.88) is used instead of MSE to produce an externally studentizedresidual , usually called R-student , given by

357


ti =ei√

S2(i)(1− hii)

, i = 1,2, . . . ,n (6.89)

In many situations, ti will differ little from the studentized residual ri. However, if the ith projectis influential, then S2

(i) can differ significantly from MSE , and thus R-student will be more sen-sitive to this point. Furthermore, under the standard assumptions, ti has a tn−p−1 distribution.Thus R-student offers a more formal procedure for outlier detection via hypothesis testing. How-ever, it is generally accepted that a formal approach is usually not necessary and that onlyrelatively crude cut–off values need be considered. In general, a diagnostic view as opposed to astrict statistical hypothesis-testing view is best. Furthermore, detection of outliers needs to beconsidered simultaneously with detection of influential observations.

Figure 6.22 is a normal probability plot of the studentized residuals. It conveys exactly the sameinformation as the normal probability plot of the ordinary residuals ei in Figure 6.20. This isbecause most of the hii-values are similar and there are no unusually large residuals. In someapplications, however, the hii can differ considerably, and in those cases plotting the studentizedresiduals is the best approach.

Figure 6.22. Normal probability plot of studentized residuals

Influence Diagnostics

One may occasionally find that a small subset of the data exerts a disproportionate influence onthe fitted regression model. That is, parameter estimates or predictions may depend more onthe influential subset than on the majority of the data. The experimenter would like to locatethese influential points and assess their impact on the model. If these influential points are ‘bad’values, then they should be eliminated. On the other hand, there may be nothing wrong withthese points, but if they control key model properties, the experimenter would like to know it,

358


because it could affect the use of the model. Several useful measure of influence are described.

Leverage Points

The disposition of points in the design space is important in determining model properties. Inparticular remote points potentially have disproportionate leverage on the parameter estimates,the predicted values, and the usual summary statistics. The hat matrix H is very useful in iden-tifying influential design points. As noted earlier, H determines the variances and covariances ofy and e, because V (y) = σ2H and V (e) = σ2(I−H). The elements hij of H may be interpretedas the amount of leverage exerted by yj on yi. Thus inspection of the elements of H can revealpoints that are potentially influential by virtue of their location in x-space. Attention is usuallyfocused on the diagonal elements hii. Because

∑hii = rank(H) = rank(x) = p, the average size

of the diagonal element of the matrix H is p/n. As a rough guideline, then, if a diagonal elementhii is greater than 2p/n, design point i is a high–leverage point.

Influence on Regression Coefficients

The hat diagonals will identify points that are potentially influential due to their location in thedesign space. It is desirable to consider both the location of the point and the response variablein measuring influence. Cook (1977, 1979) has suggested using a measure of the squared distancebetween the least squares estimate based on all n points b and the estimate obtained by deletingthe ith point, say b(i). This distance measure can be expressed in a general form as

Di(M,c) =(b(i) − b)′ M (b(i) − b)

c, i = 1,2, . . . ,n (6.90)

The usual choices of M and c are M = x′x and c = p·MSE , so that equation (6.90) becomes

Di(M,c) ≡ Di =(b(i) − b)′ x′x (b(i) − b)

p·MSE, i = 1,2, . . . ,n (6.91)

Points with large values of Di have considerable influence on the least squares estimates b. Themagnitude of Di may be assessed by comparing it with Fα,p,n−p. If Di ' F0.5, p, n−p , then deletingpoint i would move b to the boundary of a 50% confidence region for β based on the completedata set.1 This is a large displacement and indicates that the least squares estimate is sensitive tothe ith data point. Because F0.5, p, n−p ' 1, the experimenter usually considers points for whichDi > 1 to be influential. Practical experience has shown the cut-off value of 1 works well inidentifying influential points.

The statistic Di may be rewritten as

Di =r2i

p·V[y(xi)]

V(ei)=

r2i

p· hii

(1− hii), i = 1,2, . . . ,n (6.92)

1The distance measure Di is not an F random variable, but is compared with an F -value because of the similarityof Di to the normal theory confidence ellipsoid.

359


Thus it can be noted that, apart from the constant p, Di is the product of the square of the ith

studentized residual and hii/(1−hii). This ratio can be shown to be the distance from the vectorxi to the centroid of the remaining data. Thus Di is made up of a component that reflects howwell the model fits the ith computed value yi and a component that measures how far that pointis from the rest of the data. Either component (or both) may contribute to a larger value of Di.

Testing for Lack of Fit

In RSM, usually the experimenter is fitting the regression model to data from a designed exper-iment. Aferwards, he/she may conduct a formal test for the lack of fit on the regression model.For example, consider the data in Figure 6.23. There is some indication that the straight–line fitis not very satisfactory, and it would be helpful to have a statistical test to determine if there issystematic curvature present.

Figure 6.23. Lack of fit of a linear model

The lack–of–fit test requires having true replicates on the response y for at least one set of levelson the regressors x1,x2, . . . ,xk. These are not just duplicate readings or measurements of y. Forexample, suppose that y is product viscosity and there is only one regressor x (temperature).True replication consists of running ni separate experiments (usually in random order) at x = xi

and observing viscosity, not just running a single experiment at xi and measuring viscosity ni

times. The readings obtained from the latter procedure provide information mostly on the vari-ability of the method of measuring viscosity. The error variance σ2 includes measurement error,variability in the process over time, and variability associated with reaching and maintainingthe same temperature level in different experiments. These replicate points are used to obtain amodel–independent estimate of σ2.

Suppose that the experimenter has ni observations on the response at the ith level of the regres-sors xi. Let yij denote the jth observation on the response at xi, i = 1,2, . . . ,m, and j = 1,2, . . . ,ni.

360


There are n =∑

ni observations altogether. The test procedure involves partitioning the residualsum of squares into two components, say

SSE = SSPE + SSLOF

where SSPE is the sum of squares due to pure error and SSLOF is the sum of squares due to lackof fit.

To develop this partitioning of SSE , note that the (i,j)th residual is

yij − yi = (yij − yi) + (yi − yi) (6.93)

where yi is the average of the ni observations at xi. Squaring both sides of equation (6.93) andsumming over i and j yields

m∑

i=1

ni∑

j=1

(yij − yi)2 =m∑

i=1

ni∑

j=1

(yij − yi)2 +m∑

i=1

ni(yi − yi)2 (6.94)

The left–hand side of equation (6.94) is the usual residual sum of squares. The two componentson the right–hand side measure pure error and lack of fit. One can see that the pure error sumof squares

SSPE =m∑

i=1

ni∑

j=1

(yij − yi)2 (6.95)

is obtained by computing the corrected sum of squares of the repeat observations at each level ofx and then pooling over the m levels of x.

If the assumption of constant variance is satisfied, this is a model-independent measure of pureerror, because only the variability of the y’s at each xi level is used to compute SSPE . Becausethere are (ni − 1) degrees of freedom for pure error at each level xi, the total number of degreesof freedom associated with the pure error sum of squares is

m∑

i=1

(ni − 1) = n−m (6.96)

The sum of squares for lack of fit

SSLOF =m∑

i=1

ni(yi − yi)2 (6.97)

is a weighed sum of squared deviations between the mean response yi at each xi level and thecorresponding fitted value. If the fitted values yi are close to the corresponding average responsesyi, then there is a strong indication that the regression function is linear. If the yi deviate greatlyfrom the yi, then it is likely that the regression function is not linear. There are (m− p) degreesof freedom associated with SSLOF , because there are m levels of x but p degrees of freedom arelost because p parameters must be estimated for the model. Computationally one usually obtainsSSLOF by subtracting SSPE from SSE .

361


The test statistic for lack of fit is

F◦ =SSLOF /(m− p)SSPE/(n−m)

=MSLOF

MSPE(6.98)

The expected value of MSPE is σ2, and the expected value of MSLOF is

E(MSLOF ) = σ2 +

m∑

i=1

E(yi)− β0 −

k∑

j=1

βjxij

2

m− 2(6.99)

If the true regression function is linear, then E(yi) = β0 +∑k

j=1 βjxij and the second term ofequation (6.99) is zero, resulting in E(MSLOF ) = σ2. However, if the true regression functionis not linear, then E(yi) 6= β0 +

∑kj=1 βjxij , and E(MSLOF ) > σ2. Furthermore, if the true

regression function is linear, then the statistic F0 follows the Fm−p, n−m distribution. Therefore,to test for lack of fit, the experimenter would compute the test statistic F0 and conclude that theregression function is nonlinear if F0 > Fα, m−p, n−m.

This test procedure may be easily introduced into the analysis of variance conducted for signifi-cance of regression. If one concludes that the regression function is not linear, then the tentativemodel must be abandoned and attempts made to find a more appropriate equation. Alterna-tively, if F0 does not exceed Fα,m−p,n−m, there is no strong evidence of lack of fit, and MSPE

and MSLOF are often combined to estimate σ2.

Ideally, one finds that the F -ratio for lack of fit is not significant and the hypothesis of significanceof regression is rejected. Unfortunately, this does not guarantee that the model will be satisfac-tory as a prediction equation. Unless the variation of the predicted values is large relative to therandom error, the model is not estimated with sufficient precision to yield satisfactory predic-tions. That is, the model may have been fitted to the errors only. Some analytical work has beendone on developing criteria for judging the adequacy of the regression model from a predictionpoint of view. See Box and Wetz (1973), Ellerton (1978), Gunst and Mason (1979), Hill and al.(1978), and Suich and Derringer (1977). Box and Wetz’s work suggests that the observed F -ratiomust be at least four or five times the critical value from the F -table if the regression model is tobe useful as a predictor - that is, if the spread of predicted values is to be large relative to the noise.

A relatively simple measure of potential prediction performance is found by comparing the rangeof the fitted values y (i.e., ymax − ymin) with their average standard error. It can be shown that,regardless of the form of the model, the average variance of the fitted values is

V (y) =1n

n∑

i=1

V [y(xi)] =p·σ2

n(6.100)

where p is the number of parameters in the model. In general, the model is not likely to be a satis-factory predictor unless the range of the fitted values yi is large relative to their average estimatedstandard error

√(pσ2)/n, where σ2 is a model–independent estimate of the error variance.

362


6.5.4 Fitting a Second-Order Model

Many applications of response surface methodology involve fitting and checking the adequacy ofa second-order model. A complete example of this process is presented hereinafter.

Suppose that after a screening experiment involving several factors the two most importantvariables were selected. Because the experimenter thought that the process was operating in thevicinity of the optimum, he elected to fit a quadratic model relating the response to those twovariables. Table 6.4 shows the levels in terms of coded variables x1 and x2.

Run x1 x2 y

4 -1 -1 4312 1 -1 7811 -1 1 695 1 1 736 -1.414 0 487 1.414 0 781 0 -1.414 653 0 1.414 748 0 0 7610 0 0 799 0 0 832 0 0 81

Table 6.4. Example of central composite design

Figure 6.24 shows the experimental design in Table 6.4 graphically.

Figure 6.24. Example of central composite design

This design is called a central composite design, and it is widely used for fitting a second-orderresponse surface. Notice that the design consists of four runs at the corners of a square, plus fourruns at the center of this square, plus four axial runs. In terms of the coded variables the cornersof the square are (x1, x2) = (-1, -1), (1, -1), (-1,1), (1,1); the center points are at (x1, x2) = (0,0); and the axial runs are at (x1, x2) = (-1.414, 0), (1.414, 0), (0, -1.414), (0,1.414).

363


The second–order model will be fitted as

y = β0 + β1x1 + β2x2 + β11x21 + β22x

22 + β12x1x2 + ε

using the coded variables.

The matrix x and vector y for this model are

x1 x2 x21 x2

2 x1x2

x =

1 −1 −1 1 1 11 1 −1 1 1 −11 −1 1 1 1 −11 1 1 1 1 11 −1.414 0 2 0 01 1.414 0 2 0 01 0 −1.414 0 2 01 0 1.414 0 2 01 0 0 0 0 01 0 0 0 0 01 0 0 0 0 01 0 0 0 0 0

, y =

437869734876657476798381

Figure 6.25. Response surface (a) and contour plot (b) of predicted response

364


Notice that the variables associated with each column have been shown above that column in thematrix x. The entries in the columns associated with x2

1 and x22 are found by squaring the entries

in columns x1 and x2, respectively, and the entries in the x1x2 column are found by multiplyingeach entry from x1 by the corresponding entry from x2.

In many response surface problems the experimenter is interested in predicting the response y orestimating the mean response at a particular point in the process variable space. The responsesurface plots in Figure 6.25 give a graphical display of these quantities. Typically, the varianceof the prediction is also of interest, because this is a direct measure of the likely error associatedwith the point estimate produced by the model.

Plots of√

V [y(x◦)], with σ2 estimated by the mean square error MSE for all values of x◦ inthe region of experimentation, are presented in Figure 6.26(a) and 6.26(b). Both the responsesurface in Figure 6.26(a) and the contour plot of constant

√[y(x◦)]] in Figure 6.26(b) show that

the√

V [y(x0)] is the same for all points x◦ that are at the same distance from the center of thedesign. This is a result of the spacing of the axial runs in the central composite design at 1.414units from the origin (in the coded variables), and is a design property called rotatability . Thisis a very important property for a second–order response surface design.

Figure 6.26. Response surface (a) and contour plot (b) of variance of predicted response

365


6.5.5 Transformation of the Response Variable

It has been noted above that often a data transformation can be used when residual analysisindicates some problem with underlying model assumptions, such as nonnormality or nonconstantvariance in the response variable. Here the use of data transformation is illustrated by consideringa 33 factorial experiment, taken from Box and Draper (1987), which supports a complete second–order polynomial. Its least squares fit is

y = 550.7+660x1−535.9x2−310.8x3+238.7x21+275.7x2

2−48.3x23−456.5x1x2−235.7x1x3+143x2x3

The R2 value is 0.975. An analysis of variance is given in Table 6.5. The fit appears to bereasonable and both the first- and second–order terms appear to be necessary.

Source of Sum of Squares Degrees of Mean SquareVariability (x 10−3) Freedom (x 10−3) F0

First-order terms 14,748.5 3 4,916.2 70.0Second-order terms 4,224.3 6 704.1 9.5Residual 1,256.6 17 73.9

Total 20,229.4 26

Table 6.5. Analysis of the variance for a quadratic model

Figure 6.27 is a plot of residuals versus the predicted cycles to failure y for this model. There isan indication of an outward–opening funnel in this plot, implying possible inequality of variance.

Figure 6.27. Plot of residuals vs. predicted values for a quadratic model

When a natural log transformation is used for y, the following model is obtained

ln y = 6.33 + 0.82x1 − 0.63x2 − 0.38x3 ⇒ y = e6.33+0.82x1−0.63x2−0.38x3

366


This model has R2 = 0.963, and has only three model terms (apart from the intercept). None ofthe second–order terms are significant. Here, as in most modelling exercises, simplicity is of vitalimportance. The elimination of the quadratic terms and interaction terms with the change inresponse metric not only allows a better fit than the second–order model with the natural metric,but the effect of the design variables x1, x2, and x3, on the response is clear.

Figure 6.28 is a plot of residuals versus the predicted response for the log model. There is stillsome indication of inequality of variance, but the log model, overall, is an improvement on theoriginal quadratic fit.

Figure 6.28. Plot of residuals vs. predicted values for a log model

In the previous example, the problem of nonconstant variance in the response variable y in linearregression was illustrated. It was noted that this is a departure from the standard least squaresassumptions. This inequality of variance problem occurs fairly often in practice, often in con-junction with a nonnormal response variable. Examples include a count of defects on input data,or a response variable that follows some skewed distribution (one tail of the response distributionis longer than the other). It has been illustrated how transformation of the response variable canbe used for stabilizing the variance of the response.

Generally, transformations are used for three purposes: stabilizing the response variance, makingthe distribution of the response variable closer to the normal distribution, and improving the fit ofthe model to the data. This last objective could include model simplification, say by eliminatinginteraction, or higher–order polynomial terms. Sometimes a transformation will be reasonablyeffective in simultaneously accomplishing more than one of these objectives.

It is often found that the power family of transformations y∗ = yλ is very useful, where λ is theparameter of the transformation to be determined (e.g., λ = 1/2 means use the square root ofthe original response). Box and Cox (1964) have shown how the transformation parameter λ

may be estimated simultaneously with the other model parameters (overall mean and treatmenteffects). The theory underlying their method uses the method of maximum likelihood. The actual

367


computational procedure consists of performing, for various values of λ, a standard analysis ofvariance on

yλ =

yλ−1

λ yλ−1λ 6= 0

y ln y λ = 0

(6.101)

where y = ln−1[(1/n)∑

ln y] is the geometric mean of the observations. The maximum likeli-hood estimate of λ is the value for which the error sum of squares, SSE(λ), is a minimum. Thisvalue of λ is usually found by plotting a graph of SSE(λ) versus λ and then reading the value ofλ that minimizes SSE(λ) from the graph. Usually between 10 and 20 values of λ are sufficientfor estimation of the optimum value. A second iteration using a finer mesh of values can beperformed if a more accurate estimate of λ is necessary.

Notice that the experimenter cannot select a value of λ by directly comparing the error sums ofsquares from analyses of variance on yλ, because for each value of λ the error sum of squaresis measured on a different scale. Furthermore, a problem arises in y when λ = 0; namely, as λ

approaches zero, yλ approaches unity. That is, when λ = 0, all the response values are a constant.The component (yλ− 1)/λ of equation (6.101) alleviates this problem because as λ tends to zero,(yλ − 1)/λ goes to a limit of (ln y). The divisor component yλ − 1 in equation (6.101) rescalesthe responses so that the error sums of squares are directly comparable.

In applying the Box-Cox method, it is recommended using simple choices for λ, because thepractical difference between λ = 0.5 and λ = 0.58 is likely to be small, but the square roottransformation (λ = 0.5) is much easier to interpret. Obviously, values of λ close to unity wouldsuggest that no transformation is necessary.

Once a value of λ is selected by the Box-Cox method, the experimenter can analyze the datausing yλ as the response, unless of course λ = 0, in which case he/she can use ln y. It is perfectlyacceptable to use yλ as the actual response, although the model parameter estimates will have ascale difference and origin shift in comparison with the results obtained using yλ (or ln y).

An approximate 100 (1− α)% confidence interval for λ can be found by computing

SS∗ = SSE(λ)

(1 +

t2α/2, ν

ν

)(6.102)

where ν is the number of degrees of freedom, and plotting a line parallel to the λ-axis at heightSS∗ on the graph of SSE(λ) versus λ. Then by locating the points on the λ-axis where SS∗ cutsthe curve SSE(λ), the experimenter can read confidence limits on λ directly from the graph. Ifthis confidence interval includes the value λ = 1, this implies that the data do not support theneed for the transformation.

368

6.6 – Response Surface Methods and Designs

6.6 Response Surface Methods and Designs

The eventual objective of Response Surface Methodology is to determine the optimum operat-ing conditions for the technical system or to determine a region of the variable space in whichoperating specifications are satisfied. RSM is not used primarily to gain understanding of thephysical mechanism of the technical system, although RSM may assist in the gain of such knowl-edge. Furthermore, note that ‘optimum’ in RSM is used in a special sense. The ‘hill climbing’procedures of RSM guarantee convergence to a local optimum only.

6.6.1 Steepest Ascent Method

Frequently, the initial estimate of the optimum operating conditions for the technical system willbe far from the actual optimum. In such circumstances, the objective of the experimenter is tomove rapidly to the general vicinity of the optimum. It is desired to use a simple and economicallyefficient experimental procedure. When the solution is remote from the optimum, it is usuallyassumed that a first-order model is an adequate approximation to the true surface in a smallregion of the x’s.

Figure 6.29. First-order response surface and path of steepest ascent

The method of steepest ascent is a procedure for moving sequentially along the path of steepestascent, that is, in the direction of the maximum increase in the response. Of course, if minimiza-tion is desired, then one is talking about the method of steepest descent . The fitted first-ordermodel is

y = β◦ +k∑

i=1

βixi (6.103)

369


and the first-order response surface, that is, the contours of y is a series of parallel lines such asthat shown in Figure 6.29. The direction of steepest ascent is the direction in which y increasesmost rapidly. This direction is parallel to the normal to the fitted response surface. One usuallytakes as the path of steepest ascent the line through the center of the region of interest and normalto the fitted surface. Thus, the steps along the path are proportional to the regression coefficients{βi}. The actual step size is determined by the experimenter based on process knowledge or otherpractical considerations.

Experiments are conducted along the path of steepest ascent until no further increase in responseis observed. Then a new first–order model may be fit, a new path of steepest ascent determined,and the procedure continued. Eventually, the experimenter will arrive in the vicinity of theoptimum, which is usually indicated by lack of fit of a first-order model. At that time additionalexperiments are conducted to obtain a more precise estimate of the optimum.

6.6.2 Analysis of a Second-Order Model

When the experimenter is relatively close to the optimum, a model of second or higher degree isusually required to approximate the response because of curvature in the true response surface.In most cases, the second-order model

y = β◦ +k∑

i=1

βixi +k∑

i=1

βiix2i +

∑

i

∑

j

βiixixj , i < j (6.104)

is adequate. Here the scope is to show how to use this fitted model to find the optimum set offeasible intervals (levels) for the x’s and to characterize the nature of the response surface.

Location of the Stationary Point

Suppose the experimenter wishes to find the levels of x1,x2, . . . ,xk that optimize the predictedresponse. This point, if it exists, will be the set of x1,x2, . . . ,xk for which the partial derivatives∂y/∂x1 = ∂y/∂x2 = . . . = ∂y/∂xk = 0. This point, say (x1,0,x2,0, . . . ,xk,0), is called the station-ary point . The stationary point could represent (i) a point of maximum response, (ii) a point ofminimum response, or (iii) a saddle point .

The experimenter may obtain a general solution for the stationary point by writing the secondorder model in matrix notation

y = β◦ + x′ b + x′Bx (6.105)

where

x =

x1

x2...

xk

b =

β1

β2...

βk

and B =

β11 β12/2 . . . β1k/2β22 . . . β2k/2

. . .βkkxk

370


That is, b is a (k×1) vector of the first-order regression coefficients and B is a (k×k) symmetricmatrix whose main diagonal elements are the pure quadratic coefficients {βii} and whose off–diagonal elements are one-half the mixed quadratic coefficients {βij , i 6= j}. The derivative of y

with respect to the elements of the vector x equated to 0 is

∂y

∂x= b + 2Bx = 0 (6.106)

The stationary point is the solution to equation (6.106)

x◦ = − b2B

(6.107)

Furthermore, by substituting equation (6.107) into equation (6.105), the predicted response atthe stationary point can be found as

y◦ = β◦ +12

x◦b (6.108)

Characterizing the Response Surface

Once the experimenter has found the stationary point, it is usually necessary to characterize theresponse surface in the immediate vicinity of this point. That means to determine whether thestationary point is a point of maximum or minimum response or a saddle point and the relativesensitivity of the response to the variables x1,x2, . . . ,xk.

The most straightforward way to do this is to examine a contour plot of the fitted model. Ifthere are only two or three independent variables (the x’s), the construction and interpretationof this contour plot is relatively easy. However, even when there are relatively few variables, amore formal analysis can be useful.

Figure 6.30. Canonical form of the second-order model

371


It is helpful first to transform the model into a new coordinate system with the origin at thestationary point x◦ and then to rotate the axes of this system until they are parallel to theprincipal axes of the fitted response surface. This transformation is shown in Figure 6.30. It canbe shown that this results in the fitted model

y = y◦ + λ1w21 + λ2w

22 + . . . + λkw

2k (6.109)

where the {wi} are the transformed independent variables and the {λi} are constants. Equation(6.109) is called the canonical form of the model. Furthermore, the {λi} are just the eigenvaluesor characteristic roots of the matrix B.

The nature of the response surface can be determined from the stationary point and the signand magnitude of the {λi}. Suppose that the stationary point is within the region of explorationfor fitting the second-order model. If the {λi} are all positive, then x◦ is a point of minimumresponse; if the {λi} are all negative, then x◦ is a pont of maximum response; and if the {λi}have different signs, x◦ is a saddle point. Furthermore, the surface is steepest in the wi directionfor which |λi | is the greatest.

Ridge Systems

It is not unusual to encounter variations of the pure maximum, minimum, or saddle point responsesurfaces discussed above. Ridge systems, in particular, are fairly common. Consider the canonicalform of the second order model given previously in equation (6.109)

y = y◦ + λ1w21 + λ2w

22 + . . . + λkw

2k

Now suppose that the stationary point x◦ is within the region of experimentation; furthermore,let one or more of the λi be very small (e.g., λi ' 0). The response variable is then very insensitiveto the variables wi multiplied by the small λi. An example is shown in Figure 6.31 for k = 2variables with λ1 = 0; in practice, λ1 would be close to zero. The canonical model for thisresponse surface is theoretically

y = y◦ + λ2w22

with λ2 negative. Notice that the severe elongation in the w1 direction has resulted in a line ofcenters at y = 70 and the optimum may be taken anywhere along that line. This type of responsesurface is called a stationary ridge system.

If the stationary point is far outside the region of exploration for fitting the second-order modeland one (or more) λi is near zero, then the surface may be a rising ridge. Figure 6.32 illustratesa rising ridge for k = 2 variables with λ1 near zero and λ2 negative. In this type of ridge system,the experimenter cannot draw inferences about the true surface or the stationary point since x◦is outside the region where the model has been fit. However, further exploration is warranted inthe w1 direction. If λ2 had been positive, this system would be called a falling ridge.

372


Figure 6.31. A stationary ridge system

The distance of the stationary point from the design center is

d =

(k∑

i=1

x2i0

)1/2

(6.110)

where xi0, i = 1,2, . . . ,k are the coordinates of the stationary point.

Figure 6.32. A rising ridge system

When interpreting rising (or falling) ridge systems, d will usually be greater than unity; conse-quently, attempting to draw conclusions about the behavior of the response surface at x◦ is risky.As noted above, the best approach is to continue exploration along the ridge in the direction ofthe optimum. In these cases, another canonical form may be helpful, say

y = β◦ + θ1w1 + θ2w2 + . . . + θkwk + λ1w21 + λ2w

22 + . . . + λkw

2k

= β◦ + w′θ + w′Λw (6.111)

where θ = M′b and Λ = diag(λ1,λ2, . . . ,λk ). In this canonical form, the λ’s determine the typeof fitted surface and the θ’s measure the slopes of the surface at the original origin x = 0 in thedirections of the rotated axes w1,w2, . . . ,wk.

373


6.6.3 Experimental Designs for Fitting Response Surfaces

Fitting and analyzing response surfaces is greatly facilitated by the proper choice of an experi-mental design. Here some desirable features in selecting appropriate designs for fitting responsesurfaces are recalled:

1. providing a reasonable distribution of data points throughout the region of interest;

2. allowing model adequacy, including lack of fit, to be investigated;

3. allowing experiments to be performed in blocks;

4. allowing designs of higher–order to be built up sequentially;

5. providing an internal estimate of error;

6. not requiring a large number of runs;

7. not requiring too many levels of the independent variables;

8. ensuring simplicity of calculation of the model variables.

These features are sometimes conflicting, so subjective judgment must often be applied in responsesurface design selection (see Box and Draper, 1987; Khuri and Cornell, 1987).

Designs for Fitting the First-order Model

Suppose the experimenter wishes to fit the first–order model in k variables

y = β◦ +k∑

i=1

βixi + ε (6.112)

There is a unique class of designs that minimize the variance of the regression coefficients {βi}.These are the orthogonal first–order designs. A first–order design is orthogonal if the off–diagonalelements of the (x′x) matrix are all zero. This implies that the cross–products of the columns ofthe x matrix sum to zero.

The class of orthogonal first-order designs includes the 2k factorial designs and fractional factorialdesigns of the 2k series in which main effects are not aliased with each other. In using thesedesigns, the k factors are assumed to be coded to the standardized levels ±1. As an example,suppose the experimenter uses a 23 design to fit the first-order model

β◦ β1 β2 β3

x =

1 −1 −1 −11 1 −1 −11 −1 1 −11 1 1 −11 −1 −1 11 1 −1 11 −1 1 11 1 1 1

374


It is easy to verify that the off–diagonal elements of (x′x) are zero for this design.

The 2k design does not afford an estimate of the experimental error unless some runs are repeated.A common method of including replication in the 2k design is to augment the design with severalobservations at the center (the point xi = 0, i = 1,2, . . . ,k). The addition of center points to the2k design does not influence the {βi} for i ≥ 1, but the estimate of β◦ becomes the grand averageof all observations. Furthermore, the addition of center points does not alter the orthogonalityproperty of the design.

Another orthogonal first-order design is the simplex . The simplex is a regularly sided figure withk + 1 vertices in k dimensions. Thus, for k = 2 the simplex design is an equilateral triangle, andfor k = 3 it is a regular tetrahedron. Simplex designs in two and three dimensions are shown inFigure 6.33.

Figure 6.33. The simplex design for k = 2 variables (a) and k = 3 variables (b)

Designs for Fitting the Second–Order Model

An experimental design for fitting a second-order model must have at least three levels of eachfactor. There are many designs that could be used for fitting a second–order model; so to selectan appropriate design it is helpful to have a design criterion. For the first–order model, orthogo-nality is the optimal design property as it minimizes the variance of the regression coefficients. Asorthogonality is also desirable in the second–order case because it results in a very nice propertyfor the variance of the predicted response [V (y)], this property is here discussed.

Figure 6.34(a) shows the information surface and contours for a 22 factorial des!gn with fourcenter points used to fit a first-order model, where the information function is defined as thereciprocal of the variance; that is,

Ix = [V (y)]−1

Notice that the variance contours are concentric circles. An experimental design is said to berotatable if the variance of the predicted response y at some point x is a function only of thedistance of the point from the design center and is not a function of direction. Furthermore, adesign with this property will leave the variance of y unchanged when the design is rotated aboutthe center (0,0, . . . ,0); hence, the name rotatable design.

375


Figure 6.34. Information surfaces and contours for various designs

Rotability is a very important property in the selection of a response surface design. Since thepurpose of RSM is optimization and the location of the optimum is unknown prior to runningthe experiment, it makes sense to use a design that provides equal precision of estimation in alldirection. Note that any first–order orthogonal design is rotatable.

Figure 6.34(b) shows the information surface and contours for a 32 factorial design used to fit asecond–order model. Notice that the 32 design is not rotatable. For this reason, 3k designs andtheir fractions are not good choices for second-order response surface designs.

376


Figure 6.34(c) shows the information surface and contours for a second-order rotatable centralcomposite design consisting of eight points on a circle plus fout center points. Recall the centralcomposite design consists of a 2k factorial or fractional factorial (coded to the usual ±1 notation)augmented by 2k axial points (±α,0,0, . . . ,0), (0,±α,0, . . . ,0), (0,0,±α, . . . ,0), . . ., (0,0,0, . . . ,±α)and nc center points (0,0, . . . ,0). Central composite designs for k = 2 and k = 3 are shown inFigure 6.35. The central composite design is probably the most widely–used experimental designfor fitting a second–order response surface.

Figure 6.35. Central composite design for k = 2 and k = 3

A central composite design is made rotatable by the choice of α. The value of α for rotabilitydepends on the number of points in the factorial portion of the design; in fact, α = (ni)1/4 yieldsa rotatable central composite design where ni is the number of points used in the factorial portionof the design. Another useful property of the central composite design is that it may be ‘built up’from the first–order design (the 2k design) by adding the axial points and perhaps several centerpoints.

k 2 3 4 5 5 6 6 7 8rep./2 rep./2 rep./2 rep./2

ni 4 8 16 32 16 64 32 64 128

na 4 6 8 10 10 12 12 14 16

nc (up) 5 6 7 10 6 15 9 14 20

nc (orth.) 8 9 12 17 10 24 15 22 33

nc (up) 13 20 31 52 32 91 53 92 164

nc (orth.) 16 23 36 59 36 100 59 100 177

α 1.414 1.682 2.000 2.378 2.000 2.828 2.378 2.828 3.364

Table 6.6. Orthogonal and uniform-precision rotatable central composite design

Other properties of the central composite design may be controlled by the choice of the number ofcenter points, nc. With proper choice of nc the central composite design may be made orthogonal ,or it can be made a uniform–precision design. In a uniform-precision design, the variance of y

at the origin is equal to the variance of y at unit distance from the origin. A uniform–precision

377


design affords more protection against bias in the regression coefficients than does an orthogo-nal design, because of the presence of third–order and higher terms in the true surface. Table6.6 provides the design parameters for both orthogonal and uniform–precision rotatable centralcomposite designs for various values of k.

A variation of the central composite design is the face-centered central composite design, in whichα = 1. This design locates the star or axial points on the centers of the faces of the cube, asshown in Figure 6.36 for k = 3. This variation of the central composite design is sometimes usedbecause it requires only three levels of each factor, and in practice it is frequently difficult tochange factor levels. However, face-centered central composite designs are not rotatable, and thisis considered a serious disadvantage.

Figure 6.36. A face-centered central composite design for k = 3

Box and Behnken (1960) have proposed some three–level designs for fitting response surfaces.These designs are formed by combining 2k factorials with incomplete block designs. The resultingdesigns are usually very efficient in terms of the number of required runs, and they are rotatable(or nearly rotatable). Table 6.7 shows a three-variable Box-Behnken design.

Run 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

x1 -1 -1 1 1 -1 -1 1 1 0 0 0 0 0 0 0

x2 -1 1 -1 1 0 0 0 0 -1 -1 1 1 0 0 0

x3 0 0 0 0 -1 1 -1 1 -1 1 -1 1 0 0 0

Table 6.7. A three-variable Box-Behnken design

The design is shown geometrically in Figure 6.37. Notice that the Box-Behnken design does notcontain any points at the vertices of the cubic region created by the upper and lower limits foreach variable. This could be advantageous when the points on the corners of the cube representfactor-level combinations that are impossible to test because of physical phenomenon constraints.

There are several other rotatable designs that are occasionally useful for problems involving twoor three variables. These designs consist of points that are equally spaced on a circle (k = 2)or a sphere (k = 3) and form regular polygons or polyhedrons. Because the design points areequidistant from the origin, these arrangements are often called equiradial designs.

For k = 2, a rotatable equiradial design is obtained by combining n2 ≥ 5 points equally spacedon a circle with n1 ≥ 1 points at the center of the circle. Particularly useful designs for k = 2are the pentagon and hexagon. These designs are shown in Figure 6.38(a) and 6.38(b). Fork = 3, the only equiradial arrangements that contain enough points to allow all parameters in

378


the second-order model to be estimated are the icosahedron (20 points) and the dodecahedron(12 points).

Figure 6.37. Box-Behnken design for three factors

Figure 6.38. Equiradial designs for two variables

Blocking in Response Surface Designs

When using response surface designs, it is often necessary to consider blocking to eliminate nui-sance variables. For example, this problem may occur when a second-order design is assembledsequentially from a first-order design. A response surface design is said to block orthogonally if itis divided into blocks such that block effects do not affect the parameter estimates of the responsesurface model. If a 2k or 2k−p design is used as a first–order response surface design, the methodsof 2k factorial design may be used to arrange the runs in 2r blocks. The center points in thesedesigns should be allocated equally among the blocks.

For a second–order design to block orthogonally, two conditions must be satisfied. If there are nb

observations in the bth block, then these conditions are:

1. Each block must be a first–order orthogonal design; that is,nb∑u

xiu xju = 0 , i 6= j = 0,1, . . . ,k for all b

where xiu and xju are the levels of the ith and jth variables in the uth run of the experimentwith x◦u = 1 for all u.

379


2. The fraction of the total sum of squares for each variable contributed by every block mustbe equal to the fraction of the total observations that occur in the block; that is

nb∑u

x2iu

N∑

u=1

x2iu

=nb

N, i = 1,2, . . . ,k for all b

where N is the number of runs in the design.

As an example of applying these conditions, consider a rotatable central composite design ink = 2 variables with N = 12 runs. The levels of x1 and x2 for this design may be written in thefollowing design matrix

x1 x2

D =

−1 −1+1 −1−1 +1+1 +10 00 0

+1.414 0−1.414 0

0 +1.4140 −1.4140 00 0

Block 1

Block 2

Notice that the design has been arranged in two blocks, with the first block consisting of thefactorial portion of the design plus two center points and the second block consisting of the axialpoints plus two additional center points. It is clear that condition 1 is met; that is, both blocksare first-order orthogonal designs. To investigate condition 2, consider first block 1 and note that

n1∑u

x21u =

n1∑u

x22u = 4

N∑

u=1

x21u =

N∑

u=1

x22u = 8 and n1 = 6

Thereforen1∑u

x2iu

N∑

u=1

x2iu

=n1

N→ 4

8=

612

380


hence, condition 2 is satisfied in block 1.

For block 2, one hasn2∑u

x21u =

n2∑u

x22u = 4 and n2 = 6

Thereforen2∑u

x2iu

N∑

u=1

x2iu

=n2

N→ 4

8=

612

Since condition 2 is also satisfied in block 2, this design blocks orthogonally.

In general, the central composite design can always be constructed to block orthogonally in twoblocks with the first block consisting of nf factorial points plus ncf center points and the secondblock consisting of na = 2k axial points plus ncf center points. The first condition for orthogonalblocking will always hold regardless of the value used for α in the design. For the second conditionto hold

n2∑u

x2iu

n1∑

u=1

x2iu

=na + nca

nf + ncf(6.113)

The left–hand side of equation (6.113) is 2α2/nf , and after substituting in this quantity, one maysolve the equation for the value of α that will result in orthogonal blocking as

α =

[nf (na + nca)2 (nf + ncf )

]1/2

(6.114)

This value of α does not, in general, result in a rotatable design. If the design is also required tobe rotatable, then α = (nf )1/4 and

(nf )1/2 =nf (na + nca)2 (nf + ncf )

(6.115)

It is not always possible to find a design that exactly satisfies equation (6.116). For example ifk = 3 then nf = 8 and na = 6, and equation (6.116) reduces to

(8)1/2 =8 (6 + nca)2 (8 + ncf )

→ 2.83 =48 + 8nca

16 + 2ncf

It is impossible to find values of nca and ncf that exactly satisfy this last equation. However,note that if ncf = 3 and nca = 2 , then the right–hand side is

381


48 + 8·216 + 2·3 = 2.91

so the design nearly blocks orthogonally. In practice, one could relax somewhat the requirementof either rotatability or orthogonal blocking without any major loss of information.

The central composite design is very versatile in its ability to accommodate blocking. If k is largeenough, the factorial portion of the design can be divided into two or more blocks. The numberof blocks must be a power of 2, with the axial portion forming a single block. Table 6.8 presentsseveral useful blocking arrangements for the central composite design.

k 2 3 4 5 5 6 6 7 7rep./2 rep./2 rep./2

Factorial Block(s)

ni 4 8 16 32 16 64 32 128 64

Number of blocks 1 2 2 4 1 8 2 16 8

Number of points 4 4 8 8 16 8 16 8 8in each block

Number of center points 3 2 2 2 6 1 4 1 1in each block

Total number of points 7 6 10 10 22 9 20 9 9in each block

Axial Block

na 4 6 8 10 10 12 12 14 14

nca 3 2 2 4 1 6 2 11 4

Total number of points 7 8 10 14 11 18 14 25 18in the axial block

Total number of points 14 20 30 54 33 90 54 169 80N in the design

Values of α

Orgothonal blocks 1.414 1.633 2.000 2.366 2.000 2.828 2.366 3.364 2.828

Rotatability 1.414 1.682 2.000 2.378 2.000 2.828 2.378 3.333 2.828

Table 6.8. Some rotatable and near-rotatable central composite designs that block orthogonality

There are two important points about the analysis of variance when the response surface designhas been run in blocks. The first concerns the use of center points to calculate an estimate ofpure error. Only center points that are run in the same block can be considered to be replicates,so the pure error term can only be calculated within each block. If the variability is consistentacross blocks, then these pure error estimates could be pooled. The second point concerns theblock effect. If the design blocks orthogonally in m blocks, the sum of squares for blocks is

SSBlocks =m∑

b=1

B2b

nb− G2

N(6.116)

382


where Bb is the total of the nb observations in the bth block and G is the grand total of all N

observations in all m blocks. When blocks are not exactly orthogonal, the general regressionsignificance test (the ‘extra’ sum of squares method) can be used.

383


384

Bibliography

[1] Allen, D.M.: Mean Square Error of Prediction as a Criterion for Selecting Variables, Technometrics,Vol. 13, 1971, pp. 469–475.

[2] Allen, D.M.: The Relationship Between Variable Selection and Data Augmentation and a Methodfor Prediction, Technometrics, Vol. 16, 1974, pp. 125–127.

[3] Box, G.E.P. and Cox, D.R.: An Analysis of Transformations, Journal of the Royal StatisticalSociety, B, Vol. 26, 1964, pp. 211–243.

[4] Box, G.E.P. and Draper, N.R.: Empirical Model Building and Response Surface, John Wiley & Sons,New York, 1987.

[5] Box, G.E.P. and Wetz, J.M.: Criterion for Judging the Adequacy of Estimation by an ApproximationResponse Polynomial , Technical Report no. 9, Department of Statistics, University of Wisconsin,Madison, 1973.

[6] Box, G.E.P. and Wilson, K.B.: On the Experimental Attainment of Optimum Conditions, Journalof the Royal Statistical Society, Series B, Vol. 13, 1951, pp. 1–45.

[7] Cook, R.D.: Detection of Influential Observation in Linear Regression, Technometrics, Vol. 18, 1977,pp. 15–17.

[8] Cook, R.D.: Influential Observations in Linear Regression, Journal of American Statistical Associa-tion, Vol. 74, 1979, pp. 169–174.

[9] Gunst, R.F. and Mason, R.L.: Some Considerations in the Evaluation of Alternative PredictionEquations, Technometrics, Vol. 21, 1979, pp. 55–63.

[10] Hill, R.C., Judge, G.G. and Fomby, T.B.: Test the Adequacy of a Regression Model , Technometrics,Vol. 20, 1978, pp. 491–494.

[11] Hines, W.W. and Montgomery, D.C.: Probability and Statistics in Engineering and ManagementScience, 3rd edition, John Wiley & Sons, New York, 1990.

[12] Khuri, A.I. and Cornell, J.A.: Response Surfaces: Designs and Analyses, 2nd edition, Dekker ed.,New York, 1987.

[13] Myers, R.H.: Response Surface Methodology , Allyn and Bacon eds., Boston, 1976.

[14] Myers, R.H.: Response Surface Methodology: Current Status and Future Directions, Journal ofQuality Technology, Vol. 31, 1999, pp. 30–44.

[15] Suich, R. and Derringer, G.C.: Is the Regression Equation Adequate - One Criterion, Technometrics,Vol. 19, 1977, pp. 213–216.

385

Bibliography

386

Chapter 7

Fuzzy Sets and Fuzzy Logic

Fuzzy set theory was formally introduced by Zadeh (1965) but the basic concepts have been knowna very long time. Kosko (1994) ascribes the history of fuzzy logic going back to ancient Chinesephilosophers who introduced the concept of yin and yang. In this framework every element of auniverse of discourse (i.e., analogous to the ‘environmental’ set in the classical, crisp set theorycan belong to a fuzzy set with a membership grade, µ, being any number in [0, 1].

In mathematical modelling of system properties and behavior the analyst can come across incon-veniences. The first is caused by the excessive complexity of the situation being modelled. Thesecond inconvenience consists of the indeterminacy caused by analyst’s inability to differentiateevents in real situations exactly and, hence, to model problem domains in precise terms. Thisindeterminacy is not an obstacle when using natural language since its main property is vaguenessof its semantics and, thus, its capability of working with vague notions. Classical mathematics,however, has not coped with such vagueness. A mathematical apparatus capable to describevague notions could be very useful since it could help us to overcome the above obstacles inmodelling. Moreover, it appears to be necessary for the development of some new branches ofscience, such as artificial intelligence.

This chapter is devoted to such an apparatus. lt is called fuzzy set theory and its fundamentalnotion is that of a fuzzy set. It can also be understood to be a generalization of the classical set.Using it, it is possible to model vague notions and also imprecise events.

7.1 Preview

Fuzzy set theory is characterized by its capability of handling linguistic variables in a non-analytical environment; this makes it a paradigm very close to the way a human thinks. Theability to summarize information finds its most pronounced manifestation in the use of naturallanguages. Thus, each word x in a natural language L may be viewed as a summarized descrip-tion of a fuzzy subset M(x) of a universe of discourse U , with M(x) representing the meaning ofx. In this sense, the language as a whole may be regarded as a system for assigning atomic and

387

7 – Fuzzy Sets and Fuzzy Logic

composite labels (i.e., words, phrases, and sentences) to the fuzzy subsets of U . For example, ifthe meaning of the noun flower is a fuzzy subset M(flower), and the meaning of the adjectivered is a fuzzy subset M(red), then the meaning of the noun phrase red flower is given by theintersection of M(red) and M(flower). If one regards the color of an object as a variable, thenits values, ‘red’, ‘blue’, ‘yellow’, ‘green’, etc., may be interpreted as labels of fuzzy subsets ofa universe of objects. In this sense, the attribute color is a fuzzy variable, that is, a variablewhose values are labels of fuzzy sets. It is important to note that the characterization of a valueof the variable by a natural label such as ‘red’ is much less precise than the numerical value ofthe wavelength of a particular color. More generally, the values may be sentences in a specifiedlanguage, in which case it may be said that the variable is linguistic. The values of the fuzzyvariable height might be expressible as ‘tall’, ‘not tall’, ‘somewhat tall’, ‘very tall’, ‘not very tall’,‘very very tall’, ‘tall but not very tall’, ‘quite tall’, ‘more or less tall’. Thus, the values in questionare sentences formed from the label ‘tall’, the negative ‘not’ the connectives ‘and’ and ‘but’ andthe hedges ‘very’, ‘somewhat’, ‘quite’ and ‘more or less’. In this sense, the variable ‘height’ asdefined above is a linguistic variable.

The main function of linguistic variables is to provide a systematic means for an approximatecharacterization of complex or ill-defined phenomena. In essence, by moving away from the useof quantified variables and toward the use of the type of linguistic descriptions employed by hu-man beings, one acquires a capability to deal with systems which are too much complex to besusceptible to analysis in conventional mathematical terms.

For the above reason fuzzy logic is frequently used in those problems where it is necessary to mimicthe behavior of some human expert. This is one of the main reasons that makes fuzzy logic auseful approach to decision problems. Moreover, fuzzy logic has been extensively and successfullyapplied to many engineering problems. Among those developments fuzzy multiattribute decisionmaking techniques present characteristics useful also in approaching decision problems emergingin the design process. These techniques are characterized by handling multiple imprecise (fuzzy)attributes.

Fuzzy–set theory provides a mathematical basis for representing and reasoning with knowledgein uncertain and imprecise problem domains. As compared to crisp requirements (constraints),fuzzy approach softens the sharp transition from acceptable to unacceptable. The mathematicaltheory of fuzzy sets (Zadeh, 1965), alternatively referred to as fuzzy logic, is concerned with thedegree of truth that the outcome belongs to a particular category, not with the degree of likeli-hood that the outcome will be observed. Fuzzy logic provides appropriate models for the abilityof human beings to categorize things, not by verifying whether they satisfy some unambiguousdefinitions, but by comparing them with characteristic examples of the categories in question.

Fuzzy–set theory is an important branch of decision–making theory, providing tools to quantifyimprecise verbal statements and to classify outcomes of decision-analytical experiments. Usually,when decisions are prepared a considerable amount of imprecise information with a quantitativeconnotation is transmitted via natural language. Well–known examples are the frequency indica-tors like: almost never, rarely, sometimes, often, mostly, and almost always. They are meaningful

388

7.1 – Preview

albeit in a particular context only. Since decisions are invariably made within a given context,graded judgement should also be considered within a particular framework. Mutual understand-ing of what the context is seems to be possible by common experience and education of humanbeings. The qualifying terms like almost, rather, somewhat, the so-called hedges, enable us toexpress degrees of truth in situations where a black–or–white statement would be inadequate.

Although fuzzy-set theory has been criticized for being probability theory in disguise, it is easy tounderstand that the two theories are concerned with two distinct phenomena: with observationsthat can be classified in vaguely described (imprecise) categories only, and with experiments suchthat the outcomes can be classified into well–defined (crisp) categories. In essence, fuzzy–settheory is concerned with ability of human beings to categorize things and to label the categoriesvia natural language.

The almost ideological debate between the supporters of probability theory and fuzzy–set theoryreveals that the conflict has deep roots. Indeed, the fact that fuzzy–set theory models degrees oftruth leads to a confrontation with our scientific tradition. Fuzzy logic agrees that an elementmay with a positive degree of truth belong to a set and with another positive degree of truth tothe complement of the set, so violating the law of non–contradiction which states that a statementcannot be true and not-true at the same time. Fuzzy logic also violates the law of the excludedmiddle (a statement is either true or false, ‘tertium non datur’). Indeed, the real world is nota world of black-and-white, but it is full of gray shades. Note that probability theory neverchallenged the traditional bivalent logic. It has its roots in gambling, where the rules and theoutcomes are unambiguous.

7.1.1 Types of Uncertainty

Probability theory is a well-established mathematical theory, designed to model precisely de-scribed, repetitive experiments with uncertain outcomes. In the last few decades other types ofuncertainty have been identified, however, and new mathematical tools are accordingly understudy, in attempts to deal with situations which are not or cannot be covered by the classicaltools of probability theory. The key notion is that uncertainty is a matter of degree. Thus, eventsoccur with a particular degree of likelihood, elements have properties with a particular degree oftruth, and actions can be carried out with a particular degree of ease. Roughly speaking, thefollowing types of uncertainty can be distinguished.

• Randomness occurs when a precisely described experiment such as casting a die on a flattable has several possible outcomes, each with known probability (a perfect die with a ho-mogeneous mass distribution) or with unknown probability (an inhomogeneous die). Theoutcomes of the experiment (the faces 1, 2, . . ., 6) can unambiguously be observed. Theexperiment of casting the die can arbitrarily be repeated. Further experimentation willreduce the uncertainty: it will reveal the probability distribution of the outcomes of the die.Probability theory is concerned with the uncertainty of whether the respective outcomeswill occur or not, that is, with their degree of likelihood.

389


• Vagueness or imprecision arises as soon as the outcome of the experiment cannot properlybe observed. A typical example is given by the situation arising after the experiment of cast-ing a die with colored faces, under twilight where colors cannot properly be distinguished.There are several possible outcomes, each with a particular degree of truth. Further exper-imentation will not reduce the uncertainty. Color perception illustrates that vagueness orimprecision may be due to the manner in which our neural system operates.

• Ambiguity arises when a verbal statement has a number of distinct meanings so that onlythe context may clarify what the speaker really wants to say.

Risk is not a particular type of uncertainty but rather a mixture, where outcomes cannot pre-cisely be classified into a small number of categories so that it is also difficult to specify theirprobabilities.

7.1.2 Crisp Sets

A set is any well defined collection of objects. An object contained by a set is called a member , orelement . For instance, if one considers books, sets might be hard cover, soft cover, large, small,fiction, etc. A particular book could be a member of multiple sets. All members of a set arecreated as equal members of that set.

Below capital letters denote sets, while members of a set are written in lowercase. To indicatethe universe of discourse, often referred to as the universal set, the symbol U is used. All setsare members of the universal set. Additionally, a set with no elements is called a null, or emptyset and is denoted 0.

If there is an element x of set A, this is written as x ∈ A, while if x is not a member of A, it iswritten as x /∈ A. There are two methods used to describe the contents of a set, the list methodand the rule method. The list method defines the members of a set by listing each object in theset

A = {a1,a2, . . . ,an}

The rule method defines the rules that each member must adhere to in order to be considered amember of the set

A = {a | a has properties P1,P2, . . . ,Pn}

When every element in the set A is also a member of set B, then A is a subset of B

A ⊆ B

If every element in A is also in B and every element in B is also in A, then A and B are equal

A = B

390

7.1 – Preview

If at least one element in A is not in B or at least one element in B is not in A, then A and B

are not equal

A 6= B

The set A is a proper subset of B if A is a subset of B but A and B are not equal, i.e. A ⊆ B

and A 6= B

A ⊂ B

To present the notion that an object is a member of a set either fully or not at all, the functionµ is introduced. For every x ∈ U , µa(x) assigns a value that determines the grade of membershipof each x in the set A ∈ U

µA(x) =

{1 if and only if x ∈ A

0 if and only if x /∈ A

Therefore, µA maps all elements of the universal set into the set A with values 0 and 1

µA : U → {0,1}

The characteristic function has two possible values to model the idea that the statement x belongs

to A is either true or false, for each element in U . Only one of the previous statements holds,that is, the element has respectively either a 0 or a 1 membership grade in the given set.

Figure 7.1. Venn diagrams

Using the given notation, four basic operations that can be used on sets are shown in Figure 7.1using Venn diagrams and also written in set theoretic notation. The shaded region indicates theresult of applying the given function.

391


The four operations shown in Figure 7.1 are routinely combined to produce more complex func-tions. Additionally, these examples use only two sets, but union and intersection can be definedfor any number of sets. This is due to the properties of the basic operations shown in Table 7.1.

Property Description

Commutativity A ∪ B = B ∪ A

A ∩ B = B ∩ A

Associativity (A ∪ B) ∪ C = A ∪ (B ∪ C)(A ∩ B) ∩ C = A ∩ (B ∩ C)

Distributivity A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

Idempotence A ∩ A = A

A ∪ A = A

Law of contradiction A ∩ A = 0

Law of excluded middle A ∪ A = UTable 7.1. Summary of crisp set properties

Preserving these behaviors is important as fuzzy sets are a generalization of classic sets and mustbe able to reproduce exactly their behavior.

7.1.3 Fuzzy Control

Fuzzy–set theory experienced huge resistance from probability theory, but in electrical engineer-ing it is now widely accepted as a suitable model for the verbal classification of observations andcontrol commands. Fuzzy–set theory and fuzzy logic have been successfully applied to industrialcontrol problems, delivering performance levels similar to those obtained by expert human oper-ators.

Fuzzy logic, the name of which appears on Japanese cameras, washing machines, refrigerators,and other domestic appliances, has a certain future in the design of control mechanisms. The firstreally exciting application of fuzzy logic was realized in 1987, when the Sendai railway started itsoperations. On a single North-South route of 13.6 km and 16 stations, the train glides smootherthan any other train because of its sophisticated control system. So, fuzzy logic did not comeof age at universities (Kosko, 1994) but in industry and in the commercial market. The debatebetween fuzzy logic and probability theory will not be solved by theoretical arguments but bythe successes in industrial design, development, production, and sales.

Control systems benefit so much from fuzzy logic because they follow the example of the humancontrollers who categorize their observations (the speed is rather high, low, etc.) whereafter theyissue vague commands to the system under control (slow down, or accelerate slightly, etc.). Afuzzy air conditioner, for instance, employs a number of rules of the form

392

7.2 – Basics of Fuzzy Logic

if temperature is cold then motor speed must be fast,

if temperature is just right then motor speed must be medium,

etc. The system obviously checks to which of the categories ‘cold’, ‘coo1’, ‘just right’, ‘warm’, or‘hot’ the temperature belongs, whereafter the motor speed is properly adjusted if it does not suf-ficiently belong to the required category ‘stop’, ‘slow’, ‘medium’, ‘fast’, or ‘blast’ (Kosko, 1994).The temperature in this example is alternatively referred to as a linguistic variable which canonly assume a verbally defined value.

In general, a control system is characterized by how it transforms input quantities into outputquantities. An intelligent control system yields appropriate problem-solving responses when it isfaced with information which are usually imprecise. Moreover, an intelligent system learns frompast experience, it generalizes from a limited number of experiences which are mostly imprecise,and it creates new input-output relationships.

7.2 Basics of Fuzzy Logic

This section starts from the mathematical model of vagueness and imprecision originally proposedby Zadeh (1965) who suspected that an ever-increasing amount of precision in mathematical mod-elling would lead to almost insignificant models for control systems. The processing of impreciseinformation is typically the domain of fuzzy logic.

7.2.1 Membership Function

A fuzzy set A is defined as the ordered set of pairs (x,µA(x)). For each pair, x signifies an elementin the fuzzy set, while µA(x) represents the grade of membership x has in A.

Since Zadeh (1965) introduced the notion of fuzzy sets, one of the main difficulties has been withthe meaning and measurement of membership function. Fuzzy sets are totally characterized bytheir membership functions. For a sound theory of fuzzy sets a rigorous semantics together withpractical elicitation methods for membership functions are necessary.

It is suitable to start with the formal (i.e., mathematical) definition of a membership function.The so–called membership function µA of the fuzzy set A models the idea that the statement x

belongs to A is not necessarily true or false only. On the contrary, it denotes the grade of anelement x in the set A. The membership (Fig. 7.2) function maps all elements of the universalset U into the set A with values in the continuous normalized range 0 (non–membership) to 1(full membership), i.e.

µA : U → [0,1]

with 0 and 1 representing the lowest and highest grades (values) of membership, respectively.

393


Figure 7.2. A membership function

When {µA(x)} contains only the two points 0 and 1, the set A is non–fuzzy (crisp). A fuzzy set issaid to be normal if maxx µA(x) = 1. Subnormal fuzzy sets can be normalized by dividing eachµA(x) by maxx µA(x).

In fuzzy sets the membership grade does not need to be either 0 or 1, but can be any real numberbetween them. Loosely speaking, this is like excluding the possibility of only black or white,but accommodating all the different shades of gray in between. In fuzzy set theory a set iscompletely identified by its membership function as well as in crisp set theory. Indeed, while incrisp set theory the membership function can be any function defined in the environment set andhaving values in {0, 1}, in fuzzy sets the membership function is still defined in the universe ofdiscourse but it assumes values in [0, 1]. Thus, for every element in the universe of discourse themembership function of a fuzzy set gives its membership grade, that is

0 ≤ µA(x) ≤ 1 for any x ∈ U (7.1)

where the truth value µA(x) represents the degree of truth, subjectively assigned by a decisionmaker, of the statement x belongs to A.

Figure 7.3. Characteristic function and membership function

Figure 7.3 illustrates the two concepts, the characteristic function and the membership function.The interval (18, 25) on the scale of temperature is crisp, but the interval of the room tempera-tures where one feels comfortable is fuzzy. There is a zone of imprecision on both sides. Below18◦C the temperature is chilly, above 18◦C the temperature tends to be uncomfortably hot. The

394


form of the membership function depends on the individual, subjective feelings of the decisionmaker, however.

Since it is not easily acceptable to define a concept on the basis of subjective feelings, attemptshave been made to introduce a more objective definition of the truth value. Therefore, the truthvalue µA(x) is sometimes interpreted as the fraction of a sufficiently large number of refereesagreeing with the statement x belongs to A. Thus, it must be assumed that the fuzzy set A,despite the imprecision of its boundaries, can be delineated by subjectively associating a gradeof membership (a number between 0 and 1) with each of its elements.

As for crisp sets, a fuzzy set may be defined formally in two ways, each introduced by Zadeh(1965). The list method for a fuzzy set lists the membership grade of each element of a discrete,countable universe of discourse U to the set in question

A =n∑

i=1

µi/xi = {µ1/x1 + . . . + µn/xn} (7.2)

where xi denotes the ith member of U and µi/xi is the strength of membership of element xi.The use of the plus symbol to separate individual elements is a departure from standard settheory notation, which uses the comma. The plus sign in (7.2) denotes the union rather than thearithmetic sum.

To describe a fuzzy set on a continuous universe one writes

A =n∑

i=1

µA/x

where µA(x) is a function that represents the grade of membership to x to A for every elementx has in U .

For example, if one wishes to represent speeds ‘close to 50’ miles per hour using a fuzzy set witha continuous (non countable) universal set, µA(x) can be defined as

µA(x) =1

1 +(x− 50)2

50

This function, shown graphically in Figure 7.4, maps every real number into the set of speeds‘close to 50’ miles per hour (mph). If one were travelling 20 mph, that would be assigned a valueof 0.05, while 40 mph gets 0.33 and 50 mph gets 1. Clearly, the closer the number to 50, thehigher its membership in the set. The term ‘very close to 50’ would be

µA(x) =

1

1 +(x− 50)2

50

2

395


Figure 7.4. Continuous definition for ‘close to 50’ miles/hour

Alternatively, if one is dealing with a countable universe, a similar function can be defined inaccordance with equation (7.2) as

A = {0.03/10 + 0.05/20 + 0.11/30 + 0.33/40 + 1.0/50 + 0.33/60 + 0.11/70 +

0.05/80 + 0.03/90}which consists solely of points when plotted, as shown in Figure 7.5.

Figure 7.5. Countable definition for ‘close to 50’ mph

The concept of a fuzzy relation is closely related to the concept of a fuzzy set. Consider, forinstance, the relation R describing rough equality between two elements x and y, and use thesymbol µR(x,y) to express the truth value of the expression x ≈ y. For many practical purposes,it could be set

µR(x,y) = e−(x−y)2

It will be obvious that the relation R is in fact a fuzzy subset of the (x, y)-space and that µR isthe corresponding membership function.

396


7.2.2 Formulations of Membership Functions

Useful formulations of the membership grade functions are shown in Figure 7.6.

Figure 7.6. Main types of membership functions

The type of membership grade function reflects designers intention regarding the value of specificattribute.

Experience shows that the ‘Nehrling type’ membership grade function is very suited in conceptship design. In this respect four types are possible: attracting , ascending , aveting and descending ,as shown in Figure 7.7.

397


Two points on a membership grade curve are important and may be defined as

• y − y1 → the level of an attribute which is 100% satisfactory, i.e. the level that mayoptimistically be expected to be reached by the best design in respect to specific attribute;

• y = y1/2 = y1 − d → the level that is only 50% satisfactory, i.e. the level that may beexpected in the average design.

By assigning appropriate values to y1 and d, the designers may express the aspiration level forspecific attribute. For some attributes other modifiers to the formulation may be added, such asdifferent exponent values, asymmetric curves, etc.

Figure 7.7. Generalized Nerhling type functions

398


7.2.3 Fuzzy partitioning

A number of fuzzy sets form a fuzzy partition whenn∑

i=1

µAi(x) ∀ x ∈ X

The fuzzy sets Ai must all be subsets of the same universe X and none of the sets may equal 0or U . This property is very important in fuzzy inference systems, and most fuzzy sets used formfuzzy partitions.

One method of working with fuzzy sets is to treat them as a collection of crisp sets. This isachieved by using the concept of an α–cut. The formal definition of an α–cut is the crisp set Aα

containing all the elements of A with a membership grade ≥ α

Aα = {x ∈ U |µA(x) ≥ α}

The set of all α–cuts of a fuzzy set results in a family of nested crisp subsets of U . A fuzzy setcan be completely decomposed into a number of crisp sets by creating α–cuts for each distinctmembership value in the set, as shown in Figure 7.8, where each concentric ring shows themembers of the crisp set created by performing an α–cut with the value of α given. Note thatas α increases, each ring is fully contained by rings corresponding to a lower α. Thus, α–cut setscorrespond to discarding those elements of a fuzzy set that are ‘extreme’ in the sense of having‘low’ membership in the set.

Figure 7.8. Example of an α–cut set

Every property that is valid for a crisp set is also valid for an α–cut set. This means that onemethod of working with fuzzy sets can be to create the appropriate number of α–cuts and usestandard crisp set operations on them (Klir & Folger, 1988). When this processing has finished,the resulting crisp sets can be recombined to create a fuzzy set.

399


The method used to recombine a decomposed fuzzy set whether or not operations have beenperformed on the crisp sets is to multiply each member of an α–cut set by α and take the unionof each resulting set, i.e.

A =⋃α

α Bα

7.2.4 Properties of Fuzzy Sets

The properties shown in this subsection do not provide a complete coverage of this area. Rather,it is designed to show those properties that are required later in this chapter.

The maximum value of the elements in a set is called the height of the fuzzy set

height (A) = maxx∈X

µA(x)

where X ∈ U is the set from which the members of A are drawn.

The support of a fuzzy set A is a crisp set defined as

support (A) = {x ∈ X | µA(x) > 0}which results in a crisp set containing all members of A that are non–zero.

The core of a fuzzy set A, itself another crisp subset of X, is a subset of its support

core (A) = {x ∈ X | µA(x) = 1}

Fuzzy sets are generally normalized and convex . A normalized set is one in which at least onemembership value reaches the maximum permitted value, i.e. height(A) = 1. A non-normalizedset is one whose maximum value does not reach 1. An example of a normalized fuzzy set is shownin Figure 7.9.

Figure 7.9. Normalized fuzzy set

400


Table 7.2 lists some fuzzy operators (Klir & Folger, 1988).


Support (A) The support of a fuzzy set A is the crisp set containing all non–zeromembers of A

Normalized Set A set is considered normalized when at least one member attainsthe highest possible membership value

α–cut The crisp set Aα containing all the elements of A with amembership grade ≥ α

Aα = {x ∈ U|µA(x) ≥ α} for 0 ≤ α ≤ 1

Level Set The set of all α such that a distinct set is producedL(A) = {α|µA(x) = α} for some x ∈ U

Convex A fuzzy set is convex iff each α–cut is convex

Scalar Cardinality The summation of all membership grades in the fuzzy set A

|A| =∑

x∈UµA(x)

A ⊆ B If the membership grade of all members of A is less than or equal to themembership grade of the same members in B, then A is a subset of B

µA(x) ≤ µB(x) ∀ x ∈ UA = B A is equivalent to B

µA(x) = µB(x) ∀ x ∈ UA 6= B A is not equivalent to B

µA(x) 6= µB(x) for at least one x ∈ UA ⊂ B A is a proper subset of B

A ⊆ B and A 6= B

Table 7.2. Summary of fuzzy set properties

Now, a fuzzy number is a fuzzy set A in the one–dimensional universe of discourse U when thefollowing hypotheses are satisfied:

- the α–level subsets are intervals monotonously shrinking as α↑1,

- there is at least one x ∈ U such that µA(x) = 1,

By the first requirement, the inequality α1 < α2 implies

{x | µA(x) ≥ α1} ⊃ {x | µA(x) ≥ α2}

The second requirement that the top value of the membership function of A must be 1, seems tobe reasonable. If one considers the fuzzy number a, that is, the fuzzy set of numbers which areroughly equal to the crisp number a, obviously the crisp number a belongs to the fuzzy set a sothat

µA(a) = 1

401


In general, a fuzzy number has a membership function which increases monotonically from 0 to1 on the left–hand side; thereafter, there is a single top or a plateau at the level; and finally, themembership function decreases monotonically to 0 on the right–hand side.

The concept of convex fuzzy sets is introduced to define fuzzy numbers. Convex fuzzy sets arecharacterized by convex α–cuts. More formally, a fuzzy set is considered convex if and only if(Dubois and Prade, 1980)

µA [λx1 + (1− λ)x2] ≥ min [µA(x1),µA(x2)] ∀ x1,x2 ∈ X , ∀λ ∈ [0,1]

7.2.5 Extension Principle

First introduced by Zadeh (1965), the extension principle is one of the most important elementsof fuzzy set theory. It provides the framework necessary to extend crisp mathematical conceptsinto the fuzzy realm. This is accomplished by extending a function f that maps points in thecrisp set Ac to the crisp set Bc such that it maps between fuzzy sets A and B. The main provisionis to allow a mapping from points in the universe X to the universe Y.

A = {µ1/x1 + . . . + µn/xn}

B = f(A) = f{µ1/x1 + . . . + µn/xn} = {µ1/f(x1) + . . . + µn/f(xn)}

where A and B are fuzzy sets in X and Y respectively.

Using a modified example of speeds ‘close to 50’ miles per hour, and a mapping f(x) =√

x

A = {0.11/30 + 0.33/40 + 1.0/50 + 0.33/60 + 0.11/70}

B = f(A) =√{0.11/30 + 0.33/40 + 1.0/50 + 0.33/60 + 0.11/70}

= {0.11/√

30 + 0.33/√

40 + 1.0/√

50 + 0.33/√

60 + 0.11√

70}= {0.11/5.5 + 0.33/6.3 + 1.0/7 + 0.33/7.7 + 0.11/8.4}

If multiple x ∈ A map to the same element y ∈ f(A), the maximum membership value of elementsin A is selected for the membership value of y ∈ B. If no elements map to a particular y ∈ B,the membership grade for that element is zero.

Additionally, the fuzzy set operations such as complement, union and intersection can all bewritten using the extension principle. For example, the union of two fuzzy sets will be shown tobe representable using the function ‘max’. The extension principle can be used to map from atwo variable input space to the output space using this function, thus implementing the unionoperation.

402


7.2.6 Operations on Fuzzy Sets

The three most important operations on any set, whether crisp or fuzzy, are complement , union,and intersection. These three operations are capable of producing more complex operations whenused in combination. In the classical set theory, these operations can be defined uniquely. Infuzzy set theory these operations are no longer uniquely defined, as membership values are nolonger restricted to {0, 1} and can be in the range [0, 1]. Any definition of these operations onfuzzy sets must include the limiting case of crisp sets.

Fuzzy Complement

The least complex of the three operations, the so–called fuzzy complement describes the differ-ence between an object and its opposite. The membership function of the fuzzy complement ornegation A of the set A is

µA(x) = 1− µA(x) (7.3)

In the case that 0 < µA(x) < 1 it follows easily that

µA∩A(x) = min [µA(x),1− µA(x)] < 1

which implies that the law of non–contradiction is violated. To a certain degree, a fuzzy statementcan be true and not–true at the same time. In other words, the overlap of A and its complementcan be non–empty. Similarly, if 0 < µA(x) < 1, one has

1/2 ≤ µA∪A (x) = max [µA(x),1− µA(x)] < 1

so that the law of the excluded middle is also violated. A fuzzy statement is true or not trueor both to a certain extent only. In other words, the underlap of A and its complement is notnecessarily the universe of discourse U . It is easy to verify that

µA∩A(x) + µA∪A(x) = 1

so that the violations just mentioned are equal, that is, the intersection’s deviation from 0 equalsthe union’s deviation from l. In the crisp case, when the truth values are 0 or 1 only, the classicalresult holds

µA∩A(x) = 0 and µA∪A(x) = 1

An example to illustrate the above concepts is given by the membership function µA(x) of thefuzzy set A of the ages where human beings are referred to as young, at least in the personalopinion of an anonymous referee. He/she may decide, for instance, that µA(1) = µA(2) = . . . =µA(20) = 1 and that there is a monotonous decrease of µA until µA(40) = µA(41) = . . . = 0.

In addition, the referee may set µA(30) = 1/2 so that a 30–years old person is as young as he/sheis not-young. That person is young and not–young with the same degree of truth 1/2, and he/sheis either young or not–young or both with the same degree of truth.

403


Of course, one could ask how fuzzy the statement x belongs to A actually is. When µA(x) is closeto 0 or 1, the statement is almost crisp, and when µA(x) is close to 1/4 the statement is veryfuzzy indeed, but it is possible to be more precise. Consider the distinction between a fuzzy setand its complement. The ratio

µA∩A(x)µA∪A(x)

has the minimum value 0 if, and only if,

µA(x) = 0 or 1

and it has the maximum value 1 if, and only if,

µA(x) = µA(x) = 1/2

Hence, this so–called ratio of overlap and underlap seems to be an appropriate measure for thedegree of fuzziness of the statement x belongs to A.

Fuzzy Union

The concept of union of fuzzy sets can be introduced following the treatment of sets in classicalset theory. The operators of fuzzy union takes two sets and returns a simple set representingtheir union. Given the truth values

µA(x) and µB(x)

to represent the degrees of truth that an element x belongs to the respective sets A and B, thetruth value

µA∪B (x)

of the statement x belongs to A, to B, or to both cannot be smaller than the maximum of the twooriginal truth values.

In fuzzy-set theory, for each element in U the classic fuzzy union of A and B (Fig. 7.10), denotedA ∪B, is defined as the smallest fuzzy set containing both A and B.

The membership function for A ∪B is

µA∪B (x) = max [µA(x),µB(x)] x ∈ X (7.4)

When E denotes the empty set and U the universe of discourse one has

µA∪E (x) = max [µA(x),0] = µA(x)

µA∪U (x) = max [µA(x),1] = 1

404


Figure 7.10. Classic fuzzy union

The property of idempotency is

µA∪A (x) = max [µA(x),µA(x)] = µA(x)

It is easy to verify that commutativity law holds, becauseµA∪B (x) = µB∪A(x)

Similarly, the associative law is defined asµ(A∪B)∪C (x) = µA∪(B∪C) (x)

and the distributive law isµA∪(B∩C) = µ(A∪B)∩(A∪C) (x)

The maximum operator for the union and the minimum operator for the intersection of two fuzzysets are not necessarily interactive. The value

max [µA(x),µB(x)]

remains unchanged under small perturbations of µB(x) when

µA(x) > µB(x)

and a similar thing may happen tomin(µA(x),µB(x))

when the inequalityµA(x) < µB(x)

holds. It is easy to verify that the above rules also hold when the sets under consideration arecrisp. The maximum operator, for instance, also gives the correct answer for the union whenµA(x) and µB(x) are 0 or 1 only. This is one of the boundary conditions of fuzzy logic, whereasin the crisp case the operators must coincide with the classical operators.

The notion of a union is like the inflexible connective ‘or’. Thus, if A = {fast ships} and B ={long ships}, then A∪B = {fast ‘or’ long ships}. This rigid ‘or’ may be softened by forming thealgebraic sum of A and B, denoted as A + B, is defined as

µA+B(x) = [µA(x) + µB(x)]− µA(x)·µB(x) for each x ∈ X

405


Fuzzy Intersection

The operation of fuzzy intersection takes two sets and returns a single set representing theirdifference. On the analogy of the concept of union of fuzzy sets, the truth value

µA∩B (x)

of the statement x belongs to A and to B cannot be greater than the minimum of the originaltruth values.

Therefore, in fuzzy–set theory, the intersection of A and B (Fig. 7.11), denoted as A ∩ B, isdefined as the largest fuzzy set contained in both A and B. The membership function for A ∩B

is

µA∩B (x) = min [µA(x),µB(x)] (7.5)

When E denotes the empty set and U the universe of discourse one has

µA∩E (x) = min [µA(x),0] = 0

µA∩U (x) = min [µA(x),1] = µA(x)

Figure 7.11. Classic fuzzy intersection

The property of idempotency is

µA∩A (x) = min [µA(x),µA(x)] = µA(x)

It is easy to verify that the commutativity law holds, because

µA∩B (x) = µB∩A(x)

Similarly, the associative law gives

µ(A∩B)∩C (x) = µA∩(B∩C) (x)

whereas the distributive law provides

µA∩(B∪C) = µ(A∩B)∪(A∩C) (x)

406


The minimum operator for the intersection of two fuzzy sets is not necessarily interactive, so thevalue

max [µA(x),µB(x)]

remains unchanged under small perturbations of µB(x) when

µA(x) > µB(x)

and the same may happen to

min [µA(x),µB(x)]

when the inequality

µA(x) < µB(x)

holds.

The notion of an intersection is like the inflexible connective ‘and’. Thus, if A is a set of fastships and B is a set of long ships, the A ∩ B is that set of ships which are both fast ’and’ long.This inflexible ‘and’ may be softened by forming the algebraic product of the fuzzy sets A andB. The membership function for this algebraic product, denoted as AB, is defined as

µAB(x) = µA(x)·µB(x) for each x ∈ X.

To conclude consider the following example (Fig. 7.12). The fuzzy set A is taken to represent theset of comfortable velocities and B the set of high velocities, so that the truth values µA(x) andµB(x) stand for the degree that the velocity x is felt to be comfortable and high respectively.

Figure 7.12. Comfortable and high velocities

Obviously, the expression

µA∩B(x)

is the degree that the velocity x is felt to be comfortable and high at the same time. The drivermight aim at a velocity which is as comfortable as it is high (x = 160 km/h), so that he/she aimsat the value of x which maximizes the expression

min [µA(x),µB(x)]

but this is not necessary. He/she may aim at a velocity which is higher than it is comfortable,depending on his/her preference for comfort and high–speed driving. The choice of a particularvelocity, in fact a compromise between two conflicting objectives, is usually referred to as thedefuzzification.

407


Other Union and Intersection Operators

Although the maximum operator for the union of two fuzzy sets (see formula (7.4)) and theminimum operator for the intersection - see formula (7.5) - are still the most popular ones, thereare other operators who satisfy certain desirable properties.

In order to introduce some of these operators, µA(x) is often interpreted as the fraction of a groupof decision makers agreeing with the statement that x belongs to A. Similarly, the symbol µB(x)designates the fraction of the referees who agree with the statement that x belongs to B . Now,the truth value

µA∪B(x)

with the maximum operator for the union of A and B, can accordingly be seen to represent thefraction of the decision makers who agree with the statement that x belongs to A, to B, or to bothwhen the referees try to agree as much as possible. This is also shown in Figure 7.13 where thefractions with the respective sizes µA(x) and µB(x) are both positioned at the left–hand side ofthe interval [0, l].

Figure 7.13. Maximum agreement between referees

Similarly, the truth value

µA∩B(x)

with the minimum operator for the intersection of A and B, stands for the fraction of the decisionmakers who agree with the statement x belongs to A and to B , when agreement is pursued asmuch as possible.

Consider now the case where the decision makers disagree to the maximum extent. Then thetruth value of the statement x belongs to A, to B, or to both is given by

min(1,µA(x) + µB(x))

This is illustrated in Figure 7.14, where the fractions with the sizes µA(x) and µB(x) are positionedat the left–hand side and the right–hand side of the interval [0, l] respectively.

Figure 7.14. Maximum disagreement between referees

By a similar argument the truth value of the statement x belongs to A and to B can be writtenas

max [0,µA(x) + µB(x)− 1]

In the literature, the operators corresponding to the largest possible disagreement are usuallyreferred to as the bounded–sum operators.

408


Finally, one can also imagine the decision makers to be independent as much as possible. Thenthe truth values of the union and the intersection are simply given respectively by

µA(x) + µB(x)− µA(x)·µB(x)

and

µA(x) µB(x)

and it is easy to verify that

max [µA(x),µB(x)] ≤ µA(x) + µB(x)− µA(x)·µB(x) ≤ min [1,µA(x) + µB(x)]

and

min [µA(x),µB(x)] ≥ µA(x)·µB(x) ≥ max [0,µA(x) + µB(x)− 1]

Note that the above operators coincide with the classical operators for union and intersection inthe crisp case, when the truth values are 0 or 1 only.

In the applications of fuzzy–set theory there is a clear but insufficiently motivated preferencefor the maximum and the minimum operator - see the formulas (7.4) and (7.5) - to model theunion and the intersection of two fuzzy sets. In general, the question of when to apply whichoperator has not been solved at all. Although this is an unsatisfactory situation, below thepopular maximum and minimum operators will be used and the other ones will be ignored.

7.2.7 Elementhood and Subsethood

When the universe of discourse U is the one–dimensional space, the support of a fuzzy set A

can be an interval, a collection of disjunct intervals, an infinite sequence, a finite grid. For easeof exposition, attention here will be limited to fuzzy sets with a finite support. The cardinalityM(A) of the fuzzy set A can now be defined by

M(A) =∑

i

µA(xi)

When A happens to be a crisp set, the cardinality so defined stands for the number of elementsof A so that it coincides with the classical concept of cardinality.

In the previous subsections the reader was concerned with the degree of truth that a given elementx belongs to a given set A or, in other words, with the so–called elementhood E(x,A) of x withrespect to A. This concept has been generalized by Kosko (1992) who introduced the so–calledsubsethood S(B,A) of a set B with respect to A. The subsethood stands for the degree that agiven set B is a subset of another set A. Defined as the fraction of B which is contained in A thesubsethood can be written as

S(B,A) =M(B ∩A)

M(B)

409


an expression which has the typical ratio form of a conditional probability, namely

P (A|B) =P (B ∩A)

P (B)

Rewriting the subsethood of B with respect to A in the equivalent form

S(B,A) =

∑

i

min[µB(xi),µA(xi)]

∑

i

µB(xi)

one can readily see that

S(B,A) = 0 if, and only if µB(xi)·µA(xi) = 0 ∀ i

S(B,A) = 1 if, and only if µB(xi) ≤ µA(xi) ∀ i

Thus, the subsethood of B with respect to A is 0 if, and only if, the two sets are disjunct, andthe subsethood is 1 if, and only if, the set B is fully contained in A. In all other cases, it mustbe true that

0 < S(B,A) < 1

To a certain degree, the universe of discourse is also contained in any of its subsets. This maylead to an interesting interpretation of the concept of subsethood. Consider the case that A is acrisp subset of the universe of discours U ; then the subsethood of U with respect to A is

S(U ,A) =M(U ∩A)

M(U)=

M(A)M(U)

=nA

nU

where nA and nU represent the cardinalities of A and U . Now, if U stands for a set of identicaland independent probabilistic experiments and A stands for the subset of successful experimentsin U , then the subsethood of U with respect to A represents the relative frequency of the successes.

Using the concept of cardinality for fuzzy sets with a finite support, the degree of fuzziness of theset A can be defined as the ratio

M(A ∩ A)M(A ∪ A)

=

∑

i

min [µA(xi),1− µA(xi)]

∑

i

max [µA(xi),1− µA(xi)]

This expression equals 0 if, and only if, the set A is crisp. It equals 1 if, and only if,

µA(xi) = µA(xi) = 1/2 ∀ i

and it must have a value between 0 and 1 in all other cases. The set A clearly has the maximumdegree of fuzziness in the case that each element attains the maximum degree of fuzziness.

Consider now two sets in the universe of discourse with five rank-ordered elements: the fuzzy setA and the crisp set B defined as

410


A = {0.4,0.7,0.2,0.9,0.3} , B = {1,0,1,1,0}

It is evident that

A = {0.6,0.3,0.8,0.1,0.7} B = {0,1,0,0,1}A ∩ A = {0.4,0.3,0.2,0.1,0.3} A ∪ A = {0.6,0.7,0.8,0.9,0.7}

B ∩ B = the empty set E = {0,0,0,0,0}B ∪ B = the universe of discourse U = {1,1,1,1,1}

A ∩B = {0.4,0,0.2,0.9,0} A ∪B = {1,0.7,1,1,0.3}

Subsethoods can now easily be calculated as

S(B,A) =M(B ∩A)

M(B)=

1.53

S(U,A) =M(U ∩A)

M(U)=

M(A)M(U)

=2.55

The degree of fuzziness of the set A is given by

M(A ∩ A)M(A ∪ A)

=1.33.7

whereas the set B has the degree of fuzziness 0.

7.2.8 Fuzzy Numbers

In daily conversation qualitative terms are frequently used with a quantitative connotation. Im-precise numbers are also used: the distance between Trieste and Koper is roughly twenty kilome-ters, the annual number of road victims is roughly seven thousand, etc. Vagueness or imprecisioncan be observed. One of the interesting features of fuzzy logic is that it can model such impre-cise information via the concept of fuzzy numbers and that it can process these numbers via theconvenient arithmetic operations introduced by Dubois and Prade (1980).Fuzzy numbers are one of various ways to express imprecision of design variables, parameters andbehavior variables (objective functions and constraints). Indeed, imprecision in the design processis the imprecision of given values which are thought to have deterministic characters rather thanstochastic ones. Thus, it is more rational to treat imprecision of design variables and parametersvia fuzzy numbers.

There are several types of fuzzy numbers. Among them, the very simple LR-fuzzy numbers,which are formed by three real numbers and two shape functions as follows

x = (xl,x,xu)

411


Triangular fuzzy numbers

Triangular fuzzy numbers are the subclass of fuzzy numbers with a triangular membership func-tion. A triangular fuzzy number a is characterized by three parameters: the lower value al, themodal value am, and the upper value au. The interval (al,au) constitutes the basis of the triangle,and am is the position of the top (Fig. 7.15), with al < am < au. The modal value am coincideswith the crisp value a.

The length au − al of the basis (the width of the fuzzy number) depends on the actual circum-stances. Thus, if one talks about the triangular fuzzy number roughly equal to twenty to designatethe distance between Trieste and Koper in kilometers, the width may be ten or twenty percent ofthe modal value, but if one talks about the result of a scientific experiment with highly accurateequipment, the width may be a few promilies of the modal value only.

Figure 7.15. Triangular fuzzy number

The membership function of the triangular fuzzy number a is defined by

µA(x) =x− al

am − alif al ≤ x ≤ am

on the left–hand side, and by

µA(x) =au − x

au − amif am ≤ x ≤ au

on the right–hand side, whereas it is 0 elsewhere.

From now triangular fuzzy numbers will be denoted as ordered triples, so that they will simplybe written as

a = (al,am,au)

Trapezoidal fuzzy numbers

Triangular fuzzy numbers are easy to use, but sometimes a more sophisticated model is neededto work with imprecise quantities. Then one can also resort to trapezoidal fuzzy numbers which,having a plateau at the top value 1, are characterized by four parameters. Using the behaviorof the α-level sets one can easily define arithmetic operations which (sometimes approximately)preserve the trapezoidal shape of the membership function. Although the class of trapezoidalfuzzy numbers is more general than the class of triangular fuzzy numbers, cognitive economy isan incentive to use triangular fuzzy numbers in real–life applications.

412


7.2.9 Operations on Fuzzy Numbers

Fuzzy operations are defined using Zadeh’s extension principle, which provides a method for ex-tending non–fuzzy mathematical operations to deal with fuzzy sets and fuzzy numbers.

Addition of triangular fuzzy numbers

A heuristic argument is first presented to make it plausible that the sum of two triangular fuzzynumbers

a = (al,am,au) and b = (bl,bm,bu)

is given by the triangular fuzzy number

a + b = (al + bl,am + bm,au + bu) (7.6)

Let x be a point in the α–level set

[al + α(am − al),au − α(au − am)]

of a. This means that the statement x is roughly equal to a has at least the truth value a.Similarly, let y be a point in the α-level subset

[bl + α(bm − bl),bu − α(bu − bm)]

of b. This means that the statement y is roughly equal to b also has at least the truth value b.Then the statement z = x + y is roughly equal to a + b has at least the truth value a, so that z

must be in the α–level set of the sum of a and b. In other words, when x and y vary over therespective α–level sets just mentioned, then z varies over the interval

[(al + bl) + α ((am + bm)− (al + bl)),(au + bu)− α ((au + bu)− (am + bm))]

This must be the α–level set of the sum of a and b. On the other hand, this is precisely theα–level set of the triangular fuzzy number

(al + bl,am + bm,au + bu)

Hence, the membership function of the sum of a and b has a triangular shape. In other words,the addition is exact in the behavior of the parameters and exact in the shape of the membershipfunction.

Multiplication of triangular fuzzy numbers

The product of two fuzzy numbers is not exactly triangular. If the fuzzy numbers a and b areconsidered again, now under the additional hypothesis that the lower values (and henceforth theother parameters as well) are positive, then the α–level set of their product is the interval

[(al + α(am − al))× (bl + α(bm − bl)),(au − α(au − am))× (bu − α(bu − bm))]

413


However, the α–level set of the triangular fuzzy number

(al bl,am bm,au bu)

is given by the interval

[(albl + α(ambm − albl)),(aubu − α(aubu − ambm))]

On the left–hand side (the right-hand side can be analyzed in a similar way) the deviation fromthe triangular shape is given by

α(ambl + albm − albl − ambm) + α2(am − al)(bm − bl)

and the maximum deviation −0.25 (am − al) (bm − bl) is found at α = 0.5.

In what follows, however, the deviation will be ignored, and the product of a and b will simplybe written as

a× b = a b = (albl,ambm,aubu) (7.7)

Addition of arbitrary fuzzy numbers

Consider again two fuzzy numbers a and b, not necessarily with a triangular membership function.Using the analogy with the convolution in probability theory, the membership function of thesum of the two fuzzy numbers is defined by

µa+b(z) = maxx+y=z

[min(µa(x),µb(y))] (7.8)

Thus, in order to evaluate the membership function of the sum of a and b at the point z, theunion of all pairs (x.y) is taken adding up to z, such that x is roughly equal to a and y roughlyequal to b. This idea leads to the so–called extension principle.

Addition of triangular fuzzy numbers

One can now use the extension principle underlying (7.8) to demonstrate that the sum of twotriangular fuzzy numbers a and b is also triangular and that it satisfies formula (7.6). Theleft–hand sides of the respective membership functions are considered and the points

xα ∈ (al,am) and yα ∈ (bl,bm)

are chosen such that

µa(xα) = µb(yα) with 0 < α < 1

as well as a point is selected

zα ∈ (al + bl,am + bm)

such that

µa+b

(zα) = α

414


The points xα, yα and zα are unique because of the strictly monotonous behavior of the mem-bership functions in question. Let

z∗ = xα + yα

Two arbitrary points x ∈ (al,am) and y ∈ (bl,bm) are now considered such that x + y = z∗.Obviously, if x < xα, then y > yα, which implies

min [µa(x),µb(y)] < α

This inequality also holds when x > xα and y < yα. Moreover,

min [µa(xα),µb(yα)] = α

whence

maxx+y=z∗

[min(µa(x),µb(y))] = α

In other words

µa+b

(z∗) = α

so that one can write zα = z∗ = xα + yα. The observation that zα is now clearly a linear functionof α completes the proof that addition preserves the triangular shape of the membership function.

Multiplication of arbitrary fuzzy numbers

For the product of two arbitrary fuzzy numbers a and b with positive support, the membershipfunction is defined by

µab(z) = maxxy=z

[min (µa(x),µb(y))] (7.9)

Multiplication does not preserve the triangular shape of the membership function if the factorsare triangular, as it was seen in formula (7.7).

Functions of arbitrary fuzzy numbers

In general, the membership function of any function f(a,b) of two fuzzy numbers a and b can nowbe defined by

µf(ab)(z) = maxf(xy)=z

[min(µa(x),µb(y))]

This illustrates the extension principle: the union of the pairs (x,y) is considered such thatf(x,y) = z, where x is roughly equal to a and y roughly equal to b, in order to evaluate themembership function of f(a,b) at the point z.

415


In what follows, a simple set of arithmetic rules will be introduced to operate with triangularfuzzy numbers. In fact, the reader will only operate on the lower, the modal, and the uppervalues characterizing them.

Subtraction of triangular fuzzy numbers

The difference between two triangular fuzzy numbers a = (al,am,au) and b = (bl,bm,bu) is givenby the triangular fuzzy number

a− b = (al − bu,am − bm,au − bl) (7.10)

an assertion which can easily be verified by inspection of the behavior of the α–level sets. Notethat the solution of the equation

b + x = a

is given by

(al − bl,am − bm,au − bu)

which only represents a triangular fuzzy number if the three parameters have an increasing order.In fuzzy arithmetic, there is clearly a distinction between the implicit and the explicit solutionof an equation. In order to clarify the issue, consider the fuzzy numbers a = (9,10,18) andb = (1,3,7). Then

a− b = (2,7,17)

and the solution x to the above equation would be given by (8, 7, 11). The last–named tripledoes not represent a fuzzy number, however. This has important implications.

The sum of a triangular fuzzy number and its fuzzy opposite number is given by

(al − au,am − am,au − al) = (al − au,0,au − al)

which can be taken to represent roughly zero. In general, two triangular fuzzy numbers withopposite modal values could be opposite fuzzy numbers if their sum is roughly zero in the actualcontext. The equation

(al,am,au) + (xl,xm,xu) = (0,0,0)

with exactly zero in the right–hand side, has no solution, however, because the parameters of thetriple

(−al,− am,− au)

do not have the increasing order which is required for fuzzy numbers. In general, a triangularfuzzy number does not therefore have a proper opposite number.

Division of triangular fuzzy numbers

Although division does not preserve triangularity, the ratio of two triangular fuzzy numbers a

and b can be written in the form of the triangular fuzzy number

416


a

b=

(al

bu,am

bm,au

bl

)

provided that the lower values of a and b are positive.

The solution of the equation

b x = a

is approximately given by (al

bl,am

bm,au

bu

)

which indeed stands for a triangular fuzzy number if the three parameters have an increasingorder. Again, there is a distinction between the implicit and the explicit solution of an equation,and there are important implications.

The product of a triangular fuzzy number and its fuzzy inverse is given by

(al,am,au)×(

1au

,1

am,1al

)=

(al

au,am

am,au

al

)=

(al

au,1,

au

al

)

which can be taken to stand for roughly one. In general, two triangular fuzzy numbers withinverse modal values could be fuzzy inverse numbers if their product is roughly 1 in the actualcontext. The equation

(al,am,au)× (xl,xm,xu) = (1,1,1)

with exactly one in the right–hand side, does not have a fuzzy solution because the parametersof the triple (

1au

,1

am,1al

)

do not have the increasing order which is required for fuzzy numbers. In general, a triangularfuzzy number does not therefore have a proper inverse.

Maximum of triangular fuzzy numbers

The maximum of two triangular fuzzy numbers is not necessarily triangular, but one can write

max [(al,am,au),(bl,bm,bu)] = [max (al,bl), max (am,bm), max (au,bu)] (7.11)

The membership function of the maximum (the thick line) is shown in Figure 7.16. The correct-ness can be verified via inspection of the behavior of the α–level sets.

417


Figure 7.16. The maximum of two triangular fuzzy numbers

Exponentials, logarithms, and inverses

Functions of a single triangular fuzzy variable can also be defined via the extension principle.This leads to simple operations like the calculation of exponentials which can be written in theform

exp [(al,am,au)] = [exp(al), exp(am), exp(au)] (7.12)

whereby it is tacitly assumed that the deviations from triangularity may be ignored. Under thecondition al > 0 one can also write

ln (al,am,au) =] ln(al), ln(am), ln(au)] (7.13)

and1

(al,am,au)=

(1au

,1

am,1al

)

whereby it is also tacitly ignored that triangularity is not exactly preserved.

7.3 Fuzzy SMART

The simple multiattribute rating technique (SMART, Von Winterfeldt and Edwards, 1986) is amethod for multicriterial decision making (MCDM) whereby the decision maker evaluates a finitenumber of decision alternatives under a finite number of performance criteria. The purpose ofthe analysis is to rank the alternatives in a subjective order of preference and, if possible, torate the overall performance of the alternatives via the proper assignment of numerical grades orscores. SMART is first presented in its deterministic form, regardless of the vagueness of humanpreferential judgement, and thereafter a fuzzy variant is discussed which can easily be used for asensitivity analysis of the results. As a vehicle for discussion an example is employed to illustratethe applications of MCDM: the evaluation and the selection of a class of vessels.

7.3.1 Screening Phase

MADM starts with the so–called screening phase which proceeds via several categorizations.What is the objective of the decision process? Who is the decision maker or what is the compo-sition of the decision–making group? What are the performance criteria to be used in order tojudge the alternatives? Which alternatives are feasible or not totally unfeasible?

418

7.3 – Fuzzy SMART

Throughout the decision process new alternatives may appear, new criteria may emerge, oldones may be dropped, and the decision–making group may change. Many decision problems arenot clear–cut, and the decision makers have to find their way in the jungle of conflicting objectives.

The result of the screening phase is the so–called design matrix which exhibits the performanceof the alternatives. Under the so–called quantitative or measurable criteria the performance isrecorded in the original physical or monetary units. Under the qualitative criteria it can only beexpressed in verbal terms. Table 7.3 shows such a possible performance tableau for the vesselselection example. The tacit assumption is that the alternatives are in principle acceptable for thedecision makers and that a weak performance under some attributes (criteria) can be compensatedby an excellent performance under some of the remaining ones. In other words, the decisionmakers are in principle prepared to trade–off possible deficiencies of the alternatives under someattributes against possible benefits elsewhere in the performance matrix. The alternatives whichdo not appear in the matrix have been dropped from consideration because their performanceunder at least one of the attributes was beyond certain limits (crisp constraints).

Attribute A1 A2 A3 A4

Table 7.3. Decision matrix of four vessels under seven criteria

The importance of the tableau cannot be underestimate. In many situations, once the data areon the table, the preferred alternative clearly emerges and the decision problem can easily besolved. It is left to the decision makers to arrive at a compromise solution.

Given the performance matrix, the next question is how to select the attributes which are reallyrelevant. The number of attributes might be too large, and they are not independent. For exam-ple, the acquisition cost, the fuel consumption, and the costs for maintenance are closely related.Hence, the decision maker could take the estimated annual expenditures or just the buildingcost to represent the costs in the selection problem. Similarly, low accelerations and the absenceof noise and vibrations contribute to the comfort on board, which could be the real attribute.Nevertheless, measurable criteria usually help the decision makers to remain down to earth sothat they are not swept away by the nice design of a general arrangement plan, for instance.

Finally, the decision makers have to convert the data of the design matrix into subjective values ex-pressing their preferential judgement. For the qualitative criteria they usually have an arithmeticscale only to express their assessment of the performance. The seven–point scale 1, . . . ,7 whichis well–known in the behavioral sciences, and the scale 4, . . . ,10 which can easily be used for thesame purposes, will extensively be discussed in the subsections to follow. Under the quantitative

419


criteria the conversion is also non-trivial. To this end. a simple and straightforward conversionprocedure is proposed below, which derives many arguments from the behavioral sciences andfrom psycho–physics.

7.3.2 Categorization of a Range

Consider the subjective evaluation of vessels in the vessel selection problem, first under the build-ing cost criterion, thereafter under the operability criterion and the maximum speed criterion.This will enable the decision maker to illustrate not only the subdivision of the ranges of accept-able performance data but also the generation of judgemental categories (cost categories, . . .).For the time being, the problem is considered from the viewpoint of a single decision maker only.

Vessels under the cost criterion

Usually, low costs are important for a decision maker so that he/she should carefully consider thebuilding cost and possibly the annual manning costs. The building cost as such, however, cannottell whether a given vessel would be more or less acceptable. That depends on the context of thedecision problem, that is, on the spending power of the decision maker and on the alternativevessels which he/she seriously has at disposal. In what follows it will be assumed that theacceptable costs are anchored between a minimum cost Cmin to be paid anyway for the type orclass of vessels which the decision maker seriously considers and a maximum cost Cmax whichhe/she cannot or does not really want to exceed. Furthermore, it is assumed that the decisionmaker will intuitively subdivide the cost range (Cmin,Cmax) into a number of subintervals whichare felt to be subjectively equal. The grid points Cmin, Cmin + e◦, Cmin + e1, . . . , are takento denote the cost levels which demarcate these subintervals. The cost increments e0,e1,e2, . . .

represent the echelons of the so–called category scale under construction. In order to model therequirement that the subintervals must subjectively be equal, one should recall Weber’s statingthat the just noticeable difference ∆s in stimulus intensity must be proportional to the actualstimulus intensity itself. The just noticeable difference is the smallest possible step when thedecision maker moves from Cmin to Cmax, which is assumed to be practically the step carried outin the construction of the model. Thus, taking the cost increment Cmin as the stimulus intensity,i.e. assuming that the decision maker is not really sensitive to the cost as such but to the excessabove the minimum cost to be paid anyway for the vessels under consideration, it is set

eν − eν−1 = ε·eν−1 ν = 1,2, . . .

which yields

eν = (1 + ε) eν−1 = (1 + ε)2 eν−2 = . . . = (1 + ε)ν e◦

Obviously, the echelons constitute a sequence with geometric progression. The initial step is e0

and (1 + ε) is the progression factor. The integer–valued parameter ν is chosen to designate theorder of magnitude of the echelons.

420

7.3 – Fuzzy SMART

The number of subintervals is rather small because human beings have the linguistic ability to usea small number of verbal terms or labels in order to categorize the costs (cognitive economy). Thefollowing qualifications are commonly used as category labels here: ‘cheap’, ‘cheap - somewhatmore expensive’, ‘somewhat more expensive’, ’somewhat more-more expensive’, ‘more expensive’,‘more - much more expensive’, ‘much more expensive’.

Thus, there are four major, linguistically distinct categories: cheap, somewhat more, more, andmuch more expensive vessels. Moreover, there are three so–called threshold categories betweenthem which can be used when the decision maker hesitates between the neighboring qualifica-tions. Now it is necessary to link the cost categories with the cost levels Cmin + e0, Cmin + e1, . . .

.

The next subsection will show that human beings follow a uniform pattern in many unrelatedareas when they subdivide a particular range into subjectively equal subintervals. They demar-cate the subintervals by a geometric sequence of six to nine grid points corresponding to majorand threshold echelons, and the progression factor is roughly 2. Sometimes there is a geometricsequence with grid points corresponding to major echelons only, and the progression factor isroughly 4.

Take, for instance, the range between MU 20,000 and MU 40,000, where MU denotes the monetaryunit, for small to mid–size vessels. The length of the range is MU 20,000. Hence, setting the costlevel Cmin + e6 at Cmax one has

e6 = Cmax − Cmin

e0 (1 + ε)6 = 20,000 and (1 + ε) = 2 ⇒ e0 = 20,000/64 ≈ 300

Now, the cost levels are associated with the cost categories as follows:

C◦ = Cmin + e◦ MU 20,300 cheap vesselsC1 = Cmin + e1 MU 20,600 cheap - somewhat more expensive vesselsC2 = Cmin + e2 MU 21,200 somewhat more expensive vesselsC3 = Cmin + e3 MU 22,500 somewhat more - more expensive vesselsC4 = Cmin + e4 MU 25,000 more expensive vesselsC5 = Cmin + e5 MU 30,000 more - much more expensive vesselsC6 = Cmin + e6 MU 40,000 much more expensive vessels

Thus, the cost range (Cmin,Cmax) has been ‘covered’ by the grid with the geometric sequence ofpoints

Cν = Cmin + (Cmax − Cmin)× 2ν

64ν = 1,2, . . . ,6 (7.14)

In what follows Cν ie taken to stand for the ν–th cost category and the integer–valued parameterν for its order of magnitude, which is given by

ν = log2

(Cν − Cmin

Cmax − Cmin× 64

)(7.15)

421


Categorization of the costs means that each cost in or slightly outside the range (Cmin, Cmax)is supposed to ‘belong’ to a particular category, namely the category represented by the nearestCν . Of course, categorization can more appropriately be modelled via fuzzy–set theory. This willbe considered in the subsection 8.3.5. The vessels of the category C0 are referred as the cheapones within the given context, and the vessels of the categories C2, C4, and C6 as the somewhatmore, more, and much more expensive ones. At the odd–numbered grid points C1, C3, and C5,the decision maker hesitates between two adjacent gradations of expensiveness. If necessary, onecan also introduce the category C8 of vastly more expensive vessels which are situated beyondthe range, as well as the category C7 if the decision maker hesitates between much more andvastly more expensiveness. The even–numbered grid points are the so-called major grid pointsdesignating the major gradations of expensiveness. They constitute a geometric sequence in therange (Cmin, Cmax) with progression factor 4. If the decision maker also takes into account theodd-numbered grid points corresponding to hesitations, he/she has a geometric sequence of majorand threshold gradations with progression factor 2.

Figure 7.17. Categorization of a cost range

The crucial assumption here is that the decision maker considers the costs from the so–calleddesired target Cmin at the lower end of the range of acceptable costs. From this viewpoint he/shelooks at less favorable alternatives. That is the reason why the above categorization, in principlean asymmetric subdivision of the range under consideration, has an orientation from the lowerend. The upward direction is typically the line of sight of the decision maker under the costcriterion. Figure 7.17 shows the concave form of the relationship between the cost echelons onthe interval (Cmin, Cmax) and their order of magnitude.

Suppose that the costs of the vessels Aj and Ak belong to the categories represented by Cνj andCνk

, respectively. The relative preference for Aj with respect to Ak is expressed by the inverseratio of the cost increments above Cmin, which can be written as

Cνk− Cmin

Cνj − Cmin= 2 νk−νj (7.16)

422

7.3 – Fuzzy SMART

By this definition, a vessel in the cost category C0 is 4 times more desirable than a vessel in thecategory C2. The first–named vessel is said to be somewhat cheaper, the last–named vessel issomewhat more expensive. Hence, assuming that the decision maker also has a limited numberof labels to express relative preference in comparative judgement, we identify the ratio 4:1 withweak preference. Similarly, he/she identifies the ratio 16:1 with definite preference (the first–named vessel is cheaper, the last–named vessel more expensive), and a ratio of 64:1 with strongpreference (the first–named vessel is much cheaper, the last–named vessel much more expensive).The relative preference depends strongly on Cmin and weakly on Cmax. When Cmax increases,two costs which initially belong to different cost categories will tend to belong to the same one.

Vessels under the operability criterion

Numerical data to estimate the operability of vessels are usually available. Suppose that thedecision maker only considers vessels with an operability of at least Omin = 95%, so that he/sheis restricted to the interval (Omin, Omax) with Omax usually set to 100%. Following the mode ofoperation just described, the decision maker obtains the major grid points (the major categoriesof operability)

O0 = Omax − e0 = 99.9 % operable vesselsO2 = Omax − e2 = 99.7 % somewhat operable vesselsO4 = Omax − e4 = 98.7 % less operable vesselsO6 = Omax − e6 = 95.0 % much less operable vessels

because e0 = (100− 95)/64 ≈ 0.08. In general one can write

Oν = Omax − (Omax −Omin)× 2ν

64ν = 1,2, . . . ,6

The alternatives are compared with respect to the desired target, which is here taken to be atthe upper end Omax of the range of acceptable operabilities. The relative preference is inverselyproportional to the distance from the target. If one takes the symbols Oνj and Oνk

to denote theoperability of the alternative vessels Aj and Ak respectively, then the inverse ratio

Omax −Oνk

Omax −Oνj

= 2 νk−νj

represents the relative preference for Aj with respect to Ak under the operability criterion. Thequalification ‘somewhat more operable’ implies that the inverse ratio of the distances to therespective target is 4:l; the qualification ‘more operable’ implies that the inverse ratio is 16:l, etc.The relationship between the order of magnitude ν and the operability category Oν takes theexplicit form

ν = log2

(Omax −Oν

Omax −Omin× 64

)(7.17)

The typical relationship between the echelons on the dimension of operability and their orders ofmagnitude is shown in Figure 7.18.

423


Figure 7.18. Categorization of an operability range

Vessels under the maximum speed criterion

It may happen that the categorization starts not from the desired target at one end of the range,but from the opposite end point because the desired target is hazy. An example is given by thecategorization of the maximum velocities. The range of acceptable maximum velocities has aclear lower end point Vmin at 14.0 kn. Even if the shipowner should consistently prefer highermaxima to lower ones, the desired target is difficult to specify. Set Vmax at 22.0 kn. It seemsto be reasonable to choose the orientation from Vmin so that the following major grid points doexist:

V0 = Vmin + e0 = 14.1 kn slow vesselsV2 = Vmin + e2 = 14.5 kn somewhat faster vesselsV4 = Vmin + e4 = 16.0 kn faster vesselsV6 = Vmin + e6 = 22.0 kn much faster vessels

In general one has

Vν = Vmin − (Vmax − Vmin)× 2ν

64ν = ,2, . . . ,6

where the order of magnitude ν and the category Vν are connected by the relation

ν = log2

(Vν − Vmin

Vmax − Vmin× 64

)(7.18)

The relative maximum speed of two alternative vessels Aj and Ak with maximum speeds Vνj andVνk

respectively, is given by the ratio

eνj

eνk=

Vνj − Vmin

Vνk− Vmin

= 2 νj−νk (7.19)

not by the inverse ratio of the echelons as in the previous cases. The choice of the orientation isleft to the decision maker. What matters is his/her perspective on the decision problem.

424

7.3 – Fuzzy SMART

7.3.3 Assessing the Alternatives: Direct Rating

When decision makers judge the performance of the alternatives, they frequently express theirjudgement by choosing an appropriate value between a predetermined lower limit for the worstalternative and a predetermined upper limit for the best alternative. In schools and universitiesthis direct–rating procedure is known as the assignment of grades expressing the performance ofthe pupils or students on a category scale with equidistant steps, between 1 and 5, between 1 and10, or between 1 and 100. The upper limit varies from country to country. Because everybody hasonce been subject to his/her teacher’s judgement the grades have a strong qualitative connotationwhich can successfully be used in MADM. Concentrating on the scale between 1 and l0 supposethat a unit step difference represents an order of magnitude difference in performance. A studentwho scores 9 is an order of magnitude better than a pupil scoring 8, etc. A unit step differencedesignates a performance ratio 2. Returning to the vessel selection problem the following gradesare assigned

10 excellent order of magnitude ν = 08 good order of magnitude ν = 26 fair order of magnitude ν = 44 poor order of magnitude ν = 6

according to the major gradations of expensiveness and operability (in pass–or–fail decisions atschools the grades 1, 2, and 3 are normally used for a very poor performance that cannot becompensated elsewhere so that they are mostly ignored here). Now, considering two alternativevessels Aj and Ak under the building cost criterion with the respective grades

gj = 10− νj and gk = 10− νk

assigned to them, the inverse ratio is taken

eνk

eνj

=Cνk

− Cmin

Cνj − Cmin= 2 gj−gk

to stand for their relative expensiveness. The relative operability of the two alternatives is scoredin a similar way. Figure 7.19 and Figure 7.20 illustrate the relationship between judgementalcategories, orders of magnitude, and grades.

For maximum speed it is suitable a somewhat different assignment of grades. One takes

gj = 4 + νj and gk = 4 + νk

since he/she does not have an orientation from the desired target but from the opposite end ofthe range. Thus, the relative maximum speed of the two alternatives is expressed by the ratio

eνj

eνk

=Vνj − Vmin

Vνk− Vmin

= 2 gj−gk

not by the inverse ratio of maximum velocities above Vmin.

425


Figure 7.19. Categorization of a cost range

Figure 7.20. Categorization of an operability range

Because a decision maker works in fact with differences of grades only, there is an additive degreeof freedom in the grades which enables him/her to replace the scale 4, . . . ,10 by the scale 1, . . . ,7,a scale which is well–known in the behavioral sciences. Similarly, one can convert the qualitativescale ranging from — to +++ into the quantitative scale 4, . . . ,10. Thus, the decision makerhas a variety of scales to express his judgement, but there is a uniform approach to analyze theresponses.

The direct-rating procedure is illustrated via the vessel selection problem, under the assumptionthat the decision maker considers acquisition cost in the range between MU 20,000 and MU40,000, operability between 95% and 100%, and maximum velocities between 14 kn and 22 kn(by these ranges judgement as shown in Table 7.4.

426

7.3 – Fuzzy SMART

Cost operability Speed Performance Grade Qual. Scale

20,300 99.9 22.0 Excellent 10 +++20,600 99.8 18.0 Good/Excellent 9 ++21,200 99.7 16.0 Good 8 +++22,500 99.2 15.0 Fair/Good 7 025,000 98.7 14.5 Fair 6 -30,000 97.5 14.3 Poor/Fair 5 - -40,000 95.0 14.1 Poor 4 - - -

Table 7.4. Assignment of grades with predetermined range

Such an assignment of grades is feasible when the performance of the alternatives can be expressedin physical or monetary units on a one-dimensional scale. A direct–rating procedure is also used,however, when the performance can only be expressed in qualitative terms. In the evaluation ofvessels under the criterion of comfort, for instance, the decision maker is asked to rate the vesselsstraightaway. First, he/she has to determine the endpoints being requested to identify the worstand the best alternative and to assign proper grades to them. Thereafter he/she can interpolatethe remaining alternatives between the endpoints. This procedure also explains the name of theconsidered method: the simple direct–rating technique SMART to assess a number of alternativesunder a multiple set of attributes or criteria.

7.3.4 Criterion Weights and Aggregation

Some notation and terminology are introduced first. Consider a finite number of alternativesAj (j = 1, . . . ,n), under a finite number of performance criteria Ci (i = 1, . . . ,m), with the respec-tive criterion weights ci (i = 1, . . . ,m). Furthermore, it is assumed that the criterion weights arenormalized so that they sum up to l. The decision maker assessed the alternatives under eachof the criteria separately. Moreover, the decision maker expressed his/her judgement of alter-native Aj under criterion Ci by the assignment of the grade gij which from now will be calledthe impact grade. So far, the decision maker has been working on different dimensions: buildingcost, operability, and maximum speed. Judgemental statements like ‘somewhat more expensive’and ‘somewhat more operable’ cannot be aggregated, however, unless a transition is made to thenew, common dimension of desirability or preference intensity. That is the reason why the ex-pression ‘somewhat more operable’, for instance, is taken to stand for ‘somewhat more desirable’or ‘weakly preferred’ under the operability criterion. Similarly, it is assumed that the expression‘somewhat more fast’ stands for ‘somewhat less desirable’ under the building cost attribute, etc.

In order to judge the overall performance of the alternatives under all criteria simultaneously wecalculate the final grades sj of the respective alternatives Aj according to the so–called arithmetic–mean aggregation rule

sj =m∑

i=1

ci gij j = 1, . . . ,n (7.20)

427


The highest final grade is supposed to designate the preferred alternative. Let us first discuss thesignificance of the criterion weights, however, as well as their elicitation.

We start from the assumption that criteria have particular weights in the mind of the decisionmaker. They could depend on the manner in which the performance of the alternatives under eachof the criteria individually has been recorded, that is, on the units of performance measurement.They could also depend on the aggregation procedure generating the final grades which expressthe performance of the alternatives under all criteria simultaneously. Usually, however, decisionmakers ignore these issues. They are prepared to estimate the weights of the criteria or their rel-ative importance (weight ratios), regardless of how the performance of the alternatives has beenmeasured and regardless of the aggregation procedure so that they seem to supply meaninglessinformation. Many decision makers want to be consistent over a coherent collection of decisionproblems. Moreover, there is a good deal of distributed decision making in large organizations.The evaluation of a number of decision alternatives is entrusted to a committee, the criteria aresuggested in vague verbal terms or firmly prescribed by those who established the committee,but the choice of the attribute weights and the final aggregation are felt to be the responsibilityof administrators or design leaders at higher levels in the hierarchy.

In SMART the decision maker may ignore the units of performance measurement because thegrades do not depend on them. Under the building cost criterion, for instance, he/she may replaceUS dollars by Euros or any other currency, but this does not affect the orders of magnitude ofthe cost categories. Lootsma (1993, 1996) presented a more detailed discussion of the issue in astudy on SMART and the Multiplicative AHP, a multiplicative variant of the Analytic HierarchyProcess (Saaty, 1980). The relative importance of the criteria appeared to be a meaningful con-cept, even in isolation from immediate context (see also Section 5.3). In what follows we shallwork with the arithmetic–mean aggregation rule without further discussion.

We now concern ourselves with the numerical scale to quantify the relative importance (the weightratio) of any two criteria. The first thing we want to establish is the range of possible valuesfor the relative importance. Equal importance of two criteria is expressed by the ratio 1:1 ofthe criterion weights, but how do we express much more or vastly more importance? In order toanswer the question we carry out an imaginary experiment: we ask the decision maker to considertwo alternatives Aj and Ak and two criteria such that his/her preference for Aj over Ak under thefirst criterion Cf is roughly equal to his/her preference for Ak over Aj under the second criterionCs. Moreover, we suppose that the situation is extreme: the impact grades assigned to the twoalternatives are 6 units apart under each of the two criteria. We can accordingly write the impactgrades assigned to Aj as

gfj and gsj

and the impact grades assigned to Ak as

gfk = gfj − 6 and gsk = gsj + 6

This means that the decision maker has a strong preference for Aj over Ak under the first criterionCf and an equally strong but opposite preference under the second criterion Cs. If the two criteria

428

7.3 – Fuzzy SMART

are felt to be equally important, the final grades of the two alternatives will be equal so that thedecision maker is indifferent between the two alternatives under the two criteria simultaneously.However, if the final grades are 5 units apart, the ratio ω of the corresponding criterion weightshas to satisfy the relation

{ω

ω + 1gfj +

1ω + 1

gsj

}−

{ω

ω + 1(gfj − 6) +

1ω + 1

(gsj + 6)}

= 5 (7.21)

whence ω = 11. Such a ratio implies that the strong preference for Aj over Ak under Cf almostcompletely wipes out the equally strong but opposite preference under Cs.

If the impact grades of the two alternatives are 8 units apart under each of the two criteria andif the final grades are 7 units apart, the ratio ω has to satisfy the relation

{ω

ω + 1gfj +

1ω + 1

gsj

}−

{ω

ω + 1(gfj − 8) +

1ω + 1

(gsj + 8)}

= 7 (7.22)

which yields ω = 15.

In addition, two new assumptions are now introduced: (i) the number of gradations to expressrelative preference for the alternatives equals the number of gradations for the relative importanceof the criteria, and (ii) the numerical values associated with the gradations of relative importanceconstitute a sequence with geometric progression. In the extreme case of formula (7.21), where amuch higher preference under the first criterion is practically wiped out by a much higher pref-erence under the second criterion, we accordingly refer to the relative importance of the firstcriterion with respect to the second one as much higher . Similarly, in the extreme case of formula(7.22), the relative importance of the first criterion with respect to the second one is vastly higher .So, a ratio of 16:1 may be taken to stand for vastly more importance. This is also confirmed byother imaginary experiments where a decision maker is supposed to have a very strong preferencefor Aj under the first of three criteria and a definite preference for Ak under the second and thethird criterion.

A simple geometric sequence of values, with echelons corresponding to equal, somewhat more,more, much more, and vastly more importance, and ‘covering’ the range of values between 1/16and 16, is given by the sequence 1/16, 1/8, 1/4, 1/2, l, 2, 4, 8, 16 with progression factor 2.Hence, we obtain the following geometric scale for the major gradations of relative importance:

16 Cf vastly more important than Cs

8 Cf much more important than Cs

4 Cf more important than Cs

2 Cf somewhat more important than Cs

1 Cf as important as Cs

and if we allow for threshold gradations to express hesitations between two adjacent qualificationsin the above list, we have a geometric sequence with progression factor

√2.

We can now ask the decision maker to express the importance of the criteria in grades on thescale 4,5, . . . ,10 (grades lower than 4 are possible but they practically eliminate the correspondingcriteria). A difference of 6 units has to represent the ratio 8. This can be achieved via the

429


progression factor√

2. Taking hi to stand for the grade assigned to criterion Ci, we estimate theratio of the weights of two criteria Ci1 and Ci2 by

(√2)hi1

−hi2

An non-normalized weight of Ci is accordingly given by (√

2)hi .

The normalization thereafter, whereby we eliminate the additive degree of freedom in the grades,yields the desired weight

ci =(√

2)hi

∑

i

(√

2)hi(7.23)

of criterion Ci. The criterion weights clearly sum up to l.

7.3.5 Sensitivity Analysis via Fuzzy SMART

The SMART version described so far has a particular drawback: the decision maker is supposed tochoose one grade only, for the assessment of an alternative under a given criterion and also for theassessment of the importance of the criteria. Usually, however, he/she realizes that several grades(not only the integer–valued ones) are more or less appropriate. There are several reasons for this.

1. Even if the performance of the alternatives can be expressed on a numerical scale, in phys-ical or monetary units, the data may be imprecise: building cost is negotiable, maximumspeed may be measured under ideal circumstances, and so forth.

2. The gradations of human judgement (excellent, good, fair, poor), to be used if the perfor-mance of the alternatives can only be expressed in verbal terms under qualitative criteriasuch as design, ambiance, comfort, . . . , are vaguely defined.

3. The upper and the lower ends of the ranges of acceptable performance data urge the decisionmaker to concentrate his/her attention on the alternatives which are in principle acceptable,but they are only vaguely known.

4. The relative importance of the criteria, usually expressed in verbal terms only (equally im-portant, somewhat more, more, much more important), has an imprecise meaning.

The decision maker could accordingly model his/her judgement by assigning truth values toseveral (integer and non–integer) grades in order to express how well they represent his/her pref-erence. An alternative mode of operation would be to choose the most appropriate grade as we1las right–hand and left–hand spreads indicating how far his/her judgement extends.

Let us further explore the last–named approach. Then we model the decision maker’s judgementof the performance of alternative Aj under criterion Ci by a fuzzy number, in particular by thetriangular fuzzy number

gij = (gijl,gijm,giju)

430

7.3 – Fuzzy SMART

Of course, the decision maker is supposed to supply, not only the modal value gijm, but also thelower value gijl and the upper value giju. Similarly, we model the importance of criterion Ci bythe triangular fuzzy number

hi = (hil,him,hiu)

again under the assumption that the decision maker wi1l be prepared to supply, not only themodal value him, but also the lower value hil and the upper value hiu. Using the arithmeticoperations of subsection 7.3.4 we find a non–normalized weight of criterion Ci of the form

((√

2)hil),(√

2)him),(√

2)hiu))

and we obtain normalized weights of the criteria when we divide the lower, modal, and uppervalues by the sum of the modal values. This procedure (allowed because the grades hil, him,hiu, i = 1, . . . ,m, are supposed to have an additive degree of freedom) guarantees that the modalvalues are properly normalized in the sense that they sum up to l.

This approach does not seem to be practical. We ask the decision maker to supply much moreinformation than in the crisp case whereas the added value of the analysis does not proportionallyincrease. However, we can simplify the procedure considerably. In order to get a rough idea ofhow the decision maker’s imprecision affects the final grades of the alternatives, he/she can beasked to specify a uniform right–hand spread and left–hand spread σ (= upper - modal value= modal - lower value) which he/she almost never exceeds in the actual decision problem. Areasonable value seems to be given by σ = 1 so that not only a given integer grade 9 but also thenon–integer grades between g − 1 and g + 1 are more or less prototypical. We can now write

gij = (gijm − σ,gijm,gijm + σ)

hi = (him − σ,him,him + σ)

Normalized criterion weights are accordingly given by

ci = cim ×((√

2)−σ),1,(√

2)σ))

where cim stands for the normalized weight of criterion Ci in the crisp case, written as

cim =(√

2)him

∑

i

(√

2)him(7.24)

Ignoring the bounds on the impact grades we could write the fuzzy final grades of the alternativesin the form

sj =∑

i

ci gij =∑

i

cim ×((√

2)−σ),1,(√

2)σ))× (gijm − σ,gijm,gijm + σ)

but since the final grades must be between 4 and 10 we have the approximate result

sjm =∑

i

cim gijm (7.25)

431


sjl = max(4,(√

2)−σ (sjm − σ))

(7.26)

sju = min(10,(

√2)σ (sjm + σ)

)(7.27)

Take these formulas for a sensitivity analysis of the results in Table 7.5.

Attribute Weight Des1 Des2 Des3 Des4

Building costMaximum speedAccelerationCargo volumeoperabilityAmbiance

Final scores

Table 7.5. Final design matrix for a vessel selection problem

With σ = 1 the fuzzy final grades of the alternative vessels are as follows:

Des1 (4.0, 6.6, 10)Des2 (4.0, 7,0, 10)Des3 (4.0, 7.0, 10)Des4 (5.0, 6.0, 9)

The 50%–level sets of the fuzzy final grades of Des1 and Des2 are (5.3, 8.3) and (5.5, 8.5) respec-tively. The ratio of the overlap and the underlap of the two sets is 0.88, a rather high value whichdoes not yet allow us to drop Des3 and Des4. In general, we take the above ratio of overlap andunderlap as a measure for the difference between two triangular fuzzy numbers. It may be equalto one, even when the modal values do not coincide, and it is zero as soon as the area where thetwo triangles overlap is below the 50% level.

In the crisp version of SMART described in the previous sections and in the present fuzzy ex-tension. the calculations are very simple indeed, and they should be. The leading objective ofMADM is to structure a decision problem. First, there is the screening phase where the decisionmakers, the alternatives, and the criteria are selected. Next, the performance matrix is drawnup. If the information so obtained does not readily bring up a preferred alternative, a full–scaleMADM problem arises. By an appropriate choice of the ranges representing the context of theactual decision problem the decision maker can easily assign impact grades to the alternativesunder the respective criteria, in order to express how well the alternatives perform within the cor-responding ranges. The final grades designate a possibly preferred alternative. For a sensitivityanalysis the decision maker may use a fuzzy extension of the method.

432

7.4 – Additive and Multipllcative AHP

7.4 Additive and Multipllcative AHP

The Analytic Hierarchy Process (AHP) of Saaty (1980) is a widely used method for MADM,presumably because it elicitates preference information from the decision makers in a mannerwhich they find easy to understand. The basic step is the pairwise comparison of two so–calledstimuli, two alternatives under a given criterion, for instance, or two criteria. The decision makeris requested to state whether he/she is indifferent between the two stimuli or whether he/shehas a weak, strict, strong, or very strong preference for one of them. The original AHP hasbeen criticized in the literature because the algorithmic steps do not properly take into accountthat the method is based upon ratio information. The shortcomings can easily be avoided in theAdditive and the Multiplicative AHP to be discussed below. The Additive AHP is the SMARTprocedure with pairwise comparisons on the basis of difference information. The MultiplicativeAHP with pairwise comparisons on the basis of ratio information is a variant of the original AHP.There is a logarithmic relationship between the Additive AHP (SMART) and the MultiplicativeAHP. Both versions can easily be fuzzified. The reasons why we deviate from the original AHPwill be explained at the end of this subsection.

7.4.1 Pairwise Comparisons

First, the assessment of the alternatives under the respective criteria is considered. In the basicpairwise–comparison step of the AHP, two alternatives Aj and Ak are presented to the decisionmaker whereafter he/she is requested to judge them under a particular criterion. The underlyingassumptions are: (i) under the given criterion the two alternatives have subjective values Vj andVk for the decision maker, and (ii) the judgemental statement whereby he/she expresses his/herrelative preference for Aj with respect to Ak provides an estimate of the ratio Vj/Vk. For reasonsof simplicity we immediately illustrate the basic step via the subjective evaluation of vessels, firstunder the cost criterion and thereafter under the operability criterion. Finally, the subjectiveevaluation under qualitative criteria is briefly discussed.

Vessels under the cost criterion

We assume again that the decision maker is only prepared to consider alternative vessels withbuilding costs between a lower bound Cmin, the cost to be paid anyway for the vessels whichhe/she seriously has in mind, and an upper bound Cmax, the cost that he/she cannot or does notreally want to exceed. In order to model the relative preference for alternative Aj with respectto Ak we categorize the costs which are in principle acceptable. We first ‘cover’ the range (Cmin,Cmax) by the grid with the geometric sequence of points

Cν = Cmin + (Cmax − Cmin)× 2ν

64ν = 0,1, . . . ,6

Just like in SMART we take Cν to stand for the ν-th cost category and the integer–valuedparameter ν for its order of magnitude, which is given by

433


ν = log2

(Cν − Cmin

Cmax − Cmin× 64

)(7.28)

The vessels of the category C◦ are the cheap ones within the given context. The vessels ofthe categories C2, C4, and C6 are somewhat more, more, and much more expensive. At theodd–numbered grid points C2, C3, and C5 the decision maker hesitates between two adjacentgradations of expensiveness. Sometimes we also introduce the category C8 of the vastly moreexpensive vessels which are situated beyond the range, as we1l as the category C7 if the decisionmaker hesitates between much more and vastly more expensiveness. The even–numbered gridpoints are the so–called major grid points designating the major gradations of expensiveness.They constitute a geometric sequence in the range (Cmin, Cmax) with progression factor 4. Ifwe also take into account the odd–numbered grid points corresponding to hesitations, we have ageometric sequence of major and threshold gradations with progression factor 2.

Suppose that the costs of the vessels Aj and Ak belong to the categories represented by

Cνj and Cνk

respectively. We express the relative preference for Aj with respect to Ak by the inverse ratio ofthe cost increments above the desired target Cmin so that it can be written as

Ojk =

(Cνk

− Cmin

Cνj − Cmin

)= 2νk−νj (7.29)

A vessel of the category C2 is somewhat more expensive than a vessel in the category C◦. In otherwords, there is a weak preference for the vessel in the category C◦. It is 4 times more desirablethan a vessel in the category C2. On the basis of such considerations weak preference with theratio 4:l are identified. Similarly, definite preference is identified with the ratio 16:l, and strongpreference with the ratio 64:l.

If the cost category around Cν is represented by the grade g = 10− ν, the relative preference forthe vessel Aj with respect to Ak can be expressed by the difference of grades

qjk = log2 Ojk = log2

(Cνk

− Cmin

Cνj − Cmin

)= νk − νj = gj − gk (7.30)

The major gradations of the decision maker’s comparative judgement are now put on a numericalscale in two different ways. We assign scale values, either to the relative preferences themselves,or to the logarithms of the relative preferences. The assignment is shown in Table 7.6, whererelative preferences with scale values assigned to them, in real magnitudes as ratios of subjectivevalues. and in logarithmic form as differences of grades. The reader can easily complete theassignment of values to the threshold gradations between the major ones.

434


Comparative judgement of Relative preference for Rel. preference Ojk Diff. of gradesAj with respect to Ak Aj w.r.t. Ak in words in real magnitudes qjk = log2 Ojk

Aj much less expensive strong preference for Aj 64 6Aj less expensive strict, definite pref. for Aj 16 4Aj somewhat less expensive weak preference for Aj 4 2Aj as expensive as Ak indifference 1 0Ak somewhat less expensive weak preference for Ak 1/4 -2Ak less expensive strict, definite pref. for Ak 1/16 -4Ak much less expensive strong preference for Ak 1/64 -6

Table 7.6. Comparative judgement under the acquisition cost criterion

There are now two different ways to collect the preference information from the decision maker:

• He/she can be asked to consider the axis corresponding to the building cost criterion andto specify the endpoints of the range of acceptable costs. Next, we identify the judgementalcategories on the range, the corresponding orders of magnitude, and the correspondinggrades. Thereafter, we can immediately express his/her relative preference Ojk for Aj withrespect to Ak under the cost criterion by

Ojk = 2 qjk = 2 gj−gk (7.31)

• If the decision maker is unable or unwilling to specify the endpoints of the range of accept-able costs we can ask him/her to express his/her comparative judgement directly in words,that is, to state whether he/she is indifferent between the two alternatives under the givencriterion, or whether he/she has a weak, a definite, or a strong preference for one of the two.Thereafter, we set the numerical estimate rjk of his/her relative preference for Aj with re-spect to Ak under the building cost criterion to the appropriate value as shown in Table 7.6.

To illustrate matters, it is supposed that a decision maker considers two alternative vessels Aj andAk of MU 25.000, and MU 30,000, respectively. The decision maker is not prepared to specify theendpoints of the range of acceptable costs. Nevertheless, it remains necessary to keep a somewhatholistic view on the alternatives A1, . . . ,An. The two alternatives Aj and Ak cannot reasonablybe judged in isolation from the context of the selection problem. Hence, the decision maker firstpartitions the set of alternatives into three categories: the vessels which are ‘good’ because theircosts are roughly below MU 22.000; the vessels which are ‘bad’ because their costs are roughlybeyond MU 30.000; and the intermediate category with costs between the two thresholds justmentioned. Since the vessels Aj and Ak are both contained in the intermediate category and notvery close the decision maker first declares Ak to be somewhat more expensive than Aj , We modelhis/her relative preference for Aj with respect to Ak by setting Ojk = 4. The decision maker alsofeels that his/her relative preference could be expressed by a difference of grades which is equal to1 so that we would obtain Ojk = 2. In order to solve the conflict the decision maker reconsidershis/her previous judgemental statements. He/she specifies the interval between MU 20.000, andMU 40.000, as the range of acceptable costs. Here, the two vessels have the respective grades 6 and5 so that the relative preference for Aj with respect to Ak can now be modelled by setting Ojk = 2.

435


Vessels under the operability criterion

Let us again suppose that the decision maker only considers vessels with a operability of at leastOmin so that he/she is restricted to the interval (Omin, Omax), with Omax usually set to 100%.We cover the given range by the grid with the geometric sequence of points

Oν = Omax − (Omax −Omin)× 264

ν = 1, . . . ,6

The alternatives are again compared with respect to the desired target. If we take the symbolsOνj and Oνk

to denote the operability of the alternative vessels Aj and Ak respectively, thenthe inverse ratio

Ojk =

(Omax −Oνk

Omax −Oνj

)= 2νk−νj (7.32)

of the distances with respect to the desired target Omax represents the relative preference for Aj

with respect to Ak under the operability criterion. The qualification ‘somewhat more operable’implies that the inverse ratio of the distances to the desired target is 4:1, etc. Representing theoperability category around Oν by the grade g = 10−ν, we can also express the relative preferencefor the vessel Aj with respect to Ak by the difference of grades

qjk = log2 Ojk = log2

(Omax −Oνk

Cmin −Oνj

)= νk − νj = gj − gk (7.33)

The assignment of numerical values to the major gradations of comparative judgement is shownin Table 7.7.

Comparative judgement of Linguistic preference for Rel. preference Ojk Diff. of gradesAj with respect to Ak Aj with respect to Ak in real magnitudes qjk = log2 Ojk

Aj much more operable strong preference for Aj 64 6Aj more operable strict, definite pref. for Aj 16 4Aj somewhat more operable weak preference for Aj 4 2Aj as operable as Ak indifference 1 0Ak somewhat more operable weak preference for Ak 1/4 -2Ak more operable strict, definite pref. for Ak 1/16 -4Ak much more operable strong preference for Ak 1/64 -6

Table 7.7. Comparative judgement under the operability criterion

The elicitation of preference information can now again be carried out in two different waysbecause operability is expressed on a one-dimensional scale. We can ask the decision maker tospecify the endpoints of the range of acceptable operabilities whereafter we calculate the gradesto be assigned to the alternative vessels under the operability criterion. This yields the corre-sponding difference of grades and the relative preference in its real magnitude. If the decisionmaker is unable or unwilling to specify the requested endpoints, however, we can use his/her

436


comparative judgement directly. Thus, somewhat more operability yields the relative preference4:1 and the difference of grades 2, more operability yields the relative preference 16:1 and thedifference of grades 4, etc.

The above procedure whereby we assign numerical values to the relative preferences themselvesor to the logarithms of the relative preferences is similar to the mode of operation in acousticswhere ratios of sound intensities are encoded, either in real magnitudes or logarithmically onthe decibel scale. The elicitation of preferential information from the decision maker seems toproceed in two different ways. Ratio information is obtained on a scale with geometric progressionif the decision maker is asked to formulate his/her relative preferences (Multiplicative AHP), anddifference information is obtained on an arithmetic scale if one asks the decision maker to expresshis/her judgement via a difference of grades (Additive AHP, SMART with pairwise comparisons).For the decision maker these are alternative ways of saying the same thing, however.

7.4.2 Calculation of Impact Grades and Scores

In a method of pairwise comparisons we seem to collect much more information than we need.With n alternatives the decision maker may carry out n(n − 1)/2 basic experiments in order tofill the upper or the lower triangle in the matrix {Ojk} of pairwise comparisons, whereas (n− 1)properly chosen experiments would be sufficient (A1 versus A2, A2 versus A3, etc.). The redun-dancy is usually beneficial, however, since it enables us to smooth the results of the analysis. TheAdditive and the Multiplicative AHP can easily analyze incomplete pairwise comparisons whichoccur when the decision maker does not carry out the maximum number of basic experiments.In addition, it can easily be used in groups of decision makers who individually do not even carryout the minimum number of experiments.

Incomplete pairwise comparisons in a group of decision makers

Let us first consider a group of decision makers who are requested to assess the alternatives Aj

and Ak under a particular criterion. We shall be assuming that these alternatives have the samesubjective values Vj and Vk for all decision makers. Moreover, the decision makers are supposedto estimate the ratio Vj/Vk via their judgemental statements. These are strong assumptions, butthey are not unreasonable since many decisions are made within an organizational frameworkwhere the members have common values.

The verbal comparative judgement given by decision maker d is converted into the numericalvalue rjkd according to the rules of the pairwise comparisons, so that

qjkd = log2 Ojkd

Next, the vector V of subjective values is approximated via logarithmic regression. Introducingthe set Djk to denote the set of decision makers who actually expressed their opinion about thetwo alternatives under consideration, the vector V of subjective values is approximated via theunconstrained minimization of the sum of squares

437


∑

j<k

∑

d∈Djk

(log2 Ojkd − log2 νj + log2 νk)2 (7.34)

Introducing the new variables wj = log2 νj expression (7.34) can be rewritten as∑

j<k

∑

d∈Djk

(qjkd − wj + wk)2 (7.35)

So, it does not matter which type of information we collect from the decision makers. The sumof squares (7.4.2) is minimized regardless of whether one has ratio or difference information.Since (7.4.2) is a convex quadratic function we can easily find an optimal solution by solvingthe associated set of normal equations which is obtained by setting the first–order derivatives of(7.4.2) to zero. Using the properties

qjkd = −qkjd for any j and k

one finds the associated set of normal equations from∑

j<k

∑

d∈Djk

(qjkd − wj + wk) = 0

so that the normal equations themselves take the form

wj

n∑

k=1,k 6=j

Njk −n∑

k=1,k 6=j

Njkwk =∑

j<k

∑

d∈Djk

qjkd j = 1, . . . ,n (7.36)

where Njk denotes the cardinality of the set Djk. The normal equations are dependent. Theysum up to the zero equation (see the below example). There is at least one additive degree offreedom in the unconstrained minima of the function (7.4.2) because there are only differences ofvariables in the sum of squares. Hence, there is at least one multiplicative degree of freedom inthe unconstrained minima of the function (). In other words, we can only draw conclusions fromdifferences wj − wk and from ratios νj/νk. Note that the decision makers have to judge morepairs if there are two or more degrees of freedom.

Let illustrate the foregoing results with the pairwise comparisons of the Renault Laguna 2.0,the Honda Accord 2.0, and the Ford Mondeo 1.8 under the criterion of comfort. There are fourdecision makers, the parents and the two adult children of a family. They do not necessarilycarry out all possible pairwise comparisons. Table 7.8 shows their verbal judgemental statementsin logarithmic form: equally comfortable 0, somewhat more comfortable ± 2, more comfortable± 4, much more comfortable ± 6. Obviously, the cells of the pairwise–comparison tableau arenot completely filled with four entries each, and there is no information in the cells on the maindiagonal. There are four decision makers who do not judge all possible pairs. The Solution tothe associated normal equations yields SMART impact grades between 4 and 10 with an additivedegree of freedom and the AHP impact scores summing up to 1 with a multiplicative degree offreedom.

438


Ship Ro–Ro Ro–Ro Ro–Ro Solution of SMART AHPDesign Des001 Des002 Des003 normal eq. grades scores

Des001 empty -1, -3 -2, +2, -1 0 6.0 0.203Des002 +1, +3 empty -3 1.18182 7.2 0.461Des003 +2, -2, +1 +3 empty 0.72727 6.7 0.336

Table 7.8. Pairwise-comparison tableau under the criterion of comfort in logarithmic form

The normal equations corresponding to the pairwise comparisons in Table 7.8 can be written inthe explicit form

5w1 −2w2 −3w3 = -5−2w1 +3w2 −w3 = +1−3w1 −w2 +4w3 = +4

The first equation originates from the first row in the pairwise–comparison tableau; the coefficient5 on the main diagonal stands for the total number of entries, the coefficients 2 and 3 for thenumber of elements in the second and the third cell, whereas the right–hand side element -5represents the sum of the entries. The remaining equations are built up in a similar way. Sincethe equations are dependent they sum to the zero equation) we drop one of them, and since thesolutions have an additive degree of freedom we can arbitrarily choose one of the variables. Thus,setting wj = 0 and dropping the first equation we obtain the solution exhibited in Table 7.8.The additive degree of freedom, designated by the symbol θ. is used to shift the solution so thatSMART impact grades are obtained

gj = wj + θ j = 1, . . . ,n (7.37)

which are nicely but somewhat arbitrarily situated between 4 and 10. Next, we compute

vj = 2wj j = 1, . . . ,n (7.38)

and these values are normalized in order to obtain AHP impact scores aj by setting

aj = β vj j = 1, . . . ,n (7.39)

where β stands for the normalization factor which guarantees that the impact scores sum up to1. Hence

aj =2wj

n∑

j=1

2wj

(7.40)

Note that both the shift and the normalization are cosmetic operations carried out in order topresent the results in a more or less attractive way. It will be clear from Table 7.8 that the AHP

439


impact scores (Multiplicative AHP) suggest more distinction between the alternatives than theSMART impact grades (Additive AHP). They are connected by the logarithmic relationship

aj

ak= 2 gj−gk (7.41)

This relationship depends neither on the choice of the shift constant θ nor on the choice of the nor-malization factor β. Throughout this subsection we will see that the really interesting, uniquelydetermined information consists of ratios of scores and differences of grades.

Complete pairwise comparisons by a single decision maker

The above general result can be simplified in special cases. Let us consider one single decisionmaker who expressed his opinion about all possible pairs of alternatives under the given criterionso that

Njk = 1 ∀j 6= k

The normal equations (7.36) take the simple form

(n− 1)wj −n∑

k=1,k 6=j

wk =n∑

k=1,k 6=j

qjk

or, equivalently

nwj −n∑

k=1

wk =n∑

k=1

qjk

if we take qjj = 0 all j. The additive degree of freedom is used in the solutions of this set ofequations to set the sum ofthe variables to zero, which yields

wj =1n

n∑

k=1

qjk (7.42)

This means that wj is the arithmetic mean of the jth row in the matrix of pairwise comparisonsin logarithmic form, at least under the tacit assumption that the elements on the main diagonalare set to 0 so that we have indeed n elements in each row. By a proper shift of the wj we cangenerate impact grades gj which are situated between 4 and 10.

The AHP impact scores can also directly be computed when the pairwise comparisons are recordedas ratio estimates Ojk. By equations (7.38) and (7.42) it must be true that

vj = n

√Πn

k=1rjk (7.43)

at least under the assumption that the main diagonal elements are available. We set rjj = 1for any j. So, vj is the geometric mean of the j–th row in the matrix of pairwise comparisonsin real magnitudes. The AHP impact scores aj are obtained by normalization, again a cosmeticoperation to make sure that the scores add up to l.

440


An illustrative example is given by the complete set of pairwise comparisons of three vesselsunder the criterion of operability. There is one single decision maker, and the logarithms ofhis/her verbal judgement are shown in Table 7.9, where there is one single decision maker whojudges all possible pairs. The solutions to the associated normal equations yields SMART impactgrades between 4 and 10 with an additive degree of freedom and the AHP impact scores summingup to 1 with a multiplicative degree of freedom.

Ship Ro–Ro Ro–Ro Ro–Ro Arithmetic SMART AHPDesign Des001 Des002 Des003 row means grades scores

Des001 0 -3 -1 -1.33333 4.7 0.091Des002 +3 0 +2 1.66667 7.7 0.727Des003 +1 -2 0 -0.33333 5.7 0.182

Table 7.9. Pairwise-comparison tableau under the criterion of operability in logarithmic form

Complete pairwise comparisons in a group of decision makers

When all decision makers in a group of size G assess all possible pairs of alternatives under thegiven criterion, we can simplify formula (7.36) because Njk = G for all j 6= k. The normalequations are now given by

(n− 1)Gwj −n∑

k=1,k 6=j

wk =n∑

k=1,k 6=j

G∑

d=1

qjkd j = 1, . . . ,n

so that they can be rewritten as

nG wj −Gn∑

k=1

wk =n∑

k=1

G∑

d=1

qjkd j = 1, . . . ,n

if we take qjkd = 0 for all j. We use again the additive degree of freedom in the solutions to setthe sum of the variables to zero, whence

wj =1

nG

n∑

k=1

G∑

d=1

qjkd (7.44)

The wj can equivalently be calculated in two different ways:

• First, all entries are replaced in a cell by their arithmetic mean, so that we have a groupopinion about each pair of alternatives. Thereafter we calculate the arithmetic row meansof the matrix of group opinions. It is tacitly assumed that there are zeroes in the cells onthe main diagonal.

• First, the arithmetic row means are calculated in the pairwise comparison matrices of theindividual group members separately, with zeroes on the main diagonals, so that one ob-tains the impact grades assigned to the alternatives by each group member. Thereafter wecalculate the arithmetic means of the impact grades.

441


The AHP impact scores under the given criterion can be found in a similar way. On the basis ofequation (7.44) a solution of the logarithmic regression problem (7.4.2) is given by

vj = n

√G

√Πn

k=1ΠGd=1Ojkd (7.45)

a formula which shows that the vj can be obtained by geometric–mean calculations. The resultsdo not depend on the order of the operations. We can first calculate the group opinions abouteach pair of alternatives and the scores thereafter, and vice versa, under the assumption thatthere are ones on the main diagonals of all matrices.

The SMART impact grades gj and the AHP impact scores aj can finally be obtained by a propershift of the wj and a proper normalization of the vj respectively.

Power games in groups of decision makers

The results of this subsection have been generalized by Barzilai and Lootsma (1997) so that theycan be used in groups of decision makers who have widely varying power positions. The crucialstep is the assignment of power coefficients Cd to the respective group members. These coefficients,normalized so that they sum up to l, stand for the relative power of the decision makers. Impactgrades and scores of the alternatives are obtained by the unconstrained minimization of

∑

j<k

∑

d∈Djk

(qjkd − wj + wk)2pd (7.46)

clearly a generalization of equation (7.4.2) in the sense that each term in the sum of squares isweighted with the relative power of the decision maker who expressed the corresponding compar-ative judgement. When the pairwise comparisons are complete, a solution is given by

wj =1n

n∑

k=1

G∑

d=1

qjkd Cd (7.47)

7.4.3 Criterion Weights and Aggregation

Let us first introduce some notation. There are m criteria C1, . . . ,Cm. Suppose that we have ob-tained the SMART impact grades gij and the AHP impact scores aij of the respective alternativesAj under criterion Ci via the solution (wi1, . . . ,win) of a set of normal equations. Thus

gij = wij + θi j = 1, . . . ,n

aij = βi 2wij j = 1, . . . ,n

where θi and βi stand for the shift constant and the normalization factor under the i–th crite-rion. Differences of impact grades and ratios of impact scores are connected by the logarithmicrelationship a

aij

aik= 2 gij−gik (7.48)

442


We also assume that there are criterion weights ci expressing the relative importance of the re-spective criteria. They may have been obtained via the direct–rating procedure of subsection7.3.4, but they may also be generated via the method of pairwise comparisons to be described inthe present subsection.

Aggregation via arithmetic and geometric means

On the basis of equations (7.39) and (7.48) one can easily derive

Πmi=1

(aij

aik

)ci

= 2∆jk (7.49)

where

∆jk = sj − sk =m∑

i=1

ci gij −m∑

i=1

ci gik (7.50)

The symbols sj and sk, clearly obtained via a so–called arithmetic–mean aggregation rule, standfor the final SMART grades. They are not unique. There is an additive degree of freedom θi

under each criterion as well as a general degree of freedom η so that the final grade sj is generallygiven by

sj = η +m∑

i=1

ci(wij + θi) (7.51)

with arbitrary shift constants η, θ1, . . . ,θm. The formulas (7.49) and (7.50) suggest that the dif-ference of the final grades is the logarithm of a ratio of final scores according to the MultiplicativeAHP. We therefore take

tj = α Πmi=1a

ciij = α Πm

i=1(βi vij)ci (7.52)

to represent the final AHP score of Aj . The multiplicative factor α is used for cosmetic purposesonly, to make sure that the final scores sum up to l. The βi stand for arbitrary normaliza-tion factors under the respective criteria. Obviously, the final AHP scores are calculated via ageometric–mean aggregation rule and, moreover

tjtk

= 2 sj−sk (7.53)

regardless of the choice of the shift constants and the normalization factors. Hence, even the finalSMART grades and the final AHP scores satisfy the logarithmic relationship that we found forthe impact grades and the impact scores of the alternatives under each of the criteria separately(see equations (7.41) and (7.48).

Aggregation of complete pairwise comparisons

When the pairwise comparisons are complete, in the sense that each decision maker judged everypair of alternatives under each criterion, we can simplify the calculations considerably. Thesymbol qijkd is taken to stand for the difference of grades assigned to the alternatives Aj and Ak

443


under criterion Ci by decision maker d, and rijkd for the corresponding ratio of subjective values.It follows easily from (7.44) and (7.45) that the final SMART grades and the final AHP scores, ifone ignores the shift constants and the normalization factors, can be written as

sj =1

nG

m∑

i=1

n∑

k=1

G∑

d=1

ci qijkd (7.54)

tj = n

√G

√Πm

i=Πnk=1Π

Gd=1O

ciijkd (7.55)

Computationally, this implies that we can operate in any order without affecting the final resultsof the analysis. One can aggregate, first, over the decision makers so that we obtain the groupopinion about every pair of alternatives under each criterion, thereafter over the criteria so thatwe obtain an aggregate pairwise–comparison matrix, and finally over the row means of the ag-gregate pairwise–comparison matrix to obtain the final grades and scores. We can also changethe order of the operations, however, in order to check the correctness of the computational results.

Interpretation of a ratio of criterion weights

Shift constants and normalization factors do not affect ratios of criterion weights. We show thisfor the normalization factors only. In doing so we also find an interpretation for ratios of criterionweights in terms of substitution rates, as one might expect since these ratios are traditionallylinked with the trade–offs between gains and losses during a move along indifference curves. Onthe basis of formula (7.52) we can generally write the final AHP score of an alternative A as afunction of the approximations to the subjective values of A under the respective criteria. Thus,the final AHP score is given by

t(A) = t(ν1, . . . ,νn) = α Πmi=1(βi νi)ci (7.56)

The arbitrary multiplicative factors α, β1, . . . ,βm appear in formula (7.56) for normalizationpurposes but one can equivalently say that they appear because the units of performance mea-surement were not specified. The first–order partial derivatives of t take the form

∂t

∂νi=

ci

νi× t

whence

1∂νi1

∂t

∂vi2

/1

∂νi2

∂t

∂vi1

=ci2

ci1

(7.57)

for arbitrary i1 and i2, and regardless of the factors α, β1, . . . ,βm. We can now study thebehavior of t as a function of νil and νi2 along a contour or indifference curve. In a first–orderapproximation, when we make just a small step, such a move proceeds in a direction which isorthogonal to the gradient of t, that is, in the direction

(∂t

∂νi2

,− ∂t

∂νi1

)

444


This is shown in Figure 7.21. The reader has to remember that in a first–order approximation,a move along an indifference curve proceeds in a direction which is orthogonal to the gradient inthe point where the move starts. The marginal substitution rate is the ratio of the componentsof that direction. The relative substitution rate is the ratio of the components divided by thecoordinates of the point of departure.

Figure 7.21. Indifference curve of a final AHP score function

Traditionally (Keeney and Raiffa, 1976) the ratio∂t

∂νi2

/∂t

∂νi1

has been defined as the marginal trade–off or the marginal substitution rate between the twocriteria under consideration. Under the geometric–mean aggregation rule, with the functiont defined by formula (7.56), this ratio is not constant along an indifference curve. Lootsmaand Schuyt (1997) therefore introduced the relative substitution rate, which is based upon theobservation that human beings generally perceive relative gains and losses, that is, gains andlosses in relation to the levels from which the move starts. Thus, when a small step is made alongan indifference curve, the relative gain (or loss) in the vi1 direction and the corresponding relativeloss (or gain) in the νi2 direction are proportional to

1∂νi1

∂t

∂νi2

and1

∂νi2

∂t

∂νi1

respectively. Under the geometric–mean aggregation rule (7.56) the substitution rate betweenrelative gains and losses is a constant, not only along an indifference curve, but over the entire(νi1 ,νi2) space. It depends neither on the units of measurement nor. on the values of the re-maining variables νi, i 6= i1 and i2. Thus, we can meaningfully use the concept of the relativeimportance of the criteria, even in the absence of immediate context, when the alternatives arenot yet available, for instance.

Pairwise comparison of the criteria

The pairwise comparison of two criteria proceeds almost in the same way as the pairwise com-parison of two alternatives under a particular criterion. It is also closely related to the proceduredescribed in subsection ...???. In the basic experiment a pair (Cf ,Cs) of criteria is presented to

445


the decision maker whereafter he/she is requested to state whether they are equally important forhim/her or whether one of the two is somewhat more, more, or much more important than theother. By the information so obtained we estimate the relative importance of the first criterionCf with respect to the second criterion Cs, that is, the ratio of the associated criterion weights.By the arguments just mentioned the geometric–mean aggregation rule enables us to avoid thepitfalls mentioned in subsection ...??...

The assignment of numerical values to the gradations of relative importance is shown in Table7.10, where the relative importance of criteria are expressed in the form of a difference of grades.The scale l, 2, 4, 8, 16 is used, which has been derived in subsection ....???...., a scale withprogression factor

√2 if one considers the major as well as the threshold gradations of relative

importance. For reasons of simplicity the difference of grades qfsd is usually recorded to representthe comparative judgement of decision maker d about the pair (Cf ,Cs).

Comparative judgement of Cf Difference ofwith respect to Cs grades qfsd

Cf vastly more important than Cs 8Cf much more important than Cs 6Cf more important than Cs 4Cf somewhat more important than Cs 2Cf as important as Cs 0Cf somewhat less important than Cs -2Cf less important than Cs -4Cf much less important than Cs -6Cf vastly less important than Cs -8

Table 7.10. Comparative judgement of the criteria

The criterion weights are also computed, just like the impact scores of the alternatives, via thesolution of a regression problem, i.e. the unconstrained minimization of the sum of squares

∑

f<s

∑

d∈Dfs

(qfsd − wf + ws)2 (7.58)

The associated normal equations do not have a unique solution. There is an additive degree offreedom which can be used to obtain the normalized criterion weights

ci =(√

2)wi

∑mi=1(

√2)wi

(7.59)

In fact, the calculation of the impact scores of the alternatives and the calculation of the criterionweights differ with respect to the progression factor only. It is 2 for the alternatives (formula(7.36)) and

√2 for the criteria (formula (7.59)), because human beings categorize long ranges on

the dimensions of time, sound, and light, but evidently a short range on the dimension of theimportance of the criteria. .

446


7.4.4 Fuzzy Extension

A fuzzy extension of MADM methods is desirable because the problem data and the judgementalstatements of the decision makers are usually imprecise. This has been explained in subsection7.3.4, so that it is possible to immediately proceed to the design of a fuzzy extension of theAdditive and the Multiplicative AHP. The method starts from the subjective evaluation of n

alternatives under an unspecified criterion, typically the point of departure of subsection 7.3.2.It is supposed that both the estimated ratios Ojkd and the associated differences of grades qjkd

can be mode1led as fuzzy numbers with triangular membership functions. Thus, in what fo1lowsthe estimated ratio

rjkd = (Ojkdl,Ojkdm,Ojkdu)

is estimated as well as the difference of grades

qjkd = (qjkdl,qjkdm,qjkdu) = (ln Ojkdl, ln Ojkdm, ln Ojkdu)

whereby the decision maker d expresses his/her comparative judgement of Vj/Vk, the ratio of thesubjective values of the alternatives Aj and Ak. The crucial algorithmic step in the Additive andthe Multiplicative AHP is the solution of the system of normal equations (7.36), and this systemcan easily be fuzzified. The right–hand side has fuzzy elements now, but the coefficient matrixremains crisp. The variables are also fuzzy, and it is assumed that they can be written as fuzzynumbers

wj = (wjl,wjm,wju)

with triangular membership functions. Hence, one now has to deal with the fuzzy normal equa-tions

wj

n∑

k=1,k 6=j

Njk −n∑

k=1,k 6=j

Njk wk =n∑

k=1,k 6=j

∑

d∈Djk

qjkd , j = 1, . . . ,n

It follows easily that the modal values wjm have to satisfy the equations

wjm

n∑

k=1,k 6=j

Njk −n∑

k=1,k 6=j

Njk wkm =n∑

k=1,k 6=j

∑

d∈Djk

qjkdm , j = 1, . . . ,n (7.60)

These are precisely the normal equations (7.36), even though he lower values wjl and the uppervalues wju cannot be solved separately. They jointly have to satisfy the equations

wjl

n∑

k=1,k 6=j

Njk −n∑

k=1,k 6=j

Njk wku =n∑

k=1,k 6=j

∑

d∈Djk

qjkdl , j = 1, . . . ,n (7.61)

wju

n∑

k=1,k 6=j

Njk −n∑

k=1,k 6=j

Njk wkl =n∑

k=1,k 6=j

∑

d∈Djk

qjkdu , j = 1, . . . ,n (7.62)

The equations in the system (7.60) sum up to the zero equation so that they have an additivedegree of freedom. Because qjkdl = −qkjdu and Njk = Nkj , it can shown that the equations(7.61) and (7.62) also sum to the zero equation.

447


In general, the right–hand side elements of equations (7.61) and (7.62) sum up to zero becausen∑

j=1

n∑

k=1,k 6=j

∑

d∈Djk

qjkdl −n∑

j=1

n∑

k=1,k 6=j

∑

d∈Djk

qkjdl = 0

This result is not satisfactory, however. The systems (7.60), (7.61) and (7.62) have at least onedegree of freedom each, and these degrees are independent so that one cannot identify a fuzzysolution unless some additional assumptions. are introduced. For a fuzzy version the decisionmakers would have to supply much more information (the lower and the upper values of thedifferences of grades) than for a crisp version (the modal values only), but one has the impressionthat they are mostly unable or unwilling to provide the additional data. It is therefore necessaryto simplify the procedure. In order to get a rough idea of how the imprecision of the decisionmaker affects the final grades and scores of the alternatives, one has to ask them to specify auniform right–hand and left–hand spread σ which they almost never exceed in the actual decisionproblem. Now it can be written

qjkdu − qjkdm = qjkdm − qjkdl = σ

Thus, the triangular membership functions of the differences of grades are isosceles and they havethe same basis length. If it is assumed that the variables have a similar form, albeit with spreadswhich still have to be determined, one can write the following equation

wju − wjm = wjm − wju = τj

Subtracting the equations (7.61) from (7.62) we find that the spreads τi satisfy the system

τj

n∑

k=1,k 6=j

Njk +n∑

k=1,k 6=j

Njk τk =n∑

k=1,k 6=j

∑

d∈Djk

σ = σn∑

k=1,k 6=j

Njk , j = 1, . . . ,n (7.63)

If it is moreover assumed that the variables have equal spreads τ (a reasonable assumption becausethe differences of grades have equal spreads as well) the system (7.63) yields the surprisingly simpleresult τ = σ/2. This enables to carry out a sensitivity analysis of the final grades and scoresin a manner which parallels the procedure for SMART (see subsection 7.4.3). The fuzzy impactgrades can be written as

gij = [gijm − σ/2,gjm,gijm + σ/2]

and the fuzzy criterion weights have the form

ci = cim

[(√

2)−σ/2,1,(√

2)σ/2]

where cim is the normalized weight of criterion Ci in the crisp case. The fuzzy final grades of thealternatives can now be written as

sj =∑

i

ci gij =∑

i

cim ×((√

2)−σ/2,1,(√

2)σ/2)× [gijm − σ/2,gjm,gijm + σ/2]

448


Usually, all grades must be between 4 and 10, and this implies that the modal, the lower, andthe upper values of the fuzzy final grades are approximately given by

sjm =∑

i

cim gijm (7.64)

sjl = max[4,(√

2)−σ/2 (sjm − σ/2)]

(7.65)

sju = min[10,(

√2)σ/2 (sjm + σ/2)

](7.66)

7.4.5 Original AHP

The original AHP of Saaty (1977, 1980) has been criticized for various reasons:• for the so–called fundamental scale to quantify verbal comparative judgement;• because it estimates the impact scores of the alternatives by the Perron–Frobenius eigen-

vector of the pairwise–comparison matrix;• because it calculates the final scores of the alternatives via the arithmetic–mean aggregation

rule.

The original AHP is based upon ratio information so that the proposed algorithmic operationsare inappropriate. The critical comments and the supporting evidence are briefly summarizedbelow.

The fundamental scale

For a decision maker there is no difference between the original AHP and the Additive or theMu1tipllcative AHP as far as the input of the judgemental statements is concerned. Two stimuliare presented to him/her, whereafter he/she is invited to answer the same questions regardless ofthe method to be used. The difference is in the subsequent quantification of the answers. Table7.11 shows how the major gradations of the decision maker’ s comparative preferential judgementare encoded in numerical scales.

The scale of the Multiplicative AHP, based upon psycho–physical arguments, is clearly muchlonger than the fundamental scale of the original AHP . Several numerical studies, however,show that the final scores are almost insensitive to the length of the scale (Lootsma, 1993).The fundamental scale, neither arithmetic nor geometric, introduces fractions which are notnecessarily present in the decision maker’s mind. Consider three alternatives Aj , Ak, and Al,for instance, and suppose that the decision maker has a weak preference for Aj with respectto Ak (fundamental–scale value 3) and a weak preference for Ak with respect to Al (the samefundamental–scale value 3). He/she might logically have a strict preference for Aj with respectto Ak (transitivity of preference, weak × weak = strict), certainly not the very strong or absolutepreference which is suggested by the product of the scale values for weak preference (3 × 3 =9). Frictions to such an extent do not occur under the Additive AHP (2 + 2 = 4, addition oflogarithms) or the Multiplicative AHP (4 x 4 = 16).

449


Comparative preferential Original AHP Additive AHP Multiplicative AHPjudgement of Aj estimated ratio of difference of estimated ratio of

with respect to Ak subjective values grades subjective values

Very strong preference for Aj 9 8 256Strong preference for Aj 7 6 64Strict (definite) preference for Aj 5 4 16Weak preference for Aj 3 2 4Indifference between Aj and Ak 1 0 1Weak preference for Ak 1/3 -2 1/4Strict (definite) preference for Ak 1/5 -4 1/16Strong preference for Ak 1/7 -6 1/64Very strong preference for Ak 1/9 -8 1/256

Table 7.11. Scales of comparisons in the original, the additive and the multiplicative AHP

The Perron-Frobenius eigenvector

It is we1l–known that Saaty (1977, 1980), considering the positive and reciprocal matrix R withcomplete pairwise comparisons under a given criterion (one single decision maker), proposed toestimate the impact scores of the alternatives by the components of the eigenvector correspondingto the largest eigenvalue (real and positive by the theorem of Perron and Frobenius). At an earlystage, this proposal has been criticized by Johnson et al. (1979), Cogger and Yu (1983), andTakeda et al. (1987). The key issue is the so–called right-left asymmetry: which eigenvectorshould be used to produce the impact scores of the alternatives under a given criterion, the leftor the right eigenvector? Johnson et al. (1979) considered the pairwise–comparison matrix

1 3 13

12

13 1 1

6 23 6 1 12 1

2 1 1

The Perron–Frobenius right eigenvector, positive and normalized in the sense that the compo-nents sum up to 1, is given by (0.184, 0.152, 0.436, 0.227) so that it provides the rank orderA3 > A4 > A1 > A2. The element Ojk in R tells us how strongly the decision maker prefersalternative Aj over Ak, and the components of the right eigenvector stand for the ‘degree of sat-isfaction’ with the alternatives. If one rephrases the questions submitted to the decision maker,that is, if one asks how strongly he/she dislikes Aj with respect to Ak, we would logically obtainthe transpose of R. The right eigenvector of the transpose of R is the left eigenvector of R itself,and it appears to be given by (0.248, 0.338, 0.105, 0.259). The components represent the ‘degreeof dissatisfaction’ with the alternatives. This leads to the rank order A3 > A1 > A4 > A2, andaccordingly to an interchange of the positions of A1 and A4. Note that such a rank reversal doesnot occur when one uses the geometric row means to compute the impact scores. The geometriccolumn means are the inverses of the corresponding geometric row means.

450


Saaty and Vargas (1984) tried to settle the matter and to show that the right eigenvector shouldalways be used because it properly captures relative dominance. The evidence is not convincing,however. The basic step in the argumentation is that the (j,k)th element in the mth power of apairwise–comparison matrix represents the total intensity of all walks of length m from a node j

to a node k. The intensity of an m–walk from j to k is the product of the intensities of the arcs inthe walk. In terms of the AHP, the total intensity would be a sum of products of preference ratios.

Aggregation of ratio and difference information

In recent years Barzilai et al. (1994, 1997) analyzed the AHP under the requirement that theresults of the aggregation step should not depend on the order of the computations. The finalscores should be the same, regardless of whether one combines first the pairwise–comparison ma-trices into an aggregate matrix, or whether one computes first the impact scores from the separatepairwise–comparison matrices. The original AHP is based upon ratio information so that it has amultiplicative structure. The geometric–mean aggregation rule appears to be the only rule herewhich satisfies certain consistency axioms (see formula (7.45)). For a variant which is based upondifference information so that it has an additive structure the arithmetic–mean aggregation ruleappears to be the only rule which is compatible with the corresponding consistency axioms (seeformula (7.44)).

The original AHP follows an inappropriate sequence of operations, particularly when it is usedin group decision making: geometric–mean computations to synthesize the pairwise comparisonsexpressed by the individual members into group pairwise comparisons, eigenvector calculationsto compute the impact scores of the alternatives, and arithmetic–mean calculations to combinethe impact scores into final scores. The dependence on the order of the computations is avoided:in the Multiplicative AHP by using a sequence of geometric–mean calculations (formula (7.55)and in the Additive AHP by using a sequence of arithmetic–mean computations (formula (7.54).

Numerical experiments

How do the shortcomings of the original AHP emerge? How do the decision makers notice thatthere is an inappropriate sequence of operations in it? Not by inspection of the final scores be-cause the AHP is remarkably robust: the final scores are almost insensitive to the choice of thealgorithmic operations.

The vivid discussions about the shortcomings of the original AHP were triggered by Belton andGear (1983) who studied the behavior of the method on an artificial numerical example. Theynoted that the addition of a copy of an alternative may change the rank order of the final scoresin a set of consistently assessed alternatives, even when the criteria and the criterion weightsremain the same. This prompted Dyer (1990) to state that the rankings provided by the originalAHP are arbitrary. It is easy to verify, however, that the rank reversal disappears as soon as thearithmetic–mean aggregation is replaced by the geometric–mean aggregation. In the Additiveand the Multiplicative AHP the addition or the deletion of an alternative, whether it is a copy

451


of another alternative or not, preserves the rank order of the remaining alternatives. These twovariants have an even stronger property. The Additive AHP preserves the difference between anytwo final grades. the Multiplicative AHP preserves the ratio of any two final scores.

Matters may be clarifies by considering here the example of Belton and Gear (1983). Table7.12 shows the original data. There were initially three alternatives, A1, A2, and A3, and threecriteria with equal weights. Under each criterion the pairwise comparisons were consistent in themultiplicative sense that Ojk × Okl = Ojl for any triple of elements in the pairwise–comparisonmatrix (hence the somewhat unusual entries 8/9 and 9/8). The final scores appear to be (0.45,0.47, 0.08) so that A2 turns out to be the preferred alternative. When a copy A4 of A2 is addedto the set, however, the original AHP yields the final scores (0.37, 0.29, 0.06, 0.29) so that itdesignates Al to be the leading alternative.

Criterion Design A1 A2 A3 A4

A1 1 1/9 1 1/9A2 9 1 9 1

C1A3 1 1/9 1 1/9A4 9 1 9 1

A1 1 9 9 9A2 1/9 1 1 1

C2A3 1/9 1 1 1A4 1/9 1 1 1

A1 1 8/9 8 8/9A2 9/8 1 9 1

C3A3 1/8 1/9 1 1/9A4 9/8 1 9 1

Table 7.12. Comparison in AHP

Table 7.13 shows that the computations in the Additive and the Multiplicative AHP are muchsimpler because one can use the aggregate matrix to find the final grades and scores. First. notethat the pairwise-comparison matrices are consistent in the additive sense so that the aggregatematrix is also consistent. One therefore needs the top row only. The remaining rows do notprovide any additional information. It follows easily now that the final grades of A1 and A2

have the difference -0.333, the final grades of A1 and A3 have the difference 5, etc. Hence. thealternatives A1, A2, and A3 have the normalized final scores (0.436, 0.550, 0.014). When A4 isadded to the set, the normalized final scores are (0.281, 0.355, 0.009, 0.355). No rank reversal!The ratio of any two final scores is preserved.

The aggregate matrix is a powerful instrument for the analysis via the Additive and the Multi-plicative AHP, but it does not make sense in the original AHP. With the data of Table 7.12 itwould not even be reciprocal so that the theorem of Perron and Frobenius does not apply.

452

7.5 – ELECTRE Systems

Criterion Design A1 A2 A3 A4

A1 0 -8 0 -8A2 8 0 8 0

C1A3 0 -8 0 -8A4 8 0 8 0

A1 0 8 8 8A2 -8 0 0 0

C2A3 -8 0 0 0A4 -8 0 0 0

A1 0 -1 7 -1A2 1 0 8 0

C3A3 -7 -8 0 -8A4 1 0 8 0

A1 0 -0.333 5 -0.333A2 0.333 0 5.333 0

AggregateA3 -5 -5.333 0 -5.333A4 0.333 0 5.333 0

Table 7.13. Comparison with rescaled data in Additive AHP

The key question is the following one, however. Given the assumption that the criteria and thecriterion weights do not change, could one legitimately expect rank reversal by the addition orthe deletion of copies of an alternative? We do not think so. Rank reversal cannot logically beexpected under such circumstances so that the example of Belton and Gear (1983) provides thestrong warning that the original AHP should be used with considerable caution.

7.5 ELECTRE Systems

The ELECTRE systems are central to the French school in MADM where a complete or in-complete rank order of the alternatives is built up via outranking relations under the individualcriteria. In the pairwise comparison step the outranking relation between two alternatives undera given criterion is established by inspection of the difference between the physical or monetaryvalues expressing the performance of the respective alternatives. The key question is to find cer-tain discrimination thresholds to categorize the differences. The indifference, preference, and vetothresholds in ELECTRE III constitute the basis for two fuzzy concepts: the degree of concordance(the degree of agreement or harmony with the statement that the first alternative in the pair isat least as good as the second), and the degree of discordance (the degree of disagreement withthe above statement). A so–called distillation procedure will eventually produce a not necessarilycomplete rank order of the alternatives. The French school is based upon the idea of construc-tivism which implies that a coherent system of preferences and values is not necessarily presentin the decision maker’s mind at the beginning of the decision process. It may be constructed,

453


however, by the decision maker and the analyst together in the course of the process. It willbe that the elicitation of the discrimination thresholds can be simplified considerably when theperformance of the alternatives is expressed in SMART grades.

7.5.1 Discrimination Thresholds

In the series of ELECTRE systems (the acronym stands for ELimination Et Choix Traduisantla REallte) ELECTRE III was the first with fuzzy concepts incorporated in it. Lootsma andSchuyt (1997) extensively used it for a comparative study involving the AHP and SMART aswell. For the basic concepts of the French school, the reader can refer to Roy (1985) and Roy andBouyssou (1993). The establishment of an outranking relation proceeds via a pairwise comparisonof two alternatives Aj and Ak under some criterion. It is assumed that the performance of thealternatives is expressed in physical or monetary values φj = φ(Aj) and φk = φ(Ak). Increasingvalues are supposed to generate an increasing strength of preference. When φj < φk, φj is takento stand for the reference value and the strength of preference for Ak is categorized with respectto Aj by inspection of certain inequalities. If

φk ≥ φj + νj(φj) (7.67)

where νj(φj) is the so–called veto threshold , then the preference for Ak over Aj is predominant.It cannot be reversed by an excellent performance of Aj under the remaining criteria. In generaI,the veto threshold is a positive and non–decreasing function of the physical or monetary value ofthe corresponding alternative. If

φj + pj(φ) ≤ φk ≤ φj + νj(φj) (7.68)

where pj(φj) is the so–called preference threshold , then Ak is strictly preferred over Aj . Onecan also say that Ak is situated in the strict–preference zone at the right–hand side of Aj . Ingeneral, the preference threshold is also a positive and non–decreasing function of the physical ormonetary value of the corresponding alternative. If

φj + qj(φj) ≤ φk ≤ φj + pj(φj) (7.69)

where qj(φj) is the so–called indifference threshold , then Ak is weakly preferred over Aj . In otherwords, Ak is situated in the weak–preference zone at the right–hand side of Aj . The indifferencethreshold is also positive and non–decreasing. Finally, if

φj ≤ φk ≤ φj + qj(φj) (7.70)

then the decision maker is indifferent between the two alternatives.

When φk ≤ φj the position of the two alternatives is interchanged. Ak is taken as the referencealternative and the strength of preference for Aj us judged on the basis of the veto, preference,and indifference thresholds at the right–hand side of Ak.

Using this information the analyst can construct the degree of concordance or harmony h(Aj ,Ak)with the statement that Aj outranks Ak. This is a function of the difference φ(Ak) − φ(Aj).

454


Figure 7.22 shows that it has the form of a membership function. It illustrates the degree ofconcordance or harmony with the judgemental statement that the alternative Aj outranks (is atleast as good as) Ak. The degree of concordance decreases linearly from the top level as soon ashas passed the indifference threshold and it arrives at the bottom level as soon as has reached thepreference threshold. It decreases linearly from 1 to 0 over the interval between the indifferencethreshold and the preference threshold.

Figure 7.22. Degree of concordance or harmony

On the basis of similar arguments the analyst can construct the degree of discordance d(Aj ,Ak).The function is shown in graphical form in Figure 7.23 with the degree of discordance with thejudgemental statement that the alternative Aj outranks (is at least as good as) Ak. The degreeof discordance increases linearly from the bottom level as φk has passed the preference thresholdand it arrives at the top level as soon as φk has reached the veto threshold.

Figure 7.23. Degree of discordance

In the formulas (7.67) through (7.70) thresholds are found only on the right–hand side of thereference alternative Aj . If an indifference threshold is now considered on the left–hand side, itshould clearly have the property that the decision maker is indifferent between Aj and Ak if

φj − qj′(φj) ≤ φk ≤ φj (7.71)

As soon as the variable φk coincides with the point where the transition from indifference to weakpreference occurs it has to satisfy the equality

φk = φj − q′j(φj) (7.72)

but at such a coincidence it must also be true that

φj = φk + qk(φk) (7.73)

This implies that the indifference thresholds on the right–hand side and the left–hand side arenot independent (Roy, 1985). The same conclusion can be drawn, on similar grounds, for thepreference and the veto thresholds.

455


If the indifference thresholds on the right–hand side can be written in the form

qj(φj) = q × φj for any j = 1, . . . ,n

where q is a proportionality factor which does not depend on the alternatives (it is typical for thecriterion under consideration only), then relations (7.72) and (7.73) yield

q′j(φj) =

q

1 + q× φj for any j = 1, . . . ,n

One does not have to specify the thresholds around φj in relation to φj only. If the indifferencethreshold on the right–hand side has the more general form

qj(φj) = q × (φj − φmin) (7.74)

where φmin is the lower endpoint of the range of acceptable performance data under the givencriterion, then

q′j(φj) =

q

1 + q× (φj − φmin) (7.75)

Similarly, if the indifference threshold on the right–hand side is proportional to the deviationfrom the upper endpoint φmax of the range of acceptable performance data so that

qj(φj) = q × (φmax − φj) (7.76)

then one can easily derive

q′j(φj) =

q

1− q× (φmax − φj) (7.77)

In order to choose the indifference thresholds with the above formulas; there is an opportunityto merge the ideas of the AHP, SMART, and ELECTRE. In order to anchor the discriminationthresholds in the model of the decision problem, they are linked to the endpoints of the rangeof acceptable performance data. Recall that the relative preference was identified with the ratioof the deviations from the non–desired endpoint φmin of the range of acceptable performancedata (the maximum speed of vessels with φmin = 14 kn, for instance) or with the inverse ratioof the deviations from the desired target φmax (vessels under the operability criterion with φmax

= 100%, for instance). Furthermore, the ratio 4:1 was identified with weak preference and theratio 16:1 with strict or definite preference. There is a hesitation between indifference and weakpreference when the ratio is roughly 2:l. This observation can now fruitfully be employed. Usingthe formulas (7.74) and (7.75) with q = 1 indifference thresholds are obtained at the pointsφj − (φj − φmin)/2 and φj + (φj − φmin). The position of these thresholds is shown in Figure7.24.

456


Figure 7.24. Indifference thresholds around φj with respect to the non–desired endpoint of therange of acceptable performance data (maximum speed of vessels, for instance)

Similarly, using the formulas (7.76) and (7.77) with q = 1/2 the indifference thresholds are foundat φj − (φmax − φj) and φj + (φmax − φj)/2 (see Figure 7.25).

The preference thresholds may be found in the same way. Recall that the transition from weak tostrict or definite preference occurs when the ratio of the deviations from the desired or the oppositeendpoint of the range equals 8:l. In the next subsection the reader see that the discriminationthresholds can be identified in a much simpler way as soon as the physical or monetary values ofthe alternatives have been converted into grades on an arithmetic scale.

Figure 7.25. Indifference thresholds around φj with respect to the desired target at the end of therange of acceptable performance data (operability of vessels, for instance)

Under a qualitative criterion the performance of the alternatives cannot be expressed in physicalor monetary values but it can usually be expressed on a numerical scale such as l, 2, . . . , 7, or onthe SMART scale 4, 5, . . ., 10. The subsequent identification of the discrimination thresholds willbe explained in detail in the next subsection. The decision maker may also come to the conclu-sion that the two alternatives are incomparable under the given criterion so that an outrankingrelation cannot be established.

The aggregation procedure of ELECTRE III is now briefly sketched . It is much more complicatedthan the aggregation procedure of the AHP and SMART so that a complete description is omitted.First some additional notation is needed. The degree of concordance and the degree of discordanceof alternative Aj versus Ak under the ith criterion will be denoted by hi(Aj ,Ak) and di(Aj ,Ak),respectively. There is also an importance factor (sometimes referred to as the voting power) ki

associated with the ith criterion. The degree of global concordance or harmony with the statementthat Aj outranks Ak is defined as

H(Aj ,Ak) =

m∑

i=1

ki hi(Aj ,Ak)

m∑

i=1

ki

457


The discordance information is used to weaken the degree of global concordance, but one onlyemploys the degree of discordance under the criteria such that

di(Aj ,Ak) > H(Aj ,Ak)

The degree of credibillty that alternative Aj outranks Ak is now given by

H(Aj ,Ak)×Πi1− di(Aj ,Ak)1−H(Aj ,Ak)

where the product runs over the criteria just mentioned. If such criteria cannot be found, thedegree of credibility is set to the degree of global concordance. Obviously, the degree of credibilityequals zero if there is a criterion where the veto threshold at the right–hand side of Aj has beenpassed by Ak so that the degree of discordance equals one. The matrix of the degrees of credibilityis finally used to rank the alternatives. The rank order may be incomplete, which implies thatfor some pairs of alternatives there is not enough evidence to establish a preference relation.

ELECTRE is clearly full of fuzzy information but it is difficult to analyze.

7.6 Fuzzy Multiobjective Optimization

Many features of real-life single-objective optimization problems are imprecise. The values ofthe coefficients are sometimes merely prototypical, the requirement that the constraints must besatisfied may be somewhat relaxed, and the decision makers are not always very satisfied with thevalue attained by the objective function. Multiobjective Optimization introduces a new feature:the degrees of satisfaction with the objective–function values play a major role because they en-able the decision makers to control the convergence towards an acceptable compromise solution.Since the objective functions have different weights for the decision maker we also have to controlthe computational process via weighted degrees of satisfaction.

Multiobjective optimization has two subfields: (i) the identification of the nondominated solu-tions, and (ii) the selection of a nondominated solution where the objective-function values arefelt to be in a proper balance. The first–named subfield can be studied in the splendid isolationof mathematical research. From the early days in multiobjective optimization it has attracted Yuand Zeleny (1975) who provided a substantial contribution to the characterization of nondomi-nated (efficient, Pareto–optimal) solutions. The second subfield, however, straddles the boundarybetween mathematics and other disciplines because human subjectivity is an integral part of theselection process. Certain parameters (weights, targets, desired levels) are adjusted on the ba-sis of new preference information, whereafter the computations proceed in a somewhat modifieddirection. Several methods optimize a so–called scalarizing function, that is, a particular combi-nation of the objective functions which also contains a set of weights to control the computationalprocess. It is not always clear, however, how the weights could be used in order to obtain a rapid‘convergence’ towards an acceptable compromise solution.

458

7.6 – Fuzzy Multiobjective Optimization

Therefore a fuzzy concept is introduced: the degree of satisfaction with an objective function, aquantity between 0 and 1 expressing the position of the objective–function value between the idealand the nadir value. In order to arrive at a non-dominated solution where the objective functionsare reasonably balanced, a weighted geometric mean of the degrees of satisfaction is maximized.Numerically, this is equivalent to the maximization of a weighted geometric mean of the deviationsfrom the nadir values. The composite function so obtained has a particular property; the relativesubstitution rate between any two function values along an indifference curve is equal to theratio of the corresponding weights, regardless of the performance of the alternatives under theremaining objectives and regardless of the units of performance measurement. Some numericalexperiments, which are illustrated below, will demonstrate that the minimization of the weightedCebycev–norm distance and the maximization of the weighted degrees of satisfaction produceroughly the same nondominated solution, although deviations from the ideal values are minimizedin the first–named approach whereas deviations from the nadir values are maximize in the secondapproach. Thus, there are at least two approaches which seem to process the concept of therelative importance of the objective functions in a usable manner.

7.6.1 Ideal and Nadir Values

The multiobjective optimization problem is concerned with maximizing the concave objectivefunctions

fi(x) , i = 1, . . . ,p

over the set C of points satisfying the constraints

gi(x) ≥ 0 , i = 1, . . . ,m

with concave constraint functions gi defined on the n–dimensional vector space En so that C isa convex subset of En. In addition, it is assumed that C is closed and bounded so that there isa maximum solution for each objective function separately. For ease of exposition it is assumedthat each objective function has a unique maximum solution over C. Such a point is accordinglynondominated because a deviation from it will reduce at least one of the objective functions. Themaximum solution of fi will be denoted by xi.

It is customary in multiobjective optimization to consider not only the n–dimensional decisionspace En of x–vectors but also the p–dimensional objective space Ep of z–vectors which containsthe set f(C) of vectors z = f(x), x ∈ C. The symbol f obviously denotes the mapping fromEn into Ep with the components fi, i = 1, . . . ,p. The original problem can now equivalently berestated as the problem of maximizing the components of z subject to the constraints z ∈ f(C).In the linear case, when the problem functions f1, . . . ,fp, g1, . . . ,gm are linear, the sets C andf(C) are both simplices. When the problem functions are concave and non–linear, however, onecannot in general guarantee that f(C) is convex.

It is useful in multiobjective optimization to calculate the so–called single–objective maximumsolutions xı because the ideal values

459


zmaxi = fi (xı) , i=1,..., p(7.78)

show the decision maker how far one could go with each objective function separately. Thedecision maker may even decide, before the analysis is continued, to relax certain constraintswhen some of the ideal values are still rather low, or to introduce new constraints guaranteeingthat some objective functions will remain above a certain level, whereafter the ideal values arerecalculated. The ideal values are unique, even if the single–objective maximum solutions xı arenot. The ideal vector zmax with the components zmax

i , i = 1, . . . ,p, is normally outside f(C).Otherwise there would not be a real problem for the decision maker. An indication of the worstpossible outcome for the respective objective functions is given by the so–called nadir values

zmini = min

j=1,...,p[fi(xj)] , i = 1, . . . , p (7.79)

By the assumed uniqueness of the single-objective maximum solutions it must be true that thenadir values are also unique and that

zmaxi > zmin

i , i = 1, . . . ,p

The nadir vector zmin with the components zmini , i = 1, . . . ,p, does not necessarily belong to.f(C),

so that it may be too pessimistic about the possible variations of the corresponding objectivefunction. Figure 7.26 sketches the position of the ideal and the nadir vector in a problem withtwo objective functions. The objective space is accordingly two–dimensional, and the directionsof optimization are parallel to the coordinate directions.

Figure 7.26. Ideal and nadir vector in a two–dimensional objective space

7.6.2 Weighted Cebycev–Norm Distance Functions

The leading idea in the original method of multiobjective optimization, is to solve the problemof minimizing the weighted Cebycev–norm distance function

maxi=1,...,p

= {wi{zmaxi − fi(x)} (7.80)

over the constraint set C, with weight coefficients wj .

460


This problem can easily be rewritten as the problem of minimizing a new variable y subject tothe constraints

y ≥ wi {zmaxi − fi(x)} , i = 1, . . . ,p , x ∈ C (7.81)

The idea to minimize the distance function has been generalized in the reference–point methodof Wierzbicki (1980). If the jth and the kth constraint of (7.81) are active at a point (x, y)minimizing y over the set defined by (7.81), then

wj

{zmaxj − fj(x)

}= wk {zmax

k − fk(x)} = y (7.82)

and this leads tozmaxj − fj(x)

zmaxk − fk(x)

=wk

wj

Hence, the deviations of the jth and the kth objective–function values from the correspondingideal values is inversely proportional to the weights. Practical experience has shown that this isan attractive property when attempts are made to control the approach towards an acceptablecompromise solution, although a three–dimensional linear example is sufficient to show that apoint x minimizing the distance function (7.80) over C is not necessarily efficient and not neces-sarily unique.

In order to avoid the danger of generating solutions which are not efficient, Wierzbicki (1980)proposed to add a perturbative term of the form

p∑

i=1

εi{zmaxj − fj(x)}

to the distance function (7.80), with small positive numbers εi. It is beyond the scope of thepresent volume, however, to discuss the choice of these numbers and the usefulness of the aboveperturbation.

The decision maker can express his/her reluctance to deviate from the components of the idealvector by a proper choice of the weights wi. This feature has extensively been explored by Koket al. (1985) and Kok (1986) in experiments with a long–term energy–planning model for thenational economy of The Netherlands. In order to explain the choice of the weights, we firstrewrite them in the form

wi =ρi

zmaxi − zmin

j

The decision maker is then requested to estimate the ρi via pairwise comparisons, in fact a pro-cedure which is highly similar to the mode of operation described in Chapter 5. In the basic step,the j–th and the k–th objective function are presented to the decision maker whereafter he/sheis asked to specify the ratio which is acceptable for the deviations from the ideal vector. Thus,the decision maker is supposed to estimate the acceptable ratio of ρi and ρk. In principle, thesequestions can be rather precise. The analyst can ask him/her whether a 10% deviation from the

461


ideal value zmaxj in the direction of the nadir value zmin

j is equivalent to a 10% deviation fromzmaxk in the direction of zmin

k . If the answer is ‘yes’ and if the decision maker declares that he/sheis indifferent between 25% deviations in both directions and also between 50% deviations, thenthe ratio ρj/ρk can reasonably be estimated by the value of 1. The analyst can also vary thepercentages: if the decision maker is indifferent between a 10% deviation from the ideal valuezmaxj in the direction of the nadir value zmin

j and a 50% deviation from zmaxk in the direction of

zmink , then the ratio ρj/ρk can be estimated by the value of 5. The analyst clearly takes the inverse

ratio of the deviations because a higher weight corresponds to a higher reluctance to deviate fromthe ideal value. Finally, when all or almost all pairs of objective functions have been considered,the matrix R of ratio estimates provides the analyst with the information to calculate a set ofvalues for the ρi. A detailed description of the procedure may be found in Section 5.2. When thepairwise comparisons are complete, the weights can be estimated by the geometric row meansof R. The weights are not unique since the analyst collected ratio information only. There is amultiplicative degree of freedom which can be used to normalize them in the sense that they addup to 1 or to 100%.

The analyst

zmaxj − fj(x)

(zmaxj − zmin

j )

/ zmaxk − fk(x)

(zmaxk − zmin

k )=

ρk

ρj(7.83)

7.6.3 Weighted Degrees of Satisfaction

For each feasible solution x there is a vector (f1(x), . . . ,fp(x)) of objective-function values ex-pressing the performance of x under the respective objectives. The degree of satisfaction µi(x)with the solution x under the ith objective to be defined by

µi(x) =fi(x)− zmin

i

zmaxi − zmin

i

an expression which increases monotonically from zero to one when fi(x) increases from the nadirvalue to the ideal value. If the degree of satisfaction is defined to be zero below the nadir valueand one above the ideal value, it has the form of a membership function. The global degree ofsatisfaction is now deemed to be given by the weighted geometric mean

Πpi=1µi(x)ci = Πp

i=1

(fi(x)− zmin

i

zmaxi − zmin

i

)

ci

(7.84)

where the ci,i = 1, . . . ,p, stand for normalized weights assigned to the objective functions. Theproblem is to maximize function (7.84) over the set

{x | fi(x) ≥ zmin

i i = 1, . . . ,p, x ∈ C}

(7.85)

The logarithm of (7.84) is concave over the set (7.85) so that any local maximum of (7.84) is also aglobal maximum. Moreover, the function (7.84) depends monotonically on the objective–functionvalues so that any maximum solution is non–dominated (efficient, Pareto–optimal). In order to

462


analyze of the weights the function (7.84) is somewhat generalized and the geometric mean of thedeviations from the nadir values is considered. which is defined by

F (z) = Πpi=1βi (zi − zmin

i )ci (7.86)

where the βi represent arbitrary positive factors which are due to the choice of the units ofperformance measurement. The first–order partial derivatives of F are given by

∂F

∂zi=

ci

(zi − zmini )

× F

whence

1(zj − zmin

j )∂F

∂zk/

1(zk − zmin

k )∂F

∂zj=

ck

cj(7.87)

for arbitrary j and k, and regardless of the factors βi, . . . ,βp. It is now possible to study thebehavior of F along a contour or indifference curve in the (zj ,zk) space (see also subsection5.3 where the behavior of the geometric–mean aggregation rule was studied). In a first–orderapproximation a move towards alternative points for which the decision maker is indifferent (inthe sense that they have the same global degree of satisfaction) proceeds in a direction which isorthogonal to the gradient of F , that is, in the direction

{∂F

∂zk,∂F

∂zj

}

The left–hand side of (7.86) is now defined as the relative substitution rate since it is based onthe observation that human beings generally perceive relative gains and losses, that is, gains andlosses in relation to the level from which the move starts. Thus, when a small step is made alongthe indifference curve, the relative gain (or loss) in the zj–direction is proportional to

1(zj − zmin

j )∂F

∂zk

and the corresponding relative loss (or gain) in the zk–direction is proportional to

1(zk − zmin

k )∂F

∂zj

For a function F of the form (7.86) the substitution rate between the relative gains and losses inthe (zj ,zk) space (the left–hand side of (7.87) is a constant which does not depend on the valuesof the remaining variables. Since it is also a dimensionless quantity which does not depend onthe units of measurement either, it can meaningfully be referred to as a model for the relativeimportance of the objective functions. In fact, formula (7.87) presents an inverse proportional-ity: if ck > cj then a larger step in the zj–direction is compensated by a sma1ler step in thezk–direction. This is just what one may expect when objective functions have different weightsin the decision maker’s mind.

463


It is easy to see that the maximization of a weighted arithmetic mean of the objective functionssuch as

p∑

i=1

cifi(x)

although it seems to be a popular method for solving multiobjective optimization problems, hasthe disadvantage that the maximum solution (a non–dominated solution) depends strongly onthe units of performance measurement. The decision makers who choose the weights ci are notalways aware of this. Hence, their information is meaningless if a weighted arithmetic mean ofthe objective functions is used as a scalarizing function. One could avoid the dependence on theunits of performance measurement by the maximization of the function

p∑

i=1

cifi(x)− zmin

i

zmaxi − zmin

i

but it is unclear why one should ever employ the ideal and the nadir values in this function,and not in the weighted Cebycev–norm distance function which enables the user to control thedeviations from the ideal values.

7.6.4 Numerical Example

In order to illustrate the two methods, minimization of the weighted Chebycev–norm distancefunction and maximization of the weighted degrees of satisfaction, a simple numerical exampleis first considered: the multiojective optimization problem of maximizing the objective functionsx1, x1, and x1 subject to the constraint

a1x21 + a2x

22 + a3x

23 ≤ 1

with positive coefficients in the left–hand side. The single–objective maximum solutions are givenby the points (

1√ai

, 0, 0

), ,

(0,

1√ai

, 0

), ,

(0, 0,

1√ai

)

respectively.

The ideal vector is given by (1√ai

,1√ai

,1√ai

)

and the nadir vector is the origin (0, 0, 0). The problem of finding a feasible solution where theweighted Chebycev–norm distance from the ideaI vector is minimized can be formulated here asthe problem of minimizing the variable y subject to the constraints

y ≥ wi

(1√ai− xi

), i = 1,2,3

a1x21 + a2x

22 + a3x

23 ≤ 1

464


With the weights rewritten as wi = ρ√

ai the problem is to minimize y subject to

y ≥ ρi (1− xi√

ai) , i = 1.2.3

a1x21 + a2x

22 + a3x

23 ≤ 1

Using the Kuhn–Tucker conditions for optimality one can easily show that all constraints areactive at an optimal solution (x, y) if the weights are not too small. Then

xi =1√ai·(

1− y

ρi

)

whereas y can be solved from the quadratic equation3∑

i=1

(1− y

ρi

)2

= 1 (7.88)

The relative deviations from the ideal values (relative in the sense that they are given as fractionsof the deviations between the corresponding ideaI and nadir values) are given by

[1√ai− 1√

ai

(1− y

ρi

)]/

1√ai

=y

ρi, i = 1,2,3

These deviations do not depend on the coefficients ai If one of the weights is so small that equa-tion (7.88) does not have a real solution, the corresponding component of x must be set to zerowhereafter the remaining components can be calculated from a reduced set of Kuhn–Tucker con-dition.

The maximization of the weighted degrees of satisfaction

Π3i=1x

cii

subject to the constraint

a1x21 + a2x

22 + a3x

23 ≤ 1

with normalized weights ci yields as the unique solution

xi =√

ci

ai, i = 1, 2, 3

so that the relative deviations from the ideal values are given by(

1√ai− 1√

ai

√ci

)/

1√ai

= 1−√ci , i = 1,2,3

The relative deviations for a few sets of weights are now calculated in order to illustrate how thetwo methods control the computational process. Taking ρ1 = c1 = 0.6 and ρ2 = ρ3 = c2 c3 = 0.2,the following results are obtained

465


minimization of weighted Cebycev–norm distance 0.19 0.58 0.58maximization of weighted degrees of satisfaction 0.23 0.55 0.55

Note that formula (7.83) is satisfied by the relative deviations from the ideal values when theweighted Chebycev–norm distance function is minimized: the desired ratios of the deviations(the inverted ratios of the corresponding weights) are indeed produced by the procedure. Themaximization of the weighted degrees of satisfaction yields roughly the same ratios.

A similar pattern of results is obtained for the two methods when ρ1 = ρ2 = c1 c2 = 0.4 andρ3 = c3 = 0.2


By a more ‘extreme’ choice of the weights, however, larger discrepancies are found between thetwo methods. If ρ1 = c1 = 0.7, ρ2 = c2 = 0.2, and ρ3 = c3 = 0.1, for instance, one founds thefollowing results:


Nevertheless, the results seem to be close enough to speculate that the two methods make theelusive concept of the relative importance of the objective functions operational. This is confirmedby the real–life example of the next subsection.

7.6.5 Design of a Gearbox

Multiobjective optimization has a firm place in mechanical engineering. Lootsma et al. (1995)used the experience of Athan (1994) with the gearbox design problem (Osyczka, 1984) in orderto analyze the behavior of the two methods which are considered in this subsection.

The key issues are briefly summarized. Multi-speed gearboxes are used in automobiles, machinetools, and other machines to provide a range of output rotational velocities for any input rota-tional velocity via combinations of mating gears mounted upon parallel shafts. Generally, eachshaft carries more than one gear. The operator ‘changes the gears’ by disengaging one set ofgears and engaging another. The design of a gearbox is, first, concerned with important layoutdecisions regarding the number of shafts, the distances between them, and the number as wellas the placement of the gears on the shafts. Next, the transmission ratios are determined bythe choice of the diameters of the gears, whereas strength considerations determine the numberof teeth on each gear, the modules of the gear wheels, and the tooth widths. Osyczka (1984)studied the layout design problem separately, and he proposed to model it as a multiobjectiveoptimization problem with four objectives, one of them pertaining to the dynamic performance

466


of the gearbox, the others to its weight and dimensions. Athan (1994) used the same separation,added some details to clarify Osyczka’ s problem formulation, and solved the problem with avariety of multiobjective optimization techniques.

Gearbox design generally lies within the realm of mixed continuous–discrete optimization. Gearteeth must be specified in integer numbers, and the related design variables are restricted to afinite set of integer values. Because of the difficulties encountered in mixed continuous–discreteoptimization, designers have often solved the gearbox design problem as a problem with contin-uous variables, whereafter they rounded-off the resulting values to the nearest allowable values.This mode of operation does not necessarily produce an optimal solution to the original mixedproblem. In the present chapter, however, only the results of the continuous problem are shownand the integrality requirements are ignored. Following the model reduction principles of Pa-palambros and Wilde (1988), Athan (1994) reformulated the originaI problem as a non–linearoptimization problem with 14 variables and 43 constraints. As mentioned before, there were fourobjectives, the minimization of

1. the volume of the material to be used for the gears,

2. the maximum peripheral velocity, pertinent to vibrations and noise,

3. the width of the gearbox,

4. the distance between the shafts.

The objective functions, not necessarily convex, are to be minimized (not maximized, so thatone has to use the formulas of the preceding subsections with some care) over a constraint setwhich is not necessarily convex either. The results to be reported here have been obtained withthe NLPQL subroutine which is based upon a method using quadratic approximations to thefunctions in the problem to be solved. A priori knowledge of the problem, common sense, andfour single–objective optimization runs generated the ideal vector

(9.6,10.5,226,284)

as well as the nadir vector

(13.3,27.6,473,471).


Run ρ1 ρ2 ρ3 ρ4 f1 f2 f3 f4

1 0.55 0.15 0.15 0.15 9.7 16.6 314 351

2 0.15 0.55 0.15 0.15 11.2 12.6 339 369

3 0.15 0.15 0.55 0.15 9.9 19.8 263 386

4 0.15 0.15 0.15 0.55 10.1 17.9 377 315

Table 7.14. Gearbox design problem: minimization of the weighted Chebycev–norm distance

Table 7.14 exhibits the objective–function values obtained in four runs, where the weightedChebycev–norm distance from the ideal vector is minimized for given sets of weights.

467


Obviously, run i gives the highest priority to the ith objective function (ρi = 0.55) and equalpriorities (ρi = 0.15,j 6= i) the remaining ones.

Maximization of the weighted degrees of satisfaction (the weighted geometric mean of the de-viations from the nadir vector) with the same weights produced the results displayed in Table7.15.


Run c1 c2 c3 ρc f1 f2 f3 f4

1 0.55 0.15 0.15 0.15 9.7 15.7 313 355

2 0.15 0.55 0.15 0.15 10.0 12.6 355 343

3 0.15 0.15 0.55 0.15 9.7 16.8 261 393

4 0.15 0.15 0.15 0.55 9.8 14.4 385 315

Table 7.15. Gearbox-design problem: maximization of the weighted degrees of satisfaction

At first sight, the results of the two methods are highly similar. Let us examine. them in moredetail, however.

In the run 1 relative deviations from the ideal values generated by the two methods can besummarized as follows

minimization of weighted Cebycev–norm distance ≈ 0.03 0.36 0.36 0.36maximization of weighted degrees of satisfaction ≈ 0.03 0.30 0.35 0.38

The three entries 0.36, for instance, related to the second, the third, and the fourth objectivefunction respectively, represent the relative deviations

16.6− 10.527.6− 10.5

314− 226473− 226

and351− 284471− 284

with 16.6, 10.5, and 27.6 standing for the computed, the ideal, and the nadir value of the secondobjective function, etc. Obviously, the relations (7.83) are satisfied by these objective functionswhen the Cebvcev–norm distance is minimized (recall that ρ2 = ρ3 = ρ4 in run l), but theratio 0.36/0.03 is ‘better’ than the desired ratio 0.55/0.15, in the sense that the first objectiveis closer to the ideal value than it was requested, possibly at the expense of the other objectivefunctions. Maximization of the weighted degrees of satisfaction yields practically the same results.

The run 2 relative deviations generated by the two methods are as follows

minimization of weighted Cebycev–norm distance 0.43 0.12 0.46 0.45maximization of weighted degrees of satisfaction 0.10 0.13 0.52 0.62

Note that the relations (7.83) are indeed satisfied when the Chebycev–norm distance is minimized(0.43/0.12 ≈ 0.55/0.15 and 0.43 ≈ 0.46 ≈ 0.45). Nevertheless, maximization of the weighteddegrees of satisfaction gives a ‘better’ result, possibly at the expense of the fourth objective.

468


The relative deviations generated by the two methods in run 3 are

minimization of weighted Cebycev–norm distance 0.09 0.54 0.15 0.55maximization of weighted degrees of satisfaction ≈ 0.03 0.37 0.14 0.58

They show that the maximization of the weighted degrees of satisfaction yields slightly betterresults than the minimization of the weighted Cebvcev–norm distance. The relations (7.83) arenot satisfied by the first objective function.

Finally, run 4 generates the relative deviations:

minimization of weighted Cebycev–norm distance 0.14 0.43 0.61 0.16maximization of weighted degrees of satisfaction ≈ 0.05 0.23 0.64 0.17

Maximization of the weighted degrees of satisfaction seems again to do somewhat better thanminimization of the weighted Cebvcev–norm distance. The relations (7.83) are not satisfied bythe first and the second objective function.

Obviously, as soon as the ideal and the nadir vector are known, the choice of the weights enablesthe decision maker to home in towards a compromise solution where the deviations from the idealvector are simply related to what is intuitively known as the relative importance of the objectives.

The assumptions underlying the above analysis can be somewhat relaxed. So far, we havee.xtensively used the ideal and the nadir vector because this is the initial information whichthe decision maker can (and normally will) collect when the problem under consideration is new.At a later stage he/she may choose a reference point on the basis of his/her intuition and knowl-edge. Minimization of the weighted Cebvcev–norm distance from an unfeasible reference point aswell as maximization of the weighted degrees of satisfaction from a feasible reference point maylead to ‘further improvements of the decision maker’ s guess at an acceptable compromise solu-tion. Similarly, although convenient convexity conditions were assumed, the crucial requirementsare differentiability (in order to optimize via gradient methods) and the ability to find globaloptima (in order to find the ideal and the nadir values). Moreover, the decision maker may needother scalarizing functions in order to find non–dominated solutions when the constraint set isnon-convex.

469


470

Bibliography

[1] Athan, T.W.: A Quasi–Monte Carlo Method for Multicriteria Optimization, Ph.D. Thesis, Mechan-ical Engineering and Applied Mechanics, The University of Michigan, Ann Arbor, 1994.

[2] Bel1man, R., and Giertz, M.: On the Analytic Formalism of the Theory of Fuzzy Sets, InformationScience, Vol. 5, 1973, pp. 149–156.

[3] Cooke, R.W.: Experts in Uncertainty , Oxford University Press, New York, 1991.

[4] Dubois, D., and Prade, H.: Fuzzy Sets and Systems, Theory and Applications Academic Press, NewYork, 1980.

[5] Dubois, D., and Prade, H.: Possibility Theory, an Approach to Computerized Processing ofUncertainty Plenum Press, New York, 1988.

[6] Fodor, J., and Roubens, M.: Fuzzy Preference Modelling and Multi–Criteria Decision Support ,Kluwer, Dordrecht, The Netherlands, 1994.

[7] Gaines, B.R.: Precise Past, Fuzzy Future, International Journal of Man–Machine Studies, Vol. 19,1983, pp. 117–134.

[8] Kay, P., and McDaniel, C.K.: The Linguistic Significance of the Meaning of Basic Color Terms,Language 54, 1978, pp. 610–646.

[9] Kok, M.: Conflict Analysis via Multiple Objective Programming , Ph.D. Thesis, Delft University ofTechnology, Delft, The Netherlands, 1986.

[10] Kok, M., and Lootsma, F.A.: Pairwise–Comparison Methods in Multiple Objective Programming,with Applications in a Long-Term Energy-Planning Mode1 . European Journal of OperationalResearch, 1985, Vol. 22, no. 44–55.

[11] Kosko, B.: Fuzzy Thinking, the New Science of Fuzzy Logic, Hyperion, New York, 1994.

[12] Kosko, B.: Neural Networks and Fuzzy Systems, Prentice Hall, Englewood Cliffs, New Jersey, 1992.

[13] Lootsma, F.A.: Optimization with Multiple Objectives, in ‘Mathenlatical Progranmming, RecentDevelopments and Applications’. KTK Scientific Publishers, Tokyo, 1989, pp. 333–364.

[14] Lootsma, F.A., Athan, T.W., and Papalambros, P.Y.: Contro1ling the Search for a CompromiseSolution in Multi–Objective Optimization. Engineering Optimization, 1995, Vol. 25, pp. 65-81.

[15] Lootsma, F.A., and Schuyt, H.: The Multiplicative AHP, SMART, and ELECTRE in a CommonContext , Journal of Multi-Criteria Decision Analysis, Vol. 6, 1997.

[16] McNeill, D. and Freiberger, P.: Fuzzy Logic, Touchstone, New York, 1993.

[17] 0syczka, A.: Multicriterion Optimization, in Engineering”. Wiley, New York, 1984.[18] Papalambros, P.Y., and Wilde, D.J.: Principles of Optimal Design, Cambridge University Press,

Cambridge, UK, 1988.[19] Rosch, E.: Principles of Categorization in Rosch and Lloyds eds., ‘Cognition and Categorization’,

Lawrence Erlbaum, Hillsdale, 1978, pp. 27–48.

[20] Roy, B.: Methodologie Multicritere d’Aide a la Decision, Econornica, Collection Gestion, Paris, 1985.

471

Bibliography

[21] Roy, B., and Bouyssou, D.: Aide Multicritere a la Decision: Methodes et Cas. Economica, CollectionGestion, Paris, 1993.

[22] Roy, B. and Vanderpooten, D.: The European School of MCDA: Emergence, Basic Features andCurrent Works, Journal of Multi–Criteria Decision Analysis, Vol. 5, pp. 22–37.

[23] Saaty, T.L.: The Analytic Hierarchy Process, Planning, Priority Setting, Resource Allocation,McGraw-Hill, New York, 1980.

[24] Smithson, M.: Fuzzy Set Analysis for Behavioral and Social Sciences, Springer, New York, 1987.

[25] Wierzbicki, AP.: A Mathematical Basis for Satisficing Decision Making . WP-80-90, IIASA,Laxenburg,, 1980.

[26] Winterfeldt, D. and Edwards, W.: Decision Analysis and Behavioral Research, Cambridge UniversityPress, Cambridge, UK, 1986.

[27] Yu, P.L., and Zeleny, M.: The Set of All Nondominated Solutions in Linear Cases and a Multicri-teria Simplex Method , Joumal of Mathematical Analysis and Applications, 1975, Vol. 49, pp. 430–468.

[28] Zadeh, L.A.: Fuzzy Sets, Information and Control, Vol. 8, 1965, pp. 338–353.

Zimmermann, H.J.: Fuzzy Set Theory and Its Applications, Third edition, Kluwer AcademicPublishers, Boston/Dordrecht/London, 1996.

472

Chapter 8

Engineering Economics

Economics is an important aspect of human activities and engineering as well. It deals withthe use of scarce resources: e.g. materials, human skill, energy, machinery, and last but notleast capital . Thus economics is not just about money; nevertheless, money is necessary asa common monetary unit to prepare analysis of alternative designs.

This chapter is primarily concerned with the principles of engineering economics, whichcan be defined as the scientific tool to help rational design decisions. Several availableeconomic measures of merit are illustrated and rational methods are emphasized for selectingcriteria suitable for different economic scenarios. The importance of stipulating reasonablerates of interest is stressed and, in this respect, the influence of financial factors is explained.

Engineering economics is a powerful but often neglected tool in ship design. Every designdecision should consider how the decision would affect the overall economics of the ship inquestion. Engineering economics provides a means to evaluate economic investment among alarge set of alternative designs since the conceptual design stage, by studying the differencesin cash flow that should result from each alternative solution. It offers a criterion whichtakes all aspects of the alternative designs into account and by which they may be ranked.

Whereas there are no good reasons to suppose the economists can design ships, this doesnot matter very much: they do not have to design. But the reverse argument does notapply. Naval architects who are concerned with the design of merchant and offshore shipshave to do with economics in some form.

When discussing with some prospective shipowner to sale their designs, naval architectshave not to talk about technical issues. They have, rather, to talk about economics,finding out what his/her needs are, expressed in functional terms, and what economic

473

8 – Engineering Economics

measure of merit is most convincing in his/her eyes, and be ready to deal with certainnecessary details (depreciation plan, tax rates, freight rates, interest rate, charter rates, etc.).

Engineering economics is closely akin to systems engineering , an organized approach todecision making. This is a systematic way of attacking a problem, using the followingdiscrete steps:• clearly define the objective in functional terms;• indicate clearly under which constraints the system is to operate (e.g., flag of registry,

classification society requirements, port and canal limits, labor union agreements,loading and unloading facilities, etc.);

• define the economic measure(s) of merit to be used in choosing among alternativedesigns, and which affecting values (e.g., tax rates) are to be assumed;

• predict the quantitative value(s) of the measure(s) of merit likely to be attained byeach of the alternative strategies.

• append summary of any important influencing considerations that cannot be reducedto monetary terms (e.g., political implications).

Until relatively recent years naval architects and marine engineers were not instructed inpractical economics as part of their formal education. As an unfortunate result, most ofthe big and important studies bearing on ships design, as well as on design of offshoreplatforms, were (and often still are) unfortunately made by accountants. Accounting isan admirable and necessary profession. Those reared in its complexities are not, however,ideally suited to the task of analyzing alternative design proposals - at least not bythemselves. Of course, accountants do not pretend to understand the technical mattersinvolved in ship design. More than that, they are trained to look back, not ahead, and theyallow the arbitrary strictures of book keeping rules to distort their thinking. Three examples :• accountants tend to ignore lost–opportunity costs because they are not entered in the

books;• accountants tend to treat imaginary depreciation costs as though they actually exist;• accountants normally accept money at face value just as though inflation did not exist.

Wise ship design decisions require teamwork between engineers, business managers, andoperating personnel. Three additional observations are pertinent at this point:• Decisions are between alternative designs. In making comparisons, designers must

concentrate their attention on assessing those cost factors that could be differentbetween the alternatives.

• Since much guess work is involved in predicting future conditions, cost projections arebound to be crude.

• Most engineering decisions should be made on a basis of simple economic analysis.Prudent business managers usually select their options on straightforward economics.

8.1 Engineering Economics and Ship Design

Ship design involves countless decisions where many factors must be weighted beforereaching a decision. Naval architects and marine engineers have traditionally slighted or

474

8.1 – Engineering Economics and Ship Design

misused economics as a tool in ship design. When making a decision a designer shouldbe sure that in privileging one sub–system, others are not overly degraded. To avoid suchsub–optimization the overall economics of the entire system should be analyzed. The actualoutcome of an optimal design is not only to generate an ideal hull form for lower resistance orminimum fuel consumption, but also to carry desired cargo at minimum cost. Design of the‘best possible’ ship can only be identified by comparing alternative designs in economic terms.

The technical and economic capabilities of a ship are mutually dependent, and any attemptto design a ship without due recognition of this interdependence cannot be expected toprovide the ‘best possible’ solution. Naval architects must aspire to produce designs whichoffer the ‘best possible’ solution to the shipowners’ requirements. It follows, therefore, thatthey must be able to make valid economic analyses and estimates of both building andoperating costs of ships.

Engineering success of a technical system depends substantially on economic success. Everydesign decision should consider how that decision would affect its overall economics. Aconstant guiding principle in decision making is the analysis of costs and economic benefits;these comprise the primary role of engineering economics in all engineering disciplines. Inthe design of a product, process, or system, the engineer makes a multitude of choicesof configurations, subsystems, components, and materials; economic factors are centralto engineering decision making in designing a product or process (Hazelberg, 1994). Thequantitative understanding of economic implications in the design and operation of aproduct or process is indispensable.

Engineering economics should concentrate on the difference between alternatives. Corre-sponding differences in cash flows and in selected economic criteria must be predicted asa result of decision making . Related to the above is the rule that lost opportunity costsmust be given as much emphasis as real costs. This is one of the major points of differencebetween engineers and accountants. Lost opportunity costs never show up in the textbooks,and so are ignored by accountants. Another difference to keep in mind is that accountantsfocus on past results whereas engineers should look ahead.

Hence ship designers have to reclaim their role in technical-economic analysis. ”Engineersshould provide the economic analyses that compare the profitability potential of eachalternative ...” (Benford, 1970). Although that is true in many countries, sorrowfully thisstatement still remain a desiderata in the Italian shipping and shipbuilding industry. Thereare many reasons for this, of cultural, political, and psychological nature. Nevertheless, it isnot of minor importance the fact that ship designers are not capable to combine decisionmaking with engineering economics.

Since Napier (1865) tried to apply cost studies to the determination of ship characteristics,only slight progress was made in engineering economics for ship design. Along decadesneither the basic textbooks of the subject nor the periodical literature normally availableto practising engineers, managers and accountants appeared to give clear guidelines. Thisis not to suggest that no one has even thought or written about the economics of ship

475


design. Many authors have tackled this problem from Bergings (1871) to Marther (1963).But few of them provided or discussed, at least implicitly, what the criteria for comparingship designs should be. The notable paper by Benford (1963) marked a turn point since itimplies or advocates specific criteria.

So it was only one century after Napier that rigorous economic evaluations have foundserious application to ships because of three principal reasons:• The risk for making wrong decisions in ship design has increased continuously with

expansion in ship sizes and types, together with development of novel ship concepts.Until recently, the decision depended more on whether to build rather than what tobuild, as each succeeding ship design was usually a modification of a baseline ship.

• It is axiomatic that a ship design must be the ‘best possible’ for her service, butoptimization of single technical criteria is not enough. It is widely recognized thatthe main criterion must be of an economic nature, giving full weight to simultaneousinfluence of technical factors in its evaluation. The optimal design is that which is mostprofitable to the customer.

• There has been increasing complexity in the financial conditions surrounding shipprocurement. Once new ships were largely financed out of retained profits, but nowcheap loans, accelerated depreciation, hidden subsidies and tax relief all add greatly tothe difficulties of estimating ship profitability. However, most design decisions shouldbe made on the basis of simple economic analysis. In short, particularly at initialdesign stages, economic evaluation has not to be adulterated with confusing financialintricacies.

The principles of engineering economics are straightforward, and designers should notfind any difficulty in making the detailed calculations, even because there are computerprograms available. Discussion will be substantially confined to the economic evaluationsencompassing the decision-making process in conceptual and basic design of merchantships. The related principles, however, are easily adaptable to offshore platforms andnavy ships. While many of the techniques available from engineering economics may beused by shipowner management, here the primary purpose is to assist decisions in conceptdesign. There are two fundamental principles that should guide every decision in ship design:• a merchant ship is a capital investment that earns its returns as a socially useful

instrument of transport;• the best measure of engineering success is profitability ; and the only meaningful

measure of profitability is the returned profit, or required interest rate (after tax).

The required interest rate should be some logical measure of the decision–maker’s time–valueof money. In case of a government–owned ship it might reflect the current rate of interestpaid on government bonds.

8.1.1 Criteria for Optimizing Ship Design

In the context of ships, any criterion for determining the optimum investment involves theanswers to a set of inquiries as to the extent of which economic outcomes will be different

476


as a result of the investment. Thus, the following issues are of importance:1. What will be the gross benefits over the ship’s life? In the simplest case this is the gross

earnings of the ship. But where the ship is operating as part of a liner service thenthere may be some effects on earnings of other ships in the same ownership and thesemust be taken into account. In other terms, the needed figure is the difference betweenwhat the revenue would be with the investment and what it would have been without it.

2. What is the cost of the ship? This can be divided into two parts: acquisition cost andoperating costs. Acquisition cost, conventionally referred to as the capital cost thoughit may include some elements which an accountant would not normally recognizedas capital (e.g. any special training that may be required by the crew, or the cost ofhaving senior officers standing by during the building period). Operating costs, thoughsome, like the increase in management costs and commissions to brokers, may beexternal to the ship. They can be divided into the accounting headings of fuel, wages,stores, port fees, and so forth. Because capital cost has been included in acquisitioncost it would be double–counting to include any part of depreciation under this heading.

3. What is the life of the ship, either to scrapping or to sale second–hand? This willdepend largely on the physical characteristics of the ship, the work she has to undertakeand the policies of the owners. Because second–hand values are usually based on theestimated profitability of the remaining ship’s life, policies preferring one to the otherwill not make very much difference to the final answer except where the ship is highlyspecialized. But, since the second–hand sale of a highly specialized ship is unlikely, itmay be assumed that all ships are retained until scrapping.

4. What is the distribution of estimated revenues over the estimated life? In the yearsof the quadrennial classification surveys the ship will be out of service for increasingperiods. In those years, therefore, her earnings will be reduced. The distribution ofearnings throughout the ship’s life will not be constant. In addition, any rising orfalling trend in the supply-demand position of the type of ship under considerationmay affect the freight rate at which she trades, and, if she is a liner, the load factor atwhich she operates.

5. What is the distribution of the estimated operating costs over the estimated life?Again, because of the quadrennial surveys these costs will be greater in some yearsthan in others. Moreover, there may be rising or falling trends in the costs of op-eration. Methods are required by which such trends may be brought into the calculation.

6. What is the scrap (or second–hand) value at the end of the ship’s life? Because scrapvalues are easier to estimate than second–hand values, this is a further reason forassuming that the ship will be retained in the same ownership until scrapping.

The answers to these questions can be stated in terms of time and money. To this end,a table can be drawn up in which each row represents the year of construction. In thefirst column one can place all the earnings of the ship; these may be regarded as positivecomponents of cash flow. In the second column one can place all the costs of the ship, capitaland operating, against the years in which they are paid. It is the cash movements that oneis estimating; costs must, therefore, be entered in the years of payment. This is particularlyimportant where tax payments are concerned. By subtracting column two from column one,

477


one arrives at column three: net cash flow for each year in which receipts and paymentscaused by the ship will take place. In some years, certainly the year of construction, thisfigure will be negative, until the break–even point is reached. In the rest the net cash flowshould be positive if the ship has any chance of being an economic success.

Now it remains to relate the single net cash flows to one another. This can be done byrecognizing that the present value of a sum of money accruing in the future is less thanthat of an equal sum of money accruing now. That brings to the necessity of ‘discountingthe future’. Concepts such as compound interest will serve to calculate the discounted cashflow . Then, an economic criterion can be selected, e.g. either net present value, or requiredfreight rate, or another.

The different ways of financing the design may include some with deferred payments andinterest changes. All these payments are to be included, as and when they are expected tooccur. It may then happen that none of the early years of the ship’s life have any negativecash flows at all.

8.1.2 Operating Economics

The shipowner’s responsibilities for the various items of expenditure are illustrated in Figure8.1. Capital charges cover items such as loan interest, repayments, and profit, all relatedto the capital investment in the ship. The full calculation of effective capital charges canbe complex. Voyage costs cover fuel, port and canal dues, and sometimes cargo handlingcharges. Daily running costs are those incurred on a day–in, day–out basis whether the shipis at sea or in port; these include crew wages and benefits, victualling, ship upkeep, stores,insurance, equipment hire and administration. Voyage costs vary considerably from trade totrade, while daily running costs are largely a function of ship type, size and flag.

Figure 8.1. Division of responsibility for operating costs

The type of charter and the division of responsibility for cost and ship’s time betweenshipowner and charterer can influence some features of the design of the ship and its

478


equipment. With bareboat charters less than the life of the ship, the charterer has lessincentive than an owner-operator to reduce fuel consumption, while time in port is moresignificant for owners of owner-operated or voyage chartered ships than for time-charteredships. Owner operators may thus be expected to be more forward–looking in fitting fuelsaving devices or better equipment to keep port turnrounds short, e.g. bow thrusters or moreelaborate cargo handling equipment. Owner operators often have the highest standards ofequipment and maintenance.

From estimates of the components of ship operating costs and the corresponding transportperformance, it is possible to calculate freighting costs for a variety of ships. If relying onshore gear having a constant handling rate, time in port is roughly proportional to size,unlike tankers where time in port is almost independent of size. Thus big ships are onlyeconomic where handling rates are commensurate with size of ship. Shore costs per ton mayincrease with ship size, as deeper dredging, more powerful tugs, faster cargo handling gear,and bigger stockyards are required.

Item Liner Shipping Bulk Shipping

Ship Size (deadweight) Small - Medium Medium-Large(5000-25000 multi-deck) (15000-550000)(5000-50000 unit load)

Ship Speed Medium–Fast Medium–Low (12-17 knots)

Area of Operation Specific trade routes Worldwide

Type of Carrier Common Contract

Organisation/Ownership Conference of liner members Independent or industrial carrier

Assignments Large number of small parcels Small number of large parcels

Nature of Cargo Heterogeneous (general) Homogeneous (bulk)

Freight Rates Administered Negotiated(level set to cover costs) (set by supply & demand)

Competition Market shares Price and deliveryQuality of serviceNon–conference lines

Scheduled Service Yes (constant speed ship) No (constant power ship)

Mass or Volume Limited Volume Usually mass except certaincargoes and SBT tankers

Ports Serviced Range of ports near Usually one port each end nearmajor cities producing/consuming plant

Days at Sea per Year 180-240 (multi-deck) 240-330200-280 (unit load)

Own Cargo Handling Gear Yes (multi-deck) Usually none except tankersSometimes (unit load) and smaller bulk carriers

Table 8.1. Some differences between liner and bulk shipping

Actual bulk cargo freight rates are regularly published in the shipping press, ship brokers’reports, etc. They vary with supply and demand, and can be regarded as oscillating about a

479


level of freighting cost, which gives the average efficient operator an acceptable rate of returnin the long run. However over–supply of ships leading to long periods of low freight ratescan occur owing to, for example, very attractive shipbuilding loan terms. Table 8.1 indicatesthat different economic factors apply differently to liner as opposed to bulk shipping.

8.2 Time–Value of Money

Money has not only a nominal value, expressed in some monetary units, but also atime value. In the most familiar form the time value occurs on a bank–account. So, thebank–account has time value. Exactly the same value occurs in investments. If a return isyielded over an investment, and is reinvested, a return is yielded over a return, which meansthat the value of a return depends on the year in which that return is earned.

If designers want to make proper decisions, first they have to recognize that a given amountof cash exchanging hands today is more important than the same amount of cash exchanginghands in the future. In more simple terms, a sum of money in the hand today need not bespent, but could be put to work and allowed to generate rent money (i.e., interest). Then afundamental concept of economics can be introduced: it must be considered not only howmuch money flows, in or out of a company, but also when. Designers must assign sometime–value to money. They have also to consider relative risks, recognizing that expectationsmay or may not be fulfilled. Higher risky proposals naturally place greater emphasis on thetime–value of money .

The quantitative recognition of time–value of money is handled by means of standardcompound interest formulae. Interest relationships make allowances for the time–value ofmoney and the life of the investment and may be used to convert an investment (e.g. costof a ship) into an annual amount which, when added to the annual operating costs, may beused to determine the necessary level of income to give any required rate of return.

The interest can be thought about in three distinct forms:1. Simple interest when saving deposits in banks.2. Compound interest when a present sum is converted into a future sum and vice versa.3. Returned interest which is a measure of gains from risk capital invested in a profitable

company. This is called by various names including internally generated interest ,interest rate of return, profit or simply yield . It is one good measure of profitability,expressing the benefits of an investment as equivalent to returns from a bank at thederived rate of interest. Most countries impose a tax on business incomes, so theanalysts must differentiate between rates before and after tax .

The interest is calculated by exactly the same mathematical expressions in all three cases.Even though the compound interest is the more suitable to engineering economics indesign decision–making, what is more important is that in deciding between alternativedesigns, one must consider for each solution not only cash flows, but also their temporarilyevaluations.

480

8.3 – Cash Flows

Alternatively, where annual cash flows are known, the relationships can convert theminto present worths, which may be added together to give net present value (NPV ), forcomparison with the amount of the investment. The future cash flows are discounted(the inverse of ’compounded’), hence the common name of discounted cash flow (DCF )calculations. For an investment to be worthwhile, the present worth of the cash flows ofincome minus expenditure should be greater than the investment, taking inflows as positive,and outflows as negative, i.e. NPV should be positive. Cash flow implies money moving inand out of the company’s bank account.

Before considering how to integrate the related economic factors into the technical design ofships, the methods of making economic calculations, which can be used to evaluate alternativedesigns of freight earning vessels, must be taken into consideration. Engineering economicscalculations need to take account of performance over longer periods.

8.3 Cash Flows

An investment project can be described by the amount and timing of expected costs andbenefits in the planning horizon. The terms costs and benefits represent disbursements andreceipts, respectively. The term net cash flow is used to denote the receipts less the disburse-ments that occur at the same point in time. The stream of disbursements and receipts for aninvestment project over the planning horizon is said to be the cash flow profile of the project.

To facilitate the description of project cash flows, they are classified in two categories: (i)discrete–time cash flows, and (ii) continuous–time flows. The discrete-time cash flows arethose in which cash flow occurs at the end of, at the start of, or within discrete time periods.The continuous flows are those in which money flows at a given rate and continuouslythroughout a given time period. The following notation will be adopted:• Fn = discrete payment occurring at period n;• Ft = continuous payment occurring at time t.

If Fn < 0, Fn represents a net disbursement (cash outflow). If Fn > 0, Fn represents a netreceipt (cash inflow). The same can be said for Ft.

8.3.1 Cash Flow Profile

Cash flow diagrams are an important convention that engineering economists and designersshould use in decision–making. This refers to simple schematics showing how much moneyis being spent or earned year-by-year. In them, the horizontal scale represents future time,generally divided into years. The vertical scale shows annual amounts of cash inflows (up-ward pointing arrows) or outflows (downward pointing arrows). When cash flow estimationis repeated over the project life of a ship, the result is a series of net cash flows. If the sumof the series is positive, the flow is from the ship to the shipping company; if it is negative,the flow is from the shipping company to the ship. This series is often called the cash flow

481


series for the ship project, and the shipowner decides whether to undertake the project onthe basis of the estimated cash flow.

Part of the convention is that definition of cash flow series is simplified by assuming that allthe cash flows occur on the last day of each year. This assumption simplifies mathematicalformulation. Although in fact cash may change hands almost continuously, any errors thatresult from this simplifying assumption are likely to he common to all alternatives understudy, and so should have little effect on the decision.

To help visualization of the amount and timing of cash entering or leaving the organization,the so-called cash flow diagram is frequently used (see Figure 8.2), where time is representedon the horizontal scale, whereas annual cash amounts are shown on the vertical scale.

Figure 8.2. Cash flow vs. time

Zero on the time scale can be arbitrarily selected. It may mean ‘now’, ‘time of decision’,‘time when ship goes into service’, etc. Cash flows may be represented by bars or by arrows.Figure 8.3 shows a typical irregular cash flow pattern, in which receipts during a period oftime are shown by an upward arrow and disbursements during the period are shown by adownward arrow. The diagrams are drawn from the perspective of a lender or an investor. Aborrower, on the other hand, would picture the arrows reversed, but the method of analysiswould be exactly the same.

Figure 8.3. Representation of cash flows

Ships have long economic lives, usually at least twenty years. It is therefore justifiable totreat cash flows on an annual basis. For shorter-term studies, briefer time periods can beused, perhaps months. The basic principles and mathematics remain the same.

482

8.3 – Cash Flows

The basic relationships shown below use the following nomenclature (standard notation ofAmerican Society for Engineering Education), where capital letters are used for absolutevalues, and lower case for fractional values:

P present amount, principal, present worth, or present valueA annual return (e.g. income minus expenditure) or annual repayment

(e.g. principal plus interest)F future amount

483


N number of years (e.g. life of ship or period of loan)i interest or discount rate per year, decimal fraction (percentage rate/100)

All of the following basic interest relationships apply to cash flow patterns illustrated below.

8.3.2 Interest Relationships

Single Series

The first basic interest relationship is the single-investment, single-payment pattern shownin Figure 8.4.

Figure 8.4. Cash flow diagram for a single payment

Knowing the initial amount, P , and wanting to find the future amount, F , multiply P by theso–called single payment compound amount factor , usually shortened to compound amountfactor . If the time period is but a single year, the future amount, F , would equal the initialamount, P , plus the interest due, which would be i·P ; in short

F = P + i·P = P (1 + i)

If the time period, N , is some integer greater than one, then the balance of the accountwould have compounded annually as a function of that number of years, leading to thegeneral expression for the total repayment by the end of N periods

F = P (1 + i)N = (CA− i−N) P

The factor (1 + i)N is called the single-payment compound amount factor and is available intables indexed by i and N . It is abbreviated CA and, when associated with a given interestrate and number of years, the combination is indicated by the convention

(CA− i−N) = (1 + i)N

The reciprocal of the compound amount factor is the single present worth factor . It is oftenshortened to present worth factor , indicated by convention as (PW − i − N). It is themultiplier to convert a future sum into a present sum. This being the case, the abbreviationPW can now be taken to mean present worth or present value. It is also called the discountfactor . The terms are used interchangeably.

484

8.3 – Cash Flows

Reversing the process, if it is desired to know what single future amount F must be de-posited at interest i, compounded periodically, the equivalent present value can be found bymultiplying the desired amount by the reciprocal of the compound amount factor

P =F

(1 + i)N= (PW − i−N) F

The ‘present worth’ of F , which includes accumulated interest, is exactly the same as P , i.e.they are effectively equivalent.

Uniform Series

In many economic projections, decision makers assume uniform annual cash flow, eventhough that uniformity will not really occur. Again, any errors that result from thisassumption are likely to be the same for all design alternatives.

The interest relationship applies to a single initial amount, P , balanced against uniformannual amounts, A, as shown in Figure 8.5.

Figure 8.5. Single investment, uniform annual returns

If the uniform annual amounts, A, are known and the decision maker wants to find thepresent worth of them, P , he/she can use the expression

P =(1 + i)N − 1i (1 + i)N

·A = (SPW − i−N) A

The component (SPW − i − N) is called the series present worth factor , which is themultiplier to convert a number of regular annual payments into a present sum. It is alsocalled annuity factor.

This relationship is useful for situations in which the size of future uniform annual returnsfrom an investment can be predicted and the decision maker wants to find out how muchhe/she can afford to put into that investment.

Note that the series present worth factor is numerically equal to the sum of the individualannual present worth factors over the life of the investment; so is very useful for dealing withuniform cash flows, which can be used for many marine problems, at least in preliminaryevaluations.

Again reversing the approach, suppose to convert a present sum of money into an equivalentamount repaid uniformly over a number of time periods, usually annual. Then the capitalrecovery factor , CR, enables an initial capital investment (say in a ship) to be recovered as

485


an annual capital charge, which includes both principal and interest. CR is the ratio betweenthis uniform annual amount, A, and the principal, P , i.e. A = CR·P . It can be shown fromcompound interest relationships and the sum of geometrical progressions that

CR =A

P=

i (1 + i)N

(1 + i)N − 1

When associated with a given interest rate per compounding period, i, and number of com-pounding periods, N , the capital recovery factor is (CR− i−N).

Uniform Annual Deposits, Single Withdrawal

The third pair of interest relationships apply to the cash flow pattern shown in Figure 8.6,in which annual amounts are matched against a single future amount, F .

A quiry about this pattern is that at the end of the final year there are arrows pointing inopposite directions. This is done to simplify the calculations. Of course, in real life the netamount paid would not be F , but F minus A. Another possibility is that within a businesssetting the annual amounts would actually comprise continual cash deposits during the year.Nevertheless, one may assume single year–end amounts.

Figure 8.6. Uniform annual deposits, single withdrawal

If the uniform annual amounts, A, are known, and it is desired to find the equivalent singlefuture amount, F , that can be withdrawn, multiply A by the series compound amount factor,SCA

F = (SCA− i−N) A

Conversely, if the analyst wants to build up the future amount, F , and wants to find thecorresponding uniform annual amounts to be deposited, A, he/she will multiply that futureamount by what is called the sinking fund factor, SF

A = (SF − i−N) F

Of course, the sinking fund factor is the reciprocal of the series compound amount factor

(SF − i−N) =1

(SCA− i−N)=

i

(1 + i)N − 1

486

8.3 – Cash Flows

Gradient Series

Many engineering economic problems, particularly those related to equipment maintenance,involve cash flows that increase by a fixed amount, g, each year. The gradient factors canbe used to convert such gradient series to present amounts and equal annual series.

The present worth of such a cash flow can be found with a year-by-year analysis, as shownin Figure 8.7.

Figure 8.7. Gradient series pattern

A more sophisticated way may be to first find the equivalent uniform annual amount, A, bymeans of the following formula

A = A1 +g

i− N ·g

i(SF − i−N)

Consider the series

Fn = (n− 1) g , n = 1,2, . . . ,N

The gradient g can be either positive or negative. If g > 0, the series is called an increasinggradient series. If g < 0, one has a decreasing gradient series. The single–payment present–worth factor can be applied to each term of the series to obtain the expression

P =N∑

n=1

(n− 1) g

(1 + i)n

Alternatively, the present value of the series can then be found using the appropriate seriespresent worth factor based on the same values of i and N

P = (SPW − i−N) A

If the pattern shows a uniform downward slope, then the equivalent uniform annual amountwill be

A = A1 − g

i+

N ·gi

(SF − i−N)

487


Random Series

To find the present worth of an irregular cash flow (Fig. 8.8) the analyst must discount eachamount individually to time zero and then find the cumulative present value. This cumulativeamount can be converted to an equivalent uniform annual amount using the capital recoveryfactor

Figure 8.8. Random series pattern

Stepped Cash Flows

Another common variation involves cash flows that remain uniform for some number ofyears (or other compounding periods) but then suddenly exhibit a step up or down, orperhaps several such steps. In real life this might come about because of the peculiarities ofthe tax laws, as one example.

One way to solve this problem would be to analyze the cash flow year–by–year in a table, butthere are easier ways. Perhaps, with reference to Figure 8.9 the easiest way to understand is to

- find the present worth of A2 for N years;

- add the present worth of ∆A for Q years.

Figure 8.9. Stepped patterns

In short

PW = (SPW − i−N)A2 + (SPW − i−Q) ∆A

488

8.4 – Financial Factors

The analytical technique developed above can be applied to cash flows that involve morethen the two levels of income shown, as well as to negative cash flows or combinations ofpositive and negative flows.

8.4 Financial Factors

The overall profitability and the extent of own capital required is affected by load financeand tax considerations. Clearly the availability of advantageous terms may be an importantfactor in the choice of a particular ship. However, within one particular design solution, theseconsiderations do not usually affect the order of merit; that is, initial technical decisions on,for example, type of main engine, can often be made without considering detailed fiscal con-siderations, only using for example a discount rate which takes account of typical conditions.Of course, once a project has reached the detailed investigation stage, such aspects will needto be considered explicitly, on a year–by–year basis, examining cash flow projections in moredetail.

8.4.1 Taxes

Taxation represent one of the most important aspects of the investment projects. To omitit would be to falsify the picture. Not only is tax important as such, but so are the variousallowances which may be made against taxable income. Therefore, naval architects involvedin the design of a merchant ship should have at least a rough idea about the applicable taxstructure. In many cases a proper recognition of the tax law will have a major impact ondesign decisions. In other cases, taxes can be ignored. In any event, a naval architect shouldunderstand enough about the subject to discuss it intelligently with business managers.Tax laws are written by politicians who are swayed by pressures coming from manydirections and are changed over time. As a result tax laws are almost always complex, andcontinually changing. Thus, most large companies employ experts whose careers are devotedto understanding the tax laws and find ways to minimize their impact. No attempt is madehere to explain all the complexity of current tax laws; but some simple tax concepts areoutlined and their effects on cash flow explained.

When incomes are known, the impact of the corporate profits tax is usually neutral, that is,analyzing before–tax returns will point to the same optimum as that indicated by after–taxreturns. This is fairly obvious from the equation relating capital recovery factor before andafter tax

CR′ = CR (1− t) +t

N

where t is the tax rate.

Since the tax rate t and N are presumably the same for every alternative, the maximumvalues of CR′ and CR are of necessity tied to the same design. This might lead the designer

489


to conclude that taxes have no influence on technical decisions. That would be true werethe level of income independent of taxes. Such is seldom true, however, because freemarket conditions make freight rates sensitive to taxes. Shipowners base their prices on theattainment of a reasonable level of profitability. When taxes are raised, prices must also beraised to reflect the added burden. The true impact of the tax is perhaps best illustratedin conceptual design, which involves a proposal for a new ship, whose advantages mustbe weighted against competitive ships. In some cases a ship will have a radically greaterfirst cost but lower operating costs (or more income) than the other. The time-value ofmoney thus becomes supremely important in the comparison. Furthermore, to make theconclusions as general as possible (and in recognition of free market conditions), mostconceptual designs use the RFR criterion or something equivalent. The decision–makermust therefore select his/her stipulated yield with great care; he/she must also recognizethe effect of the taxes on the yearly revenue required to attain the stipulated yield.

The present tax is basically structured so that the corporation tax rate is levied at a particularrate on the before–tax cash flow. This tax base is broadly: Income - Operating Expenses -Interest on Loans - Depreciation Allowances.

Cash Flows Before and After Tax

Tax is assessed after the shipping company’s annual accounts are made up and are thus paid1–2 years in arrears of the corresponding cash flows. Annual income can therefore be dividedaccording to the bar diagram illustrated in Figure 8.10. It shows how annual revenues aretreated when figuring corporate income taxes. It is assumed here that all factors remainconstant over the N years of the design’s economic life. This is what economists call a heroicassumption, but it is frequently good enough for design studies.

Figure 8.10. Distribution of annual income

The bar diagram shows that the annual cash flow after tax , A′, is related to the cash flowbefore tax, A, by this simple expression

A′ = A (1− t) +t·PN

(8.1)

or, turning it around

490


A =A′ − t·P

N1− t

(8.2)

It is important to note is that all rational measures of merit are based on after–tax cashflows, not profits. In short, decision makers should not use profits to measure profitability, butuse cash flows instead . Profits are misleading because they are polluted with depreciation,an expense that is misallocated in time.

The return after tax, which includes the depreciation provision, is the shipowner’s disposableincome to use for repayment of loan principal, dividends, fleet replacement or any otherpermissible use. Dividends, however, are paid to shareholders without further deduction totax allowance is made in the shareholder’s own tax liability for the amount already paidunder ’corporation tax’ (tax credit).

Depreciation allowances are usually based on historic costs (i .e., face–value units) ratherthan on replacement costs. Thus, the standard tax shield for depreciation (t ·P/N) mustbe discounted twice to find its present worth in constant value terms. If one assumes thatbefore-tax cash flows, A, will remain uniform in constant–value monetary units, then onemust recognize that after-tax cash flows will drop somewhat over the years. This is due tothe diminishing value of the depreciation allowances. The constant–value present worth ofthe after-tax cash flow can be found as follows

PW = (SPW − r −N)·A (1− t) + (SPW − r −N)t·PN

or, by subtracting the investment, one can find the net present value

NPV = (SPW − r −N)·A (1− t) + (SPW − r −N)t·PN

− P

where

A annual cash flow before tax (in current–value monetary units)t tax rateP initial investmentN economic lifer discount rate applied to current–value (i.e., true time–value of money)i discount rate applied to face–value [i = (1 + d)∆(1 + r)− 1]d general rate of inflation

Fast Write–off

So far, the assumption was made that the ship’s tax life coincided with its economic life.This is not always the case because owners are sometimes permitted to base depreciationon a shorter period. It is called fast write–off and is advantageous to the investor. This is sobecause it provides a more favorable after–tax cash flow pattern. Over the life of the ship

491


the same total taxes must be paid, but their worst impact is delayed.

Some countries allow shipowners’ freedom to depreciate their ships as fast as they like. Inthat setting the owner can make the depreciation allocation equal to the cash flow beforetax. That will reduce the tax base to zero, and no taxes need be paid during the early yearsof the ship’s life. After that, of course, the depreciation tax shield will be gone, and highertaxes will come. Again, however, the total tax bill over the ship’s life will remain the same,unless the ship is sold before the expected life.

More typically, the owner will not be given a free hand in depreciating the ship. Rather, thetax life, that is, depreciation period, will be set at some period appreciably shorter than theexpected economic life. This will result in cash flow projections that feature uniform annualamounts with a step down after the depreciable life is reached.

To handle such a situation, first give separate attention to two distinct time periods. Thefirst of these comprises the years during which depreciation allowances are in effect, the finalsuch year being identified as Q. The second time period follows Q and extends to the finalyear of the ship’s life, designated as N . Among straight–line depreciation, the cash flowsbefore, A, and after tax, A′, will be related as shown in Figure 8.11.

Figure 8.11. Cash flow for fast write–off

Now, recalling how stepped cash flows were handled above, the present worth of the abovecan be found as follows

PW = A (1− t)·(SPW − i′ −N) +t·PQ

(SPW − i′ −N)

8.4.2 Leverage

Various ways are examined in which a shipowner may go into debt in order to expand thescope of operations. It is noted that the interest payments incurred may reduce the tax baseand so they must be recognized in assessing after–tax cash flows.

Increasingly complicated loan arrangements are considered. In fact, there are times whena naval architect will want to apply simple schemes, but there will be times in the designprocess when he/she will want to apply complex schemes. In general, in the conceptualdesign stage, when hundreds or thousands of alternatives are under consideration, thedesigners should be satisfied to use the simplest schemes. At the other end of the scale, whenthe choice has been narrowed down to half a dozen, the naval architect, the shipowner, orthe business manager, can apply many more realistic assumptions if considered necessary.

492


In general, the more realistic and complex assumptions will slightly reduce the impact ofthe income tax. In the early design stages, when assuming simple loan plans, the navalarchitect may recognize this effect by adding a small increment to the actual tax rate orto the interest rate. The same thought applies to assumptions regarding tax depreciationplans. By using such adjustments, the ‘best possible’ design as indicated by the simpleassumptions will closely approach the ‘best possible’ as indicated by the more realistic andelaborate assumptions.

Many, if not most, business managers have ambitions beyond the reach of their equity capital.This leads them to leverage up their operation by obtaining a loan from a bank. The sameis true of individuals who want to own a yacht. It is also often true of governments whosell bonds so as to finance a share of current expenditures. In nearly every case the lenderrequires repayment of the loan within a given time at a given interest rate. Typically, therepayments are made in periodic bits and pieces comprising both interest and some reductionin the debt itself. In short, the periodic payments are determined by multiplying the amountof the loan, PB, by the capital recovery factor appropriate to the loan period, H, and theagreed–upon interest rate, iB. The typical repayment period is monthly, but for ship designstudies one may generally assume annual repayments, AB; in short

AB = PB (CR− iB −H)

As an alternative to applying to a bank, managers may choose to raise capital by sellingbonds. As far as one needs be concerned here, the effect is the same: the debt must berepaid at some agreed–upon rate of interest.

To stimulate their shipbuilding industries, many countries throughout the world offer loansfor ship purchase, subsidized from central sources at below-market rates of interest. Theloans reduce the effective cost of the ship, and encourage owners to place orders. Creditterms officially available are broadly similar in each major country, for ships typically 80%of the contract price for eight years at 5% interest. For offshore mobile units, typically85% loans are available but only for five years. Loans for second–hand vessels are usuallymade on normal commercial terms. Interest payments can be deducted before tax liabilityis calculated on profits earned during the ship’s life.

Various initial and legal fees are charged in addition to loan repayments and interest,usually about 1% of the total loan. Generally the credit is advanced to the owner as buildinginstalments become due, so that interest becomes payable before the ship is delivered unlessarrangements are made to defer it. Repayment is usually in equal amounts at six-monthlyintervals after delivery, plus interest on the declining balance. Although favorable creditterms are an important marketing factor for shipbuilders (and once have contributed to theworld over–supply of ships), they do not usually affect the order of merit between technicalalternatives.

Figure 8.12 shows that as the credit proportion approaches 100 percent, the IRR on theshipowner’s diminished equity capital approaches infinity, but NPV or RFR continue to

493


give meaningful results. While this might suggest that shipowners should borrow near 100%of their capital needs, in fact this is risky, as in adverse market conditions, prior chargessuch as loan servicing would be excessive, which could force the owner into liquidation withinsufficient cash flow. An appropriate balance of own capital (e.g. shareholders’ or equityfunds) and debt (loans or credit) is necessary for financial stability.

494


Figure 8.12. Effect of borrowed capital on return

Some shipowners like to maintain a debt–equity ratio of about 60:40, buying in their ownstock when investment opportunities are too limited to maintain that ratio. Another temper-ing factor is that the shipowner has not to pay any tax on that part of his/her gross incomethat he/she turns over to the bank or bond-holder for interest. This, in effect, cuts his/hercost of borrowing about in two.

8.4.3 Practical Cash Flows

Although it is possible to make good use of the uniform cash flow relationships in preliminarycalculations and obtain results of about the correct order of magnitude, cash flows in mostpractical cases of ship investment are not uniform.

The most important reasons for these irregular cash flows are:

- loans for less than the life of the ship;

- differing relative rates of growth in main items of income and expenditure;

- tax allowances for (capital) depreciation and loan interest;

- subsidies.

Other variations occur but, although altering the absolute values in the economic calcula-tions, are unlikely to change significantly the relative values (‘ranking’) between alternativedesigns, as they tend to affect all designs in a similar manner. The variations would haveto be taken into account where the differences in the designs affect one particular fac-tor, e.g. different scrap values between steel, aluminium and GRP hulls. These variations are:

- scrap value;

- irregular pattern of building installments;

- special surveys or major overhauls involving appreciable cost and time out of service;

- general decrease of speed with increasing age;

- long–term charters less than ship’s life.

Although corrections may be applied to the uniform cash flow cases to cater for some ofthe items quoted, the more general procedure is to make complete year-by-year calculations.

495


A table is constructed to show for each year of life, the items of income and expendituregenerating a before–tax cash flow. After making allowances for tax, the after–tax cash flowsare multiplied by each year’s present worth factor, and summed up to give the discountedcash flow over the ship’s life and a resulting NPV .

Cash Flows for Equal Periods

The bar diagram shown in Figure 8.13 explains the cash flow before and after tax when thebank loan period is assumed to be the same as the ship’s economic life (H = N). Straight–line depreciation is also assumed, with depreciation period equal to economic life (Q = N).A final assumption is that the before–tax cash flow, A, remains constant. For many designstudies these assumptions are reasonable.

Figure 8.13. Cash flow for equal periods

In analyzing the cash–flow distribution shown in Figure 8.13 a more simplifying assumptionis used, which involves substituting a uniform annual value of the interest payments, IB, forthe actual, ever–diminishing values. Figure 8.14 shows the real distribution between principaland interest payment as well as the simplification of uniform iB.

Figure 8.14. Distribution between principal and interest payments

Cash Flows for Differing Periods

Shorter Pay–Back Period

Benford (1965) explains how to analyze returns before and after tax when the pay–backperiod to the bank differs from the economic life of the ship. It is assumed that the debt is

496


repaid in equal annual installments as well as straight-line depreciation and zero scrap value.The expressions relating conditions before and after tax now become

A′ = A (1− t) +t·PN

+ t·IB

and

CR′ = CR (1− t) +t

N+

t·IB

P(8.3)

where IB is the annual interest paid to the bank.

Further, the residual annual return to the owner, A◦, will be

A◦ = A′ −AB

where the annual return to the bank is found by means of the appropriate capital recoveryfactor as

AB = PB (CR− iB −N)

The annual interest paid to the bank will diminish from year to year, but for design purposesone can safely assume that it will be constant and equal to the annual return to the bankminus a uniform annual payback of the initial loan

IB = AB − PB

N

The annual interest paid to the bank can be also written as

IB =[(CR− iB −N)− 1

N

]·PB

All Periods Differing

Finally, it is appropriate to analyze cash flow before and after tax when the period ofthe bank loan, the depreciation period, and the economic life of the ship are all different.Initially it is assumed that the loan period, H, is shorter than the depreciation period,Q, which in turn is shorter than the economic life, N . The cash flow diagram would thencontain three segments, as shown in Figure 8.15.

During loan period (O−H) the cash flow before and after tax would be as developed beforeexcept that care must be taken to identify the differing time periods H, Q, and N . The cashflow after tax will be

A′ = A (1− t) + t·IB +t·PQ

497


Figure 8.15. Cash flow for differing time periods

During residual depreciation period (H − Q) the interest payments would no longer be afactor; so the only tax shields would be the depreciation allocation, i.e.

A′ = A (1− t) +t·PQ

During remaining period (Q−N) there would be no tax shields at all, so

A′ = A (1− t)

Applying the techniques introduced when discussing cash flow diagrams, one can find thepresent worth of this cash flow as follows

PW = A (1− t)·(SPW − i−N) + (t·P/Q)·(SPW − i−Q) + t·IB (SPW − i−H)

Thus, if there are uniform cash flows before tax and a stepped–pattern of cash flows aftertax, the analyst can find the present worth of the after–tax cash flows by means of thatrelatively simple equation.

8.4.4 Depreciation

When a shipping company makes a major investment, it exchanges a large amount of cashfor a physical asset of equal value. In its annual report it takes credit for that ship andshows no sudden drop in company net worth. Over the years, however, as the ship becomesless valuable, its contribution to the company’s worth declines; that is, it depreciates.

Depreciation is of special significance when computing income taxes, for it is an accountingexpense that reduces taxable income and hence reduces taxes, but it does not represent acash flow during this accounting period. Instead, the cash flow may have occurred at time0 when the ship was purchased, or it may be spread over a loan repayment period that isdifferent from the lifetime used for depreciation.

Depreciation is not an actual cost or expenditure of cash, but a bookkeeping transactionused both for tax and for accounting purposes. For accounting purposes, depreciation is

498


used to assess the ‘profit’ available for distribution to shareholders after applying a rate onfixed assets that maintains capital intact in money terms. The calculation of depreciationfor tax purposes is nearly always different, and as it affects the actual cash flows and finalnet income, it is the aspect considered here.

Traditionally, depreciation (or capital) allowances have been calculated either as ‘straightline’ (annual allowance = ship cost/ship life) or ‘declining (or reducing) balance’ (annualallowance = percentage of residual value of ship each year), or other variants which, ineffect, write off the initial cost over the expected life of the investment. In many cases, ’cost’may be acquisition cost minus expected residual value, e.g. assumed scrap value.

When using the basic interest relationships, e.g. CR, it is not necessary to add any furtheramounts for depreciation. The use of CR recovers the capital invested over the life of the ship,plus the required rate of return. However, depreciation affects the amount of tax payable bya shipping company. Regularly occurring expenses, such as operating costs, may be deductedin full before tax is levied, but purchase of a ship is treated on a different basis by means ofdepreciation allowances, strictly called ‘capital allowances’.

Straight–Line Depreciation

In its simplest form, the ship is assumed to lose the same amount of value every year untilthe end of her economic life. This is called straight–line depreciation (Fig. 8.16).

Figure 8.16. Straight–line depreciation

The straight–line method spreads the amount evenly over the depreciation lifetime of N

years. If the initial value or cost is P and the residual or salvage or scrap value is S, theannual depreciation Dn is found as

Dn =P − S

N, n = 1,2, . . . ,N

In most cases one is justified in ignoring the disposal value. Although it is hard to predict,the scrap value is typically less than 5% of the initial investment; and, being many years off,it has little impact on overall economics.

499


Declining Balance

The declining–balance method allocates each year a given fraction of the book balance atthe end of the previous year. If the declining balance is R% per annum, the N–th yeardepreciation allowance is 100R (1−R)N−1, where the declining balance rate R is given by

R = 1−(

S

P

)1/N

Such a method can be used for accounting purposes, and some countries’ tax authoritiesuse variants of them. For example, the declining balance method was instituted in 1984for British shipowners for tax purposes. Following a transition period, the system adopteda declining balance rate of 25%. Thus first year allowance is 25%, second 18.25%, third14.06%, fourth 10.55% etc. Thus it takes eight years to accumulate to 90%, a typical amountallowing 10% scrap value.

Until 1984 British shipowners were allowed to depreciate their ships for tax purposes atany rate they liked, with 100% first year allowances and ‘free depreciation’. In practice, thismeant writing the ship off as fast as profits permitted, i.e. extinguishing all liability for taxuntil the depreciation allowance had been exhausted. If there were profits from other shipsin the fleet, or other activities of the business, it was possible to write off the entire cost ofa new ship against tax liability on these other profits in the first year. From then on, taxwas paid on the full profit. This could be called the ‘full depreciation’ or ‘full tax’ position.Any unused allowance (e.g., because of insufficient profits) can be carried forward and usedin subsequent years.

A more general case for economic studies was to assume that depreciation could only beallowed against the profits of the particular ship being studied. This is equivalent to anewcomer to shipping, so can be called the ‘new entry’ position. At typical freight rates, itthen takes some 6 to 12 years before tax becomes payable, but considering the time valueof money, this is not worth so much as writing off in one year, but was better than writingoff over, say, 20 years. In all cases, tax balancing charges are usually levied if the disposalvalue of a ship exceeds its written–down value for tax purposes, i.e. tax allowances havebeen granted on the full cost of the ship, but the disposal income needs to be set againstthis, so is potentially taxable.

The 100% allowance system may encourage the leasing of expensive ships whereby a financialinstitution like a bank actually owns the ship. The ship may then bareboat or chartered toa ship operator at a slightly lower rate than would otherwise be possible.

Accelerated depreciation and other complexities

Nearly every maritime country gives special tax treatment to ships, usually including someform of accelerated depreciation. Some tax laws recognize that straight–line depreciationis based on an unrealistic assessment of actual resale values of physical assets. This leads

500


to various depreciation schemes that feature a large allocation during the first year of theasset’s life and diminishing allocations thereafter. These declining amounts may continueover the entire economic life, or they may lead to complete write–off in some shorter period.One may thus found accelerated depreciation combined with fast write–off. In any event,the total taxes over the asset’s life will once more be the same. The primary advantage ofsuch schemes is to offer the corporate a more favorable earlier distribution of after–tax cashflows.

When dealing with shipowners naval architects will likely have to talk to accountants whoknow all the tax rules and want to apply them to the design analysis. Naval architects mustof course pay attention to skills of these people. But they must also realize that they areusually safe in applying massive amounts of simplifying assumptions, at least in the initialdesign stages.

It should be known that some managers use the simplest sort of analysis in choosingprojects and in deciding whether or not to go ahead with them. This is so even thoughthey intend to use every possible tax–reducing trick if the project does indeed come tofruition. This suggests the wisdom of using multiple methods, for example, straight–linedepreciation, in the conceptual design stage when hundreds or thousands of alternatives areunder construction, but then, having narrowed the choice down to half a dozen alternatives,letting the accountants adjust the chosen few to satisfy their needs.

Starting with gross simplifications enables looking ahead to the effect of the more elaboratetax schemes by recognizing that their net effect is to produce some modest increase inpresent values of future incomes. This may be taken into account by assuming a slightlylower tax rate. Alternatively, future cash flows can be discussed with a slightly lower interestrate.

8.4.5 Inflation

Here the scope is to explains how to analyze monetary inflation, particularly how it mayinfluence decision–making in ship design. In general, inflation has a trivial impact on rationaldesign decisions. However, there may be special situations in which inflation should not beoverlooked. Shipowners, who expect more inflation than that expected by their bankers,are likely to favor going into debt, confident of their ability to pay off with the easiermoney of the future. Offsetting this, however, is the government’s insistence on allowingtax–depreciation credits based only on historic costs rather than constant-value monetaryunits.

If it can be assumed that a shipowner is free to raise freight rates commensurate with anyfuture inflation in operating costs, then all financial and economic factors will float upwardon the same uniform tide. If that occurs, the ‘best possible’ ship based on no inflation willalso be the ‘best possible’ ship in which inflation is taken into account.

Inflation needs concern the design team only when it becomes apparent that rates of

501


inflation are not the same for every factor in the economic structure. Long–term cash flowscannot be analyzed without first adjusting each year’s figure according to the purchasingpower of the monetary unit relative to any convenient base year.

Three basic approaches can be envisaged for calculating equivalence values in an inflationaryenvironment that allow for the simultaneous consideration in earning power and changesin purchasing power. The three approaches are consistent and, if applied properly, shouldresult in identical solutions. The first approach assumes that cash flow is estimated in termsof actual monetary units, whereas the second uses the concept of constant monetary units.The third approach uses a combination of actual and constant monetary units.

To develop the relationship between actual monetary unit analysis and constant monetaryunit analysis, it is appropriate to give precise definition of several inflation related terms(Thesen and Fabrycky, 1989):• Actual monetary units represent the out-of-pocket monetary units received or expended

at any point in time. Other names for them are current, future, inflated, and nominalmonetary units.

• Constant monetary units represent the hypothetical purchasing power of future receiptsand disbursements in terms of the purchasing monetary units in some base year. It isassumed that the base year, the beginning of the investment, is always time zero unlessspecified otherwise. Other names are real, deflated, and today’s monetary units.

• Market interest rate, i, represents the opportunity to earn as reflected by the actualrates of interest available in the financial market. The interest rates used previouslyare actually market interest rates. When the rate of inflation increases, there is acorresponding upward movement in market interest rates. Thus, the market interestrates include the effects of both the earning power and the purchasing power of money.Other names are combined interest rate, minimum attractive rate of return, andinflation–adjusted discount rate.

• Inflation-free interest rate, i′, represents the earning power of money isolated fromthe effect of inflation. This interest rate is not quoted by financial institutions andother investors and is therefore not generally known to the public. This rate canbe computed, however, if the market interest rate and inflation rate are known.Naturally, if there is no inflation in an economy, i and i′ should be identical. Othernames are real interest rate, true interest rate, and constant monetary unit interest rate.

• General inflation rate, d, represents the average annual percentage of increase in pricesof goods and services. The market inflation rate is expected to respond to this generalinflation rate.

The problem is then about which is the best way to correct a misleading ‘future value’into a reliable ‘current value’. There are two alternative methods. Both are based on thesame principles and, if corrected carried out, should produce the same final outcome andresulting design decision. One way is to prepare a year–by–year table in which all cashflows are entered in current values. The analyst is then in a position to apply standardinterest relationships to find the present value or equivalent uniform annual cost of thiscurrent–value cash flow in the usual way.

502


The other approach, as might be guessed, is to start with face-value monetary units andapply a discount rate that has built into it adjustments for both inflation and time–value ofmoney. This method can be handled by simple algebraic procedures and does not requirethe time–consuming, error-prone, year–by–year tabular approach described previously. Itallows one to find the present worth (corrected for inflation) of a future cash flow that issubject to predictably changing monetary values.

The task, now, is to derive values of i for any given set of assumptions as to the rate ofinflation and time-value of money. Note that i incorporates both time–value of money andinflation.

One way is to start with the simple case in which some cost factor is floating up right alongwith the general inflation rate, d. That being the case, although it appears to be increasingin face–value terms, it is really holding steady in real purchasing power. That is, it is alwaysthe same in current–value monetary units; so one can ignore inflation and say

i = r

Next, examine the case where a given cost factor remains fixed in face–value monetaryunits during a period of general inflation. One example might be straight–line depreciation.Another would be a fixed–level charter fee. In any given year

AFV = Ao

Correcting for inflation

ACV =AFV

(1 + d)N=

Ao

(1 + d)N

and correcting to present worth

PW =ACV

(1 + r)N=

Ao

(1 + r)N ·(1 + d)N

That is, double discounting is employed, once for the time-value of money, and again thedeclining real value of the monetary unit. In short, where costs remain fixed in face–valueterms one may use

i = (1 + r)·(1 + d)− 1

Finally, consider the case of a cost factor that changes at an annual rate, x, that differs fromgeneral inflation. In face–value terms

AFV = Ao (1 + x)

This final expression may, in extreme cases, produce a negative interest rate (equivalentto paying the bank to guard cash). This will lead to a present worth exceeding the futureamount.

503


Non-Annual Compounding

In most ship design studies engineers usually assume annual compounding when weightingthe time–value of money. There may be instances, however, when other compoundingperiods should be recognized. It may be recalled that the standard interest formulas areapplicable to any combination of compounding periods and interest rate per compoundingperiod .

Clearly, when changing the frequency of compounding the analyst also changes the weightgiven to the time–value of money. In order to make a valid comparison between debtsinvolving differing compounding periods, the analyst needs an algebraic tool that will assignto each repayment plan a measure that is independent of frequency of compounding.

The usual approach to this operation is based on what is generally called the effectiveinterest rate, abbreviated t. This is an artificial interest rate per year that ascribes the sametime–value to money as some nominal annual rate, rM , with M compounding periods peryear .

For example, suppose one loan plan is based on quarterly compounding at one interestrate, whereas another is based on monthly compounding at a somewhat lower rate. It isnot possible to tell by looking at the numbers, which is more desirable. If both nominalannual rates are converted to their corresponding effective rates, however, those values willtell which is the better deal. To convert from a nominal annual rate, rM , to effective rate,r1, the following equation is used:

r1 =(

1 + rM

M

)M

− 1

8.4.6 Escalation Rate

Escalation rate represents a specific inflation rate sometimes applicable in contracts. It isquite understandable how increasing costs reduce the profitability of an investment. Duringthe years of industrial expansion of the post–war period, freight rates for a given ship andcargo have generally followed a broadly escalation in nearly every item concerned due torates of inflation and oil prices going up particularly in the 1970’s, although the underlyingtrend has often been obscured by market fluctuations, increasing ships’ efficiency andreductions arising from the economies of scale as larger ships have been introduced.

Voyage charter rates do not include escalation clauses, nor do the majority of time-charterrates, which cover short and medium periods, i.e. they remain fixed for the duration of thecharter. However, sometimes escalation clauses covering increases in certain operating costsare included in the few long–term charters. Liner conference freight rates have been adjustedregularly over the years as elements of running costs have increased, particularly bunker costs.

In the majority of economic studies concerned with actual ships, it is suggested that moneyterms are used throughout (i.e. the actual cash amounts moving through the company’s bank

504

8.5 – Economic Criteria

account, including escalation), as this is the form usually used by shipowners in evaluatingprojects, whose cash flows from charter income and loan repayments are expressed in moneyterms. Use of money terms also forces attention on differential escalation rates (if all costsand income rose at equal rates, it would be easy to work in real terms), on second–handvalues (ships are often sold long before the end of their physical life), and on likely rates ofreturn, both before and after tax. It also makes hindcasting easier, checking on the resultsof previous evaluations. General forecasts of inflation, plus analysis of past data, can beused to assist in estimating escalation rates.

It is equally possible to work in real terms, i.e. in money of constant purchasing power, butadjustments need to be made when some costs may be quoted in money terms, e.g. progresspayments when building ships, while others may be estimated in real terms, e.g. crew costs.

8.5 Economic Criteria

So far, this chapter has dealt with the basic principles of engineering economics. It hasshown how to assess the relative values of cash exchanges that occur at different times, andhow to analyze the impact of taxes and interest payments on cash flows. Now comes thecritical question of how to apply all of the foregoing to decision making in ship.

It should be stressed is that there is no universally accepted technique for weighting therelative merits of alternative designs. Business managers, for example, may agree that theaim in designing a merchant ship should be to maximize its profitability as an investment. Butthey may fail to agree on how to measure profitability. Likewise, officials who are responsiblefor designing non–commercial vessels, such as for military or service functions, have a hardtime agreeing on how to go about deciding between alternative designs. The truth of thematter is that there are good arguments on favor of each of several economic measures ofmerit and the designer should understand how to handle each of them.

8.5.1 Set of Economic Criteria

The most widely used techniques are those associated with discounted cash flows. Themagnitude and timing of cash payments in and out are estimated over the vessel’s life foracquisition costs, operating costs, and revenue generating potential.

The time value of money is recognized by discounting future cash flows at the operator’s costof capital, i.e. multiplying each year’s net cash flow by (1 + i)−N , where N is the number ofyears from project start, and i is the discount (or interest) rate as a decimal fraction. Cashflow calculations may be made in ‘money terms’ (the actual amount in money of the day) orin ‘real terms’ (money of constant purchasing power). In the former case inflation has to beallowed for; in the latter case the interest rate will be in real terms, which is approximatelythe rate in money terms minus the rate of inflation.

505


Table 8.2 identifies thirteen measures of merit, each based on sound economic principles.Each is of potential value in marine design, and several have strong supporters. They areplaced in three categories depending on whether the analyst wants to assign, versus derive,a level of income and assign, versus derive, an interest rate.

There are only three primary economic criteria: the other ten are each closely related toone of those three. Here the discussion is confined to the four most important measures ofmerit. There are the three primary criteria shown in the middle column of Table 8.2 (netpresent value, yield , and average annual cost) plus required freight rate. All assume uniformannual costs and revenues, although levels may vary between alternatives. The last is notstated explicitly in Table 8.2; but in structuring a problem to find its minimum value, it isgenerally implied that the present worths of income and expenditure are equal so that their‘net present value’ is zero.

Required Assumptions Primary Measure Surrogates orRevenue Interest Rate of Merit Derivatives

yes yes NPV NPV I,AAB,AABI

yes no IRR CR,CR′,PBP

no yes AAC LCC,CC,RFR,ECT

Table 8.2. Three major categories of economic criteria

The numerical results will be different in each case depending on what criterion is being cal-culated, but if being used to compare alternative ship designs, all would indicated the sameoptimal design if data are consistent, e.g. rates of return are commensurate with freight rates.

Marine literature contains many studies based on questionable logic. Perhaps the most com-mon variety tries to minimize the unit cost of service. That is, someone looks for the alterna-tive that minimizes the cost to the shipowner. This is technically called the fully distributedcost . It is something like the required freight rate, but ignores corporate income taxes andapplies a rock–bottom interest rate to total capital. By ignoring taxes and minimizing thetime–value of money, this criterion is almost always misleading.

8.5.2 Definition of the Economic Criteria

The most popular economic criteria for marine problems, which are the most rationalmeasures of economic merit to optimize the design from the shipowner’s point of view, canbe firstly considered under a set of simplifying assumptions:

- all annual incomes and expenses remain uniform in constant–value terms;

- the investment is made in single lump-sum payment upon delivery of the ship;

- no bank loans and tax-credits are involved;

- the tax life equals the economic life;

- straight–line depreciation is used in figuring tax;

506


- the scrap value is zero.

Most initial ship design economic studies will probably not be afflicted with complex cash flowpatterns, but will rather consist at a single investment, at year zero, and uniform after–taxreturns. In the sequel different economic criteria are illustrated.

Net Present Value

The net present value (NPV ) criterion is by far the most popular and easily understood ofall the economic measures of merit in use today among business managers.

It requires an estimate of future revenues and it assigns an interest rate for discountingfuture, usually after–tax, cash flows. The discount rate is usually taken as the minimumrate of return acceptable to the decision maker. As implied by its name, NPV is simply thepresent value of the projected cash flow including the investments.

If the building cost of a ship is known, together with the minimum required rate of returnon the capital invested (discount rate), all the annual operating costs, the cargo quantitytransported each year and the corresponding freight rate (i.e. annual revenues), one can cal-culate the present worth of each item of income and expenditure and add them to find NPV .

The general form of NPV for freight earning ships is defined by the difference between thepresent value of cash receipts over the project life and the present value of all cash expenses

NPV =N∑

0

PWannual payload quantity×freight rate − PWshipbuilding cost − PWannual operating costs

In the simple cash–flow pattern shown in Figure 8.5, if the cash flows after tax have a uniformlevel, A′, over the ship’s life, and P represents a single lump investment, the net present valueis found by subtracting the investment from the present value of the future cash flows; inshort

NPV = A′ (SPW − i′ −N)− P

where

A′ : uniform annual after–tax cash flow = A (1− t) + t·P/N

A : uniform annual cash flow before taxt : corporate income tax rate

(SPW − i′ −N) series present worth factor for an owner’s stipulated minimum acceptableafter–tax rate of return i′ and a period of N years

If cash flows are not uniform, the present worth of each annual cash flow after tax can becalculated for each of the N years of the ship’s life.

The net present value may be regarded as an instantaneous capital gain if positive (or loss,if negative), or as a discounted profit, or the sum for which the total project could be sold

507


at its start. Consequently designs with the highest NPV s are sought. Of course, when theNPV is negative, the project would be rejected.

The NPV economic criterion has two inherent weaknesses: it tends to favor massive invest-ments and it can be misleading if alternatives have different lives. The first weakness may beovercome by using the net present value per monetary unit of investment. This normalizedquantity is called the net present value index (NPV I) or profitability index (Benford, 1981)

NPV I =NPV

P

If alternatives have different lives, NPV tends to favor the longer lived. That distortion canbe eliminated by multiplying each NPV by a capital recovery factor based on the samediscount rate, but appropriate to the individual life expectancies. This criterion is called theaverage annual benefit

AAB = (CR′ − i′ −N) NPV

If one takes the AAB per monetary unit invested, that will eliminate both weaknesses in theuse of NPV . This third variation of NPV is called the average annual benefit index

AABI =AAB

P

Internal Rate of Return

It is important to notice that NPV is found by discounting future cash flows at the decisionmaker’s minimum acceptable interest rate. Because the predicted value of an acceptableproject must always be positive, the actual expected interest rate will be something higherthan the minimum rate used in computations. Instead of applying that minimum acceptablerate, the decision maker could look at the expected cash flow pattern and derive the interestrate implied.

There is some interest rate that will make the NPV of a cash flow equal to zero. It is theinternal rate of return (IRR), which is another time–discounted measure of investmentworth. It is feasible to use IRR in cases where the freight rate or income is known. Designsare preferred that offer the highest IRR, which also goes by various names, including dis-counted cash flow rate of return (DCF ), yield , equivalent interest rate of return, profitabilityindex , marginal efficiency of capital , equivalent return on investment , and others.

It is derived iteratively, or using the ‘goal seek’ function, by finding the interest rate thatwill make the present worth of future after-tax cash flows, including the investment, equalto the present worth of the investment. In short, IRR is that interest rate that leads to anNPV of zero. In other words, it is the maximum rate of interest at which the shipownercould finance his/her ship on normal bank overdraft terms or any equivalent (i.e. whereinterest is charged only on the outstanding balance of the loan) in the absence of uncertainty.

508


In simple patterns, however, yield can be easily found. First find the expected after–taxcapital recovery factor (i.e., the ratio of A′ to P ), then go to interest tables and find theinterest rate that corresponds to that combination of CR and ship’s economic life, N . Thereare also various extensions to the basic method to cater for special situations.

IRR avoids the shortcomings of NPV in that it does not give unfair advantage to largerinvestments or those with longer lives. Nevertheless, some advocates of NPV point tocases where IRR may be misleading. This is particularly true where the attainable yielddiffers markedly from the company’s actual value of money. Another of its shortcomingsmay show up if the analyst is faced with a cash flow pattern that shows a year–by–yearmix of money coming in or out. That being the case, it may turn out that there is morethan one interest rate that will bring the net present value down to zero. Fortunately, mostship economic studies involve simple cash–flow patterns in which that dilemma does not arise.

When revenues are predictable, decision makers should take advantage of that knowledge toestimate the equivalent interest rate of return. That frees them of the necessity to stipulatean interest rate, which is a tricky question. Nevertheless, the yield must be comparedwith some interest rate. The naval architect has to compare the estimated yield for everyalternative design and select the highest. Whether the figure is high enough to justify therisk of investment is strictly manager’s concern.

When properly applied, usually IRR will produce exactly the same answers as the netpresent value method; and it requires exactly the same information. The choice between thetwo methods is, therefore, largely a matter of personal preference.

If all design alternatives have equal lives, then an examination of the relationship betweencapital recovery factor and interest rate will show that the alternative with the highest valueof CR′ will automatically have the highest yield. Moreover, the alternative with the highestcapital recovery factor before tax, CR, will normally enjoy the highest capital recoveryfactor after tax, CR′. This means that CR may be a surrogate for the yield.

A surrogate for IRR is the pay–back period , PBP , the number of years required to regainthe initial investment:

PBP =P

A′

As it is evident, PBP is the reciprocal of the capital recovery factor after tax, CR′. As such,it shares both strengths and weaknesses of that criterion.

Average Annual Cost

The next economic criterion is useful in designing ships that are not expected to generateincome: naval vessels, patrol vessels, dredgers, yachts, etc. Now the cash flow pattern willfeature only money flowing out. When that is the case, a logical and popular measure ofmerit is the so–called average annual cost (AAC) criterion. The AAC measure of merit maybe applied also to merchant ship designs where all alternatives would happen to have equal

509


incomes, which includes the possibility of that being zero.

Whereas in using NPV or IRR, the analyst seeks for the alternative promising highestvalues, in using AAC, the lowest values are desired.

The simplest case ought to have a single initial investment, P , at time zero, and uniformannual operating expenses, Y , for N years thereafter. The problem reduces to find, among aset of alternatives, the design solution with the lowest value of AAC, which would be foundby converting the initial investment, P , to an uniform annual amount, which would be addedto the annual operating costs, Y

AAC = CR·P + Y = (CR− i−N)·P + Y

510


where

Y annual operating costs

CR capital recovery factor corresponding to the life of investment, N , and the owner’sstipulated before-tax interest rate of return, i;

The term (CR− i−N)·P is called the annual cost of capital recovery, ACCR. Note that itis based on before–tax interest rates in case revenues and taxes are both involved.

The interest rate should be some logical measure of the decision maker’s time-value of money.In the case of a government–owned ship it might reflect the current rate of interest paid ingovernment bonds. For more complex cash flows, simply discount everything back to yearzero, (including P ), then multiply the total figure by the capital recovery factor based onthe same interest rate for a number of years equal to the ship’s life span. That will producethe average annual cost.

Required Freight Rate

If two competitive designs promise the same average annual cost, but one promises tobe more productive than the other, this difference is quantified by relating the AAC toproductivity. In the case of merchant ships this is done by dividing the average annualcost by the annual transport capacity. This gives the required freight rate (RFR), whicheconomists call a ‘shadow price’. It represents the necessary minimum income per unit ofcapacity (e.g. passengers or cargo) to cover all operating costs while providing the requiredrate of return on capital invested in the ship. The same concept could be applied to othermeasures of productivity such as cars per year and/or passengers per year for a ro–ro vessel,tons of fish per year for a trawler, and so forth.

If the acquisition cost, P , of a ship is known, the required rate of return, i, all the operatingcosts, Y , and the annual cargo quantity transported, C, the level of freight rate can befound which produces equal present values of income and expenditure, i.e. zero NPV . Thiscriterion is more suitable when revenues are unknown but will vary between alternativesbecause of differences in transport capability. In general

RFR =AAC

C=

ACCR + Y

C=

CR·P + Y

C

where the annual cargo capacity can be in whichever unit, and Y = Yr + Yv, being Yr andYv the annual running costs and the voyage costs, respectively.

So RFR can be regarded as a calculated freighting cost, which can then be compared withactual freighting price, i.e. market freight rates. For service vessels (e.g. offshore vehicleslike crane barges, pipe–laying ships, etc.), RFR may be calculated in the form of necessarydaily hire rate.

In other terms, it is the rate the shipowner must charge the customer if he/she is to earn areasonable return on investment. The theory is that the owner who can enter a given trade

511


route with a ship offering the lowest RFR will best be able to compete.

This is a valuable criterion, much used in the marine industry for studying the feasibility ofnew ship concepts or optimizing the details of any particular concept. It provides the moneyamount the shipowner must charge a customer if the shipowner wants to earn a reasonableafter–tax return on his/het investment. The theory behind RFR is that the best ship forany given service is the one that provides that service at minimum cost to the customer.Implicit here are the assumptions that (i) free market forces predominate in the trade, and(ii) all competitors operate within the same frame of capital and operating costs.

A key step in finding RFR is to convert the initial investment to an equivalent uniformannual negative cash flow before tax. These annual amounts must be large enough topay the income tax, and return the original investment to the owner at the specified levelof interest. In short, a suitable value for the capital recovery factor before tax must be found.

To show the truth of the above assertion, recall the basic relationship (8.2) between cashflows before and after tax

A′ = A (1− t) +t·PN

To make this non–dimensional, divide through by the initial investment P

A′

P=

A (1− t)P

+t

N

ButA′

P= CR′ and

A

P= CR

which leads to

CR′ = CR (1− t) +t

N

Then solving for CR

CR =CR′ − t

N1− t

Since t and N are the same for all design alternatives, CR will vary directly with CR′ whichwill, in turn, vary with the yield, i′. This is a simple way of converting an after–tax interestrate to a before–tax capital recovery factor. It assumes an all–equity investment and a taxdepreciation period equal to the ship’s economic life.

For non-uniform cash flows, an initial freight rate has to be assumed so that an initial NPV

can be calculated as above. This NPV is unlikely to be zero, so an iterative procedure hasto be used to find the exact freight rate which gives zero NPV .

If the ship is in a fixed, single–cargo trade, then the annual transport capacity, C, is fairlyeasy to estimate. If the ship is in, say, a round–the–world voyage with many ports of calland many classes of cargo, RFR may prove too cumbersome to be practical.

512


Payback Period

A popular rule-of-thumb for evaluating projects is to determine the number of periodsneeded to recover the original investment. The payback period (PBP ) is defined as thenumber of periods it will take to recover the initial investment outlay.

Assuming uniform annual returns, the payback period is supplied as

PBP =P

A′

This is the reciprocal of CR′ and so incorporates all that criterion’s strengths and weaknesses.Obviously, the most serious deficiencies of the payback period are that it fails to considerthe time value of money and that it fails to consider the consequences of the investmentafter the payback period.

As a modification of the conventional payback period, one may incorporate the time value ofmoney. The method is to determine the length of time required for the project’s equivalentreceipts to exceed the equivalent capital outlays.

Mathematically, the discounted payback period Q is the smallest n that satisfies the expres-sion

Q∑

n=0

Fn

(1 + i)n≥ 0 (8.4)

Clearly, the payback period analysis is simple to apply and, in some cases, may give answersapproximately equivalent to those provided by more sophisticated methods. Many authorshave tried to show an equivalence between the payback period and other criteria, such asIRR, under special circumstances. For example, the payback period may be interpretedas an indirect, though quick, measure of merit. With a uniform stream of receipts, thereciprocal of the payback period is the IRR for a project of infinite life and is a goodapproximation to this rate for a long–lived project.

There are many reasons why the payback period measure is so popular in business.One reason is that the payback period can function like many other rules-of-thumb toshortcut the process of generating information and then evaluating it. Payback reduces theinformation search by focusing on the time when the firm expects to ‘be made whole again’.Hence, it allows the decision maker to judge whether the life of the project past the breakeven point is sufficient to make the undertaking worthwhile.

In summary, the payback period gives some measure of the rate at which a project willrecover its initial outlay. This information is not available from either the NPV or the IRR.The payback period may not be used as a direct figure of merit, but as a constraint: noproject may be accepted unless as payback period in shorter than some specified period oftime.

513


8.5.3 Choice of the Economic Criteria in the Marine Field

The following are just general recommendations. When revenue is either unknown or zero,assume a reasonable interest rate and convert all costs to discounted cash flows. Whenrevenue is predictable, use equivalent yield as economic criterion. If revenues are the samefor all alternatives, seek the one with the lowest annual average cost. If transport capabilitiesvary, divide average annual cost by the annual tons of cargo (number of units, number ofpassengers) moved. This gives the freight rate required to return the stipulated yield; andthe best ship for the trade is the one with the lowest required freight rate.

In finding average annual cost or required freight rate, one must not overlook thecorporate profit tax. Where revenues are predictable, engineers need worry less aboutthe tax. Under most normal circumstances the ‘best possible’ ship before tax will alsobe the optimum ship after tax, assuming all alternative designs charge the same freight rate..

The NPV economic criterion is widely used as a criterion especially where investment fundsare limited, but it is best used in those cases in which income can be predicted reasonablyconfidently, e.g. long-term time–charters. It has the computational merit of being a singlecalculation not requiring an iterative solution. A drawback to its use is interpretation of theresults. The differences between investments are absolute, not relative, and this can makecomparison of widely different alternatives difficult. This may be partially overcome bythe net present value index (NPV I) introduced by Benford (1970), which can be used tocompare investments differing greatly in absolute size, e.g. coastal tankers versus very largecrude oil carriers.

Alternatively a profitability index may be calculated as ratio between NPV of cash inflowsand NPV of cash outflows. There still remains the problem of comparison when NPV s areclose to zero or negative, and of forecasting income in a fluctuating business like shipping.NPV I used as a measure of merit is analogous to IRR, since it is effectively a ‘profit’divided by the first cost.

The RFR measure of merit is useful in the many cases where incomes are unknown. Inan internationally competitive business like shipping, rates of return oscillate about a long-term trend, and over a ship’s life it is not unreasonable to expect that freight rates willprovide a return on an efficient ship tending to the average trend. If this did not occur,shipowners would not reinvest in new tonnage, demand would ultimately exceed supply andproduce its own correction in the form of higher freight rates, unless there is too much non-commercially run tonnage available (e.g. state supported fleets). Freight rates do not remainpermanently in peaks or troughs. RFR is particularly useful when comparing alternative shipsizes, as a single freight rate cannot be expected to apply to all sizes - the market ensuresthat economies of scale are eventually passed on the consumer. RFR can be compared withpredicted market rates to see if the results appear realistic. Low discount rates may leadto over-design, e.g. ships faster than is ‘economic’, since capital cost is being assessed more‘cheaply’ than operating costs. High discount rates may result in required freight rates sohigh as to be unattainable under normal market conditions, so the design is likely to be

514


uncompetitive in the sense of being able to find business. The RFR rate have different unitsaccording to the type of vessel and duty, e.g.

515


passenger ships cost per passenger-milero-ro ships cost per vehicle-milecargo carrier ships cost per tonne-milecontainer ships cost per TEU-milepatrol craft cost per day

The AAC concept is analogous to RFR to compare alternatives which have equal annualtransport capability or equal annual performance capability for those vessels which do notgenerate an income. In using AAC, all costs are discounted to year zero, to give presentvalue of costs which can be converted to an equivalent annual amount via the capitalrecovery factor. This criterion can be also used for items of equipment which do not affect aship’s earning potential.

The IRR economic criterion gives a more recognizable comparison between widely differentalternatives, especially where funds available for investment are relatively unrestricted. Itis a useful method for additional pieces of equipment, especially those not significantlyaffecting a ship’s income, where it can be measured against some target rate of return forthe degree of risk involved. Like NPV , there is the problem of forecasting income, but inaddition, IRR is not related to the absolute amount of the investment. IRR is, however, notthe same as the profit on historic capital shown in a company’s accounts, but is more likethe rate of return on a fixed interest rate investment like a government stock. In general, thedesign with maximum CR will be that with the highest IRR, if lives are equal. In theory,there will be multiple solutions to the calculation of IRR where cash flows alternate in sign,but this is not often a problem in marine work (Sloggett, 1984).

The incremental rate of return is a variant that calculates the IRR on an additionalinvestment, e.g. an extra piece of equipment on a ship, or the difference between twoprojects’ cash flows to show whether the rate of return on this ‘incremental’ investment isat least as high as that on the basic ship. In this case, only the cash flows and extra firstcost associated with the ’increment’ are used in calculating the rate of return, so simplifyingthe appraisal, as δA/δ → CR′ → i′.

Permissible cost can be used when assessing newbui1ding prospects or the purchase ofsecond-hand ships, comparing this price against current ship prices and expected freightrates. It can also be used to assess new items of machinery or equipment, whose operationalcosts and savings can be estimated.

Figure 8.17 shows the normal circumstances under which one of the criteria may be selectedfor ships, according to the amount of information known. The designer task is primarily thatof selecting the best alternative, leaving to management the problem of whether to investat all and if so, when. In the marine field, where it is not always possible to predict incomeover the life of a ship, the preference should be for required freight rate as the most usefullong–run economic criterion in establishing the most economic vessel design. In the caseof closely competing alternatives, a range of assumed freight rates may then be taken, so

516


that NPV s and IRRs can be calculated to see whether the order of merit of the alternativedesigns indicated by RFR is changed. Where equipment, rather than the entire ship, is beingconsidered, income may take the form of cost savings, and IRR is a useful criterion, especiallywhere ship performance (speed, payload, port time, etc.) is not significantly affected.

Figure 8.17. Decision chart for selecting the economic criterion

The criterion of payback period , PBP , is still sometimes used in industry. This is thenumber of years it takes the net revenue (income - expenditure) to accumulate to the levelwhere it equals (‘pays back’) the investment. While payback period is numerically equalto SPW for uniform cash flows, P/A, the value of i should still be calculated for theappropriate N . A variant calculates the number of years before the discounted net revenueequals the investment. This is analogous to rate of return, but solving for N instead of i.Payback period should not be used for non-uniform cash flows, as all variation in incomeand expenditure for years beyond the payback period is completely ignored, taking littleaccount of cost escalation or change in performance with time. Its use as a primary criterionis therefore not recommended, but it can be presented as a supplementary result or a sim-ple shorthand for results derived more rigorously, especially if the result is attractively small.

Even if non–economic factors are the primary reason for purchasing a ship in the first place,e.g. national prestige, technical and economic criteria still have their place in assisting theselection of the best of the alternative ship designs, machinery and equipment.

517


8.6 Ship Costs

The cost of any ship is a function of different kinds of variables - technical, physical,managerial, political. Its complete estimation calls for professional guidance from a range ofdisciplines, some of which are quite remote from that of naval architecture - accountancy,planning and production control, trade union agreements, shipyard management, insuranceand many others. But at initial design stages naval architects aim at nothing more thanfirst approximations of ship costs, which can be obtained fairly quickly, but which arenevertheless associated to some extent with the physical features of the ships underconsideration. The reasons for costing since conceptual design stage are to get an idea ofthe capital investment involved and to see how the cost might be affected by altering any ofthe principal variables.

Life–cycle costs, including both building and operating costs, are among the most importantparameters influencing the choice between competing vessels. Cost is concerned with howmuch money the shipbuilder will pay for shipyard labor to build the ship, subcontractors toassist, all materials and equipment contained in the completed vessel, miscellaneous servicesand establishment charges. That is why estimates of costs with a good level of accuracyare desirable since the initial stages of the design process. This may not be easy, as costingdata necessary for the calculations are usually not readily available. Indeed, shipowners andshipbuilders characteristically refuse at giving easily one another cost information.

The design team is generally concerned with evaluating alternative solutions in conceptualdesign. The alternatives will usually differ not only in performance, but in their first costsand operating costs. It is, therefore, useful to obtain quickly an indication of relative costs,before developing more detailed studies which may involve work by other organizations.Cost estimates maybe broadly divided into three main categories:

1. Conceptual design cost estimate for selection process.2. Basic design cost estimate, associated with detailed exploration of robust alternatives.3. Fully detailed cost estimate, usually for tendering purposes.

The expected level of accuracy increases with detail, as does the amount of data and effortrequired. Here only the first category is concerned, because cost estimating is more likelyto be applied at this level by ship operators, consultants, equipment suppliers, regulatorybodies, researchers, etc., rather than at the more detailed levels, which are largely thepreserve of professional cost estimators, e.g. in shipbuilding companies.

At conceptual design stage it is not possible to suggest more than very simple cost estimatingrelationships for approximate estimates; nevertheless, these can still be useful in establishingthe potential feasibility of a project, and in ranking the principal alternatives for moredetailed study.

In the ship design context, the need to estimate the principal costs to carry out an economicevaluation concerns the following components of an economic model:

• Building cost– Structural hull

518

8.6 – Ship Costs

– Outfitting– Machinery

• Voyage costs– Fuel consumed in transit– Fuel consumed on duty– Fuel price– Other consumables, e.g. lube oil– Port charges– Payload handling charges– Other shore/base costs

– Cost escalation• Manning costs

– Crew– Upkeep– Insurance– Stores– Overheads/Administration– Manning (per running hour and total time)– Cost escalation

• Financial factors– Internal rate of return– Expected economic life– Financing/loan terms– Fiscal factors (tax, subsidy)– Exchange rates– Residual value

• Payload/Revenue/Effectiveness– Sea time per transit (mission/duty)– Port/base time– Non-operational/off-hire time– Voyages/missions per year (with seasonal variations if appropriate)– Payload/duty performed per voyage/mission– Freight rate– Total revenue– Revenue escalation

What follows is mainly an explanation of how one can structure a procedure for estimating thecosts of alternative design concepts. Naval architects need to complement what is explainedhere with appropriate real–life data collected from many various resources, Nevertheless, thefollowing notes will assist in producing approximate estimates, but is not a substitute formore detailed methods or more accurate data where these are available. Contributions byErichsen (1972), Carreyette (1978) and Kerlen (1985) may also usefully be consulted formethods and data.

519


8.6.1 Building Cost

Analyses of engineering economic cost always involve an estimate of invested costs. In-deed, the construction costs are usually the single largest, hence most important, factorentering into the analysis. Although shipbuilding costs may be estimated for several differ-ent reasons, here the main scope is to help make rational decisions in conceptual design stage.

Naval architects normally want to predict the economics to assist comparison of largenumbers of alternative designs (Buxton, 1987) as well as to establish the design variablesand parameters of the most efficient vessel (synthesis or optimization). This means thatthe estimating methods should be relatively simple, provided basic data are available. Thealternatives under consideration usually exist only as virtual concepts about which fewdetails have been established. This, too, suggests that the techniques must be relativelysimple. Moreover, the estimating methods should strive to emphasize differences in costsbetween the competing alternatives.

At the simplest level, the first cost of a ship is influenced mainly by her type, size, speed, hencepower. Where the range of possible specifications is small, e.g. in straightforward vessels suchas tankers, size alone is often a fair guide to approximate first cost. Maritime journals such asFairplay and Lloyd’s Ship Manager include published prices of recent contracts, and graphscan be plotted to give an indication of expected prices, at least when market conditions arereasonably stable. Such graphs may indicate whether a simple cost relationship of the form

P = k (Lα ·Bβ ·Dγ) or P = k (L·B ·D)x

may be derived. The slope of such a curve, if plotted on a log–log graph paper is given byx, typically about 0.7; that is, cost increases less rapidly than size, as would be expected.Regression analysis can be used where there may be more variables, e.g. speed.

Another simple technical characteristic to use as a basis for estimating building cost is thelightship weight, WLS . An accurate mass estimate is fundamental, even at the initial stagesof the design. Aeronautical engineers have concluded that the cost of almost any kind ofvehicle could be approximated by means of the simple expression

P = k (WLS)0.87

Again, such a rough approach has its limitations, but can be useful in situations wherereturned costs are rare, such as in merely developing kinds of marine vehicles.

Care needs to be taken to keep the data as consistent as possible, e.g. untypical ships mustbe eliminated and data from the same time periods should be used, as well as a relativelystable currency. Cost per ton lightship may also be used, with typical prices around 4200 to5800 euros per ton for deep–sea container and ro–ro vessels. Bulk carriers would be 80% ofthis, VLCCs 75%, and products/chemical tankers 110%.

Where the alternatives differ in other respects, e.g. speed, machinery type, hull material,number of decks, etc. a more detailed process is required, unless the cost of the differencescan be easily identified and simply added to the basic price. When shipyard cost estimators

520

8.6 – Ship Costs

prepare a bid for a proposed ship, they look at costs based on technical characteristics. Butnow, rather than basing their work on a single characteristic, they look at one part of shipat a time and try to predict both material and labor costs for building each part. Typically,they may make individual estimates for about 200 physical components of the finished ship.Most of their unit costs are based on weights, which can be fairly accurately predictedduring the bidding phase.

In conceptual and basic design work, however, not enough is known about the ship to gointo such detail. So simplification is needed. Before lines plan and any drawings have beenprepared, the alternative designs are in the form of concepts about which very few is known:the principal characteristics, power, general weight breakdown. The total lightship mass canbe divided into hull, machinery and outfitting.

An approximate first estimate of hull costs is possible through the cubic number (CN),of machinery costs through power (usually PB), and for outfitting, including additionalequipment, through a corrected cubic number (CNc). This might lead to this expression forfirst cost:

P = C1 (CN)α + C2 (PB)β + C3 (CNc)γ (8.5)

where C1, C2 and C3 are coefficients, and α, β and γ are exponents, all of which are derivedfrom previous similar ships.

Again, such simple methods become inaccurate unless narrowly confined. Confidence can beincreased if one applies techniques that are considerably more accurate and yet require nomore knowledge about the alternative ships than what is implied above: main dimensions,power, geometrical coefficients. To do this the naval architect could break the ship down intothree major parts, namely, structural hull, outfitting plus hull engineering, and machinery.In addition expenses can be divided between material, labor and overhead. Labor rateshould include allowances for other indirect costs. Normally, material and labor costs foreach of the three major components are estimated, to which overhead is applied as a single,overall cost.

The total building cost of a ship may be divided into about eight principal groups, asindicated in Table 8.3, which gives an indication of the breakdown of shipbuilding costs intothose categories for four types of ship.

A division of building cost is carried out for hull, machinery, and outfitting costs into materialand labor. For these estimates, detailed data are required from shipyards, or machinerymanufacturers for the relevant acquisition costs. Such data are not normally fully availablefrom shipyards.

521


Item Cargo Liner Bulk Carrier Tanker Ro-Ro Vessel12-20000 dwt 25-50000 dwt 200-300000 dwt 4500-6500 dwt

1. Steel work materials 10 13 20 122. Steel work labor 12 13 14 113. Outfitting materials

and subcontractors 19 16 18 214. Outfitting labor 7 7 8 105. Main propulsion

machinery 13 12 8 106. Other machinery 7 7 8 107. Machinery installation

labor 3 3 3 38. Overheads 18 18 18 189. Appended costs 11 13 13 910. Total building cost 100 100 100 100

1. Plates, sections and welding materials.2. Direct labor only, excluding overheads.3. Semi–fabricated materials, e.g. timber and piping, items of equipment like

hatch covers, winches, anchors, galley gear, and equipment subcontractors,such as insulation and ventilation.Electrical equipment outside machinery space.

4. Shipyard outfitting trades only including electrical, excluding overheads.5. Slow-speed diesel or equivalent, e.g. boilers, turbines, gearing, condenser.6. Auxiliary machinery, generators, shafting, pumps, piping, controls in machinery space.7. Shipyard trades only.8. Variable overheads, e.g. social security and holiday expenses, supervision and

power supplies, and fixed overheads like plant maintenance.9. Classification society costs, design costs, towing tank costs.10. Profits not included, so percentages of selling price should be slightly lower.

Table 8.3. Approximate percentage breakdown of shipbuilding costs

Hull Structure Material Cost

The floating steel mass is taken from the lightship estimate. More detailed calculations arepossible through the application of classification societies’ rules and use of the midship sectionmethod. The scrap percentage (typically 10% but 20% or more for small vessels), is addedto give the steel mass in tons. The corresponding average price per ton of steel material canusually be obtained from a steel maker, e.g. British Steel Corporation who publish a pricelist for each main type of steel, heavy plates, sections, etc. Current prices for mild steel are400 euros per ton. Extra may have to be added for high–tensile steel, or a preponderance ofvery thin or very thick plates, etc.

522

8.6 – Ship Costs

Hull Structure Labor Cost

Man–hours are the basis of all direct labor costs, and once estimated, it is only necessaryto apply wage rates, overheads and profit to arrive at the total labor costs. At the simplestlevel, steelwork labor cost can be estimated from

Ch = steelwork tons× man–hourston steel

× wage rateman–hour

Man–hours per ton depend not only on the general level of productivity in a particular ship-yard, but also on the size and type of ship. Large vessels, such as tankers, have greater steelmass per unit area of structure, i.e. thicker plating, as well as more repetitive components,than smaller ships. Man–hours per ton for complex zones, e.g. hull ends and superstruc-tures can easily amount to two or three times that for parallel mid-body construction. As afirst approximation, Carreyette (1978) suggests that steelwork man–hours may be estimatedfrom:

Rh = 227W 0.85

s · L1/3

CB

where Ws is the net steel mass in tonnes and CB is the block coefficient at laden summerdraught.

In labor–intensive activities such as shipbuilding, it seems to be a natural law that asthe ship size or number of ships being produced increases, the rate Rh, of man–hoursrequired per tonne decreases asymptotically to some fairly constant rate. This suggests thatman–hours per ton can vary from below 50 for large ships, to over 200 for small ships.Substantially higher figures are appropriate for warships and offshore marine vehicles.

Wage rates per man–hour excluding overheads vary from yard to yard and country to country.In Italy the rate at present is approximately 17 euros per man–hour; but allowance shouldbe made for inflation if delivery dates are a long way ahead.

Outfitting Material and Labor Cost

The outfitting cost of a ship can vary markedly with ship type and specification; forexample, variations in cargo handling gear, accommodation and equipment. For passengerships the most significant component of outfitting mass is accommodation weight. At thesimplest level, a cost per ton of outfitting mass could be assumed for material plus labor,say around 9000 to 12000 euro for fairly straightforward ships

At a slightly more detailed level, material and sub-contractors’ costs could be separated intoa small number of items where information can often be obtained from manufacturers, e.g.hatch covers and cargo handling equipment, plus an aggregation of other remaining itemsbased on their total mass, say around 6000 to 9000 euro per ton.

Wage rates for outfitting workers are generally similar to steel–workers. Carreyette (1978)suggests that outfitting labor can be based on outfitting mass W◦ as follows:

523


H◦ = 2980×W2/3◦

whereas outfitting material can be estimated from:

C◦ = k◦ ×W 0.95◦

where k◦ is about 11000–14000 euro per ton.

Machinery and Labor Cost

The largest part of machinery mass is that of the main propulsion units, namely mainengines, gearboxes and propulsors. It is calculated reliably as a function of installed powerand engine speed, using the data of relevant databases. Economic studies comparingalternative machinery are quite common. In general, each different type of machinery has adifferent first cost, both of the basic prime mover, and as installed as a complete system.Detailed estimates of the purchase costs of main engines, gearboxes and waterjets, basedon installed power (P in kW), are summarized in Table 8.4, indicating that cost per unitpower falls with increasing power.

Approximately 80% of total machinery costs is contributed by the ten most significant itemsof equipment. Derated versions (e.g. to reduce specific fuel consumption) are almost the sameprice as the maximum rating model, despite the lower output. It should be noted that thesecosts are per unit at ‘Maximum Continuous Rating’ (MCR).

Type of Machinery Cost

Diesel engines Cd = 0.524 P

Gas turbines Cgt = 0.70 P − 6·10−6 P 2

Gearboxes Cgb = 114 + 0.043 P − 6·10−7 P 2

Waterjets Cwj = 0.936 P 0.84

Table 8.4. Cost of propulsion units

Because different machinery types may require different installed powers to achieve the sameship speed (different transmission or propulsive efficiencies and service ratings), the ratiosof absolute costs may not be the same as the relative costs. For twin–screw propulsion,about 15% can be added for diesel or gas turbine. For electric transmission, compared withgearing, 15 to 20% can be added.

Broad corrections may be applied for major changes, such as:• ship type• machinery aft or midships• propeller type and revolutions• steam conditions and number of boilers• difference in major auxiliaries• alternative fuels, e.g. marine diesel oil

524

8.6 – Ship Costs

Beyond this level, a more detailed specification and quotations from subcontractors wouldbe required for a full cost estimate.

The purchase cost of the remaining items of machinery, such as generators, together with theoverall labor cost for installation of machinery, is generally of the order of 40% of the propul-sion machinery cost. Assuming no subcontracting, the total cost of machinery installationlabor mat be calculated through the following expression:

CMl = 1200·P 0.82

Overheads

Overhead costs (sometimes called establishment charges) are costs which are necessaryto a shipyard, but which cannot be allocated to any particular ship under construction,They include salaries for administration staff and managers, watchmen as well as bills fortraining, electricity and power supplies, capital charges on plant, insurance, real estatetaxes, maintenance, research & development, and marketing.

The usual estimating technique is to express overheads as a percentage of total direct laborcosts as calculated previously, typically about 10 to 25%.

Something else to note is that what is usually called material costs should more correctly becalled costs for goods and outside services. Many shipyards, for example, use subcontractorsto do the joiner work on the deck covering. Consulting service bills would come in thiscategory, too.

Naval architects will seldom be called upon to perform detailed estimates of overhead coststo be assigned to a ship being bid. They should, nevertheless, have some understandingof the difficulties involved. To begin with, there are two basic kinds of overhead: fixedoverheads, those that remain much the same regardless of how busy the yard may be;variable overheads, those that vary with the level of activity within the yard.

This leads to the conclusion that overhead costs taken as a percentage of labor costs willrequire a prediction of what other work may be under way in the yard while the proposed shipis being built. Clearly, these estimates are outside the naval architect’s knowledge, but arethe project manager’s responsibility. It is enough to know that overhead costs, as a fractionof labor cost, will drop if the shipbuilding company is in a period of prosperity, with severalcontracts on hand.

Profit

In a shipyard, it is the job of management and not the cost estimator to decide on an appro-priate profit margin to add to the estimated building cost. The decision will be influenced bythe experience of the shipbuilding company with the type of work in question (and the asso-ciated uncertainty of the cost estimate), the yard’s order book, the state of the shipbuilding

525


market and competition, and the standing of the customer. A figure of about 10% of esti-mated costs is aimed at, but rarely achieved in the present competitive world of shipbuilding.

In simple cost estimates, it is possible to aggregate both overheads and profit together byadding about 30 to 35% to the sum of steelwork plus outfitting plus machinery costs.

Appended Costs

Appended costs include classification society fees and similar costs that the shipyard normallypasses on to the owner without mark–up for profit. They also include tug and dry–dockcharges based-upon standard rates that already include profit.

Total First Cost and Selling Price

The total price estimated from summing the above items can then be compared with currentmarket prices to assess whether the results appear to be reasonable. However, over recentyears, many shipyards have quoted prices below cost (damping) to obtain work, assisted insome cases by subsidies.

Duplicate Cost Savings

Small reductions are possible for production of multiple ships. It is estimated that bydoubling the number of sister ships produced, their average cost can be reduced by about3 to 5%, i.e. the slope parameter of the progress curve (or learning curve) is about 0.965.This means that the average cost of each of N ships is N−0.035 times the cost of one ship(Couch, 1963).

Indeed, there are two reasons for costs going down. The first is the matter of non–recursivecosts, e.g. costs required to build the first ship but which need not be repeated for follow–onships. Examples are basic design, engineering, plan approval, and preparation of numericalcontrol for fabrication. The second category consists primarily of labor learning: the increasedefficiency workers acquire through repetitive work. There are also saving in material costsbecause suppliers, too, may experience savings. The cost of labor on repeated ships fallsfaster than material costs.

8.6.2 Operating Costs

It is necessary to provide an understanding of the various components that go to make upthe annual costs of operating a ship. Unfortunately, there is no practical way to presenta tidy handbook of actual quantitative values, but there are a number of useful refer-ences (Benford, 1975; Reifs and Benford, 1975) that present some, but quickly outdated data.

The various components of operating costs are divided into two main categories, namely,manning costs and voyage costs The first includes costs that are constant, regardless of

526

8.6 – Ship Costs

whether the ship operates or not. The second category includes those costs that occur onlywhen the ship actually operates and therefore increase with increased ship operation. Theseare highly dependent on the route and the ship’s operating profile. Some of these can beindependent of the route on which the ship operates but highly dependent on the shipbuilding cost.

For the prediction of operating costs, the main characteristics of the ship’s operation mustbe defined. A basic step is to project the times involved in a typical round trip voyage,sometimes called a proforma voyage or cycle. Typically, such an imaginary, representativevoyage would include distance(s) and operating speed(s), estimated times for proceedingthrough a harbor, down a river, and out to the open sea, perhaps some time in passingthrough a canal, then more time in the open sea, followed by time in sheltered waters,manoeuvring time, time to unload cargo, time to shift to another pier, time to load cargo,and then perhaps a more minor image of all of the forgoing until a complete round trip iscompleted and the ship is once more loaded at the first port and ready to live. Factoredinto this must be some reasonable allowances for port and canal queuing delays and speedlosses in fog or heavy weather. Time may also be lost in taking on bunkers or pumping outholding tanks. If the ship is not designed to be route–specific, some basic assumptions willhave to be made

The total time for the cycle voyage, when divided into the estimated operating days peryear (typically 340–350), will give the estimated total number of round trips per year.

These scheduling calculations serve other purposes as well. In bulk ships deadweight iscritical: they are used to establish the weight of fuel that must be aboard when the shipreaches that point in her voyage where draft is most limited. In this phase of the work, oneshould give thought to the relative benefits of taking on bunkers for a round trip versus onlyenough for one leg. And one must of course add some prudent margin (often 20 to 25%) forbad weather or other kinds of delays.

The days per round trip estimate can also be used to establish the weight of othernon–payload parts of total deadweight that are a function of days away from port:fresh water, stores, and supplies. Finally, all this may lead to that critical number: theannual cargo or passenger transport capacity . That estimate of actual annual transportachievement should be tempered by some realistic assumptions as to probable amountsavailable to be carried on each leg of the voyage. In the bulk trades, that might amountto 100% use one way, and return in ballast. In the liner trades, one might typically as-sume 85% full outbound, 65% inbound; but this varies greatly depending on trade and route.

In more advanced studies, the naval architect may need to make adjustments to minimumallowable freeboard changes brought on by geographic or seasonal requirements. Shallowwater and ice operations may also be a factor.

527


Manning Costs

The breakdown of running costs discussed in the sequel represents standard accountingpractice in an Italian shipping line. Perhaps the first thing that should be said about theseaccounting practices is that they can be misleading. As an example, the maintenance andrepair category includes only money paid to outside entities, usually repair yards. Mainte-nance or repairs carried out by the ship’s crew are charged to wages; and materials used arecharged to stores and supplies.

528

8.6 – Ship Costs

Table 8.5 shows manning costs broken down by major category for four types of vessel .

Items Bulk Carrier Container Tanker Ro-Ro120000 DWT 2000 TEU 30000 DWT 1000 DWT

Crew 47 48 46 23Upkeep 30 27 32 28Insurance 16 19 14 6Stores & supplies 7 6 8 10Miscellaneous 7 6 8 10.Total 100 100 100 100

Approx % of total cost,including capital, fuel, 18 14 24 21port & cargo handling

Table 8.5. Percentage breakdown of daily manning costs

Crew Costs

Crew costs are calculated directly using the required crew size, breakdown, and relevantcharges. They include not only wages, but victualling, leave and reliefs, training, benefits,and travel.

Numbers usually vary between one and two dozens, depending on union agreements andshipowner’s willing to invest in automated equipment, more reliable components, andminimum maintenance equipment. Now with rational schemes for reducing personnel, newcomplements are nearly independent of ship size and power.

Crew costs include:• direct costs (wages for the crew, paid vacation, travels, overtime, food, pension

insurance);• indirect costs (health insurance, employment agency fees, trade union fees, training

and education, sick leave, working clothes, etc.).

In passenger ships average wage rates will decrease with increasing passenger capacitybecause most of the additional crew members will be at the lower end of the wage range. Adefault value may be used in conceptual design, as this does not vary significantly with shipsize or capacity.

In addition to direct daily wages there are many benefits paid to seafarers. In some instancesthere may be rotation schemes so that crew numbers are on year–round salary, with vacationtimes that may amount to as much a day ashore for every day aboard. There are sickbenefits, payroll taxes, and repatriation costs (travel between home and ship when rotatingon or off. These are major increments that must not be overlooked.

For general studies, not specific to any owner, it is necessary to set up a wage and benefitequation that recognizes that total costs are not directly proportional to numbers because

529


automation and other crew reduction factors tend to eliminate people at the lower end ofthe pay scale. The general equation for crew cost, Cc, may take this form

Cc = f1 N0.8c + f2 Nc

where Nc is number in crew, and f1 and f2 are coefficients that vary with time, flag, andlabor contract. In first instance, average annual crew cost may be assumed as 40000 europer crew member

The cost of victuals is a function of number of people aboard and operating days per year.Compared to wages, these costs are modest, and most owners consider the money well spentas a key element in attracting and retaining good seafarers.

Upkeep Costs

Upkeep includes maintenance, repair, stores and lubricating oil, while miscellaneous includesadministration, equipment hire, etc.

Maintenance and repair costs (M & R) comprises direct and indirect costs. Direct costsinclude the price of work in ship maintenance interventions and the cost of the expendedmaterial. Indirect maintenance costs refer to all other expenses of the ship when it is not inoperation.

Expenditure on maintenance and repairs depends on the class requirements, related tothe quality of the ship arrangement, on the freight market, the ship age, its type andsize, voyage patterns, and the shipowner’s strategy. Prediction of costs depending uponsome of these parameters is precarious due to the repair shipyard market. They are takenas 4% of acquisition cost. Analysis of actual M & R costs suggest that they are roughlyproportional to (ship size) and that they increase with age in real terms (i.e. before allowingfor inflation) at about 3 to 5% per annum. Insurance depends on a number of factors:ship type, size and value, plus the shipowner’s record. As a proportion of first cost, an-nual total premiums covering all categories of insurance carried vary between about 1 and 3%.

Annual costs for M & R can be estimated in two parts. Hull maintenance and repair willbe roughly proportional to the cubic number raised to the two–thirds power. Machinerymaintenance and repair costs will be roughly proportional to the horsepower also raised to thetwo–thirds power. A refinement on this approach is embodied in the following approximation

CM&R = k + f3 (L·B ·D)0.685 + f4 ·MCR + f5 ·MCR0.6

where MCR is main engine’s maximum continuous rating in kW, whereas f3, f4, and f5

are coefficients that vary with kind of ship, owner’s policy, and so forth, and k is a fixedamount regardless of hull size and engine power. Where data is available from different timeperiods, the escalation rates may be used to adjust them to a common basis. Such ratesmay also be used to estimate future cash flows if calculations are being made in money terms.

530

8.6 – Ship Costs

A large share of upkeep costs is related to the costs for lubricating oils. The total consump-tion of lubricating oils, consequently the expenditure on lubes, to a large extent dependson the size, age and technology of the ship equipment and machinery, as well as on theefficiency of maintenance.

Costs for provision depend on the crew numbers, costs for operating supplied depend on theship deadweight, while other expenses for management depend on the organization and thesize of the shipping company.

Insurance Costs

Risk insurance costs are divided into two groups:• insurance of hull, equipment and machinery (H & M);• insurance of cargo, crew and indemnity (protection and indemnity, e.g. P & I).

H & M insurance covers the ship hull, equipment and machinery, in which the shipownerhas a direct insurable interest.

P & I insurance covers the claims from third parties in cases such as, for example, oilpollution of the sea, or claims in cases of injuries to employees. These risks do not tendto be covered by direct placement on the insurance market as, more usually, this busi-ness is covered by mutual insurance between shipowners under the auspices of the P & I clubs.

Annual insurance expenses are calculated directly as a percentage of the vessel’s buildingcost. Protection and indemnity insurance, protecting the shipowner against law suites,usually based on ‘gross tonnage’ of the shipowner’s fleet, may add an annual cost of about1% of acquisition cost. Although even higher values can occur in less developed markets.

The annual cost of hull and machinery insurance is based on the ship’s insured value andsize. Underwriters use a ‘formula deadweight’, which is effectively the ‘cubic number’. Typ-ical figures might be one percent of the first cost. First cost is a rather illogical basis forfixing insurance premiums, but the marine insurance business is marked with such irrationalpractices.

Further Costs

Other manning costs include the expenses for operating supplies (spares, paints, chemicalsubstances, stores, ...), provisions, administration and other expenses related to generalmanagement of the ship.

The annual cost of stores and supplies would consist of three parts. The first wouldbe proportional to the ship’s size (mooring lines, for example). The second would beproportional to the main engine power (machinery replacement parts, for example). Thethird would be proportional to the number of crew members aboard (paint and cleaningcompound, for example).

531


A final annual cost category covers overhead and miscellaneous expenses. This would have toabsorb a prorated share of the costs associated with maintaining one or more offices ashore.Shore staffs may number anywhere from what can be counted on one hand to bureaucraciesbordering on civil service multitudes. .

Voyage Costs

Features of the ship exploitation define the range of voyage, the time required for the shiploading/unloading and the annual rate of the ship exploitation. Calculation of these featuresis the basis for the calculation of the voyage costs and the cargo handling costs. Dependingon the features of the ship exploitation, the share of a particular group of costs varies.

Fuel Costs

Specific fuel consumption at each speed is used to calculate annual expenses for fuelconsumptions. The effect of reduced power operation (manoeuvring, port restrictions, etc.)should not be neglected, due to the increase in specific fuel consumption, especially if gasturbines are used. These calculations also account for auxiliary fuel consumption. Fuel costsare directly derived by using actual fuel prices.

Although fuel prices vary throughout the world, such differences are often small enough to ig-nore in feasibility studies. The prices published in journals such as ‘The Motor Ship’, ‘Lloyd’sShip Manager’ or ‘Lloyd’s List’ may be taken as a good guide. After several years when heavyfuel prices were in the range 140 to 190 dollars per ton (diesel oil about 220 to 300), theyfell dramatically in 1986 to about half those levels, and have grown dramatically once againduring the last years up to 750 dollars per ton. Until some degree of stability emerges in fuelprices, it would be wise to investigate a range of prices in economic evaluations.

Port Costs

Port charges can be significant yet at the same time difficult to model. They tend to be highin the case of high–speed vessels. Port costs include a mixture of expenses, such as chargesfor entering the port, lighthouse dues, pilotage fees, tug service, mooring costs, port agencyfees, custom duty, etc. Some port costs are on a per–use basis; others are on a per–daybasis. Port charges may be based on the size of the ship. Pilotage may be based on ship draft.

Actual charges per call are variable enormously from one port to the other around theworld. They are beyond any comparison. The lowest costs per gross or net (registered) tonare usually found for large ships in ports with few facilities, e.g. tanker loading jetty ports,while the highest tend to apply to small ships in ports with an extensive range of facilities.Many ports now charge on a gross ton basis rather than net. Some investigations indicatethat quoted charges can be very high and may even lead to total expenses significantlyhigher than fuel cost. However, this may not be the case in reality, and operators will often

532

8.6 – Ship Costs

make special arrangements with port authorities, leading to major reductions in chargesactually being paid. This situation makes the calculation of port charges difficult to model.

Canal dues must be added where applicable. They are standardized according to the shiptonnage and draughts, although the rules for measuring NT are different for both the Suezand Panama Canals. Dues per transit per NT are approximately euro 1.83 laden and euro1.46 ballast for Panama. For Suez there is a sliding scale based on Suez net tons and SpecialDrawing Rights, which range for laden ships from about euro 6 for small ships up to 5000tons, to about euro 4 at 20000 tons, to about euro 2 for the largest ships; ballast rates are80% of laden.

Cargo Handling Costs

Another important cost is that of cargo handling, which may or may not be included inthe contract, depending on trade. If it were be included, it logically would be treated asa voyage cost. Associated with this may be brokerage fee and cargo damage claims, holdcleaning, rain text and other miscellaneous cargo–related expenses. In some conceptualdesigns cargo handling costs will be the same for all alternatives, in which case they can bebut all ignored.

Cargo handling costs vary widely between ports, especially for break–bulk general cargo. Forthe latter, loading or discharging in a port with low labor costs (e.g. in the Far East) may costas little as euro 8-10 per ton cargo ship-to-quay or vice versa, rising to as much as euro 60-80in high cost areas such as North America. A realistic average to use for conceptual designwill depend on the range of ports served, and also the range of cargoes carried: low stowagefactor cargoes such as steel cost less to handle than high stowage factor cargoes such as wool.

Unit load cargo handling costs are more uniform throughout the world. A container canvary between about euro 100 to euro 240 ship–to–quay, or vice versa, i.e. about euro 10 toeuro 20 per ton average cargo (multiplied by two for loading and for discharging). Stuffingand stripping the container itself will cost extra, but is not included in the sea freight charge.

Bulk cargo handling costs are not usually paid by the shipowner. However, loading costs areusually small for cargoes such as coal or grain (which are often sold f.o.b.) say, 60 euro centsper ton, while discharging is more expensive, around euro 2 to euro 4 per ton for mineralor granular type cargoes. Liquid cargo handling costs are largely pumping costs which areabsorbed by the ship (discharging) or shore terminal (loading).

Capital Charges

Capital charges to cover the investment and a return on capital are normally the mostsignificant component of running costs, due to factors such as high initial cost and possiblehigh required interest rate to account for high risk investment in unproven designs. Theyare around 30 to 50% of operating costs, excluding cargo costs if a good rate of return is to

533


be achieved. At their simplest, they are calculated as a direct proportion of first cost viathe capital recovery factor, modified to account for taxation.

In more complex situations, where taxation and loans arise, the processes outlined above arerequired to incorporate the acquisition cost into the economic calculations. In poor markets,shipowners will accept freight rates making no contribution to capital charges; but thiscannot be sustained indefinitely, especially if there are loans outstanding on the ship.

Freight Rates

All of the categories mentioned previously are items of expenditure. Income is generatedfrom the product of cargo carried per annum times average freight rate. Freight rates,especially in the bulk trades, vary widely with supply and demand. Past and present ratesfor particular cargoes and trade routes are published in ‘Shipping Statistics’ and in theshipping press, from the trends of which an assessment can be made regarding possiblefuture long term levels (unless RFR is the criterion). Some realistic escalation rate shouldalso be applied as, in the long run, freight rates increase with inflation. Such references oftenalso give freight rates dating back for several years, which can help in estimating possibleescalation rates.

Cargo liner freight rates are not usually published, varying widely between routes anddifferent types of cargo. However, shipowners and cargo agents are usually willing to providesome current freight rates for particular cargo liner services quay–to–quay. By selecting an‘average’ cargo and allowing for stowage factor if weight/measurement rates are quoted, areasonable estimate can be made. On some routes, especially short sea, ‘freight all kinds’rates are quoted for containers, i.e. a rate per box irrespective of contents. Liner freight rateson a route do not fluctuate as widely as bulk rates, but remain constant for some monthsbefore any percentage change (overall or for special factors like bunker charges) is applied.The German cargo liner freight index can be used to give some guidance on escalation.

For all freight rates, the shipowner does not receive the full revenue. For bulk cargoes, brokers’fees will amount to typically 2.5 to 5% of the gross freight, while for liner cargoes withinConferences, rebates of typically 10% are granted to shippers who use only Conference ships.

8.6.3 Other Decision Factors

In addition to factors which can be quite readily incorporated in economic models, thereare frequently other factors which influence both the decision as to whether to invest in avessel or not, and the decision as to what are the characteristics of the ‘best’ vessel. Somefactors may be more applicable to state–owned vessels than others:• maintain market share;• minimize risk to company survival;;• maximize foreign exchange earnings;• enhance company image or prestige;

534

8.7 – Ship Leg Economics

• utilize currently favorable tax allowances (although this can usually be evaluateddirectly in economic terms);

• maintain employment (may include operating staff or construction personnel).

However, even if the overall decision to acquire a ship is taken on such grounds, the selectionof the ‘most efficient’ design features is still likely to be made on basically economic grounds,e.g. choice of machinery. These raise the question of multicriterial decision–making, wheretechniques are being developed to weigh up attributes which can be quantified, but aremeasured in incommensurate units. Among other applications, the comparison of multi-rolevessels can be addressed.

8.7 Ship Leg Economics

The overall economics of the sea leg simply involves adding the inventory cost of the goodsin transit to the costs of the ship itself. This can best be understood by imagining that theshipowner buys the cargo as it is loaded on board and sells it as soon as it is discharged.The components of this inventory costs are the value of the cargo (what the owner paid forit), his time–value of money (expressed as an interest rate), the time the goods are in transit(during the period the investment is tied up), and the corporate income tax.

8.7.1 Inventory Costs

Where both ship and cargo are owned by the same entity, the cost of the goods, both intransit and in storage, represents money tied up - an investment that should be earningreturns. This cost is referred to as inventory cost .

The same principles can be applied to the case where ownership is divided. This can bejustified on the basis of the cargo owner being willing to pay somewhat higher freight ratesto the shipowner who provides the better service. Thus, if completely free market conditionsobtain, the team of owners (ship and cargo) would tend to make the same design decisionsas would be made by an individual who owned both.

The inventory cost passed along to the customer for one complete ship load can be designatedas I. If the tax rate if t, the government takes t·I, leaving the shipowner I (1−t). This amountmust cover the owner’s cost of capital tied up in inventory during the voyage, so

I (1− t) = i·v ·DWT tc

d

365

wherei annual interest rate appropriate to owner’s time value of moneyv value of cargo per ton (or other units) as loadedDWT t cargo deadweight in one tripd days in transit

535


Transposing (1− t) yields

I =i·v ·DWT t d

365 (1− t)

The inventory cost per voyage can be converted to an annual cost, Ia, by multiplying by thecargo–legs per year

Ia = I ·RT =i·v ·DWT t d

365 (1− t)·RT

where RT is the number of round trips per year, assuming one-way trade route.

If the annual transport capacity in tons (or other units) is

C = DWT t ·RT

the annual inventory cost is

Ia =i·v ·d

365 (1− t)·C

This can be converted to a unit inventory cost per ton of cargo delivered, CI , as

CI =Ia

C=

i·v ·d365 (1− t)

8.7.2 Economic Cost of Transport

Equation giving the unitary inventory cost pertains to the economics of the cargo alone. Thecombined economics of ship and cargo can be considered by adding the unit inventory costto the required freight rate, yielding the economic cost of transport :

ECT =CR·P + Y

C+

i·v ·d365 (1− t)

As common sense dictates, recognition of inventory costs will invariably tend to favorhigher speeds. The days in transit, d, must of course vary inversely with sea speed, and soincreasing speed will always decrease CI . The net result is that the ship speed selected tominimize ECT will always be higher than that selected to minimize RFR.

In the bulk trades this impact of inventory costs is so trivial as to be safely ignored. Inthe liner trades high–speed ships have demonstrated a marked capacity to attract high–value, high–paying cargo. That correlation of high–value cargo and high–speed ships maybe explained by the machinations of conference rate setting practices as much as by theinventory value of the goods in transit.

536

8.7 – Ship Leg Economics

8.7.3 Effects on NPV

To this point only inventory charges have been considered as they affect unit costs oftransport, while it should be preferred to consider other measures of merit such as NPV .

Would the shipowner buy the cargo as it comes aboard, the inventory cost could be treatedmore or less like working capital. I can, in short, be treated as a non–depreciable addition tothe initial investment. Since NPV is generally found by determining the discounted presentvalue of all the future cash flows and then subtracting the initial investment, the ship’snet present value would thereby be reduced by the value of one ship load of cargo, that is,v ·DWT t

c . This would be tempered, however, by a factor equivalent to the fraction of timethere is cargo in transit. In summary:

∆NPV = −v ·DWT t · d·RT

365

where ∆NPV is the annual interest rate appropriate to owner’s time value of money.

537


538

Bibliography

[1] Benford, H.: Investment Returns Before and After Tax, The Engineering Economist ASEE,1965.

[2] Benford, H.: Measures of Merit for Ship Design University of Michigan,Ann Arbor. 1969.

[3] Benford, H.: Bulk Cargo Inventory Costs and Their Effect on the Design of Ships and TerminalsNarine Technology, Vol. 18, no. 4, 1981, pp. 344–349.

[4] Buxton, I.L.: Engineering Economics Applied to Ship Design Transactions RINA, Vol. 114,1972, pp. 409–428.

[5] Buxton, I.L.: Engineering Economics and Ship Design British Maritime Technology, Tyne andWear, 1987.

[6] Carreyette, J.: Preliminary Ship Cost Estimation Transactions RINA, Vol. 120, 1978, pp.235–258.

[7] Couch, J.C.: The Cost Savings of Multiple Ship Production International Shipbuilding Progress,1963.

[8] Erichsen, S.: Optimising Containerships and their Terminals SNAME Spring Meeting,1972.

[9] Goss, R.O.: Economic Criteria for Optimal Ship Design, Trans. RINA, Vol. 107, 1965, pp.581–600.

[10] Grant, E.L. and Ireson, W.G.: Principles of Engineering Economy , Ronald Press, New York,1960.

[11] Herbert, R.N.: Design of the SCA Special Ships Marine Technology,1971.

[12] Kerlen, H.: How Does Hull Form Influence the Building Cost of Cargo Vessels Proceedings,Second International Marine Systems Design Conference, Danish Technical University, Lyngby,1985.

[13] Napier, J.R.: On the Most Profitable Speed for a Fully Laden Cargo Steamer for a GivenVoyage, the Philosophical Society of Glasgow, Glasgow, 1865.

[14] Thuesen, G.J. and Fabrycky, W.J.: Engineering Economy , Prentice-Hall, Englewood Cliffs,N.J., 1989.

[15] Volker, H.: Economic Calculations in Ship Design, International Shipbuilding Progress, 1967,Vol. 14, no. 150.

539

Documents

SHIP DESIGN ——————— A Rational Approach Giorgio Trincasunina.stidue.net/Universita' di Trieste/Ingegneria... · 2009. 6. 24. · Department of Naval Architecture, Ocean