Machine learning in quay wall design

DEGREE PROJECT IN THE BUil T ENVIRONMENT, SECOND CYCLE, 30 CREDITS

STOCKHOLM, SWEDEN 2021

Machine learning in quay wall design

JORIS LANGEVELD

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ARCHITECTURE AND THE BUil T ENVIRONMENT

iii

Machine learning in quay wall design

by Joris Langeveld

Master Thesis, 2021 KTH Royal Institute of Technology School of Architecture and the Built Environment Department of Civil and Architectural Engineering Division of Soil and Rock Mechanics

Examiner: Prof. Stefan Larsson Stockholm, Sweden KTH Royal Institute of Technology Supervisors: Prof. Stefan Larsson Stockholm, Sweden KTH Royal Institute of Technology Ir. Thom Olsthoorn Utrecht, Netherlands Movares

v

Abstract Today we live in a world where technology is changing the world and projects around us at a rapid pace. It is believed companies will have to change their operations to maintain an edge. At Movares a Dutch engineering consultancy firm they recognize the importance of digital transformation. Their goal is to apply digital transformation to their day-to-day operations enabling engineers to focus on innovation. One of these operations concerns the optimization of quay wall designs.

In this thesis, an automated optimization routine for the design of qual walls is suggested. The design of Quay walls is influenced by site-specific variables and design variables. Currently at Movares, the design variables are determined based on engineering judgement a combination of norms, experience, and data. The lack of an integral analysis of the design variables makes it difficult to judge the efficiency of the design in terms of costs. Furthermore, the speed of this trial-and-error based approach is limited by the designer interacting with the analysis software.

The automated optimization routines suggested in this thesis try to pose a solution to these problems. In an automated routine, the task of the engineer shifts from evaluating results to formulating an optimization problem. The engineer operates at a higher level and an algorithm is responsible for evaluating the intermediary results. The proposed routines can be best described as a databased or data-driven routine and a hybrid routine. The databased routine bases its evaluation solely on data and relies on statistical tools to extract insight. For the design of quay walls, this data includes soil properties, soil geometry and design variables. The hybrid optimization routine combines the use of data with a theory-based model. A theory-based model in contrast to databased models is based on scientific understanding of a system or process, e.g., determining slope stability with Bishop’s method, or soil behavior under cyclic loading.

From the work in this thesis, it is shown hybrid optimization routines were able to identify the optimum with respect to costs within an acceptable timeframe. With the use of Machine learning techniques, the total computation time was significantly reduced compared to uninformed sampling techniques

vii

Sammanfattning Vi lever idag i en värld där teknologin snabbt utvecklas. Företag måste förändra och utveckla sin verksamhet för att upprätthålla ledande teknik. Vid det Holländska konsultföretaget Movares har man insett vikten av digital utveckling. Målet är att tillämpa digitala verktyg I det dagliga arbetet för att möjliggöra fokus på innovation. En av dessa tillämpningar är dimensionering av kajväggar.

I detta examensarbete föreslås en automatiserad optimeringsrutin för dimensionering av kajväggar. Idag påverkas designen främst av platsspecifika variabler som bestäms baserat på ingenjörsmässiga bedömningar som baseras på normer, dimensioneringsmodeller, materialegenskaper och erfarenhet. Bristen på integrerad analys av designvariabler gör det svårt att bedöma effektiviteten i designprocessen med avseende på kostnaderna. Dessutom är hastigheten på denna ”trial-and-error” process begränsad av mänsklig interaktion med mjukvara.

Den automatiserade optimeringsrutinen som föreslås i detta examensarbete försöker lösa dessa problem. I en automatiserad rutin så är ingenjörens uppgift att skifta från att utvärdera resultat till att formulera optimeringsproblem. Ingenjören arbetar på en högre nivå och en algoritm ansvarar för att utvärdera demellanliggande resultaten. De föreslagna rutinerna kan beskrivas som en databaserad eller datadriven modellfri rutin och en hybridrutin. Den databaserade rutinen baserar sin utvärdering enbart på data och bygger på statistiska verktyg för att extrahera insikt.

För konstruktion av kajväggar inkluderar dessa data markegenskaper, jordgeometri och konstruktionsvariabler. Den modellfria rutinen beskrivs bäst som en databaserad modell som enbart baserar sin utvärdering på data. Hybridoptimeringsrutinen kombinerar användningen av data med en teoribaserad modell. En teoribaserad modell i motsats till databaserade modeller är baserad på vetenskaplig förståelse för ett system eller en process, t.ex. bestämning av stabilitet med Bishop´s-förenklade metod eller jordens beteende under cyklisk belastning.

Arbetet i denna studie påvisar att hybridoptimeringsrutiner kan identifiera det optimala med avseende på kostnader inom en rimlig tidsram. Med användning av maskininlärningstekniker reducerades den totala beräkningstiden betydligt jämfört med oinformerad provtagningsteknik.

ix

Acknowledgements This thesis concludes my master’s studies in Civil and Architectural engineering at KTH, Stockholm. The

thesis was conducted at the Dutch engineering firm Movares in Utrecht.

First, I would like to thank my industrial supervisor Ir. Thom Olsthoorn for his continuous support and

clever insights in times of feeling stuck. His help, on both the programming side and geotechnical side,

proved to be invaluable. I also like to thank my examiner, Prof. Stefan Larsson from the division of soil and

rock mechanics at KTH for his guidance and help on the geotechnical side.

Finally, I would like to thank the team constructive hydraulic engineering at Movares for their supportive

feedback and their openness. I really enjoyed my time at the company thanks to their pleasant work

environment. A special thanks to Martijn v.d. Elzen for taking me on board of this project and involving

me in the team.

Sassenheim, July 2021

Joris Langeveld

xi

Nomenclature

Greek symbols

Symbol Description Unit 𝑦 Objective / target variable 𝑥 Design variable - 𝑥∗ Optimal design variable - 𝜃 Gradient - 𝜆 Learning rate - 𝜖 i.i.d. error - 𝜑 Friction angle deg 𝛿 Wall friction angle deg 𝛿 Horizontal displacement mm 𝜎 Stress MPa 𝜎 Standard deviation - 𝜎² Variance - 𝜇 Mean - 𝜌 Density kg/m3 𝛾 Specific weight kN/m3

Latin symbols

Symbol Description Unit 𝑀 Moment force kNm 𝑉 Shear force kN 𝑁 Normal Force kN 𝐾 Earth pressure coefficient - 𝑐 Cohesion kPa

Subscripts

subscript Description 𝑖 Instance max Maximum min Minimum a Active p Passive ed External design rd Resistance design w Weight b Bias

xii

Abbreviations

EC Eurocode

AI Artificial intelligence

TPE Tree-Structured Parzen estimator

ML Machine learning

ANN Artificial neural net

MLP Multilayered perceptron

KNN K-nearest neighbor

CPT Cone penetration test

GWL Ground water level

RSE Root square error

RMSE Root mean square error

xiii

Table of Contents

1 Introduction ............................................................................................................................ - 1 -

1.1 Background ................................................................................................................................................................ - 1 -

1.2 Aim and Objectives .................................................................................................................................................. - 2 -

1.3 Approach ................................................................................................................................................................... - 2 -

1.4 Outline ........................................................................................................................................................................ - 3 -

2 Literature study ....................................................................................................................... - 5 - 2.1 Optimization theory ................................................................................................................................................. - 5 -

2.1.1 Domain space ................................................................................................................................................. - 6 -

2.2 Machine learning concepts ..................................................................................................................................... - 7 -

2.3 Machine learning algorithms .................................................................................................................................. - 8 -

2.3.1 K-nearest neighbor (KNN) .......................................................................................................................... - 8 -

2.3.2 Multilayer perceptron (MLP) ....................................................................................................................... - 9 -

2.3.3 Decision tree .................................................................................................................................................- 10 -

2.4 Hyperparameter search ..........................................................................................................................................- 11 -

2.4.1 Gradient descent...........................................................................................................................................- 11 -

2.4.2 Bayesian model-based optimization ..........................................................................................................- 12 -

3 Method .................................................................................................................................. - 15 - 3.1 Case study Delfste Schie .......................................................................................................................................- 15 -

3.2 D-sheet piling ..........................................................................................................................................................- 16 -

3.3 Modeling of the objective function .....................................................................................................................- 17 -

3.4 Experimental setup ................................................................................................................................................- 18 -

3.4.1 Hybrid optimization routine.......................................................................................................................- 18 -

3.4.2 Databased optimization routine ................................................................................................................- 19 -

3.4.3 Input ...............................................................................................................................................................- 20 -

4 Results .................................................................................................................................. - 21 - 4.1 Hybrid optimization kg’s of steel ........................................................................................................................- 21 -

4.2 Hybrid optimization displacement and moment capacity ...............................................................................- 23 -

4.3 Hybrid optimization displacement, moment capacity and stability ...............................................................- 24 -

4.4 Hybrid optimization displacement, moment capacity, stability and free height .........................................- 26 -

4.5 Result hybrid optimization routine......................................................................................................................- 27 -

5 Discussion and further work ................................................................................................... - 29 - 5.1 Considerations databased optimization ..............................................................................................................- 29 -

5.2 Discussion ................................................................................................................................................................- 30 -

5.3 Future work .............................................................................................................................................................- 31 -

6 Bibliography ......................................................................................................................... - 33 - Appendix A................................................................................................................................... - 35 -

A.1 CPT case study ...............................................................................................................................................................- 35 -

A.2 Soil parameters case study ............................................................................................................................................- 35 -

Appendix B .................................................................................................................................. - 36 - Random grid search ..............................................................................................................................................................- 36 -

TPE ..........................................................................................................................................................................................- 38 -

- 1 -

1 Introduction

1.1 Background

Quay walls are used alongside waterways, canals and harbors where they act as earth retaining structures, a place for ships to berth and the transshipment of goods [1]. One type of quay wall design commonly used in the Netherlands is a sheet pile wall. A sheet pile wall is an earth retaining structure consisting of a barrier of interlocking steel profiles providing horizontal stability to the watersides. There are two main types of sheet pile wall designs, anchored sheet pile walls and cantilevered sheet pile walls. A cantilevered sheet pile wall derives its support completely from the surrounding soil and an anchored sheet pile wall derives its support partly from the surrounding soil and partly from an anchor. Cantilevered walls are often used when the height of the backfill is below five meters, when the height exceeds five meters it is usually more economical to choose for an anchored wall [2].

Even though these designs are common in the Netherlands, at Movares (a large Dutch engineering firm) they are designed in a trial-and-error-based approach. Design variables e.g., what type of sheet piling to use and dimensions of the sheet piling are chosen by the engineer beforehand, decisions are based on engineering judgement. Depending on the results the process may involve one or more iterative steps to optimize in terms of material usage or to minimize deformations. However, this process is time-consuming and ends when time or/and money runs out which may lead to suboptimal designs. Furthermore, the initial design choices might not have driven the design in the most effective design space, without a global image of the design space it is easy to mistake a local optimum for a global optimum. There may be other designs, also suitable solutions for the structural problem, it becomes a matter of exploitation versus exploration.

Movares is seeking to develop tooling for automating the optimization process in the design of quay walls. They want to achieve this by incorporating Machine learning (ML) into the design process. The hope is that by automating the optimization process valuable insights can be obtained in the early stages of a project. Because of high uncertainties during these early stages, it is paramount this happens with little allocation of resources.

A recent literature study [3] on the use of Artificial intelligence (AI) and ML methods in civil engineering showed a lot of research has already been done in the realms of structural health monitoring (SHM) and damage detection, either vision-based or data-driven. The authors conclude AI methods in civil engineering are no longer in the initial phase further studies should propose alternative solutions for the adopted AI method in which optimal parameters for the algorithms are considered to improve accuracy. Future studies should provide a clear presentation of the process by which the optimal algorithmic parameters are chosen (i.e., training, validation, and testing) is essential to ensure the best performance for any AI-based methodology.

However, applications in the realm of structural optimization still seem to be rare in the open literature, few studies have been found. A recent application in the field of structural optimization was where the authors used deep learning (DL) a form of ML for the structural optimization of a 10-bar truss systems [4]. The same approach was then extended to more complex structures with similar results. The authors emphasize the importance of optimal algorithmic parameters as a condition to successfully integrate DL into structural engineering. In [5] the author compares ML learning techniques for the optimization of steel endplates. An optimization tool was used to collect data from one thousand optimized connections, this data was then used to train various ML models and their performance was compared. On a test case results show the ML algorithm was able to save material. The author concludes by expressing the possibilities of AI in the construction industry, it could release engineers from reinventing the wheel and reduce design time.

- 2 -

1.2 Aim and Objectives

The design of Quay walls is influenced by project-specific variables e.g., soil parameters, water levels, etc. and design variables e.g., type of sheet pile, sheet pile length. Currently, design variables are often determined based on engineering judgement, which means the effectiveness of the design depends on the experience of the engineer. Though, the lack of an integral analysis makes it difficult to judge the efficiency of the design in terms of costs compared to other designs.

This research aims to design an automated optimization routine that can aid engineers in optimizing their design in a computationally efficient way. The goal is to move from a heuristic way of designing to a more efficient way of designing. Optimization is in terms of material usage or specifically kg’s of steel directly relating to costs.

The main objectives of this study are to:

• Design an automated routine for finding the optimal (set of) design variable(s) in a design space.

• Generate insight into how changes in the design variables influence the design and how the design variables relate to each other.

• Limit computation time (effective use of resources).

• Provide clear insight into the process by which optimal algorithmic parameters are chosen.

1.3 Approach

The first step in optimization is building a quantitative model that expresses the relationship between the design variables and the feature of interest. In civil engineering, we usually have a good understanding of these relationships, they are based on physics and mechanics and form the basis of every design tool [6]. However, as our understanding of these physical processes grows so does the complexity of our design tools. It is not uncommon for some of these calculations to run for several hours. A slight adjustment in a single variable requires a new calculation, making iterative optimization ineffective. In contrast, an ML model can draw from previous knowledge [7]. ML models rely on data to construct a functional map between the input and output. Such models are referred to as data-driven methods or databased methods.

In this study, two methods are suggested a hybrid optimization routine shown in Figure 1 and a databased optimization routine shown in Figure 2. The hybrid optimization routine is a combination between an ML learning algorithm for sampling design variables and a theory-based design tool for analyzing the results. The data-driven optimization routine utilizes the same technique for sampling design variables, but the theory-based design tool is replaced by a databased ML model.

Figure 2 Databased optimization routine

Figure 1 Hybrid optimization routine

- 3 -

Both methods rely on hyperparameter search, hyperparameter search is ML lingo for the fine-tuning of model parameters in an ML model e.g., learning rate, number of layers number of neurons, etc. In recent years multiple developments have been made on this topic, moving from uninformed search methods such as grid search and random search to more informed methods based on ML techniques [8, 9, 10]. The idea is that these ML techniques can be used for fine-tuning the ML model parameters in the second method but also for sampling the design variables.

A case study was conducted on a recent quay wall project, to perform the study within a limited amount of time some simplifications and assumptions were made: • The quay wall consisted of a cantilever sheet pile wall (common in this area of the Netherlands). • The structural analysis was carried out in 2D. • Dynamic assessment of the cantilever sheet pile wall was not performed. • The following design variables were considered:

- Length of the sheet pile - Steel quality - Type of the sheet pile - Embedment depth

1.4 Outline

Chapter 2 provides the relevant background material on optimization theory and the different ML techniques applied in this study. The importance of hyperparameters in ML models is explained and two methods for determining hyperparameters are discussed. One based on gradient descent, and one based on Bayesian statistics.

In Chapter 3 the methods and the experimental setup used for testing the optimization routine are presented. Two routines are proposed, one based on a combination of theory and data, and one based on data. The first routine uses an unsupervised ML classifier in combination with a theory-based model. The second routine uses a supervised ML model trained on sample data. A caste study is selected for comparison, furthermore, the theory-based software and project-specific input are described.

Chapter 4 is concerned with presenting the results of the study. Four different objective functions have been tested; a correlation study has been done to measure the effectiveness of the objective function compared to the target variable.

Concludingly in Chapter 5, the results of the study are discussed, and a conclusion is formed. Improvements to the study are proposed and recommendations for future research are made.

- 5 -

2 Literature study

2.1 Optimization theory

Optimization is an essential aspect in many fields of study and is well established in fields such as aerospace engineering and energy engineering [11, 12]. The goal of optimization is to find the best solution to a problem while trying to lower a certain cost function. Optimization problems are often formulated in terms of cost efficiency, material usage, environmental impact or a combination of either. With increasing attention environmental issues and the need to move towards a sustainable society, reducing environmental impacts has become another significant objective because of the considerable amount of CO2 emissions in the civil engineering industry [13].

Application within structural optimization often consists of large-scale problems, with complex and sometimes conflicting relationships between the design variables and the target variable (the variable for which one is optimizing). Making it very difficult to find these design variables by hand. However, thanks to the rapid growth of computational power in recent years it is now possible to solve these problems.

In mathematical terms an optimization problem can be described as:

𝑥∗ = 𝑎𝑟𝑔min

�� 𝑓(𝑥)

( 1 where

𝑥∗ = the optimal (set of) design variables(s)

𝑓(𝑥) = the objective function

There exist two important spaces or sets in optimization theory a convex or concave set and a non-convex or non-concave set. For a set to be convex or concave two points in the set must always be able to intersect on a straight line without the line crossing over any of the boundaries in the set [14], shown in Figure 3. The main difference is that optimization techniques in a convex set are guaranteed to reach an optimum (if certain restrictions are respected). For a non-convex set this is not always true.

In practice, most optimization problems cannot be described by continuous smooth mathematical functions therefore often resulting in non-convex optimization problems. In these cases, it is not always clear whether a global optimum or a local optimum is reached. In such a case one solution is to run several optimizations each with different starting points inside the design space [6].

A problem with two or more objective functions is called a multi-objective problem. If these objectives conflict they cannot be minimized by the same set of design variables. Instead, several sets are identified, and they are called Pareto optimal solutions. Each of the sets minimizes one of the conflicting objective functions. It is not possible to decrease one of the objective values without increasing the other(s) [8]. For example, extensive of surveying of the soil at a construction site would allow an engineer to design with less safety margin, reducing costs. However, the extensive surveying is also costly, so in this example we try to balance costs and uncertainty.

Figure 3 Convex (left) set vs. non-convex set (right) [14]

- 6 -

2.1.1 Domain space

The set {𝑥� … . 𝑥�} consisting of design variables 𝑥 𝜖 ℝ� is called the domain space [14] e.g., length of the sheet pile, type of the sheet pile and steel quality. The domain space for even a small optimization problem can quickly grow very large due to repeated multiplication. To limit the extent of the domain space, we can specify constraints. For simplicity, constraints can be either equality constraints or inequality. Constraints can be divided into two categories, linear constraints, and nonlinear constraints.

linear constraint:

𝑥��.� < 𝑥� < 𝑥��.�

nonlinear constraint:

𝑥�� ≤ 1.0

The boundaries created by these constraints mark the feasible region, this is the region in which our optimum lies. From points within this region we can identify the directions that optimize our objective function [14].

After having determined the domain space and feasible region the next step is determining the distribution(s) from which we sample our set of design variables. A common approach is to sample from a uniform distribution, every design variable has the same chance of being sampled. This is a good approach if it is not evident where an optimum might be, usually if the optimization problem is very complex. However, if we have some understanding of the optimization problem (domain knowledge), it might be more efficient to target our sampling around specifics points in the domain space. For instance, if we know the length of a sheet pile wall over a certain point becomes cost-ineffective and we know the length can never be lower than zero we could choose to sample from a log-normal distribution shown in Figure 4. Or if we know a certain design variable is often centered around a particular value, we might want to sample from a normal distribution shown in Figure 5.

In probability theory, such information is called prior probability distribution or a prior 𝑃𝑟(𝑥) [15]. Not only does this information help with efficient sampling it is also used by the algorithm in 2.4.2 which is based on probabilistic modeling. The algorithm fits the design variables to a distribution, by introducing the distribution beforehand we can reduce the initiation phase of the algorithm and improve prediction accuracy.

Figure 4 Log-normal distribution Figure 5 Normal distribution

- 7 -

2.2 Machine learning concepts

This section provides a short summary of some of the concepts and general terminology in ML that are referenced in this study. ML models rely on data to construct a functional map between the input and output. The model can then be extrapolated on unseen data to predict output. In ML input is often referred to as features and output is referred to as labels.

2.2.1 Supervised learning

Supervised learning is one of three forms of ML. In supervised ML, features (input) and labels (output) have to be collected beforehand, these are combined into a data set called the training data. The training data is fed through the model in an attempt to match the input to the output by adjusting model parameters. Model parameters are the parameters that are adjusted during the training of the model. For example, in an MLP these include weights, learning rate and bias. The trained predictive model can then be used to extrapolate on unseen data, its performance on unseen data determines the quality of the model [16].

2.2.2 Clustering

Clustering is a form of unsupervised learning where we only have features (input) and no labels (output), there is no supervisor. The goal is to group or cluster features together based on similar statistical properties such as distance or correlation. Based on the statistical properties of the input data various ML techniques can be used to find and exploit regularities in the input space [16].

2.2.3 Mixture models

Mixture models are a form of clustering, in mixture models we have a set of observations but we don’t know beforehand to which distribution (class) each observation belongs. Or in some cases, we might not even know even how many distributions are present. For simplicity let’s assume we have two classes, and we know the prior distribution beforehand. If we assume the data is equally separated the prior 𝑃(𝑎) =

𝑝(𝑏) = 0.5, the steps of the algorithm would look like this [17]:

1. Start by assuming random values for the mean and variance (𝜇� ,𝜎�²), (𝜇� ,𝜎�²)

2. For each data point 𝑥� calculate the conditional probabilities 𝑃(𝑎|𝑥�), 𝑃(𝑏|𝑥�) and assign 𝑥� to the most probable

3. Adjust the Gaussians (𝜇� ,𝜎�²), (𝜇� ,𝜎�²) to fit to the new points.

4. Iterate until convergence.

2.2.4 Hyperparameters

Apart from the model parameters, previously discussed, many ML algorithms also have model hyperparameters or short hyperparameters. Unlike model parameters, these hyperparameters cannot be inferred from the training data and have to be found manually often through an iterative process. The quality of the model is highly dependent on these parameters [18].

2.2.5 Deep learning

Deep learning is a form of an MLP explained in detail in section 2.3.2, deep learning is used to describe an MLP with more one than one hidden layer although there is no clear border. Some considerer an MLP with over 100 hidden layers as a deep learner, usually over three to four hidden layers is considered as a deep learner although it also depends on the problem and the model architecture [19].

- 8 -

2.3 Machine learning algorithms

An ML algorithm or model is a mathematical model that extracts insight from data in the context of a problem or task. An ML algorithm usually refers to the specific ML technique and an ML model refers to an initialized ML model ready to be used. These words are sometimes used intertwined but here algorithm refers to the technique and model refers to an initialized model. ML algorithms can be divided into three classes, supervised, unsupervised and reinforcement learning. Unsupervised learning is mostly concerned with clustering data, an example of unsupervised learning is an algorithm that can classify soils based on their soil properties. Reinforcement learning is mostly concerned with a decision-making process based on a systems state involving a flow in time or where events happen in sequences [16]. A theoretical example of reinforcement learning related to geotechnical engineering would be a self-advancing TBM which constantly monitors states such as hydraulic pressure and drill resistance to be able to make informed decisions and guide itself. The model works by an incentive or reward function, the objective for the model is to maximize this reward by choosing the right chain of actions based on itsoperating state. In this example the reward for the TBM would be to advance as fast as possible, penalties could be added to limit actions that cause downtime. This study focuses on supervised learning, where the idea is to use data from a theory-based model to train an ML model, replacing the computational expansive theory-based model. In [5] the author compares ML learning techniques for the optimization of steel endplates. An optimization tool was used to collect data from a thousand optimized connections, this data was then used to test various ML algorithms. Amongst the top performers were a KNN algorithms and a MLP algorithm. In this study, three ML algorithms will be tested a KNN algorithm a MLP algorithm and a tree-based algorithm. The advantages of using KNN is that it is a well-established algorithm it’s easy to grasp and it has no model parameters. In contrast, the MLP algorithm is more complex and has lots of model parameters (hyperparameters) however in combination with hyperparameter search good results have been achieved [9]. Finally, the tree-based algorithm is chosen because it requires little to no data preparation and has few model parameters therefore achieving good out-of-the-box-results.

2.3.1 K-nearest neighbor (KNN)

K-nearest neighbor is a well-established algorithm in the ML community with its origins dating back to 1951. It was first developed by Evelyn Fix and Joseph Hodges in [20], the essence is discriminating between two distributions 𝐹,𝐺 based on the value of a random variable 𝑥 ∈ 𝑋 belonging to either 𝐹 or 𝐺. As an example, 𝐹 could the class sand and 𝐺 could be the class clay, 𝑋 is a series of input data e.g., cohesion, soil friction angle and density. The strength of the algorithm is that it has no hyperparameters. The process of classifying data is like this: first, the input data is fed into the algorithm and the distance between other datapoints is calculated. The distance can be calculated in multiple ways, the most common are:

Euclidian distance:

��(𝑥� − 𝑦�)

�

��

²

( 2. Manhattan distance:

� |𝑥� − 𝑦�|

�

��

( 3.

- 9 -

Hamming distance:

�|𝑥� − 𝑦�|

�

��

𝑥 = 𝑦 → 𝐷 = 0

𝑥 ≠ 𝑦 → 𝐷 = 1

( 4.

After calculating the distance between the existing datapoints and the new input vector, the new input vector is classified. To classify the new input vector a majority vote is issued the between the 𝑘 nearest datapoints, for example, if 3 of the 5 nearest datapoints are blue and only 2 of the 5 are red the point is classified as blue.

2.3.2 Multilayer perceptron (MLP)

A multilayer perceptron (MLP) is a model consisting of a network of simple interconnected perceptron’s, or nodes. The model is a (non-)linear mapping between an input vector and an output vector. The nodes are connected by edges that carry a certain weight, inside each neuron the total weight is summed, and a bias is added. The bias serves to prevent zero-valued input from not carrying any signal. This summed value is called the activation potential, the activation potential is then fed through an activation function. This activation function determines the final output of each node [21]. It is the superposition of many simple nonlinear activation functions that enables the multilayer perceptron to approximate non-linearity in a model. If the activation function was linear then the multilayer perceptron would only be able to model linear functions.

The output of a node is scaled by the connecting weight and fed forward to be an input to the nodes in the next layer of the network. This implies a direction of information processing; hence the MLP is known as a feed-forward neural network. The architecture of a multilayer perceptron is variable but in general, will consist of several layers of neurons. The input layer plays no computational role but merely serves to pass the input vector to the network. An MLP may have one or more hidden layers, but it has only one input and output layer. An MLP is fully connected or dense when each node is connected to every node in the next and previous layer [21]. Figure 6 shows a simple example of a fully connected network with a single hidden layer.

Figure 6 Structure of a single layer dense neural net, inspired by [22]

- 10 -

Figure 7 Inner workings of a single neuron, inspired by [22]

Figure 7 shows the workings of a single neuron, the neuron takes the input features 𝑥� to 𝑥� and multiplies these with the weights on the edges 𝑤� to 𝑤� next a bias 𝑏 is added, the total sum of these values is called the activation potential. The activation potential is then transferred trough an activation function. The purpose of these activation function is to introduce non-linearity into the network. If the activation functions are linear 𝐹�,� would just be linear regression.

Activation potential:

𝑎.𝑝. = 𝑥 ∗ 𝑤′ + 𝑏

The MLP can be described by the mapping:

𝑌 = 𝐹�,�(𝑋) + 𝜖

Where 𝐹�,� is the neural network and 𝜖 is the i.i.d. error between 𝑌 and 𝐹�,�(𝑋) or 𝑌�:

𝑌� ∶= 𝐹�,�(𝑋) = 𝑓�(�)(�)

2.3.3 Decision tree

A decision tree is an ML algorithm often found analogous to a tree because it has leaf’s and branches spreading out like a tree. It is based on logical reasoning and splitting samples based on hard boundary values to classify them. Decision trees are one of the most used ML techniques in practice, they are relatively easy to understand and quick to deploy [23]. Some of the main advantages of using decision trees in many classification and prediction applications are:

Easy to interpret and explain. In contrast to other algorithms which are oftentimes considered as black-box models.

Decision trees require relatively little effort from users in terms of data preparation. Outliers do not influence results as heavily compared to neural nets or regression models.

Little to no data normalization is required. Features can range from binary to infinity without it affecting the model.

Nonlinear relationships between parameters do not affect tree performance.

- 11 -

Highly nonlinear relationships between variables will result in failing checks for simple regression models and thus make such models invalid. However, decision trees do not require any assumptions of linearity in the data. Thus, we can use them in scenarios where we know the parameters are nonlinearly related. [23]

Decision trees can be used for both classification and regression. In a regression tree we are looking for a continuous numerical value, in this case we group data that is close together in the same leaf and use an average of these datapoints as our prediction. For a perfect model we could give each datapoint its own leaf that would result in a tree with perfect prediction rates on the training data. However, we are often interested in a model with some generality to it because we want to make predictions on unseen data. This problem is called the bias-variance trade-off, our perfect model would have high variance on unseen data. It is a well know problem in ML and there are several techniques to counter-act it. A single decision tree oftentimes performs well in training but fails to generalize. A technique to prevent this problem is called random forest, it involves combining the predictions of multiple decisions trees. Instead of training each of the trees on the entire dataset like we would do for a single decision tree we split the data into random samples. By sampling only a small part of the data and training the trees separate the model is more robust and better able to predict outside of the training data.

2.4 Hyperparameter search

Hyperparameter search is ML lingo for the fine-tuning process of finding the best model hyperparameters e.g., learning rate, number of layers, loss functions or the value of k in the KNN algorithm. These model hyperparameters unlike model parameters cannot be inferred from the training data and must be determined in an iterative manner [18]. While rough guidelines exist for the range and boundaries of these hyperparameters these are by no means all-encompassing. Well accepted methods to find these hyperparameters include inefficient exhaustive stochastic searches. Since each iteration involves training the ML model with the chosen set of hyperparameters and then evaluating its performance. Even for small models, this process could easily take up days. Hyperparameter search is often regarded as one of the most cumbersome tasks in machine learning projects [9].

Other than the quality of the relationship between the input and the feature of interest the performance of an ML model largely depends on the fitness of its hyperparameters. Recent developments have made the community shift away from exhaustive uninformed search methods like grid search and random search to more informed methods of searching, e.g., gradient descent and ML aided search techniques [9]. The intent of these algorithms is to search for hyperparameters in a more efficient manner.

2.4.1 Gradient descent

One of the techniques often used in optimization is gradient descent. Gradient descent is based on the principle that if we move in the direction of the gradient, we are moving towards a minimum [14]. The idea behind gradient descent: you start with a stochastically determined point in the design space and move it iteratively in the direction of steepest descent. This is the direction of the largest negative gradient vector, −∇𝐶. The learning rate 𝜆 determines how large the update or moving step is. If 𝜆 is too small, the algorithm might converge very slowly. Large 𝜆 values can cause issues with convergence or even make the algorithm divergent.

The step size is determined according to the formula:

𝜃�� = 𝜃�� + 𝜆 ∙ 𝑓′(𝜃��)

( 5

- 12 -

For example, if the objective function has only a single independent variable (𝑣), and its gradient is the derivative 2𝑣. It is a differentiable convex function, and we can find a minimum by analytical means. However, in practice, we might not always know the function beforehand making it impossible to differentiate and is often approximated with numerical methods [24].

Figure 8 Gradient descent, 𝑧 = 𝑥� + 𝑦�

2.4.2 Bayesian model-based optimization

In [8] the authors state the importance of hyperparameter optimization, the authors reference other papers that achieve high performance due to concerted hyperparameter search rather than innovative modeling or groundbreaking ML algorithms. The authors introduce two new sequential hyperparameters search algorithms based on Bayesian models. Studies were conducted on a set consisting of a 32-dimensional search space with a mix of categorical parameters, continuous parameters, and stepped parameters. Both algorithms performed significantly better compared to manual search and brute force. One of the algorithms a tree Parzen estimator or TPE is based on an ML technique called mixture models, mixture models are a form of unsupervised learning used for clustering data. Mixture models are a form of soft clustering, instead of classifying objects based on a hard boundary they assign probabilities to an object belonging to a certain class. The main advantage over other clustering algorithms is that mixture models include information about the variance and covariance in the data.

The TPE algorithm performs the following steps [8]:

1. Initialize algorithm by random sampling from the domain space and compute the objective function 𝑦 for the selected samples.

2. Split the results into two groups, upper quantile and lower quantile defined by boundary 𝑦∗ where 𝑦 denotes the value from the objective function e.g., the score, costs, or validation results.

3. Model the likelihood 𝑝(𝑥|𝑦) for 𝑥 being in each of these groups according to ( 6.

4. Model the prior 𝑃𝑟(𝑦) using the fact that 𝑃𝑟(𝑦 < 𝑦∗) = 𝛾

5. Find the hyperparameters that perform best on the surrogate model using expected improvement EI.

6. Apply these hyperparameters to the true objective function.

7. Update the surrogate model incorporating the new results.

8. Repeat steps 2–7 until the maximum number of iterations or a certain time is reached.

- 13 -

The algorithm models 𝑝(𝑥|𝑦)

𝑝(𝑥|𝑦) = �𝑙(𝑥) 𝑖𝑓 𝑦 < 𝑦∗

𝑔(𝑥) 𝑖𝑓 𝑦 ≥ 𝑦∗

( 6. The expected improvement:

𝐸𝐼�∗(𝑥) = � (𝑦∗ − 𝑦) 𝑝(𝑥|𝑦)�∗

��

𝑑𝑦

( 7. Which after some mathematical derivations can be reduced to:

𝐸𝐼�∗(𝑥) =𝛾𝑦∗𝑙(𝑥) − 𝑙(𝑥)∫ 𝑝(𝑦)

�∗

��𝑑𝑦

𝛾𝑙(𝑥) + (1 − 𝛾)𝑔(𝑥)∝

1

𝛾 +𝑔(𝑥)𝑙(𝑥)

(1 − 𝛾)

The expected improvement is proportional to the right-hand term, maximizing the likelihood of 𝑥 under 𝑙(𝑥) and minimizing the likelihood under 𝑔(𝑥) increases 𝐸𝐼�∗.

- 15 -

3 Method

3.1 Case study Delfste Schie

A case study was used to compare the results of the proposed automated optimization routines to a conventional calculation. This specific project was chosen because its design is representative of many river/channel bank projects throughout the Netherlands, favored for its simple design fast construction and long life span. Because of its relatively simple design, we can focus on the performance of the optimization routine.

The case study concerns the maintenance of the waterway profile and watersides of waterway 1 Delfste Schie. Waterway 1 is managed and maintained by the province Zuid-Holland. The trajectory consists of the Haagse Trekvliet, the Delftse Schie and the Rijn-Schie channel. It is an important route for shipping traffic between the port city of Rotterdam and The Hague. Furthermore, the channel is used by various industries situated alongside the channel, who all make frequent use of it. Currently, the channel is in poor condition, several measures have to be taken to ensure the safety of its users and the surroundings for present and future usage. One of these measures involves replacing the old sheet piling along the watersides at location 6 shown in Figure 10 by installing new sheet piling. [25]

The watersides at location 6 will be replaced by a quay wall consisting of a sheet pile wall below the water table and an inclined basalt (3:1) stone slope above the water table according to Figure 10. The sheet pile wall is a free-standing cantilevered retaining wall with a retaining height of around 4.0 meters. It was calculated with three construction stages, first the installation of the sheet pile with a temporary support structure, next the removal of the support structure and in the final stage with the application of a surcharge load from the nearby road. The water levels were kept constant at -0.43 N.A.P. throughout the stages and in summer and winter. The reference period for the structure is 100 years.

The calculations involved overall stability assessment, displacements, and internal structural capacity in accordance with EC7 and the Dutch national annexes. The conventional calculated sheet pile profile is of type AZ 12-700 with an embedment depth of 12.0 meters and a total length of 12.3 meters. This resulted in a maximum displacement at the top of the sheet pile wall of 22 mm and a moment capacity ratio of 0.54.

Moment capacity ratio:

𝑈.𝐶. =𝑀��

𝑀��

Figure 10 Overview location 6 case study: Delfste Schie Figure 9 Section detail case study Delfste Schie [25]

- 16 -

3.2 D-sheet piling

The software used to analyze the case study is a tool called D-Sheet Piling it is a design tool developed by Deltares. The majority of all Dutch quay walls have been designed by tools from Deltares. Various features of the software include arbitrary soil profiles, varying surcharge load, support structures such as anchors and struts, internal forces, and overall stability.

D-Sheet Piling models the sheet piling as an elasto-plastic beam on a foundation of a discrete number of uncoupled elasto-plastic springs representing the soil. These simplifications transform the interaction between the soil and the sheet pile to a one-dimensional model. A compressive normal force will introduce additional bending, the user can introduce normal forces and D-Sheet Piling will calculate the additional moments and displacements that follow from the inputted normal force.

The soil layers can be defined by hand, or they can be interpreted and generated automatically from a CPT, optional in combination with a non-horizontal surface level. D-Sheet Piling models the stiffness of the soil as a series of discrete, independently acting, multi-linear springs, forming an elastic foundation for a beam (which is used to model the retaining structure). Furthermore, active, and passive earth pressure 𝜎� and 𝜎� Figure 11 are depended on deformation with the effects of loading and unloading included.

Figure 11Relation between lateral earth pressure and deformation [26]

Overall stability is checked using the Bishop method in accordance with the Eurocode 7 [27]. With the bishop method the slip surface is divided in equal slices, each slice is subjected to a vertical force 𝑊 and a normal force 𝑁, interslice shear forces are neglected. The safety factor appears on both sides of the equation. First, a safety factored is assumed, the governing safety factor is converged upon trough an iterative procedure.

Figure 12 Bishop’s method of slices [28]

- 17 -

3.3 Modeling of the objective function

For the algorithm to be able to judge the quality of a design we need to model an objective function [6]. The objective function is the function for which we wish to optimize it consists of a target to maximize or minimize and a constraint or set of constraints imposed on the target. The quality of the optimization routine depends on the representation of the objective function to our optimization problem. An objective function follows from the concise specification of the optimization problem.

In our case we can define our optimization problem like this:

Find design variables sheet pile type and length to:

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑤𝑒𝑖𝑔ℎ𝑡

such that:

𝑑𝑖𝑠𝑝𝑙𝑎𝑐𝑒𝑚𝑒𝑛𝑡 < 𝑑𝑖𝑠𝑝𝑙𝑎𝑐𝑒𝑚𝑒𝑛𝑡��

𝑀�� < 𝑀��

𝑉�� < 𝑉��

𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑒𝑑

For our optimization problem, we have formulated the following constraints on the results from our design tool, displacement, structural integrity, and stability. Constraints can either serve as strict boundaries a set of design variables has to adhere to. Or they can serve as a limit, which if exceeded adds a certain penalty to the objective function [6]. When using a strict limit we would disqualify a lot of iterations, so if either one

of our constraints 𝑀��, 𝑉��, stability etc. is exceeded our objective function would not return a value. We would know the suggested set of design variables is not viable but we are unable to convey information about other sets of design variables in proximity.

Therefore, a penalty is introduced transforming our constrained optimization problem into an unconstrained optimization problem [29]. A loss function describes how this penalty is applied to the objective function. It could either be added directly or we might add more weight depending on how much a certain constraint is exceeded.

A widely adopted loss function is the quadratic loss function which weighs in the distance of the exceeded

value 𝑦� and penalizes values that are farther from the optimum 𝑦 heavier compared to values that are close to the optimum. As an example, if the displacement constraint is 50 mm and the displacement of a design for a set of design variables is 53 mm the loss would be (50-53) ².

Quadratic loss function:

𝐿(𝑦,𝑦�) = (𝑦 − 𝑦�)²

- 18 -

3.4 Experimental setup

In this section the experimental setup is discussed, important choices and considerations that were made, and the input used in the experiment are addressed. Two optimization routines were tested one based on a databased ML model and the other based on a hybrid model. To evaluate the routines, we compared the results to the results of the previously presented case study in section 3.1.

The main objective for the optimization routine was to reduce the weight of the design while respecting certain constraints e.g., displacements, moment capacity and stability. The first step was to design an objective function that could capture these relationships. We proposed 4 different objective functions each with different penalty functions, trying to capture the relationships in a different style. We checked their performance using a correlation study. The best performing function was used in both proposed optimization routines.

3.4.1 Hybrid optimization routine

For the hybrid optimization routine shown in Figure 13, we combined an ML technique for sampling the design variables with a theory-based design tool as the objective function. In this routine we still used our computational expansive objective function to calculate the design and obtain results however, we used an ML technique to shape a prediction surface of the design space. From this computational inexpensive prediction surface, the algorithm samples promising sets of design variables as the following input for the design tool. With each iteration, the prediction surface better resembles the actual shape of the design space. The project-specific parameters remained constant throughout the process. The ML technique that was used is described in detail in section 2.4.2 in short, the concept is to combine spatial distance with spatial correlation, based on their distance sample points are assumed to share statistical properties.

Apart from ML-based sampling, we first considered uninformed sampling methods these trials were used to establish a baseline for comparison and to judge the effectiveness of the objective function. The two methods considered were grid-based random sampling and random sampling. A total of 100 trials was conducted, this number was chosen because it gave a good approximation of the shape of the design space in relation to its size. This baseline then served as input for a non-linear approximation of the actual design space to visualize the performance of the sampling by the ML algorithm.

As discussed previously in section 3.3 the performance of the optimization routine is also dependent on the quality of the objective function therefore different tests were conducted with different approaches in modeling the cost function.

Figure 13 Hybrid optimization routine

- 19 -

.

3.4.2 Databased optimization routine

In this section the approach we took for the databased routine is explained. However, the databased routine has not been tested due to difficulties during implementation. These difficulties are expressed in chapter 5 section 5.1.

For the databased optimization routine, we used the same ML technique for sampling the design variables, but we replaced the theory-based design tool with a supervised ML model. We approximated the computational expansive design tool with a prediction made by our ML model. Apart from faster optimization this could justify a more complete search of the design space.

The first step involved training the ML model (prediction model) on a dataset of example calculations. Here it is also important to consider the contribution of project specific data to the predictive power model. Since in this case the data builds the predictive model. To ensure generality of the predictive model training data should preferably be sampled from a from a wide variety of different projects.

Next is the structure of data, the number of input values should remain constant for our ML techniques however between projects we have a varying amount of soil layers. One method would be to convert an image of a CPT to an input feature by vectorizing it, however this adds a lot of extra complexity to the model, and it would make the model very large. Another much simpler way is to leave slots for input values and leave them at zero if unused, the model should figure out their use.

After having determined the shape and structure of the input we can train the model. This training is done by presenting the model with the input data and the matching output. The model is then adjusted for each batch by backpropagating the output to the input.

Finally, the model hyperparameters are adjusted in an iterative manner until a satisfactory accuracy is reached. This involves rerunning the previous step and creating a new model each time. This step can be very time consuming but only has to be performed once.

For the predictive model the following ML techniques are considered:

KNN MLP Random forest

Figure 14 Databased optimization routine

- 20 -

3.4.3 Input

In the experiment we have several categories of input, first we have the project specific input. This input will vary for each project but remain static throughout the optimization process. Table 1 Project specific input

Partial factors Input value Reliability class RC1 – RC3 RC2 Reference period 50-100 100 years

Soil profiles CPT CPT Appendix A.1 Ground water level GWL -0.43m

Soil material Unsaturated soil weight 𝛾��. Appendix A.2 Saturated soil weight γ��. Appendix A.2 Cohesion 𝑐 Appendix A.2 Friction angle 𝜑 Appendix A.2

Loads

Basalt slope 𝐹 12 kN

Outer load 𝑞 20 kN/m

Next, we have our set of design variables, these are the parameters that are free to change each iteration. For simplification, we only included the most common sheet pile types (AZ - types) and the length was limited from 8 to 18 meters with intervals of 0.1 meter.

Table 2 Design variables

Design variables (start; stop; step)

Sheet pile length (8.0; 16.0; 0.1) m Sheet pile type All AZ profiles Steel quality S240

Finally, we have the model hyperparameters, these parameters are determined only once, after the ML model is trained on our data. Some of these values are dependent on the size of the model and there is no consensus on the best ranges and boundaries. Rough guidelines exist, these were followed.

Table 3 Model hyperparameters

KNN Number of neighbors 𝑘

MLP Number of layers 𝑙� Number of neurons p/lay. 𝑛� Activation function 𝑡𝑎𝑛ℎ,𝑅𝑒𝐿𝑈, … , 𝑆𝑖𝑔𝑚𝑜𝑖𝑑 Loss function MSE Optimizer 𝑆𝐺𝐷,𝑅𝑀𝑆𝑝𝑟𝑜𝑝, … ,𝐴𝑑𝑎𝑚 Learning rate (0.0001, 1) Dropout (0.1:0.5:0.05) Batch size (5:20:5) Tree Number of estimators (100,1000:100) Max depth (1:30:2) Min samples split (1:40:1) Min samples leaf (1:20:1)

- 21 -

4 Results

In this chapter, the results of the study are summarized, and a short explanation is given of how these results

were obtained. As discussed earlier in chapter 3 section 3.3 the performance of the optimization routine is

highly dependent on the quality of the representation of the objective function to the design problem.

Therefore, four different objective functions were tested, each with a different design approach. First, a

direct approach by optimizing for the weight of the sheet pile with the use of a penalty function. Next, an

objective function involving a quadratic loss function on displacement and moment capacity. Finally, the

last two functions further evolved from the results of the latter, they included a factor for the stability and

the free height.

For each test, after having defined the optimization problem, the first step was to run the objective function

with randomly generated design variables on the case study for a total of 100 trials. This was done to gain

insight in the performance of the objective function, it allowed us to easily spot extrema and the distributions

of viable designs vs. failed designs. The number of trials was chosen based on the dimensions of our design

space and gave good visual guidance.

Furthermore, a correlation study was conducted to give an indication on the influence of the design variables

on the objective function. And most importantly to analyze the effectiveness of the objective function in

optimizing the weight of the sheet pile wall. Based on the results of each test, one of the objective functions

was chosen to be used in the final optimization routine in conjunction with the ML algorithm.

4.1 Hybrid optimization kg’s of steel

The first approach was to directly optimize for the weight of the sheet pile with the use of a penalty function.

If either one of the design functions were exceeded an arbitrary penalty was added to the total weight. Two

design variables, length and type were chosen randomly by the algorithm in each iteration. The sheet pile

wall was then tested for displacement, stability and several cross-section design checks including moment

capacity, shear force capacity and buckling. If a single design check failed the sheet pile wall with its

respective design variables was marked as failed.

Objective function: optimize for kg’s of steel 1. if unstable:

2. penalty += ….

3. if displacement > displacementmax:

4. penalty += ….

5. for i in [𝑀��,𝑉��,𝑁��]:

6. if [𝑀��,𝑉��,𝑁��][i] > [𝑀��,𝑉��,𝑁��][i]:

7. penalty += ...

8. return (kg’s steel + penalty)

Pseudo code 1 Objective function optimized for kg’s steel

- 22 -

Figure 15 Design space optimization kg's steel (random grid search, 100 trials) By plotting the design variables against the value from the objective function, we obtained the design space shown in Figure 15. To plot against numerical values the type of sheet pile was represented by its section modulus 𝑤�, furthermore the axes were normalized. We can see the lowest point of the plane consists of failed designs, implementing this objective function would mean the algorithm optimizes towards a point where the design is not viable. Therefore, the function is not a good representation of the optimization problem.

To judge the effectiveness of the loss function and gain insight in the relations between design variables a correlation table was generated. Values above 0.7 indicate a high correlation, values between 0.4 – 0.7 are considered to have a moderate correlation and values below 0.4 are considered weakly correlated [30]. The same scale applies for negative correlation, in negative correlation the increase of one variable leads to the decrease of another and vice versa.

From the correlation table shown in Figure 16 we can see the loss value and the weight are highly correlated which is desirable. However, in this case with the knowledge of the design space, we can see the penalty function is ineffective in penalizing failed designs. Since the loss value is just the weight with the penalty added. Furthermore, we can see there is a high correlation between the length of the sheet pile and the weight indicating it is more important in decreasing weight compared to the type of sheet pile represented by 𝑤�.

Figure 16 Correlation table: Optimization kg's steel

Random grid search

- 23 -

4.2 Hybrid optimization displacement and moment capacity

The second approach was to optimize for displacement and moment capacity, with the idea that utilizing the full capacity of a design is most efficient. The maximum displacement, displacementmax was set at 50 mm, the resistance design moment 𝑀�� depended on the chosen sheet pile. For this case a quadratic loss function was used, the function penalized designs not utilizing their full capacity based on distance from the optimum. The farther away from the optimum, the higher the penalty. A similar approach was used as in section 4.1, 100 trials were conducted, if a design check was exceeded the design was marked as failed.

Objective function: 1. U.C. displacement = abs (displacement/displacementmax ):

2. loss += (1.0 - U.C. displacement) ² # quadratic loss function

3. U.C. moment capacity = abs (𝑀��/𝑀��)

4. loss += (1.0 – U.C. moment capacity) ² # quadratic loss function

5. return (loss)

Pseudo code 2 Objective function optimize displacement and moment capacity From the design space in Figure 17 we see the objective function was better able to split the failed designs from the sufficient designs. In this case length of the sheet pile is creating the boundary between the failed and sufficient designs. Secondly, in contrast to the previous objective function in this case there were no failed designs located at the optimum. This means the optimization algorithm would converge upon a robust optimum where a small change in the design variables doesn’t cause a significant change in the target variable, in this case the loss value. However, we do see a high level of variance in the failed designs, especially at the most unlikely solutions where the sheet pile length is below eight meters. This is concerning since the algorithm could mistake these points for the real optimum.

Figure 17 Design space optimization for displacement and moment capacity (Random grid search, 100 trials)

Random grid search

- 24 -

Figure 18 Correlation table: Optimization displacement and moment cap. (Only significant designs)

In this case, only the samples/designs marked as sufficient were considered for creating the correlation

table. This was done as we are only interested in reducing weight for sufficient designs. The failed designs

convey information, but they are not considered as a possible optimum.

From the correlation table in Figure 18 we can see there is a moderate correlation (0.60) between the loss

value and the weight. In this instance, while there was a good separation of failed and sufficient design, the

correlation meant a sufficient design with more weight could achieve a lower loss value compared to a

sufficient design. Furthermore, there is a weak correlation (-0.12) between the length and the loss value

while the length is a significant factor in the weight of the sheet pile. These results indicated the objective

function is not suitable for the optimization problem.

4.3 Hybrid optimization displacement, moment capacity and stability

The next approach was to once more use displacement and moment capacity with a quadratic loss functions

but to combine it with a penalty function for stability. If one or more of the constructions stages was

unstable a penalty was added based on the total number of construction stages and a certain factor. This

was done to increase information to the algorithm thereby increasing the extent of the loss function with

hopes of providing more separation between the values.

Objective function: 1. for stage in total construction stages:

2. if stage == unstable:

3. loss += factor∗( 1 / total construction stages)

3. U.C. displacement = abs (displacement/displacementmax:

4. loss += (1.0 - U.C. displacement) ² # quadratic loss function


6. loss += (1.0 – U.C. moment capacity) ² # quadratic loss function

7. return (loss)

Pseudo code 3 Objective function optimize displacement, moment capacity and stability

- 25 -

Figure 19 Design space optimization displacement, moment capacity and stability (Random grid search, 100 trials) Results in Figure 19 show a significant gap between the height of the loss value for failed designs and for sufficient designs, indicating its effectiveness in separating good sets of design variables from poor sets. Additionally, we can see the length of the sheet pile played a significant role in the height of the loss function. Furthermore, the results show only small differences in the loss value for the samples from the sufficient designs. Design configurations are positioned on a nearly flat plain. This makes predicting new configurations difficult.

Figure 20 Correlation table: Optimization of displacement moment cap. and stability (Only significant designs)

From the correlation table in Figure 20 we can see the weight and the loss value have a moderate to almost high correlation (0.68). Based on these results the objective function seems promising. Correlations between

the section modulus 𝑤� and length are similar compared to the previous objective function. There is still a

weak correlation (-0.15) between the length and the loss value which meant there was still room for improvement by putting more emphasis on the length of the sheet pile in the objective function.

Random grid search

- 26 -

4.4 Hybrid optimization displacement, moment capacity, stability and free height

While the previous approach proved good at splitting the failed designs from the sufficient designs, there was little difference between the loss value for the sufficient designs, making it hard to identify an optimum. Furthermore, a heavier design was sometimes favored over a lighter design. This was an indication that utilizing the full capacity is not necessarily always the best design strategy. A longer sheet pile in some cases resulted in higher utilization of the moment capacity, which led to a lower loss value but at the cost of adding more weight. Therefore, in the final approach, an extra factor was included for the length of the sheet pile in relation to the free height in hopes of increasing the correlation between the loss value and the weight.

Objective function: 1. for stage in total construction stages:

2. if stage == unstable:

3. loss += factor∗( 1 / total construction stages)

3. U.C. displacement = abs (displacement/displacementmax:

4. loss += (1.0 - U.C. displacement) ²


6. loss += (1.0 - U.C. moment capacity) ²

7. U.C. length = (length/free height*2)

7. loss += (1.0 – U.C. length)

8. return (loss)

Pseudo code 4 Objective function optimize displacement., moment capacity, stability, and free height

Figure 21 Design space optimization displ., moment cap., stab. and free height (Random grid search, 100 trials)

Random grid search

- 27 -

Looking at the design space in Figure 21, we again see a significant gap between the height of the loss value for failed designs and for sufficient designs. In contrast to the previous objective function, we also see significant differences in the loss value between the sufficient designs. If we look at the correlation table in Figure 22, the correlation between the loss value and the weight is much higher compared to the previous case 0.68 in 4.3. Together with the fact the length was highly correlated with the loss value, this objective function was considered as the best performing objective function.

Figure 22 Correlation table: Optimization of displ., moment cap., stab. and free height (Only significant designs)

4.5 Result hybrid optimization routine

The best performing objective function in terms of correlation to weight, and by inspection of the design space was considered to be the function form section 4.4. This objective function was therefore chosen in the hybrid optimization routine where the design variable sampling was conducted by TPE algorithm. The order and location of sampling is shown in Figure 23, the full details of each trial can be found in Appendix B. The grey dots indicate the initialization phase where random sampling is used, from trail 10 and onwards the algorithm starts predicting sets of design variables.

Figure 23 Design space optimization displ., moment cap., stab. And free height (TPE, 30 trials)

TPE

* 18, 30, 19

- 28 -

Figure 24 overview loss error different sampling methods From the results in Figure 24 we can see the absolute error is slightly in favor of the TPE algorithm and the elapsed search time is highly in favor of the TPE algorithm. Both the random grid search and random search took around 27 minutes to complete while the TPE algorithm took roughly 7.5 minutes to complete. Furthermore, we can see the overall error was much lower for the TPE algorithm compared to the random search and the grid search.

While our intent is to optimize for weight, we do this indirectly by minimizing a loss function, the two are not perfectly correlated. Therefore, if we look at the performance in terms of weight in Table 4 we see the TPE sampling performed slightly better compared to the random sampling. However, the suggested optima are very close and there is only a slight difference in the length. If we consider the elapsed search time the results of the TPE algorithm are promising.

If we compare the results of the optimization routine to the conventional method, we see there is a slight gain in terms of weight by 11.48 % at the cost of an increase of displacement by 2.75 %. In this situation, there seems to be not much gain by the optimization although it is interesting to see how close the results are.

Table 4 comparison optimization routine and different sampling methods

Method type length (m)

weight (kg’s/m)

displacement (mm)

Conventional AZ 12 - 700 12.3 1220 14.5 Random Grid search AZ 13 - 700 10.8 1170 14.7 TPE AZ 12* 10.9 1080 14.9 conventional vs. TPE -11.48% +2.75%

- 29 -

5 Discussion and further work

5.1 Considerations databased optimization

Apart from a hybrid optimization routine, a databased optimization routine was suggested, in this routine the theory-based model was replaced by a supervised ML model. The ML model is solely based on data to model a map between the input and output which can be used to make predictions on unseen data. Due to the limited amount of time and complications of implementing such a model it was not achieved in this study. The pitfalls that were encountered during the study and possible techniques to overcome these challenges for possible future research are discussed in this section.

The first challenge was to collect all the relevant data from different data sources and data structures and unify the data into a single database. This required significant insight in data management and data this fell outside the scope of this study.

The next challenge was to shape the data in such a way that it can be used as input for an algorithm, the hardest part is dealing with varying input sizes. As an example, one sample might have 12 soil layers while for another sample we might have only 3 soil layers, the ML techniques suggested in this study can only handle fixed input parameters sizes.

The following techniques could be used as a solution, in increasing complexity:

Use placeholders for empty input. Convert the soil data image into a vector as shown in Figure 25. Use convolution layers to extract features from soil data image.

The easiest way is to use placeholder values that contain the input if it exists and contain zeros otherwise. The downside is that such a model needs more training to detect the underlying data structure. It is difficult to detect signals if the input data was zero for a long time suddenly jumps into action.

Next is the vectorization of soil data, we could use a single column of pixels containing information about the height and the color of each layer. Finally, we could use convolutional layers to filter useful features from the soil data as a feature map. Features or feature maps can be thought of as the distance between soil layers or the color, in contrast to vectorization effective convolution can significantly reduce the input size.

Figure 25 Example vectorization of soil data

- 30 -

5.2 Discussion

In this study, an automated optimization routine for the design of quay walls was proposed and tested. The

aim was to develop an informed optimization routine where an ML algorithm proposes design variables to

converge a cost function closer towards an optimum with each iteration. The tool should assist engineers

in making informed design decisions by generating insight into the boundaries and shape of the design space

to help in localizing interesting extrema. Ideally, this should be achieved within a limited timeframe and with

a sparse allocation of resources.

The focus of our optimization was on reducing the weight of a sheet pile while respecting design constraints

such as displacement, moment capacity and stability. This was achieved through designing various objective

functions for the optimization problem and selecting the most promising one. The design constraints acted

not as hard boundaries but rather added penalties if violated this was important to allow the algorithm to

learn from all trials even trials where the design was not satisfied.

Results showed the hybrid optimization routine with the TPE algorithm was able to converge on an

optimum and slightly improved upon the conventional suggested design variables. This result was achieved

within 30 trials with a total run time of 7.5 minutes which was significantly faster compared to random

searches and can probably match up against conventional optimization. Especially if you consider the fact

the routine can run autonomously. As a side note, each trial of the TPE algorithm took longer compared to

a random iteration. This is because after each iteration the algorithm updates its predictive model and

calculates the expected improvement for each set of parameters. At higher dimensions, there might be a

trade-off in favor of random search methods. The same principles apply to models where the calculation

model is inexpensive, in such cases, it might be faster to search in a stochastic manner.

Nevertheless, these results indicate strong support for the potentials of ML techniques within structural

optimization. Especially for the implementation of hybrid models where the conventional theory is

combined with the use of data.

A mention of caution that arose during one of the discussions is the uncertainty regarding the variance in

geotechnical parameters such as soil layers, soil parameters and water levels. Is it responsible to search for

tiny margins amidst high levels of uncertainty?

In response, the design checks are in accordance with the EC7 which includes high levels of safety, but the

ultimate responsibility is always with the engineer. As an addition in such cases, one could use the initial

TPE algorithm to locate the optimum and use extensive grid search to verify whether the optimum is robust.

The next steps for successful adaptation of the optimization routine would be to generate more insight into

the decision-making process by including more information about each run. Currently, the tool is mainly

focused on identifying extrema but there is not much insight generated in the data points surrounding the

optimum. Furthermore, the routine should also be extended to more test cases to ensure the objective

function is representative of other projects with different geotechnical parameters.

- 31 -

5.3 Future work

The results are promising and show that ML algorithms can enhance the optimization routine by searching the design space in an informed manner. For such a routine to be fully implemented and adopted in the future several steps should have to be taken.

Test the fitness of the objective function on different case studies to show whether the objective function is generalizable. It would be interesting to see if different soil configurations and load configurations influence the results.

Include a graphical user interface to make it user-friendly and generate insight as to how the end results were obtained.

Gradually add more design variables to the objective function and study the influence with the use of correlation and feature importance studies. Pruning the less influencing ones.

Conduct a Pareto optimization for instance minimizing both weights can co2 reduction and

compare the results against single target optimization to study the tradeoffs.

Against initial believes the design space for our two design variable model was convex shaped and

therefore a gradient descent algorithm might be preferable. Although it is not clear how such an algorithm would perform in higher-dimensional design spaces.

- 33 -

6 Bibliography

[1] G.J. de Gijt and M.L. Broeken, in Handbook Quay Walls, Rotterdam, CRC Press/Balkema, 2014, pp. 5-8.

[2] B.M. Das, in Principles of Foundation Engineering, Thomson, 2011, pp. 413-415.

[3] H. Salehi and R. Burgueño, "Emerging artificial intelligence methods in structural engineering," Engineering Structures, vol. 171, pp. 170-189, 2018.

[4] L.C. Nguyen and H. Nguyen-Xuan, "Deep learning for computational structural optimization," ISA Transactions, vol. 103, pp. 177-191, 2020.

[5] L. Greco, "Machine Learning and Optimization techniques for Steel Connections," 2018.

[6] A.R. Parkinson, R.J. Balling and J.D. Hedengren, in Optimization Methods for Engineering Design Applications and Theory, Brigham Young University, 2013, pp. 3-9,18.

[7] B.H. Topping, in Optimization and Artificial Intelligence in Civil and Structural Engineering, Springer, 1992, pp. 217-218.

[8] J.P. Bergstra, R.J. Bardenet, Y. Benigo and K. Balzas, "Algorithms for Hyper-Parameter Optimization.," 2011.

[9] J. Bergstra, D. Yamins and D.D. Cox, "Making a Science of Model Search: Hyperparameter Optimization," in International Conference on Machine Learning, Atlanta, 2013.

[10] K. Chandra et al., "Gradient descent the ultimate optimizer," 2019.

[11] O. Dababneh, T. Kipouros and J. Whidborne, "Application of an Efficient Gradient-Based Optimization Strategy for Aircraft Wing Structures," Aerospace, vol. 5, no. 1, p. 3, 2018.

[12] I. Dincer, M.A. Rosen and A. Pouria, "Optimization of Energy Systems," 2017.

[13] S. Choi, B. Oh and H. Park, "Design technology based on resizing method for reduction of costs and carbon dioxide emissions," Energy and Buildings, vol. 138, pp. 612-620, 2017.

[14] J.A. Goulet, in Probabilistic Machine Learning for Civil Engineers, The MIT press, 2020, pp. 47-49.

[15] F.M. Dekking, C. Kraaikamp, H. P. Lopuhaa and L. E. Meester, in A Modern Introduction to Probability and Statistics, Springer, 2005, pp. 26-27.

[16] E. Alpaydin, in Introduction to Machine Learning, MIT, 2020, pp. 4-12.

[17] T. Hastie, R. Tibshirani and J. H. Friedman, in The Elements of Statistical Learning, Springer, 2009, pp. 274-275.

[18] J. Brownlee, "machinelearningmastery," [Online]. Available: https://machinelearningmastery.com/difference-between-a-parameter-and-a-hyperparameter/. [Accessed 15 May 2021].

[19] IBM Watson, "ibm," [Online]. Available: https://www.ibm.com/cloud/learn/deep-learning. [Accessed 23 2021 2021].

- 34 -

[20] E. Fix and J.L. Hodges, "Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties," USAF School of Aviation Medicine, Randolph Field, Texas, 1951.

[21] M.W. Gardner and S.R. Dorling, "Artificial neural networks (the multilayer perceptron) a review of applications in the atmospheric sciences," Atmospheric Environment, vol. 32, no. 14–15, pp. 2627-2636, 1998.

[22] M.A. Nielsen, in Neural Networks and Deep Learning, Determination Press, 2015.

[23] V. Kotu and B. Deshpande, "Decision trees," in Predictive Analytics and Data Mining, Morgan Kaufmann, 2015, p. 87.

[24] M. Stojiljković, "realpython," 27 Jan 2021. [Online]. Available: https://realpython.com/gradient-descent-algorithm-python/. [Accessed 20 Feb 2021].

[25] Movares, "Vaarwegtraject 1 Delftse Schie (Definitief ontwerp)," Movares, Utrecht, 2020.

[26] W. Bjureland, "AF2609 Foundation Engineering 2019, Sheet pile walls", [lecture notes], 2019.

[27] NEN, "NEN 9997-1+C1 Geotechnical design of structures," NEN, 2016.

[28] A. Verruijt and S. Baars van, in Soil mechanics, 2007, pp. 289-299.

[29] F.Y. Cheng and Y. Gu, in Computational Mechanics in Structural Engineering, Elsevier, 1999, p. 311.

[30] D.E. Hinkle, W. Wiersma and S. G. Jurs, in Applied statistics for the behavioral sciences, Cengage learning Inc., 2003, p. 523.

- 35 -

Appendix A

A.1 CPT case study

A.2 Soil parameters case study

- 36 -

Appendix B

Random grid search

length

- 37 -

length

- 38 -

TPE

TRITA-ABE-MBT-21582

Documents

Machine learning in quay wall design