Comparative Analysis of Image Processing Algorithms for ...€¦ · the input by converting the image from a 3 dimensional array to a single dimension array. The array is then passed

Comparative Analysis of Image ProcessingAlgorithms for Airport Security

Sasha Callaway Jeffrey Cheng Alexander [email protected] [email protected] [email protected]

David Fu Harshika Gelivi Jacob [email protected] [email protected] [email protected]

Sonia Purohit*[email protected]

New Jersey’s Governor’s School of Engineering and TechnologyJuly 18, 2020

*Corresponding Author

Abstract—In the current age of increased air travel, safeairport security is imperative. And while correctly detectingpotential threats is a top priority, it is also important thatthese security checks are performed as efficiently as possibleto maintain the flow of travel. Current security methods utilizemachine learning algorithms to analyze and detect threats onmillimeter wave scans of individuals. This paper implementedand compared various classification algorithms on a dataset ofmillimeter waves scans to determine the most effective algorithmat detecting threats. This paper also examined different datapreprocessing and model optimization techniques and their effecton the performance of the models. It was concluded thatthe convolutional neural network and long short-term memorynetwork, trained on a specific zone of the body, were the mosteffective models for detecting potential threats.

I. INTRODUCTION

A. TSA Airport Scans

The Transportation Security Administration (TSA) was cre-ated to protect public transportation in the United States ofAmerica using a variety of different technologies [1]. Oneessential technology utilizes a comprehensive scan of thehuman body to reveal any threats. As those who pose a threatto national security become more advanced in their attemptsto bypass these technologies, machine learning is an effectivetool to ensure the safety of public transportation. The use ofmachine learning aims to automate the process of checkingscans, reducing the need for manual human checks and en-abling greater accuracy in detecting threats. The need for anaccurate and efficient way to classify a scan as containinga threat or not is clear. A study by ProPublica showed thatGermany (which uses the same millimeter-wave scanners asthe TSA) had a false positive rate of 54 percent. This high rate

was cited as being a result of irregularity in clothes and sweat[2]. Using a machine learning algorithm can account for thesediscrepancies. In addition, a better detection algorithm wouldmore accurately identify threats, eliminating the additionalresources spent on people who are falsely flagged.

This paper analyzes various ways of classifying TSA imagesas containing a security threat through various machine learn-ing algorithms and optimization methods, and to ultimatelyfind the most effective method of doing so.

II. BACKGROUND

A. Machine Learning

Machine learning is a category of artificial intelligencealgorithms which have the ability to improve through datainput. These algorithms process patterns in data and createcorresponding alterations within their hidden workings toeffectively “learn” from the data being inputted and becomebetter at performing at a specified task [3]. Certain machinelearning algorithms are built to mimic the human mind, such asneural networks, which are known for processing informationwith neurons like the brain. Regardless of what the algorithmis used for and its complexity, it uses a learning method thatcan typically fall into one of the following categories [4]:

1) Supervised machine learning uses a training datasetto learn and apply information to a testing set. Thealgorithm is able to learn by finding patterns and rep-etitions in the data that it looks for in future datasets.These algorithms are used to make predictions and whentrained with labels can adjust its processes to becomemore accurate [5].

1

2) Unsupervised machine learning algorithms use unclassi-fied and unlabeled data to learn how to process hiddenpatterns in data. Rather than choosing one correct output,these algorithms are used to find and describe thesepatterns [5].

3) Reinforcement machine learning algorithms have a di-rect link to their environment, allowing them to collectdata based on their interactions. Through a process oftrial and error, the algorithms learn the ideal behaviorfor certain situations through a “reinforcement signal”[4].

The advancement of machine learning has allowed for var-ious software innovations to become possible, most of whichare used to mimic human abilities such as image recognition ortext recognition. Various forms of machine learning are usedcollectively to implement these technologies into education,safety, and everyday lives.

B. Classification Models

1) Logistic Regression: Logistic Regression is a classifyingalgorithm that is used to take certain input values and setthe output as a binary value [6]. Therefore, this algorithm issuitable for identifying whether a body scan contains a threat.The logistic regression algorithm works by first flatteningthe input by converting the image from a 3 dimensionalarray to a single dimension array. The array is then passedinto the activation function, commonly a sigmoid function.The function then outputs a value between 0 and 1, whichrepresents the probability that the input would be classifiedpositively.

g(z) = eb0+b1∗z/(1 + eb0+b1∗z) (1)

The equation above is a sigmoid function and once thecoefficients of b0 and b1 are determined through algorithmicoptimizations, the probability of z can be found from saidequation [7]. The output from this equation classifies the inputwhere if the value of g(z) is greater than 0.5, it belongs in thepositive class. Otherwise, it belongs in the negative class. Thecloser the output of the function is to either one or zero, themore confident the model is in its classification.

Fig. 1. A logistic function [8].

2) Neural Networks: Neural networks are the basis for amajority of machine learning algorithms. This specific typeof algorithm aids a computer in analyzing training examples,

which helps systems classify their data through pattern recog-nition. Neural networks contain layers of nodes, or neurons,that are connected to other layers with data passing betweenthem. The data that each node receives is coupled withcoefficients known as weights, and then a bias is added to thesaid nodes. This works to change the significance of the inputby either increasing or decreasing it [9]. The resulting valuesare passed through an activation function to determine whatinformation should be transferred further into the network ofnodes. The neural network as a whole contains nodes thatmake up an input layer, hidden layers, and an output layer, asshown in Fig. 2. While the input layer takes in the initial datafor the network, the output layer gives results of the processeddata and transfers that information out of the network. Thehidden layers contain the nodes that perform the computationsthat process the data from the input layer to the output layer[10]. In this research, the neural networks considered will befeed-forward, meaning that the information only moves in onedirection from the input layer, the hidden layers, and the outputlayer [9].

Since the neural network has no prior information on theinput data, the weights and biases that are applied at each nodebegin as random values. The training algorithm tests theserandom values with training data and computes a loss valuethat reflects how well the neural network model represents thedata. A larger loss means a worse model, so training worksto tune the weights and biases to minimize loss. Then, whennew images are fed into the model, the adjusted weights andbiases will be applied in an attempt to properly classify theimages [9].

Fig. 2. A neural network with hidden layers and weights applied [11]

3) Convolutional Neural Networks: Convolutional neuralnetworks (CNN) are a neural network subtype designed forimage processing. They are based on the scientific understand-ing of human neurons in the visual cortex, where each neuronis only responsible for a specific part of the input image [12].The algorithm is able to learn the features of an image byidentifying and assigning importance to various objects andimage aspects. The identification of features is accomplished

2

by sliding a kernel (matrix of learnable weights) across theimage and performing matrix multiplication with the pixelvalues to produce a new value for the resulting matrix. Theimage undergoes many such kernels, each producing a resultthat is passed under more kernels [13]. Convolving an imageresults in reduced dimensionality when the kernel is largerthan a pixel. To keep the dimensions of the matrix constant,some models employ same padding, surrounding the imagein layers of zeros so that the convolved image has the samedimensions as the input. Not using a layer of zeros is calledvalid padding [14]. Another parameter of convolution is thestride size. The stride is the number of pixels the kernel travelshorizontally and vertically when sliding over. Larger stridesresult in smaller outputs.

Fig. 3. A 3x3 kernel being applied to a 5x5 image that is padded with alayer of zeros. The output is smaller because the kernel uses strides of 2 [15]

In between each layer of kernels is a pooling layer. Thisproject’s CNN algorithm used max pooling, which passes akernel over the image and takes only the highest value inside.Max pooling boosts the model’s performance by extractingprominent features, eliminating unimportant noise, and reduc-ing necessary computational power. Another type of poolingis average pooling, which takes the average of the valuesinside the kernel. Because average pooling merely suppressesnoise and does not extract the most important features, maxpooling more often leads to better results [12]. After severallayers of convolution and max pooling, the model has learnedthe features by producing many filtered images, each witha different aspect highlighted by different combinations ofkernels. These images are flattened into a string of values andpassed through a standard neural network that classifies theimage.

CNNs perform better than flattening the image into a longstring of pixel values because it is able to capture spatialdependencies in the image. Flattening the pixel values stopsthe model from recognizing lines and edges, whereas theCNNs layers of kernels are able to train the model to recognizeand assign importance to edges and shapes in the image.Because it is able to identify important features independently,a CNN eliminates a lot of the preprocessing necessary in othertypes of models, making it ideal for image analysis [12].

Fig. 4. A CNN applied to the classification of handwritten digits [12]

C. Long Short-Term Memory Networks

CNNs analyze one image at a time, applying convolutionsand tuning weights and biases accordingly. However, whena sequence of images is provided, there may be certainrelationships between consecutive images that CNNs are notdesigned to detect. For example, several images showingthe progression of a jump always have a specific order.The person starts on the ground, moves upward, and thendrops back to the ground. CNNs would analyze each frameindividually and fail to consider the progressive movement ofthe human holistically. To allow neural networks to rememberfeatures from past images while working through trainingimages, recurrent neural networks (RNN) were developed.Since then, a specific type of RNN called a long short-termmemory (LSTM) network has been developed, and it has seenincredible success in its multitude of applications, includingmovement recognition, finance, and more [16].

Fig. 5. A diagram of three iterations of LSTM training [16]

The foundation of a basic LSTM are three gates that allowit to learn from both previous states and the current image thetraining algorithm is analyzing. It uses the information fromthese two items and appropriately modifies a cell state that ispassed from iteration to iteration of training. This cell stateis a matrix that acts as the LSTM’s ”memory,” in that theLSTM uses the cell state to forget, learn, and remember allinformation needed for proper image classification [16].

The first gate determines what should be removed fromthe cell state, in other words, what should be forgotten. Thesecond gate is made of two layers and determines what thecell state should learn. The first layer returns what cell state

3

information should be modified, and the second layer returnsthe information to update the cell state with. The third gateis responsible for filtering the current cell state to pass intothe next cell state, allowing future iterations to account forthe past. Each gate for each iteration of training uses thisfiltered cell state and the corresponding training image as inputand applies each gate’s set of weights, biases, and activationfunctions to produce the proper output [16].

Fig. 5 show three iterations of training, where the cell stateis passed from iteration to iteration as x and h. The four yellowrectangles are four activation functions that make up the threegates, with the middle two rectangles representing the twolayers of the second gate [16].

D. Data Preprocessing

The success of training depends on the design of the neuralnetwork, but also the quality of the training data. Higherquality training data, such as clearer pictures and centeredfeatures, leads to a better model. Therefore, there exist manyalgorithms to optimize the given data for more effectivetraining.

1) Data Augmentation: Data augmentation is a techniqueimplemented within machine learning to increase the amountof data available to a model. Data augmentation encompassesa variety of methods to alter original data. These alterationsmay include rotating, cropping, flipping, scaling, and blurringof images. Data augmentation is able to transform originaldata for the model to be trained and tested. This createsa diverse dataset which gives the model more data to trainupon, bettering its accuracy. The ability to generate a moreexpansive dataset, especially when a wide selection of data isnot available, highlights data augmentation’s significance. Dataaugmentation not only increases the amount of data, but alsoprovides varied data which will assist the model in completingits task with data which is different from the original set.Applying data augmentation to the machine learning modelstheoretically should increase the performance as they areexposed to these modified samples [9].

E. Model Optimization

When training a supervised learning model, the process willyield a training accuracy and a testing accuracy. The testingaccuracy shows the model’s predictive ability on novel testingdata, and it is generally the target value for optimization. Totrain a neural network, several parameters that must be set,including the number of layers in the neural network, thenumber of neurons in each layer, and the number of trainingepochs [17]. It is difficult to predetermine the optimal valuesfor these parameters, so the values of these parameters aremostly estimates for the first round of training. Thus, thefirst training attempt commonly yields a low training accuracyand/or testing accuracy. To optimize the model, the appropriatestrategy must be chosen, tested, and adjusted based on theresults of previous training iterations [18]. Poor training resultscan be classified into two main categories, underfitting andoverfitting, and the appropriate optimization strategy can be

selected based on which category the training results fall under[19].

1) Underfitting: Underfitting occurs when the structure ofthe neural network is not complex enough or trains for toofew epochs to properly learn the patterns presented in thetraining data. It can be identified when training results in botha low training accuracy and a low testing accuracy. There isno defined way to calculate the exact solution for underfitting,but there are several approaches to work toward optimal results[19].

Increasing the training epochs is one of the most simplesolutions to underfitting. The number of epochs refers to thenumber of times the training will iterate over the training data.A low training and testing accuracy can occur simply becausethe model did not have enough epochs to find an optimal loss.If the training accuracy stops increasing early on, however,increasing epochs often will not help [19].

Supplying more data for model training is another potentialand rather intuitive fix for underfitting. However, this solutionis not always viable because more training data may notalways be available and it may not fix the root of the problem.Additionally, increasing the data so that there is an extremeimbalance between cases might harm the model. For example,providing a large amount of dog pictures to a model thatdifferentiates between dogs and cats may result in a poormodel, as the dataset does not have enough images of catsto create an unbiased model [20].

Increasing the number of layers or neurons in each layerwill most likely be the solution for underfitting. As mentionedbefore, underfitting can occur when the model is not complexenough to learn the patterns presented in the training data.Thus, increasing the number of layers or the number ofneurons for every layer increases the complexity of the modeland the likelihood that the model will be able to learn thepatterns presented in the data. As of right now, there are noconcrete ways to predetermine either of these quantities, so thebest solution is often just to train many models with differentvalues and see which model performs the best [19].

When working with CNNs, the kernel size is anotherparameter to be taken into account. Just as it is difficult topredetermine the number of layers and number of neuronsin each layer, the optimal kernel size is also difficult topredetermine. However, it is common that the kernel sizesof earlier convolutional layers are greater than those of laterconvolutional layers [21].

2) Overfitting: Overfitting can occur when the model is toocomplex, or the training algorithm runs for too many epochs.The algorithm would learn the random noise in the trainingdata that do not contribute to the pattern the model is supposedto learn. Then, the algorithm would use the extraneous noise toclassify new images and return poor results. Overfitting canbe identified by a high training accuracy but a low testingaccuracy [17].

Overfitting occurs at an unnecessarily large number ofepochs, so the most sensible solution would be to lower theamount of epochs training runs for. Lowering the numbers

4

of epochs is called early stopping, and is similar to howincreasing the number of epochs can help with underfitting.If the testing accuracy peaks in the middle of training andends lower than the peak, early stopping will often solvethe problem. While this solution is sensible, it will not fixall overfitting problems, for which there are other potentialsolutions [22].

A set of weights that are significantly larger than the rest isa common sign of overfitting. This is because these weightshave a disproportionately large effect on the final classification,meaning the model is too reliant on these few aspects ofthe training data and will not perform well on new images.Dropout is performed by turning off certain chains of neuronsat random throughout training. By ignoring weights at random,the network will have difficulty relying on any weight toomuch and avoid overtraining. Out of all the strategies offighting overfitting, dropout is the most advocated for [23].Regularization has a similar goal as dropout in that it aimsto prevent any weights from becoming too large. It doesthis by placing larger costs on larger weights. In turn, thetraining algorithm will gravitate away from large weightsbecause its purpose is to minimize the cost. The two forms ofregularization are L1 regularization and L2 regularization. Themain difference between these two methodologies is that L1regularization places a cost that is proportional to the weightsand L2 regularization places a cost that is proportional to thesquare of the weights [23]. Underfitting occurs when the modelis not complex enough to learn the patterns, and overfittingcan occur when the model is too complex. An overly complexmodel has too many parameters, and thus learns irrelevantdetails, as mentioned previously. As a result, lowering thenumber of layers or neurons in each layer for the model canhelp with overfitting. This strategy of adjusting the architectureof the neural network is often the solution when all otherstrategies fail [24].

F. Genetic Algorithms

Genetic algorithms are based on the fundamental ideas ofevolution put forth by Charles Darwin. Using the conceptof natural selection, genetic algorithms run through differentcombinations of parameters (neurons, layers, etc.) to optimizethe run time of the learning process [8]. Genetic algorithms canbe applied to the many different aspects of a neural network,but for the purposes of this research, the algorithm will be usedto directly enhance the performance of the neural network interms of its efficiency or results. There are 5 phases to theprocess of implementing genetic algorithms [25].

1. Initial Population: Every possible combination, each ofwhich is a potential solution to the algorithm, is listed as anindividual that belongs to a larger population. The parametersof the individual are denoted as genes that are representedthrough a string of binary values.

2. Fitness Function: Fitness scores are assigned to eachindividual through a fitness function based on accuracy. Thesescores, essentially values between 0 and 1, designate the

probability that the individual will be selected for the nextgeneration of testing.

3. Selection: Based on the fitness score of the individuals,two pairs of individuals, known as parents, are created and arepassed on for reproduction. The higher the fitness, the morelikely an individual is to be selected to become a parent. Thosethat are not chosen will be killed off as non-potential solutions.

4. Crossover: The parents are mated at a random point inthe genes. This point in known as the crossover point. Thecrossover process is the core of genetic algorithms as it ends upcreating new combinations, or offspring, which can be addedto the original population. The algorithm is able to sort throughthe variety of these combinations before settling on the mostefficient one.

5. Mutation: Just as mutations are present in evolution, partsof the binary string may be flipped; however, the probabilityof this occurring is kept very low.

Several iterations of parents and offspring are created,introducing new generations. The algorithm terminates whenit determines that the new population created is no longerchanging with any new offspring produced. With this, thegenetic algorithm can then output a solution that can be runwith the training model.

G. Python Programming

To build the machine learning models, Python was chosen asthe programming language. Although Python is often slowerthan C++ [26], another popular language for machine learning,it is suitable for image classification purposes because ofits many open source libraries. These provide pre-writtenfunctions to aid with data analysis and machine learning.The NumPy library provides easy ways to create, manipulate,store, and access large matrices [27]. TensorFlow is an end-to-end, easy-to-use Python library for training neural networks[28], made even more user-friendly with Keras, an applicationprogramming interface (API) that uses TensorFlow as a back-end to aid in building models [29]. These libraries make theimplementation of machine learning algorithms much easierand allow for rapid prototyping.

III. EXPERIMENTAL PROCEDURE

A. Dataset

The dataset comes from an online competition conductedby Kaggle in partnership with the TSA [30]. It contains 1247.aps files, which show full body millimeter wave scans from 16different angles. Each of the 16 frames is 512 pixels wide and660 pixels tall. The majority of the examples contain a threatin one of 17 sets, which were predefined by the competition.The competition dataset also includes a .csv file containing theidentification numbers for all the .aps files and labels for eachof the 17 zones. A one indicates that there is a threat presentin the zone, and a zero indicates that there is not [31].

When training the models, different size training sets withdifferent ratios of positive to negative examples were used,depending on which produced the best results. These are fur-ther discussed in each algorithm’s respective section. However,

5

Fig. 6. An example of a millimeter wave scan. This is taken from the internet.The competition scans cannot be publicly shared for privacy reasons [32]

for the purposes of evaluating the models, it was important tocontrol for testing. Therefore, each model was tested on thesame set of 100 images, which were kept separate from theother 1147 training images.

B. Evaluation Metrics

When evaluating the data, it was important to note the severeconsequences of the algorithm missing a threat. As such, it wasimportant to use metrics other than accuracy for evaluation.Other metrics utilized were precision, recall, and F-1 score.Precision is the ratio of true positive to predicted positives,or the proportion of predicted threats that were correct [33].This was important because too many false flags defeats thepurpose of implementing a machine learning algorithm fordetecting threats.

precision =true positives

true positives+ false positives(2)

Recall is the ratio of true positives to actual positives, orthe proportion of existing threats that were detected [33]. Thiswas important because missing a potential threat could havedisastrous results in the real world.

recall =true positives

true positives+ false negatives(3)

Finally, the F-1 score is a metric that combines precisionand recall into one number between zero and one. It is theharmonic mean of precision and recall, which is better than aplain average because it heavily penalizes low values in eithermetric. If either precision or recall has a value of zero, thenthe F-1 score will be zero no matter how high the other is[33].

F 1 =2 ∗ precision ∗ recallprecision+ recall

(4)

C. Preprocessing

1) Zone and Perspective Analysis: The data files for themachine learning algorithms were given in an .aps format,

which, in this case, is an animation of 16 different angles ofa TSA body scan (reference Fig. 7). In order to display theimages by themselves, a few separate functions are required.The competition discloses distinct regions of the body, asshown below.

Fig. 7. Body zones, as provided by competition

Different methods were utilized when deciding how to trainthe model to detect threats on the body; algorithms couldfocus on the whole body or distinct zones. This research alsoaccounted for the different perspectives of the file, which donot encompass all of the same zones. The algorithms presentedin this paper differ based on the method used to approach thedata.

2) Zone and Perspective Selection: To focus on specificperspectives and regions of the body, it was necessary to splicethe image data. A .csv file was created that contained pixelcoordinates which bounded each of the 17 regions. After theprogram was given the target zones, it used the coordinates inthe .csv file to find the coordinates of a bounding region thatencompassed all the target zones. Then, each image was readin from the .aps files and saved in a NumPy array. The body’sregions of interest were obtained by splicing the NumPy arrayaccording to the all-encompassing bounding region. For allthe algorithms except the LSTM, different perspectives of thesame image were put side by side as one large image, and thecombined image was fed into model as one training input.

3) Applied Data Augmentation: Data augmentation wasimplemented with the ImageDataGenerator class. The Image-DataGenerator class, provided by the Keras library, is com-patible with TensorFlow. The ImageDataGenerator class takesimages within the dataset and randomly applies modificationsto the images based on the parameters specify [34]. The dataaugmentation parameters that were decided upon were imagerotation and shifting of the width and height. These newlymodified images were fitted to the models and tested withthe evaluation metrics. Considering the fact that the imagesthe model uses are generally standardized, only slight dataaugmentation parameters were used in order to prevent adverseeffects on the results [35]. The rotation values ranged from 5to 20 degrees and the shift values ranged from 5% to 20% ofthe original image.

D. Algorithms

1) Logistic Regression: A logistic regression algorithm wasimplemented using a Sequential model in the Keras library.

6

Logistic regression outputs the value as a binary value of 1or 0 which makes it suitable for this type of classificationof containing a threat or not. The image was flattened intoa one dimensional array and passed through one Dense layerconsisting of a Sigmoid Activation layer. The algorithm wasrun using 40 epochs, trained on 218 images, and tested on 100images.

Listing 1. Snippet of Baseline Logistic Regression Codeimport t e n s o r f l o w as t ffrom t e n s o r f l o w import k e r a sfrom k e r a s . models import S e q u e n t i a lfrom k e r a s . l a y e r s import Dense

model = models . S e q u e n t i a l ( )model . add ( k e r a s . l a y e r s . F l a t t e n (

i n p u t s h a p e =maxX−minX , maxY−minY , n u m P e r s p e c t i v e s ) ) )

model . add ( k e r a s . l a y e r s . Dense( 1 , a c t i v a t i o n = ’ s igmoid ’ ) )

Data augmentation was used on a similar logistic regressionalgorithm. There were several basic parameters that could beused to alter the dataset. The parameters that were selectedwere rotation, width, and height shift. The precise values foreach of these and the process of selecting these parametersrequired guessing and checking to find the most optimalchanges to the data and their values. The logistic regressionwith data augmentation was run using 40 epochs, trained on218 images, and tested on 100 images.

Listing 2. Snippet of Baseline Logistic Regression Code with Data Augmen-tation.import t e n s o r f l o w as t ffrom t e n s o r f l o w import k e r a sfrom k e r a s . models import S e q u e n t i a lfrom k e r a s . l a y e r s import Dense

model = models . S e q u e n t i a l ( )model . add ( k e r a s . l a y e r s . F l a t t e n (

i n p u t s h a p e =maxX−minX , maxY−minY , n u m P e r s p e c t i v e s ) ) )

model . add ( k e r a s . l a y e r s . Dense( 1 , a c t i v a t i o n = ’ s igmoid ’ ) )

t r a i n d a t a g e n = I m a g e D a t a G e n e r a t o r (r o t a t i o n r a n g e =10 ,w i d t h s h i f t r a n g e = 0 . 0 5 ,h e i g h t s h i f t r a n g e =0 .05

)

2) Convolutional Neural Network: The CNN was imple-mented with a Sequential model in Keras. The model consistsof three 2D convolutional layers with same padding and maxpooling, with the resulting channels being flattened and passedthrough three dense layers and one activation layer.

The convolution layers had, from first to last, kernels ofsize 4x4, 3x3, and 3x3. All layers had 2x2 max pooling.

Because max pooling reduced the dimensions of the image,it was possible to increase the number of output channelswith each subsequent layer. From first to last, the numberof output channels were 32, 64, and 128. In addition, eachlayer of convolution went through batch normalization afterconvolving. This divided each batch by the largest value inthe batch, ensuring that all values were between zero and one,with the largest value being one [36]. This resulted in improvedperformance compared to trials without batch normalization.

Listing 3. Snippet of Baseline Zone-Specific CNN Code

import t e n s o r f l o w as t ffrom t e n s o r f l o w import k e r a s

# P e r s p e c t i v e s 11−15 , 0s i d e s = 6

# Image d i m e n s i o n s i n p i x e l swid th = 130h e i g h t = 125

# B u i l d i n g t h e modelmodel = models . S e q u e n t i a l ( )

# C o n v o l u t i o n a l Lay er smodel . add ( l a y e r s . Conv2D ( 3 2 , ( 4 , 4 ) ,

a c t i v a t i o n = ’ r e l u ’ , k e r n e l r e g u l a r i z e r= t f . k e r a s . r e g u l a r i z e r s . l 2 ( l = 0 . 0 1 ) ,i n p u t s h a p e =( wid th * s i d e s , h e i g h t , 1 ) ) )

model . add ( l a y e r s . B a t c h N o r m a l i z a t i o n ( ) )model . add ( l a y e r s . MaxPooling2D ( ( 2 , 2 ) ) )

model . add ( l a y e r s . Conv2D ( 6 4 , ( 3 , 3 ) ,a c t i v a t i o n = ’ r e l u ’ ,k e r n e l r e g u l a r i z e r= t f . k e r a s . r e g u l a r i z e r s . l 2 ( l = 0 . 0 1 ) ) )


model . add ( l a y e r s . Conv2D ( 1 2 8 , ( 3 , 3 ) ,a c t i v a t i o n = ’ r e l u ’ ,k e r n e l r e g u l a r i z e r= t f . k e r a s . r e g u l a r i z e r s . l 2 ( l = 0 . 0 1 ) ) )


model . add ( l a y e r s . F l a t t e n ( ) )

# D e c i s i o n Networkmodel . add ( l a y e r s . Dense ( 1 2 8 ,

a c t i v a t i o n = ’ r e l u ’ ) )model . add ( l a y e r s . Dropout ( . 5 ) )model . add ( l a y e r s . Dense ( 6 4 ,

a c t i v a t i o n = ’ r e l u ’ ) )model . add ( l a y e r s . Dropout ( . 5 ) )

7

model . add ( l a y e r s . Dense ( 3 2 ,a c t i v a t i o n = ’ r e l u ’ ) )

model . add ( l a y e r s . Dense ( 1 ,a c t i v a t i o n = ’ s igmoid ’ ) )

model . compi le (o p t i m i z e r = t f . k e r a s . o p t i m i z e r s . Adam(

l e a r n i n g r a t e = 0 . 0 0 0 1 ) ,l o s s = t f . k e r a s . l o s s e s .

B i n a r y C r o s s e n t r o p y ( ) ,m e t r i c s = [ ’ a c c u r a c y ’ ] )

h i s t o r y = model . f i t ( c n n t r a i n ,t r a i n l a b e l s ,b a t c h s i z e = 2 , epochs =40 ,v a l i d a t i o n b a t c h s i z e = 2)

All convolution and dense layers contained a rectified linearunit (“ReLU”) activation function. The ReLU is a piecewisefunction that returns the input if it is positive, otherwiseit returns zero. ReLU activation functions are popular forneural networks because they often lead to better performance[37] [38]. The final, single-node, dense layer had a sigmoidactivation function, which returns values between zero andone. It effectively performed logistic regression on the valuesin the hidden dense layer to output a probability.

To combat overfitting, the model utilized L2 regularization.The proportional constant in the model was 0.01. Anothermeasure taken against overfitting was adding dropout layerswith a rate of 0.5 between the dense layers in the decisionnetwork. While training the model, it was predicted that recallcould be improved by reducing overfitting. To accomplish this,the L2 regularization term was increased by factors of five andten. However, this had an adverse effect on testing results andno significant effect on training results. Because of this, themodel with the regularization term of 0.01 was found to havethe optimal results.

The model was trained on a set of 348 images of zone11 with a 2:1 ratio of negative to positive examples. Only348 images were used because the entire training set of 1147contained only 116 examples where a threat was present inZone 11. After initial training on a set of 232 images witha 1:1 positive to negative split, the model performed poorly.After training on 348 images with a 2:1 split, the algorithmperformed much better.

The CNN was trained for 40 epochs with Binary Crossen-tropy as its cost function and the “adam” optimizer. Both thesefunctions came from the Keras library. Initially, when trainingthe model on the default learning rate of 0.001, the cost did notconverge and fluctuated back and forth between various values.To address this, the learning rate was lowered to 0.0001, whichresulted in convergence and good testing results.

3) Long Short Term Memory Network: To train an LSTM,the model had to be able to take in a series of images as

a single training item. This differed from the CNN, whichonly took one image for one training item. A time distributedlayer allowed several different perspectives of the same scanto be passed into the convolutional layers as one input item,representing the rotation of a scan. The final model consistedof three convolutional layers, each followed by a batch nor-malization layer and a max pooling layer. The data was thenflattened and fed into an LSTM layer with 32 units to accountfor previous cell states. For the final image classification,the result from the LSTM layer was passed through threehidden layers with dropout layers following the first two. Theoutput layer, just as with the CNN, consisted of one neuronwith a sigmoid activation function to characterize the binaryclassification [39].

Listing 4. Building the Zone-Specific LSTM

def buildCNN ( shape ) :momentum = . 9model = S e q u e n t i a l ( )

# f i r s t c o n v o l u t i o n a l l a y e rmodel . add ( l a y e r s . Conv2D ( 3 2 , ( 3 , 3 ) ,

i n p u t s h a p e =shape ,padd ing = ’ same ’ ,a c t i v a t i o n = ’ r e l u ’ ) )

model . add ( l a y e r s . B a t c h N o r m a l i z a t i o n (momentum=momentum ) )

model . add ( l a y e r s . MaxPool2D ( ) )

# second c o n v o l u t i o n a l l a y e rmodel . add ( l a y e r s . Conv2D ( 6 4 , ( 3 , 3 ) ,

padd ing = ’ same ’ ,a c t i v a t i o n = ’ r e l u ’ ) )



# t h i r d c o n v o l u t i o n a l l a y e rmodel . add ( l a y e r s . Conv2D ( 1 2 8 , ( 3 , 3 ) ,

padd ing = ’ same ’ ,a c t i v a t i o n = ’ r e l u ’ ) )



# f l a t t e n da ta f o r LSTM l a y e rmodel . add ( l a y e r s . F l a t t e n ( ) )re turn model

def a c t i o n m o d e l ( numOutputNeurons = 1 ) :shape = ( n u m P e r s p e c t i v e s ,

maxX − minX , maxY − minY , 1 )CNN = buildCNN ( shape [ 1 : ] )model = S e q u e n t i a l ( )model . add ( l a y e r s . T i m e D i s t r i b u t e d (CNN,

8

i n p u t s h a p e = shape ) )model . add ( l a y e r s .LSTM( 3 2 ) )

# f i r s t h id de n l a y e rmodel . add ( l a y e r s . Dense ( 1 2 8 ,


# second h id de n l a y e rmodel . add ( l a y e r s . Dense ( 6 4 ,


# t h i r d h id de n l a y e rmodel . add ( l a y e r s . Dense ( 3 2 ,

a c t i v a t i o n = ’ r e l u ’ ) )model . add ( l a y e r s . Dense (

numOutputNeurons ,a c t i v a t i o n = ’ s igmoid ’ ) )

re turn model

The initial architecture of the LSTM differed significantlyfrom the final architecture and yielded much poorer results.The model had difficulty increasing the training accuracy,signifying underfitting. Since the loss reached a minimumat which it stopped decreasing at an early epoch, the onlysolutions were to simplify input data and begin guessing atdifferent architectures. To simplify the input data to reducethe required complexity of the LSTM, only the thigh regionwas fed as training input. This meant the LSTM only had toanalyze the leg instead of the entire body, reducing the inputdata’s complexity to alleviate underfitting.

The underfitting persisted after modifying the input data,so the number of hidden layers and number of neurons ineach hidden layer were increased to increase complexity. Thefocus in a specific region of the body meant the structure ofthe neural network did not have to be modified drastically,allowing for reasonable guessing. Eventually, an appropriatecomplexity was found, upon which the training accuracy in-creased consistently throughout epochs. The testing accuracy,however, saw less success, meaning overfitting was occurring.To combat the overfitting, two dropout layers were added afterthe first two hidden layers, arriving at the final architecture.

4) Genetic Algorithm: Running the genetic algorithm re-quires a series of parents and children to create a populationof different combinations. The algorithm is given a set ofvariables assigned to probabilities that will be applied to thepopulation through different generations. The probability thata population would be kept after a generation was set to40%, the probability that a combination would be rejectedwas set to 10%, and the probability that an individual wouldmutate was set to 20%. These percentages are determined byprevious research that has tested out these values. [40]. Afunction was used to create a population and generate thefitness function, which was based on the accuracy of the

individual being tested. Separate functions were also createdto breed, mutate, and evolve the population. As the geneticalgorithm passed through different generations, the differentmetrics were logged in a file to be able to reference later.The parameters that were tested in this genetic algorithm werethe number of neurons, the number of layers, the activationfunction, and the optimizer, and then these parameters werepassed into the parameters that helped create the model forthe neural network and its hidden layers. These were theparameters chosen because they are the foundation for thehidden layers that train and test the data.

Listing 5. Creating neural network model with genetic algorithm parametersdef compi le model ( network , n b c l a s s e s ,

i n p u t s h a p e ) :

# a c c e s s ne twork p a r a m e t e r sn b l a y e r s = ne twork [ ’ n b l a y e r s ’ ]nb neurons = ne twork [ ’ nb neurons ’ ]a c t i v a t i o n = ne twork [ ’ a c t i v a t i o n ’ ]o p t i m i z e r = ne twork [ ’ o p t i m i z e r ’ ]

#add l a y e r smodel . add ( F l a t t e n ( i n p u t s h a p e =i n p u t s h a p e ) )f o r i in range ( n b l a y e r s ) :

model . add ( Dense ( nb neurons ,a c t i v a t i o n = a c t i v a t i o n ) )

# o u t p u t l a y e rmodel . add ( Dense ( 1 ,a c t i v a t i o n = ’ s igmoid ’ ) )model . compi le (l o s s = ’ b i n a r y c r o s s e n t r o p y ’ ,o p t i m i z e r = o p t i m i z e r ,m e t r i c s =[ ’ a c c u r a c y ’ , p r e c i s i o n m ,r e c a l l m , f1 m ] )

In testing the combinations, the parameters were given basechoices to work with. The possible number of neurons were setto 32, 64, and 128. The number of layers were set betewen 1and 4. The activation layers were assigned as either ReLU,ELU, or tanh. The Exponential Linear Unit (ELU) worksthe same as the ReLU with positive numbers but creates anexponential output for the negative numbers [41]. The tanhfunction, which is just a scaled sigmoid function, boundsits output between -1 and 1 while being differentiable [42].The optimizer was left as only “adam” as it is able toachieve fast results by maintaining the learning rate as itapplies to each weight in the neural network and parameter[43]. The code for the genetic algorithm itself was basedon the algorithm proposed by Matt Harvey, the founder ofCoastline Automation, and changes were made to incorporatethe specific TSA data [44] (see listing 6 in the appendix). Thevalidity of the genetic algorithm was then tested based on themetrics that the neural network outputted. The neural networkitself was only trained on zone 11 just to keep the simplicity

9

of the neural network while placing emphasis on testing theresults of the genetic algorithm.

IV. RESULTS AND ANALYSIS

Fig. 8. Bar graph with results from all algorithms.

Super

A. Logistic Regression Results

As shown in Fig. 8, the logistic regression algorithm withoutdata augmentation had a test accuracy of 71.00%, a precisionof 100.00%, a recall value of 63.75%, and an F-1 score of77.86%. The logistic regression algorithm when used with dataaugmentation returned better results with a test accuracy of86.00%. a precision of 97.14%, recall of 85.00%, and an F-1score of 90.67%. The logistic regression with data augmen-tation performed significantly better than the algorithm thatused data augmentation which can be explained by the datathat the algorithms receive. This is because data augmentationincreased the diversity of the training data, which led to anincrease its accuracy on test data. In this case, the image scanswere very consistent, however, the data augmentation providedenough diversity in the scans of the body for it to resultin a better precision, recall and F-1 score than the logisticregression without data augmentation.

B. Convolutional Neural Network Results

The CNN trained for 40 epochs on a training set of 348images with a 2:1 ratio of negative to positive examples. Asshown in Fig. 6, the CNN had a test accuracy of 99.0%, aprecision of 100%, a recall of 88.89%, and an F-1 score of94.12%. These results confirm the prediction that the CNNwould perform better than the logistic regression model. Asmentioned in the Background section, logistic regression isunable to capture the spatial dependencies (shapes and lines inthe image) that CNNs can. However, it was unexpected that theCNN performed just as well as the LSTM since, theoretically,the LSTM would be able to capture the temporal dependenciesof the video, which the CNN cannot. This is discussed furtherin the LSTM results.

C. Long Short-Term Memory Network Results

The LSTM performed with an accuracy of 99%, precisionof 100%, recall of 88.89%, and F-1 score of 94.12%. Thisexactly matches the result of the CNN, despite the LSTM’sability to analyze sequences of images. The lack of differencebetween the results of the two models can be attributed to thesmall testing set. To keep results consistent, the TSA’s testingset of 100 images was used for all algorithms. Both the CNNand LSTM missed exactly one image, meaning the only betterresult would be perfect percentages. Thus, better results withthe LSTM would be near impossible, so the difference betweenthe CNN and LSTM was not highlighted with merely 100images. To truly show the superiority of the LSTM, a largertesting set would be needed.

D. Genetic Algorithm Results

The genetic algorithm ran through many populations beforesettling on the very last individual tested with the follow-ing results for the parameters: 128 neurons, 4 layers, tanhactivation, “adam” optimizer. The reasoning behind why thiscombination was predicted by the algorithm can be explainedwith previous research. Out of the options passed in for thenumber of neurons, 128 was reported to give the best resultsfor the neural network. It has been reported that not enoughneurons can prevent the model from learning the classificationpatterns while an increasing number of neurons leads to alogarithmic growth in accuracy. At 128 neurons, the growthstarts to subside and the accuracy does not change much withthe increase in neurons [44]. Increasing the number of layersis able to increase the depth of the neural network, whichexplains why 4 layers was chosen- it is the highest numberof layers proposed, which allows for the neural network tocompute its results more efficiently [45]. The tanh activationfunction was chosen as the tanh function works best whenclassifying between two outputs as opposed to having anoutput with no bounds, which would be created with theReLU and the ELU [46]. With these parameters passed intothe normal neural networks the accuracy was reported to be81%, the precision was reported to be 43.3%, the recall was65%, and the F-1 score was reported 52%. Compared to theother models ran the results were low as the neural networkwas not specialized to deal with images such as the CNN.While the genetic algorithm gave its best output, the neuralnetwork was optimized to give results to the best of its ability.The 81% accuracy proves that the genetic algorithm providedgood results and that it worked as intended.

V. CONCLUSIONS

The goal of the project was to determine the optimalalgorithm for detecting threats in airport scans. This researchshows that the zone specific CNN and LSTM would be thebest choices for implementation. These algorithms performedbetter than logistic regression and genetic algorithms and wereextremely successful on the testing set, as seen in Fig. 8. Eachmodel only misclassified one image out of a test set of 100.Most notably, they both had a precision of 100%, meaning

10

no false positives. If implemented, these would be a drasticimprovement over the current false positive rate of 54%. Basedon these results, it is recommended that the TSA implement aCNN or LSTM trained on specific zones in order to improvethe efficiency and efficacy of passenger screening.

A. Future Work

1) Greater Data Acquisition: More data is necessary toproperly evaluate and compare the algorithms. As discussedearlier, it is believed that the lack of difference in performancebetween the CNN and LSTM is due to the small testing size,but more data would be necessary to verify this hypothesis.More data would also be necessary to better evaluate the recallof these two algorithms, since the testing set only includednine positive examples. This is especially important because ofthe severe consequences of missing potential threats in airportsecurity. Although the purpose of the project was fulfilled inthat the best models were identified, further work is necessaryto produce conclusive and meaningful results.

2) Extended Zone and Perspective Analysis: Consideringthe limited time frame available to complete the research, itwas not possible to implement as many approaches hopedfor. However, in the future, the dataset would be analyzedusing different body regions and zones. The logistic regressionmodels were trained and tested based on a full body view,primarily from the angle 0, a front-facing view. However,the convolutional neural network, long short-term memorynetwork and genetic algorithms focused on zone 11 (referenceFig. 5). This project was more of a proof of concept, as noneof the models were able to identify the specific zone of a threatwhen analyzing the entire body. Future analysis would includea full analysis of the body by region by all the algorithms.These regions could be the given 17 or there could be threeregions consisting of torso, arms, and legs. This would beaccomplished by dividing the image into its zones and traininga separate model for each region.

3) Combination of Classification Techniques: While indi-vidually, the classification techniques worked well, it could bepossible to utilize multiple in the same model to generate betterresults. A proposed idea for a combination of the differentapproaches would be to use a genetic algorithm to determinethe optimal number of neurons, layers, and activation functionsin a combined CNN and LSTM in addition to utilizing dataaugmentation for the most effective algorithm. One of the chal-lenges of doing this, however, would be the data augmentation.Since the data is already very normalized, determining theright parameters that would have to be used would be difficult.However, if the proper parameters and values were found (suchas rotation degrees, width shift percentages, and height shiftpercentages), the combination of these different algorithmswould theoretically result in a more effective model.

4) Applications: As the TSA runs through security checks,it seeks to implement machine learning into its processes on aregular basis [46]. As many new systems implement machinelearning into their tasks, they are able to emphasize the valueof becoming efficient and effective. Understanding which type

of model to use and any preprocessing techniques to adopt canhelp provide results that are more accurate. In a potentiallylife-threatening situation, the results have shown that applyingthe CNN to the security checks will be able to accurately andefficiently detect threats. This allows for the security checkprocess to run faster and smoothly with less human errorpresent. Implementing the CNN into security scans can bringthe nation to a safer state as people can rely on the systemthat detects threats.

APPENDIX

Listing 6. Optimizer from genetic algorithm [47]from f u n c t o o l s import reducefrom o p e r a t o r import addimport randomfrom ne twork import Network

c l a s s O p t i m i z e r ( ) :def i n i t ( s e l f , nn param choices ,r e t a i n = 0 . 4 ,r a n d o m s e l e c t = 0 . 1 ,m u t a t e c h a n c e = 0 . 2 ) :

s e l f . m u t a t e c h a n c e = m u t a t e c h a n c es e l f . r a n d o m s e l e c t = r a n d o m s e l e c ts e l f . r e t a i n = r e t a i ns e l f . nn pa ram cho ice s =nn pa ram cho ice s

def c r e a t e p o p u l a t i o n ( s e l f , c o u n t ) :pop = [ ]f o r in range ( 0 , c o u n t ) :

ne twork =Network ( s e l f . nn pa ram cho ice s )ne twork . c r e a t e r a n d o m ( )pop . append ( ne twork )

re turn pop

@ s t a t i c m e t h o ddef f i t n e s s ( ne twork ) :

re turn ne twork . a c c u r a c y

def g r a d e ( s e l f , pop ) :summed = reduce ( add ,( s e l f . f i t n e s s (ne twork ) f o rne twork in pop ) )re turn summed / f l o a t ( ( l e n ( pop ) ) )

def b r e e d ( s e l f , mother , f a t h e r ) :c h i l d r e n = [ ]f o r in range ( 2 ) :

c h i l d = {}f o r param ins e l f . nn pa ram cho ice s :

c h i l d [ param ] =

11

random . c h o i c e ([ mother . ne twork [ param ] ,f a t h e r . ne twork [ param ] ] )

ne twork = Network (s e l f . nn pa ram cho ice s )ne twork . c r e a t e s e t ( c h i l d )i f s e l f . m u t a t e c h a n c e >random . random ( ) :

ne twork =s e l f . m u t a t e ( ne twork )

c h i l d r e n . append ( ne twork )re turn c h i l d r e n

def mu ta t e ( s e l f , ne twork ) :m u t a t i o n = random . c h o i c e ( l i s t (s e l f . nn pa ram cho ice s . keys ( ) ) )ne twork . ne twork [ m u t a t i o n ] =random . c h o i c e (s e l f . nn pa ram cho ice s [ m u t a t i o n ] )re turn ne twork

def e v o l v e ( s e l f , pop ) :g r a de d = [ ( s e l f . f i t n e s s ( ne twork ) ,ne twork ) f o r ne twork in pop ]g ra de d = [ x [ 1 ] f o r x in s o r t e d( graded , key=lambda x : x [ 0 ] ,r e v e r s e =True ) ]r e t a i n l e n g t h =i n t ( l e n ( g r a de d )* s e l f . r e t a i n )p a r e n t s = g r ad e d [ : r e t a i n l e n g t h ]f o r i n d i v i d u a l ing ra de d [ r e t a i n l e n g t h : ] :

i f s e l f . r a n d o m s e l e c t >random . random ( ) :

p a r e n t s . append ( i n d i v i d u a l )p a r e n t s l e n g t h = l e n ( p a r e n t s )d e s i r e d l e n g t h = l e n ( pop ) −p a r e n t s l e n g t hc h i l d r e n = [ ]whi le l e n ( c h i l d r e n ) <d e s i r e d l e n g t h :

male = random . r a n d i n t ( 0 ,p a r e n t s l e n g t h −1)f em a l e = random . r a n d i n t ( 0 ,p a r e n t s l e n g t h −1)i f male != f e ma le :

male = p a r e n t s [ male ]f em a l e = p a r e n t s [ f em a le ]b a b i e s = s e l f . b r e e d ( male ,f em a l e )f o r baby in b a b i e s :

i f l e n ( c h i l d r e n ) <d e s i r e d l e n g t h :

c h i l d r e n . append (baby )

p a r e n t s . e x t e n d ( c h i l d r e n )

re turn p a r e n t s

ACKNOWLEDGEMENTS

The authors of this paper gratefully acknowledge the fol-lowing: Project mentor Sonia Purohit, for her valuable knowl-edge of machine learning and image processing; ResidentialTeaching Assistant and Project Liaison Steven Chen for hisinvaluable assistance; Director Dean Jean Patrick Antoine andDirector Emeritus Dean Ilene Rosen, for their management andguidance; Head counselor Rajas Karajgikar for his continualguidance and support; Research Coordinator Benjamin Leefor his assistance in conducting proper research; RutgersUniversity, Rutgers School of Engineering, and the State ofNew Jersey for the chance to advance knowledge, exploreengineering, and open up new opportunities; Lockheed Martinand NJ Space Grant Consortium for funding of our scientificendeavours; and lastly NJ GSET Alumni, for their continuedparticipation and support.

REFERENCES

[1] “Mission.” Mission — Transportation Security Administration.[2] M. Grabell and C. Salewski. “Sweating Bullets: Body Scanners Can See

Perspiration as a Potential Weapon.” ProPublica.[3] “What Is Machine Learning? A Definition.” Expert System, Expert

System Team, 29 May 2020.[4] K. Hao. “What Is Machine Learning?” MIT Technology Review, MIT

Technology Review, 2 Apr. 2020.[5] D. Soni. “Supervised vs. Unsupervised Learning.” Medium, Towards

Data Science, 16 July 2019.[6] S. Swaminathan. “Logistic Regression - Detailed Overview.” Medium,

Towards Data Science, 18 Jan. 2019.[7] J. Brownlee. “Logistic Regression Tutorial for Machine Learning.”

Machine Learning Mastery, 12 Aug. 2019.[8] N. Pramanik, Ph.D. “Genetic Algorithm-Explained Step by Step with

Example.” Medium, Towards Data Science, 29 Sept. 2019.[9] L. Hardesty. “Explained: Neural Networks.” MIT News, 14 Apr. 2017.

[10] “A Beginner’s Guide to Neural Networks and Deep Learning.” Pathmind.[11] J.B. Ahire. “The Artificial Neural Networks Handbook: Part 1.” Medium,

Coinmonks, 29 Aug. 2018.[12] S. Saha. “A Comprehensive Guide to Convolutional Neural Networks-

the ELI5 Way.” Medium, Towards Data Science, 17 Dec. 2018.[13] Ujjwalkarn. “An Intuitive Explanation of Convolutional Neural Net-

works.” The Data Science Blog, 29 May 2017.[14] “What Is the Difference between ’SAME’ and ’VALID’ Padding in

Tf.nn.max pool of Tensorflow?” Pico.[15] “Convolution Arithmetic Tutorial.” Convolution Arithmetic Tutorial -

Theano 1.0.0 Documentation, 21 Nov. 2017.[16] “Understanding LSTM Networks.” Understanding LSTM Networks –

Colah’s Blog, 27 Aug. 2015.[17] “Basic Classification: Classify Images of Clothing: TensorFlow Core.”

TensorFlow.[18] J. Brownlee. “How to Configure the Number of Layers and Nodes in a

Neural Network.” Machine Learning Mastery, 6 Aug. 2019.[19] A. Sharma. “Tackling Underfitting And Overfitting Problems In Data

Science.” Analytics India Magazine, 24 June 2020.[20] J. Brownlee. “Impact of Dataset Size on Deep Learning Model Skill

And Performance Estimates.” Machine Learning Mastery, 6 Aug. 2019.[21] A. Chazareix. “About Convolutional Layer and Convolution Kernel.”

Sicara.[22] N. Schluter. “Don’t Overfit!-How to Prevent Overfitting in Your Deep

Learning Models.” Medium, Towards Data Science, 5 June 2019.[23] “Overfit and Underfit TensorFlow Core.” TensorFlow.[24] “5 Techniques to Prevent Overfitting in Neural Networks.” KDnuggets.[25] V. Mallawaarachchi. “Introduction to Genetic Algorithms - Including

Example Code.” Medium, Towards Data Science, 1 Mar. 2020.[26] Quora. “Computer Programming Languages: Why C Runs So Much

Faster Than Python.” HuffPost, HuffPost, 6 Sept. 2017.

12

[27] “ECOSYSTEM.” NumPy.[28] “Why TensorFlow.” TensorFlow.[29] A. Nain. “TensorFlow or Keras? Which One Should I Learn?” Medium,

Imploding Gradients, 18 July 2018.[30] “Passenger Screening Algorithm Challenge.” (Overview) Kaggle.[31] “Passenger Screening Algorithm Challenge.”(Data) Kaggle.[32] “Manchester Airport Body Scanners in All Three Terminals.” BBC

News, BBC, 14 Oct. 2010.[33] W. Koehrsen. “Beyond Accuracy: Precision and Recall.” Medium,

Towards Data Science, 10 Mar. 2018.[34] ”tf.keras.preprocessing.image.ImageDataGenerator,” Tensorflow.[35] F. Chollet ”Building powerful image classification models using very

little data.” The Keras Blog, 05, June 2016.[36] D, F. “Batch Normalization in Neural Networks.” Medium, Towards

Data Science, 25 Oct. 2017[37] J. Brownlee. “A Gentle Introduction to the Rectified Linear Unit

(ReLU).” Machine Learning Mastery, 6 Aug. 2019.[38] “Alpha.” Exponential Linear Unit (ELU) Layer - MATLAB.[39] P. Ferlet. “Training Neural Network with Image Sequence, an Example

with Video as Input.” Medium, Smile Innovation, 5 Dec. 2019.

[40] Harvitronix. “Harvitronix/Neural-Network-Genetic-Algorithm.” GitHub.[41] A. Sharma. “Understanding Activation Functions in Neural Networks.”

Medium, The Theory Of Everything, 30 Mar. 2017.[42] Brownlee, Jason. “Gentle Introduction to the Adam Optimization Algo-

rithm for Deep Learning.” Machine Learning Mastery, 13 Nov. 2019.[43] M. Harvey. “Let’s Evolve a Neural Network with a Genetic Algorithm-

Code Included.” Medium, Coastline Automation, 7 Apr. 2017.[44] “How Does the Number of Hidden Neurons Affect a Neural Network’s

Performance.” How Does the Number of Hidden Neurons Affect aNeural Network’s Performance, 3 Jan. 2016.

[45] J.Brownlee. “How to Control Neural Network Model Capacity WithNodes and Layers.” Machine Learning Mastery, 10 Jan. 2020.

[46] S. Sharmar. “Activation Functions in Neural Networks.” Medium, To-wards Data Science, 14 Feb. 2019.

[47] Harvitronix. “Harvitronix/Neural-Network-Genetic-Algorithm.” GitHub,20 Sept. 2017.

[48] “Security Screening.” Security Screening — Transportation SecurityAdministration.

13

Documents

Comparative Analysis of Image Processing Algorithms for ...€¦ · the input by converting the image from a 3 dimensional array to a single dimension array. The array is then passed