Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Equine Gait Data Analysis using Signal Transforms as a
Preprocessor to Back Propagation Networks
by
Edwin H. Cheung
A Thesis
presented to
The University of Guelph
In partial fulfilment of requirements
for the degree of
Masters of Science
in
School of Computer Science
Guelph, Ontario, Canada
c�Edwin H. Cheung, April, 2014
ABSTRACT
Equine Gait Data Analysis using Signal Transforms as a Preprocessor to
Back Propagation Networks
Edwin H. Cheung
University of Guelph, 2014
Advisor:
Dr. David Calvert
This thesis examines using Back Propagation network in the analysis of equine gait
data. Back Propagation networks are capable of classifying non-linear data sets, but are
not usually built to handle time series data. By using Fourier and wavelet transforms as
a pre-processor, the Back Propagation network is then able to overcome this hurdle. It
was then able to analyze and classify gait, shoeing and direction in the gait data quite
accurately and e↵ectively. Several methods proved to be more e↵ective than others.
Acknowledgements
I would like to thank everyone who has supported me to complete this thesis. My
parents for allowing me to pursue this dream. Lifelong friends who have pushed me to
finish this degree. My fellow housemates for always being there when needed. Jenna
Stephens for being priceless, exuberant, and accepting. Finally, credit to Dr. David
Calvert, who has been everything an advisor should be – and more.
iii
Contents
Abstract i
Acknowledgements iii
Contents iv
List of Figures v
List of Tables vi
1 Introduction 1
2 Literature Review 5
2.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Architecture of Artificial Neural Networks . . . . . . . . . . . . . . 6
2.1.2 Training of Supervised Artificial Neural Networks . . . . . . . . . . 8
2.1.3 Classifying vs Clustering . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.4 Testing of Supervised Artificial Neural Networks . . . . . . . . . . 9
2.1.5 Back Propagation Artificial Neural Network . . . . . . . . . . . . . 10
2.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Signal Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.4 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.5 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Methodology 29
3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.2 Computable characteristics . . . . . . . . . . . . . . . . . . . . . . 30
3.1.3 Naming Convention . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . 36
3.2.2 Discrete Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . 37
iv
Contents v
3.3 Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Data Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.1 Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4 Results and Discussions 48
4.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Fourier vs Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.2 Additional Fourier Coe�cient Analysis . . . . . . . . . . . . . . . . 53
4.2.3 Mother Wavelets: Haar vs DB4 . . . . . . . . . . . . . . . . . . . . 55
4.3 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.1 Gait, Shoe, and Turn . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.2 Merged characteristics Used to Enhance Results . . . . . . . . . . 59
4.4 Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.1 Combined Data streams . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5 Final Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Conclusion 70
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2.1 Data Size, Variance and Types . . . . . . . . . . . . . . . . . . . . 71
5.2.2 Data Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A Results of Artificial Neural Networks 74
A.1 Note of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.2 Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
A.3 Shoe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
A.4 Turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
A.5 Merged Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
B Supplementary Results of Artificial Neural Networks 79
B.1 Note of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
B.2 Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
B.3 Shoe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
B.4 Turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
B.5 Merged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
C Standard Deviation of Results 84
C.1 Note of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
C.2 Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
C.3 Shoe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Contents vi
C.4 Turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
C.5 Merged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Bibliography 89
List of Figures
2.1 3 Layer (a,b,c) Back Propagation Artificial Neural Network with twoweight layers (V,W) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Fourier Series Approximation through di↵erent K values(Red) of a SquareWave (Blue) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Fourier Transform of a f(x), shown in red, on the time domain. Thecomponent waves, in blue, are then plotted along the frequency domainas peaks as the result of the Fourier Transform . . . . . . . . . . . . . . . 20
2.4 Results of the Short Time Fourier Transform, and the Wavelet Transform,and their di↵erence in time resolution . . . . . . . . . . . . . . . . . . . . 23
2.5 Visual representation of both the Daubechies 4, aka db4 Wavelet (Left),and the Haar Wavelet (Right) [1][2] . . . . . . . . . . . . . . . . . . . . . . 24
2.6 A simplified visual representation of the scaling and shifting techniquesused in Continuous Wavelet Transform [3] . . . . . . . . . . . . . . . . . . 25
2.7 A simplified visual representation of the Approximation and Details re-sults from Decomposition Filters used in a Discrete Wavelet Transform . . 27
2.8 Resulting Coe�cients from a Discrete Wavelet Transform . . . . . . . . . 28
3.1 Right fore hoof with location of Strain gauge (G1-G5) . . . . . . . . . . . 31
3.2 Sample Data from a Sensor divided into Data Fragments . . . . . . . . . . 33
3.3 Sample Data from a Sensor divided into Labeled Data Fragments . . . . . 35
3.4 Sample data process for the Accelerometers in X-Axis (A-X) data stream 41
3.5 Sample data process for the Strain Gauge1 (S-G1) data stream . . . . . . 42
3.6 Sample data process for the Accelerometer Combined (AC) data stream . 43
3.7 Sample data process for the Strain Combined Gauge1 (SC-G1) data stream 44
3.8 Output Layer of the ANN for the Shoe Characteristic . . . . . . . . . . . 45
3.9 Output Layer of the ANN for the Merged Characteristic . . . . . . . . . . 46
4.1 Comparison of Average Accuracies of di↵erent dimensionality reductiontechniques over characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Comparison of Average Accuracies of di↵erent Characteristic over FourierTransform dimensionality reduction techniques . . . . . . . . . . . . . . . 54
4.3 Comparison of Average Accuracies of di↵erent Characteristic over FourierTransform dimensionality reduction techniques including SupplementaryFourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4 Comparison of Average Accuracies of non-combined data streams for theeach characteristics regardless of dimensionality reduction techniques . . . 58
vii
List of Figures viii
4.5 Average Max Accuracies by characteristics using the Fourier-8 Data Re-duction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Average Max Accuracies by characteristics using the Fourier-16 Data Re-duction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7 Average Max Accuracies by characteristics using the Fourier-32 Data Re-duction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.8 Average Max Accuracies by characteristics using the Wavelet-DB4 DataReduction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.9 Average Max Accuracies by characteristics using the Wavelet-Haar DataReduction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
List of Tables
3.1 Breakdown of Data based on the Shoe Characteristic . . . . . . . . . . . . 33
3.2 Breakdown of Data based on the Gait Characteristic . . . . . . . . . . . . 33
3.3 Breakdown of Data based on the Direction Characteristic . . . . . . . . . 33
3.4 List of Data Streams Used . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 List of Dimensionality Reduction Technique Used, (a), and Characteris-tics that were assessed, (b). . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Average accuracy using various Data Streams with dimensionality reduc-tion techniques analyzing characteristics . . . . . . . . . . . . . . . . . . . 49
4.2 Average accuracy obtained by data Streams using various dimensionalityreduction techniques analyzing characteristics . . . . . . . . . . . . . . . . 50
4.3 Student’s t-test values for accuracies obtained by using Fourier and wavelettransforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 E�ciency Score based on Accuracy of ANN and Number of Inputs used . 55
4.5 Number of ANNs using Wavelet Transforms which obtained the HighestAverage Accuracy and Average Di↵erence between Accuracy sorted byMother Wavelets and characteristics . . . . . . . . . . . . . . . . . . . . . 56
4.6 Average Accuracy and Average Di↵erence between Accuracy sorted byMother Wavelets and Data Stream Using Wavelet Transforms . . . . . . . 56
4.7 Student’s t-test values for accuracies obtained by using Wavelet-Haar andWavelet-DB4 transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.8 Range of Average Accuracies over Data Streams found in ANN usingdi↵erent Dimensionality Reduction techniques for each Characteristic . . . 60
4.9 Di↵erence of Average Max Accuracy between combined and single datastreams over Wavelet Transforms For the Merged Characteristic . . . . . 68
4.10 Accuracies using combined Strain Gauge 4 (SC-G4) data stream withWavelet Data Reduction Techniques to classify the Gait, Shoe, and Turncharacteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
A.1 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Gait Characteristic 75
A.2 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Gait Characteristic 75
A.3 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Shoe Characteristic 76
ix
List of Tables x
A.4 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Shoe Characteristic 76
A.5 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Turn Characteristic 77
A.6 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Turn Characteristic 77
A.7 Average Maximum Accuracy Of ANNs and the Average Epoch needed us-ing Fourier Dimensionality Reductions to analyze all of the characteristicsmerged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
A.8 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze all of the character-istics merged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
B.1 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Gait Characteristic 80
B.2 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Shoe Characteristic 81
B.3 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Turn Characteristic 82
B.4 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Merged Charac-teristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
C.1 Standard Deviations of ANNs analyzing the Gait Characteristic using theFourier Data Reduction Technique . . . . . . . . . . . . . . . . . . . . . . 85
C.2 Standard Deviations of ANNs analyzing the Gait Characteristic using theWavelet Data Reduction Technique . . . . . . . . . . . . . . . . . . . . . . 85
C.3 Standard Deviations of ANNs analyzing the Shoe Characteristic using theFourier Data Reduction Technique . . . . . . . . . . . . . . . . . . . . . . 86
C.4 Standard Deviations of ANNs analyzing the Shoe Characteristic using theWavelet Data Reduction Technique . . . . . . . . . . . . . . . . . . . . . . 86
C.5 Standard Deviations of ANNs analyzing the Turn Characteristic usingthe Fourier Data Reduction Technique . . . . . . . . . . . . . . . . . . . . 87
C.6 Standard Deviations of ANNs analyzing the Turn Characteristic usingthe Wavelet Data Reduction Technique . . . . . . . . . . . . . . . . . . . 87
C.7 Standard Deviations of ANNs analyzing all of the characteristics mergedusing the Fourier Data Reduction Technique . . . . . . . . . . . . . . . . . 88
C.8 Standard Deviations of ANNs analyzing all of the characteristics mergedusing the Wavelet Data Reduction Technique . . . . . . . . . . . . . . . . 88
Chapter 1
Introduction
The purpose of this work is to analyze the strain on horse’s hooves using machine
intelligence techniques. The intention is to determine if there is su�cient information
in the strain and accelerometer data to identify several characteristics about the horse.
The data is collected using strain gauge and accelerometers placed on the horse’s hoof.
It is then preprocessed using Fourier and wavelet transform in order to reduce the size
of the data set. The processed data is then classified using a Back Propagation neural
network. The goal of analyzing this data is to be able to detect lameness in the horse.
Lameness in horses, a specific term to describe the subject’s inability to travel in a
normal manner upon all four limbs, is a serious and costly condition. It is also one of
the most frequent health issues amongst horses [5]. There are a wide variety of causes
that leads to lameness, causing it to be fairly diagnostically challenging [6]. As this is
an initial analysis of the data there is currently not a su�cient amount of data collected
representing lame horse gait. The analysis will therefore focus on the characteristics
which are represented in the data, namely shoing, gait, and direction.
Opinions vary from veterinarian to veterinarian on lameness evaluations when done
1
Chapter 1. Introduction 2
purely subjectively [7]. Several objective lameness detection techniques have been ex-
plored [8] [9] [10], giving veterinarians an important diagnostic tool.
While this thesis does not directly look into lameness, it does provide a technique to
identify several di↵erent characteristics including gait, which may be used as a valuable
tool to lameness detection [8] [10]. This thesis is an extension on previous works which
analyzed data from the same origins using similar techniques but sampled at a much
lower frequency [11]. The results from classifying the low frequency data were quite
positive. It was possible to classify gait, shoeing, direction, surface being walked upon
and if a rider was present. With the higher frequency data, this thesis looks for a method
to reduce the size of the data before attempting to classify it.
In this study, a simple Back Propagation Artificial Neural Network is used to classify
a time series data set. The data being analyzed are accelerometer and strain gauge
measurements from several horses as they move around a track. In order to present
the entirety of the run to the ANN at once, several signal transforms are applied to
the data stream as a preprocessor. The signal transform is used as a dimensionality
reduction technique. It e↵ectively obtains the general characteristics from the data set
and summarizes it using a smaller number of variables. This thesis compares several
dimensionality reduction techniques which are used to preprocess data which is then clas-
sified by the ANN. Fourier and wavelet analysis are the dimensionality techniques which
are compared. It is demonstrated that when using the wavelet analysis to preprocess
this data set the ANN produces more accurate classification results than when Fourier
analysis is used. This work also explores the e↵ectiveness of di↵erent configurations of
data streams and analyzing the e↵ect on the ANN’s accuracy. It will demonstrate the
usefulness of using signal transforms as a preprocessor and its e↵ectiveness in analyzing
equine gait data using a Back Propagation Artificial Neural Network.
Artificial Neural Networks (ANN) are a type of Machine Learning approach and some
of these systems can be used to solve non-linear problems. Artificial Neural Networks
Chapter 1. Introduction 3
model the structure and functionality found in biological neural networks. The Back
Propagation network used in this work is a non-linear classifier. These systems generally
require a large amount of representative data to be able to either classify or cluster data
accurately. ANNs are capable of approximating a non-linear function in their output.
They are also useful for such as function approximation or regression analysis, which
includes Time Series Prediction and Time Series Analysis.
Time series is a sequence of data that are related through time. Time Series Predic-
tion is the attempt to accurately estimate the data following in a sequence. While Time
Series Analysis is used to extract useful information about a time series. Many ANN
architectures are not built to receive input data in a sequence. Time series based data
must be preprocessed before being served to the ANN as inputs. A common approach
to this is to employ the use of a sliding window when analyzing a time series data set.
The sliding window technique, or windowing, is designed to use a specific sequence of
the data as simultaneous inputs to the network [14]. The ANN can then process the
sequence of inputs and analyze that particular subset of data. This process then re-
peats itself as the window “slides” down to the following subset of data. However, this
technique will only “learn” knowledge from a specific subset of data, or learn knowledge
intra-subset, instead of inter-subset. This is due to the fact the ANN cannot incorporate
any data outside of the window while it is learning. In using this technique, the ANN
is not provided a “context” for the entire data set. The window the ANN is currently
using does not o↵er insight to where that window is in relations to the overall data. The
context only extends to the size of the input window and the number of elements from
the data set it processes.
In order to address this problem, several ANNs such as impulse response filters, and
Elman Networks, are built in order to incorporate past memory that it has already
learned as an input variable to the next sequence [15]. These are known as recurrent
Chapter 1. Introduction 4
networks. They are built with a varying depth which controls the duration of the mem-
ory, and a varying resolution which controls just how much of that memory influences
future analysis. Previous work [11] used a moving window in the analysis of low fre-
quency horse gait data. This is not viable with data which was sampled at a higher
frequency because the moving window becomes too large and computationally expen-
sive to implement. This work uses a di↵erent approach to manage the indefinite length
of time series data. Instead the data is preprocessed to create a smaller, finite length
input for use in a non-recurrent network.
This thesis will demonstrate that the high frequency horse strain data contains useful
information about the gait and the state of the horse. It also demonstrates that the
Fourier and wavelet decomposition can be used to reduce the size of the input stream
without removing the characteristics of the gait from the data.
In this thesis, Chapter two looks at the several of the techniques in detail. Chapter
three outlines the steps and procedures for the experiment. Chapter four details the
results and an in depth analysis of what they might indicate. Lastly, chapter five talks
about the future work and some changes that might be done to further the work.
Chapter 2
Literature Review
The methods that are used in this thesis are summarized in this chapter. Artificial
Neural Networks are examined, giving the readers insight on the architecture, mechanics
to both retaining and applying knowledge, and the metrics used to grade these systems.
It also includes specifics on the Back Propagation Neural Network, a particular Neural
Network which is used extensively in this thesis. This chapter also provides a summary
of a number of Dimensionality Reduction techniques, their usefulness and implications in
Feature Selection and Extraction. Two particular Signal Transforms are also examined
in detail in this chapter, as both the Fourier Transform and Wavelet Transform are used
heavily in this thesis.
2.1 Artificial Neural Networks
Artificial Neural Networks (Henceforth known as ANNs), are a type of computation
model used for machine learning and as a data mining technique. The ANN architecture
was inspired from their namesake, neurons, which plays a large role in the brain. Using a
series of interconnected neurons, called nodes, a network is formed. The Back Propaga-
tion network used in this work excels at non linear classifications and are used extensively
5
Chapter 2. Literature Review 6
in problems that is complex to solve using more traditional learning techniques. ANNs
are used in data sets where either the desired output is known, a supervised network, or
if the desired output is not known, an unsupervised network. Supervised networks are
presented with data that consists of input-output pairs, while unsupervised networks do
not require the output component of the pair. The results from a supervised network
is the classification of data, while an unsupervised network will produce a clustering of
data.
2.1.1 Architecture of Artificial Neural Networks
An Artificial Neural Network is built mainly by layers of interconnected nodes. These
connections, which mimic synapses in the biological nervous system, are called adaptive
weighted connections (otherwise known as weights). Using these weights, the ANN is able
to transfer information stored in di↵erent layers. These weights are not only responsible
for transferring data through the network, but their values adapt during the training
phase (see 2.1.2), and are used for activation during the testing phase (see 2.1.2).
There are many variations to ANNs, but the architecture used in this work can be
described using three components:
1. Input Layer
2. Hidden Layer
3. Output Layer
The input layer component acts as a receptor for the input data which allows it to
be passed into the network. These nodes form the input layer and are connected to
the nodes in the hidden layer. The hidden layer receives the results of the input layer
once they have been passed through a layer of weighted connections. The output layer,
is responsible for producing the results which the network has concluded given from
Chapter 2. Literature Review 7
the activations of the hidden layer which is passed through a second set of weighted
connections. The results obtained from the output layer are then used to compare with
the desired output contained in the data.
Di↵erent architectures of ANN exist which may have only one or two if the layers
listed above, as di↵erent problem sets may require di↵erent types of ANNs. The most
notable di↵erence is between classification and clustering (see section 2.1.3) and the use
of an output layer. Since unsupervised networks only receive the input component of
the data set, they use only an input layer and hidden layer. The hidden layer is used
to represent the clusters in an unsupervised network. While in a supervised network,
all three component are present, as the network is presented data that consists of both
input and output pairs.
Note that the layers structure may not be used in all ANNs. It is common to refer
to layers when describing a series of nodes within the same hierarchical level in the
network. Any number of layers can be present in various types ANNs. An increased
number of layers may result in a more complex ANN. It is also possible for these layers
to be connected in di↵erent patterns. Some ANNs may have layers that are recursively
connected by having weights connecting to other layers that have occurred earlier in the
network. Some ANNs have layers that are connected to themselves. The ANN examined
in this thesis is a simple feed forward network, namely one that feeds that data forward
from the input layer through a hidden layer, ending at the output layer.
While ANNs are able to handle any di↵erent types of data, they work best with
continuous or discrete numeric values between -1 and +1 or 0 to +1. Data that are fed
into the input nodes are usually normalized to the range of -1 to +1 in order to improve
performance.
Chapter 2. Literature Review 8
2.1.2 Training of Supervised Artificial Neural Networks
In order for a supervised ANN to operate in any meaningful way, the network must first
be trained to “learn” the data set. This phase, also called the training phase, allows
the ANN to analyze the data containing both the input-output pairs. The network
then makes a prediction of the desired output using the input data it received. It then
compares its own predicted output with the desired output from the data. The network
will then make adjustments to its weights between the layers in order to attempt to
produce a more accurate result when the input is later presented to the ANN. By doing
so, the network is able to retain the knowledge of the data in its weights, a process
known as learning. There are many types of learning in the world of ANN, as many
rules, algorithm, and formulas have been developed each with its own advantage and
disadvantages. The learning rule used in this thesis will be the Back Propagation rule
(see section 2.1.5).
The knowledge of the data which the network has extracted is represented by its
weights. With the distribution of these weights, or knowledge, across the entire network,
ANN are said to have a distributed representation of its knowledge. With so many
weights in a single network, it can be di�cult to extract the knowledge that the network
has obtained without examining the context of all of the weights.
As with any machine learning algorithm, the quality of data which it learns from
greatly dictates the e�ciency of the network. While a Back Propagation ANN normally
requires a long training time, its ability to learn complex and non-linear data is one
attractive feature. Even though the rate in which a network learns can be adjusted, the
Back Propagation Network generally learns from the training data repetitively, slowly
gaining knowledge from each presentation of the data set. In order to combat this
slowness of Back Propagation, some networks employs the use of a momentum value.
Momentum is used in order to combat high frequencies oscillation in the weight values
Chapter 2. Literature Review 9
which hinder the learning process. Due to the incremental manner in which weights are
adjusted, the Back Propagation network can learn patterns that are most representative
of the data and are not overly a↵ected by outliers or small variances in the data. The
more representation of the problem the data contains, the easier the network is able to
distinguish the patterns within it accurately. Over learning occurs when the network is
trained to much and it starts memorizing the data and can lose the ability to generalize
about the data set overall.
2.1.3 Classifying vs Clustering
Clustering involves analyzing features in order to group them and to find relationships
within the dataset. These clusters are not labeled and therefore not classified.
A commonly used technique is to label clusters once they have been created. The
labels are taken from classified data points. Classification generally requires feedback
during training which indicates the quality of the results and is therefore referred to as
Supervised Learning. Clustering does not require feedback and is called Unsupervised
Learning. Clustering is an extremely useful tool and several techniques such as self
organizing maps, or k-means clustering are popular.
2.1.4 Testing of Supervised Artificial Neural Networks
Once the network has been trained it is then graded for its performance during the
testing phase. The ANN is given another set of data which it processes, and the results
from the network’s output layer are then compared to the desired output from the
dataset. Using this comparison, the network is graded on for its accuracy. There are
many metrics to which this accuracy is obtained, it may be as simple as a percentage
on how much of the output data is classified correctly. When analyzing continuous data
Chapter 2. Literature Review 10
sets such as predicting a particular numerical value, a Mean Squared Error is often used
to represent the performance of the ANN.
As with the training data, the testing data requires the expected or desired output
in order to grade the ANN’s accuracy. The testing data can be a subset of the training
data but in order to fully demonstrate the ANN’s ability to extrapolate useful knowledge,
most studies that use ANNs will use separate training and testing sets. A larger subset
is often used solely for the training phase, and a smaller subset that the ANN has not
been exposed to during training is used in the testing phase.
2.1.5 Back Propagation Artificial Neural Network
The Back propagation Artificial Neural Network is a feedforward network. It is built
from a number of hidden layer and an input and output layer. This network also
employs the learning rule which its name stems from. A visual representation of the
Back Propagation ANN using the notation found in this section can be found in Figure
2.1.
Initially, randomized values between [�1,+1] are assigned to every weight in the
network. During the training phase, inputs are processed through the hidden and output
layer (given a single hidden layer ANN) as follows:
bi
= S
hX
h=1
ah
Vhi
+ ✓i
!(2.1)
cj
= S
iX
i=1
bi
Wij
+ ⌧j
!(2.2)
where:
S(x) = (1 + e�1)�1 (2.3)
Chapter 2. Literature Review 11
Figure 2.1: 3 Layer (a,b,c) Back Propagation Artificial Neural Network with twoweight layers (V,W)
where a, b, and c are the three layers in the network, which are the input, hidden,
and output layers respectively. While ah
, bi
, cj
corresponds to a hth, ith, and jth node
in that particular layer. The activation value of these nodes are calculated by equations
2.1 and equation 2.2. Equation 2.3, also called the sigmoid function, is also used in
calculating the activation values for nodes in these layers.
V and W also represents the weights in the network, with V being the weights
connecting a and b, and W being the weights connecting b and c. More specifically, Vhi
is the particular weight that connects nodes ah
and bi
, where Wij
is the weight that
connects nodes bi
and cj
. ✓i
and ⌧j
are both threshold values, which exists for each node
Chapter 2. Literature Review 12
in b and c respectively. These threshold values allow for a particular node’s value to be
either amplified or reduced and are adjusted during the learning segment of the training
phase. Once an estimate has been made in the output layer via an input-output pair,
an error calculation occurs:
dj
= cj
(1� cj
)(ckj
� cj
) (2.4)
where dj
is the error between the desired output, ckj
(given to the network as the
input-output pair) and the actual output, cj
, is the result that was the network arrived
at. By using the error dj
, the network is able to propagate back into the hidden layer
and calculate the error, ei
, amongst it:
ei
= bi
(1� bi
)
0
@qX
j=1
Wij
dj
1
A (2.5)
Once this has been calculated, the weights adjust using the error. This learning is
doing via the following:
�Wij
= ↵bi
dj
+ ��Wij
(n) (2.6)
�Vhi
= �ah
ei
+ ��Vhi
(n) (2.7)
where dj
and ei
are the error equations calculated from 2.4 and 2.5. ↵ and � are two
constants set prior to the training phase and are known as the learning rate. The learning
rate dictates how greatly the ANN should adjust its weights for a single data record.
With a higher learning rate, the system learns more quickly and requires repetitions. �
is the momentum value, used in combination of Wij
(n) and Vhi
(n), which are the weight
Chapter 2. Literature Review 13
changes made. By being able to “remember” previous weight changes, this momentum
value will be able to filter the fluctuating nature of weight adjustments.
During learning, the threshold values are also adjusted as follows:
�⌧j
= ↵dj
(2.8)
�✓i
= �ei
(2.9)
The network performs the error calculation and weight adjustment for each set of
weights. The ntwork repeats the training phase with di↵erent input output pairs until
the error is su�ciently low or another criteria is met to end learning.
After these adjustments has made for each node, the neural network propagates
backwards the network until there are no more hidden layers to update. After such, it
repeats the training phase until the error is su�ciently low or using another criteria to
end learning.
2.2 Dimensionality Reduction
Dimensionality Reduction is an aspect of machine learning, it involves the transforma-
tion of data that consists of a large number of channels or dimensions into a lower
dimension “description” [16]. This transformation is usually necessary due to a large
amount of data. It can also be used when it is unclear if a particular stream of data
is necessarily relevant to the current problem being studied. Being able to retain rel-
evant characteristics of the data while reducing the number of variables into a smaller
and more manageable size is the goal of Dimensionality Reduction. There are several
benefits to this process.
Chapter 2. Literature Review 14
Firstly, it is easier to process smaller data sets which requires less overhead and
computing time. Solutions are easier and faster to achieve which allows for faster model
development and testing. This causes a significant reduction in the time needed to both
test and train models. Since most models rely on multiple passes or epochs through the
dataset when learning, the model can therefore save a large amount of training time
with a reduced dataset.
Secondly, removing noise or misleading data may increase the accuracy of the model.
Dimensionality reduction in its conceptual core will requires parts of the data to be
omitted. When processing with any data captured in real life, “outliers” or “noise” are
bound to be included in the data set. If these are lost in the process of dimensionality
reduction, the model is then able to concentrate more on the core of the data and able
to emphasize more on the relevant features in order to solve the problem.
Third, dimensionality reduction allows for the use of simpler models. If the models
do not need to learn or filter out “outliers” in the data, it can learn more rapidly. This
also helps reduce the complexity of the model. With a broader generalization of the data
that is learned, this allows for simpler representation and interpretation by the model.
This also then leads to a smaller overhead and computing time when applying to future
analysis[17][18].
2.2.1 Feature Selection
One of the main approaches to dimensionality reduction is the concept of Feature Se-
lection. Feature selection involves the selection of a subset of the original data and
omitting the rest. This subset of data will then be used as a representation the original
data when used in the model. A central assumption while using feature selection is that
the data contains some amount of “irrelevant” data that does not provide any insight to
the problem being studied. Therefore, the problem would not benefit from the inclusion
Chapter 2. Literature Review 15
of this “irrelevant” data and training should improve if it is removed. However, not all
unnecessary data is based on its relevancy to the problem. Data streams who’s relevant
features may also be present in other data streams can be considered “redundant” and
therefore could also be omitted from the data used by the model. Not only is feature se-
lection useful as a dimensionality reduction technique but also it provides an indication
of what type of model would be needed using the reduced data.
In order to obtain “useful” data to construct a model feature selection employs the
three of the following approaches [19]:
1. Wrapper
2. Embedded
3. Filter
Wrapper methods are most easily described as taking a “Brute Force” approach to
optimization. Subsets of the original data are, scored against the model that is to be
used after feature selection. The subset of data that obtains the least number of errors,
or highest accuracy, within the model will be chosen as the result of this method. This
is a simple method which generally yields the best results with respect to the model. It
is often criticized for its exponentially large computational cost, due to the large number
of combinations of subsets that may exist in a data set. There are several variations
of the wrapper technique that allow for similar results while using fewer computational
resources. Such variations include using a stepwise technique to either include or remove
a particular variable from data set based on the impact that variable has on the accuracy
of the model. Since wrapper methods are often specifically optimized for their particular
model, the results are usually incompatible with a di↵erent model or dataset [20].
An embedded algorithm generally builds a feature selection method within the model
that is being constructed, or the learning algorithm that the model employs [21] [22].
Chapter 2. Literature Review 16
Such algorithms include the Recursive Feature Elimination [23], which is with a Support
Vector Machine and the LASSO method [24], used with a Linear Regression Model.
Filter approaches, instead of using a score based on a particular model’s errors or
accuracy with the dataset, use other metrics to rank subsets of the data. These measures
might include point-wise mutual information[25], Francis Galton work on Karl Pearson’s
product-moment Correlation [26][27], and Markov Blanket filters [28]. This approach
allows the study to concentrate more on the data which allows for a feature selection
result that is independent of a particular model, therefore the data set can be used
amongst a variety of models [29].
2.2.2 Feature Extraction
Another approach to dimensionality reduction is by using feature extraction. Feature
extraction employs the concept that if the data contains a large amount of redundancy,
a set of characteristics or “features” may be able to be used describe the data without
losing the original information that may be relevant [30].
There are many di↵erent algorithm and approaches to feature extraction, each may
have di↵erent results depending on the data’s characteristics. Some of the most com-
mon methods for feature extraction are Principal component Analysis, Factor Analysis,
Projection Pursuit, Independent Component Analysis [31][32].
Some feature extraction methods are tailored to a specific or particular type of data,
this thesis will look more in depth at a series of signal analysis.
2.2.3 Signal Transform
Signals are sometimes referred to in the world of communications systems and electronic
engineering as “a function that conveys information about the behavior of a system or
Chapter 2. Literature Review 17
attributes of some phenomenon” [33]. Signals usually are conveyed over space or another
variable such as time and are commonly used to transfer or encode certain information
that requires to be expressed in such a form. Signal processing is based upon the concept
that there is an input signal which in turn produces an output signal [33].
A signal transform is a series of mathematical transformations applied to an infor-
mation signal. They are mostly used to either improve or extract significant data from
the signal. While there are numerous and di↵erent types of signal transforms available,
this study will use two: the Fourier transform [34] and wavelet transform[35].
2.2.4 Fourier Transform
The Fourier transform is closely related to the Fourier series (see Figure 2.2). The
Fourier series describes complex periodic signals as a sum of an infinite amount of
simpler component waves, namely sine and cosine. The Fourier Transform takes this
concept, generalizes it, and uses it to decomposes signals in order to obtain their com-
ponent sine and cosine waves [36].
The Fourier Series can be described mathematically as the following [37]:
g(x) =1
2a0 +
1X
n=1
an
cos⇣n⇡x
l
⌘+
1X
n=1
bn
sin⇣n⇡x
l
⌘(2.10)
where g(x) is the approximation of the true function, or the signal to be transformed,
f(x). As n ! 1, the closer the Fourier Series representation, g(x), comes closer to being
an exact representation of original signal f(x). Due to its cyclic nature, the Fourier series
does represent an approximation using a periodic function. Being a periodic function,
its most basic characteristic is that g(x) will be repeated later, exactly one period, so
that g(x) = g(x + 2l) = ... = g(x + 2nl). The Fourier series is comprised mostly
of the Fourier series coe�cients, an
and bn
. a0 is a special Fourier series coe�cient
Chapter 2. Literature Review 18
Figure 2.2: Fourier Series Approximation through di↵erent K values(Red) of a SquareWave (Blue)
Chapter 2. Literature Review 19
that is a constant which determines where the “the average value” of f(x) exists and
therefore where g(x) should be based upon. These Fourier series coe�cients can be
mathematically represented as [38][39]:
an
=1
l
Zl
�l
g(x)cos⇣n⇡x
l
⌘dx, n = 0, 1, 2, ... (2.11)
bn
=1
l
Zl
�l
g(x)sin⇣n⇡x
l
⌘dx, n = 1, 2, ... (2.12)
The Fourier series coe�cients can also be written as a complex number which com-
prises of both a real and imaginary component. By representing these coe�cients in
a complex form, it decrease on computing power and time required to generate them.
Consider the follow representation of the Fourier series[37][39]:
g(x) =1X
n=1Cn
ein⇡x/l (2.13)
where:
Cn
=1
2l
Zl
�l
g(x)e�in⇡x/ldx (2.14)
where Cn
is the nth Fourier coe�cient of g(x). The Fourier transform takes this
series, generalizes it, replaces the discrete Cn
with a continuous function [40]. The
Fourier transform is finally represented as the following [39]:
F (g(x)) = G(f) =
Z 1
�1g(x)e�2⇡ixydx (2.15)
Chapter 2. Literature Review 20
Figure 2.3: Fourier Transform of a f(x), shown in red, on the time domain. Thecomponent waves, in blue, are then plotted along the frequency domain as peaks as the
result of the Fourier Transform
where F (g(x)) represents the Fourier transform of g(x), with x once again repre-
senting time (seconds), and while f represents the frequency (hz).
The result of a Fourier transform (see Figure 2.3) will return the component waves
that make up g(x), in both frequency and their respective amplitude. In order to reverse
the Fourier transform, F�1(G(f)) is used to represent the inverse Fourier transform
[40][39]:
F�1(G(f)) = g(x) =
Z 1
�1G(f)e�2⇡ixydx (2.16)
Chapter 2. Literature Review 21
The discrete Fourier transform uses a discrete function and in turn it is able to
generate a series of complex coe�cients from a data set. Discrete Fourier transforms
are used for their ability to examine the “periodic-ness” of the data set [41]. Due to the
nature of the transform, information contained in the lower level Fourier coe�cients,
at both the complex and sine/cosine representation, are of higher relevance than ones
contained in coe�cients in higher levels.
By transforming the original signal f(x), into its Fourier representation, g(x), allows
for f(x) to be represented by a series of complex coe�cients Cn
extracted from g(x),
where n = {0, ...,m}. Since a discrete Fourier transform generates the same number of
Cn
as the series of real numbers which it is trying to transform, it would not function
well as a dimensionality reduction technique. By using a discrete Fourier transform,
coe�cients used to represent the original signal f(x) are in order of relevancy. Hence
the number of these coe�cients can be reduced simply with the exclusion of the less
relevant coe�cients. The variable m is used to represent a general “cuto↵” point, an
arbitrary point which excludes less relevant coe�cient, hence decreasing the number of
coe�cients, or “features” needed. While doing so, it still retains the maximum likeness
to f(x) given the amount of coe�cients used. This cuto↵ technique has also been used
in past studies [42] [43].
2.2.5 Wavelet Transform
The Fourier transform has been the staple of many applications in the field of Signal
Processing. Easily representing a signal in terms of amplitude and frequency, the Fourier
analysis allows the users a quick snapshot of the characteristics of a signal [44]. However,
sometimes too much information is required for the Fourier Transform to recapture the
the original signal. A Fourier transform can only provide information on a frequency
scale but not on a time scale, or simply, it does not have a time resolution.
Chapter 2. Literature Review 22
In order to solve this problem, another technique named the Short Time Fourier
Transform was developed. This Fourier transform overcomes the lack of a time resolution
by windowing the original signal, f(x). Windowing involves the sampling of a small
section of f(x), calculating the Fourier Transform of that particular section, and by
sliding this window down the signal, eventually it covers the entirety of f(x). Results
from the Short Time Fourier Transform are in a frequency, amplitude, and now a time
scale. With this added dimension, the Short Time Fourier Transform becomes a three
dimensional transform. Unfortunately, the Short Time Fourier Transform su↵ers from
a static window size. If the size of the window, or resolution, is too small, it generates
poor frequency resolution, however, if the resolution was too large, it would result in
poor time resolution [45].
The wavelet transform is a type of signal transformation which provides information
on both time and frequency simultaneously, or a time-frequency representation [46].
While the Fourier transform breaks down the signal, represented by f(x), by sine and
cosine waves, the Wavelet transform breaks down f(x) into “wavelets” by using scaled
(y = g(2x)) and shifted (y = g(x + 2)) versions of a “mother wavelet”. It also uses
a variation of the windowing technique, by implementing a dynamically sized window
instead of a static one, see Figure 2.4.
A wavelet is defined as a wave like signal, but unlike the sine or cosine wave, it is a
function of a finite length, and both begins and ends at zero amplitude. In other words,
a wavelet has compact support, due to the fact that it has a value of zero outside of this
finite interval. A wavelet must also oscillate around this central amplitude, meaning the
average value of the wavelet in the time domain must be zero. There are some specifically
named wavelet, such as Haar or Daubechies4 (db4)(see Figure 2.5) that are widely used
in wavelet transform. These wavelet used are known as “mother wavelets” because they
are the sole wavelet which the wavelet transform is based on and the transform will use
Chapter 2. Literature Review 23
Figure 2.4: Results of the Short Time Fourier Transform, and the Wavelet Transform,and their di↵erence in time resolution
these “mother wavelet” in order to obtain di↵erent values and for calculation throughout
the transform.
By using these “mother wavelets” as a basis for wavelet transform, the Continuous
Wavelet Transform is defined as the following [47]:
Chapter 2. Literature Review 24
Figure 2.5: Visual representation of both the Daubechies 4, aka db4 Wavelet (Left),and the Haar Wavelet (Right) [1][2]
CWT x
(⌧, s) = x
=1ps
Z 1
�1x(t) ⇤
✓t� b
a
◆dt (2.17)
where ⌧ is the translation, or the location of the window, s represents the scale of the
current transform, and the mother wavelet is defined as ⇤ � t�b
a
�. The mother wavelet
is being both scaled by a and shifted by b, this is because wavelet transform does
not utilize the exact “mother wavelet” to break down f(x). The Continuous Wavelet
Transform uses varied versions of the prototype wavelet in order to break down the
signals. A single calculation is done for each window of ⌧ , and these calculations are
repeated until the “mother wavelet” is shifted down to the end of the signal. This
process then repeated for all of s, which scales the “mother wavelet” (see Figure 2.6).
Each computation for each scale of s results in a Wavelet Coe�cient on the time-scale
plane, which indicates that particular window’s likeness to the version of the “mother
wavelet” used at that time. Note that the size of the window is inversely related to s.
While useful for calculating every possible variations of the “mother wavelet” for
each scale of s, the Continuous Wavelet Transform is however both a time and resource
consuming technique. The results of the Continuous Wavelet Transform also generate a
very large amount of data which is not desirable in a feature extraction technique. By
Chapter 2. Literature Review 25
Figure 2.6: A simplified visual representation of the scaling and shifting techniquesused in Continuous Wavelet Transform [3]
limiting s to a certain subset of values, the Discrete Wavelet Transform is an alternate
method that reduces both resource intensity and amount of data generated. Using a
“two channel subband coder”, developed by Mallat [48], the signal f(x) is decomposed
by both a low pass wavelet filter, LPF and a high pass wavelet filter, HPF, see Figure
2.7. These decomposition filters are derived from the specific “mother wavelet” used
at the time, the filter will also act as a low-pass or high-pass filter by decreasing the
amplitude signals with the frequency that is beyond the cuto↵ point. Since the product
Chapter 2. Literature Review 26
of these two filters combined would result in half the band limit, these results can then
be decimated, or omitting every second coe�cient due to Nyquist’s rule. Nyquist’s rule
states that a function can be perfectly reconstructed if the sampling rate is no greater
than twice the band limit. Since the filter e↵ectively halves the band limit, it is therefore
applicable to also reduce the band limit by half, therefore allowing every other data point
to be omitted.
For example if f(x) spans from a frequency band of [0,⇡] and contains 512 sample
points, after passing f(x) through a LPF, the coe�cients would contain frequency band
that spans from [0,⇡/2], while the results of the HPF would contain the rest of the
frequency band, namely [⇡/2,⇡]. The result of the LPF and the subsequent decimation
is labeled cA1. In other words, the 1st level approximations coe�cients. Similarly, the
result of the HPF along with the decimation is called cD1, or 1st level Details coe�cients.
The names approximation and details comes from the concept that the lower parts of
the frequency band would contain a smoother and more approximate version of f(x),
while the details of f(x) are left in the higher frequency band.
After the 1st level of decomposition, cA1 is then passed into yet another LPF and
HPF, while cA1 remains for future use. This concept repeats itself after a certain level of
decomposition which results in the Discrete Wavelet Transform coe�cients being repre-
sented via the set of details coe�cient, cD1, cD2, ...cDn and the final level approximation
coe�cient, cAn, see Figure 2.8 [49].
By using these sets of the Discrete Wavelet Transform coe�cients, g(x) may be
reassembled, but in order to minimize the number of coe�cients used while still retaining
likeness to f(x), several techniques have been suggested in the past. Marghitu and
Nalluri [50] had suggested using a summation the coe�cients squared to be used as a
representation, as this will still retain the distribution of power over the levels for future
“models”. Paya et al. [51] had also demonstrated a method where using a threshold
value that would reset any coe�cient value below it to zero. This threshold value was
Chapter 2. Literature Review 27
Figure 2.7: A simplified visual representation of the Approximation and Details re-sults from Decomposition Filters used in a Discrete Wavelet Transform
set to the reference signal’s largest value. After doing so, they had also were able to
obtain the 10 most dominant features, or coe�cients, and process them in the model
using their wavelet number and time, along with their amplitude. While Tamura et al.
[52] had found that using the specific nth order coe�cients to be su�cient for their
model.
By using some variation of wavelets coe�cients to represent g(x), information about
f(x) may be kept in a small number of coe�cients derived from the Discrete Wavelet
Transform, making it a valuable candidate for Feature Extraction.
Chapter 2. Literature Review 28
Figure 2.8: Resulting Coe�cients from a Discrete Wavelet Transform
Chapter 3
Methodology
In this chapter, the series of steps taken to approach this study are examined. The
data will be described, it’s origins, naming convention, and how specific characteristics
were chosen from this data. The preprocessors used in this thesis as well as the details
and decisions that were made while building them will be presented. This chapter will
then include the description of the Artificial Neural Network’s details that this thesis
will be utilizing. Several constants and parameters are also listed. Lastly, a few minor
procedural notes are presented as well as a summary, outlining the study.
3.1 Data
3.1.1 Origins
The data used in this study was obtained over the course of two months using IOTech
equipment in Ontario, Canada. Five horses were included in this study. They were given
four test runs of one minute each. Each of these runs were recorded with 18 sensors, 3
accelerometer and 15 strain gauges at 20,000hz and 5,000hz respectively. Attached to
the right fore hoof the accelerometers were responsible for measuring the acceleration of
29
Chapter 3. Methodology 30
the hoof in the X,Y, and Z axis, all in the unit of Gs, where G is defined as a single unit
of the earth’s gravitational pull, or simply G = 9.81m/s2.
Three gauge sensors were placed at the wall of each the medial (inside) heel, toe,
and lateral (outside) heel of the right fore hoof. A fourth sensor was placed between the
medial heel and toe, and the fifth was placed between the toe and lateral heel. Each
sensors measured the micro-strain which is equivalent to producing a deformation of one
part per million (10�6) at each of their respective location in three vectors (R1/R2/R3).
The gauges were labeled Gauge1, Gauge2, Gauge3, Gauge4, and Gauge5, representing
the sensors placed at the medial heel, between the medial and toe, toe, between the toe
and lateral, and the lateral heel respectively. Figure 3.1 visualizes this arrangement.
Of the five horses being studied, four were described as being “pacers” while the last
one was a “trotter”. These are both two-beat alternating gaits. A “pace” describes legs
that are on the same side of the subject moving together. Both legs on the same side
are in the air at the same time, and are also on the ground at the same time. A “trot”
describes diagonal legs moving together. The right fore limb and left hind limb would
be in the air together, and vice versa.
Each horse proceeds through the course in a counterclockwise fashion. In doing so
each run was then broken down into six parts. This consisted of three straight portions
were alternated with three portions where the horse would be making a left turn. This
was done in all four of their runs. Three runs were done shod, while the last one was
barefoot.
3.1.2 Computable characteristics
This study was looking for several characteristic that it hopes it can classify. The
desirable characteristics would be described as:
• consistently observable throughout the duration of the run
Chapter 3. Methodology 31
Figure 3.1: Right fore hoof with location of Strain gauge (G1-G5)
• di↵erentiable between runs
• observable from from data given by the accelerometer and the strain gauges
• able to garner enough training and testing sets to be thoroughly train and test theclassifier
After the data was analyzed with these attributes in mind, three characteristics were
chosen for this study:
1. Shoe, whether the horse was shod or barefoot
Chapter 3. Methodology 32
2. Gait, whether the horse was pacing or trotting
3. Direction, whether the horse was running straight or making a left turn
The data was divided in such a way that each data fragment can represent the
three characteristics listed above. Data fragments were divided in such a way that they
would only contain one of the two possible state for each of the three characteristics.
For example, a data fragment would only contain information where the horse was
making left turn or moving straight, not both. This also applied to both the shoe and
gait characteristic. In order to preserve consistency, records in the data fragments also
had to be recorded consecutively and would only contain information from one of the
eighteen sensors as described in section 3.1.1. Figure 3.2 visualizes data from a single
sensor during the course of a run by a sample horse. Six data fragments are obtained
this run, the fragments in red representing data fragments in sections of the run where a
left turn was occurring, and the fragments in blue representing data fragments where the
subject was moving straight. Since the state of both the Gait and Shod characteristic
are consistent from run to run, no further division of data fragments are necessary.
For this study, in each of the five horse’s four runs were divided into six data frag-
ments. The numbers of data fragments between each state of the three characteristics
can be found in Tables 3.1, 3.2, and 3.3. The number of data fragments were determined
by calculating the combinations that of sensors, number of runs, parts of run, and horses
that represent that state. For example, in calculating the Pace state of the Gait char-
acteristic, 18 sensors measured each of the 6 parts of 4 runs with 1 horse, which results
in 432 (18⇥ 6⇥ 4⇥ 1) data fragments.
In total, the dataset was split into 2145i separate data fragment (5 horses ⇥ 18
Measurements per horse ⇥ 4 Runs per horse ⇥ 6 Parts per run).
iThere are 15 less data records due to the fact that there were no Dorsal Stain Gauge data for thehorse “Art A↵air” during its last left turn of its 4th run
Chapter 3. Methodology 33
Figure 3.2: Sample Data from a Sensor divided into Data Fragments
Table 3.1: Breakdown of Data based on the Shoe Characteristic
No. of No. of Parts of No. of No. ofState Sensors Run Run (/6) Horses FragmentsShod 18 3 6 5 1620Barefoot 18 1 6 5 525i
Table 3.2: Breakdown of Data based on the Gait Characteristic
No. of No. of Parts of No. of No. ofState Sensors Run Run (/6) Horses FragmentsPace 18 4 6 4 1713i
Trot 18 4 6 1 432
Table 3.3: Breakdown of Data based on the Direction Characteristic
No. of No. of Parts of No. of No. ofState Sensors Run Run (/6) Horses FragmentsStraight 18 4 3 5 1080Left Turn 18 4 3 5 1065i
Chapter 3. Methodology 34
3.1.3 Naming Convention
In order to discuss a specific data fragment, this study will reference to its characteristics
as described above. The following naming convention was employed:
R#SHHD #.SensorType
R# = Run Number (1,2,3, or 4)
S = Accelorometer/Strain Gauge (A/S)
HH = Name of the Horse (e.g. Art A↵air was labeled as “AR”)
D = Direction (L/S)
# = Part of run (1,2, or 3)
SensorType = Which Accelormeter or Strain Gauge (“X” or “Gauge5R3”)
For example, R2SNIS 3.Gauge3R1.txt would describe the data found in:
R2 = Run Number 2
S = Strain Gauge
NI = Horse “Nicklers”
S = Straight
3 = Part 3 of Run
Gauge3R1 = Strain Gauge 3 in Vector 1
Figure 3.3 is an extension of Figure 3.2. It includes labels from the naming conven-
tion. Figure 3.3 displays the data obtained captured by the accelerometer in the X axis
from the 3rd run of horse “Nicklers”. It also shows how the resulting data fragments
from this run were labeled.
Chapter 3. Methodology 35
Figure 3.3: Sample Data from a Sensor divided into Labeled Data Fragments
3.2 Dimensionality Reduction
With 20,000 records recorded per second by the accelerometer (5000 per second for strain
gauge data), most data fragments contains anywhere from 100,000 to 600,000 samples.
Using any single data fragment directly as an input to any feed forward Artificial Neural
Network would be computationally expensive. If a normal 16 input Feed Forward Neural
Network would require 16 ⇤ 20 ⇤ 2 = 640 weights. A Neural Network with 600,000 input
nodes, 1,000,000 hidden nodes and 2 output nodes would require 1.2 ⇤ 1012 weights,
which would be 1.875 ⇤ 109 times larger in comparison.
In order to run an e�cient and accurate model it is imperative that the data is
reduced to a more manageable size. The dangers of reducing such a large dataset is
the that the reduced set will not be able to accurately represent the original data in
Chapter 3. Methodology 36
respect to the characteristics that are being examined. In other words, the desirable
characteristics of the dataset may be lost upon reducing it. In order to preserve the
data’s characteristics, several methods were examined to determine which are the best
based on its performance when it is to be classified.
Due to the rhythmic, cyclic, and repeated motions of gait, the data received at each
gauge or accelerometer are highly repetitive. Each waveform over a period of time
resembles the wave previous to it. By breaking down the data into these separate waves
it is hoped that the they will both represent the overall data and will also provide the
model with a smaller number of inputs. Techniques to accomplish this have been in
place for use as signal transforms for many years. Since signals can often be nearly
identical waves following each other they can often be represented by a particular wave
segment. Signal transformation is most commonly used for filtering and manipulating
a signal in order to improve or extract particular information from it.
3.2.1 Discrete Fourier Transform
The Discrete Fourier Transform, see 2.2.4, was first selected due to its popularity,
relative simplicity and the speed of the algorithm. The data was first treated to a discrete
Fourier transform by using the Fast Fourier Transform algorithm via the FFTW3 library
[53]. By processing a single data fragment using the Fourier transform, this resulted in
a set of Fourier coe�cients. Since lower frequency coe�cients generated by the Fourier
transform are more representative of the data than their higher frequency counterparts
it is feasible to use these lower frequency coe�cients as a representation the original
data. A subset of m lowest frequency coe�cients were used where:
m = 8, 16, 32ii
iiSince every fast Fourier transform in the data produced a “0” for its second coe�cient, the imaginarycomponent of A0, see 2.11 and 2.13, therefore this coe�cient was then omitted in the Fourier transform
Chapter 3. Methodology 37
These subsets were then used as the Fourier representation of that particular data
fragment. In this study, each value in each subset were used as input to the ANN.
Larger number of coe�cients were also taken but were later deemed to have provided
insignificant improvement to the results, see section 4.2.2.
3.2.2 Discrete Wavelet Transform
The Wavelet Transform, see 2.2.5, was also selected in this study in order to overcome
the lack of time resolution in the Discrete Fourier Transform. A one dimensional Discrete
Wavelet Transform is performed on all of the individual data fragments decomposed over
thirteen levels. Both a “Haar” and a “DB4” mother wavelet were examined. This was
done using the PyWavelets library [54]. The transform produced 13 levels of wavelet
details coe�cients ([cD1, cD2, ...cD13]) and a singular set of wavelet approximation
coe�cients (cA1). With these thirteen levels of wavelet coe�cients the summation of
each level of details coe�cients squared are calculated. These thirteen summations
are the representation of the power distribution over the levels of wavelet coe�cients.
These summations were also used as the wavelet representation of that particular data
fragment.
While there were many methods that can be used to find an appropriate represen-
tation of these wavelet coe�cients the summation of the squares over each level, as
suggested by Marghitu and Nalluri [50] was chosen. This was due to their study also be-
ing based on data about gait and gait like characteristics. All thirteen of the summation
values were used as inputs directly into the ANN.
representation of the data. However, the names of the transformation will still be referred to as usingthe numbers, m, instead of m-1
Chapter 3. Methodology 38
3.3 Artificial Neural Network
Following the dimensionality reduction procedures the transformed data streams are
then used along with their characteristic (Gait, Shoe, Turn) as an input-output pair to
an Artificial Neural Network. This was constructed to be a three layer Back Propagation
network with a learning rate of 0.05 and a momentum of 0.80. The network used the
results obtained from one of the six dimensionality reduction techniques in section 3.2 as
input into the network. The network would therefore have either 7, 15, or 31 (see ii from
section 3.2.2) input nodes for data that was preprocessed using the Fourier transform,
and 13 inputs nodes for data that was preprocessed using the wavelet transform. This
was done in conjunction with their respective gait characteristics (described in section
3.1.2) as the desired output for the network.
After the training phase of each epoch the ANN immediately entered the testing
phase in which their accuracy was calculated in order to prevent the network from being
over trained. In the testing phase the accuracy was determined to be the percentages
of records the ANN classified correctly out of the total number of records given. The
network used 67% of the data set during the training phase and the remaining 33% of
the data set was used during the testing phase.
This was done for 10,000 epochs, an arbitrary number chosen to provide all of the
ANN more than enough epochs to reach each of their maximum accuracy rates. Since
the ANNs will be tested at the end of each epoch, only the highest accuracy of the ANN
over the course of the 10,000 epochs will be used in the analysis. This was done to
prevent regression, as some ANNs had over-trained after reaching their highest accuracy
after a very few number of epoch. Using this method allows the results of the ANNs
to be based on it’s best accuracy, not the accuracy that is determined by an arbitrary
number of epoch.
Chapter 3. Methodology 39
The hidden layer was comprised of 3n2 nodes, where n is the number of input nodes.
This number was selected to fit the criteria that few studies such as Blum [55] and Lino↵
and Berry [56] have suggested. The data set that is used as inputs is normalized between
[�1, 1]. The output characteristics were represented by two nodes each, one for each state
of the characteristic. The node with the largest value was considered “activated”. For
example, the activation of one output node would represent the subject was shod, while
the activation of the other would represent that the subject was barefoot. The results
from each of the ANN were then compiled. Their accuracy and the epoch needed to
arrive at that accuracy were recorded.
3.4 Data Procedures
3.4.1 Data Streams
In this study a data stream refers to the sensor which the data had been measured from.
These data streams are sets of data fragments that were captured by that particular
sensor. The data used in this study is comprised of eight “primary” data streams: three
accelerometer (A-X, A-Y, A-Z) measurements and five strain gauges (S-G1, S-G2, S-G3,
S-G4, S-G5). Each of the five strain gauge recorded three vector measurement, which
the primary strain gauge data stream treats the measurements from these vectors as
three separate data fragment. For example, while the accelerometer in the X direction
provided only 120 data fragment for the analysis of the “Gait” characteristic (4 Runs ⇥
5 Horses ⇥ 6 Parts of Run), strain Gauge1 would contain 360 data fragment (3 Vectors
⇥ 4 Runs ⇥ 5 Horses ⇥ 6 Parts of Run), see Figure 3.4 and 3.5.
Another set of “secondary” or “combined” data streams was created for this study: a
combined accelerometer (AC) and five combined strain gauges (SC-G1, SC-G2, SC-G3,
SC-G4, SC-G5) . These data streams combined similar data fragment during the same
Chapter 3. Methodology 40
time frame as one input-output pair. The combined accelerometer uses data fragments
containing the three axis (X,Y,Z) as one record for the training/testing data set. While
the combined Strain Gauge uses data fragments containing the three vector measure-
ments (R1,R2,R3) as one record for the training/testing data set. Figure 3.6 and 3.7
visualizes this process, showing that an input-output pair is comprised three separate
data fragment for these “combined” data streams. The use of this “combined” data
stream requires the ANN to triple the number of nodes used in the input layer. These
“combined” data streams were created to allow the ANN to utilize similar data streams
that may improve on ANN performance.
One last “Merged Characteristic” was also added which was a combination of each
of the three gait, shoe and direction. For this particular merged characteristic, the ANN
would expand from having 2 output nodes to 6 output nodes. These 6 output nodes
represented the three characteristics state pair. As with the original ANN, only one node
between each of the characteristic state pairs may be considered “activated”. The ANN
is considered “accurate” for a particular input-output pair if it to activates all three of
the correct characteristics state correctly, leaving the other three nodes inactive. Figure
3.8 and 3.9 shows the expanded output layer between a normal characteristic and the
merged characteristic. This was done to provide a more di�cult test for each of the
ANNs, and becomes useful when determining which of the data streams obtained the
most accurate result.
Chapter 3. Methodology 41
Figure 3.4: Sample data process for the Accelerometers in X-Axis (A-X) data stream
Chapter 3. Methodology 42
Figure 3.5: Sample data process for the Strain Gauge1 (S-G1) data stream
Chapter 3. Methodology 43
Figure 3.6: Sample data process for the Accelerometer Combined (AC) data stream
Chapter 3. Methodology 44
Figure 3.7: Sample data process for the Strain Combined Gauge1 (SC-G1) datastream
Chapter 3. Methodology 45
Figure 3.8: Output Layer of the ANN for the Shoe Characteristic
Chapter 3. Methodology 46
Figure 3.9: Output Layer of the ANN for the Merged Characteristic
Chapter 3. Methodology 47
3.5 Summary
For this study, 14 groups of data streams were created from a group of 2145 data
fragments. Each of these streams is used individually. For each of these 14 data streams,
five di↵erent types of data reduction were applied to them. Using these compressed data
streams as input to the ANN along with one of the four corresponding characteristic or
desired output, the network is tested for their accuracy. Using the accuracies for each
of the three combination: data stream, dimensionality reduction, and characteristics,
all 280 (5 ⇤ 14 ⇤ 4) ANNs were then assembled, and accuracies noted. A list of data
streams can be found on Table 3.4, while the list of dimensionality reduction technique
and characteristics can be found in Table 3.5.
Table 3.4: List of Data Streams Used
“Primary” Data Streams “Combined” Data StreamsAccelerometers in X (A-X)
Accelerometers Combined (AC)Accelerometers in Y (A-Y)Accelerometers in Z (A-Z)Strain Gauge1 (S-G1) Strain Gauge1 Combined (SC-G1)Strain Gauge2 (S-G2) Strain Gauge2 Combined (SC-G2)Strain Gauge3 (S-G3) Strain Gauge3 Combined (SC-G3)Strain Gauge4 (S-G4) Strain Gauge4 Combined (SC-G4)Strain Gauge5 (S-G5) Strain Gauge5 Combined (SC-G5)
Table 3.5: List of Dimensionality Reduction Technique Used, (a), and Characteristicsthat were assessed, (b).
(a)
Dimensionality ReductionFourier - 8 Coe�cientsFourier - 16 Coe�cientsFourier - 32 Coe�cients
Wavelet - HaarWavelet - db4
(b)
CharacteristicsGaitShoe
DirectionAll Three Merged
Chapter 4
Results and Discussions
In this chapter, a brief summary of results based on the various tests will be described.
Following that the results using di↵erent dimensionality reduction techniques will be
compared and analyzed. Characteristics of the data are used to determine the di�culty
of accurately classifying a particular characteristic. Data streams will also be investi-
gated to determine if there are particular data streams that result in a more accurate
classification. Finally, the chapter will suggest a final configuration which was found to
most accurately classify the data.
4.1 Summary of Results
For the tests, 280 Back propagation networks were run 20 times each. Since the networks
were stochastic, values of the initial weights were randomly determined. The stochastic
nature of the network also applies to the separation of records for training and testing
sets. The results are documented in Appendix A. For each of these runs the maximum
accuracy reached by the ANN and the epoch it first achieved that accuracy is recorded.
Accuracy is determined during the testing phase using the testing data set. Using
this testing set, the ANN classifies the inputs based on the knowledge it has obtained
48
Chapter 4. Results and Discussion 49
during the training phase. The number of records the ANN tested correctly during this
phase was divided by the total of records in the phase to obtain the accuracy of the
network. A result of 0.0 would mean that the ANN did not test correctly amongst any
of the records, while a 1.0 would mean that the ANN tested correctly on all of the data
records. Standard deviations of each of the groups are also recorded in Appendix C.
An average was calculated using of the maximum accuracy from each of the 20
iterations. This average is used as the metric when determining the performance of that
particular ANN. Table 4.1 displays the average accuracy obtained when using each of the
15 data stream to determine a particular characteristic using a particular dimensionality
reduction technique. Table 4.2 displays the average accuracy obtained when using each
of the dimensionality reduction techniques on a specific data stream when analyzing
a particular characteristic. A summary of each of the data streams, dimensionality
reduction techniques, and characteristics can be found in Tables 3.4 and 3.5.
Table 4.1: Average accuracy using various Data Streams with dimensionality reduc-tion techniques analyzing characteristics
CharacteristicDR Technique Gait Shoe Turn MergedFourier-08 0.831324669 0.787970598 0.710622214 0.445948054Fourier-16 0.815858711 0.786629026 0.702716079 0.422603818Fourier-32 0.818038289 0.786007455 0.673884739 0.409961739Wavelet-DB4 0.967225936 0.90546562 0.866772589 0.731500993Wavelet-Haar 0.958902184 0.876826711 0.84202675 0.683527318
4.2 Dimensionality Reduction Techniques
4.2.1 Fourier vs Wavelets
In Table 4.1 the results of each dimensionality reduction techniques indicates a higher
accuracy is achieved when using wavelet transforms as compared to Fourier transforms.
A graph which compares each of the dimensionality reduction techniques against the
Chapter 4. Results and Discussion 50
Table 4.2: Average accuracy obtained by data Streams using various dimensionalityreduction techniques analyzing characteristics
CharacteristicData Streams Gait Shoe Turn MergedA-X 0.8801028 0.844159189 0.7370732 0.48756093A-Y 0.883440326 0.8477536 0.7107317 0.50219511A-Z 0.881643147 0.856739463 0.7295123 0.5107317AC 0.894223505 0.876765105 0.75195122 0.56512199S-G1 0.868507432 0.799223495 0.71344263 0.48147542S-G2 0.851251137 0.785246 0.75409831 0.48737706S-G3 0.859275284 0.786367558 0.6476229 0.41622956S-G4 0.856686884 0.812424537 0.76172132 0.51090162S-G5 0.860828337 0.7891286 0.7831967 0.51983612SC-G1 0.889345368 0.842362042 0.79097567 0.61951224SC-G2 0.884980789 0.825160495 0.81487804 0.60658539SC-G3 0.893966653 0.822849832 0.72146344 0.5385366SC-G4 0.896020621 0.862644442 0.84853663 0.65365853SC-G5 0.895507126 0.849293989 0.86365858 0.64219511
characteristics can be found on Figure 4.1. ANNs that used a wavelet transform resulted
in a higher accuracy compared to ANNs which used the Fourier transform.
Chapter 4. Results and Discussion 51
Figure 4.1: Comparison of Average Accuracies of di↵erent dimensionality reductiontechniques over characteristics
Chapter 4. Results and Discussion 52
This result is believed to be due to the attributes of the two transform. While
the Fourier Transform does not incorporate time resolution, the wavelet transform does.
With real world data which are derived from an imperfect source such as an animal there
are likely to be fluctuations or changes in the data as time progresses. It is expected that
the characteristics of the data collected from the beginning of the run will di↵er at the
end of the run. Factors such as the horse being warmed up, or getting tired, contribute
to this di↵erence. By incorporating a time resolution, the Wavelet transform can adjust
to these di↵erences in the data. The Fourier transform is forced to transform the data
as a whole and is unable to make adjustments changes in the data which occur through
time.
By being able to better model the original data, this allowed the ANNs to identify
correlations more easily and thus be more accurate. As a result of these experiments the
wavelet transform is identified as the more successful dimensionality reduction technique
to use in terms of producing more accurate ANNs.
A standard student t-test two-sample assuming unequal variances was performed on
the two groups of accuracies from ANNs using the Fourier transform and ones using
the wavelet transform. The values of all the t-tests done between the two groups,
shown in Table 4.3, notably are all lower than the pre-determined p-value of 0.001
(99% confidence). A t-test result that is lower than the p-value indicates in that much
confidence that the null hypothesis can be rejected. In this case, since the p-value was
set to 0.001, is then 99% confident that the null hypothesis can be rejected. The null
hypothesis in this situation is stated as, H0 : µ1 = µ2, while the alternative hypothesis
is Ha
: µ1 6= µ2. It therefore is very statistically unlikely that the interval of the means
of both group overlap each other. In short, it is highly likely that the two sets of results
are significantly di↵erent from each other.
Chapter 4. Results and Discussion 53
Table 4.3: Student’s t-test values for accuracies obtained by using Fourier and wavelettransforms
CharacteristicData Streams Gait Shoe Turn MergedAX 2.36298e-35 5.87038e-15 5.48691e-05 4.10575e-07AY 1.61843e-28 3.64691e-05 5.77122e-27 3.48164e-26AZ 3.18551e-32 1.64214e-11 1.10984e-11 1.13491e-22AC 7.64366e-44 4.12852e-19 1.98393e-18 1.12083e-33S-G1 1.40112e-35 2.45202e-24 1.45634e-23 4.16257e-31S-G2 1.54525e-30 2.6781e-18 1.22485e-38 9.15695e-32S-G3 1.64911e-38 1.11382e-25 4.45357e-50 4.4876e-47S-G4 1.40235e-33 9.11687e-18 1.36406e-39 2.03756e-28S-G5 2.86371e-26 2.13113e-11 9.39852e-23 1.72443e-26SC-G1 1.37974e-34 1.52812e-26 1.49676e-26 2.43568e-43SC-G2 1.20566e-34 7.11246e-20 7.85118e-42 1.56008e-55SC-G3 5.53116e-25 8.16644e-21 8.82545e-34 8.39959e-38SC-G4 2.04738e-34 1.72288e-30 2.44867e-43 3.45588e-60SC-G5 4.409e-33 2.81711e-34 1.28151e-27 1.12841e-52
4.2.2 Additional Fourier Coe�cient Analysis
It can be argued that the results using the Fourier Analysis were due to not using enough
coe�cients to accurately represent the data stream. While this is possible there can
only be a limited amount of coe�cients used as inputs in an ANN before its deemed too
computationally expensive to function as a dimensionality reduction technique. Figure
4.2 displays the accuracies of Fourier Analysis as the number of coe�cients increases.
While the Gait and Shoe Characteristics increase in accuracy when more coe�cients
are used as inputs this increase is small when compared to the computational resources
used when the number of Fourier Coe�cient doubles.
Supplementary analysis was done to further this claim. Fourier-64 and Fourier-128
are implemented as data reduction techniques. Each of the ANNs using those techniques
were run. The ANNs would only require approximately 10 minutes when 8-13 inputs
were used. Using the Fourier-64 technique takes an average of an hour to complete.
Chapter 4. Results and Discussion 54
Figure 4.2: Comparison of Average Accuracies of di↵erent Characteristic over FourierTransform dimensionality reduction techniques
When using the Fourier-128 technique the ANNs would require up to three hours. The
accuracy results of these supplementary ANNs can be found in Appendix B.
Figure 4.3 is an extension of Figure 4.2. It was created by adding the results from
the supplementary Fourier-64 and Fourier-128 experiments. Based on the results found,
increasing the number of coe�cients to the Fourier transform used as inputs did not sig-
nificantly contribute to the accuracy. The computational resources that are required for
such a task were high given such a small increase in accuracy. Table 4.4 shows a metric
for accuracy per number of inputs used. It presents the accuracy per dimensionality
reduction technique divided by the number of inputs the ANN employed. Depending
on the number of inputs used, a technique may be computationally cheaper by us-
ing less inputs while still obtaining a similar accuracy as another technique which is
computationally more expensive. While it might be more e�cient to use a Fourier-08
Transform if computational resource were an immense restriction, the Wavelet trans-
forms still presents the higher overall accuracy, with an e�ciency score that is near the
Chapter 4. Results and Discussion 55
Figure 4.3: Comparison of Average Accuracies of di↵erent Characteristic over FourierTransform dimensionality reduction techniques including Supplementary Fourier Trans-
forms
score of the Fourier-08.
Table 4.4: E�ciency Score based on Accuracy of ANN and Number of Inputs used
CharacteristicDR techniques Gait Shoe Turn MergedFourier-08 0.116734518 0.111695557 0.097211296 0.060332832Fourier-16 0.054468919 0.05258853 0.045753864 0.027728577Fourier-32 0.026520379 0.025338507 0.021499787 0.013096777Fourier-64 0.012905452 0.012444275 0.010492719 0.00652655Fourier-128 0.006445499 0.006137613 0.005117233 0.003181044Wavelet-DB4 0.072869135 0.063198279 0.067307795 0.04851819Wavelet-Haar 0.071921436 0.06103972 0.064958432 0.044053627
4.2.3 Mother Wavelets: Haar vs DB4
The DB4 wavelet and Haar wavelet were used as the mother wavelets for Wavelet Trans-
form. Table 4.5 shows the number of times an ANN had obtained the highest average
accuracy using each of the wavelets for preprocessing. The table also shows the average
Chapter 4. Results and Discussion 56
di↵erence between the two accuracies. Table 4.6 shows the average accuracy and average
di↵erence by data streams and mother wavelets.
Table 4.5: Number of ANNs using Wavelet Transforms which obtained the HighestAverage Accuracy and Average Di↵erence between Accuracy sorted by Mother Wavelets
and characteristics
Wavelet TransformCharacteristic DB4 Haar � AccuracyGait 10 6 0.008323752Shoe 13 1 0.02863891Turn 11 3 0.024745839Merged 14 0 0.047973675
Table 4.6: Average Accuracy and Average Di↵erence between Accuracy sorted byMother Wavelets and Data Stream Using Wavelet Transforms
Wavelet TransformData Stream DB4 Haar � AccuracyA-X 0.79119063 0.768003802 0.023186828A-Y 0.84677471 0.806964064 0.039810646A-Z 0.858825466 0.801797136 0.05702833AC 0.914762539 0.863093716 0.051668822S-G1 0.805985792 0.778456677 0.027529115S-G2 0.815444384 0.783450183 0.031994201S-G3 0.784717474 0.770874718 0.013842755S-G4 0.849881369 0.799029361 0.050852009S-G5 0.796408559 0.78272765 0.013680909SC-G1 0.920025679 0.913462775 0.006562904SC-G2 0.936713706 0.918870305 0.017843401SC-G3 0.893036014 0.875786294 0.01724972SC-G4 0.973571909 0.969383814 0.004188095SC-G5 0.961039754 0.932589874 0.02844988
Comparing these two transforms by themselves the results suggests that using the
DB4 wavelet as the mother wavelet most often resulted in more accurate classification by
the ANN. A student t-test was then performed on the accuracies obtained by the DB4
wavelet compared to the accuracies obtained by the Haar wavelet. Table 4.7 displays
these t-test results.
Using a significance level of p = 0.05 (95%), there is no statistical evidence that
whether the means of the two samples are significantly di↵erent across all of the possible
Chapter 4. Results and Discussion 57
Table 4.7: Student’s t-test values for accuracies obtained by using Wavelet-Haar andWavelet-DB4 transforms
CharacteristicData Streams Gait Shoe Turn MergedAX 0.339680455 0.009768709 0.649663505 0.019779411AY 0.328289413 0.094128691 0.00194188 0.00020807AZ 0.005621079 0.256328821 0.000262607 4.47466e-06AC 0.020537525 0.000711237 0.00430746 0.000104489S-G1 5.62816e-05 0.000149404 0.97340923 0.001747309S-G2 0.151671582 0.044723102 6.50605e-06 0.000191347S-G3 0.158336705 6.11864e-06 0.999995882 0.125999151S-G4 0.002550454 1.75181e-08 0.000273349 4.67397e-07S-G5 0.552556776 0.04259035 0.004528171 0.275032293SC-G1 0.765957446 0.006413291 0.677020257 0.756449585SC-G2 1.000000000 0.009302044 0.36382045 0.10419116SC-G3 0.834862234 0.406943561 0.087887825 0.214078641SC-G4 0.330564931 0.651486607 0.079419143 0.276332993SC-G5 0.043640834 0.078540882 0.006660337 0.008960876
configurations. While there are some combinations of data streams and characteristic
that generated a score less than 0.05, this was not true for every single combination.
Specifically the data stream SC-G4, which will be looked upon more in depth later
on, obtained a score more than 0.05, proving it to be statistically insignificant. It can
then be concluded that the null hypothesis, H0 : µ1 = µ2, cannot be rejected due to
the lack of statistical evidence. Since there is not statistical evidence that these two
samples significantly di↵er from one another, both these techniques will be considered
when discussing the highest accuracy configuration.
4.3 Characteristics
4.3.1 Gait, Shoe, and Turn
For some specific characteristics the di↵erence in dimensionality reduction technique
does not contribute as significantly to the accuracy of the ANN as the data stream
that it is transforming. There are some data streams that do not completely represent
Chapter 4. Results and Discussion 58
the absence or presence of a particular characteristic. Figure 4.4 displays the three
characteristics (Gait, Shoe, and Turn) over the eight “primary” Data Streams. These
include the accelerometers in the X, Y, and Z axis, as well as strain Gauge1, Gauge2,
Gauge3, Gauge4 and Gauge5. This subsection will looks at the e↵ect di↵erent data
streams on the accuracy of ANNs based on their location. Results and analysis from the
“combined” data streams will not considered, as they are just a combination of these
eight “primary” data streams.
Figure 4.4: Comparison of Average Accuracies of non-combined data streams for theeach characteristics regardless of dimensionality reduction techniques
For the Turn characteristic the X axis of the accelerometers performed the best
overall. This is likely due to turning being indicative of motion in the X axis. Strain
Gauge 3 did not perform as well as its other Strain Gauge cohorts. This can be explained
Chapter 4. Results and Discussion 59
by noting that Strain Gauge 3 was placed on the middle of the hoof. In such a location,
the variation in strain between a turn or moving straight can be minimal. The results
suggest that by using only this gauge, the ANN was unable to determine the di↵erence
between a left turn or a straight movement. It is also noted that using Strain Gauges
located at the lateral (or outside) part of the hoof (S-G4 and S-G5) resulted in higher
accuracies compared than using Strain Gauges at the medial (or inside) part of the hoof
(S-G1 and S-G2). This pattern is explained the lateral part of the hoof experiencing
more variance in strain during turns compared to straight movement. This variation is
smaller upon the medial part of the hoof.
4.3.2 Merged characteristics Used to Enhance Results
The merged characteristic was the final configuration which ANNs were used to classify.
This merged characteristic is a combination of the Gait, Shoe and Turn characteristic.
By combining all three characteristics, the classification process became more di�cult
for the ANN as it needed to classify all three characteristics correctly and create three
correct outputs in order to be deemed accurate.
Used as a general overview the merged characteristic was designed to test the ANN’s
ability to classify all three characteristic as once. Instead of using the average accuracy
from the previous three characteristics this merged configuration score is used as a metric
to determine the ANN’s overall ability to accurately classify this data.
The ANN is only deemed accurate if it correctly identifies all three output charac-
teristics state pairs correctly. If less than all three output characteristic state pairs are
correct than the output is ranked incorrect. Any inaccuracies will have a strong negative
a↵ect upon the overall performance of the ANN. This causes the merged characteristic
to produce lower accuracies than the individual characteristics. This was designed to
increase the range of accuracies obtained from the ANN. By using this configuration it
Chapter 4. Results and Discussion 60
becomes easier to identify the combination of data steams and data reduction techniques
that increase the accuracy of the ANN due to the increase range of results.
Table 4.8 is an variant of Table 4.1 but instead of displaying the average maximum
accuracies over each data stream it display both the minimum and maximum accuracies
as well as the range between them.
Table 4.8: Range of Average Accuracies over Data Streams found in ANN usingdi↵erent Dimensionality Reduction techniques for each Characteristic
Dimensionality TechniqueCharacter- Data Fourier Waveletistic Stream 08 16 32 DB4 Haar
GaitMax 0.89345311 0.83183579 0.83697053 1.00000000 1.00000000Min 0.80802421 0.79845968 0.79204121 0.91415026 0.90595353Range 0.08542889 0.03337611 0.04492932 0.08584974 0.09404647
ShoeMax 0.84338905 0.84082163 0.83568689 0.96148905 0.96534016Min 0.75323558 0.75280411 0.75496111 0.82312342 0.80543584Range 0.09015347 0.08801753 0.08072579 0.13836563 0.15990432
TurnMax 0.83414645 0.792683 0.76829275 0.98780495 0.97926825Min 0.54836065 0.57090155 0.55819655 0.6939024 0.70853655Range 0.2857858 0.22178145 0.2100962 0.29390255 0.2707317
MergedMax 0.52804875 0.4646342 0.48155745 0.94756095 0.93292685Min 0.33524595 0.31967225 0.31024595 0.5658537 0.5085365Range 0.1928028 0.14496195 0.1713115 0.38170725 0.42439035
The merged characteristic however did not greatly increase the range in any of the
Fourier techniques, see Figure 4.5, 4.6, and 4.7. Given the previous results with Fourier
in section 4.2.1, this is likely due to the Fourier Transform performing poorly with this
set of data. As the inaccuracies generated by the Fourier Transform compound upon
each other, it results in a lower overall accuracy with all the data stream. Given this,
the low accuracies result in a condensed the range. Instead of producing results over a
wider range the Fourier transform produces only lower accuracy results.
With the Wavelet Transformation the merged characteristic result in an increased
range of classification accuracies. The data streams which had obtained lower accuracies
in the three non-merged characteristics resulted in an even lower accuracy when using
Chapter 4. Results and Discussion 61
the merged configuration. Vice versa, data streams that resulted in higher accuracy
classifications in the three non-merged characteristics only su↵ered slightly when using
the merged characteristic. Given these results the range of accuracies increased when
using the merged configuration compared to the non-merged characteristics. See Figure
4.8 and Figure 4.9 for a graph of this pattern. By increasing this range it becomes more
apparent which data streams are more e↵ective as input for accurate classifications.
Using other non-merged classification it was di�cult to separate the results as di↵erent
accuracies would only di↵er by only several percent. Merging has the e↵ect of enhancing
the results by increasing the range of output values which then allows this study to easily
identify the most accurate ANNs.
Chapter 4. Results and Discussion 62
Figure 4.5: Average Max Accuracies by characteristics using the Fourier-8 DataReduction Technique
Chapter 4. Results and Discussion 63
Figure 4.6: Average Max Accuracies by characteristics using the Fourier-16 DataReduction Technique
Chapter 4. Results and Discussion 64
Figure 4.7: Average Max Accuracies by characteristics using the Fourier-32 DataReduction Technique
Chapter 4. Results and Discussion 65
Figure 4.8: Average Max Accuracies by characteristics using the Wavelet-DB4 DataReduction Technique
Chapter 4. Results and Discussion 66
Figure 4.9: Average Max Accuracies by characteristics using the Wavelet-Haar DataReduction Technique
Chapter 4. Results and Discussion 67
4.4 Data Streams
4.4.1 Combined Data streams
In section 4.3.1, it was determined that for particular characteristics the use of some data
streams resulted in more accurate classifications compared to using other data streams.
That section did not include the combined data streams described in section 3.4.1: AC,
SC-G1, SC-G2, SC-G3, SC-G4, and SC-G5. This section will explore the significance
of these data streams and some results from the previous sections are acknowledged,
namely:
1. Section 4.2.1, The Wavelet Transform was a superior Dimensional Reduction tech-nique compared to the Fourier Transform for this work,
2. Section 4.3.2, the merged configuration is a better metric of an ANN’s accuracyfor the three characteristic compared to using the averaged accuracy of each,
3. Section 4.3.2, using the merged configuration increased the range of accuracy re-sults when using the Wavelet Transforms.
This subsection will focus on improving the performance of the Wavelet Transform by
using the Merged characteristic. When using the wavelet transforms there were small
di↵erences in the results from di↵erent data streams when used to classify the three
core characteristics. By using the merged characteristic it became more apparent which
streams can better represent the existence of some characteristics. The results of this
will help to determine which are the best data steam to use when classifying each of the
core characteristics.
Table 4.9 displays each of the eight “primary” data streams along with the six “com-
bined” data streams, and the di↵erence between the average maximum accuracies for
that particular Merged Characteristic using the Wavelet Transforms. For example, the
maximum accuracy obtained by the “primary” data stream accelerometers using the
Chapter 4. Results and Discussion 68
Table 4.9: Di↵erence of Average Max Accuracy between combined and single datastreams over Wavelet Transforms For the Merged Characteristic
Dimensionality Reduction TechniqueData Streams Wavelet-DB4 Wavelet-HaarA-X 0.5658537 0.5085365A-Y 0.6743903 0.592683A-Z 0.71097565 0.59634145AC 0.82804885 0.735366
� AC and Max A 0.1170732 0.13902455S-G1 0.61680335 0.5713114SC-G1 0.8402439 0.8390245� SC-G1 and S-G1 0.22344055 0.2677131S-G2 0.63237705 0.5786884SC-G2 0.8743903 0.847561� SC-G2 and S-G2 0.24201325 0.2688726S-G3 0.5659836 0.55000005SC-G3 0.7975612 0.7695122� SC-G3 and S-G3 0.2315776 0.21951215S-G4 0.6754098 0.59467205SC-G4 0.94756095 0.93292685
� SC-G4 and S-G4 0.27215115 0.3382548S-G5 0.60409835 0.58934435SC-G5 0.9073169 0.8634147� SC-G5 and S-G5 0.30321855 0.27407035
Wavelet-DB4 technique was A-Z with a score of 0.71097565. The accuracy obtained
by the “combined” accelerometer, AC, was 0.82804885. This resulted in a di↵erence of
0.1170732, positive due to the fact that AC had a higher accuracy.
With the lack of negative delta results in this table, it demonstrates that the use
of a combined data stream outperforms the use of the “primary” data streams. This
is likely due to the “combined” data stream allowing the comparison of several vectors
at once during the classification of data. As Table 4.9 suggests, using the “combined”
accelerometers instead of each individual “primary” accelerometers in ANNs provides
a much greater accuracy while using SC-G4 leads to best accuracy amongst the strain
gauges.
Chapter 4. Results and Discussion 69
4.5 Final Configuration
Based on the previous experiments, it was found that using the “combined” Strain
Gauge 4 (SC-G4) data stream, along with a Wavelet data reduction technique leads
to the highest accuracy. Table 4.10 presents the accuracies of the characteristics using
these settings.
Table 4.10: Accuracies using combined Strain Gauge 4 (SC-G4) data stream withWavelet Data Reduction Techniques to classify the Gait, Shoe, and Turn characteristics
CharacteristicsDR Technique Gait Shoe TurnWavelet-DB4 0.99871632 0.960205421 0.98780495Wavelet-Haar 1.00000000 0.965340158 0.97926825
Chapter 5
Conclusion
In this chapter, the conclusions reached by the results of the study will be examined. A
summary of these results will be reiterated, along with conjectures best explaining these
results. After that, a series of suggestions for future work would be presented. Reasons
why a larger data size, which would increase the variance and types of data found in it,
are explored. Also, suggestions on di↵erent data characteristics that could be examined
are made. Lastly, di↵erent methods that could may improve the performance of ANNs
are presented.
5.1 Summary
In this study, the periodic patterns found in horse gait data was classified using a
Back Propagation ANN after applying several types of signal transforms to it. Data
from both accelerometers and a strain gauges were examined. These data streams were
preprocessed using both the Fourier Transform and the Wavelet Transfor to reduce the
number of inputs to the classifier. The coe�cients to these transforms were then used
as input to a Back Propagation Artificial Neural Network which was used to classify the
data streams into several characteristics.
70
Chapter 5. Conclusion 71
Results obtained from these ANNs were overall accurate. It was found that ANNs
using Wavelet Transforms for preprocessing the data performed more accurately over
ANNs using the Fourier Transforms. Particular data streams were more e↵ective when
used as input compared to others. The merged characteristic, which is a combination
of the other characteristics was used as an overall classifier for testing purposes. It was
also able to be classified accurately by some of the best performing data streams and
data reduction techniques. Using the merged characteristic as the classification metric,
it was also apparent that combined data streams produced more accurate results than
the use of just a single data stream.
From section 1, it stated that this thesis will present that dimensionality reduction
techniques can be used to preprocess data which is classified by the ANN and will
produce accurate classifications. It also states that it will explore the e↵ectiveness of
di↵erent techniques and data streams used, analyzing its e↵ect on the ANN’s accuracy.
It has been demonstrated that the Fourier and wavelet signal transforms can be used
to reduce the size of the input data to a Back Propagation Neural Network. It has also
been demonstrated that this data is representative of the original data and will produce
successful classification using equine gait data.
5.2 Future Work
5.2.1 Data Size, Variance and Types
This study only utilized the data from five horses, four of which were “pacers” while one
was a “trotter”. In future works involving gait analysis, a larger number of subjects with
equal amounts of both “pacer” and “trotter” would be preferred. With this approach,
a more generalized pattern might be recognized between the two gaits.
Chapter 5. Conclusion 72
Similarity, it would be interesting to study other gait characteristics, expanding from
two-beat gaits to four-beat gaits. Using similar methods as in this study is a good
starting point in the analysis of such gait. A potential di�culty might appear from
only collecting and analyze data originating from the right fore hoof. There might be a
necessity to measure to all four limbs of the subject. With the increase variance in the
types of gait additional classes could then be included in the “Gait” characteristic.
Furthering the expansion on the types of data collected, instead of just analyzing
the strain on the hoof wall, strain gauge could be used in analyzing the amount of
strain exerted upon the limbs and joints. This data might provide another dimension
as the strain upon the hoof can be completely di↵erent than ones obtained higher up in
the limbs. This becomes exceptionally useful when analyzing the medical ramifications
discussed in section 5.2.2.
Another type of data that could be collected could be an indication of the track
condition of the subject during the run. Di↵erences in track condition could factor in
di↵erences in strain gauge measurements even if the subject was performing runs each
with the same characteristics associated with it. Being able to measure this variable
could lead to a better accuracy in the ANNs due to its reliance on the strain gauge
measurements.
5.2.2 Data Characteristics
Another direction that could prove to be applicable would applying the results found
in this study to the field of medical and veterinarian science. This would rely on the
collection of limb or joint strain data mentioned in section 5.2.1. By analyzing these limb
or joint strain data, it might be helpful in diagnosing muscle, tendon or joint injuries.
This might require an ANN that is tailored specifically for a particular subject, as the
change in this measurement might vary too widely between subjects to provide any
Chapter 5. Conclusion 73
useful results. Being able to analyze this data through the development of a subject’s
lameness (the inability to travel in a regular manner with all four feet) might contribute
to a detection technique of injuries, diseases or over working of the subject. Using this
knowledge, future horses may be able to be diagnosed in the earlier stages when it shows
signs that other subjects may be experienced before acquiring such complications. The
ability to get these subjects medical help to these subjects can contribute to a speedier
recovery or even avoid such complications. This however will require a large amount
of data sets, as di↵erent injuries or diseases would present di↵erently with di↵erent
symptoms. Multiple cases across multiple horses with perhaps di↵erent gait will be
needed to provide any applicable and useful knowledge that the ANN can learn.
A more ambitious trait to classify is each subject’s gait movement in relative to their
breed. It would be interesting to analyze the correlation between subtle gait traits and a
specific breed of that horse or of a particular characteristic that is desirable in a certain
breed of horse. Being able to detect these traits might assist in more accurate genetic
breeding probabilities.
5.2.3 Methods
A general wavelet and Fourier transform were examined in this study. By transforming
the data stream using a Haar and DB4 Wavelet, the summation of the coe�cients by
level was used as the input to the ANN. Repeating this study utilizing another mother
wavelet and perhaps another method of transforming the wavelet coe�cients into a
suitable number of inputs for the ANN could also be examined.
Appendix A
Results of Artificial Neural
Networks
A.1 Note of Results
Each combination of Data Reduction Technique, Data Stream, and Characteristic were
classified by an ANN multiple times. Each had a di↵erent random initialized value. This
Appendix presents results as average values for both the maximum accuracy obtained
and the epoch in which the ANN reached this maximum value.
74
Appendix A. Results of Artificial Neural Networks 75
A.2 Gait
Table A.1: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Gait Characteristic
Dimensionality Reduction TechniqueFourier-08 Fourier-16 Fourier-32
Data Streams Accuracy Epoch Accuracy Epoch Accuracy EpochA-X 0.70454555 4248 0.69924250 4562 0.75378785 2565A-Y 0.77575745 5591 0.80151500 7082 0.82803015 6094A-Z 0.81818175 5839 0.80833330 2458 0.81742415 1900AC 0.85227255 3242 0.83333335 2065 0.81515150 2194S-G1 0.85564100 2554 0.89512815 1583 0.88256410 2728S-G2 0.84948715 3657 0.87846145 1591 0.88794875 1211S-G3 0.84948715 3721 0.84333325 2729 0.86769230 800S-G4 0.83641010 3937 0.87666655 1949 0.87846155 938S-G5 0.84461550 2373 0.89128195 1272 0.90999995 714SC-G1 0.84035100 1464 0.82368420 713 0.86666670 667SC-G2 0.90789475 626 0.78245615 123 0.82368425 1087SC-G3 0.93421045 1050 0.84473685 1189 0.83771930 758SC-G4 0.83596495 175 0.78859645 51 0.85263160 142SC-G5 0.88508775 313 0.83596490 148 0.89736840 555
Table A.2: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Gait Characteristic
Dimensionality Reduction TechniqueWavelet-DB4 Wavelet-Haar
Data Streams Accuracy Epoch Accuracy EpochA-X 0.97803010 725 0.96969675 986A-Y 0.96515135 539 0.96590900 387A-Z 0.98484825 110 0.96666645 1678AC 0.99772720 106 0.98712100 68S-G1 0.96000005 1280 0.96564120 1864S-G2 0.96076930 2220 0.94717945 4015S-G3 0.97205130 1118 0.96717955 2175S-G4 0.97384625 2605 0.94923085 2176S-G5 0.95974355 1651 0.95230775 2831SC-G1 0.99912280 19 1.00000000 19SC-G2 1.00000000 11 1.00000000 20SC-G3 0.99473680 14 0.99122800 29SC-G4 1.00000000 7 1.00000000 13SC-G5 0.99035085 54 0.99122805 28
Appendix A. Results of Artificial Neural Networks 76
A.3 Shoe
Table A.3: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Shoe Characteristic
Dimensionality Reduction TechniqueFourier-08 Fourier-16 Fourier-32
Data Streams Accuracy Epoch Accuracy Epoch Accuracy EpochA-X 0.63306425 3584 0.63387075 6849 0.69435465 4911A-Y 0.66935450 5962 0.69999970 4373 0.73145140 4847A-Z 0.79919370 3826 0.82661310 2167 0.86935510 1114AC 0.83629065 3142 0.86129060 1550 0.86935510 1134S-G1 0.76436455 4538 0.82182320 1667 0.83701650 1730S-G2 0.78618795 3168 0.83397790 1970 0.85248620 852S-G3 0.72817670 5499 0.79447520 1263 0.81215470 2011S-G4 0.79585645 3239 0.84447520 1918 0.86381215 1474S-G5 0.79005535 2892 0.85082870 1437 0.88066310 1035SC-G1 0.85327860 832 0.83196715 1131 0.88688525 562SC-G2 0.87459015 211 0.87540975 387 0.89754105 554SC-G3 0.82950820 847 0.82377040 556 0.85819675 144SC-G4 0.91229510 677 0.87868850 375 0.88032780 113SC-G5 0.90983615 2623 0.89918035 434 0.90409835 864
Table A.4: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Shoe Characteristic
Dimensionality Reduction TechniqueWavelet-DB4 Wavelet-Haar
Data Streams Accuracy Epoch Accuracy EpochA-X 0.93387110 1397 0.91612920 2376A-Y 0.91209695 3637 0.87983895 4984A-Z 0.95483880 840 0.91854855 1441AC 0.99435485 112 0.96854845 392S-G1 0.92265210 4563 0.89917130 4409S-G2 0.86602205 4284 0.80718220 6298S-G3 0.89917130 2213 0.88535905 3843S-G4 0.93121545 1985 0.90441990 3318S-G5 0.89309400 2267 0.86022100 3796SC-G1 0.97213125 695 0.95819685 590SC-G2 0.94754100 214 0.93032785 418SC-G3 0.98196735 91 0.96721325 336SC-G4 0.98852475 49 0.98524600 317SC-G5 0.97950830 128 0.97541015 64
Appendix A. Results of Artificial Neural Networks 77
A.4 Turn
Table A.5: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Turn Characteristic
Dimensionality Reduction TechniqueFourier-08 Fourier-16 Fourier-32
Data Streams Accuracy Epoch Accuracy Epoch Accuracy EpochA-X 0.78536595 5951 0.77195130 4512 0.72560980 3772A-Y 0.61829265 6522 0.63048775 4532 0.62195120 3319A-Z 0.69878065 2083 0.71097555 1948 0.65487820 689AC 0.72195135 2049 0.71707315 1655 0.63536585 655S-G1 0.65122955 1061 0.67499995 662 0.67459005 238S-G2 0.68524580 669 0.69836055 161 0.66311475 198S-G3 0.54836065 1426 0.57090155 527 0.55819655 2296S-G4 0.70942620 124 0.68032795 637 0.67827875 289S-G5 0.74713110 439 0.75245905 306 0.75532790 149SC-G1 0.74634155 32 0.71707330 50 0.68658540 30SC-G2 0.76585375 251 0.71829275 169 0.67804880 343SC-G3 0.65975600 43 0.63658550 63 0.60121950 23SC-G4 0.77682935 124 0.76585375 39 0.73292685 208SC-G5 0.83414645 114 0.79268300 61 0.76829275 135
Table A.6: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Turn Characteristic
Dimensionality Reduction TechniqueWavelet-DB4 Wavelet-Haar
Data Streams Accuracy Epoch Accuracy EpochA-X 0.69390240 2115 0.70853655 1979A-Y 0.87317075 865 0.80975615 1322A-Z 0.82317090 1440 0.75975620 1569AC 0.86951220 234 0.81585355 812S-G1 0.78401645 1977 0.78237715 4744S-G2 0.88565575 2693 0.83811470 3522S-G3 0.77909840 2489 0.78155735 2779S-G4 0.88934415 1473 0.85122955 1394S-G5 0.84426220 1015 0.81680325 1548SC-G1 0.89634150 141 0.90853660 328SC-G2 0.95975605 124 0.95243885 690SC-G3 0.86829275 356 0.84146345 897SC-G4 0.98780495 21 0.97926825 37SC-G5 0.98048780 49 0.94268290 200
Appendix A. Results of Artificial Neural Networks 78
A.5 Merged Characteristics
Table A.7: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze all of the characteristics merged
Dimensionality Reduction TechniqueFourier-08 Fourier-16 Fourier-32
Data Streams Accuracy Epoch Accuracy Epoch Accuracy EpochA-X 0.47439010 3568 0.45243900 2655 0.43658535 1335A-Y 0.39878045 4030 0.42926830 3101 0.41585350 2047A-Z 0.43292685 1135 0.42195110 723 0.39146345 393AC 0.43902445 819 0.43048775 641 0.39268290 337S-G1 0.40286890 141 0.40737700 50 0.40901645 126S-G2 0.41680335 257 0.40655745 39 0.40245905 42S-G3 0.33524595 585 0.31967225 125 0.31024595 23S-G4 0.45778690 76 0.42581975 59 0.40081960 373S-G5 0.45983610 135 0.46434435 252 0.48155745 1243SC-G1 0.49878050 53 0.46463420 187 0.45487810 31SC-G2 0.48414630 38 0.41585375 17 0.41097560 41SC-G3 0.39756100 39 0.36707305 5 0.36097555 39SC-G4 0.51707315 214 0.44634135 20 0.42439035 12SC-G5 0.52804875 227 0.46463415 87 0.44756105 101
Table A.8: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze all of the characteristics merged
Dimensionality Reduction TechniqueWavelet-DB4 Wavelet-Haar
Data Streams Accuracy Epoch Accuracy EpochA-X 0.56585370 977 0.50853650 1661A-Y 0.67439030 1069 0.59268300 1787A-Z 0.71097565 1312 0.59634145 1244AC 0.82804885 886 0.73536600 963S-G1 0.61680335 1406 0.57131140 3194S-G2 0.63237705 1331 0.57868840 1553S-G3 0.56598360 562 0.55000005 797S-G4 0.67540980 1452 0.59467205 1901S-G5 0.60409835 643 0.58934435 2161SC-G1 0.84024390 899 0.83902450 1519SC-G2 0.87439030 183 0.84756100 553SC-G3 0.79756120 915 0.76951220 360SC-G4 0.94756095 607 0.93292685 323SC-G5 0.90731690 379 0.86341470 551
Appendix B
Supplementary Results of
Artificial Neural Networks
B.1 Note of Results
Each combination of Fourier Data Reduction Technique, Data Stream, and Character-
istic were classified by an ANN multiple times. Each had a di↵erent random initialized
value. This Appendix presents results as average values for both the maximum accuracy
obtained and the epoch in which the ANN reached this maximum value.
79
Appendix B. Supplementary Results of Artificial Neural Networks 80
B.2 Gait
Table B.1: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Gait Characteristic
Dimensionality Reduction TechniqueFourier-64 Fourier-128
Data Streams Accuracy Epoch Accuracy EpochA-X 0.8390244 16 0.8195122 1A-Y 0.8243904 44 0.8243904 1A-Z 0.8146344 1 0.8195122 1AC 0.8390246 268 0.8292684 12S-G1 0.8213114 1370 0.8295082 5S-G2 0.8016394 46 0.822951 1S-G3 0.790164 163 0.8245902 1S-G4 0.804918 1 0.8213116 1S-G5 0.842623 359 0.8934426 310SC-G1 0.804878 11 0.8048782 1535SC-G2 0.785366 1 0.785366 1SC-G3 0.7951222 7 0.7902442 1702SC-G4 0.8146344 499 0.8000002 27SC-G5 0.8048782 2390 0.7951222 401
Appendix B. Supplementary Results of Artificial Neural Networks 81
B.3 Shoe
Table B.2: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Shoe Characteristic
Dimensionality Reduction TechniqueFourier-64 Fourier-128
Data Streams Accuracy Epoch Accuracy EpochA-X 0.8243902 17 0.8146342 2A-Y 0.8585364 328 0.8146342 9A-Z 0.8390244 87 0.8048782 1AC 0.8195122 55 0.8 10S-G1 0.762295 2 0.7721312 48S-G2 0.7606558 4 0.760656 3S-G3 0.7606558 4 0.7573772 3S-G4 0.7672132 62 0.7688526 10S-G5 0.7786886 311 0.8000002 23SC-G1 0.7463416 2 0.7609758 77SC-G2 0.7707318 19 0.7658538 65SC-G3 0.7414634 1 0.7512194 21SC-G4 0.7512196 4 0.7414634 1SC-G5 0.7951222 8 0.8 7
Appendix B. Supplementary Results of Artificial Neural Networks 82
B.4 Turn
Table B.3: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Turn Characteristic
Dimensionality Reduction TechniqueFourier-64 Fourier-128
Data Streams Accuracy Epoch Accuracy EpochA-X 0.687805 3390 0.619512 415A-Y 0.6390244 2012 0.6243902 305A-Z 0.6341466 315 0.5756098 35AC 0.6487806 1003 0.609756 73S-G1 0.6836064 79 0.7 789S-G2 0.657377 286 0.6426232 20S-G3 0.514754 2007 0.519672 46S-G4 0.6770492 56 0.6934426 29S-G5 0.780328 375 0.752459 27SC-G1 0.6731706 10 0.7170732 169SC-G2 0.6829268 345 0.604878 1757SC-G3 0.6048782 4 0.614634 230SC-G4 0.6731708 108 0.726829 1947SC-G5 0.6975608 134 0.6975612 497
Appendix B. Supplementary Results of Artificial Neural Networks 83
B.5 Merged
Table B.4: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Merged Characteristic
Dimensionality Reduction TechniqueFourier-64 Fourier-128
Data Streams Accuracy Epoch Accuracy EpochA-X 0.5439025 2568 0.4829267 214.5A-Y 0.5268294 1223 0.5 163A-Z 0.5146342 180 0.490244 26AC 0.5341463 551.5 0.509756 48.5S-G1 0.554918 52 0.5557377 1140S-G2 0.5262294 155 0.5131149 278S-G3 0.4139342 1004.5 0.4254097 24S-G4 0.5336066 79.5 0.5557376 33.5S-G5 0.6475411 817.5 0.6418032 66.5SC-G1 0.5634145 11 0.5756098 1013SC-G2 0.5439025 199 0.504878 917SC-G3 0.4926829 28.5 0.4926828 131SC-G4 0.5487806 59 0.5804876 977SC-G5 0.5609756 93 0.5487806 317.5
Appendix C
Standard Deviation of Results
C.1 Note of Results
Each combination of Data Reduction Technique, Data Stream, and Characteristic were
classified by an ANN multiple times. This Appendix documents the standard deviation
of each group of results.
84
Appendix C. Standard Deviation of Results 85
C.2 Gait
Table C.1: Standard Deviations of ANNs analyzing the Gait Characteristic using theFourier Data Reduction Technique
Data Reduction TechniqueData Stream Fourier-08 Fourier-16 Fourier-32 Fourier-64 Fourier-128A-X 0.043570101 0.047721676 0.048814275 0.039629453 0.058941793A-Y 0.045094107 0.049351359 0.049317956 0.035846296 0.0520832A-Z 0.038425269 0.037993951 0.031705085 0.039629453 0.050222414AC 0.038425283 0.035805767 0.031860605 0.047794832 0.046277193S-G1 0.030675335 0.032604703 0.028622641 0.040621318 0.033355391S-G2 0.031240397 0.031584053 0.030772196 0.06503879 0.033436183S-G3 0.032712969 0.033881104 0.042034965 0.073970631 0.028204432S-G4 0.033310548 0.033853718 0.029975659 0.037166419 0.031703339S-G5 0.030259945 0.031162901 0.036904688 0.028582906 0.022596844SC-G1 0.042108406 0.051412034 0.052238745 0.043630717 0.046277509SC-G2 0.054189644 0.049584594 0.039856712 0.039024625 0.039024625SC-G3 0.058263418 0.053578067 0.045385405 0.03308456 0.033084589SC-G4 0.042536694 0.042883871 0.046284414 0.033084677 0.028443761SC-G5 0.045385488 0.054523186 0.045675084 0.026718272 0.0292684
Table C.2: Standard Deviations of ANNs analyzing the Gait Characteristic using theWavelet Data Reduction Technique
Data Reduction TechniqueData Stream Wavelet-DB4 Wavelet-HaarA-X 0.023600536 0.03149653A-Y 0.027711338 0.026681321A-Z 0.026804459 0.030540132AC 0.01176516 0.016338906S-G1 0.015815619 0.022656051S-G2 0.02232508 0.02528065S-G3 0.016153135 0.019589826S-G4 0.024235552 0.021963733S-G5 0.0281904 0.023375761SC-G1 0.01176516 0.013825858SC-G2 0.00000000 0.00000000SC-G3 0.019969694 0.016539351SC-G4 0.005446211 0.00000000SC-G5 0.009943375 0.018063206
Appendix C. Standard Deviation of Results 86
C.3 Shoe
Table C.3: Standard Deviations of ANNs analyzing the Shoe Characteristic using theFourier Data Reduction Technique
Data Reduction TechniqueData Stream Fourier-08 Fourier-16 Fourier-32 Fourier-64 Fourier-128A-X 0.05755193 0.052834571 0.051444054 0.047294295 0.045237099A-Y 0.049016241 0.047061042 0.051795121 0.047294439 0.070012861A-Z 0.056976423 0.048065766 0.049351348 0.036504037 0.051161277AC 0.055004526 0.040715679 0.042768484 0.042525808 0.056467537S-G1 0.023542362 0.020581234 0.02385635 0.005184238 0.00803102S-G2 0.022754448 0.023668512 0.022308345 0.010874278 0.01311475S-G3 0.020234734 0.022132504 0.021090486 0.010874278 0.009835933S-G4 0.026056316 0.024296957 0.020662567 0.014291384 0.008031387S-G5 0.019225458 0.015709371 0.025014184 0.007331172 0.011118362SC-G1 0.04439441 0.048065836 0.048814253 0.054755788 0.047294295SC-G2 0.045349249 0.037027519 0.048168502 0.045237423 0.071692373SC-G3 0.046955975 0.052396382 0.050080734 0.062849348 0.067943425SC-G4 0.048339466 0.044690335 0.052080871 0.071359986 0.062849348SC-G5 0.043266486 0.041038188 0.031912125 0.042525808 0.028443383
Table C.4: Standard Deviations of ANNs analyzing the Shoe Characteristic using theWavelet Data Reduction Technique
Data Reduction TechniqueData Stream Wavelet-DB4 Wavelet-HaarA-X 0.042613977 0.047513901A-Y 0.038638865 0.041912211A-Z 0.056453365 0.057063087AC 0.029160119 0.04653296S-G1 0.022005869 0.024297029S-G2 0.022811607 0.020680531S-G3 0.016279438 0.020752383S-G4 0.020472582 0.027495126S-G5 0.024849911 0.025624284SC-G1 0.041477496 0.038253331SC-G2 0.033866143 0.046070155SC-G3 0.064798082 0.042960497SC-G4 0.034683666 0.032928831SC-G5 0.032928843 0.039063185
Appendix C. Standard Deviation of Results 87
C.4 Turn
Table C.5: Standard Deviations of ANNs analyzing the Turn Characteristic using theFourier Data Reduction Technique
Data Reduction TechniqueData Stream Fourier-08 Fourier-16 Fourier-32 Fourier-64 Fourier-128A-X 0.043946685 0.061990769 0.042652706 0.04729448 0.011948611A-Y 0.077214195 0.062835657 0.067756851 0.062469317 0.058941793A-Z 0.079609963 0.058994174 0.062940505 0.068986115 0.045237099AC 0.070357835 0.070474869 0.054824572 0.054755877 0.0487806S-G1 0.023391526 0.028308971 0.026502561 0.02959913 0.012267425S-G2 0.028302267 0.028713687 0.032877563 0.04734261 0.049344046S-G3 0.029474812 0.036879622 0.028629242 0.018976684 0.032207959S-G4 0.028374555 0.026220084 0.02623429 0.033835575 0.034620871S-G5 0.029838876 0.027838311 0.03030938 0.014102234 0.020343968SC-G1 0.052521954 0.056424378 0.065957572 0.061703073 0.028443383SC-G2 0.040594112 0.049016464 0.053608746 0.032357323 0.047294584SC-G3 0.046070224 0.068072225 0.042883942 0.024873331 0.062469505SC-G4 0.050408625 0.042108464 0.05128372 0.042525716 0.042525808SC-G5 0.052834806 0.04357004 0.043494447 0.052538151 0.045237099
Table C.6: Standard Deviations of ANNs analyzing the Turn Characteristic using theWavelet Data Reduction Technique
Data Reduction TechniqueData Stream Wavelet-DB4 Wavelet-HaarA-X 0.064313069 0.0854407A-Y 0.056628361 0.051443967A-Z 0.048271144 0.040553423AC 0.03944101 0.058798475S-G1 0.03469531 0.042052677S-G2 0.025507948 0.026502541S-G3 0.025905992 0.03425262S-G4 0.025236319 0.030186241S-G5 0.02391876 0.024789965SC-G1 0.052772169 0.057178599SC-G2 0.024825693 0.025416087SC-G3 0.046461976 0.054159114SC-G4 0.014523364 0.019212795SC-G5 0.018691057 0.053269533
Appendix C. Standard Deviation of Results 88
C.5 Merged
Table C.7: Standard Deviations of ANNs analyzing all of the characteristics mergedusing the Fourier Data Reduction Technique
Data Reduction TechniqueData Stream Fourier-08 Fourier-16 Fourier-32 Fourier-64 Fourier-128A-X 0.053854203 0.045675059 0.043984267 0.011948611 0.03902445A-Y 0.062229546 0.058263439 0.058938353 0.043630606 0.062849363A-Z 0.065657023 0.051922391 0.045458072 0.047294295 0.03308456AC 0.053021601 0.049948845 0.048305235 0.047294584 0.066169001S-G1 0.038198572 0.031406815 0.030820712 0.021374394 0.006134073S-G2 0.025898725 0.025711376 0.0369804 0.02283374 0.02995987S-G3 0.029682371 0.01760856 0.02718207 0.047057961 0.026736812S-G4 0.030504956 0.026523549 0.030260091 0.040687371 0.037740476S-G5 0.026269729 0.031270197 0.031631313 0.018976961 0.033355568SC-G1 0.057265076 0.065001163 0.052616007 0.042525854 0.060535146SC-G2 0.046320019 0.051219427 0.05700542 0.03308456 0.058941678SC-G3 0.043758852 0.057552132 0.05382339 0.0425259 0.044708089SC-G4 0.049584626 0.05709202 0.06469627 0.039629453 0.064345937SC-G5 0.045675072 0.044059111 0.05017929 0.054755877 0.050222667
Table C.8: Standard Deviations of ANNs analyzing all of the characteristics mergedusing the Wavelet Data Reduction Technique
Data Reduction TechniqueData Stream Wavelet-DB4 Wavelet-HaarA-X 0.06423613 0.077064675A-Y 0.04428295 0.061403191A-Z 0.06280936 0.058348364AC 0.050961328 0.071242282S-G1 0.040931337 0.03613024S-G2 0.032985064 0.03883136S-G3 0.027182018 0.035026377S-G4 0.038523426 0.037475497S-G5 0.034149256 0.040294406SC-G1 0.065657014 0.057264812SC-G2 0.04222549 0.053945709SC-G3 0.070099861 0.075922832SC-G4 0.041794128 0.048509409SC-G5 0.05079916 0.043570037
Bibliography
[1] Filip Wasilewski. Daubechies 4 wavelet (db4) properties, filters and functions, Febu-
rary 2014. URL http://wavelets.pybytes.com/wavelet/db4/. From Wavelet
Properties Browser.
[2] Filip Wasilewski. Haar wavelet (haar) properties, filters and functions, Feburary
2014. URL http://wavelets.pybytes.com/wavelet/haar/. From Wavelet Prop-
erties Browser.
[3] Joshua Altmann. 3 - wavelet basics, Feburary 2014. URL http://www.wavelet.
org/tutorial/wbasic.htm. From Wavelets Tutorial.
[4] D.L. Poole, A.K. Mackworth, and R. Goebel. Computational Intelligence: A Logical
Approach. Oxford University Press, 1998. ISBN 9780195102703. URL http://
books.google.ca/books?id=RCaOtmXvbCUC.
[5] John B Kaneene, Whitney A Ross, and RoseAnn Miller. The michigan equine mon-
itoring system. ii. frequencies and impact of selected health problems. Preventive
veterinary medicine, 29(4):277–292, 1997.
[6] Geraint Wyn-Jones et al. Equine lameness. Blackwell Scientific Publications, 1988.
[7] M Hewetson, RM Christley, ID Hunt, and LC Voute. Investigations of the reliability
of observational gait analysis for the assessment of lameness in horses. Veterinary
record: journal of the British Veterinary Association, 158(25), 2006.
89
Bibliography 90
[8] KG Keegan, DA Wilson, DJ Wilson, B Smith, EM Gaughan, RS Pleasant, JD Lil-
lich, J Kramer, RD Howard, C Bacon-Miller, et al. Evaluation of mild lameness in
horses trotting on a treadmill by clinicians and interns or residents and correlation
of their assessments with kinematic gait analysis. American journal of veterinary
research, 59(11):1370–1377, 1998.
[9] T Pfau, JJ Robilliard, R Weller, K Jespers, E Eliashar, and AMWilson. Assessment
of mild hindlimb lameness during over ground locomotion using linear discriminant
analysis of inertial sensor data. Equine veterinary journal, 39(5):407–413, 2007.
[10] Akikazu Ishihara, Stephen M Reed, Paivi J Rajala-Schultz, James T Robertson,
and Alicia L Bertone. Use of kinetic gait analysis for detection, quantification, and
di↵erentiation of hind limb lameness and spinal ataxia in horses. Journal of the
American Veterinary Medical Association, 234(5):644–651, 2009.
[11] Ellen Bajcar, David Calvert, and Je↵ Thomason. Analysis of equine gaitprint and
other gait characteristics using self-organizing maps (som). In Neural Networks,
2004. Proceedings. 2004 IEEE International Joint Conference on, volume 1. IEEE,
2004.
[12] P. Simon. Too Big to Ignore: The Business Case for Big Data. Wiley and SAS
Business Series. Wiley, 2013. ISBN 9781118642108. URL http://books.google.
ca/books?id=Dn-Gdoh66sgC.
[13] Stanley Smith Stevens. On the theory of scales of measurement, 1946.
[14] Terrence J Sejnowski and Charles R Rosenberg. Parallel networks that learn to
pronounce english text. Complex systems, 1(1):145–168, 1987.
[15] Timo Koskela, Mikko Lehtokangas, Jukka Saarinen, and Kimmo Kaski. Time series
prediction with multilayer perceptron, fir and elman neural networks. In Proceedings
of the World Congress on Neural Networks, pages 491–496, 1996.
Bibliography 91
[16] Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by
locally linear embedding. Science, 290(5500):2323–2326, 2000. doi: 10.1126/
science.290.5500.2323. URL http://www.sciencemag.org/content/290/5500/
2323.abstract.
[17] Yvan Saeys, Inaki Inza, and Pedro Larranaga. A review of feature selec-
tion techniques in bioinformatics. Bioinformatics, 23(19):2507–2517, 2007. doi:
10.1093/bioinformatics/btm344. URL http://bioinformatics.oxfordjournals.
org/content/23/19/2507.abstract.
[18] Josef Kittler. Feature selection and extraction. Handbook of pattern recognition and
image processing, pages 59–83, 1986.
[19] Isabelle Guyon and Andre Elissee↵. An introduction to variable and feature selec-
tion. The Journal of Machine Learning Research, 3:1157–1182, 2003.
[20] Ron Kohavi and George H John. Wrappers for feature subset selection. Artificial
intelligence, 97(1):273–324, 1997.
[21] Avrim L Blum and Pat Langley. Selection of relevant features and examples in
machine learning. Artificial intelligence, 97(1):245–271, 1997.
[22] Thomas Navin Lal, Olivier Chapelle, Jason Weston, and Andre Elissee↵. Embedded
methods. In Feature extraction, pages 137–165. Springer, 2006.
[23] Rick Archibald and George Fann. Feature selection and classification of hyperspec-
tral images with support vector machines. Geoscience and Remote Sensing Letters,
IEEE, 4(4):674–677, 2007.
[24] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
[25] Yiming Yang and Jan O Pedersen. A comparative study on feature selection in text
categorization. In ICML, volume 97, pages 412–420, 1997.
Bibliography 92
[26] Andre A. Rupp Michael J. Walk. Pearson Product-Moment Correlation Coe�cient,
pages 1023–1027. SAGE Publications, Inc., 0 edition, 2010. doi: http://dx.doi.org/
10.4135/9781412961288. URL http://dx.doi.org/10.4135/9781412961288.
[27] Stephen M. Stigler. Francis galton’s account of the invention of correlation. Sta-
tistical Science, 4(2):73–79, 05 1989. doi: 10.1214/ss/1177012580. URL http:
//dx.doi.org/10.1214/ss/1177012580.
[28] Daphne Koller and Mehran Sahami. Toward optimal feature selection. 1996.
[29] Lei Yu and Huan Liu. Feature selection for high-dimensional data: A fast
correlation-based filter solution. In ICML, volume 3, pages 856–863, 2003.
[30] Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and L Zadeh. Feature extraction.
Foundations and applications, 2006.
[31] Imola K Fodor. A survey of dimension reduction techniques, 2002.
[32] Padraig Cunningham. Dimension reduction. In Machine learning techniques for
multimedia, pages 91–112. Springer, 2008.
[33] Roland Priemer. Introductory signal processing, volume 6. World Scientific, 1991.
[34] Wenxin Li, David Zhang, and Zhuoqun Xu. Palmprint identification by fourier
transform. International Journal of Pattern Recognition and Artificial Intelligence,
16(04):417–432, 2002.
[35] Jing Lin and Liangsheng Qu. Feature extraction based on morlet wavelet and its
application for mechanical fault diagnosis. Journal of sound and vibration, 234(1):
135–148, 2000.
[36] H.C. Taneja. Advanced Engineering Mathematics:, volume 2. I.K. International
Publishing House Pvt. Limited, 2007. ISBN 9788189866563. URL http://books.
google.ca/books?id=X-RFRHxMzvYC.
Bibliography 93
[37] Eric W. Weisstein. Fourier series, Feburary 2014. URL http://mathworld.
wolfram.com/FourierSeries.html. From MathWorld–A Wolfram Web Resource.
[38] Eric W. Weisstein. Generalized fourier series, Feburary 2014. URL http://
mathworld.wolfram.com/GeneralizedFourierSeries.html. From MathWorld–
A Wolfram Web Resource.
[39] Mizan Rahman. Applications of Fourier Transforms to Generalized Functions. WIT
Press, 2011.
[40] Eric W. Weisstein. Fourier transform, Feburary 2014. URL http://mathworld.
wolfram.com/FourierTransform.html. From MathWorld–A Wolfram Web Re-
source.
[41] Eric W. Weisstein. Discrete fourier transform, Feburary 2014. URL http://
mathworld.wolfram.com/DiscreteFourierTransform.html. From MathWorld–
A Wolfram Web Resource.
[42] Tom Chau. A review of analytical techniques for gait data. part 2: neural network
and wavelet methods. Gait & Posture, 13(2):102–120, 2001.
[43] JG Barton and A Lees. An application of neural networks for distinguishing gait
patterns on the basis of hip-knee joint angle diagrams. Gait & Posture, 5(1):28–33,
1997.
[44] James S Walker. Fourier analysis and wavelet analysis. Notices of the AMS, 44(6):
658–670, 1997.
[45] J Allen. Short-term spectral analysis, and modification by discrete fourier trans-
form. IEEE Transactions on Acoustics Speech and Signal Processing, 25:235–238,
1977.
Bibliography 94
[46] I. Daubechies. The wavelet transform, time-frequency localization and signal anal-
ysis. Information Theory, IEEE Transactions on, 36(5):961–1005, Sep 1990. ISSN
0018-9448. doi: 10.1109/18.57199.
[47] Gerald Kaiser. A friendly guide to wavelets. Springer, 2010.
[48] Stephane G Mallat. A theory for multiresolution signal decomposition: the wavelet
representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on,
11(7):674–693, 1989.
[49] D Lee Fugal. Conceptual wavelets in digital signal processing: an in-depth, practical
approach for the non-mathematician. Space & Signals Technical Pub., 2009.
[50] Dan B. Marghitu and Prasad Nalluri. An analysis of greyhound gait using wavelets.
Journal of Electromyography and Kinesiology, 7(3):203 – 212, 1997. ISSN 1050-
6411. doi: http://dx.doi.org/10.1016/S1050-6411(96)00035-1. URL http://www.
sciencedirect.com/science/article/pii/S1050641196000351.
[51] B.A. Paya, I.I. Esat, and M.N.M. Badi. Artificial neural network based fault diag-
nostics of rotating machinery using wavelet transforms as a preprocessor. Mechani-
cal Systems and Signal Processing, 11(5):751 – 765, 1997. ISSN 0888-3270. doi:
http://dx.doi.org/10.1006/mssp.1997.0090. URL http://www.sciencedirect.
com/science/article/pii/S088832709790090X.
[52] T Tamura, M Sekine, M Ogawa, T Togawa, and Y Fukui. Classification of acceler-
ation waveforms during walking by wavelet transform. Methods of information in
medicine, 36(4-5):356–359, 1997.
[53] Matteo Frigo and Steven G Johnson. The design and implementation of ↵tw3.
Proceedings of the IEEE, 93(2):216–231, 2005.
[54] Filip Wasilewski. Pywavelets - discrete wavelet transform in python, Feburary 2014.
URL http://www.pybytes.com/pywavelets/.
Bibliography 95
[55] Adam Blum. Neural networks in C++: an object-oriented framework for building
connectionist systems. John Wiley & Sons, Inc., 1992.
[56] Gordon S Lino↵ and Michael JA Berry. Data mining techniques: for marketing,
sales, and customer relationship management. John Wiley & Sons, 2011.