Equine Gait Data Analysis using Signal Transforms as a Preprocessor to Back Propagation Networks

Equine Gait Data Analysis using Signal Transforms as a

Preprocessor to Back Propagation Networks

by

Edwin H. Cheung

A Thesis

presented to

The University of Guelph

In partial fulfilment of requirements

for the degree of

Masters of Science

in

School of Computer Science

Guelph, Ontario, Canada

c�Edwin H. Cheung, April, 2014

ABSTRACT

Equine Gait Data Analysis using Signal Transforms as a Preprocessor to

Back Propagation Networks

Edwin H. Cheung

University of Guelph, 2014

Advisor:

Dr. David Calvert

This thesis examines using Back Propagation network in the analysis of equine gait

data. Back Propagation networks are capable of classifying non-linear data sets, but are

not usually built to handle time series data. By using Fourier and wavelet transforms as

a pre-processor, the Back Propagation network is then able to overcome this hurdle. It

was then able to analyze and classify gait, shoeing and direction in the gait data quite

accurately and e↵ectively. Several methods proved to be more e↵ective than others.

Acknowledgements

I would like to thank everyone who has supported me to complete this thesis. My

parents for allowing me to pursue this dream. Lifelong friends who have pushed me to

finish this degree. My fellow housemates for always being there when needed. Jenna

Stephens for being priceless, exuberant, and accepting. Finally, credit to Dr. David

Calvert, who has been everything an advisor should be – and more.

iii

Contents

Abstract i

Acknowledgements iii

Contents iv

List of Figures v

List of Tables vi

1 Introduction 1

2 Literature Review 5

2.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Architecture of Artificial Neural Networks . . . . . . . . . . . . . . 6

2.1.2 Training of Supervised Artificial Neural Networks . . . . . . . . . . 8

2.1.3 Classifying vs Clustering . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.4 Testing of Supervised Artificial Neural Networks . . . . . . . . . . 9

2.1.5 Back Propagation Artificial Neural Network . . . . . . . . . . . . . 10

2.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.3 Signal Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.4 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.5 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Methodology 29

3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.2 Computable characteristics . . . . . . . . . . . . . . . . . . . . . . 30

3.1.3 Naming Convention . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2.1 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . 36

3.2.2 Discrete Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . 37

iv

Contents v

3.3 Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.4 Data Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4.1 Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4 Results and Discussions 48

4.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2 Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . . . . 49

4.2.1 Fourier vs Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2.2 Additional Fourier Coe�cient Analysis . . . . . . . . . . . . . . . . 53

4.2.3 Mother Wavelets: Haar vs DB4 . . . . . . . . . . . . . . . . . . . . 55

4.3 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.3.1 Gait, Shoe, and Turn . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.3.2 Merged characteristics Used to Enhance Results . . . . . . . . . . 59

4.4 Data Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.4.1 Combined Data streams . . . . . . . . . . . . . . . . . . . . . . . . 67

4.5 Final Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Conclusion 70

5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2.1 Data Size, Variance and Types . . . . . . . . . . . . . . . . . . . . 71

5.2.2 Data Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

A Results of Artificial Neural Networks 74

A.1 Note of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.2 Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

A.3 Shoe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

A.4 Turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

A.5 Merged Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

B Supplementary Results of Artificial Neural Networks 79

B.1 Note of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

B.2 Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

B.3 Shoe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

B.4 Turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

B.5 Merged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

C Standard Deviation of Results 84

C.1 Note of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

C.2 Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

C.3 Shoe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Contents vi

C.4 Turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

C.5 Merged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Bibliography 89

List of Figures

2.1 3 Layer (a,b,c) Back Propagation Artificial Neural Network with twoweight layers (V,W) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Fourier Series Approximation through di↵erent K values(Red) of a SquareWave (Blue) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3 Fourier Transform of a f(x), shown in red, on the time domain. Thecomponent waves, in blue, are then plotted along the frequency domainas peaks as the result of the Fourier Transform . . . . . . . . . . . . . . . 20

2.4 Results of the Short Time Fourier Transform, and the Wavelet Transform,and their di↵erence in time resolution . . . . . . . . . . . . . . . . . . . . 23

2.5 Visual representation of both the Daubechies 4, aka db4 Wavelet (Left),and the Haar Wavelet (Right) [1][2] . . . . . . . . . . . . . . . . . . . . . . 24

2.6 A simplified visual representation of the scaling and shifting techniquesused in Continuous Wavelet Transform [3] . . . . . . . . . . . . . . . . . . 25

2.7 A simplified visual representation of the Approximation and Details re-sults from Decomposition Filters used in a Discrete Wavelet Transform . . 27

2.8 Resulting Coe�cients from a Discrete Wavelet Transform . . . . . . . . . 28

3.1 Right fore hoof with location of Strain gauge (G1-G5) . . . . . . . . . . . 31

3.2 Sample Data from a Sensor divided into Data Fragments . . . . . . . . . . 33

3.3 Sample Data from a Sensor divided into Labeled Data Fragments . . . . . 35

3.4 Sample data process for the Accelerometers in X-Axis (A-X) data stream 41

3.5 Sample data process for the Strain Gauge1 (S-G1) data stream . . . . . . 42

3.6 Sample data process for the Accelerometer Combined (AC) data stream . 43

3.7 Sample data process for the Strain Combined Gauge1 (SC-G1) data stream 44

3.8 Output Layer of the ANN for the Shoe Characteristic . . . . . . . . . . . 45

3.9 Output Layer of the ANN for the Merged Characteristic . . . . . . . . . . 46

4.1 Comparison of Average Accuracies of di↵erent dimensionality reductiontechniques over characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 Comparison of Average Accuracies of di↵erent Characteristic over FourierTransform dimensionality reduction techniques . . . . . . . . . . . . . . . 54

4.3 Comparison of Average Accuracies of di↵erent Characteristic over FourierTransform dimensionality reduction techniques including SupplementaryFourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.4 Comparison of Average Accuracies of non-combined data streams for theeach characteristics regardless of dimensionality reduction techniques . . . 58

vii

List of Figures viii

4.5 Average Max Accuracies by characteristics using the Fourier-8 Data Re-duction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62



4.8 Average Max Accuracies by characteristics using the Wavelet-DB4 DataReduction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.9 Average Max Accuracies by characteristics using the Wavelet-Haar DataReduction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

List of Tables

3.1 Breakdown of Data based on the Shoe Characteristic . . . . . . . . . . . . 33

3.2 Breakdown of Data based on the Gait Characteristic . . . . . . . . . . . . 33

3.3 Breakdown of Data based on the Direction Characteristic . . . . . . . . . 33

3.4 List of Data Streams Used . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.5 List of Dimensionality Reduction Technique Used, (a), and Characteris-tics that were assessed, (b). . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1 Average accuracy using various Data Streams with dimensionality reduc-tion techniques analyzing characteristics . . . . . . . . . . . . . . . . . . . 49

4.2 Average accuracy obtained by data Streams using various dimensionalityreduction techniques analyzing characteristics . . . . . . . . . . . . . . . . 50

4.3 Student’s t-test values for accuracies obtained by using Fourier and wavelettransforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4 E�ciency Score based on Accuracy of ANN and Number of Inputs used . 55

4.5 Number of ANNs using Wavelet Transforms which obtained the HighestAverage Accuracy and Average Di↵erence between Accuracy sorted byMother Wavelets and characteristics . . . . . . . . . . . . . . . . . . . . . 56

4.6 Average Accuracy and Average Di↵erence between Accuracy sorted byMother Wavelets and Data Stream Using Wavelet Transforms . . . . . . . 56

4.7 Student’s t-test values for accuracies obtained by using Wavelet-Haar andWavelet-DB4 transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.8 Range of Average Accuracies over Data Streams found in ANN usingdi↵erent Dimensionality Reduction techniques for each Characteristic . . . 60

4.9 Di↵erence of Average Max Accuracy between combined and single datastreams over Wavelet Transforms For the Merged Characteristic . . . . . 68

4.10 Accuracies using combined Strain Gauge 4 (SC-G4) data stream withWavelet Data Reduction Techniques to classify the Gait, Shoe, and Turncharacteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

A.1 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Gait Characteristic 75

A.2 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Gait Characteristic 75

A.3 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Shoe Characteristic 76

ix

List of Tables x

A.4 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Shoe Characteristic 76

A.5 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Turn Characteristic 77

A.6 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Turn Characteristic 77

A.7 Average Maximum Accuracy Of ANNs and the Average Epoch needed us-ing Fourier Dimensionality Reductions to analyze all of the characteristicsmerged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

A.8 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze all of the character-istics merged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

B.1 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Gait Characteristic 80

B.2 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Shoe Characteristic 81

B.3 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Turn Characteristic 82

B.4 Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Merged Charac-teristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

C.1 Standard Deviations of ANNs analyzing the Gait Characteristic using theFourier Data Reduction Technique . . . . . . . . . . . . . . . . . . . . . . 85

C.2 Standard Deviations of ANNs analyzing the Gait Characteristic using theWavelet Data Reduction Technique . . . . . . . . . . . . . . . . . . . . . . 85

C.3 Standard Deviations of ANNs analyzing the Shoe Characteristic using theFourier Data Reduction Technique . . . . . . . . . . . . . . . . . . . . . . 86

C.4 Standard Deviations of ANNs analyzing the Shoe Characteristic using theWavelet Data Reduction Technique . . . . . . . . . . . . . . . . . . . . . . 86

C.5 Standard Deviations of ANNs analyzing the Turn Characteristic usingthe Fourier Data Reduction Technique . . . . . . . . . . . . . . . . . . . . 87

C.6 Standard Deviations of ANNs analyzing the Turn Characteristic usingthe Wavelet Data Reduction Technique . . . . . . . . . . . . . . . . . . . 87

C.7 Standard Deviations of ANNs analyzing all of the characteristics mergedusing the Fourier Data Reduction Technique . . . . . . . . . . . . . . . . . 88

C.8 Standard Deviations of ANNs analyzing all of the characteristics mergedusing the Wavelet Data Reduction Technique . . . . . . . . . . . . . . . . 88

Chapter 1

Introduction

The purpose of this work is to analyze the strain on horse’s hooves using machine

intelligence techniques. The intention is to determine if there is su�cient information

in the strain and accelerometer data to identify several characteristics about the horse.

The data is collected using strain gauge and accelerometers placed on the horse’s hoof.

It is then preprocessed using Fourier and wavelet transform in order to reduce the size

of the data set. The processed data is then classified using a Back Propagation neural

network. The goal of analyzing this data is to be able to detect lameness in the horse.

Lameness in horses, a specific term to describe the subject’s inability to travel in a

normal manner upon all four limbs, is a serious and costly condition. It is also one of

the most frequent health issues amongst horses [5]. There are a wide variety of causes

that leads to lameness, causing it to be fairly diagnostically challenging [6]. As this is

an initial analysis of the data there is currently not a su�cient amount of data collected

representing lame horse gait. The analysis will therefore focus on the characteristics

which are represented in the data, namely shoing, gait, and direction.

Opinions vary from veterinarian to veterinarian on lameness evaluations when done

1

Chapter 1. Introduction 2

purely subjectively [7]. Several objective lameness detection techniques have been ex-

plored [8] [9] [10], giving veterinarians an important diagnostic tool.

While this thesis does not directly look into lameness, it does provide a technique to

identify several di↵erent characteristics including gait, which may be used as a valuable

tool to lameness detection [8] [10]. This thesis is an extension on previous works which

analyzed data from the same origins using similar techniques but sampled at a much

lower frequency [11]. The results from classifying the low frequency data were quite

positive. It was possible to classify gait, shoeing, direction, surface being walked upon

and if a rider was present. With the higher frequency data, this thesis looks for a method

to reduce the size of the data before attempting to classify it.

In this study, a simple Back Propagation Artificial Neural Network is used to classify

a time series data set. The data being analyzed are accelerometer and strain gauge

measurements from several horses as they move around a track. In order to present

the entirety of the run to the ANN at once, several signal transforms are applied to

the data stream as a preprocessor. The signal transform is used as a dimensionality

reduction technique. It e↵ectively obtains the general characteristics from the data set

and summarizes it using a smaller number of variables. This thesis compares several

dimensionality reduction techniques which are used to preprocess data which is then clas-

sified by the ANN. Fourier and wavelet analysis are the dimensionality techniques which

are compared. It is demonstrated that when using the wavelet analysis to preprocess

this data set the ANN produces more accurate classification results than when Fourier

analysis is used. This work also explores the e↵ectiveness of di↵erent configurations of

data streams and analyzing the e↵ect on the ANN’s accuracy. It will demonstrate the

usefulness of using signal transforms as a preprocessor and its e↵ectiveness in analyzing

equine gait data using a Back Propagation Artificial Neural Network.

Artificial Neural Networks (ANN) are a type of Machine Learning approach and some

of these systems can be used to solve non-linear problems. Artificial Neural Networks


model the structure and functionality found in biological neural networks. The Back

Propagation network used in this work is a non-linear classifier. These systems generally

require a large amount of representative data to be able to either classify or cluster data

accurately. ANNs are capable of approximating a non-linear function in their output.

They are also useful for such as function approximation or regression analysis, which

includes Time Series Prediction and Time Series Analysis.

Time series is a sequence of data that are related through time. Time Series Predic-

tion is the attempt to accurately estimate the data following in a sequence. While Time

Series Analysis is used to extract useful information about a time series. Many ANN

architectures are not built to receive input data in a sequence. Time series based data

must be preprocessed before being served to the ANN as inputs. A common approach

to this is to employ the use of a sliding window when analyzing a time series data set.

The sliding window technique, or windowing, is designed to use a specific sequence of

the data as simultaneous inputs to the network [14]. The ANN can then process the

sequence of inputs and analyze that particular subset of data. This process then re-

peats itself as the window “slides” down to the following subset of data. However, this

technique will only “learn” knowledge from a specific subset of data, or learn knowledge

intra-subset, instead of inter-subset. This is due to the fact the ANN cannot incorporate

any data outside of the window while it is learning. In using this technique, the ANN

is not provided a “context” for the entire data set. The window the ANN is currently

using does not o↵er insight to where that window is in relations to the overall data. The

context only extends to the size of the input window and the number of elements from

the data set it processes.

In order to address this problem, several ANNs such as impulse response filters, and

Elman Networks, are built in order to incorporate past memory that it has already

learned as an input variable to the next sequence [15]. These are known as recurrent


networks. They are built with a varying depth which controls the duration of the mem-

ory, and a varying resolution which controls just how much of that memory influences

future analysis. Previous work [11] used a moving window in the analysis of low fre-

quency horse gait data. This is not viable with data which was sampled at a higher

frequency because the moving window becomes too large and computationally expen-

sive to implement. This work uses a di↵erent approach to manage the indefinite length

of time series data. Instead the data is preprocessed to create a smaller, finite length

input for use in a non-recurrent network.

This thesis will demonstrate that the high frequency horse strain data contains useful

information about the gait and the state of the horse. It also demonstrates that the

Fourier and wavelet decomposition can be used to reduce the size of the input stream

without removing the characteristics of the gait from the data.

In this thesis, Chapter two looks at the several of the techniques in detail. Chapter

three outlines the steps and procedures for the experiment. Chapter four details the

results and an in depth analysis of what they might indicate. Lastly, chapter five talks

about the future work and some changes that might be done to further the work.

Chapter 2

Literature Review

The methods that are used in this thesis are summarized in this chapter. Artificial

Neural Networks are examined, giving the readers insight on the architecture, mechanics

to both retaining and applying knowledge, and the metrics used to grade these systems.

It also includes specifics on the Back Propagation Neural Network, a particular Neural

Network which is used extensively in this thesis. This chapter also provides a summary

of a number of Dimensionality Reduction techniques, their usefulness and implications in

Feature Selection and Extraction. Two particular Signal Transforms are also examined

in detail in this chapter, as both the Fourier Transform and Wavelet Transform are used

heavily in this thesis.

2.1 Artificial Neural Networks

Artificial Neural Networks (Henceforth known as ANNs), are a type of computation

model used for machine learning and as a data mining technique. The ANN architecture

was inspired from their namesake, neurons, which plays a large role in the brain. Using a

series of interconnected neurons, called nodes, a network is formed. The Back Propaga-

tion network used in this work excels at non linear classifications and are used extensively

5

Chapter 2. Literature Review 6

in problems that is complex to solve using more traditional learning techniques. ANNs

are used in data sets where either the desired output is known, a supervised network, or

if the desired output is not known, an unsupervised network. Supervised networks are

presented with data that consists of input-output pairs, while unsupervised networks do

not require the output component of the pair. The results from a supervised network

is the classification of data, while an unsupervised network will produce a clustering of

data.

2.1.1 Architecture of Artificial Neural Networks

An Artificial Neural Network is built mainly by layers of interconnected nodes. These

connections, which mimic synapses in the biological nervous system, are called adaptive

weighted connections (otherwise known as weights). Using these weights, the ANN is able

to transfer information stored in di↵erent layers. These weights are not only responsible

for transferring data through the network, but their values adapt during the training

phase (see 2.1.2), and are used for activation during the testing phase (see 2.1.2).

There are many variations to ANNs, but the architecture used in this work can be

described using three components:

1. Input Layer

2. Hidden Layer

3. Output Layer

The input layer component acts as a receptor for the input data which allows it to

be passed into the network. These nodes form the input layer and are connected to

the nodes in the hidden layer. The hidden layer receives the results of the input layer

once they have been passed through a layer of weighted connections. The output layer,

is responsible for producing the results which the network has concluded given from


the activations of the hidden layer which is passed through a second set of weighted

connections. The results obtained from the output layer are then used to compare with

the desired output contained in the data.

Di↵erent architectures of ANN exist which may have only one or two if the layers

listed above, as di↵erent problem sets may require di↵erent types of ANNs. The most

notable di↵erence is between classification and clustering (see section 2.1.3) and the use

of an output layer. Since unsupervised networks only receive the input component of

the data set, they use only an input layer and hidden layer. The hidden layer is used

to represent the clusters in an unsupervised network. While in a supervised network,

all three component are present, as the network is presented data that consists of both

input and output pairs.

Note that the layers structure may not be used in all ANNs. It is common to refer

to layers when describing a series of nodes within the same hierarchical level in the

network. Any number of layers can be present in various types ANNs. An increased

number of layers may result in a more complex ANN. It is also possible for these layers

to be connected in di↵erent patterns. Some ANNs may have layers that are recursively

connected by having weights connecting to other layers that have occurred earlier in the

network. Some ANNs have layers that are connected to themselves. The ANN examined

in this thesis is a simple feed forward network, namely one that feeds that data forward

from the input layer through a hidden layer, ending at the output layer.

While ANNs are able to handle any di↵erent types of data, they work best with

continuous or discrete numeric values between -1 and +1 or 0 to +1. Data that are fed

into the input nodes are usually normalized to the range of -1 to +1 in order to improve

performance.


2.1.2 Training of Supervised Artificial Neural Networks

In order for a supervised ANN to operate in any meaningful way, the network must first

be trained to “learn” the data set. This phase, also called the training phase, allows

the ANN to analyze the data containing both the input-output pairs. The network

then makes a prediction of the desired output using the input data it received. It then

compares its own predicted output with the desired output from the data. The network

will then make adjustments to its weights between the layers in order to attempt to

produce a more accurate result when the input is later presented to the ANN. By doing

so, the network is able to retain the knowledge of the data in its weights, a process

known as learning. There are many types of learning in the world of ANN, as many

rules, algorithm, and formulas have been developed each with its own advantage and

disadvantages. The learning rule used in this thesis will be the Back Propagation rule

(see section 2.1.5).

The knowledge of the data which the network has extracted is represented by its

weights. With the distribution of these weights, or knowledge, across the entire network,

ANN are said to have a distributed representation of its knowledge. With so many

weights in a single network, it can be di�cult to extract the knowledge that the network

has obtained without examining the context of all of the weights.

As with any machine learning algorithm, the quality of data which it learns from

greatly dictates the e�ciency of the network. While a Back Propagation ANN normally

requires a long training time, its ability to learn complex and non-linear data is one

attractive feature. Even though the rate in which a network learns can be adjusted, the

Back Propagation Network generally learns from the training data repetitively, slowly

gaining knowledge from each presentation of the data set. In order to combat this

slowness of Back Propagation, some networks employs the use of a momentum value.

Momentum is used in order to combat high frequencies oscillation in the weight values


which hinder the learning process. Due to the incremental manner in which weights are

adjusted, the Back Propagation network can learn patterns that are most representative

of the data and are not overly a↵ected by outliers or small variances in the data. The

more representation of the problem the data contains, the easier the network is able to

distinguish the patterns within it accurately. Over learning occurs when the network is

trained to much and it starts memorizing the data and can lose the ability to generalize

about the data set overall.

2.1.3 Classifying vs Clustering

Clustering involves analyzing features in order to group them and to find relationships

within the dataset. These clusters are not labeled and therefore not classified.

A commonly used technique is to label clusters once they have been created. The

labels are taken from classified data points. Classification generally requires feedback

during training which indicates the quality of the results and is therefore referred to as

Supervised Learning. Clustering does not require feedback and is called Unsupervised

Learning. Clustering is an extremely useful tool and several techniques such as self

organizing maps, or k-means clustering are popular.

2.1.4 Testing of Supervised Artificial Neural Networks

Once the network has been trained it is then graded for its performance during the

testing phase. The ANN is given another set of data which it processes, and the results

from the network’s output layer are then compared to the desired output from the

dataset. Using this comparison, the network is graded on for its accuracy. There are

many metrics to which this accuracy is obtained, it may be as simple as a percentage

on how much of the output data is classified correctly. When analyzing continuous data


sets such as predicting a particular numerical value, a Mean Squared Error is often used

to represent the performance of the ANN.

As with the training data, the testing data requires the expected or desired output

in order to grade the ANN’s accuracy. The testing data can be a subset of the training

data but in order to fully demonstrate the ANN’s ability to extrapolate useful knowledge,

most studies that use ANNs will use separate training and testing sets. A larger subset

is often used solely for the training phase, and a smaller subset that the ANN has not

been exposed to during training is used in the testing phase.

2.1.5 Back Propagation Artificial Neural Network

The Back propagation Artificial Neural Network is a feedforward network. It is built

from a number of hidden layer and an input and output layer. This network also

employs the learning rule which its name stems from. A visual representation of the

Back Propagation ANN using the notation found in this section can be found in Figure

2.1.

Initially, randomized values between [�1,+1] are assigned to every weight in the

network. During the training phase, inputs are processed through the hidden and output

layer (given a single hidden layer ANN) as follows:

bi

= S

hX

h=1

ah

Vhi

+ ✓i

!(2.1)

cj

= S

iX

i=1

bi

Wij

+ ⌧j

!(2.2)

where:

S(x) = (1 + e�1)�1 (2.3)


Figure 2.1: 3 Layer (a,b,c) Back Propagation Artificial Neural Network with twoweight layers (V,W)

where a, b, and c are the three layers in the network, which are the input, hidden,

and output layers respectively. While ah

, bi

, cj

corresponds to a hth, ith, and jth node

in that particular layer. The activation value of these nodes are calculated by equations

2.1 and equation 2.2. Equation 2.3, also called the sigmoid function, is also used in

calculating the activation values for nodes in these layers.

V and W also represents the weights in the network, with V being the weights

connecting a and b, and W being the weights connecting b and c. More specifically, Vhi

is the particular weight that connects nodes ah

and bi

, where Wij

is the weight that

connects nodes bi

and cj

. ✓i

and ⌧j

are both threshold values, which exists for each node


in b and c respectively. These threshold values allow for a particular node’s value to be

either amplified or reduced and are adjusted during the learning segment of the training

phase. Once an estimate has been made in the output layer via an input-output pair,

an error calculation occurs:

dj

= cj

(1� cj

)(ckj

� cj

) (2.4)

where dj

is the error between the desired output, ckj

(given to the network as the

input-output pair) and the actual output, cj

, is the result that was the network arrived

at. By using the error dj

, the network is able to propagate back into the hidden layer

and calculate the error, ei

, amongst it:

ei

= bi

(1� bi

)

0

@qX

j=1

Wij

dj

1

A (2.5)

Once this has been calculated, the weights adjust using the error. This learning is

doing via the following:

�Wij

= ↵bi

dj

+ ��Wij

(n) (2.6)

�Vhi

= �ah

ei

+ ��Vhi

(n) (2.7)

where dj

and ei

are the error equations calculated from 2.4 and 2.5. ↵ and � are two

constants set prior to the training phase and are known as the learning rate. The learning

rate dictates how greatly the ANN should adjust its weights for a single data record.

With a higher learning rate, the system learns more quickly and requires repetitions. �

is the momentum value, used in combination of Wij

(n) and Vhi

(n), which are the weight


changes made. By being able to “remember” previous weight changes, this momentum

value will be able to filter the fluctuating nature of weight adjustments.

During learning, the threshold values are also adjusted as follows:

�⌧j

= ↵dj

(2.8)

�✓i

= �ei

(2.9)

The network performs the error calculation and weight adjustment for each set of

weights. The ntwork repeats the training phase with di↵erent input output pairs until

the error is su�ciently low or another criteria is met to end learning.

After these adjustments has made for each node, the neural network propagates

backwards the network until there are no more hidden layers to update. After such, it

repeats the training phase until the error is su�ciently low or using another criteria to

end learning.

2.2 Dimensionality Reduction

Dimensionality Reduction is an aspect of machine learning, it involves the transforma-

tion of data that consists of a large number of channels or dimensions into a lower

dimension “description” [16]. This transformation is usually necessary due to a large

amount of data. It can also be used when it is unclear if a particular stream of data

is necessarily relevant to the current problem being studied. Being able to retain rel-

evant characteristics of the data while reducing the number of variables into a smaller

and more manageable size is the goal of Dimensionality Reduction. There are several

benefits to this process.


Firstly, it is easier to process smaller data sets which requires less overhead and

computing time. Solutions are easier and faster to achieve which allows for faster model

development and testing. This causes a significant reduction in the time needed to both

test and train models. Since most models rely on multiple passes or epochs through the

dataset when learning, the model can therefore save a large amount of training time

with a reduced dataset.

Secondly, removing noise or misleading data may increase the accuracy of the model.

Dimensionality reduction in its conceptual core will requires parts of the data to be

omitted. When processing with any data captured in real life, “outliers” or “noise” are

bound to be included in the data set. If these are lost in the process of dimensionality

reduction, the model is then able to concentrate more on the core of the data and able

to emphasize more on the relevant features in order to solve the problem.

Third, dimensionality reduction allows for the use of simpler models. If the models

do not need to learn or filter out “outliers” in the data, it can learn more rapidly. This

also helps reduce the complexity of the model. With a broader generalization of the data

that is learned, this allows for simpler representation and interpretation by the model.

This also then leads to a smaller overhead and computing time when applying to future

analysis[17][18].

2.2.1 Feature Selection

One of the main approaches to dimensionality reduction is the concept of Feature Se-

lection. Feature selection involves the selection of a subset of the original data and

omitting the rest. This subset of data will then be used as a representation the original

data when used in the model. A central assumption while using feature selection is that

the data contains some amount of “irrelevant” data that does not provide any insight to

the problem being studied. Therefore, the problem would not benefit from the inclusion


of this “irrelevant” data and training should improve if it is removed. However, not all

unnecessary data is based on its relevancy to the problem. Data streams who’s relevant

features may also be present in other data streams can be considered “redundant” and

therefore could also be omitted from the data used by the model. Not only is feature se-

lection useful as a dimensionality reduction technique but also it provides an indication

of what type of model would be needed using the reduced data.

In order to obtain “useful” data to construct a model feature selection employs the

three of the following approaches [19]:

1. Wrapper

2. Embedded

3. Filter

Wrapper methods are most easily described as taking a “Brute Force” approach to

optimization. Subsets of the original data are, scored against the model that is to be

used after feature selection. The subset of data that obtains the least number of errors,

or highest accuracy, within the model will be chosen as the result of this method. This

is a simple method which generally yields the best results with respect to the model. It

is often criticized for its exponentially large computational cost, due to the large number

of combinations of subsets that may exist in a data set. There are several variations

of the wrapper technique that allow for similar results while using fewer computational

resources. Such variations include using a stepwise technique to either include or remove

a particular variable from data set based on the impact that variable has on the accuracy

of the model. Since wrapper methods are often specifically optimized for their particular

model, the results are usually incompatible with a di↵erent model or dataset [20].

An embedded algorithm generally builds a feature selection method within the model

that is being constructed, or the learning algorithm that the model employs [21] [22].


Such algorithms include the Recursive Feature Elimination [23], which is with a Support

Vector Machine and the LASSO method [24], used with a Linear Regression Model.

Filter approaches, instead of using a score based on a particular model’s errors or

accuracy with the dataset, use other metrics to rank subsets of the data. These measures

might include point-wise mutual information[25], Francis Galton work on Karl Pearson’s

product-moment Correlation [26][27], and Markov Blanket filters [28]. This approach

allows the study to concentrate more on the data which allows for a feature selection

result that is independent of a particular model, therefore the data set can be used

amongst a variety of models [29].

2.2.2 Feature Extraction

Another approach to dimensionality reduction is by using feature extraction. Feature

extraction employs the concept that if the data contains a large amount of redundancy,

a set of characteristics or “features” may be able to be used describe the data without

losing the original information that may be relevant [30].

There are many di↵erent algorithm and approaches to feature extraction, each may

have di↵erent results depending on the data’s characteristics. Some of the most com-

mon methods for feature extraction are Principal component Analysis, Factor Analysis,

Projection Pursuit, Independent Component Analysis [31][32].

Some feature extraction methods are tailored to a specific or particular type of data,

this thesis will look more in depth at a series of signal analysis.

2.2.3 Signal Transform

Signals are sometimes referred to in the world of communications systems and electronic

engineering as “a function that conveys information about the behavior of a system or


attributes of some phenomenon” [33]. Signals usually are conveyed over space or another

variable such as time and are commonly used to transfer or encode certain information

that requires to be expressed in such a form. Signal processing is based upon the concept

that there is an input signal which in turn produces an output signal [33].

A signal transform is a series of mathematical transformations applied to an infor-

mation signal. They are mostly used to either improve or extract significant data from

the signal. While there are numerous and di↵erent types of signal transforms available,

this study will use two: the Fourier transform [34] and wavelet transform[35].

2.2.4 Fourier Transform

The Fourier transform is closely related to the Fourier series (see Figure 2.2). The

Fourier series describes complex periodic signals as a sum of an infinite amount of

simpler component waves, namely sine and cosine. The Fourier Transform takes this

concept, generalizes it, and uses it to decomposes signals in order to obtain their com-

ponent sine and cosine waves [36].

The Fourier Series can be described mathematically as the following [37]:

g(x) =1

2a0 +

1X

n=1

an

cos⇣n⇡x

l

⌘+

1X

n=1

bn

sin⇣n⇡x

l

⌘(2.10)

where g(x) is the approximation of the true function, or the signal to be transformed,

f(x). As n ! 1, the closer the Fourier Series representation, g(x), comes closer to being

an exact representation of original signal f(x). Due to its cyclic nature, the Fourier series

does represent an approximation using a periodic function. Being a periodic function,

its most basic characteristic is that g(x) will be repeated later, exactly one period, so

that g(x) = g(x + 2l) = ... = g(x + 2nl). The Fourier series is comprised mostly

of the Fourier series coe�cients, an

and bn

. a0 is a special Fourier series coe�cient


Figure 2.2: Fourier Series Approximation through di↵erent K values(Red) of a SquareWave (Blue)


that is a constant which determines where the “the average value” of f(x) exists and

therefore where g(x) should be based upon. These Fourier series coe�cients can be

mathematically represented as [38][39]:

an

=1

l

Zl

�l

g(x)cos⇣n⇡x

l

⌘dx, n = 0, 1, 2, ... (2.11)

bn

=1

l

Zl

�l

g(x)sin⇣n⇡x

l

⌘dx, n = 1, 2, ... (2.12)

The Fourier series coe�cients can also be written as a complex number which com-

prises of both a real and imaginary component. By representing these coe�cients in

a complex form, it decrease on computing power and time required to generate them.

Consider the follow representation of the Fourier series[37][39]:

g(x) =1X

n=1Cn

ein⇡x/l (2.13)

where:

Cn

=1

2l

Zl

�l

g(x)e�in⇡x/ldx (2.14)

where Cn

is the nth Fourier coe�cient of g(x). The Fourier transform takes this

series, generalizes it, replaces the discrete Cn

with a continuous function [40]. The

Fourier transform is finally represented as the following [39]:

F (g(x)) = G(f) =

Z 1

�1g(x)e�2⇡ixydx (2.15)


Figure 2.3: Fourier Transform of a f(x), shown in red, on the time domain. Thecomponent waves, in blue, are then plotted along the frequency domain as peaks as the

result of the Fourier Transform

where F (g(x)) represents the Fourier transform of g(x), with x once again repre-

senting time (seconds), and while f represents the frequency (hz).

The result of a Fourier transform (see Figure 2.3) will return the component waves

that make up g(x), in both frequency and their respective amplitude. In order to reverse

the Fourier transform, F�1(G(f)) is used to represent the inverse Fourier transform

[40][39]:

F�1(G(f)) = g(x) =

Z 1

�1G(f)e�2⇡ixydx (2.16)


The discrete Fourier transform uses a discrete function and in turn it is able to

generate a series of complex coe�cients from a data set. Discrete Fourier transforms

are used for their ability to examine the “periodic-ness” of the data set [41]. Due to the

nature of the transform, information contained in the lower level Fourier coe�cients,

at both the complex and sine/cosine representation, are of higher relevance than ones

contained in coe�cients in higher levels.

By transforming the original signal f(x), into its Fourier representation, g(x), allows

for f(x) to be represented by a series of complex coe�cients Cn

extracted from g(x),

where n = {0, ...,m}. Since a discrete Fourier transform generates the same number of

Cn

as the series of real numbers which it is trying to transform, it would not function

well as a dimensionality reduction technique. By using a discrete Fourier transform,

coe�cients used to represent the original signal f(x) are in order of relevancy. Hence

the number of these coe�cients can be reduced simply with the exclusion of the less

relevant coe�cients. The variable m is used to represent a general “cuto↵” point, an

arbitrary point which excludes less relevant coe�cient, hence decreasing the number of

coe�cients, or “features” needed. While doing so, it still retains the maximum likeness

to f(x) given the amount of coe�cients used. This cuto↵ technique has also been used

in past studies [42] [43].

2.2.5 Wavelet Transform

The Fourier transform has been the staple of many applications in the field of Signal

Processing. Easily representing a signal in terms of amplitude and frequency, the Fourier

analysis allows the users a quick snapshot of the characteristics of a signal [44]. However,

sometimes too much information is required for the Fourier Transform to recapture the

the original signal. A Fourier transform can only provide information on a frequency

scale but not on a time scale, or simply, it does not have a time resolution.


In order to solve this problem, another technique named the Short Time Fourier

Transform was developed. This Fourier transform overcomes the lack of a time resolution

by windowing the original signal, f(x). Windowing involves the sampling of a small

section of f(x), calculating the Fourier Transform of that particular section, and by

sliding this window down the signal, eventually it covers the entirety of f(x). Results

from the Short Time Fourier Transform are in a frequency, amplitude, and now a time

scale. With this added dimension, the Short Time Fourier Transform becomes a three

dimensional transform. Unfortunately, the Short Time Fourier Transform su↵ers from

a static window size. If the size of the window, or resolution, is too small, it generates

poor frequency resolution, however, if the resolution was too large, it would result in

poor time resolution [45].

The wavelet transform is a type of signal transformation which provides information

on both time and frequency simultaneously, or a time-frequency representation [46].

While the Fourier transform breaks down the signal, represented by f(x), by sine and

cosine waves, the Wavelet transform breaks down f(x) into “wavelets” by using scaled

(y = g(2x)) and shifted (y = g(x + 2)) versions of a “mother wavelet”. It also uses

a variation of the windowing technique, by implementing a dynamically sized window

instead of a static one, see Figure 2.4.

A wavelet is defined as a wave like signal, but unlike the sine or cosine wave, it is a

function of a finite length, and both begins and ends at zero amplitude. In other words,

a wavelet has compact support, due to the fact that it has a value of zero outside of this

finite interval. A wavelet must also oscillate around this central amplitude, meaning the

average value of the wavelet in the time domain must be zero. There are some specifically

named wavelet, such as Haar or Daubechies4 (db4)(see Figure 2.5) that are widely used

in wavelet transform. These wavelet used are known as “mother wavelets” because they

are the sole wavelet which the wavelet transform is based on and the transform will use


Figure 2.4: Results of the Short Time Fourier Transform, and the Wavelet Transform,and their di↵erence in time resolution

these “mother wavelet” in order to obtain di↵erent values and for calculation throughout

the transform.

By using these “mother wavelets” as a basis for wavelet transform, the Continuous

Wavelet Transform is defined as the following [47]:


Figure 2.5: Visual representation of both the Daubechies 4, aka db4 Wavelet (Left),and the Haar Wavelet (Right) [1][2]

CWT x

(⌧, s) = x

=1ps

Z 1

�1x(t) ⇤

✓t� b

a

◆dt (2.17)

where ⌧ is the translation, or the location of the window, s represents the scale of the

current transform, and the mother wavelet is defined as ⇤ � t�b

a

�. The mother wavelet

is being both scaled by a and shifted by b, this is because wavelet transform does

not utilize the exact “mother wavelet” to break down f(x). The Continuous Wavelet

Transform uses varied versions of the prototype wavelet in order to break down the

signals. A single calculation is done for each window of ⌧ , and these calculations are

repeated until the “mother wavelet” is shifted down to the end of the signal. This

process then repeated for all of s, which scales the “mother wavelet” (see Figure 2.6).

Each computation for each scale of s results in a Wavelet Coe�cient on the time-scale

plane, which indicates that particular window’s likeness to the version of the “mother

wavelet” used at that time. Note that the size of the window is inversely related to s.

While useful for calculating every possible variations of the “mother wavelet” for

each scale of s, the Continuous Wavelet Transform is however both a time and resource

consuming technique. The results of the Continuous Wavelet Transform also generate a

very large amount of data which is not desirable in a feature extraction technique. By


Figure 2.6: A simplified visual representation of the scaling and shifting techniquesused in Continuous Wavelet Transform [3]

limiting s to a certain subset of values, the Discrete Wavelet Transform is an alternate

method that reduces both resource intensity and amount of data generated. Using a

“two channel subband coder”, developed by Mallat [48], the signal f(x) is decomposed

by both a low pass wavelet filter, LPF and a high pass wavelet filter, HPF, see Figure

2.7. These decomposition filters are derived from the specific “mother wavelet” used

at the time, the filter will also act as a low-pass or high-pass filter by decreasing the

amplitude signals with the frequency that is beyond the cuto↵ point. Since the product


of these two filters combined would result in half the band limit, these results can then

be decimated, or omitting every second coe�cient due to Nyquist’s rule. Nyquist’s rule

states that a function can be perfectly reconstructed if the sampling rate is no greater

than twice the band limit. Since the filter e↵ectively halves the band limit, it is therefore

applicable to also reduce the band limit by half, therefore allowing every other data point

to be omitted.

For example if f(x) spans from a frequency band of [0,⇡] and contains 512 sample

points, after passing f(x) through a LPF, the coe�cients would contain frequency band

that spans from [0,⇡/2], while the results of the HPF would contain the rest of the

frequency band, namely [⇡/2,⇡]. The result of the LPF and the subsequent decimation

is labeled cA1. In other words, the 1st level approximations coe�cients. Similarly, the

result of the HPF along with the decimation is called cD1, or 1st level Details coe�cients.

The names approximation and details comes from the concept that the lower parts of

the frequency band would contain a smoother and more approximate version of f(x),

while the details of f(x) are left in the higher frequency band.

After the 1st level of decomposition, cA1 is then passed into yet another LPF and

HPF, while cA1 remains for future use. This concept repeats itself after a certain level of

decomposition which results in the Discrete Wavelet Transform coe�cients being repre-

sented via the set of details coe�cient, cD1, cD2, ...cDn and the final level approximation

coe�cient, cAn, see Figure 2.8 [49].

By using these sets of the Discrete Wavelet Transform coe�cients, g(x) may be

reassembled, but in order to minimize the number of coe�cients used while still retaining

likeness to f(x), several techniques have been suggested in the past. Marghitu and

Nalluri [50] had suggested using a summation the coe�cients squared to be used as a

representation, as this will still retain the distribution of power over the levels for future

“models”. Paya et al. [51] had also demonstrated a method where using a threshold

value that would reset any coe�cient value below it to zero. This threshold value was


Figure 2.7: A simplified visual representation of the Approximation and Details re-sults from Decomposition Filters used in a Discrete Wavelet Transform

set to the reference signal’s largest value. After doing so, they had also were able to

obtain the 10 most dominant features, or coe�cients, and process them in the model

using their wavelet number and time, along with their amplitude. While Tamura et al.

[52] had found that using the specific nth order coe�cients to be su�cient for their

model.

By using some variation of wavelets coe�cients to represent g(x), information about

f(x) may be kept in a small number of coe�cients derived from the Discrete Wavelet

Transform, making it a valuable candidate for Feature Extraction.


Figure 2.8: Resulting Coe�cients from a Discrete Wavelet Transform

Chapter 3

Methodology

In this chapter, the series of steps taken to approach this study are examined. The

data will be described, it’s origins, naming convention, and how specific characteristics

were chosen from this data. The preprocessors used in this thesis as well as the details

and decisions that were made while building them will be presented. This chapter will

then include the description of the Artificial Neural Network’s details that this thesis

will be utilizing. Several constants and parameters are also listed. Lastly, a few minor

procedural notes are presented as well as a summary, outlining the study.

3.1 Data

3.1.1 Origins

The data used in this study was obtained over the course of two months using IOTech

equipment in Ontario, Canada. Five horses were included in this study. They were given

four test runs of one minute each. Each of these runs were recorded with 18 sensors, 3

accelerometer and 15 strain gauges at 20,000hz and 5,000hz respectively. Attached to

the right fore hoof the accelerometers were responsible for measuring the acceleration of

29

Chapter 3. Methodology 30

the hoof in the X,Y, and Z axis, all in the unit of Gs, where G is defined as a single unit

of the earth’s gravitational pull, or simply G = 9.81m/s2.

Three gauge sensors were placed at the wall of each the medial (inside) heel, toe,

and lateral (outside) heel of the right fore hoof. A fourth sensor was placed between the

medial heel and toe, and the fifth was placed between the toe and lateral heel. Each

sensors measured the micro-strain which is equivalent to producing a deformation of one

part per million (10�6) at each of their respective location in three vectors (R1/R2/R3).

The gauges were labeled Gauge1, Gauge2, Gauge3, Gauge4, and Gauge5, representing

the sensors placed at the medial heel, between the medial and toe, toe, between the toe

and lateral, and the lateral heel respectively. Figure 3.1 visualizes this arrangement.

Of the five horses being studied, four were described as being “pacers” while the last

one was a “trotter”. These are both two-beat alternating gaits. A “pace” describes legs

that are on the same side of the subject moving together. Both legs on the same side

are in the air at the same time, and are also on the ground at the same time. A “trot”

describes diagonal legs moving together. The right fore limb and left hind limb would

be in the air together, and vice versa.

Each horse proceeds through the course in a counterclockwise fashion. In doing so

each run was then broken down into six parts. This consisted of three straight portions

were alternated with three portions where the horse would be making a left turn. This

was done in all four of their runs. Three runs were done shod, while the last one was

barefoot.

3.1.2 Computable characteristics

This study was looking for several characteristic that it hopes it can classify. The

desirable characteristics would be described as:

• consistently observable throughout the duration of the run


Figure 3.1: Right fore hoof with location of Strain gauge (G1-G5)

• di↵erentiable between runs

• observable from from data given by the accelerometer and the strain gauges

• able to garner enough training and testing sets to be thoroughly train and test theclassifier

After the data was analyzed with these attributes in mind, three characteristics were

chosen for this study:

1. Shoe, whether the horse was shod or barefoot


2. Gait, whether the horse was pacing or trotting

3. Direction, whether the horse was running straight or making a left turn

The data was divided in such a way that each data fragment can represent the

three characteristics listed above. Data fragments were divided in such a way that they

would only contain one of the two possible state for each of the three characteristics.

For example, a data fragment would only contain information where the horse was

making left turn or moving straight, not both. This also applied to both the shoe and

gait characteristic. In order to preserve consistency, records in the data fragments also

had to be recorded consecutively and would only contain information from one of the

eighteen sensors as described in section 3.1.1. Figure 3.2 visualizes data from a single

sensor during the course of a run by a sample horse. Six data fragments are obtained

this run, the fragments in red representing data fragments in sections of the run where a

left turn was occurring, and the fragments in blue representing data fragments where the

subject was moving straight. Since the state of both the Gait and Shod characteristic

are consistent from run to run, no further division of data fragments are necessary.

For this study, in each of the five horse’s four runs were divided into six data frag-

ments. The numbers of data fragments between each state of the three characteristics

can be found in Tables 3.1, 3.2, and 3.3. The number of data fragments were determined

by calculating the combinations that of sensors, number of runs, parts of run, and horses

that represent that state. For example, in calculating the Pace state of the Gait char-

acteristic, 18 sensors measured each of the 6 parts of 4 runs with 1 horse, which results

in 432 (18⇥ 6⇥ 4⇥ 1) data fragments.

In total, the dataset was split into 2145i separate data fragment (5 horses ⇥ 18

Measurements per horse ⇥ 4 Runs per horse ⇥ 6 Parts per run).

iThere are 15 less data records due to the fact that there were no Dorsal Stain Gauge data for thehorse “Art A↵air” during its last left turn of its 4th run


Figure 3.2: Sample Data from a Sensor divided into Data Fragments

Table 3.1: Breakdown of Data based on the Shoe Characteristic

No. of No. of Parts of No. of No. ofState Sensors Run Run (/6) Horses FragmentsShod 18 3 6 5 1620Barefoot 18 1 6 5 525i

Table 3.2: Breakdown of Data based on the Gait Characteristic

No. of No. of Parts of No. of No. ofState Sensors Run Run (/6) Horses FragmentsPace 18 4 6 4 1713i

Trot 18 4 6 1 432

Table 3.3: Breakdown of Data based on the Direction Characteristic

No. of No. of Parts of No. of No. ofState Sensors Run Run (/6) Horses FragmentsStraight 18 4 3 5 1080Left Turn 18 4 3 5 1065i


3.1.3 Naming Convention

In order to discuss a specific data fragment, this study will reference to its characteristics

as described above. The following naming convention was employed:

R#SHHD #.SensorType

R# = Run Number (1,2,3, or 4)

S = Accelorometer/Strain Gauge (A/S)

HH = Name of the Horse (e.g. Art A↵air was labeled as “AR”)

D = Direction (L/S)

# = Part of run (1,2, or 3)

SensorType = Which Accelormeter or Strain Gauge (“X” or “Gauge5R3”)

For example, R2SNIS 3.Gauge3R1.txt would describe the data found in:

R2 = Run Number 2

S = Strain Gauge

NI = Horse “Nicklers”

S = Straight

3 = Part 3 of Run

Gauge3R1 = Strain Gauge 3 in Vector 1

Figure 3.3 is an extension of Figure 3.2. It includes labels from the naming conven-

tion. Figure 3.3 displays the data obtained captured by the accelerometer in the X axis

from the 3rd run of horse “Nicklers”. It also shows how the resulting data fragments

from this run were labeled.


Figure 3.3: Sample Data from a Sensor divided into Labeled Data Fragments

3.2 Dimensionality Reduction

With 20,000 records recorded per second by the accelerometer (5000 per second for strain

gauge data), most data fragments contains anywhere from 100,000 to 600,000 samples.

Using any single data fragment directly as an input to any feed forward Artificial Neural

Network would be computationally expensive. If a normal 16 input Feed Forward Neural

Network would require 16 ⇤ 20 ⇤ 2 = 640 weights. A Neural Network with 600,000 input

nodes, 1,000,000 hidden nodes and 2 output nodes would require 1.2 ⇤ 1012 weights,

which would be 1.875 ⇤ 109 times larger in comparison.

In order to run an e�cient and accurate model it is imperative that the data is

reduced to a more manageable size. The dangers of reducing such a large dataset is

the that the reduced set will not be able to accurately represent the original data in


respect to the characteristics that are being examined. In other words, the desirable

characteristics of the dataset may be lost upon reducing it. In order to preserve the

data’s characteristics, several methods were examined to determine which are the best

based on its performance when it is to be classified.

Due to the rhythmic, cyclic, and repeated motions of gait, the data received at each

gauge or accelerometer are highly repetitive. Each waveform over a period of time

resembles the wave previous to it. By breaking down the data into these separate waves

it is hoped that the they will both represent the overall data and will also provide the

model with a smaller number of inputs. Techniques to accomplish this have been in

place for use as signal transforms for many years. Since signals can often be nearly

identical waves following each other they can often be represented by a particular wave

segment. Signal transformation is most commonly used for filtering and manipulating

a signal in order to improve or extract particular information from it.

3.2.1 Discrete Fourier Transform

The Discrete Fourier Transform, see 2.2.4, was first selected due to its popularity,

relative simplicity and the speed of the algorithm. The data was first treated to a discrete

Fourier transform by using the Fast Fourier Transform algorithm via the FFTW3 library

[53]. By processing a single data fragment using the Fourier transform, this resulted in

a set of Fourier coe�cients. Since lower frequency coe�cients generated by the Fourier

transform are more representative of the data than their higher frequency counterparts

it is feasible to use these lower frequency coe�cients as a representation the original

data. A subset of m lowest frequency coe�cients were used where:

m = 8, 16, 32ii

iiSince every fast Fourier transform in the data produced a “0” for its second coe�cient, the imaginarycomponent of A0, see 2.11 and 2.13, therefore this coe�cient was then omitted in the Fourier transform


These subsets were then used as the Fourier representation of that particular data

fragment. In this study, each value in each subset were used as input to the ANN.

Larger number of coe�cients were also taken but were later deemed to have provided

insignificant improvement to the results, see section 4.2.2.

3.2.2 Discrete Wavelet Transform

The Wavelet Transform, see 2.2.5, was also selected in this study in order to overcome

the lack of time resolution in the Discrete Fourier Transform. A one dimensional Discrete

Wavelet Transform is performed on all of the individual data fragments decomposed over

thirteen levels. Both a “Haar” and a “DB4” mother wavelet were examined. This was

done using the PyWavelets library [54]. The transform produced 13 levels of wavelet

details coe�cients ([cD1, cD2, ...cD13]) and a singular set of wavelet approximation

coe�cients (cA1). With these thirteen levels of wavelet coe�cients the summation of

each level of details coe�cients squared are calculated. These thirteen summations

are the representation of the power distribution over the levels of wavelet coe�cients.

These summations were also used as the wavelet representation of that particular data

fragment.

While there were many methods that can be used to find an appropriate represen-

tation of these wavelet coe�cients the summation of the squares over each level, as

suggested by Marghitu and Nalluri [50] was chosen. This was due to their study also be-

ing based on data about gait and gait like characteristics. All thirteen of the summation

values were used as inputs directly into the ANN.

representation of the data. However, the names of the transformation will still be referred to as usingthe numbers, m, instead of m-1


3.3 Artificial Neural Network

Following the dimensionality reduction procedures the transformed data streams are

then used along with their characteristic (Gait, Shoe, Turn) as an input-output pair to

an Artificial Neural Network. This was constructed to be a three layer Back Propagation

network with a learning rate of 0.05 and a momentum of 0.80. The network used the

results obtained from one of the six dimensionality reduction techniques in section 3.2 as

input into the network. The network would therefore have either 7, 15, or 31 (see ii from

section 3.2.2) input nodes for data that was preprocessed using the Fourier transform,

and 13 inputs nodes for data that was preprocessed using the wavelet transform. This

was done in conjunction with their respective gait characteristics (described in section

3.1.2) as the desired output for the network.

After the training phase of each epoch the ANN immediately entered the testing

phase in which their accuracy was calculated in order to prevent the network from being

over trained. In the testing phase the accuracy was determined to be the percentages

of records the ANN classified correctly out of the total number of records given. The

network used 67% of the data set during the training phase and the remaining 33% of

the data set was used during the testing phase.

This was done for 10,000 epochs, an arbitrary number chosen to provide all of the

ANN more than enough epochs to reach each of their maximum accuracy rates. Since

the ANNs will be tested at the end of each epoch, only the highest accuracy of the ANN

over the course of the 10,000 epochs will be used in the analysis. This was done to

prevent regression, as some ANNs had over-trained after reaching their highest accuracy

after a very few number of epoch. Using this method allows the results of the ANNs

to be based on it’s best accuracy, not the accuracy that is determined by an arbitrary

number of epoch.


The hidden layer was comprised of 3n2 nodes, where n is the number of input nodes.

This number was selected to fit the criteria that few studies such as Blum [55] and Lino↵

and Berry [56] have suggested. The data set that is used as inputs is normalized between

[�1, 1]. The output characteristics were represented by two nodes each, one for each state

of the characteristic. The node with the largest value was considered “activated”. For

example, the activation of one output node would represent the subject was shod, while

the activation of the other would represent that the subject was barefoot. The results

from each of the ANN were then compiled. Their accuracy and the epoch needed to

arrive at that accuracy were recorded.

3.4 Data Procedures

3.4.1 Data Streams

In this study a data stream refers to the sensor which the data had been measured from.

These data streams are sets of data fragments that were captured by that particular

sensor. The data used in this study is comprised of eight “primary” data streams: three

accelerometer (A-X, A-Y, A-Z) measurements and five strain gauges (S-G1, S-G2, S-G3,

S-G4, S-G5). Each of the five strain gauge recorded three vector measurement, which

the primary strain gauge data stream treats the measurements from these vectors as

three separate data fragment. For example, while the accelerometer in the X direction

provided only 120 data fragment for the analysis of the “Gait” characteristic (4 Runs ⇥

5 Horses ⇥ 6 Parts of Run), strain Gauge1 would contain 360 data fragment (3 Vectors

⇥ 4 Runs ⇥ 5 Horses ⇥ 6 Parts of Run), see Figure 3.4 and 3.5.

Another set of “secondary” or “combined” data streams was created for this study: a

combined accelerometer (AC) and five combined strain gauges (SC-G1, SC-G2, SC-G3,

SC-G4, SC-G5) . These data streams combined similar data fragment during the same


time frame as one input-output pair. The combined accelerometer uses data fragments

containing the three axis (X,Y,Z) as one record for the training/testing data set. While

the combined Strain Gauge uses data fragments containing the three vector measure-

ments (R1,R2,R3) as one record for the training/testing data set. Figure 3.6 and 3.7

visualizes this process, showing that an input-output pair is comprised three separate

data fragment for these “combined” data streams. The use of this “combined” data

stream requires the ANN to triple the number of nodes used in the input layer. These

“combined” data streams were created to allow the ANN to utilize similar data streams

that may improve on ANN performance.

One last “Merged Characteristic” was also added which was a combination of each

of the three gait, shoe and direction. For this particular merged characteristic, the ANN

would expand from having 2 output nodes to 6 output nodes. These 6 output nodes

represented the three characteristics state pair. As with the original ANN, only one node

between each of the characteristic state pairs may be considered “activated”. The ANN

is considered “accurate” for a particular input-output pair if it to activates all three of

the correct characteristics state correctly, leaving the other three nodes inactive. Figure

3.8 and 3.9 shows the expanded output layer between a normal characteristic and the

merged characteristic. This was done to provide a more di�cult test for each of the

ANNs, and becomes useful when determining which of the data streams obtained the

most accurate result.


Figure 3.4: Sample data process for the Accelerometers in X-Axis (A-X) data stream


Figure 3.5: Sample data process for the Strain Gauge1 (S-G1) data stream


Figure 3.6: Sample data process for the Accelerometer Combined (AC) data stream


Figure 3.7: Sample data process for the Strain Combined Gauge1 (SC-G1) datastream


Figure 3.8: Output Layer of the ANN for the Shoe Characteristic


Figure 3.9: Output Layer of the ANN for the Merged Characteristic


3.5 Summary

For this study, 14 groups of data streams were created from a group of 2145 data

fragments. Each of these streams is used individually. For each of these 14 data streams,

five di↵erent types of data reduction were applied to them. Using these compressed data

streams as input to the ANN along with one of the four corresponding characteristic or

desired output, the network is tested for their accuracy. Using the accuracies for each

of the three combination: data stream, dimensionality reduction, and characteristics,

all 280 (5 ⇤ 14 ⇤ 4) ANNs were then assembled, and accuracies noted. A list of data

streams can be found on Table 3.4, while the list of dimensionality reduction technique

and characteristics can be found in Table 3.5.

Table 3.4: List of Data Streams Used

“Primary” Data Streams “Combined” Data StreamsAccelerometers in X (A-X)

Accelerometers Combined (AC)Accelerometers in Y (A-Y)Accelerometers in Z (A-Z)Strain Gauge1 (S-G1) Strain Gauge1 Combined (SC-G1)Strain Gauge2 (S-G2) Strain Gauge2 Combined (SC-G2)Strain Gauge3 (S-G3) Strain Gauge3 Combined (SC-G3)Strain Gauge4 (S-G4) Strain Gauge4 Combined (SC-G4)Strain Gauge5 (S-G5) Strain Gauge5 Combined (SC-G5)

Table 3.5: List of Dimensionality Reduction Technique Used, (a), and Characteristicsthat were assessed, (b).

(a)

Dimensionality ReductionFourier - 8 Coe�cientsFourier - 16 Coe�cientsFourier - 32 Coe�cients

Wavelet - HaarWavelet - db4

(b)

CharacteristicsGaitShoe

DirectionAll Three Merged

Chapter 4

Results and Discussions

In this chapter, a brief summary of results based on the various tests will be described.

Following that the results using di↵erent dimensionality reduction techniques will be

compared and analyzed. Characteristics of the data are used to determine the di�culty

of accurately classifying a particular characteristic. Data streams will also be investi-

gated to determine if there are particular data streams that result in a more accurate

classification. Finally, the chapter will suggest a final configuration which was found to

most accurately classify the data.

4.1 Summary of Results

For the tests, 280 Back propagation networks were run 20 times each. Since the networks

were stochastic, values of the initial weights were randomly determined. The stochastic

nature of the network also applies to the separation of records for training and testing

sets. The results are documented in Appendix A. For each of these runs the maximum

accuracy reached by the ANN and the epoch it first achieved that accuracy is recorded.

Accuracy is determined during the testing phase using the testing data set. Using

this testing set, the ANN classifies the inputs based on the knowledge it has obtained

48

Chapter 4. Results and Discussion 49

during the training phase. The number of records the ANN tested correctly during this

phase was divided by the total of records in the phase to obtain the accuracy of the

network. A result of 0.0 would mean that the ANN did not test correctly amongst any

of the records, while a 1.0 would mean that the ANN tested correctly on all of the data

records. Standard deviations of each of the groups are also recorded in Appendix C.

An average was calculated using of the maximum accuracy from each of the 20

iterations. This average is used as the metric when determining the performance of that

particular ANN. Table 4.1 displays the average accuracy obtained when using each of the

15 data stream to determine a particular characteristic using a particular dimensionality

reduction technique. Table 4.2 displays the average accuracy obtained when using each

of the dimensionality reduction techniques on a specific data stream when analyzing

a particular characteristic. A summary of each of the data streams, dimensionality

reduction techniques, and characteristics can be found in Tables 3.4 and 3.5.

Table 4.1: Average accuracy using various Data Streams with dimensionality reduc-tion techniques analyzing characteristics

CharacteristicDR Technique Gait Shoe Turn MergedFourier-08 0.831324669 0.787970598 0.710622214 0.445948054Fourier-16 0.815858711 0.786629026 0.702716079 0.422603818Fourier-32 0.818038289 0.786007455 0.673884739 0.409961739Wavelet-DB4 0.967225936 0.90546562 0.866772589 0.731500993Wavelet-Haar 0.958902184 0.876826711 0.84202675 0.683527318

4.2 Dimensionality Reduction Techniques

4.2.1 Fourier vs Wavelets

In Table 4.1 the results of each dimensionality reduction techniques indicates a higher

accuracy is achieved when using wavelet transforms as compared to Fourier transforms.

A graph which compares each of the dimensionality reduction techniques against the


Table 4.2: Average accuracy obtained by data Streams using various dimensionalityreduction techniques analyzing characteristics

CharacteristicData Streams Gait Shoe Turn MergedA-X 0.8801028 0.844159189 0.7370732 0.48756093A-Y 0.883440326 0.8477536 0.7107317 0.50219511A-Z 0.881643147 0.856739463 0.7295123 0.5107317AC 0.894223505 0.876765105 0.75195122 0.56512199S-G1 0.868507432 0.799223495 0.71344263 0.48147542S-G2 0.851251137 0.785246 0.75409831 0.48737706S-G3 0.859275284 0.786367558 0.6476229 0.41622956S-G4 0.856686884 0.812424537 0.76172132 0.51090162S-G5 0.860828337 0.7891286 0.7831967 0.51983612SC-G1 0.889345368 0.842362042 0.79097567 0.61951224SC-G2 0.884980789 0.825160495 0.81487804 0.60658539SC-G3 0.893966653 0.822849832 0.72146344 0.5385366SC-G4 0.896020621 0.862644442 0.84853663 0.65365853SC-G5 0.895507126 0.849293989 0.86365858 0.64219511

characteristics can be found on Figure 4.1. ANNs that used a wavelet transform resulted

in a higher accuracy compared to ANNs which used the Fourier transform.


Figure 4.1: Comparison of Average Accuracies of di↵erent dimensionality reductiontechniques over characteristics


This result is believed to be due to the attributes of the two transform. While

the Fourier Transform does not incorporate time resolution, the wavelet transform does.

With real world data which are derived from an imperfect source such as an animal there

are likely to be fluctuations or changes in the data as time progresses. It is expected that

the characteristics of the data collected from the beginning of the run will di↵er at the

end of the run. Factors such as the horse being warmed up, or getting tired, contribute

to this di↵erence. By incorporating a time resolution, the Wavelet transform can adjust

to these di↵erences in the data. The Fourier transform is forced to transform the data

as a whole and is unable to make adjustments changes in the data which occur through

time.

By being able to better model the original data, this allowed the ANNs to identify

correlations more easily and thus be more accurate. As a result of these experiments the

wavelet transform is identified as the more successful dimensionality reduction technique

to use in terms of producing more accurate ANNs.

A standard student t-test two-sample assuming unequal variances was performed on

the two groups of accuracies from ANNs using the Fourier transform and ones using

the wavelet transform. The values of all the t-tests done between the two groups,

shown in Table 4.3, notably are all lower than the pre-determined p-value of 0.001

(99% confidence). A t-test result that is lower than the p-value indicates in that much

confidence that the null hypothesis can be rejected. In this case, since the p-value was

set to 0.001, is then 99% confident that the null hypothesis can be rejected. The null

hypothesis in this situation is stated as, H0 : µ1 = µ2, while the alternative hypothesis

is Ha

: µ1 6= µ2. It therefore is very statistically unlikely that the interval of the means

of both group overlap each other. In short, it is highly likely that the two sets of results

are significantly di↵erent from each other.


Table 4.3: Student’s t-test values for accuracies obtained by using Fourier and wavelettransforms

CharacteristicData Streams Gait Shoe Turn MergedAX 2.36298e-35 5.87038e-15 5.48691e-05 4.10575e-07AY 1.61843e-28 3.64691e-05 5.77122e-27 3.48164e-26AZ 3.18551e-32 1.64214e-11 1.10984e-11 1.13491e-22AC 7.64366e-44 4.12852e-19 1.98393e-18 1.12083e-33S-G1 1.40112e-35 2.45202e-24 1.45634e-23 4.16257e-31S-G2 1.54525e-30 2.6781e-18 1.22485e-38 9.15695e-32S-G3 1.64911e-38 1.11382e-25 4.45357e-50 4.4876e-47S-G4 1.40235e-33 9.11687e-18 1.36406e-39 2.03756e-28S-G5 2.86371e-26 2.13113e-11 9.39852e-23 1.72443e-26SC-G1 1.37974e-34 1.52812e-26 1.49676e-26 2.43568e-43SC-G2 1.20566e-34 7.11246e-20 7.85118e-42 1.56008e-55SC-G3 5.53116e-25 8.16644e-21 8.82545e-34 8.39959e-38SC-G4 2.04738e-34 1.72288e-30 2.44867e-43 3.45588e-60SC-G5 4.409e-33 2.81711e-34 1.28151e-27 1.12841e-52

4.2.2 Additional Fourier Coe�cient Analysis

It can be argued that the results using the Fourier Analysis were due to not using enough

coe�cients to accurately represent the data stream. While this is possible there can

only be a limited amount of coe�cients used as inputs in an ANN before its deemed too

computationally expensive to function as a dimensionality reduction technique. Figure

4.2 displays the accuracies of Fourier Analysis as the number of coe�cients increases.

While the Gait and Shoe Characteristics increase in accuracy when more coe�cients

are used as inputs this increase is small when compared to the computational resources

used when the number of Fourier Coe�cient doubles.

Supplementary analysis was done to further this claim. Fourier-64 and Fourier-128

are implemented as data reduction techniques. Each of the ANNs using those techniques

were run. The ANNs would only require approximately 10 minutes when 8-13 inputs

were used. Using the Fourier-64 technique takes an average of an hour to complete.


Figure 4.2: Comparison of Average Accuracies of di↵erent Characteristic over FourierTransform dimensionality reduction techniques

When using the Fourier-128 technique the ANNs would require up to three hours. The

accuracy results of these supplementary ANNs can be found in Appendix B.

Figure 4.3 is an extension of Figure 4.2. It was created by adding the results from

the supplementary Fourier-64 and Fourier-128 experiments. Based on the results found,

increasing the number of coe�cients to the Fourier transform used as inputs did not sig-

nificantly contribute to the accuracy. The computational resources that are required for

such a task were high given such a small increase in accuracy. Table 4.4 shows a metric

for accuracy per number of inputs used. It presents the accuracy per dimensionality

reduction technique divided by the number of inputs the ANN employed. Depending

on the number of inputs used, a technique may be computationally cheaper by us-

ing less inputs while still obtaining a similar accuracy as another technique which is

computationally more expensive. While it might be more e�cient to use a Fourier-08

Transform if computational resource were an immense restriction, the Wavelet trans-

forms still presents the higher overall accuracy, with an e�ciency score that is near the


Figure 4.3: Comparison of Average Accuracies of di↵erent Characteristic over FourierTransform dimensionality reduction techniques including Supplementary Fourier Trans-

forms

score of the Fourier-08.

Table 4.4: E�ciency Score based on Accuracy of ANN and Number of Inputs used

CharacteristicDR techniques Gait Shoe Turn MergedFourier-08 0.116734518 0.111695557 0.097211296 0.060332832Fourier-16 0.054468919 0.05258853 0.045753864 0.027728577Fourier-32 0.026520379 0.025338507 0.021499787 0.013096777Fourier-64 0.012905452 0.012444275 0.010492719 0.00652655Fourier-128 0.006445499 0.006137613 0.005117233 0.003181044Wavelet-DB4 0.072869135 0.063198279 0.067307795 0.04851819Wavelet-Haar 0.071921436 0.06103972 0.064958432 0.044053627

4.2.3 Mother Wavelets: Haar vs DB4

The DB4 wavelet and Haar wavelet were used as the mother wavelets for Wavelet Trans-

form. Table 4.5 shows the number of times an ANN had obtained the highest average

accuracy using each of the wavelets for preprocessing. The table also shows the average


di↵erence between the two accuracies. Table 4.6 shows the average accuracy and average

di↵erence by data streams and mother wavelets.

Table 4.5: Number of ANNs using Wavelet Transforms which obtained the HighestAverage Accuracy and Average Di↵erence between Accuracy sorted by Mother Wavelets

and characteristics

Wavelet TransformCharacteristic DB4 Haar � AccuracyGait 10 6 0.008323752Shoe 13 1 0.02863891Turn 11 3 0.024745839Merged 14 0 0.047973675

Table 4.6: Average Accuracy and Average Di↵erence between Accuracy sorted byMother Wavelets and Data Stream Using Wavelet Transforms

Wavelet TransformData Stream DB4 Haar � AccuracyA-X 0.79119063 0.768003802 0.023186828A-Y 0.84677471 0.806964064 0.039810646A-Z 0.858825466 0.801797136 0.05702833AC 0.914762539 0.863093716 0.051668822S-G1 0.805985792 0.778456677 0.027529115S-G2 0.815444384 0.783450183 0.031994201S-G3 0.784717474 0.770874718 0.013842755S-G4 0.849881369 0.799029361 0.050852009S-G5 0.796408559 0.78272765 0.013680909SC-G1 0.920025679 0.913462775 0.006562904SC-G2 0.936713706 0.918870305 0.017843401SC-G3 0.893036014 0.875786294 0.01724972SC-G4 0.973571909 0.969383814 0.004188095SC-G5 0.961039754 0.932589874 0.02844988

Comparing these two transforms by themselves the results suggests that using the

DB4 wavelet as the mother wavelet most often resulted in more accurate classification by

the ANN. A student t-test was then performed on the accuracies obtained by the DB4

wavelet compared to the accuracies obtained by the Haar wavelet. Table 4.7 displays

these t-test results.

Using a significance level of p = 0.05 (95%), there is no statistical evidence that

whether the means of the two samples are significantly di↵erent across all of the possible


Table 4.7: Student’s t-test values for accuracies obtained by using Wavelet-Haar andWavelet-DB4 transforms

CharacteristicData Streams Gait Shoe Turn MergedAX 0.339680455 0.009768709 0.649663505 0.019779411AY 0.328289413 0.094128691 0.00194188 0.00020807AZ 0.005621079 0.256328821 0.000262607 4.47466e-06AC 0.020537525 0.000711237 0.00430746 0.000104489S-G1 5.62816e-05 0.000149404 0.97340923 0.001747309S-G2 0.151671582 0.044723102 6.50605e-06 0.000191347S-G3 0.158336705 6.11864e-06 0.999995882 0.125999151S-G4 0.002550454 1.75181e-08 0.000273349 4.67397e-07S-G5 0.552556776 0.04259035 0.004528171 0.275032293SC-G1 0.765957446 0.006413291 0.677020257 0.756449585SC-G2 1.000000000 0.009302044 0.36382045 0.10419116SC-G3 0.834862234 0.406943561 0.087887825 0.214078641SC-G4 0.330564931 0.651486607 0.079419143 0.276332993SC-G5 0.043640834 0.078540882 0.006660337 0.008960876

configurations. While there are some combinations of data streams and characteristic

that generated a score less than 0.05, this was not true for every single combination.

Specifically the data stream SC-G4, which will be looked upon more in depth later

on, obtained a score more than 0.05, proving it to be statistically insignificant. It can

then be concluded that the null hypothesis, H0 : µ1 = µ2, cannot be rejected due to

the lack of statistical evidence. Since there is not statistical evidence that these two

samples significantly di↵er from one another, both these techniques will be considered

when discussing the highest accuracy configuration.

4.3 Characteristics

4.3.1 Gait, Shoe, and Turn

For some specific characteristics the di↵erence in dimensionality reduction technique

does not contribute as significantly to the accuracy of the ANN as the data stream

that it is transforming. There are some data streams that do not completely represent


the absence or presence of a particular characteristic. Figure 4.4 displays the three

characteristics (Gait, Shoe, and Turn) over the eight “primary” Data Streams. These

include the accelerometers in the X, Y, and Z axis, as well as strain Gauge1, Gauge2,

Gauge3, Gauge4 and Gauge5. This subsection will looks at the e↵ect di↵erent data

streams on the accuracy of ANNs based on their location. Results and analysis from the

“combined” data streams will not considered, as they are just a combination of these

eight “primary” data streams.

Figure 4.4: Comparison of Average Accuracies of non-combined data streams for theeach characteristics regardless of dimensionality reduction techniques

For the Turn characteristic the X axis of the accelerometers performed the best

overall. This is likely due to turning being indicative of motion in the X axis. Strain

Gauge 3 did not perform as well as its other Strain Gauge cohorts. This can be explained


by noting that Strain Gauge 3 was placed on the middle of the hoof. In such a location,

the variation in strain between a turn or moving straight can be minimal. The results

suggest that by using only this gauge, the ANN was unable to determine the di↵erence

between a left turn or a straight movement. It is also noted that using Strain Gauges

located at the lateral (or outside) part of the hoof (S-G4 and S-G5) resulted in higher

accuracies compared than using Strain Gauges at the medial (or inside) part of the hoof

(S-G1 and S-G2). This pattern is explained the lateral part of the hoof experiencing

more variance in strain during turns compared to straight movement. This variation is

smaller upon the medial part of the hoof.

4.3.2 Merged characteristics Used to Enhance Results

The merged characteristic was the final configuration which ANNs were used to classify.

This merged characteristic is a combination of the Gait, Shoe and Turn characteristic.

By combining all three characteristics, the classification process became more di�cult

for the ANN as it needed to classify all three characteristics correctly and create three

correct outputs in order to be deemed accurate.

Used as a general overview the merged characteristic was designed to test the ANN’s

ability to classify all three characteristic as once. Instead of using the average accuracy

from the previous three characteristics this merged configuration score is used as a metric

to determine the ANN’s overall ability to accurately classify this data.

The ANN is only deemed accurate if it correctly identifies all three output charac-

teristics state pairs correctly. If less than all three output characteristic state pairs are

correct than the output is ranked incorrect. Any inaccuracies will have a strong negative

a↵ect upon the overall performance of the ANN. This causes the merged characteristic

to produce lower accuracies than the individual characteristics. This was designed to

increase the range of accuracies obtained from the ANN. By using this configuration it


becomes easier to identify the combination of data steams and data reduction techniques

that increase the accuracy of the ANN due to the increase range of results.

Table 4.8 is an variant of Table 4.1 but instead of displaying the average maximum

accuracies over each data stream it display both the minimum and maximum accuracies

as well as the range between them.

Table 4.8: Range of Average Accuracies over Data Streams found in ANN usingdi↵erent Dimensionality Reduction techniques for each Characteristic

Dimensionality TechniqueCharacter- Data Fourier Waveletistic Stream 08 16 32 DB4 Haar

GaitMax 0.89345311 0.83183579 0.83697053 1.00000000 1.00000000Min 0.80802421 0.79845968 0.79204121 0.91415026 0.90595353Range 0.08542889 0.03337611 0.04492932 0.08584974 0.09404647

ShoeMax 0.84338905 0.84082163 0.83568689 0.96148905 0.96534016Min 0.75323558 0.75280411 0.75496111 0.82312342 0.80543584Range 0.09015347 0.08801753 0.08072579 0.13836563 0.15990432

TurnMax 0.83414645 0.792683 0.76829275 0.98780495 0.97926825Min 0.54836065 0.57090155 0.55819655 0.6939024 0.70853655Range 0.2857858 0.22178145 0.2100962 0.29390255 0.2707317

MergedMax 0.52804875 0.4646342 0.48155745 0.94756095 0.93292685Min 0.33524595 0.31967225 0.31024595 0.5658537 0.5085365Range 0.1928028 0.14496195 0.1713115 0.38170725 0.42439035

The merged characteristic however did not greatly increase the range in any of the

Fourier techniques, see Figure 4.5, 4.6, and 4.7. Given the previous results with Fourier

in section 4.2.1, this is likely due to the Fourier Transform performing poorly with this

set of data. As the inaccuracies generated by the Fourier Transform compound upon

each other, it results in a lower overall accuracy with all the data stream. Given this,

the low accuracies result in a condensed the range. Instead of producing results over a

wider range the Fourier transform produces only lower accuracy results.

With the Wavelet Transformation the merged characteristic result in an increased

range of classification accuracies. The data streams which had obtained lower accuracies

in the three non-merged characteristics resulted in an even lower accuracy when using


the merged configuration. Vice versa, data streams that resulted in higher accuracy

classifications in the three non-merged characteristics only su↵ered slightly when using

the merged characteristic. Given these results the range of accuracies increased when

using the merged configuration compared to the non-merged characteristics. See Figure

4.8 and Figure 4.9 for a graph of this pattern. By increasing this range it becomes more

apparent which data streams are more e↵ective as input for accurate classifications.

Using other non-merged classification it was di�cult to separate the results as di↵erent

accuracies would only di↵er by only several percent. Merging has the e↵ect of enhancing

the results by increasing the range of output values which then allows this study to easily

identify the most accurate ANNs.


Figure 4.5: Average Max Accuracies by characteristics using the Fourier-8 DataReduction Technique






Figure 4.8: Average Max Accuracies by characteristics using the Wavelet-DB4 DataReduction Technique


Figure 4.9: Average Max Accuracies by characteristics using the Wavelet-Haar DataReduction Technique


4.4 Data Streams

4.4.1 Combined Data streams

In section 4.3.1, it was determined that for particular characteristics the use of some data

streams resulted in more accurate classifications compared to using other data streams.

That section did not include the combined data streams described in section 3.4.1: AC,

SC-G1, SC-G2, SC-G3, SC-G4, and SC-G5. This section will explore the significance

of these data streams and some results from the previous sections are acknowledged,

namely:

1. Section 4.2.1, The Wavelet Transform was a superior Dimensional Reduction tech-nique compared to the Fourier Transform for this work,

2. Section 4.3.2, the merged configuration is a better metric of an ANN’s accuracyfor the three characteristic compared to using the averaged accuracy of each,

3. Section 4.3.2, using the merged configuration increased the range of accuracy re-sults when using the Wavelet Transforms.

This subsection will focus on improving the performance of the Wavelet Transform by

using the Merged characteristic. When using the wavelet transforms there were small

di↵erences in the results from di↵erent data streams when used to classify the three

core characteristics. By using the merged characteristic it became more apparent which

streams can better represent the existence of some characteristics. The results of this

will help to determine which are the best data steam to use when classifying each of the

core characteristics.

Table 4.9 displays each of the eight “primary” data streams along with the six “com-

bined” data streams, and the di↵erence between the average maximum accuracies for

that particular Merged Characteristic using the Wavelet Transforms. For example, the

maximum accuracy obtained by the “primary” data stream accelerometers using the


Table 4.9: Di↵erence of Average Max Accuracy between combined and single datastreams over Wavelet Transforms For the Merged Characteristic

Dimensionality Reduction TechniqueData Streams Wavelet-DB4 Wavelet-HaarA-X 0.5658537 0.5085365A-Y 0.6743903 0.592683A-Z 0.71097565 0.59634145AC 0.82804885 0.735366

� AC and Max A 0.1170732 0.13902455S-G1 0.61680335 0.5713114SC-G1 0.8402439 0.8390245� SC-G1 and S-G1 0.22344055 0.2677131S-G2 0.63237705 0.5786884SC-G2 0.8743903 0.847561� SC-G2 and S-G2 0.24201325 0.2688726S-G3 0.5659836 0.55000005SC-G3 0.7975612 0.7695122� SC-G3 and S-G3 0.2315776 0.21951215S-G4 0.6754098 0.59467205SC-G4 0.94756095 0.93292685

� SC-G4 and S-G4 0.27215115 0.3382548S-G5 0.60409835 0.58934435SC-G5 0.9073169 0.8634147� SC-G5 and S-G5 0.30321855 0.27407035

Wavelet-DB4 technique was A-Z with a score of 0.71097565. The accuracy obtained

by the “combined” accelerometer, AC, was 0.82804885. This resulted in a di↵erence of

0.1170732, positive due to the fact that AC had a higher accuracy.

With the lack of negative delta results in this table, it demonstrates that the use

of a combined data stream outperforms the use of the “primary” data streams. This

is likely due to the “combined” data stream allowing the comparison of several vectors

at once during the classification of data. As Table 4.9 suggests, using the “combined”

accelerometers instead of each individual “primary” accelerometers in ANNs provides

a much greater accuracy while using SC-G4 leads to best accuracy amongst the strain

gauges.


4.5 Final Configuration

Based on the previous experiments, it was found that using the “combined” Strain

Gauge 4 (SC-G4) data stream, along with a Wavelet data reduction technique leads

to the highest accuracy. Table 4.10 presents the accuracies of the characteristics using

these settings.

Table 4.10: Accuracies using combined Strain Gauge 4 (SC-G4) data stream withWavelet Data Reduction Techniques to classify the Gait, Shoe, and Turn characteristics

CharacteristicsDR Technique Gait Shoe TurnWavelet-DB4 0.99871632 0.960205421 0.98780495Wavelet-Haar 1.00000000 0.965340158 0.97926825

Chapter 5

Conclusion

In this chapter, the conclusions reached by the results of the study will be examined. A

summary of these results will be reiterated, along with conjectures best explaining these

results. After that, a series of suggestions for future work would be presented. Reasons

why a larger data size, which would increase the variance and types of data found in it,

are explored. Also, suggestions on di↵erent data characteristics that could be examined

are made. Lastly, di↵erent methods that could may improve the performance of ANNs

are presented.

5.1 Summary

In this study, the periodic patterns found in horse gait data was classified using a

Back Propagation ANN after applying several types of signal transforms to it. Data

from both accelerometers and a strain gauges were examined. These data streams were

preprocessed using both the Fourier Transform and the Wavelet Transfor to reduce the

number of inputs to the classifier. The coe�cients to these transforms were then used

as input to a Back Propagation Artificial Neural Network which was used to classify the

data streams into several characteristics.

70

Chapter 5. Conclusion 71

Results obtained from these ANNs were overall accurate. It was found that ANNs

using Wavelet Transforms for preprocessing the data performed more accurately over

ANNs using the Fourier Transforms. Particular data streams were more e↵ective when

used as input compared to others. The merged characteristic, which is a combination

of the other characteristics was used as an overall classifier for testing purposes. It was

also able to be classified accurately by some of the best performing data streams and

data reduction techniques. Using the merged characteristic as the classification metric,

it was also apparent that combined data streams produced more accurate results than

the use of just a single data stream.

From section 1, it stated that this thesis will present that dimensionality reduction

techniques can be used to preprocess data which is classified by the ANN and will

produce accurate classifications. It also states that it will explore the e↵ectiveness of

di↵erent techniques and data streams used, analyzing its e↵ect on the ANN’s accuracy.

It has been demonstrated that the Fourier and wavelet signal transforms can be used

to reduce the size of the input data to a Back Propagation Neural Network. It has also

been demonstrated that this data is representative of the original data and will produce

successful classification using equine gait data.

5.2 Future Work

5.2.1 Data Size, Variance and Types

This study only utilized the data from five horses, four of which were “pacers” while one

was a “trotter”. In future works involving gait analysis, a larger number of subjects with

equal amounts of both “pacer” and “trotter” would be preferred. With this approach,

a more generalized pattern might be recognized between the two gaits.


Similarity, it would be interesting to study other gait characteristics, expanding from

two-beat gaits to four-beat gaits. Using similar methods as in this study is a good

starting point in the analysis of such gait. A potential di�culty might appear from

only collecting and analyze data originating from the right fore hoof. There might be a

necessity to measure to all four limbs of the subject. With the increase variance in the

types of gait additional classes could then be included in the “Gait” characteristic.

Furthering the expansion on the types of data collected, instead of just analyzing

the strain on the hoof wall, strain gauge could be used in analyzing the amount of

strain exerted upon the limbs and joints. This data might provide another dimension

as the strain upon the hoof can be completely di↵erent than ones obtained higher up in

the limbs. This becomes exceptionally useful when analyzing the medical ramifications

discussed in section 5.2.2.

Another type of data that could be collected could be an indication of the track

condition of the subject during the run. Di↵erences in track condition could factor in

di↵erences in strain gauge measurements even if the subject was performing runs each

with the same characteristics associated with it. Being able to measure this variable

could lead to a better accuracy in the ANNs due to its reliance on the strain gauge

measurements.

5.2.2 Data Characteristics

Another direction that could prove to be applicable would applying the results found

in this study to the field of medical and veterinarian science. This would rely on the

collection of limb or joint strain data mentioned in section 5.2.1. By analyzing these limb

or joint strain data, it might be helpful in diagnosing muscle, tendon or joint injuries.

This might require an ANN that is tailored specifically for a particular subject, as the

change in this measurement might vary too widely between subjects to provide any


useful results. Being able to analyze this data through the development of a subject’s

lameness (the inability to travel in a regular manner with all four feet) might contribute

to a detection technique of injuries, diseases or over working of the subject. Using this

knowledge, future horses may be able to be diagnosed in the earlier stages when it shows

signs that other subjects may be experienced before acquiring such complications. The

ability to get these subjects medical help to these subjects can contribute to a speedier

recovery or even avoid such complications. This however will require a large amount

of data sets, as di↵erent injuries or diseases would present di↵erently with di↵erent

symptoms. Multiple cases across multiple horses with perhaps di↵erent gait will be

needed to provide any applicable and useful knowledge that the ANN can learn.

A more ambitious trait to classify is each subject’s gait movement in relative to their

breed. It would be interesting to analyze the correlation between subtle gait traits and a

specific breed of that horse or of a particular characteristic that is desirable in a certain

breed of horse. Being able to detect these traits might assist in more accurate genetic

breeding probabilities.

5.2.3 Methods

A general wavelet and Fourier transform were examined in this study. By transforming

the data stream using a Haar and DB4 Wavelet, the summation of the coe�cients by

level was used as the input to the ANN. Repeating this study utilizing another mother

wavelet and perhaps another method of transforming the wavelet coe�cients into a

suitable number of inputs for the ANN could also be examined.

Appendix A

Results of Artificial Neural

Networks

A.1 Note of Results

Each combination of Data Reduction Technique, Data Stream, and Characteristic were

classified by an ANN multiple times. Each had a di↵erent random initialized value. This

Appendix presents results as average values for both the maximum accuracy obtained

and the epoch in which the ANN reached this maximum value.

74

Appendix A. Results of Artificial Neural Networks 75

A.2 Gait

Table A.1: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Gait Characteristic

Dimensionality Reduction TechniqueFourier-08 Fourier-16 Fourier-32

Data Streams Accuracy Epoch Accuracy Epoch Accuracy EpochA-X 0.70454555 4248 0.69924250 4562 0.75378785 2565A-Y 0.77575745 5591 0.80151500 7082 0.82803015 6094A-Z 0.81818175 5839 0.80833330 2458 0.81742415 1900AC 0.85227255 3242 0.83333335 2065 0.81515150 2194S-G1 0.85564100 2554 0.89512815 1583 0.88256410 2728S-G2 0.84948715 3657 0.87846145 1591 0.88794875 1211S-G3 0.84948715 3721 0.84333325 2729 0.86769230 800S-G4 0.83641010 3937 0.87666655 1949 0.87846155 938S-G5 0.84461550 2373 0.89128195 1272 0.90999995 714SC-G1 0.84035100 1464 0.82368420 713 0.86666670 667SC-G2 0.90789475 626 0.78245615 123 0.82368425 1087SC-G3 0.93421045 1050 0.84473685 1189 0.83771930 758SC-G4 0.83596495 175 0.78859645 51 0.85263160 142SC-G5 0.88508775 313 0.83596490 148 0.89736840 555

Table A.2: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Gait Characteristic

Dimensionality Reduction TechniqueWavelet-DB4 Wavelet-Haar

Data Streams Accuracy Epoch Accuracy EpochA-X 0.97803010 725 0.96969675 986A-Y 0.96515135 539 0.96590900 387A-Z 0.98484825 110 0.96666645 1678AC 0.99772720 106 0.98712100 68S-G1 0.96000005 1280 0.96564120 1864S-G2 0.96076930 2220 0.94717945 4015S-G3 0.97205130 1118 0.96717955 2175S-G4 0.97384625 2605 0.94923085 2176S-G5 0.95974355 1651 0.95230775 2831SC-G1 0.99912280 19 1.00000000 19SC-G2 1.00000000 11 1.00000000 20SC-G3 0.99473680 14 0.99122800 29SC-G4 1.00000000 7 1.00000000 13SC-G5 0.99035085 54 0.99122805 28


A.3 Shoe

Table A.3: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Shoe Characteristic



Table A.4: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Shoe Characteristic




A.4 Turn

Table A.5: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Turn Characteristic



Table A.6: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze the Turn Characteristic




A.5 Merged Characteristics

Table A.7: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze all of the characteristics merged



Table A.8: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Wavelet Dimensionality Reductions to analyze all of the characteristics merged



Appendix B

Supplementary Results of

Artificial Neural Networks

B.1 Note of Results

Each combination of Fourier Data Reduction Technique, Data Stream, and Character-

istic were classified by an ANN multiple times. Each had a di↵erent random initialized

value. This Appendix presents results as average values for both the maximum accuracy

obtained and the epoch in which the ANN reached this maximum value.

79

Appendix B. Supplementary Results of Artificial Neural Networks 80

B.2 Gait

Table B.1: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Gait Characteristic

Dimensionality Reduction TechniqueFourier-64 Fourier-128



B.3 Shoe

Table B.2: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Shoe Characteristic




B.4 Turn

Table B.3: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Turn Characteristic




B.5 Merged

Table B.4: Average Maximum Accuracy Of ANNs and the Average Epoch neededusing Fourier Dimensionality Reductions to analyze the Merged Characteristic


Data Streams Accuracy Epoch Accuracy EpochA-X 0.5439025 2568 0.4829267 214.5A-Y 0.5268294 1223 0.5 163A-Z 0.5146342 180 0.490244 26AC 0.5341463 551.5 0.509756 48.5S-G1 0.554918 52 0.5557377 1140S-G2 0.5262294 155 0.5131149 278S-G3 0.4139342 1004.5 0.4254097 24S-G4 0.5336066 79.5 0.5557376 33.5S-G5 0.6475411 817.5 0.6418032 66.5SC-G1 0.5634145 11 0.5756098 1013SC-G2 0.5439025 199 0.504878 917SC-G3 0.4926829 28.5 0.4926828 131SC-G4 0.5487806 59 0.5804876 977SC-G5 0.5609756 93 0.5487806 317.5

Appendix C

Standard Deviation of Results

C.1 Note of Results

Each combination of Data Reduction Technique, Data Stream, and Characteristic were

classified by an ANN multiple times. This Appendix documents the standard deviation

of each group of results.

84

Appendix C. Standard Deviation of Results 85

C.2 Gait

Table C.1: Standard Deviations of ANNs analyzing the Gait Characteristic using theFourier Data Reduction Technique

Data Reduction TechniqueData Stream Fourier-08 Fourier-16 Fourier-32 Fourier-64 Fourier-128A-X 0.043570101 0.047721676 0.048814275 0.039629453 0.058941793A-Y 0.045094107 0.049351359 0.049317956 0.035846296 0.0520832A-Z 0.038425269 0.037993951 0.031705085 0.039629453 0.050222414AC 0.038425283 0.035805767 0.031860605 0.047794832 0.046277193S-G1 0.030675335 0.032604703 0.028622641 0.040621318 0.033355391S-G2 0.031240397 0.031584053 0.030772196 0.06503879 0.033436183S-G3 0.032712969 0.033881104 0.042034965 0.073970631 0.028204432S-G4 0.033310548 0.033853718 0.029975659 0.037166419 0.031703339S-G5 0.030259945 0.031162901 0.036904688 0.028582906 0.022596844SC-G1 0.042108406 0.051412034 0.052238745 0.043630717 0.046277509SC-G2 0.054189644 0.049584594 0.039856712 0.039024625 0.039024625SC-G3 0.058263418 0.053578067 0.045385405 0.03308456 0.033084589SC-G4 0.042536694 0.042883871 0.046284414 0.033084677 0.028443761SC-G5 0.045385488 0.054523186 0.045675084 0.026718272 0.0292684

Table C.2: Standard Deviations of ANNs analyzing the Gait Characteristic using theWavelet Data Reduction Technique

Data Reduction TechniqueData Stream Wavelet-DB4 Wavelet-HaarA-X 0.023600536 0.03149653A-Y 0.027711338 0.026681321A-Z 0.026804459 0.030540132AC 0.01176516 0.016338906S-G1 0.015815619 0.022656051S-G2 0.02232508 0.02528065S-G3 0.016153135 0.019589826S-G4 0.024235552 0.021963733S-G5 0.0281904 0.023375761SC-G1 0.01176516 0.013825858SC-G2 0.00000000 0.00000000SC-G3 0.019969694 0.016539351SC-G4 0.005446211 0.00000000SC-G5 0.009943375 0.018063206


C.3 Shoe

Table C.3: Standard Deviations of ANNs analyzing the Shoe Characteristic using theFourier Data Reduction Technique


Table C.4: Standard Deviations of ANNs analyzing the Shoe Characteristic using theWavelet Data Reduction Technique



C.4 Turn

Table C.5: Standard Deviations of ANNs analyzing the Turn Characteristic using theFourier Data Reduction Technique


Table C.6: Standard Deviations of ANNs analyzing the Turn Characteristic using theWavelet Data Reduction Technique



C.5 Merged

Table C.7: Standard Deviations of ANNs analyzing all of the characteristics mergedusing the Fourier Data Reduction Technique


Table C.8: Standard Deviations of ANNs analyzing all of the characteristics mergedusing the Wavelet Data Reduction Technique


Bibliography

[1] Filip Wasilewski. Daubechies 4 wavelet (db4) properties, filters and functions, Febu-

rary 2014. URL http://wavelets.pybytes.com/wavelet/db4/. From Wavelet

Properties Browser.

[2] Filip Wasilewski. Haar wavelet (haar) properties, filters and functions, Feburary

2014. URL http://wavelets.pybytes.com/wavelet/haar/. From Wavelet Prop-

erties Browser.

[3] Joshua Altmann. 3 - wavelet basics, Feburary 2014. URL http://www.wavelet.

org/tutorial/wbasic.htm. From Wavelets Tutorial.

[4] D.L. Poole, A.K. Mackworth, and R. Goebel. Computational Intelligence: A Logical

Approach. Oxford University Press, 1998. ISBN 9780195102703. URL http://

books.google.ca/books?id=RCaOtmXvbCUC.

[5] John B Kaneene, Whitney A Ross, and RoseAnn Miller. The michigan equine mon-

itoring system. ii. frequencies and impact of selected health problems. Preventive

veterinary medicine, 29(4):277–292, 1997.

[6] Geraint Wyn-Jones et al. Equine lameness. Blackwell Scientific Publications, 1988.

[7] M Hewetson, RM Christley, ID Hunt, and LC Voute. Investigations of the reliability

of observational gait analysis for the assessment of lameness in horses. Veterinary

record: journal of the British Veterinary Association, 158(25), 2006.

89

Bibliography 90

[8] KG Keegan, DA Wilson, DJ Wilson, B Smith, EM Gaughan, RS Pleasant, JD Lil-

lich, J Kramer, RD Howard, C Bacon-Miller, et al. Evaluation of mild lameness in

horses trotting on a treadmill by clinicians and interns or residents and correlation

of their assessments with kinematic gait analysis. American journal of veterinary

research, 59(11):1370–1377, 1998.

[9] T Pfau, JJ Robilliard, R Weller, K Jespers, E Eliashar, and AMWilson. Assessment

of mild hindlimb lameness during over ground locomotion using linear discriminant

analysis of inertial sensor data. Equine veterinary journal, 39(5):407–413, 2007.

[10] Akikazu Ishihara, Stephen M Reed, Paivi J Rajala-Schultz, James T Robertson,

and Alicia L Bertone. Use of kinetic gait analysis for detection, quantification, and

di↵erentiation of hind limb lameness and spinal ataxia in horses. Journal of the

American Veterinary Medical Association, 234(5):644–651, 2009.

[11] Ellen Bajcar, David Calvert, and Je↵ Thomason. Analysis of equine gaitprint and

other gait characteristics using self-organizing maps (som). In Neural Networks,

2004. Proceedings. 2004 IEEE International Joint Conference on, volume 1. IEEE,

2004.

[12] P. Simon. Too Big to Ignore: The Business Case for Big Data. Wiley and SAS

Business Series. Wiley, 2013. ISBN 9781118642108. URL http://books.google.

ca/books?id=Dn-Gdoh66sgC.

[13] Stanley Smith Stevens. On the theory of scales of measurement, 1946.

[14] Terrence J Sejnowski and Charles R Rosenberg. Parallel networks that learn to

pronounce english text. Complex systems, 1(1):145–168, 1987.

[15] Timo Koskela, Mikko Lehtokangas, Jukka Saarinen, and Kimmo Kaski. Time series

prediction with multilayer perceptron, fir and elman neural networks. In Proceedings

of the World Congress on Neural Networks, pages 491–496, 1996.

Bibliography 91

[16] Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by

locally linear embedding. Science, 290(5500):2323–2326, 2000. doi: 10.1126/

science.290.5500.2323. URL http://www.sciencemag.org/content/290/5500/

2323.abstract.

[17] Yvan Saeys, Inaki Inza, and Pedro Larranaga. A review of feature selec-

tion techniques in bioinformatics. Bioinformatics, 23(19):2507–2517, 2007. doi:

10.1093/bioinformatics/btm344. URL http://bioinformatics.oxfordjournals.

org/content/23/19/2507.abstract.

[18] Josef Kittler. Feature selection and extraction. Handbook of pattern recognition and

image processing, pages 59–83, 1986.

[19] Isabelle Guyon and Andre Elissee↵. An introduction to variable and feature selec-

tion. The Journal of Machine Learning Research, 3:1157–1182, 2003.

[20] Ron Kohavi and George H John. Wrappers for feature subset selection. Artificial

intelligence, 97(1):273–324, 1997.

[21] Avrim L Blum and Pat Langley. Selection of relevant features and examples in

machine learning. Artificial intelligence, 97(1):245–271, 1997.

[22] Thomas Navin Lal, Olivier Chapelle, Jason Weston, and Andre Elissee↵. Embedded

methods. In Feature extraction, pages 137–165. Springer, 2006.

[23] Rick Archibald and George Fann. Feature selection and classification of hyperspec-

tral images with support vector machines. Geoscience and Remote Sensing Letters,

IEEE, 4(4):674–677, 2007.

[24] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the

Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.

[25] Yiming Yang and Jan O Pedersen. A comparative study on feature selection in text

categorization. In ICML, volume 97, pages 412–420, 1997.

Bibliography 92

[26] Andre A. Rupp Michael J. Walk. Pearson Product-Moment Correlation Coe�cient,

pages 1023–1027. SAGE Publications, Inc., 0 edition, 2010. doi: http://dx.doi.org/

10.4135/9781412961288. URL http://dx.doi.org/10.4135/9781412961288.

[27] Stephen M. Stigler. Francis galton’s account of the invention of correlation. Sta-

tistical Science, 4(2):73–79, 05 1989. doi: 10.1214/ss/1177012580. URL http:

//dx.doi.org/10.1214/ss/1177012580.

[28] Daphne Koller and Mehran Sahami. Toward optimal feature selection. 1996.

[29] Lei Yu and Huan Liu. Feature selection for high-dimensional data: A fast

correlation-based filter solution. In ICML, volume 3, pages 856–863, 2003.

[30] Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and L Zadeh. Feature extraction.

Foundations and applications, 2006.

[31] Imola K Fodor. A survey of dimension reduction techniques, 2002.

[32] Padraig Cunningham. Dimension reduction. In Machine learning techniques for

multimedia, pages 91–112. Springer, 2008.

[33] Roland Priemer. Introductory signal processing, volume 6. World Scientific, 1991.

[34] Wenxin Li, David Zhang, and Zhuoqun Xu. Palmprint identification by fourier

transform. International Journal of Pattern Recognition and Artificial Intelligence,

16(04):417–432, 2002.

[35] Jing Lin and Liangsheng Qu. Feature extraction based on morlet wavelet and its

application for mechanical fault diagnosis. Journal of sound and vibration, 234(1):

135–148, 2000.

[36] H.C. Taneja. Advanced Engineering Mathematics:, volume 2. I.K. International

Publishing House Pvt. Limited, 2007. ISBN 9788189866563. URL http://books.

google.ca/books?id=X-RFRHxMzvYC.

Bibliography 93

[37] Eric W. Weisstein. Fourier series, Feburary 2014. URL http://mathworld.

wolfram.com/FourierSeries.html. From MathWorld–A Wolfram Web Resource.

[38] Eric W. Weisstein. Generalized fourier series, Feburary 2014. URL http://

mathworld.wolfram.com/GeneralizedFourierSeries.html. From MathWorld–

A Wolfram Web Resource.

[39] Mizan Rahman. Applications of Fourier Transforms to Generalized Functions. WIT

Press, 2011.

[40] Eric W. Weisstein. Fourier transform, Feburary 2014. URL http://mathworld.

wolfram.com/FourierTransform.html. From MathWorld–A Wolfram Web Re-

source.

[41] Eric W. Weisstein. Discrete fourier transform, Feburary 2014. URL http://

mathworld.wolfram.com/DiscreteFourierTransform.html. From MathWorld–

A Wolfram Web Resource.

[42] Tom Chau. A review of analytical techniques for gait data. part 2: neural network

and wavelet methods. Gait & Posture, 13(2):102–120, 2001.

[43] JG Barton and A Lees. An application of neural networks for distinguishing gait

patterns on the basis of hip-knee joint angle diagrams. Gait & Posture, 5(1):28–33,

1997.

[44] James S Walker. Fourier analysis and wavelet analysis. Notices of the AMS, 44(6):

658–670, 1997.

[45] J Allen. Short-term spectral analysis, and modification by discrete fourier trans-

form. IEEE Transactions on Acoustics Speech and Signal Processing, 25:235–238,

1977.

Bibliography 94

[46] I. Daubechies. The wavelet transform, time-frequency localization and signal anal-

ysis. Information Theory, IEEE Transactions on, 36(5):961–1005, Sep 1990. ISSN

0018-9448. doi: 10.1109/18.57199.

[47] Gerald Kaiser. A friendly guide to wavelets. Springer, 2010.

[48] Stephane G Mallat. A theory for multiresolution signal decomposition: the wavelet

representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on,

11(7):674–693, 1989.

[49] D Lee Fugal. Conceptual wavelets in digital signal processing: an in-depth, practical

approach for the non-mathematician. Space & Signals Technical Pub., 2009.

[50] Dan B. Marghitu and Prasad Nalluri. An analysis of greyhound gait using wavelets.

Journal of Electromyography and Kinesiology, 7(3):203 – 212, 1997. ISSN 1050-

6411. doi: http://dx.doi.org/10.1016/S1050-6411(96)00035-1. URL http://www.

sciencedirect.com/science/article/pii/S1050641196000351.

[51] B.A. Paya, I.I. Esat, and M.N.M. Badi. Artificial neural network based fault diag-

nostics of rotating machinery using wavelet transforms as a preprocessor. Mechani-

cal Systems and Signal Processing, 11(5):751 – 765, 1997. ISSN 0888-3270. doi:

http://dx.doi.org/10.1006/mssp.1997.0090. URL http://www.sciencedirect.

com/science/article/pii/S088832709790090X.

[52] T Tamura, M Sekine, M Ogawa, T Togawa, and Y Fukui. Classification of acceler-

ation waveforms during walking by wavelet transform. Methods of information in

medicine, 36(4-5):356–359, 1997.

[53] Matteo Frigo and Steven G Johnson. The design and implementation of ↵tw3.

Proceedings of the IEEE, 93(2):216–231, 2005.

[54] Filip Wasilewski. Pywavelets - discrete wavelet transform in python, Feburary 2014.

URL http://www.pybytes.com/pywavelets/.

Bibliography 95

[55] Adam Blum. Neural networks in C++: an object-oriented framework for building

connectionist systems. John Wiley & Sons, Inc., 1992.

[56] Gordon S Lino↵ and Michael JA Berry. Data mining techniques: for marketing,

sales, and customer relationship management. John Wiley & Sons, 2011.

Documents

Equine Gait Data Analysis using Signal Transforms as a Preprocessor to Back Propagation Networks