Quaternion-ValuedNonlinearAdaptive Filters Ujang-CABB... · Quaternion-ValuedNonlinearAdaptive Filters Prepared by Che Ahmad Bukhari bin Che Ujang Supervised by Prof. Danilo P. Mandic

Quaternion-Valued Nonlinear Adaptive

Filters

Prepared byChe Ahmad Bukhari bin Che Ujang

Supervised byProf. Danilo P. Mandic

A thesis submitted in fulfilment of requirements for the degree of Doctor of Philosophyand Diploma of Imperial College London

Communications and Signal Processing GroupDepartment of Electrical and Electronic Engineering

Imperial College London2012

2

Abstract

Advances in vector sensor technology have created a need for adaptive nonlinear signal

processing in the quaternion domain. The main concern of this thesis lies in the issue of

analyticity of quaternion-valued nonlinear functions. The Cauchy-Riemann-Fueter (CRF)

conditions determine the analyticity in the quaternion domain which proved too strict

to be of any practical use. In order to circumvent this problem, split-quaternion nonlin-

ear functions which are analytic componentwise are commonly employed. However, these

functions do not fully capture the correlations between dimensions and are not suitable for

real-world applications. To address this, the use of fully quaternion nonlinear functions in

the derivation of a completely new class of algorithms which takes into consideration the

non-commutative aspect of quaternion product is proposed. These fully quaternion func-

tions satisfy the local analyticity condition (LAC) that guarantees the first-order differen-

tiability of the function. This provides a unifying framework for the derivation of gradient

based learning algorithms in the quaternion domain which are shown to have the same

generic form as their real- and complex-valued counterparts. Unlike existing approaches,

this new class of algorithms derived is suitable for the processing of signals with strong

component correlations and is further extended to the recurrent neural network (RNN)

architecture. Novel algorithms are also derived to improve the computational complexity

of quaternion-valued adaptive filters which could be easily extended to incorporate non-

linear functions. A rigorous mathematical analysis provides a basis for the understanding

of the convergence and steady-state performance of the proposed algorithms. Simulations

over a range of synthetic and real-world signals support the approach taken in the thesis.

3

Acknowledgement

Firstly, I would like to express my deepest gratitude to my supervisor, Professor Danilo

Mandic for his guidance and patience. His utmost dedication to his work has set a good

motivation for me to complete my studies. Throughout the years, Danilo has supported

me in my research and has been patience with my shortcomings. He has given me ample

opportunity to grow as a researcher and it was a privilege working under him.

I would like to thank Dr. Clive Cheong Took for mentoring me throughout my

studies. Working together alongside Clive for all these years has taught me so many

things about academics and life. Clive is one of the best researcher I have ever had the

opportunity to know and it has been an honour working with him.

I am greatly in debt to my parents, Che Ujang Che Daud and Noraini Mat Noor,

and my siblings, Che Adam Rashid Che Ujang and Che Roselind Che Ujang, for their

continuous and unwavering support throughout. I would also like to thank my girlfriend,

Nik Nabilah Nik Amiruddin for her warm love during the cold months.

I am thankful to all my friends and colleagues in the EEE department especially

Zaid Omar, Mossaber Ahmed, Naveed Ur Rehman, Ammar Hassan, Bruce Leow, Hana

Fedora Abdul Aziz, Hussein al-Khattab, Lila Izhar, Pradeep Loganathan, Andy Khong,

David Looney, Cheolsoo Park and Xia Yi Li.

I would like to extend my gratitude to the Malaysia Ministry of Higher Education

(MOHE) and Universiti Putra Malaysia (UPM) for giving me the opportunity to further

my studies at Imperial College London. Last but not least, I would like to thank God as

without His blessings, nothing is possible.

4

Dedicated to Che Ujang Che Daud, Noraini Mat Noor, Che Adam Rashid Che Ujang,

Che Roselind Che Ujang and Nik Nabilah Nik Amiruddin

5

List of Publications

The following publications support the material given in this thesis.

Journal Publications:

1. B. Che Ujang, C. Cheong Took and D. P. Mandic, “Quaternion valued nonlinear

adaptive filtering”, IEEE Transactions on Neural Networks, vol. 22, no. 8, pp.

1193-1206, 2011.

2. B. Che Ujang, C. Cheong Took and D. P. Mandic, “Split quaternion nonlinear

adaptive filtering”, Neural Networks, vol. 23, no. 3, pp. 426-434, 2010.

3. B. Che Ujang, C. Cheong Took and D. P. Mandic, “Identification of improper quater-

nion processes by fractional tap-length adaptive filters”, submitted to IEEE Trans-

actions on Neural Networks and Learning Systems (special issue on learning in non-

stationary and evolving environments).

Conference Publications:

1. B. Che Ujang, C. Cheong Took and D. P. Mandic, “On quaternion analyticity:

enabling quaternion-valued adaptive filtering”, In Proceedings of IEEE International

Conference on Acoustics, Speech and Signal Processing (ICASSP), March 25-30,

2012, Kyoto, Japan.

2. B. Che Ujang, C. Cheong Took and D. P. Mandic, “Identification of improper pro-

cesses by variable tap-length complex valued adaptive filters”, In Proceedings of

International Joint Conference on Neural Networks (IJCNN), pp. 1-6, July 18-23,

2010, Barcelona, Spain.

3. B. Che Ujang, C. Cheong Took and D. P. Mandic, “A split quaternion nonlin-

ear adaptive filter”, In Proceedings of IEEE International Conference on Acoustics,

List of Publications 6

Speech and Signal Processing (ICASSP), pp. 1745-1748, April 19-24, 2009, Taipei,

Taiwan.

7

Contents

Abstract 2

Acknowledgement 3

List of Publications 5

Contents 7

List of Figures 10

List of Tables 12

Statement of Originality 13

Abbreviations 14

Mathematical Notations 16

Chapter 1. Introduction 18

1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.2 Motivations and Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.3 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Chapter 2. Background Theory 23

2.1 Adaptive Systems Configuration . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 Quaternion Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3 Augmented Quaternion Statistics . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3.1 Cη-circular Quaternion Random Variables . . . . . . . . . . . . . . . 28

2.3.2 H-circular Quaternion Random Variables . . . . . . . . . . . . . . . 29

2.3.3 Augmented Second-Order Statistics of Quaternion Random Vectors 29

2.4 Analyticity in H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5 Review of Nonlinear Functions . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.6 Quaternion-valued Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . 35

Contents 8

2.6.1 Derivation of Quaternion Least Mean Square (QLMS) . . . . . . . . 36

2.6.2 Analysis of Quaternion Least Mean Square (QLMS) . . . . . . . . . 37

2.7 Introduction to Quaternion Kalman Filtering . . . . . . . . . . . . . . . . . 39

Chapter 3. A Class of Split Quaternion Nonlinear Adaptive Filters 42

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2 Derivation of Split Quaternion Algorithms . . . . . . . . . . . . . . . . . . . 43

3.2.1 Derivation of Quaternion-valued Finite Impulse Response algorithm 44

3.2.2 Derivation of the Split Quaternion Adaptive Filtering Algorithm

(SQAFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2.3 Derivation of Adaptive Amplitude Split Quaternion Adaptive Fil-

tering Algorithm (AASQAFA) . . . . . . . . . . . . . . . . . . . . . 47

3.2.4 Convergence Analysis of SQAFA and AASQAFA . . . . . . . . . . . 48

3.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.3.1 Four-dimensional Saito’s Chaotic Circuit . . . . . . . . . . . . . . . . 53

3.3.2 Wind Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Chapter 4. A Class of Quaternion Valued Nonlinear Adaptive Filters 59

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2 Fully Quaternion Functions in H . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.1 Quaternion Exponential Function . . . . . . . . . . . . . . . . . . . . 63

4.2.2 Local Analyticity of the Quaternion tanh Function . . . . . . . . . . 64

4.3 Derivation of Fully Quaternion Algorithms . . . . . . . . . . . . . . . . . . . 65

4.3.1 Derivation of Quaternion Nonlinear Gradient Descent (QNGD) . . . 66

4.3.2 Augmented Quaternion Nonlinear Gradient Descent (AQNGD) . . . 67

4.3.3 Convergence Analysis of QNGD and AQNGD . . . . . . . . . . . . . 68

4.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.4.1 Linear AR (4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.4.2 Four-dimensional Saito’s Chaotic Circuit . . . . . . . . . . . . . . . . 72

4.4.3 Wind Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Chapter 5. Enabling Quaternion Valued Recurrent Neural Networks 82

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.2 Analysis of Quaternion-Valued Functions . . . . . . . . . . . . . . . . . . . 84

5.3 FCRNN Algorithms in H . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Contents 9

5.3.1 Derivation of the Split Quaternion-valued RTRL . . . . . . . . . . . 86

5.3.2 Derivation of the Quaternion-Valued RTRL . . . . . . . . . . . . . . 89

5.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.1 Three-dimensional Lorenz Chaotic Signal . . . . . . . . . . . . . . . 91

5.4.2 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Chapter 6. Identification of Improper Quaternion Processes by Fractional

Tap-Length Algorithms 94

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.2 Model Order Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.2.1 Filter Weight Updates . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.2.2 Tap Length Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.3 Steady-State Analysis of FT Based Algorithms . . . . . . . . . . . . . . . . 99

6.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.4.1 Optimal Tap-Length . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.4.2 Modelling of Quaternion-Valued Systems . . . . . . . . . . . . . . . 107

6.4.3 Nonstationary Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Chapter 7. Conclusions and Future Works 112

7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Bibliography 116

Appendix A. Derivation of QLMS 124

Appendix B. Derivation of QMLP-FIR 126

Appendix C. Convergence of SQAFA 128

Appendix D. Convergence of AASQAFA 131

Appendix E. Analyticity of the exponential function eq 134

Appendix F. Local Analyticity of tanh(q) 136

Appendix G. A Local Derivative of tanh(q) 140

Appendix H. Derivation of Split QRTRL 141

Appendix I. Derivation of QRTRL 143

10

List of Figures

2.1 Adaptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2 Linear adaptive finite impulse response (FIR) filter . . . . . . . . . . . . . . 36

3.1 Nonlinear adaptive finite impulse response (FIR) filter . . . . . . . . . . . . 44

3.2 Left: The 4D Saito Signal. Right: The 3D wind signal. . . . . . . . . . . . . 51

3.3 The performance of SQAFA, AASQAFA and QMLP on the prediction of

4D Saito’s Chaotic Signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.4 The performance of SQAFA, AASQAFA and QMLP on the prediction of

3D wind signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.5 The performance of SQAFA, QFIR, CNGD and NGD on the prediction of

3D wind signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.6 Prediction gain of AASQAFA for the varying initial amplitude λ(0) and

step size ρ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.1 Learning curves for QMLP-FIR, AASQAFA, QNGD and AQNGD on the

prediction of linear AR (4) signal (4.39) driven by H-circular white Gaussian

noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70


prediction of linear AR (4) signal (4.39) driven by Ci-circular white Gaus-

sian noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72


prediction of linear AR (4) signal (4.39) driven by noncircular white Gaus-

sian noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.4 Noncircular signals used in simulations. Left: The 4D Saito Signal. Right:

The 3D wind signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

List of Figures 11

4.5 The performance of AQNGD, QNGD, AASQAFA and QMLP-FIR on the

prediction of the noncircular 4D Saito signal. . . . . . . . . . . . . . . . . . 76


prediction of the noncircular 4D Saito signal over a range of filter lengths. . 77


prediction of a 3D wind signal. . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.8 The performance of QNGD, QMLP and NGD on the prediction of a 3D

wind signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.9 Prediction gains of QNGD for tan(q), sin(q), arctan(q), arcsin(q), sinh(q),

arctanh(q) and arcsinh(q) for the prediction of 3D wind signal. . . . . . . . 80

5.1 A fully connected recurrent neural network (FCRNN). . . . . . . . . . . . . 86

5.2 Phase space of Lorenz signal . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.3 The performance of QRTRL, split QRTRL and RTRL on the prediction of

motion data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.1 Hybrid filter structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.2 The steady-state MSE for the processesW1 andW2 with respect to tap-length.104

6.3 The evolution of the optimal filter length parameter p and mixing parameter

λ for the modelling of the linear system W1. . . . . . . . . . . . . . . . . . . 105

6.4 The evolution of the optimal filter length parameter p and mixing parameter

λ for the modelling of the widely linear system W2. . . . . . . . . . . . . . . 106

6.5 The steady-state MSE for the process linear noncircular W1 with respect

to tap-length. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.6 The evolution of the optimal filter length parameter p for the modelling of

the system W1 for the intervals 1 ≤ n ≤ 3000, W2 for 3001 ≤ n ≤ 6000 and

noncircular W1 for 6001 ≤ n ≤ 9000 . . . . . . . . . . . . . . . . . . . . . . 109

6.7 The evolution of the mixing parameter λ for the modelling of the system

W1 for the intervals 1 ≤ n ≤ 3000, W2 for 3001 ≤ n ≤ 6000 and noncircular

W1 for 6001 ≤ n ≤ 9000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

12

List of Tables

3.1 Computational complexities of the algorithms . . . . . . . . . . . . . . . . . 56

4.1 Classes of Quaternion White Gaussian Noise . . . . . . . . . . . . . . . . . 70

4.2 Prediction Gain Rp for a Linear AR (4) Process With Varying Degree of

Noncircularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.3 Computational complexities of the algorithms considered . . . . . . . . . . . 78

5.1 Correlation Coefficients Between Lorenz Attractors . . . . . . . . . . . . . . 91

6.1 Noncircular Quaternion White Gaussian Noise . . . . . . . . . . . . . . . . 108

13

Statement of Originality

This research is believed to be an original contribution of the author’s work in the field of

quaternion domain signal processing. Any idea or quotations from the works of other peo-

ple are fully acknowledged according to the standard referencing style practiced in the field.

As far as the author is aware, the following aspects of the thesis are believed to be

original contributions:

• Chapter 3: A Class of Split Quaternion Nonlinear Adaptive Filters

• Chapter 4: A Class of Quaternion Valued Nonlinear Adaptive Filters

• Chapter 5: Enabling Quaternion Valued Recurrent Neural Networks

• Chapter 6: Identification of Improper Quaternion Processes by Fractional

Tap-Length Algorithms

14

Abbreviations

LMS: Least Mean Square

CLMS: Complex Least Square

NN: Neural Network

RNN: Recurrent Neural Network

BP: Backpropagation

RBP: Recurrent Backpropagation

CBP: Complex Backpropagation

CRTRL: Complex Real Time Recurrent Learning

RTRL: Real Time Recurrent Learning

3DV-BP: Three-Dimensional Vector Backpropagation

VP-BP: Vector Product Backpropagation

QMLP: Quaternion valued Multilayer Perceptron

QLMS: Quaternion Least Mean Square

FIR: Finite Impulse Response

NGD: Nonlinear Gradient Descent

SQAFA: Split Quaternion Adaptive Filtering Algorithm

AASQAFA: Adaptive Amplitude Split Quaternion Adaptive Filtering Algorithm

MSE: Mean Squared Error

FT: Fractional Tap-Length

QWGN: Quaternion White Gaussian Noise

Abbreviations 15

QNGD: Quaternion Nonlinear Gradient Descent

AQNGD: Augmented Quaternion Nonlinear Gradient Descent

WL-QLMS: Widely Linear Quaternion Least Mean Square

AR: Autoregressive

QRTRL: Quaternion Real Time Recurrent Learning

QMLP-FIR: QMLP rederived for FIR

IIR: Infinite Impulse Response

CR: Cauchy-Riemann

GCR: Generalized Cauchy Riemann

CRF: Cauchy-Riemann-Fueter

ETF: Elementary Transcendental Function

LAC: Local Analyticity Condition

WGN: White Gaussian Noise

CWGN: Complex White Gaussian Noise

BPTT: Backpropagation Through Time

Pdf: Probability distribution function

CC-QLMS: Collaborative Combination Quaternion Least Mean Square

KF: Kalman Filter

EKF: Extended Kalman Filter

QKF: Quaternion Kalman Filter

QEKF: Quaternion Extended Kalman Filter

16

Mathematical Notations

x lower case denotes scalar

x boldface lower case denotes vector

xa augmented vectors

R real field

C complex field

H quaternion field

Rn n vector field

[·]T transpose operation

[·]H Hermitian operation

[·]∗ conjugate operation

‖ · ‖2 Eucledian norm

[·]ı,,κ ı, and κ involution

Φ nonlinear activation function

O(·) order of computational complexity

Rp prediction gain

ε white gaussian noise

Φs Split Quaternion function

Φ Locally Analytic Quaternion function

µ learning rate

R correlation matrix

Mathematical Notations 17

P cross correlation matrix

Ψ sensitivity

Υ conjugate sensitivity

e exponential function

∇ gradient

qa,b,c,d real, ı, , κ part of the quaternion vector

[·]′

first order derivative

R real part of the variable

I imaginary parts of the variable

Q quaternion field

δ dirac-delta function

b·c floor operator

Cqq covariance matrix

Pqq pseudocovariance matrix

Cqı ı-covariance matrix

Cq -covariance matrix

Cqκ κ-covariance matrix

Iı,,κ ı, , κ part of the variable

, equality in terms of probability distribution

∀ for all

∈ is an element of

{, } the set of

≈ approximation

σ variance

→ approaches

18

Chapter 1

Introduction

The overview of the research topic is presented in Section 1.1. This is followed by an

elaboration on the motivations and aims of the research in Section 1.2. This chapter ends

with a brief organisation of the thesis in Section 1.3.

1.1 Overview

Neural networks (NN) are central to nonlinear adaptive filtering due to their universal ap-

proximation capabilities [1]. This virtue is derived from the choice of nonlinear activation

functions. The original condition established on such nonlinear activation function is that

it needs to be continuous discriminatory [1]. Funahashi proved that sigmoidal functions

also fall under the class of continuous discriminatory functions [2]. One of the earliest

gradient descent algorithms to apply sigmoidal functions is the Backpropagation (BP) al-

gorithm which trains the feedforward NNs layer by layer [3]. The BP algorithm performs

reasonably well but requires a large number of training data and takes a long time to

converge. This has led to the development of the training algorithm for the recurrent

neural networks (RNNs) which possess the attractive ability to deal with time-varying

input through natural temporal operation. Among the first RNN training algorithms is

the Backpropagation Through Time (BPTT) which unfolds the RNN into a multilayer

feedforward network one layer at each time step. The BPTT offers generality but requires

1.1 Overview 19

huge memory for a long training sequence [4]. The Recurrent Back Propogation (RBP)

does not experience the requirement of huge memory at the expense of being complicated

and unstable [5]. In 1989, William and Zipser proposed an online gradient descent learning

algorithm for the RNNs called the Real Time Recurrent Learning (RTRL) algorithm [6]

which became hugely popular due to its simple weight update and fast direct-gradient

calculation.

As multidimensional data representation became more prominent, the Complex

Least Mean Square (CLMS) algorithm was the first extension of adaptive filtering algo-

rithms enabling processing in the complex domain C [7]. The ability of CLMS to process

two-dimensional signals in the complex domain C led to improved results over the con-

ventional processing in the real domain R. Unlike linear adaptive filters, the nonlinear

adaptive filtering algorithms faced a major obstacle in finding suitable analytic complex-

valued nonlinear activation functions. This resulted from the direct consequences of the

Liouville theorem which states that a bounded entire function must be a constant in C

limiting the scope of nonlinear activation functions that were once suitable in R. To cir-

cumvent this, different classes of split-complex functions which are analytic componentwise

were proposed [8–10]. These split-complex functions were utilised in the Split Complex

Backpropagation (CBP) [8] and later extended to the Split Complex Real Time Recurrent

Learning (Split CRTRL) [11]. These split-complex algorithms have been shown to yield

reasonable performances given that there are no strong couplings between the real and

imaginary part of the complex signals. Kim and Adali later proved that a class of complex

elementary transcendental functions (ETFs) based on the entire complex exponential func-

tions are suitable for complex-valued nonlinear adaptive filtering applications [12]. These

ETFs satisfied the Cauchy-Riemann (CR) conditions proving to be analytic in C and were

later implemented in the design of the Fully Complex Real Time Recurrent Learning (Fully

CRTRL) algorithm [13]. The Fully CRTRL exploits the correlation between the real and

imaginary parts resulting in an improved performance.

Further advances in sensor technology have highlighted the demand for higher di-

mensional adaptive signal processing algorithms to efficiently process the multidimensional

1.1 Overview 20

data. Among the approach is to represent multidimensional signals as vectors in Rn and to

use general split functions. One of the first multidimensional learning algorithms to utilise

this approach is the Three-Dimensional Vector Back-Propagation (3DV-BP) for NNs [14].

The 3DV-BP utilises a matrix operation which does not take advantage of the couplings

between the dimensions. Improvements include the Vector Product Back-Propagation

(VP-BP), which addresses this issue through the use of vector products [15]. A major

drawback of the VP-BP is that the algorithm cannot update the weights in the presence

of non-zero error due to the nature of the vector spaces not forming a division algebra.

Furthermore, the universal function approximation capabilities for both algorithms have

not been investigated as no density theorem has been proven for real vector spaces [16, pp

67-71]. This led to the developments of multidimensional learning algorithms in other

multidimensional spaces.

A natural platform for dealing with the processing of three- and four-dimensional

signals is the quaternion domain H. Quaternions were first conceived by W. Hamilton

in 1843 when he was posed with the problem to extend complex numbers into higher

dimensions [17]. Frobenius later regarded quaternions as the highest associative division

algebra which made it more attractive to work with as compared to other hypercomplex

spaces [18, 19]. Recently, quaternions experienced an uprising and were proven to be

popular across many areas of engineering such as molecular modelling [20], computer

graphics [21] and robotics [22]. In the statistical signal processing field, quaternions have

been employed in adaptive filtering, including Kalman filtering [23] and stochastic gradient

algorithms, such as the Quaternion Least Mean Square (QLMS) [24].

However, they are still relatively underexplored in nonlinear adaptive filtering

mainly due to the lack of analytic nonlinear functions in H. The very stringent Cauchy-

Riemann-Fueter (CRF) conditions [25] ensure that the only globally analytic quaternion-

valued functions are the linear functions and constants. Analogous to C, a split-quaternion

function that treats each channel separately (as a real channel) passed through a real

smooth nonlinearity was proposed and employed in the training of Quaternion Multilayer

Perceptron (QMLP) [16]. The QMLP training algorithm exhibited enhanced performance

1.2 Motivations and Aims 21

over vector based algorithms owing to the power of processing in H. Despite the gain,

the training of QMLP suffers when there exist strong correlations between the dimensions

of the signal. Furthermore, the QMLP training algorithm does not take into account the

non-commutativity aspect of quaternion product in its derivation.

1.2 Motivations and Aims

The main aim of this research is to extend complex-valued nonlinear adaptive filtering

algorithms to the quaternion domain H. Due to the non-commutativity aspect of the

quaternion product [26], the derivation of these algorithms need to explicitly take this

into account. The main concern lies in the analytic properties of the nonlinear activation

functions. The lack of suitable analytic quaternion nonlinear functions [25] does not per-

mit generalisation of nonlinear signal processing to H. Previous approaches, such as the

training algorithm for the Quaternion Multilayer Perceptron (QMLP) [16], were based on

the split-quaternion functions whereby the processing is done componentwise making it

unsuitable for real-world signals. Recent local analyticity condition (LAC) [27] has pro-

vided an alternative to the strict Cauchy-Riemann-Fueter (CRF) conditions. Satisfying

the LAC indicates the nonlinear quaternion-valued functions is locally analytic. These

locally analytic functions are only guaranteed the first-order differentiability at the cur-

rent operating point. Based on these results, several novel finite impulse response (FIR)

nonlinear quaternion adaptive filtering algorithms were derived by employing these locally

analytic functions. The derived algorithms were then extended to the recurrent neural

networks (RNN) architecture.

Another objective is to extend the current real- and complex-valued algorithms to

H. This enables the tools that were once available in the real domain R and complex

domain C, to be also accessible in H. The algorithm under consideration is the fractional

tap-length (FT) algorithm [28] which was devised for real-valued filters with a recent

extension to complex-valued filters [29]. This extension will open up the possibility of new

applications in H.

1.3 Organisation of the Thesis 22

1.3 Organisation of the Thesis

The organisation of the thesis is as follows. Chapter 2 introduces the fundamental con-

cepts behind the subsequently derived algorithms. In Chapter 3, the derivation for the

quaternion-valued nonlinear adaptive filters algorithms employing the split quaternion

functions which take into account the non-commutativity aspect are presented. Chap-

ter 4 introduces a class of locally analytic quaternion nonlinear functions which are then

implemented in the derivation of a class of fully quaternion nonlinear adaptive filtering

algorithms. Chapter 5 shows the extension of the derived algorithms in Chapter 4 to the

recurrent neural networks (RNNs) architecture. Chapter 6 provides an analysis to the

fractional tap-length (FT) algorithm extended to the quaternion domain H. Future works

and conclusions are given in Chapter 7.

23

Chapter 2

Background Theory

This chapter begins with an introduction to adaptive systems in Section 2.1. Section

2.2 introduces the basic quaternion operators. Section 2.3 presents the fundamentals of

quaternion domain H second-order statistics. This is followed by a discussion regarding

the analyticity conditions in the quaternion domain H in Section 2.4. The fundamen-

tals characteristics of activation functions for gradient-descent algorithms are detailed in

Section 2.5. Section 2.6 shows the derivation and performance analysis on one of the ear-

liest quaternion-valued adaptive filtering algorithm, the Quaternion Least Mean Square

(QLMS). An introduction to Quaternion Kalman Filtering is presented in Section 2.7.

2.1 Adaptive Systems Configuration

The basic structure of an adaptive filter is shown in Figure 2.1 where x(n) is the input

signal, y(n) is the filter output and d(n) is the desired output. The instantaneous error

e(n) is defined by the difference between the desired signal and the filter output which

is e(n) = d(n) − y(n). The filter output y(n) is given as y(n) = wT (n)x(n) where

x(n) = [x(n − 1), · · · , x(n − p)]T is the input signal, p is the filter length, (·)T is the

transpose operator, and w(n) = [w(n − 1), · · · , w(n − p)]T is the filter weights. The

adaptive filter adjusts its filter parameters w(n) using an algorithm which optimises the

cost function J(n). J(n) is usually defined based on the instantaneous error square e2(n).

2.1 Adaptive Systems Configuration 24

Figure 2.1: Adaptive Systems

Wiener theory has stated that the optimal coefficients of an adaptive filter are found

by minimising the expectation operator of the error square cost function J(n) = e2(n) [30].

Assuming that e(n), d(n) and x(n) are wide sense stationary with zero mean, it can be

shown that

E{J(n)} = E{(d(n)− y(n)

)2}

= E{d2(n) +wT (n)x(n)xT (n)w(n)− 2d(n)xT (n)w(n)}

= E{d2(n)}+wT (n)E{x(n)xT (n)}w(n)− 2E{d(n)xT (n)}wT (n)

= E{d2(n)}+wT (n)Rw(n)− 2PTw(n) (2.1)

where R is the input correlation matrix and P is the cross correlation vector between the

desired signal and input signal.

In order to find the optimal weight wopt, differentiate (2.1) with respect to w and

set the results to zero which will yield the Wiener-Hopf equation given by

∇wJ = 2Rw − 2P (2.2)

2.1 Adaptive Systems Configuration 25

The optimal weight wopt is calculated to be

wopt = R−1P (2.3)

In reality, it is not possible to have an exact measurement of the gradient vector in

(2.2) since that would require prior knowledge of R and P. Therefore, the gradient vector

has to be estimated from the available data and the weights are made adaptive through a

gradient descent update specified by

w(n+ 1) = w(n)− µ∇wE(n) (2.4)

where µ is the real-valued learning rate.

Consider the instantaneous estimation of the input correlation matrix R(n) and cross

correlation vector P(n) given by

R(n) = x(n)xT (n); P(n) = d(n)xT (n); (2.5)

Correspondingly, by substituting the instantaneous estimations of R(n) and P(n) (2.5)

into the instantenous gradient ∇wJ(n) to give

∇wJ(n) = 2x(n)xT (n)w(n)− 2d(n)xT (n) (2.6)

Substitute the instantaneous gradient obtained in (2.6) into the gradient descent update

defined in (2.4) to yield the final weight update as

w(n+ 1) = w(n)− µ

(

2x(n)xT (n)w(n) + 2d(n)xT (n)

)

= w(n)− µ

(

x(n)(xT (n)w(n) + d(n)

))

= w(n) + µe(n)x(n) (2.7)

where 2 is absorbed into µ.

The final weight update derived in (2.7) is the weight update of the Least Mean

2.2 Quaternion Algebra 26

Square (LMS) algorithm which proves that the recursive nature of the LMS algorithm

would converge to the optimal Wiener-Hopf equation in (2.3). This shows that algorithms

of this nature converged to the Wiener-Hopf equation.

2.2 Quaternion Algebra

Throughout the years, quaternion has been applied in various scientific applications rang-

ing from computer graphics [21] up to wind modelling [31]. Quaternions were first con-

ceived by W. Hamilton in 1843 [17] and proven to be the highest associative division

algebra [19] making it attractive to work in. The dilemma of modelling in the quaternion

domain versus modelling in R4 has been long present [32–34] and traditionally quaternion

based nonlinear adaptive filtering is still in its infancy.

A basic quaternion variable q ∈ H is defined as having a scalar part and a vector

part which can be represented as

q = [qa, q] = qa + qbı+ qc+ qdκ (2.8)

where qa, qb, qc, qd ∈ R and ı, , κ are both imaginary units and orthogonal unit vectors.

The relationships between these imaginary units and orthogonal unit vectors are shown

to be

ı = κ; κ = ı; κı = ; ıκ = ı2 = 2 = κ2 = −1 (2.9)

The addition and subtraction operations in quaternion algebra are defined similarly to

real and complex algebra which are given by

w ± x = [wa ± xa, w ± x] = (wa ± xa)± (wb ± xb)ı± (wc ± xc)± (wd ± xd)κ (2.10)

Quaternion is notoriously known for its non-commutative product given by

wx = [wa, w][xa, x] = [waxa − w · x, wax+ xaw + w × x] (2.11)

where the symbols “·” and “×” denote respectively to the dot-product and cross-product.

2.2 Quaternion Algebra 27

The non-commutativity of the quaternion product arises due to the presence of the cross-

product. Quaternions are a division algebra as the product of two non-zero quaternion

variables can never be zero.

Due to the inherent non-commutativity nature, there are two definitions for quater-

nion division operator which are

Right Division :w

x= wx−1; Left Division :

w

x= x−1w (2.12)

It can be seen that x−1w 6= wx−1, making the left division and right division not equiva-

lent. For clarity, the default definition for quaternion division operator used in this thesis

throughout is the right division.

Similar to the complex case, the conjugate of a quaternion q is

q∗ = [qa, q]∗ = [qa,−q] = qa − qbı− qc− qdκ (2.13)

and its norm square is

‖ q ‖22= qq∗ = q∗q = q2a + q2b + q2c + q2d (2.14)

Other operators of equivalence important to this work are the three quaternion involutions

(self-inverse mappings) given by

qı = −ıqı = qa + qbı− qc− qdκ

q = −q = qa − qbı+ qc− qdκ

qκ = −κqκ = qa − qbı− qc+ qdκ (2.15)

From this point onwards, all quantities are treated as quaternion valued, unless stated

otherwise.

2.3 Augmented Quaternion Statistics 28

2.3 Augmented Quaternion Statistics

The concept of augmented statistics in division algebra was first introduced to define

the notion of second-order noncircularity, or improperness, for complex random normal

vectors [35], and was subsequently extended to non-normal vectors [36]. In the complex

domain C, the second-order properness of a complex random vector can be fully charac-

terised by its covariance Czz and pseudocovariance Pzz, defined as [35]

Czz = E(zzH); Pzz = E(zzT ) (2.16)

where (·)H and (·)T denote respectively the Hermitian and transpose vector operator, and

z = x+ yı where x and y are real-valued. A complex random vector is termed “circular”

if its probability distribution is rotation-invariant. In the second-order sense, this implies

that the real-valued vectors x and y

Cxx = Cyy; Cxy = −CTxy (2.17)

resulting in a vanishing pseudocovariance Pzz [37]. In the scalar case, this reduces to that

the real and imaginary components have equal variance and are not correlated [38,39].

2.3.1 Cη-circular Quaternion Random Variables

The concept of augmented statistics was extended to the quaternion domain in [40], albeit

with the restriction of a single rotation axis of either ı, , or κ. A quaternion random

variable q that obeys this condition is said to be Cη-circular, and is defined as

q , qeηθ,∀θ (2.18)

for one and only one pure imaginary unit η, where η ∈ {ı, , κ}. The symbol , denotes

equality in terms of the probability distribution function (pdf) and the symbol θ represents

the angle of rotation.


2.3.2 H-circular Quaternion Random Variables

The restriction of a single rotation axis for Cη-circular random variable has proven to

be too rigid in practical scenarios and a generalisation, allowing for a pdf along any two

arbitrary axes of rotation to be circular, was introduced in [41]. A quaternion random

variable q that satisfies this condition is said to be H-circular, or Q-proper, and is defined

as

q , qeηθ,∀θ (2.19)

for all the pure imaginary units η ∈ {ı, , κ}. An H-circular quaternion random variable

is circular in all its dimensions, meaning that the scatterplot of any two components of

{1, ı, , κ} is circular. A Q-proper (second-order circular) random variable q is defined

as the one that has equal powers in all the components, qa, qb, qc, and qd.

2.3.3 Augmented Second-Order Statistics of Quaternion Random Vec-

tors

Similarly to the complex case, in general the covariance alone is not sufficient to fully

describe the complete second-order information within the quaternion random vector. To

provide a generic framework for second-order statistical modelling of quaternion vectors,

that is to deal with Q-improper signals, complementary covariance matrices (pseudoco-

variances) need to be employed. These complementary covariance matrices are termed

the ı-covariance Cqı, -covariance Cq and κ-covariance Cqκ, and are given by [42,43]

Cqı = E{qqıH}; Cq = E{qqH}; Cqκ = E{qqκH} (2.20)

Thus, the complete second-order characteristics of the quaternion random vec-

tor are described by the augmented covariance matrix Caq of an augmented vector


qa = [qTqıTqTqκT ]T , given by1

Caq = E{qaqaH} =

Cqq Cqı Cq Cqκ

CHqı Cqıqı Cqıq Cqıqκ

CHq Cqqı Cqq Cqqκ

CHqκ Cqκqı Cqκq Cqκqκ

(2.21)

where the submatrices in (2.21) are calculated according to2

Cδ = E{qδH} Cαβ = E{αβH}

δ ∈ {qı,q,qκ} α,β ∈ {q,qı,q,qκ} (2.22)

A quaternion random vector q is said to be Cı-circular when the -covariance Cq

and κ-covariance Cqκ vanish [43]. Similar definitions hold for C-circular and Cκ-circular

quaternion random vectors. The semi-widely linear model, based on the statistics of Cη

circularity, is described in [43]. On the other hand, an H-circular quaternion random

vector q has the property that it is not correlated with its quaternion involutions qı, q

and qκ, that is

E{qqıH} = 0; E{qqH} = 0; E{qqκH} = 0 (2.23)

yielding the augmented covariance matrix Caq in (2.21) of a H-circular random vector in

the form3

Caq = E{qaqaH} =

Cqq 0 0 0

0 Cıqq 0 0

0 0 Cqq 0

0 0 0 Cκqq

(2.24)

1As long as the covariance matrix Cqq is nonsingular, then it shows immediately that the other co-variance matrices Cqıqı, Cqq, Cqκqκ have inverses. Therefore, the augmented Ca

q is full rank and thereforenonsingular.

2The matrices Cqηqη are an involution of Cqq over η and therefore can be simplified to Cηqq where

η ∈ {ı, , κ} [43].3Any other basis comprising four combinations out of {q,qı

,q,q

κ} and their conjugates are equallyvalid. The basis proposed in [42] and used here, qa = [qT

qıTqTqκT ]T provides most convenient repre-

sentation, as shown in the augmented covariance structure for H-circular signals in (2.21) and (2.24).

2.4 Analyticity in H 31

To exploit the complete second-order statistics of quaternion valued signals, a fil-

tering model similar to the widely linear model in C needs to be considered [38,44]. The

quaternion widely linear model is based on the augmented basis that builds the matrix Caq

(2.21), and can be described by [42,43,45,46]

y = waTxa = gTx+ hTxı + uTx + vTxκ (2.25)

where g, h, u and v are the weight vectors, x is the input signal, xı, x and xκ are

respectively its ı, and κ involutions, wa = [gT hT uT vT ]T is the augmented weight

vector, and xa = [xT xıT xT xκT ]T is the augmented random input vector. Another

benefit of the quaternion widely linear model is the possibility to determine the degree of

properness of quaternion random vectors [47,48].

2.4 Analyticity in H

In the complex domain C, the notion of analyticity conforms with holomorphic, harmonic-

ity and conformality, therefore one notion would imply the others. However, this is not

the case in the quaternion domain H due to the non-commutativity product. Each of the

notions mentioned above needs to be re-evaluated in H. The notion of interest to this

thesis is holomorphy which means the existence of the derivative of the function. In order

to make the terms used synonymous with past literatures in R and C, the term analyticity

is adopted to define the existence of the derivative of the function.

The analyticity of a complex function f(z) = u(x, y) + v(x, y)ı is governed by the

Cauchy-Riemann (CR) equations given by

∂u(x, y)

∂x=∂v(x, y)

∂y;

∂v(x, y)

∂x= −

∂u(x, y)

∂y(2.26)

For a complex function f(z) to be analytic in C , the derivatives along the real and

imaginary axis have to be equal, that is

∂f(z)

∂x+∂f(z)

∂yı⇔

∂f(z)

∂z∗= 0 (2.27)

2.4 Analyticity in H 32

where z = x+ yı.

By continuity, one of the first definitions for analyticity in H is described by the

Generalized Cauchy-Riemann (GCR) conditions. Due to the non-commutative nature of

the quaternion product, there exist two definitions of GCR which are given by [49]

Right GCR :∂f(q)

∂qa= −

∂f(q)

∂qbı = −

∂f(q)

∂qc = −

∂f(q)

∂qdκ (2.28)

Left GCR :∂f(q)

∂qa= −ı

∂f(q)

∂qb= −

∂f(q)

∂qc= −κ

∂f(q)

∂qd(2.29)

where q = qa + qbı+ qc+ qdκ.

These two definitions for quaternion analyticity create ambiguity as to which con-

dition to exercise when determining the analyticity of the function. The derivative of the

function obtained through left GCR is called left derivative and right GCR is the right

derivative. These GCRs are only satisfied by a special form of quaternion linear func-

tions and constants proving to be too prohibitive for any practical application such as in

neural networks, where typically nonlinear neuron models are involved. The restrictive

nature of the GCR conditions arises from the fact that they were initially proposed for a

four-dimensional domain, with Clifford algebra as their basis, making them unsuitable for

applications in H [50].

To circumvent this issue, Fueter further relaxed these conditions by redefining them

based on a quaternion basis, resulting in the left and right Cauchy-Riemann-Fueter (CRF)

conditions given by [25]

Right CRF :∂f(q)

∂qa+∂f(q)

∂qbı+

∂f(q)

∂qc+

∂f(q)

∂qdκ = 0 (2.30)

Left CRF :∂f(q)

∂qa+ ı

∂f(q)

∂qb+

∂f(q)

∂qc+ κ

∂f(q)

∂qd= 0 (2.31)

Unlike the GCRs conditions, these CRFs conditions are defined by a single quaternion par-

tial differentiation which lead to a close analogue of Cauchy’s theorem, Cauchy’s integral

formula and the Laurent expansion [51]. Furthermore, these CRFs provide a generaliza-

tion over the GCRs by permitting cannonical complex variable limit as a solution. The

2.5 Review of Nonlinear Functions 33

cannonical complex limits refers to functions of a complex variables involving only a single

imaginary units which are

qı = qa + qbı; q = qa + qc; qκ = qa + qdκ; (2.32)

However, similar to the GCR conditions, the notion for analyticity is still ambiguous as

there exists the left derivative and right derivative. It can be shown that only linear

quaternion functions and constants satisfy these CRF conditions [25], limiting the scope

for nonlinear adaptive filtering in H which requires differentiable nonlinear functions.

2.5 Review of Nonlinear Functions

The choice of nonlinear function has a key influence on determining the performance of

the nonlinear adaptive filters. The fundamentals of determining the choice of suitable

activation function goes way back to the Hilbert 13th problem. Hilbert 13th problem

questions the possibility of expressing a general algebraic equation of a high degree by

using the sums and compositions of single variable functions. Kolmogrov showed that the

conjecture of Hilbert 13th problem was incorrect and provided a general representation

theorem stating that any real-valued continuous function f can be represented as

f(x1, . . . , xn) =

2n+1∑

q=1

Φq

( n∑

p=1

ψpq(xp)

)

(2.33)

where Φq and ψpq are nonlinear continuous function of one variable. (2.33) proves that any

function of general number of variables can be approximated with nonlinear continuous

functions of a single variable.

The Kolmogorov’s theorem provided the existence proof for the approximation

capabilities of neural networks (NN). Based on this, the first proof of the universal ap-

proximation capabilities of NNs is given by

f(x) ≈N∑

i=1

wiσ(aTi x+ bi) (2.34)

2.5 Review of Nonlinear Functions 34

where ai, wi, bi are dense in the space of continuous function defined on [0, 1]n and σ is a

discriminatory function [1]. It was concluded that any bounded and measurable sigmoidal

functions is a discriminatory function [1, 2].

For gradient-descent learning algorithms, the sigmoidal functions should be differ-

entiable and bounded. To put emphasis on the differentiability aspect of the nonlinear

function, the weight update of the Nonlinear Gradient Descent (NGD) algorithm is pre-

sented as [52]

w(n+ 1) = w(n) + µe(n)Φ′

(wT (n)x(n))x(n) (2.35)

where e(n) is the error, w(n) is the adaptive weight vectors, x(n) is the filter input and

Φ′

(·) is the first-order derivative of the nonlinear function.

The differentiability of the nonlinear function was proven to be problematic in the

complex domain C. This is due to violating Liouville theorem which states that a bounded

entire function must be a constant in C. In order to cater for such a conflict, split-complex

functions which process componentwise are implemented, defined by

Φs

(

wT (n)x(n)

)

= Φr

(

R(wT (n)x(n)

))

+Φi

(

I(wT (n)x(n)

))

ı (2.36)

where Φr and Φi are real-valued sigmoidal functions. The symbolsR(·) and I(·) correspond

to the real and imaginary component respectively.

The properties of suitable split-complex functions for gradient descent adaptive

filtering applications are specified below [9]

a) f(z) = u(x, y) + v(x, y)ı is nonlinear in x and y;

b) f(z) has no singularities and is always bounded for all values of z;

c) The partial derivatives ∂u∂x ,

∂v∂y ,

∂v∂x and ∂u

∂y are continuous and bounded;

d) ∂u∂x

∂v∂y 6= ∂v

∂x∂u∂y to avoid the error gradient becoming zero for any non-zero inputs

ensuring continuous learning;

Despite the strict conditions imposed on the split-complex functions, these functions

2.6 Quaternion-valued Adaptive Filtering 35

do not give accurate gradient measurements as it does not satisfy the Cauchy-Riemann

(CR) conditions. Furthermore, the split-complex functions performed poorly for signals

that have high correlation between the two dimensions. With this motivation, Kim and

Adali proposed the usage of a class of complex elementary transcendental functions (ETF)

derivable from the entire complex exponential functions ez [12]. These fully complex func-

tions satisfy the CR conditions and the properties specified in [9] justifying its suitability

for gradient-descent adaptive filtering in C.

The situation in H proved to be more difficult than in C. In H, there is no known dif-

ferentiable nonlinear function as the analyticity is dictated by the strict Cauchy-Riemann-

Fueter (CRF) conditions. The CRF is only satisfied by a constant and linear function

hindering the growth of nonlinear adaptive filtering in H.

In order to circumvent the issue of analyticity, it was proposed to apply the split-

quaternion functions. The split-quaternion function that processes componentwise is given

as

Φs

(

wT (n)x(n)

)

= Φa

(

R(wT (n)x(n)

))

+Φb

(

Iı

(wT (n)x(n)

))

ı+Φc

(

I

(wT (n)x(n)

))

+ Φd

(

Iκ

(wT (n)x(n)

))

κ (2.37)

with Φa, Φb, Φc, Φd are real-valued sigmodial functions. The symbols Iı(·), I(·) and Iκ(·)

correspond to the ı, and κ components respectively.

Similar to C, the main problem inherent to the split-quaternion function is the

inadequacy of processing signal that has strong correlations between the four dimensions.

2.6 Quaternion-valued Adaptive Filtering

The cost function in quaternion-valued adaptive filtering is usually given by a real function

of quaternion variables such as

E(n) = e2a(n) + e2b(n) + e2c(n) + e2d(n) = e(n)e∗(n) (2.38)


Figure 2.2: Linear adaptive finite impulse response (FIR) filter

where the terms ea(n), eb(n), ec(n) and ed(n) denote respectively the error component in

the real part, ı part, part, and κ part.

Based on (2.38), the derivation of one of the earliest quaternion-valued adaptive filtering

algorithm, the Quaternion Least Mean Square (QLMS) [24] is provided in the coming

subsection.

2.6.1 Derivation of Quaternion Least Mean Square (QLMS)

The Quaternion Least Mean Square (QLMS) is derived based on the finite impulse response

(FIR) architecture. The basic structure of a FIR is depicted in Figure 2.2 with the output

y(n) and conjugate output y∗(n) of the filter given by

y(n) = wT (n)x(n); y∗(n) = xH(n)w∗(n) (2.39)

where w(n) and x(n) correspond to the adaptive weight vectors and the filter input.

The QLMS is made adaptive according to a gradient descent update of the coeffi-

cients, given by

w(n+ 1) = w(n)− µ∇wE(n) (2.40)

where µ is the real valued learning rate.

From (2.38), the gradient ∇wE(n) is derived to be [24]

∇wE(n) = e(n)∇we∗(n) +∇we(n)e

∗(n)

= e(n)(∇wd

∗(n)−∇wy∗(n)

)+

(∇wd(n)−∇wy(n)

)e∗(n)

= −

(

e(n)∇wy∗(n) +∇wy(n)e

∗(n)

)

(2.41)


The terms ∇wy(n) and ∇wy∗(n) in (2.41) are defined as

∇wy(n) = ∇way(n) +∇wby(n)ı+∇wcy(n)+∇wd

y(n)κ (2.42)

∇wy∗(n) = ∇way

∗(n) +∇wby∗(n)ı+∇wcy

∗(n)+∇wdy∗(n)κ (2.43)

The gradients ∇wy(n) in (2.42) and∇wy∗(n) in(2.43) are derived in [24]. For convenience,

the full gradients derivation is provided in Appendix A. The final gradients are given by

∇wy(n) = −2x∗(n); ∇wy∗(n) = 4x∗(n) (2.44)

Replacing (2.44) into the error gradient in (2.41) to give the final QLMS weight update

of [24]

w(n+ 1) = w(n) + µ(2e(n)x∗(n)− x∗(n)e∗(n)

)(2.45)

where 2 is absorbed into µ.

For the sake of comparison, the weight update of the Complex Least Mean Square

(CLMS) is reproduced here and is given by [7]

w(n+ 1) = w(n) + µe(n)x∗(n) (2.46)

Comparing the weight update structure of the QLMS (2.45) with the CLMS (2.46)

proved that the QLMS is not a simple extension of the CLMS. The extra term in the QLMS

weight update is needed to capture the extra statistical information exists in utilizing the

quaternion domain H.

2.6.2 Analysis of Quaternion Least Mean Square (QLMS)

In order to analyse the performance of the QLMS, the standard assumption in adaptive

filtering is made, which is

d(n) = wToptx(n) (2.47)

where wopt is the optimal weight specified by the Wiener-Hopf equation in (2.3).


Following the standard analysis of convergence in the mean [30], the weight error

vector v(n) is defined as

v(n) = w(n)−wopt (2.48)

The error e(n) is then rewritten to be

e(n) = d(n)− y(n)

= wToptx(n)−wT (n)x(n)

= −vT (n)x(n) (2.49)

The QLMS analysis will be based on the two following observations separately:

a) y(n)∗ = y(n) when I{y(n)} = 0;

b) y(n)∗ = −y(n) when R{y(n)} = 0.

From the weight update in (2.45) and exercising situation (a), the real-part of the weight

update is calculated to be

R{w(n+ 1)} = R{w(n)}+R{µ(2e(n)x∗(n)− x∗(n)e∗(n)

)}

= R{w(n)} − 2R{µvT (n)x(n)x∗(n)}+R{µvT (n)x(n)x(n)}

= R{w(n)} − 2R{µ(vT (n)x(n)xH(n)}

)T+R{µ

(vT (n)x(n)xT (n)}

)T(2.50)

Next, subtract wopt from both sides of (2.50) to yield

R{v(n+1)} = R{v(n)}− 2R{µ(vT (n)x(n)xH(n)

︸︷︷︸

Covariance

})T

+R{µ(vT (n) x(n)xT (n)

︸︷︷︸

Pseudocovariance

})T

(2.51)

The recursive weight error vector v(n) (2.51) shows that the QLMS considers both covari-

ance Cqq and pseudocovariance Pqq in its weight updates.

Similarly, considering the vector part of the QLMS weight update in situation (b)

2.7 Introduction to Quaternion Kalman Filtering 39

would lead to

I{v(n+ 1)} = I{v(n)} − 2I{µ(vT (n)x(n)xH(n)

︸︷︷︸

Covariance

})T

− I{µ(vT (n) x(n)xT (n)

︸︷︷︸

Pseudocovariance

})T

(2.52)

which proves that the covariance Cqq and pseudocovariance Pqq are still involved in the

weight updates of the QLMS. Therefore, this indicates that complex “augmented statis-

tics” is inherent to this class of algorithms.

2.7 Introduction to Quaternion Kalman Filtering

Kalman filter (KF) algorithm operates in the state-space as opposed to the Wiener filter

which minimises a specified cost function. KF versatility stems from the flexible process

and measurement state models which can be modified according to the application at

hand. The Quaternion Kalman Filter (QKF) algorithm was first derived for attitude

control utilizing the q-method based approach [23]. For simplicity, the QKF derived in

this section is based on the basic model given by

Process State : x(n+ 1) = F(n)x(n) + ε1(n) (2.53)

Measurement State : y(n + 1) = H(n)x(n) + ε2(n) (2.54)

where x is the M × 1 state vector, F is the M ×M transition matrix, y is the N × 1

observable output vector and H is the N ×M measurement matrix. Both ε1 and ε2 are

i.i.d. quadruply white Quaternion Gaussian noise (QWGN) vector of M × 1 and N × 1

respectively.

The basic operations of QKF is divided into two distinct steps:

a) the time update step which predicts the a priori state vector x−;

b) the measurement update which corrects the a priori prediction x− upon receiving

the observable output y.

Firstly, consider the time update stage where theM×1 a priori state vector x− is predicted


by

x−(n) = F(n)x(n− 1) (2.55)

After that, the M ×M a priori estimated error covariance matrix P− is calculated to be

P−(n) = F(n)P(n− 1)FH(n) +Q1(n) (2.56)

where Q1 is the M ×M covariance matrix of process noise ε1.

Proceeding to the measurement update stage, the previously estimated a priori error

covariance matrix P− is used to calculate the M ×N Kalman Gain matrix K according

to

K(n) = P−(n)HH(n)[H(n)P−(n)HH(n) +Q2(n)]−1 (2.57)

where the symbol (·)−1 denotes matrix inverse and Q2(n) is the N ×N covariance matrix

of the measurement noise ε2.

Next, the N × 1 innovations vector αi is defined as

αi(n) = y(n)−H(n)x−(n) (2.58)

Utilizing the Kalman Gain K and innovation vectors αi, the M × 1 estimated a posteriori

state vector x is corrected by

x(n) = x−(n) +K(n)αi(n) (2.59)

Finally, the M ×M estimated a posteriori error covariance P is updated according to

P(n) = (I−K(n)H(n))P−(n) (2.60)

where I is the M ×M identity matrix.

These two update stages are computed at every iteration of the QKF algorithm.

Comparing (2.59) with the QLMS weight update (2.45), it can be seen that the Kalman

Gain K functions a similar role to the learning rate µ.


In order to model nonlinear dynamics effectively, the Extended Kalman Filter

(EKF) is proposed. The Quaternion Extended Kalman Filter (QEKF) is derived by con-

sidering a simple nonlinear state space given by [53].

Process State : x(n+ 1) = ΦP

(F(n)x(n)

)+ ε1(n) (2.61)

Measurement State : y(n + 1) = ΦM

(H(n)x(n)

)+ ε2(n) (2.62)

where ΦP (·) and ΦM(·) both nonlinear function of the process and measurement state

respectively.

The QEKF is derived in a similar fashion to the QKF resulting in similar expres-

sions. One major difference is that the EQKF requires the computation of the quaternion

nonlinear functions derivatives, ΦP (·)′

and ΦM (·)′

, which posed to be problematic.

Despite the enhancement provided by the QEKF, the QKF is a more favourable

approach to attitude estimation. This is because the QEKF is sensitive to initial conditions

and biases in the estimation errors [23]. Furthermore, the derivatives, ΦP (·)′

and ΦM (·)′

,

are unstable as they do not fulfill the Cauchy-Riemann-Fueter (CRF) conditions.

42

Chapter 3

A Class of Split Quaternion

Nonlinear Adaptive Filters

This chapter proposes a class of split quaternion learning algorithm for the training of non-

linear finite impulse response (FIR) adaptive filters for the processing of three- and four-

dimensional signals. For higher dimensional signals, it can be represented as a quaternion-

vector similar to the Quaternion Kalman Filter approach [23]. These algorithm derivations

take into consideration explicitly the non-commutativity of the quaternion product. The

additional information obtained by this method provides improved performance on pro-

cessing hypercomplex processes. A rigorous analysis of the convergence of the proposed

algorithms is also provided. Simulation results on both benchmark and real-world signals

justify the proposed approach.

3.1 Introduction

The introduction of Quaternion Multilayer Perceptron (QMLP) has opened up many ap-

plications in the quaternion domain H such as polarized signal classification [54] and

controlling the attitude of a rigid body [55]. Despite reaping the benefits of processing in

H, the performance of the QMLP can still be improved upon. This is because the QMLP

did not explicitly consider the non-commutativity of quaternion product in its derivation.

3.2 Derivation of Split Quaternion Algorithms 43

The aim of this chapter is to introduce a quaternion valued nonlinear finite impulse

response (FIR) adaptive filter suitable for the processing of nonlinear signals. The learn-

ing algorithms introduced, the Split Quaternion Nonlinear Adaptive Filtering Algorithm

(SQAFA) and the Adaptive Amplitude Split Quaternion Nonlinear Adaptive Filtering

Algorithm (AASQAFA), are derived rigorously in order to explicitly address the non-

commutativity of the quaternion product and to compensate the large dynamical range

of the quaternion signal.

The chapter is organised as follows. In Section 3.2, the proposed SQAFA and

AASQAFA are derived followed by the analysis on their convergence properties. Sec-

tion 3.3 shows the performances on SQAFA and AASQAFA algorithms compared against

the QMLP, QMLP for FIR (QMLP-FIR) and the corresponding complex and multidi-

mensional real-valued algorithms, through simulations on both benchmark and real-world

multidimensional data. Section 3.4 provides further elaboration of the results obtained.

Finally, the chapter concludes in Section 3.5.

3.2 Derivation of Split Quaternion Algorithms

The cost function in quaternion-valued adaptive filtering is given by

E(n) = e2a(n) + e2b(n) + e2c(n) + e2d(n) (3.1)

= e(n)e∗(n) (3.2)

where the terms ea(n), eb(n), ec(n) and ed(n) denote respectively the error component in

the real part, ı part, part, and κ part.

The current quaternion-valued nonlinear adaptive filtering algorithms utilise the

split quaternion functions in order to circumvent the strict Cauchy-Riemann-Fueter (CRF)

analyticity conditions. The output of a split quaternion function is given as

Φs(q) = Φa(qa) + Φb(qb)ı+Φc(qc)+Φd(qd)κ (3.3)


Figure 3.1: Nonlinear adaptive finite impulse response (FIR) filter

where Φs(·) denoting split quaternion nonlinear function, Φa(·) is a real-valued nonlinear

activation function applied to the real-part of q, Φb(·) to the ı part, Φc(·) to the part

and Φd(·) to the κ part.

The derivative of the split quaternion nonlinear function Φ′

s is defined to be

Φ′

s(q) = Φ′

a(qa) + Φ′

b(qb)ı+Φ′

c(qc)+Φ′

d(qd)κ (3.4)

where the derivatives Φ′

a(·), Φ′

b(·), Φ′

c(·) and Φ′

d(·) are real-valued derivatives defined com-

ponentwise.

The following algorithms derived are based on the split quaternion nonlinear functions.

3.2.1 Derivation of Quaternion-valued Finite Impulse Response algo-

rithm

In order to perform a fair comparison with the proposed algorithms, the quaternion-valued

nonlinear algorithm under consideration needs to be of the same nonlinear FIR architecture

shown in Figure 3.1. Therefore, the QMLP is derived for the nonlinear FIR architecture

and aptly named QMLP for Finite Impulse Response (QMLP-FIR) filter.

The output of the QMLP-FIR algorithm is given by

y(n) = Φs(net(n))

= Φa(neta(n)) + Φb(netb(n))ı+Φc(netc(n))+Φd(netd(n))κ

= ya(n) + yb(n)ı+ yc(n)+ yd(n)κ (3.5)

where net is defined as net(n) = wT (n)x(n) with w(n) and x(n) correspond to the adap-


tive weight vector and the filter input. Symbols (·)T and (·)∗ denote the transpose and

quaternion conjugate operator. The terms ya(n), yb(n), yc(n) and yd(n) are the compo-

nentwise output of the filter.

The terms neta, netb, netc and netd are real-valued defined by

neta(n) = R{wT (n)x(n)}; netb(n) = Iı{wT (n)x(n)}

netc(n) = I{wT (n)x(n)}; netd(n) = Iκ{w

T (n)x(n)} (3.6)

where the symbols R(·), Iı(·), I(·) and Iκ(·) corresponding to the real, ı, and κ com-

ponents respectively. The full expression of these terms are presented in Appendix B.

The QMLP-FIR then minimises the cost function (3.1) through a gradient descent

weight update specified by

w(n+ 1) = w(n)− µ∇wE(n) (3.7)

where µ is the real-valued learning rate and the gradient ∇wE(n) is given by

∇wE(n) =∂e2a(n)

∂w+∂e2b(n)

∂w+∂e2c(n)

∂w+∂e2d(n)

∂w

= −2ea(n)∂ya(n)

∂w− 2eb(n)

∂yb(n)

∂w− 2ec(n)

∂yc(n)

∂w− 2ed(n)

∂yd(n)

∂w(3.8)

The term ∂ya(n)∂w is calculated by differentiating ya with respect to w which will yield

∂ya(n)

∂w=

∂ya(n)

∂wa+∂ya(n)

∂wbı+

∂ya(n)

∂wc+

∂ya(n)

∂wdκ

= Φ′a(neta(n))xa(n)− Φ′

a(neta(n))xb(n)ı− Φ′a(neta(n))xc(n)− Φ′

a(neta(n))xd(n)κ

= Φ′a(neta(n))x

∗(n) (3.9)

The expression for the remaining terms ∂yb∂w , ∂yc

∂w and ∂yd∂w can be calculated similarly and

are derived in Appendix B. Replacing all these terms into (3.8) will result in the final

weight update of

w(n+ 1) = w(n) + µ

(

e(n) · Φ′

s

(net(n)

)x∗(n)

)

(3.10)


where the factor 2 is absorbed into µ.

3.2.2 Derivation of the Split Quaternion Adaptive Filtering Algorithm

(SQAFA)

For convenient, the derivation of SQAFA considers the cost function of (3.2) which ex-

plicitly takes into consideration the non-commutative nature of the quaternion product1.

The cost function in (3.2) can be rewritten as

E(n) =

(

d(n)− y(n)

)(

d∗(n)− y∗(n)

)

= d(n)d∗(n)− d(n)y∗(n)− y(n)d∗(n) + y(n)y∗(n)

(3.11)

Taking the error gradient ∇wE(n) of (3.11) would give us

∇wE(n) = −d(n)∇wy∗(n)−∇wy(n)d

∗(n) + y(n)∇wy∗(n) +∇wy(n)y

∗(n) (3.12)

The error gradient ∇wE(n) in (3.12) explicitly considers the non-commutativity of

quaternion algebra. To simplify the derivation of SQAFA, the odd-symmetry property of

elementary transcendental functions (ETF) is applied given by

Φ′∗s

(net(n)

)= Φ

′

a

(neta(n)

)− Φ

′

b

(netb(n)

)ı− Φ

′

c

(netc(n)

)− Φ

′

d

(netd(n)

)κ = Φ

′

s

(net∗(n)

)

(3.13)

Applying the property in (3.13), the derivations to determine ∇wy(n) and ∇wy∗(n) can

be simplified resulting in (the derivation is similar to Appendix A)

∇wy(n) = Φ′

s

(net(n)

)(− 2x∗(n)

); ∇wy

∗(n) = Φ′∗s

(net(n)

)(4x∗(n)

)(3.14)

Replacing these gradients into the error gradient ∇wE(n) in (3.12) will give the final

SQAFA algorithm weight update of

w(n+ 1) = w(n) + µ

(

2e(n)Φ′

s

(net∗(n)

)x∗(n)− Φ

′

s

(net(n)

)x∗(n)e∗(n)

)

(3.15)

1In the quaternion domain H, due to the non-commutativity of the quaternion product,∇w

(

e(n)e∗(n))

6= ∇w

(

e∗(n)e(n)

)

. The gradient ∇w

(

e(n)e∗(n))

is chosen as it is quaternion-valued.


3.2.3 Derivation of Adaptive Amplitude Split Quaternion Adaptive Fil-

tering Algorithm (AASQAFA)

Architectures with fixed nonlinearities are not suitable for real-world signals with large

dynamical range. One method of addressing the large dynamics of the signal is through

the implementation of an adaptive slope of the activation function. However, the adaptive

slope of the activation function is interchangeable with the time varying step size of the

learning algorithm, rendering it less effective [56]. To overcome this, a trainable amplitude

of the activation function is implemented [57]. This trainable amplitude was applied to

nonlinear FIR adaptive filter in R [58] and then extended to the recurrent neural network

(RNN) for processing in the complex domain C [59] which yielded superior performance

compared to their counterparts with fixed nonlinearities. Motivated by this, a trainable

amplitude activation function shall be incorporated into the SQAFA, termed the Adaptive

Amplitude Split Quaternion Adaptive Filtering Algorithm (AASQAFA).

The adaptive amplitude of nonlinearity is defined as [57]

Φs

(wT (n)x(n)

)= λ(n) · Φs

(wT (n)x(n)

)(3.16)

where λ(n) denotes the time varying amplitude and Φs(·) the real nonlinearity with unit

amplitude applied componentwise.

In the context of “split quaternion” filtering, this can be formulated as

Φs

(neta(n)

)= λa(n)Φa

(neta(n)

)+ λb(n)Φb

(netb(n)

)ı+ λc(n)Φc

(netc(n)

)+ λd(n)Φd

(netd(n)

)κ

= ya(n) + yb(n)ı+ yc(n)+ yd(n)κ (3.17)

where λa(n) is the amplitude of the nonlinearity for the real part of the quaternion, λb(n)

for the ı part, λc(n) for the part and λd(n) for the κ part.

The update of the adaptive amplitude is derived based on

λ(n + 1) = λ(n)− ρ∇λE(n) (3.18)


where ρ is a real-valued learning rate.

The error gradient ∇λE(n) is given as

∇λE(n) =∂E(n)

∂λ(n)=∂[e(n)e∗(n)

]

∂λ(n)= e(n)

∂e∗(n)

∂λ(n)+∂e(n)

∂λ(n)e∗(n) (3.19)

From (3.17), since each dimension is treated separately, it is convenient to define the

corresponding component-wise errors as

ea(n) = da(n)− λa(n)Φa

(neta(n)

); eb(n) = db(n)− λb(n)Φb

(netb(n)

)

ec(n) = dc(n)− λc(n)Φc

(netc(n)

); ed(n) = dd(n)− λd(n)Φd

(netd(n)

)(3.20)

As the adaptive amplitude is applied component-wise, the error gradient ∇λE(n)

for each dimension can be optimised separately. For instance, the error gradient with

respect to λa, ∇λaE(n) is given as

∇λaE(n) = ea(n)

∂e∗a(n)

∂λa(n)+∂ea(n)

∂λa(n)e∗a(n) = −2ea(n)Φa

(neta(n)

)(3.21)

Similar expressions are obtained for the other three dimensions. Finally, the up-

dates for the amplitudes of all the four nonlinearities are given by

λa(n+ 1) = λa(n) + ρea(n)Φa

(neta(n)

); λb(n + 1) = λb(n) + ρeb(n)Φb

(netb(n)

)

λc(n+ 1) = λc(n) + ρec(n)Φc

(netc(n)

); λd(n+ 1) = λd(n) + ρed(n)Φd

(netd(n)

)(3.22)

Although λ is quaternion-valued, it is derived componentwise due the split-

quaternion function which processes componentwise. In order to derive λ as a whole,

an analytic quaternion function needs to be implemented.

3.2.4 Convergence Analysis of SQAFA and AASQAFA

The convergence analysis of the proposed algorithms is achieved based upon the relation-

ship between the a priori, and a posteriori error, and by deriving the stepsize bound which


ensures convergence. Following the approach from [60] and [52], consider the first order

Taylor series expansion

‖e(n)‖22 = ‖e(n)‖22 +∆wH(n)∂‖e(n)‖22∂w(n)

(3.23)

where e(n), e(n), ∆wH(n) and∂‖e(n)‖2

2

∂w(n) are respectively the a posteriori error, the a priori

error, the Hermitian of the weight update and the error gradient. The a posteriori output

error e and the a priori output error e are defined as2

e(n) = d(n)−Φs

(wT (n+ 1)x(n)

)+ ε(n); e(n) = d(n)−Φs

(wT (n)x(n)

)+ ε(n) (3.24)

The symbols ε and ε denote quaternion quadruply white Gaussian noise (QWGN) defined

as

ε(n) = εa(n) + εb(n)ı+ εc(n)+ εd(n)κ (3.25)

where εa, εb, εc and εd are realisations of real-valued white Gaussian noises (WGN), inde-

pendent and identically distributed (i.i.d.).

For the filter to converge, the a priori and the a posteriori errors need to satisfy

‖e(n)‖22 < ‖e(n)‖22 (3.26)

In the following analysis, three standard assumptions are made:

a) the learning rate µ is small to ensure the deterministic behaviour of the ensemble

average learning curves;

b) at convergence, e(n) is statistically independent of x(n) [30];

c) both the a posteriori output error e(n) and a priori output e(n) error are Gaussian.

Applying those assumptions, the final sufficient condition for the convergence of the step-

2The the term wT (n)x(n) is maintained instead of net(n) throughout the derivation in this subsection

to explicitly show the difference between the a posteriori error e(n) and the a priori error e(n).

3.3 Simulations 50

size µ of SQAFA becomes (full derivation is given in Appendix C)

0 < µ <1

10E{xT (n)x∗(n)‖Φ′(wT (n)x(n)

)‖22}

(3.27)

In the case of AASQAFA, each parameter λ controls the amplitude of the nonlinear-

ity in their respective dimension, hence, the convergence analysis is conducted separately

for each dimension. In order for AASQAFA to converge, λa(n), λb(n), λc(n) and λd(n)

must each converge. The analysis based on the convergence for λa(n) is first illustrated.

In order to understand the convergence property of the AASQAFA, the convergence

at each dimension is first analyzed. First, the scalar component output of the AASQAFA

ya in (3.17) is considered. This will then modify the scalar component of the priori error

ea(n), and the a posteriori error ea(n) of (3.24), to be

ea(n) = da(n)−λa(n)Φa

(wT (n)x(n)

)+ε(n); ea(n) = da(n)−λa(n)Φa

(wT (n+1)x(n)

)+ε(n)

(3.28)

Similarly, using the same procedures to find the convergence of SQAFA, the bounds

on λa(n), λb(n), λc(n) and λd(n) can be found as (derivations are given in Appendix D)

0 < λ2a(n) <1

2µE{xT (n)x∗(n)‖Φ′a

(wT (n)x(n)

)‖22}

0 < λ2b(n) <1

2µE{xT (n)x∗(n)‖Φ′b

(wT (n)x(n)

)‖22}

0 < λ2c(n) <1

2µE{xT (n)x∗(n)‖Φ′c

(wT (n)x(n)

)‖22}

0 < λ2d(n) <1

2µE{xT (n)x∗(n)‖Φ′d

(wT (n)x(n)

)‖22}

(3.29)

3.3 Simulations

Simulations were performed in an M-step prediction setting and provide a comprehensive

comparison between the nonlinear FIR filters trained with SQAFA, AASQAFA, QMLP-

FIR, Complex Nonlinear Gradient Descent (CNGD) [38], real-valued Nonlinear Gradient

3.3 Simulations 51

0 2000 4000−2

0

2

Time (samples)

X1

0 2000 4000−10

0

10

Time (samples)

Y1

0 2000 4000−2

0

2

Time (samples)

X2

0 2000 4000−10

0

10

Time (samples)

Y2

0 500 1000 1500 2000−4

−2

0

2

Time (samples)

Ea

st

Dire

ctio

n (

m/s

)

0 500 1000 1500 2000−5

0

5

Time (samples)

No

rth

Dire

ctio

n (

m/s

)

0 500 1000 1500 2000−2

−1

0

1

Time (samples)V

ert

ica

l

Dire

ctio

n (

m/s

)(a) 4D Saito Signal (b) Wind Signal

Figure 3.2: Left: The 4D Saito Signal. Right: The 3D wind signal.

Descent (NGD) [38] and the training algorithm for the Quaternion valued Multilayer

Perceptron (QMLP) [16]. The SQAFA, AASQAFA, QMLP-FIR, CNGD and NGD were

implemented with a filter length p whereas the QMLP had one hidden layer comprising

p inputs, three hidden neurons and one output neuron. The nonlinear function was the

tanh function applied component-wise. The original QMLP applied the unipolar logistic

function as the nonlinearity, whereas the QMLP algorithm implemented in our simulations

applied the bipolar tanh function, which was better suited to the dynamic range of the

data. This is justified by [61] which prove that interchanging the nonlinearity would not

lead to a significant deviation in performance. In the experiments, the amplitudes of

input signals in each dimension were scaled to within the range [-0.8,0.8]. The step size of

the adaptive amplitude was chosen to be ρ=0.4 with an initial amplitude λ(0)=1 for all

experiments. A total of 20 independent simulation trials were conducted and averaged.

These values were chosen to ensure optimal performance of the algorithms considered.

The standard prediction gain Rp was used as a quantitative measure of performance

3.3 Simulations 52

0

5

10

0

5

106

8

10

12

14

16

18

Filter Length pPrediction Horizon M

Pre

dic

tion

Ga

in (

dB

)

0

5

10

0

0.005

0.010

5

10

15

20

Filter Length pStepsize µ

Pre

dic

tion

Ga

in (

dB

)

AASQAFA

QMLP

SQAFA

AASQAFA

QMLPSQAFA

(b) Dependence of Prediction Gain on µ and p(a) Dependence of Prediction Gain on M and p

Figure 3.3: The performance of SQAFA, AASQAFA and QMLP on the prediction of 4DSaito’s Chaotic Signal.

defined as [62]

Rp = 10 log10σ2xσ2e

(3.30)

where σ2x and σ2e denote the estimated variance of the input and error respectively.

The variances were estimated according to

σ2x = E{x2a + x2b + x2c + x2d}; σ2e = E{e2a + e2b + e2c + e2d} (3.31)

where E{·} denotes the statistical expectation operator, x2a, x2b , x

2c , x

2d are the correspond-

ing squared components of the input signal, and similarly the squared error components,

e2a, e2b , e

2c , e

2d. All these values were measured at the steady-state.

Two quaternion valued processes were considered: the synthetic benchmark four-

dimensional Saito’s Chaotic Signal [63] and the real-world three-dimensional wind field

(pure quaternion).

3.3 Simulations 53

0

5

10

0

5

104

6

8

10

12


Pre

dic

tion

Ga

in (

dB

)

0

5

10

0

0.005

0.010

2

4

6

8

10

12


Pre

dic

tion

Ga

in (

dB

)

AASQAFA

SQAFA

QMLP

(b) Dependence of Prediction Gain on µ and p

QMLP

AASQAFA

SQAFA

(a) Dependence of Prediction Gain on M and p

Figure 3.4: The performance of SQAFA, AASQAFA and QMLP on the prediction of 3Dwind signal.

3.3.1 Four-dimensional Saito’s Chaotic Circuit

The Saito’s chaotic circuit is governed by four state variables x1, y1, x2, y2 and five param-

eters η, α1, α2, β1, β2, and is given by [63]

∂x1

∂τ

∂y1∂τ

=

−1 1

−α1 −α1β1

x1 − ηρ1h(z)

y1 − η ρ1β1h(z)

(3.32)

∂x2

∂τ

∂y2∂τ

=

−1 1

−α2 −α2β2

x2 − ηρ2h(z)

y2 − η ρ2β2h(z)

(3.33)

where τ is the time constant of the chaotic circuit and h(z) is the normalized hysteresis

value which is given as [63]

h(z) =

1, z ≥ −1

−1, z ≤ 1(3.34)

3.3 Simulations 54

The symbols z, ρ1 and ρ2 are given as

z = x1 + x2; ρ1 =β1

1− β1; ρ2 =

β21− β2

(3.35)

Saito’s chaotic signal used is initialised with the following standard parameters:

η=1.3, α1=7.5, α2=15, β1=0.16 and β2=0.097. As chaotic signals are sensitive to ini-

tial conditions, these values would ensure that the Saito’s chaotic signal exhibit chaotic

behaviour. Figure 3.2(a) shows the 4D Saito’s signal dimension-wise.

Figure 3.3 illustrates the performance of the algorithms considered as a function

of the prediction horizon M (with µ = 10−2), and as a function of stepsize µ (with the

prediction horizon, M=1). From Figure 3.3, it can be seen that AASQAFA and SQAFA

have similar performance and they both have higher performance than the QMLP.

3.3.2 Wind Forecasting

In the next simulation, a three-dimensional wind field was used as an input.3 The wind

data was initially sampled at 50 Hz, but resampled at 5 Hz for simulation purposes. Figure

3.2(b) shows the three-dimensional wind data dimension-wise.

Figure 3.4 depicts the performance of SQAFA, AASQAFA and QMLP as a function

of the prediction horizon M and stepsize µ. The prediction gain for SQAFA was better

than that of QMLP in both case studies (varying learning rate and prediction horizon),

thus indicating the benefits of fully exploiting the quaternion algebra. The performance of

AASQAFA was superior to that of SQAFA, due to its adaptive amplitude which follows

the dynamics of the wind signal more closely.

Figure 3.5 shows the comparison between SQAFA, the learning algorithm for QMLP

applied to the FIR filter (QMLP-FIR), CNGD, and NGD as a function of prediction

horizon M and stepsize µ. The performance gain for SQAFA was higher than that for

QFIR, followed by those of the NGD algorithm and CNGD algorithm. When using the

same FIR architecture, the SQAFA has an improved performance over the QMLP-FIR

3The wind data is obtained from Prof. K. Aihara and his team at the Institute for Industrial Science,University of Tokyo, in an urban environment.

3.4 Discussion 55

0

5

10

0

5

10−5

0

5

10

15


Pre

dic

tion

Ga

in (

dB

)

0

5

10

0

0.005

0.01−5

0

5

10

15


Pre

dic

tion

Ga

in (

dB

)

SQAFA

NGD

SQAFA

QMLP−FIR

NGD

CNGD

QMLP−FIR

(a) Dependence of Prediction Gain on M and p (b) Dependence of Prediction Gain on µ and p

CNGD

Figure 3.5: The performance of SQAFA, QFIR, CNGD and NGD on the prediction of 3Dwind signal.

highlighting the advantage of taking into consideration the non-commutativity aspect of

quaternion algebra. Moreover, both quaternions based algorithms proved to be better

than their complex and real-valued counterparts.

3.4 Discussion

The performance of the SQAFA was generally better than that for QMLP, as it takes into

account more complete information about the statistics of the multidimensional signal.

The AASQAFA, on the other hand, outperformed SQAFA due to its ability to better

track the dynamics of the signal. The QMLP was less affected by the length of the

prediction horizon and the filter length as compared to the SQAFA and AASQAFA. The

deterioration of the QMLP prediction gain Rp with the increase of prediction horizon

M is almost negligible due to the structural richness of the multilayer neural network

(NN) compared to the single layer FIR architecture of SQAFA and AASQAFA. The H

3.4 Discussion 56

Algorithms Additions Multiplications

1× QMLP 96p+168 108p+2161× SQAFA 54p+15 68p+241× AASQAFA 54p+19 68p+361× QMLP-FIR 28p+15 36p+202× CNGD 16p+4 24p+84× NGD 8p+4 12p+4

Table 3.1: Computational complexities of the algorithms

domain algorithms outperformed the algorithms in the complex C and real domain R

indicating quaternion based signal processing being a better choice for the processing of

three-dimensional and hypercomplex processes.

Another aspect that needs to be addressed is the computational complexity of

the algorithms, which is summarised in Table 3.1. The computational complexities for

AASQAFA and SQAFA are both O(68p) and QMLP is O(108p). On the other hand, the

computational complexity of QMLP is more than twice that of SQAFA and AASQAFA

when p = 1. Since the computational complexities for the SQAFA and AASQAFA are

similar, AASQAFA is a preferable choice due to its superior performance. Computational

complexities of the QMLP-FIR is O(36p), for the CNGD it is O(24p), and for NGD it is

O(12p). The computational complexity of SQAFA and AASQAFA are less than two times

that of QMLP-FIR, nearly three times that of CNGD and almost seven times that of NGD.

Hence, there is a trade-off between a higher computational complexity and increment in

performance.

The QMLP utilising the split-quaternion function was proven to be universal ap-

proximators in [16]. Specifically, it was shown that a universal approximator for quaternion

functions must be in the form of [16]

f(x) ≈N∑

i=1

CiΦs

(wT (n)x(n) + θ(n)

)(3.36)

where f(x) is a quaternion-valued function to be approximated, Ci is quaternion-valued

variable, Φs(·) is a split quaternion sigmoidal function, w(n) is the quaternion weight

vectors, x(n) is the quaternion input vectors and θ(n) is the quaternion-valued bias term.

3.4 Discussion 57

00.5

11.5

22.5

3

0

0.5

1

1.59.5

10

10.5

11

λ(0)ρ

Pre

dic

tion

Ga

in (

dB

)

Dependence of AASQAFA Prediction Gain on parameters ρ and λ(0)

Figure 3.6: Prediction gain of AASQAFA for the varying initial amplitude λ(0) and stepsize ρ.

Equation (3.36) conforms to the earlier findings of [1], who stated that any continuous

function can be approximated by the superposition of N sigmoidal functions. In the

context of the SQAFA and AASQAFA, N = 1, and therefore if SQAFA and AASQAFA

are extended to a neural network architecture, their approximation capabilities become

those of a universal approximator.

Figure 3.6 illustrates the dependence of the prediction gain of AASQAFA on the

initial amplitude λ(0) and step size ρ. It is shown that AASQAFA is robust to the initial

state λ(0) and the learning rate ρ for the realistic range of 0 < λ(0) < 3 and 0.1 ≤ ρ ≤ 1.5.

In summary, the advantages SQAFA and AASQAFA are

a) Taking into account the non-commutativity of the quaternion product leads to more

efficient use of the available statistics and improved performance;

b) AASQAFA caters for the changes in dynamical range of the signals, resulting in a

performance enhancement;

3.5 Summary 58

c) AASQAFA is robust to the choice of initial amplitude λ(0) and learning rate ρ.

3.5 Summary

A class of stochastic gradient algorithms (SQAFA and AASQAFA) for the training of

quaternion valued nonlinear adaptive finite impulse response (FIR) filters has been pro-

posed. The learning algorithm for the training of QMLP proved inadequate for modelling

the hypercomplex processes considered (four-dimensional Saito’s chaotic signal and three-

dimensional wind signal) due to the strong coupling between each dimension. Furthermore,

multiple univariate NGD and a pair of complex NGD (CNGD) were also considered, but

yielded poorer performance compared to both the QMLP and the SQAFA algorithms. The

split-quaternion nonlinear function was next employed, as there are no known analytic ex-

tensions of elementary transcendental functions from C to H, due to the violation of the

Cauchy-Riemann-Fueter (CRF) conditions. The derivations of the SQAFA and AASQAFA

have taken into account the non-commutativity of the quaternion product, and have been

simplified by making use of the odd-symmetry property of elementary transcendental func-

tions applied component-wise. A rigorous stability analysis has provided the range of the

stepsizes for SQAFA and AASQAFA, and has established the relationship between the

adaptive amplitude and the stepsize of the AASQAFA. The proposed algorithms (SQAFA

and AASQAFA) have been shown to exhibit excellent performance on the prediction of

quaternion valued real-world vector fields. The AASQAFA achieved better performance

due to its enhanced ability to track the time varying dynamics of the input signals.

59

Chapter 4

A Class of Quaternion Valued

Nonlinear Adaptive Filters

In the previous chapter, it has been shown that considering the non-commutativity of

quaternion algebra in deriving a new class of algorithms leads to an improved performance.

However, the nonlinearity used is the split-quaternion nonlinearity which does not take

into full consideration the available correlations between the dimensions. The usage of

the split-quaternion functions were necessary as there are no global analytic quaternion

nonlinear functions as dictated by the Cauchy-Riemann-Fueter (CRF) equations.

This chapter aims to propose a class of nonlinear quaternion-valued adaptive fil-

tering algorithms based on locally analytic nonlinear activation functions. To circumvent

the stringent standard analyticity conditions of CRF which are prohibitive to the develop-

ment of nonlinear adaptive quaternion-valued estimation models, the fact that stochastic

gradient learning algorithms require only local analyticity at the operating point in the

estimation space is enforced. It is shown that the quaternion-valued exponential function

is locally analytic, and since local analyticity extends to polynomials, products and ratios,

it is shown that a class of transcendental nonlinear functions can serve as activation func-

tions in nonlinear and neural adaptive models. This provides a unifying framework for

the derivation of gradient based learning algorithms in the quaternion domain H, and the

derived algorithms are shown to have the same generic form as their real- and complex-

4.1 Introduction 60

valued counterparts. To make such models second-order optimal for the generality of

quaternion signals (both circular and noncircular), recent developments in augmented

quaternion statistics is implemented to introduce widely linear versions of the proposed

nonlinear adaptive quaternion valued filters. This allows to fully exploit second-order in-

formation in the data, contained both in the covariance and pseudocovariances to cater

rigorously for second-order noncircularity (improperness), and the corresponding power

mismatch in the signal components. Simulations over a range of circular and noncircular

synthetic processes and a real world three-dimensional noncircular wind signal support

the approach.

4.1 Introduction

Although quaternion nonlinear functions have been implemented, for example, the Quater-

nion Independent Component Analysis (ICA) algorithm [64], the analyticity of such func-

tion has not been rigorously examined. The very stringent Cauchy-Riemann-Fueter (CRF)

conditions [25] ensure that the only globally analytic quaternion-valued functions are linear

functions and constants. This is a serious obstacle as the CRF conditions prevent us from

choosing the standard nonlinear activation functions (tanh, logistic) as the nonlinearities

in nonlinear quaternion-valued adaptive estimation.

It is important to notice that most practical gradient based learning algorithms [14–

16] only require local analyticity at a point. In analogy to the complex domain C, where

so called fully complex nonlinearities (elementary transcendental functions) provide means

for generic extensions of real neural networks [12, 38], our aim is to show that the class

of elementary transcendental functions, such as tanh are locally analytic in H and thus

permit generalisation of neural networks (NN) to the quaternion domain. This is not

possible to achieve using the standard Cauchy-Riemann-Fueter (CRF) conditions [25],

which are too restrictive. To this end, recent results on local analyticity [27] are exploited,

and due to a cumbersome derivation, the possibility of building generic quaternion-valued

nonlinear adaptive filters for the most commonly used activation functions, such as tanh,

are analytically shown. The derivation involves proving local analyticity of exponential

4.2 Fully Quaternion Functions in H 61

functions and their ratios, thus enabling the local analyticity for transcendental nonlinear

activation functions in H. Based on this set of results, the nonlinear adaptive filtering

and neural network paradigm in H are then established, in the similar way in R and

C [38, 39,65–69], as a natural generalisation.

In this work, a class of fully quaternion locally analytic nonlinear functions suitable

for quaternion-valued nonlinear adaptive filtering is introduced. It was also shown that

full second-order statistical information in the quaternion domain can be exploited by

combining the proposed nonlinear models with so called augmented quaternion statistics

and the widely linear model [42, 43]. For simplicity, the analysis and derivations are

provided for a single nonlinear perceptron and its widely linear counterpart.

This chapter is organised as follows. Section 4.2 reviews the local analyticity con-

dition (LAC) followed by the analysis of the quaternion exponential function and quater-

nion tanh function. Section 4.3 derives the proposed learning algorithms and their widely

linear counterparts followed by their convergence analyses. Section 4.4 compares the per-

formances of the proposed algorithms against the existing algorithms of the kind. The

results are discussed in Section 4.5 whereas the chapter concludes in Section 4.6.

4.2 Fully Quaternion Functions in H

Analyticity in H is governed by Cauchy-Riemann-Fueter (CRF) conditions given by [25]

∂f(q)

∂qa+∂f(q)

∂qbı+

∂f(q)

∂qc+

∂f(q)

∂qdκ⇔

∂f(q)

∂q∗= 0 (4.1)

The CRF conditions are too strict and are only satisfied by linear quaternion functions

and constants [25], prohibiting the development of quaternion-valued nonlinear signal

processing.

To relax the CRF, a “local” analyticity condition was proposed in [27], by using a

complex representation of a quaternion to give

∂f

∂qa= −

∂f

∂αζ (4.2)


where ζ and α are given by

ζ =qbı+ qc+ qdκ

α; α =

√

q2b + q2c + q2d (4.3)

The term “local” here refers to the fact that this representation uses “imaginary” unit ζ

which depends on the values of qb, qc and qd [27]. The local analyticity condition only

guarantees the first-order differentiability of the single variable quaternion functions at the

current operating point. This is perfectly adequate for quaternion valued gradient descent

adaptive filtering algorithms, as they only require the information about the gradient

value at a point. Furthermore, the local analyticity condition has a only a single definition

for analyticity eliminating the ambiguity of having left and right derivatives previously

suffered by the CRF conditions. An attractive aspect of the quaternion function satisfying

the local analyticity condition is that it is also a solution to the Fueter third-order analytic

conditions [27].

To provide a rigorous basis for nonlinear quaternion-valued adaptive filtering; the

fully quaternion nonlinearities in H are identified. The function that satisfies the local an-

alyticity condition in (4.2) is termed as ‘fully quaternion nonlinearity’, in the sense of local

analyticity. Due to the “local” nature of the first-order differentiability, the quaternionic

derivative at a point is dependant on the direction of the ζ-plane. The analyticity of a

function at a given point is evaluated by analysing the local derivative within the ζ-plane

(with ζ fixed) to obtain the relationship [27]

∂f

∂α=

∂qb∂α

∂f

∂qb+∂qc∂α

∂f

∂qc+∂qd∂α

∂f

∂qd

α∂f

∂α= qb

∂f

∂qb+ qc

∂f

∂qc+ qd

∂f

∂qd(4.4)

Based on this relationship, along with ζ and α in (4.3), the right hand side of the analyticity

condition in (4.2) is expanded along the orthogonal-axis vectors ı, and κ as

−

(∂f

∂α

)(

ζ

)

= −

(qbα

∂f

∂qb+qcα

∂f

∂qc+qdα

∂f

∂qd

)(qbı+ qc+ qdκ

α

)

(4.5)

By analogy with C, this yields the characteristics of a fully quaternion locally


analytic nonlinearity suitable for gradient based learning, given by

a) f(q) = u(qa, α) + v(qa, α)ζ is nonlinear in qa and α;

b) f(q) has no singularities and is always bounded for all values of q;

c) The partial derivatives ∂u∂qa

, ∂v∂α ,

∂v∂qa

and ∂u∂α are continuous and bounded;

d) ∂u∂qa

∂v∂α 6= ∂v

∂qa∂u∂α to ensure continuous learning.

The next subsection focuses on the analyticity of the quaternion exponential function eq,

as it serves as a building block to construct transcendental nonlinear quaternion functions,

typically used as nonlinear activation functions.

4.2.1 Quaternion Exponential Function

The notion of exponential function in H is not straightforward. Due to the non-

commutativity of the quaternion product, there exist several definitions of the quaternion

exponential [70]; for convenience, the following exponential function is considered (p.9 [71])

eq = eqa+qbı+qc+qdκ = eqaeqbı+qc+qdκ (4.6)

Expanding the term eq using the Euler formula leads to

eq = eqa(

cos(α) + sin(α)ζ

)

= eqa(

cos(α) +qb sin(α)ı

α+qc sin(α)

α+qd sin(α)κ

α

)

(4.7)

where α and ζ are defined in (4.3).

To examine whether such quaternion exponential function satisfies the analyticity

condition in (4.2), (4.7) is differentiated with respect to qa to give the left hand side of

(4.2), that is

∂eq

∂qa= eqa

(

cos(α) + sin(α)ζ

)

(4.8)

Next, (4.7) is differentiated with respect to α to obtain the right hand side of (4.2) as

−∂eq

∂αζ = −

(qbα

∂eq

∂qb+qcα

∂eq

∂qc+qdα

∂eq

∂qd

)(qbı+ qc+ qdκ

α

)

(4.9)


The result of such differentiation is given by (see Appendix E for a full derivation)

−∂eq

∂αζ = eqa

(

cos(α) + sin(α)ζ

)

(4.10)

Therefore, this quaternion exponential function satisfies the analyticity condition in (4.2)

giving the local derivative of the exponential function as

∂eq

∂q= eq (4.11)

Observe that, as desired, this result represents a generic extension of the real and complex

derivatives of an exponential. In addition, as gradient based learning algorithms are local,

this result provides a basis for introducing other nonlinearities, such as the elementary

transcendental functions (ETF), as a vehicle for a class of fully quaternion nonlinear

adaptive filters.

4.2.2 Local Analyticity of the Quaternion tanh Function

Similarly to the complex domain, tanh(q) in H can be defined as

tanh(q) =e2q − 1

e2q + 1(4.12)

Proceeding in a similar manner as when addressing the analyticity of eq, tanh(q) is first

expanded using the Euler formula in (4.7), leading to

tanh(q) =e2qa cos(2α) − 1 + e2qa sin(2α)ζ

e2qa cos(2α) + 1 + e2qa sin(2α)ζ

=e4qa

(cos2(2α) + sin2(2α)

)− 1 + 2e2qa cos(2α)ζ

e4qa(cos2(2α) + sin2(2α)

)+ 1 + 2e2qa cos(2α)

=e4qa − 1 + 2e2qa sin(2α)ζ

e4qa + 1 + 2e2qa cos(2α)(4.13)

To prove the local analyticity, the left hand side of (4.2) is obtained by differentiating

(4.13) with respect to qa, and the right hand side of (4.2) is obtained by differentiating

4.3 Derivation of Fully Quaternion Algorithms 65

(4.13) with respect to α, resulting in (a detailed derivation is given in Appendix F)

∂ tanh(q)

∂qa=

4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)(e4qa + 2e2qa cos(2α) + 1

)2 +

(4e2qa sin(2α) − 4e6qa sin(2α)

)

(e4qa + 2e2qa cos(2α) + 1

)2 ζ(4.14)

=−∂ tanh(q)

∂αζ (4.15)

thus illustrating that tanh(q) is a locally analytic quaternion function.

The expression for a local derivative of tanh(q) is obtained analogously to the

complex case; sech(q) shall be first defined as

sech(q) =2

eq + e−q(4.16)

By expanding (4.16) into its Euler form and then squaring (full derivation can be found

in Appendix G) will result in

sech2(q) =4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)


)2 +−4e6qa sin(2α) + 4e2qa sin(2α)(e4qa + 2e2qa cos(2α) + 1

)2 ζ (4.17)

A comparison of the definition for sech2(q) in (4.17) with ∂ tanh(q)∂qa

= −∂ tanh(q)∂α ζ in (4.14)

shows that they are equivalent; therefore, a generic extension of the real and complex tanh

function have been introduced to the quaternion domain, whose derivative is

∂ tanh(q)

∂q= sech2(q) (4.18)

4.3 Derivation of Fully Quaternion Algorithms

Similar to the class of algorithm derived in Chapter 3.2, the cost function that the

quaternion-valued adaptive filtering algorithms are minimising is given by

E(n) = e(n)e∗(n) (4.19)

where e(n) = d(n) − y(n) with d(n) and y(n) denoting respectively to the desired signal

and output signal. The symbol (·)∗ is the conjugate operator.


4.3.1 Derivation of Quaternion Nonlinear Gradient Descent (QNGD)

To introduce the Quaternion Nonlinear Gradient Descent (QNGD) algorithm for the finite

impulse response (FIR) filter that employs a fully quaternion nonlinear activation function,

consider the output y(n) and its conjugate y∗(n) given by

y(n) = Φ(wT (n)x(n)

)= Φ

(net(n)

); y∗(n) = Φ

(xH(n)w∗(n)

)= Φ

(net∗(n)

)(4.20)

where (·)T is the transpose operator, (·)H is the Hermitian, and Φ(·) is the fully quaternion

nonlinearity such as the tanh(q) introduced in Section 4.2.2. Proceeding similar to Section

3.2, the cost function (4.19) shall be expressed as

E(n) =

(

d(n)− y(n)

)(

d∗(n)− y∗(n)

)

= d(n)d∗(n)− d(n)y∗(n)− y(n)d∗(n) + y(n)y∗(n)

(4.21)

The error gradient ∇wE(n) of QNGD is then calculated as

∇wE(n) = −d(n)∇wy∗(n)−∇wy(n)d

∗(n) + y(n)∇wy∗(n) +∇wy(n)y

∗(n) (4.22)

To simplify the derivation of QNGD further, the odd-symmetry property of locally analytic

quaternion elementary transcendental functions (ETF) is applied shown to be

Φ′∗(net(n)

)= Φ

′(net∗(n)

)(4.23)

Exercising (4.23), the expressions for ∇wy(n) and ∇wy∗(n) are given by (similar to the

derivations in Appendix A)

∇wy(n) = −Φ′(net(n)

)2x∗(n); ∇wy

∗(n) = Φ′∗(net(n)

)4x∗(n) (4.24)

Substitute the terms ∇wy∗(n) and ∇wy(n) into (4.22) to obtain the QNGD weight update

in the form

w(n+ 1) = w(n) + µ

(

2e(n)Φ′∗(net(n)

)x∗(n)− Φ

′(net(n)

)x∗(n)e∗(n)

)

(4.25)


where Φ′

(·) is the local derivative of the fully quaternion function and µ is the real-valued

learning rate. Notice that the factor 2 is absorbed into µ.

4.3.2 Augmented Quaternion Nonlinear Gradient Descent (AQNGD)

The QNGD is now extended to fully capture the second-order statistics of the signal by

incorporating the quaternion widely linear model [42, 43, 45] into its derivation, resulting

in the Augmented Quaternion Nonlinear Gradient Descent (AQNGD) algorithm1. The

output y(n) of AQNGD is defined as

y(n) = Φ(gT (n)x(n)+hT (n)xı(n)+uT (n)x(n)+vT (n)xκ(n)

)= Φ

(waT(n)xa(n)

)= Φ

(neta(n)

)

(4.26)

where g, h, u and v are the weight vectors, x is the input signal, xı, x and xκ are

respectively its ı, and κ involutions, wa = [gT hT uT vT ]T is the augmented weight

vector, and xa = [xT xıT xT xκT ]T is the augmented random input vector.

The conjugate output y∗(n) is then given as

y∗(n) = Φ(neta∗(n)

)(4.27)

The weight updates of the AQNGD are made gradient adaptive according to

g(n+ 1) = g(n)− µ∇gE(n); h(n+ 1) = h(n)− µ∇hE(n)

u(n+ 1) = u(n)− µ∇uE(n); v(n + 1) = v(n) − µ∇vE(n) (4.28)

The error gradient ∇wE(n) in (4.22) is equivalent to ∇gE(n), hence

g(n + 1) = g(n) + µ

(

2e(n)Φ′∗(neta(n)

)x∗(n)− Φ

′(neta(n)

)x∗(n)e∗(n)

)

(4.29)

The error gradient ∇hE(n) is given by

∇hE(n) = −d(n)∇hy∗(n)−∇hy(n)d

∗(n) + y(n)∇hy∗(n) +∇hy(n)y

∗(n) (4.30)

1A comprehensive account of widely linear modelling in the complex domain C is given in [38].


In the same manner, the terms ∇hy(n) and ∇hy∗(n) are calculated as

∇hy(n) = −Φ′(neta(n)

)2xı∗(n); ∇hy

∗(n) = Φ′∗(neta(n)

)4xı∗(n) (4.31)

Substituting ∇hy(n) and ∇hy∗(n) into the error gradient ∇hE(n) in (4.30) yields

h(n+ 1) = h(n) + µ

(


)xı∗(n)− Φ

′(neta(n)

)xı∗(n)e∗(n)

)

(4.32)

Proceeding in a similar manner, the weight updates for u(n) and v(n) are found to be

u(n + 1) = u(n) + µ

(


)x∗(n)− Φ

′(neta(n)

)x∗(n)e∗(n)

)

v(n + 1) = v(n) + µ

(


)xκ∗(n)−Φ

′(neta(n)

)xκ∗(n)e∗(n)

)

(4.33)

For convenience, the final weight update of the AQNGD can be written in an augmented

form as2

wa(n+ 1) = wa(n) + µ

(


)xa∗(n)− Φ

′(neta(n)

)xa∗(n)e∗(n)

)

(4.34)

4.3.3 Convergence Analysis of QNGD and AQNGD

Similar to the convergence analysis in Chapter 3, three widely used general assumptions

are made [52]

a) the learning rate µ is sufficiently small to ensure the deterministic behaviour of the

ensemble average;

b) at convergence, the a priori output error e(n) is statistically independent of the input

vector x(n), that is E{e(n)x(n)} = 0;

c) both the a posteriori output error e(n) and a priori output error e(n) are Gaussian.

2The QNGD could also be readily extended to incorporate the semi-widely linear model [43], howeverthis is beyond the scope of this work.

4.4 Simulations 69

Applying those assumptions and proceeding similar to Appendix C, the final suffi-

cient condition for the convergence of QNGD becomes

0 < µ <1

10E{xT (n)x∗(n)‖Φ′(wT (n)x(n)

)‖22}

(4.35)

whereas the condition for AQNGD is

0 < µ <1

10E{xaT (n)xa∗(n)‖Φ′(waT (n)xa(n)

)‖22}

(4.36)

Both the upper bounds of (4.35) and (4.36) are governed by the expected value of the

random input vector and the gradient of the fully quaternion nonlinearity. Note that the

upper bound of µ for the AQNGD in (4.36) is smaller than that of QNGD in (4.35), due to

the larger size of the augmented input vector xa(n). This means that the allowable value

for µ in QNGD is larger than the AQNGD resulting in a faster convergence for QNGD.

4.4 Simulations

A comprehensive comparison of the performances is provided between the training al-

gorithm for the feedforward Quaternion Multilayer Perceptron (QMLP) [16, 72] and the

nonlinear FIR filters trained with the QMLP learning algorithm (QMLP-FIR), Adaptive

Amplitude Split Quaternion Adaptive Filtering Algorithm (AASQAFA), real-valued Non-

linear Gradient Descent (NGD) [52] and the proposed algorithms based on fully quaternion

nonlinear functions, QNGD and AQNGD. The QMLP-FIR, AASQAFA, NGD, QNGD

and AQNGD were implemented with a filter length p whereas the QMLP had one hidden

layer comprising of p input neurons, three hidden neurons and one output neuron. The

tanh(q) nonlinear activation function was used for all the algorithms. The performance

was measured using the prediction gain Rp defined as [52]

Rp = 10 log10σ2xσ2e

(4.37)

where σ2x and σ2e denote respectively the estimated variance of the input and error.

4.4 Simulations 70

1000 2000 3000 4000 5000 6000−35

−30

−25

−20

−15

−10

−5

0

Number of iterations (n)

Err

or

10

log

10 E

(n)

AASQAFA

AQNGD

QNGD

QMLP−FIR

Figure 4.1: Learning curves for QMLP-FIR, AASQAFA, QNGD and AQNGD on theprediction of linear AR (4) signal (4.39) driven by H-circular white Gaussian noise.

WGN H-circular Cı-circular Noncircular

εa N (0, 1) N (0, 1) N (0, 1)εb N (0, 1) N (0, 1) −0.6εa +N (0, 1)εc N (0, 1) 0.4εa + 0.8εb +N (0, 1) 0.8εb +N (0, 1)εd N (0, 1) 0.8εa − 0.4εb +N (0, 1) 0.8εa − 0.4εb +N (0, 1)

Table 4.1: Classes of Quaternion White Gaussian Noise

The three quaternion valued processes considered were the synthetic linear AR (4)

process [38] with a varying degree of circularity, the noncircular chaotic four-dimensional

Saito signal [63], and the real-world three-dimensional wind field.

4.4.1 Linear AR (4)

For this experiment, the input tap length was chosen to be p = 3, prediction horizon

M = 1 and the learning rate µ = 5× 10−3.

In the first set of simulations, the performances of AQNGD, QNGD, AASQAFA and

4.4 Simulations 71

QMLP-FIR were analysed for a linear AR (4) process with a varying degree of circularity

of the driving quaternion quadruply white Gaussian noise (QWGN) ε(n). The QWGN is

described by


where εa, εb, εc and εd are realisations of real-valued white Gaussian noises (WGN). The

properties of noises used to generate different classes of QWGN are shown in Table 4.1.

Note that the properties for C-circular and Cκ-circular noises are similar to those of the

Cı-circular input noise, and their descriptions are omitted due to space limitation.

A total of 100 independent simulation trials were conducted and averaged for the

linear AR (4) process given by

r(n) = 1.79r(n − 1)− 1.85r(n − 2) + 1.27r(n − 3)− 0.41r(n − 4) + ε(n) (4.39)

Figure 4.1 shows the learning curves for an H-circular quaternion white Gaussian

noise as the driving noise of the linear AR (4) process. Observe that the proposed AQNGD

and QNGD had the fastest convergence, followed by the AASQAFA and QMLP-FIR. It

can be seen that the steady-state performances for AQNGD, QNGD and AASQAFA were

similar due to the matched power of the components of the H-circular linear AR (4) signal.

Figure 4.2 depicts the learning curves for the input Cı-circular white Gaussian

noise3 for all of the algorithms considered. Similar to the previous case, the AQNGD and

QNGD had the fastest convergence, and as desired, the steady-state results for AQNGD

and QNGD were equivalent. In the case of C and Cκ white Gaussian noises, similar

performances were obtained and are omitted in this work for conciseness.

Figure 4.3 shows learning curves for all the algorithms considered using a noncircu-

lar white Gaussian noise as the input; the AQNGD and QNGD had improved performances

over the AASQAFA and QMLP-FIR. It can also be seen that the steady-state performance

of AQNGD was lower than that of QNGD as it was designed to cater for any noncircular

autoregressive (AR) type of processes.

3The notion of Cη circularity refers to only having a pair of axis exhibiting complex circularity.

4.4 Simulations 72

1000 2000 3000 4000 5000 6000−35

−30

−25

−20

−15

−10

−5

0


Err

or

10

log

10E

(n)

QMLP−FIRAASQAFA

QNGDAQNGD

Figure 4.2: Learning curves for QMLP-FIR, AASQAFA, QNGD and AQNGD on theprediction of linear AR (4) signal (4.39) driven by Ci-circular white Gaussian noise.

Table 4.2 compares prediction gains Rp of the AQNGD, QNGD, AASQAFA and

QMLP-FIR for the prediction of linear AR (4) process with varying classes of input cir-

cularity, with µ = 10−2. The prediction gain was obtained from an average of 100 Monte-

Carlo trials. In all the cases, the proposed algorithms, AQNGD and QNGD, had bet-

ter performance over the AASQAFA and QMLP-FIR, illustrating the power of the fully

quaternion function over the split-quaternion function. Also from Table 4.2, the use of

the quaternion widely linear model for noncircular data is fully justified, as indicated by

a higher prediction gain of AQNGD over the QNGD for noncircular sources.

4.4.2 Four-dimensional Saito’s Chaotic Circuit

The Saito chaotic signal was initialised with the following parameters: η=1.3, α1=7.5,

α2=15, β1=0.16 and β2=0.097, and is noncircular, as shown dimension-wise in Figure

4.4(a). These values would guarantee the chaotic behaviour of the Saito’s chaotic signal.

4.4 Simulations 73

1000 2000 3000 4000 5000 6000−35

−30

−25

−20

−15

−10

−5

0


Err

or

10

log

10E

(n)

QMLP−FIR AASQAFA

QNGD AQNGD

Figure 4.3: Learning curves for QMLP-FIR, AASQAFA, QNGD and AQNGD on theprediction of linear AR (4) signal (4.39) driven by noncircular white Gaussian noise.

Figure 4.5 depicts the performances of the algorithms considered in terms of pre-

diction horizon M (with fixed stepsize µ = 10−2) and stepsize µ (with fixed prediction

horizon M=1). Observe that the AQNGD outperformed all the other algorithms by a

margin greater than 2dB. For all the cases, increasing the stepsize led to a higher predic-

tion gain provided that the upper bound of QNGD in (4.35) and AQNGD in (4.36) were

satisfied.

Figure 4.6 illustrates the dependence of the prediction gain on filter length p for all

algorithms with a fixed prediction horizonM = 1 and stepsize µ = 10−2. Observe that the

prediction gain for the AQNGD was the largest followed closely by the QNGD. However,

increasing the filter length above p = 80 taps would lead to a significant performance degra-

dation of the AQNGD whereas the performance of the QNGD remains almost constant

for higher filter length p. This is because increasing the filter length would proportionally

increase the value of the term xaT (n)xa∗(n), which controls the maximum allowable µ,

thus violating the upper bound of µ for AQNGD, specified in (4.36). However, this value

4.5 Discussion 74

Algorithms H-circular Cı-circular C-circular Cκ-circular Noncircular

AQNGD 20.22dB 20.93dB 20.91dB 20.88dB 21.58dBQNGD 19.46dB 20.04dB 19.99dB 20.01dB 20.45dBAASQAFA 18.09dB 15.75dB 15.35dB 15.66dB 17.01dBQMLP-FIR 16.58dB 18.11dB 18.11dB 18.05dB 18.04dB

Table 4.2: Prediction Gain Rp for a Linear AR (4) Process With Varying Degree ofNoncircularity

is still within the upper bound of µ for QNGD given in (4.35).

4.4.3 Wind Forecasting

In this set of simulations, a single realization of three-dimensional wind field was used

as the input4. Figure 4.4(b) shows the wind field signal dimension-wise, and Figure 4.7

illustrates the performances of AQNGD, QNGD, AASQAFA and QMLP-FIR as a function

of prediction horizon M and stepsize µ. The performance of AQNGD was better than that

of QNGD; this was closely followed by AASQAFA, whereas the performance of the QMLP-

FIR was the lowest.

Figure 4.8 shows a comparison of the proposed QNGD with the existing QMLP

and three real-valued NGD as a function of prediction horizon M with a fixed stepsize

µ = 10−2. From Figure 4.8, observe that the QNGD outperformed the other algorithms

considered. Also observe that QMLP prediction gain was almost constant with the increase

of the prediction horizon due to the structural richness of the feedforward multilayer neural

network, which conforms to our earlier studies in Chapter 3.

4.5 Discussion

The performances of the filters that use the proposed locally analytic fully quaternion ac-

tivation functions were generally better than those of the existing AASQAFA and QMLP-

FIR. The widely linear version outperformed the QNGD, due to the implementation of the

quaternion widely linear model that fully captures the second-order statistics of quaternion

4The wind data were sampled at 32 Hz and recorded by the 3D WindMaster anemometer provided byGill Instruments.

4.5 Discussion 75

0 1000 2000 3000−2

0

2

Time (samples)

X1

0 1000 2000 3000−10

0

10

Time (samples)

Y1

0 1000 2000 3000−2

0

2

Time (samples)

X2

0 1000 2000 3000−10

0

10

Time (samples)

Y2

0 1000 2000 3000−2

−1

0

1

Time (samples)

Ea

st D

ire

ctio

n (

m/s

)

0 1000 2000 3000−6

−4

−2

0

2

Time (samples)

No

rth

Dire

ctio

n (

m/s

)

0 1000 2000 3000−2

0

2

Time (samples)Ve

rtic

al D

ire

ctio

n (

m/s

)(a) 4D Saito Signal (b) 3D Wind Signal

Figure 4.4: Noncircular signals used in simulations. Left: The 4D Saito Signal. Right:The 3D wind signal.

signals.

In order to create a class of fully quaternion function that is suitable for quaternion-

valued adaptive filtering, it is essential to examine the possibility of employing other fully

complex transcendental functions [12] as locally analytic fully quaternion functions. In

Section 4.2, the exponential function eq is established to be locally analytic and, given

that summations and products of analytic functions are analytic as well as quotients

(provided the denominator does not vanish), the tanh(q) function is also locally analytic

because it can be expressed in terms of eq as

tanh(q) =sinh(q)

cosh(q)=

eq − e−q

eq + e−q=

e2q − 1

e2q + 1(4.40)

This was verified by a rigorous derivation given in Appendix E, Appendix F and

Appendix G. By continuity, the other quaternion transcendental functions are also locally

analytic. In the complex domain, it has been shown in [61] that these performances based

4.5 Discussion 76

0

5

10

0

5

106

8

10

12

14

16

18

20


Pre

dic

tion

Ga

in (

dB

)

0

5

100

0.005

0.01

0

5

10

15

20

Stepsize µFilter Length p

Pre

dic

tion

Ga

in (

dB

)

QNGD

AQNGD

AASQAFA

QMLP−FIR

QNGD

AQNGD

AASQAFA

QMLP−FIR

(b) Dependence of the prediction gain on µ and p(a) Dependence of the prediction gain on M and p

Figure 4.5: The performance of AQNGD, QNGD, AASQAFA and QMLP-FIR on theprediction of the noncircular 4D Saito signal.

on a set of fully analytic transcendental functions were similar. In the same spirit, Figure

4.9 confirms by simulations that the other elementary transcendental functions give similar

performance to that of the locally analytic function tanh(q). It is therefore shown that

the fully complex transcendental activation functions from C can be extended to fully

quaternion functions in H; this is consistent with the observations in [61].

For convenience, the class of locally analytic fully quaternion functions and their

4.5 Discussion 77

10 20 30 40 50 60 70 80 9012

14

16

18

20

22

24

Filter Length p

Pre

dic

tion

Ga

in (

dB

)

AQNGD

QNGD

QMLP−FIR

AASQAFA

Figure 4.6: The performance of AQNGD, QNGD, AASQAFA and QMLP-FIR on theprediction of the noncircular 4D Saito signal over a range of filter lengths.

derivatives are given below

tanh(q) :∂ tanh(q)

∂q= sech2(q) (4.41)

tan(q) :∂ tan(q)

∂q= sec2(q) (4.42)

sin(q) :∂ sin(q)

∂q= cos(q) (4.43)

arctan(q) :∂ arctan(q)

∂q= (1 + q2)−1 (4.44)

arcsin(q) :∂ arcsin(q)

∂q= (1− q2)−1/2 (4.45)

sinh(q) :∂ sinh(q)

∂q= cosh(q) (4.46)

arctanh(q) :∂arctanh(q)

∂q= (1− q2)−1 (4.47)

arcsinh(q) :∂arcsinh(q)

∂q= (1 + q2)−1 (4.48)

4.5 Discussion 78

0

5

10

0

5

104

6

8

10

12

14


Pre

dic

tion

Ga

in (

dB

)

0

5

10 00.002

0.0040.006

0.0080.01

0

5

10

15

Stepsize µFilter Length p

Pre

dic

tion

Ga

in (

dB

)

AQNGD

AASQAFA

QNGD

QMLP−FIR

AQNGD

AASQAFAQMLP−FIR

QNGD

(b) Dependence of the prediction gain on µ on p(a) Dependence of the prediction gain on M and p

Figure 4.7: The performance of AQNGD, QNGD, AASQAFA and QMLP-FIR on theprediction of a 3D wind signal.

Another factor to consider is the computational complexity of the algorithms, sum-

marised in Table 4.3. The computational complexity of the AASQAFA, QNGD is O(68p);

the NGD has the lowest computational complexity of O(9p) and the AQNGD has the

highest computational complexity of O(272p). Computational complexities of the QMLP-

FIR is O(36p) and for the QMLP it is O(108p). The QNGD algorithm thus represents an

improvement from our previous proposed algorithm AASQAFA in terms of performance

and simplicity, while maintaining similar computational complexity.

Algorithms Multiplications Additions

1× QMLP-FIR 36p+20 28p+151× AASQAFA 68p+36 54p+191× QMLP 108p+216 96p+1683× NGD 9p+3 6p+31× QNGD 68p+36 54p+241× AQNGD 272p+144 208p+38

Table 4.3: Computational complexities of the algorithms considered

4.5 Discussion 79

0

5

10 0

5

10

0

2

4

6

8

10

12

14

Prediction Horizon MFilter Length p

Pre

dic

tion

Ga

in (

dB

)

QNGD

NGDQMLP

Dependence of the prediction gain on M and p

Figure 4.8: The performance of QNGD, QMLP and NGD on the prediction of a 3D windsignal.

In summary, the advantages of proposed class of QNGD and AQNGD algorithms

based on fully quaternion locally analytic nonlinearities, are

a) The performances of algorithms based on fully quaternion locally analytic functions,

QNGD and AQNGD, were better compared to those based on the split quaternion

functions, AASQAFA and QMLP-FIR, as the fully quaternion nonlinearities (4.41)

- (4.48) operate directly in the quaternion domain instead of the channelwise pro-

cessing in R;

b) The widely linear model enables the AQNGD to fully capture the quaternion second-

order statistics suitable for noncircular signals (improper), and hence offers a fur-

ther performance enhancement over the standard linear model employed in QNGD,

AASQAFA and QMLP-FIR;

c) The fully quaternion based QNGD is a reasonable choice as it allows for a trade-off

between performance and computational complexity.

4.6 Summary 80

02

46

810

0

5

107

8

9

10

11

12

13


Pre

dic

tion

Ga

in (

dB

)

Dependence of prediction gain on types of nonlinear quaternion functions

Figure 4.9: Prediction gains of QNGD for tan(q), sin(q), arctan(q), arcsin(q), sinh(q),arctanh(q) and arcsinh(q) for the prediction of 3D wind signal.

4.6 Summary

A class of quaternion-valued nonlinear functions suitable for stochastic gradient based

training of quaternion valued nonlinear adaptive filters has been proposed. The existing

learning algorithms either completely neglect the non-commutativity aspect of quaternion,

thus proving inadequate for the modelling of three- and four-dimensional processes, or are

unable to provide an accurate estimate due to the use of the split-quaternion function that

applies real nonlinearities component-wise. A class of fully quaternion activation functions

has been derived according to the local analyticity condition (LAC) which enables the

extension of fully complex nonlinear activation functions to the quaternion domain H.

The proposed fully quaternion algorithms (QNGD and AQNGD) have been shown to

exhibit excellent performance on the prediction of four-dimensional synthetic and three-

dimensional real-world vector signals. The widely linear AQNGD has been shown to

achieve enhanced performance due to the utilisation of the quaternion widely linear model

4.6 Summary 81

and the associated augmented quaternion statistics, which fully captures the second-order

information within quaternion-valued signals and enables the processing of both second-

order circular (proper) and noncircular (improper processes). Simulations over a range of

noncircular synthetic signals and real world three-dimensional wind recordings illustrate

the benefits of the proposed approach.

82

Chapter 5

Enabling Quaternion Valued

Recurrent Neural Networks

In the previous chapter, it was proven that the fully quaternion functions are suitable

for gradient-descent nonlinear quaternion-valued adaptive filtering applications. The fully

quaternion functions fulfil the local analyticity condition (LAC) guaranteeing first-order

differentiability of these functions.

This chapter aims to introduce an extension of the previously proposed fully

quaternion algorithms to the quaternion-valued recurrent neural networks (RNN). The

strict Cauchy-Riemann-Fueter (CRF) analyticity conditions establish that only linear

quaternion-valued functions are analytic, prohibiting the development of quaternion-

valued nonlinear adaptive filters for the recurrent neural network architecture (RNN).

In this work, the requirement of local analyticity in gradient based learning is exercised

and proposes to use the local analyticity condition (LAC) to introduce quaternion-valued

nonlinear feedback adaptive filters. The introduced class of algorithms make full use of

quaternion algebra and provide generic extensions of the corresponding real and complex

solutions. Simulations in the prediction setting support the analysis presented.

5.1 Introduction 83

5.1 Introduction

Quaternion-valued nonlinear filtering algorithms make use of elementary transcendental

functions (ETF) [12], which do not satisfy the Cauchy-Riemann-Fueter (CRF) conditions;

in fact these strict conditions are only met by linear quaternion-valued functions and con-

stants. The local analyticity condition (LAC) [27] is adopted to circumvent the analyticity

problem of the CRF. It treats the quaternion variable similarly to a complex variable and

can only guarantee the first-order differentiability of single variable quaternion functions

at a point. Notice, however, that for most gradient based learning algorithms the first

order derivative is adequate, enabling the derivation of nonlinear algorithms as shown in

Chapter 4.

This class of nonlinear algorithms in Chapter 4 was based on the feedforward ar-

chitecture and requires a long filter length for the modelling of systems with long term

correlations. For such a case, the infinite impulse response (IIR) architecture is more ap-

propriate due to the feedback as these can model long term correlations with a small-scale

model. For completeness, the aim is to investigate the suitability of LAC in the derivation

of gradient-based learning algorithms for feedback architectures and to provide building

block for recurrent neural networks in the context of quaternion-valued signal processing.

The LAC will allow the use of the ‘fully’ rather than the ‘split’ quaternion functions. Sim-

ilarly to the complex domain C [12,38], these ‘fully’ quaternion functions permit rigorous

treatment of the cross-information across the data channels, in contrast to the componen-

twise operation of the ‘split’ quaternion functions. The use of the recurrent architecture

and the fully quaternion functions will thus enhance the generality of the existing class of

quaternion-valued adaptive filtering algorithms [24,73].

This section is organised as follows. Section 5.2 presents an analysis that high-

lights the differences between split and fully quaternion functions. In Section 5.3, the

Quaternion-valued recurrent neural network (RNN) algorithms are derived. The proposed

algorithms are supported by simulations on synthetic three-dimensional Lorez attractor

and three-dimensional motion data in Section 5.4. This chapter concludes in Section 5.5.

5.2 Analysis of Quaternion-Valued Functions 84

5.2 Analysis of Quaternion-Valued Functions

This section will show that the fully-quaternion functions are better in capturing the cross-

correlations between the dimensions compared to the split-quaternion functions. Consider

the split quaternion function eq, as it serves as a building block to construct other quater-

nion elementary transcendental functions, that is

eq = eqa + eqbı+ eqc+ eqdκ = ya + ybı+ yc+ ydκ (5.1)

where ya, yb, yc and yd are the real-valued elements of the componentwise quaternion-

valued output.

From (5.1), it is clear that the componentwise output depends only on the corre-

sponding input component of the same dimension which is shown to be

E{ya} = E{eqa}; E{yb} = E{eqb}; E{yc} = E{eqc}; E{yd} = E{eqd} (5.2)

Now consider a “fully” eq function that gives the output

eq = eqaeqbı+qc+qdκ

= eqa(

cos(√

q2b + q2c + q2d) +qb sin(

√

q2b + q2c + q2d)√

q2b + q2c + q2d

ı+qc sin(

√

q2b + q2c + q2d)√

q2b + q2c + q2d

+qd sin(

√

q2b + q2c + q2d)√

q2b + q2c + q2d

κ

)

= ya + ybı+ yc+ ydκ (5.3)

Examining the output ya componentwise shows that

E{ya} = E

{

eqa cos(√

q2b + q2c + q2d)

}

(5.4)

represents a nonlinear combination of all the input components qa, qb, qc, qd, and therefore

accounts for the internal couplings. This holds true for the other output components which

5.3 FCRNN Algorithms in H 85

are given by

E{yb} = E

{

eqaqb sin(

√

q2b + q2c + q2d)√

q2b + q2c + q2d

}

E{yc} = E

{

eqaqc sin(

√

q2b + q2c + q2d)√

q2b + q2c + q2d

}

E{yd} = E

{

eqaqd sin(

√

q2b + q2c + q2d)√

q2b + q2c + q2d

}

(5.5)

5.3 FCRNN Algorithms in H

The fully connected recurrent neural network (FCRNN) consists of N neurons and p

external inputs as illustrated in Figure 5.1. The network has two distinct layers consisting

of a feedback layer and a layer of processing elements. In order to make these terms

consistent with past recurrent neural network (RNN) literature, yl(n) is chosen to denote

the quaternion-valued output of each neuron, l = 1, . . . , N at time index n and s(n) the

(1 × p) external quaternion-valued input vector. The overall input to the network z(n)

represents the concatenation of vectors y(n), s(n) and the bias input (1 + ı+ + κ), and

is given by

z(n) = [s(n− 1), . . . , s(n − p), 1 + ı+ + κ, y1(n− 1), . . . , yN (n − 1)]T

= zal + zbl ı+ zcl + zdl κ (5.6)

where zal , zbl , z

cl and zdl are the real-valued input components corresponding to the lth

element from the input vector z(n).

A quaternion-valued weight matrix of the network is denoted by W, where for

lth neuron, we have wl = [wl,1, . . . , wl,p+F+1]T . In the following subsections, only the

output from the first neuron (recurrent perceptron) y1(n) is considered resulting in the


Figure 5.1: A fully connected recurrent neural network (FCRNN).

cost function of

E(n) = (ea1(n))2 + (eb1(n))

2 + (ec1(n))2 + (ed1(n))

2 (5.7)

= e1(n)e∗1(n) (5.8)

where the error e1(n) = d(n)− y1(n) with d(n) being the desired signal. The terms ea1, eb1,

ec1 and ed1 denote the error component in the real part, ı part, part and κ part.

This terminology is used throughout this chapter.

5.3.1 Derivation of the Split Quaternion-valued RTRL

The split Quaternion-Valued Real-Time Recurrent Learning (Split QRTRL) algorithm for

FCRNN utilises the split-quaternion function, whose output at the lth neuron yl(n) is

given by

yl(n) = Φs

(wT

l (n)z(n))= Φa

(netal (n)

)+Φb

(netbl (n)

)ı+Φc

(netcl (n)

)+Φd

(netdl (n)

)κ

(5.9)


where Φs(·) denotes the “split” quaternion nonlinearity, Φa is a real-valued nonlinear

activation function applied to the real part of netl, Φb to the ı part, Φc to the part and

Φd to the κ part. The terms netal , netbl , net

cl and netdl are given by

neta(n) = R{wTl (n)z(n)}; netb(n) = Iı{w

Tl (n)z(n)}

netc(n) = I{wTl (n)z(n)}; netd(n) = Iκ{w

Tl (n)z(n)} (5.10)

where the symbols R(·), Iı(·), I(·) and Iκ(·) correspond to the real, ı, and κ components

respectively. The full expansion of the terms is given in Appendix H.

The Split QRTRL then minimises the cost function (5.7) through a gradient descent

weight update specified by ws,t(n + 1) = ws,t(n)− µ∇ws,tE(n) where µ is the real-valued

learning rate and the gradient ∇ws,tE(n) is given by

∇ws,tE(n) =∂E(n)

∂was,t(n)

+∂E(n)

∂wbs,t(n)

ı+∂E(n)

∂wcs,t(n)

+∂E(n)

∂wds,t(n)

κ (5.11)

Expanding the term ∂E∂wa

s,tin (5.11) gives

∂E(n)

∂was,t(n)

= −ea1(n)∂yal (n)

∂was,t(n)

− eb1(n)∂ybl (n)

∂was,t(n)

− ec1(n)∂ycl (n)

∂was,t(n)

− ed1(n)∂ydl (n)

∂was,t(n)

(5.12)

where the terms∂ya

l

∂was,t,

∂ybl

∂was,t,

∂ycl

∂wcs,t

and∂yd

l

∂wds,t

represents the real-valued sensitivity of the

network.

For convenience, the sensitivity terms in (5.12) is denoted with Ψl,(ηa)s,t =

∂yηl

∂was,t

where

η ∈ {a, b, c, d}, resulting in

∂E(n)

∂was,t(n)

= −ea1(n)Ψl,(aa)s,t (n)− eb1(n)Ψ

l,(ba)s,t (n)− ec1(n)Ψ

l,(ca)s,t (n)− ed1(n)Ψ

l,(da)s,t (n) (5.13)

In order to make further calculations feasible, a small stepsize is assumed so that [52,73]

w(n) ≈ w(n− 1) ≈ · · · ≈ w(n−M)

∂y(n)

∂w(n)≈

∂y(n)

∂w(n− 1)≈ · · · ≈

∂y(n)

∂w(n −M)(5.14)


The sensitivity Ψl,(aa)s,t is first calculated by differentiating yal with respect to wa

s,t and

applying the assumptions in (5.14) to yield

Ψl,(aa)s,t (n) =

∂yal (n)

∂netal (n)

∂netal (n)

∂was,t(n)

= Φ′

s

(netal (n)

)(

δslzal (n) +

N∑

q=1

∂yl(n− 1)

∂was,t(n)

)

= Φ′

s

(netal (n)

)(

δslzal (n) +

N∑

q=1

wal,p+1+q(n)Ψ

q,(aa)s,t (n− 1)− wb

l,p+1+q(n)Ψq,(ba)s,t (n− 1)

− wcl,p+1+q(n)Ψ

q,(ca)s,t (n− 1)− wd

l,p+1+q(n)Ψq,(da)s,t (n− 1)

)

(5.15)

The other 15 sensitivities are also derived in a similar manner (derivation is given in

Appendix H). Following a similar approach to [11], the compact solution is then obtained

by grouping these 16 sensitivity terms together to yield

Ψls,t(n) = Φs

′

(n)

( N∑

q=1

W(n)Ψqs,t(n− 1) + δslzsplit(n)

)

(5.16)

where δsl is the dirac-delta function. Each of the real-valued matrices are given as (the

time index ’n’ has been dropped due to space restrictions)

Ψls,t =

Ψl,(aa)s,t Ψ

l,(ab)s,t Ψ

l,(ac)s,t Ψ

l,(ad)s,t

Ψl,(ba)s,t Ψ

l,(bb)s,t Ψ

l,(bc)s,t Ψ

l,(bd)s,t

Ψl,(ca)s,t Ψ

l,(cb)s,t Ψ

l,(cc)s,t Ψ

l,(cd)s,t

Ψl,(da)s,t Ψ

l,(db)s,t Ψ

l,(dc)s,t Ψ

l,(dd)s,t

Φs

′

=

Φ′

a(netal ) 0 0 0

0 Φ′

b(netbl ) 0 0

0 0 Φ′

c(netcl ) 0

0 0 0 Φ′

d(netdl )

(5.17)


W =

wal,p+1+q −wb

l,p+1+q −wcl,p+1+q −wd

l,p+1+q

wal,p+1+q wb

l,p+1+q wcl,p+1+q −wd

l,p+1+q

wal,p+1+q −wb

l,p+1+q wcl,p+1+q wd

l,p+1+q

wal,p+1+q wb

l,p+1+q −wcl,p+1+q wd

l,p+1+q

zsplit =

zal zbl zcl zdl

−zbl zal −zdl zcl

−zcl zdl zal −zbl

−zdl −zcl zbl zal

(5.18)

5.3.2 Derivation of the Quaternion-Valued RTRL

For the fully Quaternion-Valued RTRL, the ouptut yl(n) is given by

yl(n) = Φ(wT

l (n)z(n))= Φ

(netl(n)

)(5.19)

Based on the cost function of (5.8), the gradient ∇wE(n) of QRTRL shall be expressed as

∇wE(n) = e1(n)∇we∗1(n) +∇we1(n)e

∗1(n) = −e1(n)Υ(n)−Ψ(n)e∗1(n) (5.20)

where Υ(n) and Ψ(n) are the conjugate sensitivities and sensitivities respectively, defined

by

Ψ(n) =

[∂y1(n)

∂w1,1(n), · · · ,

∂yl(n)

∂wN,N+p+1(n)

]

; Υ(n) =

[∂(y1)

∗(n)

∂w1,1(n), · · · ,

∂(yl)∗(n)

∂wN,N+p+1(n)

]

(5.21)

The sensitivity Ψls,t is calculated by differentiating yl in (5.19) with respect to ws,t resulting

in

Ψls,t(n) =

∂yl(n)

∂was,t(n)

+∂yl(n)

∂wbs,t(n)

ı+∂yl(n)

∂wcs,t(n)

+∂yl(n)

∂wds,t(n)

κ (5.22)

5.4 Simulations 90

To find the term ∂yl∂wa

s,tin (5.22), differentiate yl with respect to wa

s,t to yield

∂yl∂wa

s,t(n)=

∂yl(n)

∂netl(n)

∂netl(n)

∂was,t(n)

= Φ′

(netl(n))

(

δsl(zal (n) + zbl (n)ı+ zcl (n)+ zdl (n)κ

)+

N∑

q=1

ws,t(n)∂yq(n− 1)

∂was,t(n)

)

(5.23)

Similar to the derivation of ∂yl∂wa

s,t, the terms ∂yl

∂wbs,t

, ∂yl∂wc

s,tand ∂yl

∂wds,t

can also be found in the

same manner. From Appendix I, the final expression for the sensitivity Ψls,t(n) is given by

Ψls,t(n) = Φ

′

(netl(n))

(

− 2δslz∗l (n) +

N∑

q=1

ws,t(n)Ψqs,t(n− 1)

)

(5.24)

Similarly, the expression for the conjugate sensitivity Υls,t becomes

Υls,t(n) = Φ

′∗(netl(n))

(

4δslz∗l (n) +

N∑

q=1

Υqs,t(n− 1)w∗

s,t(n)

)

(5.25)

It is clear that only two quaternion-valued sensitivities Υ and Ψ in (5.24) - (5.25) are

needed to govern the system in contrast with the 16 real-valued sensitivities in the split-

quaternion case shown in (5.18), which results in a reduced computational complexity.

5.4 Simulations

The tanh(q) was chosen as the nonlinear activation function and initial values for Ψ(n)

and Υ(n) were set to zero for both algorithms. The algorithms had the input tap length

of p = 3 and output neurons of N = 2; the performance was assessed in a predictive

setting. For simulation purposes, the three-dimensional Lorenz chaotic signal [52] and

three-dimensional real-world Tai Chi motion recorded from 3D inertial motion sensors

were considered.

5.4 Simulations 91

−0.5

0

0.5

−0.5

0

0.50

0.2

0.4

0.6

0.8

XYZ

−0.5

0

0.5

−0.5

0

0.50

0.2

0.4

0.6

0.8

XY

Z−0.5

0

0.5

−0.5

0

0.50

0.2

0.4

0.6

0.8

XY

Z

−0.5

0

0.5

−0.5

0

0.50

0.2

0.4

0.6

0.8

XY

Z

−0.5

0

0.5

−0.5

0

0.50

0.2

0.4

0.6

0.8

XY

Z

−0.5

0

0.5

−0.5

0

0.50

0.2

0.4

0.6

0.8

XY

Z

a) Phase Space for Lorenz Signal with M=1

b) Phase Space for Lorenz Signal with M=10Lorenz Attractor Split QRTRL

Split QRTRL QRTRLLorenz Attractor

QRTRL

Figure 5.2: Phase space of Lorenz signal

Dimension Split QRTRL M=1 QRTRL M=1 Split QRTRL M=10 QRTRL M=10

X 0.884 0.935 0.360 0.636Y 0.867 0.919 0.351 0.412Z 0.588 0.719 0.001 0.034

Table 5.1: Correlation Coefficients Between Lorenz Attractors

5.4.1 Three-dimensional Lorenz Chaotic Signal

The Lorenz attractor is a three-dimensional system originally used to model atmospheric

turbulence [74] but is also now used to model lasers, dynamos and waterwheels [75]. For

this experiment, the learning rate is set to µ = 5× 10−4 for both algorithms.

The Lorenz attractor is governed by coupled partial differential equations

∂x

∂t= α(y − x);

∂y

∂t= x(ρ− z)− y;

∂z

∂t= xy − βz (5.26)

where α, ρ and β >0. The parameters for the Lorenz system was chosen to be α = 10,

ρ = 28 and β = 8/3. These values would ensure the existence of the Lorenz attractor.

5.4 Simulations 92

0 200 400 600 800

−0.6

−0.4

−0.2

0

0.2

0.4

0.6


X−

com

po

ne

nt

0 200 400 600 8000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Number of iterations (n)Y

−co

mp

on

en

t

0 200 400 600 800

−0.4

−0.2

0

0.2

0.4

0.6


Z−

com

po

ne

nt

Actual

split QRTRL

QRTRL, RTRL

RTRL

Actual QRTRL

Actual

QRTRL

split QRTRL

split QRTRLRTRL

Figure 5.3: The performance of QRTRL, split QRTRL and RTRL on the prediction ofmotion data

Figure 5.2a shows the original Lorenz attractor and the reconstruction of the at-

tractor in the phase space for both algorithms at one step ahead (M = 1) prediction.

Although both algorithms were able to reconstruct the attractor, the QRTRL estimated

a more accurate replica of the attractor than the split QRTRL.

Figure 5.2b depicts the Lorenz attractor and the reconstructed attractor for both

algorithms for ten steps ahead prediction (M = 10). It is apparent that the output of

the QRTRL still resembles the original Lorenz attractor, therefore outperforming the split

QRTRL.

Table 5.1 shows the correlation coefficients between the original Lorenz attractor

and the reconstructed attractor for the split QRTRL and QRTRL algorithms for M = 1

and M = 10. The larger values of the QRTRL algorithm for all three-dimensions at one

and ten step ahead predictions proved that its reconstructed attractor was more similar

to the original Lorenz attractor compared to the ones by the split QRTRL. This justifies

5.5 Summary 93

the advantages of using a fully quaternion function over a split quaternion function.

5.4.2 Motion Estimation

Five 3D gyroscopes were placed on the left arm, left hand, right arm, right hand and the

waist of an athlete performing Tai Chi movements and 3D motion data were recorded using

the XSense MTx 3DOF Orientation Tracker and the movement of the left arm was used

as a pure quaternion input for this simulation. For a fair comparison, the performance of

three parallel real-valued FCRNN trained with RTRL is also considered [52]. The learning

rate was set to µ = 1 × 10−3 for the quaternion-valued algorithms and µ = 1 × 10−2 for

the RTRL since it performed poorly at smaller learning rate.

Figure 5.3 shows the componentwise performance of the one step ahead prediction

M = 1 of the Tai Chi motion using the split QRTRL, QRTRL and RTRL algorithms. It

can be seen that the algorithms performed similarly in the X-component. However, the

QRTRL performed better than the split QRTRL and RTRL in Y- and Z-components. The

performance for both the split QRTRL and RTRL were similar for all three-dimensions.

5.5 Summary

A class of quaternion-valued learning algorithms for recurrent neural networks based on

the local gradient has been introduced. The superior performances of the fully-quaternion

algorithm (QRTRL) compared to the split-quaternion algorithm (split QRTRL) stems

from the fact that QRTRL accounts for the interchannel couplings in contrast to the split

QRTRL. The componentwise channel processing which operates on both split QRTRL and

RTRL explains their similar performances. Simulations over the chaotic Lorenz attractor

and real world three-dimensional motion data illustrate the advantages of the proposed

approach. The same framework can be used to introduce any other nonlinear gradient

based learning algorithm, and by removing the nonlinearity, learning algorithms for IIR

filters are obtained.

94

Chapter 6

Identification of Improper

Quaternion Processes by

Fractional Tap-Length Algorithms

In the previous chapter, it has been established that the locally analytic functions are

suitable for the training of Quaternion-valued recurrent neural networks (RNN). Owing

to the inherent ability to better capture the cross-correlations between dimensions, the

performance of the algorithm based on the fully quaternion has improved compared to its

split quaternion counterparts.

This chapter aims to extend the fractional-tap (FT) length adaptive filtering

paradigm from the real to the quaternion domain that enables data-adaptive optimal

modelling and identification. This is achieved by combining the FT length optimisation

with the recently introduced strictly linear and widely linear quaternion-valued adap-

tive filtering algorithms, the Quaternion Least Mean Square (QLMS) and Widely Linear

Quaternion Least Mean Square (WL-QLMS). A collaborative combination of QLMS and

WLQLMS (CC-QLMS) is shown to both identify the type of processes (second-order cir-

cular and noncircular) and to track their optimal parameters. Further insights into these

algorithms are provided by establishing a relationship between the steady-state error and

tap-length. This is further supported by simulations on model order selection and identifi-

6.1 Introduction 95

cation of the second-order circular (proper) and noncircular (improper) quaternion-valued

systems.

6.1 Introduction

A convenient and rigorous method to identify the model order of a quaternion-valued

system is by using a combination of quaternion-valued adaptive filters and variable tap-

length algorithm, optimised for the optimal filter length [28,76]. The variable tap-length

algorithm considered in this work is the fractional tap-length (FT) one, due to its simplicity

and robustness [28]. The FT algorithm was designed specifically for real-valued filters and

recently extended to widely linear complex-valued filters [29].

To this end, the FT algorithm is extended to the quaternion domain by consid-

ering the second-order augmented quaternion statistics of the signal. The quaternion-

valued algorithms considered are, the recently introduced Quaternion Least Mean Square

(QLMS) [24] and Widely Linear Quaternion Least Mean Square (WL-QLMS) [45] algo-

rithms, which when combined provide the necessary tools to identify the model order of

a general quaternion-valued systems. The WL-QLMS is based on the widely linear model

which has the ability to capture the full second-order statistics of the quaternion signal

characterised by the standard covariance matrix Cq and three complementary covariance

matrices termed the ı-covariance Cqı, -covariance Cq and κ-covariance Cqκ [42, 43]. The

collaborative combination of QLMS and WLQLMS (CCQLMS) provide a more flexible

tool for the modelling of the generality quaternion-valued systems. Furthermore, the

evolution of the convex mixing parameter illustrates the degree of properness of a given

quaternion-valued system.

This chapter is organised as follows. Section 6.2 shall describe the workings of

the proposed model order identification algorithms. This is followed by the steady-state

analysis in Section 6.3. In Section 6.4, simulations supporting the proposed approach are

presented. The chapter concludes in Section 6.5.

6.2 Model Order Identification 96

6.2 Model Order Identification

The proposed algorithms, FT-QLMS, FT-WLQLMS and FT-CCQLMS, comprise of two

parts: the finite impulse response (FIR) filter weight update which optimises the adaptive

weight coefficients, followed by the fractional tap-length (FT) algorithm that adapts the

tap-length of the filter to an optimal length. The filter weight algorithms are first reviewed

then followed by an illustration on ways to exploit the FT algorithm within quaternion-

valued adaptive systems.

6.2.1 Filter Weight Updates

The filter weight quaternion-valued algorithms are based on optimising a real-valued cost

function of quaternion variables shown to be

E(n) = e2a(n) + e2b(n) + e2c(n) + e2d(n) = e(n)e∗(n) (6.1)

where the error e(n) = d(n)−y(n) with d(n) and y(n) denoting respectively to the desired

signal and output signal. The terms ea(n), eb(n), ec(n) and ed(n) denote respectively the

error component in the real part, ı part, part, and κ part.

The QLMS is based on gradient-descent and is described by [24] (derivation is

provided in Appendix A)

el(n) = d(n)− yl(n)

yl(n) = wT (n)x(n)

w(n+ 1) = w(n) + µ

(

2el(n)x∗(n)− x∗(n)e∗l (n)

)

(6.2)

where w(n) is the weight vector, x(n) is the filter input, el(n) is the QLMS error, yl(n)

is the QLMS output, symbol (·)∗ denotes the quaternion conjugate operator, and µ is a

real-valued learning rate.

The WL-QLMS which utilises the widely linear model and is given by [45] (deriva-

tion is similar to the Augmented Quaternion Nonlinear Gradient Descent (AQNGD) al-


Figure 6.1: Hybrid filter structure.

gorithm in Section 4.3.2)

ew(n) = d(n)− yw(n)

yw(n) = waT(n)xa(n)

wa(n+ 1) = wa(n) + µ

(

2ew(n)xa∗(n)− xa∗(n)e∗w(n)

)

(6.3)

where ew(n) is the WL-QLMS error, yw(n) is the WL-QLMS output, wa(n) is the aug-

mented weight vector and xa(n) is the augmented filter input.

The collaborative filter shown in Figure 6.1, consists of two independent subfilters

sharing the common filter input x(n) and desired signal d(n). Similar to [77], the convex

combination of the output of the QLMS and WL-QLMS (CC-QLMS) forms the overall

output ycc(n) given by

ycc(n) = λ(n)yl(n) +(1− λ(n)

)yw(n) (6.4)

where λ(n) is the real-valued convex mixing parameter. The update of the convex mixing

parameter λ(n) is governed by

λ(n+ 1) = λ(n)− µλ∇λE(n) (6.5)


where µλ and ∇λE(k) represent the real-valued stepsize and the error gradient.

The error gradient ∇λE(n) can be evaluated as

∇λE(n) = ecc(n)∂e∗cc(n)

∂λ(n)+∂ecc(n)

∂λ(n)e∗cc(n)

= ecc(n)(yl(n)− yw(n)

)∗+

(yl(n)− yw(n)

)e∗cc(n)

= 2R{ecc(n)

(yl(n)− yw(n)

)∗}(6.6)

where ecc(n) = d(n)−ycc(n) is the error of the CC-QLMS algorithm and R{·} is the scalar

part of the variable.

This will yield the final weight update of the convex mixing parameter λ(n) in the

form

λ(n+ 1) = λ(n)− µλ

(

R{ecc(n)

(yl(n)− yw(n)

)∗})

(6.7)

where 2 is absorbed into the learning rate µλ.

Due to the convex nature of the CC-QLMS and given that the mixing parame-

ter λ(n) is within [0, 1], the CC-QLMS would converge as long as one of the subfilters

converges [78]. The value of λ(n) is hard bounded when λ > 1 or λ < 0.

6.2.2 Tap Length Adaptation

The tap-length adaptation is governed by the FT algorithm given by [28]

ηf (n+ 1) = (ηf (n)− α)− γ ·

[(

E(p)p (n)

)

−

(

E(p)p−∆(n)

)]

(6.8)

where ηf is the pseudo fractional tap-length which can take only positive real value, α and

γ are the leaky factor and tap-length learning rate, which are small positive real values

that satisfy α � γ. Symbols E(p)p (n) and E

(p)p−∆(n) denote respectively the instantaneous

square errors for the tap-lengths of p and p −∆, symbol p denotes the “true” tap-length

at discrete time instant ‘n’, and ∆ is a real positive integer such that min{p(n)−∆} > 0.

The instantaneous square output errors for filters of lengths p and p−∆ are given

6.3 Steady-State Analysis of FT Based Algorithms 99

by

E(p)p (n) =

(e(p)p (n)

)(e(p)p (n)

)∗; E

(p)p−∆(n) =

(e(p)p−∆(n)

)(e(p)p−∆(n)

)∗(6.9)

based on the errors e(p)p (n) and e

(p)p−∆(n).

These errors can be shown to be

e(p)q (n) = d(n)− y(p)q (n) = d(n)−w(p)Tq (n)x(p)

q (n) (6.10)

where 1 ≤ q ≤ p, while w(p)q (n) and x

(p)q (n) are vectors consisting of the first q coefficients

of w(p)(n) and x(p)(n), respectively.

To calculate the optimal tap length, the tap-length parameter p(n) is made adaptive

according to [28]

p(n+ 1) =

bηf (n)c, |p(n)− ηf (n)| ≥ δ

p(n), otherwise(6.11)

where δ is a predefined integer threshold and b·c denotes the floor operator.

The operations of the proposed algorithms are summarised in Algorithm 1.

Algorithm 1

Filter Weight Algorithms

Initialisation Values: λ(0) = 0.5, ηf (0) = p(0)

CC-QLMS, QLMS: w(n+ 1) = w(n) + µ

(

2el(n)x∗(n)− x∗(n)e∗l (n)

)

CC-QLMS, WL-QLMS: wa(n+ 1) = wa(n) + µ

(

2ew(n)xa∗(n)− xa∗(n)e∗w(n)

)

CC-QLMS: λ(n+ 1) = λ(n)− µλ

(

R{ecc(n)

(yl(n)− yw(n)

)∗})

Fractional Tap-Length Algorithm

ηf (n+ 1) = (ηf (n)− α)− γ ·

[(

E(p)p (n)

)

−

(

E(p)p−∆(n)

)]

p(n+ 1) =

{bηf (n)c, |p(n)− ηf (n)| ≥ δp(n), otherwise

6.3 Steady-State Analysis of FT Based Algorithms

This section will provide a rigorous steady-state analysis of the of the FT-QLMS, FT-

WLQLMS and FT-CCQLMS algorithms for two models of teaching signals: linear and


widely linear. First consider the case of widely linear teaching signal and the FT-WLQLMS

algorithm. The desired (teaching) signal d(n) is defined as

d(n) = goTLoptxLopt(n) + hoTLoptx

ıLopt(n) + uoT

LoptxLopt(n) + voT

LoptxκLopt(n) + v(n) (6.12)

where goLopt, hoLopt, u

oLopt and vo

Lopt are the optimal weight coefficients of the optimal tap

lengths of the widely linear model, and v(n) is a H-circular quaternion white Gaussian

noise. The symbols (·)ı, (·) and (·)κ denote the ı, , κ involutions respectively.

The output of the FT-WLQLMS algorithm is given as

yw(n) = gT (n)x(n)︸︷︷︸

standard part

+hT (n)xı(n) + uT (n)x(n) + vT (n)xκ(n)︸︷︷︸

augmented part

(6.13)

The output error e(n) = d(n) − y(n) is expressed in terms of the optimal tap weights by

subtracting (6.13) from (6.12) resulting in

e(n) = goTxLopt(n) + hoTxıLopt(n) + uoTx

Lopt(n) + voTxκ

Lopt(n) + v(n)

− gT (n)x(n)− hT (n)xı(n)− uT (n)x(n)− vT (n)xκ(k) (6.14)

Proceeding in a manner similar to the analysis in [79], the optimal coefficients of the weight

vectors can be split into three parts

goLopt =

g′o

g′′o

g′′′o

hoLopt =

h′o

h′′o

h′′′o

uoLopt =

u′o

u′′o

u′′′o

voLopt =

v′o

v′′o

v′′′o

(6.15)

where g′o, h′o, u′o, v′o are the coefficients modelled by tap-length 1:p −∆, g′′o, h′′o, u′′o,

v′′o are the coefficients modelled by the tap-length p−∆+ 1 : p, and g′′′o, h′′′o, u′′′o, v′′′o

are the undermodelled coefficients.

For convenience, the coefficient weight error vectors of the FT-WLQLMS are denoted as

g(n) = go−gp(n); h(n) = ho−hp(n); u(n) = uo−up(n); v(n) = vo−vp(n) (6.16)


where gp(n), hp(n), up(n) and vp(n) are the FT-WLQLMS weight vectors of length p.

Similar to (6.15), the weight error vectors can also be split up into three parts

g(n) =

g′(n)

g′′(n)

g′′′(n)

h(n) =

h′(n)

h′′(n)

h′′′(n)

u(n) =

u′(n)

u′′(n)

u′′′(n)

v(n) =

v′(n)

v′′(n)

v′′′(n)

(6.17)

The errors e(p)p (n) and e

(p)p−∆(n) are rewritten to be (the time index ‘n’ has been dropped

due to space limitations)

e(p)p =

g′

g′′

g′′′o

T

x′

x′′

x′′′

+

h′

h′′

h′′′o

T

x′ı

x′′ı

x′′′ı

+

u′

u′′

u′′′o

T

x′

x′′

x′′′

+

v′

v′′

v′′′o

T

x′κ

x′′κ

x′′′κ

+ v (6.18)

e(p)p−∆ =

g′

g′′o

g′′′o

T

x′

x′′

x′′′

+

h′

h′′o

h′′′o

T

x′ı

x′′ı

x′′′ı

+

u′

u′′o

u′′′o

T

x′

x′′

x′′′

+

v′

v′′o

v′′′o

T

x′κ

x′′κ

x′′′κ

+ v (6.19)

In order to ensure mathematical tractability, the following assumptions are en-

forced [79]:

a) both the input signal x(n) and the noise v(n) are i.i.d. zero mean white jointly

Gaussian with the respective variances σ2x and σ2v ;

b) at the steady state, the input signal x(n) is independent of the weight vectors;

c) the tap-length parameter has converged at steady-state, hence E{ηf (n + 1)} =

E{ηf (n)}, leading to the undermodelled error vectors vanishing.

Applying the statistical expectation operator to the steady-state MSE in (6.8) yields

E

{(

E(p)p (n)

)

−

(

E(p)p−∆(n)

)}

< |α

γ| (6.20)


Following the definitions of E(p)p and E

(p)p−∆ in (6.9), expanding (6.20) will give

E{‖g′′T (n)x′′(n)‖22 + ‖h′′T

(n)x′′ı(n)‖22 + ‖u′′T (n)x′′(n)‖22 + ‖v′′T (n)x′′κ(n)‖22

−‖g′′oT (n)x′′(n)‖22 − ‖h′′oT (n)x′′ı(n)‖22 − ‖u′′oT (n)x′′(n)‖22 − ‖v′′oT (n)x′′κ(n)‖22} < |α

γ| (6.21)

Remark#1: The FT-WLQLMS incorporates the errors from the standard and augmented

parts of the quaternion widely linear model in adapting the tap-length, thus ensuring

efficient modelling of the widely linear quaternion-valued systems.

To obtain the steady-state of the FT-QLMS algorithm, the augmented part in (6.13) is

set to zero which will give

yl(n) = wT (n)x(n) (6.22)

Proceeding in a similar fashion to FT-WLQLMS, the final steady-state performance is

shown to be

E{‖w′′T (n)x′′(n)‖22 − ‖g′′oT (n)x′′(n)‖22} < |α

γ| (6.23)

Remark#2: The FT-QLMS only considers the error from only the standard part of the

quaternion widely linear model in adapting the tap-length proving to be insufficient for

the modelling of widely linear quaternion-valued systems.

Next, the steady-state of the FT-CCQLMS algorithm is derived. Consider the output of

FT-CCQLMS given by

ycc(n) = λ(n)wT (n)x(n)+

(

1−λ(n)

)(

gT (n)x(n)+hT (n)xı(n)+uT (n)x(n)+vT (n)xκ(k)

)

(6.24)

Following a similar manner to obtain FT-WLQLMS and FT-QLMS, the final steady-state


is given by

E

{(

1− λ(n)

)(

‖g′′T (n)x′′(n)‖22 + ‖h′′T

(n)x′′ı(n)‖22 + ‖u′′T (n)x′′(n)‖22 + ‖v′′T (n)x′′κ(n)‖22

)

+λ(n)‖w′′T (n)x′′(n)‖22 − ‖g′′oT (n)x′′(n)‖22 − ‖h′′oT (n)x′′ı(n)‖22 − ‖u′′oT (n)x′′(n)‖22

−‖v′′oT (n)x′′κ(n)‖22

}

< |α

γ| (6.25)

For optimal processing, λ→ 0 which will simplify (6.25) to become similar to the steady-

state of FT-WLQLMS in (6.21).

Remark#3: As λ → 0, the FT-CCQLMS performance will be similar to the FT-

WLQLMS for the processing of widely linear systems.

Moving on, consider a linear model shown to be

d(n) = woTLoptxLopt(n) + v(n) (6.26)

This will then result in similar steady-state expression for FT-QLMS and FT-WLQLMS

given by

FT-QLMS : E{‖w′′T (n)x′′(n)‖22 − ‖w′′oT (n)x′′(n)‖22} < |α

γ| (6.27)

FT-WLQLMS : E{‖g′′T (n)x′′(n)‖22 − ‖w′′oT (n)x′′(n)‖22} < |α

γ| (6.28)

Remark#4: Both the FT-QLMS and FT-WLQLMS takes into account the error from the

standard part of the quaternion linear model in adapting their tap-lengths demonstrating

to be suitable for the modelling of the linear quaternion-valued systems.

Similarly, the steady-state expression for FT-CCQLMS will become

E

{

λ(n)‖w′′T (n)x′′(n)‖22 +

(

1− λ(n)

)(

‖g′′T (n)x′′(n)‖22 + ‖h′′T

(n)x′′ı(n)‖22 + ‖u′′T (n)x′′(n)‖22

+‖v′′T (n)x′′κ(n)‖22

)

− ‖w′′oT (n)x′′(n)‖22

}

< |α

γ| (6.29)

For optimal processing of the linear model, λ→ 1 resulting in a similar expression to the

steady-state of FT-QLMS.

Remark#5: The FT-CCQLMS will have similar expressions to the FT-QLMS for the

6.4 Simulations 104

10 20 300

10

20

30

40

50

60

Tap Length p

Ste

ad

y−st

ate

MS

E

10 20 300

20

40

60

80

100

120

140

Tap Length p

Ste

ad

y−st

ate

MS

Ea) The steady−state MSE for the process W

1b) The steady−state MSE for the process W

2

QLMS

QLMS, CC−QLMS

WL−QLMSWL−QLMS

CC−QLMS

Figure 6.2: The steady-state MSE for the processesW1 andW2 with respect to tap-length.

modelling of strictly linear quaternion-valued systems when λ→ 1.

6.4 Simulations

Simulations were conducted in the system identification setting and performances of FT-

QLMS, FT-WLQLMS and FT-CCQLMS were evaluated for a range of systems with the

quaternion quadruply circular white Gaussian noise (QWGN) serving as a driving input,

given by


where εa, εb, εc and εd are realisations of real-valued white Gaussian noises (WGN).

The QWGN was first fed through a filter defined by H(n) = 0.35ε(n) + ε(n− 1) +

0.35ε(n−2) to illustrate a severe condition. The output of H(n) is then fed to the systems

6.4 Simulations 105

0 1000 2000 3000 4000 50005

10

15

20

25

30

35

40

45

50


Tap

Leng

th p

0 1000 2000 3000 4000 50000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Mix

ing

Par

amet

er λ

FT−WLQLMS

b) Mixing Parameter λ of the linear system W1a) Modelling of the linear system W

1

FT−QLMS, FT−CCQLMS

Figure 6.3: The evolution of the optimal filter length parameter p and mixing parameterλ for the modelling of the linear system W1.

defined by

W1(n) = 1.79W1(n− 1)− 1.85W1(n − 2) + 1.27W1(n− 3)− 0.41W1(n− 4) + ε(n) (6.31)

W2(n) = 1.79W2(n− 1)− 1.85W2(n − 2) + 1.27W2(n− 3)− 0.41W2(n− 4) + ε(n)

+ 0.5ε∗(n) + 0.9ε∗(n− 1) (6.32)

where W1 is a linear AR (4) system [38] and W2 is a widely linear AR (4) system [29].

System W2 is constructed by combining W1 with the augmented part of W given by [80]

W (n) = eıW (n− 1) + ε(n) + 0.5ε∗(n) + 0.9ε∗(n− 1) (6.33)

where ı is the imaginary unit.

6.4 Simulations 106

0 1000 2000 3000 4000 50005

10

15

20

25

30

35

40

45

50


Tap

Leng

th p

0 1000 2000 3000 4000 50000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Mix

ing

Par

amet

er λFT−WLQLMS, FT−CCQLMS

FT−QLMS

a) Modelling of the widely linear system W2

b) Mixing Parameter λ of the widely linearsystem W

2

Figure 6.4: The evolution of the optimal filter length parameter p and mixing parameterλ for the modelling of the widely linear system W2.

6.4.1 Optimal Tap-Length

The optimal tap-lengths for both systems were determined by the steady-state MSE esti-

mated by [28]

ε(n) = λcε(n− 1) + (1− λc)E(n) (6.34)

where ε is the estimated steady-state MSE and λc = 0.9.

Figure 6.2 depicts the steady-state MSE for both the linear system W1 and widely

linear system W2, using the QLMS, WLQLMS and CCQLMS algorithms with µ = 10−3.

From Figure 6.2a, it can be seen that all three of the MSE curves were monotonically non-

increasing functions of the tap-length and such the optimal tap-length for all algorithms

were found to be p0 = {15, 16}. Figure 6.2b shows that the shape of the MSE curve for

QLMS does not asymptotically converge, thus proving the inability of the strictly linear

QLMS to model the widely linear system W2 according to [28]. On the other hand,

the MSE curves for the WL-QLMS and CC-QLMS converged indicating their ability to

model W2 for which the optimal tap-length was found to be p0 = {21, 22}. The optimal

6.4 Simulations 107

5 10 15 20 25 30 350

10

20

30

40

50

60

Tap Length p

Ste

ady−

stat

e M

SE

CC−QLMS

WL−QLMS

QLMS

Figure 6.5: The steady-state MSE for the process linear noncircular W1 with respect totap-length.

tap-lengths for both systems were not a single integer due to the use of feedforward filters,

which can only give approximations of the autoregressive (AR) feedback system response.

6.4.2 Modelling of Quaternion-Valued Systems

Figure 6.3 depicts the evolution of the optimal tap length parameter p for the FT-QLMS,

FT-WLQLMS and FT-CCQLMS algorithms when employed for the modelling of linear

AR(4) system W1 along with the evolution of the mixing parameter λ of FT-CCQLMS.

These algorithms were initialised with the following parameters: α = 0.03, γ = 1, δ=1,

∆=4, µ = 1× 10−5, µl = 5× 10−4, the initial mixing parameter λ(0) = 0.5 and the initial

tap length p(0) = 10. From Figure 6.3a, it was evident that the performances of all three

algorithms considered were similar as they converged to the optimal tap-length at around

the same number of iterations. This conforms with Remark 4 and Remark 5, which gives

justification for their similar performances. Figure 6.3b shows that the mixing parameter

λ → 1 for the FT-CCQLMS when modelling the linear system W1. This corroborates

with Remark 5. The reason being is that the QLMS subfilter converges faster than the

WL-QLMS leading to the CC-QLMS favouring the QLMS.

6.4 Simulations 108

WGN Noncircular

εa N (0, 1)εb −0.6εa +N (0, 1)εc 0.8εb +N (0, 1)εd 0.8εa − 0.4εb +N (0, 1)

Table 6.1: Noncircular Quaternion White Gaussian Noise

Similarly, Figure 6.4 shows the modelling for the widely linear system W2. It can

be seen from Figure 6.4a that the FT-QLMS was unable to model the widely linear system

W2, whereas FT-WLQLMS and FT-CCQLMS converged to the optimal tap-length. This

justified by Remark 1, Remark 2 and Remark 3 in the previous section. Figure 6.4b

illustrates that λ → 0 for the modelling of widely linear system conforming to Remark

3. This is due to the better performance of the WL-QLMS subfilter dominating the CC-

QLMS algorithm.

6.4.3 Nonstationary Systems

A system consisting of three separate subsystems was considered. The first subsystem is

the linear system W1 for the intervals of 1 ≤ n ≤ 3000 followed by widely linear system

W2 for 3001 ≤ n ≤ 6000. The third subsystem is a linear noncircular W1 for the intervals

of 6001 ≤ n ≤ 9000. The linear noncircular W1 is the system W1 (6.31) fed with the

noncircular QWGN as the driving input. The construction of the noncircular QWGN

is described in Table 4.1. For clarity, the characteristics of the noncircular QWGN is

reproduced in Table 6.1.

Figure 6.5 shows the steady-state MSE for noncircular linear system W1 using the

QLMS, WLQLMS and CCQLMS algorithms with µ = 10−4. Analysing Figure 6.5, it is

shown that all three of the MSE curves were monotonically non-increasing functions of

the tap-length and the optimal tap length was found to be po = {21, 22}. Observing the

CC-QLMS curve, there is an error spike at the tap length p = 4. This is because that p = 3

is a local minimum and the CC-QLMS is struggling to escape from it. This is supported

by the reduced slope of the QLMS and WL-QLMS curves between the 3 ≤ p ≤ 5.

6.4 Simulations 109

0 1000 2000 3000 4000 5000 6000 7000 8000 90005

10

15

20

25

30

35

40

45

50


Tap

Len

gth

p FT−CCQLMSFT−WLQLMS

FT−QLMS

Figure 6.6: The evolution of the optimal filter length parameter p for the modelling of thesystem W1 for the intervals 1 ≤ n ≤ 3000, W2 for 3001 ≤ n ≤ 6000 and noncircular W1

for 6001 ≤ n ≤ 9000

Figure 6.6 shows the evolution of the optimal filter length parameter p for the FT-

QLMS, FT-WLQLMS and FT-CCQLMS employed for the modelling of the system W1

(interval of 1 ≤ n ≤ 3000), W2 (interval of 3001 ≤ n ≤ 6000) and noncircular W1 (interval

of 6001 ≤ n ≤ 9000). These algorithms were initialised as follows: α = 0.03, γ = 1, δ=1,

∆=4, µ = 1× 10−5, µl = 5× 10−4, the initial mixing parameter λ(0) = 0.5 and the initial

tap length p(0) = 25. From the figure, the FT-WLQLMS was able to converge to the

optimal tap-length of the system W1 for the interval 1 ≤ n ≤ 3000 and adapts to the

system W2 for 3001 ≤ n ≤ 6000. The FT-WLQLMS was unable to model efficiently the

noncircular W1 for interval 6001 ≤ n ≤ 9000. As for the FT-QLMS, it was incapable to

adapt to the system W2 during the interval of 3001 ≤ n ≤ 6000 but was able to model W1

and W2. FT-CCQLMS was able to model all three systems owing to the robust mixing

parameter λ.

Figure 6.7 depicts the evolution of the mixing parameter λ of FT-CCQLMS for the

modelling of subsystems W1, W2 and noncircular W1. For the modelling of linear system

W1, the parameter λ → 1 for the interval of 1 ≤ n ≤ 3000 making FT-QLMS dominant

over FT-WLQLMS. As for the widely linear system W2 in the interval 3001 ≤ n ≤ 6000,

6.5 Summary 110

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Mix

ing

Par

amet

er λ

Figure 6.7: The evolution of the mixing parameter λ for the modelling of the systemW1 for the intervals 1 ≤ n ≤ 3000, W2 for 3001 ≤ n ≤ 6000 and noncircular W1 for6001 ≤ n ≤ 9000

the parameter λ → 0 resulting in FT-WLQLMS to be superior. For the processing of

noncircular linear system W1 in the interval 6001 ≤ n ≤ 9000, parameter λ→ 1 favouring

the linear model of FT-QLMS. This is due to the noncircular input signal being quadruply

white which has a low magnitude of properness profile [64]. This corroborates with earlier

findings in [81,82].

6.5 Summary

The fractional tap-length (FT) algorithm is successfully extended into quaternion-valued

adaptive filters trained by the Quaternion Least Mean Square (QLMS) and Widely Linear

Quaternion Least Mean Square (WL-QLMS) which have demonstrated their capabilities in

model order selection. The collaborative combination FT-CCQLMS has been shown to be

able to model efficiently both widely linear and strictly linear quaternion-valued systems

due to its robust mixing ability. The relationship between the steady-state error and tap-

length has been established giving a mathematical proof to the modelling capabilities of

all algorithms. Simulations on model order selection and the identification of quaternion-

6.5 Summary 111

valued systems support the approach. The results can be easily extended to incorporate

nonlinear quaternion-valued algorithms.

112

Chapter 7

Conclusions and Future Works

Section 7.1 of this chapter presents the conclusions of the thesis. Section 7.2 provides

suggestions for future works in the field.

7.1 Conclusions

This thesis has proposed novel class of quaternion-valued adaptive filtering algorithms

improving over existing algorithms. The findings of this thesis are summarised as follows:

a) employ a new cost function which takes into account the non-commutative nature of

quaternion product resulting in a class of nonlinear split-quaternion adaptive filtering

algorithms;

b) introduce a new class of nonlinear quaternion-valued adaptive filtering algorithms

utilizing the locally analytic nonlinear quaternion functions based on the Local An-

alyticity Condition (LAC);

c) extend the locally analytic nonlinear quaternion functions to the recurrent neural

networks (RNN) architecture catering for long-term correlations of the signal;

d) provide a tool to minimize computational complexity and enable system modelling in

the quaternion domain H through the usage of fractional tap-length (FT) algorithm.

7.1 Conclusions 113

The first contribution is deriving algorithms that takes into consideration the non-

commutativity aspect of the quaternion product. Due to the restrictive nature of the

Cauchy-Riemann-Fueter (CRF) conditions, the componentwise analytic split-quaternion

functions are utilised. The excellent simulation results achieved over real- and complex-

valued algorithms of the same nature has highlighted the benefits of processing in the

quaternion domain H. The higher performance over previous nonlinear quaternion-valued

algorithms proved the significance of considering the non-commutativity. These derived

algorithms are then served a basis for future algorithms.

The second contribution is proposing a class of locally analytic functions by-

passing the strict CRF conditions. This CRF restriction is the sole reason prohibiting

further developments of nonlinear quaternion-valued algorithms. The gradient descent

based quaternion-valued algorithms require a first-order derivative which current nonlin-

ear quaternion functions fail to provide. In that respect, the local analyticity condition

(LAC) is chosen as an alternative to define analyticity in H. The nonlinear quaternion

functions satisfying the LAC are called locally analytic functions which guarantees its

first-order differentiability proving to be suitable for gradient descent based algorithms.

One convenient aspect of these functions is that it enables a generic extension of the

complex-valued elementary transcendental functions (ETF) to the quaternion domain H.

Building on the non-commutative split-quaternion algorithms proposed previously, a new

class of quaternion-valued adaptive filtering algorithms utilising these functions are intro-

duced. The improved performance of these algorithms offers an insight to its prowess over

split-quaternion based algorithms.

The third contribution illustrates the versatility of the proposed locally analytic

functions. Developing on the previous fully quaternion algorithms for the finite impulse

response, these functions are implemented in the recurrent neural network (RNN) archi-

tecture. Its superior ability to better capture the cross-correlations between dimensions

and provides a better estimate of the gradient has led to better performances over its

split-quaternion counterpart. Furthermore, the fully-quaternion RNN algorithm has sig-

nificantly less computational complexity than the split-quaternion RNN making it very

7.2 Future Works 114

attractive. The flexibility of the locally analytic functions will hopefully draw more re-

searchers to indulge themselves in fully quaternion based algorithms and its practical

applications.

The fourth contribution tackles the issue of extending the fractional tap-length (FT)

algorithm to the quaternion domain H. The FT algorithms combined with quaternion-

valued adaptive filtering algorithms have demonstrated excellent abilities in model or-

der selection and reducing the computational complexity incurred processing in H. It

is established that the collaborative combination algorithm is able to model generality

quaternion-valued systems owing to the flexible adaptive convex mixing parameter. The

results obtained could be easily extended to nonlinear learning algorithms of the same

architecture. This could further open up applications in the quaternion domain H.

Overall, this thesis has opened up new possibilities for nonlinear quaternion-valued

adaptive filtering algorithms. A class of nonlinear quaternion-valued adaptive filtering

algorithms that takes the non-commutativity aspect of quaternion algebra into consid-

eration is proposed. This class of algorithms is then improved upon by utilising locally

analytic functions. These algorithms are then extended to the recurrent neural network

(RNN) architecture. The previously complex-valued fractional tap-length algorithms were

then extended to the quaternion domain H enabling modelling of generality quaternion-

valued systems. All of the proposed algorithms performances have been supported through

rigorous mathematical analysis and simulations of real and synthetic quaternion-valued

signals.

7.2 Future Works

Following the studies of this thesis, several future directions in this research area is pro-

posed to further improve upon existing algorithms.

Despite the flexibility of the proposed locally analytic functions, the functions are

not suitable for algorithms that are based on the second-order derivatives such as the

Newton method. These second-order derivative algorithms guarantee faster convergence

7.2 Future Works 115

with the price of increased sensitivity to initial values. Proposing an analytic quaternion

function that guarantees the existence of its second-order derivative will enable the ex-

tension of these classes of algorithms to the quaternion domain H. This is crucial to the

development of nonlinear quaternion signal processing in H as a whole.

Another improvement is to extend the quaternion-valued fractional tap-length (FT)

algorithm to encompass architectures with feedbacks such as infinite impulse response

(IIR). One setback of the proposed FT algorithm can only model moving average (MA)

systems and provides only an approximate for autoregressive (AR) and autoregressive

moving average (ARMA) systems. The main obstacle in extending the FT algorithm

to IIR architecture is in determining the order of feedback. The current FT algorithm

cost function is not suited for this task and needs to be modified. Therefore, a new cost

function that considers the impact of the order of feedbacks needs to be constructed and

the performances of this algorithm in H should be analysed.

An interesting research area is to extend the usage of the locally analytic functions

to noise cancellation of optical communication systems. The linear polarization effects such

as the Polarization Mode Dispersion (PMD) and Polarization Dependent Loss (PDL) are

a major source of degradation at high bit rates. These effects can be modelled using a

quaternion transfer function enabling for the processing to be done in H instead of the

complex domain C.

116

Bibliography

[1] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathemat-

ics of Control, Signals, and Systems, vol. 2, no. 4, pp. 303–314, 1989.

[2] K. Funahashi, “On the approximate realisation of continuous mappings by neural

networks,” Neural Networks, vol. 2, no. 3, pp. 183–192, 1989.

[3] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by

back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.

[4] P. J. Werbos, “Backpropagation Through Time: what it does and how to do it,”

Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.

[5] F. J. Pineda, “Recurrent backpropagation and the dynamical approach to adaptive

neural computation,” Neural Computation, vol. 1, no. 2, pp. 161–172, 1989.

[6] R. J. Williams and D. Zipser, “A learning algorithm for continually running fully

recurrent neural networks,” Neural Computation, vol. 1, no. 2, pp. 270–280, 1989.

[7] B. Widrow, J. McCool, and M. Ball, “The complex LMS algorithm,” Proceedings of

the IEEE, vol. 63, no. 4, pp. 719–720, 1975.

[8] H. Leung and S. Haykin, “The complex backpropagation algorithm,” IEEE Transac-

tions on Signal Processing, vol. 39, no. 9, pp. 2101–2104, 1991.

[9] G. M. Georgiou and C. Koutsougeras, “Complex domain backpropagation,” IEEE

Transactions on Circuits and Systems II, vol. 39, no. 5, pp. 330–334, 1992.

[10] A. Hirose, “Continuous complex-valued backpropagation learning,” IEE Electronics

Letters, vol. 28, no. 20, pp. 1854–1855, 1990.

Bibliography 117

[11] G. Kechriotis and E. S. Manolakos, “Training fully recurrent neural networks with

complex weights,” IEEE Transactions on Circuits and Systems II: Analog and Digital

Signal Processing, vol. 41, no. 3, pp. 235–238, 1994.

[12] T. Kim and T. Adali, “Approximation by fully complex multilayer perceptrons,”

Neural Computation, vol. 15, no. 7, pp. 1641–1666, 2003.

[13] S. L. Goh and D. P. Mandic, “A complex-valued RTRL algorithm for recurrent neural

networks,” Neural Computation, vol. 16, no. 12, pp. 2699–2713, 2004.

[14] T. Nitta and H. de Garis, “A 3D vector version of the back-propagation algorithm,” In

Proceedings of International Joint Conference on Neural Networks (IJCNN), vol. 2,

pp. 511–516, 1992.

[15] T. Nitta, “A back-propagation algorithm for neural networks based on 3D vector prod-

uct,” In Proceedings of International Joint Conference on Neural Networks (IJCNN),

vol. 1, pp. 589–592, 1993.

[16] P. Arena, L. Fortuna, G. Muscato, and M. G. Xibilia, Neural Networks in Multidi-

mensional Domains. Lecture Notes in Control and Information Sciences (Springer

Verlag), Vol. 234, 1998.

[17] W. R. Hamilton, Elements of Quaternions (2nd edition). Longmans, Green and Co,

1899.

[18] F. D. Murnaghan, “The evolution of the concept of number,” The Scientific Monthly,

vol. 68, no. 4, pp. 262–269, 1949.

[19] D. Alfsman, H. G. Gockler, S. J. Sangwine, and T. A. Ell, “Hypercomplex algebras

in digital signal processing: benefits and drawbacks,” In Proceedings EURASIP 15th

European Signal Processing Conference (EUSIPCO), pp. 1322–1326, 2007.

[20] C. F. F. Karney, “Quaternions in molecular modelling,” Journal of Molecular Graph-

ics and Modelling, vol. 25, no. 5, pp. 595–604, 2007.

[21] S. B. Choe and J. J. Faraway, “Modeling head and hand orientation during motion

using quaternions,” Journal of Aerospace, vol. 113, no. 1, pp. 186–192, 2004.

Bibliography 118

[22] J. C. K. Chou, “Quaternions kinematic and dynamic differential equations,” IEEE

Transactions on Robotics and Automation, vol. 8, no. 1, pp. 53–64, 1992.

[23] D. Choukkroun, I. Y. Bar Itzhack, and Y. Ohsman, “Novel quaternion Kalman filter,”

IEEE Transactions on Aerospace and Electronics Systems, vol. 42, no. 1, pp. 174–190,

2006.

[24] C. Cheong Took and D. P. Mandic, “The quaternion LMS algorithm for adaptive

filtering of hypercomplex processes,” IEEE Transactions on Signal Processing, vol. 57,

no. 4, pp. 1316–1327, 2009.

[25] A. Sudbery, “Quaternionic analysis,” Mathematical Proceedings of the Cambridge

Philosophical Society, vol. 85, no. 2, pp. 199–225, 1979.

[26] W. R. Hamilton, “Elements of quaternions,” Chelsea Publication, 1969.

[27] S. De Leo and P. Rotelli, “Quaternion analyticity,” Applied Mathematics Letters,

vol. 16, no. 7, pp. 1077–1081, 2003.

[28] Y. Gong and C. F. N. Cowan, “An LMS style variable tap-length algorithm for struc-

ture adaptation,” IEEE Transactions on Signal Processing, vol. 53, no. 7, pp. 2400–

2407, 2005.

[29] B. Che Ujang, C. Cheong Took, and D. P. Mandic, “Identification of improper pro-

cesses by variable tap-length complex valued adaptive filters,” In Proceedings of In-

ternational Joint Conference on Neural Networks (IJCNN), pp. 1–6, 2010.

[30] S. Haykin, Adaptive filter theory (4th edition). Prentice Hall, 2002.

[31] C. C. Took, G. Strbac, K. Aihara, and D. P. Mandic, “Quaternion-valued short-term

joint forecasting of three-dimensional wind and atmospheric parameters,” Renewable

Energy, vol. 36, no. 6, pp. 1754–1760, 2011.

[32] O. Heaviside, “Vectors versus quaternions,” Nature, vol. 47, pp. 533–534, 1893.

[33] A. MacFarlane, “Vectors versus quaternions,” Nature, vol. 48, no. 1230, pp. 75–76,

1893.

Bibliography 119

[34] C. C. Silva and R. D. A. Martins, “Polar and axial vectors versus quaternions,”

American Association of Physics Teachers, vol. 70, no. 9, pp. 958–963, 2002.

[35] F. D. Neeser and J. L. Massey, “Proper complex random processes with applications

to information theory,” IEEE Transactions on Information Theory, vol. 39, no. 4,

pp. 1293–1302, 1993.

[36] B. Picinbono, “On circularity,” IEEE Signal Processing Letters, vol. 42, no. 12,

pp. 3473–3482, 1994.

[37] A. Walden and P. Rubin-Delanchy, “On testing for impropriety of complex-valued

Gaussian vectors,” IEEE Transactions on Signal Processing, vol. 57, no. 3, pp. 825–

834, 2009.

[38] D. P. Mandic and V. S. L. Goh, Complex valued nonlinear adaptive filters: noncircu-

larity, widely linear and neural models. Wiley, 2009.

[39] D. P. Mandic, S. Javidi, S. L. Goh, A. Kuh, and K. Aihara, “Complex valued predic-

tion of wind profile using augmented complex statistics,” Renewable Energy, vol. 34,

no. 1, pp. 196–210, 2009.

[40] N. N. Vakhania, “Random vectors with values in quaternion Hilbert spaces,” Theories

of Probability and its Applications, vol. 43, no. 1, pp. 99–115, 1999.

[41] P. O. Amblard and N. Le Bihan, “On properness of quaternion valued random vari-

ables,” In Proceedings of International Conference on Mathemathics (IMA) in Signal

Processing, pp. 23–26, 2004.

[42] C. Cheong Took and D. P. Mandic, “Augmented second-order statistics of quaternion

random process,” Signal Processing, vol. 91, no. 2, pp. 214–224, 2011.

[43] J. Via, D. Ramirez, and I. Santamaria, “Properness and widely linear processing

of quaternion random vectors,” IEEE Transactions on Information Theory, vol. 56,

no. 7, pp. 3502–3515, 2010.

[44] B. Picinbono and P. Chevalier, “Widely linear estimation with complex data,” IEEE

Transactions on Signal Processing, vol. 43, no. 8, pp. 2030–2033, 1995.

Bibliography 120

[45] C. Cheong Took and D. P. Mandic, “A quaternion widely linear adaptive filter,” IEEE


[46] J. Via, D. Ramirez, I. Santamaria, and L. Vielva, “Widely and semi-widely linear

processing of quaternion vectors,” In Proceedings of IEEE International Conference

on Acoustics, Speech and Signal Processing (ICASSP), pp. 3946–3949, 2010.

[47] J. Via, D. P. Palomar, and L. Vielva, “Generalized likelihood ratios for testing the

properness of quaternion gaussian vectors ,” IEEE Transactions on Signal Processing,

vol. 59, no. 4, pp. 1356–1370, 2011.

[48] J. Via, L. Vielva, I. Santamaria, and D. P. Palomar, “Independent component analysis

of quaternion gaussian vectors,” In Proceedings of IEEE Sensor Array and Multichan-

nel Signal Processing Workshop (SAM), pp. 145–148, 2010.

[49] E. Stiefel, “On Cauchy-Riemann equations in higher dimensions,” Journal of Research

of the National Bureau of Standards, vol. 48, no. 5, pp. 395–398, 1952.

[50] R. E. S. Watson, “The generalized Cauchy-Riemann-Fueter equation and handed-

ness,” Complex Variables, vol. 48, no. 7, pp. 555–568, 2003.

[51] C. A. Deavours, “The quaternion calculus,” The American Mathematical Monthly,

vol. 80, no. 9, pp. 995–1008, 1973.

[52] D. P. Mandic and J. A. Chambers, Recurrent Neural Networks for Prediction: Learn-

ing Algorithms, Architectures and Stability. Wiley, 2001.

[53] A. M. Sabatini, “Quaternion-based extended Kalman filter for determining orienta-

tion by inertial and magnetic sensing,” IEEE Transactions on Biomedical Engineer-

ing, vol. 53, no. 7, pp. 1346–1356, 2006.

[54] S. Buchholz and N. L. Bihan, “Polarized signal classification by complex and quater-

nionic multi-layer perceptrons,” International Journal of Neural Systems, vol. 18,

no. 2, pp. 75–85, 2008.

Bibliography 121

[55] L. Fortuna, G. Muscato, and M. G. Xibilia, “A comparison between HMLP and

HRBF for attitude control,” IEEE Transactions on Neural Networks, vol. 12, no. 2,

pp. 318–328, 2001.

[56] D. P. Mandic and J. A. Chambers, “Relating the slope of the activation function and

the learning rate within a recurrent neural network,” Neural Computation, vol. 11,

no. 5, pp. 1069–1077, 1999.

[57] E. Trentin, “Networks with trainable amplitude of activation functions,” Neural Net-

works, vol. 14, no. 4-5, pp. 471–493, 2001.

[58] A. I. Hanna and D. P. Mandic, “Nonlinear FIR adaptive filters with a gradient adap-

tive amplitude in the nonlinearity,” IEEE Signal Processing Letters, vol. 9, no. 8,

pp. 253–255, 2002.

[59] S. L. Goh and D. P. Mandic, “Recurrent neural networks with trainable amplitude

of activation functions,” Neural Networks, vol. 16, no. 8, pp. 1095–1100, 2003.

[60] E. Soria Olivas, J. Maravilla, J. F. Guerrero Martinez, M. Martinez Sober, and

J. Espi Lopez, “An easy demonstration of the optimum value of the adaptation con-

stant in the LMS algorithm,” IEEE Transactions on Education, vol. 41, no. 1, p. 81,

1998.

[61] W. Duch and N. Jankowski, “Survey of neural transfer functions,” Neural Computing

Survey, vol. 2, pp. 163–212, 1999.

[62] S. Haykin and L. Li, “Nonlinear adaptive prediction of nonstationary signals,” IEEE


[63] K. Mitsubori and T. Saito, “Torus doubling and hyperchaos in a five dimensional

hysteresis circuit,” In Proceedings of 1994 IEEE International Symposium on Circuit

and Systems (ISCAS), vol. 6, pp. 113–116, 1994.

[64] J. Via, D. P. Palomar, L. Vielva, and I. Santamaria, “Quaternion ICA from second-

order statistics,” IEEE Transactions on Signal Processing, vol. 59, no. 4, pp. 1586–

1600, 2011.

Bibliography 122

[65] A. Hirose, Complex-valued neural networks: theories and applications. World Scien-

tific Publishing, 2003.

[66] A. Hirose and H. Onishi, “Proposal of relative-minimization learning for behavior

stabilization of complex-valued recurrent neural networks,” Neurocomputing, vol. 24,

no. 1-3, pp. 163–171, 1999.

[67] I. Aizenberg and C. Moraga, “Multilayer feedforward neural network based on multi-

valued neurons (MLMVN) and a backpropagation learning algorithm,” Soft Comput-

ing, vol. 11, no. 2, pp. 169–183, 2007.

[68] I. Aizenberg, N. N. Aizenberg, and J. P. L. Vandewalle, Multi-valued and universal

binary neurons. Springer-Verlag New York, 2000.

[69] S. L.Goh, M. Chen, D. H. Popovic, K. Aihara, D. Obradovic, and D. P. Mandic,

“Complex valued forecasting of wind profile,” Renewable Energy, vol. 31, no. 11,

pp. 1733–1750, 2006.

[70] F. F. Brackx, “The exponential function of a quaternion variable,” Applicable Anal-

ysis, vol. 8, pp. 265–276, 1979.

[71] L. Shi, “Exploration in quaternion colour,” Master’s thesis, Computer Science, Simon

Fraser University, 2005.

[72] S. Buchholz and N. Le Bihan, “Polarized signal classification by complex and quater-

nionic multilayer perceptrons,” International Journal of Neural Systems, vol. 18,

no. 2, pp. 75–85, 2008.

[73] C. Cheong Took and D. P. Mandic, “Quaternion-valued stochastic gradient-based

adaptive IIR filtering,” IEEE Transactions on Signal Processing, vol. 58, no. 7,

pp. 3895–3901, 2010.

[74] E. N. Lorenz, “Deterministic nonperiodic flow,” Journal of the Atmospheric Sciences,

vol. 20, no. 2, pp. 130–141, 1963.

Bibliography 123

[75] S. H. Strogartz, Nonlinear dynamics and chaos: with applications to physics, biol-

ogy, chemistry and engineering (studies in nonlinearity) 1st edition. Boulder , CO:

Westview Press, 2001.

[76] Z. Pritzker and A. Feuer, “Variable length stochastic gradient algorithm,” IEEE


[77] B. Jelfs, D. P. Mandic, and S. C. Douglas, “An adaptive approach for the identification

of improper complex signals,” Signal Processing, vol. 92, no. 2, pp. 335–344, 2012.

[78] J. Arenas-Garcia and A. H. S. A. R. Figueiras-Vidal, “Mean-square performance of a

convex combination of two adaptive filters,” IEEE Transactions on Signal Processing,

vol. 54, no. 3, pp. 1078–1090, 2006.

[79] Y. Zhang, N. Li, J. A. Chambers, and A. H. Sayed, “Steady-state performance anal-

ysis of variable tap-length LMS algorithm,” IEEE Transactions on Signal Processing,

vol. 56, no. 2, pp. 839–845, 2008.

[80] J. Navarro-Moreno, “ARMA prediction of widely linear systems by using the innova-

tions algorithm,” IEEE Transactions on Signal Processing, vol. 56, no. 7, pp. 3061–

3068, 2008.

[81] B. Jelfs, S. Javidi, P. Vayanos, and D. P. Mandic, “Characterisation of signal modality:

exploiting signal nonlinearity in machine learning and signal processing,” Journal of

Signal Processing Systems, vol. 61, no. 1, pp. 105–115, 2010.

[82] E. Ollila, “On the circularity of a complex random variable,” IEEE Signal Processing

Letters, vol. 15, pp. 841–844, 2008.

124

Appendix A

Derivation of QLMS

To calculate ∇wy(n) and ∇wy∗(n), terms wT (n)x(n) and xH(n)w∗(n) are first expanded

as (due to space limitation, the time index “n” has been dropped) :

wT (n)x(n) =

wTa xa −wT

b xb −wTc xc −wT

d xd

wTa xb +wT

b xa +wTc xd −wT

d xc

wTa xc +wT

c xa +wTd xb −wT

b xd

wTa xd +wT

d xa +wTb xc −wT

c xb

(A.1)

xH(n)w∗(n) =

wTa xa −wT


d xd

−wTa xb −wT

b xa −wTc xd +wT

d xc

−wTa xc −wT

c xa −wTd xb +wT

b xd

−wTa xd −wT

d xa −wTb xc +wT

c xb

(A.2)

and the gradients ∇wy(n) and ∇wy∗(n) are defined as:

∇wy(n) = ∇way(n) +∇wby(n)ı+∇wcy(n)+∇wd

y(n)κ (A.3)

∇wy∗(n) = ∇way

∗(n) +∇wby∗(n)ı+∇wcy

∗(n)+∇wdy∗(n)κ (A.4)

A. Derivation of QLMS 125

Based on the expansions (A.1) and (A.2), the derivatives of (A.3) can be computed as:

∇way(n) = xa(n) + xb(n)ı+ xc(n)+ xd(n)κ

∇wby(n)ı = (−xb(n) + xa(n)ı− xd(n)+ xc(n)κ)ı

= −xa(n)− xb(n)ı+ xc(n)+ xd(n)κ

∇wcy(n) = (−xc(n) + xd(n)ı+ xa(n)− xb(n)κ)

= −xa(n) + xb(n)ı− xc(n)+ xd(n)κ

∇wdy(n)κ = (−xd(n)− xc(n)ı+ xb(n)+ xa(n)κ)κ

= −xa(n) + xb(n)ı+ xc(n)− xd(n)κ (A.5)

Similarly to the above, the derivatives in (A.4) are obtained as

∇way∗(n) = xa(n)− xb(n)ı− xc(n)− xd(n)κ

∇wby∗(n)ı = (−xb(n)− xa(n)ı+ xd(n)− xc(n)κ)ı

= xa(n)− xb(n)ı− xc(n)− xd(n)κ

∇wcy∗(n) = (−xc(n)− xd(n)ı− xa(n)+ xb(n)κ)

= xa(n)− xb(n)ı− xc(n)− xd(n)κ

∇wdy∗(n)κ = (−xd(n) + xc(n)ı− xb(n)− xa(n)κ)κ

= xa(n)− xb(n)ı− xc(n)− xd(n)κ (A.6)

Substituting (A.5) into gradient ∇wy(n) (A.3) and (A.6) into ∇wy∗(n) (A.4) yield

∇wy(n) = −2x∗(n); ∇wy∗(n) = 4x∗(n) (A.7)

which is employed in the derivation of the QLMS.

The derivation for other nonlinear quaternion algorithms, SQAFA, AASQAFA,

QNGD and AQNGD, also follow a similar approach.

126

Appendix B

Derivation of QMLP-FIR

Before proceeding, the componentwise output net(n) = wT (n)x(n) is given as (the time

index “n” is dropped due to space limitation)

neta

netb

netc

netd

=

wTa xa −wT


d xd

wTa xb +wT

b xa +wTc xd −wT

d xc

wTa xc +wT

c xa +wTd xb −wT

b xd

wTa xd +wT

d xa +wTb xc −wT

c xb

(B.1)

For clarity, the gradient of the QMLP-FIR is shown to be

∇wE(n) = −2ea(n)∂ya(n)

∂w− 2eb(n)

∂yb(n)

∂w− 2ec(n)

∂yc(n)

∂w− 2ed(n)

∂yd(n)

∂w(B.2)

From Section 3.2.1, the term ∂ya(n)∂w is calculated to be

∂ya(n)

∂w= Φ′

a(neta(n))x∗(n) (B.3)

B. Derivation of QMLP-FIR 127

Similarly, the other terms ∂yb(n)∂w , ∂yc(n)

∂w and ∂yd(n)∂w are derived to be

∂yb(n)

∂w= Φ′

b(netb(n))xb(n) + Φ′b(netb(n))xa(n)ı+Φ′

b(netb(n))xd(n)− Φ′b(netb(n))xc(n)κ

= Φ′b

(netb(n))ı

(− xb(n)ı+ xa(n)− xd(n)κ− xc(n)

)

= Φ′b

(netb(n))ıx

∗(n)

∂yc(n)

∂w= Φ′

c(netc(n))xc(n)− Φ′c(netc(n))xd(n)ı+Φ′

c(netc(n))xa(n)+Φ′c(netc(n))xb(n)κ

= Φ′c

(netc(n))

(− xc(n)− xd(n)κ+ xa(n)− xb(n)ı

)

= Φ′c(netc(n))x

∗(n)

∂yd(n)

∂w= Φ′

d(netd(n))xd(n) + Φ′d(netd(n))xc(n)ı− Φ′

d(netd(n))xb(n)+Φ′d(netd(n))xa(n)κ

= Φ′d

(netd(n))κ

(− xd(n)κ− xc(n)− xb(n)ı+ xa(n)

)

= Φ′d(netd(n))κx

∗(n) (B.4)

Substituting the terms defined in (B.3) and (B.4) into the gradient ∇wE(n) (B.2) to yield

∇wE(n) = −2ea(n)Φ′a(neta(n))x

∗(n)− 2eb(n)Φ′b(netb(n))ıx

∗(n)− 2ec(n)Φ′c(netc(n))x

∗(n)

− 2ed(n)Φ′d(netd(n))κx

∗(n)

= −2e(n).Φs(net(n))x∗(n) (B.5)

where “.” denotes the dot product.

128

Appendix C

Convergence of SQAFA

The convergence criterion employed in this work is given by

E{‖e(n)‖22} ≤ E{‖e(n)‖22} (C.1)

where e and e are respectively the a posteriori and the a priori output error, given by

e(n) = d(n)− Φs

(wT (n+ 1)x(n)

)+ ε(n); e(n) = d(n)− Φs

(wT (n)x(n)

)+ ε(n) (C.2)


as

ε(n) = εa(n) + εb(n)ı+ εc(n)+ εd(n)κ (C.3)



The terms e and e in (C.2) can be related by the first order Taylor series expansion as

‖e(n)‖22 = ‖e(n)‖22 +∆wH(n)∂‖e(n)‖22∂w∗(n)

(C.4)

C. Convergence of SQAFA 129

where∂‖e(n)‖2

2

∂w∗(n) is effectively the error gradient of the cost function.

The term ‖e(n)‖22 is first evaluated as

‖e(n)‖22 =

(

d(n)− y(n) + ε(n)

)(

d∗(n)− y∗(n) + ε∗(n)

)

= d(n)d∗(n)− d(n)y∗(n) + d(n)ε∗(n)− y(n)d∗(n) + y(n)y∗(n)− y(n)ε∗(n) + ε(n)d∗(n)

− ε(n)y∗(n) + ε(n)ε∗(n) (C.5)

Then, the error gradient∂‖e(n)‖2

2

∂w∗(n) can be calculated as

∂‖e(n)‖22∂w∗(n)

= −d(n)∇wy∗(n)−∇wy(n)d

∗(n) + y(n)∇wy∗(n) +∇wy(n)y

∗(n)−∇wy(n)ε∗(n)

− ε(n)∇wy∗(n)

=

(

− d(n) + y(n)− ε(n)

)

∇wy∗(n) +∇wy(n)

(

− d∗(n) + y∗(n)− ε∗(n)

)

= −e(n)∇wy∗(n)−∇wy(n)e

∗(n)

= −[4e(n)Φ′

s

(xH(n)w∗(n)

)x∗(n)− 2Φ′

s

(wT (n)x(n)

)x∗(n)e∗(n)

](C.6)

The term ∆wH(n) = −µ(∂‖e(n)‖2

2

∂w∗(n)

)H, where

∂‖e(n)‖22

∂w∗(n) is given in (C.6), and can be calculated

as

∆wH = µ[2xT (n)Φ

′∗s

(xH(n)w∗(n)

)e∗(n)− e(n)xT (n)Φ′∗

s

(wT (n)x(n)

)](C.7)

Substitute (C.6) - (C.7) into the Taylor series expansion (C.4) and apply the expectation

operators on both sides to yield

E{‖e(n)‖22} = E

{

|e(n)‖22 − µ

([2xT (n)Φ′∗

s

(xH(n)w∗(n)

)e∗(n)− e(n)xT (n)Φ′∗

s

(wT (n)x(n)

)]

[4e(n)Φ′

s

(xH(n)w∗(n)

)x∗(n)− 2Φ′

s

(wT (n)x(n)

)x∗(n)e∗(n)

])}

(C.8)

C. Convergence of SQAFA 130

Applying the assumptions of small µ and statistical independence between the e(n) and

x(n) followed by the factorization of the term ‖e(n)‖22 gives

E{‖e(n)‖22} = E

{

‖e(n)‖22

[

1− 10µxT (n)x∗(n)‖Φ′s

(wT (n)x(n)

)‖22

]}

= E{‖e(n)‖22}E

{[

1− 10µxT(n)x∗(n)‖Φ′s

(wT (n)x(n)

)‖22

]}

(C.9)

The two terms can be separated since they are independent of each other cor-

responding to the statistical independence between the e(n) and x(n). Therefore, the

condition for convergence in (C.1) is satisfied for

0 < 10µE{xT (n)x∗(n)‖Φ′s

(wT (n

)x(n))‖22} < 1 (C.10)

Solving for µ we obtain the range of the stepsize for SQAFA to converge

0 < µ <1

10E{xT (n)x∗(n)‖Φ′s

(wT (n)x(n)

)‖22}

(C.11)

The range of stepsize for QNGD and AQNGD are derived in the same manner.

131

Appendix D

Convergence of AASQAFA

Similar to the convergence of SQAFA, the convergence criterion employed is

E{‖e(n)‖22} ≤ E{‖e(n)‖22} (D.1)

The a priori error in the real part ea(n), and the a posteriori error in the real part ea(n),

are given by

ea(n) = da(n)−λa(n)Φa

(wT (n)x(n)

)+ε(n); ; ea(n) = da(n)−λa(n)Φa

(wT (n+1)x(n)

)+ε(n)

(D.2)


as

ε(n) = εa(n) + εb(n)ı+ εc(n)+ εd(n)κ (D.3)



Since λa corresponds to the real part of a quaternion quantity, we shall consider

only the real part of the Taylor series expansion. From (C.4) we have

‖ea(n)‖22 = ‖ea(n)‖

22 +∆aw(n)

∂‖ea(n)‖22

∂w(n)(D.4)

where the term ∆awH(n) refers to the Hermitian of the weight update in the real part.

D. Convergence of AASQAFA 132

The term∂‖ea(n)‖22∂w(n) is equivalent to ∇wEa(n) and is given by

∇wEa = ea(n)∂e∗a(n)

∂w(n)+∂ea(n)

∂w(n)e∗a(n) = 2ea(n)

∂ea(n)

∂w(n)(D.5)

From the previous results, the term R{wT (n)x(n)} = neta(n) is given as

neta(n) = wTa (n)xa(n)−wT

b (n)xb(n)−wTc (n)xc(n)−wT

d (n)xd(n) (D.6)

Using (D.6) the real part of the nonlinear function Φa(·) can be expanded into

Φa

(neta(n)

)= Φa

(wT

a (n)xa(n)−wTb (n)xb(n)−wT

c (n)xc(n)−wTd (n)xd(n)

)(D.7)

Substitute the expression for the real part of the nonlinear function (D.7) into the a priori

error (D.2) and then differentiate with respect to w(n) to give

∂ea(n)

∂w(n)= −λa(n)Φ

′a

(wT (n)x(n)

)(xa(n)− xb(n)ı− xc(n)− xd(n)κ

)

= −λa(n)Φ′a

(wT (n)x(n)

)x∗(n) (D.8)

Replacing (D.8) into the error gradient ∇wEa(n) in (D.5) gives

∇wEa(n) = −2ea(n)λa(n)Φ′a

(wT (n)x(n)

)x∗(n) (D.9)

The term ∆aw(n) is obtained from (D.9) and is given by

∆aw(n) = µ

(

λa(n)Φ′a

(wT (n)x(n)

)xT (n)ea(n)

)

(D.10)

Replace the error gradient ∇wEa(n) from (D.9) and ∆aw(n) from (D.10) into the real

D. Convergence of AASQAFA 133

part of the Taylor series expansion (D.4) to yield

E{‖ea(n)‖22} = E

{

‖ea(n)‖22 −

[

2µλ2a(n)xT (n)x∗(n)‖Φ′

a

(wT (n)x(n)

)‖22‖ea(n)‖

22

]}

(D.11)

Now, in order to satisfy the convergence condition (D.1), (D.11) becomes

0 < E{1 − 2µλ2a(n)xT (n)x∗(n)‖Φ′

a

(wT (n)x(n)

)‖22)} < 1 (D.12)

Solving for λa(n) gives the stability bounds on the adaptive amplitude parameter, in the

form

0 < λ2a(n) <1

2µE{xT (n)x∗(n)‖Φ′a

(wT (n)x(n)

)‖22}

(D.13)

which also reveals the relationship between the value of the amplitude of the quaternion

nonlinearity and the stepsize parameter.

Similarly, using the same procedures, the bounds on λb(n), λc(n) and λd(n) can be found

as

0 < λ2b(n) <1

2µxT (n)x∗(n)‖Φ′b

(wT (n)x(n)

)‖2

(D.14)

0 < λ2c(n) <1

2µxT (n)x∗(n)‖Φ′c

(wT (n)x(n)

)‖22

(D.15)

0 < λ2d(n) <1

2µxT (n)x∗(n)‖Φ′d

(wT (n)x(n)

)‖22

(D.16)

134

Appendix E

Analyticity of the exponential

function eq

The quaternion exponential function eq in its Euler form is given by

eq = eqa(


α+qc sin(α)

α+qd sin(α)κ

α

)

(E.1)

The derivative to be evaluated is defined as

−∂eq

∂αζ = −

(qbα

∂eq

∂qb+qcα

∂eq

∂qc+qdα

∂eq

∂qd

)(qbı+ qc+ qdκ

α

)

(E.2)

To calculate the term −∂eq

∂α ζ in (E.2), the terms ∂eq

∂qb, ∂eq

∂qcand ∂eq

∂qdare first evaluated. The

term ∂eq

∂qbis derived by differentiating (E.1) with respect to qb to yield

∂eq

∂qb= eqa

∂

∂qb

(


α+qc sin(α)

α+qd sin(α)κ

α

)

= eqa(−qb sin(α)

α+q2b cos(α)ı

α2+

(q2c + q2d

)sin(α)ı

α3

+qbqc cos(α)

α2−qbqc sin(α)

α3+qbqd cos(α)κ

α2−qbqd sin(α)κ

α3

)

(E.3)

E. Analyticity of the exponential function eq 135

Proceeding in the same manner, the terms ∂eq

∂qcand ∂eq

∂qdare calculated as

∂eq

∂qc= eqa

(−qc sin(α)

α+qbqc cos(α)ı

α2−qbqc sin(α)ı

α3

+q2c cos(α)

α2+

(q2b + q2d

)sin(α)

α3+qcqd cos(α)κ

α2−qcqd sin(α)κ

α3

)

(E.4)

∂eq

∂qd= eqa

(−qd sin(α)

α+qbqd cos(α)ı

α2−qbqd sin(α)ı

α3

+qcqd cos(α)

α2−

(qcqd

)sin(α)

α3+q2d cos(α)κ

α2+

(q2b + q2c

)sin(α)κ

α3

)

(E.5)

Substituting the terms defined in (E.3), (E.4) and (E.5) into the analyticity condition

specified in (E.2) to yield

−∂eq

∂αζ = eqa

(− sin(α)

α2

(q2b + q2c + q2d

)+qa cos(α)

α3ı+

qc cos(α)

α3+

qd cos(α)

α3κ

)(

− ζ

)

(E.6)

From Section 4.3, ζ and α are given by

ζ =qbı+ qc+ qdκ

α; α =

√

q2b + q2c + q2d (E.7)

(E.6) is simplified further by replacing the definition of ζ and α in (E.7) to give

−∂eq

∂αζ = eqa

(

− sin(α) +qb cos(α)ı

α+qc cos(α)

α+qd cos(α)κ

α

)(

− ζ

)

= eqa(

cos(α) + sin(α)ζ

)

(E.8)

136

Appendix F

Local Analyticity of tanh(q)

From Section 4.3.2, the Euler expression for tanh(q) is derived to be

tanh(q) =e4qa − 1 + 2e2qa sin(2α)ζ

e4qa + 1 + 2e2qa cos(2α)(F.1)

To examine the local analyticity of tanh(q), the quaternion local analyticity condition of

tanh(q) is given as

∂ tanh(q)

∂qa= −

(qbα

∂ tanh(q)

∂qb+qcα

∂ tanh(q)

∂qc+qdα

∂ tanh(q)

∂qd

)(qbı+ qc+ qdκ

α

)

(F.2)

Similarly to the case of quaternion exponential functions, the term ∂ tanh(q)∂qa

can be obtained

by differentiating (F.1) with respect to qa, to give

∂ tanh(q)

∂qa=

∂

∂qa

(e4qa − 1

e4qa + 2e2qa cos(2α) + 1+

2e2qa sin(2α)ζ

e4qa + 2e2qa cos(2α) + 1

)

=4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)


)2 +

(4e2qa sin(2α) − 4e6qa sin(2α)

)


)2 ζ (F.3)

In order to determine the remaining terms in (F.2), define

u = 2e2qa sin(2α); v = e4qa + 2e2qa cos(2α) + 1 (F.4)

F. Local Analyticity of tanh(q) 137

Furthermore, ζ and α are defined by

ζ =qbı+ qc+ qdκ

α; α =

√

q2b + q2c + q2d (F.5)

Substitute u and v into (F.1) and expand ζ according to (F.5) to yield

tanh(q) =e4qa − 1 + uζ

v=

e4qa − 1

v+uqbı

vα+uqc

vα+uqdκ

vα(F.6)

Proceeding in a manner similar to when determining the analyticity of eq, the term ∂ tanh(q)∂qb

is obtained by differentiating (F.2) with respect to qb, resulting in

∂ tanh(q)

∂qb=

∂

∂qb

(e4qa − 1

v+uqbı

vα+uqc

vα+uqdκ

vα

)

=

(e4qa − 1

)(4e2qaqb sin(2α)

)

v2+

(vα

)(∂uqb∂qb

)−

(uqb

)(∂vα∂qb

)

(vα)2

ı+

(vα

)(∂uqc∂qb

)−

(uqc

)(∂vα∂qb

)

(vα)2

+

(vα

)(∂uqd∂qb

)−

(uqd

)(∂vα∂qb

)

(vα)2

κ

=

(e4qa − 1

)(4e2qaqb sin(2α)

)

v2+

(vαu+ v4e2qaq2b cos(2α) −

uvq2b

α + uq2b4e2qa sin(2α)

(vα

)2

)

ı

+

(v4e2qaqbqc cos(2α) −

uvqbqcα + uqbqc4e

2qa sin(2α)(vα

)2

)

+

(v4e2qaqbqd cos(2α)−

uvqbqdα + uqbqd4e

2qa sin(2α)(vα

)2

)

κ (F.7)


Noticing that u, v and α are functions of the variables qb, qc and qd, the terms ∂ tanh(q)∂qc

and ∂ tanh(q)∂qd

become

∂ tanh(q)

∂qc=

(e4qa − 1

)(4e2qaqc sin(2α)

)

v2+

(v4e2qaqbqc cos(2α) −

uvqbqcα + uqbqc4e

2qa sin(2α)(vα

)2

)

ı

+

(vαu+ v4e2qaq2c cos(2α) −

uvq2cα + uq2c4e

2qa sin(2α)(vα

)2

)

+

(v4e2qaqcqd cos(2α) −

uvqcqdα + uqcqd4e

2qa sin(2α)(vα

)2

)

κ (F.8)

∂ tanh(q)

∂qd=

(e4qa − 1

)(4e2qaqd sin(2α)

)

v2+

(v4e2qaqbqd cos(2α) −

uvqbqdα + uqbqd4e

2qa sin(2α)(vα

)2

)

ı

+

(v4e2qaqcqd cos(2α) −

uvqcqdα + uqcqd4e

2qa sin(2α)(vα

)2

)

+

(vαu+ v4e2qaq2d cos(2α) −

uvq2d

α + uq2d4e2qa sin(2α)

(vα

)2

)

κ (F.9)

Replacing (F.7), (F.8) and (F.9) to the right hand of side of (F.2) yields

−∂ tanh(q)

∂αζ =

((e4qa − 1

)(4e2qa sin(2α)

(q2b + q2c + q2d

))

(vα

)2 +v4qbe

2qa cos(2α) + u4qbe2qa sin(2α)

v2αı

+v4qce

2qa cos(2α) + u4qce2qa sin(2α)

v2α+

v4qde2qa cos(2α) + u4qde

2qa sin(2α)

v2ακ

)

(

− ζ

)

(F.10)

Next, the terms u and v (F.4) are expanded to give

−∂ tanh(q)

∂αζ =

((e4qa − 1

)(4e2qa sin(2α)

(q2b + q2c + q2d

))

((e4qa + 2e2qa cos(2α) + 1

)α)2

+4qbe

6qa cos(2α) + 4qbe2qa cos(2α) + 8qbe

4qa(cos2(2α) + sin2(2α)

)


)2α

ı

+4qce

6qa cos(2α) + 4qce2qa cos(2α) + 8qce


)


)2α

+4qde

6qa cos(2α) + 4qde2qa cos(2α) + 8qde


)


)2α

κ

)(

− ζ

)

(F.11)


Simplify (F.11) further by employing sin2(α) + cos2(α) = 1 to yield

−∂ tanh(q)

∂αζ =

(4e6qa sin(2α) − 4e2qa sin(2α)(e4qa + 2e2qa cos(2α) + 1

)2

+4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)


)2

(qbı+ qc+ qdκ

α

))(

− ζ

)

(F.12)

Further substituting ζ and α in (F.5) gives

−∂ tanh(q)

∂αζ =

(4e6qa sin(2α) − 4e2qa sin(2α)(e4qa + 2e2qa cos(2α) + 1

)2 +4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)


)2 ζ

)(

− ζ

)

=4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)


)2 +4e2qa sin(2α) − 4e6qa sin(2α)(e4qa + 2e2qa cos(2α) + 1

)2 ζ (F.13)

140

Appendix G

A Local Derivative of tanh(q)

sech(q) is first expanded into its Euler formula to give

sech(q) =2

eq + e−q

=2

eqa(cos(α) + sin(α)ζ

)+ e−qa

(cos(α)− sin(α)ζ

)

=2e3qa


)+ 2eqa

(cos(α) + sin(α)ζ

)

e4qa + 2e2qa(cos2(α)− sin2(α)

)+ 1

(G.1)

and apply the identity cos2(α)− sin2(α) = cos(2α) to give

sech(q) =2e3qa


)+ 2eqa

(cos(α) + sin(α)ζ

)

e4qa + 2e2qa cos(2α) + 1(G.2)

Upon squaring (G.2) results in

sech2(q) =4e6qa

(cos2(α) − sin2(α)

)+ 4e4qa

(2 cos2(α) + 2 sin2(α)

)+ 4e2qa

(cos2(α) − sin2(α)

)


)2

+−8e6qa sin(α) cos(α) + 8e2qa sin(α) cos(α)


)2 ζ (G.3)

and substituting 2 sin(α) cos(α) = sin(2α) yields

sech2(q) =4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)


)2 +−4e6qa sin(2α) + 4e2qa sin(2α)(e4qa + 2e2qa cos(2α) + 1

)2 ζ (G.4)

141

Appendix H

Derivation of Split QRTRL

The term wTl (n)z(n) = netl is expanded into its componentwise terms given by (the time

index “n” is dropped due to space limitation)

netal

netbl

netcl

netdl

=

(wal )

T za − (wbl )

T zb − (wcl )

T zc − (wdl )

T zd

(wal )

T zb + (wbl )

T za + (wcl )

T zd − (wdl )

T zc

(wal )

T zc + (wcl )

T za + (wdl )

T zb − (wbl )

T zd

(wal )

T zd + (wdl )

T za + (wbl )

T zc − (wcl )

T zb

(H.1)

The gradient for the split QRTRL is given as

∇wE(n) =∂E(n)

∂wa(n)+

∂E(n)

∂wb(n)ı+

∂E(n)

∂wc(n)+

∂E(n)

∂wd(n)κ (H.2)

Expanding the term ∂E∂wa

s,tin (H.2) gives

∂E(n)

∂was,t(n)

= −eal (n)Ψl,(aa)s,t (n)− ebl (n)Ψ

l,(ba)s,t (n)− ecl (n)Ψ

l,(ca)s,t (n)− edl (n)Ψ

l,(da)s,t (n) (H.3)

From Section 5.3.1, the sensitivity Ψl,(aa)s,t is given as

Ψl,(aa)s,t (n) = Φ

′

s

(netal (n)

)(

δslzal (n) +

N∑

q=1

wal,p+1+q(n)Ψ

q,(aa)s,t (n− 1)− wb

l,p+1+q(n)Ψq,(ba)s,t (n− 1)

− wcl,p+1+q(n)Ψ

q,(ca)s,t (n− 1)− wd


)

(H.4)

H. Derivation of Split QRTRL 142

Similar to the derivation of Ψl,(aa)s,t , the other three sensitivities in (H.3) is determined to

be

Ψl,(ba)s,t (n) = Φ

′

s

(netbl (n)

)(δslz

bl (n) +

N∑

q=1

wal,p+1+q(n)Ψ

q,(ba)s,t (n− 1) + wb

l,p+1+q(n)Ψq,(aa)s,t (n− 1)

+ wcl,p+1+q(n)Ψ

q,(da)s,t (n− 1)− wd

l,p+1+q(n)Ψq,(ca)s,t (n− 1)

)

Ψl,(ca)s,t (n) = Φ

′

s

(netcl (n)

)(δslz

cl (n) +

N∑

q=1

wal,p+1+q(n)Ψ

q,(ca)s,t (n− 1)− wb


+ wcl,p+1+q(n)Ψ

q,(aa)s,t (n− 1) + wd

l,p+1+q(n)Ψq,(ba)s,t (n − 1)

)

Ψq,(da)s,t (n) = Φ

′

s

(netdl (n)

)(δslz

dl (n) +

N∑

q=1

wal,p+1+q(n)Ψ

q,(da)s,t (n− 1) + wb

l,p+1+q(n)Ψq,(ca)s,t (n− 1)

− wcl,p+1+q(n)Ψ

q,(ba)s,t (n− 1) + wd

l,p+1+q(n)Ψq,(aa)s,t (n − 1)

)) (H.5)

Next, expanding the remaining terms ∂E∂wb

s,t

, ∂E∂wc

s,tand ∂E

∂wds,t

in (H.2) will result in

∂E(n)

∂wbs,t(n)

= −eal (n)Ψl,(ab)s,t (n)− ebl (n)Ψ

l,(bb)s,t (n)− ecl (n)Ψ

l,(cb)s,t (n)− edl (n)Ψ

l,(db)s,t (n)

∂E(n)

∂wcs,t(n)

= −eal (n)Ψl,(ac)s,t (n)− ebl (n)Ψ

l,(bc)s,t (n)− ecl (n)Ψ

l,(cc)s,t (n)− edl (n)Ψ

l,(dc)s,t (n)

∂E(n)

∂wds,t(n)

= −eal (n)Ψl,(ad)s,t (n)− ebl (n)Ψ

l,(bd)s,t (n)− ecl (n)Ψ

l,(cd)s,t (n)− edl (n)Ψ

l,(dd)s,t (n)(H.6)

Since the sensitivities in (H.6) is in a similar form to (H.3), the following 12 sensitivities

are also in a similar expression to (H.5). These remaining sensitivities are derived in the

same manner and the full expression is given in Section 5.3.1.

143

Appendix I

Derivation of QRTRL

The sensitivity Ψls,t is shown to be

Ψls,t(n) =

∂yl(n)

∂was,t(n)

+∂yl(n)

∂wbs,t(n)

ı+∂yl(n)

∂wcs,t(n)

+∂yl(n)

∂wds,t(n)

κ (I.1)

In order to derive the terms in (I.1), the term wTl (n)z(n) = netl(n) is defined as (due to

space limitation, the time index “n” has been dropped)

netl=

(wal )

T za − (wbl )

T zb − (wcl )

T zc − (wdl )

T zd

(wal )

T zb + (wbl )

T za + (wcl )

T zd − (wdl )

T zc

(wal )

T zc + (wcl )

T za + (wdl )

T zb − (wbl )

T zd

(wal )

T zd + (wdl )

T za + (wbl )

T zc − (wcl )

T zb

(I.2)

Utilizing (I.2), the term ∂yl∂wa

s,tis shown to be

∂yl(n)

∂was,t(n)

= Φ′

(netl(n))

(

δsl(zal (n)+z

bl (n)ı+z

cl (n)+z

dl (n)κ

)+

N∑

q=1

ws,t(n)∂yq(n − 1)

∂was,t(n)

)

(I.3)

I. Derivation of QRTRL 144

Similar to the derivation of ∂yl∂wa

s,tin Section 5.3.2, the terms ∂yl

∂wbs,t

, ∂yl∂wc

s,tand ∂yl

∂wds,t

∂yl(n)

∂wbs,t(n)

ı = Φ∗(netl(n))

(

δsl(−zal (n)− zbl (n)ı+ zcl (n)+ zdl (n)κ) +

∑Nq=1ws,t(n)

∂yq(n−1))

wbs,t(n)

ı

)

∂yl(n)

∂wcs,t(n)

ı = Φ∗(netl(n))

(

δsl(−zal (n) + zbl (n)ı− zcl (n)+ zdl (n)κ) +

∑Nq=1ws,t(n)

∂yq(n−1))∂wc

s,t(n)

)

∂yl(n)

∂wds,t(n)

κ = Φ∗(netl(n))

(

δsl(−zal (n) + zbl (n)ı+ zcl (n)− zdl (n)κ) +

∑Nq=1 ws,t(n)

∂yq(n−1))

∂wds,t(n)

κ

)

(I.4)

Adding up these terms to determine the sensitivity Ψls,t will give

Ψls,t(n) = = Φ

′

(netl(n))

(

− 2δsl(zal (n)− zbl (n)ı− zcl (n)− zdl (n)κ

)

+

N∑

q=1

ws,t(n)(∂yq(n− 1)

∂was,t(n)

+∂yq(n− 1))

wbs,t(n)

ı+∂yq(n− 1))

∂wcs,t(n)

+∂yq(n− 1))

∂wds,t(n)

κ))

= Φ′

(netl(n))

(

− 2δslzl(n) +

N∑

q=1

ws,t(n)Ψls,t(n− 1)

)

(I.5)

The conjugate senstivity Υls,t is derived in a similar fashion. The sensitivity Υl

s,t is first

shown to be

Υls,t(n) =

∂y∗l (n)

∂was,t(n)

+∂y∗l (n)

∂wbs,t(n)

ı+∂y∗l (n)

∂wcs,t(n)

+∂y∗l (n)

∂wds,t(n)

κ (I.6)

Since y∗l (n) = Φ(zH(n)w∗(n)

), the term zH(n)w∗(n) = net∗l (n) is first defined as (the

time index “n” is dropped due to space limitation)

net∗l =

(wal )

T za − (wbl )

T zb − (wcl )

T zc − (wdl )

T zd

−(wal )

T zb − (wbl )

T za − (wcl )

T zd + (wdl )

T zc

−(wal )

T zc − (wcl )

T za − (wdl )

T zb + (wbl )

T zd

−(wal )

T zd − (wdl )

T za − (wbl )

T zc + (wcl )

T zb

(I.7)

I. Derivation of QRTRL 145

Following a similar method in determining the sensitivity Ψls,t, the differential terms in

(I.6) is derived to be

∂y∗l (n)

∂was,t(n)

= Φ′

(net∗l (n))

(

δsl(zal (n)− zbl (n)ı− zcl (n)− zdl (n)κ

)+

N∑

q=1

∂y∗q(n− 1)

∂was,t(n)

w∗s,t(n)

)

∂y∗l (n)

∂wbs,t(n)

= Φ′

(net∗l (n))

(


)+

N∑

q=1

∂y∗q(n− 1)

∂wbs,t(n)

w∗s,t(n)

)

∂y∗l (n)

∂wcs,t(n)

= Φ′

(net∗l (n))

(


)+

N∑

q=1

∂y∗q(n− 1)

∂wcs,t(n)

w∗s,t(n)

)

∂y∗l (n)

∂wds,t(n)

= Φ′

(net∗l (n))

(


)+

N∑

q=1

∂y∗q(n− 1)

∂wds,t(n)

w∗s,t(n)

)

(I.8)

Next, these differential terms are added up to yield the conjugate sensitivity Υls,t as

Υls,t(n) = = Φ

′

(net∗l (n))

(

4δsl(zal (n)− zbl (n)ı− zcl (n)− zdl (n)κ

)

+N∑

q=1

(∂y∗q (n− 1)

∂was,t(n)

+∂y∗q(n− 1))

wbs,t(n)

ı+∂y∗q(n − 1))

∂wcs,t(n)

+∂y∗q(n− 1))

∂wds,t(n)

κ)w∗s,t(n)

)

= Φ′

(net∗l (n))

(

4δslzl(n) +

N∑

q=1

Υls,t(n− 1)w∗

s,t(n)

)

(I.9)

Documents

Quaternion-ValuedNonlinearAdaptive Filters Ujang-CABB... · Quaternion-ValuedNonlinearAdaptive Filters Prepared by Che Ahmad Bukhari bin Che Ujang Supervised by Prof. Danilo P. Mandic