Automatic digital modulation recognition using articial

Signal Processing 84 (2004) 351–365www.elsevier.com/locate/sigpro

Automatic digital modulation recognition using arti"cialneural network and genetic algorithm

M.L.D. Wong, A.K. Nandi∗

Signal Processing and Communications Group, Department of Electrical Engineering and Electronics, University of Liverpool,Brownlow Hill, Liverpool L69 3GJ, UK

Received 9 August 2001; received in revised form 5 June 2003

Abstract

Automatic recognition of digital modulation signals has seen increasing demand nowadays. The use of arti"cial neuralnetworks for this purpose has been popular since the late 1990s. Here, we include a variety of modulation types for recognition,e.g. QAM16, V29, V32, QAM64 through the addition of a newly proposed statistical feature set. Two training algorithmsfor multi-layer perceptron (MLP) recogniser, namely Backpropagation with Momentum and Adaptive Learning Rate isinvestigated, while resilient backpropagation (RPROP) is proposed for this problem, are employed in this work. In particular,the RPROP algorithm is applied for the "rst time in this area. In conjunction with these algorithms, we use a separate data setas validation set during training cycle to improve generalisation. Genetic algorithm (GA) based feature selection is used toselect the best feature subset from the combined statistical and spectral feature set. RPROP MLP recogniser achieves about99% recognition performance on most SNR values with only six features selected using GA.? 2003 Elsevier B.V. All rights reserved.

Keywords: Modulation recognition; Arti"cial neural networks; Genetic algorithm; Feature subset selection

1. Introduction

Automatic modulation recognition (AMR) has itsroots in military communication intelligence appli-cations such as counter channel jamming, spectrumsurveillance, threat evaluation, interference identi"-cation, etc. Whilst most methods proposed initiallywere designed for analogue modulations, the recentcontributions in the subject focus more on digitalcommunication [6,17,23,24]. Primarily, this is due tothe increasing usage of digital modulations in many

∗ Corresponding author. Tel.: +44-151-794-4525;fax: +44-151-794-4540.

E-mail address: [email protected] (A.K. Nandi).

novel applications, such as mobile telephony, per-sonal dial-up network, indoor wireless network, etc.With the rising developments in software-de"nedradio (SDR) systems, automatic digital modulationrecognition (ADMR) has gained more attention thanever. Such units can act as front-end to SDR systemsbefore demodulation takes place, thus a single SDRsystem can robustly handle multiple modulations.In the early days, modulation recognition relied

heavily on human operator’s interpretation of mea-sured parameters to classify signals. Signal pro-perties such as IF waveform, signal spectrum,instantaneous amplitudes and instantaneous phase areoften used in conventional methods. A latter formof recogniser consists of demodulator banks, each

0165-1684/$ - see front matter ? 2003 Elsevier B.V. All rights reserved.doi:10.1016/j.sigpro.2003.10.019

mailto:[email protected]

352 M.L.D. Wong, A.K. Nandi / Signal Processing 84 (2004) 351–365

Nomenclature

s(t) modulated signalAm(t) message amplitudefm(t) message frequency�m(t) message phasefc carrier frequencyAmc; Ams QAM amplitude modulatorsgT pulse shaping functionfs sampling frequencyRb symbol baud rateNs number of samplesa(i) instantaneous amplitudef(i) instantaneous frequency�(i) instantaneous phase max maximum value of power spectral den-

sity of normalised a(i)�ap standard deviation (SD) of the absolute

value (AV) of the centred non-linearcomponents of �(i)

�dp SD of the direct value of the centrednon-linear components of �(i)

�aa SD of the AV of the centred normaliseda(i)

�af SD of the AV of the centred normalisedf(i)

C number of non-linear componentsXi signal vectorC cumulanty(t) received signaly(t) Hilbert transform of y(t)Hy complex envelope of y(t)R real part of HyI imaginary part of Hyyk(n) output of neuron kwij weight value for neuron i from neuron jui weighted sum of the inputs of neuron iE error function� learning rate parameter� momentum parameter�−;+ update parameterL next update value of the weight

designed for a particular modulation type. This is con-sidered as semi-automatic since a human operator isstill required to ‘listen’ to the output, but it is imprac-tical for digital communications.Since mid-1980s, new classes of modulation recog-

nisers which automatically determine incoming mod-ulation type have been proposed. Generally, thesemethods fall into two main categories, decision the-oretic and statistical pattern recognition. Decisiontheoretic approaches use probabilistic and hypoth-esis testing arguments to formulate the recognitionproblem. The major drawback of this approach arethe diNculties of forming the right hypothesis aswell as careful analyses that are required to set thecorrect threshold values. Examples of decision theo-retic approaches include Azzouz and Nandi [2] whoproposed a global procedure for analogue and digi-tal modulation signals, Swami and Saddler [13] whoexamines the suitability of cumulants as features fordigitally modulated signal, and Dohono and Huo[6] who proposed a new method using Hellingerrepresentation.

Pattern recognition approaches, however, do notneed such careful treatment, although choosing theright feature set is still an issue. This classi"cationmethod can be further divided into two: the featureextraction subsystem and the recognition subsystem.The feature extraction subsystem is responsible forextracting prominent characteristics which are calledfeatures from the raw data. Examples of features usedare higher-order cumulants [23], signal spectral [20],constellation shape [17], power moments [8], etc.The second subsystem, the pattern recogniser, is re-

sponsible for classifying the incoming signal based onthe features extracted. It can be implemented in manyways, e.g. K-nearest neighbourhood classi"er (KNN),probabilistic neural network (PNN), support vectormachine (SVM), etc. [15,16,18,22] chose multi-layerperceptron (MLP) as their modulation recognisersystem.Louis and Sehier [22] proposed a hierarchical neural

network which uses backpropagation training. Theyalso gave a performance analysis on backpropaga-tion with other algorithms such as cascade correlation,

M.L.D. Wong, A.K. Nandi / Signal Processing 84 (2004) 351–365 353

binary decision tree and KNN. Lu et al. [16] proposeda novel MLP-based modulation neural network recog-niser using instantaneous frequency and bandwidthfeatures of signals. In [15], Lu et al. enhanced theirtechniques through the usage of cyclic spectrum fea-tures. Nandi and Azzouz [18] proposed MLP neuralnetworks with spectral feature set for analogue, dig-ital and combined modulation recognition. Their al-gorithms had inspired the foundation of a couple ofcommercial product or prototype; examples of hard-ware implementation had been reported in a variety ofapplications, e.g. 4G software radio wireless networks[14], spectrum monitoring hardware [4,5], etc.The work presented in this paper is a continuation of

the works of Azzouz and Nandi [2,3,18,20] with focuson digital modulations and arti"cial neural network(ANN) based recognisers. Here we present the use oftwo updated ANN training algorithms, which improvethe performance and epoch times by a large margin.On the other hand, additional modulation and a newfeature set based on higher-order statistics (HOS) ofthe signal are studied. The performance of spectral,statistical and combined feature sets have also beenreviewed. Furthermore, we investigate the ePects offeatures subset selection on the combined feature setusing genetic algorithm (GA), which clearly demon-strates the reduction in feature set with enhancedperformance.This paper is organised as follows: Section 2 states

the problem of automatic recognition of digital modu-lated signals. Section 3 brieQy discusses the extractionof diPerent feature sets; MLP-based recognisers andits training algorithms are brieQy discussed in Sec-tion 4. Section 5 presents two methods of feature sub-sets selection using GA. Some simulation results areshown in Section 6. Finally, conclusions and futureplans are presented in Section 7.

2. Problem statement

In digital communication, a modulated signal canbe generally represented as

s(t) = AmgT(t) cos(2�fm(t) + �m(t)); (1)

where Am; fm and �m are the message amplitude,message frequency and message phase, respectively,in accordance with appropriate modulation techniques

and gT is the pulse shaping function. We consider thefour main types of digital modulation techniques, i.e.amplitude shift key (ASK), phase shift key (PSK),frequency shift key (FSK) and quadrature amplitudemodulation (QAM).QAM (e.g. QAM16, V29, V32, QAM64, etc.),

being a newer type among the four, takes a slightlydiPerent form

sm(t) = AmcgT(t) cos 2�fct + AmsgT(t) sin 2�fct: (2)

As seen in Eq. (2), QAM employs two quadraturecarriers, instead of one. Each carrier is independentlymodulated by Amc and Ams, where m takes the valuefrom 1 to M , with M corresponding to the number ofsignal levels. For example a QAM16 signal is by na-ture of combination of two ASK4 signals with diPer-ent quadrature carriers. QAM can also be written as

sm1m2(t) = Am1gT(t) cos(2�fct + �m2); (3)

where QAM can be seen as a combined digital ampli-tude and digital phase modulation.The function of ADMR is to act as an intermediate

stage between channel equalisation and demodulation.It is not meant to recover the embedded informationin the sampled data. Ten digital modulation types areconsidered in this paper:

• ASK2 • ASK4• BPSK • QPSK• FSK2 • FSK4• QAM16 • V29• V32 • QAM64

3. Feature extraction

All signals are digitally simulated according toEqs. (1) and (3) in MATLAB environment. Randominteger up to M -level (M = 2; 4; 16; 29; 32; 64) are"rst generated by a uniform random number generator(URNG). These are then resampled at baud rate forpulse shaping before passing through respective mod-ulators. Parameters used for modulation are shown inTable 1.The simulated signals were also band-limited and

Gaussian noise were added according the prede"nedsignal-to-noise ratio (SNR), i.e. −5; 0; 5; 10, and20 dB.


Table 1Modulation parameters

Parameters Values

Sampling frequency, fs 307:2 kHzCarrier frequency, fc 17 kHzBaud rate, Rb 9600 baudNo. samples, Ns 4096 samples

Typical pattern recognition systems often reduce thesize of a raw data set by extracting some distinct at-tributes called features. These features, which can berepresented as d-dimensional vectors, de"ne a partic-ular pattern.Azzouz and Nandi [3, Chapter 3] proposed a

spectral-based feature set for digital modulations.These features were demonstrated to be suitable forsignals which contain hidden information in either in-stantaneous amplitude (OOK, ASK4), instantaneousfrequency (M -ary FSKs) or instantaneous phase(BPSK, PSK4).However, modulation schemes such as QAM con-

tain information in both amplitude and phase. In thiswork, two feature sets—a spectral feature set (see Sec-tion 3.1) and a new feature set based on higher-ordercumulants of the signal (see Section 3.2) are used.

3.1. Spectral feature set

The main motivation for this spectral feature set isthat, the information content for digital modulationsis hidden either in the signal instantaneous ampli-tude, instantaneous phase or instantaneous frequency,while, for the previously proposed modulation types[3, Chapter 3], information is only hidden in a sin-gle domain for a particular modulation type. The "vefeatures proposed are described as below:

• Maximum value of the power spectal density of thenormalised-centred instantaneous amplitude

max = max |DFT (acn(i))|2=Ns; (4)

where Ns is the number of samples, acn(i)=an(i)−1and an(i) = a(i)=ma; a(i) is the ith instantaneousamplitude and ma is the sample mean value.

• Standard deviation of the absolute value of the cen-tred non-linear components of the instantaneous

phase

�ap=

√√√√√ 1C

∑an(i)¿at

�2NL(i)

−

1C

∑an(i)¿at

|�NL(i)|2

;

(5)

where C is the number of samples in {�NL(i)} forwhich an(i)¿at and at is the threshold value fora(i) below which the estimation of the instanta-neous phase is very noise sensitive.

• Standard deviation of the direct value of the centrednon-linear component of the instantaneous phase

�dp=

√√√√√ 1C

∑an(i)¿at

�2NL(i)

−

1C

∑an(i)¿at

�NL(i)

2

:

(6)

• Standard deviation of the absolute value of thenormalised-centred instantaneous amplitude

�aa=

√√√√ 1Ns

(Ns∑i=1

a2cn(i)

)−(1Ns

Ns∑i=1

|acn(i)|)2:

(7)

• Standard deviation of the absolute value of thenormalised-centred instantaneous frequency

�af=

√√√√√ 1C

∑an(i)¿at

f2N (i)

−

1C

∑an(i)¿at

|fN (i)|2

:

(8)

Note that the availability of instantaneous ampli-tude, phase and frequency depends on the avail-ability of carrier frequency fc. In this paper, weassume that fc is known a priori; in practical appli-cations it would need to be estimated. Besides thefrequency estimation (see Appendix B in [5]), otherconcerns such as channel information are assumedto have been adequately addressed either with apriori knowledge or through blind estimation algo-rithms. The above features were used in previousworks of Nandi and Azzouz.

3.2. Statistical feature set

Present modulation schemes, e.g. QAM16, V29,etc., often contain information in both amplitude and


phase spectra. In communications, these modulationare regarded as complex signal and often be illustratedas in-phase and quadrature (I–Q) components. HOSprovide a good way to obtain features to illustrate thetwo-dimensional probability density function (PDF),i.e. the constellation diagram. Normalised HOS pro-vide invariant properties that are of interest, e.g. am-plitude invariant. Besides, HOS are not aPected byadditive white Gaussian noise (AWGN) as its highercumulants (order¿ 2) are always zero.In [23], Swami and Saddler reported a detailed ac-

count on ADMR using normalised complex HOS.Here, the authors proposed a diPerent scheme by us-ing the cumulants of the real and imaginary parts ofthe analytic signal.

3.2.1. CumulantsLet Xi be a signal vector, {x1i ; x2i ; : : : ; xNi }, and 〈〉

denote the statistical expectation. The second-, third-and fourth-order cumulants at zero lag is then

CX1 ;X2 = 〈X1;X2〉

=1N

N∑n=1

xn1xn2 ; (9)

CX1 ;X2 ;X3 = 〈X1;X2;X3〉

=1N

N∑n=1

xn1xn2xn3 ; (10)

CX1 ;X2 ;X3 ;X4 = 〈X1;X2;X3;X4〉 − 〈X1;X2〉〈X3;X4〉− 〈X1;X3〉〈X2;X4〉 − 〈X1;X4〉〈X2;X4〉

=1N

N∑n=1

xn1xn2xn3xn4 − CX1 ;X2CX3 ;X4

−CX1 ;X3CX2 ;X4 − CX1 ;X4CX2 ;X3 : (11)

3.2.2. Feature setLet Hy be the complex envelope of the sampled

signal y(t) which is de"ned by

Hy = [y(t) + jy(t)] exp(−j2�fct); (12)

where y(t) is the Hilbert transform of y(t) and fc isthe carrier frequency.

We then de"ne R to be the real part of Hy; R(Hy),and I to be the imaginary part of Hy; I(Hy). Thus, wepropose the following feature set:

CR;R;CR; I ;CI; I ;

CR;R;R;CR;R; I ;CR; I; I ;CI; I; I ;

CR;R;R;R;CR;R;R; I ;CR;R; I; I ;CR; I; I; I ;CI; I; I; I ; (13)

which are the second-, third- and fourth-order cumu-lants and cross cumulants of the real and imaginaryparts of the signal. These features are then combinedwith the original "ve features to give the "nal com-bined feature set.Each modulation scheme has a unique constella-

tion diagram in its I–Q plane (see Fig. 1). Examplesof some HOS features to distinguish among diPerentconstellations are shown in Figs. 2 and 3.In this paper, it is assumed that the phase recovery

process is perfect. In practice, the clock recovery pro-cess often introduces phase oPset or non-coherencyin the recovered signal. Prior to the extraction ofthese features, the phase error must be corrected.This is a part of the authors’ on going work andwill be presented in another forthcoming paper. Fordense modulation (M ¿ 64), care is to be taken sothat the sample size is large enough to include ev-ery symbol in the constellation map. Otherwise, theestimates of the higher-order cumulants will not beaccurate.

3.3. Data preprocessing and normalisation

Although feature extraction and selection areessentially a form of data preprocessing, we re-gard any work done prior to feature extraction asdata preprocessing. These include data sampling,pulse shaping, estimation of carrier frequencies, fc,and recovery of complex envelope through Hilberttransform.Normalisation is carried out before feeding the ex-

tracted features into the MLP recogniser for training,validation and testing. The features are normalised toensure that they are zero mean and unit variance. Fig. 4shows the complete process of the proposed digitalmodulation recogniser.


-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Qua

drat

ure

In-Phase

psk8

(a)-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3

Qua

drat

ure

In-Phase

qam16

(b)

-5 -4 -3 -2 -1 0 1 2 3 4 5

-5

-4

-3

-2

-1

0

1

2

3

4

5

Qua

drat

ure

In-Phase

v29

(c)-4 -3 -2 -1 0 1 2 3 4

-4

-3

-2

-1

0

1

2

3

4Q

uadr

atur

e

In-Phase

v32

(d)

Fig. 1. Various constellation signatures: (a) PSK8; (b) QAM16; (c) V29; (d) V32.

4. MLP classi�er

ANN gained their popularity in recent decades dueto their abilities to learn complex input–output map-ping of non-linear relationships, self-adaptivity toenvironment changes and fault tolerance. A popularfamily of ANN is the feed-forward networks whichinclude the MLP and the radial basis function (RBF)networks. We have chosen MLP as the classi"erin this work; however, there is nothing to preventone from choosing other types of neural networks.Ibnkahla [9] surveyed a wide variety of neural net-works in digital communications.MLP is chosen due to its simplicity and eNcient

hardware implementation. In many ways, MLP canbe likened to a posteriori probability estimation and

non-linear discriminant analysis in statistical patternrecognition. In fact, MLP hides the details of back-ground statistics and mathematics from the users, andat the same time provides an easily understandablemathematical model of a biological concept for thesolution.MLP is a feed-forward structure of interconnec-

tion of individual non-linear parallel computing unitscalled neurons. Inputs are propagated through the net-work layer by layer and MLP gives an non-linearmapping of the inputs at the output layers.We can write MLP mathematically as

yk(n) = �

q∑

i=1

wki�

p∑j=0

wijxj(n)

; (14)


0 100 200 300 400 500 600 700 800 900 1000-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3Normalised CRRRR for QAM16, V29 and V32

Samples No

QAM16V29V32

Fig. 2. Fourth-order cumulant of real part of the analytical signals of three diPerent modulation types (SNR at 20 dB).

0 100 200 300 400 500 600 700 800 900 1000-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1Normalised CRRII for QAM16, V29 and V32

Samples No

QAM16V29V32

Fig. 3. Fourth-order cross cumulant of the analytical signals of three diPerent modulation types (SNR at 20 dB).


(Hilbert Transform)

Carrier Frequency Estimation,e.g.: fractional sampling,

Data Preprocesing

Multilayer PerceptronNeural Network Recogniser

Feature Extraction

Feature Normalisation

Recovery of Complex Envelope

Fig. 4. Proposed digital modulation recogniser.

where n is the sample number, subscript k now denotesthe output nodes, subscripts i and j denote hiddenand inputs nodes, respectively. Note that the activationfunctions, �, are allowed to vary for diPerent layersof neurons.The recognition basically consists of two phases—

training and testing. A paired training input and tar-get output are presented at each training process, andweights are calculated according to the chosen learn-ing algorithm. For a batch training algorithm, weights

are updated once every training epoch, meaning a fullrun of training sample, and for an adaptive training,weights are updated every training sample. We shalldiscuss learning algorithms more in depth in the nextsection.

4.1. Learning algorithms

MLP is relatively mature branch of ANN and thereare a number of eNcient training algorithms. Azzouzand Nandi [19] adopted the standard backpropagationalgorithm (BP). They have also considered BP withmomentum and adaptive learning rate to speed up thetraining time required.BP algorithms implement generalised chained rules

repetitively to calculate the changes of each weightwith respect to the error function, E. Examples of somecommon choices are the sum squared error (SSE) andmean squared error (MSE):

)E)wij

=)E)yi

)yi)ui

)ui)wij

; (15)

where wij represents the weight value from neuron jto neuron i, yi is the output and ui is the weighted sumof the inputs of neuron i.The weight values are then updated by a simple

gradient descent algorithm

wij(t + 1) = wij(t)− � )E)wij (t): (16)

The learning rate parameter, �, is analogous to the stepsize for a least mean square (LMS) adaptive "lter,where a higher learning rate means a faster conver-gence, but with risk of oscillation. On the other hand,a value too small will take too much time to achieveconvergence. An adaptive learning rate variant of BPtakes account of this problem by updating the learningrate adaptively. By doing so, one also avoids trappingin a local minima.A BP algorithm with momentum adds an extra mo-

mentum parameter, �, to the weight changes:

Lwij(t + 1) =−� )E)wij

(t) + �)E)wij

(t − 1): (17)

This takes account of the previous weight changesand leads to a more stable algorithm and acceleratesconvergence in shallow areas of the cost function.


In recent years, new algorithms have been proposedfor network training. However, some algorithms re-quire much computing power to achieve good train-ing, especially when dealing with a large training set.Although these algorithms require very small numbersof training epochs, the actual training time for a epochis much longer compared with BP algorithms. An ex-ample would be the Levenberg–Marquardt Algorithm(LM).In this paper, we consider the BP algorithms and

the resilient backpropagation algorithm (RPROP)proposed by Riedmiller and Braun [21] in 1993.Basically, unlike BPs, RPROP only considered thesign of derivatives as the indication for the direc-tion of the weight update. In doing so, the size ofthe partial derivative does not inQuence the weightstep.The following equation shows the adaptation of

the update values of Lij for the RPROP algorithm.For initialisation, all Lij are set to small positivevalues:

Lij(t) =

�+ ∗Lij(t − 1)

if)E)wij

(t − 1))E)wij

(t)¿ 0;

�− ∗Lij(t − 1)

if)E)wij

(t − 1))E)wij

(t)¡ 0;

�0 ∗Lij(t − 1) otherwise;

(18)

where �0 = 1; 0¡�−¡ 1¡�+ and �−;0;+ areknown as the update factors. Whenever the derivativeof the corresponding weight changes its sign, it im-plies that the previous update value is too large and ithas skipped a minimum. Therefore, the update valueis then reduced (�−) as shown above. However, ifthe derivative retains its sign, the update value isincreased (�+). This will help to accelerate conver-gence in shallow areas. To avoid over-acceleration,in the epoch following the application of �+, the newupdate value is neither increased nor decreased (�0)from the previous one. Note that values of Lij remainnon-negative in every epoch.This update value adaptation process is then fol-

lowed by the actual weight update process, which is

governed by the following equations:

Lwij(t) =

−Lij(t) if)E)wij

(t)¿ 0;

+Lij(t) if)E)wij

(t)¡ 0;

0 otherwise;

(19)

wij(t + 1) = wij(t) + Lwij(t): (20)

4.2. Generalisation and early stopping

Early stopping is often used to avoid over-trainingof a recogniser and to achieve better generalisation.When training a recogniser, the error measured gen-erally decreases as the number of training epochs in-creases. However, if an error value is measured againstan independent data set (the validation set) after eachtraining epoch, the error will "rst decreases until apoint where the network starts to become over-trained,the error then starts to increase with the epochvalue.We can, therefore, stop the training procedure

at the smallest error value of the validation setsince it gives the optimal generalisation of therecogniser.

5. Feature selection

Pattern recognition applications often face the prob-lem of the curse of dimensionality. The easiest wayto reduce the input dimension is to select some inputsand discard the rest. This method is called featureselection.The advantage of feature selection is that one can

use the least possible number of features withoutcompromising the performance of the MLP recog-niser. The ePect can be seen best when there arehighly correlated inputs or more than one inputshare the same information content. Situations likethis often occur when addressing pattern recognitionproblems.There exists two main paradigms of feature selec-

tion, i.e. the "lter-type approach (e.g. [1,13]) and thewrapper type approach (e.g. [11]). For the "lter-typeapproach the feature set is evaluated without the aidof the application, in this case, the MLP recogniser. In


Table 2An overview of generalised GA

(1) Initialise a population P0 of N individuals(2) Set generation counter, i = 1(3) Create a intermediate population, P′i , through selection

function(4) Create a current population, Pi through reproduction

functions(5) Evaluate current population(6) Increment the generation counter, i = i + 1(7) Repeat step 3 until termination condition reached(8) Output the best solution found

other word, the features were selected based on someprede"ned functions of the members of the featuresset. An example of these functions is the correlationamong the features. The wrapper type approach, how-ever, uses the performance of recogniser to evaluatethe selected feature subset. This has the advantageof guaranteed performance but often takes a longertime.There are many methods for feature selection, but

they generally consist of two main parts—a selectionphase and evaluation phase. In pattern recognition,the selection criterion is normally the minimisation ofrecognition errors. This leaves us with choosing a suit-able selection procedure. In this paper, we shall focuson a stochastic search method using GA. Similar wrap-per type feature selection method with GA were alsoproposed in [10,25]. Conventional brute force search-ing method like forward selection, backward selec-tion, etc. are also available, but are not discussed inthe context of this paper. A comparison among thesemethods can be found in [12].

5.1. Genetic algorithm

GA is a stochastic optimisation algorithm whichadopts Darwin’s theory of survival of the "ttest. Thealgorithm runs through an iteration of individual se-lection, reproduction and evaluation as illustrated inTable 2. In recent years, GA has found many applica-tion in engineering sector, such as adaptive "lter de-sign, evolutionary control systems, feature extraction,evolutionary neural networks, etc.Two important issues in GA are the genetic coding

used to de"ne the problem and the evaluation func-

tion. Without these, GA is nothing but a meaning-less repetition of procedures. The simple GA proposedby Goldberg [7] in his book uses binary coding, butother methods such as real coding sometimes are moremeaningful to some problems. This is discussed infurther details in Section 5.2.The evaluation function estimates how good an in-

dividual is in surviving the current environment. Infunction minimisation problems, the individual thatgives the smallest output will be given the highestscore. In this case, the evaluation function can be thereciprocal of output plus a constant. In the modu-lation recognition problem, we de"ne the evaluationas the overall performance of the recogniser, whichis an average performance of training, validation andtesting. Therefore, the GA implemented here is de-signed to seek the maximum score returned by eachindividual.

5.2. String representation

Each individual solution in GA is represented bya genome string. This string contains speci"c param-eters to solve the problem. In our application, twodiPerent methods of coding are investigated, i.e. thebinary genome coding and the list genome coding.In the binary genome coding, each string has a

length N , where N is the total number of input fea-tures available. A binary ‘1’ denotes the presence ofthe feature at the corresponding index number. Simi-larly, a binary ‘0’ denotes an absence. The advantageof this coding is that it searches through the featuresubspace dynamically without user de"ned number ofsubset features. No constraint is needed with this cod-ing method.The second coding used is the real numbered list

genome string. Each genome in this category is oflength M , where M is the desired number of featuresin a feature subset. To initialise, the GA chooses ran-domlyM numbers from a list of integers ranging from1 toN . However, we do not desire any repetition of theintegers as this means that the same feature is selectedmore than once. Therefore a constraint, 1¡fi ¡Nis applied, where fi denotes ith input feature. In prac-tice, we randomise the list sequence and choose the"rst M features in the list.An extra parameter can be appended to the genome

for choosing the number of hidden units in the MLP


recogniser. However, from previous simulations, welearned that the optimum number of hidden unitsis around ten neurons using RPROP training al-gorithm. Since this is practical and reasonable fora hardware implementation, we shall not compli-cate our problem of feature selection with an extraparameter.

5.3. Basic genetic operators

Basic genetic operators are used for reproductionand selection. The following gives a brief descriptionof each operator:

Crossover: Crossover occurs with a crossover prob-ability of Pc. A point is chosen for two strings wheretheir genetic informations are exchanged. There arealso variation of two- or multi-point crossover. For ourpurpose, we shall use one-point crossover, and typicalvalue of Pc of 0.75.Mutation: Mutation is used to avoid local con-

vergence of the GA. In binary coding, it just meansthe particular bit chosen for mutation is invertedto its complement. In list coding, the chosen in-dex is replaced with a new index without breakingthe constraint. Mutation occurs with typical muta-tion probability of 0.05. This probability value iskept at such a low value to prevent unnecessaryoscillation.Selection: There are several ways to select a new

intermediate population. Based on the performance ofindividual strings, roulette wheel selection assigns aprobability to each string according to their perfor-mance. Therefore, poor genome strings will have aslight chance of survival. Unlike roulette wheel, se-lection by rank just orders the individuals accordingto their performance and select copies of best individ-uals for reproduction.

Other genetic operators like elitism, niche, diploidyare often classi"ed as advanced genetic operators.For the purpose of investigation, we shall apply onlyelitism. Elitism comes in various forms. In our appli-cation, we require that the best two strings are alwaysto be included in the new population. This gives achance to reevaluate their capabilities and improvesGA convergence.

6. Simulation results

6.1. MLP and BP with momentum and adaptivelearning rate

In this section, the performance of the proposedrecogniser is investigated with ten diPerent modula-tion types, i.e. ASK2, ASK4, BPSK, QPSK, FSK2,FSK4, QAM16, V29, V32 and QAM64. Randomnumbers up to M -level (M = 2; 4; 8; 16; 32, etc.)are "rst generated using a URNG. These are thenre-sampled at baud rate for pulse shaping. Each mod-ulation type has 3000 realisations of 4096 sampleseach which makes a total of 30,000 realisations.These are then equally divided into three data setsfor training, validation and testing. The MLP recog-niser is allowed to run up to 10,000 training epochs.However, training is normally stopped by the vali-dation process long before this maximum epoch isreached.A single hidden layer MLP feed-forward network

was chosen as the recogniser. As described in the pre-vious section, the number of neurons in the hiddenlayer needs to be determined manually at this stage.Generally, any number in the vicinity of 40 neuronsseems to be adequate for reasonable classi"cation. Ina noiseless setup, the recogniser can show up to 100%accuracy as shown in Table 3.The ePect of noise on recogniser performance is

also studied through training and testing with dif-ferent SNRs. Table 4 shows the performance ofthe recogniser for various SNR values. Performanceis generally good even with low SNR values. Forexample at 0 dB SNR, a performance success of 98%is recorded.

6.2. MLP and RPROP

Using a similar setup as in the previous section, theMLP recogniser is trained using the RPROP algorithmas described in Section 4.1. The task is again to recog-nise 10 diPerent modulation types grouped by SNRvalues.Table 5 presents a comparison of performance

and number of epochs. As seen from the results,RPROP reduced the epoch training to about one-"fthof BP with momentum and adaptive learning rate.


Table 3Training performance for BP MLP with 40 hidden layer neurons (SNR =∞)

Simulated modulation type Deduced modulation type

ASK2 ASK4 PSK2 PSK4 FSK2 FSK4 QAM16 V.29 V.32 QAM64

ASK2 1000 0 0 0 0 0 0 0 0 0ASK4 0 1000 0 0 0 0 0 0 0 0BPSK 0 0 1000 0 0 0 0 0 0 0QPSK 0 0 0 1000 0 0 0 0 0 0FSK2 0 0 0 0 999 1 0 0 0 0FSK4 0 0 0 0 0 1000 0 0 0 0QAM16 0 0 0 0 0 0 1000 0 0 0V.29 0 0 0 0 0 0 0 1000 0 0V.32 0 0 0 0 0 0 0 0 1000 0QAM64 0 0 0 0 0 0 0 0 0 1000

All 17 features were used.

Table 4Performance of BP MLP Recogniser with 40 hidden layer neurons at diPerent SNR values

Performance (%) SNR (dB)

−5 0 5 10 20

Training 90.38 98.11 99.32 99.86 100.00Validation 89.18 98.00 99.35 99.82 100.00Testing 89.35 97.91 99.21 99.90 99.96Overall 89.64 98.01 99.33 99.86 99.99

All 17 features were used.

Table 5Overall performance and number of epochs used for RPROP MLP with 10 hidden layer neurons vs. BP MLP with 10 hidden layer neurons

SNR (dB) BP RPROP

Overall perf. (%) Elasped epochs Overall perf. (%) Elasped epoch

−5 73.62 441 94.79 4900 87.02 421 99.07 1295 99.26 2101 99.53 483

10 89.86 995 99.95 28520 99.93 1593 99.97 257

All 17 features were presented at the input.

Computationally, the time diPerence between both al-gorithms is negligible.Another point worth noting is that the RPROP needs

less hidden neurons than the conventional BP. Table 6shows results obtained using only 10 hidden layer neu-rons as opposed to 40 hidden layer neurons in the

previous section. Performance are notably better forthe lower SNR values.RPROP also seems to perform better in lower SNR

range compared to BP with momentum and adap-tive learning rate. The simulations were carried out inMATLAB environment using neural network toolbox.


Table 6Performance of RPROP MLP Recogniser with 10 hidden layer neurons at diPerent SNR values

SNR (dB)

Performance (%) −5 0 5 10 20

Training 95.42 99.32 99.58 99.95 99.97Validation 94.5 98.92 99.45 99.97 99.98Testing 94.45 98.98 99.55 99.94 99.96Overall 94.79 99.07 99.52 99.95 99.97

All 17 features were presented.

6.3. Feature selection using GA

Experiments were again carried out in the MAT-LAB environment. By choosing the wrapper-type ap-proach, the RPROP MLP recogniser (as seen in Sec-tion 6.2) was chosen as the evaluation function and theoverall performance was returned as the "tness valueof the genome string.Convergence of a GA can be de"ned in several

ways. In our application, the performance convergenceis chosen. Therefore we de"ne convergence as theratio between present "tness function value and theaverage of "tness function values from the past "vegenerations. The GA is allowed to run until it reachesconvergence.

6.3.1. Binary codingFor binary coding, 20 binary genome strings were

created randomly with uniform probability distribu-tion during initialisation. This implies that each bitin the string is independent of other bits. One-pointcrossover and single bit mutation were used as repro-duction operators. The GA was run until it reachedconvergence. Each feature subset selected was al-lowed to train for a maximum of 250 epochs unlessit was not stopped by the validation process.Table 7 shows the best genome strings selected

in diPerent SNR environment and their performance.These are then compared with the performance of theMLP recogniser trained using RPROP.Generally, perfomance from reduced feature sets

selected in diPerent SNRs show improvement over theperformance shown by the RPROP MLP recogniser.Moreover, feature subsets contain typically four or "vefeatures less than the original feature set.

6.3.2. Fixed length list codingWith list genome encoding method, one has to spec-

ify the length of the genome string, which representsthe number of features in the reduced feature set. Asseen from the previous section, without any compro-mise in performance, the feature set can typically bereduced down to 12 features. Therefore, in this sec-tion, experiments were carried out to investigate thepossibility of even smaller data set and the compro-mise in performance that one might observe.Table 8 shows the performance of the recogniser

using only six features selected by the GA and theperformance using the original full feature set. At−5 dB, the recogniser records a performance degra-dation of 2% only. For other SNRs, the diPerence isnegligible.

6.4. Performance comparison

As mentioned in [17], direct comparison with otherworks is diNcult in modulation recognition. This ismainly because of the fact that there is no single uni-"ed data set available. DiPerent setup of modulationtypes will lead to diPerent performance. Besides, thereare many diPerent kinds of benchmarking systemsused for signal’s quality. This causes diNculties fordirect numerical comparison.As for neural network-based modulation recognis-

ers, Louis and Sehier reported a generalisation rate of90% and 93% of accuracy of data sets with SNR of15–25 dB. However, the performance for lower SNRsare reported to be less than 80% for a fully connectednetwork, and about 90% for a hierarchical network.Lu et al. [16] show an average accuracy around 90%

with 4096 samples per realisation and SNR ranges


Table 7Performance of binary string representation GA RPROP MLP with 10 hidden layer neurons with dynamic feature selection

SNR (dB) Binary string Features chosen Perf. with GA (%) RPROP Perf. (%)

−5 11011111111111100 15 94.37 94.790 11110100011101111 12 99.15 99.075 11110100010111101 11 99.97 99.53

10 11110100011101111 12 99.97 99.9520 11011000001110111 11 99.99 99.97

Table 8Performance of list string representation GA RPROP MLP with 10 hidden layer neurons

SNR (dB) Genome string Features chosen Perf. with GA (%) RPROP Perf. (%)

−5 1, 2, 7, 8, 9, 12 6 92.93 94.790 6, 7, 8, 9, 12, 15 6 99.02 99.075 7, 8, 13, 14, 15, 16 6 99.34 99.53

10 5, 6, 7, 12, 13, 15 6 99.83 99.9520 6, 7, 12, 13, 15, 16 6 99.80 99.97

Only six features were selected.

between 5 and 25 dB. By increasing the number ofsamples, an increase of 5% in average performanceis achieved. In [15], Lu et al. show through computersimulation the average recognition rate is 83%, andit reaches over 90% for SNR value of over 20 dB.However, if SNR is less than 10 dB, the performancedrops to less than 70%.The MLP recogniser proposed in this work shows

a steady performance with diPerent SNR values, e.g.over 90% for all SNR values. Through feature selec-tion, the input dimension is reduced to less than halfthe original size without trading oP the generalisationability and accuracy. The proposed recogniser is alsofast in terms of training time. If it were known thatchanges have occurred, for example, the network caneasily be retrained.

7. Conclusion

AMR is an important issue in communication in-telligence (COMINT) and electronic support measure(ESM) systems. Previous works on ANN recognis-ers have been encouraging but with a limited type ofmodulations; here additional digital modulations like

QAM16, V29, V32 and QAM64 are included. We pro-pose, in this paper, a combination of a spectral and anew statistical-based feature set with an MLP recog-niser implementation. Also, the authors have com-pared two ANN training algorithms, BP with momen-tum and adaptive learning and RPROP, which, as faras the authors are aware, is the "rst application of theRPROP algorithm in ADMR. Simulations show satis-factory results, about 99% accuracy at SNR value of0 dB with good generalisation. RPROP also demon-strated better and faster performance than BP.GA was used to perform feature selection to reduce

the input dimension and increase performance of theRPROP MLP recogniser. Signi"cant improvementscan be seen at lower SNR values using the selectedfeature subset. Using only six features in list codingmethod, the RPROP MLP managed to deliver a per-formance of 99% at 0 dB SNR and 93% at −5 dBSNR.

Acknowledgements

The authors would like to thank Overseas ResearchStudentship (ORS) Awards Committee UK and the


University of Liverpool for funding this work. Theyare also grateful for the comments from the anony-mous reviewers for this paper.

References

[1] S. Abe, R. Thawonmas, Y. Kobayashi, Feature selection byanalyzing class regions approximated by ellipsoids, IEEETrans. Systems Man Cybernet. Part C Appl. Rev. 28 (2)(1998) 282–287.

[2] E.E. Azzouz, A.K. Nandi, Procedure for automatic recognitionof analogue and digital modulations, IEE Proc. Commun. 143(5) (1996) 259–266.

[3] E.E. Azzouz, A.K. Nandi, Automatic Modulation Recognitionof Communication Signals, Kluwer Academic Publishers,Dordrecht, 1996.

[4] D. Boudreau, C. Dubuc, F. Patenaude, M. Dufour, J.Lodge, R. Inkol, A fast modulation recognition al-gorithm and its implementation in a spectrum monitoringapplication, in: MILCOM 2000 Proceedings. 21st CenturyMilitary Communications. Architectures and Technologiesfor Information Superiority, Vol. 2, IEEE Press, New York,2000, pp. 732–736

[5] Communications Research Centre Canada, Spectrum ExplorerSoftware, 2002, http://www-ext.crc.ca/spectrum-explorer/spectrum-explorer6/en/index.htm.

[6] D. Donoho, X. Huo, Large-sample modulation classi"cationusing Hellinger representation, Signal Process. Adv. WirelessCommun. (1997) 133–137.

[7] D.E. Goldberg, Genetics Algorithms: in Search Optimisationand Machine Learning, Addison-Wesley, Reading, MA, 1989.

[8] A. Hero, H. Hadinejad-Mahram, Digital modulationclassi"cation using power moment matrices, in: Proceedingsof the IEEE 1998 International Conference on Acoustics,Speech and Signal Processing (ICASSP), Washington, Vol.6, 1998, pp. 3285–3288.

[9] M. Ibnkahla, Applications of neural networks to digitalcommunications—a survey, Signal Processing 80 (2000)1185–1215.

[10] L.B. Jack, A.K. Nandi, Genetics algorithms for featureselection in machine condition monitoring with vibrationsignals, IEE Proc.: Vision Image Signal Process 147 (3)(2000) 205–212.

[11] R. Kohavi, D. Sommer"eld, Feature subset selection usingthe wrapper method: over"tting and dynamic search spacetopology, in: Proceedings of First International Conference onKnowledge Discovery and Data Mining (KDD-95), Montreal,1995, pp. 192–197.

[12] M. Kudo, J. Sklansky, Comparison of algorithms that selectfeatures for pattern classi"ers, Pattern Recognition 33 (33)(2000) 25–41.

[13] P.L. Lanzi, Fast feature selection with genetic algorithms:a "lter approach, in: Proceedings of IEEE InternationalConference on Evolutionary Computation, Indianapolis, 1997,pp. 537–540.

[14] K. Nolan, L. Doyle, D. O’Mahony, P. Mackenzie, Modulationscheme recognition techniques for software radio on a generalpurpose processor platform, Proceedings of the First JointIEI/IEE Symposium on Telecommunication Systems, Dublin,2001.

[15] L. Mingquan, X. Xianci, L. Lemin, AR modeling basedfeatures extraction for multiple signals for modulationrecognition, in: Signal Processing Proceedings, FourthInternational Conference, Beijing, Vol. 2, 1998, pp. 1385–1388.

[16] L. Mingquan, X. Xianci, L. LeMing, Cyclic spectralfeatures based modulation recognition, in: CommunicationsTechnology Proceedings, International Conference, Beijing,Vol. 2, 1996, pp. 792–795.

[17] B.G. Mobasseri, Digital modulation classi"cation usingconstellation shape, Signal Processing 80 (2000) 251–277.

[18] A.K. Nandi, E.E. Azzouz, Modulation recognition usingarti"cial neural networks, Signal Processing 56 (2) (1997)165–175.

[19] A.K. Nandi, E.E. Azzouz, Modulation Recognition UsingArti"cial Neural Networks, Signal Processing 56 (2) (1997)165–175.

[20] A.K. Nandi, E.E. Azzouz, Algorithms for automaticmodulation recognition of communication signals, IEEETrans. Commun. 46 (4) (1998) 431–436.

[21] M. Riedmiller, H. Braun, A direct adaptive method forfaster backpropagation learning: the RPROP algorithm, in:Proceedings of the IEEE International Conference on NeuralNetworks, San Francisco, CA, 1993, pp. 586–591.

[22] C.L.P. Sehier, Automatic modulation recognition with ahierarchical neural network, in: Military Communications(MILCOM) Conference Records 1993: Communications onthe move, Vol. 1, IEEE Press, New York, 1993, pp. 111–115.

[23] A. Swami, B.M. Sadler, Hierachical digital modulationclassi"cation using cumulants, IEEE Trans. Commun. 48 (3)(2000) 416–429.

[24] M.L.D. Wong, A.K. Nandi, Automatic digital modulationrecognition using spectral and statistical features withmulti-layer perceptrons, in: Proceedings of InternationalSymposium of Signal Processing and Application (ISSPA),Kuala Lumpur, Vol. II, 2001, pp. 390–393.

[25] J. Yang, V. Honavar, Feature subset selection using a geneticalgorithm, Intell. Systems IEEE 13 (2) (1998) 44–49.

http://www-ext.crc.ca/spectrum-explorer/spectrum-explorer6/en/index.htm

http://www-ext.crc.ca/spectrum-explorer/spectrum-explorer6/en/index.htm