[IEEE 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) - Southampton, United Kingdom (2013.09.22-2013.09.25)] 2013 IEEE International Workshop on Machine

2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 22–25, 2013, SOUTHAMPTON, UK

ROBUSTNESS ENHANCEMENT OF DISTRIBUTION BASED BINARY DISCRIMINATIVEFEATURES FOR MODULATION CLASSIFICATION

Zhechen Zhu∗, Asoke K. Nandi†

Electronic and Computer EngineeringBrunel University

Uxbridge, Middlesex, UB8 3PH, UK{zhechen.zhu; asoke.nandi}@brunel.ac.uk

Muhammad Waqar Aslam‡

Electrical Engineering and ElectronicsThe University of Liverpool

Liverpool, L69 3GJ, [email protected]

ABSTRACT

In this paper, we propose distribution based binary discrimi-native features and a novel feature enhancement process forautomatic modulation classification. The new features ex-ploit the signal distribution mismatch between two modula-tions. Signal distributions on I-Q segments, amplitude andphase, are considered to produce a comprehensive feature setfor improved robustness. Logistic regression is used to reducefeature dimension and enhance classification robustness. Toaccomplish multi-class classification, a class oriented featurespace is created for the K-nearest neighbours classifier. Thetest results show that the proposed method is able to achieveexcellent performance in simulated environments.

Index Terms— Modulation classification, feature combi-nation, logistic regression, K-nearest neighbour

1. INTRODUCTION

Automatic Modulation Classification (AMC) is a challengingclassification problem in telecommunications engineering.AMC equips a communication system with the ability torecognise automatically the modulation of received signals.It is a crucial component in cognitive radio system with adap-tive modulation mechanism. Adaptive modulation helps toboost spectrum efficiency by employing suitable modula-tions according to varying channel conditions. To enhancesystem performance further, AMC enables the receiver tooperate without constant notification of transmitted signalmodulations and avoids the wastage of bandwidth. In mili-tary electronic warfare, surveillance, and threat analysis, therecovery of intercepted signal is made possible by accuratemodulation identification and correct demodulation.∗Zhechen Zhu would like to thank the School of Engineering and Design,

Brunel University, for the financial support.†Asoke K. Nandi would like to thank TEKES for their award of the Fin-

land Distinguished Professorship.‡Muhammad Waqar Aslam would like to acknowledge the financial sup-

port of the University of Azad Jammu and Kashmir, Pakistan.

Existing AMC solutions can be categorised into twogroups: likelihood based approaches and feature based ap-proaches. The likelihood based Maximum Likelihood (ML)classifier [1] promises to provide optimal classification withstrict conditions of perfect channel estimation and compen-sation. Classification is achieved by finding the modulationcandidate which shares the maximum likelihood with theunknown signal. This process requires complex computa-tion because of the logarithm and exponential operationsinvolved in the likelihood function. Several studies have beenconducted to improve the computational efficiency of MLclassifiers [2, 3]. Notably, recently developed distributiontest classifiers [4, 5] show great potential in providing lowcomplexity and accurate classification. However, like mostlikelihood based classifiers, their performance suffers frommodel mismatch caused by imperfect channel conditions.

Feature based methods [6] use predefined features incombination with classifiers to achieve AMC. The perfor-mance depends on the quality of the features used. However,using machine learning techniques for feature combinationand classification can improve the performance substantially.In [7], high order cumulants were used as effective featuresfor M-QAM modulations. The feature selection and com-bination process are enhanced using Genetic Programming(GP) in [8]. However, it is pointed out in [4] that the cumulantbased features are limited by the number of samples (lengthof signal intercepted and available for analysis).

The proposed method combines the features inspiredby ML based distribution test and the adaptive procedureof feature combination using machine learning techniques.An illustration of the overall process involved in the pro-posed AMC solution is given in Figure 1. Different fromcumulants, the proposed features are expected to be simpleto collect (low complexity), to require fewer signal samples,and to provide robustness in different channel conditions. Theproposed features are optimized for the binary classificationof two modulations. To establish a low complexity classifierwith reduced feature dimension, we proposed to combine

978-1-4799-1180-6/13/$31.00 c©2013 IEEE

Fig. 1: Proposed solution for AMC with two stages: pre-classification training and the actual classification (testing).

original features into new features each representing a uniquebinary modulation combination for maximum the separationbetween two modulations using linear binomial logistic re-gression. The resulting class oriented features are then usedto construct a multi-dimensional feature space enabling fastclassification using K-nearest neighbour classifier.

In this paper, we first present the process for extractingdistribution based binary discriminative feature in Section 2.The combination and enhancement of the extracted featuresare explained in Section 3. After that, the AMC decisionmaking process is presented in Section 4. Simulation set upand results are listed in Section 5. The final conclusions aredrawn at the end.

2. FEATURE EXTRACTION

The extraction of the proposed distribution based binary dis-criminative features consist of three steps: establishing a sig-nal model, optimizing sampling locations on signal distribu-tions and extracting the features. Each step is presented indetail in the following subsections.

2.1. Signal Model

The signals considered are sampled without timing error. Theexpression for such signal r(n) is given in Equation (1)

r(n) = αej(2πfonT+θo)s(n) + g(n) (1)

where α is the channel gain, f0 is the frequency offset, θ0 isthe carrier phase offset, T is the symbol timing, and g(n) isthe Additive White Gaussian Noise (AWGN) with a varianceof σ2 given the definition of SNR.

SNR = 10 log10α2

σ2(2)

In this paper, we consider three most popular square M-QAMmodulations namely 4-QAM, 16-QAM and 64-QAM. s(n)is the transmitted signal symbol which can be expressed in acomplex form s(n) = sX(n) + sY (n)j, where sX(n) andsY (n) are components drawn randomly from the modulationcentroids components with equal probability.

We defined rX(n) and rY (n) to be real and imaginarycomponents of the received signal r(n)

r′X(n) =rX(n)− µrX

σrX(3)

r′Y (n) =rY (n)− µrY

σrY(4)

where r′X(n) and r′Y (n) are the normalized components ofthe signal’s complex expression on the I-Q plane. µrX andµrY are means of the received signals on separate dimensionswith the corresponding standard deviation σrX and σrY .

In this paper, we used the cumulative distribution of sig-nals on I-Q segments (CXYM ), amplitude (CAM ) and phase(CPM ) for analysis. Given a set of M-QAM signal r(·) of Nsamples, its distributions on different signal segments can becollected using the following equations.

CXYM (x) =1

2N

N∑n=1

{I(r′X(n) < x) + I(r′Y (n) < x)} (5)

CAM (x) =1

N

N∑n=1

I(| r′(n) |< x) (6)

CPM (x) =1

N

N∑n=1

I(arg(r′(n)) < x) (7)

where I(·) is a logic function which returns 1 if the input isture and 0 if the input is false, and arg(·) gives the phase ofthe complex input.

2.2. Sampling Location Optimization

The sampling location on signal distributions for feature ex-traction is a crucial part of the proposed AMC solution. Theoptimization of locations should follow these criteria.Criterion 1: The sampling locations should provide clear dis-crimination between two modulations.Criterion 2: The sampling locations should utilize wider dis-tribution ranger to provide more comprehensive informationof the modulation distribution.

Criterion 3: The locations should be at sufficient distance toavoid collecting repetitive and redundant information.

To satisfy the above criteria, we propose to use the local

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

X ( or Y)

Cu

mu

lati

ve P

rob

ab

ilit

y

4−QAM

16−QAM

Difference (absolute)

(a) I-Q

0 0.5 1 1.5 2 2.5 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Amplitude

Cu

mu

lati

ve P

rob

ab

ilit

y

4−QAM

16−QAM


(b) Amplitude

−3 −2 −1 0 1 2 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Phase

Cu

mu

lati

ve P

rob

ab

ilit

y

4−QAM

16−QAM


(c) Phase

Fig. 2: Cumulative Distributions of different signal segmentsfrom 4-QAM and 16-QAM at SNR of 15 dB. The solid linegive the absolute difference between two modulations.

maximums of the distance between two modulation’s cumu-lative distributions as sampling locations. We denote D∗M1M2

to be the distance between distributions from modulation M1

and M2. “∗” is used as a uniform representation of differ-ent signal segments including I-Q segments, amplitude andphase.

D∗M1M2=| C∗M1

− C∗M2| (8)

The optimized sampling locations should meet the conditionthat the distance at location l∗M1M2

should be the biggest

D∗M1M2(l∗M1M2

) = max(D∗M1M2(x)) (9)

within the range of

l∗M1M2−R∗ ≤ x ≤ l∗M1M2

+R∗ (10)

where R∗ is a range parameter. The manually optimized val-ues of R∗ in our simulation can be found in Table 1.

2.3. Feature Extraction

With the optimized feature sampling locations, the actualextraction process is very simple. In the case where the un-derlining modulation is known to the training signals, thereference features can be collected directly using the estab-lished cumulative distributions C∗M .

f∗M1M2(i) = C∗M (l∗M1M2

(i)) (11)

In the case where the signal being treated has unknownmodulation, the features can be extracted using a simplecounting measure with the sampling locations as thresholds.

f∗M1M2(i) =

1

N

N∑n=1

I(r∗(n) < l∗M1M2(i)) (12)

Equation (12) can be associated with Equation (5)-(7) to helpunderstand its implementation on different signal segments.

3. FEATURE COMBINATION

Feature combination is a good way to reduce feature di-mension and to better utilize all the available features. Thecombination processing requires extra computation while thetrained feature combination can vastly reduce the complexityof the classifier. In AMC, feature combination is frequentlyused technique to enable fast processing at classification stagewith low computation cost.

In this research, considering the nature of the distribu-tion based features, we employ linear feature combinationfor dimension reduction and enhancement. Binomial logisticregression is used to create a linear combination of featurewhich provides separation between two classes. The imple-mentation of binomial logistic regression is mostly standard.The logistic function is given as

p(fXYM1M2,fAM1M2

, fPM1M2) (13)

=1

1 + e−g(fXYM1M2

,fAM1M2

,fPM1M2

)

where p(·) = 0 for modulation M1 and p(·) = 1 for modula-tionM2. The logit function is linked with the original features

in the following format

g(fXYM1M2, fAM1M2

, fPM1M2) = B(0) +

LXYM1M2∑i=1

B(i)fXYM1M2(i)

+

LAM1M2∑i=1

B(LXYM1M2+ i)fAM1M2

(i)

+

LPM1M2∑i=1

B(LXYM1M2+ LAM1M2

+ i)fPM1M2(i) (14)

where LXYM1M2, LAM1M2

and LPM1M2are total number of orig-

inal features collected from each signal segments.The maximum likelihood estimates of coefficients B(·)

are found using Newton-Raphson method after 50 iterations.The coefficients are updated using the following update equa-tion,

Bt+1(·) = Bt(·) + J−1(Bt(·))u(Bt(·)) (15)

where J−1(Bt(·)) is the observed information matrix andu(Bt(·)) is the score function.

The resulting coefficients and original features from I-Qsegment fXYM1M2

(·), amplitude fAM1M2(·) and phase fPM1M2

(·)are combined to create a new feature FM1M2

specified for thediscrimination of modulation M1 and M2.

FM1M2= B(0) +

LXYM1M2∑i=1

B(i)fXYM1M2(i)

+

LAM1M2∑i=1

B(LXYM1M2+ i)fAM1M2

(i)

+

LPM1M2∑i=1

B(LXYM1M2+ LAM1M2

+ i)fPM1M2(i) (16)

In the case where there are more than two modulationcandidates, the enhanced features need normalization to cre-ate a properly scaled multi-dimensional feature space for clas-sification. The normalization is implemented by updating thetrained coefficients B(·) using training signals. With a num-ber of training signal realizations from each modulation, theenhanced features FM1M2

(·) for each signal realization canbe calculated using Equation (16). The coefficients are thenupdated using following equations.

B′(0) = B(0)− FM1M2(·) (17)

where FM1M2(·) is the mean of the training features,

B′(i) =B(i)

std(FM1M2(·))

, i > 0 (18)

where std(FM1M2(·)) gives the standard deviation of thetraining features.

−2

−1

0

1 −2

−1

0

1

−2

−1

0

1

2

3

F464F

416

F1664

4−QAM

16−QAM

64−QAM

(a) Three dimensional feature space

−2 −1.5 −1 −0.5 0 0.5 1 1.50

0.02

0.04

0.06

0.08

0.1

F416

Pro

bability

4−QAM

16−QAM

64−QAM

(b) F416 for 4-QAM and 16-QAM discrimination

−2 −1.5 −1 −0.5 0 0.5 1 1.50

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

F464

Pro

bability

4−QAM

16−QAM

64−QAM

(c) F464 for 4-QAM and 64-QAM discrimination

−4 −3 −2 −1 0 1 2 3 40

0.01

0.02

0.03

0.04

0.05

0.06

0.07

F1664

Probability

4−QAM

16−QAM

64−QAM

(d) F1664 for 16-QAM and 64-QAM discrimination

Fig. 3: Training signal realizations in the enhanced featurespace and their distribution projection on each separate fea-ture dimension. The signals consist of 100 signal realizationsfrom each modulation at SNR of 15 dB.

Table 1: Signal configuration and simulation parameters

Parameters Training TestingModulations 4-QAM

16-QAM64-QAM

4-QAM16-QAM64-QAM

# Realization 100x3 10,000x3Signal Length(N)

512 512

SNR 0-20 dB 0-20 dBPhase Offset 0-30◦ 0-30◦

RXY 0.2 N/ARA 0.15 N/ARP π/10 N/A

4. CLASSIFICATION

Once the feature extraction and feature combination are com-pleted, training data is used to establish a reference featurespace for the expected testing stage. In this research 100 real-izations of training signals from each modulation candidateare used as reference samples for the K-nearest neighbourclassifier. Given an unknown testing signal with extractedfeatures F416, F464 and F1664 and a reference point in the fea-ture space with F ′416, F ′464 and F ′1664, the following equationis used for distance calculation between the two,

d =√

(F416 − F ′416)2 + (F464 − F ′464)2 + (F1664 − F ′1664)2(19)

Figure 3 gives an example of such feature space. When anincoming signal is to be classified, the 17 nearest signal real-izations are found. The signal modulation which has the mostinstances of appearance in the 17 nearest signal realizations isreturned as the classification results.

5. EXPERIMENT AND RESULTS

To test the performance of the proposed AMC solution, twosets of experiments were conducted in the MATLAB envi-ronment. In both experiments, 4-QAM, 16-QAM and 64-QAM signals are generated according to Equation (1). Foreach channel configuration, a total 100 realizations of signalseach consisting 512 signal samples from each modulation aregenerated for training purpose. During testing, the numberof realizations is increased to 10,000 for each signal modu-lation. In sampling location optimization, the 100 signal re-alizations from the same signal modulation are combined tocreate a long signal realization of 51,200 samples. The in-creased number of samples helps to provide a smoother rep-resentation of signal distribution for analysis. The classifica-tion accuracy is calculated through the correct classificationin all signal realizations. The parameters used can be foundin Table 1.

0 2 4 6 8 10 12 14 16 18 2040

50

60

70

80

90

100

SNR (dB)

Cla

ssifi

cati

onA

ccu

racy

(%)

Average4-QAM16-QAM64-QAM

Fig. 4: Classification accuracy of different modulation typesin AWGN channel from the average of 10,000 signal realiza-tion each containing 512 number of samples.

5.1. AWGN Channel

In the AWGN channel, no phase or frequency offset is con-sidered. SNR from 0 dB to 20 dB are simulated. The signallengthN is set to 512. Figure 4 shows that 4-QAM is easier toclassify and the proposed method is able to achieve 100% ac-curacy with SNR above 4 dB. For 16-QAM and 64-QAM theclassification accuracy is similar throughout the SNR range.Perfect classification is achievable with SNR above 11 dB.The classification results coincide with the resulting featurespace through the feature enhancement process. Figure 3 alsoshows that the feature separation between 4-QAM and 16-QAM as well as 4-QAM and 64-QAM are much clearer thanthe separation between 16-QAM and 64-QAM.

In Figure 5, the performance comparison with two exist-ing methods is given. The ML classifier [1] gives the bestperformance at all SNR levels. This is no surprise as the chan-nel condition is ideal and all signal parameters are assumed tohave been estimated. However, the proposed method providesa very similar classification accuracy which only shows slightdisadvantage at low SNR between 0 and 10 dB. Meanwhile,the cumulant based Genetic Programming classifier [9] suf-fers from the low signal length (N = 512) used and givesmuch low accuracy even with high SNR levels.

5.2. Carrier Phase Offset

Another common channel condition is carrier phase offset. Inthis experiment, we simulated the carrier phase offset of 0◦

to 30◦. Other channel conditions are set to the same as theprevious experiments with SNR of 10 dB and signal length of512. Figure 6 shows the resulting classification accuracy forthree classifiers with different degrees of carrier phase offset.ML classifier achieves the best accuracy with no or little phase

0 2 4 6 8 10 12 14 16 18 2050

60

70

80

90

100

SNR (dB)

Cla

ssifi

cati

onA

ccu

racy

(%)

This researchML

GP-cumulants

Fig. 5: Averaged classification accuracy in AWGN channelusing different classifier.

offset. Meanwhile, the Kolmogorov Smirnov test (KS) classi-fier [4] is severely affected by the increasing amount of phaseoffset. Having similar classification accuracy with little car-rier phase offset, the proposed method is able to maintain anequal level of performance throughout the tested phase offsetrange. Consequently, it is able to outperform ML with phaseoffset over 15◦, and KS classifier under all conditions.

0 5 10 15 20 25 300

20

40

60

80

100

Phase Offset (◦)

Cla

ssifi

cati

onA

ccu

racy

(%)

This researchMLKS

Fig. 6: Averaged classification accuracy with carrier phaseoffset using different classifier.

6. CONCLUSION

Binary discriminative features are extracted from signal dis-tributions. The extraction points are optimized by finding thelocal maximum of distances between two modulations. Bino-mial logistic regression is used to combine original featuresto create smaller feature sets of enhanced classification ro-

bustness. The new feature set creates a feature space definedby the modulation candidates which enables low complexityclassification using K-nearest neighbour classifier. The simu-lation results suggest that the proposed AMC solution is ableto achieve near optimum classification accuracy with a SNRabove 10 dB in the AWGN channel. Its performance at lowerSNR is competitive and sometimes superior to state-of-the-artmethods. With carrier phase offset, the proposed classifier isable to achieve far superior performance than existing meth-ods given training under the same channel condition.

7. REFERENCES

[1] W. Wei and J. M. Mendel, “Maximum-Likelihood Classi-fication for Digital Amplitude-Phase Modulations,” IEEETransactions on Communications, vol. 48, no. 2, pp. 189–193, 2000.

[2] M. L. D. Wong and A. K. Nandi, “Semi-blind algo-rithms for automatic classification of digital modulationschemes,” Digital Signal Processing, vol. 18, no. 2, pp.209–227, 2008.

[3] J. L. Xu, W. Su, and M. Zhou, “Software-Defined RadioEquipped With Rapid Modulation Recognition,” IEEETransactions on Vehicular Technology, vol. 59, no. 4, pp.1659–1667, 2010.

[4] F. Wang and X. Wang, “Fast and robust modulation clas-sification via Kolmogorov-Smirnov test,” IEEE Transac-tions on Communications, vol. 58, no. 8, pp. 2324–2332,2010.

[5] P. Urriza, E. Rebeiz, P. Pawelczak, and D. Cabric, “Com-putationally Efficient Modulation Level ClassificationBased on Probability Distribution Distance Functions,”IEEE Communications Letters, vol. 15, no. 5, pp. 476–478, 2011.

[6] A. K. Nandi and E. E. Azzouz, “Algorithms for automaticmodulation recognition of communication signals,” IEEETransactions on Communications, vol. 46, no. 4, pp. 431–436, 1998.

[7] A. Swami and B. M. Sadler, “Hierarchical digital modu-lation classification using cumulants,” IEEE Transactionson Communications, vol. 48, no. 3, pp. 416–429, 2000.

[8] Z. Zhu, M. W. Aslam, and A. K. Nandi, “Augmented Ge-netic Programming for automatic digital modulation clas-sification,” in IEEE International Workshop on MachineLearning for Signal Processing (MLSP), 2010, pp. 391–396.

[9] M. W. Aslam, Z. Zhu, and A. K. Nandi, “AutomaticModulation Classification Using Combination of GeneticProgramming and KNN,” IEEE Transactions on WirelessCommunications, vol. 11, no. 8, pp. 2742–2750, 2012.

Documents

[IEEE 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) - Southampton, United Kingdom (2013.09.22-2013.09.25)] 2013 IEEE International Workshop on Machine