5
Multiple Joint-Variable Domains Recognition of Human Motion Branka Jokanovic, Moeness Amin, and Baris Erol Center for Advanced Communications, Villanova University, Villanova, PA 19085, USA Email:{branka.jokanovic, moeness.amin, berol}@villanova.edu Abstract—Radar has been successfully employed for classify- ing human motions in defense, security and civilian applications, and has emerged to potentially become a technology of choice in the healthcare industry, specifically in what pertains to assisted living. Due to the relationship between Doppler frequency and motion kinematics, the time-frequency domain has been tradition- ally used to analyze radar signals of human gross-motor activities. Towards improving motion classification, this paper incorporates three domains, namely, time-frequency, time-range, and range- Doppler domains. Features from each domain are extracted using deep neural network that is based on stacked auto-encoders. Final decision is made by combining the classification outcomes. Experimental results demonstrate that certain domains are more favorable than others in recognizing specific motion articulations, thus reinforcing the merits of multi-domain motion classifications. I. I NTRODUCTION The continuous improvement of living and working condi- tions since World War II has advanced quality of life across many countries. This is especially evident in the developed regions where senior population is booming [1]. The greater life expectancy has generated new challenges in the health care industry. The number of people with chronic diseases is steadily growing as well as the need for hospitalization. These factors, as well as the limited budget of many patients and the limited number of care units in health care facilities, has spurred the interest in the development of new approaches for remote health care services. The main task is to provide professional health monitoring with minimal intrusion on the patient’s daily activities. Human motion recognition (HMR) is one of the key issues in health care monitoring. Injuries and chronic diseases can significantly impact mobility of patients and their performance of common daily activities such as climbing the stairs or bend- ing [2]. Existence of a real-time system for HMR would be beneficial since it could help monitor progress in rehabilitation or quantitatively measure the effect of a prescribed therapy. Various methods for HMR have been proposed [3]-[6]. They can be generally classified based on the type of sensors they use. Sensors can be wearable and non-wearable, i.e., remote. Wearable methods are typically attached to person’s body and they are simple to use. The main drawback of these methods is that they depend on the user’s readiness and awareness to use them timely and properly. On the contrary, remote devices are user-independent. Much research which has focused on the remote methods was done in the field of computer vision. Camera-based systems, that are ubiquitous in many applications, can provide high resolution images and capture some of the smallest move- ments. However, these systems are sensitive to illumination and obstructions. Radar is a remote technology that is robust to these matters. Additionally, privacy concern issues that are associated with most of the camera based systems are nonexistent. Previous research on HMR using radar can be categorized using different criteria. Some of them include: the type of model used (parametric or nonparametric) [7],[8], feature extraction method (manual approach, PCA, deep learning) [9], [10], classifier type [11], [12]. The time-frequency domain is commonly used when ob- serving human motions. This domain depicts time-varying velocities of different human body parts. Features extracted in this domain often have physical interpretations, including energy, periodicity, and highest frequencies. Other domains have also been used for analysis of human motions [13]- [15]. For example, cadence velocity diagram can be used to extract features based on pseudo-Zernike moments [15]. These features offer translational and scale invariance, making the classification process more robust to differences that exist within the same class. The aim of this paper is to improve classification rate of a single domain by combining motion information from multiple domains using a single radar unit and considering different motion orientations with respect to the line of sight. In order to benefit from ensemble classification, the domains in use should provide mutually complementary information. This is the main motivation to choose the following three domains when observing input data: time-frequency domain, slow time-range domain (i.e., range map) and integrated slow- time range-Doppler domain [16], [17]. These domains offer different distinction levels for different human motions. For example, walking and falling are often easily identified in the time-frequency domain, while they can be misclassified based on the range map. The input signal to the classifier is observed in each domain separately and classification rates are combined using the voting technique. It is shown that the proposed approach of using a combination of all domains provides significantly higher motion classification success rate compared to any single joint-variable domain. The paper is organized as follows. Section II describes the domains that are used in this work. Proposed combination 978-1-4673-8823-8/17/$31.00 ©2017 IEEE 0948

Multiple Joint-Variable Domains Recognition of Human Motionancortek.com/wp-content/uploads/2017/06/Multiple-Joint-Variable... · Multiple Joint-Variable Domains Recognition of Human

  • Upload
    vanthuy

  • View
    230

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Multiple Joint-Variable Domains Recognition of Human Motionancortek.com/wp-content/uploads/2017/06/Multiple-Joint-Variable... · Multiple Joint-Variable Domains Recognition of Human

Multiple Joint-Variable Domains Recognition ofHuman Motion

Branka Jokanovic, Moeness Amin, and Baris ErolCenter for Advanced Communications, Villanova University, Villanova, PA 19085, USA

Email:{branka.jokanovic, moeness.amin, berol}@villanova.edu

Abstract—Radar has been successfully employed for classify-ing human motions in defense, security and civilian applications,and has emerged to potentially become a technology of choice inthe healthcare industry, specifically in what pertains to assistedliving. Due to the relationship between Doppler frequency andmotion kinematics, the time-frequency domain has been tradition-ally used to analyze radar signals of human gross-motor activities.Towards improving motion classification, this paper incorporatesthree domains, namely, time-frequency, time-range, and range-Doppler domains. Features from each domain are extracted usingdeep neural network that is based on stacked auto-encoders.Final decision is made by combining the classification outcomes.Experimental results demonstrate that certain domains are morefavorable than others in recognizing specific motion articulations,thus reinforcing the merits of multi-domain motion classifications.

I. INTRODUCTION

The continuous improvement of living and working condi-tions since World War II has advanced quality of life acrossmany countries. This is especially evident in the developedregions where senior population is booming [1]. The greaterlife expectancy has generated new challenges in the healthcare industry. The number of people with chronic diseasesis steadily growing as well as the need for hospitalization.These factors, as well as the limited budget of many patientsand the limited number of care units in health care facilities,has spurred the interest in the development of new approachesfor remote health care services. The main task is to provideprofessional health monitoring with minimal intrusion on thepatient’s daily activities.

Human motion recognition (HMR) is one of the key issuesin health care monitoring. Injuries and chronic diseases cansignificantly impact mobility of patients and their performanceof common daily activities such as climbing the stairs or bend-ing [2]. Existence of a real-time system for HMR would bebeneficial since it could help monitor progress in rehabilitationor quantitatively measure the effect of a prescribed therapy.

Various methods for HMR have been proposed [3]-[6].They can be generally classified based on the type of sensorsthey use. Sensors can be wearable and non-wearable, i.e.,remote. Wearable methods are typically attached to person’sbody and they are simple to use. The main drawback ofthese methods is that they depend on the user’s readiness andawareness to use them timely and properly.

On the contrary, remote devices are user-independent.Much research which has focused on the remote methods wasdone in the field of computer vision. Camera-based systems,that are ubiquitous in many applications, can provide high

resolution images and capture some of the smallest move-ments. However, these systems are sensitive to illuminationand obstructions. Radar is a remote technology that is robustto these matters. Additionally, privacy concern issues thatare associated with most of the camera based systems arenonexistent.

Previous research on HMR using radar can be categorizedusing different criteria. Some of them include:

• the type of model used (parametric or nonparametric)[7],[8],

• feature extraction method (manual approach, PCA,deep learning) [9], [10],

• classifier type [11], [12].

The time-frequency domain is commonly used when ob-serving human motions. This domain depicts time-varyingvelocities of different human body parts. Features extractedin this domain often have physical interpretations, includingenergy, periodicity, and highest frequencies. Other domainshave also been used for analysis of human motions [13]-[15]. For example, cadence velocity diagram can be usedto extract features based on pseudo-Zernike moments [15].These features offer translational and scale invariance, makingthe classification process more robust to differences that existwithin the same class.

The aim of this paper is to improve classification rateof a single domain by combining motion information frommultiple domains using a single radar unit and consideringdifferent motion orientations with respect to the line of sight.In order to benefit from ensemble classification, the domainsin use should provide mutually complementary information.This is the main motivation to choose the following threedomains when observing input data: time-frequency domain,slow time-range domain (i.e., range map) and integrated slow-time range-Doppler domain [16], [17]. These domains offerdifferent distinction levels for different human motions. Forexample, walking and falling are often easily identified inthe time-frequency domain, while they can be misclassifiedbased on the range map. The input signal to the classifier isobserved in each domain separately and classification ratesare combined using the voting technique. It is shown thatthe proposed approach of using a combination of all domainsprovides significantly higher motion classification success ratecompared to any single joint-variable domain.

The paper is organized as follows. Section II describesthe domains that are used in this work. Proposed combination

978-1-4673-8823-8/17/$31.00 ©2017 IEEE 0948

Page 2: Multiple Joint-Variable Domains Recognition of Human Motionancortek.com/wp-content/uploads/2017/06/Multiple-Joint-Variable... · Multiple Joint-Variable Domains Recognition of Human

scheme is given in Section III, while the experimental resultsare shown in Section IV. Conclusion is given in Section V.

II. DOMAINS USED TO REPRESENT RADAR SIGNALS

When using FMCW radar, it is possible to represent theradar signals in different joint-variable domains that includeslow-time, range, and Doppler. Each domain provides infor-mation that may not be present or easily identifiable in otherdomains. In this paper we observe the data in three domains:time-frequency domain, range map and integrated slow-timerange-Doppler domain.

The time-frequency domain has traditionally been usedto represent radar returns from human subjects. Due to therelationship between velocity and Doppler frequency, thisdomain can be used to depict the velocity change of humanbody parts over time. Typically, the spectrogram is used as thetime-frequency signal representation, depicting the distributionof the signal power over time and frequency. The spectrogramfor a discrete signal s(n), n = 0..N − 1 is defined as:

SPEC(n, k) = |N−1∑m=0

h(m)s(n−m)e−j2πkm/N |2, (1)

where h(m) is a window function.

The time-frequency domain contains information aboutvelocities and their changes over time, but it is insensitiveto range and its time-dependence. The range information isdepicted in the range map. The range map can be obtainedby applying Fast Fourier transform (FFT) of the input IQ datathat is organized in a two dimensional array.

The integrated slow time range-Doppler domain providesboth velocity and range information [16]. Typical range-Doppler map can be obtained by applying FFT over slow time.Integrated version observes range-Doppler maps over time andgenerates a compilation of these images. Fig. 1 illustrates howa representation in this domain is obtained.

Figs. 2-5 depict four observed motions in the three afore-mentioned domains. It is observed that motions which can beeasily and visually distinguished in one domain may not becorrectly identified in another domain. This motivates the useof multiple domains aiming at reduction of false alarms andmissed detection rates.

III. MULTIPLE DOMAIN BASED HMR

HMR using multiple domains is presented in this section.Fig. 6 illustrates the proposed approach. At the input, we haveimages of motions for each domain. Images are preprocessed,i.e., denoising is performed. Besides denoising, DC componentis removed in the spectrograms. The next step is feature extrac-tion. There are numerous ways to extract features from radarimages. Manual extraction of features has been commonly usedin the underlying problem. However, because hand-pickingfeatures may not tell the ”whole story” and is also a tedioustask involving tuning of parameters and thresholds, moreautomated methods have been recently considered [9]. Deeplearning techniques have emerged as a powerful approach forfeature extraction [10], [18].

Fig. 1. Illustration of generating intergrated slow-time range-Doppler map.

Fig. 7. Example of an autoencoder with N input units and K hidden units.

In this paper, feature extraction is performed using unsu-pervised learning approach. Namely, two stacked sparse auto-encoders are used to obtain the most prominent features of theinput data [19]. A sparse auto-encoder is defined as a neuralnetwork which attempts to learn the sparse representation ofthe input.

Fig. 7 shows the structure of a sparse autoencoder withN input and K hidden units. Connections between units aredetermined by weight matrix W and and bias vectors b1 and b2,which are typically represented by units with the ”+1” label.These parameters are obtained by minimizing a cost function,which is defined as follows:

978-1-4673-8823-8/17/$31.00 ©2017 IEEE 0949

Page 3: Multiple Joint-Variable Domains Recognition of Human Motionancortek.com/wp-content/uploads/2017/06/Multiple-Joint-Variable... · Multiple Joint-Variable Domains Recognition of Human

(a) (b) (c)

Fig. 2. Falling motion representation in three domains: (a) Time-frequency domain, (b) Range map, (c) Integrated slow-time range-Doppler map.

(a) (b) (c)

Fig. 3. Sitting motion representation in three domains: (a) Time-frequency domain, (b) Range map, (c) Integrated slow-time range-Doppler map.

(a) (b) (c)

Fig. 4. Bending motion representation in three domains: (a) Time-frequency domain, (b) Range map, (c) Integrated slow-time range-Doppler map.

J(W, b) = E(W, b) + βDKL(ρ, ρ). (2)

The first term represents the error between input data x andthe auto-encoder output x. Input data corresponds to vectorizedimages of any of the three joint-variable representations con-sidered. For a single training example, this error is formulatedas:

E(W, b) = ‖x− x‖22 (3)= ‖x− [σ(WTσ(Wx+ b1) + b2)]‖22,

where σ{•} denotes the sigmoid function. The second term inthe cost function is responsible for obtaining sparse representa-tion. Sparsity is promoted using Kullback-Leibler divergenceDKL(ρ, ρ) that depends on ρ, a sparsity parameter, and ρ,the average activation of each hidden neuron. DKL(ρ, ρj) isdefined as:

DKL(ρ, ρj) = ρ logρ

ρj+ (1− ρ) log 1− ρ

1− ρj. (4)

The parameter β determines the importance of the sparsityterm. A regularization term can be added in order to preventweights from assuming high values. Two sparse auto-encodersare used since it is more prudent to learn sparse representationgradually, i.e., using several layers.

The classification is done using a softmax regressionclassifier [19]. The output of this classifier is defined as L-dimensional vector where L denotes the number of motionclasses under considerations. The elements of this vectorrepresent the estimated probabilities that the input data belongsto the given class labels.

After obtaining results from each classifier, voting is per-formed. Namely, final decision is based on the majority ofvotes. Each classifier is given same weight, i.e., no classifierwas given more or less confidence compared to others.

IV. EXPERIMENTAL RESULTS

The FMCW radar experiments were conducted in the RadarImaging Lab at the Center for Advanced Communications,

978-1-4673-8823-8/17/$31.00 ©2017 IEEE 0950

Page 4: Multiple Joint-Variable Domains Recognition of Human Motionancortek.com/wp-content/uploads/2017/06/Multiple-Joint-Variable... · Multiple Joint-Variable Domains Recognition of Human

(a) (b) (c)

Fig. 5. Walking motion representation in three domains: (a) Time-frequency domain, (b) Range map, (c) Integrated slow-time range-Doppler map.

Fig. 6. The scheme for recognizing a motion.

Villanova University. The UWB system used in the experi-ments, named SDRKIT 2500B, is developed by Ancortek, Inc.Operating parameters of the radar system are:

• transmitting frequency 24 GHz,

• pulse repetition frequency (PRF) 1000 Hz,

• bandwidth 2 GHz which provides 0.075 m rangeresolution.

Three male subjects, all aged 26 years, participated in theexperiments. Their physical characteristics are given in Table I.Dataset contains four human motions: walking, falling, sittingand bending. Subjects were asked to perform each motionstarting from a standing position at 3.5 m away from radar.There were 8 trials of each motion per subject, resulting in96 signals. In order to investigate the robustness of classifierwith respect to the direction angle, 5 trials of each motionwere performed at 0 degrees, while the remaining 3 trials wereperformed at 22.5◦, 30◦ and 45◦, respectively. In this paper,we are only interested in detecting different motion classesi.e., inter-class variability. Intra-class variability, which canbe attributed to subject’s physical characteristics and directionangles, is not considered.

Each motion was observed during a time span of 4s andimages 64x64 are populated. It is noted that in continuousmonitoring of daily activities, the beginning and end times ofa class of specific motions or of any motion can be discerned

TABLE I. PHYSICAL CHARACTERISTICS OF SUBJECTS.

Subject A Subject B Subject CHeight 5’9” 5’9” 5’10”

Weight (lbs) 171 207 194

by monitoring the energy levels in corresponding bands or overthe entire signal bandwidth, respectively [20], [21].

The above images are used as inputs to the follow-onclassification stages. Matlab Neural Network Toolbox is usedfor the implementation of stacked autoencoders and softmaxregression classifier. The number of neurons in the hiddenlayer for the first auto-encoder is set to 300, meaning thatthe network is trying to compress 4096 coefficients into 300.These 300 outputs are further compressed by using only 150neurons in the next hidden layer. The final stage is a softmaxregression classifier which determines the probability that inputdata belongs to one of the four possible classes. Half of thedataset is used for training, while the rest is used for testing.Due to small dataset size, ten different combinations of trainingand testing sets are chosen and the results are averaged.

The confusion matrices for each domain are shown inTables II-IV. The confusion matrix which represents the finaldecision based on the combination of classifiers using votingtechnique is shown in Table V. We can notice that the proposedapproach of using combination of all domains provides signif-icantly higher success rate compared to any single domain.

978-1-4673-8823-8/17/$31.00 ©2017 IEEE 0951

Page 5: Multiple Joint-Variable Domains Recognition of Human Motionancortek.com/wp-content/uploads/2017/06/Multiple-Joint-Variable... · Multiple Joint-Variable Domains Recognition of Human

TABLE II. CONFUSION MATRIX FOR THE TIME-FREQUENCY DOMAIN.

Predicted/Actual Class Fall Sit Bend WalkFall 91.3% 6% 0.7% 2%Sit 6% 79.3% 4% 4.7%

Bend 2.7% 9.3% 95.3% -Walk - 5.4% - 93.3%

TABLE III. CONFUSION MATRIX FOR THE RANGE MAP.

Predicted/Actual Class Fall Sit Bend WalkFall 82% 2.6% 1.3% 4.6%Sit 7.3% 90% 11.3% -

Bend 1.3% 7.4% 87.4% -Walk 9.4% - - 95.4%

V. CONCLUSION

This paper demonstrated that the combination of resultsobtained from multiple joint-variable domains in the HMRprocess offers an improvement over the use of a single domain.Four human motions were observed, namely, walking, falling,sitting and bending. Each domain offers additional informationto the classification process, thus rendering the motions morerecognizable. We showed that while falling and sitting canbe misinterpreted in the time-frequency domain, they can beeasily distinguished in the range map. We used simple votingtechnique to render a final decision.

Combining data joint-variable representation domains formotion classification was performed using a single radarunit. Extensions to two or multiple radar network where thecombination or fusion of information also includes the spacevariable can be readily accomplished [22]-[24].

REFERENCES

[1] Aging Statistics, 2016. [Online]. Available: http://www.aoa.acl.gov/aging statistics/index.aspx

[2] A.-K. Seifert, M. G. Amin, and A. M. Zoubir, “New analysis of radarmicro-doppler gait signatures for rehabilitation and assisted living,” inIEEE ICASSP, 2017.

[3] A. I. Cuesta-Vargas, A. Galan-Mercant, and J. M. Williams, “The use ofinertial sensors system for human motion analysis,” Physical TherapyReviews, vol. 15, no. 6, pp. 462–473, 2010.

[4] O. D. Lara and M. A. Labrador, “A survey on human activity recognitionusing wearable sensors,” IEEE Communications Surveys & Tutorials,vol. 15, no. 3, pp. 1192–1209, 2013.

[5] C.-C. Yang and Y.-L. Hsu, “A review of accelerometry-based wearablemotion detectors for physical activity monitoring,” Sensors, vol. 10,no. 8, pp. 7772–7788, 2010.

[6] M. G. Amin, Ed., Radar for indoor monitoring. Boca Raton, FL: CRCPress, 2017.

[7] P. van Dorp and F. Groen, “Human walking estimation with radar,” IETRadar, Sonar and Navigation, vol. 150, no. 5, pp. 356–365, 2003.

[8] Y. He, P. Molchanov, T. Sakamoto, P. Aubry, F. Le Chevalier, andA. Yarovoy, “Range-doppler surface: a tool to analyse human target inultra-wideband radar,” IET Radar, Sonar and Navigation, vol. 9, no. 9,pp. 1240–1250, 2015.

[9] B. G. Mobasseri and M. G. Amin, “A time-frequency classifier forhuman gait recognition,” in Proc. SPIE, 2009, pp. 730 628–730 628.

TABLE IV. CONFUSION MATRIX FOR THE INTEGRATED SLOW-TIMERANGE-DOPPLER MAP.

Predicted/Actual Class Fall Sit Bend WalkFall 92.6% - - 4%Sit - 90.6% 9.3% 0.7%

Bend - 9.4% 89.3% 5.3%Walk 7.4% - 1.4% 90%

TABLE V. CONFUSION MATRIX FOR THE APPROACH BASED ON THECOMBINATION OF CLASSIFIERS.

Predicted/Actual Class Fall Sit Bend WalkFall 89.7% - - 4%Sit - 100% - -

Bend - - 100% -Walk 10.3% - - 96%

[10] B. Jokanovic, M. Amin, and F. Ahmad, “Radar fall motion detectionusing deep learning,” in 2016 IEEE Radar Conference (RadarConf),May 2016, pp. 1–6.

[11] Y. Kim and H. Ling, “Human activity classification based on micro-Doppler signatures using a support vector machine,” IEEE Trans.Geosci. Remote Sens., vol. 47, no. 5, pp. 1328–1337, 2009.

[12] ——, “Human activity classification based on micro-doppler signaturesusing an artificial neural network,” in IEEE APS, 2008, pp. 1–4.

[13] P. Molchanov, J. Astola, K. Egiazarian, and A. Totsky, “Ground movingtarget classification by using DCT coefficients extracted from micro-doppler radar signatures and artificial neuron network,” in MRRSSymposium. IEEE, 2011, pp. 173–176.

[14] A. Balleri, K. Chetty, and K. Woodbridge, “Classification of personneltargets by acoustic micro-Doppler signatures,” IET Radar, Sonar andNavigation, vol. 5, no. 9, pp. 943–951, 2011.

[15] C. Clemente, L. Pallotta, A. De Maio, J. J. Soraghan, and A. Farina,“A novel algorithm for radar classification based on doppler character-istics exploiting orthogonal pseudo-zernike polynomials,” IEEE Trans.Aerosp. Electron. Syst, vol. 51, no. 1, pp. 417–430, 2015.

[16] D. Tahmoush and J. Silvious, “Time-integrated range-doppler mapsfor visualizing and classifying radar data,” in 2011 IEEE RadarCon(RADAR), 2011, pp. 372–374.

[17] B. Erol and M. G. Amin, “Fall motion detection using combined rangeand doppler features,” in EUSIPCO 2016, pp. 2075–2080.

[18] Y. Kim and T. Moon, “Human detection and activity classification basedon micro-doppler signatures using deep convolutional neural networks,”IEEE Geosci. Remote Sens. Lett., vol. 13, no. 1, pp. 8–12, 2016.

[19] A. Ng, J. Ngiam, C. Y. Foo, Y. Mai, and C. Suen, Unsupervised FeatureLearning and Deep Learning Tutorial, 2013. [Online]. Available:http://deeplearning.stanford.edu/wiki/index.php/UFLDL Tutorial

[20] M. G. Amin, Y. D. Zhang, F. Ahmad, and K. D. Ho, “Radar signal pro-cessing for elderly fall detection: The future for in-home monitoring,”IEEE Signal Process. Mag., vol. 33, no. 2, pp. 71–80, 2016.

[21] B. Y. Su, K. Ho, M. J. Rantz, and M. Skubic, “Doppler radar fallactivity detection using the wavelet transform,” IEEE Trans. Biomed.Eng., vol. 62, no. 3, pp. 865–875, 2015.

[22] B. Erol, M. Amin, and B. Boashash, “Range-doppler radar sensor fusionfor fall detection,” in IEEE RadarCon, 2017, pp. 1–6.

[23] F. Fioranelli, M. Ritchie, and H. Griffiths, “Aspect angle depen-dence and multistatic data fusion for micro-Doppler classification ofarmed/unarmed personnel,” IET Radar, Sonar and Navigation, vol. 9,no. 9, pp. 1231–1239, 2015.

[24] S. Tomii and T. Ohtsuki, “Learning based falling detection usingmultiple Doppler sensors,” Advances in Internet of Things, vol. 3, 2013.

978-1-4673-8823-8/17/$31.00 ©2017 IEEE 0952