18
ORIGINAL ARTICLE A comparative analysis of principal component and independent component techniques for electrocardiograms M. P. S. Chawla Received: 7 February 2007 / Accepted: 1 July 2008 Ó Springer-Verlag London Limited 2008 Abstract Principal component analysis (PCA) is used for ECG data compression, denoising and decorrelation of noisy and useful ECG components or signals. In this study, a comparative analysis of independent component analysis (ICA) and PCA for correction of ECG signals is carried out by removing noise and artifacts from various raw ECG data sets. PCA and ICA scatter plots of various chest and aug- mented ECG leads and their combinations are plotted to examine the varying orientations of the heart signal. In order to qualitatively illustrate the recovery of the shape of the ECG signals with high fidelity using ICA, corrected source signals and extracted independent components are plotted. In this analysis, it is also investigated if difference between the two kurtosis coefficients is positive than on each of the respective channels and if we get a super- Gaussian signal, or a sub-Gaussian signal. The efficacy of the combined PCA–ICA algorithm is verified on six channels V1, V3, V6, AF, AR and AL of 12-channel ECG data. ICA has been utilized for identifying and for removing noise and artifacts from the ECG signals. ECG signals are further corrected by using statistical measures after ICA processing. PCA scatter plots of various ECG leads give different orientations of the same heart infor- mation when considered for different combinations of leads by quadrant analysis. The PCA results have been also obtained for different combinations of ECG leads to find correlations between them and demonstrate that there is significant improvement in signal quality, i.e., signal-to- noise ratio is improved. In this paper, the noise sensitivity, specificity and accuracy of the PCA method is evaluated by examining the effect of noise, base-line wander and their combinations on the characteristics of ECG for classifica- tion of true and false peaks. Keywords Electrocardiogram Correlated Gaussanity Scatter plots Principal component analysis Feature extraction Variance estimator Independent component analysis 1 Introduction With the increasing noise and condition differences, it is possible to assess the ability of the principal component analysis (PCA) [14] to extract the appropriate ECG components and statistically separate the conditions, under the differing noise conditions. By this time it is strongly felt that, emphasis is required on the choice of appropriate standard statistical models and methods of statistical inference for ECG. Re-sampling methods using many randomly computer-generated ECG samples can finally be checked for estimating characteristics of a distribution and for statistical inference. Statistical analysis too often has meant the manipulation of the redundant ECG data by means of judicious methods to solve a problem that has not yet been defined [1, 3, 5]. The foundation of all statistical methodology is the probability theory, which progresses from elementary to the most advanced mathematics. Independent component analysis (ICA) is a signal pro- cessing technique originating from the field of blind source separation and has been widely used in many fields such as biomedical signal processing, speech processing and communication [2, 5, 6]. One of the most well studied and understood fact about the physiological signals is the measured electrical potential related to the beating of the M. P. S. Chawla (&) Department of Electrical Engineering, Indian Institute of Technology, Roorkee 247667, India e-mail: [email protected]; [email protected] 123 Neural Comput & Applic DOI 10.1007/s00521-008-0195-1

A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

ORIGINAL ARTICLE

A comparative analysis of principal component and independentcomponent techniques for electrocardiograms

M. P. S. Chawla

Received: 7 February 2007 / Accepted: 1 July 2008

� Springer-Verlag London Limited 2008

Abstract Principal component analysis (PCA) is used for

ECG data compression, denoising and decorrelation of

noisy and useful ECG components or signals. In this study,

a comparative analysis of independent component analysis

(ICA) and PCA for correction of ECG signals is carried out

by removing noise and artifacts from various raw ECG data

sets. PCA and ICA scatter plots of various chest and aug-

mented ECG leads and their combinations are plotted to

examine the varying orientations of the heart signal. In

order to qualitatively illustrate the recovery of the shape of

the ECG signals with high fidelity using ICA, corrected

source signals and extracted independent components are

plotted. In this analysis, it is also investigated if difference

between the two kurtosis coefficients is positive than on

each of the respective channels and if we get a super-

Gaussian signal, or a sub-Gaussian signal. The efficacy of

the combined PCA–ICA algorithm is verified on six

channels V1, V3, V6, AF, AR and AL of 12-channel ECG

data. ICA has been utilized for identifying and for

removing noise and artifacts from the ECG signals. ECG

signals are further corrected by using statistical measures

after ICA processing. PCA scatter plots of various ECG

leads give different orientations of the same heart infor-

mation when considered for different combinations of

leads by quadrant analysis. The PCA results have been also

obtained for different combinations of ECG leads to find

correlations between them and demonstrate that there is

significant improvement in signal quality, i.e., signal-to-

noise ratio is improved. In this paper, the noise sensitivity,

specificity and accuracy of the PCA method is evaluated by

examining the effect of noise, base-line wander and their

combinations on the characteristics of ECG for classifica-

tion of true and false peaks.

Keywords Electrocardiogram � Correlated � Gaussanity �Scatter plots � Principal component analysis �Feature extraction � Variance estimator �Independent component analysis

1 Introduction

With the increasing noise and condition differences, it is

possible to assess the ability of the principal component

analysis (PCA) [1–4] to extract the appropriate ECG

components and statistically separate the conditions, under

the differing noise conditions. By this time it is strongly

felt that, emphasis is required on the choice of appropriate

standard statistical models and methods of statistical

inference for ECG. Re-sampling methods using many

randomly computer-generated ECG samples can finally be

checked for estimating characteristics of a distribution and

for statistical inference. Statistical analysis too often has

meant the manipulation of the redundant ECG data by

means of judicious methods to solve a problem that has not

yet been defined [1, 3, 5]. The foundation of all statistical

methodology is the probability theory, which progresses

from elementary to the most advanced mathematics.

Independent component analysis (ICA) is a signal pro-

cessing technique originating from the field of blind source

separation and has been widely used in many fields such as

biomedical signal processing, speech processing and

communication [2, 5, 6]. One of the most well studied and

understood fact about the physiological signals is the

measured electrical potential related to the beating of the

M. P. S. Chawla (&)

Department of Electrical Engineering,

Indian Institute of Technology, Roorkee 247667, India

e-mail: [email protected]; [email protected]

123

Neural Comput & Applic

DOI 10.1007/s00521-008-0195-1

Page 2: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

heart [1, 2, 4, 5, 7–9]. The ECG relates the observed ionic

current on the skin to events that occur in heart. The ECG

is typically measured by placing various electrodes on the

chest as well as the arms and/or legs [10–12]. In order to

employ the ECG signal for facilitating medical diagnosis,

statistical methods like PCA and ICA can be used to cor-

rect the ECG signal by removing either all or some sources

of noise. In order to overcome the limitations of conven-

tional ECG filtering methods, investigation is oriented

toward the reconstruction of ECG signal based on higher

order statistics [1, 2, 5–7]. In the analysis and feature

extraction of electrocardiograms, PCA and ICA techniques

have been successfully applied by various researchers in

the past for the separation of artifacts and other distur-

bances to enhance the morphological ECG features for

diagnostic information [3, 11, 13–16].

Compared to PCA methods, ICA technique realizes not

only decorrelation, but also accounts for the high-order

statistical independency [1–3, 7, 17]. Much of the misun-

derstanding and lack of proper utilization of statistics of its

probabilistic foundation is felt when assumptions of the

underlying probabilistic (mathematical) model are grossly

violated and the derived inferential methods will lead to

misleading and irrational conclusions [1, 3, 5, 7, 18]. The

techniques under such situations generally used are deriv-

ative-based techniques such as, classical digital filtering,

adaptive filtering, wavelets, neural networks, mathematical

morphology, genetic algorithms, Hilbert transform, syn-

tactic methods and zero-crossing-based identification

techniques. The basic method of denoising an ECG signal

is through filtering, but filtering physiological signals is not

trivial and is highly subjective as the information is spread

over different frequency bands and different measurement

channels [1–3, 5, 6, 15, 19].

In the first stage of analysis in this paper, an ECG data

compression procedure and detection of ECG segments

using PCA is described. In the present analysis, PCA is

used to exploit the fact that by identifying the dependencies

of higher index coefficients on the first principal compo-

nent may set a strategy for better ECG interpretation. The

extraction of independent components (ICs) is done using

ICA, and various combinational ECG lead cases among the

leads V1, V3, V6, AF, AR and AL are checked to find

correlations between them for best feature extraction and

classification. Variance is calculated for ten segments

considered for each of the IC and then total variance of

these variances is carried out to reveal all the hidden

dynamics in the ECG signals, as first discussed in [6] and

then reimplemented in [1, 5, 7, 8] as a mark of check of the

results presented in [6]. The simulation results are obtained

for CSE database ECG signals using Matlab. Based on the

combinations of the leads chosen, the position of noise and

useful ECG data in various quadrants of ICA scatter plots

are observed which could be a good basis for feature

selection as compared to PCA scatter plots obtained for the

same data base [1, 3].

2 PCA and ICA in ECG processing

One of the major concerns with high dimensional ECG

datasets is that, in some cases, all the measured segments or

intervals are not important for classification and interpre-

tation [1, 3–5, 7, 17]. Classical statistical methods break

down partly because of the increase in the number of

observations. In the present times, wavelets are also being

popularly used in ECG processing, since it is a non-

supervised method, allowing the process to be used in an

off-line automatic analysis of electrocardiograms [3, 11,

13–16, 39–42]. Moreover, the results are as accurate as

those obtained with other methods, but with much less

effort. The disadvantage of this method is the need to

calculate some approximations to find the best one,

although it is fast enough in any computer [1, 5, 7, 8, 10].

PCA is one of the most established techniques in multi-

variate statistical analysis and has been applied to ECG

compression. ICA is a statistical technique which performs

blind source separation on linear mixtures of statistically

independent ECG sources [20, 21]. The idea to use PCA

and ICA is due to the fact that, in most of the ECG data,

there is a large amount of noise and artifacts as well as

redundant information which is unnecessary for diagnostic

applications. Figure 1 gives a simple understanding of

PCA and ICA basis vectors as well of the ECG lead

configurations.

3 A comparative survey of PCA and existing methods

Principal component analysis is unsupervised, i.e., blind

and gives linear transformations of data that maximize the

variance along the new variables. A PCA classifier uses

features to discriminate between different signal classes

and may calculate the probability of a certain signal

belonging to a certain class. The classifier contains prior

knowledge about the classes and features, and is often

trained using training set, a particular set of signals where

their true class belongings are known.

Compressing ECG records using PCA involves a num-

ber of stages, as is done in the neural-network techniques; a

set of difference vectors is produced. Using PCA com-

pression, recognizable reconstruction of an ECG signal

may be achieved by summing the contributions of just the

first few basis vectors [2, 22–24]. The eigenvectors and

associated eigenvalues are derived from the particular

difference set through a linear-algebra decomposition

Neural Comput & Applic

123

Page 3: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

process [1, 2]. They are inherently optimum in a least

mean-squares sense for representing the ECG record from

which they are derived. The sum of all such contributions

produces the complete reconstructed ECG signal. Fourier

or wavelet transform approaches may also be viewed as

transformation techniques, with the basis vectors in these

cases being sinusoids and wavelets, respectively. They are

also orthonormal, but are suboptimal, as, unlike PCA, they

are general purpose and not explicitly matched to the ECG

data set that they are used to compress [3, 4].

3.1 PCA whitening

Principal component analysis is extensively used in feature

extraction to reduce the dimensionality of the original data

by a linear transformation. PCA extracts dominant features

(principal components) from a set of multivariate data. The

dominant features retain most of the information, both in

the sense of maximum variance of the features and in the

sense of minimum reconstruction error. PCA is widely

used in face recognition, vehicle sound signature recogni-

tion, speech recognition, speaker recognition, medical

applications, signal noise reduction and active noise con-

trol. The whitening step is used to remove the correlation

between the observed ECG data. A common method to

achieve whitening is by the eigenvalue decomposition of

the covariance matrix of the mixed ECG signal [1–3, 25].

Before applying ICA to signals, it is better to preprocess

the ECG signals with a PCA-whitening method. After this

preprocessing with a linear transformation of the measured

ECG signals, the means of ECG signals are zero and the

variances are made one. This facilitates the reduction of the

correlation between several ECG signals or segments as

well as the dimension of ECG data set [9, 11, 17, 19].

Normalization of the ECG data to zero mean and unit

variance can considerably improve the results of visuali-

zation of various segments of an ECG waveform.

Visualization plots or scree plots [2, 11] obtained using

PCA can show that there is good segment separation of the

ECG waveforms.

3.2 ECG features

Feature extraction refers to a process whereby the input

ECG data space is transformed into a feature space that

although it has the same dimensionality as the original

ECG data space, it can represent the ECG data set more

accurately within the constraints imposed by having a

reduced number of features at the representation [1, 8, 10,

12]. The most fundamental types of patient data that are

available include ECG parameters that may be measured

directly. Computerized ECG data acquisition systems are

capable of displaying and storing the measured ECG

variables as well as real-time analysis. ECG signals that are

clean and have a high signal-to-noise ratio (SNR) are rel-

atively easy to interpret, both by a computer and a human

healthcare provider. Often, however, ECG signals are

corrupted by large amounts of noise and artifacts that

makes ECG interpretation difficult [2, 3, 5–7]. Noise con-

tamination in ECG signals is typically in the form of

motion artifacts caused when the patient moves. Mostly

ECG signals are non-stationary, meaning that they do not

follow a statistical distribution around a constant mean

value over time. Therefore, it is necessary to obtain derived

features of ECG signals for use with statistical monitoring

techniques. [1, 7, 26, 27]. In PCA decomposition, the lower

Fig. 1 a Basis for PCA vectors,

b ICA basis vectors,

c components of an ECG

waveform and d representation

of ECG leads

Neural Comput & Applic

123

Page 4: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

index basis vectors represent larger energies in the ECG

signals. For example, the first principal component looks

somewhat like part of the central QRS complex in a normal

ECG. Likewise, other lower index PC eigenvectors

resemble elements of different ECG groups, such as the T

wave and P wave [2, 28]. Higher index eigenvectors tend to

look like noise, base-line wander (BLW), etc. Indeed, the

noisy sections of the ECG record tend to have higher

values of these principal component coefficients [1–3,

7, 29]. A conclusion of this statement is that an ECG signal

reconstructed without these higher index coefficients will

tend to look like the original ECG but without the noise

[2, 3, 17].

3.3 PCA-based ECG data compression

Recently several new approaches, the neural network with

the PCA and the multi-resolution wavelet decomposition

have been introduced [23, 24, 30]. However, existing

compression strategy has not always been easily accepted

by cardiologists. Among the transformation methods, the

Karhunen-Loeve transform (KLT) shows a highest com-

pression ratio for multi-lead ECG analysis [3, 17, 22]. It is

widely accepted that an effective ECG data compression is

in acute demand in clinical data processing. Data com-

pression of ECG records is very useful to storage pails of

daily clinical data in a hospital and to support a clinical

service in a remote of the country by data transmission with

a public line [2, 11, 23, 24, 31]. The conventional methods

of data compression are divided into two categories: direct

data compression and transformation methods [2, 3, 9, 11].

The direct data compression, mainly the polygonal or

polynomial approximation and the delta coding, is superior

to the transformation methods in terms of the processing

speed and the compression ratio. However, a direct method

depends on a function employed. Accordingly, the recon-

struction distortion appears in specific parts where the

function does not represent the extent of decomposition.

On the other hand, a transformation method recovers the

original ECG signal with a certain degree of error in whole

parts.

The linear PCA can be implemented with powerful,

robust techniques as the singular value decomposition

(SVD) that guarantee numerical accuracy and stability. To

the contrary, the robustness of nonlinear PCA technique is

questionable, due to the involved local minima that gen-

erally do not allow the detection of the optimal solution.

Therefore, when the variables of interest are mostly line-

arly correlated the linear PCA becomes a highly effective

solution [2, 3, 11, 22]. A complete description of the fol-

lowing factors [2, 11, 17] is necessary for efficient data

compression without losing morphology:

1. ECG data that is compressed or stored.

2. What can be retrieved, i.e., entire ECGs or selected

cycles only after data compression.

3. How retrieved ECG data may reasonably be used.

4. The fidelity with which ECG data can be reproduced

should be specified.

5. The ECG compression ratio should be stated, with

reference to the original sampling details, that is,

sampling rate, bit resolution, and number of bits per

sample.

Using PCA compression, recognizable reconstruction of

a given ECG signal may be achieved by summing the

contributions of just the first few basis vectors as these

contain most of the energy [2, 3, 9]. The eigenvectors

themselves form part of the overhead but need to be stored

only once for the whole ECG data set, which may have

thousands of samples. The quality of the compression and

reconstruction depends on how many of the PCA coeffi-

cients are used. Good reconstruction may be achieved using

five coefficients, as depicted in the results. Due to the nature

of the process used in identifying the eigenvectors in PCA

compression, the resultant basis vectors are orthonormal,

representing independent linear variables [3, 11, 25].

In this paper, PCA is used to exploit the fact that by

identifying the dependencies of higher index coefficients

on its first principal component may set a strategy for the

ECG analysis [2, 3]. In this analysis, using PCA variance

estimator [2, 11], the coefficients of the first five principal

components have been calculated for each of ECG signals

AF, AR and AL. This could be further facilitated by using

non-linear PCA compression, which enables only a single

coefficient to be required and stored for each ECG signal or

its samples. The other coefficients are inferred using the

polynomial techniques or concept. Restoring the ECG

waveform by the limited number of the orthogonal basis

corresponds to the orthogonal projection off onto the sub-

space H defined by the k eigenvectors [4, 9, 11, 17]. Let, A

be a pure rotation matrix, and is thus orthonormal, when

ATA = I = identity matrix. Any vector x can be recovered

from y by using the transformation

X ¼ ATyþ mx ð1Þ

x can be approximated by only using the k first eigenvalues

(components), which is five in this case and an Ak matrix

constructed from the k eigenvectors; this gives a possibility

to store the ECG data more efficiently. PCA is also used to

align the ECG data according to its eigen axes which

removes rotational effects. The eigenvalues can be used for

size normalization of an ECG data set. Translational effects

are also removed, since the ECG data set is centered

around its mean. The PCA based shape-space model for

ECG interpretation is

Neural Comput & Applic

123

Page 5: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

Q ¼ wxþ Q0

or

x ¼ wþðQ� Q0Þð2aÞ

where x = shape-space vector over time. w = matrix for

an ECG data set obtained after recording from a specimen

or can be obtained from a standard CSE data base or any

other ECG data base. Assuming a projection matrix as P(k),

the orthogonal projection of f is given by

f ðkÞ ¼ PðkÞf ð2bÞ

where P(k) = H�(HT�H)-1�HT

The governing equation for PCA based ECG analysis is

Q ¼ Q0 þ Vimp

ki ð3Þ

where

Vi ith eigenvector

ki ith eigenvalue, which represents the sample variance

of x

m a scalar varying between certain limits

Q original or actual shape of ECG wave

Q0 mean shape of ECG wave

4 Standard independent component analysis (SICA)

Independent component analysis is a new technique for

ECG signals based on high-order statistics, and is used to

separate ICs from ECG measurements [1, 5, 6]. Since ICA

uses density estimation of a signal, the components with

dominant density can be easily found [23, 24]. In the

standard, noise free ECG, formulation of the ICA problem,

the observed ECG signals x are assumed to be a linear

mixture of an equal number of unknown but statistically

independent source signals [1, 5–7, 17].

If the artifacts and noise in ECG were to be removed

using ICA, the source of the artifacts and noise would be

other independent sources, and in such a situation, the

number of sources would exceed the number of recordings

[3, 7, 14]. It is thus important to determine the conditions

under which standard ICA could be used to remove artifacts

and noise from ECG recordings when the number of sources

may exceed the number of recordings [1, 7, 10, 13]. To

analyze this, consider the set of ECG recordings to be a

vector x and the pure signals (unknown) to be a vector s.

Then

X ¼ A s ð4Þ

where A is an unknown, invertible, square mixing matrix.

The output of ICA algorithm is an estimate of un-mixing

matrix w, such that

s ¼ w x ¼ w A s ð5Þ

It is evident that, wA = I, is an identity matrix. The

estimated ICs will be a mixture of those true independent

sources with element of w as the scale factor.

4.1 ECG correction using statistical measures

Independent component analysis defines a generative

model for the observed multivariate data, which is typically

given as a large database of samples. ICA cannot recover

any information from the mixtures if the ICs have Gaussian

distribution. It may be surprising that the ICs can be esti-

mated from linear mixtures with no more assumptions than

their independence. ICA is used to extract the contributions

of independent signal sources from their mixtures and aim

of applying ICA will be to separate ECG from all these [5,

13, 14, 32]. The ICA algorithm as depicted in Fig. 2 is used

to remove noise and artifacts from ECG signals and is used

as an reimplementation of [6]. The proposed method

Fig. 2 Flow chart of PCA–ICA

scheme for denoising and

correction of ECG signals

Neural Comput & Applic

123

Page 6: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

analyses the use of PCA for the denoising of ECG signals.

ICA has some inherent limitations, due to which identify-

ing the IC of interest becomes difficult and highly

subjective at times. Band pass filtering is generally used to

attenuate the frequencies related to the above noise sources

which lie outside the frequency band occupied by the QRS

complex [15, 16, 18, 30, 33].

In the presented PCA–ICA method, the high accuracy

achieved in detecting QRS complexes is accompanied by

robustness, low computational complexity and dimen-

sionality reduction [2, 3, 11]. At the beginning of the

proposed algorithm, the enhancement of the QRS part of

the ECG is achieved by suppressing the level of the P and T

waves and disturbances in the ECG signal. These distur-

bances result mainly due to baseline drift, power line

interference and interferences from other physiological

sources. In this paper, combinations of leads V1,V3,V6,

AF, AR and AL of ECG recordings from the CSE database

is used for checking the accuracy of the combined PCA–

ICA algorithm and its reliability is evident from the results

of the scatter plots of PCA and ICA. The aim of this paper

is to justify the underlying theory of the use of ICA for

separation of the ECG signals and PCA for ECG data

compression.

5 PCA simulations

In this analysis, number of ECG samples taken in an ECG

file is 5000 and the sampling frequency of 500 Hz. In this

analysis, file-no-01 of CSE data base is used and the ECG

signals are simulated using Matlab software. Out of 12

channels, six channels viz. V1, V3, V6, AF, AR and AL

are used in the simulations. The reasons of using common

standards for electrocardiography (CSE) database is

frequently used since it is widely appreciated for the

evaluation of diagnostic ECG analyzers. The CSE Data-

base consists of about 1000 multi-lead recordings (12 or

15 leads), which obviously gives a wide range of data for

ECG analysis. In this analysis, it was investigated that

flexibility in selection of ECG segment amplitudes will be

useful to produce an acceptable ECG data compression for

a cardiologist for diagnostic applications. It is strongly felt

that PCA can be used for determining component signif-

icance to decompose a correlation matrix of an ECG

data set. A unique algorithm known as ‘‘PCA variance

estimator’’ is developed by the author, based on the

decreasing values of eigenvectors/eigenvalues [2, 3, 7].

Figure 3a–f shows the ECG waveforms for six leads

considered in this analysis.

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-1

-0.5

0

0.5

1

1.5

Number of points

Number of points

Number of points

mV

mV

mV

V6

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-2.5

-2

-1.5

-1

-0.5

0

0.5V3

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-2

-1.5

-1

-0.5

0

0.5V1

(a) (d)

(e)

(f)

(b)

(c)

Fig. 3 a ECG waveform of lead V6, b ECG waveform of lead V3, c ECG waveform of lead V1, d ECG waveform of lead AF, e ECG waveform

of lead AR and f ECG waveform of lead AL

Neural Comput & Applic

123

Page 7: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

5.1 PCA scatter plots

Principal component analysis helps in dimensionality

reduction of factors, i.e., that part of the ECG data which is

contributing the least useful diagnostic information. PCA

scatter plots are used to indicate the differences in vari-

ances observed in ECG segments before and after cardiac

treatment. Good estimates of the source signals are possible

due to the ability of the subspace algorithm to track the

eigenvalues and eigenvectors in non-stationary environ-

ment. The convergence of the algorithm is illustrated by

plotting the canonical angles between the basis vectors in

the estimated and theoretical signal subspaces [34, 35].

This section of the paper deals with plotting of scatter

plots for various lead combinations. PCA scatter plots of

various ECG leads give different orientations of the same

heart information when considered for different combina-

tions of leads [2, 3, 11]. Case studies have been

investigated and results have been obtained for combina-

tions of leads or channels to find correlations between them

viz.V1, V3, V6, AF, AR and AL. In PCA implementation,

if correlation coefficient between ECG data sets reduces

then noise is better separated or reduced which results in

more diagnostic information. On the other hand if corre-

lation increases, then noise is more predominant which

results in lesser diagnostic information. Table 1 shows the

values of the correlation coefficients of various combina-

tions of correlated ECG leads before PCA whitening.

Figure 4a–f shows the scatter plots using PCA whitening

for various combinations of leads and indicates the noise

and useful data rotation in the four quadrants.

Values of correlation coefficients between various lead

combinations give different values of correlation coeffi-

cients and also indicate the noise content sustaining in the

data as well the artifacts removed before and after appli-

cation of PCA.

Correlation coefficient between ECG data sets before

and after PCA rotation indicate the capability of PCA in

identifying and separating noise and artifacts in either all

ECG leads or the specific leads. Based on the combinations

of the leads chosen, the useful ECG data and noise content

take different locations in the four quadrants of the XY or

scatter plot. Accordingly, the values of correlation coeffi-

cients become positive or negative. The orientation of

useful ECG data and noisy data indicates the trend of fil-

tering requirements further for noise reduction and feature

extraction as well the classification.

5.2 ECG segment classification using PCAVE

Figure 5 shows a new developed PCA classifier as pro-

posed by the author. The developed PCA variance

estimator (PCAVE) operates on the principle which detects

the various ECG segments in the following sequence:

1. Collect n dimensional ECG data set.

2. Mean correct all the ECG sample points, i.e., calculate

the mean and subtract it from each ECG data point.

3. Calculate the variance-covariance matrix of the ECG

data set using, R = x�xT

4. Decompose the covariance matrix using PCA.

R ¼XN

i¼1

kiPCiPCiT

where ki is the eigenvalue of each principal component

PCi.

5. Determine eigenvalues and eigenvectors of the ECG

covariance matrix.

6. Sort the eigenvalues and the corresponding eigenvec-

tors, such that, k1 C k2 C kn.

7. Rotate the principal components using varimax

rotation.

8. Varimax rotation is an orthogonal transform that

rotates the principal components such that the variance

of the factors is maximized. This rotation improves the

ECG interpretability of the principal components.

9. Select the first eigenvectors and generate the ECG data

set in the new representation.

10. Analyze each detected ECG segment or waveform.

11. Study the error signal and calculate reconstruction

error.

The performance of the PCA classifiers is evaluated by

computing the percentages of: sensitivity (SE), specificity

(SP) and correct classification (CC) or accuracy.

This section of paper deals with the PCA analysis to find

the variance and error values of each ECG segment present

in the raw ECG signal and identify the various components

present in the augmented ECG leads AF, AR and AL.

Various cases are discussed as given below.

Case 1 deals with the results of lead AF for eigenvalue

and % error estimations. Tables 2 and 3 shows the

Table 1 Correlation coefficients of ECG leads before PCA

whitening

S. no. Correlated data sets Correlation coefficient

1 V6 and AF 0.19163

2 V6 and AL 0.88174

3 V6 and AR -0.7531

4 V3 and V1 0.84569

5 V6 and V1 -0.69315

6 AF and AR -0.6966

7 AF and AL 0.13970

8 AL and AR -0.80770

Neural Comput & Applic

123

Page 8: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

magnitude of eigenvalues, the nomenclature, % variance

and the % error of the ECG segments for the lead AF.

The eigenvalues of the various ECG segments for the

lead AF are arranged in the descending order as shown in

Table 2. Number of useful eigenvalues found here for the

ECG lead AF is five.

Case 2 deals with the results of lead AR for eigenvalue

and % error estimations. Tables 4 and 5 shows the mag-

nitude of eigenvalues, the nomenclature, % variance and

the % error of the ECG segments for the lead AR.

Case 3 deals with the results of lead AL for eigenvalue

and % error estimations. Tables 6 and 7 shows the mag-

nitude of eigenvalues, the nomenclature, % variance and

the % error of the ECG segments for the lead AL.

Figure 6a–c shows the bar representation of principal

components for the leads AF, AR and AL. Figure 6d shows

the peaky template of QRS complex obtained for lead AF

as first principal component using the developed PCAVE.

5.3 Effect of noise on PCA program results

A comparison of point estimates from high versus low-

noise ECG CSE recordings indicated that on the average

computer-derived wave onsets and offsets were shifted

outward by noise in the cases discussed. However, this shift

was significantly less for PCA programs than the results

which are obtained using wavelet transforms. The overhead

in PCA is increased slightly due to the requirement to store

the polynomial’s own coefficients, but this is minimal as

compared to large details obtained in wavelet transforms.

Typically a 25–30% reduction in the required bit rate for

comparable reconstruction quality is achievable using

PCA, as compared to wavelet transforms and conventional

filtering methods.

5.4 Scree plots

Scree plot is a plot of eigenvalues versus number of principal

components. The scree test uses the eigenvalues found from

principal components analysis, drawing a straight line

through the lowest eigenvalues. It is strongly felt after

obtaining the simulation results that PCA can be successfully

used for the determination of principal components of sig-

nificance to decompose a correlation matrix between various

ECG data sets or leads. Figure 7 shows the scree plots for the

combinations of the augmented ECG leads AF, AR and AL.

5.5 Limitations of PCA

Some limitations of the decomposition method as currently

employed are important and are being discussed [1–3, 9, 11,

17, 26, 36]. First is that the PCA approach in the time or

frequency domain alone, is most sensitive to various specific

cardiac events and even for certain cardiac cycles. PCA

decomposition is generally robust against minor jitters,

creating slightly larger ECG components that cover the area

of the jitter [16, 19, 32], but is not optimal if the jitter is

substantial or the event shifts widely on the time frequency

surface judiciously due to experimental corrections or

manipulations. One possible result of these shifts is the

extraction of multiple ECG components, with at least com-

ponent one for each shifted location. Another possibility is

that with strong jitter, an elongated ECG component covers

the area of the jitter. A second important difficulty in the

decomposition method is choosing the number of principal

components (PCs) to extract, which can be arbitrary chosen

or fixed as a criterion. This could be always a limitation when

using PCA, although less information is available to moti-

vate the selection in the time frequency domains together

than the time or frequency domains selected alone.

5.6 Morphological validation of PCA

In order to give more validation to the behavior of the

proposed PCA decomposition method, two additional

manipulations were conducted with the CSE based ECG

data sets [3, 4, 9, 17, 19]. First, a condition difference was

incorporated by arbitrarily reducing the amplitude of half

the ECG dataset by 50%. Second, after creating the con-

dition difference, noise was added to the ECG signals to

check the ability of PCA decomposition to detect the sig-

nals under high and low signal-to-noise conditions with

differing signal representations. Signal-to-noise levels were

adjusted using signal power measured for each simulated

ECG dataset as a whole, rather than measuring signal

power and adding noise separately for each trial performed.

This was carried out to avoid adding noise differentially to

the simulated conditions, i.e., to ECG signals with no

signal or overlapping simulated ECG signals, or to the

different simulations which had differing numbers of

simulated electrodes that had none as ECG signal, possibly

could be noise or an artifact.

6 Statistical corrections to ECG Data and ICA scatter

plots

Adaptive separation methods are generally required since

the mixing system and signal or noise statistics may be

time varying. Moreover, real-time computation is desirable

in ECG signal processing applications. The separation task

at hand is to estimate a separating matrix w or mixing

matrix H so that the original sources are recovered from the

noisy ECG mixtures. Prior to separation, the observed ECG

signals are typically spatially whitened and the signal

powers are normalized to unity [2, 5, 6, 26, 34].

Neural Comput & Applic

123

Page 9: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

In the present paper, the three or six lead ECG data are

decomposed into statistically ICs using the Jade algorithm

after identifying noise and artifacts. The noise and artifacts

are removed using a cleaning procedure making use of

statistical measures kurtosis and variance of variance

(Varvar) as first proposed in [6] and reimplemented in

[1, 2, 5, 7, 8]. The values of statistical measures kurtosis

and variance of variance are calculated by the algorithm

and checked against predefined thresholds. Thresholds for,

Kurtosis = 4.3 and Varvar = 0.4 obtained after initial

parameterization [1, 5, 7] of CSE data base files removing

redundant features, shows improvement in the thresholds

Fig. 4 a Scatter plot of leads AL and AF before and after PCA

whitening. b Scatter plot of leads AF and AR before and after PCA

whitening. c Scatter plot of AL and AR before and after PCA whit-

ening. d Scatter plot of V6 and V1 before and after PCA whitening.

e Scatter plot of V6 and AF before and after PCA whitening. f Scatter

plot of V6 and AL before and after PCA whitening. h Scatter plot of

V6 and AR before and after PCA whitening. i Scatter plot of V3 and

V1 before and after PCA whitening

Neural Comput & Applic

123

Page 10: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

Table 3 Eigenvalues and % variances of components of lead AF

S. no. Eigenvalue number Magnitude of eigen value Nomenclature of ECG segment Total variance of ECG segments (%)

1 Eigenvalue 1 0.0211 QRS Complex 87.8932

2 Eigenvalue 2 0.0019 T wave 7.8456

3 Eigenvalue 3 0.0005 P wave 1.9292

4 Eigenvalue 4 0.0003 BLW 1.0586

5 Eigenvalue 5 0.0002 Noise 0.6448

Table 4 Eigenvalues and % variances of various ECG components for the lead AR

S. no. Eigenvalue number Magnitude of eigen value Nomenclature of ECG segment Total variance of ECG segments (%)

1 Eigenvalue 1 0.0089 QRS Complex 88.11

2 Eigenvalue 2 0.0005 T wave 4.95

3 Eigenvalue 3 0.0004 P wave 3.96

4 Eigenvalue 4 0.0002 Noise 1.98

Table 5 Eigenvalues and % error of various ECG components for the lead AR

S. no. Eigenvalue number Magnitude of eigen value Nomenclature of ECG segment Error of various ECG segments (%)

1 Eigenvalue 1 0.0089 QRS complex 0.0178

2 Eigenvalue 2 0.0005 T wave 0.001

3 Eigenvalue 3 0.0004 P wave 0.0008

4 Eigenvalue 4 0.0002 Noise 0.0004

5 Eigenvalue 5 0.0001 BLW 0.0002

Table 2 Eigenvalues and % error of various ECG components for the lead AF

S. no. Eigenvalue number Magnitude of eigen value Nomenclature of ECG segment Error of ECG segments (%)

1 Eigenvalue 1 0.0211 QRS complex 0.0400

2 Eigenvalue 2 0.0019 T wave 0.0038

3 Eigenvalue 3 0.0005 P wave 0.0010

4 Eigenvalue 4 0.0003 Base-line wander (BLW) 0.0006

5 Eigenvalue 5 0.0002 Noise 0.0002

Fig. 5 Proposed PCA classifier

Neural Comput & Applic

123

Page 11: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

obtained as in [6]. The initial source signals are two

positive kurtotic signals representing ECG signals and

artifacts having frequencies slightly below 0.5 Hz, i.e., the

frequency of BLW [1, 2, 5, 7, 34].

6.1 Selecting appropriate nonlinearity

Blind source separation has many important applications

in communications and array signal processing. Many

widely used methods require prior knowledge on the sign

of the kurtosis of the sources and may fail if the mixtures

contain both sub- and super-Gaussian signals. Typically

prior information on the sign of the kurtosis is assumed to

be available and the nonlinearities are selected accordingly

[34, 37]. This assumption is often unreasonable. The zero-

memory nonlinearities needed for finding independent

sources are selected online by monitoring the statistics of

each estimated source signal. Consequently, separation

may be achieved even if a change in the sign of the

kurtosis occurs. Simulation examples illustrating the

ability to adapt to time-varying mixing systems and source

distributions of unknown kurtosis are presented using

ECG signals. In order to affect a truly blind algorithm, the

statistics of each output of the separation system are

recursively tracked [34] and an appropriate nonlinearity

for each channel is selected from two alternatives

depending on whether the source is deemed to have neg-

ative or positive kurtosis [35, 38].

Since the signal statistics may be time varying, the

algorithm which is discussed here does not make

restrictive assumptions on the form of the power density

Fig. 6 a Bar representation of principal components of lead AF.

b Bar representation of principal components of lead AR. c Bar

representation of principal components of lead AL. d Peaky template

of QRS complex obtained for lead AF as first principal component

Table 6 Eigenvalues and % error of components of an ECG data set

AL

S. no. Eigenvalue

number

Magnitude

of eigen

value

Nomenclature

of ECG

segment

Total variance

of ECG

segments (%)

1 Eigenvalue 1 0.0088 QRS complex 87.6345

2 Eigenvalue 2 0.0004 T wave 4.6449

3 Eigenvalue 3 0.0003 P wave 3.5277

4 Eigenvalue 4 0.0002 BLW 2.2125

5 Eigenvalue 5 0.0001 Noise 0.9794

Table 7 Eigenvalues and % error of various ECG components for

the lead AL

S. no. Eigenvalue

number

Magnitude

of eigen

value

Nomenclature

of ECG

segment

Error of

various ECG

segments (%)

1 Eigenvalue 1 0.0088 QRS complex 0.0176

2 Eigenvalue 2 0.0004 T wave 0.0008

3 Eigenvalue 3 0.0003 P wave 0.0006

4 Eigenvalue 4 0.0002 BLW 0.0004

5 Eigenvalue 5 0.0001 Noise 0.0002

Neural Comput & Applic

123

Page 12: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

function (pdf), can adapt to changes in the mixing system

and signal statistics, and lends itself to real-time compu-

tation. In many BSS problems, prior information on the

type of pdf and the sign of the kurtosis is not available

[34, 35].

6.2 Interpretation of kurtosis coefficients

Simulated cases illustrating the capability to adapt to

time-varying mixing systems and source distributions of

unknown kurtosis are presented using CSE data base

ECG signals. The track of the sign of difference of

kurtosis coefficients, i.e., (k1 - k2) for the two ECG

channels is presented in the respective graphs for

physiological interpretation of electrocardiograms and

gaussanity.

If (k1 - k2) is positive it means that on the respective

channel, we have a super-Gaussian signal, otherwise we

have a sub-Gaussian signal. The typical number of samples

needed to achieve separation is around 300–400 [1, 5, 6,

34, 37]. Here, k1 and k2 are the kurtosis coefficients of the

ICs ICA1 and ICA2.

Tables 8, 9, 10, give the values of kurtosis and variance

of variance coefficients for the combination of the leads

AF, AR and AL.

In this case, k1 = 23.6985 and k2 = 3.1324, hence

(k1 - k2) = (23.6985 - 3.1324) = 20.5661. Since (k1 -

k2) is positive it means that on each of the respective

channels, we have a super-Gaussian signal.

From Fig. 8b, it is apparent that BLW as well as noise is

removed using statistical measures kurtosis and variance of

variance along with ICA, the concept as proposed first in

Fig. 7 a Scree plot for

combination of leads AR

and AF. b Scree plot for

combination of leads AL

and AF. c Scree plot for

combination of leads AL

and AR

Neural Comput & Applic

123

Page 13: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

[6]. Table 8 indicates that, ICA-2 is a noise component

having |Kurt| \ 4.3 and has been successfully removed

using ICA followed by statistical corrections.

In this case, k1 = 23.6674 and k2 = 3.1332, hence

(k1 - k2) = (23.6674 - 3.1332) = 20.5342. Since (k1 -

k2) is positive it means that on each of the respective

channels, we have a super-Gaussian signal.

In this case, k1 = 23.6834 and k2 = 3.1353, hence

(k1 - k2) = (23.6834 - 3.1353) = 20.5481.

Since (k1 - k2) is positive it means that on each of the

channels, we have a super-Gaussian signal.

In this analysis, ICs are extracted, but as apparent from

the results obtained, they contained sufficient amount of

noise which was removed further using statistical mea-

sures, modulus of kurtosis and variance of variance

(Varvar) as proposed in [6] and reimplemented on a dif-

ferent data base at a sampling frequency of 500 Hz as

compared to [6], where the sampling frequency used was

256 Hz. After obtaining corrected ECG signals, further

investigations can be done for ECG segment analysis and

various interval classifications under different conditions.

Figure 9a shows source signals AL and AF before and after

ICA mixing, whereas Fig. 9b depicts the extracted inde-

pendent components and reconstructed ECGs for leads AL

and AF after statistical corrections. Figure 10a shows the

source signals for leads AL and AR before and after ICA

mixing, whereas Fig. 10b indicates the extracted indepen-

dent components and reconstructed clean ECG signals of

the leads AL and AR.

6.3 ICA scatter plots

For various lead combinations, ICA scatter plots have been

obtained using Matlab software. Based on the combina-

tions of the leads chosen, the position of noise and useful

ECG data can be better oriented in various quadrants of

ICA scatter plots for better feature selection as compared to

PCA scatter plots. PCA scatter plot analysis helps in

deciding which combination of leads gives most useful

diagnostic features, whereas ICA scatter plots can actually

give the features more prominently after correction pro-

cedure in either specific or all leads. Figure 11a–c give the

scatter plots for various combinations of the leads AF, AR

and AL.

7 Discussions

Principal component analysis preprocessing is used for data

compression, feature extraction and the decision-making

stage for R-peak detection. On the other hand, ICA pro-

cessing is for extraction of ICs and cleaning of ECG signals

together with statistical measures kurtosis and variance of

variance. ICA decomposition gives constant correlation

coefficient, whereas PCA decomposition exhibits varying

correlation coefficient.

In the present scheme of analysis, various segments of

an ECG data set are detected by considering PCA for data

reduction followed by ICA plus statistical measures for

cleaning the ECG as discussed and originated in [6]. A

threshold criterion is set in PCA analysis for ECG segment

classification based on variance concept. The efficacy of

the algorithm lies in the fact that PCA promotes data

compression whereas ICA helps in the identification of

noise and artifacts. The statistical measures, viz. kurtosis

and variance of variance decides the cleaning of the ECG

signals and identification of useful ECG components. The

correction of ECG data after removal of noise and artifacts

indicate that the extent of filter requirement reduces after

using ICA and statistical measures modulus of kurtosis and

variance of variance (Varvar). PCA based scatter plot

analysis, helps in deciding which combination of leads

gives most useful diagnostic features, whereas ICA scatter

plots can actually give the features more prominently after

the cleaning procedure in any of the leads. Thus, methods

like PCA and ICA would help the physicians to examine

and analyze the ECG signals more accurately, to further

increase the ratio and percentage of correct diagnosis. In

ICA processing, at times some of the components are more

emphasized or some components are either missing or

reduced and therefore, it can be used as a good statistical

tool for automated heart monitoring and its variability

measures.

Table 8 Values of estimated |Kurt| and Varvar for the two ICA

components AF and AR

Index ICA1 ICA2

|Kurt| 23.6985 3.1324 (Noise)

Varvar 0.0035 0.0050

Table 9 Values of estimated |Kurt| and Varvar for the ICA compo-

nents for leads AF and AL

Index ICA1 ICA2

|Kurt| 23.6674 3.1332 (Noise)

Varvar 0.0035 0.0050

Table 10 Values of estimated |Kurt| and Varvar for ICA components

of AL and AR

Index ICA1 ICA2

|Kurt| 23.6834 3.1353 (Noise)

Varvar 0.0035 0.0050

Neural Comput & Applic

123

Page 14: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

8 Conclusion

Principal component analysis used in this case is used for

denoising, dimensionality reduction and data compression,

whereas ICA is utilized for removal of artifacts and noise.

In this paper, a new statistical algorithm based on PCA–

ICA together for six leads of a 12-channel ECG data is

developed. In this analysis, ICs are extracted, but as

evident from the results, they contained sufficient amount

of noise which was removed using statistical measures,

modulus of kurtosis and variance of variance (Varvar) as

proposed in [6]. After obtaining the corrected ECG signals,

further investigations can be done for ECG segment anal-

ysis and different ECG interval classification. Based on the

combinations of the leads chosen, the useful ECG data at

times give better feature selection using ICA and better

Fig. 8 a Plot of signals AF and

AR before and after ICA

mixing. b Extracted

independent components and

reconstructed clean ECG signal

for leads AR and AF

Neural Comput & Applic

123

Page 15: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

Fig. 9 a Source signals AL and

AF before and after ICA

mixing. b Extracted

independent components and

reconstructed ECGs for leads

AL and AF after statistical

corrections

Neural Comput & Applic

123

Page 16: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

Fig. 10 a Source signals for

leads AL and AR before and

after ICA mixing. b Extracted

independent components and

reconstructed clean ECG signals

of the leads AL and AR

Neural Comput & Applic

123

Page 17: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

denoising of ECG data using PCA. Simulation cases

illustrating the capability to adapt to time-varying mixing

systems and source distributions of unknown kurtosis are

presented using CSE data base ECG signals. The track of

the sign of difference of kurtosis coefficients, i.e., (k1 - k2)

for the two ECG channels is presented in the respective

graphs for physiological interpretation of electrocardio-

grams and gaussanity. The results demonstrate that the

integration of PCA and ICA techniques can efficiently

remove the noise and artifacts from the ECG signals, even

after ECG data reduction preserving morphological ECG

features. PCA and ICA scatter plots of various ECG leads

and their combinations give different orientations of the

same heart information with more probability of attaining

diagnostic features. Different case studies have been

carried out for 12-lead ECG data and results have been

obtained for combinations to find correlations between

various leads, viz. V1, V3, V6, AF, AR and AL. PCA in

this case is used for removal of redundant data reduction,

denoising and data shrinkage, whereas ICA is used not only

for removal of artifacts and noise, but also for feature

extraction. For various lead combinations, ICA and

PCA scatter plots have been obtained. Based on the

combinations of the leads chosen, the position of noise and

useful ECG data in various quadrants can be done using

ICA as well as PCA scatter plots.

9 PCA and ICA: alternative statistical tools

for medical researchers

The proposed approach allows for the consideration of

priors on the structural nature of the different class of ECG

signals that are to be separated. These investigations are

expected to throw some light on the physiological phe-

nomenon of heart and its abnormalities if existing in the

ECG data sets. The observation made by the author after

this comparative study, is that, these higher order statistical

tools could be effectively utilized by the clinicians and

medical research community for morphological and feature

extraction of ECG. PCA and ICA methods demonstrate

that their combination to ECG applications offers signifi-

cant advantageous as well as comparable results over

classical approaches for better classification and a basis for

extensive feature selection. Abnormal signals were suc-

cessfully detected in ICA and corrected using statistical

parameters as compared to the original source ECG signals,

which may be used as a basis at times to reduce the need of

further sophisticated processing generally required in

conventional filtering methods. However, the results are

not always completely satisfactory, because there are only

three measured ECG signals to demix. If there were more

measured signals, ICA is expected to still provide better

results. To conclude, these techniques can be very well

accepted for ECG data compression, denoising and cor-

rection of ECG signals as done using wavelet analysis and

high-order digital filters like butterworth, etc. Based on the

combinations of the leads chosen, the useful ECG data at

times give better feature selection using ICA and better

denoising and compression of ECG data using PCA

maintaining diagnostic morphology.

Acknowledgments The author is thankful to Department of elec-

trical engineering, Indian Institute of Technology, Roorkee, India for

providing the required facilities to carry out this research. He is

extremely indebted to G.S. Inst. of Tech & Sc., Indore, India and

AICTE, GOI for sponsoring for his PhD.

References

1. Chawla MPS, Verma HK, Vinod Kumar A (2007) New statistical

PCA–ICA algorithm for location of R-peaks in ECG (Elsevier).

Int J Cardiol (in press)

2. Chawla MPS, Verma HK, Vinod Kumar (2006) ECG modeling

and QRS detection using principal component analysis. In: Pro-

ceedings of IET international conference, paper no. 04,

MEDSIP06, Glasgow, UK

Fig. 11 a ICA scatter plot of leads AL and AF. b ICA scatter plot of

leads AR and AF. c ICA Scatter plot of AL and AR

Neural Comput & Applic

123

Page 18: A comparative analysis of principal component and independent …read.pudn.com/downloads137/sourcecode/others/585155/A... · 2008. 10. 2. · supervised method, allowing the process

3. Chawla MPS (2008) PCA–ICA method for detection of QRS

complexes and location of R-peaks in electrocardiograms, DSP,

Elsevier (revised and resubmitted on 30 April 2008)

4. Hao Z, Li-Qing Z (2005) ECG analysis based on PCA and sup-

port vector machines (IEEE). Transactions 0-7803-9422-4/05, pp

743–747

5. Chawla MPS, Verma HK, Vinod Kumar (2007) Artifacts and

noise removal in electrocardiograms using independent compo-

nent analysis (Elsevier). Int J Cardiol (in Press)

6. Taigang He, Gari Clifford, Lionel Tarassenko (2005) Application

of independent component analysis in removing artifacts from the

electrocardiogram. In: Neural computing and applications.

Springer-Verlag London Limited, pp 1–19

7. Chawla MPS (2007) Parameterization and correction of electro-

cardiogram signals using Independent component analysis

(WSPC, Singapore). Int J Mech Med Biol (JMMB) 7(4):355–379

8. Chawla MPS (2007) Parameterization and R-peak error estima-

tions of ECG signals using independent component analysis

(Taylor & Francis). Int J Comp Math Methods Med (CMMM)

8(4):263–285

9. Jalaleddine S, Hutchens C, Strattan R, Coberly W (1990) ECG

data compression techniques: a uniform approach (IEEE). Trans

Biomed Eng BME 37(4):329–343

10. Chawla MPS, Verma HK, Vinod Kumar (2006) Modeling and

feature extraction of ECG using independent component analysis.

In: Proceedings of IET international conference, paper no. 40.

APSCOM06, Hongkong, Oct 31–Nov 2

11. Chawla MPS (2008) Data reduction and removal of base-line

wander using principal component analysis, DSP, Elsevier

(revised and resubmitted on 30 April 2008)

12. Kohler BU, Hennig C, Orglmeister R (2002) The principles of

software QRS detection (IEEE). Eng Med Biol Mag 21:42–57

13. Chawla MPS, Verma HK, Vinod Kumar (2006) Independent

component analysis: a novel technique for removal of artifacts and

base-line wander in ECG. In: Proceedings of national conference,

pp 14–18. CISCON-06, MIT, Manipal, India, November 3–4

14. James CJ, Hesse CW (2005) Independent component analysis for

biomedical signals. Physiol Meas 26:R15–R39

15. Gupta D, James CJ, Gray W (2006) Denoising epileptic EEG

using ICA and phase synchrony. In: Proceedings of IET inter-

national conference, paper no. 085. MEDSIP06, Glasgow, UK,

July, 17–19

16. Chawla MPS, Verma HK, Vinod Kumar (2006) A new approach

to ECG modeling using principal component analysis. In: Pro-

ceedings on national conference, paper no. BM4.NCCCB06,

Engineering college, Kota, India, March 8–10

17. Koutsogiannis GS, Soraghan JJ (2002) Selection of number of

principal components for denoising signals. Electron Lett

38:664–666

18. Gao P, Chang EC, Wyse L (2003) Blind separation of fetal ECG

from single mixture using SVD and ICA. In: Proceedings of

ICICS-PCM 2003, Singapore, pp 15–18

19. Bernat EM, Williams WJ, Gehring WJ (2005) Decomposing ERP

time-frequency energy using PCA. Clin. Neurophysiol.

116:1314–1334

20. Lee TW (1999) Independent component analysis using an

extended infomax algorithm for mixed subgaussian and super-

gaussian sources. Neural Comput 11(2):409–433. doi:10.1162/

089976699300016719

21. Ungureanu M, Bigan C, Strungaru R, Lazarescu V (2004) Inde-

pendent component analysis applied in biomedical signal

processing. Meas Sci Rev 4(sect 2):1–8

22. Draper B, Baek K, Bartlett MS, Beveridge JR (2003) Recog-

nizing faces with PCA and ICA. Comput Vis Image Underst

(Special Issue on Face Recognition) 91(1–2):115–137

23. Dyrholm M. Independent component analysis in a convoluted

world, Kongens, Lyngby, Denmark, IMM-Ph.D-2005-158

24. Enescu M (2002) Adaptive methods for blind equalization and

signal separation in MIMO systems, DSC-Tech, Thesis August,

2002, Helsinki University of Technology, Signal Processing

Laboratory

25. Roweis S, Saul L (2000) Nonlinear dimensionality reduction by

local linear embedding. Science 290(5500):2323–2326

26. Ziehe A, Lasko P, Muller K, R. Nolte G (2003) A linear least-

squares algorithm for joint diagonalization. In: Proceedings of

international conference on independent component analysis and

blind signal separation, pp 469–474 (ICA-03, Nara, Japan)

27. Hamilton PS, Tompkins WJ (1986) Quantitative investigation of

QRS detection rules using the MIT/BIH arrhythmia database.

IEEE Trans Biomed Eng 1:115–165

28. Gritzali F, Frangakis G, Papakonstantinou G (1989) Detection of

the P and T waves in an ECG. J Comput Biomed Res 22:83–91

29. Friesen GM, Jannett TC, Jadallah MA, Yates SL, Quint SR,

Nagle HT (1990) A comparison of the noise sensitivity of nine

QRS detection algorithms (IEEE). Trans BME 37:85–98

30. Bruno A, Fabio F, Nadia M, Francesco Carlo M (2005) A New

approach based on wavelet-ICA algorithms for fetal electrocar-

diogram extraction, ESANN’2005. In: Proceedings of European

symposium on artificial neural networks bruges (Belgium),

pp 193–198

31. Thakor N, Sun Y, Rix H, Caminal P (1993) Multi-wave: a

wavelet-based ECG data compression algorithm. Trans Info Syst

E76-D(12):1462–1469 (IElCE)

32. Tarvainen MP, Niskanen J-P, Karjalainen PA, Laitinen T, Lyyra-

Laitinen T (2006) Noise sensitivity of a principal component

regression based RT interval variability estimation method. In:

Proceedings of 28th, EMBS, annual international conference, pp

3098–3101. IEEE, New York City, USA, August 30–September 3

33. James CJ, Lowe D (2003) Extracting multi-source brain activity

from a single electromagnetic channel. Med Artif Intell 28:89–

104

34. Enescu M, Koivunen V (2000) Recursive estimator for separation

of arbitrarily kurtotic sources. IEEE 301–305

35. Cardoso J-F (1998) Blind signal separation: statistical principles.

Proc IEEE 86(10):2009–2025. doi:10.1109/5.720250

36. Nagasaka Y, Iwata A (1993) Data compression of long time ECG

recording using BP and PCA neural networks (IEICE). Trans Info

Syst E76-D(12):1434–1442

37. Amari S-I, Chen TP, Cichocki A (1997) Stability analysis of

adaptive blind source separation. Neural Netw 10(8):1345–1351.

doi:10.1016/S0893-6080(97)00039-7

38. Amari S-I, Cichocki A (1998) Adaptive blind signal processing-

neural network approaches. Proc IEEE 86(10):2026–2048. doi:

10.1109/5.720251

39. Fisher AC, Hagan RP, Brown MC, El-Deredy W (2006) Lisboa

PJG ICA-based blind source separation (BSS) recovery of the

pattern electroretinogram from single channel records with poor

SNR. In: Proceedings of IET international conference, paper no-

087 (MEDSIP-06, Glasgow, UK, July 17–19)

40. Belouchrani A, Amin MG (1998) Blind source separation based

on time-frequency signal representations (IEEE). Trans Signal

Process 46(11):2888–2897

41. Zhou SK, Wang JT, Xu JR (1988)The real-time detection of QRS

complex using the envelope of ECG, (New Orleans, LA). In:

Proceedings of the 10th annual international conference on

engineering in medicine and biology society, p 38.IEEE, New

Orleans, LA

42. Sornmo L, Laguna P (2005) Bioelectrical signal processing in

cardiac and neurological applications. Elsevier Academic Press,

London

Neural Comput & Applic

123