[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - T-weighted Approach

T-weighted Approach for Neural InformationProcessing in P300 based Brain-Computer

InterfaceYang Liu, Zongtan Zhou, Dewen Hut, and Guohua Dong

Department of Automatic Control, College of Mechatronics and Automation,National University of Defense Technology, Changsha, Hunan, 410073, P.R.C.

t email: dwhu@nudt edu. cn

Abstract-A novel method for feature extraction basedon T-statistic criterion is put forward and introduced forP300 potential detection in Brain-computer interface (BCI)applications. After decorrelation by principal componentanalysis (PCA), the optimized weighted sum of EEG signal iscomputed to construct the features. Applied to P300 spellerparadigm of BCI competition' 2003 and BCI competition m(' 2005), this method achieved character accuracy of 100%and 90% respectively, and by the latter score our group gotthe third place for the P300 dataset (dataset II) in the BCIcompetition m.

Index Terms-T-weight, brain-computer interface (BCI),Electroencephalography (EEG), P300 potential

I. INTRODUCTION

The goal of brain-computer interface (BCI) research isto offer a new communication and control channel withoutthe brain's normal output pathways of peripheral nervesand muscles [1]. By analyzing the electroencephalographicactivities recorded from the scalp, computer can recognizethe brain's intention and translate it to commands for outputdevice such as a computer application or a neuroprothesis[1], [2], [3]. P300 potential is a kind of event related poten-tials (ERPs) that evoked by infrequent stimuli interspersingwith routine stimuli and usually appears as a peak at about300ms after the stimulus. It was first introduced and appliedinto BCI applications by Farwell and Donchin in 1988, tocontrol a mental prosthesis [4], [5].The main issue for P300 potential detection in BCI,

is to extract the weak component submerged in artifactsor normal EEG rhythms. Averaging technique is usuallyemployed by which responses of a number of stimuli aresummed up so as to get a clear peak. In the internationalBCI competitions held at 2003 and 2005, which intend tovalidate signal processing and classification methods forBrain-Computer Interfaces, group of Wadsworth Center,NYS Department of Health provided the data recordedfrom their P300 based BCI systems. Various methods suchas independent component analysis (ICA), support vectormachine (SVM) and continuous wavelet transform plus t-

statistic (t-CWT) were proposed and a few groups obtainedthe correct rate above 90% when all 15 repetitions of thedataset were used.The ICA-based approach [6], which decomposes the

multichannel EEG data into ICs by fast-ICA algorithm inthe training phase, makes the manipulative selection ofthe ICs based on the apriori knowledge of typical spatio-temporal pattern of P300 potential distributed on scalp, andfinally projects them back again to the scalp. On testing, thetrained matrix is applied directly to filter the data. However,the iterative process of ICA is much time consuming, andmore, the IC selection is rather subjective and subtle, it isnot easy to perform online adjustment to adapt to subjectchanging or brain state variation in practice. In the SVM-based method [7] which performs rather good in a highaccuracy when fewer repetitions are used, its parametersoptimization process is also not suitable for online training.The t-CWT based method [8], which takes the maximums ofthe t-value scalogram of the CWT coefficients, as features,is a sound good approach and gives some inspirations to ourwork.

In our approach, time domain other than time-frequencydomain processing is adopted, as P300 potential is, byits essence, a time-domain discriminating component, ora signal peaked at the latent about 300ms, and sustainfor a interval actually. By filtering the data to a narrowfrequency band, useful frequency components correspondedto the features of P300 can be preserved. Then we maycompute some weighted sum of every spatio-temporal pointto obtain a global maximum of the t-statistic, while theweight coefficients can be given as a rather simple form.So it is suitable for this approach to be implemented in aonline manner, and what is more important, each data pointof the EEG recording, rather than only a few among all,contributes to the final features, thus more discriminativeinformation is taken in.We bring forward the T-weight method through theo-

retically deduction in section II. Then in section HI, it iscarried out and applied to the recorded EEG datasets of

0-7803-9422-4/05/$20.00 02005 IEEE1535

P300 speller paradigm in BCI trial, with detailed resultspresented. Conclusions and discussions are propagated insection IV.

II. T-WEIGHT METHOD FOR FEATURE EXTRACTION

Suppose that xk and Yk are signals of two classes w1

w2 at point k, with mean value [Lk and Mk, varianceand 62 respectively. Consider the weighted sum of thepoints with coefficients a and respectively,

{fx = ax, +/3x2

fy = ay, + OY2and we expect that f., and fy were good features

distinguish the two classes. Choose the two sample t-statias a criterion to evaluate the separability between f2 and

T=C fx-fyInS2 + n2Sy2

and2 III. EXPERIMENTS AND RESULTS

two The above approaches are carried out and applied in thedata processing of P300 speller paradigm in BCI experi-ments. The two datasets both came from the internationaldata analysis competition on BCI tasks, one was from

(1) dataset Ilb of 2003 (called dataset 2003 here), the otherwas from dataset II of 2005 (containing two subjects, called

i to dataset 2005A and 2005B respectively), both provided byistic Wadsworth Center, NYS Department of Health.

(2)

where fs and fy denote the mean value of fX and fy, SXand S. denote the variance of them, n, and n2 are numberof samples of two classes, and C is a constant. Assume thatsignal values are uncorrelated between every two points,then

T=C a(i(1 -ml) O(P2-m2)\/nl ( a2oj + 2f2) n2(a251 + /3262)

(3)

c aAti1 +± /3A2wh2e + re2Aa2

where

k = Ik -Mk

A2 =nlo2±+n262(k = 1,2)

iEs

(a) Charaters array presenting to the subject

(4)

At the maximum of T, a and /3 should satisfy

-T=0 and a- =0

From which we obtain

a A/Ll AfL2

/3 Aa2AaIQ^1 i\2

(5)

(6)

(b) EEG electrodes distributed on the scalp

Fig. 1. P300 speller paradigm. (a) The letters array are arranged in 6rows and 6 columns,- and each row or column blinks in a random order(cited from [9]). (b) Distribution of the EEG electrodes on the scalp.

For convenience, we could set

a.= AA, and /3 ,/A2 (7)

While extended to the more than 2 points case, theconclusion is similar. That is, to obtain the maximum ofthe T value, weight of point k should be

A. Experiment design and data acquisitionIn the P300 speller paradigm, the subject was presented

with a 6 by 6 matrix of characters (Fig. la). The subject'stask was to focus attention on characters in a word thatwas prescribed by the investigator (i.e., one character at a

time). All rows and columns of this matrix were successivelyand randomly intensified at a rate of 5.7Hz. Row/column

1536

katk= Ao2 (8)

Jyfu

All data

eatures

Tranng data

tram.-ini x-

test --x.R'-'R'

Testing data

Fig. 2. Data processing flowchart for the P300 speller paradigm.

intensifications were block randomized in blocks of 12.Sets of 12 intensifications were repeated 15 times for eachcharacter, which is called an epoch. 64 channels EEG signals(Fig. lb) are recorded from one subject and digitized at240Hz.Two out of 12 intensifications of rows or columns con-

tained the desired character (i.e. one particular row andone particular column). The responses evoked by theseinfrequent stimuli are different from those evoked by thestimuli that do not contain the desired character and theycontain a P300 component in the EEG. Our task is to findone row and one column within each set of six rows andsix columns which are associated with the target character.Our processing procedures for the P300 speller paradigm

is illustrated in Fig. 2, where EEG data filtering, averag-ing, PCA transformation, T-weighted feature summing, andfeature classification were applied in sequences to performthe character recognition task, as described in the followingsubsections.

B. Preprocessing

1) Low-pass filtering: The raw data was filtered first.Though the contest winner uses a cut-off frequency of 8Hz,we find that 4Hz is enough.

2) Block averaging: From the start of intensification, weintercept 700ms length signal as a trial. A block contains12 trials in which every row and every column flashes onceand 15 blocks compose an epoch. In time sequence, thefirst k blocks are averaged according to corresponding rowor column. In next section results using different k will begiven for a comparison.

Fig. 3 shows the averaged response on Cz. Note thattarget and non-target trials of all characters are averaged.Obviously, a peak around 300ms is observed for the EEGsignal epoches evoked by target row or column, while notfor those without target. But when averaging is implementedon only one epoch, the P300 peak is not so clear to bedistinguished. Fig.4 indicates the weak target among thenon-target trials, where the 12 curves are corresponded to12 rows and columns. The 2 target responses are thickened,but it is difficult to pick them out if not.

aI

Fig. 3. Averaged response on the electrode Cz. Solid line for target anddashed line for non-target, all trials averaged.

Origh0 tias

Fig. 4. Original signals of an epoch (15 blocks averaged).

1537

C. Feature extraction and classification1) PCA data transformation: PCA is performed on the

64 channel signals not for dimensionality reduction but forcorrelation removal, because feature extraction by T-weightas 8 requires the condition of decorrelation among points.Each PC channels is normalized afterward to make variancesof all the transformed channels to Is, and the transformmatrix of PCA when applied to the original 64-channel EEGsignals is presented in Fig. 5.

PC

isc

48

64

0.6

-0.4

-0.

-04.4

-0.6

-0.8

index of PC

Fig. 5. PCA transform matrix

selected, and the two selected row and column in an epochconstruct the final target character.

Results corresponding to different repetitions used arecompared in Fig. 8. As the repetitions increases, characterrecognition error decreases fast. For dataset 2003, whenrepeat time used is above 8, all 31 test characters arerecognized correctly. For dataset 2005A and 2005B, whenrepetition equals 15, 96% and 84% correct rate are obtainedrespectively.

Error wlth different repetn

rep

Fig. 8. Character recognition error rate with different averaging repetitions.

lo-

~32-

48-

64-200 400

Tlme(me)600

1300250200|150

|100

150lol-50|-100

|-150

I-200

Fig. 6. T-weight calculated from dataset 2003

2) Feature classification using T-weighted sum: Nowthe T-weight method is applied on the PCs of 64 virtualchannels. The procedure is, first, compute the weights forPCs of training data using (8); second, compute the weightedsums of all training and testing trials and take them as thefeatures; finally, the classification is done on every characterepoch by selecting the largest one of 6 rows and the largestone of 6 columns to construct a character.Computed T-weight is shown in Fig. 6. It can seen clearly

that P300 patterns exist in most PCs, though differ inintensity and phase.

Fig. 7 shows the weight sums or the features. Thehorizontal axis is the index of characters and the verticalaxis is the feature value. Circles represent target trial andcrosses represent non-target trial, which can be separatedvery clearly. In a semi-character epoch, the largest one is

IV. DISCUSSION AND CONCLUSION

The proposed T-weight method for feature extractionwas applied to P300 potential detection in the EEG dataanalysis competition, and obtained comparative results withthe best. Furthermore, our approach outperforms other onesin following aspects: 1). It is a general-purposed approachthat hardly depends on particular applications or particulardomain, so little a prior knowledge (such as the exact latentof P300 peak) is required. 2). The training process is fastbecause it contains no optimization or iteratives, so it is verysuitable for online training and can be adapted to variationsperiodically.

It is interesting to see that to make the global t-statisticmaximal, the best weight in (8) is also similar to the localt-statistic, besides that the denominator is not squared.

It should be emphasized that proper preprocessing isneeded to ensure a good behavior of T-weight method.In fact, the best weight as (8) requires uncorrelation be-tween every two signal points. To satisfy this condition,the direct approach is whitening the covariance matrix ofthe spatio-temporal plane. But it is not feasible actually.First, due to the small quantity of samples, the estimationof the covariance is usually singular; second, eigenvaluedecomposition of the covariance matrix is a time-consumingprocess as the large number of spatio-temporal points. Soto make a compromise between uncorrelation request and

1538

l*5 w4

Training features

- 1C

20 40 60 80 100 120 140 160Semi-Character

f a l -.

20 40 60 80 100 120 140 160 180 200Semi-Character

(a) Training features (b) Testing features

Fig. 7. Feature scatters of training data, where the circles represent the summed feature value of the target trials in each semi-character epoch, and thecrosses, located much lower, are those of the non-target trials. (a) Distribution of training features. (b) Distribution of testing features.

computational cost, we choose PCA. In PCA the dimensionis equal to the number of channels, so the computationis fast and the resulted PCs are uncorrelated in space.Though it is just an approximation, experience shows thatit dramatically improves the performance. Another way tosolve this problem is to give the best weight under noassumption about distribution and correlation, and what isour next work.

For the online realization of the P300 based BCI para-digm, this approach could be performed even faster. Ob-serving the distribution of T-weight on the spatio-temporalplane (Fig.6), we can find that the values of most pointsare near to zero, which means there is little discriminativeinformation at these points. Another advantageous fact foronline realization is that the T-weight is relative stable for aspecific subject, so we can train it for some duration, then seta threshold and only use only the points above that threshold.By this mean, the process of weight sum could be computedrather fast, and so is the decision process.

ACKNOWLEDGMENT

The work of this paper is supported by the DistinguishedYoung Scholars Fund of China (60225015), Natural ScienceFoundation of China (30370416), Ministry of Education ofChina (TRAPOYT Project), and Specialized Research Fundfor the Doctoral Program of Higher Education of China(20049998012).P300 datasets and some other materials used in

this paper are acquired from the organizers of BCIcompetition, which can be found via the competi-tion website (http: //ida. first. fraunhofer. de/pro jects/bci/competition-iii/).

REFERENCES[1] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and

T. M. Vaughan, "Brain-computer interfaces for communication andcontrol", Clin. Neurophysiol., vol. 113, pp. 767-791, 2002.

[2] B. Blankertz, K.R. Muller, T.V.G. Curio, G. Schalk, J. Wolpaw,A.Schlogl, C. Neuper, G. Pfurtscheller, T. Hinterberger, and M. S.N. Birbaumer, 'The BCI competition 2003: Progress and perspectivesin detection and discrimination of EEG single trials", IEEE Trans.Biomed. Eng., vol. 51, pp. 1044-1051, 2004.

[3] J. R. Wolpaw, N. Birbaumer, W. J. Heetderks, D. J. McFarland, P. H.Peckham, G. Schalk, E. Donchin, L. A. Quatrano, C. J. Robinson, andT. M. Vaughan, "Brain-computer interface technology: A review of thefirst international meeting", IEEE Trans. Rehab. Eng., vol. 8, no. 2, pp.164-173, 2000.

[4] L.A. Farwell, and E. Donchin, "TaLking off the top of your head:toward a mental prosthesis utilizing event-related brain potentials",Electroencephalography & Clinical Neurophysiology. vol. 70, no. 8,pp. 510-23, 1988.

[5] E. Donchin, K.M. Spencer, and R. Wljensinghe, "The mental prosthe-sis: Assessing the speed of a P300-based brain-computer interface",IEEE Trans. Rehab. Eng. vol. 8, pp. 174-179, 2000.

[6] N. Xu, X. Gao, B. Hong, X. Miao, S. Gao, and F. Yang, "Bcicompetiton 2003-data set Ilb: enhancing P300 wave detection usingICA-based subspace projections for BCI applications", IEEE Trans.Biomed. Eng., vol. 51, no. 6, pp. 1067-1072, June 2004.

[7] M. Kaper, P. Meinicke, U. Grossekathoefer, T. Lingner, and H. Ritter,"BCI competiton 2003 - data set llb: Support vector machines for theP300 speller paradigm", IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp.1073-1076, June 2004.

[8] V. Bostanov, "BCI competiton 2003 - data sets lb and Ub: featureextraction from event-related brain potentials with the continuouswavelet transform and the t-value scalogram", IEEE Trans. BiomnedEng., vol. 51, no. 6, pp. 1057-1060, June 2004.

[9] G. Schalk, D.J. McFarland, T. Hinterberger, N. Birbaumer, and J.R.Wolpaw, "BC12000 : A General-Purpose Brain-Computer Interface(BCI) System", IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp. 1034-1043, June 2004.

1539

Testing features

-1T rill W* T VI - CT JR 11 I 11'-20 't

Documents

[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - T-weighted Approach