BCI- Controlled Tone Matrix

BCI- CONTROLLED TONE MATRIX

By Laura Ileana Stoilescu, Tristan Didier

ABSTRACTUsually, the devices needed for detecting neural signals are expensive, difficult to work with and fragile just to name a few. With the Emotiv EPOC headset we don’t have to worry about these aspects at the price of lower quality signals. Still, the electroencephalography (EEG) data recorded with this headset makes possible a large range of applications. We demonstrate a brain-controlled tune matrix which works on similar principles to the P300 speller: the interface flashes the rows associated to one column in the matrix and a P300 brain potential is elicited when the flashed row matches the position on which we want to set the tone. Each position represents a different sound frequency mapped increasing from bottom up.

EEG signals are transmitted wirelessly from the headset to the processing module in Matlab which selects, adjusts and classifies the data received in order to discriminate the P300 signal. When a certain position triggers a P300, the matrix marks the position as set and generates the associated tone. We discuss the challenges in making our initial application more practical and reliable.

GENERAL TERMSAlgorithm, Design Implementation, Measurement, Performance

KEYWORDS

EEG, P300, Neural Signal Processing, Brain -Computer Interface, Emotiv, Event Related Potentials

1. INTRODUCTIONIn the present paper we present the concept, design, implementation and evaluation of a wave synthesizer inspired by Andre Michelle’s Tone Matrix [1] which is controlled through a brain- computer interface (BCI).The application itself is represented by a graphical interface (GI) through which the user sets the tonality of each sound. In Andre Michelle’s version, the interface is represented by a 16- by - 16 interface whose axes are time (horizontal direction) and frequency (vertical direction) respectively. We followed the same idea with an 8-by-8 matrix in which the positions are set by using a P300 detection algorithm on the electrical signals acquired from an Emotiv EPOC EEG headset. We used two methods of detection: linear discriminant analysis (LDA) applied over data transformed using the principal component analysis (PCA) and multi-class support vector machine implemented (MC- SVM) in libsvm 3.0.1.

We demonstrate a brain-controlled input method which works on the same principles to a P300 speller: for each time slot the rows are flashed and a P300 brain potential is elicited when a flashed row matches the position the users wants to set. After setting a position, the user has the possibility to add another tone to that time slot by choosing another position. The column is changed when the interface has received the classifier’s decision so just as we

have the pointed position, we switch automatically to the next column.

2. BRAIN-COMPUTER INTERFACEFor the actual realization of the BCI there are many challenges. Since the Emotiv headset is not a research-grade EEG headset and, as a result, there is a significant amount of noise in the recorded brain waves which require more complex signal processing and machine learning techniques to classify neural events such as the P300. However, the headset provides an encrypted wireless interface between the headset and the computer.

2.1.TONE MATRIX DESIGN

The Interface has been programmed using C++. The SDL library [2] has been used to display the matrix, and the fmod library [3] has been used to play the sounds. To communicate with the other parts of the system, we used virtual serial ports [4]. An additional library has been used for these serial communications [5].

The offline protocol: The aim of the offline part was to log data in order to compare the effectiveness of different classifiers. To do that, the subject was asked to focus on a particular square of the matrix, before starting the flashing. We have arbitrarily chosen to flash the squares of the diagonal which is between the top left hand corner and the bottom right hand corner.

The modus operandi is briefly described below.

For each column:

- The target square is colored in red.- Delay : 3 seconds- The target is colored with its initial color.- Delay: 1 second- The blinking is launched. During a

“blinking session”, each line is flashed three times, in a random order. Each “line-blinking” is composed of an on-slot

of 100 ms and of an off-slot of 75 ms. During the on-slot, the line is covered with a white rectangle. During the off-slot, the white rectangle is removed.

- When the blinking session is finished, the next column is automatically selected and the same operations are repeated. This is done until all the columns have been swept.

The programs send different markers through the serial port linked to Emotiv:

y Start interfacez All columns have been swepta New column b End of columnx1 The flashed line x is a targetx0 The flashed line x is non-target

The online protocol: aims at using the whole system to “compose” and play a musical-loop.

The main differences with the offline part are:

- For each column, the user can select the line he wants. So the markers that the interface sends are now different for each line.

- The Matlab program sends a feedback after each blinking session, and these feedbacks are used to upload the matrix. This feedback is just the line number resulting from the classifier.

- The flashing is now totally random, and we switch to the next column only when the classifier has sent a feedback.

So the new modus operandi is:

We sequentially highlight each column as the current one. A “blinking session” is launched. When the Matlab answer is received, the blinking is stopped, the corresponding square is colored in green, and the corresponding sound is played. The next column is automatically

selected and we repeat the same operations. This is repeated until all the columns have been swept.

Once all the columns have been swept, the musical-loop corresponding to the selected squares is played, until the user presses the escape key.

The program sends mainly the same markers as the offline version excluding, of course, the target/ non-target markers which are to be decided by the classifier.

2.2.P300Using the P300 neural signal, Tune Matrix recognizes the positions which will be set for each time slot. In the next sections we will present an overview of the P300 signal and the wireless Emotiv EPOC EEG headset used by our Tune Matrix application and motivation for the design choices we made.

Event-Related Potentials (ERPs) are those EEGs that directly measure the electrical response of the cortex to sensory, affective or cognitive events. They are voltage fluctuations in the EEG induced within the brain, as a sum of a large number of action potentials (APs) that are time locked to sensory, motor or cognitive events. ERPs are quite small- amplitude signals relatively to the background EEG activity getting to 1-30 µV therefore they often need a synchronous signal-averaging procedure to enhance the evoked potential and suppress the background noise. [6]

The elicited ERP is comprised of two main components: the mismatch negativity (MMN) and the novelty P300 which refers to the component elicited by events about which the subject has not been instructed prior to the experiment. The P300 wave represents cognitive functions involved in orientation of attention, contextual updating, response modulation and response resolution and consists of two overlapping subcomponents: P3a

and P3b.The two subcomponents are most visible in different areas of the scalp:

P3a- maximum amplitude over frontal areaP3b- maximum amplitude over parietal areas

When somebody concentrates on a task-specific stimulus (e.g., a highlighted row in Tune Matrix among a pool of stimuli (e.g., non highlighted rows), the task-related stimulus will elicit a positive peak with a latency of about 300ms from the stimulus onset in subject’s EEG signal.

A classic example experiment driven by P300 signals is the P300 speller: a grid of 6×6 alphanumeric characters is presented to a subject. The subject focuses on a specific character, while the rows and columns are randomly flashed. Whenever a row or column containing those specific character flashes, a P300 signal is elicited in the subject’s EEG. The speller then predicts the specific character that the subject intends to select by determining the row and column that correspond to P300 signals in the subject’s EEG and takes the letter at the intersection point.

Such a signal is presented in Figure 1 after averaging over all trials in a dataset. Initially we considered averaging the target trials and all non-target trials, independently of the flashed row. This method had very good results in the offline part (87%-93% accuracy) of the project but in the online part the classifier couldn’t discriminate between the target and non-target. This is because in the online part we couldn’t know beforehand which were non-target trials so we had to average them separately, for each line. Doing so, we now had the neighboring lines of the target one which also presented a P300 and now this effect wasn’t dissipated anymore by averaging with all non-neighboring lines.

By averaging the recording, the brain activity which is time-locked to the stimulus onset will be extracted as the ERP while the activity that is not

time-locked will be averaged out. The drawback of averaging is that it leads to a dramatic increase of the time needed to communicate a position, hence the continued effort to develop new features and classifiers to reduce the required number of trials.[7]

Figure 1: Trial and sensor-averaged data presenting a P300 ERP (blue) versus brain activity when non-target stimuli has been presented (red) on a voltage scale (10-6)

2.3.WIRELESS EEG HEADSETWe used the Emotiv EPOC headset which has 14 data collecting electrodes and two reference electrodes. The sampling rate is 128 Hz and the electrodes are placed in roughly the international 10-20 system and are labeled as such. The headset transmits encrypted data wirelessly to a Windows based machine; the wireless chip is proprietary and operates in the same frequency range as 802.11 (2.4 GHz).

The software that comes with the Emotiv headset provides the following detection functionality:

- various facial expressions (Expressive)

- levels of engagement, frustration, meditation, and excitement (Affective)- subject-specific training and detection of certain cognitive neuro-activities such as push – pull, rotate and lift (Cognitive)

Also built in the headset is a gyroscope that detects the change of orientation of subject’s head. The headset is not meant to be an extremely reliable device, thus it is challenging to extract finer P300 signals from the EEGs this headset produces.

But this headset can be easily deployed at large scale because of its low price, and can be extremely handy if we can extract useful signals (e.g., P300) from it through smart signal processing and classification algorithms.

Figure 2: Emotiv EPOC electrodes

3. DESIGNIn the design of the application we had to research many aspects in order to have a reliable and robust Tune Matrix system.

Since the Emotive headset isn’t intended for more accurate signal detection, each EEG electrode will have more noise than other headsets (e.g., Biosemi). In an always noisy scenario, the data which we use for obtaining the P300 ERP can be completely invalidated. For time-locked events, such as P300, the only reals solution is averaging. It is also used for BAEP (Brainstem Auditory Evoked Potentials) where the averaging is made over 1000-2000 trials.

We studied different solutions for increasing the Signal to Noise Ratio (SNR) such as band pass filtering, de-trending, re-referencing, electrode and trials averaging. The averaging introduces a delay in the acquisition of a P300 signal because we need several trials before actually detecting the P300.

The band pass filter is used to get rid of any noise that is not in the P300 frequency range which we estimated between 1.75 and 5 Hz. We used a Zero-Pole- Gain (ZPG) design 4th order Butterworth filter to keep the 0.1- 9 Hz segment after the example set in the Neurophone application.[8] The ZPG version of the filter proved to be more stable as seen in Figure 3.

Figure 3: Magnitude response of a ZPG filter design (green) versus standard IIR filter design (blue)

Given the strong tendency of rear sensors to display the P300 signal, the default reference locations at approximately P3/P4 will have a significant P300 component which will be subtracted from all the other channels - in other words the P300 signal will be bigger and more detectable on more channels if you use a reference away from the parietal area. Therefore we chose as reference electrodes the AF3 and AF4.

As classification algorithms we used a Bayesian statistical classifier based on PCA and LDA and a Support Vector Machine (SVM) classifier provided by the LibSVM Matlab library. For both classifiers we obtained almost the same results but the difference is the processing time which is roughly 2.5 seconds for the SVM classifier and 6.5 seconds for the Optimal Bayes.

3.1 LINEAR DISCRIMINANT ANALYSISLDA is a method used in machine learning for finding a linear combination of features which

separate two or more classes of events. The resulting combination may be used as a linear classifier or for dimensionality reduction before classification. LDA projects the data in a space of maximum variance between classes (between-class) and minimum variance inside a class (within-class). The within-class covariance matrix is inversed and if there isn’t enough variance on all dimensions, we will have a singular matrix. PCA is used for eliminating the redundant dimensions and obtaining a non-singular covariance matrix within-class.

3.2 PRINCIPAL COMPONENT ANALYSIS

Because of the low spatial resolution, EEG channel readings are strongly correlated. This implies that any topographical difference between the two classes would be unobserved. It is therefore important to de-correlate the channels prior to classification.

PCA is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components.

The number of principal components is less than or equal to the number of original variables meaning that the data is projected into a lower- dimensionality space which maximizes the variance. The principal components account for as much of the variability in the data as possible. Each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components.

3.3 SUPPORT VECTOR MACHINE CLASSIFICATION

Support Vector Machines (SVMs) are two-class robust classifiers that work by finding “maximum margin” hyper-planes which

optimally separate the two classes. The maximum margin hyper-plane is the one that is furthest away from the nearest data point of each class. Whereas the original problem may be stated in a finite dimensional space, it often happens that the sets to discriminate are not linearly separable in that space.

For this reason, it was proposed that the original finite-dimensional space be mapped into a much higher-dimensional space, presumably making the separation easier in that space.

To keep the computational load reasonable, the mapping used by SVM schemes are designed to ensure that dot products may be computed easily in terms of the variables in the original space by defining them in terms of a kernel function selected to suit the problem. The hyper-planes in the higher dimensional space are defined as the set of points whose inner product with a vector in that space is constant.

The classification method used for training is automatically set the same in the online testing for the same user.

4. IMPLEMENTATIONIn the offline implementation we focus on data pre-processing and training the classifier. We used one dataset for each of the seven subjects where we collected data from all channels using the offline interface described in section 2.1.

Since synchronization is essential in a P300 detection our greatest challenge was to correctly assign each trial to its corresponding marker. This was due to the fact that the data was collecting by reading the header from the Fieldtrip interface and the markers were set through a serial port.

Therefore the operation order in the data -collecting and decision algorithms was of major significance. In order to have an efficient processing of the data, Matlab waits for all the rows to be flashed at each column and only after that begins to pre-process the data and add it to the training set. The training set is saved and used for the same user in the online performance.

The features fed to the classifier were the time-domain amplitude variances of the trial- averaged data.

In the classifier training we used the same interface as for the online performance to keep the training conditions as similar as the testing ones. The intended positions were marked before the online testing so we could evaluate the application.

We chose to play the resulted tune at the end of the matrix evaluation to avoid having audio P300 artifacts in the data collected.[6] We concluded the visual feedback of the set point is sufficient.

In our first approach, described in section 2.2, we averaged the target and non-target trials separately without taking into account the identifier of non-target line. The results were very optimist but they didn’t match the online data-collecting method so we adjusted them. As a general idea of the results in this first step, we presented in Table 1.Error: Reference source notfound

Dataset LDA+PCAaccuracy [%]

SVM accuracy [%]

Dataset 1 90.265 92.920Dataset 2 87.500 89.930Dataset 3 91.667 88.095Dataset 6 84.210 87.134Dataset 7 88.452 90.909Average Accuracy

88.418 89.797

Table 1: Averaging without discriminating non-target trials

In our second approach we apply the same procedures to the data collected both offline (from an existing dataset taken either through Fieldtrip, either with Testbench) and online (from current data stream). The obtained dataset was named training.

For the offline part we applied a mask to divide training into training-set and test-set. We varied the training-set/ total dataset ratio as described in Table 2 and Table 3.

In the online evaluation we kept the whole training dataset and used it, as the name suggests, as a training set in evaluating the online data. Although the classifications got high accuracy scores, the rows chosen as target were not the real choices.

Because all the processing steps were the same for both scenarios we tried varying different parameters to see which has a bigger impact on the classification.

Dataset 1LDA classification accuracy0.5 98.960.6 99.120.7 99.250.8 1000.9 Not enough varianceSVM classification accuracy0.5 87.500.6 87.720.7 87.970.8 87.580.9 87.79

Table 2: Different training-set/ dataset ratios for the first dataset

Dataset 2LDA classification accuracy0.5 98.960.6 97.370.7 99.25

0.8 98.040.9 Not enough varianceSVM classification accuracy0.5 87.500.6 87.720.7 87.970.8 87.580.9 87.79

Table 3: Different training-set/ dataset ratios for the second dataset

The selection of the proper training and testing set sizes slightly improves the performance. This is not a permanent character since the training set and the test set are chosen randomly in these limits by the crossvalind Matlab function.

Although these are promising results, the visual inspection of each column show that the data for the rows pointed out as target are far from the P300 shape.

The performance in the online evaluation isn’t the desired one due to the fact that, unlike a P300 speller application, we use a lot less non-target stimuli having a target/ non-target ratio of 8/56 instead of 16/128 by flashing the rows only.

Therefore, the oddball is more difficult to detect since the random character of the presented stimuli is less obvious. A solution is flashing the columns along with the rows and introducing a distracter stimulus such as flashing diagonals, random points on the matrix.

0 100 200 300 400 500 600 700 800 900 1000-40

-30

-20

-10

0

10

20

30Average of all electrodes. Column 4, target = 4

Time [ms]

1

23

4

5

67

8

Table 4: Dataset 1, TDR = 0.7, channel 4

Another explanation for this peculiar behavior might be the serial port used for markers transmission since it is known for introducing a delay from sending to reception. If the effects of the serial communications are consistent enough for data to be assigned to the wrong marker, then the misbehavior of the classifier is understandable.

0 100 200 300 400 500 600 700 800 900 1000-15

-10

-5

0

5

10

15Channel 4. Column 4, target = 4

Time [ms]

1

23

4

5

67

8

Table 5: Dataset 2, TDR = 0.7, channel 4

As suggested in the literature, we tried an average both over time and electrodes. The accuracy results were situating at the same levels but the only channel now, plotted over different trials gives us extra information.

Provided the serial communication introduces a delay in the markers transmissions, which is a known fact, the P300 would be detected a lot faster since we lose the data between the real marker transmission and the marker reception.

Indeed, if we consider the image below, where we have the target trials (blue) and the non-target trials (red), we can see that before 300 ms all target trials are increasing and all non-target trials aren’t.

We didn’t consider an improvement to down-sample from 128 Hz as it already is a low sampling frequency (e.g., in comparison with Biosemi) but we still tried it. Down-sampling from 128 Hz to 32 Hz didn’t have any influence on the results.

0 100 200 300 400 500 600 700 800 900 1000-2

-1

0

1

2

3

4

5

6

Time [ms]

Am

plitu

de [ V

]

Average of F3 and F4

Target

Non-target

Table 6: Electrode-averaged data over six trials: 3 target (blue), three non-target (red) . We can see in the first 300ms the rising trend of the target trials

CONCLUSIONSOne major drawback of this application is the serial port communication which introduces a delay large enough to complicate a lot the decision process.

Applying LDA and PCA for each trial is a time equivalent of getting a common spatial pattern

feature. This explains the slightly better results obtained in the case of using a Bayesian statistical classifier based on PCA and LDA.

The time needed for reading the FieldTrip header is 0.008-0.009s .The classification of each trial is realized only after all the rows are flashed for the current column, so we have a quasi-offline decision-making algorithm which doesn’t add time to the online data acquisition. If the assumption that the markers are delayed is correct, we have the certainty that the processing part doesn’t have any role, only the serial interface.

BIBLIOGRAPHY1. Michelle, Andre. http://lab.andre-michelle.com/tonematrix. http://lab.andre-michelle.com/. [Online] April 11, 2009.

2. http://www.libsdl.org/. Simple Directmedia Layer. [Online]

3. middleware, fmod interactive audio. http://www.fmod.org/. [Online]

4. http://com0com.sourceforge.net/. Null-modem emulator . [Online]

5. http://www.codeproject.com/KB/system/serial.aspx. Serial library for C++. [Online]

6. Saeid Sanei, J.A. Chambers. EEG Signal Processing. London : John Willey& Sons Ltd., 2007. ISBN 978-0-470-02581-9.

7. Feature Extraction and Classification of EEG Signals for Rapid P300 Mind Spelling. Adrien Combaz, Nikolay V. Manyakov, Nikolay Chumerin, Johan A. K. Suykens, Marc M. Van

Hulle. Leuven : International Conference on Machine Learning and Applications, 2009. 978-0-7695-3926-3.

8. NeuroPhone: Brain-Mobile Phone Interface using a Wireless EEG Headset. Andrew T. Campbell, Tanzeem Choudhury, Shaohan Hu, Hong Lu,Matthew K. Mukerjee!, Mashfiqui Rabbi, and Rajeev D. S. Raizada. Hanover, : MobiHeld, 2010. ACM 978-1-4503-0197-8/10/08.

Documents

BCI- Controlled Tone Matrix