Final Report for Prosthesis ML - aidenyhlee.com · on assessing the impact of gamification on the usability of the arm-sleeve in terms of precision and signal separation between the

Final Report for Prosthesis ML Aiden Lee

Motivation

According to previous research, the usage of a prosthetic arm can help upper limb amputees better perform their daily life activities and live more independently. However, this requires skills to operate such a device efficiently by controlling the contraction of appropriate muscles. Due to the lack of pre-prosthesis training and the limited integration of prostheses into daily activities, the rejection rates of prostheses among upper limb amputees is still high with rates around 81%. Available training methods for myo-electric prostheses are still limited in efficiency and usability. In the standard pre-prosthesis procedure, the amputees are encouraged by therapists to perform specific exercises in-between sessions, at home, to activate target muscle zones. However, this kind of training is highly inefficient as patients do not get feedback on their training and progress. This usually results in a loss of motivation to continue the training. Hence, by the time the amputees receive the actual prostheses, they face issues in effectively using it and incorporating it in their daily life activities.

In our work, we evaluate the potential benefit of gamification on motivation and efficiency during the pre-prosthesis training procedure. To this purpose, an EMG-sensor arm sleeve with 8 channels was used as controller for a set of mobile games. The user was asked to perform specific hand contractions in order to interact with the game, whereas the muscle responses were collected using the EMG-sensor arm sleeve. In total the following six arm-contractions were covered: pronation, supination, contraction, wrist flexion, wrist extension and motionless. The contractions were inferred based on a hand-motion estimation model that was designed using the EMG signals collected during our data-collection phase. The impact of gamification was assessed using a list of 26 questions that covered the likeability of the game, the ease of use of the arm sleeve as controller, and the overall experience. The questions were derived from previous psychological studies. The evaluation was performed with 20 healthy patients in a controlled test environment.

Experimental procedure

The experiment was conducted in a controlled environment and was divided in three core phases: A pre-game, an in-game and a post-game phase.

The pre-game phase aimed at collecting training data for building the motion sensing model. The in-game phase gave the user the ability to interact with the arm sleeve to then assess the overall user-experience using the designed survey. The post-game phase focused on assessing the impact of gamification on the usability of the arm-sleeve in terms of precision and signal separation between the different movements. Figure 1 gives an overview of the different steps covered during each phase.

Pre-game phase

This phase aimed at collecting data in a structured manner and in a controlled environment. The data was then used to train the hand-motion models. To this end, an interface was developed to better guide the test subject through the different steps covered in this phase as can be seen in figure 2. The user was briefed as to what arm contraction to perform, he was then asked to hold the contraction for a period of 10 seconds. The counter was started by clicking on the icon relative to the performed contraction once it was made sure that the test subject performed the proper contraction.

Figure 2: Test Interface used to monitor and guide the user during the pre-game phase

The phase was divided in 5 stages as follow:

- Stage 1 – Hand Movement Analysis:

The test subject was required to perform a set of six hand motions consecutively that covered pronation, supination, contraction, wrist flexion, wrist extension and motionless arm relaxation. The different arm contractions are displayed in figure 3. Each motion was completed in ten seconds and performed to the maximal angle within a movable range. The aim behind this stage was to collect labeled data in a controlled manner for training the arm-contraction classification model.

- Stage 2 – Finger Movement Analysis:

The test subject was required to successively perform a contraction of all five fingers. Each contraction was completed in ten seconds and performed to the maximal angle within a movable range. This stage was considered for potential future use. The motivation behind this stage was to include finger contractions as a possible extension to the current arm-contraction classification model, and thus, increase the granularity of the prediction model.

Figure 3: Arm-contractions covered in the data collection stage during the pre-grame phase

- Stage 3 – Arm Movement Analysis:

The test subject was required to perform a hand contraction at three arm positions. Each contraction was completed in ten seconds and performed to the maximal angle within a movable range. This stage aimed at analyzing the impact of a variation in the arm position on the classification of hand contractions. The findings from this stage would then be integrated in our motion-sensing model to compensate for potential variations.

- Stage 4 – Precision Control:

The test subject was required to reach specific strength levels during hand contraction and asked to hold each level for ten seconds. This stage aimed at assessing how well the test subject can reach and hold specific strength levels for a period of 10 seconds. The collected data would then be compared with the data collected for the same stage of the post-game phase to assess the impact of gamification on precision control.

- Stage 5 – Electrode Separation:

The test subject was required to perform successively two wrist extensions and wrist flexions. The motivation behind this stage was to assess how well the test subject can perform two opposite arm-contractions in terms of separation. This was assessed by comparing the data collected during the pre-game and the post-game phase. For each phase the channels were identified that contributed most to each contraction. The signal energy in the channels would then be compared for both phases to identify improvements and investigate the potential impact of gamification on this matter.

In-game phase

During this phase the test subject had the chance to interact with the mobile game for a period of two minutes using the arm sleeve as a controller. The subject was then presented with a questionnaire of twenty-six questions that were selected based on psychological references. The questionnaire aimed at assessing the likeability of the game and assess the suitability of a potential prosthesis as game controller. Figure 4 displays the test setup as well as the user while interacting with the mobile game on a laptop simulator.

The questions are listed in the following:

1. I enjoyed performing this activity very much 2. This activity was fun to do 3. I would describe this activity as very interesting 4. I thought this activity was quite enjoyable

ii. Perceived Competence 1. I think I am pretty good at this activity 2. I think I did pretty well at this activity, compared to other patients 3. I am satisfied with my performance at doing this activity 4. I was pretty skilled at this activity 5. This was an activity that i couldn’t perform very well

iii. Effort/Importance 1. I put a lot of effort in this activity 2. I did not try very hard to do well at this activity 3. It was important to me to do well during this activity

iv. Pressure/Tension 1. I did not feel nervous at all while performing this activity 2. I was very relaxed in performing this activity 3. I felt pressured while performing this activity

v. Perceived Choice 1. I believe I had choice about performing this activity 2. I felt like it was not my own choice to perform this activity 3. I performed this activity because I wanted to

vi. Value/Usefulness 1. I believe this activity could be of some value to me 2. I think that performing this activity is useful for _______ 3. I think this is important to do because _______ 4. I would be willing to do this again because it has some value to me 5. I think doing this activity could be beneficial to me 6. I think doing this activity could help me to 7. I think this is an important activity

vii. Closing question 1. I answered these questions honestly 1.

Post-game Phase

In the phase, the test subject was required to repeat stage 4 and 5 of the pre-game phase. This aimed at analyzing the potential impact of gamification on precision control and electrode separation. Please refer to stage 4 and 5 of the pre-game phase for more details.

Algorithmic approach:

In this segment, we describe the approach followed in designing the hand-contraction classification model. As detailed in the previous segment, during the in-game phase training data is collected from all 8 channels of the sensor sleeve for all hand-contractions necessary for controlling the mobile games. The collected data was sampled at a frequency of 1000Hz. For our purposes we only considered the high frequency data. The data is first preprocessed to reduce the signal-to-noise ratio and eliminate all faulty data measurements. The outcome is then normalized, and specific temporal and spatial features are extracted. The feature set was derived from state-of-the-art research and was proven to achieve higher accuracy for the task at hand. The features were then used to train a random-forest-based classification model, that was used to infer the hand motions while using the sensor arm sleeve. Please refer to the ‘Feature Engineering’ chapter for more information. Figure 4 gives an overview of the data processing scheme.

Figure 4: Data processing pipeline used for classifying the hand contractions at hand

Feature engineering: A number of different feature extraction techniques have been proposed in research to tackle multi-channel myoelectric pattern recognition. The feature set we selected was reported to achieve state of the art performance in terms of robustness against practical problems such as limb position or electrodes shift, as well as in terms of achieved classification accuracy. The framework of EMG feature extraction implemented captures the structural characteristics from each of the eight available EMG channels. It also extracts the information from a set of possible combinations of channels to provide cross-channel patterns and context about underlying muscle synergies. The framework can be divided into two components: a temporal component within each channel and a spatial component that account for the cross-channel patterns. In order to increase robustness, the features are extracted from each channel as well as a nonlinear/smoothed version of it. The following two block diagrams in figure 5 and 6 give an overview of the theoretical approach followed in extracting the features.

Figure 5: Block diagram of the first layer of the feature extraction framework

Figure 6: Block diagram of the second layer of the feature extraction framework

Analysis There were 20 participants in the study, 5 of which were recruited from within the lab group and 15 of which were recruited from outside the lab. The 5 people recruited from within the lab participated in all 8 phases but did not complete a post-game survey. The 15 out of lab participants did all 8 phases and completed the post-game survey as well. Additionally, 4 of the participants (3 from inside the lab group and 1 from outside the lab group) were asked to come in again to perform phases 1-3 a second time. The idea was to capture longitudinal data. Unfortunately, the number of days between trial 1 and trial 2 was not held constant but in all cases, it was around one month later. Phase 1 A number of approaches were tried for phase 1 cross user classification. 5 of the 20 datasets were set aside for the test set, meaning that the training set consisted of 15 people. The below images show sample raw values for the same movement conducted by two different users.

As can be seen, the raw values for each channel vary quite widely. As a result, the feature extraction explained previously is important to add more dimensions and parameters the subsequent models can tune. The actual models were implemented using standard Python scikit-learn libraries. We tried various Machine Learning models including Random Forests, SVMs, KNNs and and AdaBoost. In the end, random forests consistently showed higher cross validation and test scores. To identify the best set of hyper-parameters, we split the training set into 10 folds and ran a randomized grid search to try to estimate the hyper-parameter configuration regions showing the lowest cross validation scores. The hyper-parameters we experimented with were min_samples_leaf: [1, 2, 4] bootstrap: [True, False] n_estimators: [100, 300, 500] max_features: ['auto', 'sqrt']

max_depth: [10, 30, 50, 70, 90, None] min_samples_split: [2, 5, 10] We ran the randomized search for 15 candidates and found the following hyper-parameters to yield the lowest average cross-validation score: min_samples_leaf: 4 bootstrap: True n_estimators: 300 max_features: auto max_depth: 70 min_samples_split: 2 We then ran a grid search across parameters in the region and found that having a max_depth of 120 and a min_samples_split of 3 yielded a: Cross-validation Score = 60.4% Test Score = 35.5%. The following graph shows the progression of the training and cross validation scores.

The gap between the two curves is the amount of overfitting of the model. Phase 2 Using the same hyper-parameters, we ran a Random Forest Classifier on phase 2 data. The scores achieved were: Cross-validation Score = 81% Test Score = 24%. The following graph shows the progression of the training and cross validation scores.

Phase 3 Using the same hyper-parameters, we ran a Random Forest Classifier on phase 3 data. The scores achieved were: Cross-validation Score = 63.3% Test Score = 25.4%. The following graph shows the progression of the training and cross validation scores.

Longitudinal Classification As mentioned previously, a longitudinal classification exercise was conducted in which for phases 1-3, recordings were gathered for 4 users on two occasions, roughly spaced 4 weeks apart. Limitation: the number of passed days between the two recordings were not held constant for all 4 users. Note also that a slightly different feature extraction algorithm was used. The features extracted included a list of time domain and frequency features and are listed below: Time Domain Features Mean Absolute Value Root Mean Square Integrated Absolute Value Simple Square Integral Variance Waveform Length Average Amplitude Change Zero Crossings Slope Sign Change Frequency Features Median Frequency Weighted Mean Frequency Many of the features were selected from this paper: https://ieeexplore.ieee.org/document/7748960. The following results were obtained: Phase 1 Classification User 20181107T184845 Cross Validation Score is: 0.986029016658 Prediction Score is: 0.067084078712 User 20181018T150232_Kevin Cross Validation Score is: 0.996059466237 Prediction Score is: 0.28566314659 User 20181023T152814_Brian Cross Validation Score is: 0.994451405047 Prediction Score is: 0.346023235031 User 20181018T164607_Danyan

Cross Validation Score is: 0.986945636624 Prediction Score is: 0.214272938651 Phase 2 Classification User 20181107T184845 Cross Validation Score is: 1.0 Prediction Score is: 0.33305509182 User 20181018T150232_Kevin Cross Validation Score is: 0.998326359833 Prediction Score is: 0.333333333333 User 20181023T152814_Brian Cross Validation Score is: 0.998329156224 Prediction Score is: 0.324707846411 User 20181018T164607_Danyan Cross Validation Score is: 0.995829858215 Prediction Score is: 0.33778148457 Phase 3 Classification User 20181107T184845 Cross Validation Score is: 0.997983870968 Prediction Score is: 0.200803212851 User 20181018T150232_Kevin Cross Validation Score is: 0.98490945674 Prediction Score is: 0.205025125628 User 20181023T152814_Brian Cross Validation Score is: 0.998994974874 Prediction Score is: 0.198994974874 User 20181018T164607_Danyan Cross Validation Score is: 0.969879518072 Prediction Score is: 0.2 Phase 4 In phase 4, the user had to perform contractions and try to hit certain strength intensity values for 10 seconds each. To quantify the extent to which the user was able to do this task well, we calculated the variances of the recorded smoothed data around the value that the user was supposed to hit. The greater the variances, the less the user was able to keep a certain strength intensity level.

After each trial, the user would also fill out a survey regarding the 1942 game he would play in phase 6. The survey has 30 questions and captures the user’s qualitative experience while playing the 1942 game. Out of those questions, we calculated the values for the average enjoyment and average perceived value of playing the game. We then took the users that scored above average on the sum of those two values and had at least a 6 out of 7 score in either of those categories. A one-tailed p-test was then conducted with the hypothesis being that for those users identified in the previous paragraph, the average variance values post game would be lower than pre-game, meaning that they were able to hold strength intensities better post game. This resulted in the following values: p-value: 0.184 Cohen’s d (measuring effect size): 0.560 If we take away one of the users that showed outlier values (his average variance values were the only ones that were higher post game), we get to the following results: p-value: 0.086 Cohen’s d (measuring effect size): 0.677

Discussion There are a variety of insights to be drawn from the analysis section above. Phase 1 As can be seen in the “Learning Curves Random Forest” graph in Analysis->Phase 1, the training error is almost at 0%. This suggests that if the model sees a user’s movements before, the model is able to almost perfectly discriminate between the different phase 1 movements when getting new examples from the same user. At the same time, the lower cross-validation and test scores show that there is still overfitting happening. Looking again at the “Learning Curves Random Forest” graph, it is interesting to note that the cross validation score curve is increasingly linearly as the number of training examples is increased. This suggests that had there been more training data available, the cross validation score would be even higher that currently obtained, though at some point we would expect the curve to also flatten and converge to some value below the training curve score. Nonetheless, the current results achieved are clearly superior to random guessing and provide a benchmark for subsequent researchers to try to emulate and achieve higher scores. Phase 2

Here, the very large gap between the cross validation score and the test score suggest that after feature extraction, the distribution of the input data from the training set and the samples that were left out during the cross validation process are more similar than the distribution of the input data from the training set and the test set. It can be assumed that if the split between training-test sets were to be a different combination between the users we had, the results would be different, perhaps leading to a narrower gap between the cross-validation score and the test score. Currently, our test score is not better than random guessing, which for three movements would yield a 33% chance of being correct. Phase 3 The same interpretation as for phase 2 also applies for phase 3. Here our model is slightly better than random guessing though, which with 5 movements would have a 20% accuracy. Longitudinal Classification The results we obtained show that after 4 weeks, it becomes as difficult to classify the movements from a previously seen user as to classify the movements from an unknown user. This result is rather discouraging and suggests that certain variables might change, eg. the exact posture of the user while doing the movements or perhaps even the form of the user’s arm. Phase 4 Six out of the 20 users fit the criteria described in the Analysis Section, namely users that scored above average on the sum of the average enjoyment and average perceived value values and had at least a 6 out of 7 score in either of those categories. For those users, the p-values and Cohen’s d values suggest that indeed we can make an assertion that the hypothesis of the game having a correlation with post-game improvements in the experiments conducted applies. Of course, the small number of users used to calculate the p-values is a clear limitation and more data from more users would need to be gathered to further validate the correlations derived.

Conclusion In this study, we wanted to investigate different classification models on arm movements performed with the myoelectric sleeve. The vision is to be able to discriminate between any arm movement in any arm position and therefore improve the lives of amputees. The data we collected provides a good basis for future researchers to improve upon the results we already obtained using various feature extraction and machine learning approaches. At the same time, we wanted to quantify the impact of gamification on patients’ ability to better use the device. The small experiment we conducted in phase 4 provides a first insight into possible results. Overall, there are three areas in which we see improvements or extensions to this study be made. First, gathering more user data would increase the accuracies obtained in phases 1-3 and bolster the p-values calculated in phase 4. This is primarily a time and budget issue. Second, better noise reduction and feature extraction algorithms might be explored to make movements between different users appear more similar. Furthermore, there is always potential to tune the hyper-parameters of the machine learning model to extract better results. Third, gathering data from actual amputees would show whether the models trained here would yield similar results if tested on amputees’ data. Ultimately, the goal is after all to employ this technology with amputees and not healthy patients.

Documents

Final Report for Prosthesis ML - aidenyhlee.com · on assessing the impact of gamification on the usability of the arm-sleeve in terms of precision and signal separation between the