ars.els-cdn.com€¦ · Web viewA multilayer perceptron is a feed forward artificial neural network model that maps sets of input data onto a set of appropriate output so it is

Supplementary Materials

1. Features Considered for Prediction

All features employed in prediction processes are listed in supplemental table 1.

Supplemental Table 1. List of 93 features used in machine learning based prediction of outcome. MoCA in year

4 was predicted.Feature # Abbreviation Date (Year) Feature Name

1,2 MDS_UPDRS I 0&1Movement Disorder Society_ Unified

Parkinson's Disease Rating Scale-type one

3,4MDS_UPDRS I-

Q0&1

Movement Disorder Society _ Unified

Parkinson's Disease Rating Scale- Questionnaire

-type one

5,6 MDS_UPDRS II 0&1Movement Disorder Society _ Unified

Parkinson's Disease Rating Scale-type two

7,8MDS_UPDRS

III0&1

Movement Disorder Society _ Unified

Parkinson's Disease Rating Scale-type three

9,10 MoCA 0&1 Montreal Cognitive Assessment

11,12 AGE, GENDER

13,14DD DIAG ,

SYMDisease Duration -Diagnosis

15,16 REM 0&1REM Sleep Behavior Disorder

Questionnaire

17,18 ESS 0&1 Epworth Sleepiness Scale

19-22 UPSIT-BL1 0University of Pennsylvania Smell

Identification Test-Booklet1

20 UPSIT-BL2 0University of Pennsylvania Smell






23,24 SDMT 0&1 Symbol Digit Modalities Test

25, 33 LNS-NUM1 0&1 Letter Number Sequencing-NUM 1

26,34 LNS-NUM2 0&1 Letter Number Sequencing-NUM 2




1



32,40LNS-Total

NUM0&1 Letter Number Sequencing-Total score

41,45 HVLT‐R-Q 0&1Hopkins Verbal Learning Test – Revised

(Derived-Total Recall T-Score)

42,46 HVLT‐R-R 0&1Hopkins Verbal Learning Test – Revised

(Derived-Delayed Recall T-Score)

43,47 HVLT‐R-S 0&1Hopkins Verbal Learning Test – Revised

(Derived-Retention T-Score)

44,48 HVLT‐R-T 0&1Hopkins Verbal Learning Test – Revised

(Derived-Recog. Discrim. Index T-Score)

49-50 BJLO 0&1 Benton Judgment of Line Orientation

51-52 STAIA 0&1 State‐Trait Anxiety Inventory for Adults

53-54 QUIP 0&1Questionnaire for Impulsive‐Compulsive

Disorders

55-56 GDS 0&1 Geriatric Depression Scale

57-58 SCOPA‐AUT 0&1 Scales for Outcome in Parkinson’s Disease

59 MS&EADL 0Modified Schwab & England Activities of

Daily Living

60,61 GNE 0&1 General Neurological Exam

62,63 NECN 0&1 Neurological_Exam_Cranial_Nerves

64,72 VS-I 0&1 Vital Signs-Temperature (in Celsius)

65,73 VS-J 0&2 Vital Signs-Arm used for blood pressure

66,74 VS-K 0&3 Vital Signs-Supine BP - systolic

67,75 VS-L 0&4 Vital Signs-Supine BP - diastolic - before

68,76 VS-M 0&5 Vital Signs-Supine heart rate - before

69,77 VS-N 0&6 Vital Signs-Standing BP - systolic

70,78 VS-O 0&7 Vital Signs-Standing BP - diastolic

71,79 VS-P 0&8 Vital Signs-Standing heart rate

80,81 SoE Socio‐Economics

82 FH-Q1 Family History-Biological Mother with PD

83 FH-Q2 Family History-Biological Father with PD

84 FH-Q3 Family History-Full Siblings

85 FH-Q4 Family History-Full Siblings with PD

86 FH-Q5Family History-Maternal Grandparents with

PD

87 FH-Q6Family History-Paternal Grandparents with

PD

2

88 FH-Q7 Family History-Maternal Aunts and Uncles

89 FH-Q8Family History-Maternal Aunts and Uncles

with PD

90 FH-Q9 Family History-Paternal Aunts and Uncles

91 FH-Q10Family History-Paternal Aunts and Uncles

with PD

92 FH-Q11Family History-How many children do you

have

93 FH-Q12Family History-How many children with

PD

In the present work, a range of predictor algorithms were selected amongst various families of learner

and regressor algorithms, and in addition, a range of Feature Subset Selector Algorithms (FSSA) were

considered in combination with learner machines, to optimize prediction of outcome.

2. Predictor machines

Below, we elaborate on the 10 predictor machines used in our present work.

2.1. MLP-BP (Multi-Layer Perceptron-Back propagation)

A multilayer perceptron is a feed forward artificial neural network model that maps sets of input data onto

a set of appropriate output so it is a modified MLP that uses three or more layers of neurons (nodes) with

nonlinear activation functions, and is more powerful than the perceptron in that it can distinguish data that

are not linearly separable, or separable by a hyper plane. [1], [2]. Diagram of MLP-BP is shown

Supplemental Figure 1. In this specific work, we used a three–layer neural network and the number of

neurons in each layer was adjusted via Q-learning automatically.

Supplemental Figure 1. Diagram of MLP-BP

2.2. RNN (Recurrent Neural Network)

3

Recurrent neural network is a deep learning algorithm. The RNN as fundamentally different neural

network from feed-forward architectures was investigated for modelling of nonlinear behavior [3], [4].

Diagram of the RNN is shown Supplemental Figure 2. In this work, we used a model with many inputs to

one output. In this specific work, we used a three–layer neural network and the number of neurons in each

layer was adjusted via Q-learning automatically.

Supplemental Figure 2. diagram of the RNN

2.3. RBF (Radial Basis Function)

Radial Basis Function is a method proposed in machine learning for making predictions and forecasting.

Radial basis functions are embedded into a two-layer feed-forward neural network. In between the inputs

and output layers, there is a layer of processing units called hidden units, which implement radial basis

functions [5]. Diagram of RBF is shown Supplemental Figure 3. In our work, the number of neurons in

the hidden layer was adjusted via Q-learning automatically.

Supplemental Figure 3. diagram of the RBF

2.4. LOLIMOT (Local Linear Model Trees)

The aim of the local linear model trees is to enable fast and easy-to-use nonlinear system identification.

LOLIMOT is a fast incremental construction algorithm for local linear neuro fuzzy models also known as

Takagi-Sugeno fuzzy models. This algorithm shortens the overall modeling development time by

reducing the number of required trial and error steps for identification of patterns. In order to achieve this

goal, the implemented algorithm has to converge in a reliable manner without any random influences. The

network structure of a local linear neuro fuzzy model is depicted in Supplemental Figure 4. Each neuron

realizes a local linear model (LLM) and an associated validity function that determines the region of

4

validity of the LLM [6], [7]. In this specific work, number of neuron in the hidden layer was adjusted via

Q-learning automatically.

Supplemental Figure 4. Network structure of a local linear neuro fuzzy model with M neurons for nx LLM inputs;

and nz validity function inputs Zi.

2.5. DTC (Decision Tree Classification)

Decision Tree classification technique is one of the most popular techniques in the emerging field of data

mining. There are various methods for constructing the DTC. Induced Decision tree (ID3) is the basic

algorithm for constructing the DTC [8]. There are many algorithms based on classification that is sample

based, neural networks, Bayesian networks, support vector machine, and decision tree. The DTC

classifies samples by sorting them down the tree from the root to some leaf node, which provides the

classification of samples. Each node in the tree specifies a test of some attribute of the sample and each

branch descending from that node corresponds to one of the possible values for this attribute [9].

Supplemental Figure 5 shows a diagram of decision tree algorithm. In this specific work, the maximum

depth was not set so the algorithm would continue until all leaves were pure. To measure the quality of a

split, a "Gini" function was used (GINI function describes the impurity of each node; each child node was

purer than its parent node so that the GINI function was minimized).

Supplemental Figure 5. an example of the DTC algorithm

2.6. RFA (Random Forest Algorithm)

Random Forests are a combination of tree predictors such that each tree depends on the values of a

random vector sampled independently and with the same distribution for all trees in the forest. The

5

generalization error of a forest of tree classifiers depends on the strength of the individual trees in the

forest and the correlation between them [10], [11]. In Supplemental Figure 6 a diagram of the RFA is

shown. Depth of structure was adjusted via Q-learning automatically. Number of trees and number of

splits were set to 1000 and 5, respectively.

Supplemental Figure 6. a diagram of the RFA algorithm

2.7. BRR (Bayesian Ridge Regression)

Bayesian method views parameter estimate as random, not fixed. The method combines prior information

about parameter (prior distribution of parameter) with the observed data (likelihood function) to obtain

the information or distribution of parameter given data, posterior distribution [12]. Bayesian Ridge

Regression (BRR) corresponds to the particular case α1 = · · · = αm. By regularizing all the features

identically, BRR is not well suited when only few features are relevant [13]. The BRR model assumes

normal likelihood density nn(y | Xβ, σ2 In) for the data. The conditional random variables β, σ2 | λ may be

assigned a conjugate multivariate normal (n) inverse-gamma (ig) prior distribution with probability

density function defined by [14]:

π (β, σ2 | λ) = n (β | 0, σ2λ −1 Ip)ig(σ2 | a, b)

And λ is assigned a gamma ga(λ | aλ, bλ) prior distribution [15]. In this specific work, maximum

iteration of the machine was set to 1000 with a stopping condition when β was converged by a tolerance

of 10-3. The initial values of λ and α were set to 10-6.

2.8. PAR (Passive Aggressive Regression)

Passive-aggressive (PAR) algorithms are a family of online algorithms for supervised learning [16]. They

are similar to the Perceptron in that they do not require a learning rate. This learning method updates the

classification function when a new example is misclassified or its classification score does not exceed

some predefined margin. PAR algorithms have been proved as a very successful and popular online

learning technique for solving many real-world applications [17]. On each round, the PAR algorithms

6

solve a constrained optimization problem which balances between two competing goals: being

conservative, in order to retain information acquired on preceding rounds, and being corrective, in order

to make a more accurate prediction when presented with the same instance again. The PAR algorithms

enjoy a certain popularity in the Natural Language Processing community, where they are often used for

large-scale batch learning [18]. In this specific work, the maximum epoch was set to 1000 with step size

1. The iterations stopped if the current loss value was larger than the sum of previous loss value and 1e-3.

Furthermore, the value of epsilon was adjusted via Q-learning automatically. Epsilon parameter is a

threshold amount between prediction value and target value. If their difference was under epsilon, the

model would not be updated.

2.9. Thiel-Sen Regression

Theil-Sen regression, a nonparametric estimation technique, uses median instead of mean. Median is not

sensitive to outliers in the data. Slop estimators b1 for all pairwise sets of observations in the data are

calculated and the median of all these slopes is taken as Theil-Sen slope estimator. The Y-intercept is

obtained by taking the median of all the differences (yi – b1xi) [19]. One interesting way to characterize

the slope of least squares regression line is that it is the solution of ρ(x, r(β)) = 0, where ρ is the Pearson

correlation coefficient and r(β) are the residuals from a fitted line with slope β. A non-parametric

counterpoint to this is Thiel-Sen regression, which satisfies τ (x, r(β)) = 0 where τ is Kendall’s tau, a rank

based alternative to the correlation coefficient. This was proposed by Theil [20]; Sen [21] extended the

results and added a confidence interval estimate. The approach is well known in selected fields (e.g.

astronomy), and almost 6 completely unknown in others. It has strong resistance to outliers and nearly

full efficiency compared to linear regression when the errors are Gaussian. In our work, the maximum

iteration was set to 1000. Because the number of least square solutions might be extremely large, the

number of samples and subsamples were limited to 20000. Tolerance parameter of spatial median was

adjusted via Q-learning automatically. Initial value of tolerance was set to 10-3.

2.10. LASSOLARs (Least Absolute Shrinkage and Selection Operator- Least Angle Regression)

LASSO is a powerful method that perform two main tasks: regularization and feature selection. The

LASSO method puts a constraint on the sum of the absolute values of the model parameters, the sum has

to be less than a fixed value (upper bound). In order to do so the method apply a shrinking

(regularization) process where it penalizes the coefficients of the regression variables shrinking some of

them to zero. During features selection process the variables that still have a non-zero coefficient after the

shrinking process are selected to be part of the model. The goal of this process is to minimize the

prediction error [22]. Least Angle Regression (“LARS”) relates to the classic model-selection method

7

known as Forward Selection or “forward stepwise regression” given a collection of possible predictors.

Forward Selection is an aggressive fitting technique that can be overly greedy. Lasso-Lars, Lasso model

fit with Least Angle Regression a.k.a. Lars. It is a Linear Model trained with an L1 prior as regularizer

[23]. In this specific work, maximum iteration was set to 1000, and the value of α parameter (range of

change between 0.01 and 1) which multiplied by the penalty term was adjusted via Q-learning method

automatically.

3. Feature Subset Selector Algorithms (FSSAs)

Below we describe the 6 FSSAs used in our present work.

3.1. GA (Genetic Algorithm)

GA is a heuristic solution search technique inspired by natural evolution. It involves a robust and flexible

approach that can be applied to a wide range of learning and optimization problems. GAs are particularly

suited to problems where traditional optimization techniques break down, either due to the irregular

structure of the search space (for example, absence of gradient information) or because the search

becomes computationally intractable. [24] The traditional theory of GAs assumes that, at a very general

level of description, GAs work by discovering, emphasizing, and recombining good building blocks of

solutions in a highly parallel fashion. The idea here is that good solutions tend to be made up of good

building-blocks combinations of bit values that often confer higher fitness to the strings in which they are

present [25]. Regulatory parameters such as Maximum Number of Iterations, Population Size, Crossover

Percentage, Number of Offspring (Parents), Mutation Percentage, Number of Mutants, Mutation Rate and

Selection Pressure were set to 180, 250, 0.80, 160, 0.30, 60, 0.02 and 8, respectively.

3.2. SA (Simulated Annealing)

Simulated annealing is a multivariable optimization technique based on the Monte Carlo method used in

statistical mechanical studies of condensed systems and follows by drawing an analogy between energy

minimization in physical systems and costs minimization in design applications [26], [27]. Regulatory

parameters such as Desired Number of Selected Features, Maximum Number of Iterations, Maximum

Number of Sub-iterations, Initial Temperature, and Temperature Reduction Rate were set to 18, 180, 200,

10 and 0.99.

3.3. DE (Differential Evolution)

8

The DE algorithm is a heuristic approach mainly having three advantages; finding the true global

minimum regardless of the initial parameter values, fast convergence, and using few control parameters.

DE algorithm is a population based algorithm like genetic algorithms using similar operators; crossover,

mutation and selection [28], [29]. Regulative parameters such as Desired Number of Selected Features,

Lower Bound of Variables, Upper Bound of Variables, Maximum Number of Iterations, Population Size,

Lower Bound of Scaling Factor, Upper Bound of Scaling Factor and Crossover Probability were set to 18,

0, 1, 180, 250, 0.20, 0.80, and 0.20, respectively.

3.4. ACO (Ant Colony Optimization algorithm)

ACO is a technique for optimization that was introduced in the early 1990’s. The inspiring source of ant

colony optimization is the foraging behavior of real ant colonies. This behavior is exploited in artificial

ant colonies for the search of approximate solutions to discrete optimization problems, to continuous

optimization problems, and to important problems in telecommunications, such as routing and load

balancing [30], [31]. Regulatory parameters such as Desired Number of Selected Features, Maximum

Number of Iterations, Number of Ants (Population Size), Initial Pheromone, Pheromone Exponential

Weight, Heuristic Exponential Weight and Evaporation Rate were set to 18, 180, 250, 1, 1, 1 and 0.05.

3.5. PSO (Particle Swarm Optimization algorithm)

Particle swarm optimization is a heuristic global optimization method and also an optimization algorithm,

which is based on swarm intelligence. It comes from the research on the bird and fish flock movement

behavior. The algorithm is widely used and rapidly developed for its easy implementation and few

particles required to be tuned [32], [33]. Regulatory parameters such as Desired Number of Selected

Features, Lower Bound of Variables, Upper Bound of Variables, Maximum Number of Iterations,

Population Size (Swarm Size), Inertia Weight, Inertia Weight Damping Ratio, Personal Learning

Coefficient, Global Learning Coefficient were set to 18, 0, 1, 180, 250, 0.73, 1, 1.50 and 1.50,

respectively.

3.6. NSGAII (Non dominated sorting genetic algorithm)

Multi-objective evolutionary algorithms (EAs) that use non-dominated sorting and sharing have been

criticized mainly for their: 1) O(MN3) computational complexity (where is the number of objectives and

is the population size); 2) non-elitism approach; and 3) the need for specifying a sharing parameter. In

this paper, we suggest a non-dominated sorting-based multi-objective EA (MOEA), called non-dominated

sorting genetic algorithm II (NSGA-II), which alleviates all the above three difficulties. Specifically, a

fast non-dominated sorting approach with O (MN2) computational complexity is presented. Also, a

9

selection operator is presented that creates a mating pool by combining the parent and offspring

populations and selecting the best (with respect to fitness and spread) solutions. All optimizing machines

aim to minimize error by selecting the best combination, whereas NSGAII also aims to reduce number of

features [34], [35]. Regulatory parameters such as Maximum Number of Iterations, Population Size,

Crossover Percentage, Number of Parents (Offspring), Mutation Percentage, Number of Mutants,

Mutation Rate were set to 180, 250, 0.7, 176, 100 and 0.10, respectively.

4. Results of Feature Selection using NSGAII

In the present work, NSGAII was able to find the most optimal among all existing combinations amongst

93 features. It enabled selection of different combinations of features, depending on epochs and

adjustments to select additional number of features. This machine was limited by epochs similar to other

FSSAs, therefore, it could only discover 12 best optimal combinations, although there were probably

other optimal combinations. NSGAII is an optimization framework where the most effective features are

selecting while reducing (penalizing) the number of features, thus selecting combinations which did not

have a lot of features. As can be seen, at least 6 combinations existed in the various optimal combinations.

Different selections of features by NSGAII are shown in Supplemental Table 2.

Supplemental Table 2. Selected features by NSGAII as optimal combinations.

Combinatio

n (# of

optimal

features)

6 8 91

1

1

2

1

3

1

4

1

5

1

7

2

1

2

2

2

3

Sele

cted

Fea

ture

s

9 2 3 2 7 7 7 7 7 2 2 2

1

0 7 9 5 9 9 9 9 9 7 3 7

1

6 9

1

0 7

1

0

1

0

1

0

1

0

1

0 9 7 9

2

8

1

0

1

6 9

1

6

1

5

1

5

1

5

1

4

1

0 9

1

0

3

5

1

6

2

8

1

0

2

8

1

6

1

6

1

6

1

5

1

5

1

0

1

4

5

1

2

8

3

5

1

5

3

5

2

8

2

4

2

4

1

6

1

6

1

5

1

5

10

3

5

4

1

1

6

4

0

3

5

2

8

2

8

2

0

2

0

1

6

1

6

5

1

5

1

2

8

5

1

4

0

4

0

4

0

2

4

2

4

2

4

2

0

6

0

4

0

5

6

5

1

5

1

5

1

2

8

2

7

2

7

2

4

5

1

7

0

5

5

5

6

5

6

4

0

2

8

2

8

2

7

7

3

7

2

5

6

6

8

6

8

5

1

3

5

3

5

2

8

7

3

7

2

7

2

7

2

5

6

4

0

4

0

3

5

7

3

7

3

7

3

6

8

4

1

4

1

4

0

9

0

8

8

6

9

5

1

5

1

4

1

9

0

7

2

5

6

5

6

5

1

7

3

6

9

6

9

5

2

9

0

7

0

7

0

5

6

7

2

7

2

6

7

7

3

7

3

6

9

8

8

7

9

7

2

9

0

8

8

7

3

9

0

8

8

9

11

0

We selected 184 patients who had all features (93 features). As such, we generated 10 different

arrangements among patients. Therefore, we chose features according to selected combinations based on

Supplemental Table 2. All combinations were applied to LOLIMOT, and Q-Learning was utilized. The

results are shown in Supplemental Figure 7 (blue line). Supplemental Table 3 shows p-values of

combinations selected by NSGAII versus the best combination results from part A (1.83±0.13).

COM.6 COM.8 COM.9 COM.11 COM.12 COM.13 COM.14 COM.15 COM.17 COM.21 COM.22 COM.23

Selected optimal combination

1.5

1.55

1.6

1.65

1.7

1.75

1.8

1.85

1.9

Abs

olut

e er

ror

Main test Additianal independent test

Supplemental Figure 7. Comparison between first test vs. new test using expanded subject set.

Supplemental Table 3. P-values for combinations selected by NSGAII, relative to best combination from part A of

manuscript.

Combination

(# of optimal

features)

6 8 9 11 12 13 14 15 17 21 22 23

P-value 0.017 0.22 0.007 0.09 0.073 0.005 0.045 0.13 0.48 0.25 0.27 0.42

Supplemental Table 4. Selection of patients beyond original 184 patients for additional independent testing

12

Alternatively, we created additional independent sets, as shown in Supplemental Table 4, of patients

having features according to Supplemental Table 2. For instance, for the combination with 6 features as

selected by NSGAII, 308 patients (206 males, 102 females; average age in year 67.5± 9.90, range [39,

91], average of MoCA outcome: 26.5 ±3.50, range [11, 30]) had available the features 9, 10, 16, 28, 35

and 51 (selected previously as the most vital features). This allowed additional validation of our work. In

the expanded sets (Supplemental Table 4), e.g. 308 total patients for 6 optimal features, we created a

single arrangement for train, training validation and final test (65% for train, 5% for training validation

and 30% for final test), while we ensured that the new final test set (93 new patients) only included newly

included patients, for completely independent testing. For combination 6 with new set of patients, mean

absolute error while we applied 6 features to LOLIMOT accompanied to Q-Learning, was reached about

1.60. So it was approximately similar to mean absolute error (1.68±0.12) which we reached in the prior

main final test (~30% of 184 patients who had all features). For other combinations according to selected

features in Supplemental Tables 2, it was possible to do training, training validation and final testing

process similar to 6th combination. Results of this combinations are also shown in Supplemental Figure 7

(Red line).

13

Combination

(# of optimal

features)

6 8 9 11 12 13 14 15 17 21 22 23

Total patients 308 234 290 232 286 286 282 282 282 228 228 228

Female # 102 80 193 153 191 191 188 189 189 151 151 151

Male # 206 154 97 79 95 95 94 93 93 77 77 77

New patients 124 50 106 48 102 102 98 98 98 44 44 44

Range:

MoCA

Outcome

11-

30

11-

30

11-

30

11-

30

11-

30

11-

30

13-

30

17-

30

13-

30

13-

30

13-

30

13-

30

Average:

MoCA

outcome

26.5±

3.5

26.6

±3.5

26.5

±3.3

26.5

±3.

5

26.5

±3.4

26.5

±3.4

26.6

±3.2

26.7

±2.9

26.6

±3.

2

26.6

±3.3

26.6

±3.3

26.6

±3.3

Range: Age39-

91

39-

91

39-

91

39-

91

39-

91

39-

91

39-

91

39-

91

39-

91

39-

91

39-

91

39-

91

Average: Age

67.5

±9.9

68.2

±9.5

67.6

±9.7

68.2

±9.

5

67.7

±9.6

67.7

±9.7

67.8

±9.7

67.8

±9.7

67.8

±9.

7

68.4

±9.5

68.4

±9.5

68.4

±9.5

Performance for prior final tests and new final tests are shown in Supplemental Figure 7. All mean

absolute errors for new patients were less or equal to mean absolutes errors of prior final test (the

lowering is attributed to a larger training set in new expanded patient set). This means that our utilized

machines all work very well.

According to the Supplemental Table 2, the 5 most used features within all the 12 optimal combinations

were features 9, 10, 16, 28 and 51, namely: (i,ii) MoCA years 0 and 1, (iii) REM (Sleep Behavior

Disorder Questionnaire) year 1, (iv) LNS (Letter Number Sequencing) Number 4 year 0, and (v) STAIA

(State‐Trait Anxiety Inventory for Adults) year 0. As such, these features were the most prominent and

predictive factors. The frequency of usage of each feature within the various optimal combinations is also

shown in Supplemental Figure 8.

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94

Feature number0

2

4

6

8

10

12

Nu

mb

er o

f it

erat

ion

Supplemental Figure 8. Number of times (i.e. frequency) of usage of each feature within the optimal

combinations. Features 9, 10, 16, 28 and 51 were the most commonly used.

References

[1] S. Alsmadi, M. khalil and et al, "Back Propagation Algorithm: The Best Algorithm," IJCSNS International Journal of Computer Science and Network Security, vol. 9, no. 4, pp. 378-383, 2009.

[2] D. Rumelhart, E. Geoffrey and et al, "Leaner Representations By back-Propagating errors," Nature, vol. 323, no. 9, pp. 533-536, 1986.

[3] A. Townley, M. Ilchmann and et al, "Existence and Learning of Oscillations in Recurrent Neural Networks," IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. 11, no. 1, pp. 205-214, 2000.

[4] N. Maknickiene, . V. Rutkauskas and et al, "Investigation of financial market prediction by recurrent neural network," Innovative Infotechnologies for Science, Business and Education, vol. 11, no. 2, pp. 3-8, 2011.

14

[5] Y. Arora, A. Singhal and A. Bansal, "A Study of Applications of RBF Network," International Journal of Computer Applications, vol. 94, no. 2, pp. 17-20, 2014.

[6] O. Nelles, A. Fink and R. Isermann, "Local Linear Model Trees (LOLIMOT) Toolbox for Nonlinear System Identification," science Direct (IFAC System Identification), vol. 33, no. 15, pp. 845-850, 2000.

[7] J. Mart´ınez-Morales and E. Palacios, "Modeling of internal combustion engine emissions by," SciVerse Science Direct, vol. 3, pp. 251-258, 2012.

[8] M. RodneyOD and P. Goodman, "Decision tree design using information theory," Knowledge Acquisition, vol. 2, pp. 1-19, 1990.

[9] S. Chourasia, "Survey paper on improved methods of ID3 decision tree," International Journal of Scientific and Research Publications, vol. 3, no. 12, pp. 1-4, 2013.

[10] L. Breiman, "Random Forests," Machine Learning, vol. 45, p. 5–32, 2001.

[11] A. Jehad, R. Khan and N. Ahmad, "Random Forests and Decision Trees," IJCSI International Journal of Computer Science Issues, vol. 9, no. 5, pp. 272-278, 2012.

[12] A. Efendi, "A simulation study on Bayesian Ridge regression models for several collinearity levels," in AIP Conference Proceedings, 2017.

[13] C. M. Bishop, Pattern Recognition and Machine Learning, 1th ed., P. J. K. B. S. Michael Jordan, Ed., New York: Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA, 2006.

[14] G. Karabatsos, "Fast Marginal Likelihood Estimation of the Ridge Parameter(s) in Ridge Regression and Generalized Ridge Regression for Big Data," Statistics-, pp. 1-44, 2015.

[15] D. Denison,, C. Holmes and et al, Bayesian Methods for Nonlinear Classification and Regression, New York: John Wiley and Sons, 2002.

[16] K. Crammer, O. Dekel and et al, "On-line passive-aggressive algorithms," Journal of Machine Learning Research, vol. 7, pp. 551-585, 2006.

[17] J. Lu, P. Zhao and C. H. Steven , "Online Passive Aggressive Active Learning and its," JMLR: Workshop and Conference Proceedings, vol. 39, pp. 266-282, 2014.

[18] M. Blondel, Y. Kubo and N. Ueda, "Online Passive-Aggressive Algorithms for Non-Negative Matrix Factorization and Completion," Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR, vol. 33, pp. 96-104, 2014.

[19] S. H. Shah, A. Rashid, and et al, "A Comparative Study of Ordinary Least Squares Regression and Theil-Sen Regression through Simulation in the Presence of Outliers," Lasbela, U. J.Sci.Techl, vol. V, pp. 137-142, 2016.

[20] Theil, "A rank-invariant method of linear and polynomial regression analysis. I, II, III," Nederl. Akad. Wetensch, vol. 53, pp. 386–392, 521–525, 1397–1412, 1950.

[21] P. Sen, "Estimates of the Regression Coefficient Based on Kendall's Tau," Journal of the American

15

Statistical Association, vol. 63, no. 324, pp. 1379-1389, 1968.

[22] V. Fonti, "Feature Selection using LASSO," VU Amsterdam, Amsterdam, 2017.

[23] B. Efron, T. Hastie and et al, "Least angle regression," The Annals of Statistics, vol. 32, pp. 407-499, 2004.

[24] J. McCall, "Genetic algorithms for modelling and optimisation," Journal of Computational and Applied Mathematics, vol. 184, p. 205–222, 2004.

[25] M. Mitchell, "Genetic Algorithms: An Overview," Complexity, vol. 1, no. 1, pp. 31-39, 1995.

[26] W. Dolan, P. Cummings and M. LeVan, "Process Optimization via Simulated," AIChE Journal, vol. 35, pp. 725-736, 1989.

[27] S. Kirkpatrick, C. Gelatt and M. Vecchi, "Optimization by Simulated Annealing," Science, New Series, vol. 220, pp. 671-680, 1983.

[28] D. KARABOGA and S. OKDEM, "A Simple and Global Optimization Algorithm for," Turk J Elec Engin, vol. 12, pp. 53-60, 2004.

[29] A. Musrrat, M. Pant and A. Abraham, "Simplex Differential Evolution," Acta Polytechnica Hungarica, vol. 6, pp. 95-115, 2009.

[30] C. Blum, "Ant colony optimization: Introduction and recent trends," Physics of Life Reviews, vol. 2, pp. 353-373, 2005.

[31] P. Sivakumar and K. Elakia, "A Survey of Ant Colony Optimization," International Journal of Advanced Research in Computer Science and Software Engineering, vol. 6, no. 3, pp. 574-578, 2016.

[32] Q. Bai, "Analysis of Particle Swarm Optimization Algorithm," Computer and Information Science, vol. 3, pp. 180-184, 2010.

[33] S. Singh, "A Review on Particle Swarm Optimization Algorithm," International Journal of Scientific & Engineering Research, vol. 5, no. 4, pp. 551-553, 2014.

[34] D. Kalyanmoy, A. Associate. and et al, "A Fast and Elitist Multiobjective Genetic Algorithm:," IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, vol. 6, pp. 182-197, 2002.

[35] Y. Yusoff, M. Ngadiman and A. Mohd Zain, "Overview of NSGA-II for Optimizing Machining Process Parameters," Procedia Engineering, vol. 15, p. 3978 – 3983, 2011.

16

Figure captions

Supplemental Figure 1. Diagram of MLP-BP

Supplemental Figure 2. diagram of the RNN

Supplemental Figure 3. diagram of the RBF

Supplemental Figure 4. Network structure of a local linear neuro fuzzy model with M neurons for nx LLM inputs;

and nz validity function inputs Zi.

Supplemental Figure 5. an example of the DTC algorithm

Supplemental Figure 6. a diagram of the RFA algorithm

Supplemental Figure 7. Comparison between initial final test vs. new final test included expanded subject set.

Supplemental Figure 8. Number of times (i.e. frequency) of usage of each feature within the optimal combinations.

Features 9, 10, 16, 28 and 51 were the most commonly used.

Table caption

Supplemental Table 1. List of 93 features used in machine learning based prediction of outcome. MoCA in year 4

was predicted.

Supplemental Table 2. Selected features by NSGAII as optimal combinations.

Supplemental Table 3. P-values for combinations selected by NSGAII, relative to best combination from part A of

manuscript.

Supplemental Table 4. Selection of patients beyond original 184 patients for additional independent testing

17

Documents

ars.els-cdn.com€¦ · Web viewA multilayer perceptron is a feed forward artificial neural network model that maps sets of input data onto a set of appropriate output so it is