13
A functional generalized F-test for signal detection with applications to ERP significance analysis David Causeur, Ching-Fan Sheu, Emeline Perthame and Flavia Rufini 2019-07-05 Contents Introduction The present document reproduces some data analyses and simulation studies presented in the submitted paper ‘A functional generalized F-test for signal detection with applications to Event-Related Potentials significance analysis’. In the manuscript, the whole dataset is used to illustrate the method, whereas in the following, the analyses are implemented on the data restricted to a region of interest made of three contiguous channels. Required packages and homemade functions Some R functions used below, implementing the functional generalized F-Test presented in the paper, are available in the R package ERP (downloadable from CRAN): Once installed, the required packages are loaded into the current R session: Install_and_Load <- function(packages) { k <- packages[!(packages %in% installed.packages()[, "Package"])]; if(length(k)) {install.packages(k, repos = "https://cran.rstudio.com/");} for(package_name in packages) {library(package_name, character.only = TRUE, quietly = TRUE);} } Install_and_Load(c("ERP","mnormt", "mvtnorm", "corpcor", "irlba", "fdANOVA", "mgcv")) Importing ERP data An extract of the ERP data introduced in Section 2 of the manuscript is provided in the package ERP: data(impulsivity) Each of the 144 rows of ‘impulsivity’ contains an ERP curve (from column 5 to 505) starting from 0ms and ending at 1000ms with one recording every 2ms. The first 4 columns are experimental covariates: channel (electrode location on the scalp), subject id, impulsivity trait group (High/low), response inhibition condition (Success/Failure). An extract of the data is shown below: head(impulsivity[,1:10]) 1

A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

A functional generalized F-test for signal detection withapplications to ERP significance analysis

David Causeur, Ching-Fan Sheu, Emeline Perthame and Flavia Rufini2019-07-05

Contents

Introduction

The present document reproduces some data analyses and simulation studies presented in the submittedpaper ‘A functional generalized F-test for signal detection with applications to Event-Related Potentialssignificance analysis’. In the manuscript, the whole dataset is used to illustrate the method, whereas in thefollowing, the analyses are implemented on the data restricted to a region of interest made of three contiguouschannels.

Required packages and homemade functions

Some R functions used below, implementing the functional generalized F-Test presented in the paper, areavailable in the R package ERP (downloadable from CRAN):

Once installed, the required packages are loaded into the current R session:Install_and_Load <- function(packages) {

k <- packages[!(packages %in% installed.packages()[, "Package"])];if(length(k)){install.packages(k, repos = "https://cran.rstudio.com/");}

for(package_name in packages){library(package_name, character.only = TRUE, quietly = TRUE);}

}Install_and_Load(c("ERP","mnormt", "mvtnorm", "corpcor", "irlba", "fdANOVA", "mgcv"))

Importing ERP data

An extract of the ERP data introduced in Section 2 of the manuscript is provided in the package ERP:data(impulsivity)

Each of the 144 rows of ‘impulsivity’ contains an ERP curve (from column 5 to 505) starting from 0ms andending at 1000ms with one recording every 2ms. The first 4 columns are experimental covariates: channel(electrode location on the scalp), subject id, impulsivity trait group (High/low), response inhibition condition(Success/Failure). An extract of the data is shown below:head(impulsivity[,1:10])

1

Page 2: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

Channel Subject Group Condition T_0 T_2 T_4 T_6 T_8 T_1010 FCZ S11 High Success 0.0839 -0.026 -0.111 -0.177 -0.227 -0.26615 CZ S11 High Success 0.3311 0.271 0.228 0.206 0.196 0.18920 CPZ S11 High Success 0.7119 0.722 0.741 0.774 0.818 0.86840 FCZ S11 High Failure 0.6886 0.609 0.521 0.425 0.329 0.21645 CZ S11 High Failure -0.0498 -0.160 -0.251 -0.318 -0.376 -0.43750 CPZ S11 High Failure -0.3064 -0.492 -0.656 -0.788 -0.873 -0.939

The sequence of time points is generated using function ‘seq’:time_pt = seq(0, 1000, 2)

# sequence of time points (1 time point every 2ms in [0,1000])T = length(time_pt)

# number of time points

The main testing issue addressed in the following focuses on the comparison of the ERP curves in the tworesponse inhition conditions after the onset, occuring at time 0ms:erpdta = impulsivity[,5:505]

# erpdta contains the ERP curvescovariates = impulsivity[,1:4]

# contains the experimental covariates of interest

For each of the 3 channels, the dataset contains 48 ERP curves distributed according to the following design:with(data=covariates,table(Channel,Condition,Group))

, , Group = High

ConditionChannel Failure Success

CPZ 12 12CZ 12 12FCZ 12 12

, , Group = Low

ConditionChannel Failure Success

CPZ 12 12CZ 12 12FCZ 12 12

# Within each channel (here CPZ for example), 12 ERP curves in each Condition x Group

Note finally that the 12 subjects in impulsivity group ‘High’ are not the same as in group ‘Low’:with(data=covariates,table(Condition,Subject,Group))

, , Group = High

SubjectCondition S1 S10 S11 S14 S16 S17 S18 S19 S2 S21 S22 S23 S24 S28 S32 S33 S34 S35

Failure 0 0 3 0 0 3 3 3 3 0 0 0 0 0 3 3 3 3Success 0 0 3 0 0 3 3 3 3 0 0 0 0 0 3 3 3 3

SubjectCondition S36 S4 S5 S6 S8 S9

Failure 0 3 0 0 3 3

2

Page 3: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

Success 0 3 0 0 3 3

, , Group = Low

SubjectCondition S1 S10 S11 S14 S16 S17 S18 S19 S2 S21 S22 S23 S24 S28 S32 S33 S34 S35

Failure 3 3 0 3 3 0 0 0 0 3 3 3 3 3 0 0 0 0Success 3 3 0 3 3 0 0 0 0 3 3 3 3 3 0 0 0 0

SubjectCondition S36 S4 S5 S6 S8 S9

Failure 3 0 3 3 0 0Success 3 0 3 3 0 0

# Within each channel, 12 subjects in group High, 12 in group Low

Section 2. Functional linear modeling of ERP data

We just show hereafter the histogram and image plot of the residual time correlation matrix of the linearmodel introduced in the manuscript for the impulsivity ERP design.

Since the focus of the present ERP significance analysis is on the ‘condition effect’ in each group, we start byselecting the ERP curves in the response inhibition group ‘High’:select = covariates$Group=="High"

The linear model, including the main effets of Subject, Channel and Condition, and the Condition by Channelinteraction effect, is defined by means of its design matrix:design = model.matrix(~Subject+Channel+Condition+Channel:Condition,

data=covariates[select,])

The function ‘erptest’ implements the ordinary-least-squares fit of the above linear model for the ERPs ateach time point:erpfit = erptest(erpdta[select,],design)

# erptest is a function implementing the Benjamini-Hochberg procedure# to identify significant time intervals (out of the purpose of the paper)

The residuals of the OLS fit at each time point are provided as an output of function ‘erptest’:erpres = erpfit$residuals

The left plot of Figure 5 is a histogram of the time correlations among residuals:erpcor = cor(erpres) # Residual time correlation matrixvcor = erpcor[row(erpcor)>col(erpcor)] # Keeps only lower-diagonal elementshist(vcor,proba=TRUE,breaks=seq(-1,1,0.05),xlim=c(-1,1),col="darkgray",

bty="l",xlab="Residual correlations",ylab="Density",main="Distribution of the residual correlations",cex.axis=1.25,cex.lab=1.25,cex.main=1.25)

The right plot displays the distribution over time of the absolute residual correlations using the function‘image.plot’ in package ‘fields’:fields::image.plot(time_pt,time_pt,abs(erpcor)[,T:1],

bty="l",xlab="Time (ms)",col=rev(gray.colors(12)),ylab="Time (ms)",

3

Page 4: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

Distribution of the residual correlations

Residual correlations

Den

sity

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

Figure 1: Histogram of the residual correlations of model (3) in group ‘High’

yaxt="n",main="Residual correlation pattern over time",cex.axis=1.25,cex.lab=1.25,cex.main=1.25)

Section 3. Functional ANOVA under dependence

Factor modeling of the residual time correlation

In the present section, the factor modeling of the residual time correlation is introduced. First, those residualsare obtained using function ‘erptest’:select = (covariates$Group=="High")erpfit = erptest(erpdta[select,],design=design)erpres = erpfit$residuals

The function ‘nbfactors’ implements the method introduced in the paper, adapted from Friguet et al. (2009),which consists in minimizing the Variance Inflation Criterion:scaledres = scale(erpres)nbf = nbfactors(scaledres,diagnostic.plot=TRUE,maxnbfactors=40,verbose=FALSE)$optimalnbf

The diagnostic plot provides two recommendations: the smallest one, also the most conservative, is deducedfrom a ‘elbow’ rule-of-thumb, whereas the largest one, the most liberal, is the number of factors minimizingthe criterion. In the present section, we choose 14 factors, which corresponds to the most conservative.

Now the factor model parameters for the residual time correlations are estimated by an EM algorithm usingfunction ‘emfa’:

4

Page 5: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

0 200 400 600 800 1000

Residual correlation pattern over time

Time (ms)

Tim

e (m

s)

0.0

0.2

0.4

0.6

0.8

1.0

Figure 2: Image plot of the absolute values of the residual correlations.

0 10 20 30 40

2025

3035

4045

Number of factors

Var

ianc

e In

flatio

n C

riter

ion

27 factors

14 factors

Figure 3: Variance Inflation plot to determine the number of factors in the model for the residuals of thelinear model in the high impulsivity group.

5

Page 6: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

fa = emfa(scaledres,nbf=nbf)# Fits a nbf-factor model for the residual time correlationPsi = fa$PsiB = fa$B# Psi and B are the factor model parameters

Section 5: Application to the impulsivity ERP experiment

The statistical analysis of the impulsivity ERP experiment aims primarily at testing for the condition effect,i.e., successful versus failed response inhibition trials, at different scalp locations in each of the two impulsivetrait groups, while accounting for the subject effect (individual difference). In order to account for all possibleeffects on the ERP curves, a linear model can be formulated as follow.

Let Yijkr be the amplitude of ERP curve observed at time tr, r = 1, ..., T = 501 (one recording every 2 ms inthe interval [0, 1000] ms after the onset) for subject i, i = 1, . . . , 12, in condition j = 1, 2, group k = 1, 2 andat channel l = 1, . . . , 30, then:

Yijklr = µr + αi(k)r + βjr + γkr + δlr + (βγ)jkr + (βδ)jlr + (γδ)klr + (βγδ)jklr + εijkr,

where αi(k)r, βjr, γkr and δlr stand, respectively, for the main effect parameters of subject, condition, groupand channel, the two-letter effect parameters for 2nd order interaction effects and the three-letter effectparameter for the 3rd order interaction effect at time tr.

The design matrix of such a linear model can be obtained by the function ‘model.matrix’, using the standardsymbolic expression of models in R:design = model.matrix(~C(Subject,sum)/Group+Group+Condition+Channel+Channel:Condition+

Channel:Group+Condition:Group+Channel:Condition:Group,data=covariates)

In order that all the effect parameters can be viewed as ‘mean’ effect curves over the subject, the ‘Subject’effect parameters αi(k)t are forced to sum to 0 by the expression ‘C(Subject,sum)’. Moreover, since thesubjects are not the same in the two impulsivity groups, the ‘Subject’ effect is embedded into the ‘Group’effect, which is stated by ‘C(Subject,sum)/Group’.

Testing for the significance of an effect in the present linear model framework consists in comparing the modelin which the tested effect is included and the null model obtained as a submodel of the former one, by settingthe tested effect parameters at zero.

In the present situation, the design matrix of the null model is just obtained as a submatrix of the designmatrix ‘design’ by removing the third order interation parameter:design0 = model.matrix(~C(Subject,sum)/Group+Group+Condition+Channel+Channel:Condition+

Channel:Group+Condition:Group,data=covariates)

The function ‘erpFtest’ implements the functional F-test presented in the manuscript, which peculiarity is toaccount for the strong time dependence across the residuals of the model. This is handled by a factor modelfor time dependence, which complexity essentially depends on the number of factors. Therefore, the first stepof the testing procedure aims at finding the proper number of factors, between 0, in which case the functionalF-test does not account for dependence, and the residual degrees of freedom of the model, in which case theregression factor model is saturated. This can be achieved using the function ‘erpFtest’ with ‘nbf=NULL’and ‘pvalue=“none” ’, meaning that no calculation of a p-value is expected as an output in this first step.F = erpFtest(erpdta,design,design0,nbf=NULL,pvalue="none",wantplot=TRUE)

6

Page 7: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

0 20 40 60 80

2030

4050

Number of factors

Var

ianc

e In

flatio

n C

riter

ion

72 factors

10 factors

Figure 4: Impulsivity study: Variance Inflation Criterion curve for choosing the number of factors in thegeneralized functional LRT procedure.

7

Page 8: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

F$nbf # Recommended number of factors

[1] 10

The above command implements the procedure introduced in Friguet et al. (2009) to find the number offactors for testing issues in regression factor model. It produces the plot of a so-called Variance InflationCriterion along the number of factors, which reaches its minimal value when the proper number of factors isintroduced in the model. As always in this kind of issues, whatever the criterion chosen to find the number offactors, the shape of the curve on which the decision is made has to be accounted for with caution. It isindeed rarely the case that an obvious global minimum appears with an unambiguous choice of the numberof factors. The typical variance inflation curve decreases rapidly first, then more slowly and finally increases.The decision rule consisting in choosing the value for which the criterion is minimal may lead to keep a verylarge number of factors, which exposes to the risk of overfitting and, subsequently, of a too liberal test, inwhich the type-I error level is not controlled.

For this reason, a heuristics is implemented in ‘erpFtest’, consisting in identifying the lowest number of factorsq for which the decrease in the criterion with respect to the preceding factor model with q − 1 factors doesnot exceed 5% of the largest value of the criterion. The resulting value is provided in the ‘nbf’ component ofthe output of ‘erpFtest’. It can be chosen to continue the testing procedure with this value, which definitelyprotects against a too liberal decision rule. Alternaltively, in order to gain power, it can also be preferred todetermine the number of factors from the curve by a personal rule-of-thumb which would lead to a numberof factors between the usually conservative output of ‘erpFtest’ and the usually too liberal minimization ofthe criterion. Hereafter, we choose to continue the procedure with the lowest value for the recommendednumbers of factors.

Once this number of factors is determined, the function ‘erpFtest’ can be used again, this time with a fixednumber of factors and pvalue=“MC” or pvalue=“Satterthwaite”, the default option. The former optionactually returns a p-value based on a Satterthwaite approximation of the null distribution by a cχ2

ν distribution.Using the other option, ‘pvalue=MC’, the p-value is deduced from a Monte-Carlo approximation of the nulldistribution. Although this is supposed to be a more accurate approximation of the null distribution thanthe Satterthwaite approximation, the Monte-Carlo option needs at least 1000 calculations of the functionalF-test after random permutations of the rows of the design matrix (‘nsamples=1000’), which can take a longtime when the sample size and the number of time frames are large. Moreover, the Monte-Carlo methodcannot estimate properly p-values lower than 1/nsamples. The following R chunk is not evaluated to avoid atoo long execution time:F = erpFtest(erpdta,design,design0,nbf=F$nbf)F$pval # p-value of the functional F-test

It turns out that the 3rd order interaction effect is not significant: the spatial distribution of the conditioneffect may be considered as the same in the two impulsivity groups.

Section 6. Simulation Study.

Impact of the number of factors on the ability to detect the signal

We use a data-driven simulation set-up defined as follow: the design is made of 24 ERP curves, the first 12forming an experimental group and the remaining 12 individuals another experimental group. The testingissue is on the mean comparison of the two groups:n = 24group = factor(rep(c(1,2),c(n/2,n/2)))

# The methods will all test for the mean difference between# the group of (n/2) first ERP curves

8

Page 9: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

# and the group of remaining (n/2) ERP curvessimdesign = model.matrix(~group,data=data.frame(group=group))

The simulation of ERP dataset mimics the time dependence estimated from the residuals of the linear modelintroduced in Section 2 to analyse the ‘condition’ effect in the high impulsivity trait group:select = (covariates$Group=="High")design =

model.matrix(~Subject+Channel+Condition+Channel:Condition,data=covariates[select,])erpfit = erptest(erpdta[select,],design=design)res = erpfit$residualsSigma = var(res)

The R script for the simulation study is encapsulated in the function, ‘simuldata’:simuldata = function(n,sigma,group,maxcoef) {

# Two-group comparison design# n: number of curves# sigma: variance-covariance matrix (time-dependence)# group: grouping variable (numeric, 1 or 2)# maxcoef: maximal difference between group mean curvesT = ncol(sigma)signal = rep(0,T)center = 0.25 # Location of the waveform (as a fraction of the whole time frame)int.length = 0.4 # Duration of the waveformstart = round(center*T-int.length*T/2) # Starting time point for the waveformend = round(center*T+int.length*T/2) # End time pointsc = ((start:end)-mean(start:end))/max((start:end)-mean(start:end))

# Standardized x-valuessignal[start:end] = rev(-maxcoef*exp(-maxcoef*(sc+1)^2)*sin(0.05*sc*180)/

max(exp(-maxcoef*(sc+1)^2)*abs(sin(0.05*sc*180))))# Waveform in [start,end]: exponentially attenuated sinusoid function

dta = rmvnorm(n=n,mean=rep(0,T),sigma=sigma)dta[group==1,] = dta[group==1,]+

matrix(rep(signal,sum(group==1)),nrow=sum(group==1),byrow=TRUE)return(dta)

}

Now, the true waveform effect curve is defined, with nine maximal amplitudes from 0 to 5:vmaxcoef = c(0,5,10,25,50,100,200,300,500)*1e-02

# All the values for the maximum amplitude of the waveform under# the non-null hypothesis

First, ‘nbsimul’ datasets are simulated and stored into a list named ‘ldta’:nbsimul = 1

# Should be changed to 1000 to get the results shown in the paper

k = 3 # Chosen arbitrarily. Can be changed to fix the signal strength.maxcoef = vmaxcoef[k]

ldta = lapply(rep(n,nbsimul),simuldata,sigma=Sigma,group=simdesign[,2],maxcoef=maxcoef)

# ldta is a list of simulated ERP datasets

We just show here how to calculate the detection rate of TGLS for one particular choice of the number of

9

Page 10: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

factors and one particular choice of the maximal amplitude of the true signal (‘maxcoef=1’). For eachdataset, the p-value of TGLS for the 2-group comparison with a given number of factors is approximated, by aMonte-Carlo method with 1000 Monte-Carlo samples, implemented in the function ‘erpFtest’:vnbf = 0:21

# All the possible numbers of factorsnbf = 5

# Just for illustration (change to any value in vnbf)ltest = lapply(ldta,erpFtest,design=simdesign,nbf=nbf,pvalue="MC",nsamples=1000)

# For each dataset in ldta, calculates the p-value of T-GLS with nbf factorspval = unlist(lapply(ltest,function(x) x$pval))

# extract the p-values

Finally, the detection rate of TGLS is deduced:mean(pval<=0.05)

[1] 0

# Detection rate

Comparison of signal detection methods

With an illustrative purpose, We show on the same limited number of simulated datasets used just abovehow to derive p-values for each signal detection method in the comparison study.

TOLS and TGLS with 0 factor

The p-values of TOLS and TGLS with 0 factor can be obtained by the same call of ‘erpFtest’ with ‘nbf=0’:ltest = lapply(ldta,erpFtest,design=simdesign,nbf=0,

pvalue="MC",verbose=FALSE,nsamples=1000)# For each dataset in ldta, calculates the p-values of# T-OLS and T-GLS with nbf=0 factors

pval.Fols = unlist(lapply(ltest,function(x) x$pval.Fols))# extract the p-values of T-OLS

pval.Fgls = unlist(lapply(ltest,function(x) x$pval))# extract the p-values of T-GLS

mean(pval.Fols<=0.05) # Detection rate of T-OLS

[1] 0

mean(pval.Fgls<=0.05) # Detection rate of T-GLS

[1] 0

TGLS with a data-driven choice of the number of factors

The p-value of TGLS, where the number of factors is chosen based on the Variance Inflation criterion, isprovided by the function ‘erpFtest’ with ‘nbf=NULL’:ltest = lapply(ldta,erpFtest,design=simdesign,nbf=NULL,pvalue="MC",

nsamples=1000,wantplot=FALSE)# For each dataset in ldta, calculates the p-values of T-GLSpval.Fgls = unlist(lapply(ltest,function(x) x$pval))

10

Page 11: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

# extract the p-values of T-GLSmean(pval.Fgls<=0.05)

[1] 0

Supporting information - Web appendix 4

Simulations conducted after the manner of Zhang et al. (2019)

To address the impact of (1) a departure from normality of the distribution of the residual errors and (2) a lowtime dependence across those residuals, we perform here a simulation study similar to that reported in Zhanget al. (2019) “A new test for functional one-way ANOVA with applications to ischemic heart screening“,Computational Statistics, 34(2), in which the matrix of error terms is obtained by multiplying a fixed low-rankmatrix by a random matrix whose entries are independently distributed as either normal or (scaled) Student’st with 4 degrees of freedom random variables. The fixed low-rank matrix depends on a parameter ρ governingthe time dependence across residuals (the larger ρ, the weaker the residual correlation).

The simulation study consists in a three-group comparison of curves sampled for either T = 80 or T = 150time points. The sample sizes for the three groups are: 20, 30 and 30, respectively (one of three possibilitiesin Zhang et al. (2019) - the results are very similar when other possibilities were tried). Another parameter δgoverns the amplitude of the group effect.

To introduce uncertainty in the determination of the number of factors for the dependence across residuals,we have added white noise to the simulation model of Zhang et al. (2019) without changing the overallsignal-to-noise ratios but with a small difference in the amount of time dependence (the standard deviationof the white noise is fixed to 5% of the standard deviation of the whole dependent+white noise). Finally, toinvestigate how a severe asymmetry in the distribution of the error terms would affect the performance of themethods, we have also considered the case in which residuals follow a (centered and scaled) χ2 distributionwith 1 degree of freedom.

The R script to reproduce the simulation study is encapsulated in the following function ‘simuldata.zhang’with optional arguments to change the simulation parameters introduced by Zhang et al. (2019).simuldata.zhang = function(k,n,M,c1,delta,u,rho,randtype=c("gaussian","student","chi2"),

nb_funct=11,a=1.5,white.noise=0.05) {# k: number of groups, n: vector of sample sizes per group, M: number of time points# c1: coefficients of the 3rd order polynomial defining the mean curve in group 1# delta: scalar governing for the group effect# u: perturbation vector for the group effect (ci=c1+(i-1)*delta*u)# rho: scalar governing for the time dependence# randtype: family of distributions for the residuals# nb_funct: rank of the matrix used to generate the time dependence# across residuals (set to 11 in Zhang et al., 2019)# a: scalar controlling the impact of rho on time dependence# (set to 1.5 in Zhang et al., 2019)# white.noise: percentage of variance of the white noise w.r.t dependent+white# noise (set to 0.05)time_pt = (1:M)/(M+1)mt = outer(0:3,time_pt,function(k,x) x^k)mc = rep(1,k)%*%t(c1)+delta*(0:(k-1))%*%t(u)mc = mc[rep(1:nrow(mc),times=n),]mmu = mc%*%mtPsi = matrix(1,nrow=nb_funct,ncol=length(time_pt))

11

Page 12: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

Psi[seq(2,nb_funct,2),] =outer(seq(2,nb_funct,2),time_pt,function(r,t) sqrt(2)*sin(2*pi*r*t))

Psi[seq(3,nb_funct,2),] =outer(seq(3,nb_funct,2),time_pt,function(r,t) sqrt(2)*cos(2*pi*r*t))

if (randtype=="gaussian")z = matrix(rnorm(sum(n)*nb_funct),nrow=sum(n),ncol=nb_funct)

if (randtype=="student")z = matrix(rt(sum(n)*nb_funct,df=4)/sqrt(2),nrow=sum(n),ncol=nb_funct)

if (randtype=="chi2")z = matrix((rchisq(sum(n)*nb_funct,df=1)-1)/sqrt(2),nrow=sum(n),ncol=nb_funct)

lambda = a*rho^(1:nb_funct)mb = z*(rep(1,sum(n))%*%t(sqrt(lambda)))mv = mb%*%Psimeanm = colMeans(mv)mvc = mv-rep(1,nrow(mv))%*%t(meanm)rsdv = sqrt(colMeans(mvc^2))mv = sqrt(1-white.noise)*mv+

matrix(rnorm(sum(n)*M,sd=sqrt(white.noise)*rsdv),nrow=sum(n),ncol=M)mmu+mv

}

Most of the arguments of the function ‘simuldata.zhang’ are set to default values proposed in Zhang et al.(2019). Alternative possibilities presented in Zhang et al. (2019) are also given in comments.nb_groups = 3 # 3 groups of curvesM = 80 # 80 time points. Alternatively, M = 150

a = 1.5 # scalar controlling the impact of rho on time dependencenb_funct = 11 # rank of the matrix used to generate the time dependencevrho = c(0.1,0.3,0.5,0.7,0.9) # Possible values for rho in the simulation study

c1 = c(1,2.3,3.4,1.5) # Coefficients of the 3rd order polynomial defining# the mean curve in group 1

u = c(1,2,3,4)/sqrt(30) # Perturbation vector for the group effect (ci=c1+(i-1)*delta*u)mdelta = rbind(c(0,0.03,0.06,0.10,0.13),c(0,0.05,0.10,0.15,0.20),

c(0,0.10,0.20,0.30,0.40),c(0,0.20,0.40,0.60,0.80),c(0,0.30,0.60,0.90,1.20))

# Possible values for delta depending on rho# 1st row of mdelta: rho=0.1, 2nd row: rho=0.3, ...

n1 = c(20,30,30) # samples sizes per groupn2 = c(40,30,70)n3 = c(80,70,100)n = n1 # Results will only be shown for ths choice of sample sizes

vrandtype = c("gaussian","student","chi2") # Family of distributions for the residuals

groups = factor(rep(1:nb_groups,n))simdesign = model.matrix(~groups,data=data.frame(groups=groups))

# Design matrix of the 1-way ANOVA design

nbsimul = 1 # Number of simulations: set to 1,000 in an actual study

For illustration, a single run of the simulation study setting randtype = gaussian, rho = 0.1, delta =0.03 in the function ‘simuldata.zhang’ is implemented as follows:

12

Page 13: A functional generalized F-test for signal detection …math.agrocampus-ouest.fr/infoglueDeliverLive/digital...Condition S36 S4 S5 S6 S8 S9 Failure 0 3 0 0 3 3 2 Success 0 3 0 0 3

# for (i in 1:length(vrandtype)) { # Loop over the possible distributionsi = 1 # For the present demonstration, randtype="gaussian"randtype = vrandtype[i]

# for (j in 1:length(vrho)) { # Loop over the possible values for rhoj = 1 # For the present demonstration, rho=0.1rho = vrho[j]

# for (k in 1:ncol(mdelta)) { # Loop over the possible values for deltak = 2 # For the present demonstration, delta=0.03delta = mdelta[j,k]ldta = lapply(rep(nb_groups,nbsimul),simuldata.zhang,

n=n1,M=M,c1=c1,delta=delta,u=u,rho=rho,randtype=randtype)# ldta is a list of nbsimul datasetsrespval = matrix(0,nrow=nbsimul,ncol=5)colnames(respval) = c("L2b","GPF","Fmaxb","TRP","FGLS")# Initializes an empty matrix of p-values# with fANOVA methods in columnsfor (l in 1:nbsimul) {

res = fanova.tests(t(ldta[[l]]),groups,parallel=FALSE,test=c("L2b","GPF","Fmaxb","TRP"))

# parallel can be turned to TRUE with nslaves=number of coresrespval[l,1] = res$L2b$pvalueL2brespval[l,2] = res$GPF$pvalueGPFrespval[l,3] = res$Fmaxb$pvalueFmaxbrespval[l,4] = res$TRP$pvalues.anovares = erpFtest(ldta[[l]],design=simdesign,nbf=NULL,verbose=FALSE,

pvalue="MC",nsamples=1000,wantplot=FALSE,nbfmax=20)respval[l,5] = res$pval

}# report current simulation parametersprint(paste("randtype: ",randtype,", rho: ",rho,", delta: ",delta,sep=""))

[1] "randtype: gaussian, rho: 0.1, delta: 0.03"

print(respval)

L2b GPF Fmaxb TRP FGLS[1,] 0.53 0.566 0.343 0.61 0.128

# }# }# }

13