21
| INVESTIGATION Likelihood-Free Inference in High-Dimensional Models Athanasios Kousathanas,* ,Christoph Leuenberger, Jonas Helfer, § Mathieu Quinodoz,** Matthieu Foll, †† and Daniel Wegmann* ,,1 *Department of Biology and Biochemistry and Department of Mathematics, University of Fribourg, 1700 Fribourg, Switzerland, Swiss Institute of Bioinformatics, 1700 Fribourg, Switzerland, § Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge Massachusetts 02139, **Department of Computational Biology, University of Lausanne, 1200 Lausanne, Switzerland, and †† International Agency for Research on Cancer, 69372 Lyon, France ABSTRACT Methods that bypass analytical evaluations of the likelihood function have become an indispensable tool for statistical inference in many elds of science. These so-called likelihood-free methods rely on accepting and rejecting simulations based on summary statistics, which limits them to low-dimensional models for which the value of the likelihood is large enough to result in manageable acceptance rates. To get around these issues, we introduce a novel, likelihood-free Markov chain Monte Carlo (MCMC) method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based on subsets of statistics approximately sufcient for this parameter. This increases acceptance rates dramatically, rendering this approach suitable even for models of very high dimensionality. We further derive that for linear models, a one-dimensional combination of statistics per parameter is sufcient and can be found empirically with simulations. Finally, we demonstrate that our method readily scales to models of very high dimensionality, using toy models as well as by jointly inferring the effective population size, the distribution of tness effects (DFE) of segregating mutations, and selection coefcients for each locus from data of a recent experiment on the evolution of drug resistance in inuenza. KEYWORDS approximate Bayesian computation; distribution of tness effects; hierarchical models; high dimensions; Markov chain Monte Carlo T HE past decade has seen a rise in the application of Bayesian inference algorithms that bypass likelihood cal- culations with simulations. Indeed, these generally termed likelihood-free or approximate Bayesian computation (ABC) (Beaumont et al. 2002) methods have been applied in a wide range of scientic disciplines, including cosmology (Schafer and Freeman 2012), ecology (Jabot and Chave 2009, pro- tein-network evolution (Ratmann et al. 2007), phylogenetics (Fan and Kubatko 2011), and population genetics (Cornuet et al. 2008). Arguably ABC has had its greatest success in population genetics because inferences in this eld are fre- quently conducted under complex models for which likelihood calculations are intractable, thus necessitating inference through simulations. Let us consider a model M that depends on n parameters u; creates data D, and has the posterior distribution pðujDÞ¼ ðDjuÞpðuÞ R ðDjuÞpðuÞdu ; where pðuÞ is the prior and ðDjuÞ is the likelihood function. ABC methods bypass the evaluation of ðDjuÞ by perform- ing simulations with parameter values sampled from pðuÞ that generate D, which in turn is summarized by a set of m-dimensional statistics s: The posterior distribution is then evaluated by accepting such simulations that reproduce the statistics calculated from the observed data (s obs ) pðujsÞ¼ ðs ¼ s obs juÞpðuÞ R ðs ¼ s obs juÞpðuÞdu : However, for models with m 1 the condition s ¼ s obs might be too restrictive and require a prohibitively large simulation effort. Therefore, an approximation step can be employed by relaxing the condition s ¼ s obs to ks 2 s obs k # d; where Copyright © 2016 by the Genetics Society of America doi: 10.1534/genetics.116.187567 Manuscript received January 26, 2016; accepted for publication April 4, 2016; published Early Online April 6, 2016. Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10. 1534/genetics.116.187567/-/DC1. 1 Corresponding author: Department of Biology, University of Fribourg, Chemin du Musée 10, 1200 Fribourg, Switzerland. E-mail: [email protected] Genetics, Vol. 203, 893904 June 2016 893

Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

| INVESTIGATION

Likelihood-Free Inference inHigh-Dimensional Models

Athanasios Kousathanas,*,† Christoph Leuenberger,‡ Jonas Helfer,§ Mathieu Quinodoz,**

Matthieu Foll,†† and Daniel Wegmann*,†,1

*Department of Biology and Biochemistry and ‡Department of Mathematics, University of Fribourg, 1700 Fribourg, Switzerland,†Swiss Institute of Bioinformatics, 1700 Fribourg, Switzerland, §Electrical Engineering and Computer Science, MassachusettsInstitute of Technology, Cambridge Massachusetts 02139, **Department of Computational Biology, University of Lausanne,

1200 Lausanne, Switzerland, and ††International Agency for Research on Cancer, 69372 Lyon, France

ABSTRACT Methods that bypass analytical evaluations of the likelihood function have become an indispensable tool for statisticalinference in many fields of science. These so-called likelihood-free methods rely on accepting and rejecting simulations based onsummary statistics, which limits them to low-dimensional models for which the value of the likelihood is large enough to result inmanageable acceptance rates. To get around these issues, we introduce a novel, likelihood-free Markov chain Monte Carlo (MCMC)method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based onsubsets of statistics approximately sufficient for this parameter. This increases acceptance rates dramatically, rendering this approachsuitable even for models of very high dimensionality. We further derive that for linear models, a one-dimensional combination ofstatistics per parameter is sufficient and can be found empirically with simulations. Finally, we demonstrate that our method readilyscales to models of very high dimensionality, using toy models as well as by jointly inferring the effective population size, the distribution offitness effects (DFE) of segregating mutations, and selection coefficients for each locus from data of a recent experiment on the evolutionof drug resistance in influenza.

KEYWORDS approximate Bayesian computation; distribution of fitness effects; hierarchical models; high dimensions; Markov chain Monte Carlo

THE past decade has seen a rise in the application ofBayesian inference algorithms that bypass likelihood cal-

culations with simulations. Indeed, these generally termedlikelihood-free or approximate Bayesian computation (ABC)(Beaumont et al. 2002) methods have been applied in a widerange of scientific disciplines, including cosmology (Schaferand Freeman 2012), ecology (Jabot and Chave 2009, pro-tein-network evolution (Ratmann et al. 2007), phylogenetics(Fan and Kubatko 2011), and population genetics (Cornuetet al. 2008). Arguably ABC has had its greatest success inpopulation genetics because inferences in this field are fre-quently conducted under complexmodels for which likelihoodcalculations are intractable, thus necessitating inferencethrough simulations.

Let us consider a model M that depends on n parametersu; creates data D, and has the posterior distribution

pðujDÞ ¼ ℙðDjuÞpðuÞRℙðDjuÞpðuÞdu;

where pðuÞ is the prior and ℙðDjuÞ is the likelihood function.ABC methods bypass the evaluation of ℙðDjuÞ by perform-ing simulations with parameter values sampled from pðuÞthat generate D, which in turn is summarized by a set ofm-dimensional statistics s: The posterior distribution is thenevaluated by accepting such simulations that reproduce thestatistics calculated from the observed data (sobs)

pðujsÞ ¼ ℙðs ¼ sobsjuÞpðuÞRℙðs ¼ sobsjuÞpðuÞdu

:

However, for models withm � 1 the condition s¼ sobs mightbe too restrictive and require a prohibitively large simulationeffort. Therefore, an approximation step can be employedby relaxing the condition s ¼ sobs to ks2 sobsk# d; where

Copyright © 2016 by the Genetics Society of Americadoi: 10.1534/genetics.116.187567Manuscript received January 26, 2016; accepted for publication April 4, 2016;published Early Online April 6, 2016.Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.187567/-/DC1.1Corresponding author: Department of Biology, University of Fribourg, Chemin duMusée 10, 1200 Fribourg, Switzerland. E-mail: [email protected]

Genetics, Vol. 203, 893–904 June 2016 893

Page 2: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

kx2 yk is a distance metric of choice between x and y and d isa chosen distance (tolerance) below which simulations areaccepted. The posterior pðujsÞ is thus approximated by

pðujsÞ ¼ ℙðks2 sobsk# djuÞpðuÞRℙðks2 sobsk# djuÞpðuÞdu:

An important advance in ABC inference was the developmentof methods coupling ABC with Markov chain Monte Carlo(MCMC) (Marjoram et al. 2003). These methods allow effi-cient sampling of the parameter space in regions of high likeli-hood, thus requiring fewer simulations to obtain posteriorestimates (Wegmann et al. 2009). The original ABC-MCMCalgorithm proposed by Marjoram et al. (2003) is as follows:

1. If now at u propose to move to u9 according to the transi-tion kernel qðu9juÞ:

2. SimulateD usingmodelMwith u9 and calculate summarystatistics s for D.

3. If ks2 sobsk# d; go to step 4; otherwise go to step 1.4. Calculate the Metropolis–Hastings ratio

h ¼ h�u; u9

�¼ min

1;p�u9�q�u��u9�

pðuÞq�u9�!:

5. Accept u9 with probability h; otherwise stay at u: Go tostep 1.

The sampling success of ABC algorithms is given by thelikelihood values, which are often very low even for relativelylarge tolerance values d: In such situations, the conditionks2 sobsk# d will impose a quite rough approximation tothe posterior. As a result, the utility of the ABC approachesdescribed above is limited to models of relatively low dimen-sionality, typically up to 10 parameters (Blum 2010; Fearnheadand Prangle 2012). The same limitation applies to themore recently developed sequential Monte Carlo samplingmethods (Sisson et al. 2007; Beaumont et al. 2009). Despitethese limitations ABC has been useful in addressing popula-tion genetics problems of low to moderate dimensionalitysuch as the inference of demographic histories (e.g., Wegmannand Excoffier 2010; Brown et al. 2011; Adrion et al. 2014) orselection coefficients of a single locus (e.g., Jensen et al. 2008).However, as more genomic data become available, thereis increasing interest in applying ABC to models of higherdimensionality, such as to estimate genome-wide and locus-specific effects jointly.

To our knowledge, to date, three approaches have beensuggested to tackle high dimensionality with ABC. The firstapproach proposes an expectation propagation approxima-tion to factorize the data space (Barthelmé and Chopin2014), which is an efficient solution for situations with high-dimensional data, but does not directly address the issue ofhigh-dimensional parameter spaces. The second approachconsists of first inferring marginal posterior distributions onlow-dimensional subsets of the parameter space [either one(Nott et al. 2012) or two dimensions (Li et al. 2015)] and

then reconstructing the joint posterior distribution fromthose. This approach benefits from the lower dimensionalityof the statistics space when considering subsets of the param-eters individually and hence renders the acceptance criterionmeaningful. The third approach achieves the same benefitby formulating the problem using hierarchical models, pro-posing to estimate the hyperparameters first, and then fixingthemwhen inferring parameters of lower hierarchies individ-ually (Bazin et al. 2010).

Among these, the approach by Bazin et al. (2010) is themost relevant for population genetics problems, since thoseare frequently specified in a hierarchical fashion by modelinggenome-wide effects as hyperparameters and locus-specificeffects at lower hierarchies. In this way, Bazin et al. (2010)estimated locus-specific selection coefficients and deme-specific migration rates of an island model from microsatellitedata. Furthermore, this approach has inspired the develop-ment of similar methods for estimating more complex migra-tion patterns (Aeschbacher et al. 2013) and locus-specificselection from time-series data (Foll et al. 2015). However, thisapproach and its derivatives will not recover the true jointdistribution if parameters are correlated, which is a commonfeature of such complex models.

Here, we introduce a new ABC algorithm that exploits thereduction of dimensionality of the summary statistics whenfocusing on subsets of parameters, but couples the parameterupdates in an MCMC framework. As we prove below, thiscouplingensures thatour algorithmconverges to the true jointposterior distribution even for models of very high dimen-sions. We then demonstrate its usefulness by inferring theeffective population size jointly with locus-specific selectioncoefficients and thehierarchicalparametersof thedistributionof fitness effects (DFE) from allele frequency time-series data.

Theory

Letusdefine the randomvariableTi ¼ TiðsÞ asanmi-dimensionalfunction of s: We call Ti sufficient for the parameter ui if theconditional distribution of s given Ti does not depend on ui:

More precisely, let ti;obs ¼ TiðsobsÞ: Then

ℙ�s ¼ sobsjTi ¼ ti;obs; u

� ¼ ℙ�s ¼ sobs;Ti ¼ ti;obsju

�ℙ�Ti ¼ ti;obsju

�¼ ℙðs ¼ sobsjuÞ

ℙ�Ti ¼ ti;obsju

�¼: giðsobs; u2iÞ; (1)

where u2i ¼ ðu1; . . . ; ui21; uiþ1; . . . ; unÞ is u with the ith com-ponent omitted.

It is not hard tofind examples for parameter-wise sufficientstatistics. Most common distributions are members of theexponential family, and for these, the density of s has the form

f ðsjuÞ ¼ hðsÞexp"XK

k¼1

hkðuÞTkðsÞ2AðuÞ#:

894 A. Kousathanas et al.

Page 3: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

For a given parameter ui; the vector TiðsÞ consisting of onlythose TkðsÞ for which the respective natural parameter func-tion hkðuÞ depends on ui is a sufficient statistic for ui in thesense of our definition. Some concrete examples of this typeare studied below.

If sufficient statistics Ti can be found for each parameter uiand their dimension mi is substantially smaller than thedimension m of s; then the ABC-MCMC algorithm can begreatly improved with the following algorithm that we de-note ABC with parameter-specific statistics (ABC-PaSS)henceforth.

The algorithm starts at time t ¼ 1 and at some initial pa-rameter value uð1Þ:

1. Choose an index i ¼ 1; . . . ; n according to a probabilitydistribution ðp1; . . . ; pnÞ with

Xpi ¼ 1 and all pi . 0:

2. At u ¼ uðtÞ propose u9 according to the transition kernelqiðu9juÞwhere u9 differs from u only in the ith component:

u9 ¼ ðu1; . . . ; ui21; ui9; uiþ1; . . . ; unÞ:

3. SimulateD usingmodelMwith u9 and calculate summarystatistics s for D. Calculate ti ¼ TiðsÞ and ti;obs ¼ TiðsobsÞ:

4. Let di be the tolerance for parameter ui: If����ti2ti;obs

����i # di;

go to step 5; otherwise go to step 1.5. Calculate the Metropolis–Hastings ratio

h ¼ h�u; u9

�¼ min

1;p�u9�qi�u��u9�

pðuÞqi�u9ju�

!:

6. Accept u9 with probability h; otherwise stay at u:7. Increase t by one, save a new parameter value uðtÞ ¼ u;

and continue at step 1.

Convergence of the MCMC chain is guaranteed by thefollowing:

Theorem 1. For i ¼ 1::n; if di ¼ 0 and Ti is sufficient forparameter ui; then the stationary distribution of the Markovchain is pðujs ¼ sobsÞ:

The Proof for Theorem 1 is provided in the Appendix.It is important to note that the samealgorithmcan also be

applied to groups of parameters, which may be particularlyrelevant in the case of very high correlations betweenparameters that may render their individual MCMC up-dates inefficient. Also, the efficiency of ABC-PaSS can beimproved with all previously proposed extensions for ABC-MCMC. To increase acceptance rates and render ABC-PaSSapplicable to models with continuous sampling distribu-tions, for instance, the assumption di ¼ 0 must be relaxedto di . 0 in practice. This is commonly done in ABC appli-cations and will lead to an approximation of the posteriordistribution pðujs ¼ sobsÞ: Because of the continuity of thesummary statistics s and the sufficient statistics Ti; we the-oretically recover the true posterior distribution in thelimit di/0: We can also perform an initial calibrationABC step to find an optimal starting position uð1Þ and tol-

erance di and to adjust the proposal kernel for each param-eter (Wegmann et al. 2009).

Materials and Methods

Implementation

We implemented the proposed ABC-PaSS framework into anew version of the software package ABCtoolbox (Wegmannet al. 2010), which will be made available at the authors’website and will be described elsewhere.

Toy model 1: Normal distribution

We performed simulations to assess the performance of ABC-MCMC and ABC-PaSS in estimating u1 ¼ m and u2 ¼ s2 for aunivariate normal distribution. We used the sample mean xand sample variance S2 of samples of size n as statistics. Re-call that for noninformative priors the posterior distributionfor m is Nðx; S2=nÞ and the posterior distribution for s2 is x2

distributed with n2 1 d.f. As m and s2 are independent, weget the posterior density

p�m;s2� ¼ fx;S2=nðmÞ �

n2 1S2

fx2;n21

�n2 1S2

s2�:

In our simulations the sample size was n ¼ 10 and the trueparameters were given by m ¼ 0 and s2 ¼ 5: We performed50 MCMC chains per simulation and chose effectively non-informative priors for m � U½210; 10� and s2 � U½0:1; 15�:Our simulations were performed for a wide range of toler-ances (from 0.01 to 41) and proposal ranges (from 0.05 to1.5). We did this exhaustive search to identify the combina-tion of these tuning parameters that allows ABC-MCMC andABC-PaSS to perform best in estimating m and s2: We thenrecorded the minimum total variation distance (L1) betweenthe true and estimated posteriors over these sets of tolerancesand ranges and compared it between ABC-MCMC and ABC-PaSS.

Toy model 2: General linear model

As a second toy model to compare the performance of ABC-MCMC and ABC-PaSS, we considered general linear models(GLMs) with m statistics s being a linear function of n ¼ mparameters u;

s ¼ Cuþ e; e�Nð0; IÞ;

where C is a square design matrix and the vector of errors e ismultivariate normal. Under noninformative priors for the pa-rameters u; their posterior distribution is multivariate normal

ujs � N��

C9C�21

C9s;�C9C

�21�:

We set up the design matrices C in a cyclic manner to allowall statistics to have information on all parameters but theircontributions to differ for each parameter; namely we setC ¼ B � detðB9BÞ21=2n; where

ABC in High Dimensions 895

Page 4: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

B ¼

0BB@1=n 2=n 3=n . . . n=nn=n 1=n 2=n . . . n21=n⋮ ⋮ ⋮ ⋱ 2=n

2=n 3=n 4=n . . . 1=n

1CCA  :

The normalization factor in the definition of C was chosensuch that the determinant of the posterior variance is con-stant and thus the widths of the marginal posteriors are com-parable independently of the dimensionality n. We used allstatistics for ABC-MCMC and calculated a single linear com-bination of statistics per parameter for ABC-PaSS accordingto Theorem 2, using ordinary least squares. For the estima-tion, we assumed that u ¼ 0 and the priors are uniformU½2100; 100� for all parameters, which are effectively non-informative. We started the MCMC chains at a normal de-viate Nðu; 0:01IÞ; i.e., around the true values of u: To ensurefair comparisons between methods, we performed simula-tions of 50 chains for a variety of tolerances (from 0.01 to256) and proposal ranges (from 0.1 to 8) to choose the com-bination of these tuning parameters at which each methodperformed best. We ran all our MCMC chains for 105 itera-tions per model parameter to account for model complexity.

Estimating selection and demography

Model: Consider a vector j of observed allele trajectories(sample allele frequencies) over l ¼ 1; . . . ; L loci, as is com-monly obtained in studies of experimental evolution. We as-sume these trajectories to be the result of both random driftand selection, parameterized by the effective populationsize Ne and locus-specific selection coefficients sl; respec-tively, under the classic Wright–Fisher model with allelicfitnesses 1 and 1þ sl: We further assume the locus-specificselection coefficients sl follow a DFE parameterized as ageneralized Pareto distribution (GPD) with mean m ¼ 0;shape x, and scale s. Our goal is thus to estimate the jointposterior distribution

pðNe; s1; . . . ; sL; x;sjjÞ}YLl¼1

ℙðjljNe; slÞpðsljx;sÞ

pðNeÞpðxÞpðsÞ:

To apply our ABC-PaSS framework to this problem, we ap-proximate the likelihood term ℙðjljNe; slÞ numerically withsimulations, while updating the hyperparameters x and s

analytically.

Summary statistics: To summarize the data j; we used sta-tistics originally proposed by Foll et al. (2015). Specifically,we first calculated for each locus individually a measure ofthe difference in allele frequency between consecutive timepoints as

Fs9 ¼ 1tFs12 1

��2~n�

2 2�~n

ð1þ Fs=4Þ½12 1=ðnyÞ�;

where

Fs ¼ ðx2yÞ2zð12 zÞ;

x and y are the minor allele frequencies separated by t gen-erations, z ¼ ðx þ yÞ=2; and ~n is the harmonic mean of thesample sizes nx and ny:We then summed the Fs9 values of allpairs of consecutive time points with increasing and decreas-ing allele frequencies into Fs9i and Fs9d; respectively (Follet al. 2015). Finally, we followed Aeschbacher et al. (2012)and calculated boosted variants of the two statistics to takemore complex relationships between parameters and statis-tics into account. The full set of statistics used per locus wasFl = fFs9il; Fs9dl; Fs9i2l ; Fs9d2l ; Fs9il 3 Fs9dlg

Wenext calculated parameter-specific linear combinationsforNe and locus-specific sl following the procedure developedabove. To do so, we simulated allele trajectories of a singlelocus for different values ofNe and s sampled from their prior.We then calculated Fl for each simulation and performed aBox–Cox transformation to linearize the relationships betweenstatistics and parameters (Box and Cox 1964; Wegmann et al.2009). We then fitted a linear model as outlined in EquationA3 to estimate the coefficients of an approximately suffi-cient linear combination of F for each parameter Ne and s.This resulted in tsðFlÞ ¼ bsFl and tNeðFlÞ ¼ bNe

Fl: To com-bine information across loci when updating Ne; we thencalculated

tNeðFÞ ¼XLl¼1

bNeFl;

where F ¼ fF1; . . . ; FLg In summary, we used the ABCapproximation

ℙðjjjNe; sjÞ � ℙ���tsðFlÞ2 ts

�Flobs

���, dsl ;

ktNeðFÞ2 tNeðFobsÞk, dNe jNe; sjÞ:

Simulations and application

We applied our framework to allele frequency data for thewhole influenza H1N1 genome obtained in a recently pub-lished evolutionary experiment (Foll et al. 2014). In thisexperiment, influenza A/Brisbane/59/2007 (H1N1) was se-rially amplified on Madin–Darby canine kidney (MDCK) cellsfor 12 passages of 72 hr each, corresponding to �13 gener-ations (doublings). After the three initial passages, sampleswere passed either in the absence of drug or in the presenceof increasing concentrations of the antiviral drug oseltamivir.At the end of each passage, samples were collected for whole-genome high-throughput population sequencing. We obtainedthe raw data from http://bib.umassmed.edu/influenza/ and,following the original study (Foll et al. 2014), we down-sampled it to 1000 haplotypes per time point and filtered itto contain only loci for which sufficient data were availableto calculate the Fs9 statistics. Specifically, we included all loci

896 A. Kousathanas et al.

Page 5: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

with an allele frequency $   2% at $   2 time points. Therewere 86 and 42 such loci for the control and drug-treatedexperiments, respectively. Further, we restricted our analysisof the data of the drug-treated experiment to the last ninetime points during which drug was administered.

We performed all our Wright–Fisher simulations with in-house C++ code implemented as a module of ABCtoolbox.We simulated 13 generations between time points and asample of size 1000 per time point. We set the prior for Ne

uniform on the log10 scale such that log10ðNeÞ � U½1:5; 4:5�and for the parameters of the GPD x � U½20:2; 1� and forlog10ðsÞ � U½22:5; 2 0:5�: For the simulations where noDFE was assumed, we set the prior of s � U½0; 1�:

As above,we ran all our ABC-PaSS chains for 105 iterationsper model parameter to account for model complexity. Toensure fast convergence, the ABC-PaSS implementationbenefited from an initial calibration step we originally de-veloped for ABC-MCMC and implemented in ABCtoolbox(Wegmann et al. 2009). Specifically, we first generated 10,000simulations with values drawn randomly from the prior.For each parameter, we then selected the 1% subset of thesesimulations with the smallest distances to the observed databased on the linear combination specific for that parameter.These accepted simulations were used to calibrate threeimportant metrics prior to the MCMC run: First, we set theparameter-specific tolerances di to the largest distance amongthe accepted simulations. Second, we set the width of theparameter-specific proposal kernel to half of the standarddeviation of the accepted parameter values. Third, we chosethe starting value of the chain for each parameter as theaccepted simulation with smallest distance. Each chainwas then run for 1000 iterations, and new starting valueswere chosen randomly among the accepted calibration simu-lations for those parameters forwhich no updatewas accepted.This was repeated until all parameters were updated at leastonce.

Data availability

The authors state that all data necessary for confirming theconclusions presented in the article are represented fullywithin the article.

Results

Toy model 1: Normal distribution

We first compared the performance of ABC-PaSS and ABC-MCMC under a simple model: the normal distribution withparameters mean (m) and variance (s2). Given a sample ofsize n, the sample mean (x) is a sufficient statistic for m, whileboth x and the sample variance (S2) are sufficient for s2

(Casella and Berger 2002). For ABC-MCMC, we used bothx and S2 as statistics. For ABC-PaSS, we used only x whenupdating m and both x and S2 when updating s2:

We then compared the accuracy between the two algo-rithms by calculating the total variation distance between theinferred and the true posteriors (L1 distance from kernelsmoothed posterior based on 10,000 samples). We computedL1 under a wide range of tolerances to find the tolerance forwhich each algorithmhad the best performance (i.e., minimumL1). As shown in Figure 1, A and C, ABC-PaSS produced amoreaccurate estimation form than ABC-MCMC. The two algorithmshad similar performance when estimating s2 (Figure 1, Band D).

The normal distribution toy model, although simple, isquite illustrative of the nature of the improvement in per-formance by using ABC-PaSS over ABC-MCMC. Indeed, ourresults demonstrate that the slight reduction of the summarystatistics space by ignoring a single uninformative statisticwhen updatingm already results in a noticeable improvementin estimation accuracy. This improvement would not be pos-sible to attain with classic dimension reduction techniques,such as partial least squares (PLS), since the informationcontained in x and S2 is irreducible under ABC-MCMC.

Toy model 2: GLM

Weexpectourapproach tobeparticularlypowerful formodelsof the exponential family, for which a small number of sum-mary statistics per parameter are sufficient, regardless ofsample size. To illustrate this, we next compared the perfor-mance of ABC-MCMC and ABC-PaSS under GLM models ofincreasing dimensionality n. For all models, we constructedthe design matrix C such that all statistics are informative forall parameters, while retaining the total information on the

Figure 1 Performance to infer parameters of a normal distribution. Shown is the average over 50 chains of the L1 distance between the true andestimated posterior distributions for m (A) and s2 (B) for different tolerances for ABC-MCMC (blue) and ABC-PaSS (red). The dashed horizontal line is theL1 distance between the prior and the true posterior distribution. (C and D) The estimated posterior distribution for m (C) and s2 (D) using the tolerancethat led to the minimum L1 distance from the true posterior (black). The dashed vertical line indicates the true values of the parameters.

ABC in High Dimensions 897

Page 6: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

individual parameters regardless of dimensionality (seeMa-terials and Methods). For a GLM, a single linear function is asufficient statistic for each associated parameter, and thisfunction can easily be learned from a set of simulations,using standard regression approaches (see Theorem 2 inthe Appendix). Therefore, for ABC-MCMC, we used all sta-tistics s; while for ABC-PaSS, we employed Theorem 2 andused a single linear combination of statistics ti per parame-ter ui: As above, we assessed performance of ABC-MCMCand ABC-PaSS by calculating the total variation distance(L1) between the inferred and the true posterior distribu-tion. We calculated L1 for several tolerances to find thetolerance where L1 was minimal for each algorithm (seeFigure 2A for examples with n ¼ 2 and n ¼ 4). Since inABC-MCMC distances are calculated in the multidimensionalstatistics space, the optimal tolerance increased with higherdimensionality. This is not the case for ABC-PaSS, becausedistances are always calculated in one dimension only(Figure 2A).

We found that ABC-MCMC performance was good for lown, but worsened rapidly with increasing number of parame-ters, as expected from the corresponding increase in the di-mensionality of statistics space (Figure 2B). For a GLM with32 parameters, approximate posteriors obtained with ABC-MCMC differed only little from the prior (Figure 2B). Incontrast, performance of ABC-PaSS was unaffected bydimensionality and was better than that of ABC-MCMC evenin low dimensions (Figure 2B). These results support that byconsidering low-dimensional parameter-specific summarystatistics under our framework, ABC inference remains feasi-ble even under models of very high dimensionality, for whichcurrent ABC algorithms are not capable of producing mean-ingful estimates.

Application: Inference of natural selectionand demography

One of the major research problems in modern populationgenetics is the inference of natural selection and demographichistory, ideally jointly (Crisci et al. 2012; Bank et al. 2014).One way to gain insight into these processes is by investigat-ing how they affect allele frequency trajectories through timein populations, for instance under experimental evolution.Several methods have thus been developed to analyze allele

trajectory data to infer both locus-specific selection coeffi-cients (s) and the effective population size (Ne). The model-ing framework of these methods assumes Wright–Fisher(WF) population dynamics in a hidden Markov setting toevaluate the likelihood of the parameters Ne and s given theobserved allele trajectories (Bollback et al. 2008; Malaspinaset al. 2012). In this setting, likelihood calculations are feasi-ble, but very time-consuming, especially when consideringmany loci at the genome-wide scale (Foll et al. 2015).

To speed up calculations, Foll et al. (2015) developed anABC method (WF-ABC), adopting the hierarchical ABCframework of Bazin et al. (2010). Specifically, WF-ABC firstestimates Ne based on statistics that are functions of all lociand then infers s for each locus individually under the inferredvalue of Ne: While WF-ABC easily scales to genome-widedata, it suffers from the unrealistic assumption of completeneutrality when inferring Ne; which potentially leads tobiases in the inference.

Here we show that by employing ABC-PaSS, Ne and locus-specific selection coefficients can be inferred jointly, which isnot possible with ABC-MCMC due to high dimensionality ofthe summary statistics that is a direct function of the numberof loci considered.

Finding sufficient statistics: All ABC algorithms, includingABC-PaSS introducedhere, require that statistics are sufficientfor estimating the parameters of a givenmodel. Asmentionedabove, parameter-wise sufficient statistics as required byABC-PaSS are trivial to find for distributions of the exponentialfamily. Since many population genetics models do not followsuch distributions, sufficient statistics are known for the mostsimple models only. The number of haplotypes segregating ina sample, for example, is a sufficient statistic for estimating thepopulation-scaled mutation rate under Wright–Fisher equi-librium assumptions (Durrett 2008).

Formore realisticmodels involvingmultiple populationsorpopulation size changes, only approximately-sufficient statis-tics can be found. Choosing such statistics is not trivial,however, as too few statistics are insufficient to summarizethe data while too many statistics can create an excessivelylarge statistics space that worsens the approximation ofthe posterior (Beaumont et al. 2002; Wegmann et al. 2009;Csilléry et al. 2010). Often, such statistics are thus found

Figure 2 Performance to infer parameters of GLMmodels. (A) The average L1 distance between thetrue and estimated posterior distributions for differ-ent tolerances for ABC-MCMC (blue) and ABC-PaSS(red). Solid and dashed lines are for a GLM with twoand four parameters, respectively. (B) The minimumL1 distance from the true posterior over differenttolerances for increasing numbers of parameters.(A and B) The dashed line is the L1 distance betweenthe prior and the posterior distribution.

898 A. Kousathanas et al.

Page 7: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

empirically by applying dimensionality reduction techniquesto a larger set of statistics initially calculated (Blum et al.2013).

Fearnhead and Prangle (2012) suggested amethodwherean initial set of simulations is used to fit a linear model, usingordinary least squares that expresses each parameter ui as afunction of the summary statistics s: These functions are thenused as statistics in subsequent ABC analysis. Thus Fearnheadand Prangle’s approach reduces the dimensionality of statis-tics space to a single combination of statistics per parameter.However, the Pitman–Koopman–Darmois theorem states thatfor models that do not belong to the exponential family, thedimensionality of sufficient statistics must grow with increas-ing sample size, suggesting that multiple summary statisticsare likely required in this case as any locus carries indepen-dent information for the parameter Ne: A method similar inspirit but not limited to a single summary statistic per param-eter is a partial least-squares transformation (Wegmann et al.2009), which has been used successfully in many ABC appli-cations (e.g., Veeramah et al. 2011; Chu et al. 2013; Dussexet al. 2014).

Herewe chose to calculate the per locus statistics proposedby Foll et al. (2015) and to then apply and empirically com-pare both methods to reduce dimensionality for this particu-lar model. Before dimension reduction, however, we applieda multivariate Box–Cox transformation (Box and Cox 1964)to increase linearity between statistics and parameters, assuggested by Wegmann et al. (2009). To decide on the re-quired number of PLS components, we performed a leave-one-out analysis implemented in the R package “PLS” (Mevikand Wehrens 2007). In line with the Pitman–Koopman–Darmois theorem, a small number (two) of PLS componentswere sufficient for s, but many more components containedinformation about Ne; for which many independent obser-vations are available (Supplemental Material, Figure S1).However, the first PLS component alone explained alreadytwo-thirds of the total variance than can be explained withup to 100 components, suggesting that additional compo-nents add, besides information, also substantial noise. Wethus chose to evaluate the accuracy of our inference with

three different sets of summary statistics: (1) a single linearcombination of summary statistic for each s and Ne chosenusing ordinary least squares, as suggested by Fearnhead andPrangle (2012) (LC 1/1); (2) two PLS components for s andfive PLS components forNe; as suggested by the leave-one-outanalysis (PLS 5/2); and (3) an intermediate set of one PLScomponent for s and three PLS components for Ne (PLS 3/1).

Performance of ABC-PaSS in inferring selection and de-mography: To examine the performance of ABC-PaSS underthe WF model, we inferred Ne and s on sets of 100 loci simu-lated with varying selection coefficients. We evaluated the es-timation accuracy by comparing the estimated vs. the truevalues of the parameters over 25 replicate simulations, firstusing a single linear combination of summary statistics perparameter found using ordinary least squares (LC 1/1). Asshown in Figure 3A, Ne was estimated well over the wholerange of values tested. Estimates for s were on average unbi-ased and accuracy was, as expected, higher for larger Ne (Fig-ure 3B). Note that since the prior on swas U½0; 1�; these resultsimply that our approach estimates Ne with high accuracy evenwhen the majority of the simulated loci are under strong selec-tion (90% of loci had Nes.10). Hence, our method allows usto relax the assumption of neutrality on most of the loci, whichwas necessary in previous studies (Foll et al. 2015).

Wenext introducedhyperparameters for thedistributionofselection coefficients (the so-called DFE). Such hyperpara-meters are computationally cheap to estimate under our frame-work, as their updates can be done analytically and do notrequire simulations. Following previous work (Beisel et al.2007; Martin and Lenormand 2008), we assumed that thedistribution of the locus-specific s is realistically described bya truncated GPD with location m ¼ 0 and parameters shape sand scale x (Figure S2).

Wefirst evaluated the accuracy of estimating x andswhenfixing the value of the other parameter and found that bothparameters are well estimated under these conditions (Fig-ure 3, C and D, respectively). Since the truncated GPDof multiple combinations of x and s is very similar, theseparameters are not always identifiable. This renders the

Figure 3 Accuracy in inferring demographic and selection parameters. Results were obtained with ABC-PaSS using a single combination of statistics forNe and each s (LC 1/1). Shown are the true vs. estimated posterior medians for parameters Ne (A), s per locus (B), and x and s of the generalized Paretodistribution (C and D, respectively). Boxplots summarize results from 25 replicate simulations, each with 100 loci. Uniform priors over the whole rangesshown were used. (A and B) Ne assumed in the simulations is represented as a color gradient of red (low Ne) to yellow (high Ne). (C and D) Parameters mand Ne were fixed to 0 and 103; respectively; log10s was fixed to 21 (C); and x was fixed to 0.5 (D).

ABC in High Dimensions 899

Page 8: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

accurate joint estimation of both parameters difficult (FigureS3, B and C). However, despite the reduced accuracy on theindividual parameters, we found the overall shape of the GPDto be well recovered (Figure S3, D–F). Also,Ne was estimatedwith high accuracy for all combinations of x and s (FigureS3A).

We then checked whether the accuracy of these estimatescan be improved by using summary statistics of higher di-mensionality. Specifically, we repeated these analyses with ahigh-dimensional set (PLS 5/2) consisting of the first five andthefirst twoPLS components forNe and each s, respectively, aswell as a set of intermediate dimensionality (PLS 3/1) con-sisting of the first three PLS components for Ne and only thefirst PLS component for each s. Overall, all sets of summarystatistics compared here resulted in very similar performanceas assessed both visually (compare Figure 3, Figure S4, andFigure S5 for LC 1/1, PLS 5/2, and PLS 3/1, respectively)and by calculating the both root mean square error (RMSE)and Pearson’s correlation coefficient between true andinferred values (Table S1). Interestingly, the intermediateset (PLS 3/1) performed worst in all comparisons, whilethe differences between LC 1/1 and PLS 5/2 were very subtle,particularlywhenuniformpriorswere used on all s (simulationset 1; Table S1). However, in the presence of hyperparameterson s, resultsweremore variable (simulation sets 2–4; Table S1)and we found the effective population size Ne to be consis-tently overestimated when using high-dimensional summa-ries such as PLS 5/2 (simulation sets 2–4; Table S1). Theseresults suggest that while our analysis is generally ratherrobust to the choice of summary statistics, the benefit ofextra information added by additional summary statisticsis offset by the increased noise in higher dimensions. Weexpect that robustness of results to the choice of summarystatistics will be model dependent and recommend that theperformance of multiple-dimension reduction techniquesshould be evaluated in future applications of ABC-PaSS likewe did here.

Analysis of infuenza data: We applied our approach to datafrom a previous study (Foll et al. 2014)where cultured canine

kidney cells infected with the influenza virus were subjectedto serial transfers for several generations. In one experiment,the cells were treated with the drug Oseltamivir, and in acontrol experiment they were not treated with the drug. Toobtain allele frequency trajectories of all sites of the infuenzavirus genome (13.5 kbp), samples were taken and sequencedevery 13 generations with pooled population sequencing.The aim of our application was to identify which viral muta-tions rose in frequency during the experiment due to naturalselection and which due to drift and to investigate the shapeof the DFE for the control and drug-treated viral populations.

Following Foll et al. (2014), we filtered the raw data tocontain loci for which sufficient data were available to calcu-late the summary statistics considered here (see Materialsand Methods). There were 86 and 42 such loci for the controland drug-treated experiments, respectively (Figure S6).

We then employed ABC-PaSS to estimate Ne; s per locusand the parameters of the DFE, first using a single summarystatistic per parameter (LC 1/1). We obtained a low estimatefor Ne (posterior medians 350 for drug-treated and 250 forcontrol influenza; Figure 4A), which is expected given thebottleneck that the cells were subjected to in each transfer.While we obtained similar estimates for the x parameters forthe drug-treated and for the control influenza (posterior me-dians 0.44 and 0.56, respectively), the s parameter was es-timated to be much higher for the drug-treated than for thecontrol influenza (posterior medians 0.047 and 0.0071, re-spectively; Figure 4B). The resulting DFE was thus very dif-ferent for the two conditions: The DFE for the drug-treatedinfluenza had a much heavier tail than the control (Figure4C). Posterior estimates for Nes per locus also indicated thatthe drug-treated influenza had more loci under strong posi-tive selection than the control (19% vs. 3.5% of loci hadPðNes. 10Þ. 0:95; respectively; Figure 4D and Figure S6).Almost identical results were also obtained when usinghigher-dimensional summary statistics based on PLS com-ponents (Figure S7). These results indicate that the drugtreatment placed the influenza population away from a fit-ness optimum, thus increasing the number of positivelyselected mutations with large effect sizes. Presumably these

Figure 4 Inferred demography and selection for experimental evolution of influenza. We show results for the no-drug (control) and drug-treatedinfluenza in gray and orange, respectively. Shown are the posterior distributions for log10Ne (A) and log10s and x (B). In C, we plotted the modal DFEwith thick lines by integrating over the posterior of its parameters. The thin lines represent the DFEs obtained by drawing 100 samples from the posteriorof s and x. Dashed lines in A and C correspond to the prior distributions. In D, the posterior estimates for Nes per locus vs. the position of the loci in thegenome are shown. Open circles indicate nonsignificant loci whereas solid, thick circles indicate significant loci [PðNes.10Þ.0:95, dashed line].

900 A. Kousathanas et al.

Page 9: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

mutations confer resistance to the drug, thus helping influ-enza to reach a new fitness optimum.

Our results for influenza were qualitatively similar to thoseobtained by Foll et al. (2014). We obtained slightly largerestimates for Ne (350 vs. 226 for drug-treated and 250 vs.176 for control influenza). Our estimates for the parametersof the GPDwere substantially different from those of Foll et al.(2014) but resulted in qualitatively similar overall shapes ofthe DFE for both drug-treated and control experiments. Theseresults underline the applicability of our method to a high-dimensional problem. In contrast to Foll et al. (2014) whoperformed estimations in a three-step approach, combining amoment-based estimator for Ne; ABC for s, and a maximum-likelihood approach for the GPD, our Bayesian frameworkallowed us to perform joint estimation and to obtain posteriordistributions for all parameters in a single step.

Discussion

Due to the difficulty to find analytically tractable likelihoodsolutions, statistical inference is often limited to models thatmade substantial approximations of reality. To address thisproblem, so-called likelihood-free approaches have beenintroduced that bypass the analytical evaluation of the likeli-hood functionwith computer simulations.While full-likelihoodsolutions generally have more power, likelihood-free methodshavebeenused inmanyfields of science toovercomeundesiredmodel assumptions.

Here we developed and implemented a novel likelihood-free,MCMC inference framework that scales naturally to highdimensions. This framework takes advantage of the observa-tion that the information about one model parameter is oftencontained in a subset of the data, by integrating two keyinnovations: First, only a single parameter is updated at atime, and theupdate is acceptedbasedona subset of summarystatistics sufficient for this parameter. We proved that thisMCMC variant converges to the true joint posterior distribu-tion under the standard assumptions.

Since simulations are accepted based on lower dimension-ality, our algorithm proposed here will have a higher accep-tance rate than other ABC approaches for the same accuracyand hence require fewer simulations. This is particularlyrelevant for cases in which the simulation step is computa-tionally challenging, such as for population genetic modelsthat are spatially explicit (Ray et al. 2010) or require forward-in-time simulations (as opposed to coalescent simulations)(Hernandez 2008; Messer 2013).

Wedemonstrated the power of our framework through theapplication to multiple problems. First, our framework led tomore accurate inference of the mean and standard deviationof a normal distribution than standard likelihood-freeMCMC,suggesting that our framework is already competitive inmodels of low dimensionality. In high dimensions, the benefitwas even more apparent. When applied to the problem ofinferring parameters of a GLM, for instance, we found ourframework to be insensitive to the dimensionality, resulting in

a performance similar to that of analytical solutions both inlowand inhighdimensions. Finally,weusedour framework toaddress the difficult and high-dimensional problem of infer-ring demography and selection jointly from genetic data.Specifically, and through simulations and an application toexperimental data, we show that our framework enables theaccurate joint estimation of the effective population size, thedistribution of fitness effects of segregating mutations, andlocus-specific selection coefficients from allele frequencytime-series data.

More generally, we envision that any hierarchical modelwithgenome-wideandlocus-specificparameterswouldbewellsuited for application of ABC-PaSS. Such models may includehyperparameters like genome-wide mutation and recombina-tion rates or parameters regarding the demographic history,along with locus-specific parameters that allow for between-locus variation, for instance in the intensity of selection, mu-tation, recombination, or migration rates. Among these, theprospect of jointly inferring selection and demographic historyeven from data of a single time point is particularly relevant,since it allows for the relaxation of a frequently used yetunrealistic assumption that neutral loci can be identified apriori. In addition, such a joint estimation allows for hierarchi-cal parameters to aggregate information across individual locito increase estimation power, for instance for the inference oflocus-specific selection coefficients by also jointly inferring pa-rameters of the DFE, as we did here.

Acknowledgments

We thank Pablo Duchen and the Laurent Excoffier group forcomments and discussion on this work. We also thank twoanonymous reviewers whose constructive criticism andcomments helped to improve this work significantly. Thisstudy was supported by Swiss National Foundation grant31003A_149920 (to D.W.).

Literature Cited

Adrion, J. R., A. Kousathanas, M. Pascual, H. J. Burrack, N. M.Haddad et al., 2014 Drosophila suzukii: the genetic footprintof a recent, worldwide invasion. Mol. Biol. Evol. 31: 3148–3163.

Aeschbacher, S., M. A. Beaumont, and A. Futschik, 2012 A novelapproach for choosing summary statistics in approximate Bayes-ian computation. Genetics 192: 1027–1047.

Aeschbacher, S., A. Futschik, and M. A. Beaumont, 2013 ApproximateBayesian computation for modular inference problems withmany parameters: the example of migration rates. Mol. Ecol.22: 987–1002.

Bank, C., G. B. Ewing, A. Ferrer-Admettla, M. Foll, and J. D. Jensen,2014 Thinking too positive? Revisiting current methods ofpopulation genetic selection inference. Trends Genet. 30: 540–546.

Barthelmé, S., and N. Chopin, 2014 Expectation propagation forlikelihood-free inference. J. Am. Stat. Assoc. 109: 315–333.

Bazin, E., K. J. Dawson, and M. A. Beaumont, 2010 Likelihood-free inference of population structure and local adaptation in aBayesian hierarchical model. Genetics 185: 587–602.

ABC in High Dimensions 901

Page 10: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

Beaumont, M. A., W. Zhang, and D. J. Balding, 2002 ApproximateBayesian computation in population genetics. Genetics 162:2025–2035.

Beaumont, M. A., J.-M. Cornuet, J.-M. Marin, and C. P. Robert,2009 Adaptive approximate Bayesian computation. Biome-trika 96: 983–990.

Beisel, C. J., D. R. Rokyta, H. A. Wichman, and P. Joyce,2007 Testing the extreme value domain of attraction for dis-tributions of beneficial fitness effects. Genetics 176: 2441–2449.

Bilodeau, M., and D. Brenner, 2008 Theory of Multivariate Statis-tics. Springer Science & Business Media. New York, NY.

Blum, M. G. B., 2010 Approximate Bayesian computation: a non-parametric perspective. J. Am. Stat. Assoc. 105: 1178–1187.

Blum, M. G. B., M. A. Nunes, D. Prangle, and S. A. Sisson, 2013 Acomparative review of dimension reduction methods in approx-imate Bayesian computation. Stat. Sci. 28: 189–208.

Bollback, J. P., T. L. York, and R. Nielsen, 2008 Estimation of 2nesfrom temporal allele frequency data. Genetics 179: 497–502.

Box, G. E. P., and D. R. Cox, 1964 An analysis of transformations.J. R. Stat. Soc. B 26: 211–252.

Brown, P. M. J., C. E. Thomas, E. Lombaert, D. L. Jeffries, A. Estoupet al., 2011 The global spread of Harmonia axyridis (Coleop-tera: Coccinellidae): distribution, dispersal and routes of inva-sion. BioControl 56: 623–641.

Casella, G., and R. L. Berger, 2002 Statistical Inference, Vol. 2.Duxbury Press, Pacific Grove, CA.

Chu, J.-H., D. Wegmann, C.-F. Yeh, R.-C. Lin, X.-J. Yang et al.,2013 Inferring the geographic mode of speciation by contrast-ing autosomal and sex-linked genetic diversity. Mol. Biol. Evol.30: 2519–2530.

Cornuet, J.-M., F. Santos, M. A. Beaumont, C. P. Robert, J.-M.Marin et al., 2008 Inferring population history with DIY ABC:a user-friendly approach to approximate Bayesian computation.Bioinformatics 24: 2713–2719.

Crisci, J. L., Y.-P. Poh, A. Bean, A. Simkin, and J. D. Jensen,2012 Recent progress in polymorphism-based population ge-netic inference. J. Hered. 103: 287–296.

Csilléry, K., M. G. B. Blum, O. E. Gaggiotti, and O. Franccois,2010 Approximate Bayesian computation (ABC) in practice.Trends Ecol. Evol. 25: 410–418.

Durrett, R., 2008 Probability Models for DNA Sequence Evolution.Springer Science & Business Media. New York, NY.

Dussex, N., D. Wegmann, and B. Robertson, 2014 Postglacial ex-pansion and not human influence best explains the populationstructure in the endangered kea (Nestor notabilis). Mol. Ecol.23: 2193–2209.

Fan, H. H., and L. S. Kubatko, 2011 Estimating species trees usingapproximate Bayesian computation. Mol. Phylogenet. Evol. 59:354–363.

Fearnhead, P., and D. Prangle, 2012 Constructing summary sta-tistics for approximate Bayesian computation: semi-automaticapproximate Bayesian computation. J. R. Stat. Soc. Ser. B Stat.Methodol. 74: 419–474.

Foll, M., Y.-P. Poh, N. Renzette, A. Ferrer-Admetlla, C. Bank et al.,2014 Influenza virus drug resistance: a time-sampled populationgenetics perspective. PLoS Genet. 10: e1004185.

Foll, M., H. Shim, and J. D. Jensen, 2015 WFABC: a Wright–Fisher ABC-based approach for inferring effective populationsizes and selection coefficients from time-sampled data. Mol.Ecol. Resour. 15: 87–98.

Hernandez, R. D., 2008 A flexible forward simulator for popula-tions subject to selection and demography. Bioinformatics 24:2786–2787.

Jabot, F., and J. Chave, 2009 Inferring the parameters of theneutral theory of biodiversity using phylogenetic informa-tion and implications for tropical forests. Ecol. Lett. 12:239–248.

Jensen, J. D., K. R. Thornton, and P. Andolfatto, 2008 An approx-imate Bayesian estimator suggests strong, recurrent selectivesweeps in Drosophila. PLoS Genet. 4: e1000198.

Leuenberger, C., and D. Wegmann, 2010 Bayesian computationand model selection without likelihoods. Genetics 184: 243–252.

Li, J., D. J. Nott, Y. Fan, and S. A. Sisson, 2015 Extending approx-imate Bayesian computation methods to high dimensions viaGaussian copula. arXiv:1504.04093.

Malaspinas, A.-S., O. Malaspinas, S. N. Evans, and M. Slatkin,2012 Estimating allele age and selection coefficient from time-serial data. Genetics 192: 599–607.

Marjoram, P., J. Molitor, V. Plagnol, and S. Tavaré, 2003 Markovchain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci.USA 100: 15324–15328.

Martin, G., and T. Lenormand, 2008 The distribution of beneficialand fixed mutation fitness effects close to an optimum. Genetics179: 907–916.

Messer, P. W., 2013 SLiM: simulating evolution with selectionand linkage. Genetics 194: 1037–1039.

Mevik, B., and R. Wehrens, 2007 The PLS package: principalcomponent and partial least squares regression in R. J. Stat.Softw. 18: 1–24.

Nott, D. J., Y. Fan, L. Marshall, and S. A. Sisson, 2012 ApproximateBayesian computation and Bayes’ linear analysis: toward high-dimensional ABC. J. Comput. Graph. Stat. 23: 65–86.

Ratmann, O., O. Jørgensen, T. Hinkley, M. Stumpf, S. Richardsonet al., 2007 Using likelihood-free inference to compare evolu-tionary dynamics of the protein networks of H. pylori and P.falciparum. PLoS Comput. Biol. 3: e230.

Ray, N., M. Currat, M. Foll, and L. Excoffier, 2010 SPLATCHE2: aspatially explicit simulation framework for complex demogra-phy, genetic admixture and recombination. Bioinformatics 26:2993–2994.

Schafer, C. M., and P. E. Freeman, 2012 Likelihood-free inferencein cosmology: potential for the estimation of luminosity func-tions, pp. 3–19 in Statistical Challenges in Modern Astronomy V(Lecture Notes in Statistics no. 902), edited by E. D. Feigelsonand G. J. Babu. Springer-Verlag, New York.

Sisson, S. A., Y. Fan, and M. M. Tanaka, 2007 Sequential MonteCarlo without likelihoods. Proc. Natl. Acad. Sci. USA 104: 1760–1765.

Veeramah, K. R., D. Wegmann, A. Woerner, F. L. Mendez, J. C.Watkins et al., 2011 An early divergence of KhoeSan ancestorsfrom those of other modern humans is supported by an ABC-based analysis of autosomal re-sequencing data. Mol. Biol. Evol.29: 617–630.

Wegmann, D., and L. Excoffier, 2010 Bayesian inference of thedemographic history of chimpanzees. Mol. Biol. Evol. 27: 1425–1435.

Wegmann, D., C. Leuenberger, and L. Excoffier, 2009 Efficientapproximate Bayesian computation coupled with Markov chainMonte Carlo without likelihood. Genetics 182: 1207–1218.

Wegmann, D., C. Leuenberger, S. Neuenschwander, and L. Excoffier,2010 ABCtoolbox: a versatile toolkit for approximate Bayesiancomputations. BMC Bioinformatics 11: 116.

Communicating editor: M. A. Beaumont

902 A. Kousathanas et al.

Page 11: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

Appendix

Proof for Theorem 1. The transition kernel Kðu; u9Þ associated with the Markov chain is zero if u and u9 differ in more than onecomponent. If u2i ¼ u92i for some index i, then we have

K�u; u9� ¼ piri�u; u9

�þ �12 rðuÞ�du�u9�; (A1)

where riðu; u9Þ ¼ qiðu9juÞℙ�Ti ¼ ti;obs

��u9�hðu; u9Þ; du is the Dirac mass in u; and

rðuÞ ¼Xni¼1

pi

Zri

�u; u9

�du9ig:

We may assume without loss of generality that

p�u9�qi�u��u9�

pðuÞqi�u9ju� # 1:

From (1) we conclude

ℙðs ¼ sobsjuÞ ¼ ℙ�Ti ¼ ti;obsju

�giðsobs; u2iÞ:

Setting

c :¼�Z

ℙðs ¼ sobsjuÞpðuÞdu�21

and keeping in mind that u2i ¼ u92i and hðu9; uÞ ¼ 1; we get

pðujs ¼ sobsÞri�u;��u9� ¼ pðujs ¼ sobsÞqi

�u9ju

�ℙ�Ti ¼ ti;obs

��u9�h�u; u9�¼ c  ℙðs ¼ sobsjuÞpðuÞqi

�u9ju

�ℙ�Ti ¼ ti;obs

��u9�p�u9�qi�u��u9�pðuÞqi

�u9ju�

¼ c  ℙ�Ti ¼ ti;obsju

�giðsobs; u2iÞℙ

�Ti ¼ ti;obs

��u9�p�u9�qi�u��u9�¼ c  ℙ

�Ti ¼ ti;obs

��u9�gi�sobs; u92i

�ℙ�Ti ¼ ti;obsju

�p�u9�qi�u��u9�

¼ c  ℙ�s ¼ sobs

��u9�ℙ�Ti ¼ ti;obsju�p�u9�qi�u��u9�h�u9; u�

¼ p�u9js ¼ sobs

�ri

�u9; u

�:

From this and Equation A1 it follows readily that the transition kernel Kð�; �Þ satisfies the detailed balanced equation

pðujs ¼ sobsÞK�u; u9

� ¼ p�u9��s ¼ sobs

�K�u9; u�of the Metropolis–Hastings chain.

h

Suppose that, given the parameters u; the distribution of the statistics vector s is multivariate normal according to the GLM

s ¼ cþ Cuþ e;

where e � Nð0;SsÞ and for any m3 n matrix C: If the prior distribution of the parameter vector is u � Nðu0;SuÞ; then theposterior distribution of u given sobs is

ujsobs�NðDd;DÞ (A2)

ABC in High Dimensions 903

Page 12: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

with D ¼ ðC9S21s C þ S21

u Þ21 and d ¼ C9S21s ðsobs 2 cÞ þ S21

u u0 (see, e.g., Leuenberger and Wegmann 2010). We have thefollowing:

Theorem 2. Let ci be the ith column of C and bi ¼ S21s ci: Moreover, let

ti ¼ tiðsÞ ¼ b9is:

Then ti is sufficient for the parameter ui and the collection of statistics

t ¼ ðt1; . . . ; tnÞ9

yields the same posterior (A2) as s:In practice, the design matrix C is unknown. We can perform an initial set of simulations from which we can infer that

Covðs; uiÞ ¼ VarðuiÞci:

A reasonable estimator for the sufficient statistic ti is then ti ¼ bb9is with

bi ¼ S21s Ssui ; (A3)

where Ss and Ssui for i ¼ 1; . . . ; n are the covariances estimated, for instance, using ordinary least squares.Proof for Theorem 2. It is easy to check that the mean of ti is mi ¼ t9iðCuþ cÞ and its variance is s2

i ¼ ti9Ssti9: The covariancebetween s and t is given by

Sst ¼ Eððs2Cu2 cÞðti 2miÞÞ¼ Eðee9tiÞ ¼ Ssti

Consider the conditional multinormal distribution sjti: Using the well-known formula for the variance and the mean of aconditional multivariate normal (see, e.g., Bilodeau and Brenner 2008), we get that the covariance of sjti is given by

Ssjt ¼ Ss2s22i SstS9st

and thus is independent of u: The mean of sjti is

msjt ¼ Cuþ cþ s22i Sstti9ðs2Cu2 cÞ:

The part of this expression depending on ui is �I2

Sstiti9ti9Ssti

�ciui:

Inserting ti ¼ S21s ci we obtain

  ci 2SsS

21s cici9S

21s ci

ci9S21s SsS

21s ci

!ui ¼ ðci 2 ciÞui ¼ 0:

Thus the distribution of sjti is independent of ui and hence ti is sufficient for ui:To prove the second part of Theorem 2, we observe that t is given by the linear model

t ¼ C9S21s s ¼ C9S21

s Cuþ C9S21s cþ h

with h ¼ C9S21s e: Using CovðhÞ ¼ C9S21

s C we get for the posterior variance�C9S21

s

�C9S21

s C�21

C9S21s C þ S21

u

�21

¼�C9S21

s C þ S21u

�21 ¼ D:

Similarly we see that the posterior mean is Dd:

h

904 A. Kousathanas et al.

Page 13: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

GENETICSSupporting Information

www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.187567/-/DC1

Likelihood-Free Inference inHigh-Dimensional Models

Athanasios Kousathanas, Christoph Leuenberger, Jonas Helfer, Mathieu Quinodoz,Matthieu Foll, and Daniel Wegmann

Copyright © 2016 by the Genetics Society of AmericaDOI: 10.1534/genetics.116.187567

Page 14: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

Figure S1: Prediction error for partial least squares analysis (PLS). The root mean squaredprediction error (RMSEP) is shown for parameters Ne and s as a function of increasing numberof PLS components. We performed PLS using Wright-Fisher simulations of 100 loci where weassumed uniform priors for Ne and s (U [0.5, 4.5] and U [0, 1], respectively).

Page 15: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

L

σχ

slNe

ξl

Fl

Figure S2: Directed acyclic graph describing the Wright-Fisher model examined in thisstudy. Solid circles represent parameters to be estimated. The dashed square represents the fulldata, which is summerized here by a vector of statistics F l, indicated by a solid square. Nodescontained in the plate are repeated for each locus l ∈ {1, . . . , L} times.

Page 16: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

A B C

D E F

D E

FF

D Eε(log10Ne) ε(log10σ) ε(χ)

Figure S3: Accuracy in estimating Ne and DFE parameters σ and χ jointly. (A,B,C)Sets of simulations of 100 loci were conducted for combinations of parameters σ and χ over a gridfrom their prior range and we evaluated the median approximation error (ε=estimate-true) over 25replicates. Color gradients indicate the extent of overestimation (red) or underestimation (blue)of each parameter. These results suggest very high accuracy when estimating Ne with maximumε ≈ 0.04 or 1% of the prior range and rather low for σ (about 10% of the prior range). In contrast,ε is rather large for χ, spanning up to 75% of the prior range. This is due to several combinationsof χ and σ leading to very similar shapes of the truncated GPD. This is illustrated in panels D,E anf F, where we show the true (dashed black line) versus estimated (red) DFE obtained for 50replicates using parameter combinations of χ and σ as indicated in panels B and C.

Page 17: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

A B C D

Figure S4: Accuracy in inferring demographic and selection parameters using the PLS5/2 set of statistics. Results were obtained with ABC-PaSS using �ve and two PLS componentsfor Ne and each s, respectively (PLS 5/2). Shown are the true versus estimated posterior mediansfor parameters Ne (A), s per locus (B), χ and σ of the Generalized Pareto distribution (C andD, respectively). Boxplots summarize results from 25 replicate simulations, each with 100 loci.Uniform priors over the whole ranges shown were used. (A, B): Ne assumed in the simulations isrepresented as a color gradient of red (low Ne) to yellow (high Ne). (C,D): Parameters µ and Ne

were �xed to 0 and 103, respectively, log10σ was �xed to -1 (C) and χ was �xed to 0.5 (D).

Page 18: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

A B C D

Figure S5: Accuracy in inferring demographic and selection parameters using the PLS3/1 set of statistics. Results were obtained with ABC-PaSS using three and one PLS componentsfor Ne and each s, respectively (PLS 5/2). Shown are the true versus estimated posterior mediansfor parameters Ne (A), s per locus (B), χ and σ of the Generalized Pareto distribution (C andD, respectively). Boxplots summarize results from 25 replicate simulations, each with 100 loci.Uniform priors over the whole ranges shown were used. (A, B): Ne assumed in the simulations isrepresented as a color gradient of red (low Ne) to yellow (high Ne). (C,D): Parameters µ and Ne

were �xed to 0 and 103, respectively, log10σ was �xed to -1 (C) and χ was �xed to 0.5 (D).

Page 19: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

A B C D

Figure S6: Allele trajectories (A, C) and posterior estimates for Nes (B, D) for control(A, B) and drug-treated (C, D) In�uenza. Non-signi�cant loci are colored grey and signi�cantloci are colored with a unique color for each locus.

Page 20: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

A B C D

Figure S7: Inferred demography and selection for experimental evolution of Infuenza using thePLS 3/1 set of statistics. We show results for the no-drug (control) and drug-treated In�uenza ingrey and orange, respectively. Shown are the posterior distributions for log10Ne (A) and log10σand χ (B). In panel C, we plotted the modal distribution of �tness e�ects (DFE) with thick linesby integrating over the posterior of its parameters. The thin lines represent the DFEs obtained bydrawing 100 samples from the posterior of σ and χ. Dashed lines in panels A and C correspond tothe prior distributions. In panel D, the posterior estimates for Nes per locus versus the position ofthe loci in the genome are shown. Open circles indicate non-signi�cant loci whereas closed, thickcircles indicate signi�cant loci (P (Nes > 10) > 0.95, dashed line).

Page 21: Likelihood-Free Inference in High-Dimensional Models · 2016. 6. 1. · ABC step tofind an optimal starting positionuð1Þ and tol-eranced i and toadjust the proposalkernelforeachparam-eter

Simulation set Parameter Method RMSE Pearsons R2

Set 1

sLC 1/1 0.0700 0.970PLS 3/1 0.0809 0.960

PLS 5/2 0.0743 0.966

log10(Ne)

LC 1/1 0.178 0.969

PLS 3/1 0.199 0.962PLS 5/2 0.171 0.972

Set 2

sLC 1/1 0.0191 0.954PLS 3/1 0.0390 0.957

PLS 5/2 0.0195 0.953

log10(Ne)

LC 1/1 0.0292 -

PLS 3/1 0.0390 -PLS 5/2 0.0485 -

χLC 1/1 0.126 0.911PLS 3/1 0.127 0.910

PLS 5/2 0.140 0.895

Set 3

s

LC 1/1 0.0226 0.987

PLS 3/1 0.0228 0.986PLS 5/2 0.0227 0.986

log10(Ne)LC 1/1 0.0427 -PLS 3/1 0.189 -

PLS 5/2 0.374 -

log10(σ)

LC 1/1 0.0783 0.988

PLS 3/1 0.0776 0.989PLS 5/2 0.0844 0.984

Set 4

sLC 1/1 0.0258 0.982PLS 3/1 0.0259 0.981

PLS 5/2 0.0254 0.981

log10(Ne)

LC 1/1 0.0577 -

PLS 3/1 0.0889 -PLS 5/2 0.351 -

log10(σ)LC 1/1 0.0191 0.961PLS 3/1 0.0180 0.968

PLS 5/2 0.0167 0.968

χ

LC 1/1 0.306 0.667

PLS 3/1 0.306 0.659PLS 5/2 0.291 0.666

Table S1: Performance of ABC-PaSS coupled with di�erent dimension reduction tech-niques in Wright-Fisher simulations. We computed the root mean square error (RMSE) andPearson's correlation (R2) between true and estimated parameter values using three di�erent di-mension reduction strategies: a single linear combination per parameter calculated according toTheorem 2 (LC 1/1), three and one PLS components for parameters log10Ne and s, respectively(PLS 3/1), and �ve and two PLS components for parameters log10Ne and s, respectively (PLS5/2). Results shown are for four sets of 25 replicate simulations: Set 1 assumed uniform priors forNe and s (U [0.5, 4.5] and U [0, 1], respectively), Sets 2-4 assumed a generalised pareto distributionfor s with hyperparamers σ and χ. For Set 2 we varied χ (U [−0.2, 1]) and kept σ �xed (= 0.01),for set 3 we varied log10σ (U [−2.5,−0.5]) and kept χ �xed (= 0.5) and for Set 4 we varied bothσ and χ. For Sets 2-4 we performed simulations only for log10Ne = 3, thus R2 is not calculablefor these. The dimension reduction strategy with the smallest RMSE for each parameter per setis highlighted in grey.