12
A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1 , Professor Robert Weiss 2 1 Department of Biomathematics, UCLA School of Medicine, Los Angeles CA 90095-1766 USA 2 Department of Biostatistics, UCLA School of Public Health, Los Angeles CA 90095-1772 USA

A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

Page 1: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

A Bayesian approach to inferring recent selective sweeps in West African

Anopholes gambiae populations

John Marshall1, Professor Robert Weiss2

1Department of Biomathematics, UCLA School of Medicine, Los Angeles CA 90095-1766 USA

2Department of Biostatistics, UCLA School of Public Health,Los Angeles CA 90095-1772 USA

Page 2: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

Using microsatellite alleles to detect recent selective sweeps

Microsatellites:

– Tandem repeats of short DNA segments typically 1-5 bp in length

– Alleles defined by number of repeats at a particular locus

– Multiallelic → highly informative markers

Factors affecting variance in microsatellite allele size:

1. Locus specific:

• Microsatellite mutation rate (mainly due to ‘slippage’ during DNA replication)

2. Population specific:

• Effective population size

• Population-level events (migration, bottlenecks)

3. Population and locus specific:

• Hitchhiking of a microsatellite allele to a selected gene

Page 3: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

The lnRV statistic

222

2

21

2

1

21lnlnlnln jijijii

ji

jiRV

jeij iNE 4][ 2

21

2

1

2

1 lnln4

4ln

][

][ln

2

2

ii

i

iji

eeje

je

ji

NNN

N

E

E

•From population genetics, variance in microsatellite allele size at a given locus (j) in a given population (i) is a function of effective population size (Nei) and microsatellite mutation rate (j)

•Taking the ratio of expected variances in microsatellite allele sizes for a pair of populations (i1 and i2) thus removes the locus-dependence

•For a pair of populations (i1 and i2) the ratio of variances for a set of loci (j=1,2,…,T) can be calculated

•Using coalescent simulations, the lnRV values have empirically been shown to follow a normal distribution.

•A microsatellite near to a selected locus is expected to have reduced variance and hence to have an lnRV value that is an outlier from the otherwise normal distribution of lnRV values

Page 4: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

Pros and cons of the lnRV statistic

PROS:– Easy and fast to calculate– Intuitive to understand– Can cope with a very large

number of loci– Not sensitive to genetic drift,

migration or inbreeding since these processes affect all loci to the same extent and so are removed in the ratio calculation

CONS:

– Much information is lost when a set of allele size data at a particular locus for all individuals in a population is reduced to a single value

– Only makes pair-wise comparisons

– Difficult to extrapolate methodology to >2 populations

– Inferences from pairs of populations are not carried over to other populations

– Masking can occur when multiple outliers expand the confidence interval and lead to none or only a subset of outliers being detected

Page 5: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

The Bayesian model

),(~ 2ijijijk Ny

),(~

),(~

),(~

),0(~

),0(~

),0(~

..

..

..

0

ba

ba

ba

ij

j

i

ijjiij

Gamma

Gamma

Gamma

N

N

N

),(~

),(~

),(~

),0(~

),0(~

),0(~

ln

..

..

..

02

ba

ba

ba

ij

j

i

ijjiij

Gamma

Gamma

Gamma

N

N

N

Mean components: Variance components:

Distribution of microsatellite allele sizes:

(i indexes population, j indexes locus, k indexes individual)

Page 6: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

Consistency between lnRV statistic and Bayesian ANOVA

)()(ln

),0(~

),0(~

),0(~

ln

2121

02

jijiii

ij

j

i

ijjiij

RV

N

N

N

ji

jiee

ij

j

e

ijjeij

s

sNNRV

Ns

N

NN

sNE

i

i

2

1

21ln)ln(lnln

),(~ln

),(~ln

),(~ln

lnlnln4ln])[ln( 2

ji

jijiji s

s

2

1

21ln

Bayesian ANOVA: lnRV statistic:

Relative selection:

Page 7: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

Bayesian statistics for detecting selective sweeps

)|Pr(max

yijji

)|(max

yE ijji

jimax For a given locus j, the population with the smallest fractional

reduction in allele size variance is denoted imax and has this corresponding variance component.

• Here BnM has the largest value so is least selected

• BnB and SeB have the smallest values so are most selected

• The extent of selection can be measured by:

• And:

Relative selection at locus j can be measured relative to population imax, e.g.:

Page 8: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

Pros and cons of Bayesian approach

PROS:– Doesn’t shrink data down to

summary statistics before analysis

– Can be used to compare >2 populations at once

– Inferences from one population are carried over to all others

– Can cope with any number of selected loci without shielding occurring

– Supplies quantitative measures of the probability that selection has occurred

– Can cope well with tiny sample sizes

CONS:– Can take a long time to converge– Sometimes requires a lot of computer

power– Bayesian methods are more difficult to

implement• Require well-specified prior

distributions• Require programming, use of

complicated software– Inferences are slightly determined by

subjective choice of prior distributions

Page 9: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

Microsatellite data for West African Anopholes gambiae populations

• 1998 data set: – Allele size data collected at 21

microsatellite loci dispersed throughout Anopholes gambiae

– 5 subpopulations:• Bamako chromosomal form in villages

of Banambani and Selinkenyi• Mopti chromosomal form in villages of

Banambani and Selinkenyi• Savannah chromosomal form in village

of Banambani

• 2003 data set:– Microsatellite allele size data collected at

12 microsatellite loci dispersed throughout Anopholes gambiae chromosome 3

– Data taken for 12 subpopulations• Mopti chromosomal form in the villages

of Oure, Dire, Kondi, Nampala, Torkya and Banikane

• Savannah chromosomal form in the villages of Oure, Gono, Kokouna, Pimperena, Soulouba and Madina Diasra

Page 10: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

Loci likely targeted by recent selective sweeps (1998 data set)

|025

637|

Locus Chromosome Chromosomal form

Location

1 637 2L Bamako Banambani & Selinkenyi

2 038 X Savannah Banambani

3 135 2 Mopti Banambani & Selinkenyi

4 079 2 Mopti Banambani & Selinkenyi

5 175 2 Mopti Banambani & Selinkenyi

6 095 2R Mopti Banambani & Selinkenyi

7 025 X Savannah Banambani

Applying the Bayesian ANOVA model to the 1998 data set, there is evidence of selection (in order of magnitude) in:

Locus 637: 637

/

Page 11: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

Loci likely targeted by recent selective sweeps (2003 data set)

Locus Chromosome Chromosomal form

Location

1 119 3R Mopti Oure

2 127 3R Savannah Oure

3 093 3L Mopti Kondi & Banikane

Savannah Gono & Kokouna

4 812 3 Mopti Nampala & Dire

5 817 3L Savannah Soulouba

6 555 3 Savannah Madina Daisra

Applying the Bayesian ANOVA model to the 2003 data set, there is evidence of selection (in order of magnitude) in:

Locus 119:119

|

Page 12: A Bayesian approach to inferring recent selective sweeps in West African Anopholes gambiae populations John Marshall 1, Professor Robert Weiss 2 1 Department

Implications for recent selection in Anopholes gambiae genome

119|

-577-093

059-

1998 data set:– Strongest evidence for selection

is for: • locus 637 (chromosome 2) in

Bamako form

• locus 038 (X chromosome) in Savannah form

– Most selected loci are on chromosome 2

– For a given chromosomal form collected at Banambani and Selenkenyi, selection seems to be evident in both locations

– The same does not apply for a given location where multiple chromosomal forms are collected

– Suggests there is more gene flow between these two villages than there is between chromosomal forms

2003 data set:– Strongest evidence for

selection is for:• locus 119 (chromosome 3R)

in Mopti form in Oure

• Locus 127 (chromosome 3R) in Savannah form in Oure

– Selected loci are dispersed throughout chromosome 3 (only chromosome 3 loci were analyzed in this data set)

– This time there is very little correlation for given chromosomal forms collected at neighbouring locations

– Possibly selection on chromosome 3 is weaker (1998 data set showed no selection on chromosome 3)