39
When Bayes meets Darwin: a journey in popula6on genomics [email protected] Laboratoire TIMCIMAG, Grenoble

Blum

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Blum

 When  Bayes  meets  Darwin:  a  journey  in  popula6on  genomics    

 [email protected]  

Laboratoire  TIMC-­‐IMAG,  Grenoble    

Page 2: Blum

In   the   “descent   of   man”,   Darwin  concluded  that  the  visual  differences    between  human  popula6on  were   not  adap6ve  to  any  significant  degree  […]  

“Natural  selec,on  has  almost  become  irrelevant   in  human  evolu,on.  There's  been   no   biological   change   in   humans  in  40,000  or  50,000  years”    Stephen  J.  Gould  

Page 3: Blum

But  here  is  a  counter-­‐example  •  Tibetan  popula6ons  got  adapted  to  their  high-­‐al6tude  and  

low-­‐oxygen   environment   thanks   to   increased   respiratory  rate  and  increased  blood  flow.  

•  These   traits   are   transmiTed   from   genera6on   to  genera6on.  

•  Tibetan  plateau  has  been  inhabited  since  ~  20,000  years.  

Page 4: Blum

Local  adapta6on  •  Human   adapta6on   to   high-­‐al6tude   is   an   instance   of   local  

adapta6on.  •  Understanding   how   individuals   adapt   to   their   local  

environment   is   central   in   biology.   Plants   adapt   to   their  environment,  bacteria  adapt  to  an6bio6cs…  

•  Defini6on   of   local   adapta6on:   greater   fitness   (a  measure   of  reproduc6ve  fitness)  of   individuals   in  their   local  habitats  due  to  natural  selec6on.  

How  to  find  genomic  regions  involved  in  local  adapta6on?  

Page 5: Blum

Data  descrip6on  

Page 6: Blum

Single  Nucleo6de  Polymorphism  (SNP)  Indiv  1                                                                          ....ACCCG……….                                                                                  ....AACCG……….    

Number  of  copy            1          0  Indiv  2                    ….ACCCT……….                              ….ACCCT……….      

Number  of  copy            0          2  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  •  3  billion  base  pairs  in  the  human  genome  •  Commercial  SNP  chips,  100€  for  500,000  SNPs  •  dbSNP  >106  SNPS    

Page 7: Blum

Single  Nucleo6de  Polymorphism  (SNP)  

Locus  1     Locus  2     Locus  3    

Indiv  1   1   0   2  

Indiv  2   0   2   0  

Indiv  3   0   0   0  

Indiv  4   0   1   1  

Indiv  5   1   1   1  

Data  matrix  Y    

Page 8: Blum

Main  principle  of  popula6on  genomics    •  Genome-­‐wide  paTerns  are  influenced  by  neutral  processes.  Migra6on,  admixture,  expansion  

•  Genes  involved  in  local  adapta6on  are  outliers.  

   

Page 9: Blum

Adapta6on  to  al6tude  Manha?an  plot  

Xu  et  al.  MBE  2011  

Page 10: Blum

Human  HGDP  data  

Page 11: Blum

Genome-­‐wide  paTerns  

Page 12: Blum

Principal  component  analysis  

ï� 0 � 10 �� 20

�

01

02

03

0

PC1

PC

2

Africa

America

Oceania

Middle-East

Europe

East Asia

Asia

Page 13: Blum

Principal  component  analysis  

Novembre  et  al.  Nature  2008  

Page 14: Blum

Genome  scan  for  local  adapta6on:  a  Bayesian  PCA  approach  

Page 15: Blum

 Singular  Value  Decomposi6on  (SVD)  

viewpoint  of  PCA    

In  matrix  nota6on,  we  have  

Y =UV,where  Y  is  the  genotype  (n,p)  matrix,  U  is  the  (n,K)  score  matrix  and  V  is  the  loadings  (K,p)  matrix.    Varia6ons  around  SVD  in  machine  learning  matrix  factoriza,on,  low-­‐rank  approxima,on,  probabilis,c  PCA,  factor  analysis,…  

Page 16: Blum

 Singular  Value  Decomposi6on    (SVD)  

viewpoint  of  PCA    

An  op6mal  approxima6on  of  rank  K  for  the  matrix  of  genotypes  Y      

Yi = uikV k

k=1

K

Yi:  Genotype  of  the  ith  individual  (0,1,1,2,0,0,…..)  

Vk:  vector  of  loadings    of  the  same  length  as  Yi  

( k,1v , k,2v , k,3v ,...)

Page 17: Blum

Bayesian  principal    component  analysis  

 

p(v j ) = (1−π ) Ν(0,σ 2 )+π Ν(0,c2σ 2 ),

•  A  probabilis6c  version  of  PCA                Tipping  and  Bishop  1999  

•  The  variance-­‐infla6on  model  for  outlier  detec6on            Box  and  Tiao  1968    

where  π  is  the  genome-­‐wide  outlier  probability,  and  the  prior  for  c2  is  uniform(1,c2max).  

Yi = uikV k

k=1

K

∑ +εi.

Page 18: Blum

Accoun6ng  for  local  correla6on  in  the  genome  

Ising  model  (Outlier  Zj=1,  non-­‐outlier  Zj=0)  P(Z j =1)∝π exp(β. Zk

k~j∑ ),

Local  correla6on  because  of  recombina6on  

where  β>0  is  an  hyperparameter.    

Page 19: Blum

A  hierarchical  Bayesian  model  Gibbs  sampler  for  sampling  the  posterior  

Y  

U   V  

σ  

c   cmax  

β  π  

Z  K  

σ0  

Page 20: Blum

Low-­‐rank  approxima6on  for  outlier  detec6on  in  video  sequences  

Page 21: Blum

Bayesian  scores  for  detec6ng  outliers  

BF = P(Y j outlier) / P(Y j non−outlier)

P(outlier Y j ) / P(non−outlier Y j ) = prior.odds*BF

•  Bayes  factors:  a  Bayesian  alterna6ve  to  P-­‐values  

•  Posterior  odds  

•  For  any  list  of  outlier  SNPs,  a  false  discovery  rate  can  be  es6mated  based  on  posterior  odds.  

Page 22: Blum

Ex  1:  a  simula6on  study  in  a    divergence  model  

 

Neutral  divergence  (ms)  

Divergence  with  selec6on  (SimuPOP)  4%  out  of  10,000  SNPs  under  selec6on    

Page 23: Blum

Other  methods  for  genome  scan  of  local  adapta6on  

•  Fst    A    measure  of  differen6a6on  between  popula6ons    •  BayeScan  (Foll  and  Gaggios  2008)  •  Both  methods  assume  (implicitely  or  explicitely)  a  mechanis6c  

model  of  instantaneous  divergence  

Page 24: Blum

Popula6on  structure  

ï�� ï� � � ��

�

�

���

3&�

PC2

Neutral  

Adap6ve  

Page 25: Blum

Selec6on  scan  

0 2000 4000 6000 8000 10000

02

46

8

SNP

log1

0(BF

)

PC 1PC 2 PC 3

Page 26: Blum

Comparing  methods  of    selec6on  scan  

 

0.01 0.02 0.03 0.04 0.05

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Divergence time

Fals

e di

scov

ery

rate

BayeScanPCAdaptFst T  

Advantage  of  non-­‐parametric  methods  in  data-­‐rich  situa6ons  

Page 27: Blum

Ex  2:  a  spa6ally-­‐explicit    simula6on  

with  a  gradient  of  selec6on  

0.5

0.5

0.5

0

0.5

1

1.5

2

Page 28: Blum

Popula6on  structure  

1

0

1 1.5

1

0.5 0

0.5

1

1.5

0.5

0.5

0.5

0

0.5

1

1.5

2

PC 1 PC 2 PC 3

Page 29: Blum

Selec6on  scan  

0 500 1000 1500 2000

050

100

150

200

250

SNP

log1

0(BF

)PC 1PC 2 PC 3

Page 30: Blum

Applica6on  to  the  human    HGDP  data  

ï� 0 � 10 �� 20

�

01

02

03

0

PC1

PC

2

Africa

Americas

Oceania

Middle-East

Europe

East Asia

Asia

Page 31: Blum

ManhaTan  plot    

0e+00 2e+07 4e+07 6e+07 8e+07

ï�0

�2

34

Physical position

ORJ���%)

�3&�PC2PC3PC4

ABCC11

Top  hit  is  in  chromosome  16  

Page 32: Blum

Geographic  distribu6on  of  the  top-­‐SNP    

Involved  in  earwax  type  (cerumen)  and  transpira6on  

Page 33: Blum

Enrichment  analysis  

ï� 0 � 10 �� 20

�

01

020

30

PC1

PC

2Africa

Americas

Oceania

Middle-East

Europe

East Asia

Asia

Are  PC2  outliers  enriched  for  genes  involved  in  immunity?  

Page 34: Blum

Big  data  

What  can  you  do  with  millions  of  SNPs?  Scalable  Bayesian  computa6on?  

Standard  PCA  and  permuta6on  tests.  

Page 35: Blum

A  George  Box  (1919-­‐2013)  story  to  conclude  

•  Box  wanted  to  write  a  paper  with  Cox  because  having  a  Box  and  Cox  paper  would  be  fun.  

•  They  decided  to  write  a  paper  on  transforma6on.  •  One   author   wrote   the   Bayesian   version   and   the   other   one  

wrote  the  maximum  likelihood  version.  We  do  not  know  who  wrote  what.  

•  At  the  end,  it  did  not  make  much  prac6cal  difference.  

Page 36: Blum

Nicolas  Duforet-­‐Frebourg  

Page 37: Blum

Spa6al  autocorrela6on  explains  the  PCA  paTern  

Page 38: Blum

Choice  of  K  

2 4 6 8 10 12

0.16

00.

165

0.17

00.

175

0.18

00.

185

K

Mea

n sq

uare

d er

ror

Page 39: Blum

Robustness  w.r.t.  the    choice  of  K  

0.01 0.02 0.03 0.04 0.05

0.0

0.2

0.4

0.6

0.8

1.0

Divergence time

Fals

e di

scov

ery

rate

K=1

K=2

K>2