24
Introduction Definitions What’s a marker Mutation models Module de Master 2 Biostatistique: mod` eles de g´ en´ etique des populations Genetic markers for the study of the genetic polymorphism in natural populations Rapha¨ el Leblois & Fran¸cois Rousset Centre de Biologie pour la Gestion des populations (CBGP, UMR INRA) December 2013

Module de Master 2 Biostatistique: modèles de …raphael.leblois.free.fr/ressources/cours/CoursM2Biostat3_Genetic... · Module de Master 2 Biostatistique: modèles de génétique

Embed Size (px)

Citation preview

Introduction Definitions What’s a marker Mutation models

Module de Master 2 Biostatistique: modeles de genetique des populations

Genetic markers for the study of the

genetic polymorphism in natural

populations

Raphael Leblois & Francois Rousset

Centre de Biologie pour la Gestion des populations (CBGP, UMR INRA)

December 2013

Introduction Definitions What’s a marker Mutation models

Population Genetics (reminder...)

• Infer allelic and genotypic frequencies,to study their distribution within and between populations

• Understand the evolution of gene and genotype frequencieswithin and between populations due to the differentevolutionary forces : mutation, drift, gene flow, selection

Introduction Definitions What’s a marker Mutation models

Population Genetics (reminder...)

• Theoretical : necessary to test verbal hypotheses and models,and to produce new theoretical hypotheses and models

• “Experimental” : test theoretical models and hypothesesunder controlled conditions

• “Empirical” : study the distribution of polymorphism innatural populations

→ infer the demographic and adaptive history of naturalpopulations

Introduction Definitions What’s a marker Mutation models

Polymorphism in natural populations : different form ofvariation

• Within individuals

• Between individuals/withinpopulation

Collection of snails from a polymorphicpopulation of Cepaea nemoralis inPoland.This illustrates the variety of shellcolours (Yellow, Pink, Brown) andbanding (0, 1, 5) typically found.

Introduction Definitions What’s a marker Mutation models

Polymorphism in natural populations : different form ofvariation

• Between populations

17 000 polymorphic sites in876 DNA sequences from96 individuals

Introduction Definitions What’s a marker Mutation models

Some definitions

• Gene : copy of a genetic information, carried by a sequence ofnucleotides. A diploid individual has two copies of a gene.

• Locus : location of a gene on a chromosom.

• Allele (“allelic state”) : class of equivalent homologous genes(i.e. in the same state).

From those definitions :At a given locus, a diploid individual has two homologous genes,which can belong to the same allelic class if he is homozygous.

Introduction Definitions What’s a marker Mutation models

What is a genetic marker ?

• A good genetic marker should :• have a simple mode of inheritance (e.g. Mendelian)• be polymorphic• be co-dominant• be neutral (only for demographic inferences)

Genetic markers are used to describethe genetic polymorphism and itsdistribution within and betweenindividuals, populations, or species,(because we do not have directaccess to individual genomes).

Introduction Definitions What’s a marker Mutation models

The different genetic markers (1)

PCR

• Microsatellite markers :repetitions of short DNA motives,many loci dispersed in the genome

• Medelian inheritance• high polymorphism• co-dominant• “neutral”

Introduction Definitions What’s a marker Mutation models

The different genetic markers (2)

SNPs are the typical markers of the “next generation sequencingtechnics” (NGS) epoch.

• SNPs : Single Nucleotide Polymorphisms• Medelian inheritance• low polymorphism

0 / 1 = ancestral / derived states• many many SNPs in the genome• co-dominant• “neutral” or “selected” (good for the study

of selection)

Introduction Definitions What’s a marker Mutation models

The different genetic markers (4)

Phasing : For diploid organisms, each observed polymorphism (SNP) onthe DNA sequence needs to be attributed to a given strand (i.e. maternalor paternal DNA) to get the two haplotypes carried by each individual

• DNA sequences : acces to the nucleotidesequence of “short” DNA fragments.

• Medelian inheritance• intermediate polymorphism depending on

the length• co-dominant but difficult to “phase”• “neutral” or “selected” : intra vs intergenic

sequences

Introduction Definitions What’s a marker Mutation models

The different genetic markers (5)

• Whole genomes : ideally the best genetic data !• Medelian inheritance• high polymorphism• co-dominant but difficult to “phase”• “neutral” and “selected”

Next generation sequencing (NGS) technics are clearly revolutionizingpopulation genetics.Data are no more limiting but existing methods can not deal with suchlarge data sets, they must be adapted and new methods must bedeveloped. See course 5 : CoalHMM models

Introduction Definitions What’s a marker Mutation models

Modelling of the mutation processes of the differentgenetic markers

• To use population genetic models with the data obtained fromgenetic markers, we need to take into account :

• the mutation processes creating the DNA variants• but also the difference between the DNA variants and what we

can observe given the molecular technics used

→ need to define mutation models

2 contrasting examples of mutation processes :the case of microsatellites vs. DNA sequence data

Introduction Definitions What’s a marker Mutation models

Modelling of the mutation processes of the differentgenetic markers

• What is the main cause of mutation at microsatellite loci :• sequences of repeated short DNA motifs, e.g. (CA)10

• creation of DNA loops during replication

Introduction Definitions What’s a marker Mutation models

Modelling of the mutation processes of the differentgenetic markers

• What is the main cause of mutation at microsatellite loci :• sequences of repeated short DNA motifs, e.g. (CA)10

• creation of DNA loops during replication

→ (1) the new mutated allele have gained or loss (CA) motif(s)

→ (2) it is a relatively frequent process, leading to high mutationrates (e.g. 5 · 10−4 per generation)

Introduction Definitions What’s a marker Mutation models

The different mutation models

• Several mutation models have been developped for allelicdata :

• The infinite allele model (IAM) assumes that every new allelecreated by mutation is different from the existing ones.Identity in state is thus equivalent to identity by descent.

• The K-allele model (KAM) assumes there are K possibleallelic states and that mutations from one to all other statesare equiprobable (1/(K − 1)).

Introduction Definitions What’s a marker Mutation models

The different mutation models

• Several mutation models have been developped for allelicdata :

• The stepwise mutation model (SMM) was designed to analyzealleles characterized by their electrophoretic mobility. Itassumes that each mutation increase or reduce the mobility byone “step”.Application to microsatellite markers is direct by consideringloss or gain of one repeated motif (e.g. (TG)).

Introduction Definitions What’s a marker Mutation models

Modelling of the mutation processes of the differentgenetic markers

• Main mutation processes occurring on DNA sequences are :

- Single nucleotide changes (i.e. A�T�G�C)

- Insertions / deletions of one or more nucleotides- mutations are rare : 10−8 to 10−12 per nucleotide per generation

• Several models exist for nucleotide evolution :

- The simplest : the Infinitely many Site Model (ISM)

Introduction Definitions What’s a marker Mutation models

Modelling of the mutation processes of the differentgenetic markers

• Main mutation processes occurring on DNA sequences are :

- Single nucleotide changes (i.e. A�T�G�C)

- Insertions / deletions of one or more nucleotides- mutations are rare : 10−8 to 10−12 per nucleotide per generation

• Several models exist for nucleotide evolution :

- The simplest : the Infinitely many Site Model (ISM)

An infinitely long sequence → each mutation occur on a different site

Each polymorphic site can have 2 states : ancestral (0) and derived (1)

Haplotypes of a sample can be written as a series of 0/1 : e.g. 101011011

Introduction Definitions What’s a marker Mutation models

Modelling of the mutation processes of the differentgenetic markers

• Main mutation processes occurring on DNA sequences are :

- Single nucleotide changes (i.e. A�T�G�C)

- Insertions / deletions of one or more nucleotides- mutations are rare : 10−8 to 10−12 per nucleotide per generation

• Several models exist for nucleotide evolution :

- The simplest : the Infinitely many Site Model (ISM)- Many other models exist...with 0 to 10 parameters...

e.g. combining different nucleotide transition rates, specificinsertion/deletions rates, time variables rates, etc.

They are usually calibrated using phylogenetics and fossil data.

Introduction Definitions What’s a marker Mutation models

Modelling of the mutation processes of the differentgenetic markers

More generally, mutation processes acting on the genetic markerscan be modeled using Markov chains and a transition probabilitymatrix between alleles or haplotypes (mutation matrix U).

For a KAM with K = 6 states, the transition matrix is

U ≡ (uij) =

1− u uK−1

uK−1

uK−1

uK−1

uK−1

uK−1 1− u u

K−1u

K−1u

K−1u

K−1u

K−1u

K−1 1− u uK−1

uK−1

uK−1

uK−1

uK−1

uK−1 1− u u

K−1u

K−1u

K−1u

K−1u

K−1u

K−1 1− u uK−1

uK−1

uK−1

uK−1

uK−1

uK−1 1− u

Introduction Definitions What’s a marker Mutation models

Modelling of the mutation processes of the differentgenetic markers

More generally, mutation processes acting on the genetic markerscan be modeled using Markov chains and a transition probabilitymatrix between alleles or haplotypes (mutation matrix U).

For a KAM with K = 6 states, the transition matrix is

U ≡ (uij) =

1− u u/5 u/5 u/5 u/5 u/5u/5 1− u u/5 u/5 u/5 u/5u/5 u/5 1− u u/5 u/5 u/5u/5 u/5 u/5 1− u u/5 u/5u/5 u/5 u/5 u/5 1− u u/5u/5 u/5 u/5 u/5 u/5 1− u

Introduction Definitions What’s a marker Mutation models

Modelling of the mutation processes of the differentgenetic markers

More generally, mutation processes acting on the genetic markerscan be modeled using Markov chains and a transition probabilitymatrix between alleles or haplotypes (mutation matrix U).

For a SMM with 6 states, the transition matrix is

U ≡ (uij) =

1− u u 0 0 0 0u/2 1− u u/2 0 0 0

0 u/2 1− u u/2 0 00 0 u/2 1− u u/2 00 0 0 u/2 1− u u/20 0 0 0 u 1− u

Introduction Definitions What’s a marker Mutation models

Modelling of the mutation processes of the differentgenetic markers

More generally, mutation processes acting on the genetic markerscan be modeled using Markov chains and a transition probabilitymatrix between alleles or haplotypes (mutation matrix U).

Those mutation matrix are everywhere in population genetics to gofrom identity by descent (i.e. no mutation since the commonancestor of two genes, IAM, ISM) to identity in state (i.e. theobserved allelic type of two genes is the same).

Introduction Definitions What’s a marker Mutation models

Books

Notion of genetic markers and mutation models are in all goodbooks of population genetics...especially developed from a biological point of view in