56
Genetic diversity clustering and AMOVA Aladdin Hamwieh

Genetic diversityclustering and AMOVA

Embed Size (px)

DESCRIPTION

Genetic diversity clustering and AMOVA

Citation preview

Page 1: Genetic diversityclustering and AMOVA

Genetic diversityclustering and AMOVA

Aladdin Hamwieh

Page 2: Genetic diversityclustering and AMOVA

Sources of variation

• Variation from mutation

Mutation rates are so slow that mutation alone cannot account for rapid genetic changes of populations and species.

Page 3: Genetic diversityclustering and AMOVA

Overview of the phenomena that cause genetic change in populations

Nonrandom mating

Gene flow

Page 4: Genetic diversityclustering and AMOVA

Overview of the phenomena that cause genetic change in populations

Page 5: Genetic diversityclustering and AMOVA

Genetic distanceMarker1 Marker2 Marker3 Marker4 Marker5 Marker6 Marker7

Plant1 1 0 1 1 0 1 1

Plant2 1 1 1 0 0 1 0

Plant11 0

Plant2 1 Fa=3 Fb=10 Fc=2 Fd=1

N= Fa+Fb+Fc+Fd

Simple Match distance = Fa/N= 3/7= 0.43Genetic distance (Jaccard) = Fa/(Fa+Fb+Fc) = 3/6= 0.5

Page 6: Genetic diversityclustering and AMOVA
Page 7: Genetic diversityclustering and AMOVA

Similarity coefficients for binary variables

Page 8: Genetic diversityclustering and AMOVA

Dissimilarity indices – Continuous

Euclidean distance

Euclidean Distance is the most common use of distance. In most cases when people said about distance , they will refer to Euclidean distance. Euclidean distance or simply 'distance' examines the root of square differences between coordinates of a pair of objects.

Page 9: Genetic diversityclustering and AMOVA

Example:Point A has coordinate (0, 3, 4, 5) and point B has coordinate (7, 6, 3, -1). The Euclidean Distance between point A and B is

Features

cost time weight incentive Plant A 0 3 4 5 Plant B 7 6 3 -1

Dissimilarity indices – Continuous

Euclidean distance

Page 10: Genetic diversityclustering and AMOVA

Manhattan (City-Block)It is also known as Manhattan distance, boxcar distance, absolute value distance. It examines the absolute differences between coordinates of a pair of objects.

Features cost time weight incentive

Plant A 0 3 4 5 Plant B 7 6 3 -1

Page 11: Genetic diversityclustering and AMOVA

Measures of Marker Informativeness

• Allelic Diversity

Var M1 M2Pl 150/150 130/134P2 150/152 132/134P3 152/152 130/132

Marker Major.Allele.Frquency AlleleNo GeneDiversity PICM1 0.5000 2.0000 0.5000 0.3750M2 0.3333 3.0000 0.6667 0.5926

Mean 0.4167 2.5000 0.5833 0.4838

• Polymorphism Information Content (PIC)

Major allele freq. M1=3/6 = 0.5Div (M1) = 1-0.5 = 0.5

PIC (M1) = 0.5-(0.5X(0.5/2) = 0.375

• Example

Page 12: Genetic diversityclustering and AMOVA

Analysis of molecular variance AMOVA

Page 13: Genetic diversityclustering and AMOVA

Analysis of molecular variance (AMOVA)

• MOVA is a method for estimating molecular variation within a species

• It is not ANOVA :Don’t require the assumption of a normal distribution (Pre-mutational methods)\

• This may contains regions within continent• Individuals within population in area in region

in continent

Page 14: Genetic diversityclustering and AMOVA

Analysis of molecular variance (AMOVA)

Pop1Hs

Pop2Hs

Pop3Hs

DST

DST

DST

GST=DST/Hs

GST measures the pop of gene diversity that is distributed among populations

HT=0.263HS=0.202

DST=0.061 GST=0.2319 Means 23% is the variation among the populations

Page 15: Genetic diversityclustering and AMOVA

Analysis of molecular variance (AMOVA)

Example 1 This model, also called AMOVA, measures gene diversity among populations with specific reference to areas of a region in a continentWe have: i = individuals, j = alleles, k = populations

Page 16: Genetic diversityclustering and AMOVA

Analysis of molecular variance (AMOVA)

Page 17: Genetic diversityclustering and AMOVA

A1 A2 A1 A2 A1 A20 0 0 1 1 11 1 0 1 1 10 0 1 1 0 11 0 1 0 1 10 0 0 1 0 10 0 0 1 0 01 1 1 1 1 10 0 1 1 0 01 0 1 1 1 01 1 1 0 0 11 0 0 1 1 10 0 1 1 1 01 1 1 1 0 11 1 1 0 1 01 1 0 1 1 0

Pop1 Pop1 Pop1

Page 18: Genetic diversityclustering and AMOVA

2011

Clustering

Page 19: Genetic diversityclustering and AMOVA

Phylogenetic Trees and Dissimilarity estimation

Page 20: Genetic diversityclustering and AMOVA

20

Historical Note• Until mid 1950’s phylogenies were constructed

by experts based on their opinion (subjective criteria)

• Since then, focus on objective criteria for constructing phylogenetic trees– Thousands of articles in the last decades

• Important for many aspects of biology– Classification – Understanding biological mechanisms

Page 21: Genetic diversityclustering and AMOVA

Morphological vs. Molecular

• Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc.

• Modern biological methods allow to use molecular features– Gene sequences– Protein sequences– DNA markers

Page 22: Genetic diversityclustering and AMOVA

22

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

From sequences to a phylogenetic tree

There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).

Page 23: Genetic diversityclustering and AMOVA

.

Basic Assumptions Closer related organisms have more similar genomes.

Highly similar genes are homologous (have the same ancestor).

Phylogenetic relation can be expressed by a dendrogram (a “tree”) .

Aardvark Bison Chimp Dog Elephant

Page 24: Genetic diversityclustering and AMOVA

24

Species Phylogeny

Gene Phylogenies

Speciation events

Gene Duplication

1A 2A 3A 3B 2B 1B

Phylogenies can be constructed to describe evolution genes.

Three species termed 1,2,3.Two paralog genes A and B.

Page 25: Genetic diversityclustering and AMOVA

25

Types of Trees

A natural model to consider is that of rooted trees Common

Ancestor

Page 26: Genetic diversityclustering and AMOVA

26

Types of treesUnrooted tree represents the same phylogeny without the

root node

Depending on the model, data from current day species does not distinguish between different placements of the root.

Page 27: Genetic diversityclustering and AMOVA
Page 28: Genetic diversityclustering and AMOVA

Human, Chimp, Gorilla, Orangutan, and Gibbon

Page 29: Genetic diversityclustering and AMOVA

UPGMA

Taxa 1 2 3 4 5 6 7OTU-1 T G C G T A TOTU-2 T G G G T A TOTU-3 T G C G C T TOTU-4 T G C T G T GOTU-5 T A G T A G C

Step 1: Generate data (Sequence/ Genotype/ Morphological) for each OTU.

Page 30: Genetic diversityclustering and AMOVA

Distance can be calculated by using different substitution models:1. # of nucleotide differences.2. p-distance.3. JC distance4. K2P distance.5. F816. HKY857.GTR etc

Step 2: Calculate p- distance for all pairs of taxa

Taxa 1 2 3 4 5 6 7OTU-1 T G C G T A TOTU-2 T G G G T A T

= 0.142857143

Page 31: Genetic diversityclustering and AMOVA

Step 3: Calculate distance matrix for all pairs of taxa and select pair of taxa with minimum distance as new OTU.

Taxa 1 2 3 4 51 0 1 2 4 62 0.1428 0 3 5 53 0.2857 0.4285 0 3 64 0.5714 0.7142 0.4285 0 55 0.8571 0.7142 0.85710.7142 0

OTU-1OTU-2

0.0714

0.0714

Page 32: Genetic diversityclustering and AMOVA

Step 4: Recalculate new distance matrix, assuming OTU-1 and OTU-2 as one OTU.

= 0.3571

taxa 1+2 3 4 51+2 0

3 0.35714 0 4 0.64285 0.4285 0 5 0.78571 0.8571 0.7142 0

Taxa 1 2 3 4 51 0 2 0.1428 0 3 0.2857 0.4285 0 4 0.5714 0.7142 0.4285 0 5 0.8571 0.7142 0.8571 0.7142 0

Page 33: Genetic diversityclustering and AMOVA

Step 5: Select pair of taxa with minimum distance as new OTU.

OTU-1

OTU-2

0.071

0.071

OTU-30.179

0.107

0.107 + 0.071 + 0.179 = 0.357

Page 34: Genetic diversityclustering and AMOVA

Step 6: Again select pair of OTU with minimum distance as new OTU and recalculate distance matrix.

= 0.5714

taxa (1+2)3 4 5(1+2)3 0

4 0.5714 0 5 0.8095 0.7142 0

taxa 1+2 3 4 51+2 0

3 0.35714 0 4 0.64285 0.4285 0 5 0.78571 0.8571 0.7142 0

Taxa 1 2 3 4 51 0 2 0.1428 0 3 0.2857 0.4285 0 4 0.5714 0.7142 0.4285 0 5 0.8571 0.7142 0.8571 0.7142 0

Page 35: Genetic diversityclustering and AMOVA

Step 7: Again select pair of taxa with minimum distance as new OTU.

OTU-2

OTU-10.071

0.071

OTU-30.179

0.107

OTU-40.286

0.107

0.107 + 0.107 + 0.071 + 0.286 = 0.571

Page 36: Genetic diversityclustering and AMOVA

Step 8: Again select pair of OTU with minimum distance as new OTU and recalculate distance matrix.

= 0.7857

taxa ((1+2)3)4 5

((1+2)3)4 0

5 0.7857 0

taxa (1+2)3 4 5(1+2)3 0

4 0.5714 0 5 0.8095 0.7142 0

taxa 1+2 3 4 51+2 0

3 0.35714 0 4 0.64285 0.4285 0 5 0.78571 0.8571 0.7142 0

Taxa 1 2 3 4 51 0 2 0.1428 0 3 0.2857 0.4285 0 4 0.5714 0.7142 0.4285 0 5 0.8571 0.7142 0.8571 0.7142 0

Page 37: Genetic diversityclustering and AMOVA

Step 9: Again select pair of OTU with minimum distance as new OTU and make final rooted tree.

OTU-2

OTU-10.071

0.071

OTU-30.179

0.107

OTU-40.286

0.107

OTU-50.393

0.107

0.393 + 0.107 + 0.107 + 0.107 + 0.071 = 0.785

Page 38: Genetic diversityclustering and AMOVA

Jukes-Cantor distancethe rate of nucleotide substitution is the same for all pairs of the four nucleotides A, T, C, and G

A AA CA GA TC AC CC GC TG AG CG GG TT AT CT GT T

25% similar (= distance of 0.75). 75% which is what you expect with random assignment of nucleotides to a pair of taxa

Page 39: Genetic diversityclustering and AMOVA

طريقة الوراثية UPGMAتفترض القرابة شجرة أفرع طول في ثابتة نسبة

=-(3/4)*LN(1-(((4/3)*0.1594)))

Page 40: Genetic diversityclustering and AMOVA
Page 41: Genetic diversityclustering and AMOVA
Page 42: Genetic diversityclustering and AMOVA
Page 43: Genetic diversityclustering and AMOVA

Method: Neighbor-joining

A B C D E rA (human) — 0.015 0.045 0.143 0.198 0.4010B (chimp) — 0.03 0.126 0.179 0.3500C (gorilla) — 0.092 0.179 0.3460D (orangutan) — 0.179 0.5400E (gibbon) — 0.7350

Page 44: Genetic diversityclustering and AMOVA

Neighbor-joiningطريقة

A B C D E

A (human) — 0.0150 0.0450 0.1430 0.1980

B (chimp) -0.3605 — 0.0300 0.1260 0.1790

C (gorilla) -0.3285 -0.3180 — 0.0920 0.1790

D (orangutan) -0.3275 -0.3190 -0.3510 — 0.1790

E (gibbon) -0.3700 -0.3635 -0.3615 -0.4585 —

A:B = 0.015-(0.4010+0.35)/2

Page 45: Genetic diversityclustering and AMOVA

Example:

A B C D E r r/3A (human) — 0.015 0.045 0.143 0.198 0.4010 0.1337B (chimp) — 0.03 0.126 0.179 0.3500 0.1167C (gorilla) — 0.092 0.179 0.3460 0.1153D (orangutan) — 0.179 0.5400 0.1800E (gibbon) — 0.7350 0.2450

=0.179/2+(0.18-0.245)/2

=0.179-0.0572

Page 46: Genetic diversityclustering and AMOVA

Human and chimpanzee have the smallest value of Mij and they are replaced by node 2.

Page 47: Genetic diversityclustering and AMOVA
Page 48: Genetic diversityclustering and AMOVA

dijMij

Page 49: Genetic diversityclustering and AMOVA

• PHYLIP (Phylogeny Inference Package)

a = 0.016

3

2

1

b = -0.001

c = 0.006

d = 0.057

e = 0.1221'= 0.0403

2'= 0.024

E

D

A

B

C

• UPGMA

• Neighbor-joining (NJ)

Page 50: Genetic diversityclustering and AMOVA
Page 51: Genetic diversityclustering and AMOVA
Page 52: Genetic diversityclustering and AMOVA
Page 53: Genetic diversityclustering and AMOVA

Hamwieh, A., Udupa, S., Sarker, A., Jung, C. and Baum, M. (2009). Development of new microsatellite markers and their application in the analysis of genetic diversity in lentils. Breeding Science 59: 77-86.

Project 2: Genetic diversity in lentils

Page 54: Genetic diversityclustering and AMOVA

300 accessions2915 accessions

Chickpea Reference Set (GCP)

Upadhyaya HD, Dwivedi SL, Baum M, Varshney RK, Udupa SM, Gowda CLL, Hoisington D and Singh S (2008) Genetic structure, diversity, and allelic richness in composite collection and reference set in chickpea (Cicer arietinum L.). BMC Plant Biology 8: 106.

Page 55: Genetic diversityclustering and AMOVA
Page 56: Genetic diversityclustering and AMOVA

Thank you