32
Genomic simulation of complex traits using AlphaDrop Gorjanc G. & Hickey J. M. COST RGB-Net Rodica – Domžale, Slovenia 15th October 2012

Genomic simulation of complex traits using AlphaDrop

Embed Size (px)

Citation preview

Page 1: Genomic simulation of complex traits using AlphaDrop

Genomic simulation of complex

traits using AlphaDrop

Gorjanc G. & Hickey J. M.

COST RGB-Net

Rodica – Domžale, Slovenia15th October 2012

Page 2: Genomic simulation of complex traits using AlphaDrop

Introduction

• Whole-genome technologies � rich data

• In complex traits (e.g., body height, weight, …)

gene discovery still very limited

• Rich genome-wide data can be used for

prediction(classicaly based on phenotype and pedigree data)

• AIM: Show the simulation of different types of

information for prediction in complex traits

Page 3: Genomic simulation of complex traits using AlphaDrop

Phenotype

Different sources of information(simplistic scheme)

k-1 k k+1 k+2 k+3 k+4 k+5 k+6

… C C T A G A …

… G G A T C T …

… C C T A G A …

… G G A T C T …

… C C T A G A …

… G G A T C T …

… C T C A G A …

… G A G T C T …

… C T C A T A …

… G A G T A T …

… C T C A T A …

… G A G T A T …

+0 cm

+1 cm

+2 cm

QTLSNP SNP

Haplotypes

Page 4: Genomic simulation of complex traits using AlphaDrop

SNP

ABAABB

BB

Page 5: Genomic simulation of complex traits using AlphaDrop

SNP data – long format

[Header]BSGT Version 3.3.7Content BovineSNP50_A.bpm � imageNum SNPs 54001Total SNPs 54001Num Samples 191Total Samples 191[Data]SNP Name Sample ID Allel1 Allel2ARS-BFGL-BAC-10172 SLO110973 B BARS-BFGL-BAC-1020 SLO110973 - -ARS-BFGL-BAC-10245 SLO110973 - -ARS-BFGL-BAC-10345 SLO110973 A BARS-BFGL-BAC-10365 SLO110973 A AARS-BFGL-BAC-10375 SLO110973 B A…

AB format!!!

Page 6: Genomic simulation of complex traits using AlphaDrop

SNP data – wide format

SLO110973 2 - - 1 0 1 …

SLO110974 0 0 - 1 0 1 …

SLO110975 1 2 - 1 0 1 …

0, 1, 2 format!!!

Page 7: Genomic simulation of complex traits using AlphaDrop

Missing genotypes in reality?

„Computer eye“

Manual correction?

Page 8: Genomic simulation of complex traits using AlphaDrop

Methods - Idea1. Simulate individuals‘ genome

2. Allocate QTLs and SNPs to positions in genome

3. Sample QTL effects and compute (total) additive genetic value (AGV = sum of all QTL effects)

4. Sample phenotypic value based on additive genetic value and heritability ��

5. Statistical analysis of co-variation between phenotypic and genotypic data

6. Correlate estimated AGV with true AGV

Page 9: Genomic simulation of complex traits using AlphaDrop

MAJOR

„MINOR“

1. Simulate individuals‘ genome?

• A hard task!

• What is „going on under the hood“?

– „Mutation“ � SNP, MS, …

– Insertions

– Deletions

– Duplications � copy number variation

– Translocations

– Methylation � epigenetics

– …

– Linkage + Recombination

– Segregation

– …

Structural variation

Page 10: Genomic simulation of complex traits using AlphaDrop

Whole-genome sequence

haplotypes (human trios) – Chr1

Page 11: Genomic simulation of complex traits using AlphaDrop
Page 12: Genomic simulation of complex traits using AlphaDrop

A compromise

• DNA is a double helix (linear structure)

� simulate two „strings“ (haplotypes)

• Linkage + Recombination

� simulate „chunkular strings“ (structured haplotypes

manifesting linkage disequlibrium - LD) - – there are

many programs to do this (e.g., ms, MaCS, Freegene, …)

� sample chunk and position within the chunk where

crossing over occurs (flip haplotypes between gametes)

… B B A B A A …

… A B A A B A …

… B B A A B A …

… A B A B A A …

Page 13: Genomic simulation of complex traits using AlphaDrop

A compromise II

• Segregation

� base: take at random gametes from a „soup“

� non-base: take at random one gamete from parent

… B B A A B A …

… A B A B A A …

… B B B B B B …

… A B A B A A …

… B B B B B B …

… A B A B A A …

Page 14: Genomic simulation of complex traits using AlphaDrop

2. Allocate genes (QTL) and markers

• Sample position within the genome separately for QTL

and markers

• How many?

… B B A A B A …

… A B A B A A …

… B B B B B B …

… A B A B A A …

… B B B B B B …

… A B A B A A …

… B B A B A A …

… A B A A B A …

Page 15: Genomic simulation of complex traits using AlphaDrop

Ascertainment bias

• SNP chips are designed from few animals (often not

equally distributed among breeds, countries, …) and

such that markers are polymorphic

• Variation on SNP chip does not necesarilly represent the

variation of QTL closely!!!

Page 16: Genomic simulation of complex traits using AlphaDrop

For example

B + B B A A + B B A B B +

B + A A B A + A A A A B +

A - B A A A + A A B B A -

B + A A B B - A A A B B +

A - B B A B - A A B A A -

A - A B A B - B B A A B +

B + B B A A + B B A B B +

B + A A A A + A A B B A -

B + A A B A + A A A B B +

A - B A A B - A A A B B +

A - B B A B - A A B A A -

A - B A A A + A A B B A -

A - B B B B - A A B A A +

B + A A B B + A A B B A -

1 32

4

5

6

7

Page 17: Genomic simulation of complex traits using AlphaDrop

For example IIID DAD MUM SNP1 QTL1 SNP2 SNP3 SNP4 SNP5 QTL2 SNP6 SNP7 SNP8 SNP9 SNP10 QTL3 a y

1 / / B + B B A A + B B A B B +2,2 172,,0

B + A A B A + A A A A B +

2 / / A - B A A A + A A B B A --0,9 167,5

B + A A B B - A A A B B +

3 / / A - B B A B - A A B A A --3,9 168,6

A - A B A B - B B A A B +

4 2 1 B + B B A A + B B A B B +2,1 169,1

B + A A A A + A A B B A -

5 2 1 B + A A B A + A A A B B +-0,8 170,3

A - B A A B - A A A B B +

6 2 3 A - B B A B - A A B A A --2,0 165,9

A - B A A A + A A B B A -

7 2 3 A - B B A B - A A B A B +-0,9 168,1

B + A A B B + A A B B A -

ID DAD MUM SNP1 QTL1 SNP2 SNP3 SNP4 SNP5 QTL2 SNP6 SNP7 SNP8 SNP9 SNP10 QTL3 a y

1 / / BB / AB AB AB AA / AB AB AA AB BB / / 172,0

2 / / AB / AB AA AB AB / AA AA AB BB AB / / 167,5

3 / / AA / AB BB AA BB / AB AB AB AA AB / / 168,6

4 2 1 BB / AB AB AA AA / AB AB AB BB AB / / 169,1

5 2 1 AB / AB AA AB AB / AA AA AA BB BB / / 170,3

6 2 3 AA / BB AB AA AB / AA AA BB AB AA / / 165,9

7 2 3 AB / AB AB AB BB / AA AA BB AB AB / / 168,1

Page 18: Genomic simulation of complex traits using AlphaDrop

ID DAD MUM SNP1 QTL1 SNP2 SNP3 SNP4 SNP5 QTL2 SNP6 SNP7 SNP8 SNP9 SNP10 QTL3 a y

1 / / 2 / 1 1 1 0 / 1 1 0 1 2 / / 172,0

2 / / 1 / 1 0 1 1 / 0 0 1 2 1 / / 167,5

3 / / 0 / 1 2 0 2 / 1 1 1 0 1 / / 168,6

4 2 1 2 / 1 1 0 0 / 1 1 1 2 1 / / 169,1

5 2 1 1 / 1 0 1 1 / 0 0 0 2 2 / / 170,3

6 2 3 0 / 2 1 0 1 / 0 0 2 1 0 / / 165,9

7 2 3 1 / 1 1 1 2 / 0 0 2 1 1 / / 168,1

For example IIIAA ���� 0 AB ���� 1 BB ���� 2

�� � ��� ���� ���� � ���

�� � ��� ���� ����

Page 19: Genomic simulation of complex traits using AlphaDrop

3. Sample QTL effects

• What is the true state of nature? � Nobody knows

• Additive

� simple, i.e, linear regression on 0, 1, 2

• Dominance

� interaction between alleles on one locus

• Epistasis

� interaction between alleles on different loci

• Imprinting (parent of origin effect)

� interaction between parental origin and allele

• … ???

Page 20: Genomic simulation of complex traits using AlphaDrop

� � � ��� � � ���� � � ����

������� � � � � � � �� � � � �� � ��

�� � � ��� � � ���� � � ����

������� � � � � � � �� � � � �� � ���

� � ����

Additive

Page 21: Genomic simulation of complex traits using AlphaDrop

QTL effect size - popular choices

• Gaussian distribution

� most of complex traits support infinitesimall model

with very many loci having very small effects

• Gamma distribution

� can accomodate few genes with very large effect

(major genes) as for example DGAT in %fat in milk

QTL

QTL

Page 22: Genomic simulation of complex traits using AlphaDrop

AlphaDrop in a nutshell

• Simple program

• Coalescent simulation of haplotypes with MaCS(Chen et al., 2009)

– mutation, recombination

• Dropping haplotypes through the pedigree

– recombination, segregation, selection

• Result

– SNP, haplotype, and sequence data(efficient internal compression: 01010110 � long integer)

– QTL position and effect and genetic values 7

Page 23: Genomic simulation of complex traits using AlphaDrop

EXEMPLAR APPLICATION

Page 24: Genomic simulation of complex traits using AlphaDrop

Methods - Simulation

1. Coalescent simulation of haplotypes (4000 on 30 Chr)

– mutation, recombination, drop in Ne

2. Dropping haplotypes through the pedigree(animal breeding scenario)

– recombination, segregation

– 50 sires × 10 dams × 2 progeny � 1000 animals / generation

– 10 generations

– QTL effects from Gaussian or gamma distribution

– phenotypes from Gaussian distribution, �� � ���

– 50K SNP genotypes

• AlphaDrop software (Hickey & Gorjanc, 2012)

Page 25: Genomic simulation of complex traits using AlphaDrop

AGV

Methods – Simulated data

1

Genotype

Pedigree

Phenotype

Genotype

2

3

4

5

6

7

8

9

10

Genotype

Genotype

ValidationCalibrationGen.

Page 26: Genomic simulation of complex traits using AlphaDrop

Methods – Statistical analysis

• Statistical analysis of co-variation between phenotypic and genotypic data with linear mixed model

!"# $% � &� '()�

• … accounting for relationships between individuals *

�"# +*(,�

• * based on:– pedigree,

– SNP, „proxies“– haplotype, or

– QTL data � „genetic variation that matters“

�� �(,

()� � (,

Page 27: Genomic simulation of complex traits using AlphaDrop

GWAS vs. relationship modelling

• GWAS

• Relationships � use the same underlying information

(phenotype and genotype data) to infer the sum of all

GWAS estimates

Page 28: Genomic simulation of complex traits using AlphaDrop

Haplotype similarity

• Long haplotypes � „explosion“ in #haplotypes

• But parts of haplotypes are similar

� efective number of haplotypes is smaller

• Similarities(several variations

tested)

k-1 k k+1 k+2 k+3 k+4 k+5 k+6

… C C T A G A …

… G G A T C T …

… C T C A G A …

… G A G T C T …

… C T C A T A …

… G A G T A T …

Haplotype 1 Haplotype 2 Haplotype 3

Haplotype 1 6/6 4/6 3/6

Haplotype 2 6/6 5/6

Haplotype 3 6/6

Page 29: Genomic simulation of complex traits using AlphaDrop

Results – Gaussian QTL

QTL Pedigree SNP V SNP Y

Haplotypes – no similarity

Haplotypes – similarity 1 Haplotypes – similarity 2

QTL

Page 30: Genomic simulation of complex traits using AlphaDrop

Results – Gamma QTL

QTL Pedigree SNP V SNP Y

Haplotypes – no similarity

Haplotypes – similarity 1 Haplotypes – similarity 2

QTL

Page 31: Genomic simulation of complex traits using AlphaDrop

Conclusions

• Genome-wide information increases accuracy in comparison to classic methods using pedigrees and phenotypes only

• Long haplotypes � large #haplotypes

– low accuracies

– similarities help

– no advantage over SNP data (perhaps due to large #haplotypes)

• Accuracies drop in further generations(not so much with Gamma QTL data)

� can not predict distant relatives or unrelated individuals accurately!!!

• Even with QTL data accuracies are not perfect!!!

Page 32: Genomic simulation of complex traits using AlphaDrop

Genomic simulation of complex

traits using AlphaDrop

Gorjanc G. & Hickey J. M.

COST RGB-Net

Rodica – Domžale, Slovenia15th October 2012