32
Animal Breeding & Genomics Centre Illumina Agriculture Seminar Series August 3, 2009, Madison WI Martien Groenen High throughput SNP discovery in the pig using the Illumina Genome Analyzer and characterization of the porcine HapMap panel using the Illumina Porcine 60K iSelect TM Beadchip

Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Martien Groenen

High throughput SNP discovery in the pig using the Illumina Genome Analyzer and characterization of the porcine HapMap

panel using the Illumina Porcine 60K iSelect TM Beadchip

Page 2: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Overview

SNP discovery and chip design

Statistics of the 60 K chip

Hapmap data: MDS and clustering

Hapmap data: Haploype diversity, haplotype

sharing and selective sweeps

Fom SNP genotyping towards resequencing

Page 3: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

HistoryObjective

Design of a 60K Illumina BeadChip

ChallengeLimited number of SNPs availableOnly half of porcine genome sequence availableJob had to be done between April/August 2008

ApproachSNP identification by sequencing reduced representation libraries (RRL) on an Illumina Genome Analyzer

Page 4: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Overview sequencing approach

6x 6x 6x 6x

30x Highly reliable SNP identification

Rough estimate of MAF within breeds

5 Reduced representation libraries (RRL)Restriction digest of DNASeparate on acrylamide gelsIsolate 150-200 bp fragments

6x

Page 5: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

RRL Genome%

HaeIII 160-200 1

AluI 100-150 2

AluI 150-200 3

MspI 100-200 1

DraI 180-260 2

TOTAL 9

Total No Reads(million)

67.1

88.6

145.9

69.1

100.3

481

No Reads after filtering (million)

57.8

83.4

95.8

52.3

67.7

357

RRLs results

=12.8 billion bp

Page 6: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Align to reference genome and identify SNPs using MAQ

C/T G/C

HaeIII HaeIIISNPs

SNP called if minor allele seen at least 3x

Page 7: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

SNP distribution on Illumina GA reads

All

Transitions

Transversions

Page 8: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

All porcine SNPs currently availableSNP source Number

HQ Solexa SNPs WU* 333,000

454 sequencing MARC 110,000

7K iSelect chip AU-RI 5,500

dbSNP 17,700

INRA (Sanger seq) 61,700

Cambridge University 14,900

Total No SNPs 543,000

Total Unique SNPs 510,000

TOTAL #SNPs submitted for the chip: 72,000

70 % mapped on build 77 % predicted23 % unmapped

*Plus additional 58,000 SNPs where minor allele seen twice

Page 9: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

5000

1000

0

1500

0

2000

0

2500

0

3000

0

3500

0

4000

0

4500

0

5000

0

5500

0

6000

0

6500

0

7000

0

7500

0

8000

0

8500

0

9000

0

9500

0

1000

00

1050

00

1100

00

1150

00

1200

00

1250

00

1300

00

1350

00

1400

00

1450

00

1500

00

>150

000

SNP distributionBuild 7

Build 9(89 % of SNPs)

Page 10: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Porcine 60K BeadChip: Excellent performanceFinal total number of SNPs on the chip: 62,163Number of SNPs with MAF>0.05 in at least 1 breed: 59KConversion rate overall: 94.8% (Illumina GA SNPs: 96 %)Good correlation between allele count based on GA sequencing and Geotyping

Page 11: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

The porcine HapMap project

576 individuals typed by Illumina991 individuals typed by WU

Page 12: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Phenotypic variation

Page 13: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Porcine Hapmap panelBreed Illumina WU TotalLandrace 79 33 112Large White 129 28 157Duroc 82 16 98Pietrain 95 17 1112Wild Boar 20 249 26912 Chinese breeds 31 233 2643 Synthetic 6 49 55Hampshire 69 8 77Berkshire 58 - 58Museum samples - 28 28Other suiformes 6 41 47Other european br. - 289 289Tobasco 1 - 1TOTAL 576 991 1567

Sus celebensisSus verrucosusSus barbatusPecari tajacuPotomochoerusBabyrousa Sus cebifrons

Including 165 animals of the discovery panel

Animal sequenced

Page 14: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Use of the 60K chip: European vs Chinese breeds

Biased towards common variantsBiased towards SNPs in European breedsSNP density of the 60K chip

Sufficient for GWA and GWS in european breedsToo low for Chinese breeds

Large White Ningxiang

Amaral et al. (2008) Genetics 179: 569

Page 15: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Domestication of the pig

Greger Larson et al. (2005) Science 307:1618

Gene flowSelective sweepsHaplotype diversityHaplotype sharingExtent of LD

mtDNA analysis

Page 16: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Multi dimensional scaling and clustering

Page 17: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

MDS plot Axis1-2: Division Europe-Asia (PLINK)

Outgroup(other suidae)Asian breeds

European/US breeds

Asian WB

European WB

European WB

Page 18: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

MDS plot Axis1-2: Division Europe-Asia (PLINK)

WB

Museum sampleCollected in turkey 1964

Museum sample from Indonesia (~ 100 y old)

Page 19: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

OutgroupsOther suidae

Page 20: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Sus scrofa Sus barbatus (11)

Sus celebensis (2)

Sus verrucosus (10)

Other Suidae

Sus cebrifons (1)

~ 1 MY

~ 2 MY

~ 4 MYPotomochoerus p. (1)

Babyrousa (1)~ 10 MY

Potomochoerus l. (1)

Pecari tajacu (1) ~ 70 MY

Phacochoerus a. (2)

Page 21: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

SNPs shared with other suidae

765438

578

--

5837

59167

1167

2306

Species Average heter.

Sus scrofa 23 %

Sus barbatus 3 %

Sus celebensis 2 %

Sus cebifrons 0.8 %

Sus verrucosus 0.6 %

Phacochoerus 0.8 %

Potomochoerus l. 0.7 %

Potomochoerus p. 0.9 %

Babyrousa b. 0.9 %

478

Ancestral allele identified for > 95% of the SNPs

496

Page 22: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Haplotype diversity, haplotype sharing and

selective sweeps

Page 23: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Sharing is high between populations of the same breed, but not uniform across chromosomes

Duroc/Meishan

Sharing is low between Chinese and European breeds, but local exceptions exist

3567 SNPs on SSC7, nearly complete, 133 Mb

Duroc/DurocPietrain/Pietrain

Haplotype sharing between pig populations

Large W/LandraceLarge W/Pietrain

Sharing is intermediate between populations of several 'white' breeds

Page 24: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

High Low (1 haplotype)

Haplotype diversity and linkage disequilibrium10 MbSSC7

LWLRPI

MS

3.5 MbDu

SSC14

DULWLRPIMS

Page 25: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Identification of selective sweeps: Examples on SSC1-4LW27

LWxPI

HA01

DU24

DU22

DU20

SSC1 SSC4SSC3SSC2

Low heterozygosityHigh frequency of derived allele

Page 26: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

From SNP typing to resequencing

Page 27: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

(1) Site frequency spectrum analysis based on RRL sequence data

RRL cover 5-10 % of the genomeCoverage differs between breeds4 commercial white breeds + Wild BoarData from poolsWithin 500 Kb windows estimate:

– Watterson’s estimator: θ = f(S, n). – Tajima’s estimator: π = f(S, n, freqs.).– Tajima’s D: D = (π-θ) / sd.– Fst measures population differentiation

reference genome

breed 1

breed 2

Page 28: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

5000

10000

15000

20000

25000

30000

35000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 330

500

1000

1500

2000

2500

3000

3500

4000

4500

Reanalyse data per breed

SNP distributions

TransitionsTransversions

Minor allele count

Page 29: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Example: Site frequency spectrum of Ssc8 derived from deep sequencing of pools

Pietrain Wild Boar

Selective sweep around the KIT locus in white breeds

Variation of nucleotide diversity ( Wθ̂ )

Page 30: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Sequencing of Sus Verrucosus (Java warty pig)

6x on Illumina GA200, 500 and 3000 bp libraries

~2 My

Page 31: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

FundingUSDA – CSREESEU (Sabre, PigSNP)Institute for Pig Genetics (IPG)

Acknowledgements (1)

Food Quality and Safety

DNA samplesChina Agricultural University

Ning LiUniversity of Sassari, Italy

Massimo ScanduraStaff Institute, Japan

Naohiko OkumuraINRA-Toulouse, France

Alain DuvroMARC, USA

Gerry RohrerRoslin Institute, UK

Alan ArchibaldEU PigBiodiv1

Wageningen University, The NetherlandsRichard Crooijmans

University of Illinois, USALarry Schook

Aristotle University of Thessaloniki, GreeceCostas Triantaphyllidis

Istituto Zootecnico per la sardegna, ItalySara Casu

Research Centre for Biology – LIPI, Indonesia

Gono SemiadiHendrix Genetics (HG)Institute for pg genetics (IPG)Pig Improvement Company (PIC)

PTP, ItalyElisabetta Guiffra

REPROGEN, AustraliaJaime Gongora

Universidade de Lisboa, PortugalDeodália Dias

Universitat Autonoma BarcelonaMiguel Perez-Enciso

University of Aarhus, DenmarkChristian Bendixen

Durham University, UKGreger Larsen

Iowa State University, USAMax Rothschild

Page 32: Groenen Madison Porcine 60K [Read-Only]€¦ · Greger Larson et al. (2005) Science 307:1618 ... zMax Rothschild zZhiliang Hu zJames Reecy Sanger Institute, UK zCarol Churcher zRichard

Animal Breeding & Genomics Centre

Illumina Agriculture Seminar SeriesAugust 3, 2009, Madison WI

Acknowledgements (2)Wageningen University, The Netherlands

Richard CrooijmansMarcos RamosAndreia FonsecaHinri KerstensHendrik-Jan Megens

University of Aarhus, DenmarkChristian BendixenJakob Hedegaard

USDA,ARS, MARC, USAGary RohrerTim SmithDan Nonneman

INRA , Toulouse, FranceDenis Milan

Roslin Institute, UKAlan ArchibaldAndy Law

Durham University, UKGreger Larsen

Purdue University, USABill Muir

University of Illinois, USALarry SchookJon Beever

USDA, ARS, Beltsville, USACurt Van Tassel

Iowa State University, USAMax RothschildZhiliang HuJames Reecy

Sanger Institute, UKCarol ChurcherRichard Clark

University of Missour-Columbia, USAJeremy TaylorBob Schnabel

Universitat Autonoma BarcelonaMiguel Perez-EncisoLucca Ferreti

Illumina Inc., San Diego, USAMark HansenMarylinn Munson