1
Genomic imputation and evaluation using 1074 high density Holstein genotypes P. M. VanRaden 1 , D. J. Null 1 *, G.R. Wiggans 1 , T.S. Sonstegard 2 , E.E. Connor 2 , M. Winters 3 , and M. Sargolzaei 4 1 Animal Improvement Programs Laboratory, ARS, USDA, Beltsville, MD 2 Bovine Functional Genomics Laboratory, ARS, USDA, Beltsville, MD, and 3 Dairy Co Agriculture and Horticulture Development Board, Warwickshire, UK 4 Centre for Genetic Improvement of Livestock, U. Guelph, ON, Canada Abstr. W53 2011 Introduction Data Four types of genotypes were used for this analysis: HD, 50K, 3K, and imputed dams. The animals genotyped included 1,074 with HD, 66,540 with 50K, 33,119 with 3K, and 2,337 imputed dams. HD genotypes were from 356 influential USA and CAN sires, 398 GBR sires, 156 other sires, 138 Beltsville research cows, and 26 other females. To test imputation, an example simulated chromosome was used with 1% of the genotypes missing and 0.02% incorrect initially from each chip. Among all animals, 94.4% of genotypes were missing initially. Conclusions Imputation from 50K to HD is accurate (98.9%), The 0.4% average increase in reliability is less favorable than the 0.9% expected from simulation. More animals with HD genotypes will improve imputation and reliability. Multi-breed evaluation could produce larger gains than the single-breed evaluation that was investigated. Software & Computing (cont.) A maximum length of 2,000 markers and a minimum of 200 yielded the best results when findhap was run one time. A maximum length of 1,500 markers and a minimum of 200 markers yielded the best results when findhap was run twice and when findhap and FImpute were combined . Running FImpute and findhap yielded the best results with an average of 96.37% correctly called HD genotypes across all chip types including imputed dams (Table 1). The average reliability gain over all traits was 0.4% (Table 2). Table 2. Gains in Reliability Three combinations of the programs were tested: findhap run once (imputing from 3K and 50K up to HD), findhap run twice (first imputing 3K to 50K then imputing 50K to HD), and running FImpute (imputing 3K to 50K) before running findhap (imputing 50K to HD). Several combinations of segment lengths were tested in findhap. Imputation of 636,967 markers for 103,070 animals with findhap required 50 Gbytes of memory and 10 hours using 6 processors. Iteration for SNP effects for 29 traits required 2 days using 6 processors. August 2007 predictions were tested with April 2011 data Higher density genotypes can provide markers closer to QTL, but imputation is needed for genotypes of less than highest density. Markers from multiple chips can then be combined in genomic evaluation. Results (cont.) Objectives Determine the accuracy of imputing up to 636,967 markers ( HD) from 42,495 markers (50K), 2,614 markers (3K) or from 0 markers (imputed dams) using simulated data. Determine gain in reliability from using more markers with actual data. Results Table 1. Correctly imputed genotypes. Software & Computing Both findhap.f90 developed at AIPL and FImpute developed at U. Guelph and Boviteq Alliance were tested in this analysis. The imputation rate with findhap version 2 is improved compared to version 1 results tested earlier. Version 2 of findhap uses both long segments to improve haplotype matches for close relatives and short segments to help detect matches from more remote ancestors. Correctly called genotypes (%) 3K to 50K 50K to HD Dams HD 50K 3K Averag e Findhap 94.23 99.91 98.84 88.77 95.43 Findhap Findhap 94.52 99.91 98.92 90.36 95.93 FImpute Findhap 95.53 99.91 98.93 92.69 96.76 Trait 50K Rel HD Rel HD Gain Milk 67.3 67.8 0.6 Fat 69.9 70.3 0.4 Protein 61.0 61.4 0.4 Fat % 85.6 87.5 1.9 Protein % 78.4 80.9 2.6 Net Merit 52.4 52.4 0.0 Productive Life 52.9 53.1 0.2 SCS 61.4 60.9 -0.5 Daughter Pregnancy Rate 50.8 50.5 -0.3 Sire Calving Ease 30.8 32.2 1.5 Daughter Calving Ease 38.9 37.0 -1.9 Sire Stillbirth 17.6 18.2 0.6 Daughter Stillbirth 28.5 28.8 0.3 Final Score 53.2 53.4 0.2 Stature 63.9 65.4 1.4 Strength 63.8 64.0 0.2 Udder Depth 73.8 74.2 0.4 Average 57.0 57.4 0.4

Introduction

Embed Size (px)

DESCRIPTION

Results (cont.). Software & Computing (cont.). Introduction. Higher density genotypes can provide markers closer to QTL, but imputation is needed for genotypes of less than highest density. Markers from multiple chips can then be combined in genomic evaluation. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction

Genomic imputation and evaluation using 1074 high density Holstein genotypes

P. M. VanRaden1, D. J. Null1*, G.R. Wiggans1, T.S. Sonstegard2, E.E. Connor2, M. Winters3, and M. Sargolzaei4

1Animal Improvement Programs Laboratory, ARS, USDA, Beltsville, MD 2Bovine Functional Genomics Laboratory, ARS, USDA, Beltsville, MD, and

3 Dairy Co Agriculture and Horticulture Development Board, Warwickshire, UK 4Centre for Genetic Improvement of Livestock, U. Guelph, ON, Canada

Abstr. W53

2011

Introduction

Data

• Four types of genotypes were used for this analysis: HD, 50K, 3K, and imputed dams.

• The animals genotyped included 1,074 with HD, 66,540 with 50K, 33,119 with 3K, and 2,337

imputed dams.

•HD genotypes were from 356 influential USA and CAN sires, 398 GBR sires, 156 other sires,

138 Beltsville research cows, and 26 other females.

• To test imputation, an example simulated chromosome was used with 1% of the genotypes

missing and 0.02% incorrect initially from each chip. Among all animals, 94.4% of genotypes

were missing initially.

Conclusions

• Imputation from 50K to HD is accurate (98.9%),

• The 0.4% average increase in reliability is less favorable than the 0.9% expected from

simulation.

• More animals with HD genotypes will improve imputation and reliability.

• Multi-breed evaluation could produce larger gains than the single-breed evaluation that was

investigated.

Software & Computing (cont.)

•A maximum length of 2,000 markers and a minimum of 200 yielded the best results when findhap

was run one time.

•A maximum length of 1,500 markers and a minimum of 200 markers yielded the best results when

findhap was run twice and when findhap and FImpute were combined .

•Running FImpute and findhap yielded the best results with an average of 96.37% correctly called HD

genotypes across all chip types including imputed dams (Table 1).

• The average reliability gain over all traits was 0.4% (Table 2).

Table 2. Gains in Reliability • Three combinations of the programs were tested: findhap run once (imputing from 3K and 50K up to

HD), findhap run twice (first imputing 3K to 50K then imputing 50K to HD), and running FImpute

(imputing 3K to 50K) before running findhap (imputing 50K to HD).

• Several combinations of segment lengths were tested in findhap.

• Imputation of 636,967 markers for 103,070 animals with findhap required 50 Gbytes of memory and

10 hours using 6 processors.

• Iteration for SNP effects for 29 traits required 2 days using 6 processors.

•August 2007 predictions were tested with April 2011 data

Higher density genotypes can provide markers closer to QTL, but imputation is needed for

genotypes of less than highest density. Markers from multiple chips can then be combined in

genomic evaluation.

Results (cont.)

Objectives

•Determine the accuracy of imputing up to 636,967 markers (HD) from 42,495 markers (50K),

2,614 markers (3K) or from 0 markers (imputed dams) using simulated data.

•Determine gain in reliability from using more markers with actual data.

Results

Table 1. Correctly imputed genotypes.

Software & Computing

•Both findhap.f90 developed at AIPL and FImpute developed at U. Guelph and Boviteq Alliance

were tested in this analysis.

• The imputation rate with findhap version 2 is improved compared to version 1 results tested

earlier.

• Version 2 of findhap uses both long segments to improve haplotype matches for close relatives

and short segments to help detect matches from more remote ancestors.

Correctly called genotypes (%)3K to 50K 50K to HD Dams HD 50K 3K Average

Findhap 94.23 99.91 98.84 88.77 95.43

Findhap Findhap 94.52 99.91 98.92 90.36 95.93

FImpute Findhap 95.53 99.91 98.93 92.69 96.76

Trait 50K Rel HD Rel HD Gain

Milk 67.3 67.8 0.6

Fat 69.9 70.3 0.4

Protein 61.0 61.4 0.4

Fat % 85.6 87.5 1.9

Protein % 78.4 80.9 2.6

Net Merit 52.4 52.4 0.0

Productive Life 52.9 53.1 0.2

SCS 61.4 60.9 -0.5

Daughter Pregnancy Rate 50.8 50.5 -0.3

Sire Calving Ease 30.8 32.2 1.5

Daughter Calving Ease 38.9 37.0 -1.9

Sire Stillbirth 17.6 18.2 0.6

Daughter Stillbirth 28.5 28.8 0.3

Final Score 53.2 53.4 0.2

Stature 63.9 65.4 1.4

Strength 63.8 64.0 0.2

Udder Depth 73.8 74.2 0.4

Average 57.0 57.4 0.4