1
Modeling nuisance variables for phenotypic evaluation of bull fertility M. T. Kuhn, J. L. Hutchison, and H. D. Norman* Animal Improvement Programs Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350 Abstract T20 2008 INTRODUCTION In May 2006, AIPL began evaluation of U.S. bull fertility. General research objectives: investigate options for modeling and trait definition that might improve accuracy Specific goal of this research: determine which available nuisance variables to include in the evaluation model and how to model them Factors considered were management (mgt) groups based on herd-yr- season-parity-registry status (HYSPR) , Yr-State (St)-Mo, cow age, DIM, lactation, service number, milk yield, cow effects, and short heat intervals MATERIALS & METHODS Comparing Predictors from Alternative Models Bulls’ predicted conception rates (CR) computed from estimation data (n=3,613,907) and compared to their average CR in set-aside data (n=2,025,884) using accuracy, bias, and MSE; the 803 bulls with a min. of 50 matings for estimation and 100 matings in the set-aside data were included in comparisons. Only AI cow breedings were included Management Groups Minimum (target) group sizes tested were 3, 5, 10, 20 Many small groups occurred; thus 3 basic strategies tested: 1.Exclude records if HYSPR does not have the min. number (exact HYSPR groups) 2.Combine groups until the target size is reached; exclude the group if target not reached 3.Combine to target size but if HY has a specified minimum number of records, allow it into the evaluation; HY minimums were 2, 5, 10 when target sizes were 5, 10, and 20, respectively Model: y = HYSPR + 1 *Milk + 2 *Milk 2 + 3 *Age Cow + 4 *Age Cow 2 + 5 *DIM + 6 *DIM + 7 *F Bull + 8 *F Mating + Age Bull + Stud-Yr + Service Sire (SSR) + A Cow + PE Cow + e, y = conception, yes or no Other Factors Tested by dropping/adding factors of interest from/to the basic model of: y = HYSPR + SSR variables + PE + A + Age Cow + DIM + Yr-St-Mo + Milk + Lact + e SSR variables included: F SSR , F Mating , Age Bull , Stud-Yr, SSR The HYSPR strategy used was to combine to a target group size of 20 and allow the HY into the evaluation if it had at least 10 breedings Preliminary results showed: 1.Use of 305d-2x-ME milk yield provided as good or better RESULTS CONCLUSIONS Combining HYSPR groups to a target size of 20 and allowing HYs in with a min of 10 records maximized accuracy and thus will be implemented Other nuisance variables to include are: cow PE, cow breeding value, cow age, Yr-St-Mo, ME milk yield, lactation, service number, and a short breeding interval variable; quantitative nuisance variables will be fit as categorical variables. Management group N. Records for estimation Corr Mean Diff (%) Std. Dev. Diff Min. N. Strategy 3 No combine 3,467,947 54.36 -0.229 3.279 3 Combine 3,612,549 53.80 -0.067 3.292 5 No combine 3,249,048 53.93 -0.222 3.290 5 Combine 3,609,384 54.13 0.150 3.284 5 Combine, allow 2 3,613,907 54.12 0.159 3.284 10 No combine 2,670,545 52.67 -0.293 3.319 10 Combine 3,596,781 54.66 -0.135 3.270 10 Combine, allow 5 3,609,273 54.48 -0.106 3.275 20 No combine 1,905,602 49.16 -0.181 3.400 20 Combine 3,542,053 54.47 0.028 3.274 20 Combine, allow 10 3,595,601 54.91 0.374 3.263 Management Groups Models are sorted from best to worst for each statistic (mean difference, correlation, and mean square error); the model listed first was the best for that statistic and the model listed last was the poorest The model without cow age (but with Lact; see basic model in methods) had the smallest mean difference but mean difference between bulls’ predicted CR and CR in the set-aside data was nearly 0 for all models, except when all nuisance variables dropped from the model (Omit All) The model with service number (ServN) and without DIM maximized accuracy and minimized MSE; correlations with both in the model were lower than with just service number because these 2 variables are highly correlated; the importance of including at least one is seen from the correlation when DIM was omitted without including ServN (Omit DIM) The range in correlations and MSEs, however, was generally small, except when all nuisance variables were omitted While simple average CR was 9% lower for breedings preceded by a short breeding interval (10-17 days, min. of 10 required), they accounted for only 2.5% of all breedings and the max percentage for any one bull was 9%; thus, this variable had minimal impact overall. For bulls where these breedings accounted Generally, combining groups resulted in higher correlations of predicted CR with bulls’ average CR in set aside data, than did using exact HYSPRs (no combining); except in the case where min. group size was 3, restricting to exact HYSPRs resulted in the loss of too many records Allowing HYs into the evaluation that had fewer than the target number of records was beneficial only when target group size was 20; considerably more records were salvaged when target group size was 20 than when it was 5 or 10 In general, though, differences among the options tested were small; provided that excessive data exclusion is avoided, formation of mgt groups will not have a large impact on accuracy Combining groups to a target group size of 20 and allowing HYs in if they have a min. of 10 records maximized accuracy. The small mean difference for this option was eliminated by categorization of quantitative nuisance variables, as can be seen below (Basic model) Other Factors Model Mean Diff (%) Model Corr Model MSE Omit Cow Age -0.011 ServN, Omit DIM 55.17 ServN, Omit DIM 3.254 Basic Model -0.019 Lact*ServN, Omit DIM 55.11 Lact*ServN , Omit DIM 3.256 Omit DIM -0.019 Lact*ServN and DIM 55.07 Lact*ServN and DIM 3.258 Lact*ServN , Omit DIM -0.020 ServN and DIM 55.06 ServN and DIM 3.259 ServN, Omit DIM -0.020 DIM*ServN 55.05 Basic Model 3.260 Omit Cow -0.021 Basic Model 54.97 DIM*ServN 3.260 Lact*ServN and DIM -0.022 Omit Cow 54.93 Omit Cow Age 3.263 DIM*ServN -0.028 Omit Cow Age 54.89 Omit Cow 3.265 Omit Milk -0.028 Omit Milk 54.72 Omit Milk 3.265 ServN and DIM -0.042 Omit DIM 53.35 Omit DIM 3.300 Omit All -0.257 Omit All 51.54 Omit All 3.501

INTRODUCTION

Embed Size (px)

DESCRIPTION

RESULTS. INTRODUCTION In May 2006, AIPL began evaluation of U.S. bull fertility. General research objectives: investigate options for modeling and trait definition that might improve accuracy - PowerPoint PPT Presentation

Citation preview

Page 1: INTRODUCTION

Modeling nuisance variables for phenotypic evaluation of bull fertility

M. T. Kuhn, J. L. Hutchison, and H. D. Norman*Animal Improvement Programs Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350

Abstract T202008

INTRODUCTION◆ In May 2006, AIPL began evaluation of U.S. bull fertility. General research objectives:

investigate options for modeling and trait definition that might improve accuracy

◆ Specific goal of this research: determine which available nuisance variables to include in the evaluation model and how to model them

◆ Factors considered were management (mgt) groups based on herd-yr-season-parity-registry status (HYSPR) , Yr-State (St)-Mo, cow age, DIM, lactation, service number, milk yield, cow effects, and short heat intervals

MATERIALS & METHODSComparing Predictors from Alternative Models◆ Bulls’ predicted conception rates (CR) computed from estimation data (n=3,613,907) and

compared to their average CR in set-aside data (n=2,025,884) using accuracy, bias, and MSE; the 803 bulls with a min. of 50 matings for estimation and 100 matings in the set-aside data were included in comparisons. Only AI cow breedings were included

Management Groups◆ Minimum (target) group sizes tested were 3, 5, 10, 20

◆ Many small groups occurred; thus 3 basic strategies tested:1. Exclude records if HYSPR does not have the min. number (exact HYSPR groups)2. Combine groups until the target size is reached; exclude the group if target not

reached3. Combine to target size but if HY has a specified minimum number of records, allow it

into the evaluation; HY minimums were 2, 5, 10 when target sizes were 5, 10, and 20, respectively

◆ Model: y = HYSPR + 1*Milk + 2*Milk2 + 3*AgeCow + 4*AgeCow2 + 5*DIM + 6*DIM +

7*FBull + 8*FMating + AgeBull + Stud-Yr + Service Sire (SSR) + ACow + PECow + e, y = conception, yes or no

Other Factors◆ Tested by dropping/adding factors of interest from/to the basic model of:

y = HYSPR + SSR variables + PE + A + AgeCow + DIM + Yr-St-Mo + Milk + Lact + e

◆ SSR variables included: FSSR, FMating, AgeBull, Stud-Yr, SSR

◆ The HYSPR strategy used was to combine to a target group size of 20 and allow the HY into the evaluation if it had at least 10 breedings

◆ Preliminary results showed:

1. Use of 305d-2x-ME milk yield provided as good or better predictions than use of test-day yields; ME records also did as well as FCM. Thus, ME milk yield used

2. For quantitative nuisance variables (e.g., cow age), categorical variables found to be preferable over linear and quadratic covariates; relationships with CR were not linear or quadratic. Thus, quantitative vars. fit as categorical

3. Combining mgt groups implies some groups contain multiple seasons and lactations; inclusion of Yr-St-Month and Lactation (Lact) found to improve prediction and therefore included in all models

RESULTS

CONCLUSIONS◆Combining HYSPR groups to a target size of 20 and allowing HYs in with a min of 10 records maximized accuracy and thus will be implemented◆Other nuisance variables to include are: cow PE, cow breeding value, cow age, Yr-St-Mo, ME milk yield, lactation, service number, and a short

breeding interval variable; quantitative nuisance variables will be fit as categorical variables.

Management group

N. Records for estimation Corr

Mean Diff (%)

Std. Dev. DiffMin. N. Strategy

3 No combine 3,467,947 54.36 -0.229 3.279

3 Combine 3,612,549 53.80 -0.067 3.292

5 No combine 3,249,048 53.93 -0.222 3.290

5 Combine 3,609,384 54.13 0.150 3.284

5 Combine, allow 2 3,613,907 54.12 0.159 3.284

10 No combine 2,670,545 52.67 -0.293 3.319

10 Combine 3,596,781 54.66 -0.135 3.270

10 Combine, allow 5 3,609,273 54.48 -0.106 3.275

20 No combine 1,905,602 49.16 -0.181 3.400

20 Combine 3,542,053 54.47 0.028 3.274

20 Combine, allow 10 3,595,601 54.91 0.374 3.263

Management Groups

◆Models are sorted from best to worst for each statistic (mean difference, correlation, and mean square error); the model listed first was the best for that statistic and the model listed last was the poorest

◆The model without cow age (but with Lact; see basic model in methods) had the smallest mean difference but mean difference between bulls’ predicted CR and CR in the set-aside data was nearly 0 for all models, except when all nuisance variables dropped from the model (Omit All)

◆The model with service number (ServN) and without DIM maximized accuracy and minimized MSE; correlations with both in the model were lower than with just service number because these 2 variables are highly correlated; the importance of including at least one is seen from the correlation when DIM was omitted without including ServN (Omit DIM)

◆The range in correlations and MSEs, however, was generally small, except when all nuisance variables were omitted

◆While simple average CR was 9% lower for breedings preceded by a short breeding interval (10-17 days, min. of 10 required), they accounted for only 2.5% of all breedings and the max percentage for any one bull was 9%; thus, this variable had minimal impact overall. For bulls where these breedings accounted for at least 5% of their matings (52 out of 803), accuracy improved by 0.4% when this variable was included

◆Generally, combining groups resulted in higher correlations of predicted CR with bulls’ average CR in set aside data, than did using exact HYSPRs (no combining); except in the case where min. group size was 3, restricting to exact HYSPRs resulted in the loss of too many records

◆Allowing HYs into the evaluation that had fewer than the target number of records was beneficial only when target group size was 20; considerably more records were salvaged when target group size was 20 than when it was 5 or 10

◆In general, though, differences among the options tested were small; provided that excessive data exclusion is avoided, formation of mgt groups will not have a large impact on accuracy

◆Combining groups to a target group size of 20 and allowing HYs in if they have a min. of 10 records maximized accuracy. The small mean difference for this option was eliminated by categorization of quantitative nuisance variables, as can be seen below (Basic model)

Other Factors

ModelMean

Diff (%)   Model Corr   Model MSE

Omit Cow Age -0.011ServN, Omit

DIM 55.17ServN, Omit

DIM 3.254

Basic Model -0.019Lact*ServN,

Omit DIM 55.11Lact*ServN,

Omit DIM 3.256

Omit DIM -0.019Lact*ServN

and DIM 55.07Lact*ServN

and DIM 3.258

Lact*ServN, Omit DIM -0.020

ServN and DIM 55.06

ServN and DIM 3.259

ServN, Omit DIM -0.020 DIM*ServN 55.05 Basic Model 3.260

Omit Cow -0.021 Basic Model 54.97 DIM*ServN 3.260

Lact*ServN and DIM -0.022 Omit Cow 54.93

Omit Cow Age 3.263

DIM*ServN -0.028 Omit Cow Age 54.89 Omit Cow 3.265

Omit Milk -0.028 Omit Milk 54.72 Omit Milk 3.265

ServN and DIM -0.042 Omit DIM 53.35 Omit DIM 3.300

Omit All -0.257   Omit All 51.54   Omit All 3.501