View
213
Download
0
Category
Tags:
Preview:
Citation preview
So far... Until we looked at factorial interactions, we were
looking at differences and their significance - or the probability that an observed difference was due to chance
But we had not learned anything about how two (or more) variables are related
What is regression? The way one variable is related to another.
As you change one, how are others affected?
Yield
Protein %
Types of Variables in Crop Experiments:
Treatments such as fertilizer rates, varieties, and weed control methods which are the primary focus of the experiment
Environmental factors, such as rainfall and solar radiation which are not within the researcher’s control
Responses which represent the biological and physical features of the experimental units that are expected to be affected by the treatments being tested.
Usual associations within ANOVA...
Association between response and treatment– when treatments are quantitative - such as
fertilizer levels - it is possible to describe the association between treatment and response
– the response could then be specified for not only the treatment levels actually tested but for all other intermediate points within the range of the treatments tested
Partitioning SST into Regression Components
Agronomic experiments frequently consist of different levels of one or more quantitative variables:– Varying amounts of fertilizer– Several different row spacings– Two or more depths of seeding
Would be useful to develop an equation to describe the relationship between plant response and treatment level
Fitting the Model
WheatYield(Y)
Applied N Level
X1 X2 X3 X4
Y3
Y1
Y2
Y4
Y = 0 + 1X +
where:Y =wheat yieldX =nitrogen level0=yield with no
nitrogen1=change in yield
per unit of applied N
=random error
Partitioning SST
Sums of Squares for Treatments (SST) contains:– SSLIN = Sum of squares associated with the
linear regression of Y on X
– SSLOF = Sum of squares for the failure of the regression model to describe the relationship between Y and X (lack of fit)
One way: Find a set of coefficients that define a linear
contrast– use the deviations of the treatment levels from
the mean level of all treatments– so that j jk X X
Therefore
The sum of the coefficients will be zero, satisfying the definition of a contrast
jLIN j jL (X X)Y
Computing SSLIN
SSLOF (sum of squares for lack of fit) is computed by subtraction
SSLOF = SST - SSLIN (df is df for treatments - 1)
Not to be confused with SSE which is still the SS for pure error (experimental error)
_ SSLIN = r*LLIN
2/[j (Xj - X)2]
really no different from any other contrast - df is always 1
F Ratios and their meaning All F ratios have MSE as a denominator
FT = MST/MSE tests– significance of differences among the treatment means
FLIN = MSLIN/MSE tests– H0: no linear relationship between X and Y (1 = 0)– Ha: there is a linear relationship between X and Y ( 1 0)
FLOF = MSLOF/MSE tests
– H0: the simple linear regression model describes the data
E(Y) = 0 + 1X
– Ha: there is significant deviation from a linear relationship between X and Y
E(Y) 0 + 1X
The linear relationship
The expected value of Y given X is described by the equation:
where:– = grand mean of Y
– Xj = value of X (treatment level) at which Y is estimated
–
j 1 jY Y b (X X)
Y
jLIN j jL (X X)Y
LIN1 2
j j
Lb
(X X)
2LIN
LIN 2j j
r *LSS
(X X)
Sources of Variation in Regression
WheatYield
(y)
Applied N Level
x1 x2 x3 x4
Y3
Y1
Y2
Y4
Y
Y
V
Y Y b(X X)
iˆY Y
Orthogonal Polynomials If the relationship is not linear, we can simplify
curve fitting within the ANOVA with the use of orthogonal polynomial coefficients under these conditions:– equal replication– the levels of the treatment variable must be equally
spaced• e.g., 20, 40, 60, 80, 100 kg of fertilizer per plot
Curve fitting Model: E(Y) = 0 + 1X + 2X2 + 3X3 +…
Determine the coefficients for 2nd order and higher polynomials from a table
Use the F ratio to test the significance of each contrast.
Unless there is prior reason to believe that the equation is of a particular order, it is customary to fit the terms sequentially
Include all terms in the equation up to and including the term at which lack of fit first becomes nonsignificant
Table of coefficients
Where do linear contrast coefficients come from? (revisited)
Assume 5 Nitrogen levels: 30, 60, 90, 120, 150
x = 90
k1 = (-60, -30, 0, 30, 60)
If we code the treatments as 1, 2, 3, 4, 5
x = 3
k1 = (-2, -1, 0, 1, 2)
b1 = LLIN / [r j (xj - x)2], but must be decoded back to original scale
_
_
_
jLIN j jL (X X)Y
1 1
X Xk
d
Consider an experiment Five levels of N (10, 30, 50, 70, 90) with four
replications 2LIN
LIN 2j j
r *LSS
(X X)
LIN 1 2 3 4 5L ( 2)Y ( 1)Y (0)Y (1)Y (2)Y
Linear contrast–
– SSLIN = 4* LLIN2
/ 10
QUAD 1 2 3 4 5L (2)Y ( 1)Y ( 2)Y ( 1)Y (2)Y
Quadratic–
– SSQUAD = 4*LQUAD2
/ 14
LOF still significant? Keep going… Cubic
–
– SSCUB = 4*LCUB2
/ 10CUB 1 2 3 4 5L ( 1)Y (2)Y (0)Y ( 2)Y (1)Y
QUAR 1 2 3 4 5L (1)Y ( 4)Y (6)Y ( 4)Y (1)Y
Quartic–
– SSQUAR = 4*LQUAR2
/ 70
Each contrast has 1 degree of freedom
Each F has MSE in denominator
Numerical Example An experiment to determine the effect of nitrogen on the
yield of sugarbeet roots:– RBD– three blocks– 5 levels of N (0, 35, 70, 105, and 140) kg/ha
Meets the criteria– N is a quantitative variable– levels are equally spaced– equally replicated
Significant SST so we go to contrasts
Orthogonal Partition of SST
N level (kg/ha)
0 35 70 105 140
Order Mean 28.4 66.8 87.0 92.0 85.7 Li j kj2 SS(L)i
Linear -2 -1 0 +1 +2 46.60 10 651.4780
Quadratic +2 -1 -2 -1 +2 -34.87 14 260.5038
Cubic -1 +2 0 -2 +1 2.30 10 1.5870
Quartic +1 -4 +6 -4 +1 0.30 70 .0039
Sequential Test of Nitrogen Effects
Source df SS MS F
(1)Nitrogen 4 913.5627 228.3907 64.41**
(2)Linear 1 651.4680 651.4680 183.73**
Dev (LOF) 3 262.0947 87.3649 24.64**
(3)Quadratic 1 260.5038 260.5038 73.47**
Dev (LOF) 2 1.5909 .7955 0.22ns
Choose a quadratic model– First point at which the LOF is not significant– Implies that a cubic term would not be significant
Regression Equation
bi = LREG / j kj2 Coefficient b0 b1 b2
23.99 4.66 -2.49
2Y 9.69 0.418X 0.002X
To scale to original X values
j 1j 2 j
1
for example, at 0 k
Y Y 4.66k 2.49k
Y 23.99 0.418( 2) 0.002(2) 9.69
g N/ha
1 1
X Xk
d
2 2
2 2
X X t 1k
d 12
Common misuse of regression... Broad Generalization
– Extrapolating the result of a regression line outside the range of X values tested
– Don’t go beyond the highest nitrogen rate tested, for example
– Or don’t generalize over all varieties when you have just tested one
Do not over interpret higher order polynomials– with t-1 df, they will explain all of the variation among
treatments, whether there is any meaningful pattern to the data or not
Class vs nonclass variables General linear model in matrix notation
Y = Xß + X is the design matrix
– Assume CRD with 3 fertilizer treatments, 2 replications
1 1 0 0
1 1 0 0
1 0 1 0
1 0 1 0
1 0 0 1
1 0 0 1
1 -1 1
1 -1 1
1 0 -2
1 0 -2
1 1 1
1 1 1
1 30 900
1 30 900
1 60 3600
1 60 3600
1 90 8100
1 90 8100
x1 x2 x3 L1 L2 b0 x x2
ANOVA(class variables)
Orthogonalpolynomials
Regression(continuous variables)
Recommended