33
A first order model with one binary and one quantitative predictor variable

A first order model with one binary and one quantitative predictor variable

Embed Size (px)

Citation preview

Page 1: A first order model with one binary and one quantitative predictor variable

A first order model with one binary and one quantitative

predictor variable

Page 2: A first order model with one binary and one quantitative predictor variable

Examples of binary predictor variables

• Gender (male, female)

• Smoking status (smoker, nonsmoker)

• Treatment (yes, no)

• Health status (diseased, healthy)

Page 3: A first order model with one binary and one quantitative predictor variable

On average, do smoking mothers have babies with lower birth weight?

• Random sample of n = 32 births.

• y = birth weight of baby (in grams)

• x1 = length of gestation (in weeks)

• x2 = smoking status of mother (yes, no)

Page 4: A first order model with one binary and one quantitative predictor variable

Coding the binary (two-group qualitative) predictor

• Using a (0,1) indicator variable.– xi2 = 1, if mother smokes

– xi2 = 0, if mother does not smoke

• Other terms used: – dummy variable– binary variable

Page 5: A first order model with one binary and one quantitative predictor variable

On average, do smoking mothers have babies with lower birth weight?

0 1

424140393837363534

3500

3000

2500

Gestation (weeks)

Wei

ght (

gram

s)

Page 6: A first order model with one binary and one quantitative predictor variable

A first order model with one binary and one quantitative predictor

iiii xxy 22110

where …

• yi is birth weight of baby i

• xi1 is length of gestation of baby i

• xi2 = 1, if mother smokes and xi2 = 0, if not

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Page 7: A first order model with one binary and one quantitative predictor variable

An indicator variable for 2 groups yields 2 response functions

If mother is a smoker (xi2 = 1):

iiii xxy 22110

11201| )(2 ixY x

If mother is a nonsmoker (xi2 = 0):

1100| 2 ixY x

Page 8: A first order model with one binary and one quantitative predictor variable

Interpretation of the regression coefficients

1represents the change in the mean response μY for each additional unit increase in the quantitative predictor x1 … for both groups.

2represents how much higher (or lower) the mean response function for the second group is than the one for the first group… for any value of x2.

Page 9: A first order model with one binary and one quantitative predictor variable

The estimated regression function

0 1

424140393837363534

3700

3200

2700

2200

Gestation (weeks)

Wei

ght (

gram

s)

The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking

xy 1432390ˆ

xy 1432635ˆ

Page 10: A first order model with one binary and one quantitative predictor variable

The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking

Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000Smoking -244.54 41.98 -5.83 0.000

S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%

A significant difference in mean birth weights for the two groups?

11201| )(2 ixY x 1100| 2 ixY x

Page 11: A first order model with one binary and one quantitative predictor variable

Why not instead fit two separate regression functions?

One for the smokers and one for the nonsmokers?

Page 12: A first order model with one binary and one quantitative predictor variable

Using indicator variable, fitting one function to 32 data points

The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking

Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000Smoking -244.54 41.98 -5.83 0.000

S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%

Page 13: A first order model with one binary and one quantitative predictor variable

Using indicator variable, fitting one function to 32 data points

Predicted Values for New ObservationsNew Obs Fit SE Fit 95.0% CI 95.0% PI1 2803.7 30.8 (2740.6, 2866.8) (2559.1, 3048.3) 2 3048.2 28.9 (2989.1, 3107.4) (2804.7, 3291.8)

Values of Predictors for New ObservationsNew Obs Gest Smoking1 38.0 1.002 38.0 0.00

Page 14: A first order model with one binary and one quantitative predictor variable

Fitting function to 16 nonsmokers

The regression equation isWeight = - 2546 + 147 Gest

Predictor Coef SE Coef T PConstant -2546.1 457.3 -5.57 0.000Gest 147.21 11.97 12.29 0.000

S = 106.9 R-Sq = 91.5% R-Sq(adj) = 90.9%

Page 15: A first order model with one binary and one quantitative predictor variable

Fitting function to 16 nonsmokers

Predicted Values for New ObservationsNew Obs Fit SE Fit 95.0% CI 95.0% PI1 3047.7 26.8 (2990.3, 3105.2) (2811.3, 3284.2)

Values of Predictors for New ObservationsNew Obs Gest1 38.0

Page 16: A first order model with one binary and one quantitative predictor variable

Fitting function to 16 smokers

The regression equation isWeight = - 2475 + 139 Gest

Predictor Coef SE Coef T PConstant -2474.6 554.0 -4.47 0.001Gest 139.03 14.11 9.85 0.000

S = 126.6 R-Sq = 87.4% R-Sq(adj) = 86.5%

Page 17: A first order model with one binary and one quantitative predictor variable

Fitting function to 16 smokers

Predicted Values for New ObservationsNew Obs Fit SE Fit 95.0% CI 95.0% PI1 2808.5 35.8 (2731.7, 2885.3) (2526.4, 3090.7)

Values of Predictors for New ObservationsNew Obs Gest1 38.0

Page 18: A first order model with one binary and one quantitative predictor variable

Summary table

Model estimated using…

SE(Gest)Length of CI for μY

32 data points 9.128(NS) 118.3

(S) 126.2

16 nonsmokers 11.97 114.9

16 smokers 14.11 153.6

Page 19: A first order model with one binary and one quantitative predictor variable

Reasons to “pool” the data and to fit one regression function

• Model assumes equal slopes for the groups and equal variances for all error terms.

• It makes sense to use all of the data to estimate these quantities.

• More degrees of freedom associated with MSE, so confidence intervals that are a function of MSE tend to be narrower.

Page 20: A first order model with one binary and one quantitative predictor variable

How to answer the research question using one regression function?

The regression equation isWeight = - 2390 + 143 Gest - 245 Smoking

Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000Smoking -244.54 41.98 -5.83 0.000

S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%

Page 21: A first order model with one binary and one quantitative predictor variable

How to answer the research question using two regression functions?

The regression equation is Weight = - 2546 + 147 Gest

Predictor Coef SE Coef T PConstant -2546.1 457.3 -5.57 0.000Gest 147.21 11.97 12.29 0.000

Nonsmokers

The regression equation is Weight = - 2475 + 139 Gest

Predictor Coef SE Coef T PConstant -2474.6 554.0 -4.47 0.001Gest 139.03 14.11 9.85 0.000

Smokers

Page 22: A first order model with one binary and one quantitative predictor variable

Reasons to “pool” the data and to fit one regression function

• It allows you to easily answer research questions concerning the binary predictor variable.

Page 23: A first order model with one binary and one quantitative predictor variable

What if we instead tried to use two indicator variables?

One variable for smokers and one variable for nonsmokers?

Page 24: A first order model with one binary and one quantitative predictor variable

Definition of two indicator variables – one for each group

• Using a (0,1) indicator variable for nonsmokers– xi2 = 1, if mother smokes

– xi2 = 0, if mother does not smoke

• Using a (0,1) indicator variable for smokers– xi3 = 1, if mother does not smoke

– xi3 = 0, if mother smokes

Page 25: A first order model with one binary and one quantitative predictor variable

The modified regression functionwith two binary predictors

3322110 iiiY xxx

where …

• μY is mean birth weight for given predictors

• xi1 is length of gestation of baby i

• xi2 = 1, if smokes and xi2 = 0, if not

• xi3 = 1, if not smokes and xi3 = 0, if smokes

Page 26: A first order model with one binary and one quantitative predictor variable

Implication on data analysis

Regression Analysis: Weight versus Gest, x2*, x3*

* x3* is highly correlated with other X variables* x3* has been removed from the equation

The regression equation isWeight = - 2390 + 143 Gest - 245 x2*

Predictor Coef SE Coef T PConstant -2389.6 349.2 -6.84 0.000Gest 143.100 9.128 15.68 0.000x2* -244.54 41.98 -5.83 0.000

S = 115.5 R-Sq = 89.6% R-Sq(adj) = 88.9%

Page 27: A first order model with one binary and one quantitative predictor variable

To prevent problems with the data analysis

• A qualitative variable with c groups should be represented by c-1 indicator variables, each taking on values 0 and 1.– 2 groups, 1 indicator variable– 3 groups, 2 indicator variables– 4 groups, 3 indicator variables– and so on…

Page 28: A first order model with one binary and one quantitative predictor variable

What is the impact of using a different coding scheme?

… such as (1, -1) coding?

Page 29: A first order model with one binary and one quantitative predictor variable

The regression model defined using (1, -1) coding scheme

iiii xxy 22110

where …

• yi is birth weight of baby i

• xi1 is length of gestation of baby i

• xi2 = 1, if mother smokes and xi2 = -1, if not

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Page 30: A first order model with one binary and one quantitative predictor variable

The regression model yields 2 different response functions

If mother is a smoker (xi2 = 1):

iiii xxy 22110

1120 )( iY x

If mother is a nonsmoker (xi2 = -1):

1120 iY x

Page 31: A first order model with one binary and one quantitative predictor variable

Interpretation of the regression coefficients

1represents the change in the mean response μY for each additional unit increase in the quantitative predictor x1 … for both groups.

0 represents the “average” intercept

2 represents how far each group is “offset” from the “average”

Page 32: A first order model with one binary and one quantitative predictor variable

The estimated regression function

-1

1

34 35 36 37 38 39 40 41 42

2200

2700

3200

3700

Gestation (weeks)

We

ight

(gr

am

s)

The regression equation isWeight = - 2512 + 143 Gest - 122 Smoking2

xy 1432390ˆ

xy 1432635ˆ

Page 33: A first order model with one binary and one quantitative predictor variable

What is impact of using different coding scheme?

• Interpretation of regression coefficients changes.

• When reporting your results, make sure you explain what coding scheme was used!

• When interpreting others’ results, make sure you know what coding scheme was used!