3
See 8.5 Transformations If we find that our data don't look very normal, we have a few options. 1) Use a different distribution. For example, Weibull distributions are often used to model lifetimes of machines. 2) There are "nonparametric methods" that don't rely on any assumption about the underlying distribution. These methods sacrifice some power. When a distribution can be used, you are better off doing so. 3) Re-scale (transform) the data so that the normality assumption appears to hold. In this class, we will cover this transformation option. In many situations, a log transformation works well. For log transformed data it is useful to back-transform means to the original scale so readers can compare the values as originally measured. This back-transformed mean is the geometric mean. Example: Y X=logY 1 0 10 1 100 2 Y = 37 X = 1 y ~ IDX x The arithmetic mean is Y = 37 The geometric mean is lOx = 101 = 10 It's always true that: geometric mean < arithmetic mean A random variable, Y, such that 10g(Y) or In(Y) is normal is called a lognormal random variable. If In(Y) =X rv N(JL, (72),then E(Y) = eIJ+!q2 eIJ is the population geometric mean of Y. Mean of Y --------

How do we find a confidence interval for the geom~ric mean?rregal/documents/stat5411/5411_Sec_8.5.pdf · If In(Y) rv N (p" a2) and a2 is small, then the coefficient of variation,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How do we find a confidence interval for the geom~ric mean?rregal/documents/stat5411/5411_Sec_8.5.pdf · If In(Y) rv N (p" a2) and a2 is small, then the coefficient of variation,

See 8.5 Transformations

If we find that our data don't look very normal, we have a few options.

1) Use a different distribution. For example, Weibull distributions are often usedto model lifetimes of machines.

2) There are "nonparametric methods" that don't rely on any assumption aboutthe underlying distribution. These methods sacrifice some power. When adistribution can be used, you are better off doing so.

3) Re-scale (transform) the data so that the normality assumption appears tohold.

In this class, we will cover this transformation option. In many situations, a logtransformation works well.

For log transformed data it is useful to back-transform means to the original scaleso readers can compare the values as originally measured. This back-transformedmean is the geometric mean.

Example:

Y X=logY1 0

10 1100 2

Y = 37 X = 1

y~

IDX

x

The arithmetic mean is Y = 37The geometric mean is lOx = 101= 10

It's always true that: geometric mean < arithmetic mean

A random variable, Y, such that 10g(Y) or In(Y) is normal is called a lognormalrandom variable.

If In(Y) = X rv N(JL,(72),then E(Y) = eIJ+!q2

eIJ is the population geometric mean of Y.

Mean of Y

--------

Page 2: How do we find a confidence interval for the geom~ric mean?rregal/documents/stat5411/5411_Sec_8.5.pdf · If In(Y) rv N (p" a2) and a2 is small, then the coefficient of variation,

How do we find a confidence interval for the geom~ric mean?

l!sing the X = In(Y) values, we find a 95% confidence interval for I-t,(ilL, {Lu)

ilL= X - t~ilu = X + tlfnP({LL < I-t < {Lu) = 0.95P(eJLL < e/J < e/Ju) = 0.95 Confidence Interval for Geometric Mean

Comparing 2 groups with log transformation

P(XI - X2 - tSEXI-X2 < P,l - 1-t2< Xl - X2 + tSExI-X2)= 0.95

P(eXl-x2-tSEi21-i22 < ::~ < eXI-X2+tSEi21-i22)= 0.95

Backtransforming confidence intervals for the difference in means in the log scalegives a confidence interval for the ratio of the geometric means. For at = a~,the ratio of the population geometric means in also the ratio of the means, so thebacktransformed logscale confidence interval is an interval for the ratio of the meansin the original scale.

We just backtransform the endpoints of the confidence interval in the In scale.

Or we could find a confidence interval for logY and then find (10JJL,10JJU)

Note: This is a confidence interval for the geometric mean of Y, not the mean ofY.

If In(Y) rv N(p"a2) and a2 is small, then the coefficientof variation, CV, for Y inthe orginal scale is approximately a * 100%

The coefficient of variation is defined asCV = Standard Deviation X 10001Mean 70CV = relative variation

For example if In(Y) has standard deviation 0.01, then the coefficient of variationof Y is about 0.01*100 = 1%

Standard deviations in the In scale represent relative variation in the original scale.So if populations have similar coefficients of variation, then In(Y) or 10g(Y) willhave similar variances.

It's not uncommon for groups with bigger values to have more variation than treat-ment groups with smaller values. But often the groups have similar relative varia-tion, CV's. In these situations, transforming to a logarithmic scale can put us intothe usual ANOVA situation with equal variances.

-- ~ - ---- - --~ - -

Page 3: How do we find a confidence interval for the geom~ric mean?rregal/documents/stat5411/5411_Sec_8.5.pdf · If In(Y) rv N (p" a2) and a2 is small, then the coefficient of variation,

. .

In other situations, different transformations help. - .

X -l-y

SometimesY = time

X = rate = ~

Y f'J Poisson

Y = number of random events in time or space

vYmakes variances nearly equal as long as J.ty isn't too small

Y f'J Binomial

Y = number of successes in n independent trials arcsin vYmakes variances nearlyequal as long as the probability of success isn't too close to 0 or 1.

i and ..;x are special cases of power functions.l - X -Ix-..;x =X1/2

The In(X) function is related to a power function by

1. X,\ - 11m - 1 (

'\-+0 A - n X)

The family of power transformations in the formX>'-l---x-

are called Box-Cox transformations.

SAS procedure TRANSREG and other software packages allow you to see howdifferent power transformations, including a logarithmic transformation, performin giving us normally distributed data with equal variances.

-- -- - - ~ ---~