91
BIOL2300 Biostatistics Chapter 7 Point estimates, confidence intervals and minimum sample size for proportion, mean and variance

BIOL2300 Biostatistics Chapter 7 - Boston College

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BIOL2300 Biostatistics Chapter 7 - Boston College

BIOL2300 BiostatisticsChapter 7

Point estimates, confidence intervals and minimum sample size for proportion, mean and variance

Page 2: BIOL2300 Biostatistics Chapter 7 - Boston College

Inferential statistics• estimate a population parameter (proportion,

mean, variance)– confidence interval for point estimate

• hypothesis testing– H1: patients taking Vioxx (Merck) have MORE

heart attacks and strokes than those not taking Vioxx

– H0: NULL hypothesis, EQUAL number of heart attacks and strokes in control and treatment groups

Page 3: BIOL2300 Biostatistics Chapter 7 - Boston College

Merck • The arthritis medication Vioxx was removed

from US market on Sep 30, 2004, after data from a clinical trial showed an increased risk of heart attack, stroke, blood clots and other cardiovascular illnesses.

• FDA announced in 2004 that patients taking Vioxx have a 50 percent greater chance of heart attacks and sudden cardiac death, and that patients taking the highest recommended daily dosage of Vioxx had a 300 percent greater chance of heart attack and sudden cardiac death. – source of information from this paragraph is

http://www.yourlawyer.com/topics/overview/vioxx

Page 4: BIOL2300 Biostatistics Chapter 7 - Boston College

ESTIMATING PROPORTIONS

Page 5: BIOL2300 Biostatistics Chapter 7 - Boston College

Requirements to check before approximating population proportion

• Simple random sample• Sampling with replacement (binomial)

– fixed number of trials– trials independent– 2 outcomes: success, failure

• Conditions allowing approximation of binomial by normal are satisfied:

where p,q are estimated by p-hat (resp. q-hat), the proportion of successes (resp. failures) in sample

Page 6: BIOL2300 Biostatistics Chapter 7 - Boston College

• point estimate: a single value used as approximation for a population parameter

• confidence interval: a real interval

such that we are 1-α confident that p lies in that interval

Page 7: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 8: BIOL2300 Biostatistics Chapter 7 - Boston College

Rationale for confidence interval

Page 9: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 10: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 11: BIOL2300 Biostatistics Chapter 7 - Boston College

Warning: Take care with proper interpretation of confidence intervals

• NOT that probability that p lies in given interval (a,b) is 95% (1 - 5% where α equals 5%), since p is a fixed constant and either belongs to (a,b) or not.

• Proper interpretation is that the probability that p belongs to is at least 95%, if one were to take ALL samples of size n, compute the corresponding p-hat and test if p belongs to

Page 12: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 13: BIOL2300 Biostatistics Chapter 7 - Boston College

Critical value

Page 14: BIOL2300 Biostatistics Chapter 7 - Boston College

Critical value• Critical value for 1-α confidence interval is the

z-score such that

Page 15: BIOL2300 Biostatistics Chapter 7 - Boston College

How to find 95% confidence interval for point estimator of proportion: Margin of Error

• Recall that for proportion of successes in n independent Bernouilli trials, we have

Page 16: BIOL2300 Biostatistics Chapter 7 - Boston College

Margin of error E for estimate of proportion for 1-α confidence interval

Page 17: BIOL2300 Biostatistics Chapter 7 - Boston College

Why is E so defined?

Page 18: BIOL2300 Biostatistics Chapter 7 - Boston College

Sample size for estimating proportion p with confidence 1-α

Page 19: BIOL2300 Biostatistics Chapter 7 - Boston College

Minimum sample size is independent of population size• Previous computation of minimum

sample size depends on probability p of success, q of failure, margin of error E, and population standard deviation, but NOT on population size.

Page 20: BIOL2300 Biostatistics Chapter 7 - Boston College

Example

Page 21: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 22: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 23: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 24: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 25: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 26: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 27: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 28: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 29: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 30: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 31: BIOL2300 Biostatistics Chapter 7 - Boston College

Problem on proportion estimation

• The drug Eliquis (apixaban) is used to help prevent blood clots in certain patients. In clinical trials, among 5924 patients treated with Eliquis, 153 developed the adverse reaction of nausea (based on data from Bristol-Myers Squibb Co.). Construct a 99% confidence interval for the proportion of adverse reactions.

Page 32: BIOL2300 Biostatistics Chapter 7 - Boston College

ESTIMATING THE MEAN

Page 33: BIOL2300 Biostatistics Chapter 7 - Boston College

Estimate of mean

Page 34: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 35: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 36: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 37: BIOL2300 Biostatistics Chapter 7 - Boston College

Requirements to estimate µwhen σ is known

1) Simple random sample (all samples of same size have same probability of being selected)

2) Value of population standard deviation is known

3) Population is normally distributed or n>30

Page 38: BIOL2300 Biostatistics Chapter 7 - Boston College

Problems

Page 39: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 40: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 41: BIOL2300 Biostatistics Chapter 7 - Boston College

Answer: yes

1) Simple random sample (all samples of same size have same probability of being selected)

2) Value of population standard deviation is known: sigma = 13

3) n=239>30

Page 42: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 43: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 44: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 45: BIOL2300 Biostatistics Chapter 7 - Boston College

Sample size for estimating mean µ with confidence 1-α

Page 46: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 47: BIOL2300 Biostatistics Chapter 7 - Boston College

This is not quite right, since the Excel functionCEILING(real value, 0) should be used to round UP.

Page 48: BIOL2300 Biostatistics Chapter 7 - Boston College

If σ not known, or if n<30, then replace critical value for normal distribution by

critical value for t-distribution

Out[2]=

t-distribution is symmetric, but wider than normal distribution

Page 49: BIOL2300 Biostatistics Chapter 7 - Boston College

Superimposition of normal and Student T-distributions

Standard normal distribution has HIGHER y-intercept(~0.4) than all T-distributions, but then has SLIMMER tail.

Page 50: BIOL2300 Biostatistics Chapter 7 - Boston College

T-distribution for n=3…12

Page 51: BIOL2300 Biostatistics Chapter 7 - Boston College

2-tailed T-test

Page 52: BIOL2300 Biostatistics Chapter 7 - Boston College

Right-tailed T-test

Page 53: BIOL2300 Biostatistics Chapter 7 - Boston College

Left-tailed T-test

Page 54: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 55: BIOL2300 Biostatistics Chapter 7 - Boston College

Excel analogue of normdist for the t-distribution, when n<30 or population is population is

approximately normal but σ not known

TDISTReturns the Percentage Points (probability) for the Student t-distribution where a

numeric value (x) is a calculated value of t for which the Percentage Points are to be computed. The t-distribution is used in the hypothesis testing of small sample data sets. Use this function in place of a table of critical values for the t-distribution.

SyntaxTDIST(x,degrees_freedom,tails)X is the numeric value at which to evaluate the distribution.Degrees_freedom is an integer indicating the number of degrees of freedom.Tails specifies the number of distribution tails to return. If tails = 1, TDIST

returns the one-tailed distribution. If tails = 2, TDIST returns the two-tailed distribution.

Page 56: BIOL2300 Biostatistics Chapter 7 - Boston College

Excel analogue of norminv for computing critical values for the t-distribution, when n<30 or

population is approximately normal but σ not knownTINVReturns the t-value of the Student's t-distribution as a function of the probability and

the degrees of freedom.SyntaxTINV(probability,degrees_freedom)Probability is the probability associated with the two-tailed Student's t-distribution.Degrees_freedom is the number of degrees of freedom to characterize the

distribution.Remarks • • TINV returns that value t, such that P(|X| > t) = probability where X is a

random variable that follows the t-distribution and P(|X| > t) = P(X < -t or X > t).

• A one-tailed t-value can be returned by replacing probability with 2*probability. For a probability of 0.05 and degrees of freedom of 10, the two-tailed value is calculated with TINV(0.05,10), which returns 2.28139. The one-tailed value for the same probability and degrees of freedom can be calculated with TINV(2*0.05,10), which returns 1.812462.

Page 57: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 58: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 59: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 60: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 61: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 62: BIOL2300 Biostatistics Chapter 7 - Boston College

Answer• The t-distribution is wider than the normal

distribution, so to capture 95% of area under the curve, when centered at the mean, one must go out further -- i.e. the 95% confidence interval when using the t-distribution is LARGER than the 95% confidence interval when using the normal distribution.

• Thus the answer to the question is that the confidence is LESS than 95%.

Page 63: BIOL2300 Biostatistics Chapter 7 - Boston College

ESTIMATING THE VARIANCE

Page 64: BIOL2300 Biostatistics Chapter 7 - Boston College

Variance estimation

• point estimate of variance• confidence interval• minimum sample size

• estimation of variance (stdev) of population is used in quality control

Page 65: BIOL2300 Biostatistics Chapter 7 - Boston College

Chi square distribution Â2

Out[2]=

Chi square distribution NOT symmetric, unlike normaland t-distributions.

Page 66: BIOL2300 Biostatistics Chapter 7 - Boston College

Approximation of Chi-Square distribution with 10 df

Let Z=X+…+X be the sum of 10 values, each valueX = Y2, where Y is sampled from the standard normaldistribution. Obtain 10,000 values for Z and create histogram.

Page 67: BIOL2300 Biostatistics Chapter 7 - Boston College

Definition of χ2 distribution

Page 68: BIOL2300 Biostatistics Chapter 7 - Boston College

Using Â2 distribution

Page 69: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 70: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 71: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 72: BIOL2300 Biostatistics Chapter 7 - Boston College

Excel function chidistCHIDISTReturns the one-tailed probability of the χ2 distribution. The χ2

distribution is associated with a χ2 test. Use the χ2 test to compare observed and expected values. For example, a genetic experiment might hypothesize that the next generation of plants will exhibit a certain set of colors. By comparing the observed results with the expected ones, you can decide whether your original hypothesis is valid.

SyntaxCHIDIST(x,degreesfreedom)X is the value at which you want to evaluate the distribution.Degreesfreedom is the number of degrees of freedom.Remarks • CHIDIST is calculated as CHIDIST = P(X>x)

Page 73: BIOL2300 Biostatistics Chapter 7 - Boston College

Excel function chiinvCHIINVReturns the inverse of the one-tailed probability of the chi-squared

distribution. If probability = CHIDIST(x,df), then CHIINV(probability,df) = x. Use this function to compare observed results with expected ones in order to decide whether your original hypothesis is valid.

SyntaxCHIINV(probability,degrees_freedom)Probability is a probability associated with the chi-squared distribution.Degrees_freedom is the number of degrees of freedom.Remarks • Given a value for probability, CHIINV seeks that value x such that

CHIDIST(x, degrees_freedom) = probability. Thus, precision of CHIINV depends on precision of CHIDIST. CHIINV uses an iterative search technique. If the search has not converged after 100 iterations, the function returns the #N/A error value.

Page 74: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 75: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 76: BIOL2300 Biostatistics Chapter 7 - Boston College

Using Excel to compute minimum sample size for variance or stdev estimations

Page 77: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 78: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 79: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 80: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 81: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 82: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 83: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 84: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 85: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 86: BIOL2300 Biostatistics Chapter 7 - Boston College
Page 87: BIOL2300 Biostatistics Chapter 7 - Boston College

BOOTSTRAPPING

Page 88: BIOL2300 Biostatistics Chapter 7 - Boston College

Bootstrapping allows one to create confidence intervals (CI)for proportions, means, variances and standard deviations In the case that the requirements for parametric methods areNot satisfied. NEVERTHELESS, the initial sample must NOTbe biased – it must be a simple random sample.

Page 89: BIOL2300 Biostatistics Chapter 7 - Boston College

Bootstrapping construction of a 90% confidence interval for proportion

Page 90: BIOL2300 Biostatistics Chapter 7 - Boston College

Mathematica demo

Page 91: BIOL2300 Biostatistics Chapter 7 - Boston College

Similar constructions for mean, variance and standard deviation