Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
BIOL2300 BiostatisticsChapter 7
Point estimates, confidence intervals and minimum sample size for proportion, mean and variance
Inferential statistics• estimate a population parameter (proportion,
mean, variance)– confidence interval for point estimate
• hypothesis testing– H1: patients taking Vioxx (Merck) have MORE
heart attacks and strokes than those not taking Vioxx
– H0: NULL hypothesis, EQUAL number of heart attacks and strokes in control and treatment groups
Merck • The arthritis medication Vioxx was removed
from US market on Sep 30, 2004, after data from a clinical trial showed an increased risk of heart attack, stroke, blood clots and other cardiovascular illnesses.
• FDA announced in 2004 that patients taking Vioxx have a 50 percent greater chance of heart attacks and sudden cardiac death, and that patients taking the highest recommended daily dosage of Vioxx had a 300 percent greater chance of heart attack and sudden cardiac death. – source of information from this paragraph is
http://www.yourlawyer.com/topics/overview/vioxx
ESTIMATING PROPORTIONS
Requirements to check before approximating population proportion
• Simple random sample• Sampling with replacement (binomial)
– fixed number of trials– trials independent– 2 outcomes: success, failure
• Conditions allowing approximation of binomial by normal are satisfied:
where p,q are estimated by p-hat (resp. q-hat), the proportion of successes (resp. failures) in sample
• point estimate: a single value used as approximation for a population parameter
• confidence interval: a real interval
such that we are 1-α confident that p lies in that interval
Rationale for confidence interval
Warning: Take care with proper interpretation of confidence intervals
• NOT that probability that p lies in given interval (a,b) is 95% (1 - 5% where α equals 5%), since p is a fixed constant and either belongs to (a,b) or not.
• Proper interpretation is that the probability that p belongs to is at least 95%, if one were to take ALL samples of size n, compute the corresponding p-hat and test if p belongs to
Critical value
Critical value• Critical value for 1-α confidence interval is the
z-score such that
How to find 95% confidence interval for point estimator of proportion: Margin of Error
• Recall that for proportion of successes in n independent Bernouilli trials, we have
Margin of error E for estimate of proportion for 1-α confidence interval
Why is E so defined?
Sample size for estimating proportion p with confidence 1-α
Minimum sample size is independent of population size• Previous computation of minimum
sample size depends on probability p of success, q of failure, margin of error E, and population standard deviation, but NOT on population size.
Example
Problem on proportion estimation
• The drug Eliquis (apixaban) is used to help prevent blood clots in certain patients. In clinical trials, among 5924 patients treated with Eliquis, 153 developed the adverse reaction of nausea (based on data from Bristol-Myers Squibb Co.). Construct a 99% confidence interval for the proportion of adverse reactions.
ESTIMATING THE MEAN
Estimate of mean
Requirements to estimate µwhen σ is known
1) Simple random sample (all samples of same size have same probability of being selected)
2) Value of population standard deviation is known
3) Population is normally distributed or n>30
Problems
Answer: yes
1) Simple random sample (all samples of same size have same probability of being selected)
2) Value of population standard deviation is known: sigma = 13
3) n=239>30
Sample size for estimating mean µ with confidence 1-α
This is not quite right, since the Excel functionCEILING(real value, 0) should be used to round UP.
If σ not known, or if n<30, then replace critical value for normal distribution by
critical value for t-distribution
Out[2]=
t-distribution is symmetric, but wider than normal distribution
Superimposition of normal and Student T-distributions
Standard normal distribution has HIGHER y-intercept(~0.4) than all T-distributions, but then has SLIMMER tail.
T-distribution for n=3…12
2-tailed T-test
Right-tailed T-test
Left-tailed T-test
Excel analogue of normdist for the t-distribution, when n<30 or population is population is
approximately normal but σ not known
TDISTReturns the Percentage Points (probability) for the Student t-distribution where a
numeric value (x) is a calculated value of t for which the Percentage Points are to be computed. The t-distribution is used in the hypothesis testing of small sample data sets. Use this function in place of a table of critical values for the t-distribution.
SyntaxTDIST(x,degrees_freedom,tails)X is the numeric value at which to evaluate the distribution.Degrees_freedom is an integer indicating the number of degrees of freedom.Tails specifies the number of distribution tails to return. If tails = 1, TDIST
returns the one-tailed distribution. If tails = 2, TDIST returns the two-tailed distribution.
Excel analogue of norminv for computing critical values for the t-distribution, when n<30 or
population is approximately normal but σ not knownTINVReturns the t-value of the Student's t-distribution as a function of the probability and
the degrees of freedom.SyntaxTINV(probability,degrees_freedom)Probability is the probability associated with the two-tailed Student's t-distribution.Degrees_freedom is the number of degrees of freedom to characterize the
distribution.Remarks • • TINV returns that value t, such that P(|X| > t) = probability where X is a
random variable that follows the t-distribution and P(|X| > t) = P(X < -t or X > t).
• A one-tailed t-value can be returned by replacing probability with 2*probability. For a probability of 0.05 and degrees of freedom of 10, the two-tailed value is calculated with TINV(0.05,10), which returns 2.28139. The one-tailed value for the same probability and degrees of freedom can be calculated with TINV(2*0.05,10), which returns 1.812462.
Answer• The t-distribution is wider than the normal
distribution, so to capture 95% of area under the curve, when centered at the mean, one must go out further -- i.e. the 95% confidence interval when using the t-distribution is LARGER than the 95% confidence interval when using the normal distribution.
• Thus the answer to the question is that the confidence is LESS than 95%.
ESTIMATING THE VARIANCE
Variance estimation
• point estimate of variance• confidence interval• minimum sample size
• estimation of variance (stdev) of population is used in quality control
Chi square distribution Â2
Out[2]=
Chi square distribution NOT symmetric, unlike normaland t-distributions.
Approximation of Chi-Square distribution with 10 df
Let Z=X+…+X be the sum of 10 values, each valueX = Y2, where Y is sampled from the standard normaldistribution. Obtain 10,000 values for Z and create histogram.
Definition of χ2 distribution
Using Â2 distribution
Excel function chidistCHIDISTReturns the one-tailed probability of the χ2 distribution. The χ2
distribution is associated with a χ2 test. Use the χ2 test to compare observed and expected values. For example, a genetic experiment might hypothesize that the next generation of plants will exhibit a certain set of colors. By comparing the observed results with the expected ones, you can decide whether your original hypothesis is valid.
SyntaxCHIDIST(x,degreesfreedom)X is the value at which you want to evaluate the distribution.Degreesfreedom is the number of degrees of freedom.Remarks • CHIDIST is calculated as CHIDIST = P(X>x)
Excel function chiinvCHIINVReturns the inverse of the one-tailed probability of the chi-squared
distribution. If probability = CHIDIST(x,df), then CHIINV(probability,df) = x. Use this function to compare observed results with expected ones in order to decide whether your original hypothesis is valid.
SyntaxCHIINV(probability,degrees_freedom)Probability is a probability associated with the chi-squared distribution.Degrees_freedom is the number of degrees of freedom.Remarks • Given a value for probability, CHIINV seeks that value x such that
CHIDIST(x, degrees_freedom) = probability. Thus, precision of CHIINV depends on precision of CHIDIST. CHIINV uses an iterative search technique. If the search has not converged after 100 iterations, the function returns the #N/A error value.
Using Excel to compute minimum sample size for variance or stdev estimations
BOOTSTRAPPING
Bootstrapping allows one to create confidence intervals (CI)for proportions, means, variances and standard deviations In the case that the requirements for parametric methods areNot satisfied. NEVERTHELESS, the initial sample must NOTbe biased – it must be a simple random sample.
Bootstrapping construction of a 90% confidence interval for proportion
Mathematica demo
Similar constructions for mean, variance and standard deviation