Upload
madeleine-abigail-wheeler
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
1
Always be mindful of the kindness and not the faults of others.
Categorical Data
Sections 10.1 to 10.5Estimation for proportionsTests for proportionsChi-square tests
3
Example
Researchers in the development of new treatments for cancer patients often evaluate the effectiveness of new therapies by reporting the proportion of patients who survive for a specified period of time after completion of the treatment. A new treatment of 870 patients with lung cancer resulted in 330 survived at least 5 years.
4
Example
Estimate , the proportion of all patients with lung cancer who would survive at least 5 years after being administered this treatment
How much would you estimate the proportion as?
5
Distribution of Sample Proportion
Y: the number of successes in the n trials (independent and identical trials)
What’s the distribution of Y?
Sample proportion,
n
Y
n
)1(ˆ
ˆ
6
Distribution of Sample Proportion
When n≥ 5 and n(1-≥ 5, the distribution of Y can be approximated by a normal distribution.
(approximate) (1-) Confidence Interval for :
Optional: (exact) C.I. for for small sample
ˆ2/ ˆˆ Z
7
Sample Size
2
22/ )1(
E
Zn
Where E is the largest tolerable error at (1- confidence level.
8
Test for a Large Sample
When n≥ 5 and n(1-≥ 5, the test statistic is:
n
Z)1(
ˆ
00
0
9
Inference about 2 Proportions
Notation:
Population 1 Population 2
Proportion
Sample size n1 n2
# of successes y1 y2
Sample proportion
1
11ˆ n
y
2
22ˆ
n
y
10
Estimation for
Point estimate: 21 ˆˆ
21ˆˆ 21
1
22
1
11ˆˆ
)1()1(11 nn
11
Estimation for
(1-) Confidence Interval for two large samples:
1
22
1
11ˆˆ
)ˆ1(ˆ)ˆ1(ˆˆ
21 nn
21 ˆˆ2/21 ˆˆˆ z
12
Example 10.6
A company markets a new product in the Grand Rapids and Wichita.
In Grand Rapids, the company’s advertising is based entirely on TV commercials.
In Wichita, based on a balanced mix of TV, radio, newspaper, and magazine.
2 months after the ad campaign begins, the company conducts surveys to determine consumer awareness of the product.
13
Example 10.6: Data Set
Grand Rapids Wichita
# of interviewed 608 527
# of aware 392 413
Q: Calculate a 95% C.I. for the regional difference in the proportion of all consumers who are aware of the product.
14
Example 10.6 (conti.)
Conduct a test at =0.05 to verify if there are >10% more Wichita consumers than Grand Rapids consumers aware of the product.
15
Test for Large Samples)
When n1≥ 5 and n1(1-≥ 5; n2≥ 5 and n2(1-≥ 5, the test statistic of Ho: p1-p2=d is
Optional: Fisher Exact Test (p.511)
1
22
1
11
21
)ˆ1(ˆ)ˆ1(ˆ
)ˆˆ(
nn
dZ
Minitab
Z test for one proportion:
Stat >> Basic Statistics >>1 proportion
Z test for two proportions:
Stat >> Basic Statistics >>2 proportion
16
17
Chi-Square Goodness of Fit Test
More than two possible outcomes per trial the multinomial experiment
1. The experiment consists of n identical trials.
2. Each trial results in one of k outcomes with probabilities ...k.
Y=(Y1,…,Yk); Yi = the # of outcome i.
18
Chi-square Goodness of Fit Test
Goal: We are interested in testing a hypothesized distribution of Y (i.e. a set of i’s values).
Hypotheses:
Ho: i = io for all i vs. Ha: Ho is false
19
Chi-square Goodness of Fit Test
Test Statistic:
ni = the observed YiEi = the expected Yi = nio
i i
ii
E
En 22 )(
20
Chi-square Goodness of Fit Test
Rejection Region:
Reject Ho if where df=k-1.
Note:
This test can be trusted only when 80% of more cells of the Ei’s are at least 5.
2,
2df
21
Example 10.10
Category Hypothesized % Observed counts
Marked decrease 50 120
Moderate decrease 25 60
Slight decrease 10 10
Stationary of slight increase
15 10
Minitab: Stat >> Tables >> Chi-Square Goodness-of-Fit Test(One Variable)
22
Example 10.11
23
Contingency Table(Example 10.12)
nijAge Category
Severity of skin disease
1 2 3 4 Total ni*
1 15 32 18 5 70
2 8 29 23 18 78
3 1 20 25 22 68
Total n*j 24 81 66 45 216 = n
24
Contingency Table
2 categorical variables:
row and column indexed by i and j, respectively
If they are independent, then
n
nnE ji
ijˆ
25
Test for Independence of 2 Var’s
Hypotheses:
Ho: the row and column variables are independent
Ha: they are dependent
Test Statistic:
ji ij
ijij
E
En
,
22
ˆ)ˆ(
26
Test for Independence of 2 Var’s
Rejection Region:
Reject Ho if where df=(r-1)(c-1).
Note:
This test can be trusted only when 80% of more cells of the are at least 5.
2,
2df
ijE
Minitab: Stat >> Tables >> Chi-Square Test(Two-Way Table in Worksheet)
Chi-Square Test: C1, C2, C3, C4
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
C1 C2 C3 C4 Total
1 15 32 18 5 70
7.78 26.25 21.39 14.58
6.706 1.260 0.537 6.298
2 8 29 23 18 78
8.67 29.25 23.83 16.25
0.051 0.002 0.029 0.188
3 1 20 25 22 68
7.56 25.50 20.78 14.17
5.688 1.186 0.858 4.331
Total 24 81 66 45 216
Chi-Sq = 27.135, DF = 6, P-Value = 0.000
27
Example 10.12