15
Quarter 3 Portfolio Nikita Mehta Period 2 Mr. Hugus-AP Statistics

Quarter 3 Portfolio

Embed Size (px)

Citation preview

Page 1: Quarter 3 Portfolio

Quarter 3Portfolio

Nikita MehtaPeriod 2

Mr. Hugus-AP Statistics

Page 2: Quarter 3 Portfolio

Nikita MehtaPeriod 2

Quarter 3 Main Concepts

Chapter 18-Sampling Distribution Models: Sampling distribution model are different random samples that give different

values for a statistic. The sampling distribution model shows the behavior of the statistic over all the possible samples for the sample size n.

Sampling Variability/Sampling error is the variability we expect to see from one random sample to another.

If assumptions of independence and random sampling are met, and we expect at least 10 successes and 10 failures, then the sampling distribution of a proportion is modeled by a Normal model with a mean equal to the true

proportion, p, and a standard deviation equal to . The central limit theorem states that the sampling distribution model of the

sample mean (and proportion) from a random sample is approximately Normal for a large n, regardless of the distribution of the population, as long as the observations are independent.

If assumptions of independence and random sampling are met, and the sample size is large enough, the sampling distribution of the sample mean is modeled by a Normal Model with a mean equal to the population mean, ,

and a standard deviation equal to . Chapter 19-Confidence Intervals for Proportions

Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error.

Our best estimate of the true population proportion is the proportion we observed in the sample, so we center our confidence interval there.

The higher the level of confidence we want, the wider our confidence interval becomes.

The larger sample size we have, the narrower our confidence interval can be. A level C confidence interval for a model parameter is an interval of values of

the form “estimate plus or minus margin of error” found from data in such a way that C% of all random samples will yield intervals that capture the true parameter value.

The number of standard errors to move away from the mean of sampling distribution to correspond to the specified level of confidence. The critical value, denoted z*, is usually found from table or with technology.

Chapter 20-Testing Hypothesis About Proportions The null hypothesis specifies a population model parameter of interest and

proposes a value for that parameter.

Page 3: Quarter 3 Portfolio

The alternate hypothesis contains the values of the parameter that we consider plausible if we reject the null hypothesis. It can be one sided or two sided.

Two-sided alternative is an alternative hypothesis in which we are interested in deviations in either direction away from the hypothesized parameter value.

One-sided hypothesis is one-sided when we are interested in deviations in only one direction away from the hypothesized parameter value.

P-Value- The probability of observing a value for a test statistic at least as far from the hypothesized value as the statistic value actually observed if the null hypothesis is true.

One proportion z-test is a test of null hypothesis that the proportion of a

single sample equals a specified value by referring the statistic z= to a Standard Normal model.

Chapter 21-More about Tests and Intervals The p-value can indicate evidence against the null hypothesis when it’s small,

but it does not tell us the probability that the null hypothesis is true. Alpha Level- The threshold P-value that determines when we reject a null

hypothesis. If we consider a statistic whose P-value based on the null hypothesis is than alpha, we reject that null hypothesis.

When a p-value falls below the alpha value, we say that the test is “statistically significant” at that alpha value.

The alpha value is also called the significance level, most often in a phrase such as a conclusion that a particular test is “significant at the 5% significance level”

The error of rejecting the null hypothesis when if fact it is true is a Type I error, denoted with alpha

The error of failing to reject a null when in fact it is true is called a Type II error, denoted with beta

The probability that a hypothesis test will correctly reject a false null hypothesis is the power of the test. To find power, we must specify a particular alternative parameter value as the “true” value. For any specific value in the alternative, the power is 1-beta.

Chapter 22-Comparing Two Proportions The variance of independent random variables add

Page 4: Quarter 3 Portfolio

The sampling distribution of is, under appropriate assumptions and conditions, modeled with mean and standard deviation

A two-proportion z-interval gives a confidence interval for the true difference in proportion, in two independent groups

Two-proportion z-test: Test the null hypothesis by referring

the statistic . When we have data from different sources that we believe are homogeneous,

we can get a better estimate of the common proportion and its standard deviation. We can combine, or pool, the data into a single group for the purpose of estimating the common proportion. The resulting pooled standard error is based on more data and is thus more reliable (if the null hypothesis is true and the groups are truly homogeneous).

Chapter 23-Inferences about Means A family of distributions indexed by its degrees of freedom. The t-models are

unimodal symmetric, and bell shaped, but generally have fatter tails and a narrower center than the Normal model. As the degrees of freedom increase, the t-distributions approach the Normal.

Assumptions and Conditions: Independence, Random, 10%, Nearly Normal (check histogram)

A one sample t-interval for the proportion mean is , where

SE( = . One sample t-test for the mean: The one sample t-test for the mean tests the

hypothesis null using the statistic

The standard error of is SE( )= . Chapter 24-Comparing Means

Two sample t methods allow us to draw conclusions about the difference between he means of two independent groups.

The two-sample methods make relatively few assumptions about the underlying populations, so they are usually the method of choice for comparing two sample means.

The assumptions and conditions that need to be checked are: Independence, Independence of groups, Random, Nearly Normal

Page 5: Quarter 3 Portfolio

A hypothesis test for the difference between the means of the two independent groups are as follows: The null is that the difference of the two means are equal to 0 and the alternate is that they are not equal to 0.

Data from two or more populations may sometimes be combined, or pooled, to estimate a statistic when we are willing to assume that the estimated value is the same in both populations. The resulting larger sample size may lead to an estimate with lower sample variance. However, pooled estimates are appropriate only when the required assumptions are true.

Chapter 25-Paired Samples and Blocks Data are paired when the observations are collected in pairs or the

observations in one group are naturally related to observations in the other.. The simplest form of pairing is to measure each subject twice-often before

and after a treatment is applied. More sophisticated forms of pairing in experiments are a form of blocking and arise in other contexts.

The hypothesized difference is almost always 0, using the statistic

with n-1 degrees of freedom, where SE( )= . and n is the number of pairs. Assumptions and Conditions: Random, Independence, Paired Data

Assumption, Nearly Normal Condition Making a confidence interval for matched pairs follows exactly the steps for a

one-sample t-interval.

Page 6: Quarter 3 Portfolio

Nikita MehtaPeriod 2

Quarter 3 Problems

Chapter 18: p. 43633. GPAs. A college’s data about incoming freshmen indicates that the mean of their high school GPA was 3.4, with a standard deviation of .35; the distribution was roughly mound-shaped and only slightly skewed. The students are randomly assigned to freshman writing seminars in groups of 25. What might the mean GPA of one of these seminar groups be? Describe the appropriate sampling distribution model-shape, center, and spread-with attention to assumptions and conditions.

Given that the mean was 3.4 and SD at .35: Independent assumption: It is reasonable to assume that the probability that one seminar group is independent of the other seminar group. Random: The students were a random sample and the students are randomly assigned to freshman writing seminars in groups of 25. 10%-25 are less than 10% of all college studentsLarge Enough-The distribution of the population is unimodal and symmetrical, so the sample is large enough

SD( y )= s

√n=. 35

√25=.07

68%:m−s=3 .33u+s=3 .47

95%:m−2 s=3 .26u+2 s=3 .54

99 .7%:m−3 s=3 .19u+3 s=3.61

Chapter 19: p. 45617. Teenage drivers. An insurance company checks police records on 582 accidents selected at random and notes that teenagers were at the wheel in 91 of them.

Page 7: Quarter 3 Portfolio

a) Create a 95% confidence interval for the percentage of all auto accidents that involve teenage drivers. n=582

p

¿̂=91582

=.15636

¿ SE( p

¿̂)=√. 15636 ´ . 84364582=.01506 ¿

ME=z∗´ SE ( p¿̂)=1 .96 ´ .01506=. 02951¿

95%:( p¿̂−ME , p

¿̂+ME ) ¿

( .15636−.02951 , 0. 15636+. 02951)=( . 127 , .186)

¿b) Explain what your interval means.

Based on the given sample of 582 accidents, we are 95% confident that the population proportion between 12.7% and 18.6% of all auto accidents that involve teenage drivers.

c) Explain what “95% confidence” means. 95% of samples of this size will produce confidence intervals that capture the true proportion. We are 95% confident that the true proportion lies in our interval. Our uncertainty is about whether the particular sample we have at hand is one of the successful ones or one of the 5% that fail to produce an interval that capture the true value.

d) A politician urging tighter restrictions on drivers’ licenses issued to teens says, “In one of every five auto accidents, a teenager is behind the wheel.” Does your confidence interval support of contradict this statement? Explain. The given one of every five means 20% of all auto accidents. This percentage contradicts our confidence limits 12.7% and 18.6%. So our confidence interval does not support, since the interval is completely below 20% auto accidents, a teenagers is behind the wheel.

Chapter 20: p. 47817. Law School. According to the Law School Admission Council, in the fall of 2006, 63% of law school applicants were accepted to some law school. The training program LSATisfaction claims that 163 out of the 240 students trained in 2006 were admitted to law school. You can safely consider these trainees to be representative of the population of law school applicants. Has LSATisfaction demonstrated a real improvement over the national average?

a) What are the hypothesis? H0=p=. 63H A=p>. 63

b) Check the conditions and find P-value. The sample is an SS and is representative of the number of students trained in 2006 were admitted to law school is 240<10% of all law applicants. We expect np0=240(.63)-151.2 and nq0=240(.37)=88.8 were not to be admitted.

Page 8: Quarter 3 Portfolio

And they both are less than 10% of the entire population of students. We have n=240, X-163 and a p=.63.

p

¿̂=Xn

=160240

=. 675

¿ SD( p

¿̂)=√ .63( 1−. 63)240=. 031164 ¿

z=p

¿̂−p0

SD( p¿̂)

=.6791−.63.031164

=1 .4439 ¿ p−value=.0749 ¿

c) Would you recommend this program based on what you see here? Explain. From the p-value, we came to know that the evidence is weak, and we may reject the null at 10& level of significance. Hence there is some indication that the program may be successful. That candidates should decide whether they can afford the time and expense.

Chapter 21: p. 50121. Testing Cars. A clean air standard requires that vehicle exhaust emissions not exceed specified limits for various pollutants. Many states require that cars be tested annually to be sure they meet these standards. Suppose state regulators double-check a random sample of cars that suspect a repair show had certified as okay. They will revoke the shop’s license if they find significant evidence that the shop is certifying vehicles that do not meet standards.

a) In this context, what is a Type I error? It is decided that the shop is not meeting standards when it is.

b) In this context, what is a Type II error?The shop is certified as meeting standards when it is not.

c) Which type of error would the shop’s owner consider more serious?Type I

d) Which type of error might environmentalists consider more serious? Type II

Chapter 22: p. 52123. Politics and Sex. One month before the election, a poll of 630 randomly selected voters showed 54% planning to vote for a certain candidate. A week later it became known that he had has extramarital affair, and a new poll showed only 51% of 1010 voters supporting him. Do these results indicate a decrease in voter support for his candidacy?

a) Test an appropriate hypothesis and state your conclusion.H0 : p1−p2=0H A : p1−p2>0p1 is the proportion of voters planning to vote before a week and p2 represents the voters after a week.

Page 9: Quarter 3 Portfolio

H0 : p1−p2=0H A : p1−p2>0

p

¿̂pooled

=y1+ y2

n1+n2

=340 .2+515 .1630+1010

=.5215 ¿

SE=.SEpooled ( p

¿̂1− p

2

¿̂)=√ (.5215 )( .4785 )630

+( .5215 )(.4785 )1010

= 0254 ¿

p

¿̂1

− p

2¿̂=. 54− .51=. 03 ¿

z=( p

¿̂

1−

p

2¿̂ )−0

SE

pooled( p

¿̂

1−p

2¿̂ )

=1.18 ¿

P−value :. 119

¿¿

¿

¿

¿

Because the p-value is large, we fail to reject the null. There is not enough evidence to conclude that there is a difference between the proportions of voters.

b) If your conclusion turns out to be wrong, did you make a Type I error or Type II error? Since there was no difference, we do not construct a confidence interval.

c) If you concluded there was a difference, estimate that difference with a confidence interval and interpret your interval in context. N/A

Chapter 23: p. 55623. Pizza. A researcher tests whether the mean cholesterol level among those who eat frozen pizza exceeds the value considered to indicate a health risk. She gets a 7%. In context what does this mean?

If in fact the mean cholesterol of pizza eaters does not indicate a health risk, then only 7 of every 100 samples would have mean cholesterol levels as high as observed in this sample.

Chapter 24: p. 58429. Lower scores? Newspaper headlines recently announced a decline in science scores among high school seniors. In 2000, a total of 15, 109 seniors tested by The National Assessment in Education Program scored a mean of 147 points. Four years earlier, 7537 seniors had averaged 150 points. The standard error of the difference in the mean scores for the two groups was 1.22.

a) Have the science scored declined significantly? Cite appropriate statistical evidence to support your conclusion.

Page 10: Quarter 3 Portfolio

n2000=15109n1996=7537y2000=147y1996=150SE=1.22DF=7537−1=753695%:ME=t∗´ SE( y2000− y1996 )=2 .3912[ y2000− y1996 ]±ME=[147−150 ]±2.3192=( .608 ,5 .3912 )

Since 0 is not in the interval, we can conclude that there is a significant difference between the mean scores for the two groups. And hence we conclude that score in 1996 were significantly higher compared to the scores in 2000.

b) The sample size in 2000 was almost double that in 1996. Does this make the results more convincing or less? Explain. As the sample size in 2000 was almost double that in 1996. This would make the results remains the same. Because both the sample sizes are large, there is not much difference in both results, when the condition that we assume is satisfied.

Chapter 25: p. 60415. Temperatures. The table below gives the average high temperatures in January and July for several European cities. Write a 90% confidence interval for the mean temperature difference between summer and winter in Europe. Be sure to check conditions for inference, and clearly explain what your interval means.

City Jan. JulyVienna 34 75Copenhagen 36 72Paris 42 76Berlin 35 74Athens 54 90Rome 54 88Amsterdam 40 69Madrid 47 87London 44 73Edinburg 43 65Moscow 21 76Belgrade 37 84

Page 11: Quarter 3 Portfolio

Assumptions and ConditionsPaired DataIndependenceRandom-This is not a random sample, so we might by vary of inferring that this differenced in the temperatures recorded for all European cities. Nearly Normal-The boxplot is symmetric. No Outliers. n=12d=36 .8333sd=8 .6638

SE(d )=sd√12

=2 .501

n−1=11t11=1 .796ME=t11∗´ SE( d )=1 .796 (2.50102)=4 .49183d±ME=(32 .341 ,41 .325 )

We are 90% confident that the mean difference in temperatures in European cities in pairs of months is between 32.34 and 41.32. And hence we conclude that the average difference in temperatures in European cities in July is between 32.3 degrees F and 41.3 degrees F higher than in January.