One-Sample Means · 1 Write out the null and alternative hypotheses. 2 Calculate the test...

One-Sample Means

August 26, 2019

August 26, 2019 1 / 36

One-Sample Means

We can approach hypothesis testing for x̄ in mostly the same way thatwe approached p̂. However,

We will often run into situations where x̄ is not approximatelynormal.

We will develop a framework for dealing with these situations.

Section 7.1 August 26, 2019 2 / 36

The Central Limit Theorem for x̄

When we collect a sufficiently large sample of n independentobservations from a population with mean µ and standard deviation σ,the sampling distribution of x̄ will be nearly normal with mean

and standard errorσ√n

Section 7.1 August 26, 2019 3 / 36

Conditions for Normality

For the Central Limit Theorem to hold, we require

Independence

Normality

Section 7.1 August 26, 2019 4 / 36

Conditions for Normality: Independence

As before, for independence we typically check for

a simple random sample.

a random process.

Section 7.1 August 26, 2019 5 / 36

Conditions for Normality: Normality?

For x̄ to be normally distributed,

We require that the observations x used to calculate x̄ be from anormally distributed population.

When this doesn’t hold, we require a large sample size.

Section 7.1 August 26, 2019 6 / 36

Large Sample Size

So what is a ”large sample size”?

n < 30: A sample size n less than 30 is considered a small sample. Inthis case, we will need the data to come from a nearly normaldistribution.

n ≥ 30: A sample side n of at least 30 is considered a large sample.Then we can assume that x̄ is nearly normal.

In both cases, we need to be wary of outliers as these can causeproblems with our results.

Section 7.1 August 26, 2019 7 / 36

Normal or Not?

Section 7.1 August 26, 2019 8 / 36

Normal or Not?

For the first plot, n = 15 and there are no clear outliers. For the meanto be normally distributed, we would have to justify that these aredraws from a normal distribution.

Section 7.1 August 26, 2019 9 / 36

Normal or Not?

For the second plot, n = 50 but there is an extreme outlier at about 22.This outlier will prevent us from confidently using a normaldistribution.

Section 7.1 August 26, 2019 10 / 36

One-Sample Means

We will start with the situation wherein we know that X ∼ N(µ, σ)and the value of σ is known.

Section 7.1 August 26, 2019 11 / 36

Confidence Interval for µ

This (1− α)100% confidence interval for µ is

x̄± zα/2 ×σ√n

where σ/√n is the SE and zα/2 is again the critical value.

Section 7.1 August 26, 2019 12 / 36

Example

The following n = 5 observations are from a N(µ, 2) distribution. Finda 90% confidence interval for µ.

1.1, 0.5, 2, 1.9, 2.7

Section 7.1 August 26, 2019 13 / 36

Example

So we can be 90% confident that the true mean µ is in the interval(0.1687, 3.1113).

Section 7.1 August 26, 2019 14 / 36

Example

Recall that when we say ”90% confident”, we mean:

If we draw repeated samples of size 5 from this distribution, then90% of the time the corresponding intervals will contain the truevalue of µ.

Section 7.1 August 26, 2019 15 / 36

In practice, we typically do not know the population standarddeviation σ.

Instead, we have to estimate this quantity.

We will use the sample statistic s to estimate σ.

This strategy works quite well when n ≥ 30

Section 7.1 August 26, 2019 16 / 36

This works quite well because we expect large samples to give usprecise estimates such that

SE =σ√n≈ s√

Section 7.1 August 26, 2019 17 / 36

When n ≥ 30 and σ is unknown, a (1− α)100% confidence interval forµ is

x̄± zα/2s√n

where we’ve plugged in s for σ.

Section 7.1 August 26, 2019 18 / 36

Example

The average heart rate of a random sample of 60 students is found tobe 74 with a standard deviation of 11. Find a 95% confidence intervalfor the true mean heart rate of the students.

Section 7.1 August 26, 2019 19 / 36

Sample Size Calculation

Sample size calculation for means is essentially the same as forproportions!

n ≥(zα/2 × sdMoE

Section 7.1 August 26, 2019 20 / 36

Example

A manufacturer claims with 95% confidence that his product isaccurate within 0.2 units with a standard deviation of 0.2626. Whatsample size would we need to demonstrate this claim?

Section 7.1 August 26, 2019 21 / 36

Hypothesis Testing for a Population Mean

We begin with the setting where n ≥ 30.

As before, it is possible to use the confidence interval to completea hypothesis test.

However, we also want to be able to use our test statistic andp-value approaches.

Section 7.1 August 26, 2019 22 / 36

For n ≥ 30, the test statistic is

ts = z =x̄− µ0s/√n

where again s/√n ≈ σ/

√n because we are using a large sample.

Section 7.1 August 26, 2019 23 / 36

There are five steps to carrying out these hypothesis tests:

1 Write out the null and alternative hypotheses.

2 Calculate the test statistic.

3 Use the significance level to find the critical value

use the test statistic to find the p-value.

4 Compare the critical value to the test statistic

compare the p-value to α.

5 Conclusion.

Section 7.1 August 26, 2019 24 / 36

Example

In its native habitat, the average density of giant hogweed is 5 plantsper m2. In an invaded area, a sample of 50 plants produced an averageof 11.17 plants per m2 with a standard deviation of 8.9. Does theinvaded area have a different average density than the native area?Test at the 5% level of significance.

Section 7.1 August 26, 2019 25 / 36

We now move to the situation where n < 30.

If n < 30 but we are dealing with a normal distribution and σ is known,

ts = z =x̄− µ0σ/√n

but we know that this will rarely (if ever) occur in practice!

Section 7.1 August 26, 2019 26 / 36

Introducing the t-Distribution

With a small sample size, plugging in s for σ can result in someproblems.

Therefore less precise samples will require us to make somechanges.

This brings us to the t-distribution.

Section 7.1 August 26, 2019 27 / 36

The t-distribution is a symmetric, bell-shaped curve like the normaldistribution.

However, the t-distribution has more area in the tails.

Section 7.1 August 26, 2019 28 / 36

These thicker tails turn out to be exactly the correction we need inorder to use s in place of σ when calculating standard error!

Section 7.1 August 26, 2019 29 / 36

The t-Distribution

The t-distribution:

Is always centered at zero.

Has one parameter: degrees of freedom (df).

For our purposes,df = n− 1

where n is our sample size.

Section 7.1 August 26, 2019 30 / 36

The t-Distribution

For the t-distribution, the parameter df controls how fat the tailsare.

Higher values of df result in thinner tails.

That is, higher df results in a t-distribution that looks more like anormal distribution.I.e., bigger sample sizes make the t-distribution look more normal.

When n ≥ 30, the t-distribution will be essentially equivalent tothe normal distribution.

Section 7.1 August 26, 2019 31 / 36

Confidence Intervals for A Single Population Mean

When n < 30 and σ is unknown, we use the t-distribution for ourconfidence intervals. A (1− α)100% confidence interval for µ is

x̄± tα/2,df ×s√n

Section 7.1 August 26, 2019 32 / 36

Critical Values for the t-Distribution

Let’s take a minute to look at the table of t-distribution critical valuesthat will be provided for your final exam.

Section 7.1 August 26, 2019 33 / 36

Test Statistics

The test statistic for the setting where n < 30 and σ is unknown is

ts = t =x̄− µ0s/√n

(two-sided hypotheses)

Section 7.1 August 26, 2019 34 / 36

P-Values

The p-value for the two-sided hypotheses is then

2× P (tdf < −|ts|)

Section 7.1 August 26, 2019 35 / 36

Example

The following data is on red blood cell counts (in 106 cells permicroliter) for 9 people:

5.4, 5.3, 5.3, 5.2, 5.4, 4.9, 5.0, 5.2, 5.4

Test at the 5% level of significance if the average cell count is 5.

Section 7.1 August 26, 2019 36 / 36

One-Sample Means · 1 Write out the null and alternative hypotheses. 2 Calculate the test...

Documents

Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test

Corpus Linguistics - The Chi Square Test for Statistical ... · Corpus Linguistics The Chi Square Test for Statistical Signi cance Niko Schenk Institut fur England- und Amerikastudien

1 Chapter 9: Introduction to the t statistic. 2 The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown

Robust and E cient One-way MANOVA Tests - Supplemental ... · MANOVA test statistic, and the MCD test statistic. The simulation setting is the same as in Section 6 of the manuscript

AP Statistics 2017 Free-Response Questions...2017 AP ® STATISTICS FREE-RESPONSE QUESTIONS -5- standard deviation of statistic statistic parameter Standardized test statistic: Confidence

-4-€¦ · -5- standard deviation of statistic statistic parameter Standardized test statistic: Confidence interval: statistic critical value standard deviation of statistic

guidelines cance retal_noPW

Dec 19 Presentation S€¦ · ¾List of journals requiring effect sizes ... Use a z test statistic Use s to estimate unknown σ, approximate with a z test statistic Use s to estimate

Don’t Be Another Statistic! Develop a Long-Term Test Automation Strategy

Statistic III: Chi-Square Test

The permutation test as a non-parametric method …hera.ugr.es › doi › 1500000x.pdfThe permutation test as a non-parametric method for testing the statistical signi¢cance of power

NIH Public Access RMSEA Test Statistic in Structural

of the Organic Acids Test - Invivo Clinical · Clinical Signi˜ cance of the Organic Acids Test ... present with elevated lactic acid, including disorders of sugar ... citric acid,

Communications and Its Signiﬁ cance

Towards Explainable AI: Signi cance Tests for Neural Networks · Application: House price valuation Variable Test Statistic Sale Month 2.660 Last Sale Amount 0.768 N Past Sales 0.705

Minimum Kolmogorov–Smirnov test statistic parameter estimates

Mercury - Log Transformed...Coefficient of Variation 7.121933 A-D Test Statistic 51.19065 Skewness 10.5195 A-D 5% Critical Value 0.932722 K-S Test Statistic 0.33119 k star (bias corrected)

V. Multiple Linear Regression€¦ · variables in a regression model: ... The ANOVA F test statistic istest statistic is therefore given by MSR/MSE, and approximately follows an

VAR Order Selection Checking the Model Adequacy · The Likelihood Ratio Test Statistic Because we just need to test zero restrictions on the coefficients, we may use the Wald statistic

A test statistic in the complex wishart distribution and ...4 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 1, JANUARY 2003 A Test Statistic in the Complex Wishart