An “app” thought!

An “app” thought!

45

67

89

ESTIMATED

6 7 8 9OBSERVED

ESTIMATED ESTIMATED

An “app” thought!

VC question: How much is this worth as a killer app?

GAUSS, Carl Friedrich 1777-1855

http://www.york.ac.uk/depts/maths/histstat/people/

http://www.york.ac.uk/depts/maths/histstat/people/

f(X) =

Where = 3.1416 and e = 2.7183

1

2

e-(X - ) / 2 2 2

Normal Distribution

UnimodalSymmetrical34.13% of area under curve is between µ and +1 34.13% of area under curve is between µ and -1 68.26% of area under curve is within 1 of µ.95.44% of area under curve is within 2 of µ.

Some Problems

• If z = 1, what % of the normal curve lies above it? Below it?

• If z = -1.7, what % of the normal curve lies below it?

• What % of the curve lies between z = -.75 and z = .75?

• What is the z-score such that only 5% of the curve lies above it?

• In the SAT with µ=500 and =100, what % of the population do you expect to score above 600? Above 750?

X_

X_

X_

X_

X_

X_

X_

X_

X_

X_

X_

X_X

_

X_ X

_X_

X_

X_X

_X_

X_

X_

X_

X_

X_

X_

X_ X

_X_

X_

X_

X_

X_

X_

X_

X_

μ

Population

SampleA XA

µ

_

SampleB XB

SampleE XE

SampleD XD

SampleC XC

_

_

_

_

In reality, the sample mean is just one of many possible samplemeans drawn from the population, and is rarely equal to µ.

sa

sb

sc

sd

se

n

n

n

n n

Population

SampleA XA

µ

_

SampleB XB

SampleE XE

SampleD XD

SampleC XC

_

_

_

_

In reality, the sample sd is also just one of many possible samplesd’s drawn from the population, and is rarely equal to σ.

sa

sb

sc

sd

se

n

n

n

n n

SS

(N - 1)s2 =

SS

N2 =

What’s the difference?

SS

(N - 1)s2 =

SS

N2 =

What’s the difference?

^

(occasionally you will see this little “hat” on the symbol to clearly indicate that this is a variance estimate) – I like this because it is a reminder that we are usually just making estimates, and estimates are always accompanied by error and bias, and that’s one of the enduring lessons of statistics)

Standard deviation.

SS

(N - 1)s =

Standard Error of the Mean

X_ = ____

N

As sample size increases, the magnitude of the sampling error decreases; at a certainpoint, there are diminishing returns of increasing sample size to decrease sampling error.

Central Limit Theorem

The sampling distribution of means from random samplesof n observations approaches a normal distribution regardless of the shape of the parent population.

Just for fun, go check out the Khan Academyhttp://www.khanacademy.org/video/central-limit-theorem?playlist=Statistics

http://www.khanacademy.org/video/central-limit-theorem?playlist=Statistics

_

z = X -

X-

Wow! We can use the z-distribution to test a hypothesis.

Step 1. State the statistical hypothesis H0 to be tested (e.g., H0: = 100)

Step 2. Specify the degree of risk of a type-I error, that is, the risk of incorrectly concluding that H0 is false when it is true. This risk, stated as a probability, is denoted by , the probabilityof a Type I error.

Step 3. Assuming H0 to be correct, find the probability of obtaining a sample mean thatdiffers from by an amount as large or larger than what was observed.

Step 4. Make a decision regarding H0, whether to reject or not to reject it.

An Example

You draw a sample of 25 adopted children. You are interested in whether theyare different from the general population on an IQ test ( = 100, = 15).

The mean from your sample is 108. What is the null hypothesis?

An Example



H0: = 100

An Example



H0: = 100

Test this hypothesis at = .05

An Example



H0: = 100

Test this hypothesis at = .05

Step 3. Assuming H0 to be correct, find the probability of obtaining a sample mean thatdiffers from by an amount as large or larger than what was observed.

Step 4. Make a decision regarding H0, whether to reject or not to reject it.

GOSSET, William Sealy 1876-1937

GOSSET, William Sealy 1876-1937

The t-distribution is a family of distributions varying by degrees of freedom (d.f., whered.f.=n-1). At d.f. = , but at smaller than that, the tails are fatter.

_

z = X -

X-

_

t = X -

sX-

sX = s

N

-

The t-distribution is a family of distributions varying by degrees of freedom (d.f., whered.f.=n-1). At d.f. = , but at smaller than that, the tails are fatter.

df = N - 1

Degrees of Freedom

Problem

Sample:

Mean = 54.2SD = 2.4N = 16

Do you think that this sample could have been drawn from a population with = 50?

Problem

Sample:

Mean = 54.2SD = 2.4N = 16

Do you think that this sample could have been drawn from a population with = 50?

_

t = X -

sX-

The mean for the sample of 54.2 (sd = 2.4) was significantly different from a hypothesized population mean of 50, t(15) = 7.0, p < .001.

The mean for the sample of 54.2 (sd = 2.4) was significantly reliably different from a hypothesized population mean of 50, t(15) = 7.0, p < .001.

Population

SampleA

SampleB

SampleE

SampleD

SampleC

_

XY

rXY

rXY

rXYrXY

rXY

The t distribution, at N-2 degrees of freedom, can be used to test the probability that the statistic r was drawn from a population with = 0. Table C.

H0 : XY = 0

H1 : XY 0

where

r N - 2

1 - r2

t =

Documents

An “app” thought!