Last Time Central Limit Theorem –Illustrations –How large n? –Normal Approximation to Binomial...

Last Time

• Central Limit Theorem– Illustrations– How large n?– Normal Approximation to Binomial

• Statistical Inference– Estimate unknown parameters– Unbiasedness (centered correctly)– Standard error (measures spread)

Administrative Matters

Midterm II, coming Tuesday, April 6

• Numerical answers:– No computers, no calculators

• Numerical answers:– No computers, no calculators– Handwrite Excel formulas (e.g. =9+4^2)– Don’t do arithmetic (e.g. use such formulas)

• Bring with you:– One 8.5 x 11 inch sheet of paper

• Bring with you:– One 8.5 x 11 inch sheet of paper– With your favorite info (formulas, Excel, etc.)

• Course in Concepts, not Memorization

• Material Covered:

HW 6 – HW 10

– Note: due Thursday, April 2

HW 6 – HW 10

– Note: due Thursday, April 2– Will ask grader to return Mon. April 5– Can pickup in my office (Hanes 352)

HW 6 – HW 10

– Note: due Thursday, April 2– Will ask grader to return Mon. April 5– Can pickup in my office (Hanes 352)– So today’s HW not included

Extra Office Hours before Midterm II

Monday, Apr. 23 8:00 – 10:00

Monday, Apr. 23 11:00 – 2:00

Tuesday, Apr. 24 8:00 – 10:00

Tuesday, Apr. 24 1:00 – 2:00

(usual office hours)

Study Suggestions

1. Work an Old Exama) On Blackboard

b) Course Information Section

Study Suggestions

c) Afterwards, check against given solutions

Study Suggestions

2. Rework HW problems

Study Suggestions

2. Rework HW problemsa) Print Assignment sheets

b) Choose problems in “random” order

Study Suggestions

2. Rework HW problemsa) Print Assignment sheets

b) Choose problems in “random” order

c) Rework (don’t just “look over”)

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 356-369, 487-497

Approximate Reading for Next Class:

Pages 498-501, 418-422, 372-390

Law of AveragesCase 2: any random sample

CAN SHOW, for n “large”

is “roughly”

Terminology: “Law of Averages, Part 2” “Central Limit Theorem”

(widely used name)

nXX ,,1

Central Limit TheoremIllustration: Rice Univ. Applethttp://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html

Starting Distribut’n

user input

(very non-Normal)

Dist’n of average

of n = 25

(seems very

mound shaped?)

Extreme Case of CLTConsequences:

roughly

Terminology: Called

The Normal Approximation to the Binomial

p npppN 1,

X pnpnpN 1,

Normal Approx. to BinomialHow large n?

• Bigger is better

• Could use “n ≥ 30” rule from above

Law of Averages

• But clearly depends on p

• Textbook Rule:

OK when {np ≥ 10 & n(1-p) ≥ 10}

Statistical InferenceIdea: Develop formal framework for

handling unknowns p & μ

e.g. 1: Political Polls

e.g. 2a: Population Modeling

e.g. 2b: Measurement Error

Statistical InferenceA parameter is a numerical feature of

population, not sample

An estimate of a parameter is some function of data

(hopefully close to parameter)

Statistical InferenceStandard Error: for an unbiased estimator,

standard error is standard deviation

Notes: For SE of , since don’t know p, use

sensible estimate For SE of , use sensible estimate

Statistical InferenceAnother view:

Form conclusions by

quantifying uncertainty

Form conclusions by

quantifying uncertainty

(will study several approaches, first is…)

Confidence Intervals

Background:

The sample mean, , is an “estimate”

of the population mean,

Background:

How accurate?

Background:

How accurate?

(there is “variability”, how

much?)

Confidence IntervalsIdea:

Since a point estimate

(e.g. or )X p

is never exactly right

(in particular ) 0XP

give a reasonable range of likely values

(range also gives feeling

for accuracy of estimation)

give a reasonable range of likely values

(range also gives feeling

for accuracy of estimation)

Confidence IntervalsE.g. ,~,,1 NXX n

Confidence IntervalsE.g. with σ known ,~,,1 NXX n

Confidence IntervalsE.g. with σ known

Think: measurement error

,~,,1 NXX n

Each measurement is Normal

,~,,1 NXX n

Each measurement is Normal

Known accuracy (maybe)

,~,,1 NXX n

Think: population modeling

,~,,1 NXX n

Normal population

,~,,1 NXX n

Normal population

Known s.d.

(a stretch, really need to improve)

,~,,1 NXX n

Recall the Sampling Distribution:

,~,,1 NXX n

(recall have this even when data not

normal, by Central Limit Theorem)

,~,,1 NXX n

Use to analyze variation

,~,,1 NXX n

Understand error as:

(normal density quantifies

randomness in )

ndistX '

(distribution centered at μ)

ndistX '

(spread: s.d. = )

ndistX 'n

How to explain to untrained consumers?

ndistX 'n

How to explain to untrained consumers?

(who don’t know randomness,

distributions, normal curves)

ndistX 'n

Approach: present an interval

With endpoints:

Estimate +- margin of error

With endpoints:

I.e. mX

With endpoints:

reflecting variability

With endpoints:

How to choose ?

Choice of Confidence Interval Radius

Choice of Confidence Interval Radius,

i.e. margin of error, m

i.e. margin of error, :

Notes:

• No Absolute Range (i.e. including “everything”) is available

Notes:

• From infinite tail of normal dist’n

Notes:

• From infinite tail of normal dist’n

• So need to specify desired accuracy

Confidence IntervalsChoice of margin of error, m

Confidence IntervalsChoice of margin of error, :Approach:• Choose a Confidence Level

Confidence IntervalsChoice of margin of error, :Approach:• Choose a Confidence Level• Often 0.95

(e.g. FDA likes this number for approving new drugs, and it is a common standard for publication in many fields)

Confidence IntervalsChoice of margin of error, :Approach:• Choose a Confidence Level• Often 0.95

(e.g. FDA likes this number for approving new drugs, and it is a common standard for publication in many fields)

• And take margin of error to include that part of sampling distribution

E.g. For confidence level 0.95, want

0.95 = Area

distribution

0.95 = Area

distribution

0.95 = Area

= margin of errorm

Computation: Recall NORMINV

Computation: Recall NORMINV takes

areas (probs)

areas (probs), and returns cutoffs

Issue: NORMINV works with lower areas

Note: lower tail

included

So adapt needed probs to lower areas….

When inner area = 0.95,

Right tail = 0.025

Shaded Area = 0.975

Right tail = 0.025

Shaded Area = 0.975

So need to compute as:

nNORMINV

,,975.0

Need to compute:

nNORMINV

,,975.0

Need to compute:

Major problem: is unknown

nNORMINV

,,975.0

Need to compute:

• But should answer depend on ?

nNORMINV

,,975.0

Need to compute:

• “Accuracy” is only about spread

nNORMINV

,,975.0

Need to compute:

• Not centerpoint

nNORMINV

,,975.0

Need to compute:

• Not centerpoint

• Need another view of the problem

nNORMINV

,,975.0

Approach to unknown

Approach to unknown :

Recenter, i.e. look at dist’n

Key concept:

Centered at 0

Key concept:

Centered at 0

Now can calculate as:

nNORMINVm

,0,975.0

Computation of:

nNORMINVm

,0,975.0

Computation of:

Smaller Problem: Don’t know

nNORMINVm

,0,975.0

Computation of:

Approach 1: Estimate with

(natural approach: use estimate)

nNORMINVm

,0,975.0

Computation of:

• Leads to complications

nNORMINVm

,0,975.0

Computation of:

• Will study later

nNORMINVm

,0,975.0

Computation of:

• Will study later

Approach 2: Sometimes know

nNORMINVm

,0,975.0

Research Corner

How many bumps in stamps data?

Kernel Density Estimates

Depends on Window

Research Corner

Depends on Window

Research Corner

Depends on Window

Research Corner

Depends on Window

Research Corner

Depends on Window

Early Approach:

Use data to choose

window width

Research Corner

Depends on Window

Challenge:

Not enough info in

data for good choice

Research Corner

Depends on Window

Alternate Approach:

Scale Space

Research Corner

Scale Space:

Main Idea:

• Don’t try to choose window width

Research Corner

Scale Space:

Main Idea:

• Instead use all of them

Research Corner

Scale Space:

Main Idea:

• Terminology from Computer Vision

Research Corner

Scale Space:

Main Idea:

• Terminology from Computer Vision

(goal: teach computers to “see”)

Research Corner

Scale Space:

Main Idea:

• Terminology from Computer Vision:– Oversmoothing: coarse scale view

(zoomed out – macroscopic perception)

Research Corner

Scale Space:

Main Idea:

• Terminology from Computer Vision:– Oversmoothing: coarse scale view

– Undersmoothing: fine scale view

(zoomed in – microscopic perception)

Research Corner

Scale Space:

View 1: Rainbow colored movie

Research Corner

Scale Space:

View 2: Rainbow

colored overlay

Research Corner

Scale Space:

View 3: Rainbow

colored surface

Research Corner

Scale Space:

Main Idea:

Challenge: how to do statistical inference?

Research Corner

Scale Space:

Main Idea:

Which bumps are really there?

Research Corner

Scale Space:

Main Idea:

(i.e. statistically significant)

Research Corner

Scale Space:

Research Corner

Scale Space:

Address this next time

E.g. Crop researchers plant 15 plots

with a new variety of corn.

with a new variety of corn. The

yields, in bushels per acre are:

with a new variety of corn. The

yields, in bushels per acre are:

Assume that = 10 bushels / acre

Confidence IntervalsE.g. Find:

a) The 90% Confidence Interval for the mean value , for this type of corn.

b) The 95% Confidence Interval.

c) The 99% Confidence Interval.

d) How do the CIs change as the confidence level increases?

Solution, part 1 of Class Example 11:http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg11.xls

a) 90% Confidence

Interval for

Next study relevant parts of E.g. 11:http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg11.xls

a) 90% Confidence

Interval for

Use Excel

a) 90% Confidence

Interval for

Use Excel

Data in C8:C22

a) 90% Confidence Interval for

Steps:

- Sample Size, n

Steps:

- Sample Size, n

- Average,

Steps:

- Sample Size, n

- Average,

- S. D., σ

Steps:

- Sample Size, n

- Average,

- S. D., σ

- Margin, m

Steps:

- Sample Size, n

- Average,

- S. D., σ

- Margin, m

- CI endpoint, left

Steps:

- Sample Size, n

- Average,

- S. D., σ

- Margin, m

- CI endpoint, left

- CI endpoint, right

a) 90% CI for : [119.6, 128.0]

An EXCEL shortcut:

CONFIDENCE

An EXCEL shortcut:

CONFIDENCE

An EXCEL shortcut:

CONFIDENCE

Note: same

margin of error

as before

An EXCEL shortcut:

CONFIDENCE

An EXCEL shortcut:

CONFIDENCE

Inputs:

Sample Size

An EXCEL shortcut:

CONFIDENCE

Inputs:

Sample Size

An EXCEL shortcut:

CONFIDENCE

Inputs:

Sample Size

An EXCEL shortcut:

CONFIDENCE

Careful: parameter α

An EXCEL shortcut:

CONFIDENCE

Careful: parameter α is:

2 tailed outer area

An EXCEL shortcut:

CONFIDENCE

Careful: parameter α is:

2 tailed outer area

So for level = 0.90, α = 0.10

a) 90% CI for μ: [119.6, 128.0]