34
Stor 155, Section 2, Last Time Prediction in Regression Given new point X 0 , predict Y 0 Confidence interval for mean Prediction Interval for value Review…

Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Embed Size (px)

Citation preview

Page 1: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Stor 155, Section 2, Last Time• Prediction in Regression

– Given new point X0, predict Y0

– Confidence interval for mean

– Prediction Interval for value

• Review…

Page 2: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Stat 31 Final Exam:Date & Time:

Tuesday, May 8,  8:00-11:00

Last Office Hours:• Thursday, May 3, 12:00 - 5:00• Monday, May 7, 10:00 - 5:00 • & by email appointment (earlier)

Bring with you, to exam:• Single (8.5" x 11") sheet of formulas• Front & Back OK

Page 3: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Review Slippery Issues

Major Confusion:

Population Quantities

Vs.

Sample Quantities

Page 4: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Response to a RequestYou said at the end of today's class that you would be

willing to take class time to "reteach" concepts that might still be unknown to us.

Well, in my case, it seems that probability and probability distribution is a hard concept for me to grasp.

On the first midterm, I missed … and on the second midterm, I missed …

I seem to be able to grasp the other concepts involving binomial distribution, normal distribution, t-distribution, etc fairly well, but probability is really killing me on the exams.

If you could reteach these or brush up on them I would greatly appreciate it.

Page 5: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Levels of Probability• Simple Events

– Big Rules of Prob (Not, And, Or)– Bayes Rule

• Distributions (in general)– Defined by Tables

• Summary of discrete probs• Get probs by summing

– Uniform• Get probs by finding areas

Page 6: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Levels of Probability• Distributions (in general)

• Named (& Useful) Distributions– Binomial

• Discrete distribution of Counts• Compute with BINOMDIST & Normal Approx.

– Normal• Continuous distribution of Averages• Compute with NORMDIST & NORMINV

– T• Similar to Normal, for estimated s.d.• Compute with TDIST & TINV

Page 7: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Detailed LookSimple Events:• Big Rules of Probability:

– Not Rule ( 1 – P{opposite})– Or Rule (glasses – football)– And rule (multiply conditional prob’s)– Use in combination for real power

• Bayes Rule– Turn around conditional probabilities– Write hard ones in terms of easy ones– Recall surprising disease testing result

Page 8: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Detailed Look• Distributions (in general)

– Defined by Tables• Summary of discrete probs• Get probs by summing• Easy to forget after so much other stuff…

Studied in Notes: 2/20, 2/22, 3/1

Some highlights…

Page 9: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Highlights of Dist’ns in Tables• Distributions (in general)

– Defined by Tables• Summary of discrete probs• Get probs by summing• Easy to forget after so much other stuff…

Studied in Notes: 2/20, 2/22, 3/1

Some highlights…

Page 10: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Random Variables

Die rolling example, for X = “net winnings”:

Win $9 if 5 or 6, Pay $4, if 1, 2 or 4

Probability Structure of X is summarized by:

P{X = 9} = 1/3 P{X = -4} = 1/2 P{X = 0} = 1/6

Convenient form: a table

Winning 9 -4 0

Prob. 1/3 1/2 1/6

Page 11: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Summary of Prob. Structure

In general: for discrete X, summarize “distribution” (i.e. full prob. Structure) by a table:

Where:

i. All are between 0 and 1

ii. (so get a prob. funct’n as above)

Values x1 x2 … xk

Prob. p1 p2 … pk

11

k

iip

ip

Page 12: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Summary of Prob. Structure

Summarize distribution, for discrete X,

by a table:

Power of this idea:

• Get probs by summing table values

• Special case of disjoint OR rule

Values x1 x2 … xk

Prob. p1 p2 … pk

Page 13: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Summary of Prob. Structure

E.g. Die Rolling game above:

P{X = 9} = 1/3

P{X < 2} = P{X = 0} + P{X = -4} =1/6+1/2 = 2/3

P{X = 5} = 0 (not in table!)

Winning 9 -4 0

Prob. 1/3 1/2 1/6

Page 14: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Summary of Prob. Structure

E.g. Die Rolling game above:Winning 9 -4 0

Prob. 1/3 1/2 1/6

0

0&90|9

XPXXP

XXP

3

2

2131

31

6131

09

XPXP

Page 15: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Mean of Discrete Distributions

Frequentist approach to mean:

a weighted average of values

where weights are probabilities

i

k

iixpX

1

Page 16: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Mean of Discrete Distributions

E.g. Above Die Rolling Game:

Mean of distribution =

= (1/3)(9) + (1/6)(0) +(1/2)(-4) = 3 - 2 = 1

Interpretation: on average (over large number

of plays) winnings per play = $1

Conclusion: should be very happy to play

Winning 9 -4 0

Prob. 1/3 1/2 1/6

Page 17: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Variance of Random VariablesSo define:

Variance of a distribution

As:

random variable

k

jXjjX xp

1

22

Page 18: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Variance of Random Variables

E. g. above game:

=(1/2)*5^2+(1/6)*1^2+(1/3)*8^2

Note: one acceptable Excel form,

e.g. for exam (but there are many)

Winning 9 -4 0

Prob. 1/3 1/2 1/6

2222 1931

1061

1421 X

X

Page 19: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Standard Deviation

Recall standard deviation is square root of

variance (same units as data)

E. g. above game:

Standard Deviation

=sqrt((1/2)*5^2+(1/6)*1^2+(1/3)*8^2)

Winning 9 -4 0

Prob. 1/3 1/2 1/6

Page 20: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

And Now for Something Completely Different

Thought Provoking Movie…

http://www.aclu.org/pizza/

Page 21: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Review Slippery Issues

Major Confusion:

Population Quantities

Vs.

Sample Quantities

Page 22: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Recall Pepsi Challenge

In class taste test:

• Removed bias with randomization

• Double blind approach

• Asked which was:

– Better

– Sweeter

– which

Page 23: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Recall Pepsi Challenge

Results summarized in http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stat155CokePepsiResults2007.xls

Recall Eyeball impressions:

a. Perhaps no consensus preference

between Pepsi and Coke?

– Is 54% "significantly different from 50%?

Result of "marketing research"???

Page 24: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Recall Pepsi Challenge

b. Perhaps no consensus as to which is

sweeter?

• Very different from the past, when Pepsi was

noticeably sweeter

• This may have driven old Pepsi challenge

phenomenon

• Coke figured this out, and matched Pepsi in

sweetness

Page 25: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Recall Pepsi Challenge

c. Most people believe they know

– Serious cola drinkers, because now flavor driven

– In past, was sweetness driven, and there were many

advertising caused misperceptions!

d. People tend to get it right or not??? (less clear)

– Overall 71% right. Seems like it, but again is that

significantly different from 50%?

Page 26: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Recall Pepsi Challenge

e. Those who think they know tend to be right???

– People who thought they knew: right 71% of the

time

f. Those who don't think they know seem to right as

well. Wonder why?

– People who didn't: also right 70% of time? Why?

"Natural sampling variation"???

– Any difference between people who thought they

knew, and those who did not think so?

Page 27: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Recall Pepsi Challenge

g. Coin toss was fair (or is 57% heads significantly

different from %50?)

How accurate are those ideas?

• Will build tools to assess this

• Called “hypo tests” and “P-values”

• Revisit this now

Page 28: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Pepsi – Coke Taste TestData and Analysis:

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stat155CokePepsiResults2007.xls

Hypothesis Tests:

• Proportions based (i.e. think about p)

• Interesting Hypos:

• Recall Sampling Distribution:

5.0:0 pH

5.0: pH A

npp

Npp1

,0~ˆ

Page 29: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Pepsi – Coke Taste TestData and Analysis:

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stat155CokePepsiResults2007.xls

P-value: P{what saw or m.c. | p = 0.5}

Under assumption p = 0.5,

So compute P-value as:

Area obs’d

nnnn

pp215.05.015.01

5.0ˆ p

Page 30: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Pepsi – Coke Taste TestData and Analysis:

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stat155CokePepsiResults2007.xls

Compute P-value as:

Area obs’d

=NORMDIST(ABS(phat – 0.5),0, 1/(2*SQRT(n),TRUE)

5.0ˆ p

Page 31: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Pepsi – Coke Taste TestConclusions (P-values):

http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stat155CokePepsiResults2007.xls

• No consensus, Pepsi vs. Coke (0.46)

• No consensus, Sweeter (0.81)

• Most think know (e-5, very strong)

• Get It Right (0.0006, very strong)

• Fair Coin Toss (0.21, seems OK)

• Thought Right, Were Right (0.003,yes)

• Thought Not, Were Right (0.09, perhaps too modest?)

Page 32: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Pepsi – Coke Taste TestSome interesting history of this test:

• First Attempts

– Pepsi was preferred

– Pepsi was sweeter

– Many got it wrong (even if thought new)

– Reason for “Pepsi challenge”?

• New Coke Came Out

– Response to Pepsi Challenge?

Page 33: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Pepsi – Coke Taste TestSome interesting history of this test:

• New Coke Came Out

– People thought they hated it…

– Anger over changing the flavor…

– So Coke Classic came out

• Fun for me:

New Coke vs. Coke Classic

Page 34: Stor 155, Section 2, Last Time Prediction in Regression –Given new point X 0, predict Y 0 –Confidence interval for mean –Prediction Interval for value

Pepsi – Coke Taste TestSome interesting history of this test:

• Taste test: New Coke vs. Coke Classic

– New Coke preferred to Coke Classic!

– New Coke was sweeter

– Most got it wrong (even if thought new)

• Changes Over Time

– Appears Coke Classic slowly morphed into

New Coke…