19
Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

Embed Size (px)

Citation preview

Page 1: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

Random Thoughts 2012(COMP 066)

Jan-Michael FrahmJared Heinly

source: fivethirtyeight.com

Page 2: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

2

Election Polls

• Virginia at 8:30 pm was 58% Romney and 41% Obama with 12% of the polls in

• That is a poll of 971, 000 people

• Why did Obama win it now?

Page 3: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

3

Election Polls• Why is Florida still not having a projected

winner?

• Why does Ohio already have a projected winner with the same percentage of polls in?

Page 4: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

4

Statistic of Support for Candidates

• USAToday: “Romney leads in states with most American cars” eight of the top 10 states for registration of new

american build cars are Romney supporters two others are Swing states (Iowa & Michigan) In fact the next four states are also Romney

supporters Obama has solid support for 9 of the 10 states

with the most foreign registrations

• Is this statement true?

• How do we compute if its true?

Page 5: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

5

Hypothesis Testing

• What we want is to test a hypothesis H0

• Hypothesis is usually a number to characterize a population percentage of cancer in population average size of a person in the US ….

• In hypothesis testing we also need an alternative hypothesis that we pick to make a statement if H0 is rejected

Page 6: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

6

Alternative Hypothesis • Alternative Hypothesis Ha

selected to support the rejection statement typical choices:

Ha< H0 is less is the desired statement if H0 is rejected

Ha<> H0 is different is the desired statement if H0 is rejected (H0 is false)

Ha< H0 is larger is the desired statement if H0 is rejected

Ha often called research hypothesis

• How to select what is H0 and what is Ha?

H0 is typically the statement you want to verify

H0 is typically assumed to be true unless there is strong evidence its wrong (similar to jury trial)

Page 7: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

7

Find a Sample to Test Hypothesis

• Select a sample of size N to test the hypothesis all rules of good samples apply that we have

seen before for polls. sample size still is influencing your certainty of

the decision/estimate

• Compute the desired value e.g. average height of males)

• What does that value tell us? its only the characteristic of our sample set we will need to extract its characteristics for the

hypothesis evidence

Page 8: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

8

Standardizing the Sample Value

• Convert to standard score (probability of the result) 1. Take out the Null Hypothesis H0 (value from

sample – H0)

if small this indicates you are close to H0, if far H0 is less likely

2. divide by standard error of the statistic s

this normalizes the distance to equalize close and far in 1) to what the deviation of the value is.

• What distance is good to reject or not reject?

Page 9: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

9

How to reject H0

• Previous normalization brings value into standard value distribution called Z-distribution or Normal distribution

• Test for value being likely or unlikely given the distribution if within likely region keep H0

if unlikely reject H0

Page 10: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

10

Z-Distribution (Normal distribution)

Page 11: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

11

How to reject H0

• Previous normalization brings value into standard value distribution called Z-distribution or Normal distribution

• Test for value being likely or unlikely given the distribution if within likely region keep H0

if unlikely reject H0

• Note that if the value is not rejected that does not mean its accepted either! Only means there is not enough evidence to reject

Page 12: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

12

Finding the Likelihood

• Value is called p value

• Can be looked up in reference tables

• EXCEL: vp NORM.S.DIST(value.TRUE)

• For alternative hypothesis being: less than p=vp

not equal p=2 vp

larger than p = 1- vp

Page 13: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

13

Interpreting p-value

• set your cutoff called α (e.g. α = 0.05)

• if the p-value is: less than 0.01 result is considered highly

statistically significant reject null hypothesis if between α and 0.01 (not close to α) result is

statistically significant reject null hypothesis if close to α result is marginally statistically

significant either way is fine for rejection or not rejection

if greater than α don’t reject

• Always ask for p-value and α to make up your own mind

Page 14: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

14

Testing for Proportion of Population

• Again for proportions we need to test differently

1. Compute proportion of population that is positive

regular percentage calculation

2. Subtract proportion that is claimed

3. Calculate standard error

4. divide step 2 by the standard error

Page 15: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

15

Statistic of Support for Candidates

• USAToday: “Romney leads in states with most American cars” eight of the top 10 states for registration of new

american build cars are Romney supporters two others are Swing states (Iowa & Michigan) In fact the next four states are also Romney

supporters Obama has solid support for 9 of the 10 states

with the most foreign registrations

• Is this statement true?

• How do we compute if its true?

Page 16: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

16

Small Samples

• Use t-distribution

Page 17: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

17

T-distribution

source: Wikipedia

Page 18: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

18

Small Samples

• Use t-distribution

• Accounts for the sample size

• Value can be found in tables

• Excel: T.DIST

Page 19: Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com

19

Errors

• Error Type 1: Wrong rejection

• Error Type 2: Missed rejection