45
HUDM4122 Probability and Statistical Inference April 1, 2015

HUDM4122 Probability and Statistical Inference April 1, 2015

Embed Size (px)

Citation preview

Page 1: HUDM4122 Probability and Statistical Inference April 1, 2015

HUDM4122Probability and Statistical Inference

April 1, 2015

Page 2: HUDM4122 Probability and Statistical Inference April 1, 2015

Continuing from last class

Page 3: HUDM4122 Probability and Statistical Inference April 1, 2015

You Try It

• 49 students use another curriculum and take pre and post tests

• The students average a gain of 3 points• The students get a standard deviation of 14

• Do the students learn from this curriculum?• Use a two-tailed Z test to find out

Page 4: HUDM4122 Probability and Statistical Inference April 1, 2015

Z = = = = = 1.5

• 49 students use another curriculum and take pre and post tests

• The students average a gain of 3 points• The students get a standard deviation of 14

• Do the students learn from this curriculum?

Page 5: HUDM4122 Probability and Statistical Inference April 1, 2015

1.5> 1.96It is not statistically significant

• 49 students use another curriculum and take pre and post tests

• The students average a gain of 3 points• The students get a standard deviation of 14

• Do the students learn from this curriculum?

Page 6: HUDM4122 Probability and Statistical Inference April 1, 2015

Questions? Comments?

Page 7: HUDM4122 Probability and Statistical Inference April 1, 2015

P-value

• As you’ve probably noticed, most papers don’t just report whether a result is statistically significant, they report a p-value as well

Page 8: HUDM4122 Probability and Statistical Inference April 1, 2015

P-value

• The p-value is the smallest value of a• For which the test is still statistically significant

• Or in other words, it’s the probability that you could have seen the result you got, if the null hypothesis was true

Page 9: HUDM4122 Probability and Statistical Inference April 1, 2015

To compute that probability

• Compute a Z for your data• Take the values –Z and +Z• Find the area to the left of the smaller Z on the Z

distribution• Find the area to the right of the bigger Z on the Z

distribution

• Add those together

• That’s your p

Page 10: HUDM4122 Probability and Statistical Inference April 1, 2015

Example• Z = -1.96• So take -1.96 and +1.96

• Area to the left of Z=-1.96 is 0.025– See your probability table

• Area to the right of Z=+1.96 is 1-0.975= 0.025– See your probability table

• 0.025+0.025=0.05

• So for Z=-1.96, p =0.05

Page 11: HUDM4122 Probability and Statistical Inference April 1, 2015

You try it

• Z = 1.53

Page 12: HUDM4122 Probability and Statistical Inference April 1, 2015

You try it

• Z = -1.03

Page 13: HUDM4122 Probability and Statistical Inference April 1, 2015

How you report it

• “The difference between the curricula was not statistically significant, Z=1.50,p=0.13”

• “The difference between the curricula was statistically significant, Z=5,p<0.001”

Page 14: HUDM4122 Probability and Statistical Inference April 1, 2015

Reporting

• Customarily– p=actual value for p>=0.05– p<0.05 for 0.01<p<0.05– p<0.01 for 0.001<p<0.01– p<0.001 for p<0.001

Page 15: HUDM4122 Probability and Statistical Inference April 1, 2015

MBB Say

This is non-standard; don’t do thisYou can sometime say “marginally significant” for 0.05<p<0.10; depends on the journal

Page 16: HUDM4122 Probability and Statistical Inference April 1, 2015

Comments? Questions?

Page 17: HUDM4122 Probability and Statistical Inference April 1, 2015

Comparing a sample to a single value

• Let’s say you want to compare a sample to a single value that is not zero

• Let’s review the example from last time• And then you will do one

Page 18: HUDM4122 Probability and Statistical Inference April 1, 2015

Example from last time

• A TC professor is studying the grades on an exam taken by 49 students

• The students get an average (sampled) grade of 72

• The students get a standard deviation of 7

• Are the students doing statistically significantly better than the C cut-off line of 70?

Page 19: HUDM4122 Probability and Statistical Inference April 1, 2015

Z =

• A TC professor is studying the grades on an exam taken by 49 students

• The students get an average (sampled) grade of 72

• The students get a standard deviation of 7

• Are the students doing statistically significantly better than the C cut-off line of 70?

Page 20: HUDM4122 Probability and Statistical Inference April 1, 2015

2 > 1.96, so it is statistically significant

• A TC professor is studying the grades on an exam taken by 49 students

• The students get an average (sampled) grade of 72

• The students get a standard deviation of 7

• Are the students doing statistically significantly better than the C cut-off line of 70?

Page 21: HUDM4122 Probability and Statistical Inference April 1, 2015

For Z=2, p = 0.023 + (1-0.977)=0.046

• A TC professor is studying the grades on an exam taken by 49 students

• The students get an average (sampled) grade of 72

• The students get a standard deviation of 7

• Are the students doing statistically significantly better than the C cut-off line of 70?

Page 22: HUDM4122 Probability and Statistical Inference April 1, 2015

You try it• A fisherman is examining the size of the fish he catches to decide if

it’s worth fishing in these here waters• If the average catch size is 36” or under, those jerks in Albany will

confiscate his catch

• He catches 64 fish in his first net

• The fish have an average size of 37”• The fish have a standard deviation of 40”

• Should he fish in these here waters?• What’s the p value?

Page 23: HUDM4122 Probability and Statistical Inference April 1, 2015

Z = = = 0.2• A fisherman is examining the size of the fish he catches to decide if

it’s worth fishing in these here waters• If the average catch size is 36” or under, those jerks in Albany will

confiscate his catch

• He catches 64 fish in his first net

• The fish have an average size of 37”• The fish have a standard deviation of 40”

• Should he fish in these here waters?• What’s the p value?

Page 24: HUDM4122 Probability and Statistical Inference April 1, 2015

Z = = = 0.2, not stat sig.• A fisherman is examining the size of the fish he catches to decide if

it’s worth fishing in these here waters• If the average catch size is 36” or under, those jerks in Albany will

confiscate his catch

• He catches 64 fish in his first net

• The fish have an average size of 37”• The fish have a standard deviation of 40”

• Should he fish in these here waters?• What’s the p value?

Page 25: HUDM4122 Probability and Statistical Inference April 1, 2015

p = 0.84• A fisherman is examining the size of the fish he catches to decide if

it’s worth fishing in these here waters• If the average catch size is 36” or under, those jerks in Albany will

confiscate his catch

• He catches 64 fish in his first net

• The fish have an average size of 37”• The fish have a standard deviation of 40”

• Should he fish in these here waters?• What’s the p value?

Page 26: HUDM4122 Probability and Statistical Inference April 1, 2015

p = 0.84So he shouldn’t fish in these here waters

• A fisherman is examining the size of the fish he catches to decide if it’s worth fishing in these here waters

• If the average catch size is 36” or under, those jerks in Albany will confiscate his catch

• He catches 64 fish in his first net

• The fish have an average size of 37”• The fish have a standard deviation of 40”

• Should he fish in these here waters?• What’s the p value?

Page 27: HUDM4122 Probability and Statistical Inference April 1, 2015

Comments? Questions?

Page 28: HUDM4122 Probability and Statistical Inference April 1, 2015

Two-group Z-test

• Combines our previously studied analysis to estimate the confidence interval of the difference between two groups

• And the process for computing statistical significance rather than confidence intervals

Page 29: HUDM4122 Probability and Statistical Inference April 1, 2015

Two-sample Z-test

• A statistical test involving the Z distribution• Which, yes, means that your samples should

each have N>30

Page 30: HUDM4122 Probability and Statistical Inference April 1, 2015

The test

• H0 : The difference between sample means is no different than 0

• Ha: The difference between sample means is different than 0

• Calculate a Z value for the difference between sample means

Page 31: HUDM4122 Probability and Statistical Inference April 1, 2015

Significance Criterion

• For a two-tailed test, where = 0.05a

• We consider the test significant if

Page 32: HUDM4122 Probability and Statistical Inference April 1, 2015

For example

• You’re comparing the difference between Reasoning Mind and Reasoning Lime

• Reasoning Mind: average grade = 72, standard deviation = 6, sample size = 36

• Reasoning Lime: average grade = 60, standard deviation = 30 , sample size = 36

Page 33: HUDM4122 Probability and Statistical Inference April 1, 2015

Hypotheses

• Null hypothesis: There is no difference between Reasoning Mind and Reasoning Lime

• Alternative hypothesis: There is a difference between Reasoning Mind and Reasoning Lime

Page 34: HUDM4122 Probability and Statistical Inference April 1, 2015

Z =

• You’re comparing the difference between Reasoning Mind and Reasoning Lime

• Reasoning Mind: average grade = 72, standard deviation = 6, sample size = 36

• Reasoning Lime: average grade = 60, standard deviation = 30 , sample size = 36

Page 35: HUDM4122 Probability and Statistical Inference April 1, 2015

Z, so p=0.02 and it is statistically significant

• You’re comparing the difference between Reasoning Mind and Reasoning Lime

• Reasoning Mind: average grade = 72, standard deviation = 6, sample size = 36

• Reasoning Lime: average grade = 60, standard deviation = 30 , sample size = 36

Page 36: HUDM4122 Probability and Statistical Inference April 1, 2015

You try it

• Our friend the fisherman is fishing in two rivers and wants to know if the fish are bigger in one river than another

• Salmon River: average size = 42”, standard deviation = 20”, sample size = 100

• Hudson River: average grade = 49”, standard deviation = 30” , sample size = 100

Page 37: HUDM4122 Probability and Statistical Inference April 1, 2015

Questions? Comments?

Page 38: HUDM4122 Probability and Statistical Inference April 1, 2015

Types of Errors

Page 39: HUDM4122 Probability and Statistical Inference April 1, 2015

Types of Errors

• Statistician Terminology• Data Miner Terminology

Page 40: HUDM4122 Probability and Statistical Inference April 1, 2015

“Type I Error”

• False Positive

• Rejecting the Null Hypothesis when the Null Hypothesis is true

• Saying the result is statistically significant when there’s nothing there

• a

Page 41: HUDM4122 Probability and Statistical Inference April 1, 2015

“Type II Error”

• False Negative

• Accepting the Null Hypothesis when the Null Hypothesis is false

• Saying the result is not statistically significant when there’s actually something there

• b

Page 42: HUDM4122 Probability and Statistical Inference April 1, 2015

In the traditionalstatistical significance paradigm

• You control a• You are unable to control b

Page 43: HUDM4122 Probability and Statistical Inference April 1, 2015

Type I or Type II error?

• Reasoning Mind is better than Reasoning Lime, but your stat test got p=0.13

• Dreambox is not better than Bob’s Discount Math Curriculum, but your stat test got p=0.03

• Columbia University is better than Columbia College of Hollywood CA, but your stat test got p=0.17

Page 44: HUDM4122 Probability and Statistical Inference April 1, 2015

Questions? Comments?

Page 45: HUDM4122 Probability and Statistical Inference April 1, 2015

Upcoming Classes

• 4/15 Statistical power– HW8 due

• 4/20 Independent-samples t-test– HW9 due

• 4/22 Paired-samples t-test

• 4/23 Special session on SPSS