69
INFERENTIAL STATISTICS

Statistics for Librarians, Session 3: Inferential statistics

Embed Size (px)

DESCRIPTION

Third in a series of four seminars presented to University of North Texas librarians. This presentation focuses on using basic tests that determine the association of two sets of data based on measures of central tendency and variation.

Citation preview

Page 1: Statistics for Librarians, Session 3: Inferential statistics

INFERENTIAL STATISTICS

Page 2: Statistics for Librarians, Session 3: Inferential statistics

GOALS OF SERIES

Comfort

Fears

Page 3: Statistics for Librarians, Session 3: Inferential statistics

SESSION OBJECTIVES

Purpose of Inferential Statistics

Probability

Elements of Significance Testing

Three key tests• T-test• Chi-squared• Correlation (or binomial)

Effect Measures

Page 4: Statistics for Librarians, Session 3: Inferential statistics

PURPOSE OF INFERENTIAL STATISTICS

• Infer results•Draw conclusions• Increase the Signal-Noise ratio

Signal

Noise

Page 5: Statistics for Librarians, Session 3: Inferential statistics

INFERENTIAL STATISTICS

Tests of hypotheses• Expectations• AssociationsAccounts for uncertainty• Random error• Confidence interval

Page 6: Statistics for Librarians, Session 3: Inferential statistics

HYPOTHESES

Your Hypothesi

s(H1)

Null Hypothesis(H0)

Page 7: Statistics for Librarians, Session 3: Inferential statistics

NOT TO PROVE, BUT TO FALSIFY

H1Difference

H0No Difference

Page 8: Statistics for Librarians, Session 3: Inferential statistics

NOT TO PROVE, BUT TO FALSIFY

H1>=10%

Increase

H0<10% Increase

Page 9: Statistics for Librarians, Session 3: Inferential statistics

REVIEW OF DESCRIPTIVE STATISTICS

Page 10: Statistics for Librarians, Session 3: Inferential statistics

LEVELS OF MEASUREMENT (NOIR)

Nominal• Counts by

category• Binary (Yes/No)• No meaning

between the categories (Blue is not better than Red)

Ordinal• Ranks• Scales• Space between

ranks is subjective

Interval• Integers• Zero is just

another value – doesn’t mean “absence of”

• Space between values is equal and objective, but discrete

Ratio• Interval data with

a baseline• Zero (0) means

“absence of” • Space between is

continuous• Includes simple

counts

Page 11: Statistics for Librarians, Session 3: Inferential statistics

Central Tendency

ErrorSpread

DESCRIPTIVE STATISTICAL ANALYSIS

Page 12: Statistics for Librarians, Session 3: Inferential statistics

CENTRAL TENDENCY BY LEVELS OF MEASUREMENT

Interval or Ratio

Mean

Median

Nominal or Rank

Mode

Median (rank only)

Page 13: Statistics for Librarians, Session 3: Inferential statistics

SPREAD

Interval & Ratio

• Range• Quantiles• Standard

Deviation

Nominal & Rank

• Distribution Tables

• Bar Graphs

How variable is the data?

Page 14: Statistics for Librarians, Session 3: Inferential statistics

RANGE & QUARTILES

Page 15: Statistics for Librarians, Session 3: Inferential statistics

FORMULAS

Mean Standard Deviation

Page 16: Statistics for Librarians, Session 3: Inferential statistics

PROBABILITYWHAT’S PROBABIL ITY GOT TO DO WITH

STATISTICS?

Page 17: Statistics for Librarians, Session 3: Inferential statistics

WHAT IS PROBABILITY?

Chance of something happening (x)

Expressed as P(x)=y

Between 0 and 1

Based on distribution of events

Page 18: Statistics for Librarians, Session 3: Inferential statistics

STEM-AND-LEAF

Stem

Leaf

0 01112222222222222233333344445556666677788899

1 0000000011122223333356778899

2 00122234444799

3 0245

Groups Last digit

Years at UNT

0 5 131 6 131 6 131 6 132 6 152 6 162 7 172 7 172 7 182 8 182 8 19

3 11 294 11 294 12 304 12 324 12 345 12 355 13 

Page 19: Statistics for Librarians, Session 3: Inferential statistics

Stem

Leaf Count

0 1122223334445555666666677777899

31

1 000011122222222333346677889 27

2 0122234468 10

3 1112355888 11

4 12 2Range Count

0-9 31

10-19 27

20-29 10

30-39 11

40-49 2

0-9 10-19 20-29 30-39 40-490

10

20

30

40

Histogram of Years at UNT

Page 20: Statistics for Librarians, Session 3: Inferential statistics

NORMAL DISTRIBUTIONS

Page 21: Statistics for Librarians, Session 3: Inferential statistics

PROBABILITY DISTRIBUTION

Page 22: Statistics for Librarians, Session 3: Inferential statistics

Set the mean to 0Standard Deviations above

and below the mean

Page 23: Statistics for Librarians, Session 3: Inferential statistics

DEMONSTRATION OF DISTRIBUTIONS

Distribution of the PopulationThe “Truth”

N is the # of samples

n is the number of items in each

sample

Watch the cumulative mean & medians slowly merge to the population

Page 24: Statistics for Librarians, Session 3: Inferential statistics

ACTIVITIES

Page 25: Statistics for Librarians, Session 3: Inferential statistics

CASE STUDY

• Background: Info-Lit course is meeting resistance from skeptical faculty.• Research Questions:• Does the IL course improve grades on final

papers?• Can the IL course improve passing rates for

the course?• Do students in different majors respond

differently to the IL training?• Is the final score related to the number of

credit hours enrolled for each student?

Page 26: Statistics for Librarians, Session 3: Inferential statistics

METHODOLOGY

Selection

• Two sections of same course with different instructors.

• Random Assignment

Outcome

• Blinded scoring by 2 TAs• Scores range from 1-100• Passing grade: 70

Page 27: Statistics for Librarians, Session 3: Inferential statistics

ACTIVITIES

Table 1• Distributio

n of scores

Table 2• Distributio

n of passing rates by major

Table 3• Correlation

of scores with credit hours

Page 28: Statistics for Librarians, Session 3: Inferential statistics

DESCRIPTIVE STATISTICSOF CASE STUDY

Page 29: Statistics for Librarians, Session 3: Inferential statistics

DISTRIBUTION OF SCORES

Table 1• Distribution

of scores

Table 2• Distribution

of passing rates by broad field of major

Table 3• Correlation

of scores & credit hours

Page 30: Statistics for Librarians, Session 3: Inferential statistics

SIGNIFICANCE TESTING

Page 31: Statistics for Librarians, Session 3: Inferential statistics

SIGNIFICANCE TESTING

• Groups against each other• A group against the population or

standard

Comparing significance of

differences

• Risk of being wrong• Alpha (α)• Set in advance

What is “significant”?

• The value that the statistic must meet or exceed to be statistically significant.

• Based on statistic and αCritical Value

Page 32: Statistics for Librarians, Session 3: Inferential statistics

STEPS IN SIGNIFICANCE TESTING

Which Test?

Calculate Statistic

Critical Value of Statistic?

Probability (p-

value)

Page 33: Statistics for Librarians, Session 3: Inferential statistics

KEY ELEMENTS OF SIGNIFICANCE TESTING

Null Hypothesis

Measure of Central Tendency

Standard deviations

Risk of being wrong (alpha)• Usually .05 or .025 or .01 or .001

Degrees of freedom (df)

Page 34: Statistics for Librarians, Session 3: Inferential statistics

DEGREES OF FREEDOM

Number of values in the final calculation of a statistic that are free to vary.

Page 35: Statistics for Librarians, Session 3: Inferential statistics

DEGREES OF FREEDOM EXPLAINED

• All these have a mean of 5:• 5, 5, 5• 2, 8, 5• 3, 2, 10• 7, 4, & ?

• If 2 values are known and the mean is known, then the 3rd value is also known.• Only 2 of the 3

values are free to vary.

Page 36: Statistics for Librarians, Session 3: Inferential statistics

CALCULATING DEGREES OF FREEDOM (DF)

For a single sample:• Degrees of freedom (df) for t-test = n-1

For more than one group: • df=∑(n-1) for all groups (k)• OR, ∑ n-k

For comparing proportions in categories (k):• df= ∑k-1 (# of categories minus 1)

Page 37: Statistics for Librarians, Session 3: Inferential statistics

COMPARING VALUEST-TEST

Page 38: Statistics for Librarians, Session 3: Inferential statistics

T-TEST

Used with interval or ratio data

Based on normal distribution

Four Decisions• Paired or un-paired samples?• Equal or unequal variances (standard deviations)?• Risk? • One- or two-tail?

• Direction of expected difference• Best to bet on difference in both directions (2-tail)

Page 39: Statistics for Librarians, Session 3: Inferential statistics

One-Tail

Two-Tail

Page 40: Statistics for Librarians, Session 3: Inferential statistics

T-TEST FORMULA FOR UNPAIRED SAMPLES

𝑡=𝑥1−𝑥2𝑆𝑥1−𝑥2

Signal

Noise

Difference  Between  Group  Means  𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐺𝑟𝑜𝑢𝑝𝑠

Page 41: Statistics for Librarians, Session 3: Inferential statistics

ELEMENTS OF T-TEST USING EXCEL DATA ANALYSIS TOOLPAK

• UnpairedPaired or Unpaired samples?

• Equal*Equal or Unequal Variances?

• Data for intervention group• Data for control groupData

• 0Hypothesized difference

• 0.025 (for a 2-tail test)Alpha

Page 42: Statistics for Librarians, Session 3: Inferential statistics

T-TEST IN EXCEL

Page 43: Statistics for Librarians, Session 3: Inferential statistics

READING T-TEST RESULTS

∑(n-1) = (51-1)+(50-1) =50+49=99

<=0.025?

Page 44: Statistics for Librarians, Session 3: Inferential statistics

IS THE DIFFERENCE SIGNIFICANT?

p=0.0005

Page 45: Statistics for Librarians, Session 3: Inferential statistics

TESTING DISTRIBUTION OF NOMINAL DATA

Page 46: Statistics for Librarians, Session 3: Inferential statistics

PEARSON’S CHI-SQUARED (Χ2)GOODNESS OF FIT TEST

Does an observed frequency distribution differ from an expected distribution• Observed is the sample or the intervention.• Expected is the population or the control or a

theoretical distribution.• Will depend on your Null Hypothesis

Nominal or categorical data

• Counts by category

Page 47: Statistics for Librarians, Session 3: Inferential statistics

EXPECTED RATIOS FOR CASE STUDY

Research Question:• Do students in different majors respond differently to the IL training?

Null Hypothesis• The ratio of students who passed will be the same for all majors.

Page 48: Statistics for Librarians, Session 3: Inferential statistics

WHEN TO USE PEARSON’S CHI-SQUARED GOODNESS OF FIT TEST

Nominal Data

Sample Size• Not too large:

• Sample is at most 1/10th of population• Not too small:

• At least five in each of the categories for the expected group.

Page 49: Statistics for Librarians, Session 3: Inferential statistics

OBSERVED PASSING RATES BY MAJOR

Major PassedNot Passed

Grand Total

Arts 6 7 13

Humanities 8 5 13Social Sciences 17 10 27

STEM 20 5 25

Undeclared 16 7 23

Total 67 34 101

Page 50: Statistics for Librarians, Session 3: Inferential statistics

EXPECTED RATIOS OF PASSING RATES BY MAJOR

• H0: Rates of passing will be the same for all majors.• Expected rates: 70% of class passes.• Expected ratios: 70% of each major passes.Major Passed Not Passed Grand Total

Arts 11.2 (16*.7) 4.8 16Humanities 11.2 (16*.7) 4.8 16Social Sciences 18.2 (26*.7) 7.8 26STEM 16.1 (23*.7) 6.9 23Undeclared 14 (20*.7) 6 20

Page 51: Statistics for Librarians, Session 3: Inferential statistics

CHI-SQUARED GOF TEST FORMULA

• Critical value of Chi-squared depends on degrees of freedom.•Degrees of freedom• Based on the number of categories or

table cells (k)• df=k-1

Page 52: Statistics for Librarians, Session 3: Inferential statistics

CHI-SQUARED IN EXCEL

What is Null Hypothesis?

There is no difference between the majors regarding passing rates.

What is your alpha (risk)?

0.05

Data in a summary tables?

Actual Ratios

Expected Ratios

Excel function:

=CHISQ.TEST(actual range1,expected range2)

Provides a p-value

0.0000172

Is p-value <= alpha?

Yes

Page 53: Statistics for Librarians, Session 3: Inferential statistics

CORRELATION OF SCORE & SEMESTER HOURS

ENROLLED

Page 54: Statistics for Librarians, Session 3: Inferential statistics

STATISTICAL CORRELATION

Quantitative value of relationship of 2 variables

• -1 represents a perfect indirect correlation• 0 represents no correlation• +1 represents a perfect direct correlation

Expressed in range of -1 to +1

• How much two variables change together

Based on co-variance

Page 55: Statistics for Librarians, Session 3: Inferential statistics

PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT

Most commonly used statistic

Normally distributed interval or ratio data only

Labeled as r

Multiplication = Interaction

Signal

Noise

𝑟 𝑥𝑦=∑ (𝑥−𝑥 ) ( 𝑦− 𝑦 )

(𝑛−1 )𝑠𝑥 𝑠𝑦

Page 56: Statistics for Librarians, Session 3: Inferential statistics

CORRELATION IN EXCEL

• No correlationNull Hypothesis?

• =PEARSON(range1,range2)Coefficient function (r):

Does NOT have a single function to test for significance

Calculate Probability:

n # in sample 101

df # in sample - 2 99

alpha 0.025 for 2-Tail Test 0.025

r =PEARSON(range1,range2) 0.362287

t =r*SQRT(alpha)/SQRT(1-r^2) 3.867434

p =T.DIST.2T(t,df) 0.000197

Page 57: Statistics for Librarians, Session 3: Inferential statistics

CORRELATIONS FOR ORDINAL DATA

Spearman’s ϱ (rho)• Use if there are limited ties in rank.

Kendall’s τ (tau)• Use if you have a number of ties.

Page 58: Statistics for Librarians, Session 3: Inferential statistics

SELECTING THE TESTS

Page 59: Statistics for Librarians, Session 3: Inferential statistics

KNOW THE TESTS

Assumptions

Limitations

Appropriate data type

What the test tests

Page 60: Statistics for Librarians, Session 3: Inferential statistics

FACTORS ASSOCIATED WITH CHOICE OF STATISTICAL METHOD

Level of Measurement

What is being compared

Independence of units

Underlying variance in the

populationDistribution Sample size

Number of comparison

groups

Page 61: Statistics for Librarians, Session 3: Inferential statistics

USE A FLOW CHART

Page 62: Statistics for Librarians, Session 3: Inferential statistics

GOING BEYOND THE P-VALUEEFFECT SIZES

Page 63: Statistics for Librarians, Session 3: Inferential statistics

AND THE P-VALUE SAYS…

Much about the

distributions

More about the H0 than

H1

Little about size of

differences

Page 64: Statistics for Librarians, Session 3: Inferential statistics

MORE USEFUL STATISTICS

Effect Sizes• Tell the real story

Confidence Intervals• State your certainty

Page 65: Statistics for Librarians, Session 3: Inferential statistics

EFFECT SIZES OF QUANTITATIVE DATA

Differences from the mean

• Standardized• weighted against the

pooled (average) standard deviation

• Cohen’s d

Correlations

• Cohen’s guidelines for Pearson’s r

• r = 0.362

Effect Size

r>

Small .10

Medium .30

Large .50𝑑=

𝑥1−𝑥2𝑠𝑥1 , 𝑥2

Page 66: Statistics for Librarians, Session 3: Inferential statistics

EFFECT SIZES OF QUALITATIVE DATA

Based on Contingency

table

• Uses probabilitiesRelative risk

• RR = 1.608• The passing rate for the intervention

group was 1.6 times the passing rate for control group.

RR of Case Study

Pass No Pass Total

Intervention a (41) b (24) a+b (65)

Control c (26) d (10) c+d (36)

Totals a+c (67) b+d (34) a+b+c+d (101)

Page 67: Statistics for Librarians, Session 3: Inferential statistics

CONFIDENCE INTERVALS

Point estimates

Intervals

Based on

Expressed as:

• Single value• Mean

• Degree of uncertainty• Range of certainty around the

point estimate

• Point estimate (e.g. mean)• Confidence level (usually .95)• Standard deviation

• The mean score of the students who had the IL training was 79.5 with a 95% CI of 76.4 and 82.5.

Page 68: Statistics for Librarians, Session 3: Inferential statistics

CASE STUDY CONCLUSIONS

• Research Questions:• Could the IL course improve grades on final

papers?• Could the IL course improve passing rates for

the course?• Do students in different majors respond

differently to the IL training?• Is the final score related to the number of

credit hours enrolled for each student?

• Control for external variables

Page 69: Statistics for Librarians, Session 3: Inferential statistics

STATISTICAL ANALYSIS

Signal

Noise