51
PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

Embed Size (px)

Citation preview

Page 1: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130C UNIVARIATE ANALYSIS

Prof. James Elder

Page 2: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

Introduction

Page 3: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 3

What is (are) statistics?

• A branch of mathematics concerned with understanding and summarizing collections of numbers

• A collection of numerical facts

• Estimates of population parameters, derived from samples

Page 4: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 4

What is this course about?

• Applied statistics

• Emphasizes methods, not proofs

• Descriptive statistics

• Inferential statistics

Page 5: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 5

Fall Term

Date Title Readings Notes

10-Sep-08 Introduction Probability Descriptive statistics

1.1-1.3 5.1-5.5, 5.7 2.1,2.2,2.5,2.7-2.9,2.12,2.13

17-Sep-08 The normal distribution 3.1-3.4 Lab 1

24-Sep-08 Introduction to hypothesis testing t-tests

4 7

1-Oct-08 Rosh Hashanah – No Classes

8-Oct-08 t-tests 7 Lab 2

15-Oct-08 Statistical power and effect size 8 Assignment 1 due

22-Oct-08 Correlation and regression 9

29-Oct-08 One-way independent ANOVA 11 Lab 3

5-Nov-08 Multiple comparisons 12.1-12.12

12-Nov-08 Multiple comparisons 12.1-12.12 Lab 4

19-Nov-08 Two-way ANOVA 13.1-13.11,13.14 Assignment 2 due

26-Nov-08 Review

3-Dec-08 Exam

Page 6: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 6

Winter Term

Date Title Readings Deadlines

7-Jan-09 Repeated measures ANOVA 14

14-Jan-09 Two-way mixed design ANOVA 14 Lab 5 Deadline for choosing project topic

21-Jan-09 Reading Week

28-Jan-09 Multiple regression 15 Lab 6

4-Feb-09 The general linear model 16 Assignment 3 due, drop date is Feb 1

11-Feb-09 The binomial distribution 5.6, 5.8-5.10 Lab 7

18-Feb-09 Reading Week – No Classes

25-Feb-09 Chi-square tests 6

4-Mar-09 Resampling and nonparametric techniques 18 Lab 8

11-Mar-09 Student Presentations

18-Mar-09 Student Presentations Assignment 4 due

25-Mar-09 Review

1-Apr-09 Exam

Page 7: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

Some Background

(Howell Ch. 1)

Page 8: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 8

Variables and Constants

• Constants are properties that never change (e.g., the speed of light in a vacuum ~3x108m/s).

• Most physiological and psychological parameters of interest vary considerably

– Between individuals (e.g., intelligence quotient)

– Within individuals (e.g., heart rate)

• Any variable whose variation is somewhat unpredictable is called a random variable (rv).

Page 9: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 9

Scales of measurement

• Nominal scale: values are categories, having no meaningful correspondence to numbers.

Page 10: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 10

Scales of measurement

• Ordinal scale: ordering is meaningful, but exact numerical values (if they exist) are not.

Page 11: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 11

Scales of measurement

• Interval scale: values are numerically meaningful, and interval between two values is meaningful.

– Example: Celsius temperature scale. It takes the same amount of energy to raise the temperature of a gram of water from 20 °C to 21 °C as it does to raise it from 30 °C to 31 °C.

• Ratio scale: ratio of two values is also meaningful.

– Example: Kelvin temperature scale. A gram of H20 at 300 K has twice the energy of a gram of H20 at 150 K.

– Ratio scales require a 0-point corresponding to a complete lack of the substance being measured.

• Example: a gram of H20 at 0 K has no heat (particles are motionless).

Page 12: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 12

Continuous vs Discrete Variables

• A continuous variable may assume any real value within some range

Page 13: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 13

Continuous vs Discrete Variables

• A discrete variable may assume only a countable number of values: intermediate values are not meaningful.

Page 14: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 14

Independent vs Dependent Variables

• Experiments involve independent and dependent variables.

– The independent variable is controlled by the experimenter.

– The dependent variable is measured.

– We seek to detect and model effects of the independent variable on the dependent variable.

• Example: In a visual search task, subjects are asked to find the odd-man-out in a display of discrete items (e.g., a horizontal bar amongst vertical bars).

– The number of items in the display is an independent variable.

– Reaction time is the main dependent variable.

– Typically, we observe a roughly linear relationship between the number of items and the reaction time.

Page 15: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 15

Experimental vs Correlational Research

• Experimental study:

– Researcher controls the independent variable.

– Seek to detect effects on the dependent variable.

– Direction of causation may be inferred (but may be indirect).

• Correlational study:

– There are no independent or dependent variables.

– No variables are under control of the researcher.

– Seek to find statistical relationships (dependencies) between variables.

– Direction of causation may not normally be inferred.

Page 16: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 16

Correlational Studies: Examples

Page 17: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 17

Populations vs Samples

• In human science, we typically want to characterize and make inferences not about a particular person (e.g., Uncle Bob) but about all people, or all people with a certain property (e.g., all people suffering from a bipolar disorder).

• These groups of interest are called populations.

• Typically, these populations are too large and inaccessible to study.

• Instead, we study a subset of the group, called a sample.

• In order to make reliable inferences about the population, samples are ideally randomly selected.

• The population properties of interest are called parameters.

• The corresponding measurements made on our samples are called statistics. Statistics are approximations (estimates) of parameters.

Page 18: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 18

Different Types of Populations and Samples

• Outside of human science, populations do not necessarily refer to humans

– e.g. populations may be of bees, algae, quarks, stock prices, pork belly futures, ozone levels, etc…

• In clinical and social psychology you will often be conducting large-n studies on human populations.

• In cognitive psychology, you will often be doing small-n within-subject studies involving repeated trials on the same subject.

– Here, you may think of the ‘population’ as being the infinite set of responses you would obtain were you able to continue the experiment indefinitely.

– The sample is the set of responses you were able to collect in a finite number of trials (e.g., 5000) on the same subject.

Page 19: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 19

Summation Notation

Let Number of siblings for respondent iX i

i Xi Yi

1 1 2

2 2 1

3 2 1

… … …

N 4 0

Number of children for respondent iY i

1

1Then

N

ii

X XN

1

1 N

ii

Y YN

where Number of respondents in sampleN

Page 20: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 20

Some Summation Rules

N

i ii=1

1. Often abbreviate X as X

2. ( )i i i iX Y X Y

1 1 2 2 1 2 1 2since (X ) (X ) (X ) (Y ) Associative property of additionY Y X Y

Similarly, ( )i i i iX Y X Y

3. , where is a constant,C NC Csince adding C to itself N times yields N C's.

4. i iCX C X

1 2 1 2since ( ) Multiplication is distributive over additionCX CX C X X

But note that

5. i i i iXY X Y

1 1 2 2 1 2 1 2 1 1 1 2 2 1 2 2since X X (X )(Y ) X +X X XY Y X Y Y Y Y Y

Page 21: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 21

Summary

• What is (are) statistics

• Variables and constants

• Scales of measurement

• Continuous and discrete variables

• Independent and dependent variables

• Experimental and correlational research

• Populations and samples

• Summation Notation

Page 22: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

Descriptive Statistics(Howell, Ch 2)

Page 23: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 23

Frequency Tables1991 U.S. General Social Survey: Number of Brothers and Sisters

Frequency Percent Valid Percent Cumulative Percent

Valid 0 74 4.88 4.92 4.921 236 15.56 15.68 20.602 276 18.19 18.34 38.943 236 15.56 15.68 54.624 209 13.78 13.89 68.505 118 7.78 7.84 76.356 80 5.27 5.32 81.667 81 5.34 5.38 87.048 58 3.82 3.85 90.909 47 3.10 3.12 94.02

10 34 2.24 2.26 96.2811 22 1.45 1.46 97.7412 11 0.73 0.73 98.4713 9 0.59 0.60 99.0714 5 0.33 0.33 99.4015 3 0.20 0.20 99.6016 1 0.07 0.07 99.6717 2 0.13 0.13 99.8018 1 0.07 0.07 99.8721 1 0.07 0.07 99.9326 1 0.07 0.07 100.00

Total 1505 99.21 100.00Missing DK 4 0.26

NA 8 0.53Total 12 0.79

Total 1517 100.00

Page 24: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 24

Bar Graphs and Histograms

Page 25: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 25

Grouped Frequency Distributions

• What are the apparent limits?

• What are the real limits?X f

<5 5815 - 9 66110 - 14 74015 - 19 70120 - 24 68925 - 29 67430 - 34 73135 - 39 90340 - 44 93045 - 49 83850 - 54 74655 - 59 60860 - 64 43465 - 69 38370 - 74 34575 - 79 28880 - 84 17485+ 97

Statistics Canada 2001 CensusAge of Respondent

Page 26: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 26

Percentiles and Percentile Ranks

• Percentile: The score at or below which a given % of scores lie.

• Percentile Rank: The percentage of scores at or below a given score

Page 27: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 27

Linear Interpolation to Compute Percentile Ranks

What if you have a 23-year-old respondent and

would like to know her percentile rank?

Let age (percentile)xpercentile ranky

Then the linear (affine) interpolation model is: y ax b

There are 2 unknowns ( and ). If we have two

data points near these unknowns, we can solve:

a b

1 1

2 2

y ax b

y ax b

2 1

2 1

y ya

x x

Thus y ax b

1 1ax y ax

1 1( )y a x x

2 11 1

2 1

( )y y

y x xx x

Frequency Percent Cumulative Percent

Valid <5 581 5.5 5.55 - 9 661 6.3 11.810 - 14 740 7.0 18.815 - 19 701 6.7 25.520 - 24 689 6.5 32.025 - 29 674 6.4 38.430 - 34 731 6.9 45.435 - 39 903 8.6 54.040 - 44 930 8.8 62.845 - 49 838 8.0 70.850 - 54 746 7.1 77.955 - 59 608 5.8 83.660 - 64 434 4.1 87.865 - 69 383 3.6 91.470 - 74 345 3.3 94.775 - 79 288 2.7 97.480 - 84 174 1.7 99.185+ 97 0.9 100.0Total 10523 100.0

Statistics Canada 2001 Census Age of Respondent

Page 28: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 28

Frequency Percent Cumulative Percent

Valid <5 581 5.5 5.55 - 9 661 6.3 11.810 - 14 740 7.0 18.815 - 19 701 6.7 25.520 - 24 689 6.5 32.025 - 29 674 6.4 38.430 - 34 731 6.9 45.435 - 39 903 8.6 54.040 - 44 930 8.8 62.845 - 49 838 8.0 70.850 - 54 746 7.1 77.955 - 59 608 5.8 83.660 - 64 434 4.1 87.865 - 69 383 3.6 91.470 - 74 345 3.3 94.775 - 79 288 2.7 97.480 - 84 174 1.7 99.185+ 97 0.9 100.0Total 10523 100.0

Linear Interpolation to Compute Percentiles

What if you want to know what the median age is? Statistics Canada 2001 Census Age of Respondent

2 1

1 12 1

To compute percentiles,

simply swap the x's and y's in the formula:

x ( )x x

x y yy y

Page 29: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 29

Measures of Central Tendency

• The mode – applies to ratio, interval, ordinal or nominal scales.

• The median – applies to ratio, interval and ordinal scales

• The mean – applies to ratio and interval scales

Mean Median ModeAGE 37.1 37 41

Page 30: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 30

The Mode

• Defined as the most frequent value (the peak)

• Applies to ratio, interval, ordinal and nominal scales

• Sensitive to sampling error (noise)

• Distributions may be referred to as unimodal, bimodal or multimodal, depending upon the number of peaks

Mode = 41

Page 31: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 31

The Median

• Defined as the 50th percentile

• Applies to ratio, interval and ordinal scales

• Can be used for open-ended distributions

Median 37

Page 32: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 32

The Mean

• Applies only to ratio or interval scales

• Sensitive to outliers

1

1Population mean

N

ii

XN

1

1Sample mean

N

ii

X XN

37.1X

Page 33: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 33

Properties of the Mean

Then the mean also increases (decreases) by :C

X X C

Suppose a constant is added (or subtracted) to every score in your sample:

i i

C

X X C1.

Then the mean is also multiplied (divided) by :C

X CX

Suppose every score in your sample is multiplied (divided) by a constant :

i i

C

X CX2.

( ) 0iX X3.

Page 34: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 34

Properties of the Mean (Cntd…)

2 2

Least-squares property: the mean minimizes the sum of squared deviations:

( ) ( ) i iX X X X X

2

2 2 2

2

Proof:

( ) has a minimum where ( ) 0 and ( ) 0i i i

d dX X X X X X

dX dX

2 1( ) 2 ( ) 0i i i

dX X X X X X X

dX N

2

2

2( ) 2 0i

dX X N

dX

Page 35: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 35

Measures of Variability (Dispersion)

• Range – applies to ratio, interval, ordinal scales

• Semi-interquartile range – applies to ratio, interval, ordinal scales

• Variance (standard deviation) – applies to ratio, interval scales

Page 36: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 36

Range

• Interval between lowest and highest values

• Generally unreliable – changing one value (highest or lowest) can cause large change in range.

Range = 79 drinks

Page 37: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 37

Semi-Interquartile Range• The interquartile range is the interval between the first and third

quartile, i.e. between the 25th and 75th percentile.

• The semi-interquartile range is half the interquartile range.

• Can be used with open-ended distributions

• Unaffected by extreme scores

N Valid 19769Missing 6004

Median 4Percentiles 25 2

50 475 7

SIQ = 2.5 drinks

Page 38: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 38

Population Variance and Standard Deviation

dev iis at kno ionwn as the of sample iX i

2Thus ( ) is known as t sum of squared deviah te ions.iSS X

2

2 2

The population is simply the mean squared deviation:

1(

varianc

)

e

iXN

2

The population standard deviation is simply the square-root of the variance:

1( )iXN

The standard deviation is particularly sensitive to outliers, due to the squaring operation.

Page 39: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 39

Sample Variance and Standard Deviation

de is known as the viation of sample iX X i

2Thus ( ) is known as t sum of squared deviationshe .iSS X X

2

2

1The mean squared sample deviation ( )

is a biased estimator of the population variance

- it tends to underestimate .

iX XN

2

2 2

A minor modification makes the sample variance unbiased:

1( )

1 i

s

s X XN

2

The corrected sample standard deviation is given by

1( )

1 is not an unbiased estimator of , but is close enough for most purposes.

is X XN

s

Page 40: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 40

Degrees of Freedom

The is the number of independent measurements

available for estimating a p

degrees of freedom

opulation parame

ter.

df

2The calculation of involves . Knowing and 1 of the sample values

allows us to infer the value of the remaining sample value. Thus only

1 of the sample values are independent, and 1.

s X X N

N df N

Page 41: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 41

Computational Formulas for Variance

2The formula for the sum of squares: devi (ation l )a iSS X X

2 2computationalMore efficient to use the formula: iSS X NX

Why are these equivalent?

2 2 2( ) ( 2 )i i iX X X X X X

2 22i iX X X X

2 2 22iX NX NX

2 2iX NX

2 2 2

Thus

1s

1 iX NXN

Page 42: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 42

Properties of the Standard Deviation

Suppose a constant is added (or subtracted) to every score in your sample:

i i

C

X X C

Then the standard devia does not chation nge.

1.

Page 43: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 43

Properties of the Standard Deviation (cntd…)

Suppose every score in your sample is multiplied (divided) by a constant :

i i

C

X CX

2.

Then the standard deviation is also multiplied (divided) by :C

s Cs

2

Proof:

1( )

1old is X XN

21

( )1new is CX CX

N

21

( )1 iC X X

N

oldCs

Page 44: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 44

Standard Deviation Example

5.7 drinks

5.8 drinks

X

s

cf. SIQ = 2.5 drinks

range = 79 drinks

Page 45: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 45

Skew

• The mean and median are identical for symmetric distributions.

• Skew tends to push the mean away from the median, toward the tail (but not always)

Median=3

Mean=6.7

Page 46: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 46

Skewness

• Properties of skewness

– Positive for positive skew (tail to the right)

– Negative for negative skew (tail to the left)

– Dimensionless

– Invariant to shifting or scaling data (adding or multiplying constants)

3

3

( )Sample skewness =

2 ( 1)iX XN

N N s

Page 47: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 47

Dealing with Outliers

• Trimming:

– Throw out the top and bottom k% of values (k=5%, for example).

– May be justified if there is evidence for confounding process interfering with the dependent variable being studied

• Example: participant blinks during presentation of a visual stimulus

• Example: participant misunderstands a question on a questionnaire.

• Transforming

– Scores are transformed by some function (e.g., log, square root)

– Often done to reduce or eliminate skewness

Page 48: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 48

Log-Transforming Data

skewness=0.67 skewness=0.08

Page 49: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

End of Lecture 1

Sept 10, 2008

Page 50: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 50

Kurtosis

kurtosis>0: leptokurtic (Laplacian)

kurtosis=0: mesokurtic (Gaussian)kurtosis<0: platykurtic

4 2

4

( )N(N+1) ( 1)Sample kurtosis = 3

(N-2)(N-3) ( 1) ( 2)( 3)iX X N

N s N N

Page 51: PSYC 6130C UNIVARIATE ANALYSIS Prof. James Elder

PSYC 6130, PROF. J. ELDER 51

Summary

• Measures of central tendency

• Measures of dispersion

• Skew

• Kurtosis