Experimental design and inferential statistics: An · PDF fileExperimental design and inferential statistics: An introduction Dr. Alissa Melinger School of Psychology University of

Experimental design andinferential statistics:

An introduction

Dr. Alissa MelingerSchool of PsychologyUniversity of Dundee

Structure of Tutorial

• Block 1: Background and fundamentalunderpinnings to inferential statistics

• Block 2: Tests for evaluating differences

• Block 3: Tests for evaluating associations

• Block 4: New analyses that I’m quite excited aboutPlus, your data and questions

Background and fundamentalunderpinnings to inferential

statistics

Block 1

The Goal

• We observe some behavior, of individuals,the economy, our computer models, etc.

• We want to say something about thisbehavior. We’d like to say something thatextends beyond just these observations, tofuture behaviors, past behaviors,unobserved behaviors.

Types of Analysis

• Descriptive statistics: summarizing anddescribing the important characteristics ofthe data.

• Inferential statistics: decide if a pattern,difference, or relation found with a sample isrepresentative and true of the population.

Definitions

• Population: the entire set of entities that areclassified together– Population of native German speakers (large group)– Population of native Lakhota speaker (medium sized

group)– Population of 5 year old Lakhota speakers (small group)

• Sample: is subset of entities that make up apopulation– Random selection of 1000 native German speakers.

• Data from a population → exhaustively collecteddata from all relevant individuals• No need for inferential statistics

• Data from a sample → collected data from asubset of your population

• Inferential statistics help determine if the samplereflects the population

•Key point → everyone is unique

Generalizing from a sample

• We want some measure of reassurance thatour observations are representative of thepopulation, and not just characteristic of thesample.

• Any DIFFERENCE we observe indescriptive statistics needs to be evaluated.

Is this difference REAL?

• Descriptive difference– Milk costs .69 at Plus and .89 at Lidl

• Is this a real difference?

• Subjective difference• Is the difference important enough to me? Is it worth my while

to travel farther to pay less?

• Statistical difference– Is Plus generally cheaper than Lidl?

• How representative of the prices is milk?– Are all Plus stores cheaper than all Lidl?

• How representative of all Plus stores is my Plus?

Is a difference REAL?• To answer this question we could go through the store and

compare every item in each store and call every Plus and Lidlin the world– If we can get information on a whole population, we don’t need

inferential statistics.OR

• We can look at a Sample of products and stores and then usestatistics to determine whether our observation is true of otherunobserved products and shops.– Important to choose sample well (not only from the chocolate aisle)

• Statistics help us determine how well our Sample represents thePopulation.

A simple model of the data

• Different statistical methods attempt to builda model of the data using hypothesizedfactors to account for the characteristics ofthe observed pattern.

• One simple model of the data is the MEAN– Mode, median are other simple models– Distributions, counts, spreads, range, ect.

The mean

1F

4E

0D

2C

3B

1A

# siblingsSubjects

Mean # siblings = 1.83

How well does the meanmodel the data?

ERROR

Error• Variance is the average error between the

mean and the observations.• Sum of error is offset by positive and negative

numbers– Take the square of each error value

• Sum of squared errors (SS) will increase themore data you collect.– Large number ≠ bad estimate of # of siblings

• Divide the sum of squared errors by N-1• Variance (s2) = SS/N-1

!

xi " x( )#

!

xi " x( )#2

Error is crucial to inferential statistics.

If you don’t have Error in your data (e.g., if you aretesting a model and 100 tests would give you the

same result because the model is deterministic) youdo not need to do inferential statistics.

Variance• Sum of error is offset by positive and

negative numbers– Take the square of each error value

• Sum of squared errors (SS)

• Divide the sum of squared errors by N-1• Variance (s2) = SS/N-1

!

xi " x( )#

!

xi " x( )#2

Standard Deviation

• Variance gives us measure in units squared,so not comparable to directly to the unitsmeasured.

• Standard Deviation is measure of how wellthe mean represents the data.

!

s = SSN "1

Sampling• Sampling is a random selection of representative

members of a population.• If you had access to all members of a population then

you would not need to conduct inferential statistics tosee whether some observation generalizes to thewhole population.

• Normally, we only have access to a (representative)subset.

• Most random samples tend to be fairly typical of thepopulation, but there is always variation and thepotential of selection bias.

Standard Error• Standard errors (SE) are similar to SD but they

apply to sample means rather than individualmeans.

• Standard errors give you a measure of howrepresentative your sample is of the population.

• A large standard error means your sample is notvery representative of the population. Small SEmeans it is representative.

!

MSE = S / N

types of tests / types of data

• Which test to choose depends on the typeof data you have and the question you areasking.

• Parametric tests have certain assumptionsabout the data.

• Non-parametric tests are ‘assumption free’

Parametric Assumptionsabout your data

• Normally distributed

• Independent **

• Homogeneity of variance

• At least interval scale **

Normality

• Your data should befrom a normallydistributed population.– Normal distributions are

symmetrical ‘bell-shaped’ distributionswith the majority ofscores around thecenter.

Normal Curves• Normal curves can be defined by their

mean and variance.• Z-distribution has mean = 0 and variance

= 1

Normal Curves

95% of cases fall within 2standard deviations ofmean

95% of the time, a samplepopulation will fall in the whitepart of the distribution.

Parametric testsassume your dataconform to thispattern

Homogeneity of Variance

• The variance should not changesystematically throughout the data.

• When you test different groups of subjects(monolinguals vs. bilinguals; test vs. control;trained vs. untrained), their variances shouldnot differ.– If you test two corpora, the variance should not

differ.

Independence

• Data from different subjects (speakers, sentences)are independent.– If trial n influences behavior on trial n+I, then trials are

not independent.– If two participants related (friends, partners), behavior

might not be independent.– Binary (either X or Y) classifications are non-

independent– If you measure distance between anaphor and

antecedent and you have more than one anaphor perantecedent, the individual distances will not beindependent.

Types of Data• Nominal scale (qualitative):

– Numbers represent qualitative features, not quantitative.– 1 not bigger than 2, just different; 1=masculine, 2 = feminine

• Ordinal Scale (qualitative):– Rankings, 1<2<3<4, but differences between values not important

or constant; Likert scale data.– Distance between 1&2 ≠ distance between 3&4

• Interval Scale (quantitative): like ordinal, but distances areequal– Differences make sense, but ratios don’t (30°-20°=20°-10°, but 20°/10° is not twice as hot)

– e.g., temperature, dates• Ratio Scale (quantitative): interval, plus a meaningful 0

point.– Weight, length, reaction times, age

Types of Measurement Scale

Measure #of correctanswerson test

Convey over& underestimates

Judge whois 1st, 2nd,3rd

ID malesvs. Females

YesNoNoNo

QuantityQuantityRelativeQuantity

RatioIntervalOrdinalNominal





ID malesvs. Females

YesNoNoNo

QuantityQuantityRelativeQuantity /Categories


What does the scale indicate?





ID malesvs. Females

YesNoNoNo



Is there a true Zero?


Measuretime tocompletetask



ID malesvs. Females

YesNoNoNo



How might the scale be used in research?

Experimental designs

• What I mean by experiment is likely quitedifferent from what you mean, but hopefullythe two terms will overlap sufficiently.

• An experiment should allow for a systematicobservation of a particular behavior undercontrolled circumstances.

• Observed patterns in the data should betraceable to our manipulation.

Two types of experimentalvariables

• You manipulate the situations under whichthe behavior is observed and measured.The variables you manipulate are yourindependent variables.

• You observe and measure a particularbehavior. This measurement is yourdependent variable.

Hypotheses: what the IV shoulddo to the DV

• The experimental hypothesis is what youare testing and what you are hoping to find.

• The NULL Hypothesis states manipulationwill have not impact

• The goal of our statistical tests is not toprove our hypothesis but to reject the NULLHypothesis.

P-value• Each test provides a test statistic and a p-value.• In parametric tests, the test statistic is a ratio of

the variance within the data not attributable to yourIV to the variance that is attributable to the IV.– Signal to noise ratio

• The p-value is a probability that the observeddifference is real.– probability that difference occurred by chance.– probability that it would not replicate.

The ratioVariance withinconditions reflectsrandom variancefrom multiplesources.

Variance betweenconditions attributable toexperimental manipulation

A significanteffect requiresmore betweenvariance thanwithin variance

Some common designs

• 1 IV, 2 levels(state or valueof IV)

• 2 IVs, 2 Levelseach

21

Subordinateclause

Mainclause

Readingtimes

21Transitive

43Intransitive

Subordinateclause

Mainclause

Readingtimes

Repeated Measures Designs(Within Sample)

• Repeated measures = more than one observationfrom each subject.

• To reduce subject variance, use same subjects inall conditions; within subjects

• take multiple measures from same individual.– Observe a single sentence in multiple contexts.– but be sure to control for order effects and sequence

effects.

Between sample design

• If each subject only experiences onecondition (1 level of the IV), then you makecomparisons between individuals.

• No way to factor out the inherent differencesbetween the individuals– Necessary when comparing monolinguals to

bilinguals, boys to girls, dyslexics to non-dyslexics, etc.

Jonckheeretrend

VarianceRatio (F)

Independent

Mann-Whitneyχ2

Independent Z-Independent t-VarianceRatio (F)

Independent

Spearman’srank correlationcoefficient

Page’s Ltrend

Wilcoxon

Sign

One-sampleproportion

VarianceRatio (F)

Related tOne-sample ZOnesample t

Product-momentcorrelationcoefficient(Pearson’s r)Linearregression

RelatedRelated

CorrelationK sampleTwo-SampleOne-Sample

Type of Research DesignTy

pe o

f Dat

a

Para

met

ricN

on-p

aram

etric

Between analyses

χ2Kruskal-Wallis

Between-subjectsANOVA

Three ormore

χ2Mann-Whitney

Independent samples t

two

Nonparametric -nominal

Nonparametric - ordinal

Parametricscores

# ofconditions

Within analyses

Linearmixedeffects

FriedmanWithin-subjectsANOVA

Three ormore

Linearmixedeffects

WilcoxonDependentsamples t

two

Nonparametric -nominal

Nonparametric - ordinal

Parametricscores

# ofconditions

Documents

Experimental design and inferential statistics: An · PDF fileExperimental design and inferential statistics: An introduction Dr. Alissa Melinger School of Psychology University of