Upload
vunga
View
216
Download
2
Embed Size (px)
Citation preview
Experimental design andinferential statistics:
An introduction
Dr. Alissa MelingerSchool of PsychologyUniversity of Dundee
Structure of Tutorial
• Block 1: Background and fundamentalunderpinnings to inferential statistics
• Block 2: Tests for evaluating differences
• Block 3: Tests for evaluating associations
• Block 4: New analyses that I’m quite excited aboutPlus, your data and questions
Background and fundamentalunderpinnings to inferential
statistics
Block 1
The Goal
• We observe some behavior, of individuals,the economy, our computer models, etc.
• We want to say something about thisbehavior. We’d like to say something thatextends beyond just these observations, tofuture behaviors, past behaviors,unobserved behaviors.
Types of Analysis
• Descriptive statistics: summarizing anddescribing the important characteristics ofthe data.
• Inferential statistics: decide if a pattern,difference, or relation found with a sample isrepresentative and true of the population.
Definitions
• Population: the entire set of entities that areclassified together– Population of native German speakers (large group)– Population of native Lakhota speaker (medium sized
group)– Population of 5 year old Lakhota speakers (small group)
• Sample: is subset of entities that make up apopulation– Random selection of 1000 native German speakers.
• Data from a population → exhaustively collecteddata from all relevant individuals• No need for inferential statistics
• Data from a sample → collected data from asubset of your population
• Inferential statistics help determine if the samplereflects the population
•Key point → everyone is unique
Generalizing from a sample
• We want some measure of reassurance thatour observations are representative of thepopulation, and not just characteristic of thesample.
• Any DIFFERENCE we observe indescriptive statistics needs to be evaluated.
Is this difference REAL?
• Descriptive difference– Milk costs .69 at Plus and .89 at Lidl
• Is this a real difference?
• Subjective difference• Is the difference important enough to me? Is it worth my while
to travel farther to pay less?
• Statistical difference– Is Plus generally cheaper than Lidl?
• How representative of the prices is milk?– Are all Plus stores cheaper than all Lidl?
• How representative of all Plus stores is my Plus?
Is a difference REAL?• To answer this question we could go through the store and
compare every item in each store and call every Plus and Lidlin the world– If we can get information on a whole population, we don’t need
inferential statistics.OR
• We can look at a Sample of products and stores and then usestatistics to determine whether our observation is true of otherunobserved products and shops.– Important to choose sample well (not only from the chocolate aisle)
• Statistics help us determine how well our Sample represents thePopulation.
A simple model of the data
• Different statistical methods attempt to builda model of the data using hypothesizedfactors to account for the characteristics ofthe observed pattern.
• One simple model of the data is the MEAN– Mode, median are other simple models– Distributions, counts, spreads, range, ect.
The mean
1F
4E
0D
2C
3B
1A
# siblingsSubjects
Mean # siblings = 1.83
How well does the meanmodel the data?
ERROR
Error• Variance is the average error between the
mean and the observations.• Sum of error is offset by positive and negative
numbers– Take the square of each error value
• Sum of squared errors (SS) will increase themore data you collect.– Large number ≠ bad estimate of # of siblings
• Divide the sum of squared errors by N-1• Variance (s2) = SS/N-1
!
xi " x( )#
!
xi " x( )#2
Error is crucial to inferential statistics.
If you don’t have Error in your data (e.g., if you aretesting a model and 100 tests would give you the
same result because the model is deterministic) youdo not need to do inferential statistics.
Variance• Sum of error is offset by positive and
negative numbers– Take the square of each error value
• Sum of squared errors (SS)
• Divide the sum of squared errors by N-1• Variance (s2) = SS/N-1
!
xi " x( )#
!
xi " x( )#2
Standard Deviation
• Variance gives us measure in units squared,so not comparable to directly to the unitsmeasured.
• Standard Deviation is measure of how wellthe mean represents the data.
!
s = SSN "1
Sampling• Sampling is a random selection of representative
members of a population.• If you had access to all members of a population then
you would not need to conduct inferential statistics tosee whether some observation generalizes to thewhole population.
• Normally, we only have access to a (representative)subset.
• Most random samples tend to be fairly typical of thepopulation, but there is always variation and thepotential of selection bias.
Standard Error• Standard errors (SE) are similar to SD but they
apply to sample means rather than individualmeans.
• Standard errors give you a measure of howrepresentative your sample is of the population.
• A large standard error means your sample is notvery representative of the population. Small SEmeans it is representative.
!
MSE = S / N
types of tests / types of data
• Which test to choose depends on the typeof data you have and the question you areasking.
• Parametric tests have certain assumptionsabout the data.
• Non-parametric tests are ‘assumption free’
Parametric Assumptionsabout your data
• Normally distributed
• Independent **
• Homogeneity of variance
• At least interval scale **
Normality
• Your data should befrom a normallydistributed population.– Normal distributions are
symmetrical ‘bell-shaped’ distributionswith the majority ofscores around thecenter.
Normal Curves• Normal curves can be defined by their
mean and variance.• Z-distribution has mean = 0 and variance
= 1
Normal Curves
95% of cases fall within 2standard deviations ofmean
95% of the time, a samplepopulation will fall in the whitepart of the distribution.
Parametric testsassume your dataconform to thispattern
Homogeneity of Variance
• The variance should not changesystematically throughout the data.
• When you test different groups of subjects(monolinguals vs. bilinguals; test vs. control;trained vs. untrained), their variances shouldnot differ.– If you test two corpora, the variance should not
differ.
Independence
• Data from different subjects (speakers, sentences)are independent.– If trial n influences behavior on trial n+I, then trials are
not independent.– If two participants related (friends, partners), behavior
might not be independent.– Binary (either X or Y) classifications are non-
independent– If you measure distance between anaphor and
antecedent and you have more than one anaphor perantecedent, the individual distances will not beindependent.
Types of Data• Nominal scale (qualitative):
– Numbers represent qualitative features, not quantitative.– 1 not bigger than 2, just different; 1=masculine, 2 = feminine
• Ordinal Scale (qualitative):– Rankings, 1<2<3<4, but differences between values not important
or constant; Likert scale data.– Distance between 1&2 ≠ distance between 3&4
• Interval Scale (quantitative): like ordinal, but distances areequal– Differences make sense, but ratios don’t (30°-20°=20°-10°, but 20°/10° is not twice as hot)
– e.g., temperature, dates• Ratio Scale (quantitative): interval, plus a meaningful 0
point.– Weight, length, reaction times, age
Types of Measurement Scale
Measure #of correctanswerson test
Convey over& underestimates
Judge whois 1st, 2nd,3rd
ID malesvs. Females
YesNoNoNo
QuantityQuantityRelativeQuantity
RatioIntervalOrdinalNominal
Types of Measurement Scale
Measure #of correctanswerson test
Convey over& underestimates
Judge whois 1st, 2nd,3rd
ID malesvs. Females
YesNoNoNo
QuantityQuantityRelativeQuantity /Categories
RatioIntervalOrdinalNominal
What does the scale indicate?
Types of Measurement Scale
Measure #of correctanswerson test
Convey over& underestimates
Judge whois 1st, 2nd,3rd
ID malesvs. Females
YesNoNoNo
QuantityQuantityRelativeQuantity /Categories
RatioIntervalOrdinalNominal
Is there a true Zero?
Types of Measurement Scale
Measuretime tocompletetask
Convey over& underestimates
Judge whois 1st, 2nd,3rd
ID malesvs. Females
YesNoNoNo
QuantityQuantityRelativeQuantity /Categories
RatioIntervalOrdinalNominal
How might the scale be used in research?
Experimental designs
• What I mean by experiment is likely quitedifferent from what you mean, but hopefullythe two terms will overlap sufficiently.
• An experiment should allow for a systematicobservation of a particular behavior undercontrolled circumstances.
• Observed patterns in the data should betraceable to our manipulation.
Two types of experimentalvariables
• You manipulate the situations under whichthe behavior is observed and measured.The variables you manipulate are yourindependent variables.
• You observe and measure a particularbehavior. This measurement is yourdependent variable.
Hypotheses: what the IV shoulddo to the DV
• The experimental hypothesis is what youare testing and what you are hoping to find.
• The NULL Hypothesis states manipulationwill have not impact
• The goal of our statistical tests is not toprove our hypothesis but to reject the NULLHypothesis.
P-value• Each test provides a test statistic and a p-value.• In parametric tests, the test statistic is a ratio of
the variance within the data not attributable to yourIV to the variance that is attributable to the IV.– Signal to noise ratio
• The p-value is a probability that the observeddifference is real.– probability that difference occurred by chance.– probability that it would not replicate.
The ratioVariance withinconditions reflectsrandom variancefrom multiplesources.
Variance betweenconditions attributable toexperimental manipulation
A significanteffect requiresmore betweenvariance thanwithin variance
Some common designs
• 1 IV, 2 levels(state or valueof IV)
• 2 IVs, 2 Levelseach
21
Subordinateclause
Mainclause
Readingtimes
21Transitive
43Intransitive
Subordinateclause
Mainclause
Readingtimes
Repeated Measures Designs(Within Sample)
• Repeated measures = more than one observationfrom each subject.
• To reduce subject variance, use same subjects inall conditions; within subjects
• take multiple measures from same individual.– Observe a single sentence in multiple contexts.– but be sure to control for order effects and sequence
effects.
Between sample design
• If each subject only experiences onecondition (1 level of the IV), then you makecomparisons between individuals.
• No way to factor out the inherent differencesbetween the individuals– Necessary when comparing monolinguals to
bilinguals, boys to girls, dyslexics to non-dyslexics, etc.
Jonckheeretrend
VarianceRatio (F)
Independent
Mann-Whitneyχ2
Independent Z-Independent t-VarianceRatio (F)
Independent
Spearman’srank correlationcoefficient
Page’s Ltrend
Wilcoxon
Sign
One-sampleproportion
VarianceRatio (F)
Related tOne-sample ZOnesample t
Product-momentcorrelationcoefficient(Pearson’s r)Linearregression
RelatedRelated
CorrelationK sampleTwo-SampleOne-Sample
Type of Research DesignTy
pe o
f Dat
a
Para
met
ricN
on-p
aram
etric
Between analyses
χ2Kruskal-Wallis
Between-subjectsANOVA
Three ormore
χ2Mann-Whitney
Independent samples t
two
Nonparametric -nominal
Nonparametric - ordinal
Parametricscores
# ofconditions
Within analyses
Linearmixedeffects
FriedmanWithin-subjectsANOVA
Three ormore
Linearmixedeffects
WilcoxonDependentsamples t
two
Nonparametric -nominal
Nonparametric - ordinal
Parametricscores
# ofconditions