Descriptive statistics 922. Hypothesis (Linguistic) Participants Task (stimuli = questions,...

Preview:

Citation preview

Descriptive statistics

922

Hypothesis (Linguistic) Participants Task (stimuli = questions, responses = answers) Results Conclusions

  Key terms: stimulus design, response measure

What do we need to run an experiment?

Example

Show me the cat that bit the dog

Show me the cat that the dog bit

Picture from:

Friedmann &Novogrodsky (2001)

Design

Number of conditions Within subject / between subject How many items to each participant Order of items

Measure Response

Variables Scales Analysis

Descriptive Inferential

Variables Any experimental category that has a

value that can vary. Anything that is not constant and can

change over time, or be different in different people is a variable

Variables can take many forms Variables can be manipulated and

observed

Properties of Variables Continuous variable – along a continuum

with equal intervals (e.g., age, height, weight, grade in a test)

Ordinal variables – rating along a continuum with estimated intervals (e.g., evaluation)

Discrete variables (categorical, nominal) – divide to categories (e.g., language, yes/no, correct/incorrect)

Types of Variables Independent variables –

Characteristics of the subject (Participant variable)Conditions chosen by the experimenter

Dependent variables – what the experiment measures (e.g., degree of success)

Intervening variables – variables which are not measured or manipulated, but could influence the results (e.g., concentration, intelligence)

Field, A. & G. Hole. 2003. How to design and Report Experiments. London: A Sage Publications Company

Scales

Nominal Ordinal Interval Ratio

Scales

Nominal Ordinal Interval Ratio

Two things with the same number are similar (same name)

Scales

Nominal Ordinal Interval Ratio

Four is more than three (but not the same as three from two)

Scales

Nominal Ordinal Interval Ratio

Four is more than two (but not twice)

Scales

Nominal Ordinal Interval Ratio Four is more than

three, same as three from two, and is twice two

Which scale are the following variables rated on? Height Celsius degrees TV channel number Grades in an exam (1-100) Psychological rating (anxiety on a scale of 1-

10) Time (13:00, 14:00) Time (one hour, two hours, three hours) Phone number Rating places in a race

Variables and Scales: summary

Choose an appropriate task Measure responses Be aware of the variables and their

properties Choose the mathematical operations

appropriate for the scale

Factorial design

Tests all possible combinations, e.g., a 2x2 design – one participant variable and one independent variable with two conditions.

Subject relatives

Object Relatives

TLD

SLI

Practical questions for offline tasks How many subjects? At least 25 How many categories? 2x2 How many items? More subjects >> fewer

items. For 25 – 6 items per category For 50 – 3 is enough For case studies and within subject analysis

at least 10.

SIMPLE NUMERICAL COMPUTATIONS

Ratio

The relation between two nominal variables

V/N ratio: 60/80=3/4

N/V ratio: 80/60=4/3

N

Nouns80

Verbs60

Other words

50

Total190

Example

Goofy said that the Troll had to put two hoops on the pole to win.

Does the Troll win?

Musolino (2004)

Ratio

Yes/no ratio: 8/12=2/3

Proportion

Relation between a group and its part (Verb/Word, Pronouns/Subject position). Ratio out of the total

Verb/Word proportion: 60/190=1/3=0.31

Percentage(%)

Relative proportion out of a hundred Verb percentage (out of all words):

100*(60/190) =31%

Rate

The relative frequency (for population out of a 1000)7% of children have SLI >> 0.07 * 1000 = 70 70 children out of a 1000 have SLI

Frequency

Count the number of times a score occurs.

How many times a value of a variable occurs?

Example

Show 10 pictures, and check for number of “correct” response

Is every bunny eating a carrot?

Roeper, Strauss and Zurer

Pearson (2004)

Picturecorrect

11

21

30

40

50

60

71

81

91

101

Total6

Frequency

Count the number of times a score occurs

ChildScore

18

28

36

46

56

66

72

82

Frequency

Raw scoreChildScore

18

28

36

46

56

66

72

82

FrequencyScoreFrequency

2

6

8

Frequency=how many children got this score

2

2

4

Frequency graph

Score on the test is the horizontal axis (X-axis)

Frequency is on the vertical axis (Y-axis)

Percentile

The cumulative frequency - how many scores are below a particular point in the distribution

Percentile = 100(Cumulative Frequency/Total N)

Grade Frequency cumulative frequency

percentile

100 2 30 100% 90 5 28 93% 80 10 23 77% 70 8 13 43% 60 4 5 17% 50 1 1 3% Total N 30

Frequency polygon (the curve)Frequency distribution

0

2

4

6

8

10

12

50 60 70 80 90 100

Grade

N o

f st

ud

ent

The frequency polygon (the curve) is a picture of the data

Types of distributions (Fig. 4.3 &4.4, pp. 113-116)

A bell shaped curve - a symmetric distribution, a unimodal distribution (one midpoint, one peak),

normal distribution

Peak

Tails

Pointy distribution (Leptokutic) Flat distribution (Platykutic)

In skewed distribution the tail is skewed in one direction:Positively skewed distribution - most scores are low, the tail is directed towards the high (positive) scores which skewed the distributionNegatively skewed distribution - most scores are high, the tail is directed towards the low (negative) scores which skewed the distribution

Bimodal distribution - a double peaked curve

Min (the lowest score) and Max (the highest score)

Range – the range of observed values. Range = Max-Min

But the range changes with the extreme scores (unstable but useful informal measure).

Descriptive Statistics - Some definitions

Mode - most frequently obtained score Mean (average) – average of a set of

numbers Median – the middle score of a group

(when odd) or the average of the two middle scores (when even)

In a bell curve (normal) distribution mode, mean and median will be the same

Mode

GradeFrequency

501

604

708

8010

905

1002

total30

Which grade is most frequent?

Highest in “frequency” column

Mean (average)

GradeFrequency

501

604

708

8010

905

1002

total30

Compute a sum of all grades

Divide by number of grades

Mean (average)

Grade x times

50x150

60x4240

70x8560

80x10800

90x5450

100x2200

total2300

mean2300/30

76.66

Median

GradeFrequency

501

604

708

8010

905

1002

total30

Order all grades in a row according to value

The grade in “the middle” of the row is the median

Median

GradeFrequency

501

604

708

8010

905

1002

total30

We have a row of 30 grades:

50,60,60,60,60,70… Half of 30 is 15 The grade in the 15th

position is the median

Median

GradeFrequency

501

604

708

8010

905

1002

total30

Slight complication: we have 15 grades on both sides of the median

Compute mean of the grades in the 15th and 16th positions

Variability

(Fig

ure

from

Hat

ch &

Far

hady

198

2, p

.56)

Questions: Are both curves the same? How? Are they different? How?

We need to measure the accuracy of the mean.

Coming attractions

How to draw valid statistical inferences?We have to look at the relation between our

sample and the population Today we looked at where the ‘center’ of

the data is – what is the big picture Look at variance, how the data is distributed

DeviationThe distance between a score and the Mean (see Table 4.2, p. 125), how much a score deviates from the average

Sum of squared errors (SS)

Variance Average error in the sample, average error

in the population Variance in the sample = SS/N

33.7143/7=4.8163 Variance in the population = SS/(N-1)

33.7143/6=5.6191 Why N-1? Degree of freedom (read box

4.5, page 129)

Standard deviation (SD)

The average distance between a score and the Mean (square root of the Variance)

SD= √5.6191 = 2.37

What can SD tell us about the distribution (pointy distribution vs. flat distribution)?

Standard Error (SE)

How well does the sample represent the population?

Different samples of the population might yield different means. The SE is the average of the SDs of the means of several samples. Large value - big difference, small value- small difference.

SE = SD/√ N

Confidence Interval

The limits within which 95% or 99% of the samples fall

Lower boundary = Mean-2SE Upper boundary = Mean+2SE

Inferential statistics

z-score and T-score

How can we use the standard deviation (SD) to compare two samples? two exams? two tests?

We translate the raw scores into distance in SD from the mean, by subtracting the mean from the raw score and dividing by the SD.

So for Table 4.2:

1-3.57 8-3.57--------- = -1.08 --------- = 1.86 2.37 2.37

These scores are z-scores. Some z-scores are negative and some are positive. Why?

So for Table 4.2:

1-3.57 8-3.57--------- = -1.08 --------- = 1.86 2.37 2.37

These scores are z-scores. Some z-scores are negative and some are positive. Why?

If you prefer a scale with only positive numbers, you can use the T-score

T score = 10 * z-score +50

10 * -1.08 +50 = 39.2

10*1.86+50 = 68.6

A few words on Covariance and Pearson correlation

Covariance - how much two variables co-vary?

Cov = (X - X) (Y- Y)

But we are interested in sets of scores so we need to sum up all the individual covariance and divide, as always by N-1.

Σ (X-X)(Y-Y)COVxy= ----------------------

N-1

What do we need covariance for? To measure correlations (Pearson correlation coefficient is considered the best way to estimate correlation between X & Y).

Since the two samples do not have the same

SD, we must adjust the covariance to the amount of variation

COVx y

r= --------------

SDx * SDy

What does r mean ?

Positive r - positive correlation Negative r - negative correlation Small r - small correlation Big r - big correlation

inferential statistics.xls

Effect size We can use correlations to measure

experimental effect size r2 - the coefficient of determination - is the

fraction of the variance that is accounted for by a linear correlation.

r=0.1 (small effect) - only 1% of the variance is accounted for by our task (1%=.01=r2)

r=0.3 (medium effect) - 9% of variance is accounted for by our task (9%=.09=r2)

r=0.5 (large effect) - 25% of variance is accounted for by our task (25%=0.25=r2)

r = 1 A perfect effect

Probability How probable it is to get a certain correlation? How probable is it to get a certain score? How probable is it to get a certain mean? How probable is it that two samples are the

same/different?

Playing "Head or tails?" Throwing a dice.

Probability can be calculated by dividing the number of desired events by the number of possible outcomes.

What is the probability of getting a score above the mean?

What is the probability of getting a score which is up to 1SD above the mean? up to 1SD from the mean? (For every z-score there is a probability)

Or by relaying on SD

Confidence Interval

The limits within which 95% of the samples fall

Lower boundary = Mean-2SE Upper boundary = Mean+2SE

Hypothesis testing

How likely is it (how probable is it) that our hypothesis is right?

The probability that some results could happen by chance is less than 5% (or 1%)

p<0.05 (or p<0.01) - the level of significance

Null hypothesis - there is no difference between our sample and the population

Positive hypothesis - the sample does better than the population.

Negative hypothesis - the sample worse better than the population

Alternative hypothesis - the sample is different but there is no direction.

p>0.05

p<0.05

(Fig

ures

fro

m H

atch

& F

arha

dy 1

982,

p.8

7)

If the data falls in the shaded area of 8.5 - the null hypothesis is confirmed

If the data falls in the shaded area of 8.6 - the null hypothesis is rejected

If the data falls in the shaded higher tail of 8.6 - the scores are higher than the population and the null hypothesis is rejected

If the data falls in the shaded negative tail of 8.6 - the scores are lower than the population and the null hypothesis is rejected

Since there is no direction specified by the null hypothesis, we must consider both tails - thus we use a two tailed test (with .025 in each tail).

If we test a directional hypothesis, the level of significance applies to one tail only.

(Fig

ures

fro

m H

atch

& F

arha

dy 1

982,

p.8

8)

A score in the shaded area in 8.7 confirms the _____________ hypothesisA score in the shaded area in 8.8 confirms the _____________ hypothesis

Recommended