Topics, Summer 2008

Topics, Summer 2008

Day 1. Introduction

Day 2. Samples and populations • Measures of central tendency and dispersion• Evaluating differences between sample means to

estimate differences between populations – normal distribution and t-test

Day 3. Evaluating relationships• Scatterplots• Correlation

Day 4. Regression and Analysis of Variance

Day 5. Logistic regression

Distributions for nominal variables

• Counts (i.e., frequency)

How many Xs do I have?• Proportions (i.e., probability density)

How many Xs do I have out of the total number of observations?

Example:• How many of the clauses tagged in the Switchboard

portion of the Bresnan et al. (2007) dataset show the PP realization of the recipient?

• What proportion of the Switchboard observations …?

Frequency, probability, odds

Frequency and expectation:• Of the 17 students who received financial support to

attend the LSA Summer Meeting, how many do we expect to be women?

• If 7 were women, is this deviation from the expected value of 8.5 larger than we could expect by chance?

Evaluating frequency differences:• Of the tagged clauses in the Switchboard portion of

the Bresnan et al. (2007) dataset, 79% show the PP realization of the recipient.

• Is the proportion of PP realizations the same in the Wall Street Journal portion of the dataset?

Distributions for ratio variables

• Raw counts of values not very useful

How many Xs are equal to n1?

How many Xs are more than n1 but less than n2?

• Proportions

What percentage of Xs such that n1 < x < n2?

• Histogram: X={x1, x2, …, xn}, breaks = {b1, b2, …, bm }

What percentage of Xs such that x ≤ b1 ?

What percentage of Xs such that b1 < x ≤ b2 ?

…

What percentage of Xs such that bm-1 < x ≤ bm ?

Summary measures

• Central tendency (expected value)• mode• median• mean

• Dispersion (reliability of expectation)• range• inter-quartile range• variance• standard deviation

Descriptive vs inferential statistics

• descriptive statistics• summary of your sample• examples:

• calculate sample mean (written “x-bar”)• calculate sample variance (s2)

• inferential statistics• generalization from your sample to the population

from which your sample was drawn• examples:

• use x-bar to estimate population mean ()• use s2 to estimate population variance (2)

Distribution families

• Uniform distribution

Example:

Expected value for throw of one die• Binomial distribution

Example:

Expected number of heads when n coins tossed• Normal distribution

Example:

Expected total value for throw of n=many dice

Expected value for many variables that are the cumulative result of many independent influences

Central Limit Theorem

• Because the mean value of a large random sample is the cumulative result of many independent influences, the distribution of mean values of large random samples taken from a population will approximate a normal curve whatever the shape of the population distribution.

• Example:• distribution of values in random throw of a die vs

distribution of mean values calculated for a set of random throws of 10,000 dice

Hypothesis testing

• Null hypothesis (H0)

• examples: • mean F4 for Detroit vowels is 3500

(written H0: = 3500 Hz)

• mean F4 of Detroit men’s vowels is 3500• mean F4 of men’s vowel is same as mean F4

of women’s vowels• Alternative hypothesis

• examples (matching those above):• mean F4 for Detroit vowels is not 3500

(written H0: ≠ 3500 Hz)

Documents

Topics, Summer 2008