9
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations • Measures of central tendency and dispersion • Evaluating differences between sample means to estimate differences between populations – normal distribution and t- test Day 3. Evaluating relationships • Scatterplots • Correlation Day 4. Regression and Analysis of Variance Day 5. Logistic regression

Topics, Summer 2008

Embed Size (px)

DESCRIPTION

Topics, Summer 2008. Day 1. Introduction Day 2. Samples and populations Measures of central tendency and dispersion Evaluating differences between sample means to estimate differences between populations – normal distribution and t-test Day 3. Evaluating relationships Scatterplots - PowerPoint PPT Presentation

Citation preview

Page 1: Topics, Summer 2008

Topics, Summer 2008

Day 1. Introduction

Day 2. Samples and populations • Measures of central tendency and dispersion• Evaluating differences between sample means to

estimate differences between populations – normal distribution and t-test

Day 3. Evaluating relationships• Scatterplots• Correlation

Day 4. Regression and Analysis of Variance

Day 5. Logistic regression

Page 2: Topics, Summer 2008

Distributions for nominal variables

• Counts (i.e., frequency)

How many Xs do I have?• Proportions (i.e., probability density)

How many Xs do I have out of the total number of observations?

Example:• How many of the clauses tagged in the Switchboard

portion of the Bresnan et al. (2007) dataset show the PP realization of the recipient?

• What proportion of the Switchboard observations …?

Page 3: Topics, Summer 2008

Frequency, probability, odds

Frequency and expectation:• Of the 17 students who received financial support to

attend the LSA Summer Meeting, how many do we expect to be women?

• If 7 were women, is this deviation from the expected value of 8.5 larger than we could expect by chance?

Evaluating frequency differences:• Of the tagged clauses in the Switchboard portion of

the Bresnan et al. (2007) dataset, 79% show the PP realization of the recipient.

• Is the proportion of PP realizations the same in the Wall Street Journal portion of the dataset?

Page 4: Topics, Summer 2008

Distributions for ratio variables

• Raw counts of values not very useful

How many Xs are equal to n1?

How many Xs are more than n1 but less than n2?

• Proportions

What percentage of Xs such that n1 < x < n2?

• Histogram: X={x1, x2, …, xn}, breaks = {b1, b2, …, bm }

What percentage of Xs such that x ≤ b1 ?

What percentage of Xs such that b1 < x ≤ b2 ?

What percentage of Xs such that bm-1 < x ≤ bm ?

Page 5: Topics, Summer 2008

Summary measures

• Central tendency (expected value)• mode• median• mean

• Dispersion (reliability of expectation)• range• inter-quartile range• variance• standard deviation

Page 6: Topics, Summer 2008

Descriptive vs inferential statistics

• descriptive statistics• summary of your sample• examples:

• calculate sample mean (written “x-bar”)• calculate sample variance (s2)

• inferential statistics• generalization from your sample to the population

from which your sample was drawn• examples:

• use x-bar to estimate population mean ()• use s2 to estimate population variance (2)

Page 7: Topics, Summer 2008

Distribution families

• Uniform distribution

Example:

Expected value for throw of one die• Binomial distribution

Example:

Expected number of heads when n coins tossed• Normal distribution

Example:

Expected total value for throw of n=many dice

Expected value for many variables that are the cumulative result of many independent influences

Page 8: Topics, Summer 2008

Central Limit Theorem

• Because the mean value of a large random sample is the cumulative result of many independent influences, the distribution of mean values of large random samples taken from a population will approximate a normal curve whatever the shape of the population distribution.

• Example:• distribution of values in random throw of a die vs

distribution of mean values calculated for a set of random throws of 10,000 dice

Page 9: Topics, Summer 2008

Hypothesis testing

• Null hypothesis (H0)

• examples: • mean F4 for Detroit vowels is 3500

(written H0: = 3500 Hz)

• mean F4 of Detroit men’s vowels is 3500• mean F4 of men’s vowel is same as mean F4

of women’s vowels• Alternative hypothesis

• examples (matching those above):• mean F4 for Detroit vowels is not 3500

(written H0: ≠ 3500 Hz)