Upload
benedict-cooke
View
26
Download
0
Embed Size (px)
DESCRIPTION
Topics, Summer 2008. Day 1. Introduction Day 2. Samples and populations Measures of central tendency and dispersion Evaluating differences between sample means to estimate differences between populations – normal distribution and t-test Day 3. Evaluating relationships Scatterplots - PowerPoint PPT Presentation
Citation preview
Topics, Summer 2008
Day 1. Introduction
Day 2. Samples and populations • Measures of central tendency and dispersion• Evaluating differences between sample means to
estimate differences between populations – normal distribution and t-test
Day 3. Evaluating relationships• Scatterplots• Correlation
Day 4. Regression and Analysis of Variance
Day 5. Logistic regression
Distributions for nominal variables
• Counts (i.e., frequency)
How many Xs do I have?• Proportions (i.e., probability density)
How many Xs do I have out of the total number of observations?
Example:• How many of the clauses tagged in the Switchboard
portion of the Bresnan et al. (2007) dataset show the PP realization of the recipient?
• What proportion of the Switchboard observations …?
Frequency, probability, odds
Frequency and expectation:• Of the 17 students who received financial support to
attend the LSA Summer Meeting, how many do we expect to be women?
• If 7 were women, is this deviation from the expected value of 8.5 larger than we could expect by chance?
Evaluating frequency differences:• Of the tagged clauses in the Switchboard portion of
the Bresnan et al. (2007) dataset, 79% show the PP realization of the recipient.
• Is the proportion of PP realizations the same in the Wall Street Journal portion of the dataset?
Distributions for ratio variables
• Raw counts of values not very useful
How many Xs are equal to n1?
How many Xs are more than n1 but less than n2?
• Proportions
What percentage of Xs such that n1 < x < n2?
• Histogram: X={x1, x2, …, xn}, breaks = {b1, b2, …, bm }
What percentage of Xs such that x ≤ b1 ?
What percentage of Xs such that b1 < x ≤ b2 ?
…
What percentage of Xs such that bm-1 < x ≤ bm ?
Summary measures
• Central tendency (expected value)• mode• median• mean
• Dispersion (reliability of expectation)• range• inter-quartile range• variance• standard deviation
Descriptive vs inferential statistics
• descriptive statistics• summary of your sample• examples:
• calculate sample mean (written “x-bar”)• calculate sample variance (s2)
• inferential statistics• generalization from your sample to the population
from which your sample was drawn• examples:
• use x-bar to estimate population mean ()• use s2 to estimate population variance (2)
Distribution families
• Uniform distribution
Example:
Expected value for throw of one die• Binomial distribution
Example:
Expected number of heads when n coins tossed• Normal distribution
Example:
Expected total value for throw of n=many dice
Expected value for many variables that are the cumulative result of many independent influences
Central Limit Theorem
• Because the mean value of a large random sample is the cumulative result of many independent influences, the distribution of mean values of large random samples taken from a population will approximate a normal curve whatever the shape of the population distribution.
• Example:• distribution of values in random throw of a die vs
distribution of mean values calculated for a set of random throws of 10,000 dice
Hypothesis testing
• Null hypothesis (H0)
• examples: • mean F4 for Detroit vowels is 3500
(written H0: = 3500 Hz)
• mean F4 of Detroit men’s vowels is 3500• mean F4 of men’s vowel is same as mean F4
of women’s vowels• Alternative hypothesis
• examples (matching those above):• mean F4 for Detroit vowels is not 3500
(written H0: ≠ 3500 Hz)