View
26
Download
0
Category
Preview:
DESCRIPTION
Topics, Summer 2008. Day 1. Introduction Day 2. Samples and populations Measures of central tendency and dispersion Evaluating differences between sample means to estimate differences between populations – normal distribution and t-test Day 3. Evaluating relationships Scatterplots - PowerPoint PPT Presentation
Citation preview
Topics, Summer 2008
Day 1. Introduction
Day 2. Samples and populations • Measures of central tendency and dispersion• Evaluating differences between sample means to
estimate differences between populations – normal distribution and t-test
Day 3. Evaluating relationships• Scatterplots• Correlation
Day 4. Regression and Analysis of Variance
Day 5. Logistic regression
Distributions for nominal variables
• Counts (i.e., frequency)
How many Xs do I have?• Proportions (i.e., probability density)
How many Xs do I have out of the total number of observations?
Example:• How many of the clauses tagged in the Switchboard
portion of the Bresnan et al. (2007) dataset show the PP realization of the recipient?
• What proportion of the Switchboard observations …?
Frequency, probability, odds
Frequency and expectation:• Of the 17 students who received financial support to
attend the LSA Summer Meeting, how many do we expect to be women?
• If 7 were women, is this deviation from the expected value of 8.5 larger than we could expect by chance?
Evaluating frequency differences:• Of the tagged clauses in the Switchboard portion of
the Bresnan et al. (2007) dataset, 79% show the PP realization of the recipient.
• Is the proportion of PP realizations the same in the Wall Street Journal portion of the dataset?
Distributions for ratio variables
• Raw counts of values not very useful
How many Xs are equal to n1?
How many Xs are more than n1 but less than n2?
• Proportions
What percentage of Xs such that n1 < x < n2?
• Histogram: X={x1, x2, …, xn}, breaks = {b1, b2, …, bm }
What percentage of Xs such that x ≤ b1 ?
What percentage of Xs such that b1 < x ≤ b2 ?
…
What percentage of Xs such that bm-1 < x ≤ bm ?
Summary measures
• Central tendency (expected value)• mode• median• mean
• Dispersion (reliability of expectation)• range• inter-quartile range• variance• standard deviation
Descriptive vs inferential statistics
• descriptive statistics• summary of your sample• examples:
• calculate sample mean (written “x-bar”)• calculate sample variance (s2)
• inferential statistics• generalization from your sample to the population
from which your sample was drawn• examples:
• use x-bar to estimate population mean ()• use s2 to estimate population variance (2)
Distribution families
• Uniform distribution
Example:
Expected value for throw of one die• Binomial distribution
Example:
Expected number of heads when n coins tossed• Normal distribution
Example:
Expected total value for throw of n=many dice
Expected value for many variables that are the cumulative result of many independent influences
Central Limit Theorem
• Because the mean value of a large random sample is the cumulative result of many independent influences, the distribution of mean values of large random samples taken from a population will approximate a normal curve whatever the shape of the population distribution.
• Example:• distribution of values in random throw of a die vs
distribution of mean values calculated for a set of random throws of 10,000 dice
Hypothesis testing
• Null hypothesis (H0)
• examples: • mean F4 for Detroit vowels is 3500
(written H0: = 3500 Hz)
• mean F4 of Detroit men’s vowels is 3500• mean F4 of men’s vowel is same as mean F4
of women’s vowels• Alternative hypothesis
• examples (matching those above):• mean F4 for Detroit vowels is not 3500
(written H0: ≠ 3500 Hz)
Recommended