View
218
Download
1
Category
Tags:
Preview:
Citation preview
AP Statistics
Course Review
Exploring Data• Variables can be categorical or quantitative• Discrete or continuous• For categorical data, we use bar charts• Numerical data can be displayed using a dotplot,
stemplot, box-and-whisker plot, histogram or cumulative frequency plot
• Remember histograms have no spaces (unless a category has none)
• Must include key with stemplot• Always label axes and make sure you read the axes
when interpreting a graph.
Commenting on a graph
• Shape: symmetric, skewed, unimodal, uniform• Center: Mean and median• Spread: Range, standard deviation, Iqr, gaps,
outliers (1.5x iqr) added to quartile
Effect of changing units
• Changing units will change measures of center and spread by the same ratio as the multiplier.
• Adding or subtracting the same constant will change measures of center in a similar manner but will not change measures of spread.
Trial Run 1
Scatterplots
• Bivariate, explanatory, response• Correlation coefficient (r) -1 to 1• R does not change when you switch x and y,
nor will it change when you multiply or add• Only measures strength of linear relationship• Affected by outliers• Lurking variables• Danger of extrapolation
• Coefficient of determination (r2)• Residuals (observed – predicted)• Influential points• Transformations
Trial run
Sampling
• Census, survey, experiment, observational study
• Parameter (population) statistic (sample)• Convenience, SRS, stratified, cluster,
systematic• Bias: undercoverage, nonresponse, response• Placebo, blind, randomization, replication,
confounding variable
• Experimental designs: completely randomized, blocks, matched pairs
Trial run
Probability
• Law of large numbers: long-term relative frequency gets closer to true freq. as # trials increases
• Disjoint (mutually exclusive): cannot occur simultaneously
• Mand and ort• Conditional probability:• Independence: knowing one has occurred
doesn’t change chance of the other
Probability distributions
• Matches all possible values of variable with probability of it happening
• All probabilities must be between 0 and 1• Total of probabilities must be 1• Mean: • Variance•
Binomial Random Variables
• Fixed number of trials, success or failure• P remains constant each trial• Each trial is independent• (nCr) pr (1-p)n-r
• Mean: np• Variance: np(1-p)
Geometric Random Variable
• Success or failure• P constant, each trial independent• How many times until ….• Probability k trials occur before …• p (1-p)k-1
Trial run
Combining Variables
• Mean (x+y) = mean (x) + mean (y)• Mean (x-y) = mean (x) – mean (y)• If independent: variance (x+y)= var(x)+var(y)
Normal distributions
• Z-score• Standardize endpoints, find area under curve
Trial run
Sampling distributions
• All possible random samples are taken and used to create a sampling distribution of the sample mean
• Standard dev. :
• Central Limit Theorem: as the size of an SRS increases, the shape of the sampling dist. tends toward normal
Hypothesis Testing
• Sample Proportion• Ho: • Ha:• Test Statistic• Pvalue• Assumptions: p is from a random sample• Sample size is large (np>10 and n(1-p)>10)• Sample no more than 10% of population
Sample Mean
• Ho:• Ha: • Test Statistic• P value• Assumptions: from a random sample• Sample size is large (>30) or population
distribution is approximately normal
Hypothesis Testing
• Difference in 2 sample proportions:• Ho: • Ha:• Test statistic:• P value• Assumptions: independently chosen random
samples or treatments were assigned at random to individuals
• Both sample sizes are large (np>10, n(1-p)>10 works for both of them
Hypothesis Testing
• Difference in two sample means• Ho:• Ha:• Test Statistic• P value• Assumptions: 2 sample are independently
selected random samples• Sample size large (>30) or population
distributions are approximately normal
Hypothesis Testing
• Paired t test comparing 2 population means• Ho: µd = hypothesized value
• Ha: µd < > ≠ hypothesized value• Test statistic:• Pvalue:• Assumptions: Samples are paired• Random samples from a pop. Of differences• Sample size is large (>30) or population distribution
of differences is about normal
Hypothesis Testing
• Chi-Square GOF• Ho:• Ha:• Test Statistic• P value• Assumptions: based on random sample• Sample size is large – every expected cell count
at least 5• Degrees of freedom?
Hypothesis Testing
• Chi-Square Test of Homogeneity or Independence (2 way table)
• Ho: There is no relationship between __and _• Ha: Ho not true• Test Statistic: • P value• Assumptions: independently chosen random samples or
random assignation to groups• All expected cell counts are at least 5• Degrees of freedom?
Hypothesis Testing (last one!!)
• Chi-square test for slope• Ho:• Ha:• Test statistic:• P value• Assumptions: dist. of e has mean value=0, std. dev.
of e does not depend on x, dist. of e is normal, random dev. of e are independent of each other
• Degrees of freedom: n-2
Confidence Intervals
• Statistic ± margin of error(also called bound)• Margin of error is combination of 2 numbers:
(Critical value ) (standard error)
Recommended