AP Statistics Course Review. Exploring Data Variables can be categorical or quantitative Discrete or...

Preview:

Citation preview

AP Statistics

Course Review

Exploring Data• Variables can be categorical or quantitative• Discrete or continuous• For categorical data, we use bar charts• Numerical data can be displayed using a dotplot,

stemplot, box-and-whisker plot, histogram or cumulative frequency plot

• Remember histograms have no spaces (unless a category has none)

• Must include key with stemplot• Always label axes and make sure you read the axes

when interpreting a graph.

Commenting on a graph

• Shape: symmetric, skewed, unimodal, uniform• Center: Mean and median• Spread: Range, standard deviation, Iqr, gaps,

outliers (1.5x iqr) added to quartile

Effect of changing units

• Changing units will change measures of center and spread by the same ratio as the multiplier.

• Adding or subtracting the same constant will change measures of center in a similar manner but will not change measures of spread.

Trial Run 1

Scatterplots

• Bivariate, explanatory, response• Correlation coefficient (r) -1 to 1• R does not change when you switch x and y,

nor will it change when you multiply or add• Only measures strength of linear relationship• Affected by outliers• Lurking variables• Danger of extrapolation

• Coefficient of determination (r2)• Residuals (observed – predicted)• Influential points• Transformations

Trial run

Sampling

• Census, survey, experiment, observational study

• Parameter (population) statistic (sample)• Convenience, SRS, stratified, cluster,

systematic• Bias: undercoverage, nonresponse, response• Placebo, blind, randomization, replication,

confounding variable

• Experimental designs: completely randomized, blocks, matched pairs

Trial run

Probability

• Law of large numbers: long-term relative frequency gets closer to true freq. as # trials increases

• Disjoint (mutually exclusive): cannot occur simultaneously

• Mand and ort• Conditional probability:• Independence: knowing one has occurred

doesn’t change chance of the other

Probability distributions

• Matches all possible values of variable with probability of it happening

• All probabilities must be between 0 and 1• Total of probabilities must be 1• Mean: • Variance•

Binomial Random Variables

• Fixed number of trials, success or failure• P remains constant each trial• Each trial is independent• (nCr) pr (1-p)n-r

• Mean: np• Variance: np(1-p)

Geometric Random Variable

• Success or failure• P constant, each trial independent• How many times until ….• Probability k trials occur before …• p (1-p)k-1

Trial run

Combining Variables

• Mean (x+y) = mean (x) + mean (y)• Mean (x-y) = mean (x) – mean (y)• If independent: variance (x+y)= var(x)+var(y)

Normal distributions

• Z-score• Standardize endpoints, find area under curve

Trial run

Sampling distributions

• All possible random samples are taken and used to create a sampling distribution of the sample mean

• Standard dev. :

• Central Limit Theorem: as the size of an SRS increases, the shape of the sampling dist. tends toward normal

Hypothesis Testing

• Sample Proportion• Ho: • Ha:• Test Statistic• Pvalue• Assumptions: p is from a random sample• Sample size is large (np>10 and n(1-p)>10)• Sample no more than 10% of population

Sample Mean

• Ho:• Ha: • Test Statistic• P value• Assumptions: from a random sample• Sample size is large (>30) or population

distribution is approximately normal

Hypothesis Testing

• Difference in 2 sample proportions:• Ho: • Ha:• Test statistic:• P value• Assumptions: independently chosen random

samples or treatments were assigned at random to individuals

• Both sample sizes are large (np>10, n(1-p)>10 works for both of them

Hypothesis Testing

• Difference in two sample means• Ho:• Ha:• Test Statistic• P value• Assumptions: 2 sample are independently

selected random samples• Sample size large (>30) or population

distributions are approximately normal

Hypothesis Testing

• Paired t test comparing 2 population means• Ho: µd = hypothesized value

• Ha: µd < > ≠ hypothesized value• Test statistic:• Pvalue:• Assumptions: Samples are paired• Random samples from a pop. Of differences• Sample size is large (>30) or population distribution

of differences is about normal

Hypothesis Testing

• Chi-Square GOF• Ho:• Ha:• Test Statistic• P value• Assumptions: based on random sample• Sample size is large – every expected cell count

at least 5• Degrees of freedom?

Hypothesis Testing

• Chi-Square Test of Homogeneity or Independence (2 way table)

• Ho: There is no relationship between __and _• Ha: Ho not true• Test Statistic: • P value• Assumptions: independently chosen random samples or

random assignation to groups• All expected cell counts are at least 5• Degrees of freedom?

Hypothesis Testing (last one!!)

• Chi-square test for slope• Ho:• Ha:• Test statistic:• P value• Assumptions: dist. of e has mean value=0, std. dev.

of e does not depend on x, dist. of e is normal, random dev. of e are independent of each other

• Degrees of freedom: n-2

Confidence Intervals

• Statistic ± margin of error(also called bound)• Margin of error is combination of 2 numbers:

(Critical value ) (standard error)

Recommended