49
Statistics: First Steps Andrew Martin PS 372 University of Kentucky

Statistics 091208004734-phpapp01 (1)

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Statistics 091208004734-phpapp01 (1)

Statistics: First Steps

Andrew MartinPS 372

University of Kentucky

Page 2: Statistics 091208004734-phpapp01 (1)

Variance

Variance is a measure of dispersion of data points about the mean for interval- and ratio-level data.

Variance is a fundamental concept that social scientists seek to explain in the dependent variable.

Page 3: Statistics 091208004734-phpapp01 (1)
Page 4: Statistics 091208004734-phpapp01 (1)

Standard Deviation

Standard deviation is a measure of dispersion of data points about the mean for interval- and ratio-level data.

Like the mean, standard deviation is sensitive to extreme values.

Standard deviation is calculated as the square root of the variance.

Page 5: Statistics 091208004734-phpapp01 (1)
Page 6: Statistics 091208004734-phpapp01 (1)
Page 7: Statistics 091208004734-phpapp01 (1)

Normal Distribution

The bulk of observations lie in the center, where there is a single peak.

In a normal distribution half (50 percent) of the observations lie above the mean and half lie below it.

The mean, median and mode have the same statistical values.

Fewer and fewer observations fall in the tails. The spread of the distribution is symmetric.

Page 8: Statistics 091208004734-phpapp01 (1)

Normal Distribution

Mathematical theory allows us to know what percentage of observations lie within one (68%), two (95%) or three (98%) standard deviations of the mean.

If data are not perfectly normally distributed, the percentages will only be approximations.

Many naturally occurring variables do have nearly normal distributions.

Some can be transformed using logarithms.

Page 9: Statistics 091208004734-phpapp01 (1)

Frequency Distribution

Page 10: Statistics 091208004734-phpapp01 (1)

What about categorical variables?

Page 11: Statistics 091208004734-phpapp01 (1)
Page 12: Statistics 091208004734-phpapp01 (1)

Example

Calculate the ID and IQV for a former PS 372 class grades using the following frequencies or

proportions:Grade Freq. Prop.

A 4 (.12)B 7 (.21)C 4 (.12)D 7 (.21)E 12 (.34)

Page 13: Statistics 091208004734-phpapp01 (1)

Index of Diversity

ID = 1 – (p2a + p2

b + p2

c +p2

d +p2

e)

ID = 1 - (.122 + .212 + .122 + .212 + .342)

ID = 1 - (.0144 + .0441 + .0144 + .0441 + .1156)

ID = 1 - (.2326)

ID = .7674

Page 14: Statistics 091208004734-phpapp01 (1)

Index of Qualitative Variation

1 – (p2a + p2

b + p2

c +p2

d +p2

e)

1 - (1/K)

Page 15: Statistics 091208004734-phpapp01 (1)

Index of Qualitative Variation

.7674(1 – 1/5)

.9592

Page 16: Statistics 091208004734-phpapp01 (1)
Page 17: Statistics 091208004734-phpapp01 (1)

Data Matrix

A data matrix is an array of rows and columns that stores the values of a set of variables for all the cases in a data set.

This is frequently referred to as a dataset.

Page 18: Statistics 091208004734-phpapp01 (1)
Page 19: Statistics 091208004734-phpapp01 (1)
Page 20: Statistics 091208004734-phpapp01 (1)

Data Matrix from JRM

Page 21: Statistics 091208004734-phpapp01 (1)

Properties of Good Graphs

Should answer several of the following questions:(JRM 384)

1. Where does the center of the distribution lie?

2. How spread out or bunched up are the observations?

3. Does it have a single peak or more than one?

4. Approximately what proportion of observations in in the ends of the distributions?

Page 22: Statistics 091208004734-phpapp01 (1)

Properties of Good Graphs

5. Do observations tend to pile up at one end of the measurement scale, with relatively few observations at the other end?

6. Are there values that, compared with most, seem very large or very small?

7. How does one distribution compare to another in terms of shape, spread, and central tendency?

8. Do values of one variable seem related to another variable?

Page 23: Statistics 091208004734-phpapp01 (1)
Page 24: Statistics 091208004734-phpapp01 (1)
Page 25: Statistics 091208004734-phpapp01 (1)
Page 26: Statistics 091208004734-phpapp01 (1)
Page 27: Statistics 091208004734-phpapp01 (1)
Page 28: Statistics 091208004734-phpapp01 (1)

Statistical Concepts

Let's quickly review some concepts.

Page 29: Statistics 091208004734-phpapp01 (1)

Population

A population refers to any well-defined set of objects such as people, countries, states, organizations, and so on. The term doesn't simply mean the population of the United States or some other geographical area.

Page 30: Statistics 091208004734-phpapp01 (1)

Population

A sample is a subset of the population. Samples are drawn in some known manner and

each case is chosen independently of the other. From here on out, when the book uses the term

sample, random sample or simple random sample, it's making reference to the same concept, which is a sample chosen at random.

Page 31: Statistics 091208004734-phpapp01 (1)

Populations Parameters are numerical features of a

population. A sample statistic is an estimator that

corresponds to a population parameter of interest and is used to estimate the population value.

Y is the sample mean, (μ) is the population mean.

^ is a “hat”, caret or circumflex

Page 32: Statistics 091208004734-phpapp01 (1)

Two Kinds of Inference

Hypothesis Testing

Point and interval estimation

Page 33: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Many claims can be translated into specific statements about a population that can be confirmed or disconfirmed with the aid of probability theory.

Ex: There is no ideological difference between the voting patterns between the voting patterns of Republican and Democrat justices on the U.S. Supreme Court.

Page 34: Statistics 091208004734-phpapp01 (1)

Point and Interval Estimation

The goal here is to estimate unknown population parameters from samples and to surround those estimates with confidence intervals. Confidence intervals suggest the estimates reliability or precision.

Page 35: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Start with a specific verbal claim or proposition.

Ex: The chances of getting heads or tails when flipping the coin is are roughly the same.

Ex: The chances of the United States electing a Republican or Democrat president are roughly the

same.

Page 36: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Page 37: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Next, the researcher constructs a null hypothesis.

A null hypothesis is a statement that a population parameter equals a specific value.

Page 38: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Following up on the coin example, the null hypothesis would equal .5.

Stated more formally: H0: P = .5

Where P stands for the probability that the coin will be heads when tossed.

H0 is typically used to denote a null hypothesis.

Page 39: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Next, specify an alternative hypothesis. An alternative hypothesis is a statement

about the value or values of a population parameter. It is proposed as an alternative to the null hypothesis.

An alternative hypothesis can merely state that the population does not equal the null hypothesis, or is greater than or less than the null hypothesis.

Page 40: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Suppose you believe the coin is unfair, but have no intuition about whether it is too prone to come up heads or tails.

Stated formally, the alternative hypothesis is:

HA: P ≠ .5

Page 41: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Perhaps you believe the coin is more likely to come up heads than tails. You would formulate the following alternative hypothesis:

HA : P > .5

Conversely, if you believe the coin is less likely to come up heads than tails, you would formulate the alternative hypothesis in the opposite direction:H

A: P < .5

Page 42: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

After specifying the null and alternative hypothesis, identify the sample estimator that corresponds to the parameter in question.

The sample must come from the data, which in this case is generated by flipping a coin.

Page 43: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Next, determine how the sample statistic is distributed in repeated random samples. That is, specify the sampling distribution of the estimator.

For example, what are the chances of getting 10 heads in 10 flips (p = 1.)? What about 9 heads in 10 flips (p = .9)? 8 flips (p = .8)?

Page 44: Statistics 091208004734-phpapp01 (1)
Page 45: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Make a decision rule based on some criterion of probability or likelihood.

In social sciences, a result that occurs with a probability of .05 (that is, 1 chance in 20) is considered unusual and consequently is grounds for rejecting a null hypothesis.

Other common thresholds (.01, .001) are also common..

Make the decision rule before collecting data.

Page 46: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

In light of the decision rule, define a critical region. The critical region consists of those outcomes so unlikely to occur that one has cause to reject the null hypothesis should they occur.

So there are areas of “rejection” (critical areas) and nonrejection.

Page 47: Statistics 091208004734-phpapp01 (1)
Page 48: Statistics 091208004734-phpapp01 (1)

Hypothesis Testing

Collect a random sample and calculate the sample estimator.

Calculate the observed test statistic. A test statistic converts the sample result into a number that can be compared with the critical values specified by your decision rule and critical values.

Examine the observed test statistic to see if it falls in the critical region.

Make practical or theoretical interpretation of the findings.

Page 49: Statistics 091208004734-phpapp01 (1)