45
Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction to parameters and statistics The idea of randomness revisited Thinking about probabilities Basic probability math

Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Embed Size (px)

Citation preview

Page 1: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 1

Lecture 12: Introduction to probability

Review cross tabulations/ conditional distributions

Conceptual introduction to parameters and statistics

The idea of randomness revisited Thinking about probabilities Basic probability math

Page 2: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 2

Race and affirmative action

favor aff |

action in | race

hiring | white black other | Total

-------------+---------------------------------+----------

str support | 57 94 9 | 160

| 4.48 44.34 11.25 | 10.24

-------------+---------------------------------+----------

some support | 77 27 8 | 112

| 6.06 12.74 10.00 | 7.17

-------------+---------------------------------+----------

oppose | 335 43 24 | 402

| 26.36 20.28 30.00 | 25.72

-------------+---------------------------------+----------

str oppose | 802 48 39 | 889

| 63.10 22.64 48.75 | 56.88

-------------+---------------------------------+----------

Total | 1271 212 80 | 1563

| 100.00 100.00 100.00 | 100.00

Page 3: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 3

Aid support by income quartile

low-income | income category co | Q1 Q2 Q3 Q4 | Total-----------+--------------------------------------------+---------- def shd | 144 149 114 132 | 539 | 34.95 34.33 33.04 32.04 | 33.62 -----------+--------------------------------------------+---------- prob shd | 213 232 182 216 | 843 | 51.70 53.46 52.75 52.43 | 52.59 -----------+--------------------------------------------+---------- prob not | 40 43 35 47 | 165 | 9.71 9.91 10.14 11.41 | 10.29 -----------+--------------------------------------------+---------- def not | 15 10 14 17 | 56 | 3.64 2.30 4.06 4.13 | 3.49 -----------+--------------------------------------------+---------- Total | 412 434 345 412 | 1603 | 100.00 100.00 100.00 100.00 | 100.00

Page 4: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 4

Parameters and statistics

A parameter is the true population value for some attribute• May be an attribute of a distribution (such as

mean, median, variance)

• May be an attribute of a relationship (correlation, least squares regression line)

Page 5: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 5

Assumptions

When we talk about a parameter, we are assuming:• The quantity exists in real life (empirical)

• The quantity is stable, at least for a moment

• The quantity is knowable These assumptions are not universally

accepted

Page 6: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 6

Measuring net worth

The parameter of interest is mean net worth in the population of those over age 18 in the U.S. In 1990• There is an average net worth out there

• It is stable at a particular moment

• It is a knowable quantity

Page 7: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 7

Voting for president

Percent of “likely voters” supporting each candidate (October 11-13)• Bush 48%

• Gore 44%

• Nader 2%

• Buchanan 1%

Page 8: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 8

Assumptions on voting

The parameters are percentage of the voting population voting for each candidate. The assumptions are:• There is an actual percentage out there

• It is stable at a particular moment

• It is a knowable quantity

Page 9: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 9

The internet stamp tax

In a debate in the fall of 2000, Rick Lazio and Hillary Clinton were asked to share their views on a House bill to allow the U.S. Postal Service to tax email at 5¢ a pop

The bill is fictitious- an internet hoax Assume we asked Americans about their

views on this bill

Page 10: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 10

The internet tax question

Recently, the U.S. House of Representatives took up a bill that would tax email at 5¢ a message. Would you say you strongly oppose, somewhat oppose, somewhat favor, or strongly favor this legislation?

Page 11: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 11

Assumptions on the internet tax

The parameters are percentage of the population strongly opposed, somewhat opposed, somewhat in favor and strongly in favor of the bill. The assumptions are:• There is an actual percentage out there

• It is stable at a particular moment

• It is a knowable quantity

Page 12: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 12

Statistics and parameters

A statistic is your best guess at the value of a parameter; your attempt to infer the parameter’s value

We are almost always interested in parameters (properties of a population), but choose to estimate those parameters from samples

This applies to both experimental and observational studies

Page 13: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 13

Population and theoretical distributions

There is a connection between population distributions and theoretical distributions

Though we believe there are population distributions, we seldom if ever observe them

Population and theoretical distribution share the same notation

Page 14: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 14

Some (old) new notation

It is important to distinguish between sample and population distributions

So important we use different symbols

Mean Std dev

Sample s

Population x

Page 15: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 15

How are statistics and parameters related?

If a statistic is calculated from data from a simple random sample, the distribution of the statistic has a known relationship to the population parameter.

By building on chance, on probability, we can make claims concerning the parameters in which we are interested.

Page 16: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 16

Randomness revisited

To say something is random is to say that the outcome or value of something cannot be known with certainty before it is observed

This is very different from saying that the distribution of this value is random

Many phenomena are random at one level (individual observations), but follow a pattern at another level (aggregations)

Page 17: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 17

The coin toss example

Heads or tails is the outcome of each trial (flip of the coin)

Assuming the coin is balanced, as the number of tosses increases, the proportion of heads approaches 0.50

Page 18: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 18

The coin toss example

Page 19: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 19

To find probability, you must:

Have a long series of independent trials Observe and record the outcomes of those

trials Aggregate across trials to find the probability

Page 20: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 20

The voting example

Think of each observation as a trial The responses are Bush, Gore, Nader or

Buchanan Observations are independent if:

• probability of selection for each sample member is independent

• interviewer and response are independent

Page 21: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 21

The voting example

In simulated data, I gave each observation a 44% probability of choosing Gore.

The following graphs plot the proportion of observations voting for Gore by the number of observations

This is analogous to the number of successes (or failures) by the number of trials

Page 22: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 22

Voting with 200 observationsProportion voting for GoreFirst 200 observations- simulated

pro

po

rtio

n v

otin

g fo

r G

ore

number of observations1 10 20 50 100 150 200

0

.2

.4

.44

.6

Page 23: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 23

Voting with 2000 observations

Proportion voting for GoreFirst 2000 observations- simulated

pro

po

rtio

n v

otin

g fo

r G

ore

number of observations100 500 1000 1500 2000

0

.2

.4

.44

.6

Page 24: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 24

Probability math

A probability model describes a random phenomenon, or a random event. Probability models begin with:• The sample space (S), which is the set of all

possible outcomes

• The event, which is any outcome or set of outcomes of interests

• A way of assigning probabilities

Page 25: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 25

The sample space

Think of a sample space as a population of outcomes. All possible outcomes are included.

The number of possible outcomes varies with the number of trials

Page 26: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 26

Examples of sample space: one observation

Coin toss

S={heads, tails}

Voter poll

S={Bush, Gore, Nader, Buchanan}

Page 27: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 27

Examples of sample space: two observations, order counts

Coin toss

S={heads tails

heads heads

tails heads

tails tails}

Voter poll

S={Bush Bush Nader Nader

Bush Gore Nader Bush

Bush Nader Nader Gore

Bush Buchanan Nader Buchanan

Gore Gore Buchanan Buchanan

Gore Bush Buchanan Bush

Gore Nader Buchanan Gore

Gore Buchanan Buchanan Nader}

Page 28: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 28

Examples of sample space: two observations, order doesn’t count

Coin toss

S={heads tails

heads heads

tails tails}

Voter poll

S={Bush Bush Nader Nader

Bush Gore

Bush Nader

Bush Buchanan Nader Buchanan

Gore Gore Buchanan Buchanan

Gore Nader

Gore Buchanan }

Page 29: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 29

Sample space, one outcome of interest, 10 trials

Coin toss (number of heads)

S={0,1,2,3,4,5,6,7,8,9,10}

Voter poll (votes for Bush)

S={0,1,2,3,4,5,6,7,8,9,10}

Page 30: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 30

The event

An event is a sample of outcomes, a subset of interest to us. Not a random sample, a subset.

If order counts, event might be “getting heads first, tails second”

If order does not count, event might be “getting one heads and one tails”

The sample space and event depend on the research question

Page 31: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 31

Properties of probability

Probability is the likelihood of some event occurring

Any probability is between 0 and 1• 0P(A) 1

• 0P(vote Nader) 1

Page 32: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 32

Properties of probability

The sum of probabilities for all possible outcomes is 1• P(S)=1

• P(Nader or Bush or Gore or Buchanan or other)=1

Page 33: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 33

Properties of probability

The probability that an event does not occur is 1 minus the probability that an event does occur• P(not A)=1-P(A)

• P(not Nader)=1-P(Nader)

Page 34: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 34

Properties of probability

It two events A and B have no outcomes in common, the probability of either event occurring is sum of probabilities of A and B.• P(A or B)=P(A) + P(B)

• P(Nader or Buchanan)= P(Nader) + P(Buchanan) If the above is true, we call A and B disjoint

events

Page 35: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 35

Probabilities and frequencies

Probability and frequency are closely related logically and mathematically

The relative frequency for some event in a population is the probability of that event• The relative frequency of some event in a sample

is an estimate of the population probability

Page 36: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 36

Who supports aid for college?

low-income | race of respondent co | white black other | Total-----------+---------------------------------+---------- def shld | 386 113 40 | 539 | 29.09 57.95 49.38 | 33.62 -----------+---------------------------------+---------- pr shld | 739 73 31 | 843 | 55.69 37.44 38.27 | 52.59 -----------+---------------------------------+---------- pr not | 153 6 6 | 165 | 11.53 3.08 7.41 | 10.29 -----------+---------------------------------+---------- def not | 49 3 4 | 56 | 3.69 1.54 4.94 | 3.49 -----------+---------------------------------+---------- Total | 1327 195 81 | 1603 | 100.00 100.00 100.00 | 100.00

Page 37: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 37

Assigning probabilities

Sometimes it is useful to assign probabilities rather than observing them• Simulations (such as the voting analysis)

• Sampling When we assign probabilities, we often do so

using random draws from a density curve

Page 38: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 38

Uniform distribution (empirical)

Uniform density curvevoting simulation (n=10,000)

Fra

ctio

n

sort0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

0

.05

.1

Page 39: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 39

Simulating voting

In this case, 44% of “likely voters” in the Gallup poll preferred Gore, 48% preferred Bush

Assign voters to Bush if P(X).44 Assign voters to Gore if .44<P(X) .92

Page 40: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 40

Sampling voting

Assign likely voters equal (uniform) probabilities of selection

Select voters at random Observe their voting preferences In this case, 44% of “likely voters” in the

Gallup poll preferred Gore, 48% preferred Bush

Page 41: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 41

In both cases…

Whether or not individual i prefers Gore, Bush, Nader or Buchanan is a random variable• In the simulation, observations are randomly

assigned to a candidate

• In the survey, observations have views and are randomly selected

Page 42: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 42

Random variables

“A random variable is a variable whose value is a numerical outcome of a random phenomenon” (Moore, p. 231)• Not completely random

• Not necessarily mostly random

• Just needs a random component

Page 43: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 43

Probability distributions

The probability distribution of a random variable X shows us the values X can take

There are many different probability distributions

One with which you are familiar is the normal probability distribution, AKA “Table A”

Page 44: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 44

Next time

The normal probability distribution and The sampling distribution which is NOT an

empirical distribution

Page 45: Eric Grodsky Sociology 360 Spring 2001 1 Lecture 12: Introduction to probability Review cross tabulations/ conditional distributions Conceptual introduction

Eric Grodsky Sociology 360 Spring 2001 45

Homework due Wednesday, March 28

Moore: 4.10,18,20,32,35,40,42,43,54