Raoul LePage Professor STATISTICS AND PROBABILITY stt.msu/~lepage click on STT315_F06

1Week 9-25-06 and some preparation for exam 2.

2Week 9-25-06 and some preparation for exam 2.

3

NORMAL DISTRIBUTIONBERNOULLI TRIALS

BINOMIAL DISTRIBUTIONPOISSON DISTRIBUTION

4

note thepoint of inflexion

note thebalance point

5

SD=15

MEAN = 100

point ofinflexion

6

5

50

7

6.3

39.7

8

~68%

Illustrated for theStandard NormalMean=0, SD=1

9

~95%

Illustrated for theStandard normalMean=0, SD=1

10

15

100

~68/2=34%

~95/2=47.5%

130 85

11

15

100

~68/2=34%

~95/2=47.5%

130 85

12

15

IQ Z

100

0

1

Standard Normal

13

14

P(Z > 0) = P(Z < 0 ) = 0.5

P(Z > 2.66) = 0.5 - P(0 < Z < 2.66) = 0.5 - 0.4961 = 0.0039

P(Z < 1.92) = 0.5 + P(0 < Z < 1.92) = 0.5 + 0.4726 = 0.9726

15

x p(x)

1 p (1 denotes “success”)0 q (0 denotes “failure”) __ 1

0 < p < 1q = 1 - p

16

P(success) = P(X = 1) = pP(failure) = P(X = 0) = q

e.g. X = “sample voter is Democrat” Population has 48% Dem.

p = 0.48, q = 0.52P(X = 1) = 0.48

17

P(S1 S2 F3 F4 F5 F6 S7) = p3 q4

just write P(SSFFFFS) = p3 q4

“the answer only depends upon howmany of each, not their order.”

e.g. 48% Dem, 5 sampled, with-repl:P(Dem Rep Dem Dem Rep) = 0.483 0.522

18

e.g. P(exactly 2 Dems out of sample of 4)= P(DDRR) + P(DRDR) + P(DDRR)+ P(RDDR) + P(RDRD) + P(RRDD)= 6 .482 0.522 ~ 0.374.

There are 6 ways to arrange 2D 2R.

19

e.g. P(exactly 3 Dems out of sample of 5)= P(DDDRR) + P(DDRDR) + P(DDRRD)+ P(DRDDR) + P(DRDRD) + P(DRRDD) + P(RDDDR) +P(RDDRD) + P(RDRDD) + P(RRDDD) = 10 .483 0.522 ~ 0.299.

There are 10 ways to arrange 3D 2R.Same as the number of ways to select 3 from 5.

20

5! ways to arrange 5 things in a line

Do it thus (1:1 with arrangements): select 3 of the 5 to go first in line, arrange those 3 at the head of line then arrange the remaining 2 after.

5! = (ways to select 3 from 5) 3! 2!So num ways must be 5! /( 3! 2!) = 10.

21

Let random variable X denote the number of“S” in n independent Bernoulli p-Trials.By definition, X has a Binomial Distributionand for each of x = 0, 1, 2, …, n P(X = x) = (n!/(x! (n-x)!) ) px qn-x

e.g. P(44 Dems in sample of 100 voters) = (100!/(44! 56!)) 0.4844 0.52100-44 = 0.05812.

22

n!/(x! (n-x)!) is the count of how manyarrangements there are of a string of x letters“S” and n-x letters “F.”.

px qn-x is the shared probability of each stringof x letters “S” and n-x letters “F.”(define 0! = 1, p0 = q0 = 1 and the formula goesthrough for every one of x = 0 through n)

is short for the arrangement count =

23Week 9-25-06

24Week 9-25-06

n = 10, p = 0.4mean = n p = 4sd = root(n p q) ~ 1.55

25Week 9-25-06


26Week 9-25-06


27Week 9-25-06

p(x) = e-mean meanx / x!for x = 0, 1, 2, ..ad infinitum

28Week 9-25-06

e..g. X = number of times ace of spades turns up in 104 tries X~ Poisson with mean 2p(x) = e-mean meanx / x!e.g. p(3) = e-2 23 / 3! ~ 0.18

29Week 9-25-06

e.g. X = number of raisins in MY cookie. Batter has 400 raisins and makes 144 cookies. E X = 400/144 ~ 2.78 per cookie

p(x) = e-mean meanx / x!e.g. p(2) = e-2.78 2.782 / 2! ~ 0.24(around 24% of cookies have 2 raisins)

30Week 9-25-06

THE FIRST BEST THING ABOUT THE POISSON IS THAT THE MEAN ALONE TELLS US THE ENTIRE DISTRIBUTION!note: Poisson sd = root(mean)

31Week 9-25-06

E X = 400/144 ~ 2.78 raisins per cookie

sd = root(mean) = 1.67 (for Poisson)

32Week 9-25-06

THE SECOND BEST THING ABOUT THE POISSON IS THAT FOR A MEAN AS SMALL AS 3 THE NORMAL APPROXIMATION WORKS WELL.

mean 2.78

1.67 = sd = root(mean)Special to Poisson

33Week 9-25-06

E X = 127.8 accidentsIf Poisson then sd = root(127.8) = 11.3049 and the approx dist is:

mean 127.8 accidents

~sd = root(mean) = 11.3Special to Poisson

34

Week 9-25-06

35

The overwhelming majority of samples of n from a population of N can stand-in for the population.

36

The overwhelming majority of samples of n from a population of N can stand-in for the population.

37

Sample size n must be “large.” For only a few characteristics at a

time, such as profit, sales, dividend.SPECTACULAR FAILURES MAY OCCUR!

38

With-replacement

39

With-replacementvs without replacement.

40

This sample is obviously “not representative.”

41

Rule of thumb: With and without replacement are about the same if

root [(N-n) /(N-1)] ~ 1.

42

They would have you believe the population is {8, 9, 12, 42}

and the sample is {42}. A SET is a collection of distinct entities.

43

IF THE OVERWHELMING MAJORITY OF SAMPLES ARE “GOOD SAMPLES”

THEN WE CAN OBTAIN A “GOOD” SAMPLE BY

RANDOM SELECTION.

44

Digits are made to correspond to letters. a = 00-02 b = 03-05 …. z = 75-77Random digits then give random letters. 1559 9068 … (Table 14, pg. 809)15 59 90 68 etc… (split into pairs) f t * w etc… (take chosen letters)For samples without replacement just pass over any duplicates.

45

The Great Trick is far more powerful than we have seen.

A typical sample closely estimates such things as a population mean or the shape

of a population density.But it goes beyond this to reveal how

much variation there is among sample means and sample densities. A typical sample not only A typical sample not only

estimates population quantities. estimates population quantities.

It estimates the sample-to-It estimates the sample-to-sample variations of its own sample variations of its own

estimates.estimates.

46

The average account balance is $421.34 for a random with-replacement sample of 50 accounts.

We estimate from this sample that the average balance is $421.34 for all accounts.

From this sample we also estimate

and display a “margin of error”

$421.34 +/- $65.22 = $421.34 +/- $65.22 = . .

47

NOTE: Sample standard deviation smay be calculated in several equivalent ways,

some sensitive to rounding errors, even for n = 2.

48

The following margin of error calculation for n = 4 is only an illustration. A sample of four

would not be regarded as large enough.Profits per sale = {12.2, 15.3, 16.2, 12.8}.Mean = 14.125, s = 1.92765, root(4) = 2.Margin of error = +/- 1.96 (1.92765 / 2)

Report: 14.125 +/- 1.8891.A precise interpretation of margin of error will be given later in the course, including the role of 1.96. The interval 14.125 +/- 1.8891 is called a “95%

confidence interval for the population mean.”

We used: (12.2-14.125)2 + (15.3-14.125)2 + (16.2-14.125)2 + (12.8-14.125)2 = 11.1475.

49

A random with-replacement sample of 50 stores participated in a test marketing. In 39 of these 50 stores (i.e. 78%) the new package

design outsold the old package design.

We estimate from this sample that 78% of all stores will sell more of new vs old.

We also estimate a “margin of error +/- 11.5%+/- 11.5%

Figured: 1.96 root(pHAT qHAT)/root(n) =1.96 root(.78 .22)/root(50) = 0.114823 in Binomial setup

50

A sample of only n = 600 from a population of N = 500 million.

(FINE resolution)

51

Documents

Raoul LePage Professor STATISTICS AND PROBABILITY stt.msu/~lepage click on STT315_F06