31
Discrete Distributions • What is a random variable? • Distinguish between discrete random variables and continuous random variables. • Know how to determine the mean and variance of a discrete distribution. • Identify the type of statistical experiments that can be described by the binomial distribution, and know how to calculate probabilities based on the binomial distribution.

Statr sessions 9 to 10

Embed Size (px)

Citation preview

Page 1: Statr sessions 9 to 10

Discrete Distributions

• What is a random variable?• Distinguish between discrete random variables and

continuous random variables.• Know how to determine the mean and variance of

a discrete distribution.• Identify the type of statistical experiments that can

be described by the binomial distribution, and know how to calculate probabilities based on the binomial distribution.

Page 2: Statr sessions 9 to 10

Discrete vs. Continuous Distributions

• Random Variable - a variable which contains the outcomes of a chance experiment

• Discrete Random Variable – A random variable that only takes on distinct values

ex: Number of heads on 10 flips, Number of defective items in a random sample of 100, Number of times you check your watch during class, etc.

• Continuous Random Variable – A random variable that takes on infinite values by increasing precision. For each two values, there always exists a valid value in between them.ex: Time until a bulb goes out, height, etc.

Page 3: Statr sessions 9 to 10

Describing a Distribution

• A distribution can be described by constructing a graph of the distribution

• Measures of central tendency and variability can be applied to distributions

Page 4: Statr sessions 9 to 10

Describing a Discrete Distribution

• Mean of discrete distribution – is the long run average – If the process is repeated long enough, the average of

the outcomes will approach the long run average (mean)– Mean of a discrete distribution

µ = ∑ (Xi * P(Xi))where µ is the long run average,Xi = the ith outcome of random variable X, and

P(Xi) = probability of X = Xi

Page 5: Statr sessions 9 to 10

Describing a Discrete Distribution

• Variance of a discrete distribution is obtained in a manner similar to raw data, summing the squared deviations from the mean and weighting them by P(Xi) (rather than dividing by n):

Var(Xi) = ∑ (Xi – m)2* P(Xi)

• Standard Deviation is computed by taking the square root of the variance

Page 6: Statr sessions 9 to 10

Discrete Distribution -- Example

• An executive is considering out-of-town business travel for a given Friday. At least one crisis could occur on the day that the executive is gone. The distribution on the following slide contains the number of crises that could occur during the day the executive is gone and the probability that each number will occur. For example, there is a 0.37 probability that no crisis will occur, a 0.31 probability of one crisis, and so on.

Page 7: Statr sessions 9 to 10

Discrete Distribution -- Example

012345

0.370.310.180.090.040.01

Number of Crises ( X )

Probability P(Xi)

Distribution of Daily Crises

0

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5

Probability

Number of Crises ( X )P(Xi)

Page 8: Statr sessions 9 to 10

Requirements for a Discrete Probability Function -- Examples

• Each probability must be between 0 and 1• The sum of all probabilities must be equal to 1.

X P(X)

-10123

.1

.2

.4

.2

.11.0

X P(X)

-10123

-.1.3.4.3.1

1.0

X P(X)

-10123

.1

.3

.4

.3

.11.2

VALID NOTVALID

NOTVALID

Page 9: Statr sessions 9 to 10

Roulette

A roulette wheel has 37 pockets.

£1 on a number returns £36 if it comes up (i.e. your £1 back + £35 winnings). Otherwise you lose your £1.

What is the expected winnings (in pounds) on a £1 number bet?

1. -1/362. -1/373. -2/374. -1/355. 1/36

Page 10: Statr sessions 9 to 10

Roulette

A roulette wheel has 37 pockets.

£1 on a number returns £36 if it comes up (i.e. your £1 back + £35 winnings). Otherwise you lose your £1.

What is the expected winnings (in pounds) on a £1 number bet?

Expected winnings is

£35 P(right number) + £(-1) P(wrong number)

= £

Page 11: Statr sessions 9 to 10

Binomial Distribution• The binomial distribution is a discrete distribution where X, the

random variable, represents the number of “successes” and the following four conditions are met: There are n trials The n trials are independent of each other The outcome is dichotomous – only two outcomes are possible The probability of “success” is constant across trials

• Example, 10 coin flips, X = # of heads• X = the number of “successes” and we say X follows a Binomial

distribution with n trials and P(success in each trial) = p• If the data follow a binomial distribution, then we can

summarize P(Xi) for all values of Xi = 1, …, n through the binomial probability distribution formula

• n = Sample size

Page 12: Statr sessions 9 to 10

Situations where a Binomial distribution might occur

1) Quality control: select n items at random; X = number found to be satisfactory. 2) Survey of n people about products A and B; X = number preferring A. 3) Telecommunications: n messages; X = number with an invalid address. 4) Number of items with some property above a threshold; e.g. X = number with height > A

Page 13: Statr sessions 9 to 10

Binomial distribution

• Probability function

• Mean value

• Variance and Standard Deviation

p1q ,0for

!!!)(

nX

q Xnp XXnX

nXP

pnm

qpn

qpn

2

2

Page 14: Statr sessions 9 to 10

Binomial Distribution:Demonstration Problem 5.3

According to the U.S. Census Bureau, approximately 6% of all workers in Jackson, Mississippi, are unemployed. In conducting a random telephone survey in Jackson, what is the probability of getting two or fewer unemployed workers in a sample of 20?

Page 15: Statr sessions 9 to 10

Binomial Distribution:Demonstration Problem 5.3

• In this example, – 6% are unemployed => p– The sample size is 20 => n– 94% are employed => q– X is the number of successes desired– What is the probability of getting 2 or fewer unemployed

workers in the sample of 20? => P(X≤2)– The hard part of this problem is identifying p, n, and x

According to the U.S. Census Bureau, approximately 6% of all workers in Jackson, Mississippi, are unemployed. In conducting a random telephone survey in Jackson, what is the probability of getting two or fewer unemployed workers in a sample of 20?

Page 16: Statr sessions 9 to 10

Binomial Distribution Table:Demonstration Problem 5.3

According to the U.S. Census Bureau, approximately 6% of all workers in Jackson, Mississippi, are unemployed. In conducting a random telephone survey in Jackson, what is the probability of getting two or fewer unemployed workers in a sample of 20?

n = 20 PROBABILITYX 0.05 0.06 0.070 0.3585 0.2901 0.23421 0.3774 0.3703 0.35262 0.1887 0.2246 0.25213 0.0596 0.0860 0.11394 0.0133 0.0233 0.03645 0.0022 0.0048 0.00886 0.0003 0.0008 0.00177 0.0000 0.0001 0.00028 0.0000 0.0000 0.0000… … …20 0.0000 0.0000 0.0000

8850.2246.3703.2901.)2()1()0()2(

94.06.20

XPXPXPXPqpn

Page 17: Statr sessions 9 to 10

Poisson Distribution

• The Poisson distribution focuses only on the number of discrete occurrences over some interval or continuum Poisson does not have a given number of

trials (n) as a binomial experiment does Occurrences are independent of other

occurrences Occurrences occur over an interval

Page 18: Statr sessions 9 to 10

Poisson Distribution

• If Poisson distribution is studied over a long period of time, a long run average can be determined The average is denoted by lambda (λ) Each Poisson distribution contains a lambda

value from which the probabilities are determined

A Poisson distribution can be described by λ alone

Page 19: Statr sessions 9 to 10

Poisson Distribution :Probability Function P(x)

)logarithms natural of base (the ...718282.2

:

,...3,2,1,0for !

)(

eaveragelongrun

where

XXeXXP

Mean Variance

Standard Deviation

Page 20: Statr sessions 9 to 10

Continuous Random Variables• A continuous random variable is a random variable which can

take values measured on a continuous scale e.g. weights, strengths, times or lengths.

• Probabilities of outcomes occurring between particular two points are determined by calculating the area under the Probability density function curve between these points.

𝑃 (−1.5<𝑋<−0.7 )

𝑓 (𝑥)

Probability density function (pdf):

Probability of in the range to .

Page 21: Statr sessions 9 to 10

Properties of Normal Distribution• Continuous distribution - Line does not break• The line does not touch the x-axis• Bell-shaped, symmetrical distribution• Ranges from -∞ to ∞• Mean = median = mode

• Area under the curve = total probability = 1• 68% of data are within one standard deviation of mean,

95% within two standard deviations, and 99.7% within three standard deviations by Empirical rule.

Page 22: Statr sessions 9 to 10

Probability Density Function ofNormal Distribution

There are a number of different normal distributions, they are characterized by the mean and the standard deviation

Page 23: Statr sessions 9 to 10

Probability Density Function ofNormal Distribution

. . . 2.71828

. . . 3.14159 = of deviation standard

ofmean :

21)(

2

21

e

xx

where

xxf e

m

m

m 𝑥

Page 24: Statr sessions 9 to 10

Normal Distribution –Calculating Probabilities

• Rather than create a different table for every normal distribution (with different mean and standard deviations), we can calculate a standardized normal distribution, called Z-score

• A z-score gives the number of standard deviations that a value x is above the mean.

• Z distribution is normal distribution with a mean of 0 and a standard deviation of 1

𝑧=𝑥−𝜇𝜎

Page 25: Statr sessions 9 to 10

Standardized Normal Distribution –Calculating Probabilities

• Z distribution probability values are given in table A5 of your book or can be calculated using software

• Table A5 gives the total area under the Z curve between 0 and any point on the positive Z axis

• Since the curve is symmetric, the area under the curve between Z and 0 is the same whether the Z curve is positive or negative

Page 26: Statr sessions 9 to 10

Standardized Normal Distribution –Calculating Probabilities – z table

Second Decimal Place in z z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.00 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.03590.10 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.07530.20 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.11410.30 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.90 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.33891.00 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.36211.10 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.38301.20 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

2.00 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

3.00 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.49903.40 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.49983.50 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998

Page 27: Statr sessions 9 to 10

Table Lookup of a StandardizedNormal Probability

-3 -2 -1 0 1 2 3

P Z( ) .0 1 0 3413

Z 0.00 0.01 0.02

0.00 0.0000 0.0040 0.00800.10 0.0398 0.0438 0.04780.20 0.0793 0.0832 0.0871

1.00 0.3413 0.3438 0.3461

1.10 0.3643 0.3665 0.36861.20 0.3849 0.3869 0.3888

Page 28: Statr sessions 9 to 10

Applying the Z Formula

X is normally distributed with = 485, and = 105m P X P Z( ) ( . ) .485 600 0 1 10 3643 For X = 485,

Z = X -m

485 485

1050

10.1105

485600-X=Z

600, = XFor

m

Z 0.00 0.01 0.02

0.00 0.0000 0.0040 0.00800.10 0.0398 0.0438 0.0478

1.00 0.3413 0.3438 0.3461

1.10 0.3643 0.3665 0.3686

1.20 0.3849 0.3869 0.3888

Page 29: Statr sessions 9 to 10

Applying the Z Formula

7123.)56.0()550(100= and 494,= with ddistributenormally is X

ZPXPm

56.0100

494550-X=Z

550 = XFor

m

0.5 + 0.2123 = 0.7123

Page 30: Statr sessions 9 to 10

Applying the Z Formula

0.5 – 0.4803 = 0.0197

0197.)06.2()700(100= and 494,= with ddistributenormally is X

ZPXPm

06.2100

494700-X=Z

700 = XFor

m

Page 31: Statr sessions 9 to 10

Applying the Z Formula

8292.)06.194.1()600300(100= and 494,= with ddistributenormally is X

ZPXPm

94.1100

494300-X=Z

300 = XFor

m

06.1100

494600-X=Z

600 = XFor

m

0.4738+ 0.3554 = 0.8292