Download pdf - Statistics (recap)

Statistics (Recap)Finance & Management Students

Farzad Javidanrad

October 2013

University of Nottingham-Business School

Probability• Some Preliminary Concepts:

Random: Something that happens (occurs) by chance.

Population: A set of all possible outcome of a random experiment or a collection of all members of a specific group under study. This collection makes an space that all possible samples can be derived from. For that reason it is sometimes called sample space.

Sample: Any subset of population (sample space).

In tossing a die:

Random event is the event of appearing any face of the die.

Population (sample space) is the set of .

Sample is any subset of the set above such as or .

61,2,3,4,5,

3 6,4,2

Probability• Two events are mutually exclusive if they cannot happen together.

The occurrence of one of them prevents the occurrence of another. For example, if the baby is a boy it cannot be a girl and vice versa.

• Two events are independent if occurrence of one of them has no effect on the chance of occurrence of another. For example, the result of rolling a die has no impact on the outcome of flipping a coin. But in the experiment of taking two cards consecutively from a set of 52 cards (if the cards can be chosen equally likely) the chance of getting the second card is affected by the result of the first card.

• Two events are exhaustive if they include all possible outcomes together. For example, in rolling a die the possibility of having odd numbers or even numbers.

Probability• If event 𝑨 can happen in 𝒎 different ways out of 𝒏 equally likely

ways, the probability of event 𝑨 can be shown as its relative frequency; i.e. :

𝑃 𝐴 =𝑚

𝑛

U: sample space (population)

A: an event (sample)

A’: mutually exclusive event with A

A & A’ are exhaustive collectively

No. of ways that event 𝐴occurs

Total of equally likely and possible outcomes

𝐴𝐴

𝐴′

U

Probability• As 0 ≤ 𝑚 ≤ 𝑛 it can be concluded that

0 ≤𝑚

𝑛≤ 1

Or 0 ≤ 𝑃(𝐴) ≤ 1

• 𝑃 𝐴 = 0 means that event 𝐴 cannot happen and 𝑃 𝐴 = 1means that the event will happen with certainty.

• With the definition of 𝐴′ as an event of “non-occurrence” of event 𝐴, we can find that:

𝑃 𝐴′ =𝑛 −𝑚

𝑛= 1 −

𝑚

𝑛= 1 − 𝑃 𝐴

Or 𝑃 𝐴 + 𝑃 𝐴′ = 1

Probability of Multiple Events• If 𝑨 and 𝑩 are not mutually exclusive events so, the probability of

happening one of them (𝑨 𝑜𝑟 𝑩) can be calculated as following:

𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷 𝑩 − 𝑷(𝑨 ∩ 𝑩)

𝑃 𝐴 𝑜𝑟 𝐵 𝑃 𝐴 𝑎𝑛𝑑 𝐵

𝑃 𝐴 𝑃 𝐵

𝑃 𝐴 ∩ 𝐵

Probability of Multiple Events

P(A)

P(B)P(C)

𝑃 𝐴 ∩ 𝐵 ∩ 𝐶

In case, we are dealing with more events:

𝑷 𝑨 ∪ 𝑩 ∪ 𝑪 = 𝑷 𝑨 + 𝑷 𝑩 + 𝑷 𝑪 − 𝑷 𝑨 ∩ 𝑩 − 𝑷 𝑨 ∩ 𝑪 −𝑷 𝑩 ∩ 𝑪 + 𝑷(𝑨 ∩ 𝑩 ∩ 𝑪)

Probability of Multiple Events• Considering 𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷 𝑩 − 𝑷(𝑨 ∩ 𝑩) we can have the

following situations:

1. If 𝑨 and 𝑩 are mutually exclusive events, then :

𝑷 𝑨 ∩ 𝑩 = 𝟎

2. If 𝑨 and 𝑩 are two independent events, then:

𝑷 𝑨 ∩ 𝑩 = 𝑷(𝑨) × 𝑷(𝑩)

3. If 𝑨 and 𝑩 are dependent events, then:

𝑷 𝑨 ∩ 𝑩 = 𝑷(𝑨) × 𝑷(𝑩 𝑨) = 𝑷(𝑩) × 𝑷(𝑨 𝑩)

Where 𝑷(𝑨 𝑩) and 𝑷(𝑩 𝑨) are conditional probabilities and in the case of 𝑷(𝑨 𝑩) means the probability of event 𝐴 provided that event 𝐵 has already happened.

Probability of Multiple Eventso The probability of picking at random a Heart or a Queen on a single

experiment from a card deck of 52 is:

𝑃 𝐻 ∪ 𝑄 = 𝑃 𝐻 + 𝑃 𝑄 − 𝑃 𝐻 ∩ 𝑄 =13

52+

4

52−

1

52=

4

13

o The probability of getting a 1 or a 4 on a single toss of a fair die is:

𝑃 1 ∪ 4 = 𝑃 1 + 𝑃 4 =1

6+1

6=

1

3As they cannot happen together they are mutually exclusive events and 𝑃 1 ∩ 4 = 0.

o The probability of having two heads in the experiment of tossing two fair coins is: (two independent events)

𝑃 𝐻 ∩ 𝐻 =1

2.1

2=

1

4

Probability of Multiple Eventso The probability of picking two ace without returning the first card

into the batch of 52 playing cards, which represents a conditional probability, is:

𝑃 1𝑠𝑡 𝑎𝑐𝑒 ∩ 2𝑛𝑑 𝑎𝑐𝑒 = 𝑃(1𝑠𝑡 𝑎𝑐𝑒) × 𝑃(2𝑛𝑑 𝑎𝑐𝑒 1𝑠𝑡 𝑎𝑐𝑒)

Or can be written with less words involved:

𝑃 𝐴1 ∩ 𝐴2 = 𝑃(𝐴1) × 𝑃(𝐴2 𝐴1) =4

52×

3

51=

1

221

• If two events 𝑨 and 𝑩 are independent from each other then:

𝑷(𝑨 𝑩) = 𝑷 𝑨 𝒂𝒏𝒅 𝑷(𝑩 𝑨) = 𝑷(𝑩)

Random Variable & Probability Distribution

Some Basic Concepts:

• Variable: A letter (symbol) which represents the elements of a specific set.

• Random Variable: A variable whose values are randomly appear based on a probability distribution.

• Probability Distribution: A corresponding rule (function) which corresponds a probability to the values of a random variable.

• Variables (including random variables) are divided into two general categories:

1) Discrete Variables, and

2) Continuous Variables


• A discrete variable is the variable whose elements (values) can be

corresponded to the values of the natural numbers set or any subset

of that. So, it is possible to put an order and count its elements

(values). The number of elements can be finite or infinite.

• For a discrete variable it is not possible to define any neighbourhood, whatever small, at any value in its domain. There is a jump from one value to another value.

• If the elements of the domain of a variable can be corresponded to

the values of the real numbers set or any subset of that, the variable

is called continuous. It is not possible to order and count the

elements of a continuous variable. A variable is continuous if for any

value in its domain a neighbourhood, whatever small, can be defined.


• Probability Distribution: A rule (function) that associates a probability either to all possible elements of a random variable (RV) individually or a set of them in an interval.*

• For a discrete RV this rule associates a probability to each possible individual outcome. For example, the probability distribution for occurrence of a Head when filliping a fair coin: (Note: 𝑃𝑖 = 1)

𝒙 0 1

𝑃(𝑥) 0.5 0.5In one trial 𝐻, 𝑇

𝒙 0 1 2

𝑃(𝑥) 0.25 0.5 0.25

In two trials 𝐻𝐻,𝐻𝑇, 𝑇𝐻, 𝑇𝑇

𝒙 = 𝑷𝒓𝒊𝒄𝒆 (+1) --- (0) (-1)

𝑃(𝑥) 0.6 0.1 0.3

Change in the price of a share in one day

o The probability distribution for the price change of a share in stock market

Probability Distributions (Continuous)• The probability that a continuous random variable chooses

just one of its values in its domain is zero, because the number of all possible outcomes 𝒏 is infinite and

𝒎

∞→ 𝟎.

• For the above reason, the probability of a continuous random variable need to be calculated in an interval.

• The probability distribution of a continuous random variable is often called a probability density function (PDF) or simply probability function and it is usually shown by 𝒇(𝒙) and it has following properties:

I. 𝑓(𝑥) ≥ 0 (similar to 𝑷(𝒙) ≥ 𝟎 for discrete RV*)

II. −∞

+∞𝑓 𝑥 𝑑𝑥 = 1 (similar to 𝑷 𝒙 = 𝟏 for discrete RV)

III. 𝑎𝑏𝑓 𝑥 𝑑𝑥 = 𝑃 𝑎 ≤ 𝑥 ≤ 𝑏 = 𝐹 𝑏 − 𝐹 𝑎 (probability

given to set of values in an interval [a,b] )**

Probability Distributions (Continuous)• where 𝐹(𝑥) is the integral of the PDF function (𝑓(𝑥)) and it is

called as Cumulative Distribution Function (CDF) and for any real value of 𝒙 is defined as:

𝐹(𝑥) ≡ 𝑃(𝑋 ≤ 𝑥)

CDF shows the area under PDF function (𝐟(𝐱)) from −∞ to 𝐱 . For discrete random variable, CDF shows the summation of all probabilities before the value of 𝐱 .

Adopted from http://beyondbitsandatomsblog.stanford.edu/spring2010/tag/embodied-artifacts/

𝐹(𝑥)

𝑓(𝑥)

𝐹(𝑥)≡𝑃(𝑋≤𝑥)

𝐹(𝑥)≡𝑃(𝑋≤𝑥)

Some Characteristics of Probability Distributions

• Expected Value (Probabilistic Mean Value): It is one of the most important measures which shows the central tendency of the distribution. It is the weighted average of all possible values of random variable 𝒙 and it is shown by 𝑬(𝒙).

• For a discreet RV (with n possible outcomes)

𝑬 𝒙 = 𝒙𝟏𝑷 𝒙𝟏 + 𝒙𝟐𝑷 𝒙𝟐 +⋯+ 𝒙𝒏𝑷 𝒙𝒏 =

𝒊=𝟏

𝒏

𝒙𝒊𝑷(𝒙𝒊)

• For a continuous RV

𝑬 𝒙 =

−∞

+∞

𝒙. 𝒇 𝒙 𝒅𝒙


• Properties of 𝑬(𝒙):

i. If 𝒄 is a constant then 𝑬 𝒄 = 𝒄 .

ii. If 𝒂 and 𝒃 are constants then 𝑬 𝒂𝒙 + 𝒃 = 𝒂𝑬 𝒙 + 𝒃 .

iii. If 𝒂𝟏, … , 𝒂𝒏 are constants then

𝑬 𝒂𝟏𝒙𝟏 +⋯+ 𝒂𝒏𝒙𝒏 = 𝒂𝟏𝑬 𝒙𝟏 +⋯+ 𝒂𝒏𝑬(𝒙𝒏)

Or

𝑬(

𝒊=𝟏

𝒏

𝒂𝒊𝒙𝒊) =

𝒊=𝟏

𝒏

𝒂𝒊𝑬(𝒙𝒊)

iv. If 𝒙 and 𝒚 are independent random variables then

𝑬 𝒙𝒚 = 𝑬 𝒙 . 𝑬 𝒚


v. If 𝒈 𝒙 is a function of random variable 𝒙 then

𝑬 𝒈 𝒙 = 𝒈 𝒙 .𝑷(𝒙)

𝑬 𝒈 𝒙 = 𝒈 𝒙 . 𝒇 𝒙 𝒅𝒙

• Variance: To measure how random variable 𝒙 is dispersed around its expected value, variance can help. If we show 𝑬 𝒙 = 𝝁 , then

𝒗𝒂𝒓 𝒙 = 𝝈𝟐 = 𝑬[ 𝒙 − 𝑬 𝒙𝟐]

= 𝑬[ 𝒙 − 𝝁 𝟐]

= 𝑬[𝒙𝟐 − 𝟐𝒙𝝁 + 𝝁𝟐]

= 𝑬 𝒙𝟐 − 𝟐𝝁𝑬 𝒙 + 𝝁𝟐

= 𝑬 𝒙𝟐 − 𝝁𝟐

For discreet RV

For continuous RV


𝒗𝒂𝒓 𝒙 =

𝒊=𝟏

𝒏

𝒙𝒊 − 𝝁 𝟐. 𝑷(𝒙)

𝒗𝒂𝒓 𝒙 = −∞+∞

𝒙𝒊 − 𝝁 𝟐. 𝒇 𝒙 𝒅𝒙

• Properties of Variance:

i. if 𝒄 is a constant then 𝒗𝒂𝒓 𝒄 = 𝟎 .

ii. If 𝒂 and 𝒃 are constants then 𝒗𝒂𝒓 𝒂𝒙 + 𝒃 = 𝒂𝟐𝒗𝒂𝒓(𝒙) .

iii. If 𝒙 and 𝒚 are independent random variables then

𝒗𝒂𝒓 𝒙 ± 𝒚 = 𝒗𝒂𝒓 𝒙 + 𝒗𝒂𝒓(𝒚)

can be extended to more variables

For discreet RV

For continuous RV

• Some of the well-known probability distributions are:

• The Binomial Distribution:

1. The probability of the occurrence of an event is 𝒑 and is not changing.

2. The experiment is repeated for 𝒏 times.

3. The probability that out of 𝒏 times, the event appears 𝒙 times is:

𝑃 𝑥 =𝑛!

𝑥! 𝑛 − 𝑥 !𝑝𝑥(1 − 𝑝)𝑛−𝑥

The mean value and standard deviation of the binomial distribution are:

𝜇 = 𝑖=0𝑛 𝑥𝑖 . 𝑃 𝑥𝑖 =𝑛𝑝 𝜎 = 𝑖=0

𝑛 𝑥𝑖 − 𝜇 2. 𝑃(𝑥𝑖) = 𝑛𝑝(1 − 𝑝)

So, to show that the probability distribution of the random variable 𝑋is binomial we can write: 𝑋~𝐵𝑖(𝑛𝑝, 𝑛𝑝 1 − 𝑝 )

Probability Distributions (Discrete RV)

Probability Distributions (Discrete RV)• A gambler thinks his chance to get a 1 in rolling a die is high. What

is his chance to have 4 one out of six experiments using a fair die?

The probability of having a one in an individual trial is 1

6and it

remains the same in all 6 experiments. So,

𝑃 𝑥 = 4 =6!

4! 2!

1

6

45

6

2

=375

7776= 0.048 ≈ 5%

• The Poisson Distribution:

1. It is used to calculate the probability of number of desired event (no. of successes)in a specific period of time.

2. The average number of desired event (no. of successes) per unit of time remains constant.

• So, the probability of having 𝒙 numbers of success is calculated by:

𝑃 𝑥 =𝝀𝑥𝑒−𝝀

𝑥!

Where 𝝀 is the average number of successes in a specific period of time and 𝑒 = 2.7182 .

• The mean value and standard deviation of the Poisson distribution are:

𝜇 =

𝑖=0

𝑛

𝑥𝑖 . 𝑃 𝑥𝑖 =𝝀 and 𝜎 =

𝑖=0

𝑛

𝑥𝑖 − 𝜇 2. 𝑃(𝑥𝑖) = 𝝀

So, to show that the probability distribution of the random variable 𝑋 is Poisson we can write: 𝑿~Poi(𝝀, 𝝀).

o The emergency section in a hospital receives 2 calls per half an hour (4 calls in an hour). The probability of getting just 2 calls in a randomly chosen hour in a random day is:

𝑃 𝑥 = 2 =42𝑒−4

2!= 0.146 ≈ 15%

Probability Distributions (Discrete RV)

The Normal Distribution (Continuous RV)• The Normal Distribution: It is the best known probability

distribution which reflects the nature of most random variables in the world. The probability density function (PDF) of normal distribution is:

1. Symmetrical around its mean value (𝝁).

2. Bell-shaped, with two tails approaching the horizontal axis asymptotically as we move further away from the mean.

Adopted from http://www.pdnotebook.com/2010/06/statistical-tolerance-analysis-root-sum-square/

The Normal Distribution (Continuous RV)3. The probability density function (PDF) of normal distribution

can be represented by:

𝒇 𝒙 =𝟏

𝝈 𝟐𝝅𝒆−

𝒙−𝝁 𝟐

𝟐𝝈𝟐 (−∞ < 𝒙 < +∞)

Where 𝝁 and 𝝈 are mean and standard deviation respectively.

𝝁 = −∞+∞

𝒙. 𝒇 𝒙 𝒅𝒙 and 𝝈 = −∞+∞

𝒙 − 𝝁 𝟐 . 𝒇 𝒙 𝒅𝒙

So, 𝑿~𝑵(𝝁, 𝝈𝟐).

• A linear combination of independent normally distributed random variables is itself normally distributed, that is,

If 𝑿~𝑵 𝝁𝟏, 𝝈𝟏𝟐 and 𝒀~𝑵 𝝁𝟐, 𝝈𝟐

𝟐 and if 𝒁 = 𝒂𝑿 + 𝒃𝒀 then

𝒁~𝑵(𝒂𝝁𝟏 + 𝒃𝝁𝟐 , 𝒂𝟐𝝈𝟏

𝟐 + 𝒃𝟐𝝈𝟐𝟐)

• This can be extended to more than two random variables.

The Normal Distribution (Continuous RV)• Recalling the last property of PDF (

𝑎

𝑏𝑓 𝑥 𝑑𝑥 = 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏)), it is

difficult to calculate the probability using the above PDF with different values of 𝝁 and 𝝈. The solution for this problem is to transform the normal variable 𝒙 to the standardised normal variable (or simply, standard normal

variable) random variable 𝒛 , by: 𝒛 =𝒙−𝝁

𝝈

which its parameters (𝜇 and 𝜎2) are independent from the influence of other random variables’ parameters with normal distribution because we always have:𝑬 𝒛 = 𝟎 and 𝒗𝒂𝒓 𝒛 = 𝟏 (why?)

• The probability distribution for the standard normal variable is defined as:

𝒇 𝒛 =𝟏

𝟐𝝅𝒆−

𝒛𝟐

𝟐 𝒁~𝑵(𝟎, 𝟏).

Standardised

Adopted and amended from http://www.mathsisfun.com/data/standard-normal-distribution.html

𝑿~𝑵(𝝁, 𝝈𝟐) 𝒁~𝑵(𝟎, 𝟏)

The Standard Normal Distribution

0

• Properties of the standard normal distribution curve:

1. It is symmetrical around y-axis.

2. The area under the curve can be split into two equal areas, that is:

−∞

0

𝑓 𝑧 𝑑𝑧 =

0

+∞

𝑓 𝑧 𝑑𝑧 = 0.5

• To find the area under the curve and before 𝒛𝟏 = 𝟏. 𝟐𝟔 , using the z-table (next slide), we have:

𝑃 𝑧 ≤ 𝑧1 = 1.26 =

−∞

0

𝑓 𝑧 𝑑𝑧 +

0

𝑧1

𝑓 𝑧 𝑑𝑧 =0.5 + 0.3962 = 0.8962 ≈ 90%

𝑓(𝑧)

50%

𝑧

50% 50%

𝒛𝟏 = 𝟏. 𝟐𝟔

0.5

0.3

96

2

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141

0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224

0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133

0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857

2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890

2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964

2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981

2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Working with the Z-Table• To find the probability

𝑃 0.89 < 𝑧 < 1.5 =

0

𝑧2

𝑓(𝑧)𝑑𝑧 −

0

𝑧1

𝑓 𝑧 𝑑𝑧

= 𝐹 1.5 − 𝐹 0.89 = 0.4332 − 0.3133

= 0.119 ≈ 12%

as both values are positive.

• To find the probability in the negative area we

need to find the equivalent area in the positive side:

𝑃 −1.32 < 𝑧 < −1.25 = 𝑃 1.25 < 𝑧 < 1.32

= 𝐹 1.32 − 𝐹 1.25

= 0.4066 − 0.3944 = 0.0122 ≈ 1%

1.50.89

Working with the Z-Table• To find 𝑃(−2.15 < 𝑧) we can write:

−∞

−2.15

𝑓. 𝑑𝑧 =

−∞

0

𝑓. 𝑑𝑧 −

−2.15

0

𝑓. 𝑑𝑧

= 0.5 − 0.4842 = 0.0158 ≈ 2%

• And finally, to find 𝑃(𝑧 ≥ 1.93) , we have:

1.93

+∞

𝑓. 𝑑𝑧 =

0

+∞

𝑓. 𝑑𝑧 −

0

1.93

𝑓. 𝑑𝑧

= 0.5 − 0.4732 = 0.0268

0-2.15 =≡

0

2.15

𝑓. 𝑑𝑧

0 =1.93

An Exampleo If the income of employees in a big company normally distributed

with 𝝁 = £𝟐𝟎𝟎𝟎𝟎 and 𝝈 = £𝟒𝟎𝟎𝟎, what is the probability of an employee picked randomly have an income

a) above £22000, b) between £16000 and £24000.

a) We need to transform 𝒙 to 𝒛 firstly:

𝑃 𝑥 > 22000 = 𝑃𝑥 − 20000

4000>

22000 − 20000

4000

= 𝑃 𝑧 > 0.5 = 0.5 − 01915 = 0.3085 ≈ 31%

b) 𝑃 16000 < 𝑥 < 24000 = 𝑃(16000−20000

4000<

𝑥−20000

4000<

24000−20000

4000)

= 𝑃 −1 < 𝑧 < 1

= 0.3413 + 0.3413

= 0.6826 ≈ 68%

The 2(Chi-Squared)Distribution• The 𝟐(Chi-Squared)Distribution:

Let 𝒁𝟏, 𝒁𝟐, … , 𝒁𝒌be 𝒌 independent standardised normal distributed random variables, then the sum of the squares of them

𝑋 =

𝑖=1

𝑘

𝑍𝑖2

have a Chi-Square distribution with a degree of freedom equal to the number of random variables (𝒅𝒇 = 𝒌). So, 𝑿~ .

The mean value and standard

deviation of the RV with a Chi-Squared

distribution are 𝒌 𝑎𝑛𝑑 𝟐𝒌

Respectively. So we can write:

𝑿~

2

k

Probability Density Function (PDF) of 2 Distribution

Ad

op

ted fro

m h

ttp://2012b

oo

ks.lardb

ucket.o

rg/bo

oks/b

eginn

ing-statistics/s1

5-ch

i-squ

are-tests-an

d-f-tests.h

tml

Ad

op

ted

from

http

://ww

w.d

ocsto

c.com

/do

cs/80811

492/chi--sq

uare

-table

𝑃 𝑥2 = 32 𝑑𝑓 = 16 = 0.01 or 𝑥20.01 ,16 = 32

The t-Distribution• If 𝒁~𝑵 𝟎, 𝟏 and 𝑿~ and two random variables

𝒁 and 𝑿 are independent then the random variable

𝒕 =𝒁

𝑿 𝒌

=𝒁. 𝒌

𝑿

follows student’s t-distribution (t-distribution) with 𝒌 degree of freedom. For a sample size 𝒏 we have 𝒅𝒇 = 𝒌 = 𝒏 − 𝟏.

• The mean value and standard deviation of this distribution are

𝝁 = 𝟎 𝒏 > 𝟐

𝒖𝒏𝒅𝒆𝒇𝒊𝒏𝒆𝒅 𝒏 = 𝟏, 𝟐𝝈 =

𝒏−𝟏

𝒏−𝟑𝒏 > 𝟑

∞ 𝒏 = 𝟑𝒖𝒏𝒅𝒆𝒇𝒊𝒏𝒆𝒅 𝒏 = 𝟏, 𝟐

)2,(2 kkk

The t-Distribution• The t-distribution like the standard normal distribution is a bell-

shaped and symmetrical distribution with zero mean (n>2) but it is flatter but as the degree of freedom increases (or 𝒏 increases)it approaches the standard normal distribution and for 𝒏≥𝟑𝟎 their behaviours are similar.

• From the table (next slide)

𝑃 𝑡 = 1.706 𝑑𝑓 =26 = 0.05 ≈ 5% or 𝑡0.05,26 = 1.706

Ad

op

ted

fro

m h

ttp

://e

du

cati

on

-p

ort

al.c

om

/aca

de

my/

less

on

/wh

at-i

s-a-

t-te

st-p

roce

du

re-

inte

rpre

tati

on

-e

xam

ple

s.h

tml#

less

on

= 𝟏. 𝟕𝟎𝟔

5%

df 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.0025 0.001 0.0005

1 1.376 1.963 3.078 6.314 12.706 31.821 63.656 127.321 318.289 636.578

2 1.061 1.386 1.886 2.920 4.303 6.965 9.925 14.089 22.328 31.600

3 0.978 1.250 1.638 2.353 3.182 4.541 5.841 7.453 10.214 12.924

4 0.941 1.190 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610

5 0.920 1.156 1.476 2.015 2.571 3.365 4.032 4.773 5.894 6.869

6 0.906 1.134 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959

7 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408

8 0.889 1.108 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041

9 0.883 1.100 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781

10 0.879 1.093 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587

11 0.876 1.088 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437

12 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318

13 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221

14 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140

15 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073

16 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015

17 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965

18 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922

19 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883

20 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850

21 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819

22 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792

23 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.768

24 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745

25 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725

26 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707

27 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.689

28 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674

29 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.660

30 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646

31 0.853 1.054 1.309 1.696 2.040 2.453 2.744 3.022 3.375 3.633

32 0.853 1.054 1.309 1.694 2.037 2.449 2.738 3.015 3.365 3.622

33 0.853 1.053 1.308 1.692 2.035 2.445 2.733 3.008 3.356 3.611

34 0.852 1.052 1.307 1.691 2.032 2.441 2.728 3.002 3.348 3.601

35 0.852 1.052 1.306 1.690 2.030 2.438 2.724 2.996 3.340 3.591

36 0.852 1.052 1.306 1.688 2.028 2.434 2.719 2.990 3.333 3.582

37 0.851 1.051 1.305 1.687 2.026 2.431 2.715 2.985 3.326 3.574

38 0.851 1.051 1.304 1.686 2.024 2.429 2.712 2.980 3.319 3.566

39 0.851 1.050 1.304 1.685 2.023 2.426 2.708 2.976 3.313 3.558

40 0.851 1.050 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551

50 0.849 1.047 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.496

60 0.848 1.045 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460

80 0.846 1.043 1.292 1.664 1.990 2.374 2.639 2.887 3.195 3.416

100 0.845 1.042 1.290 1.660 1.984 2.364 2.626 2.871 3.174 3.390

150 0.844 1.040 1.287 1.655 1.976 2.351 2.609 2.849 3.145 3.357

Infinity 0.842 1.036 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.290

The F Distribution• If 𝑍1~ and 𝑍2~ and 𝑍1 and 𝑍2 are independent then the

random variable

𝐹 =

𝑍1𝑘1

𝑍2

𝑘2

follows F distribution with 𝑘1 and 𝑘2 degrees of freedom, i.e.:

𝐹~𝐹𝑘1,𝑘2 or 𝐹~𝐹(𝑘1, 𝑘2)

• This distribution is skewed to

the right as the Chi-Square

distribution but as 𝑘1 and 𝑘2increase (𝑛 → ∞) it approaches

to normal distribution.

2

2k2

1k

Ad

op

ted

from

h

ttp://w

ww

.vose

softw

are.com

/Mo

de

lRiskH

elp

/ind

ex.h

tm#D

istrib

utio

ns/C

on

tinu

ou

s_distrib

utio

ns/F_d

istribu

tion

.htm

The F Distribution• The mean and standard deviation of the F distribution are:

𝜇 =𝑘2

𝑘2−2𝑓𝑜𝑟 (𝑘2 > 2) and

𝜎 =𝑘2

𝑘2−2

2(𝑘1+𝑘2−2)

𝑘1(𝑘2−4)𝑓𝑜𝑟 (𝑘2 > 4)

• Relation between t & Chi-Square Distributions with F distribution:

• For a random variable 𝑋~𝑡𝑘it can be shown that 𝑋2~𝐹1,𝑘. This can also be written as

𝑡𝑘2 = 𝐹1,𝑘

• If 𝑘2 is large enough, then 𝑘1. 𝐹𝑘1,𝑘2~

2

1k

𝛼 = 0.25All adopted from http://www.stat.purdue.edu/~yuzhu/stat514s05/tables.html

𝛼 = 0.10

𝛼 = 0.05

𝛼 = 0.025

𝛼 = 0.01

Statistical Inference (Estimation)• Statistical inference or statistical induction is one of the most

important aspect of decision making and it refers to the process of drawing a conclusion about the unknown parameters of the population from a sample of randomly chosen data.

• So, the idea is that a sample of randomly chosen data provides the best information about parameters of the population and it can be considered as a representative of the population when its size reasonably (appropriately) large.

• The first step in statistical inference (induction) is estimation which is the process of finding an estimate or approximation for the population parameters (such as mean value and standard deviation) using the data in the sample.

Statistical Inference (Estimation)• The value of 𝑿 (sample mean) in a randomly chosen and

appropriately large sample is a good estimator of the population mean 𝝁 . The value of 𝒔𝟐(sample variance) is also a good estimator of the population variance 𝝈𝟐.

• Before taking any sample from population (when the sample is not realised or observed) we can talk about the probability distribution of a hypothetical sample. The probability distribution of a random variable 𝒙 in a hypothetical sample follows the probability distribution of the population even if the sampling process is repeated for many times.

• But the probability distribution of the sample mean 𝑿 in repeated sampling does not necessarily follow the probability distribution of its population when number of sampling increases.

Central Limit Theorem• Central Limit Theorem:

Imagine random variable 𝑿 with any probability distribution is defined in a population with the mean 𝝁 and the variance 𝝈𝟐. If we get 𝒏 independent samples 𝑿𝟏, 𝑿𝟐, … , 𝑿𝒏 and for each sample we

calculate the mean values 𝑿𝟏, 𝑿𝟐, … , 𝑿𝒏(see figure below)

𝑿~𝒊. 𝒊. 𝒅(𝝁, 𝝈𝟐)

𝑿𝟏

𝑿𝟐

⋮

𝑿𝒏

𝑖. 𝑖. 𝑑 ≡Independent & Identically Distributed RVs

Central Limit TheoremAs the number of sampling increases infinitely, the random variable 𝑿has a normal distribution (regardless of the population distribution) and we have

𝑿~𝑵 𝝁,𝝈𝟐

𝒏when 𝒏 → +∞

And in the standard form:

𝒁 = 𝑿 − 𝝁 𝑿

𝝈 𝑿=

𝑿 − 𝝁𝝈

𝒏

=𝒏 𝑿 − 𝝁

𝝈~𝑵(𝟎, 𝟏)

o Taking sample of 36 elements from a population with the mean of 20 and standard deviation of 12, what is the probability that the sample mean falls between 18 and 24?

𝑃 18 < 𝑥 < 24 = 𝑃 −1 < 𝑥 − 20

12

36

< 2 = 0.3413 + 0.4772 ≈ 82%

Estimation• In previous slides we introduced some of the most important

probability distributions for discrete & continuous random variables.

• In many cases we know the nature of the probability distribution of a random variable, defined in a population, but have no idea about its parameters such as mean value or/and standard deviation.

• Point Estimation:

• To estimate the unknown parameters of a probability distribution of a random variable we can either have a point estimation or an interval estimation using an estimator.

• The estimator is a function of the sample values 𝒙𝟏, 𝒙𝟐, … , 𝒙𝒏 and it

is often called a statistic. If 𝜽 represent that estimator we have: 𝜽 = 𝒇(𝒙𝟏, 𝒙𝟐, … , 𝒙𝒏)

Estimation• 𝜽 is said to be an unbiased estimator of true 𝜽 (parameter of the

population) if 𝑬 𝜽 = 𝜽. Because the bias itself is defined as

𝑩𝒊𝒂𝒔 = 𝑬 𝜽 − 𝜽

o For example, the sample mean 𝑿 is a point and unbiased estimator for the unknown parameter 𝝁 (population mean):

𝜽 = 𝑿 = 𝒇 𝒙𝟏, 𝒙𝟐, … , 𝒙𝒏 =𝟏

𝒏𝒙𝟏 + 𝒙𝟐 +⋯+ 𝒙𝒏

It is unbiased because 𝑬 𝑿 = 𝝁.

• The sample variance in the form of 𝒔𝟐 = 𝒙𝒊− 𝒙

𝟐

𝒏is a point but a

biased estimator of the population variance 𝝈𝟐 in a small sample:

𝑬 𝒔𝟐 = 𝝈𝟐(𝟏 −𝟏

𝒏) ≠ 𝝈𝟐

But it is a consistent estimator because it will approaches to 𝝈𝟐when the sample size 𝒏 increases indefinitely (𝒏 → ∞)

• With Bessel’s correction (changing 𝒏 to (𝒏 − 𝟏)) we can define another sample variance which is unbiased even for small sample size.

𝒔𝟐 = 𝒙𝒊 − 𝒙 𝟐

𝒏 − 𝟏

• The methods of finding point estimators are mostly least-square method and maximum likelihood method which among them the first method will be discussed later.

Estimation

Interval Estimation• Interval Estimation:

• Interval estimation, in contrary, provides an interval or a range of possible estimates at a specific level of probability, which is called level of confidence, within which the true value of the population parameter may lie.

• If 𝜽𝟏 and 𝜽𝟐 are respectively the lowest and highest estimates of 𝜽

the probability that 𝜽 is covered by the interval 𝜽𝟏, 𝜽𝟐 is:

𝐏𝐫 𝜽𝟏 ≤ 𝜽 ≤ 𝜽𝟐 = 𝟏 − 𝜶 (0 < 𝛼 < 1)

Where 𝟏 − 𝜶 is the level of confidence and 𝜶 itself is called level of

significance. The interval 𝜽𝟏, 𝜽𝟐 is called confidence interval.

Interval Estimation How to find 𝜽𝟏 𝒂𝒏𝒅 𝜽𝟐? In order to find the lower and upper limits of a confidence interval we need to have a prior knowledge about the nature of distribution of the random variable in the population. If random variable 𝒙 is normally distributed in the population and the

population standard deviation (𝝈) is known, the 95% confidence interval for the unknown population mean (𝝁) can be constructed by finding the symmetric z-values associated to 95% area under the standard normal curve:

𝟏 − 𝜶 = 𝟗𝟓% → 𝜶 = 𝟓% →𝜶

𝟐= 𝟐. 𝟓%

So, ±𝒁𝟎.𝟎𝟐𝟓 = ±𝟏. 𝟗𝟔

We know that: 𝒁 = 𝑿−𝝁 𝑿

𝝈 𝑿=

𝑿−𝝁𝝈

𝒏

, so:

𝑷(−𝒁 𝜶 𝟐≤ 𝒁 ≤ 𝒁 𝜶 𝟐

) = 𝟗𝟓%

Adopted & altered from http://upload.wikimedia.org/wikipedia/en/b/bf/NormalDist1.96.png

=1−𝛼

𝜶

𝟐= 𝟎. 𝟎𝟐𝟓

𝜶

𝟐= 𝟎. 𝟎𝟐𝟓

−𝒁 𝜶 𝟐= = 𝒁 𝜶 𝟐

Interval Estimation• So we can write:

𝑷 𝒙 − 𝟏. 𝟗𝟔𝝈 𝒙 ≤ 𝝁 ≤ 𝒙 + 𝟏. 𝟗𝟔𝝈 𝒙 = 𝟎. 𝟗𝟓

Or

𝑷 𝒙 − 𝟏. 𝟗𝟔𝝈

𝒏≤ 𝝁 ≤ 𝒙 + 𝟏. 𝟗𝟔

𝝈

𝒏= 𝟎. 𝟗𝟓

Therefore, the interval 𝒙 − 𝟏. 𝟗𝟔𝝈

𝒏, 𝒙 + 𝟏. 𝟗𝟔

𝝈

𝒏represents a 95%

confidence interval (𝐶𝐼95%)of the unknown value of 𝝁.

It means in repeated random sampling (for 100 times) we

expect 95 out of 100 intervals, such as the above, cover the

unknown value of the population mean 𝝁 .

𝒙 ̅−𝟏.𝟗𝟔 𝝈/√𝒏 = = 𝒙 ̅−𝟏.𝟗𝟔 𝝈/√𝒏Adopted and altered from http://forums.anarchy-online.com/showthread.php?t=604728

Interval Estimation for population Proportion

A confidence interval can be constructed for the population proportion (see the graph below)

𝑋~𝐵𝑖(𝑛𝑝, 𝑛𝑝 1 − 𝑝 )

𝒑𝟏

𝜇 𝜎2

𝒑𝟐

⋮

𝒑𝒏

𝝁 𝒑 = 𝑬 𝒑 = 𝒑 =𝝁

𝒏

𝝈𝟐 𝒑= 𝒗𝒂𝒓 𝒑 =

𝝈𝟐

𝒏𝟐=

𝒑(𝟏 − 𝒑)

𝒏

𝒑 in each sample represents a

sample proportion. In repeated random sampling 𝒑 has its own probability distribution with mean value and

variance

Interval Estimation for population Proportion• The 90% confidence interval for the population proportion 𝒑 when

sample size is bigger than 30 (n>30) and there is no information about the population variance will be constructed as following:

±𝒁 𝜶 𝟐=

𝒑 − 𝒑

𝒑(𝟏 − 𝒑)𝒏

𝑷(−𝒁 𝜶 𝟐≤ 𝒁 ≤ +𝒁 𝜶 𝟐

) = 𝟏 − 𝜶

𝑷( 𝒑 − 𝒁 𝜶 𝟐.

𝒑(𝟏− 𝒑)

𝒏≤ 𝒑 ≤ 𝒑+𝒁 𝜶 𝟐

. 𝒑(𝟏− 𝒑)

𝒏) = 𝟎. 𝟗

So, the confidence interval can be simply written as:

𝑪𝑰𝟗𝟎% = 𝒑 ∓ 𝟏. 𝟔𝟒𝟓 𝒑(𝟏 − 𝒑)

𝒏 =90% 𝜶 𝟐 = 𝟎. 𝟎𝟓 𝜶 𝟐 = 𝟎. 𝟎𝟓

−𝒁 𝜶 𝟐= −𝟏. 𝟔𝟒𝟓 𝒁 𝜶 𝟐

= 𝟏. 𝟔𝟒𝟓

Obviously, if we had knowledge about the

population variance we were be able to estimate

the population proportion 𝒑 directly.

Why?

Adopted and altered fromhttp://www.stat.wmich.edu/s216/book/node83.html

Exampleso Imagine the weight of people in a society distributed normally. A

random sample of 25 with the sample mean 72 kg is taken from this society. If the standard deviation of the population is 6 kg find a)the 90% b)95% and c) 99% confidence interval for the unknown population mean.

a) 1 − 𝛼 = 0.9 →𝛼

2= 0.05 → 𝑍 𝛼 2

= 1.645

So, 𝐶𝐼90% = 72 ± 1.645 ×6

25= 70.03 , 73.97

b) 1 − 𝛼 = 0.95 →𝛼

2= 0.025 → 𝑍 𝛼 2

= 1.96

So, 𝐶𝐼95% = 72 ± 1.96 ×6

25= 69.65 , 74.35

c) 1 − 𝛼 = 0.99 →𝛼

2= 0.005 → 𝑍 𝛼 2

= 2.58

So, 𝐶𝐼99% = 72 ± 2.58 ×6

25= 68.9 , 75.1

Exampleso Samples from one of the lines of production in a factory suggests

that 10% of products are defective. If the range of 1% difference between sample and population proportion is acceptable what sample size we need to construct a 95% confidence interval for the population proportion? What about if the acceptable gap between sample & population proportion increased to 3%?

1 − 𝛼 = 0.95 →𝛼

2= 0.025 → 𝑍 𝛼 2

= 1.96

𝑍 𝛼 2=

𝑝 − 𝑝

𝑝(1 − 𝑝)𝑛

→ 1.96 =0.01

0.1 × 0.9𝑛

→ 𝑛 = 196 × 0.3 2 ≈ 3458

If the gap increases to 3% then:

1.96 =0.03

0.1×0.9

𝑛

→ 𝑛 = 196 × 0.1 2 ≈ 385

Interval Estimation (Using t-distribution) • If the population standard deviation 𝝈 is unknown and we use

sample standard deviation 𝒔 instead, and the size of the sample is less than 30 (𝒏 < 𝟑𝟎) then the random variable

𝒙 − 𝝁𝒔

𝒏

~𝒕𝒏−𝟏

has t-distribution with 𝒅𝒇 = 𝒏 − 𝟏.

This means a confidence interval for the population mean 𝝁 will be in the form of:

𝑪𝑰(𝟏−𝜶) = 𝒙 − 𝒕 𝜶 𝟐,𝒏−𝟏

𝒔

𝒏, 𝒙 + 𝒕 𝜶 𝟐,𝒏−𝟏

𝒔

𝒏

−𝒕 𝜶𝟐,𝒏−𝟏

𝒕 𝜶𝟐,𝒏−𝟏

1 − 𝛼 % 𝜶

𝟐

𝜶

𝟐

Adopted and altered from http://cnx.org/content/m46278/latest/?collection=col11521/latest

Interval Estimation• The following flowchart can help to choose between Z and t-

distributions when the interval estimation is constructed for 𝝁 in the population.

Use nonparametric

methods

Adopted from http://www.expertsmind.com/questions/flow-chart-for-confidence-interval-30112489.aspx

Interval Estimation• Here there is a list of confidence intervals for the subject parameters

in the population.

Adopted from http://www.bls-stats.org/uploads/1/7/6/7/1767713/250709.image0.jpg

Hypothesis Testing • Hypothesis testing is one of the important aspects of statistical inference.

The main idea is to find out if some claims/statements (in the form of hypothesis) about population parameters can be statistically rejected by the evidence from the sample using a test statistic (a function of sample).

• Claims can be made in the form of null hypothesis (𝐻0) against the alternative hypothesis (𝐻1) and they are just rejectable. These two hypotheses should be mutually exclusive and collectively exhaustive. For example:

𝐻0: 𝜇 = 0.8 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝐻1: 𝜇 ≠ 0.8

𝐻0: 𝜇 ≥ 2.1 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝐻1: 𝜇 < 2.1

𝐻0: 𝜎2 ≤ 0.4 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝐻1: 𝜎

2 > 0.4

Always remember that the equality sign comes with 𝐻0.

• If the value of the test statistic lies in the rejection area(s) the null hypothesis must be rejected, otherwise the sample does not provide sufficient evidence to reject the null hypothesis.

Hypothesis Testing • Assuming we know the distribution of the random variable in the

population and also having statistical independence between different random variables, in hypothesis testing we need to follow the following steps:

1. Stating the relevant null & alternative hypotheses. The state of the null hypothesis (being =,≥,≤ something)indicates how many rejection regions we will have (for = sign we will have two regions and for others just one region; depending on the difference between the value of estimator and the claimed value for the population parameter the rejection area could be on the right or left of the distribution curve).

𝐻0: 𝜇 = 0.5

𝐻1: 𝜇 ≠ 0.5

𝐻0: 𝜇 ≥ 0.5 (𝑜𝑟 𝜇 ≤ 0.5)

𝐻1: 𝜇 < 0.5 (𝑜𝑟 𝜇 > 0.5)Graphs Adopted from http://www.soc.napier.ac.uk/~cs181/Modules/CM/Statistics/Statistics%203.html

Hypothesis Testing 2. Identifying the level of significance of the test (𝜶) and it is usually

considered to be 5% or 1%, depending on the nature of the test and the goals of researcher. When 𝜶 is known with the prior knowledge about the sample distribution, the critical region(s) (or rejection area(s)) can be identified.

Here we have two critical values for standard normal

distributions associated to the level

of significance 𝛼 =5% and 𝛼 = 1%

Adopted from http://www.psychstat.missouristate.edu/introbook/sbk26.htm

𝑍𝛼=1.65

𝑍𝛼=2.33

Hypothesis Testing 3. Constructing a test statistic (a function based on the sample distribution &

sample size). This function is used to decide whther or not to reject 𝑯𝟎.

Table

Ad

op

ted

from

http

://ww

w.b

ls-stats.org/u

plo

ads/1/7/6/7/1

76771

3/250714.im

age0.jp

g

Here we have a list of some of

the test statistics

for testing different

hypotheses

Hypothesis Testing 4. Taking a random sample from the population and calculating the value of the test statistic. If the value is in the rejection area the null hypothesis 𝑯𝟎

will be rejected in favour of the alternative 𝑯𝟏at the predetermined significance level 𝜶, otherwise the sample does not provide sufficient evidence to reject 𝑯𝟎 (this does not mean that we accept 𝑯𝟎)

Adopted from http://www.onekobo.com/Articles/Statistics/03-Hypotheses/Stats3%20-%2010%20-%20Rejection%20Region.htm

−𝒁𝜶 𝑜𝑟 − 𝒕𝜶,𝒅𝒇 if there is a left-tail test

−𝒁𝜶

𝟐𝑜𝑟 − 𝒕𝜶

𝟐,𝒅𝒇 if there is a two-tail test

+𝒁𝜶 𝑜𝑟 + 𝒕𝜶,𝒅𝒇 if there is a right-tail test

+𝒁𝜶

𝟐𝑜𝑟 + 𝒕𝜶

𝟐,𝒅𝒇 if there is a two-tail test

Exampleo A chocolate factory claims that its new tin of cocoa powder contains at

least 500 gr of the powder. A standard checking agency takes a random sample of 𝑛 = 25 of the tins and found out that sample mean weight of tins is 𝑋 = 520 𝑔𝑟 and the sample standard deviation is 𝑠 = 75 𝑔𝑟. If we assume the weight of cocoa powder in tins has a normal distribution, does the sample provide enough evidence to support the claim at 95% level of confidence?

1. 𝐻0: 𝜇 ≥ 500

𝐻1: 𝜇 < 500 (so, it is a one-tail test)

2. Level of significance 𝛼 = 5% → 𝑡𝛼2,(𝑛−1) = 𝑡0.05,24 = 1.711 (it is t-

distribution because 𝑛 < 30 and we do not have a prior knowledge about the population standard deviation)

3. The value of the test statistics is : 𝑡 =𝑋−𝜇

𝑠

𝑛

=520−500

75

25

= 1.33

4. As 1.33 < 1.711 we are not in the rejection area so, the claim cannot be rejected at 5% level of significance.

Type I & Type II Errors• Two types of errors can occur in hypothesis testing:

A. Type I error; when based on our sample we reject a true null hypothesis.

B. Type II error; when based on our sample we cannot reject a false null hypothesis.

• By reducing the level of significance 𝜶 we can reduce the probability of making type I error (why?) however, at the same time, we increase the probability of making type II error.

• What would happen to type I and type II errors if we increase the sample size? (Hint: look at the confidence intervals)

Adopted from http://whatilearned.wikia.com/wiki/Hypothesis_Testing?file=Type_I_and_Type_II_Error_Table.jpg

Type I & Type II Errors• The following graph shows how a change of the critical line (critical

value) changes the probability of making type I and type II errors:

𝑷 𝑻𝒚𝒑𝒆 𝑰 𝒆𝒓𝒓𝒐𝒓 = 𝜶

And

𝑷 𝑻𝒚𝒑𝒆 𝑰𝑰 𝒆𝒓𝒓𝒐𝒓 = 𝜷

Adopted from http://www.weibull.com/hotwire/issue88/relbasics88.htm

The Power Of a Test:

The power of a test is the probability that the test will correctly reject the null hypothesis. It is

the probability of not committing type II error. The power is

equal to 𝟏 − 𝜷 which means by reducing 𝜷the power of the test

will increase.

The P-Value• It is not unusual to reject 𝐻0 at some level of significance, for

example 𝛼 = 5% , but being unable to reject it at some other levels, e.g. 𝛼 = 1% . The dependence of the final decision to the value of 𝛼 is the weak point of the classical approach.

• In the new approach, we try to find p-value which is the lowest significance level at which 𝐻0 can be rejected. If the level of significance is determined at 5% and the lowest significance level at which 𝐻0 can be rejected (p-value) is 2% so the null hypothesis should be rejected; i.e.

𝒑 − 𝒗𝒂𝒍𝒖𝒆 < 𝜶

To understand this concept better let’s look at an example:

• Suppose we believe that the mean life expectancy of the people in a city is 75 years (𝐻0: 𝜇 = 75). But our observation shows a sample mean of 76 years for a sample size of 100 with a sample variance of 4 years.

Reject 𝐻0

The P-Value• The Z-score (test statistic) can be calculated as following:

• At 5% level of significance the critical Z-value is 1.96 so we must reject 𝑯𝟎. But, we should not have had this result (or should not have had those observations in our random sample) from the beginning if our assumption about the population mean 𝝁 was correct.

• The p-value is the probability of

having these type of results

or even worse than that (i.e. a Z-score

bigger than 2.5) considering the null

hypothesis was correct,

𝑷(𝒁 ≥ 𝟐. 𝟓 𝝁 = 𝟕𝟓) = 𝒑 − 𝒗𝒂𝒍𝒖𝒆 ≈ 𝟎. 𝟎𝟎𝟔 (it means in 1000 samples this type of results can happen theoretically 6 times; but it has happened in our first random sampling).

𝑍 =𝑋 − 𝜇𝑠𝑛

=76 − 75

4

100

= 2.5

Z=2.5

𝑷 𝒁 ≥ 𝟐. 𝟓≈ 𝟎. 𝟎𝟎𝟔

http

://faculty.elgin

.edu

/dke

rnler/statistics/ch

10/10

-2.htm

l

The P-Value• As we cannot deny what we have observed and obtained from the

sample, eventually we need to change our belief about the population mean and reject our assumption about that.

• The smaller the p-value, the stronger evidence against 𝐻0.