Biometric Method I - University of Ibadan

Biometric Method I

University of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning Centre

Open and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series Development

COURSE MANUAL

Biometric Method I STA 351





2

Copyright © 2008, Revised in 2015 by Distance Learning Centre, University of Ibadan, Ibadan.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval

system, or transmitted in any form or by any means, electronic, mechanical,

photocopying, recording or otherwise, without the prior permission of the copyright

owner.

ISBN 978-021-340-6

General Editor: Prof. Bayo Okunade

University of Ibadan Distance Learning Centre

University of Ibadan, Nigeria

Telex: 31128NG

Tel: +234 (80775935727) E-mail: [email protected]

Website: www.dlc.ui.edu.ng

3

Vice-Chancellor’s Message

The Distance Learning Centre is building on a solid tradition of over two decades of service in the provision of External Studies Programme and now Distance Learning Education in Nigeria and beyond. The Distance Learning mode to which we are committed is providing access to many deserving Nigerians in having access to higher education especially those who by the nature of their engagement do not have the luxury of full time education. Recently, it is contributing in no small measure to providing places for teeming Nigerian youths who for one reason or the other could not get admission into the conventional universities.

These course materials have been written by writers specially trained in ODL course delivery. The writers have made great efforts to provide up to date information, knowledge and skills in the different disciplines and ensure that the materials are user-friendly.

In addition to provision of course materials in print and e-format, a lot of Information Technology input has also gone into the deployment of course materials. Most of them can be downloaded from the DLC website and are available in audio format which you can also download into your mobile phones, IPod, MP3 among other devices to allow you listen to the audio study sessions. Some of the study session materials have been scripted and are being broadcast on the university’s Diamond Radio FM 101.1, while others have been delivered and captured in audio-visual format in a classroom environment for use by our students. Detailed information on availability and access is available on the website. We will continue in our efforts to provide and review course materials for our courses.

However, for you to take advantage of these formats, you will need to improve on your I.T. skills and develop requisite distance learning Culture. It is well known that, for efficient and effective provision of Distance learning education, availability of appropriate and relevant course materials is a sine qua non. So also, is the availability of multiple plat form for the convenience of our students. It is in fulfilment of this, that series of course materials are being written to enable our students study at their own pace and convenience.

It is our hope that you will put these course materials to the best use.

Prof. Abel Idowu Olayinka

Vice-Chancellor

4

Foreword

As part of its vision of providing education for “Liberty and Development” for Nigerians and the International Community, the University of Ibadan, Distance Learning Centre has recently embarked on a vigorous repositioning agenda which aimed at embracing a holistic and all encompassing approach to the delivery of its Open Distance Learning (ODL) programmes. Thus we are committed to global best practices in distance learning provision. Apart from providing an efficient administrative and academic support for our students, we are committed to providing educational resource materials for the use of our students. We are convinced that, without an up-to-date, learner-friendly and distance learning compliant course materials, there cannot be any basis to lay claim to being a provider of distance learning education. Indeed, availability of appropriate course materials in multiple formats is the hub of any distance learning provision worldwide.

In view of the above, we are vigorously pursuing as a matter of priority, the provision of credible, learner-friendly and interactive course materials for all our courses. We commissioned the authoring of, and review of course materials to teams of experts and their outputs were subjected to rigorous peer review to ensure standard. The approach not only emphasizes cognitive knowledge, but also skills and humane values which are at the core of education, even in an ICT age.

The development of the materials which is on-going also had input from experienced editors and illustrators who have ensured that they are accurate, current and learner-friendly. They are specially written with distance learners in mind. This is very important because, distance learning involves non-residential students who can often feel isolated from the community of learners.

It is important to note that, for a distance learner to excel there is the need to source and read relevant materials apart from this course material. Therefore, adequate supplementary reading materials as well as other information sources are suggested in the course materials.

Apart from the responsibility for you to read this course material with others, you are also advised to seek assistance from your course facilitators especially academic advisors during your study even before the interactive session which is by design for revision. Your academic advisors will assist you using convenient technology including Google Hang Out, You Tube, Talk Fusion, etc. but you have to take advantage of these. It is also going to be of immense advantage if you complete assignments as at when due so as to have necessary feedbacks as a guide.

The implication of the above is that, a distance learner has a responsibility to develop requisite distance learning culture which includes diligent and disciplined self-study, seeking available administrative and academic support and acquisition of basic information technology skills. This is why you are encouraged to develop your computer skills by availing yourself the opportunity of training that the Centre’s provide and put these into use.

5

In conclusion, it is envisaged that the course materials would also be useful for the regular students of tertiary institutions in Nigeria who are faced with a dearth of high quality textbooks. We are therefore, delighted to present these titles to both our distance learning students and the university’s regular students. We are confident that the materials will be an invaluable resource to all.

We would like to thank all our authors, reviewers and production staff for the high quality of work.

Best wishes.

Professor Bayo Okunade

Director

6

Course Development Team

Content Authoring Obisesan K.O.

Content Editor

Production Editor

Learning Design/Assessment Authoring

Managing Editor

General Editor

Prof. Remi Raji-Oyelade

Ogundele Olumuyiwa Caleb

SkulPortal Technology

Ogunmefun Oladele Abiodun

Prof. Bayo Okunade

7

Table of Contents Study session 1: Introduction to Population Genetics ............................................................. 13

Introduction .......................................................................................................................... 13

Learning Outcomes for Study Session 1 .............................................................................. 13

1.1 What is Genetics........................................................................................................ 13

1.1.1 The early history of population genetics ............................................................ 14

1.2 Basic Definitions ....................................................................................................... 15

In-Text Question .............................................................................................................. 16

In-Text Answer ................................................................................................................ 16

1.3 Gene Proportions and Variations ................................................................................... 17

1.3.1 Gene Variations ...................................................................................................... 18

1.3.2 The Mendel’s Experiment....................................................................................... 22

In-Text Answer ................................................................................................................ 29

In-Text Question .............................................................................................................. 29

Summary of Study Session 1 ............................................................................................... 29

Self-Assessment Question for Study Session 1 ................................................................... 30

SAQ (Test of learning outcome) ...................................................................................... 30

References ............................................................................................................................ 31

Study Session 2: Introduction to Biometrics ........................................................................... 32

Introduction .......................................................................................................................... 32


2.1 Biometrics ...................................................................................................................... 32

In-Text Question .............................................................................................................. 34

In-Text Answer ................................................................................................................ 34

2.2 Biological Data .............................................................................................................. 34

2.2.1 Sources of Biological Data ..................................................................................... 34

2.3 Challenges in Biological Data Collection ...................................................................... 35

2.3.1 Statistical Data ........................................................................................................ 35

2.4 Basic Terms in Biometrics ............................................................................................. 36

In-Text Question .............................................................................................................. 37

In-Text Answer ................................................................................................................ 37

8

2.5 Descriptive Statistics ...................................................................................................... 37

2.5.1 Measure of Location ............................................................................................... 37

2.5.2 Relationship between Arithmetric, Geometric and Harmonic Mean ...................... 44

2.5.3 Measures of Dispersion........................................................................................... 50

Summary of Study session 2 ................................................................................................ 65

Post-Test .......................................................................................................................... 65

Self-Assessment Question for Study Session 2 ................................................................... 67

SAQ (Test of learning outcome) ...................................................................................... 67

References ............................................................................................................................ 68

Study Session 3: Sampling and Sampling Distribution ........................................................... 69

Introduction .......................................................................................................................... 69


3.1 Sample............................................................................................................................ 69

3.1.1 Sampling Method .................................................................................................... 70

In-Text Question .............................................................................................................. 73

In-Text Answer ................................................................................................................ 74

3.2 Hypothesis Testing......................................................................................................... 87

3.2.1 Definition of some Basic Concepts ......................................................................... 88

3.2.2 Hypothesis Testing on the Difference of Two Population Mean ........................... 92

In-Text Question .............................................................................................................. 96

In-Text Answer ................................................................................................................ 96

3.2.3 Probability Distribution and Hypothesis Testing .................................................. 102

Summary of Study Session 3 ............................................................................................. 113

Self-Assessment Question for Study Session 3 ................................................................. 114

SAQ (Test of learning outcome) .................................................................................... 114

References .......................................................................................................................... 114

Study Session 4: Design and Analysis of Biological Experiments........................................ 115

Introduction ........................................................................................................................ 115

Learning Outcome for Study Session 4 ............................................................................. 115

4.1 Design of an Experiment ............................................................................................. 116

4.1.1 Components of Experimental Design ................................................................... 116

4.1.2 Principles of Experimental Design ....................................................................... 117

In-Text Question ............................................................................................................ 119

In-Text Answer .............................................................................................................. 119

9

4.2 Steps Involved in the Planning and Execution of an Experimental Design ................ 119

In-Text Answer .............................................................................................................. 120

In-Text Question ............................................................................................................ 120

4.3 Definitions of Relevant Terms in Design and Analysis of Biological......................... 121

4.4 Experiments ................................................................................................................. 121

4.4.1 Analysis of Variance (ANOVA) ........................................................................... 121




References .......................................................................................................................... 140

Study Session 5: Design and Analysis of Clinical Trials ...................................................... 141

Introduction ........................................................................................................................ 141


5.1 What are Clinical Trials? ............................................................................................. 142

5.1.1 Features of Clinical Trials ..................................................................................... 142

5.1.2 Needs for Clinical Trials ....................................................................................... 142

5.1.3 Uses of Clinical Trials .......................................................................................... 143

5.1.4 Advantages of Clinical trials ................................................................................. 143

5.1.5 Disadvantages of Clinical trials ............................................................................ 143

5.2 Identification of Phases of Clinical Trials ................................................................... 144

5.2.1 Phase A of Clinical Trials ..................................................................................... 145

5.2.2 Phase B of Clinical Trials ..................................................................................... 145

5.2.3 Phase C of Clinical Trials ..................................................................................... 145

5.2.4 Phase D of Clinical Trials ..................................................................................... 146

In-Text Question ............................................................................................................ 146

In-Text Answer .............................................................................................................. 146

5.3 Basic Terms on Clinical Trial ...................................................................................... 146

5.3.1 Protocol Design in Clinical Trials......................................................................... 148

5.3.2 Role of Statisticians in Clinical Trials .................................................................. 148

In-Text Question ............................................................................................................ 149

In-Text Answer .............................................................................................................. 149

5.4 Ethical Issues on Clinical Trials .................................................................................. 149

5.4.1 Sample Size Determination................................................................................... 150


10


References .......................................................................................................................... 159

Study Session 6: Biological Assay ........................................................................................ 160

Introduction ........................................................................................................................ 160


6.1 Biological Assay .......................................................................................................... 160

6.1.1 Purpose of Bioassay .............................................................................................. 163

6.2 Types of Assay ............................................................................................................. 164

6.2.1 Direct Assay .......................................................................................................... 164

6.2.2 Indirect Assay ....................................................................................................... 171

In-Text Question ............................................................................................................ 173

In-Text Answer .............................................................................................................. 173

6.2.3 Definitions of some Terms .................................................................................... 176




References .......................................................................................................................... 184

Study Session 7: Life Table ................................................................................................... 185

Introduction ........................................................................................................................ 185


7.1 What is a Life Table? ................................................................................................... 185

7.1.1 Assumptions of Life Table.................................................................................... 186

7.1.2 Usefulness of Life Tables ..................................................................................... 186

7.1.3 Types of Life Table ............................................................................................... 187

In-Text Question ............................................................................................................ 189

In-Text Answer .............................................................................................................. 189

7.2 Relevance of Rates in Life Table Analysis .................................................................. 189

7.2.1 Crude Death Rate .................................................................................................. 189

7.2. Age- Specific Death Rate ........................................................................................ 190




References .......................................................................................................................... 203

NOTES ON SAQS ................................................................................................................. 204

11

Study Session 1 .................................................................................................................. 204

Study Session 2 .................................................................................................................. 206

Study Session 3 .................................................................................................................. 207

Study Session 4 .................................................................................................................. 209

Study Session 5 .................................................................................................................. 210

Study Session 6 .................................................................................................................. 211

Study Session 7 .................................................................................................................. 213

12

Course Introduction

Every aspect of human endeavours now requires the application of Statistics. Statistics is now

employed to solve problems ranging from Anatomy to Zoology (A-Z).

In this course material, we will be introduced to population genetics, biometrics, sampling

estimation and sampling distribution, design and analysis of biological experiment, design

and analysis of clinical trials, biological assay and finally life table.

We shall adopt a step by step approach so as to facilitate easy understanding of the subject,

by providing examples and exercises during each of the studies. It is hoped that at the end of

this course, students and users at all levels will have a sound knowledge of Biometrics

methods.

However, at the end of these studies, students and users are expected to give comprehensive

meaning of genetics, compute gene proportions and their variations, and understand the

meaning of Biometrics. Furthermore, the users should be able to discuss the problems

associated with the collection of biological data, test hypotheses on some sampling assertions,

design and analyse biological experiments, design and analyse clinical trials, and be able to

fully understand the concept of life-table.

The Royal Statistical Society (RSS) in London is appreciated for their cooperation. The

copyright of the statistical table at the back of this book is held by the RSS. I thank the

society for the kind permission to use it in this book.

Obisesan, K.O.

13

Study session 1: Introduction to Population Genetics

Introduction

Population genetics is a subfield of genetics, that deals with genetic differences within and

between populations, and is a part of evolutionary biology. Studies in this branch of biology

examine such phenomena as adaptation, speciation, and population structure. A sample is

Adaptation which speaks about how you have been able to interact with your environment

and it paved way for your dwelling of today.

Generally, population genetics have been applied to animal and plant breeding, to human

genetics, and more recently to ecology and conservation biology. It is important that you

understand simple genetics. Though the emphasis is on human population, genetics are

applicable to any living cell such as plants and animals. Genetics shall be introduced to you

in this study.

Learning Outcomes for Study Session 1

At the end of this study, you should be able to:

1.1 Explain the meaning of Genetics;

1.2 Define some basic terms relating to genetics;

1.3 Define simple Mendelian ratio; and

1.4 Work out some applications to real life situations.

1.1 What is Genetics

You should be aware that you came into existence through your parents, but it’s not certain

that you took note of the fact that you have inherited one or two things from your parents not

physically alone but those that can be revealed scientifically.

For instance, the parent of a child who is seriously ill in the hospital will be called upon at

first if the child needs blood because it is believed that at least one of the parent’s blood

14

group be the same should tally with that of their child. In case this is not so, there would be a

question to know if both of them are the true parents of the child.

Meaning of Genetics

Genetics is the scientific study of the ways in which different characteristics are passed from

each generation of living things to the next. In other words, it is the study of the genetic

constitution of individuals as it affects the genes and genotype from one generation to the

other. The genetic constitution of a population however is described by the gene frequencies

and genotype frequencies.

In the process of transferring these genes from one generation to the other, the genotype of

the parents is broken down (mutation) and a new set of genotype is constituted in the

progenies from the genes transmitted in the gametes. Simply put, genetics is a term used to

describe the particles that are transmitted from parent to their offspring.

1.1.1 The early history of population genetics

The early history of population genetics:

The early history of population genetics

1856-63 Mendel’s experiments on peas

1859 Darwin’s Origin of Species

1900 Rediscovery of Mendel’s laws

1909 Nilsson-Ehle’s experiments on wheat

1915 Morgan’s experiments on Drosophila

1918 Fisher’s paper on phenotypic correlations between relatives

1918 Sturtevant’s artificial selection experiments on Drosophila

1912-1920 Pearl, Jennings and Wright’s work on inbreeding

1930 Fisher’s The Genetical Theory of Natural Selection (Fundamental theorem)

1931 Wright’s Evolution in Mendelian populations

1932 Haldane’s The Causes of Evolution

1955 Kimura diffusion equation solution to the distribution of allele frequencies

15

1.2 Basic Definitions

Population: This is the total number of people living in a particular territory at a particular

time. In genetic sense, however, population refers to a breeding group that is capable of

transmitting genes from one generation to another.

Gene: This is a unit in a cell, which controls a particular quality in a living thing that has

been passed on from its parents. In other words, it is a basic part of a living cell that contains

characteristics of one’s parent.

Genotype: This is the make up of living creatures with some genetic combination. It

describes the array of gene frequencies present in each individual.

Allele: This is one of the several forms of a gene. It is a short form of allelomorph. Allele are

responsible for the production of contrasting characteristics of a given gene differing in

Deoxyribonucleic acid (DNA), which is the hereditary material of the majority of living

organism and affecting the functioning of single product (RNA-ribonucleic acid).

Fecundity: This is the physiological ability to produce offspring or the capability of repeated

fertilization.

Fertilization: It is the fusion of two gametes to produce a cell called the zygote.

Natural Selection: When a selection force is at work such as survivorship, competition for

food and space, some members of the species possess adaptive characters more than the

others.

Recombination: The formation of offspring, containing a combination of genes that are

different from the combination in either parent. This can act on natural selection to produce

evolutionary change.

Mating: It is the union in pairs of individuals of opposite sexes to accomplish sexual

reproduction.

16

Taxonomy: It is the study of classification and identification of living organisms whereby

you have the;

Mutation: This is the process in which the genetic material of a living thing changes in

structure when it is passed on to its offspring.

DNA (Deoxyribonucleic Acid): It is a chemical that carries genetic information in the cells

of each living thing. (Watson and Grida won a Nobel Prize for discovering DNA).

Chromosome: It is the part of a living thing that contains the characteristics that are passed

on from parents to their offspring. They look like tiny threads and also contain the DNA in

each cell.

In-Text Question

______________ is the make up of living creatures with some genetic combination.

a. Genotype

b. Fecundity

c. Fertilization

d. DNA

In-Text Answer

a. Genotype

Kingdom Animalia

Phylum Chordata

Class Mammalia

Order Primates

Family Hominidae

Genus Homo

Species Homo sapiens

17

1.3 Gene Proportions and Variations

Gene Proportions

Suppose there are two alleles A and S in a population of N individuals, genotypes could be

formed as AA, AS (also for SA) and SS. If N11 has AA and N22 has SS then N12 has AS and

that

N = N11 + N12 + N22

The genotype frequency is generally defined as

The proportion of population (gene frequency) with allele A,

( ) ASAA PPAP 21+=

N

NN 1211 21+

=

The proportion of population (gene frequency) with allele S,

( )

N

NN

N

N

N

N

PPSP ASSS

1222

1222

21

21

21

+

+

+=

The total proportion is unity, that is

P(A) + P(S) = 1

but P(A) = PAA + ½PAS

P(S) = PSS + ½PAS

So we have. PAA + ½PAS + PSS + ½PAS =1, which implies

PAA + PAS + PSS =1

N

NP

N

NP

N

NP

SS

AS

AA

22

12

11

=

=

=

18

Proof:

Recall that N = N11 + N12 + N22

Divide through by N

Hence

1 = PAA + PAS + PSS

PAA + PAS + PSS = 1

PAA + ½PAS + ½PAS + PSS = 1

P(A) + P(S) = 1

Box 1.1: Recall

Genetics is the scientific study of the ways in which different characteristics are passed from

each generation of living things to the next.

1.3.1 Gene Variations

The variance of these proportions also can be calculated. It is used to check on the variability

between the proportions.

For the proportion of population with an allele A,

The variance V{P(A)} = V(PAA) + V(PAS) – 2COV (PAA, PAS)

SS

AS

AA

PN

N

PN

N

pN

NWhere

N

N

N

N

N

N

N

N

N

N

N

N

N

N

=

=

=

++=

++=

22

12

11

221211

221211

1

19

Where V(PAA) = ( )

N

PP AAAA −1

V(PAS) = ( )N

PP ASAS 2112

1 −

COV (PAA, PAS) = ( )N

PP ASAA 21

For the proportion of population with an allele S,

The variance V{P(S)} = V(PSS) + V(PAS) - 2COV (PSS, PAS)

Where V (PSS) = ( )N

PP SSSS −1

V(PAS) = ( )N

PP ASAS 2112

1 −

{ }N

PPpp

ASSS

ASSS

)21(

)cov( , =

The standard error, coefficient of variation and confidence interval can also be computed.

Standard Error, S.E. is the positive square root of its variance

S. E. (P(A)) = ))(( APV

Coefficient of variation %100P(A)

(P(A)) E. S.))(.(. XAPVC =

The confidence interval C.I. for P(A) is given as

Which can be put as:

))((.)()),((.)(22

APESZAandPAPESZAP αα +−

or written as

))((..)( 2/ APESZAP α±

20

))(ˆ(.)(ˆ)())(ˆ(.)(ˆ22

APESZAPAPAPESZAP αα +<<−

where α is the significance level.

Confidence level = α−%100

Similarly,

S.E. (P(S) = ))(( SPV

%100P(S)

(P(S)) E. S.))(.(. XSPVC =

C.I. for P(S) is given as

))((.)(2

SPESZSP α±

Which can be put as

))((.)()),((.)(22

SPESZSandPSPESZSP αα +−

or written as

where α is the significance level.

Confidence level = α−%100

Example 1

The number of individuals with genotypes AA, AS and SS in a sample of 1482 in a

population is given as:

Genotype AA AS SS

Number 421 735 326

a. For each blood group obtain the genotype frequency

b. Obtain the gene frequencies of P(A), P(S) and their standard errors.

c. Place a bound on your estimates in (b) above at 5% level.

))(ˆ(..)(ˆ)())(ˆ(..)(ˆ2/2/ SPESZSPSPSPESZSP αα +<<−

21

Solution

(a) Genotype frequency

i. 284.01482

421ˆ 11 ===N

NPAA

ii. 496.01482

735ˆ 12 ===N

NPAS

iii. 220.01482

326ˆ 22 ===N

NPSS

(b) Gene Frequency

i. = ASAA PP ˆˆ2

1+

= 0.284 + ½(0.496)

= 0.284 + 0.248

= 0.532

ii. )(ˆ SP = ASSS PP ˆˆ2

1+

= 0.220 + ½(0.496)

= 0.220 + 0.248

= 0.468

To obtain their standard errors, we calculate their variances first.

iii. V(P(A) = V(PAA) + V(PAS) – 2COV (PAA, PAS)

= N

PP

N

PP

N

PP ASAAASASAAAA 21

21 2)1()1( −

−+−

= 0.284(0.716) + 0.248(0.752) – 2(0.284)(0.248)

1482 1482 1482

= 0.203344 + 0.186496 - 0.140864

1482

)(ˆ AP

22

= 0.248976

1482

= 0.000168

Hence,

S.E. (P(A)) =

= 0.01296

By calculation, S.E.(P(S)) also is 0.01296

(c) 95% Confidence Interval

(i) ( ) ( ) ))(ˆ.(.ˆ

2APESZAPAP α±=

= 0.532 )01296.0(2

05.0Z±

= 0.0532 ±Z0.025(0.01296)

= 0.532 ±1.96(0.01296)

= 0.532±0.0254016

= (0.507, 0.557)

That is 0.507 < P(A) < 0.557

(ii) For P(S) = ))(ˆ(.)(ˆ2

SPESZSP α±

= 0.468 ± Z0.05/2 (0.01296)

= 0.468 ± 1.96 (0.01296)

= 0.468 ± 0.0254016

= (0.443, 0.493)

That is 0.443 < P(S) < 0.493

1.3.2 The Mendel’s Experiment

This was done by taking pollen from one plant and dusting it on the stigma of the other plant.

Precautions were taken to ensure that the latter was not contaminated with its own pollen or

000168.0

23

that of any other plant. The seeds resulting from this cross were collected and sown. It was

found that all the seeds gave rise to tall plants, no dwarf plant was produced at all. This plant

belongs to what is called first filia generation (or F1 for short).

One of this tall F1 plant was then self pollinated, precautions again were taken to prevent its

being pollinated by any other kind of pollen, the resulting seeds were sown and the offspring

F2 generation were carefully examined. It was found that the F2 plants were a mixture; some

were tall and some dwarf.

In one particular cross, he counted 787 tall plants and 277 dwarf ones i.e. approximately ¾

were tall and ¼ were dwarf meaning that the proportion of tall to dwarf plants approximate to

3:1. This is the Mendelian ratio.

24

The Figure below Illustrates what Happened when a Pure Breeding Pea Tall Plant is

Crossed with a Dwarf One

A punnet square can be used to show the fusion of Fusion of F1 Gametes:

Gametes

½T ½t

½T ¼TT ¼Tt

½t ¼Tt ¼tt

σ

σ

Gam

etes

Parent

P

Tall

TT

Dwarf

tt

X

Alleles in gamates Alleles in gamates

All t All T Gamates

produced by

Gamates Fused

First Filia

Generation F1 Tall Tt Tall Tt

X

Self Polinated

Segregation Segregation

½ T ½ t ½ T ½ t

G a m a t e s F u s e d

Second Filia

Generation F2

¼ TT ¼ Tt ¼ Tt ¼ tt

Tall ¾ Dwarf ¼

25

Example 2

If a cross pollination experiment has been performed correctly, then ⅛ of the subsequent

plant should have red flowers, ⅜ should have pink flowers and the remainder should have

white flowers. A random sample of 80plants is found to comprise 4 red flowers, 40 pink

flowers and 36 white flowers. Does this provide significance evidence at 5% level that the

experiment has been performed incorrectly?

Solution

H0: Experiment was performed correctly

H1: Experiment was performed incorrectly.

Probability frequencies for plant

i. Red = ⅛

ii. Pink = ⅜

iii. White = 4/8 = ½

Observed frequencies

i) Red = 4

ii) Pink = 40

iii) White = 36

80

Therefore the expected frequencies ei = nPi

108

180;Re) 1 =

=edi

308

380;) 2 =

=ePinkii

402

180;) 3 =

=eWhiteiii

Now

26

Oi ei

4 10 3.6

40 30 3.3

36 40 0.4

7.3

i.e.

Decision: Accept the null hypothesis if is less than . Otherwise reject.

Conclusion: We reject the null hypothesis, since the calculated is in the rejection region

i.e. . So, we conclude that the experiment has been performed incorrectly.

Example 3:

A Botanist conjectures that the specimens of a particular plant are growing at random

location in a bar with mean rate 4 plants per square meter. To test the conjecture, he divides a

randomly chosen region of the bar into non-overlapping ridges of area 0.5m2 and counts the

number of plant in each region. The results were as follows:

No. of plants 0 1 2 3 4

No. of Regions 4 14 9 7 6 0

Test the conjecture using the 5% level of significance

2( )i i

i

O ee

−

2

2 21,5%

2(3 0 1),0.05

22,0.05

7.3

5.991

calculated

tabulated v m

χχ χ

χ

χ

− −

− −

=

=

=

==

2.calχ 2

tabχ

2χ

2 2cal tabχ χ>

5≥

27

Solution:

H0: The specimens are growing at random location

Hi: The specimens are not growing at random location

X f fx

0 4 0

1 14 14

2 9 18

3 7 21

4 6 24

0 0

Total 40 77

925.140

77 ===∑∑

f

fxX

5≥

28

We shall consider the Poisson distribution.

Total 04

Now, we revise ei by adding together ei<5 which also affects corresponding Oi.

Oi ei

4 5.836 0.5776

14 11.232 0.6821

9 10.812 0.3037

7 6.936 0.0006

6 5.184 0.1284

Total 1.6924

1 .9 2 5 0

1 . 9 2 5 1

1 . 9 2 5 2

1 .9 2 5 3

1 . 9 2 5 4

( )!

(1 . 9 2 5 )( 0 ) 0 . 1 4 5 9

0 !

(1 . 9 2 5 )( 1 ) 0 . 2 8 0 8

1 !

(1 . 9 2 5 )( 2 ) 0 . 2 7 0 3

2 !

(1 . 9 2 5 )( 3 ) 0 . 1 7 3 4

3 !

(1 . 9 2 5 )( 4 ) 0 . 0 8 3 5

4 !( 5 ) 1 ( 5 )

X xe XP X x

x

eP X

eP X

eP X

eP X

eP X

P X P X

−

−

−

−

−

−

= =

= = =

= = =

= = =

= = =

= = =

≥ = − <=

0

1

2

3

4

1 0 . 9 5 3 9 0 . 0 4 6 1

4 0 0 . 1 4 5 9 5 . 8 3 6

4 0 0 . 2 8 0 8 1 1 . 2 3 2

4 0 0 . 2 7 0 3 1 0 . 8 1 2

4 0 0 . 1 7 3 4 6 . 9 3 6

4 0 0 . 0 8 3 5

i iE x p e c t e d f r e q u e n c i e s e n P

e X

e X

e X

e X

e X

− =

== == == == == =

5

3 . 3 4

4 0 0 . 0 4 6 1 1 . 8 4 4e X≥ = =

2( )i i

i

O ee

−

29

815.7

.

6924.1

205.0,3

205.0,115

2%5,1

2

2

==

=

=

=

−−

−−

χ

χ

χχχ

ei

mvtab

cal

Decision: Accept the null hypothesis if is less than . Otherwise reject.

Conclusion: We accept the null hypothesis since the calculated is in the acceptance

region i.e. . So we conclude that the specimens are growing at random location.

In-Text Answer

Mendel’s experiment by taking pollen from one plant and dusting it on the stigma of the other

plant. True/False?

In-Text Question

True

Summary of Study Session 1

In this study, you have learnt that:

1. Genetics is an important topic in the study of biometrics. Gregor Mendel (1882-1884)

is the father of genetics.

2. Population genetics is the branch of genetics that describes the consequences of

mendelian inheritance at the population level.

3. Genetics; we considered proportions of genes and confidence intervals. We also

considered simple mendelian ratio and test of hypothesis.

2

calχ 2

tabχ

2χ2 2

cal tabχ χ<

30

Self-Assessment Question for Study Session 1

Having studied this session, you can now assess how well you have learnt its learning outcomes

by answering the following questions. It is advised that you answer the questions thoroughly and

discuss your answers with your tutor in the next Study Support Meeting. Also, brief answers to the

Self-Assessment Questions are provided at the end of this module as a guide.

SAQ (Test of learning outcome)

The number of individuals with genotypes AA, AS and SS in a sample of 1482 in a

population is given as:

Genotype AA AS SS

Number 421 735 326

a. For each blood group obtain the genotype frequency

b. Obtain the gene frequencies of P(A), P(S) and their standard errors.

c. Place a bound on your estimates in (b) above at 5% level.

31

References

Bliss C.I. (1970, 3rd Ed): Statistics in Biology: Statistical Methods for Research in the Natural

Sciences, Vol. 1 and 2, published by Wiley.

Cox D.R. (1992, Classical Ed): Planning of Experiments, published by Wiley.

Finney D.J. (1978 3rd Ed): Statistical Method in Biological Assay, Published by Griffin.

32

Study Session 2: Introduction to Biometrics

Introduction

Biometrics are computerised methods of recognizing you based on a physiological or

behavioural characteristic. Among the features measured are; face, fingerprint, hand

geometry, signature, voice, etc. Biometric technologies are becoming the foundation of an

extensive array of highly secure identification and personal verification solutions. Financial

fraud and insecurity encouraged its use, almost everywhere you go it is the check.

You should be familiar with the methods of registration before the election and voting during

election. Furthermore, you will learn about the basic statistical techniques that are used in the

study of biological data and in the analysis of biological experiment. Statistical techniques or

methods play a very significant role in virtually all-human endeavours. They are widely

applied in various fields such as Biological science, medicine, physical science, and so on.



2.1 Discuss the meaning of Biometrics;

2.2 Identify the sources of biological data;

2.3 Identify problems associated with the collection of biological data;

2.4 Identify some basic terms in Biometrics; and

2.5 Discuss various statistical tools.

2.1 Biometrics

Population genetics is a subfield of genetics that deals with genetic differences within and

between populations, and is a part of evolutionary biology. Studies in this branch of biology

examine such phenomena as adaptation, speciation, and population structure.

Population genetics was a vital ingredient in the emergence of the modern evolutionary

synthesis. Its primary founders were

who also laid the foundations for the related discipline of quantitative genetics.

Traditionally a highly mathematical discipline, modern population genetics encompasses

theoretical, lab, and field work.

inference from DNA sequence data and for proof/disproof of concept.

Figure 2.1: Sewall Wright, J. B. S. Haldane and Ronald Fisher

Statistics applied to biological problem is called B

measurement using statistical method

method and application of the mathematical sciences (Mathematical statistics and computer

science) in the studies of agriculture, biolo

Greek word "bios" meaning life and M

Then, we can say that "Biometry" refers to "measurement of life”. Biometric

refers to the application of statistical techniques to

Biometrics is also the application of statistical methods to the analysis of data derived from

the biological science and medicine.

two or more treatments and to select the one

disease or disorder.

33

synthesis. Its primary founders were Sewall Wright, J. B. S. Haldane and



theoretical, lab, and field work. Population genetic models are used both for statistical

inference from DNA sequence data and for proof/disproof of concept.

: Sewall Wright, J. B. S. Haldane and Ronald Fisher

o biological problem is called Biometry, which is the study of biological

atistical methods or techniques. The biometric is linked with the theory,


science) in the studies of agriculture, biology and the environment. It originated from the

k word "bios" meaning life and Metron meaning, "measure".

Then, we can say that "Biometry" refers to "measurement of life”. Biometric

refers to the application of statistical techniques to biological problems.


the biological science and medicine. The aim of many medical investigations is to compare

two or more treatments and to select the one that is more effective in dealing with some

and Ronald Fisher,



Population genetic models are used both for statistical

: Sewall Wright, J. B. S. Haldane and Ronald Fisher

is the study of biological

is linked with the theory,


It originated from the

Then, we can say that "Biometry" refers to "measurement of life”. Biometrics then clearly


The aim of many medical investigations is to compare

that is more effective in dealing with some

34

In-Text Question

As you have learnt thus far, statistics applied to ____________problem is called Biometry?

a. Biological

b. Mathematical

c. Scientific

d. Economic

In-Text Answer

a. Biological

2.2 Biological Data

Biological experiments deal with living things and are characterized by the fact that no

individuals are exactly alike. These variations between individuals are the bases of the

variation of new forms of life.

2.2.1 Sources of Biological Data

The following are the sources of biological data:

1. Laboratory Experiment.

2. Control Experiment.

3. Agricultural Farm.

4. Research Institutions.

5. Existing Records.

35

2.3 Challenges in Biological Data Collection

Biological data are data or measurements collected from biological sources, which are often

stored or exchanged in a digital form. Biological data are commonly stored in files or

databases. Examples of biological data are DNA base-pair sequences, and population data

used in ecology.

The following are the problems associated with the Collection of Biological Data:

1) Some diseases may not be found in the available register despite the fact that they are

in existence.

2) Causes of illness are not always recorded.

3) The general practitioner’s record, which is a source of medical data, is not kept up to

date.

4) Variation in climatic condition, e.g. rainfall, relative humidity, temperature e.t.c.,

these may affect the yield of crop under investigation.

5) Diseases and pests make the results of the experiment inadequate for analysis

purpose.

2.3.1 Statistical Data

Data are raw facts that have to be processed using "statistical method". When data are

presented in numerical form, they are called "quantitative data". A statistical data must be

compiled in a properly planned manner and for a specific purpose. In biological science,

statistical data are products of measurement or direct observation. A measuring device called

‘scaling’ can make the process of collecting observation on a variable.

Box 2.1: Recall

Biometrics refers to "measurement of life”. Biometrics then clearly refers to the application

of statistical techniques to biological problems.

36

2.4 Basic Terms in Biometrics

1. Sample: It is a collection of individual observation selected by a specified procedure, e.g. a

sample of individuals with HIV/AIDS. It is also defined in general form as the representative

or fractional part of a population.

2. Observation: It refers to the count of individual object or measurement taking on the

smallest sampling unit in a sample, such that the observation can be expressed numerically.

3. Population: It is the total number of individual observations about which inferences are

made, existing anywhere in the world or at least within a definitely specified sampling area,

limited in space or time, e.g. The population of infants in a Local Government Area,

population of lion in the western region of Africa. A population may be of finite size or

infinite size. It is the entire collection of all persons/objects concerning a specific area of

investigation or study.

4. Variable: (A Biological Variable) define a property with respect to which individual in the

sample differs in some ascertainable way. If the property does not differ among the sample

case, it cannot be of statistical interest and therefore cannot be referred to as a variable. It can

also be defined as characteristics of interest that is either measured or counted e.g. the weight

of animal can be measured. Information on variables is called Qualitative data.

5. Accuracy: It is the closeness of the measurement of a computed value to its true value. It is

a measure of variation of the estimates to the actual value. That is,

Accuracy = Actual Value - Estimated Value.

6. Precision: It is the closeness of repeated measurement to its true value. It is the closeness

of an estimate to the expected value. It is the difference between the expected value and the

estimated value. i.e. Precision = Expected value -Estimated value.

7. Unit of Analysis: This is the unit for which we wish to obtain statistical data. These may

be animals, living matter, human being etc.

37

In-Text Question

Observation refers to the count of individual object or measurement taking on the smallest

sampling unit in a sample. True/False

In-Text Answer

True

2.5 Descriptive Statistics

2.5.1 Measure of Location

The various number values which tell us about the location (centres) of a distribution are

known as measures of location, central tendencies or simply called averages. Location here

refers to a value, which is typical or representative of the characteristics of a given

distribution. The Arithmetic Mean, Mode, Median, Harmonic mean, Geometric mean and the

quadratic mean are all measure of location.

1. Arithmetic Mean

This is perhaps the most important and most widely used average. The Arithmetic mean is the

sum of variable values divided by the number of such values. The symbol X (pronounced

X-bar) is used to represent the arithmetic mean or simply mean of a set X-values.

( )

n

xxxn

XxMean

n

n

ii

+++=

=∑

=

K21

1

Sum of variable values (X)

Number of such values= Means

38

Example 2.1

The ages of 10 students in a school were given as follows in years: 12, 14, 20, 20, 17, 18, 19,

24, 25, and 25. Calculate the arithmetic mean.

Solution

Example 2.2

Obtain the arithmetic mean of the numbers 8, 3, 4, 7, 2, 4, 6, 9, 10, 11, 14, 20, 21, 18, 19

Solution

2. Frequency Distribution

The mean is given and defines as:

∑∑=

f

fxXMean )(

where “f” is the frequency

“n” is the variable value

“ Σ ” is the sum up symbol called summation

12 14 20 20 17 18 19 24 25 25( )

10194

10

( ) 19.4 19

Mean X

Mean X years

+ + + + + + + + +=

=

= ≈

1( )

8 3 4 7 2 4 6 9 10 11 14 20 21 18 19

15156

15

10.4

n

ii

XMean X

n==

+ + + + + + + + + + + + + +=

=

=

∑

39

Example 2.3

The Ages in years of one hundred and fifty students in a secondary school are given as:

Age (x) 12 13 14 15 16 17

No of student 25 40 25 30 15 15

Determine the Arithmetic mean age of the distribution

Solution:

Age (x) f fx

12 25 300

13 40 520

14 25 350

15 30 450

16 15 240

17 15 255

Total 150 2115

1

n

i

f xX

f==∑∑

6

16

1

2 1 1 5

1 5 0 1 4 . 1

i

i

f xX

f

X

=

=

=

=

=

∑

∑

40

Example 2.4

The table below shows the distribution of the waiting times for some customers in a certain

Bank in Lagos.

Waiting time No of Customer

1.0 - 1.4 6

1.5 - 1.9 9

2.0 - 2.4 2

2.5 - 2.9 4

3.0 - 3.4 6

3.5 - 3.9 10

4.0 - 4.4 12

4.5 - 4.9 15

5.0 - 5.4 7

5.5 - 5.9 9

Find the average waiting time of the customers.

Solution:

Waiting time No of Customer (f) Mid value(x) fx

1.0 – 1.4 6 1.2 7.2

1.5 – 1.9 9 1.7 15.3

2.0 – 2.4 2 2.2 4.4

2.5 – 2.9 4 2.7 10.8

3.0 – 3.4 6 3.2 19.2

3.5 – 3.9 10 3.7 37

4.0 – 4.4 12 4.2 50.4

4.5 – 4.9 15 4.7 70.5

5.0 – 5.4 7 5.2 36.4

5.5 – 5.9 9 5.7 51.3

Total 80 302.5

41

Harmonic Mean (HM)

The harmonic mean of a set of a value x1,x2,x3,…........xn is the number of observations, n,

divided by the sum of the reciprocals of the values.

It is given by

Example 1.5

Find the harmonic mean of the following data

2,3,4,6,8,9,10 and 12

Solution:

10

1

10

1

( )

302.5

80 3.78

i

i

fxMean X

f

X

=

=

=

=

=

∑

∑

1

i

1 1

.1

x has frequency f then

.

n

i

n n

x x

nH M

x

It

fnH M O R

f fx x

=

= =

=

= =

∑

∑∑ ∑

1

.1

8

1 1 1 1 1 1 1 12 3 4 6 8 9 10 12

8

0.5 0.33 0.25 0.167 0.125 0.111 0.1 0.0838

1.666

4.8

. 4.8

n

i

nH M

x

H M

=

=

=+ + + + + + +

=+ + + + + + +

=

==

∑

If

42

Example 1.6

The data below gives the distribution of the time of studying chimpanzees in a zoological

garden:

Time (Hour) No of Chimpanzees

10 – 19 4

20 – 29 6

30 – 39 7

40 – 49 10

50 – 59 3

60 – 69 7

70 – 79 8

80 – 89 9

90 – 99 6

Determine the Harmonic mean of the number of Chimpanzees to the nearest Hour.

Solution:

Time (Hour) f Midvalue(x) f/x

10 - 19 4 14.5 0.2759

20 - 29 6 24.5 0.2449

30 - 39 7 34.5 0.2029

40 - 49 10 44.5 0.2247

50 - 59 3 54.5 0.0550

60 - 69 7 64.5 0.1085

70 - 79 8 74.5 0.1074

80 - 89 9 84.5 0.1065

90 - 99 6 94.5 0.0635

Total 60 1.3893

43

Geometric Mean

The Geometric mean (G.M) is an analytical method of finding the average rate of growth or

decline in the value of an item over a particular period of time.

The Geometric mean (G.M) of a set of n observation: x1, x2, x3,..............,xn is the nth root

of their product. It is given by

Example 1.7

Determine the geometric mean of the following set of data 2, 4, 8,11,4

Solution:

Using the previous example, find the geometric mean?

( )9

160

H.M 1.3893

43.187 43.2

i

f

fx

hrs

== =

= ≈

∑

∑

( )

( )

1 2

1 2

. , , .......,

T ak ing logarithm s, w e have

LLog G .M = log log ..... log

n

logFor grouped data G .M = A ntilog

nn

n

G M x x x

x x x

f x

f

=

+ + +

∑∑

( )5

5

. 2 4 8 1 1 4

2 8 1 6

4 .8 9 7

4 .9

G M x x x x=

==≈

n

1

44

Solution:

Time (Hour) f Midvalue(x) logx f logx

10 - 19 4 14.5 1.1614 4.6456

20 - 29 6 24.5 1.3892 8.3352

30 - 39 7 34.5 1.5378 10.7646

40 - 49 10 44.5 1.6484 16.4840

50 - 59 3 54.5 1.7364 5.2092

60 - 69 7 64.5 1.8096 12.6672

70 - 79 8 74.5 1.8722 14.9776

80 - 89 9 84.5 1.9269 17.3421

90 - 99 6 94.5 1.9754 11.8524

Total 60 102.2779

[ ]7046.1log

60

2779.102log

loglog. 1

Anti

Anti

f

xfAntiMG

n

x

=

=

=∑

∑=

G.M = 50.6524

2.5.2 Relationship between Arithmetric, Geometric and Harmonic Mean

In general, the geometric mean for a set of data is always less than or equal to the

corresponding arithmetic mean but greater than or equal to the harmonic mean. The

relationship is expressed as follows:

H.M < G.M < A.M

The equality signs hold only if all the observations are identical.

45

Median

The median of a set of number nxxxxx ,,,,, 4321 K arranged in an increasing or decreasing

order of magnitude is defined as:

(a). The middle observation if the number of such is odd.

(b). The arithmetic mean of the two middle observations if the number of such observations

is even.

If there are n observations in the distribution, the median is the magnitude of the ½ (n + 1)th

observations.

Example 1.8

Find the median of the following observations 8, 2, 6, 1, 8, 9, 3

Solution:

Arranging the observations in ascending order of magnitude gives 1,2,3,6,8,8,9

The number of observations is odd i.e.

The middle observation, i.e. median is 6

Example 1.9

Given the following observations

6,3,3,5,9,7,2,10

Solution:

Arranging the distribution in ascending order of magnitude gives 2,3,3,5,6,7,9,10

Since the number of observation is even i.e. 8. The median is the arithmetic mean of the two

middle observations, i.e.

5 6

2 5 .5

M ed ian+=

=

46

Median for Frequency Distribution

Steps:

a) Obtain the Cumulative frequency of a given distribution.

b) Determine 2

N

c) Determine the median class i.e. the class whose cumulative frequency is equal to or

greater than 2

N

d) Apply the formula.

Where

Lm = Lower class boundary of the median class

N = Total frequency ∑= f

Cfm = Cumulative frequency preceding the median class

fm = frequency of the median class

Cm = Class interval of the median class

Example 2.0

The following table shows the frequency distribution of 70 laurel leaves.

Length Frequency

118 -126 8

127 – 135 10

136 – 144 14

145 – 153 18

154 – 162 9

163 – 171 7

172 – 180 4

2 m

m m

m

N CfMedian L C

f

− = +

47

Total 70

Compute the median.

Solution:

Length F

Class Cf (cumulative frequency)

118 -126 8 117.5 – 126.5 8

127 – 135 10 126.5 – 135.5 18

136 – 144 14 135.5 – 144.5 32

145 – 153 18 144.5 – 153.5 50

154 – 162 9 153.5 – 162.5 59

163 – 171 7 162.5 – 171.5 66

172 – 180 4 171.5 – 180.5 70

The median position

Median thth

N

=

2

70

2

Median position = 35th; 352

=N

The median class is 145 – 153, the lower boundary of the median class 144.5 Lm = 144.5

Frequency of the median class = 18

fm = 18

Cumulative Frequency Preceding the modal class is 32 i.e Cfm = 32

Class size of the median is 9

Cm = 9

48

+

−+=

18

935.144

918

32355.144

X

XMedian

Mode

This is the observation or value with the highest frequency of occurrence in a given

distribution

Sometimes, there are two or more of such observation. In such situation the distribution is

multi modal.

Example 2.1

Given the following observations, determine the mode

37, 24, 74, 18, 24, 20, 72, 21, 23, 30, 24

Solution

The modal value is 24 because it occurs most

Example 2.2

From the values below, determine the mode: 14, 3, 6, 7, 1, 4, 6, 8, 1, 6, 3, 6, 3, 14, and 3

Solution

This is a case of a bimodal distribution. That is 3 and 6 are the modal values. In a situation

like this we find the arithmetic mean of 3 and 6.

1 4 4 .5 1 5 1 4 6

1 4 6M e d ia n

+ ==

3 6 m od

2 4 .5

The e+=

=

49

Mode for a Frequency Distribution

a. Determine the modal class i.e. class with the highest frequency

b. Apply the formular

Where Lm = lower class boundary of the modal class

d1= frequency of modal class minus frequency of the class

preceding the modal class

d2 = Frequency of the modal class minus frequency of the

class immediately after the modal class

Cm = Class interval of the modal class

Example 2.3

The number of complaints received at the service counter of a store owned by a company

during a month are as follows

Number of complaint Store

0 – 4 5

5 – 9 9

10 – 14 17

15 – 19 13

20 – 24 12

25 – 34 10

35 – 40 4

Determine the modal number of complaints at the store

1

1 2

m m

dMode L C

d d

= + +

50

Solution:

Number of complaint f

0 – 4 5

5 – 9 9

10 – 14 17

15 – 19 13

20 – 24 12

25 – 34 10

35 – 40 4

The modal class is the class (10 – 14) lower class boundary 9.5

d1=17 - 9 = 8

d2 = 17 – 13= 4

Cm = 5

2.5.3 Measures of Dispersion

In a distribution, measures of central tendency or location only describe the nature of the

distribution, but fail to give any idea as to how individual values differ from the central value

i.e. if they are closely packed around the central value or widely scattered away from it.

As a scientist once puts it "An average doesn't give us full story. An average does not

give information about dispersion of values in the distribution. The magnitude of variation

between two distributions that may have the same mean is known as dispersion.

89.5 5

8 4

8 9.5 5

12

9.5 3.333

12.833

13 int

Mode

compla s

= + +

= +

= +=≈

51

Measures of Dispersion

The following are the measures of dispersion:

i. Range

ii. Quartile deviation or semi-interquartile range

iii. Mean deviation

iv. Standard deviation

v. Variance

vi. Interquartile range

Range

The simplest measure of dispersion is the range. It is defined as the difference between the

highest and the smallest number in a set of a given data. It is very easy to compute. It is not

suitable for further statistical analysis.

Another very good use of range is in statistical quality control where an estimate of the

variance of the sample mean is obtained from R. Also, in the construction of a control chart

to know if a process is in control or not.

Example 2.4

Find the range of the given set of data: 89,120, 146, 76, 50, 49, 60

Solution

Range = highest value - smallest value

= 146 - 49

= 97

Quartile Deviation (Q.D)

a). Inter-Quartile Range: It is the difference between the upper quartile, Q3, and the lower

quartile (Q1).

b). Semi - Interquartile Range: It is defined as half of the interquartile range. It is also

known as quartile deviation, it is represented mathematically as,

52

2

13 QQQD

−=

c). Coefficient of quartile deviation

13

13

QQ

QQCQD

+−

=

Example 2.5

Ungrouped Data

Compute

i. Quartile deviation

ii. Interquartile range

iii. Co-efficient of quartile deviation for the given set of data below:

50, 64, 70, 85, 90, 100, 115, 160, 45, 54

Solution

The data should be arranged in ascending order, we have

45, 50, 54, 64, 70, 85, 90, 100, 115,160

Since the data is even

Q1 = N

4

= 10

4

= 2.5

53

Value of 2.5th item

Q1 = 50 + 0.5 (54 - 50)

= 50 + 0.5(4)

Q1 = 52

Q3 = 3N

4

= 3(10)

4

= 30

4

= 7.5

= value of 7.5th item

= 90 + 0.5(100 - 90)

= 90 + 0.5(10)

= 95

Therefore Quartile deviation 2

13 QQ −=

= 95-52

2

= 43

2

= 21.5

54

Interquartile Range = Q3 - Q1

= 95 - 52

= 43

Coefficient of Quartile deviation

13

13

QQ

QQ

+−

=

= 43

147

= 0.29

Example 2.6

Compute

i. QD

ii. IQR

iii. CQD for the data below

5, 4, 50, 44, 69, 75, 10

Solution:

Arrange in ascending order

4, 5, 10, 44, 50, 69, 75

Q1 = (N+1)

4

= 8/4 = 2

55

Value of the 2nd item

Q1 = 5

Q3 = 3(N+1)

4

= 3(8) = 6

4

value of the 6th item

Q3 = 69

therefore QD = 2

13 QQ −

= 69-5

2

= 32

IQR = Q3 - Q1

= 69 - 5

= 64

CQR = 13

13

QQ

QQ

+−

= 64

74

= 0.86

56

Grouped Data

Example 2.7

Calculate the quartile deviation and its related coefficient from the following data:

Size 10-19 20-29 30-39 40-49 50-59 60-69 70-79

Frequency 1 4 6 7 11 7 4

Solution:

position of Q1 = N

4

40/4 = 10

Size f Class boundary Cumulative frequencies

10-19 1 9.5-19.5 1

20-29 4 19.5-29.5 5

30-39 6 29.5-39.5 11

40-49 7 39.5-49.5 18

50-59 11 49.5-59.5 29

60-69 7 59.5-69.5 36

70-79 4 69.5-79.5 40

From the table, position of Q1 falls in the class boundary 30 – 39 with lower class limit = 29.5

Q1 for grouped data

4N m

l xCf

− = +

57

Where l is the lower class limit

C is the class interval

f is the frequency of quartile position

m is the cumulative frequency of class below quartile position

3Q position 304

4034

3 === XN

l = 59.5 m = 29

107

29305.59

43

3

X

XCf

mNlQ

−+=

−+=

= 59.5 + 1.43

= 60.93

13

13

QQ

QQCQD

+−

=

83.3793.60

83.3793.60

+−=

76.98

1.23=

23.0=

1

10 529.5 10

6

5 29.5 10

6 29.5 8.33

37.83

Q x

x

− ⇒ = +

= +

= +=

58

Mean Deviation

It is the arithmetic mean of the absolute differences of the values from mean, median or

mode. Average deviation has been defined as the average amount of scatter of the items in a

distribution from either the mean or the median, ignoring the signs of the deviation.

Mean Deviation = n

xx∑ − for an

Ungrouped data

and for a grouped data

Example 2.8

Find out the mean deviation of the following series

x 17-19 20-25 26-35 36-40 41-50 51-55 56-60 61-70

f 9 16 12 26 14 12 6 5

Solution

x f

18 9 20.245 182.205

22.5 16 15.745 251.92

30.5 12 7.745 92.94

38.0 26 0.245 6.37

45.5 14 7.255 101.57

53.5 12 14.755 177.06

58.0 6 19.755 118.53

65.5 5 27.255 136.275

Total 1066.87

( ) f x x

Mean deviation MDf

−= ∑

∑

x x− f x x−

59

Standard Deviation (S.D)

In 1893, Karl Pearson introduced the concept of standard deviation.

It is defined as the positive square root of the mean of the squares of the deviations from the

garithmetic mean.

Example 2.9

Find the standard deviation of the scores obtained by students in the table given below:

S/No. 1 2 3 4 5

Scores 50 46 70 89 15

3824 .5

100 38 .245

fxx

f=

=

=

∑∑

1066.87

100 10 .67

f x xM D

f

−=

=

=

∑∑

( )2

22

.f x x

S Df

fx fxOR

f f

−=

−

∑∑

∑ ∑∑ ∑

60

Solution

x

50 16

46 64

70 256

89 1225

15 1521

3082

( )2

x x−

( )2

2 7 0

55 4

3 0 8 2

5

6 1 6 .4

2 4 .8 3

x

x xS D

n

S D

=

=

−=

=

==

∑

61

Example 2.10

Find the standard deviation for the following data

Marks frequency

15 – 20 20

20 – 25 26

25 – 30 44

30 – 35 60

35 – 40 101

40 – 45 109

45 – 50 84

50 – 55 66

55 – 60 10

Solution:

X frequency

17.5 20

22.5 26

27.5 44

32.5 60

37.5 101

42.5 109

47.5 84

52.5 66

57.5 10

62

Coefficient of Variation

It is known as the relative measure of dispersion. It is given by

100X

xCV

σ=

or %100tan

Xmean

darderrorS

Example 2.11

Using the previous example on standard deviation to calculate the coefficient of variation

Note

Coefficient of variation based on standard deviation or standard error helps the comparison of

two or more distributions for their variable.

Note

579

586391

009715615961652

520

20545

520

8593502

22

.

.

..

==

−=

−=

−=

∑∑

∑∑

f

fx

f

fxSD

100Xestimate

estimateoferrordardsCV

tan=

24.227 .

.

=

=

=

100 5 39

579

100

X

X x

CV xσ

63

Skewness

It is the measure of the asymmetric of a distribution. Skewness means lack of symmetry.

Skewness indicates whether the curve is turned more to one side than to the other.

It is the degree of departure from symmetry of a distribution. A distribution can be positively

skewed or negatively skewed or zero skewed.

For a positively skewed distribution, its mean > median >mode. For a negatively skewed one,

we have mean < median<mode. Measures of Skewness give the direction and dimension of

the deviation of the distribution from symmetry. Karl Pearson’s coefficient of Skewness is

given by the formula

Coefficient of Skewness

Bowley’s coefficient of Skewness is given by

This is a measure based on the quartiles and hence is also known as quartile coefficient of

Skewness.

Kelly’s coefficient of Skewness is defined as

19

19 2

DD

MedianDDCS

−−+

=

Among the three measures, the Pearson’s coefficient is used more often

( )deviation standard

MedianMean3 or

deviation standard

modemean

−

−=

13

13

Q-Q

2Median-QQ

+=

64

Example 2.12

Calculate the coefficient of Skewness using data in the table below

Height in inches 58 59 60 61 62 63 64 65

No. of students 10 18 30 42 35 28 16 8

Solution

Using Pearson’s coefficient of Skewness

= iondardDeviatS

MedianMean

tan

−

Kurtosis

Kurtosis is the measure of peakedness of a distribution. A normal curve which is symmetrical

and bell shaped is called a mesokurtic curve. If a curve is peaked at the top, it is termed as

leptokurtic and a curve which is more flat than the normal curve is called platykurtic.

Thus, property of the distribution is measured by

2

2

42 µ

µβ =

230761

61461

761

61

461187

11482

.

.

.

.

.

=

−=

=

=

=

=

=∑∑

Therefore

SD

Mode

f

fxx

65

Coefficient of kurtosis is 322 −= βγ

For a symmetric distribution, . If then, the distribution is called leptokurtic

and if the distribution is called platykurtic.

Summary of Study session 2

In this study, you have learnt about:

1) The meaning of Biometrics;

2) The sources of biological data;

3) Also, you have been able to learn how to identify problems associated with the

collection of biological data;

4) Some basic terms in Biometrics; and

5) The various statistical tools.

Post-Test

1. Calculate the range and semi-interquartile range deviation and coefficient of quartile

Wages

(Rs)

30-32 32-34 34-36 36-38 38-40 40-42 42-44

Freq. 12 18 16 14 12 8 6

Ans: 14, 2.85, 0.08

2. Calculate the QD for the following data:

Height 58 59 60 61 62 63 64 65 66

No. of

Student

15 20 32 35 33 22 20 10 8

32 =β 32 >β

32 <β

66

3. Calculate MD for

Age (yrs) 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80

No. of

patient

20 25 32 40 42 35 10 8

Ans: 10.73

4. Find the SD of the marks scored by the students given as 74,72,78,75,80, 84

Ans: 4.02

5. Calculate the standard deviation and coefficient of variation of the following data

Age (yrs) 0-10 10-20 20-30 30-40 40-50

No. of patient 5 16 18 14 47

Ans:

6. If the coefficients of variation of two groups are 60 and 80 and their standard deviations

are 20 and 16. Find their mean

Ans: 33.3 and 20.

36390713233 . . ,. === CVSDx

67







1. Find the median of the following observations 8, 2, 6, 1, 8, 9, 3

2. Determine the geometric mean of the following set of data 2, 4, 8,11,4

68

References

Brown D. & Rothery P. (1993): Models in Biology: Mathematics, Statistics and Computing.

Collett D. (1991): Modeling Binary data, Published by Chapman & Hall.

Mead R, Curnow R.N. & Hasted A.M. (2002, 3rd Ed): Statistical Methods in Agriculture and

Experimental Biology, Published by Chapman & Hall.

Murray, R. Spiegel (1972, 1st Ed): Theory and Problems of Statistics.

69

Study Session 3: Sampling and Sampling Distribution

Introduction

What comes to your mind when you think of research, I guess it’s a search for some more

relevant information about a subject which several searches has been previously done. The

major purpose of conducting a research is to make inferences or generalize from a sample to

a larger population. This process of making inferences or generalization about the total

population is accomplished by the use of statistical method based on probability.



3.1 Identify various sampling methods

3.2 Test hypotheses

3.1 Sample

In order for you to be in a good shape in this study, you need to understand some terms:

Population can be defined as the collection of the entire item that has something in common.

Population is not limited to human beings alone. A population may either be of finite size or

infinite size.

Sample is the fractional part of the population selected in such a way that it is a

representative of the entire population. Sampling is a process of selecting the fractional part

of the population as a sample.

70

Figure 3.1: Population sampling

Why do we sample? We sample because of the following reasons:

1. Collecting information from the fractional part of the population reduces expenditure

than when the entire population is attempted.

2. A study of an entire population is impossible in most situations, hence, the sample

must be a representative of the population as much as possible.

3. Results based on sample are more accurate than those based on the entire population.

4. Samples can be studied more quickly than the entire population.

5. Data is generally more readily available and more quickly analysed when the sample

is being studied.

3.1.1 Sampling Method

Now, we consider how the sample can be selected, which is referred to as the sampling

techniques.

There are basically two types of sampling techniques

1. Probability sampling/random sampling.

2. Non-probability/non random sampling.

Probability Sampling

This is the method whereby the sample is selected/drawn from the population in such a way

that every member of the population has equal/known chance of being selected. The sample

selected is free from bias. Therefore, inferences about the population from sample are valid.

There are five types of probability sampling:

Figure 3.1

1 Simple Random Sampling

This is a sampling technique in which each unit of the population has an

being selected.

A simple Random sample of n unit is a sample selected in such a way that every N unit in the

population has an equal chance of being selected. A Simple Random Sampling is done either

WITH REPLACEMENT OR WITHOUT REPL

have in sampling. It is very easy to apply and make use of to avoid bias i.e. it is unbiased.

1. Simple random sampling

2. Systematic sampling

3. Stratified sampling

4. Cluster sampling

5. Multi-state sampling

71



Therefore, inferences about the population from sample are valid.

There are five types of probability sampling:-

Figure 3.1: Types of probability sampling

sampling technique in which each unit of the population has an equal probability of



WITH REPLACEMENT OR WITHOUT REPLACEMENT. It is the simplest technique we


Simple random sampling

Systematic sampling

Stratified sampling

Cluster sampling

state sampling



Therefore, inferences about the population from sample are valid.

equal probability of



It is the simplest technique we


72

2 Systematic Sampling

This is a sampling method in which the population units are selected at a fixed interval from

the sampling frame. Suppose there are N units in a population, selecting a sample of n

involves taking a unit at random from the first K units (where K = N/n) and every kth unit

thereafter.

n

Nkei

nsizeSample

NsizePopulationK

=

=

.

)(

)(

Where k is the sampling interval

Example

If a sample of 10 students is to be taken from a list of 100 students in a public school, then the

systematic selection of this sample entails:

Solution:

i. Determining the sampling interval k as follows:

N = 100, and n = 10

K = N/n = 100/10

K = 10.

ii. Using the random number table, select a number at random (i.e. Random start) between 1 and 10.

iii. Say 3 is selected. Then the sample would include students with the following numbers:

3, 13, 23, 33, 43, 53, 63, 73, 83 and 93.

This method does not give every unit of the population an equal chance of being selected.

The selection of the first unit (Random start) determines the whole sample.

73

The major advantages of this method are:

1. It is more convenient than simple random sampling or stratified sampling.

2. Systematic Sampling may be more efficient than the Simple Random Sampling, in the

sense that the sample selected is a true representative of the whole population.

3 Stratified Sampling

If the population is very heterogeneous, the best method of taking a sample is through

stratified sampling. In this method, the population is divided into two or more homogenous

groups called “strata”. The stratification factor can be sex, age, marital status, department,

income level, academic qualification, state of origin etc.

Then in each stratum, the sample size required is taken using either simple random or

systematic techniques. The basic purpose of stratification as compared to simple random

sampling is to obtain reduction in sampling error and an increase in precision.

4 Cluster Sampling

This is a form of probabilistic sampling in which the population is first of all divided into

clusters or groups. Then a probability sample of these clusters is taken for examination. The

method of simple random sampling or systematic sampling could be employed for the

selection of clusters.

5 Multi-Stage Sampling

When sampling is done in stages, it is referred to as Multi-Stage Sampling. It is normally

used to cut down the number of investigators and costs of obtaining a sample.

In-Text Question

_____________ is a sampling technique in which each unit of the population has an equal

probability of being selected.

a. Simple random sampling

b. Systematic sampling

c. Stratified sampling

d. Cluster sampling

74

In-Text Answer

a. Simple random sampling

Non-Probability Sampling

This is a method of selecting sample with a definite purpose. It is a sampling technique

without the consideration of probability. The choice of sampling depends entirely upon the

direction and judgement of the sampler. It does not give every unit of the population an equal

or known chance of being selected or included in the sample.

Examples of Non-probability sampling, are; Haphazard; Volunteer, i.e. any procedure of

selection that is taken in a zigzag manner. Another example is quota sampling.

1 Quota Sampling

This is a method whereby the population is sub divided into homogenous units just like

stratified sampling, but the main difference is that, the selection of unit within each stratum is

not Random. This method is commonly used in opinion polls, television audience research

and market research.

2 Sampling Frame

This is the numbered list of all the items in the population. A sampling frame should have the

following properties:

1. Completeness

2. Up-to-date

3. Adequacy

4. No-duplication

5. Conveniency

6. Accuracy

75

3 Sampling Distribution

1. Parameter: It is a numerical descriptive measure of a population, because it is based

on the observation in the population and its value is unknown. It can also be referred

to as population characteristics.

2. Statistic: It is a numerical descriptive measure of a sample to make inferences about

the PARAMETERS of a population. It can also be referred to as sample

characteristics.

If we consider how “statistic” differs from sample to sample drawn from a given population

either with or without replacement, for each sample, we can compute a statistic (mean (x ),

standard deviation, proportion.) e.g. A given statistic such as mean or a proportion will vary

from sample to sample. The distribution of this mean can therefore be examined to estimate

the amount of variation that can be expected from one sample to another.

The probability distribution of such statistic is referred to as “SAMPLING

DISTRIBUTION”.

Sample Distribution of Means ( )X

Let f(x) be the probability distribution of any given population from which we take a sample

of size n. Then, the probability distribution of the statistic X is called the sampling

distribution of means.

Theorem 1

If x1, x2, x3,…., xn represent a sample of size n, without replacement, from a finite population

of size N with population mean µ and variance σ2, then the sample mean is.

n

XX

n

i

I∑== 1

76

1. ( )xE = µ (i.e. the mean of sampling distribution equal the population mean i.e µµ =x

2. ( )

−−=

1

2

N

nN

nxv

σ

Where ( )XV denotes variance of the sampling distribution and σ2 denote the population

variance.

Theorem 2

If x1, x2, x3….., xn represent a sample of size n, with replacement σ2 from an infinite

population with mean µ and variance 2σ , then the sample mean.

n

iXX

n

i∑

== 1

(i) E(x) = µ i.e. µx = µ

(ii) ( )n

xv2σ=

The standard deviation of the mean n

x

σσ =

The standard deviation of a sampling distribution is referred to as STANDARD ERROR.

Example

A population of size N = 5 consisting of the ages of children suffering from measles in a

clinic. The ages are as follows.

X1 = 3, X2 = 5, X3 = 7 X4 = 9 and X5 = 11

i. List all the possible sample of size 2 which can be drawn for the population with

replacement i.e. from an infinite population.

ii. Find the population mean and standard deviation.

iii. Find the mean of the sampling distribution of mean

iv. Find the standard deviation of the sampling distribution of mean.

2. Solve (i) if the sampling is without replacement.

77

Solution:

1 With Replacement

N = 5

n = 2

Possible samples = 5 x 5 = 25 samples or Nn sample = 52 = 25 sample.

3 5 7 9 11

3 (3,3) (3,5) (3,7) (3,9) (3,11)

5 (5,3) (5,5) (5,7) (5,9) (5,11)

7 (7,3) (7,5) (7,7) (7,9) (7,11)

9 (9,3) (9,5) (9,7) (9,9) (9,11)

11 (11,3) (11,5) (11,7) (11,9) (11,11)

(ii) Population mean (µ)

µ = x1 + x2 + x3 + x4 + x5

5

= 3 + 5 + 7 + 9 + 11

5

µ = 7

Population standard deviation

σ2 = (3-7)2 + (5-7)2 + (7-7)2 + (9-7)2 + (11-7)2

5

= 16 + 4 + 0 + 4 + 16

5

σ2 = 40

5

σ = √σ2

78

σ = √8

σ = 2.83

(iii) The mean of sampling distribution of means.

The corresponding sample means are ( )X i.e. for row 1

Column 1 i.e the sample means are 42

53,3

2

33 =+=+, etc.

x f f x ( )µ−x ( )2µ−x ( )2µ−xf

3 1 3 -4 16 16

4 2 8 -3 9 18

5 3 15 -2 4 12

6 4 24 -1 1 4

7 5 35 0 0 0

8 4 32 1 1 4

9 3 27 2 4 12

10 2 20 3 9 18

11 1 11 4 16 16

Total 25 175 100

725

175

=

=

=∑∑

f

xfxµ

By theorem 2.

µµ =x

79

i.e the mean of sampling distribution of means must be equal to the population mean.

om theorem 2

425

1002 ==xσ

242 === xx σσ

From equation 2

nx

σσ =

Where xσ is the population variance.

2

83.2=xσ

2=xσ

(2) Without Replacement

Total possible samples

nNC Samples

N = 5

n = 2

25C = 10 samples.

Samples are:

(3,5), (3,7), (3,9), (3,11), (5,7), (5,9), (5,11), (7,9), (7,11) and (9,11)

The corresponding samples mean (x ) are: 4, 5, 6, 7, 6, 7, 8, 8, 9 and 10.

( )∑

∑ −=

f

xfx

22 µ

σ

80

x f f x ( )µ−x ( )2µ−x ( )2µ−xf

4 1 4 -3 9 9

5 1 5 -2 4 4

6 2 12 -1 1 2

7 2 14 0 0 0

8 2 16 1 1 2

9 1 9 2 4 4

10 1 10 3 9 9

Total 10 70 30

710

70 ===∑∑

f

xfxµ

( )3

10

302

2 ==−

=∑

∑f

xf µσ

Hence

73.13 ==σ

Example 2

The weights at birth of a population of four puppies delivered in one day are given below

1 2 3 4

3.9 3.6 3.4 3.7

Suppose a simple Random Sample of size n = 2 is to be selected from this population of N=4

with replacement and without replacement.

81

i. List the entire possible sample which can be drawn with replacement and without replacement.

ii. Find the population mean and standard deviation

iii. Find the mean of sampling distribution of mean

iv. Find the standard errors of the sampling distribution of mean.

Solution:

1 With Replacement

Total possible sample when N = 4 and n = 2 is Nn = 24 = 16 samples

3.9 3.6 3.4 3.7

3.9 (3.9,3.9) (3.9,3.6) (3.9,3.4) (3.9,3.7)

3.6 (3.6,3.9) (3.6,3.6) (3.6,3.4) (3.6,3.7)

3.4 (3.4,3.9) (3.4,3.6) (3.4,3.4) (3.4,3.7)

3.7 (3.7,3.9) (3.7,3.6) (3.7,3.4) (3.7,(3.7)

(ii) The population mean (µ).

iii. The corresponding sample mean (x )

x 3.9 3.6 3.4 3.7

3.9 3.9 3.75 3.65 3.8

3.6 3.75 3.6 3.5 3.65

3.4 3.65 3.5 3.4 3.7

3.7 3.8 3.65 3.55 3.7

( )2

2

3.9 3.6 3.4 3.73.65

4

0.0625 0.0025 0.0625 0.0025

40.13

0.03254

0.0325 0.18

X

N

x

N

µ

µσ

σ

+ + += = =

− + + += =

= =

= =

∑

∑

82

The mean of the sampling distribution of mean xµ

x f f x ( )µ−x ( )2µ−x ( )2µ−xf

3.4 1 3.4 -0.25 0.0625 0.0625

3.5 2 7 -0.15 0.0225 0.045

3.55 2 7.1 -0.1 0.01 0.02

3.6 1 36 -0.05 0.0025 0.0025

3.65 4 14.6 0 0 0

3.7 1 3.7 0.05 0.0025 0.0025

3.75 2 7.5 0.1 0.01 0.02

3.75 2 7.6 0.15 0.0225 0.045

3.9 1 3.9 6.25 0.0625 0.0625

TOTAL 58.4

( )

65.316

4.58

=

=

=∑∑

f

xfxµ

65.3=xµ

The variance of the sampling distribution of mean

01625.0=xσ

2

2( )

x

f x

f

µσ

−= ∑

∑

2

0.127x x

x

σ σσ

==

83

ii) WITHOUT REPLACEMENT

Total possible sample when N = 4 and n = 2

624 == CCn

N

Samples

The samples are

(3.9, 3.6) (3.9, 3.4), (3.9, 3.7) (3.6, 3.4), 3.6, 3.7) and (3.4, 3.7)

The corresponding sample means are

3.75, 3.65, 3.8, 3.5, 3.65 and 3.55

xv

f f xv

( )µ−x ( )2µ−x ( )2µ−xf

3.5 1 3.5 -0.15 0.0225 0.0225

3.55 1 3.55 -0.1 0.01 0.01

3.65 2 7.3 0 0 0

3.75 1 3.75 0.1 0.01 0.01

3.8 1 3.8 0.15 0.0225 0.0226

TOTAL 6 21.9 0.065

( )

65.36

9.21 ==

=∑∑

f

xfxµ

The variance of the sampling distribution of mean

( )2

2

∑∑ −

=f

xfx

µσ

6

063.0=

= 0.0108

84

The standard deviation of the sampling distribution of mean

104.0

0108.02

==

x

x

σσ

Theorem 3

Suppose that the populations from which samples are taken are from Normal Population

with population mean µ and variance 2σ i.e ( )2,~ σµNX .The sample mean ( ) ∑=

=n

iix

nx

1

1

the sample mean has a normal distribution with mean µ and variance n

2σ i.e. ( )2,~ σµNx .

The random variable x from which the mean of its distribution is xµ and its standard

deviation n

x

σσ = by approximately modifying the formula,

n

xZ σ

µ−=

Example 1

(i) If the uric acid values in 13 pregnant women randomly selected in a maternity

clinic are approximately normally distributed with a mean of 7.9 and standard

deviation of 0.7 mgs percent respectively. Find the probability that the sample size

will yield a mean.

(a) At most 8.2

(b) Between 7.5 and 8.2

(c) At least 7.4

85

Solution:

µ = 7.9

S.D = = 0.7

n =13

(i)

( )

[ ]55.1

194.0

3.0

137.0

9.72.82.8

≤=

≤=

−≤−=≤

zp

zp

n

xpxP σ

µ

( ) [ ] ( )9394.0

55.155.12.8

==≤=≤ φzpxp

( )

( )( ) ( )( ) ( )( )

( )

9197.0

0197.09394.0

9803.019394.0

06.2155.1

06.255.1

55.106.2

194.0

3.0

194.0

4.0

137.0

9.72.8

137.0

9.75.72.85.7 (ii)

=−=

−−=−−=

−≥−<=<<−=

<<−=

−<−<−=<<

φφ

σµ

zpzp

zp

zp

n

xpxp

σ

86

( )

[ ]( )( )

9953.0

047.01

58.21

58.21

58.2

137.0

9.74.74.7 (iii)

=−=

−−=−≤−=

−≥=

−≥−=≥

φ

σµ

xp

zp

n

xpxp

Example

A certain liquid drug is marketed in bottles containing a nominal 20 ml of drug. Tests on 10

bottles indicate that the volume of liquid in each bottle is distributed normally with mean

20.42 ml and standard deviation 0.429

(i) Estimate the percentage of bottles which would be expected to contain less than 20ml of

drug.

(ii) Estimate the bottle expected to contain between 20 ml and 20.5ml

Solution

µ = 20.42

= 0.429

n = 10

σ

87

(i)

( )

( ) [ ]( )

[ ]( )

0010.0

9990.01

09.31

09.31

0010.0

09.3

09.320

136.0

421.0

10429.0

42.202020

=−=−=

≤−=

=−=

−≤=≤=

−≤=

−≤−=≤

φ

φ

σµ

zp

or

zpxp

zp

n

xpxp

Therefore 0.1% of bottles would be expected to contain less than20 ml of drug.

(ii)

( )

( ) [ ]( ) ( )( )

7214.0

0010.07224.0

09.3159.0

59.009.35.2020

10429.0

42.205.20

10429.0

42.20205.2020

=−=

−−−=≤≤−=≤≤

−≤−≤−=≤≤

φφ

σµ

zpxp

n

xpxp

3.2 Hypothesis Testing

When making a statistical enquiry, we often put forward a hypothesis concerning a

population parameter. The essence of hypothesis testing is to aid the researcher in reaching a

decision concerning a population by examining a sample from that population.

A statistical hypothesis test is usually structured in terms of two mutually exclusive

hypotheses referred to as the NULL HYPOTHESIS and ALTERNATIVE HYPOTHESIS

denoted by H0 and H1 respectively.

88

3.2.1 Definition of some Basic Concepts

i. Statistical Hypothesis: This is an assumption made about the value of population

parameter or parameter of the distribution or population.

ii. Null Hypothesis (HO): This is a statement of insignificant or no difference between

the observed value and theoretical value. It is denoted by HO. It can either be accepted

or rejected.

iii. Alternative Hypothesis: This is a statement of significance or differences between

the observed value and theoretical value. This is a statement against the Null

hypothesis and it is usually denoted by H1. There are two types of alternative

hypothesis.

a. One tailed test

b. Two tailed test.

i. ONE TAILED TEST: A one tailed test looks for a definite decrease or a definite increase in

the parameter. The hypothesis could be

(i) Definite decrease (ii) Definite increase

H0 : µ = 25 H0 : µ = 25

HI : µ < 25 HI : µ > 25

(ii) TWO TAILED TEST: Two tailed test look for any change in the parameter.

Examples

H0 : µ = 25

HI : µ ≠ 25 (i.e µ <25 or µ > 25

(iii) LEVEL OF SIGNIFICANCE (α ): This is usually specified so as to guide one to

either accept or reject H0: if we reject H0 at this level, we say that “there is

significant evidence at the level of significance”.

(1) TYPE 1 ERROR: These errors occur if the null hypothesis is rejected when it should be

accepted. The probability of committing type 1 error is referred to as level of significance.

The probability of a type 1 error is conditional probability. It is denoted by α .

P(Rejecting HO/HO is true) = α

1 - α = P(Accepting HO /HO is true)

89

(2) TYPE 2 ERROR: These errors occur if the Null hypothesis is accepted when it should be

rejected. Its probability is denoted by β such that

β = P(Accepting H0 /H0 is false)

β−1 = P(rejecting H0 /H0 is false)

(3) CRITICAL REGION: This is the value of the test statistic that leads to the rejection of

the null hypothesis. Critical region depends on the type and the level of the test chosen. The

boundaries of the critical region are called the critical values.

In general, when performing significance test, it is useful to follow a set procedure.

(1) State the null hypothesis H0 and the alternative hypothesis HI

NB. If we are looking for a definite increase or definite decrease in the population

parameter, we use a one tailed test and if we are looking for any change we use a two-

tailed test.

(2) State the level of significance.

(3) Determine test statistic which is based on random sample and appropriate test statistic

i.e. if n is less than 30 use t statistic and n ≥ 30 use Z statistic

(4) Determine the critical region using the cumulative distribution table for the test

statistic.

(5) Compute the value of the test statistic based on the sample information.

(6) Make a conclusion: If the value of the test statistic lies in the critical region, reject Ho.

If the value of the test statistic does not lie in the critical region, accept Ho.

Hypothesis for a Single Population Mean

Example 1

Nine crippled children were asked to perform a certain task, the average time required to

perform the task was 7 minutes with a standard deviation of 2 minutes. Set the null

hypothesis that 0µ = 10 mm and = 0.1

α

90

Solution

H0 : µ = 10

H1 : µ 10

= 0.1

Test statistic

5.467.0

33

23

92

107

−=

−=

−=

−=−=

−=

cal

cal

t

ns

xt

ns

xt

µ

µ

Conclusion

Since the calt lie in the rejection region, we reject Ho and conclude that there is significant

evidence at 10% level that the population mean is not 10

Example

A researcher wishes to estimate the mean maximum weight at birth of 40 randomly selected

babies delivered in four teaching hospitals in a day. He is willing to assume that the weights

are approximately normally distributed with a variance of 144. He found out that the mean

weight for these babies is 84.3kg. Can the researcher conclude that the weight for the

population is at most 95kg? Use α = 0.01

≠

α

91

Test statistic

63.590.1

7.1040

12953.84

326.201.0

−=

−=

−=

===

−=

Z

ZZZn

xZ

tab α

σµ

Conclusion

Since the calZ lie in the acceptance region, we accept H0 and conclude that the mean weight

for the population mean is at most 95kg.

Hypothesis Testing for a Simple Population Variance

Examples

A sample of 40 observations resulting from an experiment yielded a mean and standard

deviation of 71.5 and 12 respectively. Can we conclude at 0.05 level of significance that the

population variance is 150?

Solution

05.0

150:

150:2

1

20

=≠

=

ασσ

H

H

( ) ( )2

1,21 −− nαχ and ( ) ( )2

1,2 −nαχ

= 239,975.0χ and 2

39,025.0χ

= 58.12 and 23.65

2

22 )1(

σσ−= n

X

92

( )( )

44.37150

144392

=

=χ

Conclusion: Since 2calχ lies in the acceptance region. Therefore, we accept HO and conclude

that population variance i.e. = 150

3.2.2 Hypothesis Testing on the Difference of Two Population Mean

The following may be used to test whether there is a significant difference between two

populations mean. In such cases, one of the following hypotheses is tested.

(1) H0 : 21 µµ − = 0 against

H1 : 21 µµ − ≠ 0

(2) H0 : 21 µµ − ≥ 0 against

H1 : 21 µµ − < 0

(3) H0 : 21 µµ − ≤ 0 against

H1 : 21 µµ − > 0

(1) We will consider the case when 21σ and 22σ are known

We use the test statistic

( ) ( )

2

22

1

21

2121

nn

XXZ

σσµµ

+

−−−=

Which is distributed as N(0, 1)

(2) If when 21σ and 2

2σ are unknown but assumed equal

We use the test statistic

2σ

93

( ) ( )

2

2

1

2

2121

n

s

n

s

XXt

pp +

−−−= µµ

With 221 −+ nn degree of freedom

Where ( ) ( )

2

11

21

222

211

−+−+−=

nn

snsnSp

Example

On assumption of office the General Manager of Global Textile Plc was told that the average

output of the 120 factory workers is 60 bundles of textile per month with a standard deviation

of 19 bundles per month. He discovered that this average output was too low and attributed

this to efficiency of old workers in factory, he therefore decided to retire those who are above

50 yrs and employed younger worker in their place.

After a year, the factory manager reported that the mean output of the present 115 workers in

the factory had increased to 70 bundles with standard deviation of 116 bundles per month.

The G.M refers the case to a researcher with the question that has the mean output really

increase. Answer this question by carrying out a test of 5% level of significance.

Solution

Given

16

70

115

2

2

2

==

=

σX

n

H0 : 021 =− µµ against

HI : 021 >− µµ

Sp

nn

XX

21

2121

11

)()(

+

−−− µµ

19

60

120

1

1

1

==

=

σX

n

94

= 0.05

( ) ( )

2

22

1

21

2121

nn

XXZ

σσµµ

+

−−−=

( )

37.4

29.2

10

24.5

10223.201.3

10115256

120361

1011516

12019

07060

645.1

22

95.005.0105.0

−=

−=−=

+−=

+

−=

+

−−=

==== −

cal

cal

cal

cal

cal

Z

Z

Z

Z

Z

ZZZZα

Conclusion

Since the calZ lies in the rejection region in the critical region therefore we reject HO and

conclude that the mean has really increased.

Example 2

Suppose that the random sample from population A and B respectively give the following

result:

Sample mean Sample variation Sample size

POP A 12.6 10.02 9

POP B 8.9 16.08 7

If the variance is assumed equal, test whether there is a difference in the population mean at

5% level of significance.

α

95

Solution

HO : 021 =− µµ against

HI : 021 ≠− µµ

= 0.05

( ) ( )

21

2121

11

nns

XXt

p

cal

+

−−−=

µµ

( ) ( )2

11

21

222

211

−+−+−=

nn

snsnSp

( )

( )( ) ( )( )

55.3279

08.161702.1019

71

91

09.86.12

=−+

−+−=

+

−−=

p

p

p

cal

s

s

s

t

( )

145.2

084.2

5.055.3

7.3

5.07

1

9

1

279,975.0

2,1 212

==

==

=

=+

−+

−+−

tt

tt

t

and

tab

nntab

cal

α

Conclusion

Since calt lies in the acceptance region, therefore we accept HO and conclude that there is no

difference in the population mean.

α

96

Estimation

The subject of estimation is concerned with the methods by which population characteristics,

i.e. parameters are estimated from sample information. Suppose that a population has an

unknown parameter, such as the mean and the variance. We take a random sample (or

samples) from the population and make an estimate of the unknown parameter from this.

(i) Point Estimation: This is a single numerical value used to estimate the

corresponding population parameter.

(ii) Interval Estimation: This consists of two numerical values defining an interval

with varying degrees of confidence.

Techniques for Interval Estimation

(1) Make a direct probability statement, including the parameter to be estimated.

(2) Rearrange the confidence interval so that it will be in the form of direct inequalities of

the parameter.

Estimator: This is a statistic used to estimate the value of a parameter and it is denoted by a

capital letter e.g. 2,σµ .

An Estimate: The numerical value taken by the estimator for a particular sample e.g. Sample

mean ( ) is an estimator for population mean where sample variance ( ) is an estimator

for a population variance ( )

In-Text Question

Point Estimation is a single numerical value used to estimate the corresponding population

parameter. True/False?

In-Text Answer

True

x µ 2S

2σ

97

Properties of a Good Estimator

(1) Unbiasedness: An estimator is said to be unbiased, if the average value i.e. the

expected value of the estimator is equal to the true value of parameter being estimated

i.e. ( ) µ=xE

(2) Efficiency: The efficiency of any estimator is measured in terms of its variance. The

lower the variance, the more efficient an estimator is said to be.

(3) Consistency: An estimator is consistent if the sample size becomes large when the

probability increases, then the estimate will approach the true value of the population

parameters.

(4) Sufficiency: An estimator is sufficient, if it uses all the information that a sample can

provide about a population parameter and no other estimator can provide additional

information.

Interval estimation is where we expect the true value of the population parameter to lie. We

shall be discussing the construction of confidence interval as a measure of interval estimation.

The confidence we have that a population parameter will fall within some confidence interval

= 1 - σ where σ is the significant level.

(1) Construction of Confidence Interval for Mean when Variance ( ) is known,

population is normal, n is large and point estimate is

Then, the 100(1 - ) % confidence interval for the is given by

nzx

nzx

nzx

nz

z

n

xzp

n

xZ

σµσ

σµσ

ασµ

σµ

αα

αα

αα

2121

2121

21211

−−

−−

−−

+≤≤−

≤−≤−

−=

≤−≤−

−=

2σ

x

α µ

98

Examples

(1) The mean weight of a random sample of 80 yams is 3.75 Kg with a standard deviation of

0.85kg. Construct the 95% confidence interval for estimating population µ of entire farm.

= 3.75

= 0.85

n = 80

n

xZcal σ

µ−=

A 100 (1 - ) % C.I for µ implies

nzx

nzx σµσ

αα2121 −−

+≤≤−=

( ) ( )

( )9362.3,5638.3

9362.35638.3

1862.075.31862.075.3

095.096.175.3095.096.175.380

085.075.3

80

085.075.3

96.1

975.0975.0

975.02

05.0121

⇒

≤≤+≤≤−

+≤≤−

+≤≤−

===−−

µµ

µ

µ

α

ZZ

ZZZ

Confidence for µ when variance 2σ is unknown, population is normal, n is small and point

estimate is x .

Then, the 100 (1 – α) % confidence interval for µ is given by

ns

xt

µ−= at n-1 degree of freedom

v = n – 1

x

σ

α

nstx

nst

t

nsx

tp

vv

vv

,2,2

,2,2

αα

αα

µ

µ

≤−≤−

≤−≤−

99

Rearrange to reflect µ

nstx

nstxIC

vv ,2,2. αα µ +≤≤−=

Example

The cost of assembling an item of equipments has been estimated by obtaining a sample of 15

jobs. The average cost of assembly derived from the sample was N1.23 million with a

standard deviation of N0.27 million. Form a 99% confidence interval for the mean cost of

assembling the items.

Solution

n = 15

X = 1.23

S = 0.27

ns

xt

µ−=

A 100 (1 – α) % C.I for µ implies

nstx

nstxIC

vv ,2,2. αα µ +≤≤−=

( ) ( )

( )44.1,02.1

44.102.1

21.023.121.023.1

07.0977.223.107.0977.223.1

15

27.0977.223.1

15

27.0977.223.1

977.214,995.0,21

⇒

≤≤=+≤≤−=

+≤≤−=

+≤≤

−=

==−

µµ

µ

µ

α ttv

100

Example

A sample of 25 years old boys yielded a mean weight and S.D. of 73 and 10 pounds

respectively. Assuming a normal distributed population, find the 90% confidence interval for

µ .

Solution

n = 25

S = 10

X = 73

ns

xtcal

µ−=

A 100 (1 – α) % C.I. for µ implies

nstx

nstxIC

vv ,2,2. αα µ +≤≤−=

( ) ( )

( )422.76,578.69

422.76578.69

422.373422.373

2711.1732711.173

25

10711.173

25

10711.173

711.124,95.0,21

⇒

≤≤=+≤≤−=

+≤≤−=

+≤≤

−=

==−

µµ

µ

µ

α ttv

Confidence Interval for Variance ( 2σ )

A 100 (1 – α) % C.I for 2σ

( ))1(),(

)1(

)1()1(

1

22

22

22

2

−−≤≤

−−−

nX

sn

nX

snαα

σ

101

Examples

Suppose a sample 2S =10.42 is obtained from a random sample of size 25 from a normal

population, obtain a 95% confidence interval.

Solution:

n=25

2S =10.42

where V= n-1

( )

( ) 40.12

36.392

24,025.02

1,025.0

224,975.0

21,975.0

==

==

−

−

χχ

χχ

n

n

A 100 (1-α) % confidence interval for 2σ implies

( )( )

( )( )

( )( ) ( )( )

( )17.20,35.6

17.2035.6

40.12

42.1024

36.39

42.1024

11

2

2

2

,2

22

2

,21

2

⇒

≤≤=

≤≤=

−≤≤−

−

σ

σ

χσ

χ αα vv

snsn

Example 2

In an experiment to determine the amount of carbon monoxide emitted by generator in a day,

a sample of 15 normal generators of the same capacity and brand was taken from an

industrial area, and the sample yielded a mean of 96 and S.D. of 35. Construct the 95%

confidence interval for 2σ .

102

Solution

S2 = (35)2

n = 15

A 100 (1-α) % confidence for 2σ implies

( )( )

( )( ) vv

snsn

,2

2

22

,21

2

2 11

αα χσ

χ−≤≤−

−

( ) ( )

( )72.3046,58.656

72.304658.656

629.5

3514

12.26

3514

629.5

12.26

2

22

2

214,025.0

214,975.0

⇒

≤≤=

≤≤=

=

=

σ

σ

χ

χ

3.2.3 Probability Distribution and Hypothesis Testing

Please follow on as the focus of this study shifts to Binomial Distribution:

Binomial Distribution

1. An experiment whose outcome cannot be predicted is termed a random experiment.

2. The totality of all outcomes of a random experiment constitutes spaces, known as the

sample space.

3. A finite collection of these sample spaces is termed as an event associated with the

random experiment (each point of a sample space is referred to as an elementary event).

4. A real valued function connected with the outcome of a random experiment known as a

random variable. If a random variable takes almost a countable number of values, then it is

called a discrete random variable. On the other hand, if a random variable assumes all the

values between certain limits, it is said to be a continuous distribution. The random variable x

is said to have a Binomial distribution with 2 parameters n,p written as x in B~(n, p) where n

is the number of independent trials and p the probability of success. Then, the probability of x

success in independent trial when P is constant from trial to trial is P(x) = nCxPxqn-x x = 0, 1, 2

… where p + q = 1

103

Mean µ = np

Variance 2σ = npq

Properties of Binomial Distribution

i. The experiment has n independent trials

ii. The trials are independent

iii. Only two possible outcomes are found in a trial e.g. success or failure.

iv. The probability of success p is constant from trial to trial.

Fitting Binomial Distribution and Goodness-of-Fit

Test

A goodness-of-fit is appropriate when one wishes to decide if an observed distribution of

frequency is incompatible with some preconceived or hypothesized distribution. For

example; we wish to determine whether or not a sample of observed values of some random

variables is compatible with the hypothesis that it was drawn from a population of values that

are normally distributed.

NOTE

In case of categorical data compute the expected values Ei using the formula

ncrEi

iii

.=

Where r i represent row total.

ci represent Column total.

nirepresent Grand total.

( )nx

qpCxXp xnxx

n

.........,.........2,1,0=== −

and such that npX =

The expected frequency was obtained by ( )xXNPEi ==

104

Example 1

The following table summarizes 100 observations of the discrete random variable x

x 0 1 2 3 4

f 25 14 25 26 10

Perform a test−2χ of hypothesis that has a binomial distribution.

Solution

Assume X has a binomial distribution with n=4

x f fx

0 25 0

1 14 14

2 25 50

3 26 78

4 10 40

82.1100

182 ===∑∑

f

fxX

To get P, we let npX =

545.0455.01

1,455.04

82.1

=−=

=−==

=⇒

q

qpp

n

XP

105

We now compute

( )

( ) 0882.00

40

4,3,2,1,0

4

40 =

==

=

== −

qpXp

x

qpx

xXp xnx

( )

( )

( )

( ) 0429.04

44

2054.03

43

03689.02

42

2946.01

41

04

13

22

31

=

==

=

==

=

==

=

==

qpXp

qpXp

qpXp

qpXp

The expected frequency is calculated as ( )xXNPEi == we have,

X 0 1 2 3 4

P (X=x) 0.0882 0.2946 0.3689 0.2054 0.0429

N P (X=x) ‘8.82 29.46 36.89 20.54 4.29

NOTE: - If expected frequency is less than 5 combine with adjacent class. Therefore, we

have

X 0i Ei (0i-Ei)2 (0i Ei)2/Ei

0 25 8.82 261.79 29.68

1 14 29.46 239.0 8.11

2 25 36.89 141.37 3.83

3 26 20.54 29.81 1.45

4 10 4.29 32.60 7.60

50.67

106

H0: Distribution follows binomial distribution.

H1: Distribution does not follow binomial distribution.

67.502 =calχ

991.522,95.0

2114,95.0

21,1

2 ==== −−−−− χχχχ α mvtab

Decision since 22tabcal χχ > at 5%, we reject H0 and conclude that the random variable X does

not follow a binomial distribution.

Example

Seven coins are tossed and the number of heads obtained was noted. The experiment is

repeated 128 times and the following distribution is obtained.

No of Heads 0 1 2 3 4 5 6 7 Total

Frequency 7 6 19 35 30 23 7 1 128

Fit a binomial distribution assuming that

(i) The coin is unbiased

(ii) The nature of the coin is not known

Solution:

(i) When the coin is unbiased, p = q = (1/2)

( ) ( )77

21

xxnx

xn CqpCxXp === −

107

X P(x) Expected frequency

0 ( )707

21C

128(1/2)7 7 =1

1 ( )717

21C

128(1/2)7 7 = 7

2 ( )727

21C

128(1/2)7 21 = 21

3 ( )737

21C

128(1/2)7 35 = 35

4 ( )747

21C

128(1/2)7 35 = 35

5 ( )757

21C

128(1/2)7 21 =21

6 ( )767

21C

128(1/2)7 = 7

7 ( )777

21C

128(1/2)7 7 =1

(ii) When the nature of the coin is not known

5167.04833.011

4833.0

4832571.07

3828.3

3828.3128

433

=−=−=≈

==

===

pq

p

npMean

( ) ( ) ( ) xxxCxXp −= 77 5167.04833.0

108

X P(x) Expected frequency

0 (0.5167)7 128P(0) = 1.25856 = 1

1 7 (0.4833) (0.5167)6 128P(1) = 8.24024 = 8

2 21 (0.4833)2(0.5167)5 128(2) = 23.1227 = 23

3 35 (0.4833)3 (0.5167)4 128P(3) = 36.0467 = 36

4 35 (0.4833)4 (0.5167)3 128P(4) = 33.7166 = 34

5 21 (0.4833)5 (0.5167)2 128P(5) = 18.9223=19

6 7 (0.4833)6 (0.5167) 128P(6) = 5.8997 = 6

7 1(0.4833)7 128p(6)=0.7883=1

Normal Distribution

The Binomial and Poisson distributions are useful theoretical distributions for discrete

variable. In many problems, the variables concerned may take all values within a given time.

In such cases, the Binomial and Poisson distribution cannot be applied. In order to deal with

such problems, we will have a continuous distribution. The normal distribution is one of the

most useful continuous distributions that we use to describe continuous random variables.

This is a unimodal symmetric continuous distribution having two parameters µ (the mean)

and 2σ (the variance).

( )( )2,~

var,~

σµNX

iancemeanNX

Features of Normal Distribution

i. It is a continuous distribution.

ii. It is a unimodal distribution.

iii. The total area under the curve is one.

iv. The curve is symmetrical and has a bell shape

v. The mean = median = mode.

109

The probability function of a normal distribution is

( ) ∫∞

∞−

−−== dxexXP

x 2

2 2

2

1 σµ

πσ

Example on Normal distribution

Fit a normal curve to the following data

Class

interval

Freq Upper

limits

(x)

σµ−= x

Z

φ (z) Cumulative

frequency 100

φ (z)

Expect

ed

freque

ncy

94.5 – 99.5 1 99.5 -2.46 0.0069 0.69 0.69

99.5 – 104.5 4 104.5 - 1.99 2.33 2.33 1.64

104.5–109.5 3 109.5 -1.57 0.0582 5.82 3.49

109.5– 1145 3 114.5 -1.06 0.1446 14.46 8.64

114.5–119.5 12 119.5 -0.59 0.2776 27.76 13.3

119.5–124.5 27 124.5 -0.12 0.4522 45.22 17.46

124.5–129.5 12 129.5 0.35 0.6368 63.68 18.46

129.5-1345 16 134.5 0.81 0.7910 79.1 15.42

134.5-139.5 12 139.5 1.28 0.89.97 89.97 10.87

139.5-144.5 8 144.5 1.75 0.9599 95.99 6.02

144.5-149.5 1 149.5 2.22 0.9868 98.68 2.69

149.5-154.5 1 154.5 2.68 0.9963 99.63 0.95

7.10

8.125

75.125

1 =≈

==

−

∑∑

n

f

fxX

σ

110

Poisson Distribution

Poisson distribution is a discrete probability distribution. This distribution occurs at random

points of time and space wherein our interest lies only in the number of occurrences of the

event and not in the non-occurrence of it. When is it necessary to apply Poisson distribution?

(i) n (the no of trials) is very large or not readily obtainable (n> 50, 5<np )

(ii) P (probability of success) is very small relative to q (the probability of failure).

The mass probability function of a poisson distribution is given by

( ). 2, 1, 0, x Where

!

.mxXP

x

…=

=−

x

e m

Uses of Poisson Distribution

i. It can be used in occurrence of accident in the factory.

ii. It can be used to find the demand for a product.

iii. It is also being used when the variable concerned occurs in a random manner, e.g.,

number of customers entering a banking hall at a specific time.

iv. It can be used to detect the occurrence of flux in a production process.

Properties of Poisson distribution

If ( )mpX o~

i. np or npMean == λ

ii. npq== 2 Variance σ

iii. The rate or population average must be known.

111

Examples on Poisson distribution

The following table shows the number of days (f) in a 50 day period during which automobile

accidents occurred in a city. Fit a Poisson distribution.

Solution

No. of accidents (x) 0 1 2 3 4

No. of days (f) 21 18 7 3 1

Solution

0.90x

50

45

13171821

1x43x318x1x7x221x0

f

fx Mean

=

=++++

+++==∑∑

For a Poisson distribution mean = 0.90

0.90=λ

( ) ( )

( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )0111.0

24

39.0

!4

90.04

494.06

29.0

!3

90.03

1647.02

19.0

!2

90.02

3659.090.0!1

90.01

4066.0!0

90.00

!

90.0

!

490.0

390.0

290.0

90.0190.0

90.0090.0

90.0

=====

=====

=====

====

====

===

−

−

−

−−

−−

−−

XpeXp

XpeXp

XpeXp

ee

Xp

ee

Xp

x

e

x

exXp

xxλλ

112

( )xXP N frequency lTheoretica ==

No of accidents 0 1 2 3 4

Theoretical 20.33 0r 20 18.30 0r 18 8.24 or

18

2.47 0r 3 0.56 or

1

Observed frequency 21 18 7 3 1

The fit of the Poisson distribution to the given data is good.

Example 2

Fit a Poisson distribution and test if it follows a Poisson law.

x 0 1 2 3 4

f 43 38 22 9 1

Solution

1

1113

113

19223843

1493222381430

=

==++++

++++==∑∑

λ

λ XXXXX

f

fx

x f P(x) Expected frequency

0 43 0.3679 41.6

1 38 0.3679 41.6

2 22 0.1839 20.8

3 9 0.0613 6.9

4 1 0.0153 1.7

113

Revised table

iO Ei

( )2

EO ii− ( ) EEO iii

2−

0 41.6 1730.6 41.6

1 41.6 1648.4 39.6

2 20.8 353.4 17.0

10 8.6 2.0 0.2

Total 98.4

4.982 =calχ

991.522,95.0

2114,05.01

21,1 === −−−−−− χχχ α mv

Decision: Since 22tabcal χχ > , we reject H0 and conclude that the test does not follow a Poisson

law.


In this study session, you have learnt that:

1. A larger number of individual usually typifies population to be studied and compared;

therefore, sampling methods are used to facilitate comparison.

2. Despite the importance of descriptive statistics, the scientific philosophy of our time

places more emphasis on explanation, prediction and generalization than on pure

description we have discussed the basic sampling procedures, test of hypothesis,

confidence intervals and fitting of distributions to data.

114







1. Nine crippled children were asked to perform a certain task, the average time required to



2. A researcher wishes to estimate the mean maximum weight at birth of 40 randomly

selected babies delivered in four teaching hospitals in a day. He is willing to assume that the

weights are approximately normally distributed with a variance of 144. He found out that the

mean weight for these babies is 84.3kg. Can the researcher conclude that the weight for the


References

Mead R. (1988): The design of Experiments: Statistical Principles for Practical Application,

Published by CUP.

Rustagi J.S. (1985): Introduction to Statistical Methods. New Jersey: Rowman and Nanheld.

Thompson S.K. (2002, 2nd Ed): Sampling, Published by Wiley.

α

115

Study Session 4: Design and Analysis of Biological Experiments

Introduction

An experiment can be described as an act of operation undertaken in order to obtain new facts

or to confirm or deny the results of previous experiments.

Experiments are conducted to establish the effect of one or more independent variables on a

response. These variables are often called treatments or factors. Examples of variable are;

different drugs; different fertilizer and so on. Differences in animal breeds are caused by

genetic and environmental differences.

Learning Outcome for Study Session 4


4.1 Explain experiment involved in the planning of experimental designs;

4.2 State and explain the procedures involved in planning and execution of biological

experiments;

4.3 Definitions of Relevant Terms in Design and Analysis of Biological

Experiments

4.4 Explain the concept of the completely randomized design and the randomized complete

block design: use the analysis of variance (one way and two-way classification)

techniques to solve problems in the CRD and RCBD.

4.1 Design of an Experiment

The design of an experiment can be defined as the order in which an experiment is

performed. It deals with the planning, execution, analysis and interpretation of results of an

experiment.

For instance, one method of determining

an agreed standard. The drug is applied at a constant rate to an experimental unit and the dose

at which death or some other recognizable events (e.g. recovery) occur is noted. This critical

dose is called the tolerance or threshold. This is repeated for a number of experimental units

using the drug under analysis.

The tolerances vary from one experimental unit to another. Also, the clinical investigation of

the use of new medical treatment raises similar

of a new treatment is compared to that of a controlled treatment. The experiment consists of

applying one treatment to each unit and making one or more observations.

The assignment of treatments is under experi

experiment is the comparison of treatments rather than the determination of absolute values.

This type of experiment is called comparative.

4.1.1 Components of Experimental

The design of an experiment has three essential components:

Figure 4.1

116

Design of an Experiment



For instance, one method of determining the efficacy of a drug is by direct comparison with



d the tolerance or threshold. This is repeated for a number of experimental units

using the drug under analysis.


the use of new medical treatment raises similar problems of experimental design. The effect



The assignment of treatments is under experiment’s control when the objective of such an


This type of experiment is called comparative.

Experimental Design

has three essential components:

Figure 4.1: Components of Experimental Design

1. Estimation of error

2. Control of error

3. Proper interpretation of result



the efficacy of a drug is by direct comparison with



d the tolerance or threshold. This is repeated for a number of experimental units


problems of experimental design. The effect



ment’s control when the objective of such an


1. Estimation of Error:

similar manner and the differences among them are measured, these differences are

called experimental errors. This error serves as a basis for taking decision whether an

observed difference is natural or artificial.

2. Control of Error: A common method of controlling or reducing experimental error is

called blocking. A good experiment must inculc

experimental error.

3. Proper Interpretation of Results:

design is its ability to resist all environmental factors, which are not part of the

treatments being considered.

4.1.2 Principles of Experimental

In all experimental designs, there are three essential principles:

Figure 4.2

1 Replication

Replication is the repetition of the same treatment on different experimental units. When

treatment appears more than once in an experiment, it is said to be repeated.

117

Estimation of Error: This is when two or more experimental plots are treated in a


experimental errors. This error serves as a basis for taking decision whether an

observed difference is natural or artificial.

A common method of controlling or reducing experimental error is

called blocking. A good experiment must inculcate every possible means of reducing

Proper Interpretation of Results: An essential characteristic of an experimental


treatments being considered.

Experimental Design

In all experimental designs, there are three essential principles:

Figure 4.2: Principles of Experimental Design

Replication is the repetition of the same treatment on different experimental units. When


1 Replication

2 Randomization

3 Blocking/Local Control

when two or more experimental plots are treated in a


experimental errors. This error serves as a basis for taking decision whether an

A common method of controlling or reducing experimental error is

ate every possible means of reducing

An essential characteristic of an experimental


Replication is the repetition of the same treatment on different experimental units. When a


118

Function of Replication

The following are the functions of Replication:

i. To provide an estimate of experimental error.

ii. To improve the precision of an experiment by reducing the standard deviation of a

treatment mean.

iii. To measure the scope of inference of the experiment by selection and appropriate

use of variable experimental units.

iv. To effect control of error variance.

2 Randomization

This is the use of a random process to assign experimental units to treatments so that every

experimental unit considered has an equal chance of receiving a treatment. The unbiasedness

of the treatment means and experimental error estimates is ensured. This is a major reason for

randomization.

Methods of Randomization

The following are the method of Randomization:

i. Use random number table.

ii. Tossing a coin.

iii. Shuffling numbered cards.

iv. Drawing numbered balls from a bag.

Functions of Randomization

• To eliminate systematic bias in treatment assignment.

• It ensures that each unit has an equal chance of receiving a treatment.

3 Blocking/Local Control

Blocking is the process by which an experimental material is divided into blocks of

homogeneous units. This is to allow for certain restrictions on randomization to minimize

experimental error.

119

In-Text Question

The following are the methods of Randomization EXCEPT?

a. Use random number table.

b. Tossing a coin.

c. Shuffling greeting cards.

d. Drawing numbered balls from a bag

In-Text Answer

c. Shuffling greeting cards

4.2 Steps Involved in the Planning and Execution of an Experimental

Design

When planning and executing an experiment, the following are the procedures to be followed

in order to achieve a great success:

1. Problem Recognition and Definition: This is the first step in the planning and

execution of an experiment. After the problem has been recognized, it must be clearly

defined. The more we understand the problem at hand, the easier it is for us to find an

appropriate solution to it.

2. Objective of the Experiment: A clear statement of the objective is very important.

This may be in form of a hypothesis to be tested or a question which requires an

answer. If the objective is more than one, arrange them in their order of priority.

3. Analysis of the Problem and Objectives: The aims of the objectives must be

carefully studied and the necessity of the experiment stated in precise terms.

4. Treatment Selection: Reasonable selection of the treatments goes a long way in

achieving the desired results.

5. Selection of the Experimental Units: This involves the selection of experimental

units from the population on which inferences are to be made. The selected or chosen

experimental units must be a true representative of the entire population under

consideration.

6. Selection of the Experimental Design: The objective of the experiment should be

120

carefully considered here; so that an appropriate design will be given the needed

result that should be preferred.

7. Selection of Units for Observation and the Number of Replications: This involves

the determination of the number of replications and the plot size to be chosen in order

to produce the desired result.

8. Data to be collected: Consideration of data to be collected is indispensable. No

essential data should be omitted.

9. Outlining Statistical Analysis and Summarization of Results: Given the summary

of the expected results in order to show the expected effects of the treatments on the

experimental units. This can be presented on a summary table or graphs. List its

sources, variation and corresponding degrees of freedom in the analysis of variance

table. Check whether the expected results of the experiment are relevant to the

objective.

10. Performing the Experiment: The experiment should be conducted without any

element of personal biases. Record all observations properly and later check them

with a view to minimizing recording errors or deleting data that are obviously

erroneous.

11. Data Analysis and Interpretation of Results: Analyse all data and give the

interpretation of the results obtained from the experiment. Data should be scrutinized

and carefully analyzed so that the result will be accurate. If either the data is not well

analyzed or the results not well interpreted, the conclusions may be wrong.

In-Text Answer

The objective of the experiment should be carefully considered; so that an appropriate design

will be given the needed result that should be preferred. The above statement explains one of

the following:

a. Selection of the Experimental Design

b. Selection of the Experimental Units

c. Data to be collected

d. Outlining Statistical Analysis and Summarization

In-Text Question

b. Selection of the Experimental Units

121

4.3 Definitions of Relevant Terms in Design and Analysis of Biological

4.4 Experiments

The following are the definition of relevant terms:

I. Factors: These are variables that can affect the outcome of an experiment. The

factors are always under investigation. Examples of factors are time, humidity,

temperature, etc.

II. Factor levels: Every factor in an experiment may take two or more level for

quantitative factors while for qualitative factors, the levels could be fertilizers

such as NPK fertilizer.

III. Treatment: A treatment is the procedure whose effect is to be measured and

compared with other treatments.

IV. Experimental Units: An experimental unit is the smallest division of the

experimental materials, such that any two units may receive different treatments in

the experiment. It can also be defined as the unit of material to which one

application of a treatment is applied.

V. Pairing: It is defined as the method employed in assigning the effectiveness of a

treatment or experimental procedure that make use of related observation resulting

from non-independent samples. A hypothesis test based on these types of data is

called a paired comparison test.

VI. Response: The numerical result obtained for a specific experimental unit is

known as a response.

VII. Sample Statistic: These are descriptive measures computed from the data of a

sample. Examples are the sample mean, sample variance, sample standard

deviation.

VIII. Population parameter: These are descriptive measures computed from the data

of a population. For example, population mean ( ), variance ( ), population

standard deviation, population range and so on.

4.4.1 Analysis of Variance (ANOVA)

The analysis of variance provides a method of testing whether the means of two or more

normal populations with unknown but common variances are equal. The analysis of variance

µ 2σ

122

approach is to partition the variation within the experiment into meaningful components. The

purpose is to compare the variations attributable to differences between the populations to the

variation attributable to random variation within the various populations. If the former is

sufficiently large in comparison to the latter, then the procedure will result in a rejection of

the null hypothesis that all population means are equal.

Therefore, analysis of variance may be defined as a technique whereby the total variation

present in a set of data is partitioned into several components. Associated with each of these

components is a specific source of variation, so that in the analysis, it is possible to ascertain

the magnitude of the contributions of each of these sources to the total variation. It was

developed by R.A Fisher, and used mainly in agricultural experiment. The ANOVA can be

split into two types: one-way classification and two-way classification.

Assumptions underlying the use of ANOVA

1. Independence of Observations

This is the most important assumption underlying the F-test in ANOVA, and violation

of this can lead to serious inferential errors. In experimental research, independence is

achieved by random assignment of subjects to the treatment groups.

2. Normal Distribution of X:

Fisher’s F is an appropriate model for the distribution used in ANOVA only when the

means and variances of X are independent. This can be safely assumed only if X is

normally distributed.

3. Equality of Error variances (Homoscedasticity or homogeneity of variance) i.e.

222

21 jσσσ =−−−==

4. Additivity (treatment additive)

5. Independent error

In the two ways classification model, our interest is centred on testing the equality of the

treatment mean. Hence, the hypothesis of interest is

(i) H0 : nµµµ =−−−== 21

(ii) H1 : At least one ji µµ ≠

123

Classification of Experimental Design

There are two categories of experimental design (i.e. experiment in which only a single factor

varies while others remain constant). They are the complete Block Design and the Incomplete

Block Design. Examples of the former are Completely Randomized Designs, Randomized

Complete Block Designs and Latin Square Designs. While for the latter, we have Lattice

Design and Group Balanced Block Design.

The Complete Block Design is suitable for experiments with small number of treatment

while the Incomplete Block Design is applicable for experiments with large number of

treatments.

1. The Completely Randomized Design (CRD)

There is a design in which the treatments are randomly assigned to the experimental units or

in which experimental units are randomly assigned to treatments. The design is simple and

therefore widely used. However, it should be used when the units receiving the treatments are

homogenous if the experimental design should be considered.

Example of cases where CRD is applicable:

a. Laboratory experiment, where a quality of material is thoroughly mixed and then divided into small lots to form the experimental units to which treatments are randomly assigned.

b. Plant and animal experiment where environmental effects are much alike.

The model of CRD can be given as:

.n1,2, j and..k 1,2, i Where ……………=………=

++= ijiij etY µ

ijY = observation of the ith item of the jth sample.

µ = Overall mean (grand mean)

it = treatment effect

ije = random error effect.

ije measures the deviation of the jth observation of the ith sample from the

corresponding population mean and it is assumed to be normally and independently

distributed with mean zero and common variance for the treatment. 2σ

124

The hypothesis being tested

H0 : nµµµ =−−−== 21

H1 : At least one ji µµ ≠

Table of sample values for the CRD

Treatment

1 2 3 - - - j - - - k

y11 y12 y 13 - - - yij - - - y1k

y21 y22 y 23 - - - y2j - - - y2k

. . . . .

. . . . .

. . . . .

Total T1 T2 T3 - - - Tj - - - Tk T..

Mean y .1 y .2 y .3 - - - y .j - - - y .k y ..

Where T.j = total of the observation taken under treatment j

y .j = observed mean for treatment j = j

j

n

T

T.. = overall total for all observations taken

Yij= ith observation receiving jth treatment

i = 1, 2, - - - , n, j = 1, 2, - - - k

∑=

==k

jjn

N

Ty

1

;..

..

∑ ∑ ∑= = =

=⇒k

j

k

j

nj

i

YijjT1 1 1

.T..

125

The computations of sum of squares are as follows:

SSTotal = ( )∑∑= =

−k

j

nj

i

YYij1 1

2.. . . . ……………………………. (1)

N

TYij

..22∑∑ − . . ………………………………………… (2)

To partition the total sum of squares, we introduce

in to equation (1)

SSTotal = ( )∑∑= =

−+−k

j

nj

i

YjYjYYij1 1

2....

( )( )[ ]∑∑= =

−+−=k

j

nj

i

YjYjYYij1 1

2....

( ) ( )( ) ( )∑∑∑∑∑∑= == == =

−+−−+−=k

j

nj

i

k

j

nj

i

k

j

nj

i

YjYYjYjYYijjYYij1 1

2

1 11 1

2.......2. …(3)

The middle term in (3) becomes zero since the sum of deviations of a set values from their

mean equals to zero i.e

( )( )∑∑= =

−−k

j

nj

i

YjYjYYij1 1

....2 =0

∑=

=⇒nj

i

Yijj1

.T

jYjY .. +

126

SSTotal ( ) ( )∑∑∑∑= == =

−+−=k

j

nj

i

k

j

nj

i

YjYjYYij1 1

2

1 1

2....

( ) ( )∑∑∑== =

−+−=k

j

k

j

nj

i

YjYnjjYYij1

2

1 1

2....

When the observations are the same in each group, we get

SSTotal ( ) ( )∑∑∑∑= == =

−+−=k

j

nj

i

k

j

nj

i

YjYnjYYij1 1

2

1 1

2.... …………………… (4)

Hence, equation (4) can be symbolically written as

SST = SSt + SSE

Therefore,

SST = ( )∑∑= =

−k

j

nj

i

YYij1 1

2.. . .

= ∑∑= =

−k

j

nj

iij N

TY

1 1

22

..

Where N = nk; N

T ..2

= correction factor

( )

SSSSSS TrTotalE

k

j

K

Jtr

N

T

nj

jT

YjYnSS

−=

−=

−=

∑

∑

=

=

1

22

1

2

...

...

Where SSE = ( ) ∑∑∑∑∑== == =

−⇒−k

j

jk

j

nj

iij

k

j

nj

ij nj

TYYYij

1

2

1 1

2

1 1

2 ).()(.

∴

127

The ANOVA table for one-way classification is given as follows:

Source of

Variation

Degree of

Freedom

Sum of

Squares

(SS)

Mean Square

(MS)

F-ratio or

FO

Treatment K – 1 SSTr

Error K(n – 1) SSE

( )1−=

nKSSMS E

E

Total Kn - 1 SSTotal

Decision:

To reach a decision, the computed F-ratio must be compared with the critical value of F

obtained from the table. If the computed value of F-ratio exceeds the critical value of F, the

null hypothesis is rejected. Otherwise, we accept the null hypothesis and reject alternative

hypothesis.

Example 1

Four types of pain relief drugs were administered on 12 patients in a hospital and the number

of hours of pain relief provided by the drug recorded as follows:

Drugs

A B C D

6 7 3 4

3 10 6 2

4 8 5 9

Total 13 25 14 15

Solution:

TrTr MS

K

SS =− 1 E

TrMS

MS

128

Ho: µ1 = µ2 = µ3

H1: Not all population means are equal i.e. µ1 ≠ µ2 ≠ µ3

Grand total (T..) = 13 + 25 + 14 + 15

= 67

SStotal = ∑∑= =

−k

j

nj

iij N

TY

1 1

22

..

N = nk i.e. number of rows x number of columns

N = 3 x 4

= 12

SSTotal = 62 + 72 + 32 + … + 52 + 92 – (67)2

12

= 445 – 374.03

= 70.97

SSTr = N

T

nj

Tk

j

j ... 2

1

2

−∑=

=

= 405 – 374.03

= 30.97

SSE = SSTotal – SSTr

= 70.97 – 30.97

= 40.00

( )12

67

3

15142513 22222

−+++

129

ANOVA TABLE

Source of

Variation

Degree of

Freedom

Sum of

Squares

Mean

Square

F-ration

Treatment 3 30.97 10.32 2.064

Error 8 40.00 5.00

Total 11 70.97

The critical value is obtained as follows:

Fα, (v1, v2) = Critical value

Where α = level of significance

v1 = k – 1

v2 = k (n-1)

F0.05, (3,8) = 4.07

Decision: Since Fcal < Ftab, we accept Ho and conclude that there is no significant difference

in the mean number of hours of pain relief provided by the four types of drugs.

130

Example 2:

Mathematics tests are conducted by 5 different examiners separately. The numbers of

candidates in the categories of distinction, credit, pass, fail are as shown below:

Examiner

Grade A B C D E

Distinction 5 7 4 10 6

Credit 13 3 5 8 15

Pass 21 31 19 27 33

Fail 39 17 28 34 19

Test the hypothesis that the examiners do not differ in their standards of awards. Use α

= 0.05

Solution

Ho: µ1 = µ2 = µ3 = µ4 = µ5

Hi: µ1 ≠ µ2 ≠ … ≠ µ5

Grand total = 344

( )

2.2493

8.5916841020

3441934475

222222

=−=

−+++++= KSST

SSTr =

= 6038.5 – 5976.8

= 121.7

SSE = SST – SSTr

= 2493.2 – 121.7

20

)344(

4

)73()79()56()58()78( 222222

−++++

131

= 2371.5

The corresponding ANOVA table is presented thus,

Source of

Variation

Degree of

Freedom

Sum of

Squares

Mean

Square

F-ratio

Treatment 4 121.7 30.43 0.1925

Error 15 2371.5 158.10

Total 19 2493.2

F0.05, (4, 15) = 3.06

Decision Rule: If caltab FF < , reject Ho. Otherwise accept Ho

Conclusion:

Since Fcal= (0.1925) < Fα, (v1, v2) = 3.06, we accept the null hypothesis that there is no

significant difference in the examiners’ standards of awards.

Co-efficient of Variation

This shows the level of precision of the treatments compared. The lower the co-efficient of

variation, the higher the precision.

The co-efficient of variation (C.V) is given by C.V. = MG

MSE

. x 100%; Overall (Grand

Mean) =

Overall Mean

The co-efficient of variation for example 1 is

= 40%

N

T..

10058.5

00.5x

132

For example 2,

C.V = 1002.17

10.158X

= 73%

Advantages of the Completely Randomized Design (CRD)

The following are the advantages of completely randomized design (CRD)

1) The CRD is flexible in that the number of treatment and replicate is limited only

to the number of experiments available.

2) The statistical analysis is simple even if the number of replicates vary with the

treatment.

3) The number of replicates can vary from treatment to treatment, though it is

generally desirable to have the same number per treatment.

4) Loss of information due to missing data is small relative to loss with other

designs.

5) The number of degrees of freedom for estimating experimental error is maximum.

This improves the precision of the experiment.

Disadvantages of the CRD

The following are the disadvantages of the CRD

1) The CRD is not normally used for field experiments, other designs seem to be

more appropriate.

2) The major disadvantage of this design is that it is normally appropriate for

homogeneous experimental materials and when the number of treatment is small.

The Randomized Complete Block Design (RCBD)

The RCBD is the most widely used design and it was originated in agriculture where land

was divided into blocks and blocks were divided into plots that received the treatments under

investigation. It is a design in which the units called experimental unit to which the

treatments are applied are sub-divided into homogeneous groups called plots, so that the

133

number of experimental units in a plot is equal to the number of treatments being considered.

The treatments are then assigned at random to the experimental units within each block. It

should be noted that each treatment appears in every block and each receives every treatment.

However, the objective of using the Randomized Complete Block Design is to reduce as

small as possible or isolate variability among the experimental units within a block while

assuring that treatment means will be free of block effect. The effectiveness of the design

depends on the ability to achieve homogeneous blocks of experimental units. The ability to

form homogeneous blocks depends on the researcher’s knowledge of the experimental

material.

The following are examples of areas where RCBD is applicable:

1. In experiments involving human beings, age and sex may be used as blocks.

2. In animal experiments where different breeds of animal will respond differently to

the same treatment. The breed type may be used as a blocking factor.

3. The design can also be used when an experiment must be carried out in more than

one laboratory or when several days are required for completion.

Table of sample values for the RCBD

Two-Way Classification

Treatment

Blocks 1 2 3 - - - k Total Mean

1. y11 y12 y13 - - - y1k B1 1Y

2. y21 y22 y23 - - - y2k B2 2Y

3. y31 y32 y33 - - - y3k B3 3Y

. . . . . . .

m ym1 ym2 ym3 - - - ymk Bm mY

Total T1 T2 T3 - - - Tk R…

Mean Y.1 Y.2 Y.3 - - - Y.k ..Y

134

The model is given as:

Yij = µ + iτ + βj + eij

Where µ = Grand mean

iτ = ith row effect

βj = jth column effect

eij = random error term

Hypothesis: We can consider two different null hypotheses. They are

Ho1: There is no difference in treatment means

Ho2: There is no difference in block means

Then we can compute the sum of squares for the three sources of variations as

follows:

SSTotal = with (mk – 1) d.f

Where N = mk; R.. = Grand Total

SSTreatment = with (m-1) d.f.

SSBlocks = with (k-1) d.f.

SSError = SSTotal – SSTreatment – SSBlocks

With (m-1) (k-1) d.f.

N

RY

m

i

k

j

ij

2

1 1

2 ..−∑∑= =

∑=

−m

i N

R

m

Ti

1

22 ..

∑=

−k

j

j

N

R

K

B

1

22..

135

ANOVA Table for RCBD

Source of

Variation

Degree of

Freedom

Sum of

Squares (SS)

Mean Square F-ratio or

Fo

Treatment m-1 SSTreatment SSTr=MSTr

m-1

MSTr

MSE

Blocks k-1 SSBlock SSBlock = MSE

k-1

MSB

MSE

Error (m-1)(k-1) SSError SSError = MSE

(m-1)(k-1)

Total Mk-1 SSTotal

Decision Rule: Reject Ho if Ftab < F0. Otherwise, reject H1

Example 3

The director of a tuberculosis control section of a general hospital wishes to study the

effectiveness of 3 newly developed drugs to cure the disease. The drugs are administered on

patients in different hospitals and the number of recovery from the disease per 50 patients is

recorded as follows:

Drugs

Hospitals 1 2 3 Total

H1 8 13 10 31

H2 11 15 16 42

H3 14 17 17 48

Is there any difference in the number of recovery cases recorded in the 3 hospitals?

Test, using α = 5%

136

N

RY

m

iij

k

j

2

1

2

1

..−∑∑= =

Solution

Ho: µ1 = µ2 = µ3

Hi: µ1 ≠ µ2 ≠ µ3

SSTotal =

N = mk

= 3 x 3 = 9

SST = 82 + 132 + … + 172 + 172 - 9

)121( 2

= 1709 – 1626.78

= 82.22

SSTr = N

R

m

Tim

i

2

1

2 ..−∑=

=

= 1654.33 - 1626.78

= 27.55

SSBlock =N

R

k

Bjk

j

2

1

2 ..−∑=

=

= 1676.33 - 1626.78

= 49.55

SSError = SSTotal - SSTreatment - SSBlock

= 82.22 - 27.55 - 49.55

= 5.12

9

)121(

3

434533 2222

−++

9

)121(

3

484231 2222

−++

137

ANOVA Table

Source of

Variation

Degree of

Freedom

Sum of

Squares

Mean

Squa

re

F-ratio

Treatment 2 27.55 13.78 10.77

Blocks 2 49.55 24.78 19.36

Error 4 5.12 1.28

Total 8 82.22

Fo.o5, (2,4) = 6.94 in both cases.

Decision: Since Fcal> Ftab in both cases, the null hypothesis H0, is rejected. We therefore

conclude that not all the treatment means are equal.

Example 4

Four groups of physical therapy patients were subjected to different diets in 6 clinics. At the

end of a specified period of time, each was given a test to measure treatment effectiveness.

The following scores were obtained:

Treatments (Diet)

Clinic 1 2 3 4 Total

1. 76 70 90 80 316

2. 75 82 64 88 309

3. 72 80 79 71 302

4. 38 95 74 90 317

5. 66 80 87 60 293

6. 82 88 85 75 330

Total 429 495 479 464 1867

138

Do these data provided sufficient evidence to indicate a difference among the treatments?

Use α = 0.05

Solution

Ho: µ1 = µ2 = µ3 = µ4

Hi: at least one µi ≠ µj

SSTotal = 762 + 702 + 902 + … + 852 + 752 - 24

)1867( 2

= 147439 - 145237.04

= 2201.96

SSTreatment = 24

)1867(

6

464479495429 22222

−+++

= 145633.83 - 145237.04

= 396.79

SSBlock =

= 145444.75 - 145237.04

= 207.71

SSError = 2201.96 - 396.79 - 207.71

= 1597.46

24

)1867(

4

330293317302309316 2222222

−+++++

139

ANOVA Table

Source of

Variation

Degree of

Freedom

Sum of

Squares

Mean Square Fo

Treatment 3 396.79 132.26 1.24

Blocks 5 207.71 41.54 0.39

Error 15 1597.46 106.50

Total 23 2201.96

F2, 15 (0.05) = 3.29

F5, 15 (0.05) = 2.90

Decision: Since the value of Fcal < Ftab in both cases, we accept the null hypothesis and

conclude that the experiment does not show any significant difference among the four

treatments.

Advantages of the RCBD

1) It is easy to understand with simple computation.

2) Grouping experimental units into blocks gives better precision.

3) Missing value can be easily estimated so that arithmetic convenience is not loss.

4) Certain complications that may arise in the cause of experiment are easily handled

with this design.

Disadvantage of RCBD

The disadvantage of RCBD is as follows:

I. When the variation among the experimental units between a block is large, a large error

term results. This frequently occurs when the number of treatment is large. Thus, it

may not be possible to secure sufficiently uniform groups or units for blocks. This is

the major disadvantage of the RCBD.

140


In this study session, you have learnt about:

1. The experiment involved in the planning of experimental designs;

2. The procedures involved in planning and execution of biological experiments;

3. How to define relevant terms in Design and Analysis of Biological

4. The concept of the completely randomized design and the randomized complete block

design: the analysis of variance (one way and two-way classification) techniques to

solve problems in the CRD and RCBD.







When planning and executing an experiment, list and explain the first five procedures to

follow in order to accomplish success.

References

Edward, R. Dougherty (1990): Probability and Statistics for the Engineering, Compound and

Physical Sciences

James, T. McClave, Terry Sincich (1995, 5th Ed): A first course in Statistics

Lindgen, McElrath, Berry (1978, 4th Ed): Introduction to Probability and Statistics

Murray, R. Spiegel (1972, 1st Ed): Theory and Problems of Statistics

Royal Statistical Society (2004): Statistical tables for use in examinations.

Rustagi J.S. (1985): Introduction to Statistical Methods, Volume II.

141

Study Session 5: Design and Analysis of Clinical Trials

Introduction

Paying attention to design and analysis of clinical trial will have a lot to do with experiments.

For example, for you to test the effectiveness of a drug formulation in the remedy of a

specific disease, some patients with the disease are given the new drug and some are given a

well-known drug. The two groups are then compared prospectively for incidence of the

disease and other outcomes.

An experiment applied to existing patients, in order to decide upon an appropriate remedy

(treatment) or to those presently free of warning sign, in order to decide upon an appropriate

preventive strategy. These experiments involve giving remedy to the subjects in the study.



5.1 Define clinical trial and Identify the features of clinical trial;

5.2 Identify and explain the Phases of clinical trials;

5.3 Highlight and explain the basic terms on clinical trials;

5.4 State some ethnical issues on clinical trials;

142

5.1 What are Clinical Trials?

A Clinical Trial is a well planned experiment on human beings (species) which is designed to

evaluate the efficacy and the safety of one or more forms of treatment (remedy).

Clinical trial is conducted to compare at least two interventions, one of which is usually the

standard or well-known intervention. The subjects are assigned to these interventions at

random. A properly designed clinical trial is extremely important for drawing valid

inferences from its results.

5.1.1 Features of Clinical Trials

The following are the features of Clinical Trials:

1. A comparative clinical trial is essentially prospective rather than retrospective. That

is, the participant must be followed from a well-defined time point which becomes

time zero or baseline for the study for that participant.

2. A comparative trial must have one or more interventions or treatments. These may be

used to prevent diseases, diagnostics or used in treating illness, etc.

3. A comparative clinical trial must have a control group against which the intervention

group is compared. That is, most often a new intervention is compared with the best

current standard treatment. Or if no standard treatment exists, the control group may

receive a placebo or nothing at all. Placebo is an identical looking drug that has no

active ingredient.

5.1.2 Needs for Clinical Trials

1. A clinical trial is the method used to determine whether a health or medical

intervention has the postulated effect.

2. A Clinical trial offers the possibility of judgment based on uncontrolled clinical

observation whether a new treatment has made a difference to desired outcome

because there is a control group which is comparable to the intervention group in

every way except for the intervention being studied.

3. It evaluates the efficacy and safety of the new treatments because clinical trial is

conducted properly and it follows the principles of scientific experimentation

provided.

143

5.1.3 Uses of Clinical Trials

1. To provide confirmatory evidence for findings from earlier studies that may be

unconvincing.

2. To develop and test intervention in nearly areas of medicine and public health.

3. For easier understanding of safety and effectiveness concerns about new treatments

prior to marketing approval.

4. It is used to develop surgical treatments and some medical device.

5.1.4 Advantages of Clinical trials

The following are the advantages of clinical trials:

1. It ensures that the ‘cause’ come before the ‘effect’.

2. It ensures that the possible confounding factors do not confuse the results because we

can allocate subjects to treatment in any way we choose.

3. It ensures that treatments are compared efficiently. That is, using control group over

allocation to spread the sample over the treatments groups so as to achieve maximum

power in statistical tests such as t test.

5.1.5 Disadvantages of Clinical trials

The following are the disadvantages of Clinical trials:

1. It may share many of the disadvantages of cohort studies, since intervention studies

involve the forthcoming collection of data.

2. Ethical problems are associated with the given experimental treatments, which rule

out the use of intervention studies in epidemiological investigations.

3. This may restrict the generalization of results because clinical trials screen out those

who may have a special reaction to treatment.

144

New

che

mic

al

com

poun

d or

A

nim

al s

tudy

P

hase

A

To

dete

rmin

e

the

max

imum

allo

wab

le o

r

tole

rate

d do

se

and

to k

now

Pha

se B

1

To

dete

rmin

e if

it is

effe

ctiv

e

enou

gh to

cont

inue

the

Ter

min

atio

n

Pha

se B

2

To

estim

ate

effic

acy

of t

he

Pha

se C

To

com

pare

the

effic

acy

of

the

new

trea

tmen

t w

ith

Pha

se D

To

mon

itor

perf

orm

ance

and

side

effe

cts

of t

he n

ew

PH

AS

ES

OF

CLI

NIC

AL

TR

IALS

No

N

o

Yes

Y

es

Yes

Y

es

Box 5.1: Recall

A Clinical Trial is a well planned experiment on human beings (species) which is designed to

evaluate the efficacy and the safety of one or more forms of treatment (remedy).

5.2 Identification of Phases of Clinical Trials

Clinical trial is classified into four main phases of experimentation namely phase I, II, III, and

IV.

This shows how it is applied in the pharmaceutical industry. It also indicates how the research

programme for the development of a new drug as applied to a particular disease might

proceed.

145

5.2.1 Phase A of Clinical Trials

1. The main purpose is to test the treatment that is, the bioavailability of the drug, drug

metabolism and toxicity.

2. Another purpose is dose-finding by determining how large a dose can be given before

unacceptable toxicity is experienced by patient. This dose is known as the maximum

dose or MTD. And use of multiple dose to determine appropriate close schedules.

3. In this stage, sample size usually between 20 and 50 subjects are drawn. And the trials

are carried out in healthy volunteers.

5.2.2 Phase B of Clinical Trials

This stage investigates the effectiveness and safety i.e. to make an estimate of the probability

that patient will:

1. Have serious side-effect from the treatment.

2. Benefit from the treatment.

It is used as a place where screening process has taken place to select out those relatively few

drugs of genuine potential from the larger number which are inactive or over toxic.

Involvement in the estimation of optimum sample sizes by statistical considerations is

required in phase B trials.

5.2.3 Phase C of Clinical Trials

These are comparative trials of a new drug (emerging from phase trails) simultaneously with

the current standard treatments under the same conditions. It usually requires a large number

of subjects, strict statistical principles are used to determine optimum sample sizes required.

In a very large audience, the term ‘clinical trial’ is synonymous with a phase B comparative

treatment efficacy trial. It is the most rigorous and extensive type of scientific, clinical

investigation of a new treatment. There are extensive rules and regulations for the design,

organization, conduct and analysis of phase C trials.

146

5.2.4 Phase D of Clinical Trials

These are studies to monitor post-registration performance, interaction with other therapies or

treatment, unusual complication and adverse side effects that are uncommon.

In this stage, some phase D trials may incorporate epidemiological designs such as

descriptive study, case-control and cohort study to enable the estimation of the incidence of

serious side effects.

Most studies in phase D especially those using only disease registers or notification systems

capture only serious side effects and may not precisely ascertain the number of patient who

have received the treatment.

In-Text Question

In this stage, sample size usually between 20 and 50 subjects are drawn. And the trials are

carried out in healthy volunteers. This describes;

a. Phase A

b. Phase B

c. Phase C

d. Phase D

In-Text Answer

a. Phase A

5.3 Basic Terms on Clinical Trial

As a moral philosophy embodies the analysis, evaluation and development of “normative

moral criteria for dealing with moral problems”. It also assigns significance to norms, values

and principles in the regulation of human affairs. It is the study of morality and its effects on

human conduct. It is concerned with questions such as when an act is right or wrong.

147

Research

It can be defined as an organized, formally structured methodology to obtain new knowledge.

Examples of research are the laboratory research, epidemiology research etc.

Placebo

It is an identical looking drug that has no active ingredients. It is an inert substance which

looked like and tasted like the treatment.

Experiment

It is a planned investigation, which is carried out to obtain facts or deny the results of

previous investigation.

Randomization of Randomization

It the process of using random number to assign treatment to patients (subjects). Subjects

should allocate to treatment group according to some chance mechanism.

Advantages

1. To avoid systematic bias.

2. To exclude deliberative bias in the allocation.

Blindness

It is the policy of keeping someone unaware of the treatment which has been given. There are

three forms of blindness.

1. Single-blind

2. Double blind

3. Triple blind

It is proper for a trial to be “double-blind” i.e. for the doctor not to know the particular

treatment that he is giving to a particular patient. The reason is to avoid subjective judgment.

Treatment

It can be defined as any stimuli that can be applied to a subject. It may be drug, diet, etc.

148

Replication: Is the repetition of some treatment on different subjects.

Examples of clinical trials are:

1. Aspirin in the prevention of recurrent heart attacks.

2. A comparative clinical evaluation of econazole nitrate, miconazale and nystatin in the

treatment of vaginal candidacies.

5.3.1 Protocol Design in Clinical Trials

1. Background and general aims of the trial: It should be clearly stated. It also builds on

the experience gained from past researches. The aims may be to compare the efficacy

and side-effects of a new treatment.

2. Specific objectives of the trials: There should be a concise and precise definition of

hypothesis concerning treatment efficacy and safety which are to be examined by the

trial.

3. Sampling method

4. Treatment schedule: This involves detail of the treatment.

5. Method of patient evaluation.

6. Trial design: It deals with method of treatment allocation.

7. Registration and randomization of patients

8. Patient consent

9. Required site for the study.

10. Monitoring of trial progress

11. Form and data handling

12. Protocol deviation

13. Plans for statistical analysis

14. Administrative responsibilities

5.3.2 Role of Statisticians in Clinical Trials

1. Assist by advising on the required number of patients.

2. He assists in the method of randomization and the data processing techniques.

3. Patients’ registration treatment schedule.

4. Statistician has a valuable contribution to improve the overall design.

149

In-Text Question

There are three forms of blindness, they are listed below except?

a. Placebo blind

b. Single-blind

c. Double blind

d. Triple blind

In-Text Answer

a. Placebo blind

5.4 Ethical Issues on Clinical Trials

1. The proposed treatment must be safe and therefore not likely to bring harm to the

patient. The following are the examples of treatment that has been carried out

before and has led to death and damages:

a. Tuskegee Syphilis Experiments in 1932

b. Nazi experiments from 1930 – 1945, stopped in 1946

c. Pfizer Drug trial in children: It took place in Kano in 1996 and stooped in

2001

2. Which type of patient may be brought into a controlled trial and allocated randomly to

different treatment e.g. pregnant woman, the very young, the old and frail should be

excluded from trial.

3. The patient consent should be obtained in every trial.

4. It is ethical to use a placebo or dummy treatment.

5. The trial should be double-blind. Double-blind, in the sense that the doctor and the

patient are not aware of the nature or kind of treatment given.

150

5.4.1 Sample Size Determination

Power

The power of a hypothesis is the probability that the null hypothesis is rejected when it is

false. Often, this is represented as a percentage rather than a probability. We shall consider

here, determination of the power of a hypothesis test and the sample size shall be estimated.

For a one-sided test with hypothesis as below:

Ho: µ = µ0

H1 : µ > µ0

The power can be defined as

)Z- (Z P power 01ασ

µµ

n

−<=

For a two – sided test with hypothesis stated

as: Ho: µ = µ o

H1 : µ = µ1

)Z- (Z P power 2

01ασ

µµ

n

−<= Where µ1 > µ0

or

)Z- (Z P 2

01ασ

µµ

n

−< Where µ0 > µ1 or µ1< µ0

Example 1

The male population of an area within a developing country is known to have had a mean

serum total cholesterols value of 5.5mmol/l 10 years ago. In recent years, many western

foodstuffs have been imported into the country as part of a rapid move towards a western –

style market economy. It is believed that this would have caused an increment in cholesterol

levels. A sample survey involving 50 men is planned to test the hypothesis that the mean

cholesterol is still 5.5 against the alternative that it has increased, at the 5% level of

significance from the evidence of several studies; it may be assumed that the standard

deviation of serum total cholesterol is 1.4mmol/l. The epidemiologist in charge of the project

151

considers that an increase in cholesterol up to 6.0monol/l would be extremely important in a

medical sense. What is the chance that the planned test will detect such an increase when it

really has occurred?

H0: µ = 5.5 (µo)

H1: µ > 5.5 (µo)

α = 5% and µ1 = 6.0 µ0 = 5.5, σ = 1.4, n = 50

Since it is one- sided test:

−−<=

−−<=

05.0

01

504.1

5.50.6ZZp

Z

n

Zppower ασµµ

Z0.05 from statistical table = 1.645

=

=

=

Power = 0.8106

Hence, the chance of finding a significant result when the mean really is 6.0 is about 81%

Example 2

Using example 1 again, but here, we wish to test for a cholesterol difference in either

direction so that a two – sided test is to be used. What is the chance of detecting a significant

difference when the true mean is 8.0mmol/l and when the true mean is 4.0mmol/l? α = .1%

−∠Ζ 645.1198.0

5.0p

( )645.1525.2 −<zp

( )88.0<zp

152

Solution

H0 : µ = 5.5

H1 = µ = 8.0

α = 0.01

−−<=

−−<=

201.0

01

504.1

5.50.8ZZp

Z

n

Zppower ασµµ

( ) 105.10 =<zp

There is 100% chance of detecting an interest using a two sided test

If H0 : µ = 5.5 against

H1 = µ = 4.0

( )576.2198.00.45.5 −<= −Zppower

( ) 15 =<zp

Example 3

Likewise, consider the following example if the true mean is 6.0mmol/l at α = 5% we have,

( )960.1198.05.50.6 −<= −Zppower

( ) 7157.057.0 =<Zp

There is about a 72% chance of detecting an increase or decrease of 0.5moll using a two

sided test.

−< 576.2198.0

5.2zp

−< 576.2198.0

5.1zp

( )576.2576.7 −<zp

153

Determination of Sample size of a mean value For a One sided test

( )( )2

01

22

µµσβα

−+

=zz

n

For a two sided test

( )( )2

01

222

µµσβα

−+

=zz

n

Example 4

Considering example 1 again, the epidemiologist decides that an increase in mean total serum

cholesterol up to 6.0 moll from the known previous value of 5.5mmoili is so important that

he would wish to be 95% sure of detecting it. Assuming that a one-sided 10 % test is to be

carried out and the standard deviation, σ is known to be 1.4 moll as before, what sample size

should be used?

Solution

( )( )2

01

22

µµσβα

−+

=zz

n

µ 1 = 6.0, µ 0 = 5.5

2σ = 1.42 = 1.96

Z α = Z 0.1

= 1.282

Power = 1-β , β = I-P = 1- 0.95 = 0.05

Z β = Z0.05

= 1.645

( )( )2

2

5.56

96.1*645.1282.1

−+=n

n= 8.567329 x1.96

0.25

154

= 67.17

Value of n will be rounded up

n = 68

Example 5

Suppose the epidemiologist has decided to change his requirements. He now wishes to be

90% sure of detecting when the true means is 6.4 mmol/l with the previous data still the

same. Assuming that a two sided 5% test is to be carried out, what sample size should be

used?

( )( )

( )

25

43.25

9.0

6007.20

4.1*5.54.6

282.196.1

,

282.1

96.1

2

22

2

1.0

025.02

==

=

−+=

==

==

n

n

Therefore

zz

zz

β

α

Example 6

Vitamin A supplementation is known to confer resistance to infectious diseases in Nigeria,

although the precise mechanism is unclear. To seek to explain the mechanism, a group of

healthy children will be given vitamin A supplements for a long period. At baseline and at

the end of the supplementation period, urine samples will be taken from all the children and

neopterin excretion measured. The current level of neopterim is estimated as 600 mmol/per

millomole of creatinine and the standard deviation of change due to supplementation has

been estimated (from a small pilot sample) to be 200 mmol/mmol creatinine.

i. Suppose that a drop of 10% neopterim is considered a meaningful difference, which it

is wished to find with 95% power using a two-sided 1% test. How many children

should be recruited?

ii. Suppose that resources allow only 150 children to be given the supplements. What is

the chance of detecting a drop of 10% neopterin with a 10% two-sided test?

155

Solution

( )( )

( )( )

( )( )

198

9649.197

40000*540600

221.4

54060600

60600*100

10

10%by drop

600

sin

40000*645.1576.2

645.1

576.2

05.0%951

40000

200

).

26064.712673

2

2

0

0

1

201

2

05.0

005.02

2

201

222

=

==−

=

=−=

==

=

−+=

==

===−=

=

=−

+=

n

n

ce

N

zz

zz

zzni

µ

µµ

µµ

βσσ

µµσ

β

α

βα

ii). n = 150

( )( )

%4.86

8639.0

098.1

576.2674.3

150200

540600

150200

540600

%1,200,540,600

201.0

2

01

==

<=−<=

−−<=

−−<=

====

power

zp

zp

zzp

zzppower α

ασµµ

156

Determination of Sample Size of Proportion

If the problem is to test the hypothesis

H0: P = Pm

H1: P= Pt = Pm + d

Where P is the true population; Pm is some specified value for this proportion for which we

wish to test or the minimum level of the effectiveness of which if the drug does not achieve,

results in the drug being dropped from further test and Pt (which differs from Pm by an

amount d) is the alternative value which we would like to identify correctly within probability

of β−1 or it is the interest level of effectiveness such that if the drug achieved or exceed it,

it would mean that the drug has an efficacy level worthy of further investigation in a phase III

trial

One-sided test

( ) ( )( )( )

( ) ( )( )( )2

2

2

2

2

11

testsided- Two

11

mt

ttmm

mt

ttmm

pp

ppzppzn

pp

ppzppzn

−−+−

=

−−+−

=

βα

βα

Example 7

Suppose that in a country where male smoking has been extremely common in recent years, a

government target has been to reduce the prevalence of male smoking to almost 30%. A

sample survey is planned to test at the 5% levels the hypothesis that the proportion of

smokers in the male population is 0.3 against the one-sided alternative that it is greater. The

test should be able to find a prevalence of 32%; when it is true with 90% power. Determine

the population that should be sampled.

157

( ) ( )( )( )

( ) ( )( )( )

4567

2.4567

3.032.0

68.032.0282.17.03.0645.1

32.0,3.0

282.1,1.09.01,645.1

11

2

2

1.005.0

2

2

≈=

−+

=

==

===−===−

−+−=

n

n

pp

zzzz

pp

ppzppzn

tm

mt

ttmm

βα

βα

β

Example 8

A certain drug is to be entered into a phase 2 trial. If its effectiveness is found to be lower

than 20%. It will drop but if it is at least 50%, it will be pursued in further comparative trial.

If the significance level is taken to be 5% with a power of 80%, how many subjects are

needed to be studied?

Solution

( ) ( )( )( )

( ) ( )( )( )

( )( )

( )

13

9.1209.0

078.1

3.0

42.0658.0

2.05.0

5.05.084.08.02.0645.1

5.0,2.0

84.0,2.08.01,645.1

11

2

2

2

2

2

2.0

2

2

≈

==

+=

−+

=

==

===−==−

−+−=

n

n

n

pp

zzz

pp

ppzppzn

tm

mt

ttmm

βα

βα

β

Analysis of Clinical Trial

In a clinical trial of Aspirin versus placebo in the treatment of headache, the results were as

shown in the table below:

158

No headache Headache Total

Aspirin 130 (102) 20 (48) 150

Placebo 40 (68) 60 (32) 100

Total 170 80 250

Examine whether the difference in those cure with Aspirin and placebo have risen by chance.

Solution

H0: The difference in those cure with Aspirin and placebo rise by chance.

H1: The difference in those cures with Aspirin and placebo does not rise by chance

α = 0.05

Compute the expected values

alcolumn tot is

totalrow is

totalgrand theis N Where

.E

i

i

ii

c

r

N

cr=

The expected values were computed and put in the table. Note that they are enclosed in

bracket.

Compute

( )

∑−=

i

ii

E

EO 22χ

Oi (Observed) Ei (Expected) (Oi-Ei)2/Ei

130 102 7.69

40 68 11.53

20 48 16.33

60 32 24.50

Σ(Oi - Ei)2/Ei = 60.05

159

Tabulated value

( ) ( )

( ) ( )

841.3

21,1,05.0

21,1,

=

−−

χ

χα rc

Conclusion

Reject H0 and conclude that the difference in those cures by Aspirin and placebo has not risen

by chance.







Write short note on the following:

1. Randomization of Randomization

2. Experiment

3. Placebo

4. Research

References

Cochran W.G. & Cox G.M. (1992): Experimental Designs, Published by Wiley.

Ronald, E. Wapole (1982, 3rd Ed): Introduction to Statistics


Woodward, W. (2005): Epidemiology: Study Design and Data Analysis. Published by Chapman & Hall/CRC.

160

Study Session 6: Biological Assay

Introduction

Bioassays are typically conducted to measure the effects of a substance on a living organism

and are essential in the development of new drugs and in monitoring environmental

pollutants.

Bioassay (commonly used shorthand for biological assay or assessment), or biological

standardization is a type of scientific experiment. A bioassay involves the use of live animal

or plant or tissue or cell [in vitro (is the technique of performing a given procedure in a

controlled environment outside of a living organism)] to determine the biological activity of a

substance, such as a hormone or drug.

You are about to learn about Bioassay as an important area of application of statistical

methods. The general principles were established mainly during the 1930s and 1940s, and

were rapidly developed in detail to an extent which cannot be adequately covered here.



6.1 Define and Identify the basic components of Bioassay;

6.2 Highlight and explain the type and nature of Bioassay.

6.1 Biological Assay

A ‘biological assay’ is an experiment to determine the concentration of a key substance (i.e.

the substance under investigation, may be a pharmaceutical preparation like digitalis or

penicillin) in a preparation by measuring the activity of the preparation in a biological

system.

161

The term ‘bioassay’ is often used in a more general sense, to denote any experiment in which

specific responses to externally applied agents are observed in animals or some other

biological system. The emphasis in such experiments is often to measure the effects of

various agents on the response variable, and no attempt is made to estimate relative potency.

This usage is for example, common in the extensive programmes for the testing of new drugs,

pesticide, and food additives, etc.

Figure 6.2: Biological assay experiment

Vividly, biological assay or bioassays are methods used to compare the potencies of two

preparation of a drug by observing their respective actions on biological systems (e.g. plant,

animal) in relation to a standard preparation whose potency in arbitrary unit is known.

In general, bioassay is the measurement of the potency (i.e. Efficacy or effectiveness) of any

stimulus by means of the reactions that it produces on living matter. There are two types of

preparation, they are:

Test Preparation (T) – It is the preparation whose potency is required. It is usually some

quantity of naturally occurring diluents

containing an unknown concentration of substance under investigation.

Standard Preparation (S) –

The institution of a standard preparation usually leads to the definition of the standard unit as

the activity of a certain amount of the standard.

Any test preparation (T), can now be assayed against the standard (S) by simultaneous

experimentation. If, say S is defined to contain 2000 units per gram and T happens to contain

150 units per gram and therefore, the relative potency or potency ratio of T in terms of S is

then said to be 2000

150=R

40

3=

The relative potency of T is the ratio of the mean of test preparation to mean of the standard preparation.

ρ = R= S

T

µµ

The estimated relative potency

Basic Components of Bioassay

There are three essential basic aspect of an assay namely:

� Stimulus.

� Subject or experimental unit or biological material.

� Response.

Test preparation

Standard preparation

162

Figure 6.3: Types of preparation

is the preparation whose potency is required. It is usually some

quantity of naturally occurring diluents, or of a product of a manufacturing process

containing an unknown concentration of substance under investigation.

It is the preparation whose potency in arbitrary unit is known.


the activity of a certain amount of the standard.

can now be assayed against the standard (S) by simultaneous



The relative potency of T is the ratio of the mean of test preparation to mean of the standard

The estimated relative potency s

T

X

XR == ρˆ

Basic Components of Bioassay

There are three essential basic aspect of an assay namely:

Subject or experimental unit or biological material.

Test preparation

Standard preparation

is the preparation whose potency is required. It is usually some

or of a product of a manufacturing process

It is the preparation whose potency in arbitrary unit is known.


can now be assayed against the standard (S) by simultaneous



The relative potency of T is the ratio of the mean of test preparation to mean of the standard

163

1 Stimulus

It is an agent which causes a reaction or change in an organism or any of its part. The

stimulus (usually a drug) is applied to an experimental unit (animal, man or living material).

It is also referred to as the objects (substance) whose potency is been measured. Examples are

vitamins, drug, fungicide, insulin, analgesic etc.

The application of a stimulus at a given dose results in a change in some measured

characteristics. It is the procedure whose effect is to be measured.

2 Subject or Experimental or Biological Material

This is referred to as the piece of animal, tissues, plant or living matter to which the stimulus

is applied. It is the smallest division of the experimental material or as the unit of material to

which one application of a treatment (usually a stimulus) is applied.

3 Response

It is a measurable characteristic of the subject that develops as a result of the application of

the stimulus e.g. there are some chronic reactions such as allergic reaction that occurs when

some people are subjected to chloroquine.

It is defined as the reaction that occurs when a stimulus is applied to a subject. Concisely, it is

the reaction of the subject.

6.1.1 Purpose of Bioassay

It is used to estimate the relative potency of the stimulus needed to produce the desired

response. It is used to compare the potency obtained from a test preparation relative to a

standard preparation.

Design of an Assay

Let S be a standard preparation of a drug whose potency in arbitrary unit is known and T be

another test preparation whose potency is required. Take a number of subjects, divide them

into two groups, one of the groups is allocated to the standard preparation and the test

preparation is assigned to the other groups. To each subject, the drug is administered until

pre-assigned response has occurred. The dose corresponding to the occurrence of the pre-

164

assigned response is recorded as an observation. The mean of this observation are taken in

order to estimate the potency. A measure of potency of a test preparation is the relative

potency defined as s

T

X

XR == ρˆ

6.2 Types of Assay

There are two major categories of biological assay and this can be further categorized into

other types:

1. Qualitative Assay: It is a form of assay in which the potency cannot be measured

numerically e.g. colour, complexion.

2. Quantitative Assay: Is a form of assay where it is possible to measure the potency of

the stimulus. This can be further broken down into two main type of assay.

a. Direct assay

b. Indirect assay

6.2.1 Direct Assay

In direct biological assay, the potency of a test preparation can be examined directly and its

relative potency can be determined in comparison with that of a standard preparation. The

simplest form of assay is the direct assay, in which increasing doses of S and T can be

administered to an experienced unit until a certain critical event take place. This selection is

rare. An example is the assay of prepared digitalis in the guinea pig, in which the critical

event is arrest of heart beat. The dose given to any animal is a measure of the individual

tolerance of that animal which can be expected to vary between animals through biological

and environmental causes.

The relative potency of the test preparations is defined by s

T

X

X=ρ

Derivation of the Confidence Interval for ρ

THEOREM

Let TX and SX be the samples average of test and standard preparations in a potency test. If

the standard preparations have mean and variance sµ and n

2σ respectively and the test

preparation has mean Tµ and variancem

2σ . Assuming the independence of the two samples,

165

the distribution ST XX ρ− has mean 0 and variance

+

nm

22 1 ρσ therefore the appropriate

confidence interval for ρ is

( )

−−−±

− 2

22

2

2

111

1

TS

T

S

T

Xm

stg

X

X

X

X

g

( )

−−−±

2

22

11ˆˆ1

1

TXm

stg

-gor ρρ

Proof

ρ S

T

µµ= ………… equation (1)

S

T

X

XR == ρ ………… equation (2)

Var (R) ( ) ( )( )sT

s

XVRXVX

2

2

1 +≅

The standard deviation of the estimate of R can be obtained for a large sample size. Recall

that the variance of the sample average is the variance of the population divided by the

number of observations. Hence, both ( )TXV and ( )sXV can be obtained from the sample. If

the variance is assumed to be the same for the standard and test preparation, a pooled

variance estimate can be used for both.

Then the estimate of the common variance 2σ

i.e ( ) ( ) ( )∑∑==

− −+−−+==m

iTTi

n

issi XXXXmns

1

2

1

2122 2σ

= ( ) ( )

( )21

2

1

2

−+

−+− ∑∑==

mn

XXXXm

iTTi

n

issi

equation (3)

166

Since

( ) ( ) ( )( ) ( )

)4( ...................... 1

22

22

2

2

equationnm

nm

XVXV

XVXVXXV

ST

STST

+=

+=

+=

+=−

ρσ

σρσρ

ρρ

Since S

TST µ

µρρµµ ==− and 0

i.e 0=

− S

S

TT µ

µµµ then using S2 as an estimate of 2σ then we can write

+

−

nmS

XX ST

21 ρ

ρ ………….equation (5)

Equation (5) has a t-distribution with n+m-2 degree of freedom (where n and m is the no of

observation of standard and test preparation respectively). The ( )α−1 level of confidence

interval is obtained with the help of the following inequality.

+<−<

+−

−−+−−+ nmStXX

nmSt

nmSTnm

2

21,2

2

21,2

11 ρρραα

……………… equation (6)

This will lead to obtaining the interval by solving the quadratic equation

( )

+=−

nmStXX ST

2222 1 ρρ …………….. Equation (7)

Equation (7) can be re-written after multiplying out all terms and collecting those involving

ρ2 and ρ as

0222

222

2 =−+−

−

m

StXXX

m

StX TTSS ρρ

167

−

−

−−±=

m

stX

n

stX

m

stXXXXX

S

TSTSTS

22

222

2222

…………. Equation (8)

The root of a quadratic equation are not real i.e. the expression in equation (8) above inside

the radical sign is –ve. This means we are unable to find the confidence interval for the

relative potency by this method. Also, by dividing the numerator and denominator of right

hand side of equation (8) by 2

SX , the confidence interval for ρ can also be written as:

where

( )

2

22

2

22

2

2

111

1

S

TS

T

S

T

Xn

stg

Xm

stg

X

X

X

X

g

=

−−−±

−

……………. (9)

The confidence limit in equation (9) is a special case of the well known Feller’s theorem ……

Note

1) t can be obtained from the statistical table i.e t =,

, 22

−+nmtα

2) ( )ST XXE ρ− is normally distributed with mean zero

( ) ( )

S

T whereµµρρµµ

ρ

=−

−

ST

ST XEXE

0=− ss

TT µ

µµµ

168

Example 1

The standard preparation of a drug gave the determinations of 10.5, 10.7, 9.8, 9.5 and 9.5. For

the test preparation, we have 12.7, 11, 5, 10.7, 13.5, 12.7 and 13.3.

a. Find the point estimate of the relative potency of the test preparation.

b. Give a 90% confidence interval for the relative potency.

Solution

Relative potency, S

T

X

XR == ρ

m

XX

m

iT

T

i∑== 1 and

n

XX

n

iS

S

i∑== 1

siX ( )2

ssi XX − TiX ( )2

TTi XX −

10.5 0.25 12.7 2.0449

10.7 0.49 11 0.0729

9.9 0.04 5 39.3129

9.5 0.25 10.7 0.3249

9.5 0.25 13.5 4.9729

12.7 2.0449

13.3 4.1209

Total 50 1.28 78.9 52.8943

Therefore S

T

X

X=ρ

Where m

XX

m

iT

T

i∑== 1

= 78.9

7

= 11.27

169

127.1 10

27.11 implies This

10550

1

=

==

=∑

=

ρ

n

XX

n

iS

S

i

Therefore the relative potency of the test preparation is 1.127

b) 90% confidence internal for the relative potency

( )

−−−±

− 2

222 11ˆˆ

1

1

TXm

stg

gρρ

And

( ) ( ) ( )

( ) ( )42.5

8943.5228.110

2

2

12

1 1

2212

=

+=

−+−−+=

−

= =

− ∑ ∑

s

s

XXXXmnsn

i

m

iTTissi

Therefore g = ( ) ( )

2

2

105

4258121

X

..

= 0. 036

Therefore 90% confidence limit for ρ

812110050

21025722

2

22

.

,.

.,,

==

=

=

−+−+

t

tt

Xn

stgwhere

nm

S

α

170

( ) ( ) ( )

( )( ) ( )( )

( )7160

57001691

3252801691

02019640127112710371

27117

42581211036011271127103601

2

2

221

.,.

..

..

.....

.

......

=

±=

±=

−−±=

−−−±−= −

X

Therefore, the confidence limit for ρ

is 7.1ˆ6.0 ≤≤ ρ

Example 2

Let m = n = 7, SX = 6.25, TX = 8.5 and let S2 = 0.358 for a 95% confidence interval, we

obtain t = 1.882 for 12 degree of freedom.

1. Find the estimate of potency

2. Construst a 90% confidence interval for P

Solution

1.

36.1ˆ25.6

5.8ˆ

ˆ

=

=

=

ρ

ρ

ρS

T

X

X

2. ( )

−−−±

− 2

222 11

1

1

TXm

stgRR

gˆˆ

( ) ( )( )

( )( )4375273

35801755243

2567

358078212

2

2

22

.

..

.

..

==

=

X

Xn

stgwhere

S

171

g = 0.00416

( ) ( ) ( )

( )( )[ ][ ][ ]

[ ][ ][ ]

[ ]2.29 ,.

2.285 ,..

.. ,...

...

...

....

.....

.

.....

.

440

43500041

925036192503610041

92503610041

85603610041

99360849613610041

002250199584084961361995840

1

587

3580782110041601361361

0041601

12

22

+−±

±

−±

−−±

−−−±− X

Therefore the confidence limit for ρ is

29.244.0 ≤≤ ρ

6.2.2 Indirect Assay

When direct assays are not feasible, we apply different doses to various subjects and we

measure the responses. We consider a case when the response is a continuous variable. We

then assume that the dose-response relationship has a parametric model. We are interested not

only in estimating potency, but also the overall dose-response relationship and the parameters

of the model involved.

Generally, few dose levels are tested and various standard designs can be used in conducting

a bioassay. The general theory of the design of experiment apply here as well. We shall

assume that the relationship between the log-dose and the response is linear. Let Ys and YT be

responses to a dose level of the standard preparation Zs and ZT where log Zs = Xs and log ZT =

XT

We assume linear relations for the test and standard preparations.

TTTT XY βα += --------------------------------------- (1)

ssss XY βα += --------------------------------------- (2)

172

If the test preparation has potency P, a dose of ZT (which is the same as pZs) gives the same

response Ys as the standard preparations. Notice that

( ) XsPZspPZsLogZX TT +=+=== loglogloglog

We shall discuss indirect assay under two types of assay.

1. Parallel line assay

2. Slope ratio assay.

Parallel Line Assay

Assume first that the dose-response relationship for the test drug is a line with the same

slope as the one of the standard preparation. Such assays are called parallel-line assay.

We then have

sTsTTs XPXwhenYYand +=== logββ

so that

( )ssTs XPY ++= logβα

or

sssTs XPY ββα ++= log -------------------------------------- (3)

Comparing equation (2) and (3), we have

PsTs logβαα += ------------------------------------------------------- (4)

Let sTs andβαα ˆˆ,ˆ be the estimates of the corresponding parameters, then

ssss XY βα −=ˆ -------------------------------------------------------- (5)

TsTT XY βα −=ˆ ------------------------------------------------------- (6)

22 )()(

))(())((

TTissi

TTiTTississis

XXXX

YYXXYYXX

−Σ+−Σ−−Σ+−−Σ

=β ------------------- (7)

The potency P can now be estimated in terms of sTs and^^^

, βαα with

s

TsRβ

ααˆ

ˆˆlog

−= -------------------------------------------------------- (8)

173

An alternative form of r is given by

( )Tss

Ts XXYY

R −−−

=β

log ------------------------------ (9)

In-Text Question

In _____________ its relative potency can be determined in comparison with that of a

standard preparation.

In-Text Answer

Direct assay

Example 3

In an experiment to determine the concentration of Oestrone, the percentage gains in weight

of rats are recorded as responses and test doses after a fixed number of days after treatment.

Suppose 4 breeds of 7 animals are considered, whereby three animals are used for standard

preparations of 0.2mg, 0.4mg and 0.8mg respectively, and four animals are used for the test

preparation, with doses 0.0075cc, 0.015cc, 0.03cc and 0.06cc respectively. The table below

gives the responses, obtained as percentage weights (in grams):

Daily Dose

Standard Oestrone Yeeit Osetrone

Letter 0.2Ng 0.4Ng 0.8Ng 0.0075cc 0.15cc 0.03cc 0.06cc

I 50 80 80 60 90 120 130

II 40 80 100 70 40 60 80

III 60 100 110 50 90 100 120

IV 50 60 70 60 100 120 150

Averages 50 80 90 60 80 100 120

174

Solution

The doses can be standardized so as to obtain simplification in graphing and

computations. The standard dose Zs can be transformed to Xs by dividing by 0.4 and

taking the log to base 2

( )4.0loglog2 22 −= ss ZX

Giving the coded value as -2,0, and 2.

Similarly,

( 1103.0log015.0log2

1log2 22 −

+−= TT ZX

transforms the test preparations to -3, -1, 1, 3 we then obtain the regression equations for

the standard and test preparations in coded units using equation (5), 6, and (7) as follows:

Ts

ss

XY

XY

1090

1073

+=+=

using the transformation we have

ss ZY 22 log204.0log2073 +−=

and ( ) TT ZY log2003.0log015.0log1090 22 ++−=

using equation 8, the estimate of the relative potency is obtained from:

[ ]

( )

029.0

8025.1log5303.0log

85.04.0

00045.0loglog

20

174.0log03.0015.0log

2

1

03.0log10015.0log10904.0log207320

1log

22

22

22

2222

=−=

−=

−−×=

++−−=

Rthatso

R

or

R

175

i.e. one of the test preparation consists of 0.03K/g as Oestrone.

The diagram below shows the regression lines with coded values

Slope-Ratio Assays

When the dose response relationship does not lead to parallel lines, we must use other

models. We can assume then that the lines have the same intercepts on the y-axis but have

different slopes. An assay that uses the ratio of slopes is called the slope-ratio assays. In this

case, we can estimate the relative potency of the test-procedure with the help of the ratio of

the slopes of the lines. For the standard preparation, we have:

sssY βα += Xs ------------------------------------ (1)

for the test preparation, we have

( ) sTsTss XPSPY βαβα +=×+= ---------------------------------- (2)

P is the potency of the test preparation.

Equation 13 and 14 give sββτ =Ρ or τβ

βsP =

The estimates of the potency can be made in terms of the ratio of the estimated slopes of

the lines.

150

100

50

-3 -2 -1 0 1 2 3

x

x

x

0

0

0 x

Parallel – Line Assay

176

^

^

T

sRβ

β=

let N1 be the number of observations from the standard preparation and N2 be the number of

observations from the test preparation. The estimates of sTs and αββ , by pooling

both samples are

∑

∑

−

−

−=

−

−−

2

^

SSI

ssissi

s

XX

YYXXβ ----------------------------------- (3)

∑

∑

−

−

−=

−

−−

2

^

Tti

TTiTti

t

XX

YYXXβ --------------------------------------- (4)

and 21

^

2

^

1^

NN

XYNXYN TTTsss

s+

−+

−=

−−−ββ

α ---------------------- (5)

6.2.3 Definitions of some Terms

The following are the definition of some Terms:

Median Lethal Dose

This is the median of the response distribution. It signifies the value of the dose below which

half of the population will die. It is denoted by LD50.

Median Effective Dose

It is the dose producing response in 50% of trial. This means the median of the response

distribution when the response is something other than death. It is denoted by ED50.

Minimal Lethal Dose/Minimal Effective Dose

It is dose for any given poison which is only just sufficient to kill all the subject of a given

specie and that if the dose when little or smaller is given, it will not kill any of its subjects. It

177

can be obtained only by observation.

ED90. It refers to the dose that produces a response in 90% of trial or subjects.

Method of Estimating ED50

1. The spearman-kerbal method

2. Probit analysis

But for the scope of this course, we shall be restricted to the first method.

Spearman- Kerbal Method

Suppose that doses of a preparation having logarithm x1, x2 ..xk are tested and ri = 1,2, … of

the subject at dose xi responds then i

ii n

rp = is an estimate of the response rate at dose

xi

Suppose that Pi = 0 and PK = 1 then (Pi +1-Pi) is an estimate of the proportion of the subject

whose tolerance lies between xi and xi+1.

If a subject can tolerate up to dose d of a preparation such that any dose greater than d will

always kill it and any dose less than d will not kill it then d is the tolerance of the subject

relative to the preparation. If successive doses are fairly close together, the mean may be

estimated as

( )( )

21

11∑=

++ −−=

k

iiiii xxPP

m ---------------------------------------- (1)

If the doses are equally spaced so that idxix i ∀=−+ 1 then the estimate of ED50 = m=

∑−+ Piddx k 2

1 --------------------------------------- (2)

If the number of dose is also constant and equal to n then

ED50= m ∑−+= nrd

dx i

k 2 --------------------------------------- (3)

The variance for equation (2) is written as

178

∑=

=n

i

ii

n

qpdmv

1

2)(

the variance for equation (1) and (3) are given as

( ) ( ) ( )∑=

−−

=n

i

rinnn

dmV

12

2

1r i

Example 4

The table below gives the log-dose of a drug enough to kill a random sample of different rate

size and the number dead

Log-dose (xi) Number of rats ni Number dead ri

4 5 0

5 7 1

6 9 2

7 6 1

8 10 4

9 8 5

10 12 8

11 15 15

Obtain an estimate of ED50 and place a 95% confidence bound on your estimate

179

Solution

Since no of rat i.e. n is not constant, equation (2) is applied.

Log dose xi

ni ri Pi=

ni

ri

q = 1-pi

n

qp ii

4 5 0 0.00 1 O

5 7 1 0.14 0.86 0.02

6 9 2 0.22 0.78 0.02

7 6 1 0.17 0.83 0.02

8 10 4 0.40 0.60 0.02

9 8 5 0.63 0.37 0.03

10 12 8 0.67 0.33 0.02

11 15 15 1.00 0.00 0

Total 3.23 0.12

( ) ( )

12.0

12.0 x 1

27.5

23.35.08

23.3 x )1()1(2

18

1 2

1

2

1

250

50

50

1

50

==

==

==−+=

−+=

=−=

−+=

∑

∑

=

+

n

i

ii

ii

ik

nqpdEDVmV

Edm

ED

xxd

pddxED

180

Standard duration = ( )mV

= 12.0

S.E = 0.35

To place a 95% bound on ED50 we have

( )35.027.5

.

205.0

250

Z

ESZZED

±

± α

( )( ) ( )

( )956.5,584.4

686.027.5

35.096.127.5

35.0025.027.5

±±± Z

Example 5

The table below gives the logdose of a drug enough to kill a random sample of size 14 rats

and the number dead.

Logdose xi No of rats (n) No Dead (ri)

3 14 2

6 14 4

9 14 1

12 14 0

15 14 5

18 14 6

21 14 6

24 14 4

27 14 8

30 14 1

Obtain ED50 and place a 90% bound on your estimate.

181

Solution

Logdose xi n ri n-ri ri(n – ri)

3 14 2 12 24

6 14 4 10 40

9 14 1 13 13

12 14 0 14 0

15 14 5 9 45

18 14 6 8 48

21 14 6 8 48

24 14 4 10 40

27 14 8 6 48

30 14 1 13 13

Total 37 319

Where d= xi + 1 – xi = 3 xk = 10

( ) ( )

6.3

9.75.11

9.75.11014

3733

2

110

2

1

50 =−=

−+=

−+=

−+= ∑

ED

n

riddxm

nk

(ii) To place a bound on ED50

∑ −−

= )()1(

)(2

2

rinrinn

dmV

)319()114(14

32

2

X−

=

182

2548

2871)( =mV

( )( )

ESZED

mVEStherefore

mV

.

06.1

1268.1

..

1268.1

250 α+⇒

==

=

=

( )( ) ( )

( )3437.5,8563.1

7437.16.3

06.1645.16.3

06.16.3 05.0

=±±=±= Z

Example 6

Log dose xi Number of rats ni Number of dead ri

1.0 5 0

1.5 7 1

1.8 9 2

2.1 6 1

2.7 10 4

2.8 8 5

2.9 12 8

3.1 15 15

Obtain an estimate of ED50

Solution

183

Log dose xi Number of rats ri ni

riPi = PiPi −+1

1.0 5 0 0 ____

1.5 7 1 0.14 0.14

1.8 9 2 0.22 0.08

2.1 6 1 0.17 -0.005

2.7 10 4 0.40 0.23

2.8 8 5 0.63 0.23

2.9 12 8 0.67 0.04

3.1 15 15 1.00 0.33

( )( )2

111∑

=++ +−

=

n

iiiii xxPP

m

5.22

99.4

1

=

=

=i

xi + xi+1 ( )( )11 ++ +− iiii xxPP

2.5 0.35

3.3 0.26

3.9 -0.20

4.8 1.10

5.5 1.22

5.7 0.23

6.0 1.98

4.99

184


In this study, you have learnt about:

1. The basic components of Bioassay;

2. The type and nature of Bioassay.









a. Find the point estimate of the relative potency of the test preparation.

b. Give a 90% confidence interval for the relative potency.

References

Dobson A.J. (2002, 2nd Ed): An Introduction to Generalised Linear Models, published by

Chapman and Hall.

McCullagh P. & Nelder J.A. (1989, 2nd Ed): Generalised Linear Models, published by

Chapman and Hall.

Montgomery D.C., Peck E.A. & Vinning G.G. (2001, 3rd Ed): An Introduction to Linear

Regression Analysis, published by Wiley.

Myers R.H. (1992, 2nd Ed): Classical and Modern Regression with Applications, published by

PWS/Kent.

Thomas, H. Wonnacott, Ronald J. Wonnacott (1977, 2nd Ed): Introduction Statistics for

Business and Economics.


185

Study Session 7: Life Table

Introduction

A very long time ago, before the development of modern probability and statistics, people

were concerned with the length of life and constructed tables to measure longevity.

Traditionally, the life table was used primarily in the actuarial science for the analysis of life

contingencies and in demography to study population changes.

Today, life table methodology is being applied to vital statistics summarizing the mortality

experience of the current population. In this study you will learn about the uses and types of

life table including their functions.



7.1 Give the definition, uses and types of life table;

7.2 State the functions of a life table column headings and construct a complete and an

abridged life table; and

7.3 Solve various questions on life tables.

7.1 What is a Life Table?

Life table is a life history of a hypothetical group or cohort of people as it diminishes

gradually by death. It gives the mortality experience of a given population in a standard form.

It is usually used to study the mortality experience of selected cohort. A cohort is a group of

individuals followed over time for a specific purpose.

Life tables provide a description of the most prominent aspect of the state of human

mortality. Essentially, they illuminate and summarize the mortality experience of a

population. The life table is a valuable analytical tool for demographers, epidemiologists,

biologists, and research workers in other fields.

186

7.1.1 Assumptions of Life Table

The record of a life table begins at birth of each member and continues until every member

dies. The cohort losses the pre-determined proportion at each age and this represents a

situation that is artificially defined. This is done by means of a few stated assumptions, which

may be described as follows:

�� The cohort originates from an arbitrary number of births always set as round figures,

such as 1,000, 10,000, 100,000 e.t.c, called the radix.

�� Cohort is closed against migration

�� People die at each age group according to a schedule that is fixed initially, and does

not change.

�� At each age group, except the first few years of life, deaths are evenly distributed

between one age group and the next.

�� The cohort normally contains members of only one sex.

7.1.2 Usefulness of Life Tables

Life tables can be specifically applied to the following aspects of population analysis:

�� Life table models or references can be used for the graduation, adjustment and

extension of limited and defective data.

�� The study of replacement, reproduction, birth intervals, marriage and contraceptive

use.

�� It is used to study population growth and population estimate.

�� In the summarization of age- specific risks of death in a population, by providing us

with the chances of dying and surviving as a function of age and in the comparison of

these risks in different population groups.

�� It is also used by insurance companies to determine the amount to be paid as premium

on life insurance.

Life tables developed about four hundred years ago give the mortality experience of a given

population in a standard form.

7.1.3 Types of Life Table

There are basically two types of life table in use: the conventional (or periodic) life table or

cross-sectional life table and the generation (or cohort) life table.

One conceptual approach to understanding a life table is to imagine a group [of persons all

born at exactly the same moment]. Some persons in this imaginary birth cohort will die

before reaching their first birthday: others will die during their second year and so on.

Although some members of the cohort will attain very advanced ages, all will eventually die

A. The Conventional Life Table

The conventional or periodic life table gives a cross

survival experience of a population during a current year (e.g. the Oyo State population of

2006). It is entirely dependent on the age

it is constructed. Such tables project the life span of each individual in a hypothetical cohort

on the basis of the actual death rates in a given population.

The conventional life table is the most effective means of summarizing th

survival experience of a population.

B. The Generation Life Table

The generation (or cohort) approach has been basically explained by the example given by

the conceptual approach (cohort study) with the modification to take account of pra

situations. In practice, the cohort or generation life table is usually constructed for groups still

living and stops at the present age. The generation life

life span of persons making up the cohort, which st

187

ally two types of life table in use: the conventional (or periodic) life table or

sectional life table and the generation (or cohort) life table.


tly the same moment]. Some persons in this imaginary birth cohort will die


Although some members of the cohort will attain very advanced ages, all will eventually die

Figure 7.1: Types of Life Table

The Conventional Life Table

The conventional or periodic life table gives a cross-sectional view of the mortality and


2006). It is entirely dependent on the age-specific death rates prevailing in the year for which


on the basis of the actual death rates in a given population.

The conventional life table is the most effective means of summarizing th

survival experience of a population.

The Generation Life Table


the conceptual approach (cohort study) with the modification to take account of pra


living and stops at the present age. The generation life-table requires data covering the entire

life span of persons making up the cohort, which stands as a serious limitation to its

The Generation Life Table

The Conventional Life Table

Complete Life Table

Abridged Life Table

ally two types of life table in use: the conventional (or periodic) life table or


tly the same moment]. Some persons in this imaginary birth cohort will die


Although some members of the cohort will attain very advanced ages, all will eventually die.

sectional view of the mortality and


in the year for which


The conventional life table is the most effective means of summarizing the mortality and


the conceptual approach (cohort study) with the modification to take account of practical


table requires data covering the entire

ands as a serious limitation to its

188

preparation.

The generation life table has, however, wide application in medical research where it can be

used to estimate the expectation of life after an operation for certain types of diseases, where

follow- up is necessary. Also, cohort life tables do have practical applications in the study of

animal populations and have been extended to assess the durability of inanimate objects such

as engines, electric light bulbs, and other mechanical objects.

Merits and Demerits of the Two Methods

The merits and demerits of the two types of life table is hinged on which is more appropriate

for specific purposes.

The current (or periodic or conventional) life table presents a picture of the mortality at one

moment of time. It combines the mortality experience of different people, and no single

cohort has actually experienced or will experience this particular pattern, which is in a way

fictitious, but the value of the fiction lies in the graphic and easily comprehensible way it

summarizes mortality conditions.

The cohort (or generation) life table presents a historical record of what has actually occurred

and presents a realistic mortality pattern for one cohort only. The numbers involved are

usually small, and the procedure takes a long time to complete. It is tedious and cumbersome.

Basically for most demographic research, conventional or current life tables are used, and our

main concern is to study how to construct such tables and the uses of its various columns.

Construction of Life Tables

There are two major types of life table which can be constructed:

1. The complete life table

2. Abridged life table

C. Complete Life Table

A complete life table is a table in which the mortality experience is considered in single years

of age throughout the life span. All the columns of the life table are given for single year and

it is extremely detailed.

189

Limitations

1. It needs very extensive detailed data which is not always available to most countries.

2. Since the construction is only carried out in single years, one also needs large

numbers of, say death, at each point; otherwise the estimates will be subject to

systematic and random fluctuations.

D. Abridged Life Table

An abridged life table is one in which the measures are given not for every single year of age

but for age groups. The values are also in general more appropriate i.e. elaborate adjustments

are not made, although they are accurate enough for most practical purposes. It contains

columns similar to those of the complete life table. The only difference is the length of

intervals.

In-Text Question

Life table is a life history of a hypothetical group or cohort of people as it diminishes

gradually by death. True/False?

In-Text Answer

True

7.2 Relevance of Rates in Life Table Analysis

The following are the relevance of rates in Life Table Analysis:

1. Crude Death Rate

2. Age- Specific Death Rate

7.2.1 Crude Death Rate

This is the number of deaths expressed as a fraction or percentage of the total population at a

given point in time.

CDR KxP

D =

190

Where D = number pf deaths during the period

P = number of people in a community present over a specific period.

K = an appropriate constant, usually 100,000 for human population.

Due to migration, births and deaths, the total number of persons in a given year may not be

known precisely. In this case, the mid- year population count could be used as the population

for the year.

Example

A community has 53,280 people and there are 419 deaths in the community during 1999, the

crude death rate (CDR) for 1999, is

CDR 1999 = 419 (100,000)

53,280

= 786 deaths per 100,000 populations

7.2. Age- Specific Death Rate

We sometimes need death rates for specific segment of the population, such as for a certain

age group, or a certain disease or occupation or for a specific sex. The death rates are then

obtained for that particular group. In that case, the number of deaths is divided by the

population at risk.

The population at risk is the total population of the group we are concerned with. Suppose

that during a year we have Dx death in the age group (x, x+1), where the total number of

persons in the same age group (x, x+1) is Px, then the age specific death rate is given by

ASDR 100,000 x x

x

P

D=

191

Example

Suppose the number of people in the age group 35- 39 years is 30,000, and there were 115

deaths from heart disease. Then, the diseases and age specific death rate for 35- 39 years old

persons is

115 X 100,000

30,000

= 383 death per 100,000 persons.

Functions of the Life Table Columns

Generation and cohort life tables are identical in appearance but different in construction. The

following discussion refers to the complete life table.

Let xt p denote the population in the interval x to (x + t) years, and tDx denote the number of

deaths in the interval x to (x + t) years. The following column headings are used in a standard

life table:

(i) Age Interval: (x, x +t)

In the complete life table, t= 1, but in the abridge life table, t=5 except for the first year and

the next four years. So, the intervals for an abridge life table are (0-1). (1-5), (5-10), (10-15)

and so on.

(ii) Age specific Death Rate: (tmx)

This is defined as the number of deaths in a given year between exact ages x and x+t divided

by the mid-year population in an actual population. It is a central rate and forms the basic

data from which life table analysis begins.

tmx x

tx

t

p

D=

where tDx = death in the interval x to x+t

tPx = mid- year population in interval x to x+t.

192

(iii) Survivors of a Cohort of Live Born Babies to the Exact Age x: (lx)

For all real population, this column is a finite non-negative monotonic and non-increasing

function of x over the age range of individual. It can be conveniently taken as 100, 1000,

10,000 or 100,000. If lx is linear and the ultimate age (the age at which everyone dies) is 108,

then l1o8 = 0. If we assume that l0= 108, then lx= 108-x.

(iv) Death Experienced by the Life Table Cohort within the Interval x to x+t : (tdx)

This can be obtained by taking the first difference of the lx column starting with the radix.

Thus, tdx may be interpreted as the number of persons who died during the t-year period after

reaching exact age x. The length of the period t is then equal to the interval between the

birthdays.

For instance, 5d1 therefore refers to the number of persons who reach their first birthday but

die before reaching their sixth birth day.

xt d = lx – lx+t

For t = 1

xd = lx – lx+1

(v) Probability of a Person Aged x Dying within the Interval x to x+t :(tq x)

This is the conditional probability of death of an individual in the age interval (x, x+t) given

that a person has reached age x. It is derived from the corresponding age-specific death rates

of the current population.

xt

xt

x

x

x

xx

x

xx

x

tx

x

txx

x

xt

xt

pqalso

l

l

l

ll

l

dq

tfor

l

l

l

ll

l

dq

−=

−=−

==

=

−=

−==

++

+

+

1 ,

1

1

1

11

where tPx = conditional probability of surviving the interval x to (x+t) years.

193

Note:

This tqx differs from the ordinary age specific death rate, tMx, in the sense that tqx is based on

the life table population at the beginning of the age span x to x + t intervals to which it refers.

The tMx is a central rate based on the actual population at the mid- point of that period.

(vi) Probability of Surviving: ( tPx)

The function tPx defines the probability of a person aged x surviving through the interval x to

x + t year. It is the complement of tqx that is

tPx + tqx = 1

tPx = lx + t

lx

For t = 1

Px = lx + 1

lx

tPx + tqx = 1

tPx = 1- tqx

(vii) Life Table Central Death Rate: (tM x)

It is defined as the number of deaths in the life table (tdx) divided by the number of persons

(tLx) in the stationery population of the life table between age x and x + t

tMx x

ttxx

xt

xt

L

ll

L

d +−==

(viii) Number of Years Lived by the Total Cohort :(tL x)

The function tLx shows the number of persons years lived by the cohort during the interval

between specified birthdays that is between x to x + t years.

tLx ( )txx llt

++=2

194

on the assumption that those who died during a t – year interval live on the average 2

t years.

tLx can also be estimated as

tLx = tlx –2

t (tdx)

ix) Total Number of Years Lived Beyond Age x: (Tx)

The function Tx is derived directly from the tLx columns. This total is essential for

computation of the life expectancy. It is equal to the sum of the number of years lived in each

age interval beginning with age x or

Tx = Lx + Lx + 1 +--- + Lw

Where x = 0, 1, 2, ---, w

W= the highest age attainable

∑=

=w

xyyX LT In general,

TLx = Tx – Tx + t

Tx = Tx + t + tLx.

(x) Life Expectancy: (ex)

This is the expectation of life at age x. It is the number of years, on the average, yet to be

lived by a person of age x. The function is derived from the lx and Tx columns by the

relationship

ex = T x

lx

This column is the most important in the life table since each ex summarizes the mortality

experience of persons beyond age x in the population under consideration.

Since in a stationary population, the number of births in a year is equal to the number of

deaths in the year, then at x = 0

195

population Total

deathor birth ofnumber Total1

0

00

==T

l

e

This is the birth rate or the death rate in the stationary population.

Remark: The proportion of those living at age x who will survive to

age y is x

yyxxxy l

lPPPP == −+ 11ˆ...ˆˆˆ

Example: From the complete life table for California population in table 1, the expected

number of years to be lived by a person aged 20 is

01.54ˆ 20 =e

That is, the person is expected to die at age

= 20 + 20e

= 20 + 54.01

= 74.01 years.

Example of a complete life table:

The following table is the complete life table for total California, U.S.A., 1970, of persons

aged 0 to 23 years.

196

Table 1 Complete life table for total California population, U.S.A; 1970, of persons aged o to 23 years

x to x+1 qx 1x Dx Lx Tx ex

0-1 .01801 100000 1801 98631 7190390 71.90

1-2 .00113 98199 111 98136 7092029 72.22

2-3 .00086 98088 84 98042 6993893 71.30

3-4 .00073 98004 72 97966 6895851 70.36

4-5 .00052 97932 51 97906 6797885 69.41

5-6 .00049 97881 48 97857 6699979 68.45

6-7 .00045 97833 44 97811 6602122 67.48

7-8 .00034 97789 33 97772 6504311 66.51

8-9 .00031 97756 30 97741 6406539 65.54

9-10 .00030 97726 29 97711 6308798 64.56

10-11 .00031 97697 30 97682 6211087 63.58

11-12 .00033 97667 32 97651 6113405 62.59

12-13 .00035 97635 34 976581 6015754 61.61

13-14 .00041 97601 40 97581 5918136 60.64

14-15 .00048 96561 47 97538 5820555 59.66

15-16 .00062 97514 60 97484 5723017 58.69

16-17 .00093 97454 91 97408 5625533 57.73

17-18 .00105 97363 102 97312 5528125 56.78

18-19 .00142 97261 138 97192 5430813 55.84

19-20 .00156 97123 161 97043 5333621 54.92

197

20-21 .00162 96962 157 96884 5236578 54.01

21-22 .00161 96805 156 96727 5139694 53.09

22-23 .00156 96649 151 96574 5042967 52.18

Consider the following numerical examples taken from

Table 1:

a. do = lo x qo = (100,000) x (0.01801)

= 1,801

b. d1 = l1 x q1 -= (98199) x (0.00113)

= 111

c. l1 = lo – do = 100,000 - 1801

= 98,199

d. l2 = l1 – d1 = 98,199-111

= 98,088

e. Po 000,100

981991 == +

x

x

l

l

= 0.98199

f. qo = 1 – Po = 1- 0.98199

= 0.01801

Or, qo 000,100

1801

0

0 ==L

d

= 0.01801

g. What is the probability that a person aged 15 will die before reaching age 19?

From table 1, we obtain the following solution

198

97514

971231

115

1919

−=

−=l

lq

= 1 – 0.99599 = 0.00401

h. What is the probability that a person aged 10 will survive to exact age 15?

Solution

0.99813

97697

97514

l

l P

l

l P

10

15 15

x

tx x

t

=

==

= +

The mortality prospects of an individual at different ages are independent. Therefore, the

probability that a person aged 10 will die between the ages 15 and 19 is as follows:

97697

9712397514

1

10

1915

15

19

10

15

−=−

=

−=

l

ll

l

lX

l

l

= 0.00400

From the table 1 above, we can also evaluate probabilities using

fractional ages. Consider the problem of evaluating 1/3 P10 that a

person aged 10 will survive at least four months.

By definition

( ) ( )

97697

976679769731976973

1

31

10

111010

10

−−=

−−=

l

lllp

= 0.999898

199

Treatment of the under 5 or 0-4 Age Groups

For the estimation of the tlx column for the under 1 or the zero age

Group, we have lo = 0.3lo + 0.7l4

For the 1-4 age group, we have 4l1 = 1.3l4 + 2.7l5

For open ended intervals, say at age 75+, the following approximate

formulae hold:

lx = tdx

Tx = Lx

∞l75 = l75 Log10 l 75

Abridged Life Table

Since complete life tables are large, they are usually abridged.

The standard table has five-year intervals, except the first two. Since there are too many

deaths in the first year of life, the first interval is one year, that is (0-1), and the second is

from ages 1-5 years.

The column of the abridged life table is the same with the columns of the complete life table.

Abridged life tables are used extensively in specific purpose in clinical or animal studies.

They use interval (x, x + n) instead of (x, x + t)

The following table gives an example of an abridged life table.

Table 2. An abridged life table for males in Nigeria

X to

x+n

nMx nqx nPx Lx ndx nLx Tx

personal

ex

Age

interval

Central

age

specific

death rate

Probability

of dying of

dying in

interval x

to x + n

Probability

of

surviving

in interval

x to x+n

Persons

surviving

to exact

age x

No of

deaths x

to n+n

Person

years

lived x

to x+n

Year lived

after exact

age

Expectation

of life at

age x

200

(1) (2) (3) (4) (5) (6) (7) (8) (9)

Under

1

0.1843 0.1632 0.8368 10,000 1632 8858 369856 37.0

1-4 0.0532 0.1861 0.8139 8368 1557 29268 360998 43.1

5-9 0.0095 0.0464 0.9536 6811 316 33265 329733 48.1

10-14 0.0056 0.0276 0.9724 6495 179 32028 295705 45.5

15-19 0.0064 0.0315 0.9685 6316 199 31083 264627 41.9

20-24 0.0069 0.0339 0.9661 6117 207 30068 133539 38.2

25-29 0.0066 0.0325 0.9675 5910 192 29070 203471 34.4

30-34 0.0086 0.0421 0.9579 5718 241 27988 17401 30.5

35-39 0.0097 0.0474 0.9253 5477 260 26735 146412 26.7

40-44 0.0155 0.0747 0.9526 5217 390 25110 119678 22.9

45-49 0.0238 0.1123 0.8877 4827 542 22780 94568 19.6

50-54 0.261 0.1225 0.8775 4285 525 20113 71788 16.8

55-59 0.0311 0.1443 0.8557 3760 543 17443 51675 13.7

60-64 0.0477 0.2131 0.7869 3217 686 14370 34232 10.6

65-69 0.0631 0.2725 0.7275 2531 690 10930 19862 7.8

70-74 0.0992 0.3974 0.6025 1841 732 13751 8932 4.9

75+ 0.1299 1.0000 0.0000 1109 1109 3372 2372 3.0

201


In this study you have learnt about:

1) Definition of Life Table

�� Assumptions of Life Table

�� Usefulness of Life Tables

�� Types of Life Table

2) Relevance of Rates in Life Table Analysis

� Age- Specific Death Rate

� Crude Death Rate







Write short note on the following:

i. The Generation Life Table

ii. The Conventional Life Table

iii. Complete Life Table

iv. Abridged Life Table

202

Complete the following life table:

X to

x+ t

qx lx dx Lx Tx ex

40-41 93703 250 93578

41-42

42-43 93165 33.71

43-44 92831

44-45 0.00399 369 2954625

45-46 92123 91916

46-47 91710 464

47-48

48-49 90746 510 28.52

49-50 0.00630 568 89952

203

References

Edward R. Dougherty (1990): Probability and Statistics for the Engineering, Compound and

Physical Sciences.

James T. Mcclave, Terry Sincich (1995, 5th Ed): A First Course in Statistics.

Lindgen, McElrath, Berry (1978, 4th Ed): Introduction to Probability and Statistics.

Murray R. Spiegel (1972, 1st Ed): Theory and Problems of Statistics.


204

NOTES ON SAQS

Study Session 1

Solution

(a) Genotype frequency

i. 284.01482

421ˆ 11 ===N

NPAA

ii. 496.01482

735ˆ 12 ===N

NPAS

iii. 220.01482

326ˆ 22 ===N

NPSS

(b) Gene Frequency

i. = ASAA PP ˆˆ2

1+

= 0.284 + ½(0.496)

= 0.284 + 0.248

= 0.532

ii. )(ˆ SP = ASSS PP ˆˆ2

1+

= 0.220 + ½(0.496)

= 0.220 + 0.248

= 0.468

To obtain their standard errors, we calculate their variances first.

iii. V(P(A) = V(PAA) + V(PAS) – 2COV (PAA, PAS)

= N

PP

N

PP

N

PP ASAAASASAAAA 21

21 2)1()1( −

−+−

= 0.284(0.716) + 0.248(0.752) – 2(0.284)(0.248)

)(ˆ AP

205

1482 1482 1482

= 0.203344 + 0.186496 - 0.140864

1482

= 0.248976

1482

= 0.000168

Hence,

S.E. (P(A)) =

= 0.01296

By calculation, S.E.(P(S)) also is 0.01296

(c) 95% Confidence Interval

(i) ( ) ( ) ))(ˆ.(.ˆ

2APESZAPAP α±=

= 0.532 )01296.0(2

05.0Z±

= 0.0532 ±Z0.025(0.01296)

= 0.532 ±1.96(0.01296)

= 0.532±0.0254016

= (0.507, 0.557)

That is 0.507 < P(A) < 0.557

(ii) For P(S) = ))(ˆ(.)(ˆ2

SPESZSP α±

= 0.468 ± Z0.05/2 (0.01296)

= 0.468 ± 1.96 (0.01296)

= 0.468 ± 0.0254016

= (0.443, 0.493)

That is 0.443 < P(S) < 0.493

000168.0

206

Study Session 2

1. Find the median of the following observations 8, 2, 6, 1, 8, 9, 3

Solution:

Arranging the observations in ascending order of magnitude gives 1,2,3,6,8,8,9

The number of observations is odd i.e.

The middle observation, i.e. median is 6

2. Determine the geometric mean of the following set of data 2, 4, 8,11,4

Solution:

Using the previous example, find the geometric mean?

Solution:

Time (Hour) f Midvalue(x) logx f logx

10 - 19 4 14.5 1.1614 4.6456

20 - 29 6 24.5 1.3892 8.3352

30 - 39 7 34.5 1.5378 10.7646

40 - 49 10 44.5 1.6484 16.4840

50 - 59 3 54.5 1.7364 5.2092

60 - 69 7 64.5 1.8096 12.6672

70 - 79 8 74.5 1.8722 14.9776

80 - 89 9 84.5 1.9269 17.3421

90 - 99 6 94.5 1.9754 11.8524

( )5

5

. 2 4 8 1 1 4

2 8 1 6

4 .8 9 7

4 .9

G M x x x x=

==≈

207

Total 60 102.2779

[ ]7046.1log

60

2779.102log

loglog. 1

Anti

Anti

f

xfAntiMG

n

x

=

=

=∑

∑=

G.M = 50.6524

Study Session 3

1. Nine crippled children were asked to perform a certain task, the average time required to



Solution

H0 : µ = 10

H1 : µ 10

= 0.1

Test statistic

5.467.0

33

23

92

107

−=

−=

−=

−=−=

−=

cal

cal

t

ns

xt

ns

xt

µ

µ

α

≠

α

208

Conclusion

Since the calt lie in the rejection region, we reject Ho and conclude that there is significant

evidence at 10% level that the population mean is not 10

2. A researcher wishes to estimate the mean maximum weight at birth of 40 randomly

selected babies delivered in four teaching hospitals in a day. He is willing to assume that the

weights are approximately normally distributed with a variance of 144. He found out that the

mean weight for these babies is 84.3kg. Can the researcher conclude that the weight for the


Test statistic

63.590.1

7.1040

12953.84

326.201.0

−=

−=

−=

===

−=

Z

ZZZn

xZ

tab α

σµ

Conclusion

Since the calZ lie in the acceptance region, we accept H0 and conclude that the mean weight

for the population mean is at most 95kg.

209

Study Session 4

When planning and executing an experiment, the following are the procedures to be followed

in order to achieve a great success:

1. Problem Recognition and Definition: This is the first step in the planning and

execution of an experiment. After the problem has been recognized, it must be clearly

defined. The more we understand the problem at hand, the easier it is for us to find an

appropriate solution to it.

2. Objective of the Experiment: A clear statement of the objective is very important.

This may be in form of a hypothesis to be tested or a question which requires an

answer. If the objective is more than one, arrange them in their order of priority.

3. Analysis of the Problem and Objectives: The aims of the objectives must be

carefully studied and the necessity of the experiment stated in precise terms.

4. Treatment Selection: Reasonable selection of the treatments goes a long way in

achieving the desired results.

5. Selection of the Experimental Units: This involves the selection of experimental

units from the population on which inferences are to be made. The selected or chosen

experimental units must be a true representative of the entire population under

consideration.

6. Selection of the Experimental Design: The objective of the experiment should be

carefully considered here; so that an appropriate design will be given the needed

result that should be preferred.

7. Selection of Units for Observation and the Number of Replications: This involves

the determination of the number of replications and the plot size to be chosen in order

to produce the desired result.

8. Data to be collected: Consideration of data to be collected is indispensable. No

essential data should be omitted.

9. Outlining Statistical Analysis and Summarization of Results: Given the summary

of the expected results in order to show the expected effects of the treatments on the

experimental units. This can be presented on a summary table or graphs. List its

sources, variation and corresponding degrees of freedom in the analysis of variance

210

table. Check whether the expected results of the experiment are relevant to the

objective.

10. Performing the Experiment: The experiment should be conducted without any

element of personal biases. Record all observations properly and later check them

with a view to minimizing recording errors or deleting data that are obviously

erroneous.

11. Data Analysis and Interpretation of Results: Analyse all data and give the

interpretation of the results obtained from the experiment. Data should be scrutinized

and carefully analyzed so that the result will be accurate. If either the data is not well

analyzed or the results not well interpreted, the conclusions may be wrong.

Study Session 5

Research

It can be defined as an organized, formally structured methodology to obtain new knowledge.

Examples of research are the laboratory research, epidemiology research etc.

Placebo

It is an identical looking drug that has no active ingredients. It is an inert substance which

looked like and tasted like the treatment.

Experiment

It is a planned investigation, which is carried out to obtain facts or deny the results of

previous investigation.

Randomization of Randomization

It the process of using random number to assign treatment to patients (subjects). Subjects

should allocate to treatment group according to some chance mechanism.

211

Study Session 6



c. Find the point estimate of the relative potency of the test preparation.

d. Give a 90% confidence interval for the relative potency.

Solution

Relative potency, S

T

X

XR == ρ

m

XX

m

iT

T

i∑== 1 and

n

XX

n

iS

S

i∑== 1

siX ( )2

ssi XX − TiX ( )2

TTi XX −

10.5 0.25 12.7 2.0449

10.7 0.49 11 0.0729

9.9 0.04 5 39.3129

9.5 0.25 10.7 0.3249

9.5 0.25 13.5 4.9729

12.7 2.0449

13.3 4.1209

Total 50 1.28 78.9 52.8943

Therefore S

T

X

X=ρ

Where m

XX

m

iT

T

i∑== 1

= 78.9

7

= 11.27

212

127.1 10

27.11 implies This

10550

1

=

==

=∑

=

ρ

n

XX

n

iS

S

i

Therefore the relative potency of the test preparation is 1.127

b) 90% confidence internal for the relative potency

( )

−−−±

− 2

222 11ˆˆ

1

1

TXm

stg

gρρ

And

( ) ( ) ( )

( ) ( )42.5

8943.5228.110

2

2

12

1 1

2212

=

+=

−+−−+=

−

= =

− ∑ ∑

s

s

XXXXmnsn

i

m

iTTissi

Therefore g = ( ) ( )

2

2

105

4258121

X

..

= 0. 036

Therefore 90% confidence limit for ρ

812110050

21025722

2

22

.

,.

.,,

==

=

=

−+−+

t

tt

Xn

stgwhere

nm

S

α

213

( ) ( ) ( )

( )( ) ( )( )

( )7160

57001691

3252801691

02019640127112710371

27117

42581211036011271127103601

2

2

221

.,.

..

..

.....

.

......

=

±=

±=

−−±=

−−−±−= −

X

Therefore, the confidence limit for ρ

is 7.1ˆ6.0 ≤≤ ρ

Study Session 7

A. The Conventional Life Table

The conventional or periodic life table gives a cross-sectional view of the mortality and


2006). It is entirely dependent on the age-specific death rates prevailing in the year for which

it is constructed.

B. The Generation Life Table


the conceptual approach (cohort study) with the modification to take account of practical


living and stops at the present age. The generation life-table requires data covering the entire

life span of persons making up the cohort, which stands as a serious limitation to its

preparation.

C. Complete Life Table

A complete life table is a table in which the mortality experience is considered in single years

of age throughout the life span. All the columns of the life table are given for single year and

it is extremely detailed.

214

D. Abridged Life Table

An abridged life table is one in which the measures are given not for every single year of age

but for age groups. The values are also in general more appropriate i.e. elaborate adjustments

are not made, although they are accurate enough for most practical purposes. It contains

columns similar to those of the complete life table. The only difference is the length of

intervals.

Documents

Biometric Method I - University of Ibadan