Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Biometric Method I
University of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning Centre
Open and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series Development
COURSE MANUAL
Biometric Method I STA 351
University of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning Centre
Open and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series Development
University of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning CentreUniversity of Ibadan Distance Learning Centre
Open and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series DevelopmentOpen and Distance Learning Course Series Development
2
Copyright © 2008, Revised in 2015 by Distance Learning Centre, University of Ibadan, Ibadan.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior permission of the copyright
owner.
ISBN 978-021-340-6
General Editor: Prof. Bayo Okunade
University of Ibadan Distance Learning Centre
University of Ibadan, Nigeria
Telex: 31128NG
Tel: +234 (80775935727) E-mail: [email protected]
Website: www.dlc.ui.edu.ng
3
Vice-Chancellor’s Message
The Distance Learning Centre is building on a solid tradition of over two decades of service in the provision of External Studies Programme and now Distance Learning Education in Nigeria and beyond. The Distance Learning mode to which we are committed is providing access to many deserving Nigerians in having access to higher education especially those who by the nature of their engagement do not have the luxury of full time education. Recently, it is contributing in no small measure to providing places for teeming Nigerian youths who for one reason or the other could not get admission into the conventional universities.
These course materials have been written by writers specially trained in ODL course delivery. The writers have made great efforts to provide up to date information, knowledge and skills in the different disciplines and ensure that the materials are user-friendly.
In addition to provision of course materials in print and e-format, a lot of Information Technology input has also gone into the deployment of course materials. Most of them can be downloaded from the DLC website and are available in audio format which you can also download into your mobile phones, IPod, MP3 among other devices to allow you listen to the audio study sessions. Some of the study session materials have been scripted and are being broadcast on the university’s Diamond Radio FM 101.1, while others have been delivered and captured in audio-visual format in a classroom environment for use by our students. Detailed information on availability and access is available on the website. We will continue in our efforts to provide and review course materials for our courses.
However, for you to take advantage of these formats, you will need to improve on your I.T. skills and develop requisite distance learning Culture. It is well known that, for efficient and effective provision of Distance learning education, availability of appropriate and relevant course materials is a sine qua non. So also, is the availability of multiple plat form for the convenience of our students. It is in fulfilment of this, that series of course materials are being written to enable our students study at their own pace and convenience.
It is our hope that you will put these course materials to the best use.
Prof. Abel Idowu Olayinka
Vice-Chancellor
4
Foreword
As part of its vision of providing education for “Liberty and Development” for Nigerians and the International Community, the University of Ibadan, Distance Learning Centre has recently embarked on a vigorous repositioning agenda which aimed at embracing a holistic and all encompassing approach to the delivery of its Open Distance Learning (ODL) programmes. Thus we are committed to global best practices in distance learning provision. Apart from providing an efficient administrative and academic support for our students, we are committed to providing educational resource materials for the use of our students. We are convinced that, without an up-to-date, learner-friendly and distance learning compliant course materials, there cannot be any basis to lay claim to being a provider of distance learning education. Indeed, availability of appropriate course materials in multiple formats is the hub of any distance learning provision worldwide.
In view of the above, we are vigorously pursuing as a matter of priority, the provision of credible, learner-friendly and interactive course materials for all our courses. We commissioned the authoring of, and review of course materials to teams of experts and their outputs were subjected to rigorous peer review to ensure standard. The approach not only emphasizes cognitive knowledge, but also skills and humane values which are at the core of education, even in an ICT age.
The development of the materials which is on-going also had input from experienced editors and illustrators who have ensured that they are accurate, current and learner-friendly. They are specially written with distance learners in mind. This is very important because, distance learning involves non-residential students who can often feel isolated from the community of learners.
It is important to note that, for a distance learner to excel there is the need to source and read relevant materials apart from this course material. Therefore, adequate supplementary reading materials as well as other information sources are suggested in the course materials.
Apart from the responsibility for you to read this course material with others, you are also advised to seek assistance from your course facilitators especially academic advisors during your study even before the interactive session which is by design for revision. Your academic advisors will assist you using convenient technology including Google Hang Out, You Tube, Talk Fusion, etc. but you have to take advantage of these. It is also going to be of immense advantage if you complete assignments as at when due so as to have necessary feedbacks as a guide.
The implication of the above is that, a distance learner has a responsibility to develop requisite distance learning culture which includes diligent and disciplined self-study, seeking available administrative and academic support and acquisition of basic information technology skills. This is why you are encouraged to develop your computer skills by availing yourself the opportunity of training that the Centre’s provide and put these into use.
5
In conclusion, it is envisaged that the course materials would also be useful for the regular students of tertiary institutions in Nigeria who are faced with a dearth of high quality textbooks. We are therefore, delighted to present these titles to both our distance learning students and the university’s regular students. We are confident that the materials will be an invaluable resource to all.
We would like to thank all our authors, reviewers and production staff for the high quality of work.
Best wishes.
Professor Bayo Okunade
Director
6
Course Development Team
Content Authoring Obisesan K.O.
Content Editor
Production Editor
Learning Design/Assessment Authoring
Managing Editor
General Editor
Prof. Remi Raji-Oyelade
Ogundele Olumuyiwa Caleb
SkulPortal Technology
Ogunmefun Oladele Abiodun
Prof. Bayo Okunade
7
Table of Contents Study session 1: Introduction to Population Genetics ............................................................. 13
Introduction .......................................................................................................................... 13
Learning Outcomes for Study Session 1 .............................................................................. 13
1.1 What is Genetics........................................................................................................ 13
1.1.1 The early history of population genetics ............................................................ 14
1.2 Basic Definitions ....................................................................................................... 15
In-Text Question .............................................................................................................. 16
In-Text Answer ................................................................................................................ 16
1.3 Gene Proportions and Variations ................................................................................... 17
1.3.1 Gene Variations ...................................................................................................... 18
1.3.2 The Mendel’s Experiment....................................................................................... 22
In-Text Answer ................................................................................................................ 29
In-Text Question .............................................................................................................. 29
Summary of Study Session 1 ............................................................................................... 29
Self-Assessment Question for Study Session 1 ................................................................... 30
SAQ (Test of learning outcome) ...................................................................................... 30
References ............................................................................................................................ 31
Study Session 2: Introduction to Biometrics ........................................................................... 32
Introduction .......................................................................................................................... 32
Learning Outcomes for Study Session 2 .............................................................................. 32
2.1 Biometrics ...................................................................................................................... 32
In-Text Question .............................................................................................................. 34
In-Text Answer ................................................................................................................ 34
2.2 Biological Data .............................................................................................................. 34
2.2.1 Sources of Biological Data ..................................................................................... 34
2.3 Challenges in Biological Data Collection ...................................................................... 35
2.3.1 Statistical Data ........................................................................................................ 35
2.4 Basic Terms in Biometrics ............................................................................................. 36
In-Text Question .............................................................................................................. 37
In-Text Answer ................................................................................................................ 37
8
2.5 Descriptive Statistics ...................................................................................................... 37
2.5.1 Measure of Location ............................................................................................... 37
2.5.2 Relationship between Arithmetric, Geometric and Harmonic Mean ...................... 44
2.5.3 Measures of Dispersion........................................................................................... 50
Summary of Study session 2 ................................................................................................ 65
Post-Test .......................................................................................................................... 65
Self-Assessment Question for Study Session 2 ................................................................... 67
SAQ (Test of learning outcome) ...................................................................................... 67
References ............................................................................................................................ 68
Study Session 3: Sampling and Sampling Distribution ........................................................... 69
Introduction .......................................................................................................................... 69
Learning Outcomes for Study Session 3 .............................................................................. 69
3.1 Sample............................................................................................................................ 69
3.1.1 Sampling Method .................................................................................................... 70
In-Text Question .............................................................................................................. 73
In-Text Answer ................................................................................................................ 74
3.2 Hypothesis Testing......................................................................................................... 87
3.2.1 Definition of some Basic Concepts ......................................................................... 88
3.2.2 Hypothesis Testing on the Difference of Two Population Mean ........................... 92
In-Text Question .............................................................................................................. 96
In-Text Answer ................................................................................................................ 96
3.2.3 Probability Distribution and Hypothesis Testing .................................................. 102
Summary of Study Session 3 ............................................................................................. 113
Self-Assessment Question for Study Session 3 ................................................................. 114
SAQ (Test of learning outcome) .................................................................................... 114
References .......................................................................................................................... 114
Study Session 4: Design and Analysis of Biological Experiments........................................ 115
Introduction ........................................................................................................................ 115
Learning Outcome for Study Session 4 ............................................................................. 115
4.1 Design of an Experiment ............................................................................................. 116
4.1.1 Components of Experimental Design ................................................................... 116
4.1.2 Principles of Experimental Design ....................................................................... 117
In-Text Question ............................................................................................................ 119
In-Text Answer .............................................................................................................. 119
9
4.2 Steps Involved in the Planning and Execution of an Experimental Design ................ 119
In-Text Answer .............................................................................................................. 120
In-Text Question ............................................................................................................ 120
4.3 Definitions of Relevant Terms in Design and Analysis of Biological......................... 121
4.4 Experiments ................................................................................................................. 121
4.4.1 Analysis of Variance (ANOVA) ........................................................................... 121
Summary of Study Session 4 ............................................................................................. 140
Self-Assessment Question for Study Session 3 ................................................................. 140
SAQ (Test of learning outcome) .................................................................................... 140
References .......................................................................................................................... 140
Study Session 5: Design and Analysis of Clinical Trials ...................................................... 141
Introduction ........................................................................................................................ 141
Learning Outcome for Study Session 5 ............................................................................. 141
5.1 What are Clinical Trials? ............................................................................................. 142
5.1.1 Features of Clinical Trials ..................................................................................... 142
5.1.2 Needs for Clinical Trials ....................................................................................... 142
5.1.3 Uses of Clinical Trials .......................................................................................... 143
5.1.4 Advantages of Clinical trials ................................................................................. 143
5.1.5 Disadvantages of Clinical trials ............................................................................ 143
5.2 Identification of Phases of Clinical Trials ................................................................... 144
5.2.1 Phase A of Clinical Trials ..................................................................................... 145
5.2.2 Phase B of Clinical Trials ..................................................................................... 145
5.2.3 Phase C of Clinical Trials ..................................................................................... 145
5.2.4 Phase D of Clinical Trials ..................................................................................... 146
In-Text Question ............................................................................................................ 146
In-Text Answer .............................................................................................................. 146
5.3 Basic Terms on Clinical Trial ...................................................................................... 146
5.3.1 Protocol Design in Clinical Trials......................................................................... 148
5.3.2 Role of Statisticians in Clinical Trials .................................................................. 148
In-Text Question ............................................................................................................ 149
In-Text Answer .............................................................................................................. 149
5.4 Ethical Issues on Clinical Trials .................................................................................. 149
5.4.1 Sample Size Determination................................................................................... 150
Self-Assessment Question for Study Session 5 ................................................................. 159
10
SAQ (Test of learning outcome) .................................................................................... 159
References .......................................................................................................................... 159
Study Session 6: Biological Assay ........................................................................................ 160
Introduction ........................................................................................................................ 160
Learning Outcome for Study Session 6 ............................................................................. 160
6.1 Biological Assay .......................................................................................................... 160
6.1.1 Purpose of Bioassay .............................................................................................. 163
6.2 Types of Assay ............................................................................................................. 164
6.2.1 Direct Assay .......................................................................................................... 164
6.2.2 Indirect Assay ....................................................................................................... 171
In-Text Question ............................................................................................................ 173
In-Text Answer .............................................................................................................. 173
6.2.3 Definitions of some Terms .................................................................................... 176
Summary of Study Session 6 ............................................................................................. 184
Self-Assessment Question for Study Session 6 ................................................................. 184
SAQ (Test of learning outcome) .................................................................................... 184
References .......................................................................................................................... 184
Study Session 7: Life Table ................................................................................................... 185
Introduction ........................................................................................................................ 185
Learning Outcome for Study Session 7 ............................................................................. 185
7.1 What is a Life Table? ................................................................................................... 185
7.1.1 Assumptions of Life Table.................................................................................... 186
7.1.2 Usefulness of Life Tables ..................................................................................... 186
7.1.3 Types of Life Table ............................................................................................... 187
In-Text Question ............................................................................................................ 189
In-Text Answer .............................................................................................................. 189
7.2 Relevance of Rates in Life Table Analysis .................................................................. 189
7.2.1 Crude Death Rate .................................................................................................. 189
7.2. Age- Specific Death Rate ........................................................................................ 190
Summary of Study Session 7 ............................................................................................. 201
Self-Assessment Question for Study Session 7 ................................................................. 201
SAQ (Test of learning outcome) .................................................................................... 201
References .......................................................................................................................... 203
NOTES ON SAQS ................................................................................................................. 204
11
Study Session 1 .................................................................................................................. 204
Study Session 2 .................................................................................................................. 206
Study Session 3 .................................................................................................................. 207
Study Session 4 .................................................................................................................. 209
Study Session 5 .................................................................................................................. 210
Study Session 6 .................................................................................................................. 211
Study Session 7 .................................................................................................................. 213
12
Course Introduction
Every aspect of human endeavours now requires the application of Statistics. Statistics is now
employed to solve problems ranging from Anatomy to Zoology (A-Z).
In this course material, we will be introduced to population genetics, biometrics, sampling
estimation and sampling distribution, design and analysis of biological experiment, design
and analysis of clinical trials, biological assay and finally life table.
We shall adopt a step by step approach so as to facilitate easy understanding of the subject,
by providing examples and exercises during each of the studies. It is hoped that at the end of
this course, students and users at all levels will have a sound knowledge of Biometrics
methods.
However, at the end of these studies, students and users are expected to give comprehensive
meaning of genetics, compute gene proportions and their variations, and understand the
meaning of Biometrics. Furthermore, the users should be able to discuss the problems
associated with the collection of biological data, test hypotheses on some sampling assertions,
design and analyse biological experiments, design and analyse clinical trials, and be able to
fully understand the concept of life-table.
The Royal Statistical Society (RSS) in London is appreciated for their cooperation. The
copyright of the statistical table at the back of this book is held by the RSS. I thank the
society for the kind permission to use it in this book.
Obisesan, K.O.
13
Study session 1: Introduction to Population Genetics
Introduction
Population genetics is a subfield of genetics, that deals with genetic differences within and
between populations, and is a part of evolutionary biology. Studies in this branch of biology
examine such phenomena as adaptation, speciation, and population structure. A sample is
Adaptation which speaks about how you have been able to interact with your environment
and it paved way for your dwelling of today.
Generally, population genetics have been applied to animal and plant breeding, to human
genetics, and more recently to ecology and conservation biology. It is important that you
understand simple genetics. Though the emphasis is on human population, genetics are
applicable to any living cell such as plants and animals. Genetics shall be introduced to you
in this study.
Learning Outcomes for Study Session 1
At the end of this study, you should be able to:
1.1 Explain the meaning of Genetics;
1.2 Define some basic terms relating to genetics;
1.3 Define simple Mendelian ratio; and
1.4 Work out some applications to real life situations.
1.1 What is Genetics
You should be aware that you came into existence through your parents, but it’s not certain
that you took note of the fact that you have inherited one or two things from your parents not
physically alone but those that can be revealed scientifically.
For instance, the parent of a child who is seriously ill in the hospital will be called upon at
first if the child needs blood because it is believed that at least one of the parent’s blood
14
group be the same should tally with that of their child. In case this is not so, there would be a
question to know if both of them are the true parents of the child.
Meaning of Genetics
Genetics is the scientific study of the ways in which different characteristics are passed from
each generation of living things to the next. In other words, it is the study of the genetic
constitution of individuals as it affects the genes and genotype from one generation to the
other. The genetic constitution of a population however is described by the gene frequencies
and genotype frequencies.
In the process of transferring these genes from one generation to the other, the genotype of
the parents is broken down (mutation) and a new set of genotype is constituted in the
progenies from the genes transmitted in the gametes. Simply put, genetics is a term used to
describe the particles that are transmitted from parent to their offspring.
1.1.1 The early history of population genetics
The early history of population genetics:
The early history of population genetics
1856-63 Mendel’s experiments on peas
1859 Darwin’s Origin of Species
1900 Rediscovery of Mendel’s laws
1909 Nilsson-Ehle’s experiments on wheat
1915 Morgan’s experiments on Drosophila
1918 Fisher’s paper on phenotypic correlations between relatives
1918 Sturtevant’s artificial selection experiments on Drosophila
1912-1920 Pearl, Jennings and Wright’s work on inbreeding
1930 Fisher’s The Genetical Theory of Natural Selection (Fundamental theorem)
1931 Wright’s Evolution in Mendelian populations
1932 Haldane’s The Causes of Evolution
1955 Kimura diffusion equation solution to the distribution of allele frequencies
15
1.2 Basic Definitions
Population: This is the total number of people living in a particular territory at a particular
time. In genetic sense, however, population refers to a breeding group that is capable of
transmitting genes from one generation to another.
Gene: This is a unit in a cell, which controls a particular quality in a living thing that has
been passed on from its parents. In other words, it is a basic part of a living cell that contains
characteristics of one’s parent.
Genotype: This is the make up of living creatures with some genetic combination. It
describes the array of gene frequencies present in each individual.
Allele: This is one of the several forms of a gene. It is a short form of allelomorph. Allele are
responsible for the production of contrasting characteristics of a given gene differing in
Deoxyribonucleic acid (DNA), which is the hereditary material of the majority of living
organism and affecting the functioning of single product (RNA-ribonucleic acid).
Fecundity: This is the physiological ability to produce offspring or the capability of repeated
fertilization.
Fertilization: It is the fusion of two gametes to produce a cell called the zygote.
Natural Selection: When a selection force is at work such as survivorship, competition for
food and space, some members of the species possess adaptive characters more than the
others.
Recombination: The formation of offspring, containing a combination of genes that are
different from the combination in either parent. This can act on natural selection to produce
evolutionary change.
Mating: It is the union in pairs of individuals of opposite sexes to accomplish sexual
reproduction.
16
Taxonomy: It is the study of classification and identification of living organisms whereby
you have the;
Mutation: This is the process in which the genetic material of a living thing changes in
structure when it is passed on to its offspring.
DNA (Deoxyribonucleic Acid): It is a chemical that carries genetic information in the cells
of each living thing. (Watson and Grida won a Nobel Prize for discovering DNA).
Chromosome: It is the part of a living thing that contains the characteristics that are passed
on from parents to their offspring. They look like tiny threads and also contain the DNA in
each cell.
In-Text Question
______________ is the make up of living creatures with some genetic combination.
a. Genotype
b. Fecundity
c. Fertilization
d. DNA
In-Text Answer
a. Genotype
Kingdom Animalia
Phylum Chordata
Class Mammalia
Order Primates
Family Hominidae
Genus Homo
Species Homo sapiens
17
1.3 Gene Proportions and Variations
Gene Proportions
Suppose there are two alleles A and S in a population of N individuals, genotypes could be
formed as AA, AS (also for SA) and SS. If N11 has AA and N22 has SS then N12 has AS and
that
N = N11 + N12 + N22
The genotype frequency is generally defined as
The proportion of population (gene frequency) with allele A,
( ) ASAA PPAP 21+=
N
NN 1211 21+
=
The proportion of population (gene frequency) with allele S,
( )
N
NN
N
N
N
N
PPSP ASSS
1222
1222
21
21
21
+
+
+=
The total proportion is unity, that is
P(A) + P(S) = 1
but P(A) = PAA + ½PAS
P(S) = PSS + ½PAS
So we have. PAA + ½PAS + PSS + ½PAS =1, which implies
PAA + PAS + PSS =1
N
NP
N
NP
N
NP
SS
AS
AA
22
12
11
=
=
=
18
Proof:
Recall that N = N11 + N12 + N22
Divide through by N
Hence
1 = PAA + PAS + PSS
PAA + PAS + PSS = 1
PAA + ½PAS + ½PAS + PSS = 1
P(A) + P(S) = 1
Box 1.1: Recall
Genetics is the scientific study of the ways in which different characteristics are passed from
each generation of living things to the next.
1.3.1 Gene Variations
The variance of these proportions also can be calculated. It is used to check on the variability
between the proportions.
For the proportion of population with an allele A,
The variance V{P(A)} = V(PAA) + V(PAS) – 2COV (PAA, PAS)
SS
AS
AA
PN
N
PN
N
pN
NWhere
N
N
N
N
N
N
N
N
N
N
N
N
N
N
=
=
=
++=
++=
22
12
11
221211
221211
1
19
Where V(PAA) = ( )
N
PP AAAA −1
V(PAS) = ( )N
PP ASAS 2112
1 −
COV (PAA, PAS) = ( )N
PP ASAA 21
For the proportion of population with an allele S,
The variance V{P(S)} = V(PSS) + V(PAS) - 2COV (PSS, PAS)
Where V (PSS) = ( )N
PP SSSS −1
V(PAS) = ( )N
PP ASAS 2112
1 −
{ }N
PPpp
ASSS
ASSS
)21(
)cov( , =
The standard error, coefficient of variation and confidence interval can also be computed.
Standard Error, S.E. is the positive square root of its variance
S. E. (P(A)) = ))(( APV
Coefficient of variation %100P(A)
(P(A)) E. S.))(.(. XAPVC =
The confidence interval C.I. for P(A) is given as
Which can be put as:
))((.)()),((.)(22
APESZAandPAPESZAP αα +−
or written as
))((..)( 2/ APESZAP α±
20
))(ˆ(.)(ˆ)())(ˆ(.)(ˆ22
APESZAPAPAPESZAP αα +<<−
where α is the significance level.
Confidence level = α−%100
Similarly,
S.E. (P(S) = ))(( SPV
%100P(S)
(P(S)) E. S.))(.(. XSPVC =
C.I. for P(S) is given as
))((.)(2
SPESZSP α±
Which can be put as
))((.)()),((.)(22
SPESZSandPSPESZSP αα +−
or written as
where α is the significance level.
Confidence level = α−%100
Example 1
The number of individuals with genotypes AA, AS and SS in a sample of 1482 in a
population is given as:
Genotype AA AS SS
Number 421 735 326
a. For each blood group obtain the genotype frequency
b. Obtain the gene frequencies of P(A), P(S) and their standard errors.
c. Place a bound on your estimates in (b) above at 5% level.
))(ˆ(..)(ˆ)())(ˆ(..)(ˆ2/2/ SPESZSPSPSPESZSP αα +<<−
21
Solution
(a) Genotype frequency
i. 284.01482
421ˆ 11 ===N
NPAA
ii. 496.01482
735ˆ 12 ===N
NPAS
iii. 220.01482
326ˆ 22 ===N
NPSS
(b) Gene Frequency
i. = ASAA PP ˆˆ2
1+
= 0.284 + ½(0.496)
= 0.284 + 0.248
= 0.532
ii. )(ˆ SP = ASSS PP ˆˆ2
1+
= 0.220 + ½(0.496)
= 0.220 + 0.248
= 0.468
To obtain their standard errors, we calculate their variances first.
iii. V(P(A) = V(PAA) + V(PAS) – 2COV (PAA, PAS)
= N
PP
N
PP
N
PP ASAAASASAAAA 21
21 2)1()1( −
−+−
= 0.284(0.716) + 0.248(0.752) – 2(0.284)(0.248)
1482 1482 1482
= 0.203344 + 0.186496 - 0.140864
1482
)(ˆ AP
22
= 0.248976
1482
= 0.000168
Hence,
S.E. (P(A)) =
= 0.01296
By calculation, S.E.(P(S)) also is 0.01296
(c) 95% Confidence Interval
(i) ( ) ( ) ))(ˆ.(.ˆ
2APESZAPAP α±=
= 0.532 )01296.0(2
05.0Z±
= 0.0532 ±Z0.025(0.01296)
= 0.532 ±1.96(0.01296)
= 0.532±0.0254016
= (0.507, 0.557)
That is 0.507 < P(A) < 0.557
(ii) For P(S) = ))(ˆ(.)(ˆ2
SPESZSP α±
= 0.468 ± Z0.05/2 (0.01296)
= 0.468 ± 1.96 (0.01296)
= 0.468 ± 0.0254016
= (0.443, 0.493)
That is 0.443 < P(S) < 0.493
1.3.2 The Mendel’s Experiment
This was done by taking pollen from one plant and dusting it on the stigma of the other plant.
Precautions were taken to ensure that the latter was not contaminated with its own pollen or
000168.0
23
that of any other plant. The seeds resulting from this cross were collected and sown. It was
found that all the seeds gave rise to tall plants, no dwarf plant was produced at all. This plant
belongs to what is called first filia generation (or F1 for short).
One of this tall F1 plant was then self pollinated, precautions again were taken to prevent its
being pollinated by any other kind of pollen, the resulting seeds were sown and the offspring
F2 generation were carefully examined. It was found that the F2 plants were a mixture; some
were tall and some dwarf.
In one particular cross, he counted 787 tall plants and 277 dwarf ones i.e. approximately ¾
were tall and ¼ were dwarf meaning that the proportion of tall to dwarf plants approximate to
3:1. This is the Mendelian ratio.
24
The Figure below Illustrates what Happened when a Pure Breeding Pea Tall Plant is
Crossed with a Dwarf One
A punnet square can be used to show the fusion of Fusion of F1 Gametes:
Gametes
½T ½t
½T ¼TT ¼Tt
½t ¼Tt ¼tt
σ
σ
Gam
etes
Parent
P
Tall
TT
Dwarf
tt
X
Alleles in gamates Alleles in gamates
All t All T Gamates
produced by
Gamates Fused
First Filia
Generation F1 Tall Tt Tall Tt
X
Self Polinated
Segregation Segregation
½ T ½ t ½ T ½ t
G a m a t e s F u s e d
Second Filia
Generation F2
¼ TT ¼ Tt ¼ Tt ¼ tt
Tall ¾ Dwarf ¼
25
Example 2
If a cross pollination experiment has been performed correctly, then ⅛ of the subsequent
plant should have red flowers, ⅜ should have pink flowers and the remainder should have
white flowers. A random sample of 80plants is found to comprise 4 red flowers, 40 pink
flowers and 36 white flowers. Does this provide significance evidence at 5% level that the
experiment has been performed incorrectly?
Solution
H0: Experiment was performed correctly
H1: Experiment was performed incorrectly.
Probability frequencies for plant
i. Red = ⅛
ii. Pink = ⅜
iii. White = 4/8 = ½
Observed frequencies
i) Red = 4
ii) Pink = 40
iii) White = 36
80
Therefore the expected frequencies ei = nPi
108
180;Re) 1 =
=edi
308
380;) 2 =
=ePinkii
402
180;) 3 =
=eWhiteiii
Now
26
Oi ei
4 10 3.6
40 30 3.3
36 40 0.4
7.3
i.e.
Decision: Accept the null hypothesis if is less than . Otherwise reject.
Conclusion: We reject the null hypothesis, since the calculated is in the rejection region
i.e. . So, we conclude that the experiment has been performed incorrectly.
Example 3:
A Botanist conjectures that the specimens of a particular plant are growing at random
location in a bar with mean rate 4 plants per square meter. To test the conjecture, he divides a
randomly chosen region of the bar into non-overlapping ridges of area 0.5m2 and counts the
number of plant in each region. The results were as follows:
No. of plants 0 1 2 3 4
No. of Regions 4 14 9 7 6 0
Test the conjecture using the 5% level of significance
2( )i i
i
O ee
−
2
2 21,5%
2(3 0 1),0.05
22,0.05
7.3
5.991
calculated
tabulated v m
χχ χ
χ
χ
− −
− −
=
=
=
==
2.calχ 2
tabχ
2χ
2 2cal tabχ χ>
5≥
27
Solution:
H0: The specimens are growing at random location
Hi: The specimens are not growing at random location
X f fx
0 4 0
1 14 14
2 9 18
3 7 21
4 6 24
0 0
Total 40 77
925.140
77 ===∑∑
f
fxX
5≥
28
We shall consider the Poisson distribution.
Total 04
Now, we revise ei by adding together ei<5 which also affects corresponding Oi.
Oi ei
4 5.836 0.5776
14 11.232 0.6821
9 10.812 0.3037
7 6.936 0.0006
6 5.184 0.1284
Total 1.6924
1 .9 2 5 0
1 . 9 2 5 1
1 . 9 2 5 2
1 .9 2 5 3
1 . 9 2 5 4
( )!
(1 . 9 2 5 )( 0 ) 0 . 1 4 5 9
0 !
(1 . 9 2 5 )( 1 ) 0 . 2 8 0 8
1 !
(1 . 9 2 5 )( 2 ) 0 . 2 7 0 3
2 !
(1 . 9 2 5 )( 3 ) 0 . 1 7 3 4
3 !
(1 . 9 2 5 )( 4 ) 0 . 0 8 3 5
4 !( 5 ) 1 ( 5 )
X xe XP X x
x
eP X
eP X
eP X
eP X
eP X
P X P X
−
−
−
−
−
−
= =
= = =
= = =
= = =
= = =
= = =
≥ = − <=
0
1
2
3
4
1 0 . 9 5 3 9 0 . 0 4 6 1
4 0 0 . 1 4 5 9 5 . 8 3 6
4 0 0 . 2 8 0 8 1 1 . 2 3 2
4 0 0 . 2 7 0 3 1 0 . 8 1 2
4 0 0 . 1 7 3 4 6 . 9 3 6
4 0 0 . 0 8 3 5
i iE x p e c t e d f r e q u e n c i e s e n P
e X
e X
e X
e X
e X
− =
== == == == == =
5
3 . 3 4
4 0 0 . 0 4 6 1 1 . 8 4 4e X≥ = =
2( )i i
i
O ee
−
29
815.7
.
6924.1
205.0,3
205.0,115
2%5,1
2
2
==
=
=
=
−−
−−
χ
χ
χχχ
ei
mvtab
cal
Decision: Accept the null hypothesis if is less than . Otherwise reject.
Conclusion: We accept the null hypothesis since the calculated is in the acceptance
region i.e. . So we conclude that the specimens are growing at random location.
In-Text Answer
Mendel’s experiment by taking pollen from one plant and dusting it on the stigma of the other
plant. True/False?
In-Text Question
True
Summary of Study Session 1
In this study, you have learnt that:
1. Genetics is an important topic in the study of biometrics. Gregor Mendel (1882-1884)
is the father of genetics.
2. Population genetics is the branch of genetics that describes the consequences of
mendelian inheritance at the population level.
3. Genetics; we considered proportions of genes and confidence intervals. We also
considered simple mendelian ratio and test of hypothesis.
2
calχ 2
tabχ
2χ2 2
cal tabχ χ<
30
Self-Assessment Question for Study Session 1
Having studied this session, you can now assess how well you have learnt its learning outcomes
by answering the following questions. It is advised that you answer the questions thoroughly and
discuss your answers with your tutor in the next Study Support Meeting. Also, brief answers to the
Self-Assessment Questions are provided at the end of this module as a guide.
SAQ (Test of learning outcome)
The number of individuals with genotypes AA, AS and SS in a sample of 1482 in a
population is given as:
Genotype AA AS SS
Number 421 735 326
a. For each blood group obtain the genotype frequency
b. Obtain the gene frequencies of P(A), P(S) and their standard errors.
c. Place a bound on your estimates in (b) above at 5% level.
31
References
Bliss C.I. (1970, 3rd Ed): Statistics in Biology: Statistical Methods for Research in the Natural
Sciences, Vol. 1 and 2, published by Wiley.
Cox D.R. (1992, Classical Ed): Planning of Experiments, published by Wiley.
Finney D.J. (1978 3rd Ed): Statistical Method in Biological Assay, Published by Griffin.
32
Study Session 2: Introduction to Biometrics
Introduction
Biometrics are computerised methods of recognizing you based on a physiological or
behavioural characteristic. Among the features measured are; face, fingerprint, hand
geometry, signature, voice, etc. Biometric technologies are becoming the foundation of an
extensive array of highly secure identification and personal verification solutions. Financial
fraud and insecurity encouraged its use, almost everywhere you go it is the check.
You should be familiar with the methods of registration before the election and voting during
election. Furthermore, you will learn about the basic statistical techniques that are used in the
study of biological data and in the analysis of biological experiment. Statistical techniques or
methods play a very significant role in virtually all-human endeavours. They are widely
applied in various fields such as Biological science, medicine, physical science, and so on.
Learning Outcomes for Study Session 2
At the end of this study, you should be able to:
2.1 Discuss the meaning of Biometrics;
2.2 Identify the sources of biological data;
2.3 Identify problems associated with the collection of biological data;
2.4 Identify some basic terms in Biometrics; and
2.5 Discuss various statistical tools.
2.1 Biometrics
Population genetics is a subfield of genetics that deals with genetic differences within and
between populations, and is a part of evolutionary biology. Studies in this branch of biology
examine such phenomena as adaptation, speciation, and population structure.
Population genetics was a vital ingredient in the emergence of the modern evolutionary
synthesis. Its primary founders were
who also laid the foundations for the related discipline of quantitative genetics.
Traditionally a highly mathematical discipline, modern population genetics encompasses
theoretical, lab, and field work.
inference from DNA sequence data and for proof/disproof of concept.
Figure 2.1: Sewall Wright, J. B. S. Haldane and Ronald Fisher
Statistics applied to biological problem is called B
measurement using statistical method
method and application of the mathematical sciences (Mathematical statistics and computer
science) in the studies of agriculture, biolo
Greek word "bios" meaning life and M
Then, we can say that "Biometry" refers to "measurement of life”. Biometric
refers to the application of statistical techniques to
Biometrics is also the application of statistical methods to the analysis of data derived from
the biological science and medicine.
two or more treatments and to select the one
disease or disorder.
33
synthesis. Its primary founders were Sewall Wright, J. B. S. Haldane and
who also laid the foundations for the related discipline of quantitative genetics.
Traditionally a highly mathematical discipline, modern population genetics encompasses
theoretical, lab, and field work. Population genetic models are used both for statistical
inference from DNA sequence data and for proof/disproof of concept.
: Sewall Wright, J. B. S. Haldane and Ronald Fisher
o biological problem is called Biometry, which is the study of biological
atistical methods or techniques. The biometric is linked with the theory,
method and application of the mathematical sciences (Mathematical statistics and computer
science) in the studies of agriculture, biology and the environment. It originated from the
k word "bios" meaning life and Metron meaning, "measure".
Then, we can say that "Biometry" refers to "measurement of life”. Biometric
refers to the application of statistical techniques to biological problems.
Biometrics is also the application of statistical methods to the analysis of data derived from
the biological science and medicine. The aim of many medical investigations is to compare
two or more treatments and to select the one that is more effective in dealing with some
and Ronald Fisher,
who also laid the foundations for the related discipline of quantitative genetics.
Traditionally a highly mathematical discipline, modern population genetics encompasses
Population genetic models are used both for statistical
: Sewall Wright, J. B. S. Haldane and Ronald Fisher
is the study of biological
is linked with the theory,
method and application of the mathematical sciences (Mathematical statistics and computer
It originated from the
Then, we can say that "Biometry" refers to "measurement of life”. Biometrics then clearly
Biometrics is also the application of statistical methods to the analysis of data derived from
The aim of many medical investigations is to compare
that is more effective in dealing with some
34
In-Text Question
As you have learnt thus far, statistics applied to ____________problem is called Biometry?
a. Biological
b. Mathematical
c. Scientific
d. Economic
In-Text Answer
a. Biological
2.2 Biological Data
Biological experiments deal with living things and are characterized by the fact that no
individuals are exactly alike. These variations between individuals are the bases of the
variation of new forms of life.
2.2.1 Sources of Biological Data
The following are the sources of biological data:
1. Laboratory Experiment.
2. Control Experiment.
3. Agricultural Farm.
4. Research Institutions.
5. Existing Records.
35
2.3 Challenges in Biological Data Collection
Biological data are data or measurements collected from biological sources, which are often
stored or exchanged in a digital form. Biological data are commonly stored in files or
databases. Examples of biological data are DNA base-pair sequences, and population data
used in ecology.
The following are the problems associated with the Collection of Biological Data:
1) Some diseases may not be found in the available register despite the fact that they are
in existence.
2) Causes of illness are not always recorded.
3) The general practitioner’s record, which is a source of medical data, is not kept up to
date.
4) Variation in climatic condition, e.g. rainfall, relative humidity, temperature e.t.c.,
these may affect the yield of crop under investigation.
5) Diseases and pests make the results of the experiment inadequate for analysis
purpose.
2.3.1 Statistical Data
Data are raw facts that have to be processed using "statistical method". When data are
presented in numerical form, they are called "quantitative data". A statistical data must be
compiled in a properly planned manner and for a specific purpose. In biological science,
statistical data are products of measurement or direct observation. A measuring device called
‘scaling’ can make the process of collecting observation on a variable.
Box 2.1: Recall
Biometrics refers to "measurement of life”. Biometrics then clearly refers to the application
of statistical techniques to biological problems.
36
2.4 Basic Terms in Biometrics
1. Sample: It is a collection of individual observation selected by a specified procedure, e.g. a
sample of individuals with HIV/AIDS. It is also defined in general form as the representative
or fractional part of a population.
2. Observation: It refers to the count of individual object or measurement taking on the
smallest sampling unit in a sample, such that the observation can be expressed numerically.
3. Population: It is the total number of individual observations about which inferences are
made, existing anywhere in the world or at least within a definitely specified sampling area,
limited in space or time, e.g. The population of infants in a Local Government Area,
population of lion in the western region of Africa. A population may be of finite size or
infinite size. It is the entire collection of all persons/objects concerning a specific area of
investigation or study.
4. Variable: (A Biological Variable) define a property with respect to which individual in the
sample differs in some ascertainable way. If the property does not differ among the sample
case, it cannot be of statistical interest and therefore cannot be referred to as a variable. It can
also be defined as characteristics of interest that is either measured or counted e.g. the weight
of animal can be measured. Information on variables is called Qualitative data.
5. Accuracy: It is the closeness of the measurement of a computed value to its true value. It is
a measure of variation of the estimates to the actual value. That is,
Accuracy = Actual Value - Estimated Value.
6. Precision: It is the closeness of repeated measurement to its true value. It is the closeness
of an estimate to the expected value. It is the difference between the expected value and the
estimated value. i.e. Precision = Expected value -Estimated value.
7. Unit of Analysis: This is the unit for which we wish to obtain statistical data. These may
be animals, living matter, human being etc.
37
In-Text Question
Observation refers to the count of individual object or measurement taking on the smallest
sampling unit in a sample. True/False
In-Text Answer
True
2.5 Descriptive Statistics
2.5.1 Measure of Location
The various number values which tell us about the location (centres) of a distribution are
known as measures of location, central tendencies or simply called averages. Location here
refers to a value, which is typical or representative of the characteristics of a given
distribution. The Arithmetic Mean, Mode, Median, Harmonic mean, Geometric mean and the
quadratic mean are all measure of location.
1. Arithmetic Mean
This is perhaps the most important and most widely used average. The Arithmetic mean is the
sum of variable values divided by the number of such values. The symbol X (pronounced
X-bar) is used to represent the arithmetic mean or simply mean of a set X-values.
( )
n
xxxn
XxMean
n
n
ii
+++=
=∑
=
K21
1
Sum of variable values (X)
Number of such values= Means
38
Example 2.1
The ages of 10 students in a school were given as follows in years: 12, 14, 20, 20, 17, 18, 19,
24, 25, and 25. Calculate the arithmetic mean.
Solution
Example 2.2
Obtain the arithmetic mean of the numbers 8, 3, 4, 7, 2, 4, 6, 9, 10, 11, 14, 20, 21, 18, 19
Solution
2. Frequency Distribution
The mean is given and defines as:
∑∑=
f
fxXMean )(
where “f” is the frequency
“n” is the variable value
“ Σ ” is the sum up symbol called summation
12 14 20 20 17 18 19 24 25 25( )
10194
10
( ) 19.4 19
Mean X
Mean X years
+ + + + + + + + +=
=
= ≈
1( )
8 3 4 7 2 4 6 9 10 11 14 20 21 18 19
15156
15
10.4
n
ii
XMean X
n==
+ + + + + + + + + + + + + +=
=
=
∑
39
Example 2.3
The Ages in years of one hundred and fifty students in a secondary school are given as:
Age (x) 12 13 14 15 16 17
No of student 25 40 25 30 15 15
Determine the Arithmetic mean age of the distribution
Solution:
Age (x) f fx
12 25 300
13 40 520
14 25 350
15 30 450
16 15 240
17 15 255
Total 150 2115
1
n
i
f xX
f==∑∑
6
16
1
2 1 1 5
1 5 0 1 4 . 1
i
i
f xX
f
X
=
=
=
=
=
∑
∑
40
Example 2.4
The table below shows the distribution of the waiting times for some customers in a certain
Bank in Lagos.
Waiting time No of Customer
1.0 - 1.4 6
1.5 - 1.9 9
2.0 - 2.4 2
2.5 - 2.9 4
3.0 - 3.4 6
3.5 - 3.9 10
4.0 - 4.4 12
4.5 - 4.9 15
5.0 - 5.4 7
5.5 - 5.9 9
Find the average waiting time of the customers.
Solution:
Waiting time No of Customer (f) Mid value(x) fx
1.0 – 1.4 6 1.2 7.2
1.5 – 1.9 9 1.7 15.3
2.0 – 2.4 2 2.2 4.4
2.5 – 2.9 4 2.7 10.8
3.0 – 3.4 6 3.2 19.2
3.5 – 3.9 10 3.7 37
4.0 – 4.4 12 4.2 50.4
4.5 – 4.9 15 4.7 70.5
5.0 – 5.4 7 5.2 36.4
5.5 – 5.9 9 5.7 51.3
Total 80 302.5
41
Harmonic Mean (HM)
The harmonic mean of a set of a value x1,x2,x3,…........xn is the number of observations, n,
divided by the sum of the reciprocals of the values.
It is given by
Example 1.5
Find the harmonic mean of the following data
2,3,4,6,8,9,10 and 12
Solution:
10
1
10
1
( )
302.5
80 3.78
i
i
fxMean X
f
X
=
=
=
=
=
∑
∑
1
i
1 1
.1
x has frequency f then
.
n
i
n n
x x
nH M
x
It
fnH M O R
f fx x
=
= =
=
= =
∑
∑∑ ∑
1
.1
8
1 1 1 1 1 1 1 12 3 4 6 8 9 10 12
8
0.5 0.33 0.25 0.167 0.125 0.111 0.1 0.0838
1.666
4.8
. 4.8
n
i
nH M
x
H M
=
=
=+ + + + + + +
=+ + + + + + +
=
==
∑
If
42
Example 1.6
The data below gives the distribution of the time of studying chimpanzees in a zoological
garden:
Time (Hour) No of Chimpanzees
10 – 19 4
20 – 29 6
30 – 39 7
40 – 49 10
50 – 59 3
60 – 69 7
70 – 79 8
80 – 89 9
90 – 99 6
Determine the Harmonic mean of the number of Chimpanzees to the nearest Hour.
Solution:
Time (Hour) f Midvalue(x) f/x
10 - 19 4 14.5 0.2759
20 - 29 6 24.5 0.2449
30 - 39 7 34.5 0.2029
40 - 49 10 44.5 0.2247
50 - 59 3 54.5 0.0550
60 - 69 7 64.5 0.1085
70 - 79 8 74.5 0.1074
80 - 89 9 84.5 0.1065
90 - 99 6 94.5 0.0635
Total 60 1.3893
43
Geometric Mean
The Geometric mean (G.M) is an analytical method of finding the average rate of growth or
decline in the value of an item over a particular period of time.
The Geometric mean (G.M) of a set of n observation: x1, x2, x3,..............,xn is the nth root
of their product. It is given by
Example 1.7
Determine the geometric mean of the following set of data 2, 4, 8,11,4
Solution:
Using the previous example, find the geometric mean?
( )9
160
H.M 1.3893
43.187 43.2
i
f
fx
hrs
== =
= ≈
∑
∑
( )
( )
1 2
1 2
. , , .......,
T ak ing logarithm s, w e have
LLog G .M = log log ..... log
n
logFor grouped data G .M = A ntilog
nn
n
G M x x x
x x x
f x
f
=
+ + +
∑∑
( )5
5
. 2 4 8 1 1 4
2 8 1 6
4 .8 9 7
4 .9
G M x x x x=
==≈
n
1
44
Solution:
Time (Hour) f Midvalue(x) logx f logx
10 - 19 4 14.5 1.1614 4.6456
20 - 29 6 24.5 1.3892 8.3352
30 - 39 7 34.5 1.5378 10.7646
40 - 49 10 44.5 1.6484 16.4840
50 - 59 3 54.5 1.7364 5.2092
60 - 69 7 64.5 1.8096 12.6672
70 - 79 8 74.5 1.8722 14.9776
80 - 89 9 84.5 1.9269 17.3421
90 - 99 6 94.5 1.9754 11.8524
Total 60 102.2779
[ ]7046.1log
60
2779.102log
loglog. 1
Anti
Anti
f
xfAntiMG
n
x
=
=
=∑
∑=
G.M = 50.6524
2.5.2 Relationship between Arithmetric, Geometric and Harmonic Mean
In general, the geometric mean for a set of data is always less than or equal to the
corresponding arithmetic mean but greater than or equal to the harmonic mean. The
relationship is expressed as follows:
H.M < G.M < A.M
The equality signs hold only if all the observations are identical.
45
Median
The median of a set of number nxxxxx ,,,,, 4321 K arranged in an increasing or decreasing
order of magnitude is defined as:
(a). The middle observation if the number of such is odd.
(b). The arithmetic mean of the two middle observations if the number of such observations
is even.
If there are n observations in the distribution, the median is the magnitude of the ½ (n + 1)th
observations.
Example 1.8
Find the median of the following observations 8, 2, 6, 1, 8, 9, 3
Solution:
Arranging the observations in ascending order of magnitude gives 1,2,3,6,8,8,9
The number of observations is odd i.e.
The middle observation, i.e. median is 6
Example 1.9
Given the following observations
6,3,3,5,9,7,2,10
Solution:
Arranging the distribution in ascending order of magnitude gives 2,3,3,5,6,7,9,10
Since the number of observation is even i.e. 8. The median is the arithmetic mean of the two
middle observations, i.e.
5 6
2 5 .5
M ed ian+=
=
46
Median for Frequency Distribution
Steps:
a) Obtain the Cumulative frequency of a given distribution.
b) Determine 2
N
c) Determine the median class i.e. the class whose cumulative frequency is equal to or
greater than 2
N
d) Apply the formula.
Where
Lm = Lower class boundary of the median class
N = Total frequency ∑= f
Cfm = Cumulative frequency preceding the median class
fm = frequency of the median class
Cm = Class interval of the median class
Example 2.0
The following table shows the frequency distribution of 70 laurel leaves.
Length Frequency
118 -126 8
127 – 135 10
136 – 144 14
145 – 153 18
154 – 162 9
163 – 171 7
172 – 180 4
2 m
m m
m
N CfMedian L C
f
− = +
47
Total 70
Compute the median.
Solution:
Length F
Class Cf (cumulative frequency)
118 -126 8 117.5 – 126.5 8
127 – 135 10 126.5 – 135.5 18
136 – 144 14 135.5 – 144.5 32
145 – 153 18 144.5 – 153.5 50
154 – 162 9 153.5 – 162.5 59
163 – 171 7 162.5 – 171.5 66
172 – 180 4 171.5 – 180.5 70
The median position
Median thth
N
=
2
70
2
Median position = 35th; 352
=N
The median class is 145 – 153, the lower boundary of the median class 144.5 Lm = 144.5
Frequency of the median class = 18
fm = 18
Cumulative Frequency Preceding the modal class is 32 i.e Cfm = 32
Class size of the median is 9
Cm = 9
48
+
−+=
18
935.144
918
32355.144
X
XMedian
Mode
This is the observation or value with the highest frequency of occurrence in a given
distribution
Sometimes, there are two or more of such observation. In such situation the distribution is
multi modal.
Example 2.1
Given the following observations, determine the mode
37, 24, 74, 18, 24, 20, 72, 21, 23, 30, 24
Solution
The modal value is 24 because it occurs most
Example 2.2
From the values below, determine the mode: 14, 3, 6, 7, 1, 4, 6, 8, 1, 6, 3, 6, 3, 14, and 3
Solution
This is a case of a bimodal distribution. That is 3 and 6 are the modal values. In a situation
like this we find the arithmetic mean of 3 and 6.
1 4 4 .5 1 5 1 4 6
1 4 6M e d ia n
+ ==
3 6 m od
2 4 .5
The e+=
=
49
Mode for a Frequency Distribution
a. Determine the modal class i.e. class with the highest frequency
b. Apply the formular
Where Lm = lower class boundary of the modal class
d1= frequency of modal class minus frequency of the class
preceding the modal class
d2 = Frequency of the modal class minus frequency of the
class immediately after the modal class
Cm = Class interval of the modal class
Example 2.3
The number of complaints received at the service counter of a store owned by a company
during a month are as follows
Number of complaint Store
0 – 4 5
5 – 9 9
10 – 14 17
15 – 19 13
20 – 24 12
25 – 34 10
35 – 40 4
Determine the modal number of complaints at the store
1
1 2
m m
dMode L C
d d
= + +
50
Solution:
Number of complaint f
0 – 4 5
5 – 9 9
10 – 14 17
15 – 19 13
20 – 24 12
25 – 34 10
35 – 40 4
The modal class is the class (10 – 14) lower class boundary 9.5
d1=17 - 9 = 8
d2 = 17 – 13= 4
Cm = 5
2.5.3 Measures of Dispersion
In a distribution, measures of central tendency or location only describe the nature of the
distribution, but fail to give any idea as to how individual values differ from the central value
i.e. if they are closely packed around the central value or widely scattered away from it.
As a scientist once puts it "An average doesn't give us full story. An average does not
give information about dispersion of values in the distribution. The magnitude of variation
between two distributions that may have the same mean is known as dispersion.
89.5 5
8 4
8 9.5 5
12
9.5 3.333
12.833
13 int
Mode
compla s
= + +
= +
= +=≈
51
Measures of Dispersion
The following are the measures of dispersion:
i. Range
ii. Quartile deviation or semi-interquartile range
iii. Mean deviation
iv. Standard deviation
v. Variance
vi. Interquartile range
Range
The simplest measure of dispersion is the range. It is defined as the difference between the
highest and the smallest number in a set of a given data. It is very easy to compute. It is not
suitable for further statistical analysis.
Another very good use of range is in statistical quality control where an estimate of the
variance of the sample mean is obtained from R. Also, in the construction of a control chart
to know if a process is in control or not.
Example 2.4
Find the range of the given set of data: 89,120, 146, 76, 50, 49, 60
Solution
Range = highest value - smallest value
= 146 - 49
= 97
Quartile Deviation (Q.D)
a). Inter-Quartile Range: It is the difference between the upper quartile, Q3, and the lower
quartile (Q1).
b). Semi - Interquartile Range: It is defined as half of the interquartile range. It is also
known as quartile deviation, it is represented mathematically as,
52
2
13 QQQD
−=
c). Coefficient of quartile deviation
13
13
QQCQD
+−
=
Example 2.5
Ungrouped Data
Compute
i. Quartile deviation
ii. Interquartile range
iii. Co-efficient of quartile deviation for the given set of data below:
50, 64, 70, 85, 90, 100, 115, 160, 45, 54
Solution
The data should be arranged in ascending order, we have
45, 50, 54, 64, 70, 85, 90, 100, 115,160
Since the data is even
Q1 = N
4
= 10
4
= 2.5
53
Value of 2.5th item
Q1 = 50 + 0.5 (54 - 50)
= 50 + 0.5(4)
Q1 = 52
Q3 = 3N
4
= 3(10)
4
= 30
4
= 7.5
= value of 7.5th item
= 90 + 0.5(100 - 90)
= 90 + 0.5(10)
= 95
Therefore Quartile deviation 2
13 QQ −=
= 95-52
2
= 43
2
= 21.5
54
Interquartile Range = Q3 - Q1
= 95 - 52
= 43
Coefficient of Quartile deviation
13
13
+−
=
= 43
147
= 0.29
Example 2.6
Compute
i. QD
ii. IQR
iii. CQD for the data below
5, 4, 50, 44, 69, 75, 10
Solution:
Arrange in ascending order
4, 5, 10, 44, 50, 69, 75
Q1 = (N+1)
4
= 8/4 = 2
55
Value of the 2nd item
Q1 = 5
Q3 = 3(N+1)
4
= 3(8) = 6
4
value of the 6th item
Q3 = 69
therefore QD = 2
13 QQ −
= 69-5
2
= 32
IQR = Q3 - Q1
= 69 - 5
= 64
CQR = 13
13
+−
= 64
74
= 0.86
56
Grouped Data
Example 2.7
Calculate the quartile deviation and its related coefficient from the following data:
Size 10-19 20-29 30-39 40-49 50-59 60-69 70-79
Frequency 1 4 6 7 11 7 4
Solution:
position of Q1 = N
4
40/4 = 10
Size f Class boundary Cumulative frequencies
10-19 1 9.5-19.5 1
20-29 4 19.5-29.5 5
30-39 6 29.5-39.5 11
40-49 7 39.5-49.5 18
50-59 11 49.5-59.5 29
60-69 7 59.5-69.5 36
70-79 4 69.5-79.5 40
From the table, position of Q1 falls in the class boundary 30 – 39 with lower class limit = 29.5
Q1 for grouped data
4N m
l xCf
− = +
57
Where l is the lower class limit
C is the class interval
f is the frequency of quartile position
m is the cumulative frequency of class below quartile position
3Q position 304
4034
3 === XN
l = 59.5 m = 29
107
29305.59
43
3
X
XCf
mNlQ
−+=
−+=
= 59.5 + 1.43
= 60.93
13
13
QQCQD
+−
=
83.3793.60
83.3793.60
+−=
76.98
1.23=
23.0=
1
10 529.5 10
6
5 29.5 10
6 29.5 8.33
37.83
Q x
x
− ⇒ = +
= +
= +=
58
Mean Deviation
It is the arithmetic mean of the absolute differences of the values from mean, median or
mode. Average deviation has been defined as the average amount of scatter of the items in a
distribution from either the mean or the median, ignoring the signs of the deviation.
Mean Deviation = n
xx∑ − for an
Ungrouped data
and for a grouped data
Example 2.8
Find out the mean deviation of the following series
x 17-19 20-25 26-35 36-40 41-50 51-55 56-60 61-70
f 9 16 12 26 14 12 6 5
Solution
x f
18 9 20.245 182.205
22.5 16 15.745 251.92
30.5 12 7.745 92.94
38.0 26 0.245 6.37
45.5 14 7.255 101.57
53.5 12 14.755 177.06
58.0 6 19.755 118.53
65.5 5 27.255 136.275
Total 1066.87
( ) f x x
Mean deviation MDf
−= ∑
∑
x x− f x x−
59
Standard Deviation (S.D)
In 1893, Karl Pearson introduced the concept of standard deviation.
It is defined as the positive square root of the mean of the squares of the deviations from the
garithmetic mean.
Example 2.9
Find the standard deviation of the scores obtained by students in the table given below:
S/No. 1 2 3 4 5
Scores 50 46 70 89 15
3824 .5
100 38 .245
fxx
f=
=
=
∑∑
1066.87
100 10 .67
f x xM D
f
−=
=
=
∑∑
( )2
22
.f x x
S Df
fx fxOR
f f
−=
−
∑∑
∑ ∑∑ ∑
60
Solution
x
50 16
46 64
70 256
89 1225
15 1521
3082
( )2
x x−
( )2
2 7 0
55 4
3 0 8 2
5
6 1 6 .4
2 4 .8 3
x
x xS D
n
S D
=
=
−=
=
==
∑
61
Example 2.10
Find the standard deviation for the following data
Marks frequency
15 – 20 20
20 – 25 26
25 – 30 44
30 – 35 60
35 – 40 101
40 – 45 109
45 – 50 84
50 – 55 66
55 – 60 10
Solution:
X frequency
17.5 20
22.5 26
27.5 44
32.5 60
37.5 101
42.5 109
47.5 84
52.5 66
57.5 10
62
Coefficient of Variation
It is known as the relative measure of dispersion. It is given by
100X
xCV
σ=
or %100tan
Xmean
darderrorS
Example 2.11
Using the previous example on standard deviation to calculate the coefficient of variation
Note
Coefficient of variation based on standard deviation or standard error helps the comparison of
two or more distributions for their variable.
Note
579
586391
009715615961652
520
20545
520
8593502
22
.
.
..
==
−=
−=
−=
∑∑
∑∑
f
fx
f
fxSD
100Xestimate
estimateoferrordardsCV
tan=
24.227 .
.
=
=
=
100 5 39
579
100
X
X x
CV xσ
63
Skewness
It is the measure of the asymmetric of a distribution. Skewness means lack of symmetry.
Skewness indicates whether the curve is turned more to one side than to the other.
It is the degree of departure from symmetry of a distribution. A distribution can be positively
skewed or negatively skewed or zero skewed.
For a positively skewed distribution, its mean > median >mode. For a negatively skewed one,
we have mean < median<mode. Measures of Skewness give the direction and dimension of
the deviation of the distribution from symmetry. Karl Pearson’s coefficient of Skewness is
given by the formula
Coefficient of Skewness
Bowley’s coefficient of Skewness is given by
This is a measure based on the quartiles and hence is also known as quartile coefficient of
Skewness.
Kelly’s coefficient of Skewness is defined as
19
19 2
DD
MedianDDCS
−−+
=
Among the three measures, the Pearson’s coefficient is used more often
( )deviation standard
MedianMean3 or
deviation standard
modemean
−
−=
13
13
Q-Q
2Median-QQ
+=
64
Example 2.12
Calculate the coefficient of Skewness using data in the table below
Height in inches 58 59 60 61 62 63 64 65
No. of students 10 18 30 42 35 28 16 8
Solution
Using Pearson’s coefficient of Skewness
= iondardDeviatS
MedianMean
tan
−
Kurtosis
Kurtosis is the measure of peakedness of a distribution. A normal curve which is symmetrical
and bell shaped is called a mesokurtic curve. If a curve is peaked at the top, it is termed as
leptokurtic and a curve which is more flat than the normal curve is called platykurtic.
Thus, property of the distribution is measured by
2
2
42 µ
µβ =
230761
61461
761
61
461187
11482
.
.
.
.
.
=
−=
=
=
=
=
=∑∑
Therefore
SD
Mode
f
fxx
65
Coefficient of kurtosis is 322 −= βγ
For a symmetric distribution, . If then, the distribution is called leptokurtic
and if the distribution is called platykurtic.
Summary of Study session 2
In this study, you have learnt about:
1) The meaning of Biometrics;
2) The sources of biological data;
3) Also, you have been able to learn how to identify problems associated with the
collection of biological data;
4) Some basic terms in Biometrics; and
5) The various statistical tools.
Post-Test
1. Calculate the range and semi-interquartile range deviation and coefficient of quartile
Wages
(Rs)
30-32 32-34 34-36 36-38 38-40 40-42 42-44
Freq. 12 18 16 14 12 8 6
Ans: 14, 2.85, 0.08
2. Calculate the QD for the following data:
Height 58 59 60 61 62 63 64 65 66
No. of
Student
15 20 32 35 33 22 20 10 8
32 =β 32 >β
32 <β
66
3. Calculate MD for
Age (yrs) 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of
patient
20 25 32 40 42 35 10 8
Ans: 10.73
4. Find the SD of the marks scored by the students given as 74,72,78,75,80, 84
Ans: 4.02
5. Calculate the standard deviation and coefficient of variation of the following data
Age (yrs) 0-10 10-20 20-30 30-40 40-50
No. of patient 5 16 18 14 47
Ans:
6. If the coefficients of variation of two groups are 60 and 80 and their standard deviations
are 20 and 16. Find their mean
Ans: 33.3 and 20.
36390713233 . . ,. === CVSDx
67
Self-Assessment Question for Study Session 2
Having studied this session, you can now assess how well you have learnt its learning outcomes
by answering the following questions. It is advised that you answer the questions thoroughly and
discuss your answers with your tutor in the next Study Support Meeting. Also, brief answers to the
Self-Assessment Questions are provided at the end of this module as a guide.
SAQ (Test of learning outcome)
1. Find the median of the following observations 8, 2, 6, 1, 8, 9, 3
2. Determine the geometric mean of the following set of data 2, 4, 8,11,4
68
References
Brown D. & Rothery P. (1993): Models in Biology: Mathematics, Statistics and Computing.
Collett D. (1991): Modeling Binary data, Published by Chapman & Hall.
Mead R, Curnow R.N. & Hasted A.M. (2002, 3rd Ed): Statistical Methods in Agriculture and
Experimental Biology, Published by Chapman & Hall.
Murray, R. Spiegel (1972, 1st Ed): Theory and Problems of Statistics.
69
Study Session 3: Sampling and Sampling Distribution
Introduction
What comes to your mind when you think of research, I guess it’s a search for some more
relevant information about a subject which several searches has been previously done. The
major purpose of conducting a research is to make inferences or generalize from a sample to
a larger population. This process of making inferences or generalization about the total
population is accomplished by the use of statistical method based on probability.
Learning Outcomes for Study Session 3
At the end of this study, you should be able to:
3.1 Identify various sampling methods
3.2 Test hypotheses
3.1 Sample
In order for you to be in a good shape in this study, you need to understand some terms:
Population can be defined as the collection of the entire item that has something in common.
Population is not limited to human beings alone. A population may either be of finite size or
infinite size.
Sample is the fractional part of the population selected in such a way that it is a
representative of the entire population. Sampling is a process of selecting the fractional part
of the population as a sample.
70
Figure 3.1: Population sampling
Why do we sample? We sample because of the following reasons:
1. Collecting information from the fractional part of the population reduces expenditure
than when the entire population is attempted.
2. A study of an entire population is impossible in most situations, hence, the sample
must be a representative of the population as much as possible.
3. Results based on sample are more accurate than those based on the entire population.
4. Samples can be studied more quickly than the entire population.
5. Data is generally more readily available and more quickly analysed when the sample
is being studied.
3.1.1 Sampling Method
Now, we consider how the sample can be selected, which is referred to as the sampling
techniques.
There are basically two types of sampling techniques
1. Probability sampling/random sampling.
2. Non-probability/non random sampling.
Probability Sampling
This is the method whereby the sample is selected/drawn from the population in such a way
that every member of the population has equal/known chance of being selected. The sample
selected is free from bias. Therefore, inferences about the population from sample are valid.
There are five types of probability sampling:
Figure 3.1
1 Simple Random Sampling
This is a sampling technique in which each unit of the population has an
being selected.
A simple Random sample of n unit is a sample selected in such a way that every N unit in the
population has an equal chance of being selected. A Simple Random Sampling is done either
WITH REPLACEMENT OR WITHOUT REPL
have in sampling. It is very easy to apply and make use of to avoid bias i.e. it is unbiased.
1. Simple random sampling
2. Systematic sampling
3. Stratified sampling
4. Cluster sampling
5. Multi-state sampling
71
This is the method whereby the sample is selected/drawn from the population in such a way
that every member of the population has equal/known chance of being selected. The sample
Therefore, inferences about the population from sample are valid.
There are five types of probability sampling:-
Figure 3.1: Types of probability sampling
sampling technique in which each unit of the population has an equal probability of
A simple Random sample of n unit is a sample selected in such a way that every N unit in the
population has an equal chance of being selected. A Simple Random Sampling is done either
WITH REPLACEMENT OR WITHOUT REPLACEMENT. It is the simplest technique we
have in sampling. It is very easy to apply and make use of to avoid bias i.e. it is unbiased.
Simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling
state sampling
This is the method whereby the sample is selected/drawn from the population in such a way
that every member of the population has equal/known chance of being selected. The sample
Therefore, inferences about the population from sample are valid.
equal probability of
A simple Random sample of n unit is a sample selected in such a way that every N unit in the
population has an equal chance of being selected. A Simple Random Sampling is done either
It is the simplest technique we
have in sampling. It is very easy to apply and make use of to avoid bias i.e. it is unbiased.
72
2 Systematic Sampling
This is a sampling method in which the population units are selected at a fixed interval from
the sampling frame. Suppose there are N units in a population, selecting a sample of n
involves taking a unit at random from the first K units (where K = N/n) and every kth unit
thereafter.
n
Nkei
nsizeSample
NsizePopulationK
=
=
.
)(
)(
Where k is the sampling interval
Example
If a sample of 10 students is to be taken from a list of 100 students in a public school, then the
systematic selection of this sample entails:
Solution:
i. Determining the sampling interval k as follows:
N = 100, and n = 10
K = N/n = 100/10
K = 10.
ii. Using the random number table, select a number at random (i.e. Random start) between 1 and 10.
iii. Say 3 is selected. Then the sample would include students with the following numbers:
3, 13, 23, 33, 43, 53, 63, 73, 83 and 93.
This method does not give every unit of the population an equal chance of being selected.
The selection of the first unit (Random start) determines the whole sample.
73
The major advantages of this method are:
1. It is more convenient than simple random sampling or stratified sampling.
2. Systematic Sampling may be more efficient than the Simple Random Sampling, in the
sense that the sample selected is a true representative of the whole population.
3 Stratified Sampling
If the population is very heterogeneous, the best method of taking a sample is through
stratified sampling. In this method, the population is divided into two or more homogenous
groups called “strata”. The stratification factor can be sex, age, marital status, department,
income level, academic qualification, state of origin etc.
Then in each stratum, the sample size required is taken using either simple random or
systematic techniques. The basic purpose of stratification as compared to simple random
sampling is to obtain reduction in sampling error and an increase in precision.
4 Cluster Sampling
This is a form of probabilistic sampling in which the population is first of all divided into
clusters or groups. Then a probability sample of these clusters is taken for examination. The
method of simple random sampling or systematic sampling could be employed for the
selection of clusters.
5 Multi-Stage Sampling
When sampling is done in stages, it is referred to as Multi-Stage Sampling. It is normally
used to cut down the number of investigators and costs of obtaining a sample.
In-Text Question
_____________ is a sampling technique in which each unit of the population has an equal
probability of being selected.
a. Simple random sampling
b. Systematic sampling
c. Stratified sampling
d. Cluster sampling
74
In-Text Answer
a. Simple random sampling
Non-Probability Sampling
This is a method of selecting sample with a definite purpose. It is a sampling technique
without the consideration of probability. The choice of sampling depends entirely upon the
direction and judgement of the sampler. It does not give every unit of the population an equal
or known chance of being selected or included in the sample.
Examples of Non-probability sampling, are; Haphazard; Volunteer, i.e. any procedure of
selection that is taken in a zigzag manner. Another example is quota sampling.
1 Quota Sampling
This is a method whereby the population is sub divided into homogenous units just like
stratified sampling, but the main difference is that, the selection of unit within each stratum is
not Random. This method is commonly used in opinion polls, television audience research
and market research.
2 Sampling Frame
This is the numbered list of all the items in the population. A sampling frame should have the
following properties:
1. Completeness
2. Up-to-date
3. Adequacy
4. No-duplication
5. Conveniency
6. Accuracy
75
3 Sampling Distribution
1. Parameter: It is a numerical descriptive measure of a population, because it is based
on the observation in the population and its value is unknown. It can also be referred
to as population characteristics.
2. Statistic: It is a numerical descriptive measure of a sample to make inferences about
the PARAMETERS of a population. It can also be referred to as sample
characteristics.
If we consider how “statistic” differs from sample to sample drawn from a given population
either with or without replacement, for each sample, we can compute a statistic (mean (x ),
standard deviation, proportion.) e.g. A given statistic such as mean or a proportion will vary
from sample to sample. The distribution of this mean can therefore be examined to estimate
the amount of variation that can be expected from one sample to another.
The probability distribution of such statistic is referred to as “SAMPLING
DISTRIBUTION”.
Sample Distribution of Means ( )X
Let f(x) be the probability distribution of any given population from which we take a sample
of size n. Then, the probability distribution of the statistic X is called the sampling
distribution of means.
Theorem 1
If x1, x2, x3,…., xn represent a sample of size n, without replacement, from a finite population
of size N with population mean µ and variance σ2, then the sample mean is.
n
XX
n
i
I∑== 1
76
1. ( )xE = µ (i.e. the mean of sampling distribution equal the population mean i.e µµ =x
2. ( )
−−=
1
2
N
nN
nxv
σ
Where ( )XV denotes variance of the sampling distribution and σ2 denote the population
variance.
Theorem 2
If x1, x2, x3….., xn represent a sample of size n, with replacement σ2 from an infinite
population with mean µ and variance 2σ , then the sample mean.
n
iXX
n
i∑
== 1
(i) E(x) = µ i.e. µx = µ
(ii) ( )n
xv2σ=
The standard deviation of the mean n
x
σσ =
The standard deviation of a sampling distribution is referred to as STANDARD ERROR.
Example
A population of size N = 5 consisting of the ages of children suffering from measles in a
clinic. The ages are as follows.
X1 = 3, X2 = 5, X3 = 7 X4 = 9 and X5 = 11
i. List all the possible sample of size 2 which can be drawn for the population with
replacement i.e. from an infinite population.
ii. Find the population mean and standard deviation.
iii. Find the mean of the sampling distribution of mean
iv. Find the standard deviation of the sampling distribution of mean.
2. Solve (i) if the sampling is without replacement.
77
Solution:
1 With Replacement
N = 5
n = 2
Possible samples = 5 x 5 = 25 samples or Nn sample = 52 = 25 sample.
3 5 7 9 11
3 (3,3) (3,5) (3,7) (3,9) (3,11)
5 (5,3) (5,5) (5,7) (5,9) (5,11)
7 (7,3) (7,5) (7,7) (7,9) (7,11)
9 (9,3) (9,5) (9,7) (9,9) (9,11)
11 (11,3) (11,5) (11,7) (11,9) (11,11)
(ii) Population mean (µ)
µ = x1 + x2 + x3 + x4 + x5
5
= 3 + 5 + 7 + 9 + 11
5
µ = 7
Population standard deviation
σ2 = (3-7)2 + (5-7)2 + (7-7)2 + (9-7)2 + (11-7)2
5
= 16 + 4 + 0 + 4 + 16
5
σ2 = 40
5
σ = √σ2
78
σ = √8
σ = 2.83
(iii) The mean of sampling distribution of means.
The corresponding sample means are ( )X i.e. for row 1
Column 1 i.e the sample means are 42
53,3
2
33 =+=+, etc.
x f f x ( )µ−x ( )2µ−x ( )2µ−xf
3 1 3 -4 16 16
4 2 8 -3 9 18
5 3 15 -2 4 12
6 4 24 -1 1 4
7 5 35 0 0 0
8 4 32 1 1 4
9 3 27 2 4 12
10 2 20 3 9 18
11 1 11 4 16 16
Total 25 175 100
725
175
=
=
=∑∑
f
xfxµ
By theorem 2.
µµ =x
79
i.e the mean of sampling distribution of means must be equal to the population mean.
om theorem 2
425
1002 ==xσ
242 === xx σσ
From equation 2
nx
σσ =
Where xσ is the population variance.
2
83.2=xσ
2=xσ
(2) Without Replacement
Total possible samples
nNC Samples
N = 5
n = 2
25C = 10 samples.
Samples are:
(3,5), (3,7), (3,9), (3,11), (5,7), (5,9), (5,11), (7,9), (7,11) and (9,11)
The corresponding samples mean (x ) are: 4, 5, 6, 7, 6, 7, 8, 8, 9 and 10.
( )∑
∑ −=
f
xfx
22 µ
σ
80
x f f x ( )µ−x ( )2µ−x ( )2µ−xf
4 1 4 -3 9 9
5 1 5 -2 4 4
6 2 12 -1 1 2
7 2 14 0 0 0
8 2 16 1 1 2
9 1 9 2 4 4
10 1 10 3 9 9
Total 10 70 30
710
70 ===∑∑
f
xfxµ
( )3
10
302
2 ==−
=∑
∑f
xf µσ
Hence
73.13 ==σ
Example 2
The weights at birth of a population of four puppies delivered in one day are given below
1 2 3 4
3.9 3.6 3.4 3.7
Suppose a simple Random Sample of size n = 2 is to be selected from this population of N=4
with replacement and without replacement.
81
i. List the entire possible sample which can be drawn with replacement and without replacement.
ii. Find the population mean and standard deviation
iii. Find the mean of sampling distribution of mean
iv. Find the standard errors of the sampling distribution of mean.
Solution:
1 With Replacement
Total possible sample when N = 4 and n = 2 is Nn = 24 = 16 samples
3.9 3.6 3.4 3.7
3.9 (3.9,3.9) (3.9,3.6) (3.9,3.4) (3.9,3.7)
3.6 (3.6,3.9) (3.6,3.6) (3.6,3.4) (3.6,3.7)
3.4 (3.4,3.9) (3.4,3.6) (3.4,3.4) (3.4,3.7)
3.7 (3.7,3.9) (3.7,3.6) (3.7,3.4) (3.7,(3.7)
(ii) The population mean (µ).
iii. The corresponding sample mean (x )
x 3.9 3.6 3.4 3.7
3.9 3.9 3.75 3.65 3.8
3.6 3.75 3.6 3.5 3.65
3.4 3.65 3.5 3.4 3.7
3.7 3.8 3.65 3.55 3.7
( )2
2
3.9 3.6 3.4 3.73.65
4
0.0625 0.0025 0.0625 0.0025
40.13
0.03254
0.0325 0.18
X
N
x
N
µ
µσ
σ
+ + += = =
− + + += =
= =
= =
∑
∑
82
The mean of the sampling distribution of mean xµ
x f f x ( )µ−x ( )2µ−x ( )2µ−xf
3.4 1 3.4 -0.25 0.0625 0.0625
3.5 2 7 -0.15 0.0225 0.045
3.55 2 7.1 -0.1 0.01 0.02
3.6 1 36 -0.05 0.0025 0.0025
3.65 4 14.6 0 0 0
3.7 1 3.7 0.05 0.0025 0.0025
3.75 2 7.5 0.1 0.01 0.02
3.75 2 7.6 0.15 0.0225 0.045
3.9 1 3.9 6.25 0.0625 0.0625
TOTAL 58.4
( )
65.316
4.58
=
=
=∑∑
f
xfxµ
65.3=xµ
The variance of the sampling distribution of mean
01625.0=xσ
2
2( )
x
f x
f
µσ
−= ∑
∑
2
0.127x x
x
σ σσ
==
83
ii) WITHOUT REPLACEMENT
Total possible sample when N = 4 and n = 2
624 == CCn
N
Samples
The samples are
(3.9, 3.6) (3.9, 3.4), (3.9, 3.7) (3.6, 3.4), 3.6, 3.7) and (3.4, 3.7)
The corresponding sample means are
3.75, 3.65, 3.8, 3.5, 3.65 and 3.55
xv
f f xv
( )µ−x ( )2µ−x ( )2µ−xf
3.5 1 3.5 -0.15 0.0225 0.0225
3.55 1 3.55 -0.1 0.01 0.01
3.65 2 7.3 0 0 0
3.75 1 3.75 0.1 0.01 0.01
3.8 1 3.8 0.15 0.0225 0.0226
TOTAL 6 21.9 0.065
( )
65.36
9.21 ==
=∑∑
f
xfxµ
The variance of the sampling distribution of mean
( )2
2
∑∑ −
=f
xfx
µσ
6
063.0=
= 0.0108
84
The standard deviation of the sampling distribution of mean
104.0
0108.02
==
x
x
σσ
Theorem 3
Suppose that the populations from which samples are taken are from Normal Population
with population mean µ and variance 2σ i.e ( )2,~ σµNX .The sample mean ( ) ∑=
=n
iix
nx
1
1
the sample mean has a normal distribution with mean µ and variance n
2σ i.e. ( )2,~ σµNx .
The random variable x from which the mean of its distribution is xµ and its standard
deviation n
x
σσ = by approximately modifying the formula,
n
xZ σ
µ−=
Example 1
(i) If the uric acid values in 13 pregnant women randomly selected in a maternity
clinic are approximately normally distributed with a mean of 7.9 and standard
deviation of 0.7 mgs percent respectively. Find the probability that the sample size
will yield a mean.
(a) At most 8.2
(b) Between 7.5 and 8.2
(c) At least 7.4
85
Solution:
µ = 7.9
S.D = = 0.7
n =13
(i)
( )
[ ]55.1
194.0
3.0
137.0
9.72.82.8
≤=
≤=
−≤−=≤
zp
zp
n
xpxP σ
µ
( ) [ ] ( )9394.0
55.155.12.8
==≤=≤ φzpxp
( )
( )( ) ( )( ) ( )( )
( )
9197.0
0197.09394.0
9803.019394.0
06.2155.1
06.255.1
55.106.2
194.0
3.0
194.0
4.0
137.0
9.72.8
137.0
9.75.72.85.7 (ii)
=−=
−−=−−=
−≥−<=<<−=
<<−=
−<−<−=<<
φφ
σµ
zpzp
zp
zp
n
xpxp
σ
86
( )
[ ]( )( )
9953.0
047.01
58.21
58.21
58.2
137.0
9.74.74.7 (iii)
=−=
−−=−≤−=
−≥=
−≥−=≥
φ
σµ
xp
zp
n
xpxp
Example
A certain liquid drug is marketed in bottles containing a nominal 20 ml of drug. Tests on 10
bottles indicate that the volume of liquid in each bottle is distributed normally with mean
20.42 ml and standard deviation 0.429
(i) Estimate the percentage of bottles which would be expected to contain less than 20ml of
drug.
(ii) Estimate the bottle expected to contain between 20 ml and 20.5ml
Solution
µ = 20.42
= 0.429
n = 10
σ
87
(i)
( )
( ) [ ]( )
[ ]( )
0010.0
9990.01
09.31
09.31
0010.0
09.3
09.320
136.0
421.0
10429.0
42.202020
=−=−=
≤−=
=−=
−≤=≤=
−≤=
−≤−=≤
φ
φ
σµ
zp
or
zpxp
zp
n
xpxp
Therefore 0.1% of bottles would be expected to contain less than20 ml of drug.
(ii)
( )
( ) [ ]( ) ( )( )
7214.0
0010.07224.0
09.3159.0
59.009.35.2020
10429.0
42.205.20
10429.0
42.20205.2020
=−=
−−−=≤≤−=≤≤
−≤−≤−=≤≤
φφ
σµ
zpxp
n
xpxp
3.2 Hypothesis Testing
When making a statistical enquiry, we often put forward a hypothesis concerning a
population parameter. The essence of hypothesis testing is to aid the researcher in reaching a
decision concerning a population by examining a sample from that population.
A statistical hypothesis test is usually structured in terms of two mutually exclusive
hypotheses referred to as the NULL HYPOTHESIS and ALTERNATIVE HYPOTHESIS
denoted by H0 and H1 respectively.
88
3.2.1 Definition of some Basic Concepts
i. Statistical Hypothesis: This is an assumption made about the value of population
parameter or parameter of the distribution or population.
ii. Null Hypothesis (HO): This is a statement of insignificant or no difference between
the observed value and theoretical value. It is denoted by HO. It can either be accepted
or rejected.
iii. Alternative Hypothesis: This is a statement of significance or differences between
the observed value and theoretical value. This is a statement against the Null
hypothesis and it is usually denoted by H1. There are two types of alternative
hypothesis.
a. One tailed test
b. Two tailed test.
i. ONE TAILED TEST: A one tailed test looks for a definite decrease or a definite increase in
the parameter. The hypothesis could be
(i) Definite decrease (ii) Definite increase
H0 : µ = 25 H0 : µ = 25
HI : µ < 25 HI : µ > 25
(ii) TWO TAILED TEST: Two tailed test look for any change in the parameter.
Examples
H0 : µ = 25
HI : µ ≠ 25 (i.e µ <25 or µ > 25
(iii) LEVEL OF SIGNIFICANCE (α ): This is usually specified so as to guide one to
either accept or reject H0: if we reject H0 at this level, we say that “there is
significant evidence at the level of significance”.
(1) TYPE 1 ERROR: These errors occur if the null hypothesis is rejected when it should be
accepted. The probability of committing type 1 error is referred to as level of significance.
The probability of a type 1 error is conditional probability. It is denoted by α .
P(Rejecting HO/HO is true) = α
1 - α = P(Accepting HO /HO is true)
89
(2) TYPE 2 ERROR: These errors occur if the Null hypothesis is accepted when it should be
rejected. Its probability is denoted by β such that
β = P(Accepting H0 /H0 is false)
β−1 = P(rejecting H0 /H0 is false)
(3) CRITICAL REGION: This is the value of the test statistic that leads to the rejection of
the null hypothesis. Critical region depends on the type and the level of the test chosen. The
boundaries of the critical region are called the critical values.
In general, when performing significance test, it is useful to follow a set procedure.
(1) State the null hypothesis H0 and the alternative hypothesis HI
NB. If we are looking for a definite increase or definite decrease in the population
parameter, we use a one tailed test and if we are looking for any change we use a two-
tailed test.
(2) State the level of significance.
(3) Determine test statistic which is based on random sample and appropriate test statistic
i.e. if n is less than 30 use t statistic and n ≥ 30 use Z statistic
(4) Determine the critical region using the cumulative distribution table for the test
statistic.
(5) Compute the value of the test statistic based on the sample information.
(6) Make a conclusion: If the value of the test statistic lies in the critical region, reject Ho.
If the value of the test statistic does not lie in the critical region, accept Ho.
Hypothesis for a Single Population Mean
Example 1
Nine crippled children were asked to perform a certain task, the average time required to
perform the task was 7 minutes with a standard deviation of 2 minutes. Set the null
hypothesis that 0µ = 10 mm and = 0.1
α
90
Solution
H0 : µ = 10
H1 : µ 10
= 0.1
Test statistic
5.467.0
33
23
92
107
−=
−=
−=
−=−=
−=
cal
cal
t
ns
xt
ns
xt
µ
µ
Conclusion
Since the calt lie in the rejection region, we reject Ho and conclude that there is significant
evidence at 10% level that the population mean is not 10
Example
A researcher wishes to estimate the mean maximum weight at birth of 40 randomly selected
babies delivered in four teaching hospitals in a day. He is willing to assume that the weights
are approximately normally distributed with a variance of 144. He found out that the mean
weight for these babies is 84.3kg. Can the researcher conclude that the weight for the
population is at most 95kg? Use α = 0.01
≠
α
91
Test statistic
63.590.1
7.1040
12953.84
326.201.0
−=
−=
−=
===
−=
Z
ZZZn
xZ
tab α
σµ
Conclusion
Since the calZ lie in the acceptance region, we accept H0 and conclude that the mean weight
for the population mean is at most 95kg.
Hypothesis Testing for a Simple Population Variance
Examples
A sample of 40 observations resulting from an experiment yielded a mean and standard
deviation of 71.5 and 12 respectively. Can we conclude at 0.05 level of significance that the
population variance is 150?
Solution
05.0
150:
150:2
1
20
=≠
=
ασσ
H
H
( ) ( )2
1,21 −− nαχ and ( ) ( )2
1,2 −nαχ
= 239,975.0χ and 2
39,025.0χ
= 58.12 and 23.65
2
22 )1(
σσ−= n
X
92
( )( )
44.37150
144392
=
=χ
Conclusion: Since 2calχ lies in the acceptance region. Therefore, we accept HO and conclude
that population variance i.e. = 150
3.2.2 Hypothesis Testing on the Difference of Two Population Mean
The following may be used to test whether there is a significant difference between two
populations mean. In such cases, one of the following hypotheses is tested.
(1) H0 : 21 µµ − = 0 against
H1 : 21 µµ − ≠ 0
(2) H0 : 21 µµ − ≥ 0 against
H1 : 21 µµ − < 0
(3) H0 : 21 µµ − ≤ 0 against
H1 : 21 µµ − > 0
(1) We will consider the case when 21σ and 22σ are known
We use the test statistic
( ) ( )
2
22
1
21
2121
nn
XXZ
σσµµ
+
−−−=
Which is distributed as N(0, 1)
(2) If when 21σ and 2
2σ are unknown but assumed equal
We use the test statistic
2σ
93
( ) ( )
2
2
1
2
2121
n
s
n
s
XXt
pp +
−−−= µµ
With 221 −+ nn degree of freedom
Where ( ) ( )
2
11
21
222
211
−+−+−=
nn
snsnSp
Example
On assumption of office the General Manager of Global Textile Plc was told that the average
output of the 120 factory workers is 60 bundles of textile per month with a standard deviation
of 19 bundles per month. He discovered that this average output was too low and attributed
this to efficiency of old workers in factory, he therefore decided to retire those who are above
50 yrs and employed younger worker in their place.
After a year, the factory manager reported that the mean output of the present 115 workers in
the factory had increased to 70 bundles with standard deviation of 116 bundles per month.
The G.M refers the case to a researcher with the question that has the mean output really
increase. Answer this question by carrying out a test of 5% level of significance.
Solution
Given
16
70
115
2
2
2
==
=
σX
n
H0 : 021 =− µµ against
HI : 021 >− µµ
Sp
nn
XX
21
2121
11
)()(
+
−−− µµ
19
60
120
1
1
1
==
=
σX
n
94
= 0.05
( ) ( )
2
22
1
21
2121
nn
XXZ
σσµµ
+
−−−=
( )
37.4
29.2
10
24.5
10223.201.3
10115256
120361
1011516
12019
07060
645.1
22
95.005.0105.0
−=
−=−=
+−=
+
−=
+
−−=
==== −
cal
cal
cal
cal
cal
Z
Z
Z
Z
Z
ZZZZα
Conclusion
Since the calZ lies in the rejection region in the critical region therefore we reject HO and
conclude that the mean has really increased.
Example 2
Suppose that the random sample from population A and B respectively give the following
result:
Sample mean Sample variation Sample size
POP A 12.6 10.02 9
POP B 8.9 16.08 7
If the variance is assumed equal, test whether there is a difference in the population mean at
5% level of significance.
α
95
Solution
HO : 021 =− µµ against
HI : 021 ≠− µµ
= 0.05
( ) ( )
21
2121
11
nns
XXt
p
cal
+
−−−=
µµ
( ) ( )2
11
21
222
211
−+−+−=
nn
snsnSp
( )
( )( ) ( )( )
55.3279
08.161702.1019
71
91
09.86.12
=−+
−+−=
+
−−=
p
p
p
cal
s
s
s
t
( )
145.2
084.2
5.055.3
7.3
5.07
1
9
1
279,975.0
2,1 212
==
==
=
=+
−+
−+−
tt
tt
t
and
tab
nntab
cal
α
Conclusion
Since calt lies in the acceptance region, therefore we accept HO and conclude that there is no
difference in the population mean.
α
96
Estimation
The subject of estimation is concerned with the methods by which population characteristics,
i.e. parameters are estimated from sample information. Suppose that a population has an
unknown parameter, such as the mean and the variance. We take a random sample (or
samples) from the population and make an estimate of the unknown parameter from this.
(i) Point Estimation: This is a single numerical value used to estimate the
corresponding population parameter.
(ii) Interval Estimation: This consists of two numerical values defining an interval
with varying degrees of confidence.
Techniques for Interval Estimation
(1) Make a direct probability statement, including the parameter to be estimated.
(2) Rearrange the confidence interval so that it will be in the form of direct inequalities of
the parameter.
Estimator: This is a statistic used to estimate the value of a parameter and it is denoted by a
capital letter e.g. 2,σµ .
An Estimate: The numerical value taken by the estimator for a particular sample e.g. Sample
mean ( ) is an estimator for population mean where sample variance ( ) is an estimator
for a population variance ( )
In-Text Question
Point Estimation is a single numerical value used to estimate the corresponding population
parameter. True/False?
In-Text Answer
True
x µ 2S
2σ
97
Properties of a Good Estimator
(1) Unbiasedness: An estimator is said to be unbiased, if the average value i.e. the
expected value of the estimator is equal to the true value of parameter being estimated
i.e. ( ) µ=xE
(2) Efficiency: The efficiency of any estimator is measured in terms of its variance. The
lower the variance, the more efficient an estimator is said to be.
(3) Consistency: An estimator is consistent if the sample size becomes large when the
probability increases, then the estimate will approach the true value of the population
parameters.
(4) Sufficiency: An estimator is sufficient, if it uses all the information that a sample can
provide about a population parameter and no other estimator can provide additional
information.
Interval estimation is where we expect the true value of the population parameter to lie. We
shall be discussing the construction of confidence interval as a measure of interval estimation.
The confidence we have that a population parameter will fall within some confidence interval
= 1 - σ where σ is the significant level.
(1) Construction of Confidence Interval for Mean when Variance ( ) is known,
population is normal, n is large and point estimate is
Then, the 100(1 - ) % confidence interval for the is given by
nzx
nzx
nzx
nz
z
n
xzp
n
xZ
σµσ
σµσ
ασµ
σµ
αα
αα
αα
2121
2121
21211
−−
−−
−−
+≤≤−
≤−≤−
−=
≤−≤−
−=
2σ
x
α µ
98
Examples
(1) The mean weight of a random sample of 80 yams is 3.75 Kg with a standard deviation of
0.85kg. Construct the 95% confidence interval for estimating population µ of entire farm.
= 3.75
= 0.85
n = 80
n
xZcal σ
µ−=
A 100 (1 - ) % C.I for µ implies
nzx
nzx σµσ
αα2121 −−
+≤≤−=
( ) ( )
( )9362.3,5638.3
9362.35638.3
1862.075.31862.075.3
095.096.175.3095.096.175.380
085.075.3
80
085.075.3
96.1
975.0975.0
975.02
05.0121
⇒
≤≤+≤≤−
+≤≤−
+≤≤−
===−−
µµ
µ
µ
α
ZZ
ZZZ
Confidence for µ when variance 2σ is unknown, population is normal, n is small and point
estimate is x .
Then, the 100 (1 – α) % confidence interval for µ is given by
ns
xt
µ−= at n-1 degree of freedom
v = n – 1
x
σ
α
nstx
nst
t
nsx
tp
vv
vv
,2,2
,2,2
αα
αα
µ
µ
≤−≤−
≤−≤−
99
Rearrange to reflect µ
nstx
nstxIC
vv ,2,2. αα µ +≤≤−=
Example
The cost of assembling an item of equipments has been estimated by obtaining a sample of 15
jobs. The average cost of assembly derived from the sample was N1.23 million with a
standard deviation of N0.27 million. Form a 99% confidence interval for the mean cost of
assembling the items.
Solution
n = 15
X = 1.23
S = 0.27
ns
xt
µ−=
A 100 (1 – α) % C.I for µ implies
nstx
nstxIC
vv ,2,2. αα µ +≤≤−=
( ) ( )
( )44.1,02.1
44.102.1
21.023.121.023.1
07.0977.223.107.0977.223.1
15
27.0977.223.1
15
27.0977.223.1
977.214,995.0,21
⇒
≤≤=+≤≤−=
+≤≤−=
+≤≤
−=
==−
µµ
µ
µ
α ttv
100
Example
A sample of 25 years old boys yielded a mean weight and S.D. of 73 and 10 pounds
respectively. Assuming a normal distributed population, find the 90% confidence interval for
µ .
Solution
n = 25
S = 10
X = 73
ns
xtcal
µ−=
A 100 (1 – α) % C.I. for µ implies
nstx
nstxIC
vv ,2,2. αα µ +≤≤−=
( ) ( )
( )422.76,578.69
422.76578.69
422.373422.373
2711.1732711.173
25
10711.173
25
10711.173
711.124,95.0,21
⇒
≤≤=+≤≤−=
+≤≤−=
+≤≤
−=
==−
µµ
µ
µ
α ttv
Confidence Interval for Variance ( 2σ )
A 100 (1 – α) % C.I for 2σ
( ))1(),(
)1(
)1()1(
1
22
22
22
2
−−≤≤
−−−
nX
sn
nX
snαα
σ
101
Examples
Suppose a sample 2S =10.42 is obtained from a random sample of size 25 from a normal
population, obtain a 95% confidence interval.
Solution:
n=25
2S =10.42
where V= n-1
( )
( ) 40.12
36.392
24,025.02
1,025.0
224,975.0
21,975.0
==
==
−
−
χχ
χχ
n
n
A 100 (1-α) % confidence interval for 2σ implies
( )( )
( )( )
( )( ) ( )( )
( )17.20,35.6
17.2035.6
40.12
42.1024
36.39
42.1024
11
2
2
2
,2
22
2
,21
2
⇒
≤≤=
≤≤=
−≤≤−
−
σ
σ
χσ
χ αα vv
snsn
Example 2
In an experiment to determine the amount of carbon monoxide emitted by generator in a day,
a sample of 15 normal generators of the same capacity and brand was taken from an
industrial area, and the sample yielded a mean of 96 and S.D. of 35. Construct the 95%
confidence interval for 2σ .
102
Solution
S2 = (35)2
n = 15
A 100 (1-α) % confidence for 2σ implies
( )( )
( )( ) vv
snsn
,2
2
22
,21
2
2 11
αα χσ
χ−≤≤−
−
( ) ( )
( )72.3046,58.656
72.304658.656
629.5
3514
12.26
3514
629.5
12.26
2
22
2
214,025.0
214,975.0
⇒
≤≤=
≤≤=
=
=
σ
σ
χ
χ
3.2.3 Probability Distribution and Hypothesis Testing
Please follow on as the focus of this study shifts to Binomial Distribution:
Binomial Distribution
1. An experiment whose outcome cannot be predicted is termed a random experiment.
2. The totality of all outcomes of a random experiment constitutes spaces, known as the
sample space.
3. A finite collection of these sample spaces is termed as an event associated with the
random experiment (each point of a sample space is referred to as an elementary event).
4. A real valued function connected with the outcome of a random experiment known as a
random variable. If a random variable takes almost a countable number of values, then it is
called a discrete random variable. On the other hand, if a random variable assumes all the
values between certain limits, it is said to be a continuous distribution. The random variable x
is said to have a Binomial distribution with 2 parameters n,p written as x in B~(n, p) where n
is the number of independent trials and p the probability of success. Then, the probability of x
success in independent trial when P is constant from trial to trial is P(x) = nCxPxqn-x x = 0, 1, 2
… where p + q = 1
103
Mean µ = np
Variance 2σ = npq
Properties of Binomial Distribution
i. The experiment has n independent trials
ii. The trials are independent
iii. Only two possible outcomes are found in a trial e.g. success or failure.
iv. The probability of success p is constant from trial to trial.
Fitting Binomial Distribution and Goodness-of-Fit
Test
A goodness-of-fit is appropriate when one wishes to decide if an observed distribution of
frequency is incompatible with some preconceived or hypothesized distribution. For
example; we wish to determine whether or not a sample of observed values of some random
variables is compatible with the hypothesis that it was drawn from a population of values that
are normally distributed.
NOTE
In case of categorical data compute the expected values Ei using the formula
ncrEi
iii
.=
Where r i represent row total.
ci represent Column total.
nirepresent Grand total.
( )nx
qpCxXp xnxx
n
.........,.........2,1,0=== −
and such that npX =
The expected frequency was obtained by ( )xXNPEi ==
104
Example 1
The following table summarizes 100 observations of the discrete random variable x
x 0 1 2 3 4
f 25 14 25 26 10
Perform a test−2χ of hypothesis that has a binomial distribution.
Solution
Assume X has a binomial distribution with n=4
x f fx
0 25 0
1 14 14
2 25 50
3 26 78
4 10 40
82.1100
182 ===∑∑
f
fxX
To get P, we let npX =
545.0455.01
1,455.04
82.1
=−=
=−==
=⇒
q
qpp
n
XP
105
We now compute
( )
( ) 0882.00
40
4,3,2,1,0
4
40 =
==
=
== −
qpXp
x
qpx
xXp xnx
( )
( )
( )
( ) 0429.04
44
2054.03
43
03689.02
42
2946.01
41
04
13
22
31
=
==
=
==
=
==
=
==
qpXp
qpXp
qpXp
qpXp
The expected frequency is calculated as ( )xXNPEi == we have,
X 0 1 2 3 4
P (X=x) 0.0882 0.2946 0.3689 0.2054 0.0429
N P (X=x) ‘8.82 29.46 36.89 20.54 4.29
NOTE: - If expected frequency is less than 5 combine with adjacent class. Therefore, we
have
X 0i Ei (0i-Ei)2 (0i Ei)2/Ei
0 25 8.82 261.79 29.68
1 14 29.46 239.0 8.11
2 25 36.89 141.37 3.83
3 26 20.54 29.81 1.45
4 10 4.29 32.60 7.60
50.67
106
H0: Distribution follows binomial distribution.
H1: Distribution does not follow binomial distribution.
67.502 =calχ
991.522,95.0
2114,95.0
21,1
2 ==== −−−−− χχχχ α mvtab
Decision since 22tabcal χχ > at 5%, we reject H0 and conclude that the random variable X does
not follow a binomial distribution.
Example
Seven coins are tossed and the number of heads obtained was noted. The experiment is
repeated 128 times and the following distribution is obtained.
No of Heads 0 1 2 3 4 5 6 7 Total
Frequency 7 6 19 35 30 23 7 1 128
Fit a binomial distribution assuming that
(i) The coin is unbiased
(ii) The nature of the coin is not known
Solution:
(i) When the coin is unbiased, p = q = (1/2)
( ) ( )77
21
xxnx
xn CqpCxXp === −
107
X P(x) Expected frequency
0 ( )707
21C
128(1/2)7 7 =1
1 ( )717
21C
128(1/2)7 7 = 7
2 ( )727
21C
128(1/2)7 21 = 21
3 ( )737
21C
128(1/2)7 35 = 35
4 ( )747
21C
128(1/2)7 35 = 35
5 ( )757
21C
128(1/2)7 21 =21
6 ( )767
21C
128(1/2)7 = 7
7 ( )777
21C
128(1/2)7 7 =1
(ii) When the nature of the coin is not known
5167.04833.011
4833.0
4832571.07
3828.3
3828.3128
433
=−=−=≈
==
===
pq
p
npMean
( ) ( ) ( ) xxxCxXp −= 77 5167.04833.0
108
X P(x) Expected frequency
0 (0.5167)7 128P(0) = 1.25856 = 1
1 7 (0.4833) (0.5167)6 128P(1) = 8.24024 = 8
2 21 (0.4833)2(0.5167)5 128(2) = 23.1227 = 23
3 35 (0.4833)3 (0.5167)4 128P(3) = 36.0467 = 36
4 35 (0.4833)4 (0.5167)3 128P(4) = 33.7166 = 34
5 21 (0.4833)5 (0.5167)2 128P(5) = 18.9223=19
6 7 (0.4833)6 (0.5167) 128P(6) = 5.8997 = 6
7 1(0.4833)7 128p(6)=0.7883=1
Normal Distribution
The Binomial and Poisson distributions are useful theoretical distributions for discrete
variable. In many problems, the variables concerned may take all values within a given time.
In such cases, the Binomial and Poisson distribution cannot be applied. In order to deal with
such problems, we will have a continuous distribution. The normal distribution is one of the
most useful continuous distributions that we use to describe continuous random variables.
This is a unimodal symmetric continuous distribution having two parameters µ (the mean)
and 2σ (the variance).
( )( )2,~
var,~
σµNX
iancemeanNX
Features of Normal Distribution
i. It is a continuous distribution.
ii. It is a unimodal distribution.
iii. The total area under the curve is one.
iv. The curve is symmetrical and has a bell shape
v. The mean = median = mode.
109
The probability function of a normal distribution is
( ) ∫∞
∞−
−−== dxexXP
x 2
2 2
2
1 σµ
πσ
Example on Normal distribution
Fit a normal curve to the following data
Class
interval
Freq Upper
limits
(x)
σµ−= x
Z
φ (z) Cumulative
frequency 100
φ (z)
Expect
ed
freque
ncy
94.5 – 99.5 1 99.5 -2.46 0.0069 0.69 0.69
99.5 – 104.5 4 104.5 - 1.99 2.33 2.33 1.64
104.5–109.5 3 109.5 -1.57 0.0582 5.82 3.49
109.5– 1145 3 114.5 -1.06 0.1446 14.46 8.64
114.5–119.5 12 119.5 -0.59 0.2776 27.76 13.3
119.5–124.5 27 124.5 -0.12 0.4522 45.22 17.46
124.5–129.5 12 129.5 0.35 0.6368 63.68 18.46
129.5-1345 16 134.5 0.81 0.7910 79.1 15.42
134.5-139.5 12 139.5 1.28 0.89.97 89.97 10.87
139.5-144.5 8 144.5 1.75 0.9599 95.99 6.02
144.5-149.5 1 149.5 2.22 0.9868 98.68 2.69
149.5-154.5 1 154.5 2.68 0.9963 99.63 0.95
7.10
8.125
75.125
1 =≈
==
−
∑∑
n
f
fxX
σ
110
Poisson Distribution
Poisson distribution is a discrete probability distribution. This distribution occurs at random
points of time and space wherein our interest lies only in the number of occurrences of the
event and not in the non-occurrence of it. When is it necessary to apply Poisson distribution?
(i) n (the no of trials) is very large or not readily obtainable (n> 50, 5<np )
(ii) P (probability of success) is very small relative to q (the probability of failure).
The mass probability function of a poisson distribution is given by
( ). 2, 1, 0, x Where
!
.mxXP
x
…=
=−
x
e m
Uses of Poisson Distribution
i. It can be used in occurrence of accident in the factory.
ii. It can be used to find the demand for a product.
iii. It is also being used when the variable concerned occurs in a random manner, e.g.,
number of customers entering a banking hall at a specific time.
iv. It can be used to detect the occurrence of flux in a production process.
Properties of Poisson distribution
If ( )mpX o~
i. np or npMean == λ
ii. npq== 2 Variance σ
iii. The rate or population average must be known.
111
Examples on Poisson distribution
The following table shows the number of days (f) in a 50 day period during which automobile
accidents occurred in a city. Fit a Poisson distribution.
Solution
No. of accidents (x) 0 1 2 3 4
No. of days (f) 21 18 7 3 1
Solution
0.90x
50
45
13171821
1x43x318x1x7x221x0
f
fx Mean
=
=++++
+++==∑∑
For a Poisson distribution mean = 0.90
0.90=λ
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )0111.0
24
39.0
!4
90.04
494.06
29.0
!3
90.03
1647.02
19.0
!2
90.02
3659.090.0!1
90.01
4066.0!0
90.00
!
90.0
!
490.0
390.0
290.0
90.0190.0
90.0090.0
90.0
=====
=====
=====
====
====
===
−
−
−
−−
−−
−−
XpeXp
XpeXp
XpeXp
ee
Xp
ee
Xp
x
e
x
exXp
xxλλ
112
( )xXP N frequency lTheoretica ==
No of accidents 0 1 2 3 4
Theoretical 20.33 0r 20 18.30 0r 18 8.24 or
18
2.47 0r 3 0.56 or
1
Observed frequency 21 18 7 3 1
The fit of the Poisson distribution to the given data is good.
Example 2
Fit a Poisson distribution and test if it follows a Poisson law.
x 0 1 2 3 4
f 43 38 22 9 1
Solution
1
1113
113
19223843
1493222381430
=
==++++
++++==∑∑
λ
λ XXXXX
f
fx
x f P(x) Expected frequency
0 43 0.3679 41.6
1 38 0.3679 41.6
2 22 0.1839 20.8
3 9 0.0613 6.9
4 1 0.0153 1.7
113
Revised table
iO Ei
( )2
EO ii− ( ) EEO iii
2−
0 41.6 1730.6 41.6
1 41.6 1648.4 39.6
2 20.8 353.4 17.0
10 8.6 2.0 0.2
Total 98.4
4.982 =calχ
991.522,95.0
2114,05.01
21,1 === −−−−−− χχχ α mv
Decision: Since 22tabcal χχ > , we reject H0 and conclude that the test does not follow a Poisson
law.
Summary of Study Session 3
In this study session, you have learnt that:
1. A larger number of individual usually typifies population to be studied and compared;
therefore, sampling methods are used to facilitate comparison.
2. Despite the importance of descriptive statistics, the scientific philosophy of our time
places more emphasis on explanation, prediction and generalization than on pure
description we have discussed the basic sampling procedures, test of hypothesis,
confidence intervals and fitting of distributions to data.
114
Self-Assessment Question for Study Session 3
Having studied this session, you can now assess how well you have learnt its learning outcomes
by answering the following questions. It is advised that you answer the questions thoroughly and
discuss your answers with your tutor in the next Study Support Meeting. Also, brief answers to the
Self-Assessment Questions are provided at the end of this module as a guide.
SAQ (Test of learning outcome)
1. Nine crippled children were asked to perform a certain task, the average time required to
perform the task was 7 minutes with a standard deviation of 2 minutes. Set the null
hypothesis that 0µ = 10 mm and = 0.1
2. A researcher wishes to estimate the mean maximum weight at birth of 40 randomly
selected babies delivered in four teaching hospitals in a day. He is willing to assume that the
weights are approximately normally distributed with a variance of 144. He found out that the
mean weight for these babies is 84.3kg. Can the researcher conclude that the weight for the
population is at most 95kg? Use α = 0.01
References
Mead R. (1988): The design of Experiments: Statistical Principles for Practical Application,
Published by CUP.
Rustagi J.S. (1985): Introduction to Statistical Methods. New Jersey: Rowman and Nanheld.
Thompson S.K. (2002, 2nd Ed): Sampling, Published by Wiley.
α
115
Study Session 4: Design and Analysis of Biological Experiments
Introduction
An experiment can be described as an act of operation undertaken in order to obtain new facts
or to confirm or deny the results of previous experiments.
Experiments are conducted to establish the effect of one or more independent variables on a
response. These variables are often called treatments or factors. Examples of variable are;
different drugs; different fertilizer and so on. Differences in animal breeds are caused by
genetic and environmental differences.
Learning Outcome for Study Session 4
At the end of this study, you should be able to:
4.1 Explain experiment involved in the planning of experimental designs;
4.2 State and explain the procedures involved in planning and execution of biological
experiments;
4.3 Definitions of Relevant Terms in Design and Analysis of Biological
Experiments
4.4 Explain the concept of the completely randomized design and the randomized complete
block design: use the analysis of variance (one way and two-way classification)
techniques to solve problems in the CRD and RCBD.
4.1 Design of an Experiment
The design of an experiment can be defined as the order in which an experiment is
performed. It deals with the planning, execution, analysis and interpretation of results of an
experiment.
For instance, one method of determining
an agreed standard. The drug is applied at a constant rate to an experimental unit and the dose
at which death or some other recognizable events (e.g. recovery) occur is noted. This critical
dose is called the tolerance or threshold. This is repeated for a number of experimental units
using the drug under analysis.
The tolerances vary from one experimental unit to another. Also, the clinical investigation of
the use of new medical treatment raises similar
of a new treatment is compared to that of a controlled treatment. The experiment consists of
applying one treatment to each unit and making one or more observations.
The assignment of treatments is under experi
experiment is the comparison of treatments rather than the determination of absolute values.
This type of experiment is called comparative.
4.1.1 Components of Experimental
The design of an experiment has three essential components:
Figure 4.1
116
Design of an Experiment
The design of an experiment can be defined as the order in which an experiment is
performed. It deals with the planning, execution, analysis and interpretation of results of an
For instance, one method of determining the efficacy of a drug is by direct comparison with
an agreed standard. The drug is applied at a constant rate to an experimental unit and the dose
at which death or some other recognizable events (e.g. recovery) occur is noted. This critical
d the tolerance or threshold. This is repeated for a number of experimental units
using the drug under analysis.
The tolerances vary from one experimental unit to another. Also, the clinical investigation of
the use of new medical treatment raises similar problems of experimental design. The effect
of a new treatment is compared to that of a controlled treatment. The experiment consists of
applying one treatment to each unit and making one or more observations.
The assignment of treatments is under experiment’s control when the objective of such an
experiment is the comparison of treatments rather than the determination of absolute values.
This type of experiment is called comparative.
Experimental Design
has three essential components:
Figure 4.1: Components of Experimental Design
1. Estimation of error
2. Control of error
3. Proper interpretation of result
The design of an experiment can be defined as the order in which an experiment is
performed. It deals with the planning, execution, analysis and interpretation of results of an
the efficacy of a drug is by direct comparison with
an agreed standard. The drug is applied at a constant rate to an experimental unit and the dose
at which death or some other recognizable events (e.g. recovery) occur is noted. This critical
d the tolerance or threshold. This is repeated for a number of experimental units
The tolerances vary from one experimental unit to another. Also, the clinical investigation of
problems of experimental design. The effect
of a new treatment is compared to that of a controlled treatment. The experiment consists of
applying one treatment to each unit and making one or more observations.
ment’s control when the objective of such an
experiment is the comparison of treatments rather than the determination of absolute values.
1. Estimation of Error:
similar manner and the differences among them are measured, these differences are
called experimental errors. This error serves as a basis for taking decision whether an
observed difference is natural or artificial.
2. Control of Error: A common method of controlling or reducing experimental error is
called blocking. A good experiment must inculc
experimental error.
3. Proper Interpretation of Results:
design is its ability to resist all environmental factors, which are not part of the
treatments being considered.
4.1.2 Principles of Experimental
In all experimental designs, there are three essential principles:
Figure 4.2
1 Replication
Replication is the repetition of the same treatment on different experimental units. When
treatment appears more than once in an experiment, it is said to be repeated.
117
Estimation of Error: This is when two or more experimental plots are treated in a
similar manner and the differences among them are measured, these differences are
experimental errors. This error serves as a basis for taking decision whether an
observed difference is natural or artificial.
A common method of controlling or reducing experimental error is
called blocking. A good experiment must inculcate every possible means of reducing
Proper Interpretation of Results: An essential characteristic of an experimental
design is its ability to resist all environmental factors, which are not part of the
treatments being considered.
Experimental Design
In all experimental designs, there are three essential principles:
Figure 4.2: Principles of Experimental Design
Replication is the repetition of the same treatment on different experimental units. When
treatment appears more than once in an experiment, it is said to be repeated.
1 Replication
2 Randomization
3 Blocking/Local Control
when two or more experimental plots are treated in a
similar manner and the differences among them are measured, these differences are
experimental errors. This error serves as a basis for taking decision whether an
A common method of controlling or reducing experimental error is
ate every possible means of reducing
An essential characteristic of an experimental
design is its ability to resist all environmental factors, which are not part of the
Replication is the repetition of the same treatment on different experimental units. When a
treatment appears more than once in an experiment, it is said to be repeated.
118
Function of Replication
The following are the functions of Replication:
i. To provide an estimate of experimental error.
ii. To improve the precision of an experiment by reducing the standard deviation of a
treatment mean.
iii. To measure the scope of inference of the experiment by selection and appropriate
use of variable experimental units.
iv. To effect control of error variance.
2 Randomization
This is the use of a random process to assign experimental units to treatments so that every
experimental unit considered has an equal chance of receiving a treatment. The unbiasedness
of the treatment means and experimental error estimates is ensured. This is a major reason for
randomization.
Methods of Randomization
The following are the method of Randomization:
i. Use random number table.
ii. Tossing a coin.
iii. Shuffling numbered cards.
iv. Drawing numbered balls from a bag.
Functions of Randomization
• To eliminate systematic bias in treatment assignment.
• It ensures that each unit has an equal chance of receiving a treatment.
3 Blocking/Local Control
Blocking is the process by which an experimental material is divided into blocks of
homogeneous units. This is to allow for certain restrictions on randomization to minimize
experimental error.
119
In-Text Question
The following are the methods of Randomization EXCEPT?
a. Use random number table.
b. Tossing a coin.
c. Shuffling greeting cards.
d. Drawing numbered balls from a bag
In-Text Answer
c. Shuffling greeting cards
4.2 Steps Involved in the Planning and Execution of an Experimental
Design
When planning and executing an experiment, the following are the procedures to be followed
in order to achieve a great success:
1. Problem Recognition and Definition: This is the first step in the planning and
execution of an experiment. After the problem has been recognized, it must be clearly
defined. The more we understand the problem at hand, the easier it is for us to find an
appropriate solution to it.
2. Objective of the Experiment: A clear statement of the objective is very important.
This may be in form of a hypothesis to be tested or a question which requires an
answer. If the objective is more than one, arrange them in their order of priority.
3. Analysis of the Problem and Objectives: The aims of the objectives must be
carefully studied and the necessity of the experiment stated in precise terms.
4. Treatment Selection: Reasonable selection of the treatments goes a long way in
achieving the desired results.
5. Selection of the Experimental Units: This involves the selection of experimental
units from the population on which inferences are to be made. The selected or chosen
experimental units must be a true representative of the entire population under
consideration.
6. Selection of the Experimental Design: The objective of the experiment should be
120
carefully considered here; so that an appropriate design will be given the needed
result that should be preferred.
7. Selection of Units for Observation and the Number of Replications: This involves
the determination of the number of replications and the plot size to be chosen in order
to produce the desired result.
8. Data to be collected: Consideration of data to be collected is indispensable. No
essential data should be omitted.
9. Outlining Statistical Analysis and Summarization of Results: Given the summary
of the expected results in order to show the expected effects of the treatments on the
experimental units. This can be presented on a summary table or graphs. List its
sources, variation and corresponding degrees of freedom in the analysis of variance
table. Check whether the expected results of the experiment are relevant to the
objective.
10. Performing the Experiment: The experiment should be conducted without any
element of personal biases. Record all observations properly and later check them
with a view to minimizing recording errors or deleting data that are obviously
erroneous.
11. Data Analysis and Interpretation of Results: Analyse all data and give the
interpretation of the results obtained from the experiment. Data should be scrutinized
and carefully analyzed so that the result will be accurate. If either the data is not well
analyzed or the results not well interpreted, the conclusions may be wrong.
In-Text Answer
The objective of the experiment should be carefully considered; so that an appropriate design
will be given the needed result that should be preferred. The above statement explains one of
the following:
a. Selection of the Experimental Design
b. Selection of the Experimental Units
c. Data to be collected
d. Outlining Statistical Analysis and Summarization
In-Text Question
b. Selection of the Experimental Units
121
4.3 Definitions of Relevant Terms in Design and Analysis of Biological
4.4 Experiments
The following are the definition of relevant terms:
I. Factors: These are variables that can affect the outcome of an experiment. The
factors are always under investigation. Examples of factors are time, humidity,
temperature, etc.
II. Factor levels: Every factor in an experiment may take two or more level for
quantitative factors while for qualitative factors, the levels could be fertilizers
such as NPK fertilizer.
III. Treatment: A treatment is the procedure whose effect is to be measured and
compared with other treatments.
IV. Experimental Units: An experimental unit is the smallest division of the
experimental materials, such that any two units may receive different treatments in
the experiment. It can also be defined as the unit of material to which one
application of a treatment is applied.
V. Pairing: It is defined as the method employed in assigning the effectiveness of a
treatment or experimental procedure that make use of related observation resulting
from non-independent samples. A hypothesis test based on these types of data is
called a paired comparison test.
VI. Response: The numerical result obtained for a specific experimental unit is
known as a response.
VII. Sample Statistic: These are descriptive measures computed from the data of a
sample. Examples are the sample mean, sample variance, sample standard
deviation.
VIII. Population parameter: These are descriptive measures computed from the data
of a population. For example, population mean ( ), variance ( ), population
standard deviation, population range and so on.
4.4.1 Analysis of Variance (ANOVA)
The analysis of variance provides a method of testing whether the means of two or more
normal populations with unknown but common variances are equal. The analysis of variance
µ 2σ
122
approach is to partition the variation within the experiment into meaningful components. The
purpose is to compare the variations attributable to differences between the populations to the
variation attributable to random variation within the various populations. If the former is
sufficiently large in comparison to the latter, then the procedure will result in a rejection of
the null hypothesis that all population means are equal.
Therefore, analysis of variance may be defined as a technique whereby the total variation
present in a set of data is partitioned into several components. Associated with each of these
components is a specific source of variation, so that in the analysis, it is possible to ascertain
the magnitude of the contributions of each of these sources to the total variation. It was
developed by R.A Fisher, and used mainly in agricultural experiment. The ANOVA can be
split into two types: one-way classification and two-way classification.
Assumptions underlying the use of ANOVA
1. Independence of Observations
This is the most important assumption underlying the F-test in ANOVA, and violation
of this can lead to serious inferential errors. In experimental research, independence is
achieved by random assignment of subjects to the treatment groups.
2. Normal Distribution of X:
Fisher’s F is an appropriate model for the distribution used in ANOVA only when the
means and variances of X are independent. This can be safely assumed only if X is
normally distributed.
3. Equality of Error variances (Homoscedasticity or homogeneity of variance) i.e.
222
21 jσσσ =−−−==
4. Additivity (treatment additive)
5. Independent error
In the two ways classification model, our interest is centred on testing the equality of the
treatment mean. Hence, the hypothesis of interest is
(i) H0 : nµµµ =−−−== 21
(ii) H1 : At least one ji µµ ≠
123
Classification of Experimental Design
There are two categories of experimental design (i.e. experiment in which only a single factor
varies while others remain constant). They are the complete Block Design and the Incomplete
Block Design. Examples of the former are Completely Randomized Designs, Randomized
Complete Block Designs and Latin Square Designs. While for the latter, we have Lattice
Design and Group Balanced Block Design.
The Complete Block Design is suitable for experiments with small number of treatment
while the Incomplete Block Design is applicable for experiments with large number of
treatments.
1. The Completely Randomized Design (CRD)
There is a design in which the treatments are randomly assigned to the experimental units or
in which experimental units are randomly assigned to treatments. The design is simple and
therefore widely used. However, it should be used when the units receiving the treatments are
homogenous if the experimental design should be considered.
Example of cases where CRD is applicable:
a. Laboratory experiment, where a quality of material is thoroughly mixed and then divided into small lots to form the experimental units to which treatments are randomly assigned.
b. Plant and animal experiment where environmental effects are much alike.
The model of CRD can be given as:
.n1,2, j and..k 1,2, i Where ……………=………=
++= ijiij etY µ
ijY = observation of the ith item of the jth sample.
µ = Overall mean (grand mean)
it = treatment effect
ije = random error effect.
ije measures the deviation of the jth observation of the ith sample from the
corresponding population mean and it is assumed to be normally and independently
distributed with mean zero and common variance for the treatment. 2σ
124
The hypothesis being tested
H0 : nµµµ =−−−== 21
H1 : At least one ji µµ ≠
Table of sample values for the CRD
Treatment
1 2 3 - - - j - - - k
y11 y12 y 13 - - - yij - - - y1k
y21 y22 y 23 - - - y2j - - - y2k
. . . . .
. . . . .
. . . . .
Total T1 T2 T3 - - - Tj - - - Tk T..
Mean y .1 y .2 y .3 - - - y .j - - - y .k y ..
Where T.j = total of the observation taken under treatment j
y .j = observed mean for treatment j = j
j
n
T
T.. = overall total for all observations taken
Yij= ith observation receiving jth treatment
i = 1, 2, - - - , n, j = 1, 2, - - - k
∑=
==k
jjn
N
Ty
1
;..
..
∑ ∑ ∑= = =
=⇒k
j
k
j
nj
i
YijjT1 1 1
.T..
125
The computations of sum of squares are as follows:
SSTotal = ( )∑∑= =
−k
j
nj
i
YYij1 1
2.. . . . ……………………………. (1)
N
TYij
..22∑∑ − . . ………………………………………… (2)
To partition the total sum of squares, we introduce
in to equation (1)
SSTotal = ( )∑∑= =
−+−k
j
nj
i
YjYjYYij1 1
2....
( )( )[ ]∑∑= =
−+−=k
j
nj
i
YjYjYYij1 1
2....
( ) ( )( ) ( )∑∑∑∑∑∑= == == =
−+−−+−=k
j
nj
i
k
j
nj
i
k
j
nj
i
YjYYjYjYYijjYYij1 1
2
1 11 1
2.......2. …(3)
The middle term in (3) becomes zero since the sum of deviations of a set values from their
mean equals to zero i.e
( )( )∑∑= =
−−k
j
nj
i
YjYjYYij1 1
....2 =0
∑=
=⇒nj
i
Yijj1
.T
jYjY .. +
126
SSTotal ( ) ( )∑∑∑∑= == =
−+−=k
j
nj
i
k
j
nj
i
YjYjYYij1 1
2
1 1
2....
( ) ( )∑∑∑== =
−+−=k
j
k
j
nj
i
YjYnjjYYij1
2
1 1
2....
When the observations are the same in each group, we get
SSTotal ( ) ( )∑∑∑∑= == =
−+−=k
j
nj
i
k
j
nj
i
YjYnjYYij1 1
2
1 1
2.... …………………… (4)
Hence, equation (4) can be symbolically written as
SST = SSt + SSE
Therefore,
SST = ( )∑∑= =
−k
j
nj
i
YYij1 1
2.. . .
= ∑∑= =
−k
j
nj
iij N
TY
1 1
22
..
Where N = nk; N
T ..2
= correction factor
( )
SSSSSS TrTotalE
k
j
K
Jtr
N
T
nj
jT
YjYnSS
−=
−=
−=
∑
∑
=
=
1
22
1
2
...
...
Where SSE = ( ) ∑∑∑∑∑== == =
−⇒−k
j
jk
j
nj
iij
k
j
nj
ij nj
TYYYij
1
2
1 1
2
1 1
2 ).()(.
∴
127
The ANOVA table for one-way classification is given as follows:
Source of
Variation
Degree of
Freedom
Sum of
Squares
(SS)
Mean Square
(MS)
F-ratio or
FO
Treatment K – 1 SSTr
Error K(n – 1) SSE
( )1−=
nKSSMS E
E
Total Kn - 1 SSTotal
Decision:
To reach a decision, the computed F-ratio must be compared with the critical value of F
obtained from the table. If the computed value of F-ratio exceeds the critical value of F, the
null hypothesis is rejected. Otherwise, we accept the null hypothesis and reject alternative
hypothesis.
Example 1
Four types of pain relief drugs were administered on 12 patients in a hospital and the number
of hours of pain relief provided by the drug recorded as follows:
Drugs
A B C D
6 7 3 4
3 10 6 2
4 8 5 9
Total 13 25 14 15
Solution:
TrTr MS
K
SS =− 1 E
TrMS
MS
128
Ho: µ1 = µ2 = µ3
H1: Not all population means are equal i.e. µ1 ≠ µ2 ≠ µ3
Grand total (T..) = 13 + 25 + 14 + 15
= 67
SStotal = ∑∑= =
−k
j
nj
iij N
TY
1 1
22
..
N = nk i.e. number of rows x number of columns
N = 3 x 4
= 12
SSTotal = 62 + 72 + 32 + … + 52 + 92 – (67)2
12
= 445 – 374.03
= 70.97
SSTr = N
T
nj
Tk
j
j ... 2
1
2
−∑=
=
= 405 – 374.03
= 30.97
SSE = SSTotal – SSTr
= 70.97 – 30.97
= 40.00
( )12
67
3
15142513 22222
−+++
129
ANOVA TABLE
Source of
Variation
Degree of
Freedom
Sum of
Squares
Mean
Square
F-ration
Treatment 3 30.97 10.32 2.064
Error 8 40.00 5.00
Total 11 70.97
The critical value is obtained as follows:
Fα, (v1, v2) = Critical value
Where α = level of significance
v1 = k – 1
v2 = k (n-1)
F0.05, (3,8) = 4.07
Decision: Since Fcal < Ftab, we accept Ho and conclude that there is no significant difference
in the mean number of hours of pain relief provided by the four types of drugs.
130
Example 2:
Mathematics tests are conducted by 5 different examiners separately. The numbers of
candidates in the categories of distinction, credit, pass, fail are as shown below:
Examiner
Grade A B C D E
Distinction 5 7 4 10 6
Credit 13 3 5 8 15
Pass 21 31 19 27 33
Fail 39 17 28 34 19
Test the hypothesis that the examiners do not differ in their standards of awards. Use α
= 0.05
Solution
Ho: µ1 = µ2 = µ3 = µ4 = µ5
Hi: µ1 ≠ µ2 ≠ … ≠ µ5
Grand total = 344
( )
2.2493
8.5916841020
3441934475
222222
=−=
−+++++= KSST
SSTr =
= 6038.5 – 5976.8
= 121.7
SSE = SST – SSTr
= 2493.2 – 121.7
20
)344(
4
)73()79()56()58()78( 222222
−++++
131
= 2371.5
The corresponding ANOVA table is presented thus,
Source of
Variation
Degree of
Freedom
Sum of
Squares
Mean
Square
F-ratio
Treatment 4 121.7 30.43 0.1925
Error 15 2371.5 158.10
Total 19 2493.2
F0.05, (4, 15) = 3.06
Decision Rule: If caltab FF < , reject Ho. Otherwise accept Ho
Conclusion:
Since Fcal= (0.1925) < Fα, (v1, v2) = 3.06, we accept the null hypothesis that there is no
significant difference in the examiners’ standards of awards.
Co-efficient of Variation
This shows the level of precision of the treatments compared. The lower the co-efficient of
variation, the higher the precision.
The co-efficient of variation (C.V) is given by C.V. = MG
MSE
. x 100%; Overall (Grand
Mean) =
Overall Mean
The co-efficient of variation for example 1 is
= 40%
N
T..
10058.5
00.5x
132
For example 2,
C.V = 1002.17
10.158X
= 73%
Advantages of the Completely Randomized Design (CRD)
The following are the advantages of completely randomized design (CRD)
1) The CRD is flexible in that the number of treatment and replicate is limited only
to the number of experiments available.
2) The statistical analysis is simple even if the number of replicates vary with the
treatment.
3) The number of replicates can vary from treatment to treatment, though it is
generally desirable to have the same number per treatment.
4) Loss of information due to missing data is small relative to loss with other
designs.
5) The number of degrees of freedom for estimating experimental error is maximum.
This improves the precision of the experiment.
Disadvantages of the CRD
The following are the disadvantages of the CRD
1) The CRD is not normally used for field experiments, other designs seem to be
more appropriate.
2) The major disadvantage of this design is that it is normally appropriate for
homogeneous experimental materials and when the number of treatment is small.
The Randomized Complete Block Design (RCBD)
The RCBD is the most widely used design and it was originated in agriculture where land
was divided into blocks and blocks were divided into plots that received the treatments under
investigation. It is a design in which the units called experimental unit to which the
treatments are applied are sub-divided into homogeneous groups called plots, so that the
133
number of experimental units in a plot is equal to the number of treatments being considered.
The treatments are then assigned at random to the experimental units within each block. It
should be noted that each treatment appears in every block and each receives every treatment.
However, the objective of using the Randomized Complete Block Design is to reduce as
small as possible or isolate variability among the experimental units within a block while
assuring that treatment means will be free of block effect. The effectiveness of the design
depends on the ability to achieve homogeneous blocks of experimental units. The ability to
form homogeneous blocks depends on the researcher’s knowledge of the experimental
material.
The following are examples of areas where RCBD is applicable:
1. In experiments involving human beings, age and sex may be used as blocks.
2. In animal experiments where different breeds of animal will respond differently to
the same treatment. The breed type may be used as a blocking factor.
3. The design can also be used when an experiment must be carried out in more than
one laboratory or when several days are required for completion.
Table of sample values for the RCBD
Two-Way Classification
Treatment
Blocks 1 2 3 - - - k Total Mean
1. y11 y12 y13 - - - y1k B1 1Y
2. y21 y22 y23 - - - y2k B2 2Y
3. y31 y32 y33 - - - y3k B3 3Y
. . . . . . .
m ym1 ym2 ym3 - - - ymk Bm mY
Total T1 T2 T3 - - - Tk R…
Mean Y.1 Y.2 Y.3 - - - Y.k ..Y
134
The model is given as:
Yij = µ + iτ + βj + eij
Where µ = Grand mean
iτ = ith row effect
βj = jth column effect
eij = random error term
Hypothesis: We can consider two different null hypotheses. They are
Ho1: There is no difference in treatment means
Ho2: There is no difference in block means
Then we can compute the sum of squares for the three sources of variations as
follows:
SSTotal = with (mk – 1) d.f
Where N = mk; R.. = Grand Total
SSTreatment = with (m-1) d.f.
SSBlocks = with (k-1) d.f.
SSError = SSTotal – SSTreatment – SSBlocks
With (m-1) (k-1) d.f.
N
RY
m
i
k
j
ij
2
1 1
2 ..−∑∑= =
∑=
−m
i N
R
m
Ti
1
22 ..
∑=
−k
j
j
N
R
K
B
1
22..
135
ANOVA Table for RCBD
Source of
Variation
Degree of
Freedom
Sum of
Squares (SS)
Mean Square F-ratio or
Fo
Treatment m-1 SSTreatment SSTr=MSTr
m-1
MSTr
MSE
Blocks k-1 SSBlock SSBlock = MSE
k-1
MSB
MSE
Error (m-1)(k-1) SSError SSError = MSE
(m-1)(k-1)
Total Mk-1 SSTotal
Decision Rule: Reject Ho if Ftab < F0. Otherwise, reject H1
Example 3
The director of a tuberculosis control section of a general hospital wishes to study the
effectiveness of 3 newly developed drugs to cure the disease. The drugs are administered on
patients in different hospitals and the number of recovery from the disease per 50 patients is
recorded as follows:
Drugs
Hospitals 1 2 3 Total
H1 8 13 10 31
H2 11 15 16 42
H3 14 17 17 48
Is there any difference in the number of recovery cases recorded in the 3 hospitals?
Test, using α = 5%
136
N
RY
m
iij
k
j
2
1
2
1
..−∑∑= =
Solution
Ho: µ1 = µ2 = µ3
Hi: µ1 ≠ µ2 ≠ µ3
SSTotal =
N = mk
= 3 x 3 = 9
SST = 82 + 132 + … + 172 + 172 - 9
)121( 2
= 1709 – 1626.78
= 82.22
SSTr = N
R
m
Tim
i
2
1
2 ..−∑=
=
= 1654.33 - 1626.78
= 27.55
SSBlock =N
R
k
Bjk
j
2
1
2 ..−∑=
=
= 1676.33 - 1626.78
= 49.55
SSError = SSTotal - SSTreatment - SSBlock
= 82.22 - 27.55 - 49.55
= 5.12
9
)121(
3
434533 2222
−++
9
)121(
3
484231 2222
−++
137
ANOVA Table
Source of
Variation
Degree of
Freedom
Sum of
Squares
Mean
Squa
re
F-ratio
Treatment 2 27.55 13.78 10.77
Blocks 2 49.55 24.78 19.36
Error 4 5.12 1.28
Total 8 82.22
Fo.o5, (2,4) = 6.94 in both cases.
Decision: Since Fcal> Ftab in both cases, the null hypothesis H0, is rejected. We therefore
conclude that not all the treatment means are equal.
Example 4
Four groups of physical therapy patients were subjected to different diets in 6 clinics. At the
end of a specified period of time, each was given a test to measure treatment effectiveness.
The following scores were obtained:
Treatments (Diet)
Clinic 1 2 3 4 Total
1. 76 70 90 80 316
2. 75 82 64 88 309
3. 72 80 79 71 302
4. 38 95 74 90 317
5. 66 80 87 60 293
6. 82 88 85 75 330
Total 429 495 479 464 1867
138
Do these data provided sufficient evidence to indicate a difference among the treatments?
Use α = 0.05
Solution
Ho: µ1 = µ2 = µ3 = µ4
Hi: at least one µi ≠ µj
SSTotal = 762 + 702 + 902 + … + 852 + 752 - 24
)1867( 2
= 147439 - 145237.04
= 2201.96
SSTreatment = 24
)1867(
6
464479495429 22222
−+++
= 145633.83 - 145237.04
= 396.79
SSBlock =
= 145444.75 - 145237.04
= 207.71
SSError = 2201.96 - 396.79 - 207.71
= 1597.46
24
)1867(
4
330293317302309316 2222222
−+++++
139
ANOVA Table
Source of
Variation
Degree of
Freedom
Sum of
Squares
Mean Square Fo
Treatment 3 396.79 132.26 1.24
Blocks 5 207.71 41.54 0.39
Error 15 1597.46 106.50
Total 23 2201.96
F2, 15 (0.05) = 3.29
F5, 15 (0.05) = 2.90
Decision: Since the value of Fcal < Ftab in both cases, we accept the null hypothesis and
conclude that the experiment does not show any significant difference among the four
treatments.
Advantages of the RCBD
1) It is easy to understand with simple computation.
2) Grouping experimental units into blocks gives better precision.
3) Missing value can be easily estimated so that arithmetic convenience is not loss.
4) Certain complications that may arise in the cause of experiment are easily handled
with this design.
Disadvantage of RCBD
The disadvantage of RCBD is as follows:
I. When the variation among the experimental units between a block is large, a large error
term results. This frequently occurs when the number of treatment is large. Thus, it
may not be possible to secure sufficiently uniform groups or units for blocks. This is
the major disadvantage of the RCBD.
140
Summary of Study Session 4
In this study session, you have learnt about:
1. The experiment involved in the planning of experimental designs;
2. The procedures involved in planning and execution of biological experiments;
3. How to define relevant terms in Design and Analysis of Biological
4. The concept of the completely randomized design and the randomized complete block
design: the analysis of variance (one way and two-way classification) techniques to
solve problems in the CRD and RCBD.
Self-Assessment Question for Study Session 3
Having studied this session, you can now assess how well you have learnt its learning outcomes
by answering the following questions. It is advised that you answer the questions thoroughly and
discuss your answers with your tutor in the next Study Support Meeting. Also, brief answers to the
Self-Assessment Questions are provided at the end of this module as a guide.
SAQ (Test of learning outcome)
When planning and executing an experiment, list and explain the first five procedures to
follow in order to accomplish success.
References
Edward, R. Dougherty (1990): Probability and Statistics for the Engineering, Compound and
Physical Sciences
James, T. McClave, Terry Sincich (1995, 5th Ed): A first course in Statistics
Lindgen, McElrath, Berry (1978, 4th Ed): Introduction to Probability and Statistics
Murray, R. Spiegel (1972, 1st Ed): Theory and Problems of Statistics
Royal Statistical Society (2004): Statistical tables for use in examinations.
Rustagi J.S. (1985): Introduction to Statistical Methods, Volume II.
141
Study Session 5: Design and Analysis of Clinical Trials
Introduction
Paying attention to design and analysis of clinical trial will have a lot to do with experiments.
For example, for you to test the effectiveness of a drug formulation in the remedy of a
specific disease, some patients with the disease are given the new drug and some are given a
well-known drug. The two groups are then compared prospectively for incidence of the
disease and other outcomes.
An experiment applied to existing patients, in order to decide upon an appropriate remedy
(treatment) or to those presently free of warning sign, in order to decide upon an appropriate
preventive strategy. These experiments involve giving remedy to the subjects in the study.
Learning Outcome for Study Session 5
At the end of this study, you should be able to:
5.1 Define clinical trial and Identify the features of clinical trial;
5.2 Identify and explain the Phases of clinical trials;
5.3 Highlight and explain the basic terms on clinical trials;
5.4 State some ethnical issues on clinical trials;
142
5.1 What are Clinical Trials?
A Clinical Trial is a well planned experiment on human beings (species) which is designed to
evaluate the efficacy and the safety of one or more forms of treatment (remedy).
Clinical trial is conducted to compare at least two interventions, one of which is usually the
standard or well-known intervention. The subjects are assigned to these interventions at
random. A properly designed clinical trial is extremely important for drawing valid
inferences from its results.
5.1.1 Features of Clinical Trials
The following are the features of Clinical Trials:
1. A comparative clinical trial is essentially prospective rather than retrospective. That
is, the participant must be followed from a well-defined time point which becomes
time zero or baseline for the study for that participant.
2. A comparative trial must have one or more interventions or treatments. These may be
used to prevent diseases, diagnostics or used in treating illness, etc.
3. A comparative clinical trial must have a control group against which the intervention
group is compared. That is, most often a new intervention is compared with the best
current standard treatment. Or if no standard treatment exists, the control group may
receive a placebo or nothing at all. Placebo is an identical looking drug that has no
active ingredient.
5.1.2 Needs for Clinical Trials
1. A clinical trial is the method used to determine whether a health or medical
intervention has the postulated effect.
2. A Clinical trial offers the possibility of judgment based on uncontrolled clinical
observation whether a new treatment has made a difference to desired outcome
because there is a control group which is comparable to the intervention group in
every way except for the intervention being studied.
3. It evaluates the efficacy and safety of the new treatments because clinical trial is
conducted properly and it follows the principles of scientific experimentation
provided.
143
5.1.3 Uses of Clinical Trials
1. To provide confirmatory evidence for findings from earlier studies that may be
unconvincing.
2. To develop and test intervention in nearly areas of medicine and public health.
3. For easier understanding of safety and effectiveness concerns about new treatments
prior to marketing approval.
4. It is used to develop surgical treatments and some medical device.
5.1.4 Advantages of Clinical trials
The following are the advantages of clinical trials:
1. It ensures that the ‘cause’ come before the ‘effect’.
2. It ensures that the possible confounding factors do not confuse the results because we
can allocate subjects to treatment in any way we choose.
3. It ensures that treatments are compared efficiently. That is, using control group over
allocation to spread the sample over the treatments groups so as to achieve maximum
power in statistical tests such as t test.
5.1.5 Disadvantages of Clinical trials
The following are the disadvantages of Clinical trials:
1. It may share many of the disadvantages of cohort studies, since intervention studies
involve the forthcoming collection of data.
2. Ethical problems are associated with the given experimental treatments, which rule
out the use of intervention studies in epidemiological investigations.
3. This may restrict the generalization of results because clinical trials screen out those
who may have a special reaction to treatment.
144
New
che
mic
al
com
poun
d or
A
nim
al s
tudy
P
hase
A
To
dete
rmin
e
the
max
imum
allo
wab
le o
r
tole
rate
d do
se
and
to k
now
Pha
se B
1
To
dete
rmin
e if
it is
effe
ctiv
e
enou
gh to
cont
inue
the
Ter
min
atio
n
Pha
se B
2
To
estim
ate
effic
acy
of t
he
Pha
se C
To
com
pare
the
effic
acy
of
the
new
trea
tmen
t w
ith
Pha
se D
To
mon
itor
perf
orm
ance
and
side
effe
cts
of t
he n
ew
PH
AS
ES
OF
CLI
NIC
AL
TR
IALS
No
N
o
Yes
Y
es
Yes
Y
es
Box 5.1: Recall
A Clinical Trial is a well planned experiment on human beings (species) which is designed to
evaluate the efficacy and the safety of one or more forms of treatment (remedy).
5.2 Identification of Phases of Clinical Trials
Clinical trial is classified into four main phases of experimentation namely phase I, II, III, and
IV.
This shows how it is applied in the pharmaceutical industry. It also indicates how the research
programme for the development of a new drug as applied to a particular disease might
proceed.
145
5.2.1 Phase A of Clinical Trials
1. The main purpose is to test the treatment that is, the bioavailability of the drug, drug
metabolism and toxicity.
2. Another purpose is dose-finding by determining how large a dose can be given before
unacceptable toxicity is experienced by patient. This dose is known as the maximum
dose or MTD. And use of multiple dose to determine appropriate close schedules.
3. In this stage, sample size usually between 20 and 50 subjects are drawn. And the trials
are carried out in healthy volunteers.
5.2.2 Phase B of Clinical Trials
This stage investigates the effectiveness and safety i.e. to make an estimate of the probability
that patient will:
1. Have serious side-effect from the treatment.
2. Benefit from the treatment.
It is used as a place where screening process has taken place to select out those relatively few
drugs of genuine potential from the larger number which are inactive or over toxic.
Involvement in the estimation of optimum sample sizes by statistical considerations is
required in phase B trials.
5.2.3 Phase C of Clinical Trials
These are comparative trials of a new drug (emerging from phase trails) simultaneously with
the current standard treatments under the same conditions. It usually requires a large number
of subjects, strict statistical principles are used to determine optimum sample sizes required.
In a very large audience, the term ‘clinical trial’ is synonymous with a phase B comparative
treatment efficacy trial. It is the most rigorous and extensive type of scientific, clinical
investigation of a new treatment. There are extensive rules and regulations for the design,
organization, conduct and analysis of phase C trials.
146
5.2.4 Phase D of Clinical Trials
These are studies to monitor post-registration performance, interaction with other therapies or
treatment, unusual complication and adverse side effects that are uncommon.
In this stage, some phase D trials may incorporate epidemiological designs such as
descriptive study, case-control and cohort study to enable the estimation of the incidence of
serious side effects.
Most studies in phase D especially those using only disease registers or notification systems
capture only serious side effects and may not precisely ascertain the number of patient who
have received the treatment.
In-Text Question
In this stage, sample size usually between 20 and 50 subjects are drawn. And the trials are
carried out in healthy volunteers. This describes;
a. Phase A
b. Phase B
c. Phase C
d. Phase D
In-Text Answer
a. Phase A
5.3 Basic Terms on Clinical Trial
As a moral philosophy embodies the analysis, evaluation and development of “normative
moral criteria for dealing with moral problems”. It also assigns significance to norms, values
and principles in the regulation of human affairs. It is the study of morality and its effects on
human conduct. It is concerned with questions such as when an act is right or wrong.
147
Research
It can be defined as an organized, formally structured methodology to obtain new knowledge.
Examples of research are the laboratory research, epidemiology research etc.
Placebo
It is an identical looking drug that has no active ingredients. It is an inert substance which
looked like and tasted like the treatment.
Experiment
It is a planned investigation, which is carried out to obtain facts or deny the results of
previous investigation.
Randomization of Randomization
It the process of using random number to assign treatment to patients (subjects). Subjects
should allocate to treatment group according to some chance mechanism.
Advantages
1. To avoid systematic bias.
2. To exclude deliberative bias in the allocation.
Blindness
It is the policy of keeping someone unaware of the treatment which has been given. There are
three forms of blindness.
1. Single-blind
2. Double blind
3. Triple blind
It is proper for a trial to be “double-blind” i.e. for the doctor not to know the particular
treatment that he is giving to a particular patient. The reason is to avoid subjective judgment.
Treatment
It can be defined as any stimuli that can be applied to a subject. It may be drug, diet, etc.
148
Replication: Is the repetition of some treatment on different subjects.
Examples of clinical trials are:
1. Aspirin in the prevention of recurrent heart attacks.
2. A comparative clinical evaluation of econazole nitrate, miconazale and nystatin in the
treatment of vaginal candidacies.
5.3.1 Protocol Design in Clinical Trials
1. Background and general aims of the trial: It should be clearly stated. It also builds on
the experience gained from past researches. The aims may be to compare the efficacy
and side-effects of a new treatment.
2. Specific objectives of the trials: There should be a concise and precise definition of
hypothesis concerning treatment efficacy and safety which are to be examined by the
trial.
3. Sampling method
4. Treatment schedule: This involves detail of the treatment.
5. Method of patient evaluation.
6. Trial design: It deals with method of treatment allocation.
7. Registration and randomization of patients
8. Patient consent
9. Required site for the study.
10. Monitoring of trial progress
11. Form and data handling
12. Protocol deviation
13. Plans for statistical analysis
14. Administrative responsibilities
5.3.2 Role of Statisticians in Clinical Trials
1. Assist by advising on the required number of patients.
2. He assists in the method of randomization and the data processing techniques.
3. Patients’ registration treatment schedule.
4. Statistician has a valuable contribution to improve the overall design.
149
In-Text Question
There are three forms of blindness, they are listed below except?
a. Placebo blind
b. Single-blind
c. Double blind
d. Triple blind
In-Text Answer
a. Placebo blind
5.4 Ethical Issues on Clinical Trials
1. The proposed treatment must be safe and therefore not likely to bring harm to the
patient. The following are the examples of treatment that has been carried out
before and has led to death and damages:
a. Tuskegee Syphilis Experiments in 1932
b. Nazi experiments from 1930 – 1945, stopped in 1946
c. Pfizer Drug trial in children: It took place in Kano in 1996 and stooped in
2001
2. Which type of patient may be brought into a controlled trial and allocated randomly to
different treatment e.g. pregnant woman, the very young, the old and frail should be
excluded from trial.
3. The patient consent should be obtained in every trial.
4. It is ethical to use a placebo or dummy treatment.
5. The trial should be double-blind. Double-blind, in the sense that the doctor and the
patient are not aware of the nature or kind of treatment given.
150
5.4.1 Sample Size Determination
Power
The power of a hypothesis is the probability that the null hypothesis is rejected when it is
false. Often, this is represented as a percentage rather than a probability. We shall consider
here, determination of the power of a hypothesis test and the sample size shall be estimated.
For a one-sided test with hypothesis as below:
Ho: µ = µ0
H1 : µ > µ0
The power can be defined as
)Z- (Z P power 01ασ
µµ
n
−<=
For a two – sided test with hypothesis stated
as: Ho: µ = µ o
H1 : µ = µ1
)Z- (Z P power 2
01ασ
µµ
n
−<= Where µ1 > µ0
or
)Z- (Z P 2
01ασ
µµ
n
−< Where µ0 > µ1 or µ1< µ0
Example 1
The male population of an area within a developing country is known to have had a mean
serum total cholesterols value of 5.5mmol/l 10 years ago. In recent years, many western
foodstuffs have been imported into the country as part of a rapid move towards a western –
style market economy. It is believed that this would have caused an increment in cholesterol
levels. A sample survey involving 50 men is planned to test the hypothesis that the mean
cholesterol is still 5.5 against the alternative that it has increased, at the 5% level of
significance from the evidence of several studies; it may be assumed that the standard
deviation of serum total cholesterol is 1.4mmol/l. The epidemiologist in charge of the project
151
considers that an increase in cholesterol up to 6.0monol/l would be extremely important in a
medical sense. What is the chance that the planned test will detect such an increase when it
really has occurred?
H0: µ = 5.5 (µo)
H1: µ > 5.5 (µo)
α = 5% and µ1 = 6.0 µ0 = 5.5, σ = 1.4, n = 50
Since it is one- sided test:
−−<=
−−<=
05.0
01
504.1
5.50.6ZZp
Z
n
Zppower ασµµ
Z0.05 from statistical table = 1.645
=
=
=
Power = 0.8106
Hence, the chance of finding a significant result when the mean really is 6.0 is about 81%
Example 2
Using example 1 again, but here, we wish to test for a cholesterol difference in either
direction so that a two – sided test is to be used. What is the chance of detecting a significant
difference when the true mean is 8.0mmol/l and when the true mean is 4.0mmol/l? α = .1%
−∠Ζ 645.1198.0
5.0p
( )645.1525.2 −<zp
( )88.0<zp
152
Solution
H0 : µ = 5.5
H1 = µ = 8.0
α = 0.01
−−<=
−−<=
201.0
01
504.1
5.50.8ZZp
Z
n
Zppower ασµµ
( ) 105.10 =<zp
There is 100% chance of detecting an interest using a two sided test
If H0 : µ = 5.5 against
H1 = µ = 4.0
( )576.2198.00.45.5 −<= −Zppower
( ) 15 =<zp
Example 3
Likewise, consider the following example if the true mean is 6.0mmol/l at α = 5% we have,
( )960.1198.05.50.6 −<= −Zppower
( ) 7157.057.0 =<Zp
There is about a 72% chance of detecting an increase or decrease of 0.5moll using a two
sided test.
−< 576.2198.0
5.2zp
−< 576.2198.0
5.1zp
( )576.2576.7 −<zp
153
Determination of Sample size of a mean value For a One sided test
( )( )2
01
22
µµσβα
−+
=zz
n
For a two sided test
( )( )2
01
222
µµσβα
−+
=zz
n
Example 4
Considering example 1 again, the epidemiologist decides that an increase in mean total serum
cholesterol up to 6.0 moll from the known previous value of 5.5mmoili is so important that
he would wish to be 95% sure of detecting it. Assuming that a one-sided 10 % test is to be
carried out and the standard deviation, σ is known to be 1.4 moll as before, what sample size
should be used?
Solution
( )( )2
01
22
µµσβα
−+
=zz
n
µ 1 = 6.0, µ 0 = 5.5
2σ = 1.42 = 1.96
Z α = Z 0.1
= 1.282
Power = 1-β , β = I-P = 1- 0.95 = 0.05
Z β = Z0.05
= 1.645
( )( )2
2
5.56
96.1*645.1282.1
−+=n
n= 8.567329 x1.96
0.25
154
= 67.17
Value of n will be rounded up
n = 68
Example 5
Suppose the epidemiologist has decided to change his requirements. He now wishes to be
90% sure of detecting when the true means is 6.4 mmol/l with the previous data still the
same. Assuming that a two sided 5% test is to be carried out, what sample size should be
used?
( )( )
( )
25
43.25
9.0
6007.20
4.1*5.54.6
282.196.1
,
282.1
96.1
2
22
2
1.0
025.02
==
=
−+=
==
==
n
n
Therefore
zz
zz
β
α
Example 6
Vitamin A supplementation is known to confer resistance to infectious diseases in Nigeria,
although the precise mechanism is unclear. To seek to explain the mechanism, a group of
healthy children will be given vitamin A supplements for a long period. At baseline and at
the end of the supplementation period, urine samples will be taken from all the children and
neopterin excretion measured. The current level of neopterim is estimated as 600 mmol/per
millomole of creatinine and the standard deviation of change due to supplementation has
been estimated (from a small pilot sample) to be 200 mmol/mmol creatinine.
i. Suppose that a drop of 10% neopterim is considered a meaningful difference, which it
is wished to find with 95% power using a two-sided 1% test. How many children
should be recruited?
ii. Suppose that resources allow only 150 children to be given the supplements. What is
the chance of detecting a drop of 10% neopterin with a 10% two-sided test?
155
Solution
( )( )
( )( )
( )( )
198
9649.197
40000*540600
221.4
54060600
60600*100
10
10%by drop
600
sin
40000*645.1576.2
645.1
576.2
05.0%951
40000
200
).
26064.712673
2
2
0
0
1
201
2
05.0
005.02
2
201
222
=
==−
=
=−=
==
=
−+=
==
===−=
=
=−
+=
n
n
ce
N
zz
zz
zzni
µ
µµ
µµ
βσσ
µµσ
β
α
βα
ii). n = 150
( )( )
%4.86
8639.0
098.1
576.2674.3
150200
540600
150200
540600
%1,200,540,600
201.0
2
01
==
<=−<=
−−<=
−−<=
====
power
zp
zp
zzp
zzppower α
ασµµ
156
Determination of Sample Size of Proportion
If the problem is to test the hypothesis
H0: P = Pm
H1: P= Pt = Pm + d
Where P is the true population; Pm is some specified value for this proportion for which we
wish to test or the minimum level of the effectiveness of which if the drug does not achieve,
results in the drug being dropped from further test and Pt (which differs from Pm by an
amount d) is the alternative value which we would like to identify correctly within probability
of β−1 or it is the interest level of effectiveness such that if the drug achieved or exceed it,
it would mean that the drug has an efficacy level worthy of further investigation in a phase III
trial
One-sided test
( ) ( )( )( )
( ) ( )( )( )2
2
2
2
2
11
testsided- Two
11
mt
ttmm
mt
ttmm
pp
ppzppzn
pp
ppzppzn
−−+−
=
−−+−
=
βα
βα
Example 7
Suppose that in a country where male smoking has been extremely common in recent years, a
government target has been to reduce the prevalence of male smoking to almost 30%. A
sample survey is planned to test at the 5% levels the hypothesis that the proportion of
smokers in the male population is 0.3 against the one-sided alternative that it is greater. The
test should be able to find a prevalence of 32%; when it is true with 90% power. Determine
the population that should be sampled.
157
( ) ( )( )( )
( ) ( )( )( )
4567
2.4567
3.032.0
68.032.0282.17.03.0645.1
32.0,3.0
282.1,1.09.01,645.1
11
2
2
1.005.0
2
2
≈=
−+
=
==
===−===−
−+−=
n
n
pp
zzzz
pp
ppzppzn
tm
mt
ttmm
βα
βα
β
Example 8
A certain drug is to be entered into a phase 2 trial. If its effectiveness is found to be lower
than 20%. It will drop but if it is at least 50%, it will be pursued in further comparative trial.
If the significance level is taken to be 5% with a power of 80%, how many subjects are
needed to be studied?
Solution
( ) ( )( )( )
( ) ( )( )( )
( )( )
( )
13
9.1209.0
078.1
3.0
42.0658.0
2.05.0
5.05.084.08.02.0645.1
5.0,2.0
84.0,2.08.01,645.1
11
2
2
2
2
2
2.0
2
2
≈
==
+=
−+
=
==
===−==−
−+−=
n
n
n
pp
zzz
pp
ppzppzn
tm
mt
ttmm
βα
βα
β
Analysis of Clinical Trial
In a clinical trial of Aspirin versus placebo in the treatment of headache, the results were as
shown in the table below:
158
No headache Headache Total
Aspirin 130 (102) 20 (48) 150
Placebo 40 (68) 60 (32) 100
Total 170 80 250
Examine whether the difference in those cure with Aspirin and placebo have risen by chance.
Solution
H0: The difference in those cure with Aspirin and placebo rise by chance.
H1: The difference in those cures with Aspirin and placebo does not rise by chance
α = 0.05
Compute the expected values
alcolumn tot is
totalrow is
totalgrand theis N Where
.E
i
i
ii
c
r
N
cr=
The expected values were computed and put in the table. Note that they are enclosed in
bracket.
Compute
( )
∑−=
i
ii
E
EO 22χ
Oi (Observed) Ei (Expected) (Oi-Ei)2/Ei
130 102 7.69
40 68 11.53
20 48 16.33
60 32 24.50
Σ(Oi - Ei)2/Ei = 60.05
159
Tabulated value
( ) ( )
( ) ( )
841.3
21,1,05.0
21,1,
=
−−
χ
χα rc
Conclusion
Reject H0 and conclude that the difference in those cures by Aspirin and placebo has not risen
by chance.
Self-Assessment Question for Study Session 5
Having studied this session, you can now assess how well you have learnt its learning outcomes
by answering the following questions. It is advised that you answer the questions thoroughly and
discuss your answers with your tutor in the next Study Support Meeting. Also, brief answers to the
Self-Assessment Questions are provided at the end of this module as a guide.
SAQ (Test of learning outcome)
Write short note on the following:
1. Randomization of Randomization
2. Experiment
3. Placebo
4. Research
References
Cochran W.G. & Cox G.M. (1992): Experimental Designs, Published by Wiley.
Ronald, E. Wapole (1982, 3rd Ed): Introduction to Statistics
Royal Statistical Society (2004): Statistical tables for use in examinations.
Woodward, W. (2005): Epidemiology: Study Design and Data Analysis. Published by Chapman & Hall/CRC.
160
Study Session 6: Biological Assay
Introduction
Bioassays are typically conducted to measure the effects of a substance on a living organism
and are essential in the development of new drugs and in monitoring environmental
pollutants.
Bioassay (commonly used shorthand for biological assay or assessment), or biological
standardization is a type of scientific experiment. A bioassay involves the use of live animal
or plant or tissue or cell [in vitro (is the technique of performing a given procedure in a
controlled environment outside of a living organism)] to determine the biological activity of a
substance, such as a hormone or drug.
You are about to learn about Bioassay as an important area of application of statistical
methods. The general principles were established mainly during the 1930s and 1940s, and
were rapidly developed in detail to an extent which cannot be adequately covered here.
Learning Outcome for Study Session 6
At the end of this study, you should be able to:
6.1 Define and Identify the basic components of Bioassay;
6.2 Highlight and explain the type and nature of Bioassay.
6.1 Biological Assay
A ‘biological assay’ is an experiment to determine the concentration of a key substance (i.e.
the substance under investigation, may be a pharmaceutical preparation like digitalis or
penicillin) in a preparation by measuring the activity of the preparation in a biological
system.
161
The term ‘bioassay’ is often used in a more general sense, to denote any experiment in which
specific responses to externally applied agents are observed in animals or some other
biological system. The emphasis in such experiments is often to measure the effects of
various agents on the response variable, and no attempt is made to estimate relative potency.
This usage is for example, common in the extensive programmes for the testing of new drugs,
pesticide, and food additives, etc.
Figure 6.2: Biological assay experiment
Vividly, biological assay or bioassays are methods used to compare the potencies of two
preparation of a drug by observing their respective actions on biological systems (e.g. plant,
animal) in relation to a standard preparation whose potency in arbitrary unit is known.
In general, bioassay is the measurement of the potency (i.e. Efficacy or effectiveness) of any
stimulus by means of the reactions that it produces on living matter. There are two types of
preparation, they are:
Test Preparation (T) – It is the preparation whose potency is required. It is usually some
quantity of naturally occurring diluents
containing an unknown concentration of substance under investigation.
Standard Preparation (S) –
The institution of a standard preparation usually leads to the definition of the standard unit as
the activity of a certain amount of the standard.
Any test preparation (T), can now be assayed against the standard (S) by simultaneous
experimentation. If, say S is defined to contain 2000 units per gram and T happens to contain
150 units per gram and therefore, the relative potency or potency ratio of T in terms of S is
then said to be 2000
150=R
40
3=
The relative potency of T is the ratio of the mean of test preparation to mean of the standard preparation.
ρ = R= S
T
µµ
The estimated relative potency
Basic Components of Bioassay
There are three essential basic aspect of an assay namely:
� Stimulus.
� Subject or experimental unit or biological material.
� Response.
Test preparation
Standard preparation
162
Figure 6.3: Types of preparation
is the preparation whose potency is required. It is usually some
quantity of naturally occurring diluents, or of a product of a manufacturing process
containing an unknown concentration of substance under investigation.
It is the preparation whose potency in arbitrary unit is known.
The institution of a standard preparation usually leads to the definition of the standard unit as
the activity of a certain amount of the standard.
can now be assayed against the standard (S) by simultaneous
experimentation. If, say S is defined to contain 2000 units per gram and T happens to contain
150 units per gram and therefore, the relative potency or potency ratio of T in terms of S is
The relative potency of T is the ratio of the mean of test preparation to mean of the standard
The estimated relative potency s
T
X
XR == ρˆ
Basic Components of Bioassay
There are three essential basic aspect of an assay namely:
Subject or experimental unit or biological material.
Test preparation
Standard preparation
is the preparation whose potency is required. It is usually some
or of a product of a manufacturing process
It is the preparation whose potency in arbitrary unit is known.
The institution of a standard preparation usually leads to the definition of the standard unit as
can now be assayed against the standard (S) by simultaneous
experimentation. If, say S is defined to contain 2000 units per gram and T happens to contain
150 units per gram and therefore, the relative potency or potency ratio of T in terms of S is
The relative potency of T is the ratio of the mean of test preparation to mean of the standard
163
1 Stimulus
It is an agent which causes a reaction or change in an organism or any of its part. The
stimulus (usually a drug) is applied to an experimental unit (animal, man or living material).
It is also referred to as the objects (substance) whose potency is been measured. Examples are
vitamins, drug, fungicide, insulin, analgesic etc.
The application of a stimulus at a given dose results in a change in some measured
characteristics. It is the procedure whose effect is to be measured.
2 Subject or Experimental or Biological Material
This is referred to as the piece of animal, tissues, plant or living matter to which the stimulus
is applied. It is the smallest division of the experimental material or as the unit of material to
which one application of a treatment (usually a stimulus) is applied.
3 Response
It is a measurable characteristic of the subject that develops as a result of the application of
the stimulus e.g. there are some chronic reactions such as allergic reaction that occurs when
some people are subjected to chloroquine.
It is defined as the reaction that occurs when a stimulus is applied to a subject. Concisely, it is
the reaction of the subject.
6.1.1 Purpose of Bioassay
It is used to estimate the relative potency of the stimulus needed to produce the desired
response. It is used to compare the potency obtained from a test preparation relative to a
standard preparation.
Design of an Assay
Let S be a standard preparation of a drug whose potency in arbitrary unit is known and T be
another test preparation whose potency is required. Take a number of subjects, divide them
into two groups, one of the groups is allocated to the standard preparation and the test
preparation is assigned to the other groups. To each subject, the drug is administered until
pre-assigned response has occurred. The dose corresponding to the occurrence of the pre-
164
assigned response is recorded as an observation. The mean of this observation are taken in
order to estimate the potency. A measure of potency of a test preparation is the relative
potency defined as s
T
X
XR == ρˆ
6.2 Types of Assay
There are two major categories of biological assay and this can be further categorized into
other types:
1. Qualitative Assay: It is a form of assay in which the potency cannot be measured
numerically e.g. colour, complexion.
2. Quantitative Assay: Is a form of assay where it is possible to measure the potency of
the stimulus. This can be further broken down into two main type of assay.
a. Direct assay
b. Indirect assay
6.2.1 Direct Assay
In direct biological assay, the potency of a test preparation can be examined directly and its
relative potency can be determined in comparison with that of a standard preparation. The
simplest form of assay is the direct assay, in which increasing doses of S and T can be
administered to an experienced unit until a certain critical event take place. This selection is
rare. An example is the assay of prepared digitalis in the guinea pig, in which the critical
event is arrest of heart beat. The dose given to any animal is a measure of the individual
tolerance of that animal which can be expected to vary between animals through biological
and environmental causes.
The relative potency of the test preparations is defined by s
T
X
X=ρ
Derivation of the Confidence Interval for ρ
THEOREM
Let TX and SX be the samples average of test and standard preparations in a potency test. If
the standard preparations have mean and variance sµ and n
2σ respectively and the test
preparation has mean Tµ and variancem
2σ . Assuming the independence of the two samples,
165
the distribution ST XX ρ− has mean 0 and variance
+
nm
22 1 ρσ therefore the appropriate
confidence interval for ρ is
( )
−−−±
− 2
22
2
2
111
1
TS
T
S
T
Xm
stg
X
X
X
X
g
( )
−−−±
2
22
11ˆˆ1
1
TXm
stg
-gor ρρ
Proof
ρ S
T
µµ= ………… equation (1)
S
T
X
XR == ρ ………… equation (2)
Var (R) ( ) ( )( )sT
s
XVRXVX
2
2
1 +≅
The standard deviation of the estimate of R can be obtained for a large sample size. Recall
that the variance of the sample average is the variance of the population divided by the
number of observations. Hence, both ( )TXV and ( )sXV can be obtained from the sample. If
the variance is assumed to be the same for the standard and test preparation, a pooled
variance estimate can be used for both.
Then the estimate of the common variance 2σ
i.e ( ) ( ) ( )∑∑==
− −+−−+==m
iTTi
n
issi XXXXmns
1
2
1
2122 2σ
= ( ) ( )
( )21
2
1
2
−+
−+− ∑∑==
mn
XXXXm
iTTi
n
issi
equation (3)
166
Since
( ) ( ) ( )( ) ( )
)4( ...................... 1
22
22
2
2
equationnm
nm
XVXV
XVXVXXV
ST
STST
+=
+=
+=
+=−
ρσ
σρσρ
ρρ
Since S
TST µ
µρρµµ ==− and 0
i.e 0=
− S
S
TT µ
µµµ then using S2 as an estimate of 2σ then we can write
+
−
nmS
XX ST
21 ρ
ρ ………….equation (5)
Equation (5) has a t-distribution with n+m-2 degree of freedom (where n and m is the no of
observation of standard and test preparation respectively). The ( )α−1 level of confidence
interval is obtained with the help of the following inequality.
+<−<
+−
−−+−−+ nmStXX
nmSt
nmSTnm
2
21,2
2
21,2
11 ρρραα
……………… equation (6)
This will lead to obtaining the interval by solving the quadratic equation
( )
+=−
nmStXX ST
2222 1 ρρ …………….. Equation (7)
Equation (7) can be re-written after multiplying out all terms and collecting those involving
ρ2 and ρ as
0222
222
2 =−+−
−
m
StXXX
m
StX TTSS ρρ
167
−
−
−−±=
m
stX
n
stX
m
stXXXXX
S
TSTSTS
22
222
2222
…………. Equation (8)
The root of a quadratic equation are not real i.e. the expression in equation (8) above inside
the radical sign is –ve. This means we are unable to find the confidence interval for the
relative potency by this method. Also, by dividing the numerator and denominator of right
hand side of equation (8) by 2
SX , the confidence interval for ρ can also be written as:
where
( )
2
22
2
22
2
2
111
1
S
TS
T
S
T
Xn
stg
Xm
stg
X
X
X
X
g
=
−−−±
−
……………. (9)
The confidence limit in equation (9) is a special case of the well known Feller’s theorem ……
Note
1) t can be obtained from the statistical table i.e t =,
, 22
−+nmtα
2) ( )ST XXE ρ− is normally distributed with mean zero
( ) ( )
S
T whereµµρρµµ
ρ
=−
−
ST
ST XEXE
0=− ss
TT µ
µµµ
168
Example 1
The standard preparation of a drug gave the determinations of 10.5, 10.7, 9.8, 9.5 and 9.5. For
the test preparation, we have 12.7, 11, 5, 10.7, 13.5, 12.7 and 13.3.
a. Find the point estimate of the relative potency of the test preparation.
b. Give a 90% confidence interval for the relative potency.
Solution
Relative potency, S
T
X
XR == ρ
m
XX
m
iT
T
i∑== 1 and
n
XX
n
iS
S
i∑== 1
siX ( )2
ssi XX − TiX ( )2
TTi XX −
10.5 0.25 12.7 2.0449
10.7 0.49 11 0.0729
9.9 0.04 5 39.3129
9.5 0.25 10.7 0.3249
9.5 0.25 13.5 4.9729
12.7 2.0449
13.3 4.1209
Total 50 1.28 78.9 52.8943
Therefore S
T
X
X=ρ
Where m
XX
m
iT
T
i∑== 1
= 78.9
7
= 11.27
169
127.1 10
27.11 implies This
10550
1
=
==
=∑
=
ρ
n
XX
n
iS
S
i
Therefore the relative potency of the test preparation is 1.127
b) 90% confidence internal for the relative potency
( )
−−−±
− 2
222 11ˆˆ
1
1
TXm
stg
gρρ
And
( ) ( ) ( )
( ) ( )42.5
8943.5228.110
2
2
12
1 1
2212
=
+=
−+−−+=
−
= =
− ∑ ∑
s
s
XXXXmnsn
i
m
iTTissi
Therefore g = ( ) ( )
2
2
105
4258121
X
..
= 0. 036
Therefore 90% confidence limit for ρ
812110050
21025722
2
22
.
,.
.,,
==
=
=
−+−+
t
tt
Xn
stgwhere
nm
S
α
170
( ) ( ) ( )
( )( ) ( )( )
( )7160
57001691
3252801691
02019640127112710371
27117
42581211036011271127103601
2
2
221
.,.
..
..
.....
.
......
=
±=
±=
−−±=
−−−±−= −
X
Therefore, the confidence limit for ρ
is 7.1ˆ6.0 ≤≤ ρ
Example 2
Let m = n = 7, SX = 6.25, TX = 8.5 and let S2 = 0.358 for a 95% confidence interval, we
obtain t = 1.882 for 12 degree of freedom.
1. Find the estimate of potency
2. Construst a 90% confidence interval for P
Solution
1.
36.1ˆ25.6
5.8ˆ
ˆ
=
=
=
ρ
ρ
ρS
T
X
X
2. ( )
−−−±
− 2
222 11
1
1
TXm
stgRR
gˆˆ
( ) ( )( )
( )( )4375273
35801755243
2567
358078212
2
2
22
.
..
.
..
==
=
X
Xn
stgwhere
S
171
g = 0.00416
( ) ( ) ( )
( )( )[ ][ ][ ]
[ ][ ][ ]
[ ]2.29 ,.
2.285 ,..
.. ,...
...
...
....
.....
.
.....
.
440
43500041
925036192503610041
92503610041
85603610041
99360849613610041
002250199584084961361995840
1
587
3580782110041601361361
0041601
12
22
+−±
±
−±
−−±
−−−±− X
Therefore the confidence limit for ρ is
29.244.0 ≤≤ ρ
6.2.2 Indirect Assay
When direct assays are not feasible, we apply different doses to various subjects and we
measure the responses. We consider a case when the response is a continuous variable. We
then assume that the dose-response relationship has a parametric model. We are interested not
only in estimating potency, but also the overall dose-response relationship and the parameters
of the model involved.
Generally, few dose levels are tested and various standard designs can be used in conducting
a bioassay. The general theory of the design of experiment apply here as well. We shall
assume that the relationship between the log-dose and the response is linear. Let Ys and YT be
responses to a dose level of the standard preparation Zs and ZT where log Zs = Xs and log ZT =
XT
We assume linear relations for the test and standard preparations.
TTTT XY βα += --------------------------------------- (1)
ssss XY βα += --------------------------------------- (2)
172
If the test preparation has potency P, a dose of ZT (which is the same as pZs) gives the same
response Ys as the standard preparations. Notice that
( ) XsPZspPZsLogZX TT +=+=== loglogloglog
We shall discuss indirect assay under two types of assay.
1. Parallel line assay
2. Slope ratio assay.
Parallel Line Assay
Assume first that the dose-response relationship for the test drug is a line with the same
slope as the one of the standard preparation. Such assays are called parallel-line assay.
We then have
sTsTTs XPXwhenYYand +=== logββ
so that
( )ssTs XPY ++= logβα
or
sssTs XPY ββα ++= log -------------------------------------- (3)
Comparing equation (2) and (3), we have
PsTs logβαα += ------------------------------------------------------- (4)
Let sTs andβαα ˆˆ,ˆ be the estimates of the corresponding parameters, then
ssss XY βα −=ˆ -------------------------------------------------------- (5)
TsTT XY βα −=ˆ ------------------------------------------------------- (6)
22 )()(
))(())((
TTissi
TTiTTississis
XXXX
YYXXYYXX
−Σ+−Σ−−Σ+−−Σ
=β ------------------- (7)
The potency P can now be estimated in terms of sTs and^^^
, βαα with
s
TsRβ
ααˆ
ˆˆlog
−= -------------------------------------------------------- (8)
173
An alternative form of r is given by
( )Tss
Ts XXYY
R −−−
=β
log ------------------------------ (9)
In-Text Question
In _____________ its relative potency can be determined in comparison with that of a
standard preparation.
In-Text Answer
Direct assay
Example 3
In an experiment to determine the concentration of Oestrone, the percentage gains in weight
of rats are recorded as responses and test doses after a fixed number of days after treatment.
Suppose 4 breeds of 7 animals are considered, whereby three animals are used for standard
preparations of 0.2mg, 0.4mg and 0.8mg respectively, and four animals are used for the test
preparation, with doses 0.0075cc, 0.015cc, 0.03cc and 0.06cc respectively. The table below
gives the responses, obtained as percentage weights (in grams):
Daily Dose
Standard Oestrone Yeeit Osetrone
Letter 0.2Ng 0.4Ng 0.8Ng 0.0075cc 0.15cc 0.03cc 0.06cc
I 50 80 80 60 90 120 130
II 40 80 100 70 40 60 80
III 60 100 110 50 90 100 120
IV 50 60 70 60 100 120 150
Averages 50 80 90 60 80 100 120
174
Solution
The doses can be standardized so as to obtain simplification in graphing and
computations. The standard dose Zs can be transformed to Xs by dividing by 0.4 and
taking the log to base 2
( )4.0loglog2 22 −= ss ZX
Giving the coded value as -2,0, and 2.
Similarly,
( 1103.0log015.0log2
1log2 22 −
+−= TT ZX
transforms the test preparations to -3, -1, 1, 3 we then obtain the regression equations for
the standard and test preparations in coded units using equation (5), 6, and (7) as follows:
Ts
ss
XY
XY
1090
1073
+=+=
using the transformation we have
ss ZY 22 log204.0log2073 +−=
and ( ) TT ZY log2003.0log015.0log1090 22 ++−=
using equation 8, the estimate of the relative potency is obtained from:
[ ]
( )
029.0
8025.1log5303.0log
85.04.0
00045.0loglog
20
174.0log03.0015.0log
2
1
03.0log10015.0log10904.0log207320
1log
22
22
22
2222
=−=
−=
−−×=
++−−=
Rthatso
R
or
R
175
i.e. one of the test preparation consists of 0.03K/g as Oestrone.
The diagram below shows the regression lines with coded values
Slope-Ratio Assays
When the dose response relationship does not lead to parallel lines, we must use other
models. We can assume then that the lines have the same intercepts on the y-axis but have
different slopes. An assay that uses the ratio of slopes is called the slope-ratio assays. In this
case, we can estimate the relative potency of the test-procedure with the help of the ratio of
the slopes of the lines. For the standard preparation, we have:
sssY βα += Xs ------------------------------------ (1)
for the test preparation, we have
( ) sTsTss XPSPY βαβα +=×+= ---------------------------------- (2)
P is the potency of the test preparation.
Equation 13 and 14 give sββτ =Ρ or τβ
βsP =
The estimates of the potency can be made in terms of the ratio of the estimated slopes of
the lines.
150
100
50
-3 -2 -1 0 1 2 3
x
x
x
0
0
0 x
Parallel – Line Assay
176
^
^
T
sRβ
β=
let N1 be the number of observations from the standard preparation and N2 be the number of
observations from the test preparation. The estimates of sTs and αββ , by pooling
both samples are
∑
∑
−
−
−=
−
−−
2
^
SSI
ssissi
s
XX
YYXXβ ----------------------------------- (3)
∑
∑
−
−
−=
−
−−
2
^
Tti
TTiTti
t
XX
YYXXβ --------------------------------------- (4)
and 21
^
2
^
1^
NN
XYNXYN TTTsss
s+
−+
−=
−−−ββ
α ---------------------- (5)
6.2.3 Definitions of some Terms
The following are the definition of some Terms:
Median Lethal Dose
This is the median of the response distribution. It signifies the value of the dose below which
half of the population will die. It is denoted by LD50.
Median Effective Dose
It is the dose producing response in 50% of trial. This means the median of the response
distribution when the response is something other than death. It is denoted by ED50.
Minimal Lethal Dose/Minimal Effective Dose
It is dose for any given poison which is only just sufficient to kill all the subject of a given
specie and that if the dose when little or smaller is given, it will not kill any of its subjects. It
177
can be obtained only by observation.
ED90. It refers to the dose that produces a response in 90% of trial or subjects.
Method of Estimating ED50
1. The spearman-kerbal method
2. Probit analysis
But for the scope of this course, we shall be restricted to the first method.
Spearman- Kerbal Method
Suppose that doses of a preparation having logarithm x1, x2 ..xk are tested and ri = 1,2, … of
the subject at dose xi responds then i
ii n
rp = is an estimate of the response rate at dose
xi
Suppose that Pi = 0 and PK = 1 then (Pi +1-Pi) is an estimate of the proportion of the subject
whose tolerance lies between xi and xi+1.
If a subject can tolerate up to dose d of a preparation such that any dose greater than d will
always kill it and any dose less than d will not kill it then d is the tolerance of the subject
relative to the preparation. If successive doses are fairly close together, the mean may be
estimated as
( )( )
21
11∑=
++ −−=
k
iiiii xxPP
m ---------------------------------------- (1)
If the doses are equally spaced so that idxix i ∀=−+ 1 then the estimate of ED50 = m=
∑−+ Piddx k 2
1 --------------------------------------- (2)
If the number of dose is also constant and equal to n then
ED50= m ∑−+= nrd
dx i
k 2 --------------------------------------- (3)
The variance for equation (2) is written as
178
∑=
=n
i
ii
n
qpdmv
1
2)(
the variance for equation (1) and (3) are given as
( ) ( ) ( )∑=
−−
=n
i
rinnn
dmV
12
2
1r i
Example 4
The table below gives the log-dose of a drug enough to kill a random sample of different rate
size and the number dead
Log-dose (xi) Number of rats ni Number dead ri
4 5 0
5 7 1
6 9 2
7 6 1
8 10 4
9 8 5
10 12 8
11 15 15
Obtain an estimate of ED50 and place a 95% confidence bound on your estimate
179
Solution
Since no of rat i.e. n is not constant, equation (2) is applied.
Log dose xi
ni ri Pi=
ni
ri
q = 1-pi
n
qp ii
4 5 0 0.00 1 O
5 7 1 0.14 0.86 0.02
6 9 2 0.22 0.78 0.02
7 6 1 0.17 0.83 0.02
8 10 4 0.40 0.60 0.02
9 8 5 0.63 0.37 0.03
10 12 8 0.67 0.33 0.02
11 15 15 1.00 0.00 0
Total 3.23 0.12
( ) ( )
12.0
12.0 x 1
27.5
23.35.08
23.3 x )1()1(2
18
1 2
1
2
1
250
50
50
1
50
==
==
==−+=
−+=
=−=
−+=
∑
∑
=
+
n
i
ii
ii
ik
nqpdEDVmV
Edm
ED
xxd
pddxED
180
Standard duration = ( )mV
= 12.0
S.E = 0.35
To place a 95% bound on ED50 we have
( )35.027.5
.
205.0
250
Z
ESZZED
±
± α
( )( ) ( )
( )956.5,584.4
686.027.5
35.096.127.5
35.0025.027.5
±±± Z
Example 5
The table below gives the logdose of a drug enough to kill a random sample of size 14 rats
and the number dead.
Logdose xi No of rats (n) No Dead (ri)
3 14 2
6 14 4
9 14 1
12 14 0
15 14 5
18 14 6
21 14 6
24 14 4
27 14 8
30 14 1
Obtain ED50 and place a 90% bound on your estimate.
181
Solution
Logdose xi n ri n-ri ri(n – ri)
3 14 2 12 24
6 14 4 10 40
9 14 1 13 13
12 14 0 14 0
15 14 5 9 45
18 14 6 8 48
21 14 6 8 48
24 14 4 10 40
27 14 8 6 48
30 14 1 13 13
Total 37 319
Where d= xi + 1 – xi = 3 xk = 10
( ) ( )
6.3
9.75.11
9.75.11014
3733
2
110
2
1
50 =−=
−+=
−+=
−+= ∑
ED
n
riddxm
nk
(ii) To place a bound on ED50
∑ −−
= )()1(
)(2
2
rinrinn
dmV
)319()114(14
32
2
X−
=
182
2548
2871)( =mV
( )( )
ESZED
mVEStherefore
mV
.
06.1
1268.1
..
1268.1
250 α+⇒
==
=
=
( )( ) ( )
( )3437.5,8563.1
7437.16.3
06.1645.16.3
06.16.3 05.0
=±±=±= Z
Example 6
Log dose xi Number of rats ni Number of dead ri
1.0 5 0
1.5 7 1
1.8 9 2
2.1 6 1
2.7 10 4
2.8 8 5
2.9 12 8
3.1 15 15
Obtain an estimate of ED50
Solution
183
Log dose xi Number of rats ri ni
riPi = PiPi −+1
1.0 5 0 0 ____
1.5 7 1 0.14 0.14
1.8 9 2 0.22 0.08
2.1 6 1 0.17 -0.005
2.7 10 4 0.40 0.23
2.8 8 5 0.63 0.23
2.9 12 8 0.67 0.04
3.1 15 15 1.00 0.33
( )( )2
111∑
=++ +−
=
n
iiiii xxPP
m
5.22
99.4
1
=
=
=i
xi + xi+1 ( )( )11 ++ +− iiii xxPP
2.5 0.35
3.3 0.26
3.9 -0.20
4.8 1.10
5.5 1.22
5.7 0.23
6.0 1.98
4.99
184
Summary of Study Session 6
In this study, you have learnt about:
1. The basic components of Bioassay;
2. The type and nature of Bioassay.
Self-Assessment Question for Study Session 6
Having studied this session, you can now assess how well you have learnt its learning outcomes
by answering the following questions. It is advised that you answer the questions thoroughly and
discuss your answers with your tutor in the next Study Support Meeting. Also, brief answers to the
Self-Assessment Questions are provided at the end of this module as a guide.
SAQ (Test of learning outcome)
The standard preparation of a drug gave the determinations of 10.5, 10.7, 9.8, 9.5 and 9.5. For
the test preparation, we have 12.7, 11, 5, 10.7, 13.5, 12.7 and 13.3.
a. Find the point estimate of the relative potency of the test preparation.
b. Give a 90% confidence interval for the relative potency.
References
Dobson A.J. (2002, 2nd Ed): An Introduction to Generalised Linear Models, published by
Chapman and Hall.
McCullagh P. & Nelder J.A. (1989, 2nd Ed): Generalised Linear Models, published by
Chapman and Hall.
Montgomery D.C., Peck E.A. & Vinning G.G. (2001, 3rd Ed): An Introduction to Linear
Regression Analysis, published by Wiley.
Myers R.H. (1992, 2nd Ed): Classical and Modern Regression with Applications, published by
PWS/Kent.
Thomas, H. Wonnacott, Ronald J. Wonnacott (1977, 2nd Ed): Introduction Statistics for
Business and Economics.
Royal Statistical Society (2004): Statistical tables for use in examinations.
185
Study Session 7: Life Table
Introduction
A very long time ago, before the development of modern probability and statistics, people
were concerned with the length of life and constructed tables to measure longevity.
Traditionally, the life table was used primarily in the actuarial science for the analysis of life
contingencies and in demography to study population changes.
Today, life table methodology is being applied to vital statistics summarizing the mortality
experience of the current population. In this study you will learn about the uses and types of
life table including their functions.
Learning Outcome for Study Session 7
At the end of this study, you should be able to:
7.1 Give the definition, uses and types of life table;
7.2 State the functions of a life table column headings and construct a complete and an
abridged life table; and
7.3 Solve various questions on life tables.
7.1 What is a Life Table?
Life table is a life history of a hypothetical group or cohort of people as it diminishes
gradually by death. It gives the mortality experience of a given population in a standard form.
It is usually used to study the mortality experience of selected cohort. A cohort is a group of
individuals followed over time for a specific purpose.
Life tables provide a description of the most prominent aspect of the state of human
mortality. Essentially, they illuminate and summarize the mortality experience of a
population. The life table is a valuable analytical tool for demographers, epidemiologists,
biologists, and research workers in other fields.
186
7.1.1 Assumptions of Life Table
The record of a life table begins at birth of each member and continues until every member
dies. The cohort losses the pre-determined proportion at each age and this represents a
situation that is artificially defined. This is done by means of a few stated assumptions, which
may be described as follows:
���� The cohort originates from an arbitrary number of births always set as round figures,
such as 1,000, 10,000, 100,000 e.t.c, called the radix.
���� Cohort is closed against migration
���� People die at each age group according to a schedule that is fixed initially, and does
not change.
���� At each age group, except the first few years of life, deaths are evenly distributed
between one age group and the next.
���� The cohort normally contains members of only one sex.
7.1.2 Usefulness of Life Tables
Life tables can be specifically applied to the following aspects of population analysis:
���� Life table models or references can be used for the graduation, adjustment and
extension of limited and defective data.
���� The study of replacement, reproduction, birth intervals, marriage and contraceptive
use.
���� It is used to study population growth and population estimate.
���� In the summarization of age- specific risks of death in a population, by providing us
with the chances of dying and surviving as a function of age and in the comparison of
these risks in different population groups.
���� It is also used by insurance companies to determine the amount to be paid as premium
on life insurance.
Life tables developed about four hundred years ago give the mortality experience of a given
population in a standard form.
7.1.3 Types of Life Table
There are basically two types of life table in use: the conventional (or periodic) life table or
cross-sectional life table and the generation (or cohort) life table.
One conceptual approach to understanding a life table is to imagine a group [of persons all
born at exactly the same moment]. Some persons in this imaginary birth cohort will die
before reaching their first birthday: others will die during their second year and so on.
Although some members of the cohort will attain very advanced ages, all will eventually die
A. The Conventional Life Table
The conventional or periodic life table gives a cross
survival experience of a population during a current year (e.g. the Oyo State population of
2006). It is entirely dependent on the age
it is constructed. Such tables project the life span of each individual in a hypothetical cohort
on the basis of the actual death rates in a given population.
The conventional life table is the most effective means of summarizing th
survival experience of a population.
B. The Generation Life Table
The generation (or cohort) approach has been basically explained by the example given by
the conceptual approach (cohort study) with the modification to take account of pra
situations. In practice, the cohort or generation life table is usually constructed for groups still
living and stops at the present age. The generation life
life span of persons making up the cohort, which st
187
ally two types of life table in use: the conventional (or periodic) life table or
sectional life table and the generation (or cohort) life table.
One conceptual approach to understanding a life table is to imagine a group [of persons all
tly the same moment]. Some persons in this imaginary birth cohort will die
before reaching their first birthday: others will die during their second year and so on.
Although some members of the cohort will attain very advanced ages, all will eventually die
Figure 7.1: Types of Life Table
The Conventional Life Table
The conventional or periodic life table gives a cross-sectional view of the mortality and
survival experience of a population during a current year (e.g. the Oyo State population of
2006). It is entirely dependent on the age-specific death rates prevailing in the year for which
it is constructed. Such tables project the life span of each individual in a hypothetical cohort
on the basis of the actual death rates in a given population.
The conventional life table is the most effective means of summarizing th
survival experience of a population.
The Generation Life Table
The generation (or cohort) approach has been basically explained by the example given by
the conceptual approach (cohort study) with the modification to take account of pra
situations. In practice, the cohort or generation life table is usually constructed for groups still
living and stops at the present age. The generation life-table requires data covering the entire
life span of persons making up the cohort, which stands as a serious limitation to its
The Generation Life Table
The Conventional Life Table
Complete Life Table
Abridged Life Table
ally two types of life table in use: the conventional (or periodic) life table or
One conceptual approach to understanding a life table is to imagine a group [of persons all
tly the same moment]. Some persons in this imaginary birth cohort will die
before reaching their first birthday: others will die during their second year and so on.
Although some members of the cohort will attain very advanced ages, all will eventually die.
sectional view of the mortality and
survival experience of a population during a current year (e.g. the Oyo State population of
in the year for which
it is constructed. Such tables project the life span of each individual in a hypothetical cohort
The conventional life table is the most effective means of summarizing the mortality and
The generation (or cohort) approach has been basically explained by the example given by
the conceptual approach (cohort study) with the modification to take account of practical
situations. In practice, the cohort or generation life table is usually constructed for groups still
table requires data covering the entire
ands as a serious limitation to its
188
preparation.
The generation life table has, however, wide application in medical research where it can be
used to estimate the expectation of life after an operation for certain types of diseases, where
follow- up is necessary. Also, cohort life tables do have practical applications in the study of
animal populations and have been extended to assess the durability of inanimate objects such
as engines, electric light bulbs, and other mechanical objects.
Merits and Demerits of the Two Methods
The merits and demerits of the two types of life table is hinged on which is more appropriate
for specific purposes.
The current (or periodic or conventional) life table presents a picture of the mortality at one
moment of time. It combines the mortality experience of different people, and no single
cohort has actually experienced or will experience this particular pattern, which is in a way
fictitious, but the value of the fiction lies in the graphic and easily comprehensible way it
summarizes mortality conditions.
The cohort (or generation) life table presents a historical record of what has actually occurred
and presents a realistic mortality pattern for one cohort only. The numbers involved are
usually small, and the procedure takes a long time to complete. It is tedious and cumbersome.
Basically for most demographic research, conventional or current life tables are used, and our
main concern is to study how to construct such tables and the uses of its various columns.
Construction of Life Tables
There are two major types of life table which can be constructed:
1. The complete life table
2. Abridged life table
C. Complete Life Table
A complete life table is a table in which the mortality experience is considered in single years
of age throughout the life span. All the columns of the life table are given for single year and
it is extremely detailed.
189
Limitations
1. It needs very extensive detailed data which is not always available to most countries.
2. Since the construction is only carried out in single years, one also needs large
numbers of, say death, at each point; otherwise the estimates will be subject to
systematic and random fluctuations.
D. Abridged Life Table
An abridged life table is one in which the measures are given not for every single year of age
but for age groups. The values are also in general more appropriate i.e. elaborate adjustments
are not made, although they are accurate enough for most practical purposes. It contains
columns similar to those of the complete life table. The only difference is the length of
intervals.
In-Text Question
Life table is a life history of a hypothetical group or cohort of people as it diminishes
gradually by death. True/False?
In-Text Answer
True
7.2 Relevance of Rates in Life Table Analysis
The following are the relevance of rates in Life Table Analysis:
1. Crude Death Rate
2. Age- Specific Death Rate
7.2.1 Crude Death Rate
This is the number of deaths expressed as a fraction or percentage of the total population at a
given point in time.
CDR KxP
D =
190
Where D = number pf deaths during the period
P = number of people in a community present over a specific period.
K = an appropriate constant, usually 100,000 for human population.
Due to migration, births and deaths, the total number of persons in a given year may not be
known precisely. In this case, the mid- year population count could be used as the population
for the year.
Example
A community has 53,280 people and there are 419 deaths in the community during 1999, the
crude death rate (CDR) for 1999, is
CDR 1999 = 419 (100,000)
53,280
= 786 deaths per 100,000 populations
7.2. Age- Specific Death Rate
We sometimes need death rates for specific segment of the population, such as for a certain
age group, or a certain disease or occupation or for a specific sex. The death rates are then
obtained for that particular group. In that case, the number of deaths is divided by the
population at risk.
The population at risk is the total population of the group we are concerned with. Suppose
that during a year we have Dx death in the age group (x, x+1), where the total number of
persons in the same age group (x, x+1) is Px, then the age specific death rate is given by
ASDR 100,000 x x
x
P
D=
191
Example
Suppose the number of people in the age group 35- 39 years is 30,000, and there were 115
deaths from heart disease. Then, the diseases and age specific death rate for 35- 39 years old
persons is
115 X 100,000
30,000
= 383 death per 100,000 persons.
Functions of the Life Table Columns
Generation and cohort life tables are identical in appearance but different in construction. The
following discussion refers to the complete life table.
Let xt p denote the population in the interval x to (x + t) years, and tDx denote the number of
deaths in the interval x to (x + t) years. The following column headings are used in a standard
life table:
(i) Age Interval: (x, x +t)
In the complete life table, t= 1, but in the abridge life table, t=5 except for the first year and
the next four years. So, the intervals for an abridge life table are (0-1). (1-5), (5-10), (10-15)
and so on.
(ii) Age specific Death Rate: (tmx)
This is defined as the number of deaths in a given year between exact ages x and x+t divided
by the mid-year population in an actual population. It is a central rate and forms the basic
data from which life table analysis begins.
tmx x
tx
t
p
D=
where tDx = death in the interval x to x+t
tPx = mid- year population in interval x to x+t.
192
(iii) Survivors of a Cohort of Live Born Babies to the Exact Age x: (lx)
For all real population, this column is a finite non-negative monotonic and non-increasing
function of x over the age range of individual. It can be conveniently taken as 100, 1000,
10,000 or 100,000. If lx is linear and the ultimate age (the age at which everyone dies) is 108,
then l1o8 = 0. If we assume that l0= 108, then lx= 108-x.
(iv) Death Experienced by the Life Table Cohort within the Interval x to x+t : (tdx)
This can be obtained by taking the first difference of the lx column starting with the radix.
Thus, tdx may be interpreted as the number of persons who died during the t-year period after
reaching exact age x. The length of the period t is then equal to the interval between the
birthdays.
For instance, 5d1 therefore refers to the number of persons who reach their first birthday but
die before reaching their sixth birth day.
xt d = lx – lx+t
For t = 1
xd = lx – lx+1
(v) Probability of a Person Aged x Dying within the Interval x to x+t :(tq x)
This is the conditional probability of death of an individual in the age interval (x, x+t) given
that a person has reached age x. It is derived from the corresponding age-specific death rates
of the current population.
xt
xt
x
x
x
xx
x
xx
x
tx
x
txx
x
xt
xt
pqalso
l
l
l
ll
l
dq
tfor
l
l
l
ll
l
dq
−=
−=−
==
=
−=
−==
++
+
+
1 ,
1
1
1
11
where tPx = conditional probability of surviving the interval x to (x+t) years.
193
Note:
This tqx differs from the ordinary age specific death rate, tMx, in the sense that tqx is based on
the life table population at the beginning of the age span x to x + t intervals to which it refers.
The tMx is a central rate based on the actual population at the mid- point of that period.
(vi) Probability of Surviving: ( tPx)
The function tPx defines the probability of a person aged x surviving through the interval x to
x + t year. It is the complement of tqx that is
tPx + tqx = 1
tPx = lx + t
lx
For t = 1
Px = lx + 1
lx
tPx + tqx = 1
tPx = 1- tqx
(vii) Life Table Central Death Rate: (tM x)
It is defined as the number of deaths in the life table (tdx) divided by the number of persons
(tLx) in the stationery population of the life table between age x and x + t
tMx x
ttxx
xt
xt
L
ll
L
d +−==
(viii) Number of Years Lived by the Total Cohort :(tL x)
The function tLx shows the number of persons years lived by the cohort during the interval
between specified birthdays that is between x to x + t years.
tLx ( )txx llt
++=2
194
on the assumption that those who died during a t – year interval live on the average 2
t years.
tLx can also be estimated as
tLx = tlx –2
t (tdx)
ix) Total Number of Years Lived Beyond Age x: (Tx)
The function Tx is derived directly from the tLx columns. This total is essential for
computation of the life expectancy. It is equal to the sum of the number of years lived in each
age interval beginning with age x or
Tx = Lx + Lx + 1 +--- + Lw
Where x = 0, 1, 2, ---, w
W= the highest age attainable
∑=
=w
xyyX LT In general,
TLx = Tx – Tx + t
Tx = Tx + t + tLx.
(x) Life Expectancy: (ex)
This is the expectation of life at age x. It is the number of years, on the average, yet to be
lived by a person of age x. The function is derived from the lx and Tx columns by the
relationship
ex = T x
lx
This column is the most important in the life table since each ex summarizes the mortality
experience of persons beyond age x in the population under consideration.
Since in a stationary population, the number of births in a year is equal to the number of
deaths in the year, then at x = 0
195
population Total
deathor birth ofnumber Total1
0
00
==T
l
e
This is the birth rate or the death rate in the stationary population.
Remark: The proportion of those living at age x who will survive to
age y is x
yyxxxy l
lPPPP == −+ 11ˆ...ˆˆˆ
Example: From the complete life table for California population in table 1, the expected
number of years to be lived by a person aged 20 is
01.54ˆ 20 =e
That is, the person is expected to die at age
= 20 + 20e
= 20 + 54.01
= 74.01 years.
Example of a complete life table:
The following table is the complete life table for total California, U.S.A., 1970, of persons
aged 0 to 23 years.
196
Table 1 Complete life table for total California population, U.S.A; 1970, of persons aged o to 23 years
x to x+1 qx 1x Dx Lx Tx ex
0-1 .01801 100000 1801 98631 7190390 71.90
1-2 .00113 98199 111 98136 7092029 72.22
2-3 .00086 98088 84 98042 6993893 71.30
3-4 .00073 98004 72 97966 6895851 70.36
4-5 .00052 97932 51 97906 6797885 69.41
5-6 .00049 97881 48 97857 6699979 68.45
6-7 .00045 97833 44 97811 6602122 67.48
7-8 .00034 97789 33 97772 6504311 66.51
8-9 .00031 97756 30 97741 6406539 65.54
9-10 .00030 97726 29 97711 6308798 64.56
10-11 .00031 97697 30 97682 6211087 63.58
11-12 .00033 97667 32 97651 6113405 62.59
12-13 .00035 97635 34 976581 6015754 61.61
13-14 .00041 97601 40 97581 5918136 60.64
14-15 .00048 96561 47 97538 5820555 59.66
15-16 .00062 97514 60 97484 5723017 58.69
16-17 .00093 97454 91 97408 5625533 57.73
17-18 .00105 97363 102 97312 5528125 56.78
18-19 .00142 97261 138 97192 5430813 55.84
19-20 .00156 97123 161 97043 5333621 54.92
197
20-21 .00162 96962 157 96884 5236578 54.01
21-22 .00161 96805 156 96727 5139694 53.09
22-23 .00156 96649 151 96574 5042967 52.18
Consider the following numerical examples taken from
Table 1:
a. do = lo x qo = (100,000) x (0.01801)
= 1,801
b. d1 = l1 x q1 -= (98199) x (0.00113)
= 111
c. l1 = lo – do = 100,000 - 1801
= 98,199
d. l2 = l1 – d1 = 98,199-111
= 98,088
e. Po 000,100
981991 == +
x
x
l
l
= 0.98199
f. qo = 1 – Po = 1- 0.98199
= 0.01801
Or, qo 000,100
1801
0
0 ==L
d
= 0.01801
g. What is the probability that a person aged 15 will die before reaching age 19?
From table 1, we obtain the following solution
198
97514
971231
115
1919
−=
−=l
lq
= 1 – 0.99599 = 0.00401
h. What is the probability that a person aged 10 will survive to exact age 15?
Solution
0.99813
97697
97514
l
l P
l
l P
10
15 15
x
tx x
t
=
==
= +
The mortality prospects of an individual at different ages are independent. Therefore, the
probability that a person aged 10 will die between the ages 15 and 19 is as follows:
97697
9712397514
1
10
1915
15
19
10
15
−=−
=
−=
l
ll
l
lX
l
l
= 0.00400
From the table 1 above, we can also evaluate probabilities using
fractional ages. Consider the problem of evaluating 1/3 P10 that a
person aged 10 will survive at least four months.
By definition
( ) ( )
97697
976679769731976973
1
31
10
111010
10
−−=
−−=
l
lllp
= 0.999898
199
Treatment of the under 5 or 0-4 Age Groups
For the estimation of the tlx column for the under 1 or the zero age
Group, we have lo = 0.3lo + 0.7l4
For the 1-4 age group, we have 4l1 = 1.3l4 + 2.7l5
For open ended intervals, say at age 75+, the following approximate
formulae hold:
lx = tdx
Tx = Lx
∞l75 = l75 Log10 l 75
Abridged Life Table
Since complete life tables are large, they are usually abridged.
The standard table has five-year intervals, except the first two. Since there are too many
deaths in the first year of life, the first interval is one year, that is (0-1), and the second is
from ages 1-5 years.
The column of the abridged life table is the same with the columns of the complete life table.
Abridged life tables are used extensively in specific purpose in clinical or animal studies.
They use interval (x, x + n) instead of (x, x + t)
The following table gives an example of an abridged life table.
Table 2. An abridged life table for males in Nigeria
X to
x+n
nMx nqx nPx Lx ndx nLx Tx
personal
ex
Age
interval
Central
age
specific
death rate
Probability
of dying of
dying in
interval x
to x + n
Probability
of
surviving
in interval
x to x+n
Persons
surviving
to exact
age x
No of
deaths x
to n+n
Person
years
lived x
to x+n
Year lived
after exact
age
Expectation
of life at
age x
200
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Under
1
0.1843 0.1632 0.8368 10,000 1632 8858 369856 37.0
1-4 0.0532 0.1861 0.8139 8368 1557 29268 360998 43.1
5-9 0.0095 0.0464 0.9536 6811 316 33265 329733 48.1
10-14 0.0056 0.0276 0.9724 6495 179 32028 295705 45.5
15-19 0.0064 0.0315 0.9685 6316 199 31083 264627 41.9
20-24 0.0069 0.0339 0.9661 6117 207 30068 133539 38.2
25-29 0.0066 0.0325 0.9675 5910 192 29070 203471 34.4
30-34 0.0086 0.0421 0.9579 5718 241 27988 17401 30.5
35-39 0.0097 0.0474 0.9253 5477 260 26735 146412 26.7
40-44 0.0155 0.0747 0.9526 5217 390 25110 119678 22.9
45-49 0.0238 0.1123 0.8877 4827 542 22780 94568 19.6
50-54 0.261 0.1225 0.8775 4285 525 20113 71788 16.8
55-59 0.0311 0.1443 0.8557 3760 543 17443 51675 13.7
60-64 0.0477 0.2131 0.7869 3217 686 14370 34232 10.6
65-69 0.0631 0.2725 0.7275 2531 690 10930 19862 7.8
70-74 0.0992 0.3974 0.6025 1841 732 13751 8932 4.9
75+ 0.1299 1.0000 0.0000 1109 1109 3372 2372 3.0
201
Summary of Study Session 7
In this study you have learnt about:
1) Definition of Life Table
���� Assumptions of Life Table
���� Usefulness of Life Tables
���� Types of Life Table
2) Relevance of Rates in Life Table Analysis
� Age- Specific Death Rate
� Crude Death Rate
Self-Assessment Question for Study Session 7
Having studied this session, you can now assess how well you have learnt its learning outcomes
by answering the following questions. It is advised that you answer the questions thoroughly and
discuss your answers with your tutor in the next Study Support Meeting. Also, brief answers to the
Self-Assessment Questions are provided at the end of this module as a guide.
SAQ (Test of learning outcome)
Write short note on the following:
i. The Generation Life Table
ii. The Conventional Life Table
iii. Complete Life Table
iv. Abridged Life Table
202
Complete the following life table:
X to
x+ t
qx lx dx Lx Tx ex
40-41 93703 250 93578
41-42
42-43 93165 33.71
43-44 92831
44-45 0.00399 369 2954625
45-46 92123 91916
46-47 91710 464
47-48
48-49 90746 510 28.52
49-50 0.00630 568 89952
203
References
Edward R. Dougherty (1990): Probability and Statistics for the Engineering, Compound and
Physical Sciences.
James T. Mcclave, Terry Sincich (1995, 5th Ed): A First Course in Statistics.
Lindgen, McElrath, Berry (1978, 4th Ed): Introduction to Probability and Statistics.
Murray R. Spiegel (1972, 1st Ed): Theory and Problems of Statistics.
Royal Statistical Society (2004): Statistical tables for use in examinations.
204
NOTES ON SAQS
Study Session 1
Solution
(a) Genotype frequency
i. 284.01482
421ˆ 11 ===N
NPAA
ii. 496.01482
735ˆ 12 ===N
NPAS
iii. 220.01482
326ˆ 22 ===N
NPSS
(b) Gene Frequency
i. = ASAA PP ˆˆ2
1+
= 0.284 + ½(0.496)
= 0.284 + 0.248
= 0.532
ii. )(ˆ SP = ASSS PP ˆˆ2
1+
= 0.220 + ½(0.496)
= 0.220 + 0.248
= 0.468
To obtain their standard errors, we calculate their variances first.
iii. V(P(A) = V(PAA) + V(PAS) – 2COV (PAA, PAS)
= N
PP
N
PP
N
PP ASAAASASAAAA 21
21 2)1()1( −
−+−
= 0.284(0.716) + 0.248(0.752) – 2(0.284)(0.248)
)(ˆ AP
205
1482 1482 1482
= 0.203344 + 0.186496 - 0.140864
1482
= 0.248976
1482
= 0.000168
Hence,
S.E. (P(A)) =
= 0.01296
By calculation, S.E.(P(S)) also is 0.01296
(c) 95% Confidence Interval
(i) ( ) ( ) ))(ˆ.(.ˆ
2APESZAPAP α±=
= 0.532 )01296.0(2
05.0Z±
= 0.0532 ±Z0.025(0.01296)
= 0.532 ±1.96(0.01296)
= 0.532±0.0254016
= (0.507, 0.557)
That is 0.507 < P(A) < 0.557
(ii) For P(S) = ))(ˆ(.)(ˆ2
SPESZSP α±
= 0.468 ± Z0.05/2 (0.01296)
= 0.468 ± 1.96 (0.01296)
= 0.468 ± 0.0254016
= (0.443, 0.493)
That is 0.443 < P(S) < 0.493
000168.0
206
Study Session 2
1. Find the median of the following observations 8, 2, 6, 1, 8, 9, 3
Solution:
Arranging the observations in ascending order of magnitude gives 1,2,3,6,8,8,9
The number of observations is odd i.e.
The middle observation, i.e. median is 6
2. Determine the geometric mean of the following set of data 2, 4, 8,11,4
Solution:
Using the previous example, find the geometric mean?
Solution:
Time (Hour) f Midvalue(x) logx f logx
10 - 19 4 14.5 1.1614 4.6456
20 - 29 6 24.5 1.3892 8.3352
30 - 39 7 34.5 1.5378 10.7646
40 - 49 10 44.5 1.6484 16.4840
50 - 59 3 54.5 1.7364 5.2092
60 - 69 7 64.5 1.8096 12.6672
70 - 79 8 74.5 1.8722 14.9776
80 - 89 9 84.5 1.9269 17.3421
90 - 99 6 94.5 1.9754 11.8524
( )5
5
. 2 4 8 1 1 4
2 8 1 6
4 .8 9 7
4 .9
G M x x x x=
==≈
207
Total 60 102.2779
[ ]7046.1log
60
2779.102log
loglog. 1
Anti
Anti
f
xfAntiMG
n
x
=
=
=∑
∑=
G.M = 50.6524
Study Session 3
1. Nine crippled children were asked to perform a certain task, the average time required to
perform the task was 7 minutes with a standard deviation of 2 minutes. Set the null
hypothesis that 0µ = 10 mm and = 0.1
Solution
H0 : µ = 10
H1 : µ 10
= 0.1
Test statistic
5.467.0
33
23
92
107
−=
−=
−=
−=−=
−=
cal
cal
t
ns
xt
ns
xt
µ
µ
α
≠
α
208
Conclusion
Since the calt lie in the rejection region, we reject Ho and conclude that there is significant
evidence at 10% level that the population mean is not 10
2. A researcher wishes to estimate the mean maximum weight at birth of 40 randomly
selected babies delivered in four teaching hospitals in a day. He is willing to assume that the
weights are approximately normally distributed with a variance of 144. He found out that the
mean weight for these babies is 84.3kg. Can the researcher conclude that the weight for the
population is at most 95kg? Use α = 0.01
Test statistic
63.590.1
7.1040
12953.84
326.201.0
−=
−=
−=
===
−=
Z
ZZZn
xZ
tab α
σµ
Conclusion
Since the calZ lie in the acceptance region, we accept H0 and conclude that the mean weight
for the population mean is at most 95kg.
209
Study Session 4
When planning and executing an experiment, the following are the procedures to be followed
in order to achieve a great success:
1. Problem Recognition and Definition: This is the first step in the planning and
execution of an experiment. After the problem has been recognized, it must be clearly
defined. The more we understand the problem at hand, the easier it is for us to find an
appropriate solution to it.
2. Objective of the Experiment: A clear statement of the objective is very important.
This may be in form of a hypothesis to be tested or a question which requires an
answer. If the objective is more than one, arrange them in their order of priority.
3. Analysis of the Problem and Objectives: The aims of the objectives must be
carefully studied and the necessity of the experiment stated in precise terms.
4. Treatment Selection: Reasonable selection of the treatments goes a long way in
achieving the desired results.
5. Selection of the Experimental Units: This involves the selection of experimental
units from the population on which inferences are to be made. The selected or chosen
experimental units must be a true representative of the entire population under
consideration.
6. Selection of the Experimental Design: The objective of the experiment should be
carefully considered here; so that an appropriate design will be given the needed
result that should be preferred.
7. Selection of Units for Observation and the Number of Replications: This involves
the determination of the number of replications and the plot size to be chosen in order
to produce the desired result.
8. Data to be collected: Consideration of data to be collected is indispensable. No
essential data should be omitted.
9. Outlining Statistical Analysis and Summarization of Results: Given the summary
of the expected results in order to show the expected effects of the treatments on the
experimental units. This can be presented on a summary table or graphs. List its
sources, variation and corresponding degrees of freedom in the analysis of variance
210
table. Check whether the expected results of the experiment are relevant to the
objective.
10. Performing the Experiment: The experiment should be conducted without any
element of personal biases. Record all observations properly and later check them
with a view to minimizing recording errors or deleting data that are obviously
erroneous.
11. Data Analysis and Interpretation of Results: Analyse all data and give the
interpretation of the results obtained from the experiment. Data should be scrutinized
and carefully analyzed so that the result will be accurate. If either the data is not well
analyzed or the results not well interpreted, the conclusions may be wrong.
Study Session 5
Research
It can be defined as an organized, formally structured methodology to obtain new knowledge.
Examples of research are the laboratory research, epidemiology research etc.
Placebo
It is an identical looking drug that has no active ingredients. It is an inert substance which
looked like and tasted like the treatment.
Experiment
It is a planned investigation, which is carried out to obtain facts or deny the results of
previous investigation.
Randomization of Randomization
It the process of using random number to assign treatment to patients (subjects). Subjects
should allocate to treatment group according to some chance mechanism.
211
Study Session 6
The standard preparation of a drug gave the determinations of 10.5, 10.7, 9.8, 9.5 and 9.5. For
the test preparation, we have 12.7, 11, 5, 10.7, 13.5, 12.7 and 13.3.
c. Find the point estimate of the relative potency of the test preparation.
d. Give a 90% confidence interval for the relative potency.
Solution
Relative potency, S
T
X
XR == ρ
m
XX
m
iT
T
i∑== 1 and
n
XX
n
iS
S
i∑== 1
siX ( )2
ssi XX − TiX ( )2
TTi XX −
10.5 0.25 12.7 2.0449
10.7 0.49 11 0.0729
9.9 0.04 5 39.3129
9.5 0.25 10.7 0.3249
9.5 0.25 13.5 4.9729
12.7 2.0449
13.3 4.1209
Total 50 1.28 78.9 52.8943
Therefore S
T
X
X=ρ
Where m
XX
m
iT
T
i∑== 1
= 78.9
7
= 11.27
212
127.1 10
27.11 implies This
10550
1
=
==
=∑
=
ρ
n
XX
n
iS
S
i
Therefore the relative potency of the test preparation is 1.127
b) 90% confidence internal for the relative potency
( )
−−−±
− 2
222 11ˆˆ
1
1
TXm
stg
gρρ
And
( ) ( ) ( )
( ) ( )42.5
8943.5228.110
2
2
12
1 1
2212
=
+=
−+−−+=
−
= =
− ∑ ∑
s
s
XXXXmnsn
i
m
iTTissi
Therefore g = ( ) ( )
2
2
105
4258121
X
..
= 0. 036
Therefore 90% confidence limit for ρ
812110050
21025722
2
22
.
,.
.,,
==
=
=
−+−+
t
tt
Xn
stgwhere
nm
S
α
213
( ) ( ) ( )
( )( ) ( )( )
( )7160
57001691
3252801691
02019640127112710371
27117
42581211036011271127103601
2
2
221
.,.
..
..
.....
.
......
=
±=
±=
−−±=
−−−±−= −
X
Therefore, the confidence limit for ρ
is 7.1ˆ6.0 ≤≤ ρ
Study Session 7
A. The Conventional Life Table
The conventional or periodic life table gives a cross-sectional view of the mortality and
survival experience of a population during a current year (e.g. the Oyo State population of
2006). It is entirely dependent on the age-specific death rates prevailing in the year for which
it is constructed.
B. The Generation Life Table
The generation (or cohort) approach has been basically explained by the example given by
the conceptual approach (cohort study) with the modification to take account of practical
situations. In practice, the cohort or generation life table is usually constructed for groups still
living and stops at the present age. The generation life-table requires data covering the entire
life span of persons making up the cohort, which stands as a serious limitation to its
preparation.
C. Complete Life Table
A complete life table is a table in which the mortality experience is considered in single years
of age throughout the life span. All the columns of the life table are given for single year and
it is extremely detailed.
214
D. Abridged Life Table
An abridged life table is one in which the measures are given not for every single year of age
but for age groups. The values are also in general more appropriate i.e. elaborate adjustments
are not made, although they are accurate enough for most practical purposes. It contains
columns similar to those of the complete life table. The only difference is the length of
intervals.