Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
USE OF
MATHEMATICAL
PROGRAMMING IN
STRATIFICATION
WITH COST
CONSTRAINT
USE OF MATHEMATICAL PROGRAMMING IN
STRATIFICATION WITH COST CONSTRAINT
by
Aluwesi Volau Fonolahi
A thesis submitted in fulfilment of the requirements for the degree of
Masters of Science in Mathematics.
Copyright © 2015 by Aluwesi Fonolahi
School of Computing, Information and Mathematical Sciences
Faculty of Science, Technology and Environment
The University of the South Pacific
Suva, Fiji Islands.
September, 2015
i
Contents Acknowledgement iv
Abstract v
Preface vi
1. Introduction…………………………………………...……..……………….…..1
1.1 Survey…………………………………...…………………………..….....1
1.2 Stratified Random Sampling…..…………………..……………….….…..2
1.3 Mathematical Programming Problem..…………………….…..……….....4
1.4 The Dynamic Programming Technique…………………....………….…..5
1.5 The Review of the Literature ……..………………...…………………..…7
2. Determination of the Optimum Strata Boundaries
with Cost Constraint..…………..……………………………………..……...….10
2.1 Introduction…………………..…………..…………………………..…...10
2.2 Formulation of the Problem of Determining the OSB as an
MPP……………………………………………………………….…...…..10
2.3 Solution Procedure Using Dynamic Programming
Technique…..…………………………………………………….....….….14
3. Determination of the Optimum Strata Boundaries for a Population
with Exponential Study Variable.……………………….………..….……...…....18
3.1 Exponential distribution………………....…………………………….…..18
3.2 Formulation of the Problem of Determining the OSB for
Exponential Variable……………………………….……………..……....19
3.3 Numerical Illustration of the Solution Procedure………………………....22
4. Determination of the Optimum Strata Boundaries for a Population
with Right-Triangular Study Variable ...….………………………….…….…...26
4.1 Right-Triangular Distribution…………………….……………………....26
4.2 Formulation of the Problem of Determining the OSB for
Right-Triangular Variable .……………………………….……………....27
ii
4.3 Numerical Illustration of the Solution Procedure………………….…..….31
5. Determination of the Optimum Strata Boundaries for a Population
with Cauchy Study Variable ………………..……………...………….….…......35
5.1 Cauchy Distribution…………………………………………………..…. .35
5.2 Formulation of the Problem of Determining the OSB for
Cauchy Variable …………………………………...……………………....36
5.3 Numerical Illustration of the Solution Procedure..………………………...41
6. Determination of the Optimum Strata Boundaries for a Population
with Power Study Variable …………………………………………………........45
6.1 Power Distribution………………...………………………………….........45
6.2 Formulation of the Problem of Determining the OSB for
Power Variable...…………………………………………………………...45
6.3 Numerical Illustration of the Solution Procedure ………….……………...49
7. Conclusion……………...……………………………………….…………….…..53
8. Bibliography………………...………………………………..…………………...55
9. Appendix…………..……………………………………………………….….......64
A. The C++ Program Created to Determine the OSB with cost factor
for Exponential Distribution…………..…………………………………..…..64
B. The C++ Program Created to Determine the OSB with cost factor for
Right-Triangular Distribution………………………………………………....74
C. The C++ Program Created to Determine the OSB with cost factor for
Standard Cauchy Distribution…………………………………….…………..84
D. C The C++ Program Created to Determine the OSB with cost factor for
Power Distribution ……………………………………………………......….94
iii
List of Tables 3.1 OSW, OSB and Optimum Value of the Objective Function for
Exponential Distribution……………………………………………………..24
4.1 OSW, OSB and Optimum Value of the Objective Function for
Right-Triangular Distribution ….......................................................................33
5.1 OSW, OSB and Optimum Value of the Objective Function for
Cauchy Distribution………………………………………………………..….43
6.1 OSW, OSB and Optimum Value of the Objective Function for
Power Distribution………………………………………………..……...……51
iv
Acknowledgement First of all I would like to thank the Almighty God for his continual blessing to my life.
This thesis is dedicated to my husband Mr Jale Kotobalavu Fonolahi and my four children,
Joana Agnes Volau Sorowale, Susana Elizabeth Takaiwai Fonolahi, John Rabici Sakai
Fonolahi and Loata Talei Sorowale Fonolahi. Without their support, understanding and
encouragement, I would have not completed this thesis.
There are a number of people who I would also like to thank for helping me to complete
this thesis.
� My parents Mr Viliame Sorowale and Mrs Loata Lutu Volau Sorowale for their
guidance, encouragement and prayers.
� My supervisor, Dr. M.G.M Khan who has guided me through the process of
research and scholarly writing. I would like to express my sincere gratitude for
his valuable suggestions, motivation and counselling in this meritorious task.
� Mr Karuna Reddy and Mr Shalvindra Prasad for answering my queries in
programming.
� My sponsors, “The Itaukei Scholarship Unit” for providing financial support
towards my studies.
� My friends who have helped me in any way to complete this thesis.
v
Abstract The aim of survey design is to obtain maximum precision at minimum cost. To achieve
this, stratified random sampling is one of the commonly used sampling techniques in
designing a survey. While using stratified sampling, the problem of stratification, that is,
determining optimum strata boundaries (OSB) is one of the main problems encountered
by survey designers. Many authors have proposed different methods of determining the
OSB by considering merely a fact that the total sample size is fixed. They ignored the fact
that the cost of measurement per unit may vary from stratum to stratum. This research is
an attempt to determine the OSB when the budget of the targeted survey is fixed in
advance and the measurement cost per unit varies across the strata. The problem is
formulated as a mathematical programming problem and solved to obtain the optimum
strata width, which is then used to calculate the optimum strata boundaries. The
formulated mathematical programming problem, being a multistage problem is solved by
developing a Dynamic Programming Technique. Numerical examples using the
population in which the stratification variables follow the exponential distribution, right-
triangular distribution, Cauchy distribution and the power distribution are presented to
illustrate the procedure developed in this thesis.
vi
Preface This thesis entitled “Use of Mathematical Programming in Stratification with Cost
Constrait” is submitted to The University of the South Pacific, Suva, Fiji to supplicate the
Master of Science in Mathematics.
When conducting a survey it is always best to include the data of the whole population,
but this is usually impossible because it is usually too expensive and time consuming. Due
to this, many sampling techniques have been developed. When conducting these sampling
techniques, the researcher’s aim is to obtain maximum precision at minimum cost. This
means that the results obtained should be as close as possible to the results of the
population.
One commonly used technique in survey is the stratified random sampling that increases
the precisions of the estimates. When conducting survey using stratified sampling one of
the main factors that one should consider is the determination of the optimum strata
boundaries (OSB), known as optimum stratification. The strata boundaries chosen should
ensure that the sample inside a strata is as homogenous as possible. This thesis looks at
constructing the OSB while the cost factor, that is, the total budget of the survey is fixed
in advance and the measurement cost per unit varies across strata is considered.
This thesis has seven chapters. Chapter 1 consists of the introduction which explains the
purpose of conducting a survey and sampling. It explains why the stratified random
sampling technique has been very popular when compared to other sampling techniques
and the factors that need to be considered when conducting stratified random sampling.
This chapter also describes a mathematical programming problem (MPP) as a technique
of finding the optimum values of an objective function given a set of constraints. Also in
this chapter the dynamic programming technique is explained as a method used to solve
complex optimization problems. The chapter ends with a literature review of the different
methods of determining the OSB.
vii
In Chapter 2, the problem of determing the OSB with cost factor is considered. First, a
brief introduction on considering the cost factor in sampling is explained. Next, the
formulation of the problem as an MPP with cost factor is given. The formulated MPP is
reconsidered as an equivalent MPP of determining optimum strata width, which is then
solved for obtaining OSB. Lastly, the solution procedure using a dynamic programming
technique is described.
In surveys the main stratification variables may follow different types of distributions. In
most cases it is assumed that the data follow a normal distribution, but in reality this is not
the case. In many surveys such as engineering, business and economics, etc. stratification
variables generally have a distribution different from the normal distribution. So because
of this, it is important to consider the different types of distributions when determining the
OSB of a stratification variable.
Thus, in Chapter 3 the problem of determining the OSB with cost factor for a population
with exponentially distributed stratification variable is considered. The chapter begins
with an introduction of the exponential distribution. Then the MPP for determining the
optimum strata width is described. This is then solved by developing the solution
procedure using a dynamic programming technique. A computer program coded in C++
was created to execute the solution procedure giving both the optimum strata widths and
the OSB as the output. The work carried out in Chapter 3 was presented at the IEEE Asia
Pacific World Congress on Computer Science and Engineering (APWC on CSE) held
during 4th and 5th of November 2014 at Plantation Island in Fiji (see Fonolahi and Khan,
2014).
Chapters 4, 5 and 6 look at determining the OSB with cost factor for the stratification
variables, respectively, with right-triangular distribution, Cauchy distribution and power
distribution.
Finally, Chapter 7 gives a brief conclusion of this research work, followed by a
comprehensive list of references in the bibliography and the code for the C++ computer
programs in the appendix.
1
Chapter 1
Introduction
______________________________
1.1 Survey
Survey is the process of collecting data that aids in decision making for development.
These data are analyzed and findings are interpreted from the results of the analyses. Later
recommendations and conclusions are drawn from the findings.
Survey is a very important component for the development of any country or organization.
It is from survey (such as estimating poverty, agricultural products, etc.) that we get
information on how to improve procedures for development. In government and other
non-profit organizations, before new policies are created, it is very important that a survey
is conducted first to determine whether there is a need to make new policies, what changes
are needed from the existing policies and what are the advantages of having new policies
compared to the old ones.
Manufacturers do a lot of surveys to determine whether the public like their products or
to estimate the number of products that they are manufacturing and selling, how they can
improve the quality of their products so that it suits the market and the people’s needs.
Big businesses also conduct surveys to find out the best venue to set up their businesses
and how many clients they are likely to have and the type of products that their clients
need. The decisions resulting from information obtained in the survey is critical for
developing production and marketing policies and because of this, it is very important that
the information obtained from the survey is as accurate as possible. Otherwise, conducting
the survey will be a waste of time, money and resources.
2
The best way to conduct a survey is to reach out and question each individual in the
population. Yet, if the population is very large, this can be very time consuming and also
very expensive. Due to this, there has been a wide range of sampling techniques developed
to try and minimize the time spent and the cost of the survey.
Survey sampling simply means taking out a small group from the population and the
statistical analysis of this small group is assumed to represent the whole population. Some
common sampling techniques used are simple random sampling, systematic sampling,
quota sampling, stratified random sampling and cluster sampling. Choosing the sample is
a very critical process as we try to ensure that the sample taken gives a statistical result
that represents very closely the analysis given by the population. So after one has chosen
a sampling technique, the next step is to develop a way of using this technique that will
make the results obtained to be as precise as possible. The sampling technique that will be
looked at in this research project is stratified random sampling.
1.2 Stratified Random Sampling
In stratified random sampling, a population of size N is divided into smaller non
overlapping groups of sizes 1 2, , .. LN N N� such that 1 2 LN N N N� ���� � . These L
groups are called strata. After the strata have been formed then an independent sample of
sizes 1 2, , .. Ln n n� is drawn from within each stratum by simple random sampling. This
method is used when we have heterogeneous units. Heterogeneous means that the unit
vary a lot or have a very wide range. The main aim of stratification is to try and group
similar types of units together, that is, the strata should be as homogeneous as possible.
There are many reasons why stratified random sampling is chosen over the other methods.
Some reasons are:
1. The researcher ensures the representation of all the different subgroups in the
population, especially, in cases where we have extreme ends. An example is, if
there is a survey to be conducted on the average income of people working in a
company, then strata could be divided according to different categories of
3
positions in the company, i.e. executive positions, senior positions, officers,
assistants and interns. In this way, the employees within a group are homogenous
with respect to salary and also all the groups of salaries are represented in the
sample, even the executive position with high salary, although there are very few
people holding these positions.
2. There is increased precision obtained from the result since the subgroups are more
homogeneous.
3. The researcher is able to obtain individual results for each stratum and compare
the results for each stratum, especially if they are interested in finding out more
information or highlights about a particular subgroup. This can also set a platform
for further research. For example, if there is an agricultural survey on when is the
right time to pick the oranges ensuring that they produce sweet juice. Then, the
strata can be divided according to the different types of oranges.
4. Since the variability of data within each strata is less compared to working with
the entire population, stratified random sampling usually requires less sample.
This is because each stratum has very similar data and no matter if the sample is
big or small the results obtained will be similar. Thus, choosing fewer samples
saves a lot of money, time and energy.
5. Stratification has an advantage due to administration because interviewers can be
trained to interview a specific group. This helps the interviewer to study and
manage a specific group which makes the work easier for the interviewer instead
of focusing on a very wide range of population. This will help save money and
time in terms of training the interviewer different skills for the wide range of
population. This will also guarantee that a set of rich information is obtained from
the interview. An example is, if a research has to be conducted to people of
different languages then the strata can be divided according to these different
languages and the interviewer can be assigned to the particular strata where they
know the language instead of training each interviewer to know all the types of
languages.
4
It is important to note that if stratification is not done properly then the results obtained
will be unreliable. Thus to obtain maximum precision in estimate of the study variable
when using stratification, one should consider the following factors:
1. The choice of the stratification variables.
2. The determination of the number of strata.
3. The determination of the optimum strata boundaries.
4. The determination of the optimum sample size to be selected from within each
stratum.
This research project will specifically look at the problem (3) above.
1.3 Mathematical Programming Problem
A Mathematical Programming Problem (MPP) can be stated as a technique of finding the
optimum solution of an objective function from all feasible values given by a set of
constraints. The general form of an MPP is given as:
Maximize (or minimize): 1 2 3 ( , , , .. )nZ f x x x x� �
Subject to � �� 1 2 3, , , ., , , 0; 1,2,3 .i ng x x x x i m�� � � � � and 0; 1,2,3,jx j n� � �� .
All the functions in the MPP above are assumed to be continuously differentiable unless
stated otherwise and also for each i only one of the signs , , � � holds true. An MPP may
consist of one or more objective functions and constraints of several decision variables.
The equation that expresses the system response as a function of decision variables is
referred to as the objective function. The objective function is the function that we wish
to maximize or minimize. In the general form above the objective function is given as:
� �1 2 3 , , , .. . nZ f x x x x� � The decision variables ( 0; 1,2,3, )jx j n� � �� are variables
which can be controlled and which influence the performance of the system. In most
situations some values of the decision variables are not possible. When there are
5
limitations on resources required to implement a system, they will be expressed as
constraints equations. In the general form above the constraints functions are given as
( ).ig x The aim of solving the MPP is to find the values of the decision variable that gives
the optimum solution for a particular objective function given a set of constraints. Thus
when solving the MPP one can locate value(s) of the decision variable(s) that will result
in the “best” (optimum) system in view of limited resources available.
If both the objective function and the constraints consist of all linear functions then the
MPP is a Linear Programming Problem (LPP). In more complex situation where nonlinear
functions are involved then the MPP becomes a Nonlinear Programming Problem (NLPP).
The MPP has received a great attention from researchers in the field of mathematics,
economics and operations research. The advantage of MPP is that one can easily
manipulate the variables, parameters, constraints or even change the objective function.
The MPP has grown into many branches depending on the nature of the objective function,
the constraints and the decision variables. Some branches of MPP are Integer
Programming Problem (IPP), Quadratic Programming Problem (QPP), Convex
Programming Problem (CPP), Separable Programming Problem (SPP), Multi Objective
Programming Problem (MOPP), Fractional Programming Problem (FPP) and Geometric
Programming Problem (GPP).
The problem of stratification usually involves nonlinear functions so the research carried
out in this thesis will be attempted using NLPP. Then, a Dynamic Programming technique
is used to solve the problem.
1.4 The Dynamic Programming Technique Dynamic Programming is used to solve complex optimization problems. After the MPP
has been constructed, an appropriate optimization technique must be chosen. This will
depend on the form of the objective function and constraints, the number and nature of the
variables and the kind of computational facilities available. Due to the complexity of the
nature of the problem to be optimized there is a need to make some transformation. The
transformed model preserves the properties of the original model but it is now in a form
6
that can be easily optimized. This transformation is known as dynamic programming.
Dynamic programming takes a sequential or multistage decision process containing many
interdependent variables and converts it into a series of single-stage problems, each
containing only a single decision variable. This transformation is invariant in that the
number of feasible solutions and the value of the objective function as associated with
each feasible solution are preserved. The transformation is based on Bellman’s (1957)
principle of optimality that:
“An optimal set of decisions has the property that whatever the first decision is, the
remaining decisions must be optimal with respect to the outcome which results from the
first decision.”
Although the principle of optimality seems both obvious and simple, it can more
appropriately be described as powerful, subtle and elusive. We may say that a problem
with N decision variables can be transformed into N sub-problems, each containing only
one decision variable. As a rule of thumb, the computations increase exponentially with
the number of variables, but only linearly with a number of sub-problems. Thus there can
be a great computation savings. Often this saving makes a difference between an
insolvable problem and one requiring only a small amount of computer time.
An MPP that has the characteristics listed below can be solved using dynamic
programming techniques.
1. The given MPP may be described as a multistage decision problem, where at each
stage, the value(s) of one or more decision variables are to be determined.
2. The problem must be defined for any number of stages.
3. At each stage, there must be a specified set of parameters describing the state of
the system, that is, the parameters on which the values of the decision variables
and the objective functions depend.
4. The same set of state parameters must be described as the state of the system
irrespective of the number of stages.
5. The decision at any stage, that is, the determination of the decision variable(s) at
any stage must have no effect on the decisions of the remaining stages except in
changing the values of the parameter which describes the state of the system.
7
Certain problem areas, such as inventory theory, allocation, control theory, and chemical
engineering design, have been particularly fertile for dynamic programming applications.
Dynamic Programming was certainly practiced long before it was named. Wald’s (1947)
work on sequential decision theory contains the seed of dynamic programming approach.
The two papers by Dvoretzky et al. (1952), on inventory theory are certainly in the spirit
of dynamic programming.
Undoubtedly, however, Richard Bellman is the father of dynamic programming. His
research at the Rand Corporation in the 1950’s led to the publication of a large number of
significant papers on dynamic programming first published in Bellman (1957). He
invented the rather undescriptive but alluring name for the approach-dynamic
programming. A more representative but less glamorous name would be recursive
optimization.
1.5 The Review of the Literature To obtain the maximum precision of the estimates of population parameters when using
stratified random sampling one of the important problems to consider is the determination
of the optimum strata boundaries (OSB). In practice, the OSB is determined by cutting
the range of the distribution of the study variable at suitable points.
The choice of OSB is important to ensure that the units in each stratum are homogenous.
This means that in order to achieve maximum precision, the stratum variance � �2h� should
be as small as possible for a given type of sample allocation. The problem of determining
OSB was first studied by Dalenius (1950) when he used the study variable as the
stratification variable. He presented a set of minimal equations for finding the OSB.
Unfortunately, the minimal equation was difficult to solve because of its implicit nature.
Some other classical methods of obtaining the OSB was determined by Dalenius and
Gurney (1951) where they mentioned that the boundary points can be obtained by making
h hW � constant where hW is defined as the weight of the thh stratum. However, they
found that an explicit solution could not be determined but they managed to achieve some
relations which the OSB points must satisfy. So starting with a set of points they proceeded
8
towards the optimum set by iterative steps, but it was noted that the results can be
unreliable for more than two strata. On the other hand Mahalanobis (1952), Hansen and
Hurwitz (1953) suggested that the stratum boundaries can be obtained by making h hW
constant, where h is defined as the stratum mean. Their rule is to consider equal stratum
totals given the condition that the coefficient of variation within the strata are equal and
will remain the same if the strata size is adjusted. The advantage of this rule is its
simplicity and it has been claimed that it works well with a large number of real
populations.
Some authors that came up with an approximation rule are Aoyama (1954) who suggested
that boundaries of equal width should be made, while Ekman (1959) gave a condition for
determining the stratum boundaries which is the 1( )h h hW x x �� should be constant. In
reality the frequency distribution of the study variable is usually not known, so authors
like Dalenius (1957), Taga (1967), Singh and Sukhatme (1969, 1972, 1973), Singh (1971),
Singh and Prakash (1975), Mehta et al. (1996), Rizvi et al. (2002) and Gupta et al. (2005)
use the frequency distribution of an auxiliary variable and came up with different
approximation methods of determining the OSB.
Dalenius and Hodges (1959) first came up with the method of constructing the OSB by
dividing the square root of the cumulative frequency at equal intervals. This method was
tested by Cochran (1977) who mentioned that it works well, especially when the
regression of y on x is linear and the � (correlation coefficient) is nearly perfect. It was
noted that the disadvantages of using this rule are the breaks for the intervals and the
number of initial class intervals are random. Due to these disadvantages authors like Singh
and Sukhatme (1969, 1972, 1973), Singh (1971), Singh and Prakash (1975), Mehta et al.
(1996) and Rizvi et al. (2002) and Serfling (1968) tried to modify the Dalenius and Hodges
(1959) rule in some way. On the other hand Cochran (1961), Hess et al. (1966) and Murthy
(1967) compared some classical methods of obtaining the OSB and concluded that the
Ekman method and the Dalenius and Hodges method worked consistently well.
9
Sethi (1963) came up with another method where he proposed that the boundaries can be
calculated from the calculus equations: � � � �2 22 21 1 1
1
h h h h h h
h h
x x � �� �
� � �
�
� � � �� .
Lately, more researchers have moved into the direction of proposing an algorithm to
determine the OSB such as Unnithan (1978), Lavallee and Hidiroglou (1988), Niemiro
(1999), Nicolini (2001), Lednicki and Wieczorkowski (2003) and Kozak (2004). One such
method was proposed by Buhler and Dutler (1975) where the OSB is formulated as an
optimization problem and solved using a dynamic programming technique. This method
was reviewed by Khan et al. (2008). Lavallee (1988, 1988) also used this approach where
the OSB divides the population domain of two stratification variables into distinct subsets
such that the precision of the variable of interest is maximized.
Recently Khan et al. (2002) proposed a technique to obtain the exact value for the OSB
when the frequency distribution of the study variable is given and the number of strata is
fixed in advance by formulating the problem as an MPP and solving it using the Buhler
and Deutlers dynamic programming approach. Later Khan et al. (2002, 2003, 2005, 2008,
2008, 2014) and Nand and Khan (2005a, 2005b) applied these procedures to determine
the OSB for other populations with different type of distributions such as uniform, right
triangular, exponential, triangular, standard normal, Cauchy, power and log-normal type
frequency distribution. Khan et al. (2015), also looked at determining the OSB for skewed
population using auxiliary information that follows the gamma distribution. They found
the OSB by considering the problem as determining the optimum strata width (OSW),
which they formulated as an MPP and solved using the dynamic programming method.
This research is an attempt to determine the OSB for the populations in which the study
variable follows exponential, right-triangular, Cauchy and power distribution
respectively. The OSB obtained for these populations by extending the problems
discussed in Khan et al. (2002, 2005) while taking into account the cost constraints as
discussed in the subsequent chapters.
10
Chapter 2
Determination of the Optimum
Strata Boundaries with Cost
Constraint
______________________________ 2.1 Introduction
In the literature so far the discussion of constructing optimum stratification has been made
merely in terms of a given total size of sample. However, in practice, in many surveys the
total budget is fixed in advance and the cost of measurement per unit within stratum varies
across the strata. Thus, the OSB obtained based on the total sample size may not remain
optimum for a given cost. Due to this fact, it is important to consider the problem of
determining the OSB that is constrained by cost.
2.2 Formulation of the Problem of Determining the
OSB as an MPP
In stratified random sampling, where the population is divided into L strata, an
unbiased estimate of the population mean hY�
is given by
1
Lst h hh
y W y�
�� (2.1)
with variance
� � 2 21
1 1Lst h hh
h h
V y W Sn N�
� �� �� �
� �� , (2.2)
11
where hh
NW N� is the proportion of population contained in thh stratum, hy is the mean
of a sample of size hn and 2hS is the variance of thh stratum.
The total cost C of a survey may be expressed as
0 1
Lh hh
C c c n�
� �� , (2.3)
where 0c represents the overhead cost. This is usually the cost of administration and
conducting training for interviewers. The term hc gives the cost of collecting information
per unit in thh stratum. This is usually the cost of travelling to conduct the interview and
the cost of interview, which cost is usually different from stratum to stratum. A reason for
this could be the different distance the interviewer will need to travel to conduct the
interview. For example, in a particular strata the houses, where one needs to conduct the
interviews, are very close together so the cost of travelling is very cheap. Whereas, in
another strata the houses are far apart thus the cost of travelling is very expensive.
Then, the problem of determining optimum allocation of sample size : 1,2,...,hn h L� for
which � �stV y in (2.2) is minimum for a fixed total cost C is given by
Minimize � � 2 21
1 1Lst h hh
h h
V y W Sn N�
� �� �� �
� �� ,
subject to 0 1
Lh hh
c c n C�
� �� . (2.4)
Solving the problem stated in (2.4) using a Lagrange multiplier technique, the optimum
allocation � �hn is obtained by
0
1
h hh L
hh h hh
C c W SncW S c
�
�� ��
. (2.5)
Substituting (2.5) in (2.2), that is, the variance with this optimum allocation is given by
12
� �� �2 2 2
1
10
Lh h h Lh h h
st hh
W S c W SV yC c N�
�� �
�
�� . (2.6)
If the finite population correction is ignored, then minimizing the expression on the right
hand side of (2.6) is equivalent to minimizing
1
Lh h hh
W S c�� . (2.7)
Assuming that the stratification variable x has a continuous frequency function � �f x ,
a x b and if the population is divided into L strata and 0x a� and Lx b� are the
initial and final value of the distribution then the problem of determining the OSB is to
cut the range of distribution, d , that is
0Lx x d� � (2.8)
at the intermediate points 1 2 1Lx x x � 1x such that the variance in (2.7) is minimum.
With a known frequency function � �f x of the stratification variable x , the values of hW
and hS in (2.7) are obtained by
� �1
h
h
x
hx
W f x dx�
� � , (2.9)
� �1
2 2 21 h
h
x
h hh x
S f x dxxW
�
� �� , (2.10)
where
� �1
1 h
h
x
hh x
x f x dxW
�
� � (2.11)
and � �1,h hx x� are the boundaries of th stratum.
So (2.7) can be expressed as a function of boundary points � �1,h hx x� .
13
Let
� �1,h h h h h hf x x W S c� � . (2.12)
Then the problem of obtaining the OSB can be expressed as:
Find 1 2 1, , , Lx x x �1x that
Minimize � �11,L
h h hhf x x���
Subject to 0 1 2 1L La x x x x x b�� � � . (2.13)
Let
1h h hl x x �� � (2.14)
denote the width of the thh ( 1, 2, . )h L� �� stratum.
Obviously, with the above definition of hl , the range of the distribution in equation (2.8)
is expressed as a function of stratum width as:
01
1 1( )L
hhh
h h LL xl x x x d�
��� � � � �� � . (2.15)
The thh stratification point : 1,2,..., 1hx h L� � is then expressed as
0 1 1h h h hx x l l x l�� � � � � �ll xlh hhh . (2.16)
Adding (2.15) as a new constraint, the problem of determining the OSB can be treated as
the problem of determining the optimum strata widths 1 2, ......, Ll l l and can be expressed as
the following Mathematical Programming Problem (MPP):
Minimize � �11,L
h h hhf l x ���
subject to 1
Lhh
l d���
14
and 0; 1,2,...,hl h L� � . (2.17)
Initially, 0x is known. Therefore, the first term � �1 1 0,f l x in the objective function of
(2.17) is a function of 1l alone. Once 1l is known, the second term
� � � �2 2 1 2 2 0 1, ,f l x f l x l� � will become a function of 2l alone and so on. Due to this special
nature, the MPP (2.17) may be treated as a function of hl alone and can be expressed as:
Minimize � �1
Lh hh
f l��
subject to 1
Lhh
l d���
and 0; 1,2,...,hl h L� � . (2.18)
2.3 Solution Procedure Using Dynamic Programming
Technique
The problem (2.18) is a multistage decision problem in which the objective function and
the constraint are sums of separable functions of ; 1,2,...,hl h L� . Due to this separable
characteristic and the nature of the problem, the MPP (2.18) may be solved using a
dynamic programming technique (see Khan et al., 2008). Dynamic programming
determines the optimum solution of a multi-variable problem by decomposing it into
stages, each state comprising a single variable sub-problem. A dynamic programming
model is basically a recursive equation based solution procedure, which is based on
Bellman’s principle of optimality (Bellman, 1957). The recursive equation links the
different stages of the problem in a manner which guarantees that each stage’s optimum
feasible solution is also optimal and feasible for the entire problem (see Taha 1997, chapter
10).
Considering a sub-problem of (2.18) for first ( )k L� strata:
Minimize � �1
kh hh
f l��
15
subject to 1
kh kh
l d���
and 0; 1,2,...,hl h L� � . (2.19)
where kd d� is the total width available for the division into k strata.
Note that kd d� when k L� .
The transformation functions are given by
1 2
1 1 2 1
2 1 2 2 1 1
2 1 2 3 3
1 1 2 2
.
.
.
.
k k
k k k k
k k k k
d l l l
d l l l d l
d l l l d l
d l l d l
d l d l
� �
� � � �
� � ����
� � ���� � �
� � ���� � �
� � � �
� � �
Let � �, kf k d denote the minimum value of the objective function of (2.19), that is,
� � � �1 1
, min , and 0; 1,2, .k k
k h h h k hh h
f k d f l l d l h k� �
� �� � � � �� �
� �� � . (2.20)
With the above definition of � �, ,kf k d the MPP (2.18) is equivalent to finding � �,f L d
recursively by finding � �, kf k d for 1, 2, ..,k L� � and 0 .kd d
We can write:
� � � � � �1 1
1 1
, min , and 0; 1, 2, . 1k k
k k k h h h k k hh h
f k d f l f l l d l l h k� �
� �
� �� � � � � � � �� �
� �� �
For a fixed value of ;kd 0 ,k kl d
16
� � � � � �1 1
1 1
, min , and 0; 1, 2, . 1k k
k k k h h h k k hh h
f k d f l f l l d l l h k� �
� �
� �� � � � � � � �� �
� �� �
Using Bellman’s Principle of Optimality, we get the recurrence relation of the Dynamic
Programming technique as
For stages 2k �
� � � � � �0
, min 1, k k
k k k k kl df k d f l f k d l
� � � �� �� � . (2.21)
For the first stage (i.e. 1k � )
� � � � *1 1 1 1 11,f d f d l d� � (2.22)
where *1 1l d� is the optimum width of the first stratum. The relations (2.21) and (2.22)
are solved recursively for 1, 2, .., k L� � and 0 kd d and � �,f L d is obtained. From
� �,f L d the optimum width of thL stratum, *Ll is obtained. From � �*11,f L d l� � the
optimum width of � �1 thL� stratum, *1Ll � is obtained and so on until *
1l is obtained.
The algorithm of the above solution procedure for MPP (2.18) to determine OSB is
summarized as follows:
Step 1: Start at 1k � . Set � �00, 0f d �
Step 2: Calculate � �11, f d , the minimum value of RHS of (2.22) for 1 1l d� ,
10 .d d
Step 3: Record � �11, f d and 1l .
Step 4: For 2k � , express the state variable as � �1 , .k k kd d f k d� � �
Step 5: Set � �, 0kf k d � if ,k kl d! where 0 .kl d
Step 6: Calculate � �, kf k d , the minimum value of RHS of (2.21) for ;kl
0 k kl d .
17
Step 7: Record � �, kf k d and kl .
Step 8: For 3, .., k L� � , go to step 4.
Step 9: At ,k L� � �, f L d is obtained and hence the optimum value of *Ll and Ll
is obtained.
Step 10: At 1k L� � , using the backward calculation for *1L Ld d l� � � , read
the value of � �11, Lf L d �� and hence the optimum value *1Ll � of 1.Ll �
Step 11: Repeat Step 10 until the optimum value *1l of 1l is obtained from � �11, .f d
18
Chapter 3
Determination of the Optimum
Strata Boundaries for a Population
with Exponential Study Variable
______________________________
3.1 Exponential Distribution In probability theory, the exponential distribution is a single parameter family of
continuous distribution. It is commonly used model for waiting times between occurrences
of events; for example, lifetimes of electrical or mechanical devices, the waiting time until
failure, etc. are the random variables that are frequently modeled with an exponential
distribution.
If the study of the stratification variable x follows exponential distribution, then its
density function is given by:
� �λxλ ; 0
; λ 0 ; elsewhere
e xf x
�" !� #$
(3.1)
where λ is called the rate parameter.
The exponential distribution play an important role in both queuing theory and reliability
problems. This distribution is very useful in modelling the time between arrivals at service
facilities and time to failure of component parts of electrical systems. There are many real
life examples where the exponential distributions are used. For example, individual
19
income, energy consumption and many wealth variables are exponential (Banerjee et al.,
2006; Banerjee and Yakovenko, 2010).
3.2 Formulation of the Problem of Determining the
OSB for Exponential Variable
Let the stratification variable follows the exponential distribution with parameter
λ 0! as given by (3.1).
In practice, the actual populations are often finite, so assuming the largest value of in
the population as D ; the frequency function given in (3.1) can be approximated as:
� �λxλ ; 0
; λ 0 ; elsewhere.
e x Df x
�" � #$
(3.2)
Note that we have here 0 0 x � and .Lx D� If D is sufficiently large, (3.2) can be
considered as an approximate exponential density. Otherwise, the truncated exponential
density is to be used in the expression.
If the stratification variable x , follows the exponential distribution with density function
given in (3.2), then the stratum weight ( )hW , stratum mean ( )h and the stratum variance
2( )hS can be obtained as a function of boundary points � �1,h hx x� , by using (2.9), (2.11), and
(2.10) respectively as follows:
� �1
1
λxλ .
h
h
h
h
x
hx
x
x
W f x dx
e dx
�
�
�
�
�
�
�
By integrating the function and substituting hx and 1hx � , it gives us
20
1λx (λ )[1 ]h hlhW e e�� �� � (3.3)
From (2.15) note that
1h h hx l x �� � . (3.4)
The stratum mean ( )h is obtained as follows:
� �1
1
λx
1
1 λ .
h
h
h
h
x
hh x
x
h x
x f x dxW
x e dxW
�
�
�
�
�
�
�
Thus, substituting the value of hW from (3.3), we get
1
1
λx
λx (λ )
1 λ[1 ]
h
h h
h
x
h lx
x e dxe e
�
�
�� ��
� � .
By integrating the function gives
1
1
λxλx
λx (λ )
1[1 ] λ
h
h h
h
x
h lx
exee e
�
�
��
� �
� �� � �� �� � �
.
Substituting the value of hx from (3.4), it gives
1
1
1
λxλx
( )
1[1 ] λ
h h
h
l x
hx
exe% % �
�
�
���
� �
� �� � �� �� � �
xh hle e
which reduces to
21
� � � �� �1
1
1
11
λλ
1
( )λ
λ
1
λ1
[1 ]
λ
h hh h
hh
l xl x
h h
h
xx
h
el x e
ex e
% %
��
�
��
� �� �
�
� �
��
�
� �� �� � �� �� �� �� �� �� �� � ��� �� �
� � �� �� �� �� �� �
xh hle e.
Simplifying the above gives
� �λ λ
1
λ
1 1λ
1
h h
h
l lh h
h l
x e l e
e
� ��
�
� �� � � �� �� ��
�. (3.5)
Similarly, the stratum variance 2( )hS is found as follows:
� �1
1
2 2 2
2 λx 2
1
1 .λ
h
h
h
h
x
h hh x
x
hh x
x fS x dxW
x e dxW
�
�
�
� �
� �
�
�
Thus, substituting the value of hW from (3.3) and h from (3.5), we get
� �1
1
2λ λ
12 2 λx
( ) λ
1 11 λ
λ[1 ] 1
h hh
h
h
l lx h h
h lx
x e l ex e dx
eS % %�
�
� ��
�� � �
� �� �� � � �� �� �� �� �� �� �� �
� �� �
�xh hle e.
By integrating and simplifying, it gives
� �1
1
1
λx λx 12 2 λx
( ) 2
1 11 2 2
[1 ] λ λ 1
h h
h
l x
hx
S e ex e x
% %
% % %%�
�
�
� ��� � �
�� � �
� �� �� � � �� �� �� � � �� �� � � � �� �� �� �� �� �� �
x
h h
h h h
l lh h
l l
x e l e
e e e.
By simplifying further we get
22
� �� �
2 22
22
1 1
1hS
% %
%
%� �
�
� ��
�
h h
h
l lh
l
e l e
e . (3.6)
Thus substituting (3.3) and (3.6) in (2.7) we get
� � � �12 2 2
2
1 1h h hx l lh h h h hW S c e e l e c% % %
%�� � �� �� � �� �� �
.
Then, the formulated MPP given in (2.18) to determine the optimum stratum widths and
hence the optimum stratum boundaries could be expressed using (2.12), (3.3) and (3.6)
as
Minimise � � � �12 2
λx λ λ22
1
1 1λ
h h h
Ll l
h hh
e e l e c�� � �
�
� �� �� �� ��
subject to 1
L
hh
l d�
��
and 0, 1,2, ..hl h L� � �� (3.7)
where 0 Ld x x� � is the range of the distribution.
3.3 Numerical Illustration of the Solution Procedure
This section gives an illustration of the computational details of the proposed solution
procedure using the dynamic programming technique to determine the OSB with varying
stratum cost as discussed in Section 2.3. For the purpose of illustration we assume that x
follows the exponential distribution with 1% � , 0 0x � and 20 Lx � . This implies that
20d � . Then the MPP (3.7) is reduced to
Minimise � � � �12 2x 2
1
1h h h
Ll l
h hh
e e l ce�� � �
�
� �� �� �� ��
23
subject to 1
20L
hh
l�
��
and 0, 1,2, ..hl h L� � �� . (3.8)
Note that the ( 1)thh � stratification point is obtained by
1 0 1 2 1
1 2 1
1
0
0
h h
h
h
h h
x x l l l
l l l
d
d l
� �
�
�
� � � ���
� � � ���
� �
� �
Substituting the value of 1hx � , the recurrence relation (2.21) and (2.22) for solving the
MPP (3.8) reduces to
For the first stage, 1k �
� � � �1 12 2
1 1 11, 1 d df d e d e c� �� �� � �� �� � at 1 1*l d� . (3.9)
For the stage k , where 2k �
� � � �� � � � � �2 2 2
0, min 1 1,k k k k
k k
d l d dk k k kl d kf k d e e d e c f k d l� � � �
� � � � � �
� �� �� �� �� �� �
(3.10)
because 1 0 1 1k k k kx x l l d l� �� � ��� � � .
24
A C++ program (see Appendix A) was coded to solve the recurrence relation (3.9) and
(3.10). While executing the developed program, the optimum strata width *hl and hence
the optimum strata boundaries * * *1h h hx x l�� � are obtained. The results for six different
number of strata 2, 3, 4, 6 a5 7, ndL � with different strata measurement cost hc , are
presented in the Table 3.1.
Table 3.1 OSW, OSB and Optimum Value of the Objective Function for
Exponential Distribution
No of
Strata
� �L
Strata
Measurement
Cost � �hc
Optimum Strata
Width � �hl
Optimum Strata
Boundaries
� �1h h hx x l�� �
Optimum Value
of the Objective
Function
2 1 2c �
2 3c �
1 1 .467970l �
2 1 8.53203l �
1 0 1 1.467970x x l� � � 0.8368043
3 1 2c �
2 3c �
3 4c �
1 0.95892l �
2 1 .40568l �
3 1 7.6354l �
1 0 1 0.95892x x l� � �
2 1 2 2.36460x x l� � �
0.6177939
4 1 2c �
2 3c �
3 4c �
4 5c �
1 0.735210l �
2 0.90241l �
3 1 .37240l �
4 1 6.98998l �
1 0 1 0.735210x x l� � �
2 1 2 1.637620x x l� � �
3 2 3 3.01002x x l� � �
0.5002674
5 1 2c �
2 3c �
3 4c �
4 5c �
5 6c �
1 0.60669l �
2 0.68336l �
3 0.87155l �
4 1 .35166l �
5 16.48674l �
1 0 1 0.60669x x l� � �
2 1 2 1.29005x x l� � �
3 2 3 2.16160x x l� � �
4 3 4 3.51326x x l� � �
0.4260679
25
6 1 2c �
2 3c �
3 4c �
4 5c �
5 6c �
6 7c �
1 0.52232l �
2 0.55854l �
3 0.65463l �
4 0.85203l �
5 1 .33748l �
6 1 6.07500l �
1 0 1 0.52232x x l� � �
2 1 2 1.08086x x l� � �
3 2 3 1.73549x x l� � �
4 3 4 2.58752x x l� � �
5 4 5 3.92500x x l� � �
0.3745282
7 1 2c �
2 3c �
3 4c �
4 5c �
5 6c �
6 7c �
7 8c �
1 0.46223l �
2 0.47719l �
3 0.53160l �
4 0.63626l �
5 0.83853l �
6 1.32718l �
7 15.72701l �
1 0 1 0.46223 x x l� � �
2 1 2 0.93942x x l� � �
3 2 3 1.47102x x l� � �
4 3 4 2.10728x x l� � �
5 4 5 2.94581x x l� � �
6 5 6 4.27299x x l� � �
0.3364002
26
Chapter 4
Determination of the Optimum
Strata Boundaries for a Population
with Right-Triangular Study
Variable
______________________________
4.1 Right-Triangular Distribution
A right-triangular distribution is a family of continuous probability distribution, which
models observable phenomena where the most likely success or mode falls at the
maximum and the least likely success falls at the minimum values. For example; less
income earned by a larger portion of families in a society whereas very few families earn
larger income. It is defined by two parameters a and b , where a is its minimum and also
where the least likely success falls and b is the maximum and where the most likely
success falls.
The probability density function of a right-triangular distribution is given by the general
formula
� � 2
2( ) ; ( ); ,
0 ; otherwise
b x a x bb af x a b�" & �� #
&$
(4.1)
27
4.2 Formulation of the Problem of Determining the
OSB for Right-Triangular Variable
If the stratification variable x follows the right-triangular distribution with density
function given in (4.1), then the stratum weight ( )hW , stratum mean ( )h , and stratum
variance 2( )hS , can be obtained as a function of boundary points � �1,h hx x� , by using (2.9),
(2.11), and (2.10) respectively as derived below.
� �
� �
1
1
1
2
2
( )
2( )
2 ( )
h
h
h
h
h
h
x
hx
x
x
x
x
W f x dx
b x dxb a
b x dxb a
�
�
�
�
��
�
� ��
�
�
�
Performing simple integration gives
� �
� �
1
2
2
2 21
12
22
2 .2 2
h
h
x
hx
h hh h
xW bxb a
x xbx bxb a
�
��
� �� �� �� � �
� �� � � �� � � �� �� � � �� � � � �� �
From (3.4), replacing hx with 1h hl x �� gives
� �
2 21 1
1 12
( )2 ( ) 2 2
h h hh h h h
l x xW b l x bxb a
� �� �
� �� �� ��� � � �� �� �� �� � �� �� �
.
Simplifying the above equation gives
28
� �
� �' (
2
12
12
22
2 2 .
hh h h h
hh h
lW bl l xb a
l b l xb a
�
�
� �� � �� �� � �
� � ��
Finally, substituting 1h ha b x �� � into the equation yields
� �� �22
.h h hh
l a lW
b a�
��
(4.2)
Using (2.12) the stratum mean ( )h of the right-triangular distribution is obtained as
follows:
� �
1
1
2
1 ( )
1 2( ) .
h
h
h
h
x
hh x
x
h x
x f x dxw
b xx dxW b a
�
�
�
��
�
�
�
Substituting hW from (4.2) gives
� �
1
22
2
( ) 2 ( )2 ( )
h
h
x
hh h h x
b a bx x dxl a l b a
�
�� ) �
� � � .
Performing simple integration gives
� �
� �
1
2 3
2 3 2 31 1
22 2 3
3 2 3 22 .2 6
h
h
x
hh h h x
h h h h
h h h
bx xl a l
bx x bx xl a l
�
� �
� �� �� �� � �
� �� � �� � �� � �
From (3.4) substituting 1h h hx l x �� � and expanding the equation, results in
29
� �� �� � � �� �
� �� � � �
32 32 31 1 1 1
2 3 2 21 1 1
3 212 3
3 2 2 3 31 .2 3
h h h h h h
hh h h
h h h h h h h h
h h h
b l x x l x x
l a l
b l l x l l x l xl a l
� � � �
� � �
� �� � � � �� ��� ��� �� �
� �� � � �� ��
� � �� �
Finally, simplifying the above expression
� � � �
� �
2 21 1 13 2 2 3 33 2
h h h h h hh
h h
b l x l l x xa l
� � �� � � ��
�. (4.3)
Similarly, using (2.10) the stratum variance � �2hS of the right-triangular distribution is
obtained as follows:
1
2 2 21 ( ) h
h
x
h hh x
S x f x dxw
�
� �� .
By substituting the mean � �h from (4.3) gives
� �
� � � �� �
1
22 21 1 12 2
2
3 2 2 3 31 2( ) 3 2
h
h
xh h h h h h
hh h hx
b l x l l x xb xx dxw aa
Slb
�
� � �� �� � � ��� �� �
�� � �� �� .
When substituting the weight � �hW from (4.2) gives
30
� � � �� �
� � � �� �
1
1
22 221 1 12 2
2
22 21 1 12 3
3 2 2 3 3( ) 2 ( ) (2 ) ( ) 3 2
3 2 2 3 32 (2 ) 3 2
.
h
h
h
h
xh h h h h h
hh h h h hx
xh h h h h h
h h h h hx
b l x l l x xb a x b x dxl a l b a a l
b l x l l x xx b x dx
a
S
l l a l
�
�
� � �
� � �
� �� � � ��� �� � �
� � �� �� �
� �� � � �� �� � �
� �� �� �
�
�
Performing simple integration gives
� �
� � � �� �
2 4 3 41 1
222 2
1 1 1
22 3 4 3 4
3 2 2 3 33 2
h h h h
h h h
hh h h h h h
h h
x b x x b xl a l
b l x l l x xa l
S
� �
� � �
� �� �� � � �) � � �� �� �� � � �� � � � �� �� �
� � �� �� � � �� �� ��� ��� �� �� �� �
.
Substituting 1h h hx l x �� � , expanding and simplifying further yields
� � � �
� � � �� �
3 41 1
3 41 12
22 21 1 1
4 3122
(2 ) 4 312
3 2 2 3 33 2
h h h h
h h hh h
h
h h h h h h
h h
l x b l x
l a l x b x
b l x l l x xa
S
l
� �
� �
� � �
� �� �� �� � �� �� �� �� �� �� �� �)� �� �� � �� ��� ��� � �� �� �
� �� �� �� �� �� � � �� �� ��� ��� �� �� �� �
.
After further simplification we get
� �2 2 22
2
6 618(2 )
h h h h hh
h h
l l a l aa
Sl
� ��
�. (4.4)
Then, the formulated MPP given in (2.18) to determine the optimum stratum widths and
hence the optimum stratum boundaries could be expressed using (2.12), (4.2) and (4.4)
as
31
Minimize � � � �2 2 2
2 21
6 62( ) 18(2 )
Lh h h h hh h h
hh h h
l l a l al a lc
b a a l�
� ��� ��
subject to 1
L
hh
l d�
��
and 0, 1,2, ..hl h L� � �� (4.5)
where 0 Ld x x� � is the range of the distribution.
4.3 Numerical Illustration of the Solution Procedure
This section gives an illustration of the computational details of the proposed solution
procedure using the dynamic programming technique to determine the OSB with varying
strata measurement cost for the right-triangular distribution as discussed in Section 2.3.
We assume that a 1� and b 2,� which gives 12h ha x �� � and 1d � . So the MPP reduces
to
Minimize � �4 3 2 2
1
6 6
3 2
L h h h h h h h
h
l l a l a l c
�
� ��
subject to 1
1L
hh
l�
��
and 0, 1,2, ..hl h L� � �� . (4.6)
Note that the ( 1)thh � stratification point is obtained by
32
1 0 1 2 1
1 2 1
1
1
1
1
h h
h
h
h h
x x l l l
l l l
d
d l
� �
�
�
� � � ���
� � � ���
� �
� � �
Substituting the value of 1hx � , the recurrence relation (2.21) and (2.22) for solving the
MPP reduces to
For the first stage, 1k �
� �� �4 3 2
1 1 1 1 11
6 61,
3 2
d d d l cf d
� �� at *
1 1l d� . (4.7)
For the stage k , where 2k �
� �� �
� �
4 3 2
0
6(1 ) 6(1 ), min 3 2
1, k k
k k h h k h h k k
k l d
k k
d d d l d d l d cf k d
f k d l
� �� � � � � �� �
� � �� �
� � �� �� �
. (4.8)
A C++ program (see Appendix B) was coded to solve the recurrence relation (4.7) and
(4.8). While executing the developed program, the optimum strata width *hl and hence the
optimum strata boundaries * * *1h h hx x l�� � are obtained. The results for six different
number of strata 2, 3, 4, 5 6 an 7, d L � with different strata measurement cost hc are
presented in the Table 4.1.
33
Table 4.1 OSW, OSB and Optimum Value of the Objective Function for Right-
Triangular Distribution
No of
Strata
� �L
Strata
Measurement
Cost � �hc
Optimum Strata
Width � �hl
Optimum Strata
Boundaries
� �1h h hx x l�� �
Optimum
Value of the
Objective
Function
2 1 2c �
2 3c �
1 0.39770l �
2 0.60230l �
1 0 1 1.39770x x l� � � 0.1915934
3 1 2c �
2 3c �
3 4c �
1 0.27858l �
2 0.27769l �
3 0.44373l �
1 0 1 1.27858x x l� � �
2 1 2 1.55627x x l� � �
0.1399830
4 1 2c �
2 3c �
3 4c �
4 5c �
1 0.22026l �
2 0.20646l �
3 0.21668l �
4 0.35660l �
1 0 1 1.22026x x l� � �
2 1 2 1.42672x x l� � �
3 2 3 1.64340x x l� � �
0.1127605
5 1 2c �
2 3c �
3 4c �
4 5c �
5 6c �
1 0.18500l �
2 0.16838l �
3 0.16626l �
4 0.17946l �
5 0.30090l �
1 0 1 1.18500x x l� � �
2 1 2 1.35338x x l� � �
3 2 3 1.51964x x l� � �
4 3 4 1.69910x x l� � �
0.09573736
6 1 2c �
2 3c �
3 4c �
4 5c �
1 0.16116l �
2 0.14409l �
3 0.13820l �
4 0.14038l �
1 0 1 1.16116x x l� � �
2 1 2 1.30525x x l� � �
3 2 3 1.44345x x l� � �
4 3 4 1.58383x x l� � �
5 4 5 1.73805x x l� � �
0.0839845
34
5 6c �
6 7c �
5 0.15422l �
6 0.26195l �
7 1 2c �
2 3c �
3 4c �
4 5c �
5 6c �
6 7c �
7 8c �
1 0.14383l �
2 0.12706l �
3 0.11977l �
4 0.11820l �
5 0.12222l �
6 0.13590l �
7 0.23302l �
1 0 1 1.14383x x l� � �
2 1 2 1.27089x x l� � �
3 2 3 1.39066x x l� � �
4 3 4 1.50886x x l� � �
5 4 5 1.63108x x l� � �
6 5 6 1.76698x x l� � �
0.07532636
35
Chapter 5
Determination of the Optimum
Strata Boundaries for a Population
with Cauchy Study Variable
______________________________
4.1 Cauchy Distribution
The Cauchy distribution, named after Augustin Cauchy is a continuous probability
distribution. The history of the discovery of this distribution goes back to the 17th century
but it was first published by the French mathematician Siméon Denis Poisson in 1824 and
Cauchy became associated with it during an academic controversy in 1853.
A Cauchy distribution is considered as a possible model whenever one needs a density
function with heavier tails than the normal distribution allows. This distribution does not
possess (finite) moments. This is a unimodal, symmetric distribution, stable and infinitely
divisible. This distribution is very interesting because it is a simple family of distribution
yet the expected value does not exist. The family of distribution is closed under the
formation of sums of independent variable, and it is also an infinitely divisible family of
distributions. The Cauchy distribution is often used in statistics as a canonical example of
a pathological distribution, since both its mean and its variance are undefined.
The probability density function of a Cauchy distribution is given by
36
0 20
1( ; , )
1
f x xx x
*
+**
�� �� � ��� �� �� �� �� �
, (5.1)
where 0x is the location parameter that specifies the location of the peak of the distribution
and is the scale parameter that specifies the half-width at half maximum.
The simplest Cauchy distribution called the standard Cauchy distribution, is a special case
where 0 0x � and 1.* � It has the probability density function
� � � �2
1 ; 1
f x xx+
� �, ,�
(5.2)
In this Chapter the problem of constructing the OSB is discussed when the study
variable of the underlying population has a standard Cauchy distribution given in (5.2).
5.2 Formulation of the Problem of Determining the
OSB for Cauchy Variable
If the stratification variable , follows the standard Cauchy distribution with density
function as given by (5.2), then the stratum weight ( )hW , stratum mean ( )h , and stratum
variance 2( )hS can be obtained as a function of boundary points � �1,h hx x� by using (2.09),
(2.11), and (2.10) respectively as follows:
� �
� �
1
1
2
11
h
h
h
h
x
hx
x
x
W f x dx
dxx+
�
�
�
��
�
�
By integrating the function and substituting hx and 1hx � , it gives us
37
� �1 11
1 tan tanh h hW x x+
� ��� � .
Note that 1h h hl x x �� � and thus substituting this into the above equation gives
� �1 11 1
1 tan ( ) tanh h h hW l x x+
� �� �� � � . (5.3)
Similarly, using (2.11) the stratum mean ( )h is obtained as follows:
� �
1
1
2
1 ( )
1 .1
h
h
h
h
x
hh x
x
h x
x f x dxw
x dxW x
+
�
�
�
��
�
�
Let
21u x� � .
Then
2
dux dx � .
Also
2 ; 1h hx x u x- - �
21 1 ; 1h hx x u x� �- - � .
Thus
2
21
1
1
1 1h
h
x
hh x
duW u
+
�
�
�
� � .
Performing simple integration yields
38
' (
� � � �
2
21
1
1
2 21
1 ln2
1 ln 1 ln 1 .2
h
h
xh x
h
h hh
uW
x xW
+
+
�
�
�
�
�
� �� � � �� �
Substituting 1h h hl x x �� � gives
� � � �2 2 21 1 1
1 ln 1 2 ln 12h h h h h h
h
l l x x xW
+ � � �� �� � � � � �� � .
Using the identity, ln ln ln xx yy� �
� � � �� �
, the above equation reduces to
2 21 1
21
1 21 ln2 1
h h h hh
h h
l l x xW x
+
� �
�
� �� �� � �� � �� ��� �� �
.
Also substituting the value of hW from (5.3), we get
� �
2 21 1
21
1 11 1
1 2ln1
2 tan ( ) tan
h h h h
hh
h h h
l l x xx
l x x
� �
�� �
� �
� �� � �� ��� ��
� �. (5.4)
The stratum variance 2( )hS for the standard Cauchy distribution is found using (2.10) as
follows:
� �
� �
1
1
2 2 2
22
21.
1
1
h
h
h
h
x
h hh x
x
hh x
x f x dxW
x
S
dxW x
+
�
�
� �
� ��
�
�
Let
tan x .� .
39
Differentiating this gives
2sec dx d. .� .
Also
1tan x. �� .
When
1 ; tanh hx x x. �- -
11 1 ; tanh hx x x. �� �- - .
Thus the integral becomes
� �
1
11
tan 22 2 2
2tan
1 tan sec 1 tan
h
h
x
h hh x
S dW
. . . + .
�
��
� ���
or
1
11
tan2 2 2
tan
1 tan h
h
x
h hh x
S dW
. . +
�
��
� �� . (5.5)
Substituting, 2 2tan sec 1. .� � in the equation (5.5) and integrating gives
' (1
11
tan2 2tan
1 tan h
h
xh hx
hWS . .
+
�
��
� � � .
Thus
� � � �2 1 1 1 1 21 1
1 tan tan tan tan tan tanh h h h h hh
x x x xW
S +
� � � �� �
� �� � � � �� � .
Substituting 1h h hl x x �� � gives
40
� �2 1 1 21 1
1 tan tanh h h h h hh
l x lS xW
+
� �� �� �� � � � �� � .
Substituting the value of hW from (5.3) and h from (5.4), gives
� �
� � � �
22 21 11 1 21 1 12
21 11 11 11 1
1 211 lntan tan 4 1 1 tan ( ) tantan ( ) tan
h h h h
h h h h hh
h h hh h h
l l x xl x l x x
l xS
xl x x+
+
� �� �� �
�
� �� �� �� �
� �� �� � �� �� �� �� � � �� � � �� �� �
� �� �.
Simplifying the above equation gives
� �1 11 1
2 1 11 121 1
1 1 22 21 1
21
tan tan1 tan ( ) tan
tan ( ) tan1 21 ln
4 1
h h h h
h h h h
h h h
h h h h
h
l x l x
l x xl x x
l lx
S
x x
� �� �
� �� �� �
� �
� �
�
" /& &� �� � �& &� �& && &� �� � �# 0� �� �� � & &� �
� �& &� �� � �� � �& &� ��� �& &� �$ 1
. (5.6)
Then, the formulated MPP given in (2.18) to determine the optimum stratum widths and
hence the optimum stratum boundaries could be expressed using (2.12), (5.3) and (5.6)
as:
Minimise
� �� �� �
1 11 1
1 11 1
22 211 1
21
tan tan
tan ( ) tan1 sqrt1 21 ln
4 1
h h h h
Lh h h
hh
h h h h
h
l x l x
l x x
l x xx
cl+
� �� �
� �� �
�� �
�
" /� �� �� � �& &� �� �& &� �� �� �& &� �� �# 0� �� �& &� �� �� � �� �� ��& &� �� �� �� ��& &� �� �� �� �$ 1
�
subject to 1
L
hh
l d�
��
41
and 0, 1,2, ..hl h L� � �� (5.7)
where 0 Ld x x� � is the range of the distribution.
5.3 Numerical Illustration of the Solution Procedure
This Section gives an illustration of the computational details of the proposed solution
procedure using the dynamic programming technique to determine the OSB with varying
strata measurement cost for the standard Cauchy distribution as discussed in Section 2.3.
Let us assume that x follows the standard Cauchy distribution in the interval ' (1,1� , that
is, 0 1X � � , 1LX � and 2d � . Then, the MPP (5.7) becomes
Minimise
� �� �� �
1 11 1
1 11 1
22 211 1
21
tan tan
tan ( ) tan1 sqrt1 21 ln
4 1
h h h h
Lh h h
hh
h h h h
h
l x l x
l x x
l x xx
cl+
� �� �
� �� �
�� �
�
" /� �� �� � �& &� �� �& &� �� �� �& &� �� �# 0� �� �& &� �� �� � �� �� ��& &� �� �� �� ��& &� �� �� �� �$ 1
�
subject to 1
2L
hh
l�
��
and 0; 1,2, .h hl L� � �� . (5.8)
Note that the ( 1)thh�
42
1 0 1 2 1
1 2 1
1
1
1
1 .
h h
h
h
h h
x x l l l
l l l
d
d l
� �
�
�
� � � ���
� � � � ���
� � �
� � � �
Substituting the value of 1hx � , the recurrence relation (2.21) and (2.22) for solving the
MPP reduces to:
For the first stage,
� �
� �� �� �� �
1 11 1
1
1 11
1 221 1
tan 1 tan ( 1)
tan 1 ) tan ( 1)11, sqrt2 21 ln
4 2
l
dd
c
d
df
d+
� �
� �
" /� �� � � �& &� �& &� �� � �& &� �� # 0
� �& &� �� �� �� ��& &� �� �� �& &� �� �� �$ 1
at *1 1l d� . (5.9)
For the stage ,k where 2k �
� �
� � � �� �� � � �� �
� �� �� �
� �
1 1
1 1
202
2
tan 1 tan 11 sqrt tan 1 tan 1
, min 1 1 11 ln
4 1 1
1,
k k
h k k k
k k k kk x d
k k k k k
k k
k k
l d d l
d d l cf k d
l d l d ld l
f k d l
+
� �
� �
" /� �� �& &� �� �& &� �� �� � � � �& &� �� �& &� �� �� � � �& &� �� �� # 0� �� �& � �� �� � � � � �� �� �& � �� � �� �� �� �� �& � � �� �� �� �� �� �� �&& � � �$
&&&&&1
(5.10)
A C++ program (see Appendix C) was coded to solve the recurrence relation (5.9) and
(5.10). While executing the developed program, the optimum strata width *hl and hence
the optimum strata boundaries * * *1h h hx x l�� � are obtained. The results for six different
43
number of strata 2,3,4,5,6 and 7L � and the different values of hc are presented in the
Table 5.1.
Table 5.1 OSW, OSB and Optimum Value of the Objective Function for Standard
Cauchy Distribution
No of
Strata
� �L
Strata
Measurement
Cost � �hc
Optimum Strata
Width � �hl
Optimum Strata
Boundaries
� �1h h hx x l�� �
Optimum
Value of the
Objective
Function
2 1 2c �
2 3c �
1 1.09957l �
2 0.90043l �
1 0 1 0.09957x x l� � �
0.2179531
3 1 2c �
2 3c �
3 4c �
1 0.81943l �
2 0.58253l �
3 0.59804l �
1 0 1 0.18057x x l� � � �
2 1 2 0.40196x x l� � �
0.1586139
4 1 2c �
2 3c �
3 4c �
4 5c �
1 0.67321l �
2 0.46327l �
3 0.41898l �
4 0.444540l �
1 0 1 0.32679x x l� � � �
2 1 2 0.13648x x l� � �
3 2 3 0.55546x x l� � �
0.1273067
5 1 2c �
2 3c �
3 4c �
4 5c �
5 6c �
1 0.58034l �
2 0.39783l �
3 0.33898l �
4 0.33133l �
5 0.35152l �
1 0 1 0.41966x x l� � � �
2 1 2 0.02183x x l� � � �
3 2 3 0.317150x x l� � �
4 3 4 0.64848x x l� � �
0.1078071
6 1 2c �
2 3c �
1 0.51509l �
2 0.35472l �
1 0 1 0.48491x x l� � � �
2 1 2 0.13019x x l� � � �
0.0943949
44
3 4c �
4 5c �
5 6c �
6 7c �
3 0.29332l �
4 0.27303l �
5 0.27442l �
6 0.28942l �
3 2 3 0.16313x x l� � �
4 3 4 0.43616x x l� � �
5 4 5 0.71058x x l� � �
7 1 2c �
2 3c �
3 4c �
4 5c �
5 6c �
6 7c �
7 8c �
1 0.46630l �
2 0.32335l �
3 0.26317l �
4 0.23768l �
5 0.23032l �
6 0.23398l �
7 0.24520l �
1 0 1 0.53370x x l� � � �
2 1 2 0.21035x x l� � � �
3 2 3 0.05282x x l� � �
4 3 4 0.29050x x l� � �
5 4 5 0.52082x x l� � �
6 5 6 0.75480x x l� � �
0.0845448
45
Chapter 6
Determination of the Optimum
Strata Boundaries for a Population
with Power Study Variable
______________________________
6.1 Power Distribution
The Power distribution is a continuous probability distribution with probability density
function given by
� �1
; 0
0 ; otherwise
x xf x
2
2
2 ..
�" &� #
&$
(6.1)
where 0 and 0. 2! ! are the scale and shape parameters. In this chapter the problem of
determining the OSB is discussed when the study variable in the underlying population
has a power distribution described in (6.1).
6.2 Formulation of the Problem of Determining the
OSB for Power Variable
If the stratification variable x follows the power distribution with density function given
in (6.1), then the stratum weight � �,hW stratum mean � �h and stratum variance � �2hS can
46
be obtained as a function of boundary points � �1,h hx x� by using (2.9), (2.11), and (2.10)
respectively as follows:
� �1
h
h
x
hx
W f x dx�
� �
1
1h
h
x
x
x dx2
2
2.
�
�
� � .
By integrating the function and substituting hx and 1 hx � , it gives us
11
h h hW x x2 22. �� �� �� � . (6.2)
Note that 1h h hl x x �� � and thus substituting this to (6.2) gives
� �1 11 h h h hW l x x2 22. � �� �� � �� � . (6.3)
The stratum mean � �h of the power distribution is obtained using (2.11) as follows:
1
11 h
h
x
h x
xx dxW
2
2
2.
�
�
� � .
By integrating the function and substituting 1h h hx l x �� � gives
� � � � 1 11 11h h h h
h
l x xW
2 22
2 . 2
� �� �
� �� � �� ��.
Thus, substituting the value of ,hW from (6.2) and simplifying gives
1
1 ( ) h
h
x
hh x
x f x dxw
�
� �
47
� �� �� �
1 11 1
1 11
h h hh
h h h
l x xl x x
2 2
2 2
2 2
� �� �
� �
� �� �� � �
� � �� �� �. (6.4)
The stratum variance � �2hS for the power distribution is found using (2.10) as follows:
� �1
2 2 21 h
h
x
h hh x
x f xW
S x d �
� ��
1
12 21 h
h
x
hh x
xx dxW
2
2
2 .
�
�
� �� .
By integrating the function above we get
� �2 2 2 2
11
2h h h hh
xW
S x2 22 . 2
� ��� �� � �� ��
.
Substituting 1h hl x �� and 1hx � gives
� � � � 22 2 21 1
12h h h h h
h
l x xSW
2 22 . 2
� �� �
� �� � � �� ��.
Thus, substituting the value of hW from (6.3) and h from (6.4), gives
� �
� �� �
� �
� �
2 21 12 2 1 121 1
22
1 1 1 1
12h h h
h h h
hh h h h h h
l x xl x x
l xS
x l x x
2 22 2
2 2 2 2
2222
� �� � � �� �
� � � �
� �� �� �� � � ��� ��� �� � � �� �� �
.
Expanding and finding the common denominator the above function can be simplified
as
48
� �
� �� �� �
� �� �
� � � �� �
� �� �� �
2 21
2
2 21
2
22
1 1 2 2 21 1 1 1
2 2 21 1 1 1
2
2 1
2 11
2 22
2
1
h h
h
h
h h hh h h h h h h
h h h h h h
S
l x
x
l x xl x x l l x x
l x x l x x
2
2
2 22
2
22 2
22 2
22
2
2
��
��
� �� � � �
� � � �
" /�& &
& &� �& && && && &�& &� �& && &� # 0
� � & &� �� � � � �& &�& &�& && && && &� �& &�
�& &$ 1
Further simplification gives
� �� � � �
� �
� �
� �� �
2 2 2 21 1
221 122
1 1
2 21 1
2 1
1 2 2
h h h
h h hh
h h h
h h h h
l x x
l x xl x x
l l x
S
x
2
2
2 22 2
2
� �� �
� �
� �
� �
" /� �& && && && &� �� # 0
� � & &� � � �� � & && &� � �& &$ 1
α
α . (6.5)
Then, the formulated MPP given in (2.18) to determine the optimum stratum widths and
hence the optimum stratum boundaries could be expressed using (2.12), (6.3) and (6.5)
as:
Minimize
� �
� �
� �
� �� �
2 2 2 21 1
21 1
1
2 21 1
12 2 sqrt
1 2 2
h h h
L
h h h hh
h h h h
l x x
l x x c
l l x x
2
22. 2 2 2
2
� �� �
� ��
� �
" /� �� �� �� �& &� �� �� �& &� �� �� �& &� �� �� �& &� �� �� �� � � �# 0
� �� �� �& &� �� �� �& &� �� �� �& &� �� � �� �� �& &� �� �� �$ 1
�
α
subject to 1
L
hh
l d�
��
and 0, 1,2, ..hl h L� � �� . (6.6)
49
6.3 Numerical Illustration of the Solution Procedure
This section gives an illustration of the computational details of the proposed solution
procedure using the dynamic programming technique to determine the OSB with varying
stratum cost for the power distribution. Let x follow the power distribution in the interval
[0, 1], that is, 0 0,x � 1,Lx � and 1d � . We also assume that 1. � and 32 � . Then, the
MPP (6.6) reduces to:
Minimize � �2 4 3 2 2 3 4
1 1 1 1
1
3 24 84 120 60
4 5
L h h h h h h h h h h
h
l l l x l x l x x c� � � �
�
� � � ��
subject to 1
1L
hh
l�
��
and 0; 1,2, .h hl L� � �� . (6.7)
Note that the � �1 thh� stratification point is obtained by
1 0 1 2 1
1 2 1
1
h h
h
h
h h
x x l l l
l l l
d
d l
� �
�
�
� � � ���
� � ���
�
� �
Substituting the value of 1hx � , the recurrence relation (2.21) and (2.22) for solving the
MPP reduces to:
For the first stage, 1k �
� �4
11
61, 4 5
df d � at 1*1l d� . (6.8)
50
For the stage k , where 2k �
� �� �
� �
2 4 3 2 2 3 41 1 1 1
0
3 24 84 120 60, min 4 5
1, k k
k k h h h h h h h k
k l d
k k
l l l x l x l x x cf k d
f k d l
� � � �
" /� � � �& && &� # 0& &
� � �& &$ 1
which simplifies to
� �� � � �� � � �� �
24 3 22
3 4
0
3 24 84sqrt
, min 4 5 120 60
1, k k
k h k k h k kkk
k h k k k kx d
k k
l l d l l d ll cf k d l d l d l
f k d l
" /� �� �� � � � �& &� �� �& &� �� �� � � �# 0� �� �& &� � �& &$ 1
(6.9)
because 1 0 1 1k k k kx x l l d l� �� � ��� � � .
A C++ program (see Appendix D) was coded to solve the recurrence relation (6.8) and
(6.9). While executing the developed program, the optimum strata width *hl and hence the
optimum strata boundaries * * *1h h hx x l�� � are obtained. The results for six different number
of strata 2,3,4,5,6 and 7L � and the different values of hc , are presented in the Table 6.1.
51
Table 6.1 OSW, OSB and Optimum Value of the Objective Function for Power
Distribution
No of
Strata
� �L
Strata
Measurement
Cost � �hc
Optimum
Strata Width
� �hl
Optimum Strata
Boundaries
� �1h h hx x l�� �
Optimum
Value of the
Objective
Function
2 1 2c �
2 3c �
1 0.75883l �
2 0.24117l �
1 0 1 0.75883x x l� � � 0.1580252
3 1 2c �
2 3c �
3 4c �
1 0.64929l �
2 0.20635l �
3 0.14436l �
1 0 1 0.64929x x l� � �
2 1 2 0.85564x x l� � �
0.1157368
4 1 2c �
2 3c �
3 4c �
4 5c �
1 0.58323l �
2 0.18536l �
3 0.12967l �
4 0.10174l �
1 0 1 0.58323x x l� � �
2 1 2 0.76859x x l� � �
3 2 3 0.89826x x l� � �
0.0933962
5 1 2c �
2 3c �
3 4c �
4 5c �
5 6c �
1 0.53776l �
2 0.17091l �
3 0.11957l �
4 0.09381l �
5 0.07795l �
1 0 1 0.53776x x l� � �
2 1 2 0.70867x x l� � �
3 2 3 0.82824x x l� � �
4 3 4 0.92205x x l� � �
0.0794072
6 1 2c �
2 3c �
3 4c �
4 5c �
1 0.50396l �
2 0.16016l �
3 0.11205l �
4 0.08791l �
1 0 1 0.50396x x l� � �
2 1 2 0.66412x x l� � �
3 2 3 0.77617x x l� � �
0.0697378
52
5 6c �
6 7c �
5 0.07305l �
6 0.06287l �
4 3 4 0.86408x x l� � �
5 4 5 0.93713x x l� � �
7 1 2c �
2 3c �
3 4c �
4 5c �
5 6c �
6 7c �
7 8c �
1 0.47750l �
2 0.15175l �
3 0.10616l �
4 0.08330l �
5 0.06921l �
6 0.05957l �
7 0.05251l �
1 0 1 0.47750x x l� � �
2 1 2 0.62925x x l� � �
3 2 3 0.73541x x l� � �
4 3 4 0.81871x x l� � �
5 4 5 0.88792x x l� � �
6 5 6 0.94749x x l� � �
0.0626070
53
Chapter 7
Conclusion
______________________________
Survey is an important decision-making tool for development. Results from survey will
aid in decision-making in government, businesses and non-profit organizations. The best
way to conduct a survey is to consider the population. Usually this is impossible due to
budget constraints. Thus, to cater for this problem, many sampling techniques have been
developed. One commonly used technique is stratified random sampling, where the
population is divided into non-overlapping groups called strata and the random samples
are drawn from the strata. To achieve the maximum precision of the estimates when using
stratified sampling, the four basic problems to consider are: (1) the choice of the
stratification variables, (2) the determination of the number of strata, (3) the determination
of the optimum strata boundaries and (4) the determination of the optimum sample size to
be selected from each stratum.
The optimum strata boundaries (OSB) are determined by cutting the range of the
distribution of the study variable at suitable points. The choice of the OSB is important to
ensure that the strata are homogenous. This means that in order to achieve maximum
precision, the stratum variance � �2hS should be as small as possible for a given type of
sample allocation. The problem of determining OSB was first studied by Dalenius (1950)
and subsequently many authors proposed various technique of determining the OSB.
On the other hand, the budget of a survey is fixed in advance in practice. If so, then the
purpose of survey design is to maximize the amount of information gathered with
optimum precision for a given cost. Also when the cost of measurement per unit varies
from stratum to stratum, this certainly influences the determination of the OSB that
54
maximizes the precision of the estimate. Thus it is important that while trying to determine
the OSB, the cost of measuring observation per unit in each stratum should also be
considered. Unfortunately, the problem of determining the OSB with these cost
constraints was not studied much in the literature. In this thesis, an attempt is made to
determine the OSB for a given budget where the cost of measuring units varies from
stratum to stratum. First a mathematical programming problem was formulated, which
addresses the problem of determining the OSB. Then a solution procedure was proposed
to solve the MPP using a dynamic programming technique. Numerical examples were
illustrated for determining the OSB using four populations in which the study variable
follows different distributions, namely, exponential distribution, right-triangular
distribution, Cauchy distribution and the power distribution.
The findings from this project will be very useful for those who want to conduct a survey
using stratified random sampling given some budget constraints. The advantage of the
proposed method is that it gives global optimum of the boundary points. However, the
research carried out in this thesis is for only a single study variable. Thus future research
can be carried out for more study variables that have other frequency function such as
normal, lognormal, gamma etc. which are more useful in the industry. The research can
also be carried out for determining the OSB for multivariate.
55
Bibliography 1. Aoyama, H. (1954). A study of stratified random sampling. Ann. Inst. Stat.
Math., 6, 1-36.
2. Arthanari, T.S., Dodge, Y. (1981). Mathematical Programming in Statistics.
Wiley and Sons, Inc. USA
3. Banerjee, A., Yakovenko, V.M. and Matteo, T.D. (2006). A study of the personal
income distribution in Australia. Physica A, 370, 54-59.
4. Banerjee, A., Yakovenko, V.M. (2010). Universal patterns of inequality. New
Journal of Physics, 12, DOI: 10.1088/1367-2630/12/7/075032.
5. Bellman, R.E. (1957). Dynamic Programming. Princetown University Press,
New Jersey.
6. Brandt, S. (1999). Data Analysis. Statistical and Computational Methods for
Scientists and Engineers. Ed. 3. Springer Verlag, New York.
7. Buhler. W., Deutler. T., (1975). Optimal stratification and grouping by dynamic
programming. Metrika. 22(1), 161-175.
8. Cameron, N. (1985). Introduction to Linear and Convex Programming
Cambridge University Press, Cambridge.
9. Claycombe, W.W. and Sullivan, W.G. (1975). Foundations of Mathematical
Programming. Reston Publishing Company, INC. A Prentice-Hall
Company,USA
56
10. Cochran, W.G. (1977). Sampling Techniques. John Wiley & Sons, New York.
11. Cochran, W.G. (1961). Comparisons of methods for determining stratum
boundaries. Bull. Int. Stat. Inst, Vol 38. Part 2, pp. 345-358.
12. Dalenius, T. (1950). The problem of optimum stratification- II. Skand.
Aktuartidskr, 33, 203-213.
13. Dalenius, T., and Gurney, M. (1951). The problem of optimum stratification.
Skand. Akt.,34, 133-148.
14. Dalenius, T. (1957). Sampling in Sweden. Almquist and Wiksell. Stockholm.
15. Dalenius, T., and Hodges, J. L. (1959). Minimum variance stratification. J. Amer.
Statis. Assoc., 54, 88-101.
16. Detlefsen, R. E., and Veum, C. S. (1991). Design Issues for the retail trade
sample surveys of the US Bureau of the Censors. Proceeding of the Survey
Research Methods Section, ASA, pp. 214-219.
17. Dvoretzky, A. Kiefer, J. Wolfowitz, J. (1952). The Inventory Problem: I, Case of
Known Distribution of Demand. Econometrica. 20, 187-222.
18. Dvoretzky, A. Kiefer, J. Wolfowitz, J. (1952). The Inventory Problem: II, Case
of Unknown Distribution of Demand. Econometrica. 20, 450-466.
19. Ekman, G. (1959). Approximate expression for conditional mean and variance
over small intervals of a continuous distribution. Ann. Inst. Stat. Math., 30, 1131-
1134.
57
20. Evans, M., Hasting, N. and Peacock, B. (2000). Statistical Distributions. John
Wiley and Sons Inc., Canada.
21. Findensein, W., Szymanowski, J. and Wierzbicki, A. (1974). Metody
obliczeniowe optymalizacji (Computing Methods of Optimization).
Wydawnictwa Politechniki Warszawskiej, Warsaw, Poland.
22. Fonolahi, A.V., Khan, M.G.M. (2014). Determining the optimum strata
boundaries with constant cost factor. IEEE Proceding of 2014 Asia-Pacific
World Congress on Computer Science and Engineering (APWC on CSE), 1-7,
DOI: 10.1109/APWC CSE.2014.7053850.
23. Glaisher, J.W. (1871). On a class of definite integrals. Philosophical Magazine,
XXXII, 294-301.
24. Govindarajulu, Z. (1999). Elements of sampling Theory and Methods. Prentice-
Hall Company.
25. Gupta, R. K., Singh, R., Mahajan, P.K. (2005). Approximate optimum strata
boundaries for ratio and regression estimators. Aligarh Journal of Statistics. 25,
49-55.
26. Hansen, M.H., and Hurwitz, W.N. (1953). On the theory of sampling from finite
population. Ann. Math. Statist., 14, 333-362.
27. Hansen, M.H., and Hurwitz, W.N., Madow, W.G. (1962). Sample Survey
Methods and Theory Methods and applications, Vol 1, John Wiley &Sons, Inc.
28. Hess, I., Sethi, V.K. and Balakrishnan, T.R. (1966). Stratification: A practical
investigation. J. Amer. Statist. Assoc., 61, 71-90.
58
29. Hidiroglou, M.A., and Srinath, K.P. (1993). Problems associated with designing
subannual business surveys. Journal of Bussiness and Economic Statistics, 11,
397-405.
30. Johnson, D. (1997 ). The triangular distribution as a proxy for the beta
distribution in risk analysis. Journal of the Royal Statistical Society: Series D (
The Statistician), Vol 47, Issue 3, 387- 398.
31. Khan, E.A., Khan, M.G.M., and Ahsan, M.J. (2002). Optimum stratification: A
mathematical programming approach. Culcutta Statistical Association Bulletin,
52 (special), 205-208.
32. Khan, M.G.M., Khan, N., and Ahsan, M.J. (2003). Optimum stratification for
exponential study variable under Neyman Allocation. Bulletin of the
International Statistical Institute, 54th Session, Vol LX, 606-607.
33. Khan, M.G.M., Najmussehar., Ahsan, M.J. (2005). Optimum stratification for
exponential study variable under Neyman allocation. J. Indi. Soc. Agri. Statist.,
59(2), 146-150.
34. Khan, M.G.M., Nand, N., Ahmad, N. (2008). Optimum stratification for cauchy
and power type study variables. J. Appl. Statist. Sci. 16(4), 64-74.
35. Khan, M.G.M., Nand, N., Ahmad, N. (2008). Determining the optimum strata
boundary points using dynamic programming. Survey methodology. 34(2), 205-
214.
36. Khan, M.G.M., Rao, D., Ansari, A. H., Ahsan, M. J., (2014). Determining
optimum strata boundaries and sample sizes for skewed population using log-
normal distribution. Communications in Statistics- Simulation and Computation.
DOI # 10.1080/03610918.2013.819917.
59
37. Khan, M.G.M., Reddy, K.G. and Rao, D.K. (2015). Designing stratified
sampling in economic and business surveys. Journal of Applied Statistics. DOI:
10.1080/02664763.2015.1018674.
38. Kozak, M. (2004). Optimal stratification using random search method in
agricultural surveys. Statistics in Transition, Vol. 6, No.5, 797-806.
39. Lavalle’e, P. (1988). Two-way optimal stratification using dynamic
programming. Proc. Sect. Surv. Resea. Meth., Amer. Statist. Assoc., Virginia,
646-651.
40. Lavalle’e, P., Hidiroglou, M. (1988). On the stratification of skewed populations,
Survey Methodlogy, 14, 3-43.
41. Lednicki, B., Wieczorkowski, R. (2003). Optimal stratification and sample
surveys. Sankhya, 12, 1-7.
42. Mahalanobis, P. C. (1952). Some Aspects of the Design of Sample Survey.
Sankhya, 12, 1-7.
43. Mehta, S. K., Singh, R. Kishore, L. (1996). On optimum stratification for
allocation proportional to strata totals. J. Indi. Statist. Assoc. 34, 9-19.
44. Murthy, M.N. (1967). Sampling Theory and Methods. Statistical Publishing
Society, Culcutta.
45. Nand, N. and Khan, M.G.M. (2005a). Use of mathematical programming in
optimum stratification. Presented in the conference of Australia – New Zealand
Industrial and Applied Mathematics (ANZIAM 2005) held during January 30 to
February 3 at Napier, New Zealand.
60
46. Nand, N. and Khan, M.G.M. (2005b). Determining the optimum strata boundary
points using mathematical programming. Presented in the 55th Session of the
International Statistical Institute (ISI) held during April 5-12 at Sydney,
Australia.
47. Nelders, J. A., Mead, R. (1965). A simplex method for function minimization.
Computer Journal, 7, 308-313.
48. Nemhauser, G. L. (1980). Introduction to Dynamic Programming. John Wiley &
Sons, Inc, USA.
49. Nicolini, G. (2001). A method to define strata boundaries. Departemental
Working Papers 2001-01, Department of Economics University of Milan Italy.
(www.economia.unimi.it/pubb/wp83)
50. Niemiro, W. (1999). Konstrukcja optymalnej stratyfikacja metoda poszukiwan
losowych. (Optimal stratification using random search method). Wiadomosci
Statystyczne, 10, 1-9.
51. Poisson, D.D. (1832). Sur la probabilite’ des resultat moyens des observations,
Paris.
52. Rivest, L. P. (2002). A generalization of Lavallee and Hidiroglou algorithm for
stratification in business survey. Techniques d’enquete, 28, 207-214.
53. Rizvi, S.E.H., Gupta, J.P., Bhargava M. (2002). Optimum stratification base on
auxiliary variable for compromise allocation. Metron, 28(1), 201-215.
54. Scheaffer, R. L., Mendenhall, R., Ott, L. R. (2006). Elementary Survey
Sampling (6th Edn). Thomson Brooks/Cole. University of California, USA.
61
55. Serfling, R.J. (1968). Approximately optimum stratification. Journal of American
Statistical Association. 63, 1298-1309.
56. Sethi, V. K. (1963). A note on optimum stratification of population for
estimating the population mean. Aust. J. Statist. 5, 20-23.
57. Simpson, T. (1755). A letter to the Right Honourable George Earl of
Macclesfield, President of the Royal Society on the advantage of taking the mean
of a number of observations, in practical astronomy. Philosophical Transactions
of the Royal Society of London. 49, 82-93.
58. Singh, R., Prakash, D. (1975). Optimum Stratification for Equal Allocation.
Annals of the Institute of Statistical mathematics, 27, 273-280.
59. Singh, R., Sukhatme, B.V. (1969). Optimum stratification. Ann. Inst. Stat. Math.
21, 515-528.
60. Singh, R., Sukhatme, B.V. (1972). Optimum stratification in sampling with
varying probabilities. Ann. Inst. Stat. Math. 24, 485-494.
61. Singh, R., Sukhatme, B.V. (1973). Optimum stratification with ratio and
regression methods of estimation. Ann. Inst. Stat. Math. 25, 627-633.
62. Singh, R. (1971). Approximately optimum stratification on auxiliary variable. J.
Amer. Statist. Assoc. 66, 829-833.
63. Singh, R., Prakash, D. (1975). Optimum stratification for equal allocation. Ann.
Inst. Stat. Math. 27, 273-280.
62
64. Sundaram, R. K. (1996). A first Course in optimization Theory. Cambridge
University Press, USA.
65. Sweet, E.M., Sigman, R.S. (1995a). Evaluation of model-assisted procedures for
stratifying skewed populations using auxiliary data, U.S. Bereau of the Census
(available on the internet: www.census.gov/srd/papers/pdf/sm95-22.pdf).
66. Sweet, E.M., Sigma, R.S. (1995b). User guide for the generalized SAS univariate
stratification program, ESM Report Series, ESM-9504, U.S. Bureau of the
Census.
67. Taga, Y., (1967). On optimum stratification for the objective variable based on
concomitant variables using prior information. Ann. Inst. Stat. Math., 19, 101-
129.
68. Taha, H. A. (1997). Operations Research: An Introduction. Prentice Hall, Inc.,
Opper Saddle River, New Jersey.
69. Howard G. T. (1962). An introduction to probability and mathematical statistics.
Academic Press Inc. USA.
70. Unnithan, V.K.G. (1978). The minimum variance boundary points of
stratification. Sankhya, 40, C, 60-72.
71. Unnithan, V.K.G. and Nair, N.U. (1995). Minimum variance stratification.
Commun. Statist., 24(1), 275-284.
72. Walpole, R. E. Myers, R. H., Ye, K. (1981).Probability and Statistics for
engineers and scientists, 7th edn. Prentice Hall, Inc USA.
63
73. Winston, L. W., Venkataramanan, M. (2003). Introduction to Mathematical
Programming, Operation Research Volume One, 4th Edn . Brooks/Cole-
Thompson Learning, USA.
74. Wald, A. (1947). Foundation of a General Theory of Sequential Decision
Functions. Econometrica, Vol. 15. 279-313.
64
Appendix Appendix A:
The C++ Program Created to Determine the OSB
with cost factor for Exponential Distribution
/* This program gives the optimum strata width and optimum strata boundaries of
the main study variable following the exponential distribution */
#include <iostream>
#include <math.h>
#include <assert.h>
#include <conio.h>
#include <iomanip>
using namespace std;
typedef double Number;
/*********************************************************************/
//declare and initialize global constants
# define z 100 //refine to 5 decimal places
# define factor 4
# define inc 0.001 // amount of precision (10^-3)
# define inc2 0.00001 // amount of precision (10^-5)
# define prec 1/inc
# define points 1000 //Keep this to be 1/inc
// function declarations
65
double RootVal(int k, double d, double y, double c);
/*calculates the value of objective function and the minimal elements*/
double Minimum(double val1,double val2); // returns minimum of 2 numbers
/*Recursive function receives the parameter k, dk, yk to calculate f.*/
double fun(int k,int n,double incf, int minYk, int maxYk, bool isFirstRun, double []);
//void Weight Sam Pop(int h, int N, int nh[], int Nh[], double w[], double d[], double y[],
double f, double Vybarst, double Vh[]) ;
//void sample size(int nh[], int n, int h, double w[], double sig[], double prodwhsigh);
//declare global variables
int n, N; // number of strata (h), total sample size (n), pop size
double s; // s=x0, the initial value of 6.1
const int SIZE = 10;
const double lambda = 1;
//declare global constants and initialize their values
const double g = 20; // g is the distance
const int stages = 10;
int ylimits[10]; //stores the 3dp values for refining
const int e = (int)(g*points*z+1);
const int p = (int)(g*points+1);
double minkf2[stages][e];
double dk2[stages][e];
int h = 0;
int main() //program execution starts
{
double c[SIZE];
//take inputs of L, d and s as local variables
cout<<"Enter Number of Strata, L " << endl;
66
cin >> h;
cout<<"Enter Initial Value, Xo " << endl;
cin >> s;
s=0;
for (int i=0; i<h; i++)
{
cout<<"Enter the Cost, C " << endl;
cin >> c[i];
printf("c[%i+1] = %f \n\n", i, c[i]);
}
//initialize minkf locally
cout<<"\initializing points ...."<<endl;
for (int i=0; i < stages; i++)
{
for(int j=0; j<e; j++)
{
minkf2[i][j]= -9999; //assign -9999 to every cell
}
}
for (int k=0; k < stages; k++)
{
for(int l=0; l<e; l++)
{
dk2[k][l]= -9999; //assigning same as above
}
}
cout<<"Initialization complete"<<endl<<endl<<"Calculating...."<<endl<<endl;
67
double f=fun(h,p,inc ,0,p ,true, c);
double d[SIZE], y[SIZE], x[SIZE], w[SIZE], Vh[SIZE]; // d, y and x are arrays
of h float numbers
int nh[SIZE], Nh[SIZE]; //stratum sample sizes
int temp;
double Vybarst;
//backward calculation for the 3dp results
for(int i=h; i>=1; i--)
{
//c[i] = i+1;
if(i==h)
{
d[i] = g;
y[i] = dk2[i][p];
//c[i] = c[i+1];
}
else if(i<h && i>1)
{
d[i] = d[i+1]-y[i+1];
temp = (int)(d[i]*points);
y[i] = dk2[i][temp];
//c[i] = c[i+1];
}
else if(i==1)
{
d[i]=d[i+1]-y[i+1];
y[i]=d[i];
//c[i] = c[i+1];
68
}
}
//setup the limits for the 6dp calculations
for(int i=h; i>=1; i--)
{
temp = (int)(y[i]*points*z);
ylimits[i] = temp;
}
f=fun(h, e-1, inc2, ylimits[h]-factor*z, ylimits[h]+factor*z, false, c);// for k>=2
cout <<"Strata: L = " << h << setw(30) << "Distance: d = " << g << endl;
printf("\nf(h,g): %.10f \n" ,f);
cout << setw(20) << "\n\n Distance" << setw(25) << "Width" << setw(26) <<
"Boundary" << endl;
cout << setfill('-') << setw(73) << "-";
//Backward calculation for the 6 dp, compute d, y and x
for(int i=h; i>=1; i--)
{
//c[i] = i+1;
if(i==h)
{
d[i]=g;
y[i] = dk2[i][(e-1)];
x[i]=s+g;
}
else if(i<h && i>1)
69
{
//cout << d[i+1] << "\t\t" << y[i+1] << endl;
d[i]=d[i+1]-y[i+1];
temp = (int)(d[i]*points*z);
y[i]=dk2[i][temp];
x[i]=x[i+1]-y[i+1];
}
else if(i==1)
{
//cout << "d=" << d[i+1] << "\t\ty=" << y[i+1] << endl;
d[i]=d[i+1]-y[i+1];
y[i]=d[i];
x[i]=y[i]+s;
}
printf("\nd[%i] = %f \t y[%i] = %f \t x[%i] = %f" , i, d[i], i, y[i], i, x[i]);
}
cout << endl << setfill('-') << setw(73) << "-" << endl;
getch();
system ("PAUSE");
return 0;
} //end main
double RootVal(int k, double d, double y, double c)//calculate the root value of the
current distribution
{
double rtval;
double calc;
70
double A = exp(-1*lambda*(d-y+s));
double B = (1/pow(lambda,2))*pow((1-exp(-1*lambda*y)),2);
double C = pow(y,2)*exp(-1*lambda*y);
double Wh = exp(-1*lambda*(d-y+s))*(1-exp(-1*lambda*y));
double Sig2h = ((1/pow(lambda,2))*pow((1-exp(-1*lambda*y)),2)-
pow(y,2)*exp(-1*lambda*y))/pow((1-exp(-1*lambda*y)),2);
calc = pow(A,2)*(B-C)*c;
if (calc<0)
{
cout<<"\nError: Negative root...\n";
rtval = -1;
}
else
{
calc = sqrt(calc);
}
rtval = calc;
return rtval;
}
double Minimum(double val1,double val2) // returns minimum of 2 numbers
{
if (val1<=val2)
{
return val1;
}
else
{
71
return val2;
}
}
/*this functions performs the same actions as "function". It only defers in terms of the
iterations of the for loop.*/
double fun(int k,int n,double incf,int minYk,int maxYk,bool isFirstRun, double cost[])
{
assert (k>=1); //Abort if k is negative
double dblRetVal;
double d = n*incf; //d value for the function
double y;
//int c;
double min;
double val;
double miny = 0;
int col;
if(k==1) //base case
{
y = d;
dblRetVal = RootVal(k, d, y, cost[0]);
}
else
{
for(int i=minYk; i<=maxYk; i++) //iterate over the interval allowed to
calculate the 6dp results.
{
y = i*incf; //this sets to precission of y to 6dp
double root;
72
root = RootVal(k, d, y, cost[k-1]); //calculate the root.
if(root != -1) //if root is valid
{
col = n-i; //get the current d value
if(minkf2[k-1][col]==-9999)
{
if(isFirstRun) //check if the result has been
previously calculated
{
val = root+ fun((k-1),col,incf,0,col,true,
cost); //if not, calculate the result
}
else
{ //if not, calculate the result
val = root+ fun((k-1),col,incf,ylimits[k-1]-
factor*z,ylimits[k-1]+ factor*z,false, cost);
}
}
else
{
val = root+ minkf2[k-1][col]; //if result exists, use
it for calculations
}
}
if (i==minYk)
{
min =val;//base case
}
73
else
{
min = Minimum(min,val);//get the minimum if the result
and the current minimum
}
if(min == val)
{
miny=y;
}//get the position of the current minimum
}//end for
dblRetVal = min;
}//end else
//store the f and the d value of the minimum calculated.
col = n;
minkf2[k][col] = dblRetVal;
dk2[k][col]=miny;
return dblRetVal;
}//end function
74
Appendix B:
The C++ Program Created to Determine the OSB
with cost factor for Right-Triangular Distribution
/* This program gives the optimum strata width and optimum strata boundaries of
the main study variable following the right-triangular distribution */
#include <iostream>
#include <math.h>
#include <assert.h>
#include <conio.h>
#include <iomanip>
using namespace std;
typedef double Number;
/*********************************************************************/
//declare and initialise global constants
# define z 100 //refine to 5 decimal places
#define a 1
#define b 2
# define factor 4
# define inc 0.001 // amount of precision (10^-3)
# define inc2 0.00001 // amount of precision (10^-5)
# define prec 1/inc
# define points 1000 //Keep this to be 1/inc
// function declarations
75
double RootVal(int k, double d, double y, double c); /*calculates the value of objective
function and the minimal elements*/
double Minimum(double val1,double val2); // returns minimum of 2 numbers
/*Recursive function receives the parameter k, dk, yk to calculate f.*/
double fun(int k,int n,double incf,int minYk,int maxYk,bool isFirstRun, double []);
//void WeightSamPop(int h, int N, int nh[], int Nh[], double w[], double d[], double y[],
double f, double Vybarst, double Vh[]) ;
//void sampsize(int nh[], int n, int h, double w[], double sig[], double prodwhsigh);
//declare global variables
int n, N; // number of strata (h), total sample size (n), pop size
double s; // s=x0, the initial value of 6.1
const int SIZE = 10;
//declare global constants and initialize their values
const double g = 1; // g is the distance
const int stages = 10;
int ylimits[10]; //stores the 3dp values for refining
const int e = (int)(g*points*z+1);
const int p = (int)(g*points+1);
double minkf2[stages][e];
double dk2[stages][e];
int h = 0;
int main() //program execution starts
{
double c[SIZE];
//take inputs of L, d and s as local variables
cout<<"Enter Number of Strata, L " << endl;
cin >> h;
76
s=1;
for (int i=0; i<h; i++)
{
cout<<"Enter the Cost, C " << endl;
cin >> c[i];
printf("c[%i+1] = %f \n\n", i, c[i]);
}
//initialize minkf locally
cout<<"\initializing points ...."<<endl;
for (int i=0; i < stages; i++)
{
for(int j=0; j<e; j++)
{
minkf2[i][j]= -9999; //assign -9999 to every cell
}
}
for (int k=0; k < stages; k++)
{
for(int l=0; l<e; l++)
{
dk2[k][l]= -9999; //assigning same as above
}
}
cout<<"Initialization complete"<<endl<<endl<<"Calculating...."<<endl<<endl;
double f=fun(h,p,inc ,0,p ,true, c);
77
double d[SIZE], y[SIZE], x[SIZE], w[SIZE], Vh[SIZE]; // d, y and x are arrays
of h float numbers
int nh[SIZE], Nh[SIZE]; //stratum sample sizes
int temp;
double Vybarst;
//backward calculation for the 3dp results
for(int i=h; i>=1; i--)
{
//c[i] = i+1;
if(i==h)
{
d[i] = g;
y[i] = dk2[i][p];
//c[i] = c[i+1];
}
else if(i<h && i>1)
{
d[i] = d[i+1]-y[i+1];
temp = (int)(d[i]*points);
y[i] = dk2[i][temp];
//c[i] = c[i+1];
}
else if(i==1)
{
d[i]=d[i+1]-y[i+1];
y[i]=d[i];
//c[i] = c[i+1];
}
}
78
//setup the limits for the 6dp calculations
for(int i=h; i>=1; i--)
{
temp = (int)(y[i]*points*z);
ylimits[i] = temp;
}
f=fun(h, e-1, inc2, ylimits[h]-factor*z, ylimits[h]+factor*z, false, c);// for k>=2
cout <<"Strata: L = " << h << setw(30) << "Distance: d = " << g << endl;
printf("\nf(h,g): %.10f \n" ,f);
cout << setw(20) << "\n\n Distance" << setw(25) << "Width" << setw(26) <<
"Boundary" << endl;
cout << setfill('-') << setw(73) << "-";
//Backward calucation for the 6 dp, compute d, y and x
for(int i=h; i>=1; i--)
{
if(i==h)
{
d[i]=g;
y[i] = dk2[i][(e-1)];
x[i]=s+g;
}
else if(i<h && i>1)
{
//cout << d[i+1] << "\t\t" << y[i+1] << endl;
79
d[i]=d[i+1]-y[i+1];
temp = (int)(d[i]*points*z);
y[i]=dk2[i][temp];
x[i]=x[i+1]-y[i+1];
}
else if(i==1)
{
//cout << "d=" << d[i+1] << "\t\ty=" << y[i+1] << endl;
d[i]=d[i+1]-y[i+1];
y[i]=d[i];
x[i]=y[i]+s;
}
printf("\nd[%i] = %f \t y[%i] = %f \t x[%i] = %f" , i, d[i], i, y[i], i, x[i]);
}
cout << endl << setfill('-') << setw(73) << "-" << endl;
getch();
system("PAUSE");
return 0;
} //end main
double RootVal (int k, double d, double y, double c)//calculate the root value of the
current distribution
{
double rtval;
double calc;
double A = y;
80
double B = ((pow(y,2))*((pow(y,2))-(6*(b-(d-y+s))*y)+(6*(pow((b-(d-
y+s)),2)))))/(18);
double Wh = y*(2*(b-(d-y+s))-y);
double Sig2h = ((pow(y,2))*((pow(y,2))-(6*(b-(d-y+s))*y)+(6*(pow((b-(d-
y+s)),2)))))/(18*pow((2*(b-(d-y+s))-y),2));
calc=pow(A,2)*(B*c);
if (calc<0)
{
// cout<<"\nError: Negative root...\n";
rtval = -1;
}
else
{
calc = sqrt(calc);
}
rtval = calc;
return rtval;
}
double Minimum (double val1,double val2) // returns minimum of 2 numbers
{
If (val1<=val2)
{
return val1;
}
else
{
return val2;
81
}
}
/*this functions performs the same actions as "function". It only defers in terms of the
iterations of the for loop.*/
double fun(int k,int n,double incf,int minYk,int maxYk,bool isFirstRun, double cost[])
{
assert (k>=1); //Abort if k is negative
double dblRetVal;
double d = n*incf; //d value for the function
double y;
//int c;
double min;
double val;
double miny = 0;
int col;
if (k==1) //base case
{
y = d;
dblRetVal = RootVal(k, d, y, cost[0]);
}
else
{
for(int i=minYk; i<=maxYk; i++) //iterate over the interval allowed to
calculate the 6dp
results.
{
y = i*incf; //this sets to precission of y to 6dp
double root;
82
root = RootVal(k, d, y, cost[k-1]); //calculate the root.
If (root != -1) //if root is valid
{
col = n-i; //get the current d value
if (minkf2[k-1][col]==-9999)
{
If (isFirstRun) //check if the result has been
previously calculated
{
val = root+ fun((k-1),col,incf,0,col,true,
cost); //if not, calculate the result
}
else
{ //if not, calculate the result
val = root+ fun((k-1),col,incf,ylimits[k-1]-
factor*z,ylimits[k-1]+ factor*z,false, cost);
}
}
else
{
val = root+ minkf2[k-1][col]; //if result exists, use
it for calculations
}
}
if (i==minYk)
{
min =val;//base case
}
83
else
{
min = Minimum(min,val);//get the minimum if the result
and the current minimum
}
If (min == val)
{
miny=y;
}//get the position of the current minimum
}//end for
dblRetVal = min;
}//end else
//store the f and the d value of the minimum calculated.
col = n;
minkf2[k][col] = dblRetVal;
dk2[k][col]=miny;
return dblRetVal;
}//end function
84
Appendix C:
The C++ Program Created to Determine the OSB
with cost factor for Standard Cauchy Distribution
/* This program gives the optimum strata width and optimum strata boundaries of
the main study variable following the standard Cauchy distribution */
#include <iostream>
#include <math.h>
#include <assert.h>
#include <conio.h>
#include <iomanip>
using namespace std;
typedef double Number;
/*********************************************************************/
//declare and initialise global constants
# define z 100 //refine to 5 decimal places
# define factor 4
# define inc 0.001 // amount of precision (10^-3)
# define inc2 0.00001 // amount of precision (10^-5)
# define prec 1/inc
# define points 1000 //Keep this to be 1/inc
// function declarations
double RootVal(int k, double d, double y, double c); /*calculates the value of objective
function and the minimal elements*/
double Minimum(double val1,double val2); // returns minimum of 2 numbers
85
/*Recursive function receives the parameter k, dk, yk to calculate f.*/
double fun(int k,int n,double incf,int minYk,int maxYk,bool isFirstRun, double []);
//void WeightSamPop(int h, int N, int nh[], int Nh[], double w[], double d[], double y[],
double f, double Vybarst, double Vh[]) ;
//void sampsize(int nh[], int n, int h, double w[], double sig[], double prodwhsigh);
//declare global variables
int n, N; // number of strata (h), total sample size (n), pop size
double s; // s=x0, the initial value of 6.1
const int SIZE = 10;
//declare global constants and initialise their values
const double g = 2; // g is the distance
const int stages = 10;
int ylimits[10]; //stores the 3dp values for refining
const int e = (int)(g*points*z+1);
const int p = (int)(g*points+1);
double minkf2[stages][e];
double dk2[stages][e];
int h = 0;
int main() //program execution starts
{
double c[SIZE];
//take inputs of L, d and s as local variables
cout<<"Enter Number of Strata, L " << endl;
cin >> h;
//cout<<"Enter Initial Value, Xo " << endl;
//cin >> s;
s=-1;
for (int i=0; i<h; i++)
86
{
cout<<"Enter the Cost, C " << endl;
cin >> c[i];
printf("c[%i+1] = %f \n\n", i, c[i]);
}
//initialize minkf locally
cout<<"\nInitializing points ...."<<endl;
for (int i=0; i < stages; i++)
{
for(int j=0; j<e; j++)
{
minkf2[i][j]= -9999; //assign -9999 to every cell
}
}
for (int k=0; k < stages; k++)
{
for(int l=0; l<e; l++)
{
dk2[k][l]= -9999; //assigning same as above
}
}
cout<<"Initialization complete"<<endl<<endl<<"Calculating...."<<endl<<endl;
double f=fun(h,p,inc ,0,p ,true, c);
double d[SIZE], y[SIZE], x[SIZE], w[SIZE], Vh[SIZE]; // d, y and x are arrays
of h float numbers
int nh[SIZE], Nh[SIZE]; //stratum sample sizes
87
int temp;
double Vybarst;
//backward calculation for the 3dp results
for(int i=h; i>=1; i--)
{
//c[i] = i+1;
if(i==h)
{
d[i] = g;
y[i] = dk2[i][p];
//c[i] = c[i+1];
}
else if(i<h && i>1)
{
d[i] = d[i+1]-y[i+1];
temp = (int)(d[i]*points);
y[i] = dk2[i][temp];
//c[i] = c[i+1];
}
else if(i==1)
{
d[i]=d[i+1]-y[i+1];
y[i]=d[i];
//c[i] = c[i+1];
}
}
//setup the limits for the 6dp calculations
for(int i=h; i>=1; i--)
{
88
temp = (int)(y[i]*points*z);
ylimits[i] = temp;
}
f=fun(h, e-1, inc2, ylimits[h]-factor*z, ylimits[h]+factor*z, false, c);// for k>=2
cout <<"Strata: L = " << h << setw(30) << "Distance: d = " << g << endl;
printf("\nf(h,g): %.10f \n" ,f);
cout << setw(20) << "\n\n Distance" << setw(25) << "Width" << setw(26) <<
"Boundary" << endl;
cout << setfill('-') << setw(73) << "-";
for(int i=h; i>=1; i--)
{
//c[i] = i+1;
if(i==h)
{
d[i]=g;
y[i] = dk2[i][(e-1)];
x[i]=s+g;
}
else if(i<h && i>1)
{
//cout << d[i+1] << "\t\t" << y[i+1] << endl;
d[i]=d[i+1]-y[i+1];
temp = (int)(d[i]*points*z);
y[i]=dk2[i][temp];
89
x[i]=x[i+1]-y[i+1];
}
else if(i==1)
{
//cout << "d=" << d[i+1] << "\t\ty=" << y[i+1] << endl;
d[i]=d[i+1]-y[i+1];
y[i]=d[i];
x[i]=y[i]+s;
}
printf("\nd[%i] = %f \t y[%i] = %f \t x[%i] = %f" , i, d[i], i, y[i], i, x[i]);
}
//WeightSamPop(h, N, nh, Nh, w, d, y, f, Vybarst, Vh);
cout << endl << setfill('-') << setw(73) << "-" << endl;
getch();
system("PAUSE");
return 0;
} //end main
double RootVal(int k, double d, double y, double c)//calculate the root value of the
current distribution
{
double rtval;
double calc;
calc = (y-atan(d-1)+atan(d-y-1))*(atan(d-1)-atan(d-y-1))-
0.25*pow((log((1+y*y+(2*y*(d-y-1))+
(d-y-1)*(d-y-1))/(1+(d-y-1)*(d-y-1)))),2);
90
if(calc<0)
{
// cout<<"\nError: Negative root...\n";
rtval = -1;
}
else
{
calc = 0.318309886*sqrt(calc*c);
}
rtval = calc;
return rtval;
}
double Minimum(double val1,double val2) // returns minimum of 2 numbers
{
if(val1<=val2)
{
return val1;
}
else
{
return val2;
}
}
/*this functions performs the same actions as "function". It only defers in terms of the
iterations of the for loop.*/
double fun(int k,int n,double incf,int minYk,int maxYk,bool isFirstRun, double cost[])
{
91
assert (k>=1); //Abort if k is negative
double dblRetVal;
double d = n*incf; //d value for the function
double y;
//int c;
double min;
double val;
double miny = 0;
int col;
if(k==1) //base case
{
y = d;
dblRetVal = RootVal(k, d, y, cost[0]);
}
else
{
for(int i=minYk; i<=maxYk; i++) //iterate over the interval allowed to
calculate the 6dp results.
{
y = i*incf; //this sets to precission of y to 6dp
double root;
root = RootVal(k, d, y, cost[k-1]); //calculate the root.
if(root != -1) //if root is valid
{
col = n-i; //get the current d value
if(minkf2[k-1][col]==-9999)
{
if(isFirstRun) //check if the result has been
previously calculated
92
{
val = root+ fun((k-1),col,incf,0,col,true,
cost); //if not, calculate the result
}
else
{ //if not, calculate the result
val = root+ fun((k-1),col,incf,ylimits[k-1]-
factor*z,ylimits[k-1]+ factor*z,false, cost);
}
}
else
{
val = root+ minkf2[k-1][col]; //if result exists, use
it for calculations
}
}
if (i==minYk)
{
min =val;//base case
}
else
{
min = Minimum(min,val);//get the minimum if the result
and the current minimum
}
if(min == val)
{
miny=y;
}//get the position of the current minimum
93
}//end for
dblRetVal = min;
}//end else
//store the f and the d value of the minimum calculated.
col = n;
minkf2[k][col] = dblRetVal;
dk2[k][col]=miny;
return dblRetVal;
}//end function
94
Appendix D:
The C++ Program Created to Determine the OSB
with cost factor for Power Distribution
/* This program gives the optimum strata width and optimum strata boundaries of
the main study variable following the power distribution */
#include <iostream>
#include <math.h>
#include <assert.h>
#include <conio.h>
#include <iomanip>
using namespace std;
typedef double Number;
/*********************************************************************/
//declare and initialise global constants
# define z 100 //refine to 5 decimal places
# define factor 4
# define inc 0.001 // amount of precision (10^-3)
# define inc2 0.00001 // amount of precision (10^-5)
# define prec 1/inc
# define points 1000 //Keep this to be 1/inc
// function declarations
double RootVal(int k, double d, double y, double c); /*calculates the value of objective
function and
the minimal elements*/
95
double Minimum(double val1,double val2); // returns minimum of 2 numbers
/*Recursive function receives the parameter k, dk, yk to calculate f.*/
double fun(int k,int n,double incf,int minYk,int maxYk,bool isFirstRun, double []);
//void WeightSamPop(int h, int N, int nh[], int Nh[], double w[], double d[], double y[],
double f, double Vybarst, double Vh[]) ;
//void sampsize(int nh[], int n, int h, double w[], double sig[], double prodwhsigh);
//declare global variables
int n, N; // number of strata (h), total sample size (n), pop size
double s; // s=x0, the initial value of 6.1
const int SIZE = 10;
//const double lambda = 1;
//declare global constants and initialise their values
const double g = 1; // g is the distance
const int stages = 10;
int ylimits[10]; //stores the 3dp values for refining
const int e = (int)(g*points*z+1);
const int p = (int)(g*points+1);
double minkf2[stages][e];
double dk2[stages][e];
int h = 0;
int main() //program execution starts
{
double c[SIZE];
//take inputs of L, d and s as local variables
cout<<"Enter Number of Strata, L " << endl;
cin >> h;
//cout<<"Enter Range of Data, d " << endl;
//cin >> g;
96
cout<<"Enter Initial Value, Xo " << endl;
cin >> s;
s=0;
for (int i=0; i<h; i++)
{
cout<<"Enter the Cost, C " << endl;
cin >> c[i];
printf("c[%i+1] = %f \n\n", i, c[i]);
}
//initialize minkf locally
cout<<"\nInitializing points ...."<<endl;
for (int i=0; i < stages; i++)
{
for(int j=0; j<e; j++)
{
minkf2[i][j]= -9999; //assign -9999 to every cell
}
}
for (int k=0; k < stages; k++)
{
for(int l=0; l<e; l++)
{
dk2[k][l]= -9999; //assigning same as above
}
}
cout<<"Initialization complete"<<endl<<endl<<"Calculating...."<<endl<<endl;
97
double f=fun(h,p,inc ,0,p ,true, c);
double d[SIZE], y[SIZE], x[SIZE], w[SIZE], Vh[SIZE]; // d, y and x are arrays
of h float numbers
int nh[SIZE], Nh[SIZE]; //stratum sample sizes
int temp;
double Vybarst;
//backward calculation for the 3dp results
for(int i=h; i>=1; i--)
{
//c[i] = i+1;
if(i==h)
{
d[i] = g;
y[i] = dk2[i][p];
//c[i] = c[i+1];
}
else if(i<h && i>1)
{
d[i] = d[i+1]-y[i+1];
temp = (int)(d[i]*points);
y[i] = dk2[i][temp];
//c[i] = c[i+1];
}
else if(i==1)
{
d[i]=d[i+1]-y[i+1];
y[i]=d[i];
//c[i] = c[i+1];
}
98
}
//setup the limits for the 6dp calculations
for(int i=h; i>=1; i--)
{
temp = (int)(y[i]*points*z);
ylimits[i] = temp;
}
f=fun(h, e-1, inc2, ylimits[h]-factor*z, ylimits[h]+factor*z, false, c);// for k>=2
cout <<"Strata: L = " << h << setw(30) << "Distance: d = " << g << endl;
// cout << "Sample Size: n = " << n << setw(30) << "Population Size: N = " << N
<< endl;
// printf("\n\nAccurate values derved after refining\n");
printf("\nf(h,g): %.10f \n" ,f);
cout << setw(20) << "\n\n Distance" << setw(25) << "Width" << setw(26) <<
"Boundary" << endl;
cout << setfill('-') << setw(73) << "-";
//Backward calculation for the 6 dp, compute d, y and x
for(int i=h; i>=1; i--)
{
//c[i] = i+1;
if(i==h)
{
d[i]=g;
99
y[i] = dk2[i][(e-1)];
x[i]=s+g;
}
else if(i<h && i>1)
{
//cout << d[i+1] << "\t\t" << y[i+1] << endl;
d[i]=d[i+1]-y[i+1];
temp = (int)(d[i]*points*z);
y[i]=dk2[i][temp];
x[i]=x[i+1]-y[i+1];
}
else if(i==1)
{
//cout << "d=" << d[i+1] << "\t\ty=" << y[i+1] << endl;
d[i]=d[i+1]-y[i+1];
y[i]=d[i];
x[i]=y[i]+s;
}
printf("\nd[%i] = %f \t y[%i] = %f \t x[%i] = %f" , i, d[i], i, y[i], i, x[i]);
}
cout << endl << setfill('-') << setw(73) << "-" << endl;
getch();
system("PAUSE");
return 0;
} //end main
double RootVal(int k, double d, double y, double c)//calculate the root value of the
current distribution
100
{
double rtval;
double calc;
calc = 3*pow(y,4)+24*pow(y,3)*(d-y)+84*pow(y,2)*pow((d-y),2)+120*y*pow((d-
y),3)+60*pow((d-y),4);
if(calc<0)
{
cout<<"\nError: Negative root...\n";
rtval = -1;
}
else
{
calc = pow(y,2)*sqrt(calc *c)/(4*pow(5,0.5));
}
rtval = calc;
return rtval;
}
double Minimum(double val1,double val2) // returns minimum of 2 numbers
{
if(val1<=val2)
{
return val1;
}
else
{
return val2;
}
}
101
/*this functions performs the same actions as "function". It only defers in terms of the
iterations of the for loop.*/
double fun(int k,int n,double incf,int minYk,int maxYk,bool isFirstRun, double cost[])
{
assert (k>=1); //Abort if k is negative
double dblRetVal;
double d = n*incf; //d value for the function
double y;
//int c;
double min;
double val;
double miny = 0;
int col;
if(k==1) //base case
{
y = d;
dblRetVal = RootVal(k, d, y, cost[0]);
}
else
{
for(int i=minYk; i<=maxYk; i++) //iterate over the interval allowed to
calculate the 6dp results.
{
y = i*incf; //this sets to precision of y to 6dp
double root;
root = RootVal(k, d, y, cost[k-1]); //calculate the root.
if(root != -1) //if root is valid
102
{
col = n-i; //get the current d value
if(minkf2[k-1][col]==-9999)
{
if(isFirstRun) //check if the result has been
previously calculated
{
val = root+ fun((k-1),col,incf,0,col,true,
cost); //if not, calculate the result
}
else
{ //if not, calculate the result
val = root+ fun((k-1),col,incf,ylimits[k-1]-
factor*z,ylimits[k-1]+ factor*z,false, cost);
}
}
else
{
val = root+ minkf2[k-1][col]; //if result exists, use
it for calculations
}
}
if (i==minYk)
{
min =val;//base case
}
else
{
103
min = Minimum(min,val);//get the minimum if the result
and the current minimum
}
if(min == val)
{
miny=y;
}//get the position of the current minimum
}//end for
dblRetVal = min;
}//end else
//store the f and the d value of the minimum calculated.
col = n;
minkf2[k][col] = dblRetVal;
dk2[k][col]=miny;
return dblRetVal;
}//end function