Research Design Notes Weeks 7-12

  • Upload
    seggy7

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

  • 8/18/2019 Research Design Notes Weeks 7-12

    1/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 1

     

    Topic 1: Basic Conceptsof Experimental Design

    Dr Amirul Islam

    Acknowledged to: Dr Jahar Bhowmik

  • 8/18/2019 Research Design Notes Weeks 7-12

    2/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 2

     

    Contents

    1.1 Topic introduction 3

    1.2 Topic learning objectives 4

    1.3 Important Terms and Definitions of Experimental Design 4

    1.4 Principles of an Experimental Design 8

    1.5 Design of Experiments in Marketing 11

    1.6 Sample Surveys versus Experimental Design 12

    1.7 The Parallels between Experimental Designs & Sample Surveys 12

    1.8 Study Design in Medical Research 13

    1.9 Guidelines for Designing Experiments 14

    1.10 Research questions and hypotheses 15

    Revision Exercises 17

    Solution to Revision Exercises 20

    References 22

  • 8/18/2019 Research Design Notes Weeks 7-12

    3/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 3

     

    Note: Some of the materials are adapted from standard texts and guides (see references).  

    1.1 Topic introduction

    “The formulation of a problem is often more essential than its solution which may be

    merely a matter of mathematical or experimental skill”. --------------Albert Einstein 

    Design of Experiment is a structured, organized method that is used to determine the

    relationship between the different factors affecting a process and the output of that

    process. This method was first developed in the 1920s and 1930, by Sir Ronald A. Fisher,

    the renowned mathematician and geneticist.

    This chapter examines the basic concepts of experimental design. Experimentation and

    making inferences are twin features of general scientific methodology. The subject-matter

    of experimental design includes:

    (i)  Planning the experiment,

    (ii)  Obtaining relevant information from it regarding the statisticalhypothesis under study, and(iii)  Making a statistical analysis of the data.

     Experimental design  is a term which includes efficient methods for planning for the

    collection of data, in order to obtain the maximum amount of information for the least

    amount of work. Data are everywhere. Anyone can collect and analyse data, be it in the

    lab, the field, or the production plant, can benefit from knowledge about experimental

  • 8/18/2019 Research Design Notes Weeks 7-12

    4/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 4

    design. Directed experimentation generates critical events. An experiment is “an

    invitation for an informative event to occur” (Box et al., 2005).

    Experience has shown that proper consideration of statistical analysis before the

    experiment is conducted, forces the experimenter to plan more carefully the design of the

    experiment. The observations obtained from a carefully planned and well-designed

    experiment give entirely valid inferences.

    Experiments are usually more structured than sample surveys and include the additional

    step of treating the elements. In Sample Survey we select elements from frames and then

    take measurements (such as responses to a questionnaire) but in Experimental Designs 

    we select experimental units, allocate treatments and then take measurements (either a

    few or all elements).

    1.2 Topic learning objectives

    Learning objectives

    When you have worked through this topic you should:

    •  Understand the idea of experimental design.

    •  Know the basic definition of experimental design.

    •  Understand the basic concepts that underlie scientific investigations.

    1.3 Important Terms and Definitions ofExperimental Design

    Observation (Correlational) and experimental studies

  • 8/18/2019 Research Design Notes Weeks 7-12

    5/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 5

    A study in which the researcher observes and records what has already happened is called

    an observational approach. On the other hand, an experimental study or trial is initiated

    by a researcher. In an "ideal" experiment, the researcher manipulates the independent

    variable (s), holds all other variables constant, and observes the changes in the dependent

    variable. In experimental studies or trials we determine which experimental units receive

    which treatment, whereas in observational studies we have to take what is observed.

    Observational studies often show an association between two variables, but they cannot in

    themselves prove cause and effect.

    For example, consider the hypothesis: 

    "Driving ability varies with blood alcohol level".

    The researcher would manipulate the amount of alcohol given to the drivers and then

    observe changes in their driving skills. If all other variables are held constant, then any

    changes in driving skill  must be caused by the effects of the alcohol.

    Consider an alternative means of collecting data. The researcher stands outside the pub

    on Friday night and asks for volunteers leaving the pub. Each volunteer undergoes a

    driving test and also has his/her blood alcohol level measured. The researcher then

    compares the driving skills of volunteers with zero blood alcohol level to the driving

    skills of those drivers whose alcohol level is over .05. This is an observational design.The researcher is merely observing  the blood alcohol level of each subject, rather than

    controlling or manipulating it.

    Experiment

    An experiment is the device or the means of getting the answer to the problem under

    investigation, e.g. comparison of different manures or fertilizers, different varieties of a

    crop, different cultivation processes, or different diets or medicines in a dietary or medical

    experiment.

    An experiment is a planned inquiry to discover new facts, or to confirm or deny the

    results of previous investigations (Petersen, 1985).

    Nuisance variables

    Nuisance variables are associated with variation in an outcome (dependent variable) that

    is extraneous to the effects of independent variables that are of primary interest to the

    researcher. In our description of an "ideal" experiment, we stated that "all other variables"

    should be held constant. If, for example, we are interested in the effects of alcohol on

    driving ability, then any other variable which may influence driving ability is known as a

    nuisance  variable. Such things as the type of car, the driving course, temperature,

  • 8/18/2019 Research Design Notes Weeks 7-12

    6/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 6

    humidity, time of day, and the driver's, age, reflexes and level of experience would all

    have an influence on a driving test score. These are all referred to as nuisance variables.

    Confounding variables

    Variables that are not controlled for that systematically change experimental results, they

    are called confounding variables. A confounding variable has two properties. First, a

    confounding variable is related to the explanatory (independent) variable in the sense thatindividuals who differ due to the explanatory variable are also likely to differ for the

    confounding variable. Second, a confounding variable affects the response (dependent)

    variable.

    Suppose we are interested in the effects of alcohol on driving ability. If all of the zero

    alcohol level driving tests were performed in the morning, and all of the .05 alcohol level

    driving tests were completed in the evening, we could not tell if the resulting differences

    in driving abilities were due to differences in the alcohol level, or if they were due to

    differences in the time of day of the test. In this case, "time of day" is known as a

    confounding factor , because it literally confounds our interpretation of the experiment.

    Treatments

    Various objectives of comparison in a comparative experiment, are known as treatments,

    e.g., in field experimentation different fertilizers or different varieties of crop or different

    methods of cultivation are treatments.

    A treatment is one or a combination of categories of the explanatory variable(s) assigned

    by the experimenter. The plural term treatments incorporates a collection of conditions,

    each of which is one treatment.

    Factor and Level

    A  factor of an experiment is a controlled independent variable; the levels of the variable

    are set by the experimenter.

    A factor is a general type or category of treatment. Different treatments constitute

    different levels of a factor. For example, three different groups of runners are subjected to

    three different training methods. The runners are the experimental units, the training

    methods are the treatments. Where the three types of training methods constitute three

    levels of the factor e.g. 'type of training'. The states of a factor, i.e., the treatments within

    the class, are called the levels of the factor.

    Experimental Units

    The individuals in an experiment are referred to as experimental units. The smallest

    division of the experimental material, to which we apply the treatments and on which we

    make observations on the variable under study, is termed an experimental unit.

    Experimental units can be people, animals, batteries, etc. In field experiment the plot of

    ‘land’ is the experimental unit. In other examples may be a patient in a hospital, pigs in a

    pen, or a batch of seeds. With animal trials, an experimental unit can be a paddock of

    animals, a single animal, or even part of an animal.

  • 8/18/2019 Research Design Notes Weeks 7-12

    7/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 7

     

    Blocks

    In agricultural experiments, most of the time we divide the whole experimental unit

    (field) into relatively homogenous sub-groups (shown in the following diagram) or strata.

    These strata, which are uniform amongst themselves, are known as blocks. That means, a

    block is a group of experimental units that are similar in a way that is expected to affect

    the response to the treatments. A group of homogenous experimental units is called ablock.

    The term blocking was first used by R. A. Fisher in agronomic experiments (1920). In the

    statistical theory of the design of experiments, blocking is the arranging of experimental

    units in groups (blocks) that are similar to one another. Typically, a blocking factor is a

    source of variability that is not of primary interest to the experimenter. Blocking is

    sometimes used for nuisance factors that can be controlled. Nuisance factors are those

    that may affect the measured result, but are not of primary interest. For example, in

    applying a treatment, nuisance factors might be the specific operator who prepared the

    treatment, the time of day the experiment was run, or the room temperature. All

    experiments have nuisance factors. The experimenter will typically need to spend some

    time deciding which nuisance factors are important enough to keep track of or control if

    possible, during the experiment.

    Figure 1: Non-homogenous experimental units

    Figure 2: Blocking into homogenous groups

    Replication

    Replication means the repetition of a test or an experiment more than once. In other

    words, the repetition of treatments under investigation is known as replication.

    Precision

  • 8/18/2019 Research Design Notes Weeks 7-12

    8/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 8

    The reciprocal of the variance of the mean is termed as the precision, or the amount of

    information of a design. Thus for an experiment replicated r  times, the precision is given

    by

    2

    1

    var( )

     x   σ  = .

    Experimental ErrorLet us suppose that a large homogenous field is divided into different plots (of equal

    shape and size) and different treatments are applied to those plots. If the yields from some

    of the treatments are more than those of the others, the experimenter is faced with the

    problem of deciding if the observed differences are really due to treatment effects or they

    are due to chance (uncontrolled) factors. In field experiments, it is a common experience

    that the fertility gradient of the soil does not follow any systematic pattern but behaves in

    an erratic fashion. Experience tells us that even if the same treatment is used on all the

    plots, the yields would still vary due to the differences in soil fertility. Such variation

    from plot to plot, which is due to random (or chance or non-assignable) factors beyond

    human control, is spoken of as experimental error . It may be pointed out that the term

    ‘error’ used here in not synonymous with ‘mistake’ but is a technical term which includes

    all types of extraneous variations due to:

    (i)  the inherent variability in the experimental material towhich treatments are applied,

    (ii)  the lack of uniformity in the methodology of conducting theexperiments, or in other words failure to standardise the

    experimental technique, and

    (iii)  lack of representativeness of the sample to the populationunder study.

    Blind Experiment

    The blind method is a part of some scientific methods, used to prevent research outcomes

    from being influenced by either the placebo effect or the observer bias. In a blind

    experiment, the subjects do not know whether they are in the treatment group or the

    control group. The idea is that the groups studied, including the control, should be

    unaware of the group they are placed in. In medicine, when researchers are testing a new

    medicine, they ensure that the placebo looks, and tastes, the same as the actual medicine.

    There is strong evidence of a placebo effect with medicine, where, if people believe that

    they are receiving a medicine, they show some signs of improvement in health. A blind

    experiment reduces the risk of bias from this effect, giving an honest baseline for theresearch, and allowing a realistic statistical comparison.

    Ideally, the subjects should not be told that a placebo was being used at all, but this is

    regarded as unethical.

    Natural sources of error in field experiments

    Plant variability 

  • 8/18/2019 Research Design Notes Weeks 7-12

    9/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 9

    –  type of plant, larger variation among larger plants –  competition, variation among closely spaced plants is smaller –  plot to plot variation because of plot location (border effects) 

    Seasonal variability –  climatic differences from year to year –  rodent, insect, and disease damage varies –  conduct tests for several years before drawing firm conclusions 

    Soil variability 

    –  differences in texture, depth, moisture-holding capacity, drainage, availablenutrients 

    –  since these differences persist from year to year, the pattern of variability can bemapped with a uniformity trial 

    1.4 The Three basic Principles of Experimental

    Design

    Professor Ronald A. Fisher pioneered the study of experimental designs with his classicalbook, The Design of Experiments. According to him, the basic principles of the design of

    experiments are:

    (i)  Randomisation(ii)  Replication, and(iii)  Local Control or Error Control or Blocking.

    The roles they play in data collection and interpretation are discussed below.

    Randomisation

    By randomisation we mean that both the allocation of the experimental material and the

    order in which the individual runs or trials of the experiment to be performed, are

    randomly determined. After the treatments and the experimental units are decided the

    treatments are allotted to the experimental units at random to avoid any type of personal

    or subjective bias which may be conscious or unconscious. This brings to the

    experimenter the question of allocation of treatments to experimental units so that each

    treatment gets an equal chance of showing its worth. In the absence of prior knowledge of

    the variability of the experimental material, this objective is achieved through‘randomisation’, a process of assigning the treatments to various experimental units in a

    purely chance manner. The following are the main objectives of randomisation:

    (i)  To eliminate bias,(ii)  To ensure independence among the observations.

    Criteria for randomisation in clinical trial studies

  • 8/18/2019 Research Design Notes Weeks 7-12

    10/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 10

     1. Unpredictability

    –  Each participant has the same chance of receiving any of the interventions.–  Allocation is carried out using a chance mechanism so that neither the

    participant nor the investigator will know in advance which will be

    assigned.

    2. Balance

    –  Treatment groups are of a similar size & constitution; groups are alike in all

    important aspects and only differ in the intervention each group receives.3. Simplicity

    –  Easy for investigator/staff to implement.

    Replication

    As pointed out earlier, replication means the execution of an experiment more than once.

    In other words, the repetition of treatments under investigation is known as replication.

    An experimenter resorts to replication in order to average out the influence of the chance

    factors on different experimental units. Thus, the repetition of treatments results in a more

    reliable estimate than is possible with a single observation. Replication is necessary to

    increase the accuracy of estimates of the treatment effects. Although, the more the

    number of replications the better the estimate is, it cannot be increased indefinitely as it

    increases costs of experimentation.

    Replication serves a number of purposes in an experimental design:

    (i)  It allows the experimenter to obtain an estimate of the experimentalerror.

    (ii)  It permits the experimenter to increase precision by reducingstandard errors.

    (iii)  It can expand the base for making inference.

    Local Control or Blocking

    Blocking means to arrange the experimental materials into groups, or blocks, of more or

    less uniform experimental units. If the experimental material, say field for agriculture

    experimentation, is heterogenous and different treatments are allocated to various units

    (plots) at random over the entire field, the soil heterogeneity will also enter the

    uncontrolled factors and thus increase the experimental error. It is desirable to reduce the

    experimental error as far as practicable without unduly increasing the number of

    replications or without interfering with the statistical requirement of randomness, so that

    even smaller differences between treatments can be detected as significant.

    In addition to the principles of replication and randomisation discussed earlier, the

    experimental error can further be reduced by making use of the fact that neighbouring

    areas in a field are relatively more homogenous than those widely spread. In order to

    separate the soil fertility effects from the experimental error, the whole experimental area

    (field) is divided into homogenous groups (blocks) row-wise or column-wise or both,

    according to the fertility gradient of the soil such that the variation within each block is

    minimum and between the blocks is maximum. The treatments are then allocated at

  • 8/18/2019 Research Design Notes Weeks 7-12

    11/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 11

    random within each block. The process of reducing the experimental error by dividing the

    relatively heterogenous experimental area (field) into homogenous blocks is known as

    local control. 

    Example 1.1

    Consider the very simple agricultural problem of comparing two varieties of tomatoes.The purpose of the comparison is to find the variety which produces the greater quantityof marketable quality fruit from a given area for large scale commercial planting. Whatshould we do? A simple approach would be to plant a block of land of each variety andmeasure the total weight of marketable fruit produced. However, there are some obviousdifficulties. The variety that cropped most heavily may have done so simply because itwas growing in better soil. There are a number of factors which affect growth: soilfertility, soil acidity, irrigation and drainage, wind exposure, exposure to sunlight(e.g. shading, north-facing or south-facing hillside). Unfortunately no one knows exactlyto what extent changes in these factors affect growth. So unless the two blocks of land arecomparable with respect to all of these features, we won't be able to conclude that themore heavily producing variety is better as it may just be planted in a block that is bettersuited to growth.

    If it was possible (and it never will be) to find two tracts of land that were identical inthese respects, using just those two blocks for comparison would result in a faircomparison but the differences found might be so special to that particular combination ofgrowing conditions that the results obtained were not a good guide to full scaleagricultural production anyway.

    Why randomise? Let us think about it another way. Suppose we took a large block ofland and subdivided it into smaller plots by laying down a rectangular grid. By usingsome sort of systematic design to decide what variety to plant in each plot, we may comeunstuck if there is a feature of the land like an unknown fertility gradient. We may stillend up giving one variety better plots on average. Instead, let's do it randomly bynumbering the plots and randomly choosing half of them to receive the first variety. Therest receive the second variety. We might expect the random assignment to ensure that

    both varieties were planted in roughly the same numbers of high fertility and low fertilityplots, high pH and low pH plots, well drained and poorly drained plots etc.

    In that sense we might expect the comparison of yields to be fair. Moreover, although wehave thought of some factors affecting growth, there will be many more that we, and eventhe specialist, will not have thought of. And we can expect the random assignment oftreatments to ensure some rough balancing of those as well!

    Why replicate? Random sampling gives representative samples, on average. However, insmall samples, it may occur, just by chance, that your sample may be a 'bit weird'.Unfortunately, we can only expect the random allocation of treatments to lead to balancedsamples (e.g. a fair division of the more and less fertile plots) if we have a large numberof experimental units to randomise. In many experiments this is not true (e.g. using plots

    to compare varieties) so that in any particular experiment there may well be a lack ofbalance on some important factor. Random assignment still leads to fair or unbiasedcomparisons, but only in the sense of being fair or unbiased when averaged  over a wholesequence of experiments. This is one of the reasons why there is such an emphasis inscience on results being repeatable.

    Why block? Partly because random assignment of treatments does not necessarily ensurea fair comparison when the number of experimental units is small. In this case morecomplicated experimental designs are available to ensure fairness with respect to thosefactors which we believe to be very important. Suppose with our tomato example that,

  • 8/18/2019 Research Design Notes Weeks 7-12

    12/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 12

    because of the small variation in the fertility of the land we were using, the only thing thatwe thought mattered greatly was drainage. We could then try and divide the land into twoblocks, one well drained and one badly drained. These would then be subdivided intosmaller plots, say 6 plots per block. Then in each block, 3 plots are assigned at random tothe first variety and the remaining 3 plots to the second variety. We would then onlycompare the two varieties within each block so that well drained plots are only comparedwith well drained plots, and similarly for badly drained plots. This idea is called blocking.By allocating varieties to plots within a block at random we would provide some

    protection against other extraneous factors.

    1.5 Design of Experiments in MarketingDesign of experiments, or conjoint analysis as it is known in a marketing context, is

    known to be the most powerful statistical method for establishing the linkage between a

    customer's decision-making process and the service or product being offered. After

    effective application of design of experiments, companies find it easier to gain an insight

    into the significant variables affecting a customer's decision-making ability. 

    Marketing Problems

    Eventually, the primary aim of marketing is to calculate the upcoming market share netsales, or profitability of an offering, thus, allowing a company to:

    • Foretell customer buying tendency

    • Boost customer retention

    • Ascertain trade-off strategies during contract negotiation

    • Ascertain competitive pricing

    • Predict sales

    • Control brand equity

    • Devise product elements

    • Establish price sensitivity

    • Forecast and reduce customer switch rates

    • Ascertain best market position for new product introductions.

    1.6 Sample Surveys versus Experimental DesignExperiments are usually more structured than sample surveys and include the additionalstep of treating the elements in some way.

    Sample Survey Experimental Design

  • 8/18/2019 Research Design Notes Weeks 7-12

    13/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 13

      Select elements from frame. Select experimental units

    Take measurements. Allocate treatments.

    Take measurements.

    1.7 The Parallels between Experimental Designs &Sample Surveys

    Sample Survey Experimental Design

    Random selection is the method used to

    choose units from the population for the

    sample.

    Randomisation is used to assign treatments

    to experimental units.

    The sampling error can be minimized

    by stratification.

    Method of blocking/local control is

    common to reduce error.

    Partial grouping is useful in cluster

    sampling.

    Partial grouping is useful in split-plot

    designs.

    For analysis regression techniques are

    useful.

    For analysis ANOVA (analysis of

    variance) and ANCOVA (analysis of

    covariance) are useful.

    1.8 Study Design in Medical Research

    (Taken from Dawson, B. & Trapp, R.G. (2004): Basic & Clinical Biostatistics, p.7)

    Study designs in medicine fall into two categories: studies in which subjects are observed

    (observational), and studies in which the effect on an intervention in observed

    (experimental).

    Classification of Study Designs

    With a little practice, the classification of study designs outlined below would help us to

    read medical articles and classify studies with little difficulty.

    1.  Observational Studies

    a.  Descriptive or case-series

  • 8/18/2019 Research Design Notes Weeks 7-12

    14/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 14

    b.  Case-control studies (retrospective studies)

    i.  Case and incidence of disease

    ii.  Identification of risk factors

    c.  Cross-sectional studies, surveys (prevalence)

    i.  Disease description

    ii.  Diagnosis and stagingiii.  Disease processes, mechanisms

    d.  Cohort studies (prospective studies)

    i.  Causes and incidence of disease

    ii.  Natural history, prognosis

    iii.  Identification of risk factors

    e.  Historical cohort studies

    2.  Experimental studies

    a.  Controlled trials

    i.  Parallel or concurrent controls

    1.  Randomised

    2.  Not randomised

    ii.  Sequential controls

    1.  Self-controlled

    2.  Crossover

    iii.  External controls (including historical)

    b.  Studies with no controls

    3.  Meta-analysis.

    1.9 Guidelines for Designing Experiments

    (Taken from Montgomery, D.C. (2005): Design and Analysis of Experiments).

    To use the statistical approach in designing and analysing an experiment, it is necessary

    for everyone involved in the experiment to have a clear idea in advance of exactly what is

    to be studied, how the data are to be collected, and at least a qualitative understanding of

    how these data are to be analysed. An outline of the recommended procedure byMontgomery (2005) is as below:

    Step 1: Recognition of and statement of the problem

    Step 2: Selection of the response variable*

    Step 3: Choice of factors, levels and ranges*

    Step 4: Choice of experimental design

    Step 5: Performing the experiment

    Pre-experimental planning.

  • 8/18/2019 Research Design Notes Weeks 7-12

    15/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 15

    Step 6: Statistical analysis of the data

    Step 7: Conclusions and recommendations.

    *In practice, steps 2 & 3 are often done simultaneously or in reverse order.

    Step 1:  The first step for designing an experiment is to develop all ideas about the

    objectives of the experiment. It is usually helpful to prepare a list of specific problems or

    questions that are to be addressed by the experiment. A clear statement of the problem

    often contributes substantially to better understanding of the phenomenon being studied

    and the final solution of the problem. It is also important to keep an overall objective in

    mind; for example, is this a new process or system-in which case the initial objective is

    likely to be characterization or factor screening-or is it a mature or reasonably well-

    understood system that has been previously characterized-in which case the objective may

    be optimization.

    Step 2:  In selecting the response variable, the experimenter should be certain that this

    variable really provides useful information about the process under study. Most often, the

    average or standard deviation (or both) of the measured characteristic will be the response

    variable. Multiple responses are not unusual. It is usually critically important to identify

    issues related to defining the responses of interest and how they are to be measured before

    conducting the experiment.

    Step 3: When considering the factors that may influence the performance of a process or

    system, the experimenter usually discovers that these factors can be classified as either

    potential design factors or nuisance factors. The potential design factors are those factors

    that the experimenter may wish to vary in the experiment. Nuisance factors are often

    classified as controllable, uncontrollable, or noise factors. Once the experimenter has

    selected the design factors, he or she must choose the ranges over which these factors will

    be varied and the specific levels at which runs will be made. We reiterate how crucial it is

    to bring out all points of view and process information in steps 1 through 3. We refer to

    this as pre-experimental planning.

    Step 4: If the above pre-experimental planning activities are done correctly, this step is

    relatively easy. Choice of design involves consideration of sample size (number of

    replicates), selection of a suitable run order for the experimental trials, and determination

    of whether or not blocking or other randomisation restrictions are involved. In Topic 2 we

    discusses some of the important types of experimental designs for a wide variety of

    problems. In selecting design, it is important to keep the experimental objectives in mind.

    Step 5: When running the experiment, it is vital to monitor the process carefully to ensure

    that everything is being done according to plan. Errors in experimental procedure at this

    stage will usually destroy experimental validity. Up-front planning is crucial to success. It

    is easy to underestimate the logistical and planning aspects of running a designed

    experiment in a complex manufacturing or research and development environment. Thisstep suggests re-visiting the decisions made in steps 1-4, if necessary.

    Step 6:  Statistical methods should be used to analyse the data so that results and

    conclusions are objective rather than judgemental in nature. If the experiment has been

    designed correctly and performed according to the design, the statistical methods required

    are not elaborate. Remember that statistical methods cannot prove that a factor (or

    factors) has a particular effect. They only provide guidelines as to the reliability and

    validity of results.

  • 8/18/2019 Research Design Notes Weeks 7-12

    16/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 16

    Step 7:  Once the data have been analysed, the experimenter must draw practical

    conclusions about the results and recommend a course of action. Graphical methods are

    often used in this stage, particularly in presenting the results to others. Follow-up runs and

    confirmation testing should also be performed to validate the conclusions from the

    experiment.

    1.10 Research questions and hypothesesThe development of a research question from a research idea is largely a matter of

    organising one’s thoughts into a concise statement of what one intends to do and why.

    Research questions and hypotheses are closely related but are not quite the same. A

    hypothesis is a statement, at a higher level, in which an attempt is made to generalise

    about the nature of the universe in which we live.

    Research begins with a question. Such questions may come about talking with friends,

    reading the scientific literature, or through an untold number of ways. When reading the

    current literature as a means to inform your research, you will need to ask three questions:

    1. ‘Is my idea based solidly in theory?’; 2. ‘Is this idea the next most obvious step for the

    discipline to take?’; and 3. ‘Is my idea novel in some way?’ Having satisfied yourself

    that your idea is worth pursuing it is necessary to turn it into a specific research question.

    In doing so, you will have to tease out various parts of your idea, making each a more

    focused question. Through this process there is the genesis of experimental/research

    hypotheses.

    There is an art to devising good experimental/research hypotheses. As a general rule there

    should be one hypothesis per experiment. Put another way, each experiment should have

    only one question to answer. As to how we state an experimental/research hypothesis, it is

    more or less convention to treat it as a proposition of only one sentence. Begin with the

    word ‘That….’. Within the hypothesis include the general sort of manipulation you will

    be performing, known as the independent variable, and what it is you will be measuring,

    now referred to as the dependent variable. However, an excellent hypothesis goes one

    step further by suggestion how specific treatments known as the levels of the independent

    variable, will affect the dependent variable.

    Research design and analysis is a method of thought. It begins with a good idea that is

    then refined into an experimental/research hypothesis but does not conclude until the

    experiment is completed and the results published. At its heart is an experimental design

    that limits error and thus promotes a simple and honest analysis of the data.

    (Taken from Edwards, T. 2008).

    Example: A suitable research question might be:

    “Does drug treatment of hypertension reduce the morbidity associated with cardiovasculardisease?”

    A suitable hypothesis for the above research question might be:

    “Participants with hypertension who are treated with a specific drug will experience less

    morbidity associated with cardiovascular disease than participants who were not.”

  • 8/18/2019 Research Design Notes Weeks 7-12

    17/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 17

     

    Revision Exercises

    1.  Explain the differences between sample survey and experimental design.

    2.  A pharmaceutical company wishes to test a new medication it thinks will reducecholesterol. A group of 20 volunteers is formed and each person has their cholesterol

    level measured. Half the group is randomly assigned to take the new drug and the other

    half is given a placebo. After 6 months the volunteers’ cholesterol is measured again and

    any change from the beginning of the study is recorded. In this experiment, identify the

    experimental unit, factors, treatments, and response variable.

    3.  An agricultural researcher is interested in determining how much water andfertilizer are optimal for growing a certain plant. Twenty four plots of land are available

    to grow the plant. The researcher will apply three different amounts of fertilizer (low,

    medium, and high) and two different amounts of water (light and heavy). These will be

    applied at random in equal combination to each of four plots. After 6 weeks, the plants’heights in each plot will be recorded.

    Identify the experimental units, factors and their levels, treatments (treatment

    combinations), and response variable in this study.

    4.  In 1930, it was decided to carry out an experiment in Lanarkshire schools to assess the

    possible beneficial effects of giving the children free milk during the school day. Twenty

    thousand children took part and over the course of five months, February to June, half of

    them had three-quarters of a pint of either raw or pasteurised milk while the remainder

    did not have milk. All the children were weighed and had their heights measured before

    and after the experiment, but contrary to expectation the average increase for the children

    who had not had milk exceeded that for the children who had milk. This unexpected

    result was later attributed to unconscious bias in the formation of the groups being

    compared. In each school the division of the children into a "milk" or a "no-milk" group

    was made either by ballot or by using an alphabetic system, but if the outcome appeared

    to give groups with an undue preponderance of well-nourished or ill-nourished children,

    some arbitrary interchange was carried out in an effort to balance them. In this

    interchange the teachers must have unconsciously tended to put a preponderance of ill-

    nourished children into the group receiving milk. The results of the experiment were

    further complicated by the fact that the children were weighed in their clothes and this

    probably introduced a differential effect as between winter and summer and children from

    poorer and wealthier homes. Because of the deficiencies in design the results of theexperiment were ambiguous despite the very large sample of children concerned.

    (a) Suggest an appropriate research hypothesis.

    (b) What is the independent/predictor variable?

    (c) What is the dependent/outcome variable?

  • 8/18/2019 Research Design Notes Weeks 7-12

    18/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 18

     

    (d) Is it an observational study or a designed experiment?

    5.  From the shelf of a fresh juice shop, all the bottles of a certain brand of orange

     juice on the shelf on a particular day were taken and analysed to observe the

    vitamin C in orange juice. There were 21 bottles, and their vitamin C readings

    (mg/100gm) were as follows:15, 21, 20, 21, 18, 17, 15, 17, 13, 22, 23, 16,

    13, 19, 23, 20, 25, 14, 26, 22, 23.

    (a)  Is it an observational study or a designed experiment?

    (b) Is the random variable discrete or continuous?

    (c) What parameters are we likely to be interested in estimating?

    (d) What null hypothesis might be taken?

    6.  A researcher conducted an experiment to examine the efficiency of three types of

    fungal sprays (T1, T2, & T3) in controlling fungal rots on blueberries. Threeadjacent rows of blueberries are available, each with 24 plants. Sprays can be

    applied to individual blueberry plants. The outcome/response variable is the

    proportion of blueberries with rot. For the following two designs, specify the

    experimental unit, blocking factor, and number of replications of the treatments.

    (a) The sprays are randomly allocated to rows and 8 blueberry plantsrandomly selected from each row for assessment.

    (b) Each row is divided into 3 plots of 8 plants each. The sprays arerandomly allocated to plots within each row.

    7.  A new drug was given to a group of 20 patients who suffered hay fever. Of these,

    15 reported that the remedy was very helpful in treating their hay fever. From theinformation we can conclude

    (A) The new drug is effective for the treatment of hay fever;

    (B) Sample size is too small to make a decision;

    (C) This result is not valid because there was no control group forcomparison.

    8.  Why is randomisation important?

    9.  Suppose a toy company wants to know if certain colors are more appealing and

    attractive to toddlers than others. They decide to measure this by choosing fivecolors of blocks and making sets of blocks in each of the five colors. Then they

    found 30 toddlers to participate in the study, and they randomly assigned each

    toddler a block color. They observed each toddler separately at the same time of

    the day, and gave them no other toys to play with. They recorded the length of

    time each toddler played with the blocks, to see if some colors of blocks were

    played with longer than other colors. All toddlers in the experiment were the same

    age (2 years old) and an equal number of girls and boys played with each color of

    blocks.

  • 8/18/2019 Research Design Notes Weeks 7-12

    19/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 19

     

    (A) What is the explanatory variable (IV) and what is the response variable(DV)?

    (B)  Is this study an observational study or an experiment?

    (C) Name one confounding variable that was controlled for in this study.

    (D) Give two reasons why we must sometimes use an observational studyinstead of an experiment.

    10. A common mistake made by the media, the general public, and some researchers,is to think that a link between two variables in any study implies that one variable

    causes the other. Explain what is wrong with this automatic conclusion.

    11. How can a researcher try to address the problem of confounding variables whendesigning an observational study?

    12. Explain why each of the following is used in experiments:

    a)  Placebo treatmentsb)  Blindingc)  Control groups.

  • 8/18/2019 Research Design Notes Weeks 7-12

    20/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 20

    Solution to revision exercises

    1. The sample survey focuses on the selection of individuals from the population. We

    discover the effect of applying a stimulus to subjects from experiments. The experimental

    design focuses on the formation of comparison groups that allow conclusions about the

    effect of the stimulus to be drawn.

    2. The experimental units (subjects) in this study are the 20 volunteers. There is onefactor, the medication and it has two levels, the active pill and the placebo. There are two

    treatments; the active pill and the placebo. The response variable is the change in

    cholesterol over the period of the study.

    3. The experimental units in this study are plots of land. There are two factors, fertilizer

    and water. Fertilizer has three levels: low, medium, and high. Water has two levels: light

    and heavy. There are a total of six treatments of fertilizer-water combinations: low-light,

    low-heavy, medium-light, medium-heavy, high-light, high-heavy. The response variable

    is the height of the plants at the end of the study.

    4. (a) Average height will increase for the children who had milk as compared to thechildren who had not had milk.

    (b) Milk

    (c) Height

    (d) Designed experiment.

    5. (a) Observational study.

    (b) Continuous random variable.

    (c) Example: mean vitamin C concentration in all bottles of that brand of orange juice

    stocked at the fresh juice shop over a period of time.(d) Example: mean vitamin C equal to 20 mg/100gm.

    6. (a) Experimental unit = row,

    No Blocking factor,

    Number of replications is one.

    Lay out of the design:

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    Single replication

    (b) Experimental unit = plot of 8 plants,

    Blocking factor is row,

    Number of replication is 3.

    -

    Row-2

    -

    Plant

  • 8/18/2019 Research Design Notes Weeks 7-12

    21/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 21

    Lay out of the design:

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    3 replications

    7. (C) This result is not valid because there was no control group for comparison.

    8. Why randomisation?

    The basic benefits of randomisation include

    i. Elimination of selection bias.

    ii. Formation of basis for statistical tests, a basis for an assumption-free statistical test of

    the equality of treatments.

    In general, a randomised trial is an essential tool for testing the efficacy of the treatment.

    9. (A) Explanatory variable is Block Colour and response variable is Playing Time.

    (B) An experiment.

    (C) Any of the following: age, time of the day, other toys, interaction with other childrenetc.

    (D) 1) It is unethical or impossible in certain situations to assign people to receive a

    specific treatment (such as smoking); 2) certain explanatory variables, such as left vs.

    right handedness, are inherent traits and cannot be randomly assigned.

    10. If the link is based on an observational study, there is simply no way to rule out all

    potential confounding factors, so cause and effect cannot be established.

    11.  Measure all the potential confounding variables he/she can think of and include them

    in the ANALYSIS to see whether they are related to the response variable; or use a case-

    control study and choose the controls to be as similar as possible to the cases.

    12. a) The power of suggestion may lead to changes in the participants, and those changes

    would be mistakenly attributed to the treatment or drug.

    b) Participants are kept blind so they don't alter their behavior or outcome to please the

    experimenter. Those collecting the measurements are kept blind so they don't

    inadvertently bias the measurements in the desired direction.

    c) Control groups are used to compare the effect of the treatment with what would have

    happened under similar circumstances without the treatment.

    Row-1

    Row-2

    Row-3

  • 8/18/2019 Research Design Notes Weeks 7-12

    22/170

    Research Design: Topic 7 Module 2: Topic 1

    Course notes STA60004 Semester 1, 2015 22

     

    References

    Box G.E.P., Hunter W.G. & Hunter J.S. (2005). Statistics for Experimenters: Design,

    Innovation and Discovery. 2nd

     edition. New York: Wiley.

    Cox D.R. (1958). Panning of Experiments. New York: Wiley.

    Das M.N. & Giri N.C. (1986). Design and Analysis of Experiments. 2nd

     edition. New

    Delhi: Wiley Eastern Ltd.Dawson B. & Trapp R.G. (2004). Basic and Clinical Biostatistics. New York: McGraw-

    Hill.

    Edwards T. (2008). Research Designs and Statistics. New York: McGraw-Hill.

    Gupta S.C and Kapoor V.K. (1984). Applied Statistics. Sultan Chand & Sons, New Delhi.

    Hinkelmann, K. and Kempthorne, O. (2008). Design and Analysis of Experiments, John

    Wiley & Sons, Inc.

    Jones B. & Kenward M.G. (2003). Design and Analysis of Crossover Trials. 2nd

     edition.

    London: Chapman & Hall.

    Montgomery D.C. (2005). Design and Analysis of Experiments. 6th

     edition. New York:

    Wiley.

    Petersen, R.G. (1985). Design and Analysis of Experiments. New York: Marcel Dekker,INC.

    Utts J.M. (2005). Seeing Through Statistics. Third Edition. Brooks/Cole Cengage

    Learning, CA, USA.

  • 8/18/2019 Research Design Notes Weeks 7-12

    23/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 1

     

    Topic 2: Common Designs

    Dr Amirul Islam

    Acknowledged to: Dr Jahar Bhowmik

  • 8/18/2019 Research Design Notes Weeks 7-12

    24/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 2

    Contents

    2.1 Topic introduction 3

    2.2 Topic learning objectives 3

    2.3 Completely Randomised Designs 3

    2.4 Randomised Block Designs 8

    2.5 Latin Square Designs 14

    2.6 Factorial Experiments 17

    2.7 Nested Designs 19

    2.8 Repeated Measures Design 20

    Revision Exercises 21

    Solutions to Revision Exercises 22

    References 23

    Note: Some of the materials are adapted from standard texts and guides (see references).  

  • 8/18/2019 Research Design Notes Weeks 7-12

    25/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 3

    2.1 Topic introduction

    In the previous chapter we have explored the fundamental principles of good

    experimental design. In this chapter we apply these principles to some of the basic

    designs that are commonly used in practice. These are: (i) Completely Randomised

    Designs (CRD), (ii) Randomised Block Designs (RBD) and (iii) Latin Square Designs(LSD). These designs are described below one by one. We also consider the analysis of

    data from these basic designs. In practice most experimental data are continuous, so we

    will try to restrict our attention to continuous response (outcome) variable.

    2.2 Topic learning objectives

    Learning objectives

    When you have worked through this topic you should:

    •  Recognise the designs commonly used in practice.

    •  Understand the principles of basic designs.

    •  Understand which design would be useful for a particular researchproject.

    2.3 Completely Randomised Designs (CRD)

    The completely randomised design is the simplest of all the designs, based on principles

    of randomisation and replication. In this design treatments are allocated at random to the

    experimental units over the entire experimental material and each treatment is repeated an

    equal number of times. This design is very flexible in that any number of treatments and

    any number of replications may be used. A completely randomised design is one in which

    all experimental units are assigned treatments solely by chance. No grouping of

    experimental units is done prior to assignment of treatments. In general, an equal number

    of replications for each treatment should be made except in particular cases when some

    treatments are of greater interest than others or when practical limitations dictate

    otherwise.

    In this design treatments are assigned to the experimental units completely at random.

    There are a variety of ways that this is done in practice, usually using computer programs

    are easy but all have the feature that each observation has an equal chance of being

    allocated to each group. Suppose we want to conduct an experiment with four treatments,

    each replicated five times. This will require 20 experimental units, which we number

    from 1 to 20 as in Figure 2.1 below. We can now assign different experimental units to

    various treatments many ways. For example, we are explaining two methods. Method 1 is

    not used much practically but the Method 2 is always used.

  • 8/18/2019 Research Design Notes Weeks 7-12

    26/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 4

    Method 1:

    1.  Obtain 20 identical pieces of paper (this is the experimental unit). Label five ofthem “Treatment A”, five of them “Treatment B”, five of them “Tret C” and five

    of them “Tret D”.

    2.  Place the pieces of paper in a box and mix thoroughly.

    3.  Pick a piece of paper at random. The treatment named on this piece is assigned toexperimental unit 1.

    4.  Without returning the first piece of paper to the box, select another piece. Thetreatment named on this piece is assigned to experimental unit 2.

    5.  Continue this way until all 20 pieces of paper have been drawn.

    6.  This is just an example, the allocation will vary according to what you getrandomly.

    Treatment Experimental Unit Total 5 units in eachtreatment

    Treatment A 1 2 3 4 5 5

    Treatment B  6 7 8 9 10 5

    Treatment C  11 12 13 14 15 5

    Treatment D  16 17 18 19 20 5

    Figure 2.1: Assignment of numbers to experimental units.

    Method 2: Using EXCEL

    1.  Put the numbers 1 to 20 in column A.

    2.  Enter the formula =RAND ( ) in cell B1 and fill down to B20. It will generate 20random number with 4-5 decimal places. You can make it one decimal places or it

    does not matter if you keep it. You have to do exactly

  • 8/18/2019 Research Design Notes Weeks 7-12

    27/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 5

     

    3.  Copy column B onto itself using Paste Special Values

    4.  Select a cell in either column A or B, and sort the worksheet by Data SortColumn B. (from points 3 and 4, you will find the following)

    5.  We now have the numbers 1 to 20 in column A in random order.

    6.  Give the first five to treatment A, and so on, which gives

    Treatment Experimental Unit Total 5 units

    in each

    treatment

    Treatment A 2 20 16 13 6 5

    Treatment B  10 12 19 4 17 5

    Treatment C  3 9 15 5 8 5

  • 8/18/2019 Research Design Notes Weeks 7-12

    28/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 6

    Treatment D  18 14 7 11 1 5

    Figure 2.2: Assignment of numbers to experimental units.

    2.3.1 Analysis

    A completely randomised design provides a one-way classified data according to levels of

    a single factor, “treatment”. The data from this design can be analysed by a one-wayanalysis of variance (ANOVA). The ANOVA results help us to answer the following:

      How much variation is due to differences between treatments?

      How much variation is due to differences within each set of observations for thesame treatment?

      It provides solution of the hypotheses to test if there is any difference across thetreatments, i.e., Treatment A vs. Treatment B and so on.

    An appropriate linear statistical model for a one-way classified data is

    Response = general mean effect (overall mean)+ effect of treatment i + error

      ij i ij y e µ α = + + ; i=1,2,…,p & j=1,2,….,r.

    Where yij is the yield or response from the  jth unit receiving the ith treatment, µ is the

    general mean effect, αi is the effect due to the ith treatment, and eij is the error component

    due to chance. The error components are assumed to be independently and normally

    distributed with 0 mean and constant variance σ2.

    The general form of the ANOVA table for a completely randomised design with  p

    treatments each replicated r  times with N (rp) experimental units is given below.

    Table 2.1: ANOVA for CRD

    Source of

    variation (SV)

    Degrees of

    freedom (df)

    Sum of

    squares (SS)

    Mean square

    (MS)

    F Statistic

    Treatment

    Error

    Total

     p-1

    N- p [N-1-p+1,

    i.e., total df – 

    treatment df]]

    N-1

    SST

    SSE

    SSTot

    MST=SST/( p-1)

    MSE=SSE/(N- p)

    FT=MST/MSE

    SST= Between treatments sum of squares (or between groups sum of squares) which is

    the sum of squares of the differences between the treatment means and the overall mean.

  • 8/18/2019 Research Design Notes Weeks 7-12

    29/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 7

    SSE= Residual sum of squares or error sum of squares (or within groups sum of squares)

    which is the sum of the squares of the differences between the observations and their

    respective treatment means.

    SSTot= Total sum of squares which is the sum of the squares of the differences between

    the observations and the overall mean. Note that SSTot=SST+SSE.

    In this design the total variation is partitioned into two components:

    (a) Variation among treatment means (treatments).

    (b) Variation among units within treatments (error).

    Example 2.1

    The following table shows some of the results of an experiment on the effect of

    applications of sulphur [S3, S6, S12] in reducing scale disease of potatoes. The object in

    applying sulphur is to increase the acidity of the soil since scale does not thrive in very

    acid soil. In addition to untreated plots which serve as controls [O]- 3 [F3, F6, F12]

    amounts of dressing are compared-300, 600 and 1200 lb. per acre. Both a spring and fall

    application of each treatment was tested so that in all there were seven distinct treatments.

    Field plan and scale indices for a completely randomized experiment on potatoes

    F3 

    9

    O

    12

    S6 

    18

    F12 

    10

    S6 

    24

    S12 

    17

    S3 

    30

    F6 

    16

    O

    10

    S3 

    7

    F12 

    4

    F6 

    10

    S3 

    21

    O

    24

    O

    29

    S6 

    12

    F3 

    9

    S12 

    7

    S6 

    18

    O

    30

    F6 

    18

    S12 

    16

    F3 

    16

    F12 

    4

    S3 9

    O18

    S12 17

    S6 19

    O32

    F12 5

    O26

    F3 4

    Results grouped by treatments for data analysis

    Totals

    Means

    O F3  S3  F6  S6  F12  S12 

    12

    10

    24

    29

    30

    18

    32

    26

    9

    9

    16

    4

    30

    7

    21

    9

    16

    10

    18

    18

    18

    24

    12

    19

    10

    4

    4

    5

    17

    7

    16

    17

    181 38 67 62 73 23 57

    22.6 9.5 16.8 15.5 18.2 5.8 14.2

  • 8/18/2019 Research Design Notes Weeks 7-12

    30/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 8

     

    Figure 2.1: Mean (acidity level) plots for Sulphur (S3, S6 and S12:treatment) and

    controls (O, F3, F6 and F12)

    The figure shows the highest mean acidity level for control “O” but in general application

    of sulphur increased the acidity level, especially the mean deferences were higher in case

    of S3 vs F3 and S12 vs F12.

    Example 2.2 (taken from Petersen R.G. 1985, p.14)

    An anthropologist was interested in studying physical differences, if any, among the

    various races of people inhabiting Hawaii. As a part of her study she obtained a random

    sample of eight 5-year-old girls from each of three races: Caucasian, Japanese, andChinese. She made a number of anthropometric measurements on each girl. She wanted

    to determine whether the Oriental races differ from the Caucasian, and whether the

    Oriental races differ from each other. The results of the head width measurements (cm)

    are given in the following table. The anthropologist is interested in knowing whether or

    not head width means differ among the races.

    Head width (cm)

    Caucasian Japanese Chinese

    14.20

    14.30

    15.00

    14.60

    14.55

    15.15

    14.60

    14.55

    12.85

    13.65

    13.40

    14.20

    12.75

    13.35

    12.50

    12.80

    14.15

    13.90

    13.65

    13.60

    13.20

    13.20

    14.05

    13.80

  • 8/18/2019 Research Design Notes Weeks 7-12

    31/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 9

    Total: 116.95 105.50 109.55

    Mean: 14.619 13.188 13.694

    Grant mean 13.83

    AOVA table of head width

    SV d.f. SS MS F

    Race (Treatment)

    Error

    Total

    3-1=2

    24-3=21

    24-1=23

    8.43

    3.84

    12.27

    4.21

    0.18

    23.39

    Calculations:

    Error sum square calculation

    (14.2-14.619)2+……………..+(14.55-14.619)

    2+(12.85-

    13.188)2+………..+(13.80-13.694)

    2= 3.84

    Sum square total = (14.2-13.83)2+………………(13.80-13.83)

    2=12.27

    In this case, (every treatment units –grant total)2 

    Sum square treatment = SS total – SS error = 12.27-3.84 = 8.43.

    In case of CRD, the total variation is due to treatment and error.

  • 8/18/2019 Research Design Notes Weeks 7-12

    32/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 10

    If we do the analysis in SPSS, then the data entry should be like this.

    Here, 0 = Caucasian; 1= Japanese; and 2= Chinese

    If the data are in SPSS, the analysis will produce the following output.

    ANOVA 

    headwidth

    Sum of Squares df Mean Square F Sig.

    Between Groups 8.428 2 4.214 23.041 .000

    Within Groups 3.841 21 .183

    Total 12.268 23

  • 8/18/2019 Research Design Notes Weeks 7-12

    33/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 11

     

    Explanation of the ANOVA Table:

    Between groups degrees of freedom is 2. This is because there are 3 ethnic groups, i.e.,

    the number of group minus 1. Eight girls in each ethnic groups, i.e., the total degrees of

    freedom equal 3×8 – 1 = 23. The error degree of freedom = 23-2 = 21. Mean sum square

    equals sum squares divided by the number of degrees of freedom. F = (4.214/0.183)= 23.04.was supposed to be significant with F(2, 21) degrees of freedom if F was greater than

    2.57. Please get this information from the F table (available online from the link).

    http://www.socr.ucla.edu/applets.dir/f_table.html.

    Descriptives 

    Headwidth

    N Mean Std. Deviation

    Caucasian 8 14.6188 .31953

    Japanse 8 13.1875 .56553

    Chinese 8 13.6938 .35601

    Total 24 13.8333 .73035

    2.3.2 Advantages of CRD

    There are a number of advantages of a completely randomised design:

    (i)  The design is very flexible. Any number of treatments can be used anddifferent treatments can be used unequal number of times without unduly

    complicating the statistical analysis in most of the cases. The number of

    replications need not be the same from one treatment to another, although

    comparisons are most precise when the treatments are equally replicated.

    (ii)  The statistical analysis remains simple if some or all the observations forany treatment are rejected or lost or missing for some purely random

    accidental reasons. Moreover the loss of information due to missing datais smaller in comparison with any other design.

    (iii)  The design provides maximum degrees of freedom for the estimation ofthe error variance, which increases the sensitivity or the precision of the

    experiment for small experiments, i.e., for experiments with small

    number of treatments.

    (iv)  This design results in the maximum use of the experimental units since allthe experimental material can be used.

  • 8/18/2019 Research Design Notes Weeks 7-12

    34/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 12

     

    2.3.3 Disadvantages of CRD

    There is one principal disadvantage of this design:

    (i) If the experimental material is not uniform its precision is low. Since

    randomisation is not restricted in any direction to ensure that the units receivingone treatment are similar to those receiving the other treatment, the whole

    variation among the experimental units is included in the residual variance. This

    makes the design less efficient and results in less sensitivity in detecting

    significant effects.

    2.3.4 Applications of CRD

    Although other designs have more precision, the CRD has a number of uses:

    (i)  It is most useful in laboratory techniques and methodological studies, e.g.,in physics, chemistry or cookery, in chemical and biological experiments,

    in some green house studies, etc., where the experimental material isuniform.

    (ii)  This design is also recommended in situations where a large fraction ofunits is likely to be destroyed or to fail to respond.

    (iii)  This design may be useful for experiments in which the total number ofunits is limited.

    2.4 Randomised Block Designs (RBD)

    The second commonly used design is the randomised block design. If a researcher has to

    believe that subgroups of the experimental units will respond differently to treatments

    because of some characteristic, the units are sorted into those subgroups before treatments

    are assigned. In an experiment these subgroups are called blocks. Once units are assigned

    to blocks, treatments are randomly assigned to the units in each block. Blocking is a form

    of control to reduce unwanted variability in the response variable due to some variable

    other than the treatment (s). In field experimentation, if the whole of the experimental

    area is not homogenous (uniform) and the fertility gradient is only in one direction, then a

    simple method of controlling the variability of the experimental material consists in

    stratifying or grouping the whole area into relatively homogenous strata or sub-groups (or

    blocks), perpendicular to the direction of the fertility gradient. Now if the treatments are

    applied at random to relatively homogenous units within each strata or block and

    replicated over all the blocks, the design is a randomised block design (RBD). In CRD no

    such local control measure is adopted except that the experimental units should be

    homogenous and treatments allocated at random to the experimental units. But in

    randomised block designs treatments are allocated at random within the units of each

    stratum or block, i.e. randomisation is restricted. Therefore, homogenous grouping of

    experimental units and the random allocation of the treatments separately in each block

    are the two main characteristic features of randomised block designs. RBD is the

  • 8/18/2019 Research Design Notes Weeks 7-12

    35/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 13

    improvement of CRD obtained by providing error control measures. The error control

    measures in RBD consist of making the units in each of these blocks homogenous.

    Layout of RBD:  In the RBD the experimental units are first grouped into blocks or

    strata. Treatments are then randomly assigned to the units within the blocks. A separate

    randomisation is used in each block. To illustrate the procedure, suppose we want to run

    an experiment with five treatments (A, B, C, D and E) replicated four times in an

    agricultural field with a fertility gradient (see Petersen R.G, 1985, p. 36). We construct a

    RBD using the following steps:

    BLOCK

    I II III IV

    Treatment A 1 1 1 1

    Treatment B 2 2 2 2

    Treatment C 3 3 3 3

    Treatment D 4 4 4 4

    Treatment E 5 5 5 5GRADIENT

    Figure 2.3: Assignment of numbers to units blocked to remove effects of a gradient

    Step 1: Form four blocks of five plots each perpendicular to the gradient. Number the

    plots from 1 to 5 within each bock as shown in Figure 2.3.

    Step 2: Use a table of random numbers or some other procedure (e.g. using EXCEL), to

    assign treatments to the units in the first block. To illustrate for

    Block I:

    Sequence Treatment Random number

    (generated in Excel)

    Random No

    sorted and

    ranked the plot

    Treatment

    according to the

    rank, e.g., 1=A,

    2=B, 3=C, 4=D,

    5=E

    1 A 293 (second smallest)=2 2 B

    2 B 078 (smallest) =1 1 A

    3 C 721 (largest)=5 5 E

    4 D 569 (3rd

     smallest)=3 3 C

    5 E 612 (4th

     smallest)=4 4 D

    Step 3: Repeat step 2 for the reaming three blocks:

    Block II

  • 8/18/2019 Research Design Notes Weeks 7-12

    36/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 14

    Sequence Random number Rank (plot) Treatment

    1 962 4 D

    2 036 1 A

    3 844 3 C

    4 963 5 E

    5 097 2 B

    Block III

    Sequence Random number Rank (plot) Treatment

    1 675 3 C

    2 936 5 E

    3 709 4 D

    4 591 1 A5 665 2 B

    Block IV

    Sequence Random number Rank (plot) Treatment

    1 230 1 A

    2 981 5 E

    3 687 4 D

    4 604 3 C

    5 454 2 B

    The final plan of the RBD is given in the following figure.

    Block I II III IV

    Treatment 1 B 1 D 1 C 1 A

    2 A 2 A 2 E 2 E

    3 E 3 C 3 D 3 D

    4 C 4 E 4 A 4 C

    5 D 5 B 5 B 5 B

  • 8/18/2019 Research Design Notes Weeks 7-12

    37/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 15

    Figure 2.4: Final experimental plan with treatments randomly assigned to units within

    blocks in a RBD

    Example 2.3

    Suppose we are interested in how weight gain (Y) in rats is affected by source of protein

    (Beef, Cereal, and Pork) and by level of protein (High or Low). There are a total of 6

    (3x2) treatment combinations of the two factors (Beef -High Protein, Cereal-High

    Protein, Pork-High Protein, Beef -Low Protein, Cereal-Low Protein, and Pork-LowProtein) . Suppose we have available to us a total of N = 66 experimental rats to which

    we are going to apply the different diets based on the t = 6 treatment combinations. Prior

    to the experimentation the rats were divided into n = 11 homogeneous groups of size 6.

    The grouping was based on factors that had previously been ignored (Example - Initial

    weight size, appetite size etc.). Within each of the 11 blocks a rat is randomly assigned a

    treatment combination (diet). The weight gain (in grams) after six month is measured for

    each of the test animals and is tabulated in the following table.

    6101 70 98 82 77 79

    (1) (2) (3) (4) (5) (6)

    Example 2.4

    A group of researchers are interested in comparing the effects of four different chemicals

    (A, B, C and D) in producing water resistance (y) in textiles. A strip of material,

    randomly selected from each bolt, is cut into four pieces (samples) the pieces are

    randomly assigned to receive one of the four chemical treatments. This process is

    replicated three times producing a Randomised Block (RB) design. Moisture resistance

    (y) was measured for each of the samples. (Low readings indicate low moisture

    penetration). The data is given below.

    Blocks (Bolt Samples) 

    Completed Design

    Block Block

    1 107 96 112 83 87 90 7 128 89 104 85 84 89

    (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

    2 98 72 101 82 70 94 8 56 70 71 64 62 67

    (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

    3 102 76 101 85 95 89 9 99 91 92 80 71 85

    (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

    4 97 70 93 65 71 61 10 82 63 87 87 81 61

    (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

    5 109 79 101 75 75 81 11 101 102 110 83 93 83

    (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

    9.9 C 13.4 D 12.7 B

    10.1 A 12.9 B 12.9 D

    11.4 B 12.2 A 11.4 C

    12.1 D 12.3 C 11.9 A

  • 8/18/2019 Research Design Notes Weeks 7-12

    38/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 16

     Example 2.5

    An experiment was carried out on wheat. Three varieties of wheat A, B, C were tested for

    their yield in four randomised blocks. Each of four blocks were divided into three plots

    and plots of each block were assigned at random to the three varieties. The plan and yield

    per plot in kg are given below:

    Block 1 Block 2 Block 3 Block 4

    A

    8

    C

    10

    A

    6

    B

    10

    C12

    B8

    B9

    A8

    B

    10

    A

    8

    C

    10

    C

    9

    Wheat yield Block 1 Block 2 Block 3 Block 4

    A 8 8 6 8

    B 10 8 9 10

    C 12 10 10 9

    Example 2.6

    A researcher is carrying out a study of the effectiveness of four different skin creams forthe treatment of a certain skin disease. He has eighty subjects and plans to assign them

    into 4 treatment groups of twenty subjects each. Using a randomised block design, the

    subjects are assessed and put in blocks of four according to how severe their skin

    condition is; the four most severe cases are the first block, the next four most severe cases

    are the second block, and so on to the twentieth block. The four members of each block

    are then randomly assigned, one to each of the four treatment groups.

    ( Example taken from Valerie J. Easton and John H. McColl's Statistics Glossary).

    2.4.1 Analysis

    If in an RBD a single observation is made on each of the experimental units, then the data

    from an RBD can be analysed by a two-way ANOVA. In this design the ANOVA enables

    us to partition the total variation into blocks, treatments and error. A randomised block

    experiment is assumed to be a two-factor experiment. The factors are blocks and

    treatments.

    An appropriate linear statistical model for RBD is

    Response = general mean effect (overall mean)+ treatment effect + block effect + error

  • 8/18/2019 Research Design Notes Weeks 7-12

    39/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 17

      ij i j ij y b e µ α = + + + ; i=1,2,…,p & j=1,2,….,r.

    Where yij is the yield or response of experimental unit from ith treatment and  jth block, µ

    is the general mean effect, αi is the effect due to the ith treatment, b j is the effect due to jth

    block or replicate and eij is the error component due to chance. The error components are

    assumed to be independently and normally distributed with 0 mean and constant variance

    σ2

    .

    The general form of the ANOVA table with  p  treatments each replicated r   times in a

    randomised block design with r  blocks of p units each, is given below.

    Table 2.2: ANOVA for RBD

    Source of

    variation (SV)

    Degrees of

    freedom (df)

    Sum of

    squares (SS)

    Mean square (MS) F Statistic

    Treatment

    Block

    Error

    Total

     p-1

    r-1

    ( p-1)(r -1) 

    rp—1

    SST

    SSB

    SSE

    SSTot

    MST=SST/( p-1)

    MSB=SSB/(r-1)

    MSE=SSE/( p-1)(r -1)

    FT=MST/MSE

    Analysis output using SPSS from example 2.5 (four blocks and 3 varieties of wheat)

    Analysis summary

  • 8/18/2019 Research Design Notes Weeks 7-12

    40/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 18

    Between-Subjects Factors 

    Value Label N

    BLOCK 1 3

    2 3

    3 3

    4 3TREATNUM 1 A 4

    2 B 4

    3 C 4

  • 8/18/2019 Research Design Notes Weeks 7-12

    41/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 19

     

    Tests of Between-Subjects Effects 

    Dependent Variable: YEILD

    Source

    Type III Sum of

    Squares df Mean Square F Sig.

    Corrected Model 26.000a  11 2.364 . .

    Intercept 972.000 1 972.000 . .

    BLOCK 4.667 3 1.556 . .

    TREATNUM 15.500 2 7.750 . .

    BLOCK *

    TREATNUM5.833 6 .972 . .

    Error .000 0 .

    Total 998.000 12

    Corrected Total 26.000 11

    a. R Squared = 1.000 (Adjusted R Squared = .)

    Interpretation of the Table

    There are four blocks, so theoretically df is expected to be 4-1 = 3; similarly for

    variety/treatment, 2 and for interaction (4-1)× (3-1) = 6 or simply 3×2=6. Since there is

    no error df, no F value was able to compute.

    2.4.2 Advantages of RBD

    There are a number of advantages of a randomised block design. Chief advantages of

    RBD can be outlined as follows:

    (i)  This design is more efficient or accurate than CRD for most types of experimentalworks. Blocking can increase precision by removing one source of variation from

    experimental error.

    (ii) In this design no restrictions are placed on the number of treatments or the numberof replicates. Any number of blocks and any number of treatments can be used so long

    as each treatment is replicated the same number of times in each block. However, for

    better management of the experiment, it is suitable not to use a large number of

    treatments.

    (iii)  Statistical analysis is relatively simple and rapid.

    2.4.3 Disadvantages of RBD

    There are a few disadvantages of RBD:

    (i)  RBD is not suitable for large number of treatments. The efficiency of the designdecreases as the number of treatments and, hence, block size increases.

  • 8/18/2019 Research Design Notes Weeks 7-12

    42/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 20

    (ii) In the analysis, missing data can cause some difficulty.

    2.4.4 Applications of RBD

    This design has a number of applications:

    (i)  RBD provides unbiased estimates of the means for blocking categories, providingadditional information from the experiment.

    (ii) This design can remove one source of variation from the experimental error andthus increase precision.

    2.5 Latin Square Designs (LSD)

    In RBD the whole of the experimental area is divided into relatively homogenous groups

    (blocks) to control one source of variation, and treatments are allocated at random to units

    within each block. But in field experimentation, it may happen that an experimental area

    (field) exhibits fertility in strips, e.g., cultivation might result in alternative strips of high

    and low fertility. RBD will be effective if the blocks happen to be parallel to these stripsand would be extremely inefficient if the blocks are across the strips. Initially the fertility

    gradient is seldom known. A useful method of eliminating fertility variations consists of

    an experimental layout which will control variation in two perpendicular directions. One

    design which controls two sources of variations is called a Latin square design. A Latin

    square design incorporates two blocking factors, which are usually represented as rows

    and columns.

    Layout of LSD: In this design the number of treatments is equal to the number of

    replications. To construct a Latin square design for  p  treatments we require  p×p  =  p2 

    experimental units. The whole of the experimental area in divided into  p2

    experimental

    units (plots) arranged in a square so that each row as well as each column contains p units

    (plots). The p treatments are then allocated at random to these rows and columns in such away that every treatment occurs once and only once in each row and in each column.

    Such a layout is known as  p×p Latin Square Design (LSD) and is extensively used in

    agricultural experiments, e.g. if we are interested in studying the effects of  p  types of

    fertilizers (treatments) on the yield of a certain variety of wheat, it is customary to

    conduct the experiment on a square field with  p2-plots of equal area and to associate

    treatments with different fertilizers and row and column effects with variations in fertility

    of soil. A Latin Square Design incorporates two blocking factors, which are usually

    represented as rows and columns.

    The basic pattern of a Latin square design with  p = 5 treatments, A, B, C, D, and E, in a

    5x5 square is given below which enables both blocking factors (rows: say treatment and

    column: say soil fertility):

    Column

    Row 1 2 3 4 5

    1

    2

    3

    A B C D E

    B C D E A

    C D E A B

  • 8/18/2019 Research Design Notes Weeks 7-12

    43/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 21

    4

    5

    D E A B C

    E A B C D

    Figure 2.4: Basic design for a 5x5 Latin square.

    Example 2.7

    An experiment was conducted to compare the effectiveness of four types of foodsupplements for increasing the milk yield of dairy cows in a farm. The supplements (A,

    B, C and D) were given to four cows, and repeated in four successive time periods while

    rotating the cows. Milk yields, in grams/day, are recorded. The cow (1, 2, 3 or 4) was one

    blocking factor and the time period (I, II, III or IV) was the other. The plan and yields are

    given in the following table:

    Cow

    Time Period

    I II III IV

    1 A

    882

    B

    605

    C

    947

    D

    772

    2 B1078

    C705

    D712

    A756

    3 C

    702

    D

    659

    A

    824

    B

    644

    4 D

    690

    A

    789

    B

    930

    C

    762

    Example 2.8

    An experiment was conducted to compare the effectiveness of five manorial treatments

    A, B, C, D and E on the yield of sugarcane (in kg/plot). The following are the results of

    the Latin Square experiment.

    B

    405

    A

    525

    E

    463

    D

    441

    C

    481

    C

    325

    D

    445

    B

    429

    A

    513

    E

    493

    E

    471

    B

    492

    A

    472

    C

    381

    D

    410

    A

    552

    C

    431

    D

    425

    E

    572

    B

    410

    D

    430

    E

    469

    C

    432

    B

    467

    A

    460

    Example 2.9 (taken from Petersen R.G, page 57)

    A ceramics engineer wanted to test the strength of high-tension insulators made from four

    clay mixtures A, B, C, D and a control, E. He made five insulators from each mixture. He

    suspected that there was a temperature gradient from front to back and from top to bottom

    in his oven. He decided to use a Latin square design with shelves (top to bottom) as rows

    and positions on the shelves (front to back) as columns. The insulators were placed in the

    oven in the Latin square arrangements. After firing, the strength of each insulator was

  • 8/18/2019 Research Design Notes Weeks 7-12

    44/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 22

    measured. The experimental layout and strength measurements were as shown in the

    following table:

    Front Back

    A

    33.8

    B

    33.7

    D

    30.4

    C

    32.7

    E

    24.4

    D35.0

    E28.8

    B33.5

    A26.7

    C33.4

    C

    35.8

    D

    35.6

    A

    36.9

    E

    26.7

    B

    35.1

    E

    33.2

    A

    37.1

    C

    37.4

    B

    38.1

    D

    34.1

    B

    34.8

    C

    39.1

    E

    32.7

    D

    37.4

    A

    36.4

    2.5.1 Analysis

    In Latin square design there are three factors: row, column and treatment. The datacollected from this design can be analysed by a three-way ANOVA.

    An appropriate linear statistical model for the ith row, jth column and the pth treatment

    is:

    Response = general mean effect (overall mean) + row effect + column effect + treatment

    effect + error

      ijk i j k ijk   y r c e µ α = + + + + ; i=j=k=1,2,…,p.

    Where yijk is the yield or response of experimental unit from ith row, jth column and kthtreatment, µ is the general mean effect, r i is the effect due to the ith row, c j is the effect

    due to jth column and αk  is the effect due to k th treatment and eijk is the error component

    due to chance. As usual the error components are assumed to be independently and

    normally distributed with 0 mean and constant variance σ2.

    The general form of the ANOVA table for a Latin square design with p treatments is

    presented in the following table.

    Table 2.2: ANOVA for RBD

    Source of

    variation (SV)

    Degrees of

    freedom (df)

    Sum of

    squares (SS)

    Mean square (MS) F Statistic

    Rows

    Columns

    Treatment

    Error

     p-1

     p-1

     p-1

    ( p-1)( p-2) 

    SSR

    SSC

    SST

    SSE

    MSR=SSR/( p-1)

    MSC=SSC/( p-1)

    MST=SST/( p-1)

    MSE=SSE/( p-1)( p-1)

    FR=MSR/MSE

    FC=MSC/MSE

    FT=MST/MSE

    Top

    Bottom

  • 8/18/2019 Research Design Notes Weeks 7-12

    45/170

     Research Design: Topic 8 Module 2: Topic 2

    Course notes STA60004 Semester 2/SP3, 2015 23

    Total P2--1 SSTot

    De