98
Office of Learning, Teaching & Community Engagement Version date: 1 October 2008 Study Guide Design and analysis of biological studies SBI209 Student Name

Design and analysis of biological studies€¦ · Introduction 1 Study Guide SBI209 Design and analysis of biological studies Introduction Biology is “the science of life or living

Embed Size (px)

Citation preview

Office of Learning, Teaching & Community Engagement Version date: 1 October 2008

Study Guide

Design and analysis of biological studies

SBI209

Student Name

Unit Design and analysis of biological studies

Unit code

SBI209

Faculty Engineering, Health, Science

and the Environment

Awards Bachelor of Applied Science

Bachelor of Biomedical Science Bachelor of Environmental Science

Bachelor of Pharmacy

Prerequisites None

Duration

One semester

Credit 10 credit points

Assessment tasks

Four

The complete study package contains: Unit Information

Study Guide CD-Rom

Statistical Manual Statistical Tables Tutorial Problems Learnline (online)

Prepared by

Keith McGuinness

Acknowledgements Julia Schult for many helpful suggestions

Materials in this book are reproduced under section 40 (1A) of the Copyright Amendment Act 1980 (Cth) for the purposes of private study by external students enrolled in this unit. © Charles Darwin University CRICOS provider 00300K First published 2004 Reprinted June 2006 Minor revision September 2008 Reprinted October 2009 Published by the Office of Learning & Teaching and printed by Uniprint NT, Charles Darwin University

Contents Introduction 1

Working through this unit 1

Unit content overview 4

Topic 1: Approaches to biological studies 5 Introduction 6 Sub-topic 1: Estimation and model testing 6 Sub-topic 2: A framework for testing models 6 Sub-topic 3: The role of the statistical test 8 Sub-topic 4: Type I and Type II errors 9 Summary 10

Topic 2: Elements of design 11 Introduction 12 Sub-topic 1: Variables, distributions and summary statistics 12 Sub-topic 2: Accuracy and precision 16 Sub-topic 3: Estimation and sampling schemes 18 Sub-topic 4: Manipulative experiments 20

Topic 3: Problems and solutions 25 Introduction 26 Sub-topic 1: Problems with replication 26 Sub-topic 2: Problems with confounding 29 Sub-topic 3: Problems with non-independence 31 Sub-topic 4: Review and exercises 32 Sub-topic 5: Enhancements to design 33 Summary 34

Topic 4: Testing hypotheses about one or two means 35 Introduction 36 Sub-topic 1: Types of comparisons of two values 37 Sub-topic 2: Comparing the mean of a sample to a value 38 Sub-topic 3: Comparing means of paired samples 39 Sub-topic 4: Comparing means of unpaired samples 40 Summary 41

Topic 5: Testing hypotheses about more than two means 43 Introduction 44

Sub-topic 1: One-factor ANOVA versus two factor ANOVA 45 Sub-topic 2: One-factor ANOVA 46 Sub-topic 3: Multiple comparisons tests 47 Sub-topic 4: Assumptions of ANOVA 49 Sub-topic 5: Two-factor ANOVA 51

Topic 6: Testing hypotheses about frequencies 57 Introduction 58 Sub-topic 1: Reasons for looking at frequency distributions 58 Sub-topic 2: Four particular “theoretical” distributions 61 Sub-topic 3: Comparing observed and expected distributions 64 Sub-topic 4: Comparing two or more observed distributions 67 Summary 70

Topic 7: Testing hypotheses about relationships 71 Introduction 72 Sub-topic 1: Why look at relationships 72 Sub-topic 2: Correlations 75 Sub-topic 3: Linear regression 78 Sub-topic 4: Non-linear relationships 79 Summary 81

Topic 8: An introduction to multivariate analysis 83 Introduction 84 Sub-topic 1: Why use multivariate analysis? 84 Sub-topic 2: Kinds of multivariate analysis 88 Sub-topic 3: Multiple regression 89 Sub-topic 4: Cluster analysis – “group membership” 90 Sub-topic 5: Ordination – “structure” 91 Sub-topic 6: Limitations of some multivariate methods 92 Summary 93

References and Bibliography 94

Introduction 1

Study Guide SBI209 Design and analysis of biological studies

Introduction

Biology is “the science of life or living matter in all its forms and phenomena” (Macquarie Dictionary). To understand life “in all its forms”, biologists ask questions, make observations and test ideas. One general characteristic of “life and living matter” is its variability: life thrives in an enormous variety of different environments and takes an enormous variety of different forms. In addition to the variety of environments and forms, is the variability of individual behaviour: even clones, which are genetically identical, can differ in response to the same events.

This wonderful, natural variability – which is in some ways one of the key features of biological systems – can complicate the task of answering even some of the simplest questions. For instance, according to principles worked out by Mendel, crossing pure lines of “yellow seed” and “green seed” peas should give offspring in which the “yellow seed” type outnumber the “green seed” type three to one (3:1). Suppose that in one experiment the outcome is 25 yellow to 15 green: are these results consistent with Mendel’s theory? The results are not exactly as predicted but they are also not that far out. What should our conclusion be in this case? The kinds of statistical methods discussed in this unit can help resolve this dilemma, and others like it.

But this unit is designed to be more than a “statistical cook-book”. It aims to give you an appreciation of some of the general issues involved in answering questions, and testing ideas, in the biological sciences. This does involve learning how to use appropriate statistical methods to assist in the

interpretation of results: this is “analysis” part of the title. It also, however, involves appreciating important issues about the conduct of biological studies to ensure that results are reliable: this is the “design” part of the title. This unit – and any biological study – requires attention to both sets of issues.

Working through this unit

This Study Guide is your key learning resource. As you work through it, you will encounter several different types of activities – these are highlighted with appropriate icons so that you can easily identify what you need to do at any point. Read the descriptions below, so that you understand what to do when they appear.

Icons and their meaning

This icon indicates an important point you need to take note of.

Read through the section(s) of the Statistical Manual indicated, then continue with the Study Guide or do other activities as directed.

Answer a question, make some notes or do some problems. This usually involves working with the SBI209 Problems book (you may also need to use your calculator and the Statistical Manual and Statistical Tables). In some cases, you may choose to do the problems with your calculator or on the computer with Excel.

Introduction 2

Study Guide SBI209 Design and analysis of biological studies

Watch a demonstration, or work through an example, on the computer.

Do a computer exercise. These use Excel and special workbooks designed for this unit.

Organisation of the unit

The unit material is organised as a series of topics for you to work through. Each topic has subtopics to enable you to work through material in short bursts rather than long sessions. More information about the unit, the learning materials and the assessment is the Unit Information book.

Materials

The materials in your package comprise:

• Study Guide (this book);

• Unit Information;

• Statistical Manual;

• Statistical Tables;

• Tutorial Problems; and

• CD-ROM.

In addition, you will need a calculator and access to a computer running a recent version of Excel (from Excel97 onwards).

All of these items are discussed in more detail in the Unit Information.

Options for working through the materials

The materials for this unit are divided between those on paper (the Study Guide, etc.) and those on the CD-ROM. I deliberately chose not to put everything on the CD-ROM because some students may not have the ready access to a computer that this would require, and because some people might find doing everything on the computer rather tedious (I, for instance, don’t like reading lots of stuff on the PC; some is okay). For similar reasons, all the tutorials I include can be done using a fairly basic calculator and don’t require a statistical package. (There is an extended discussion on the CD of my reasons for this last decision.)

Because of the way I have put the materials together, you have a few options on how to proceed.

• Work mostly from paper: As all the tutorials can be done using a calculator, you may choose to work mostly from the “paper materials”. You would still need to access the CD-ROM to view the various examples – these are an essential part of the materials – but you need not spend much time with the Excel tutorials or the other materials. With this approach, one or two sessions a week with the computer would probably be sufficient.

• Work mostly from the PC: You can work almost entirely from the computer as there are copies of all of the printed materials on the CD-ROM. You will also find a calculator program is standard equipment on most computers and the CD-ROM comes with several specialised Excel workbooks. You will, however, need to do

Introduction 3

Study Guide SBI209 Design and analysis of biological studies

calculations during the exam with a calculator, so you need some practice with this.

• Take an intermediate approach: Obvious!

However you choose to proceed, always remember that the Study Guide – this book – is your main guide to working through the unit. You need to work through it carefully, departing from it when directed to do so.

A brief introduction

View the “Introduction to the unit” section on the CD.

Unit content overview 4

Study Guide SBI209 Design and analysis of biological studies

Unit content overview

DESI

GN A

ND A

NALY

SIS

OF

BIOL

OGIC

AL S

TUDI

ES

Topic 1 Approaches to biological studies

The material in this unit is divided into eight topics. These topics are NOT of equal length: keeping each topic focused on appropriate areas necessarily means that some are longer than others. They should, all, however, take only one or two weeks to complete.

Each topic has text, in the Study Guide (this book), and most have tutorial exercises, in the Problems Book, and examples, demonstrations or Excel tutorials, on the CD. (The CD itself has a contents list and instructions on how to use it and the materials it contains.)

The SBI209 Statistical Manual, and associated Statistical Tables, is a general reference for statistical issues but you will also find it referred to here in the Study Guide (you may, for instance, be directed to read part of the manual to explore certain issues) and elsewhere.

Before the start of each topic you will find an outline, similar to this, which lists the main issues to be covered and the tutorial exercises, demonstrations or other activities which make up the topic.

Topic 2 Elements of design

Topic 3 Problems and solutions

Topic 4 Testing hypotheses about

one or two means

Topic 5 Testing hypotheses about

more than two means

Topic 6 Testing hypotheses

about frequencies

Topic 7 Testing hypotheses

about relationships

Topic 8 An introduction to

multivariate analysis

Topic 1: Approaches to biological studies 5

Study Guide SBI209 Design and analysis of biological studies

Topic 1: Approaches to biological studies

DESI

GN A

ND A

NALY

SIS

OF

BIOL

OGIC

AL S

TUDI

ES

Topic 1 Approaches to biological studies

Sub-topic 1: Estimation and model testing • No exercise or additional material.

Sub-topic 2: A framework for testing models • Example: Testing models about mangrove snails [CD] • Tutorial: Models, hypotheses and null hypotheses,

Tutorial 1 in the Problems book

Sub-topic 3: The role of the statistical test • Excel exercise: Exploring testing a null [CD]

Sub-topic 4: Type I and Type II errors • Definition: Type I and Type II errors [CD] • Excel exercise: Exploring Type I and Type II errors [CD] Note: [CD] means that the item is on the CD-ROM.

Topic 2 Elements of design

Topic 3 Problems and solutions

Topic 4 Testing hypotheses about

one or two means

Topic 5 Testing hypotheses about

more than two means

Topic 6 Testing hypotheses

about frequencies

Topic 7 Testing hypotheses

about relationships

Topic 8 An introduction to

multivariate analysis

Topic 1: Approaches to biological studies 6

Study Guide SBI209 Design and analysis of biological studies

Introduction

The major issues you will investigate in this topic will provide you with a framework for

• identifying two common reasons for collecting information (estimation and model testing);

• applying a systematic approach to testing models in biology;

• understanding the role of statistical tests in the model testing procedure; and

• understanding the nature and origin of two general types of errors that can arise during the procedure.

These investigations should provide you with a framework you can use to organise your thinking about the design and analysis of biological studies.

IMPORTANT: Before continuing you should briefly review some of the terms used in this unit. See Table 1–2 in the Statistical Manual or the CD (the CD has extended discussion).

Sub-topic 1: Estimation and model testing

Biologists collect many different kinds of information and appear to use it for many different purposes. In practice, however, the information is usually collected for one of two basic purposes: to estimate something of interest, or to test some idea of interest.

Read Box 1 for a description of these two purposes.

These two purposes are not mutually exclusive. Information may be collected first to estimate something of interest and subsequently to test some idea (or the other way around). For instance, fisheries managers may estimate the size of a fish stock (population) to determine catch limits for the next year, then compare results for different years to test ideas about how the stock responds to fishing.

Box 1. Estimation and Model testing Estimation: Collecting information to estimate some characteristic – parameter – of interest: the percentage of females in the population; the rate of population increase; the amount of oxygen in the water. The result here – the estimate – is usually numerical, that is expressed in numbers. The estimate may then be, and usually is, used to decide what course of action to take: reduce fishing pressure; add nutrients. Model testing: Collecting information to test some idea – a model – about the world: nutrients are limiting plant growth; disturbance increases diversity. The result here is usually “yes” or “no”: the idea appears to be correct or it doesn’t.

The next sub-topic examines model testing in more detail.

Sub-topic 2: A framework for testing models

This sub-topic describes a framework – a step-wise sequential procedure – for testing models in biology. The foremost proponent of this particular procedure is Professor A. J. Underwood, of the University of Sydney, but other biologists have described similar frameworks.

In Assignment 2 you have to apply this procedure in a practical situation you devise.

Topic 1: Approaches to biological studies 7

Study Guide SBI209 Design and analysis of biological studies

The complete rationale and justification for this procedure will not be considered in this unit (see Underwood 1990 and 1997, in the References, for this) but it is worth noting that it does have several advantages:

• It is a falsificationist approach: models are proposed and tested and a model is retained only as long as it is not known to be wrong. This type of approach has logical and philosophical advantages.

• It is methodical: Each step in the procedure is specified and this can illuminate problems or confusion.

• The role of statistical tests is explicit: in particular, the alternate hypothesis and null hypothesis (described later) are logically derived from the model. This also can illuminate problems or confusion.

• It is iterative (repeating): Regardless of the outcome of the test – that is, regardless of whether the null hypothesis is accepted or rejected – progress is made and this is an essential feature of science.

The model testing procedure is illustrated in Figure 1 below and described in the Statistical Manual.

Important terms are listed in Box 2.

Read Section 1.2 of the Statistical Manual, then continue with the Study Guide.

Examine the extended example “Testing models about mangrove snails”, Topic 1–2, on the CD.

Box 2. Model, hypothesis and null Model: The model is the idea being tested. It may be simple or complex; verbal or mathematical. It may be a description of a pattern, or an explanation for one. Hypothesis: The hypothesis – often also called the alternate hypothesis – is a prediction of what should be observed in a particular situation if the model is correct. Null hypothesis: The null hypothesis is the opposite of the hypothesis and is a prediction of what should be observed if the model is not correct. The hypothesis and null hypothesis are mutually exclusive.

This procedure may seem a bit “fiddly” or, perhaps, mechanical but it does emphasise important points that may otherwise be overlooked.

• Multiple explanations: Any observation or pattern can be explained by several, sometimes very different, models. This does not preclude testing the most obvious explanation first, but it should help in keeping alternatives in mind.

• Multiple predictions: Any model can give rise to many different predictions, or hypotheses, about what should happen if it is true. Some hypotheses may be easier to test, or more informative, than others.

• Logical null hypothesis: The null hypothesis is logically derived from the alternate hypothesis (prediction) and its role is clear.

• Imagination is required: Although the procedure may appear mechanical, it actually emphasises the roles of experience, imagination and ingenuity in the scientific

Topic 1: Approaches to biological studies 8

Study Guide SBI209 Design and analysis of biological studies

process. All of these talents, and more, are required to make interesting observations, valid models and useful hypotheses.

Figure 1. Diagram of the model testing process.

Do Tutorial 1, “Models, hypotheses and null hypotheses”, in the Problems book.

The next sub-topic examines in more detail the role of the statistical test in this procedure.

Sub-topic 3: The role of the statistical test

One of the key features of biological systems – whether the “system” is a single cell or an entire ecosystem – is that they vary. Two seeds from the same plant won’t grow in exactly the same way and two kittens from the same litter won’t behave in exactly the same way.

Also, most of the time that we collect information we can’t observe the entire population. Because we have limited time, money or resources, we usually collect information on just part of the population, a sample.

As a consequence, we have incomplete information based on a sample from a population which varies.

Statistical methods allow us to draw conclusions about the entire population based on the sample collected.

If the samples were collected appropriately, and are large enough, then the conclusions we draw are likely to be reliable. (Later topics consider appropriate ways to collect samples and how to determine if they are large enough.)

When testing models, statistical methods help us to decide between these two alternatives:

• the observations collected are consistent with the null hypothesis so it should be accepted; or

• the observations collected are not consistent with the null hypothesis, so it should be rejected.

Topic 1: Approaches to biological studies 9

Study Guide SBI209 Design and analysis of biological studies

Here, the observations are consistent with the null hypothesis if we would be likely to see the observations we got if the null hypothesis was true.

The observations are not consistent with the null hypothesis if we would not be likely to see the observations we got if the null hypothesis was true.

Statistical tests may be simple or extremely complex. They may be done with a pocket calculator or require thousands of computations on a powerful computer. Regardless of the complexity of the procedure, any statistical test of a null hypothesis is ultimately done with the aim of deciding whether to accept or reject that null (and the model).

If necessary re-read the section on the “Statistical test” in Section 1.2 of the Statistical Manual, then continue with the Study Guide.

It is important to recognise that a statistical test does not prove that a null hypothesis is either true or false. The test provides objective estimates which we can use to decide whether to accept or reject the null.

Do the Excel tutorial exercise “Testing a null” on the CD. This is a simple exercise in using observations (data) to test an hypothesis. It does not actually involve statistical tests, just your personal assessment of what is “likely” or “reasonable”.

You will NOT find detailed instructions for the Excel tutorials in this book. Each of the workbooks has an “Instructions” page which provides (of course) instructions on how to use the worksheets in the workbook. For many (but not all) of the workbooks there are also instructions on the CD.

The decision we make – to accept or reject the null hypothesis – is based only on probabilities, rather than certainties. It is, therefore, always possible that we will make a mistake (although this may be very unlikely in some situations). The next sub-topic looks at these potential errors.

Sub-topic 4: Type I and Type II errors

In the “Testing a null” exercise you probably found that on some occasions you rejected a correct null hypothesis and on other occasions you accepted an incorrect null hypothesis. Statisticians refer to these two different types of incorrect decisions as Type I and Type II errors.

Read Section 1.6 of the Statistical Manual, then continue with the Study Guide.

There is also a definition of “Type I and Type II errors”, Topic 1–4, on the CD.

The Type I error rate is determined by the significance level used for the statistical test. The usual significance level in biology is 5% (or 0.05 or 1 chance in 20).

Topic 1: Approaches to biological studies 10

Study Guide SBI209 Design and analysis of biological studies

The Type II error rate is determined by several factors, including the amount of data (number of replicates) collected, the variability in the population and the design of the study.

Do the Excel tutorial exercise “Exploring Type I and Type II errors” on the CD. This is an extension of the “coin tossing” example designed to look in more detail at Type I and Type II errors.

Summary • Biologists collect information to either estimate some

characteristic of the population being studied, or test a model about how the world works. Sometimes a study may be designed to collect information which will be used for both purposes.

• By adopting a particular framework, we can make the process of testing ideas, or models, clearer and more methodical. The framework described – due to Underwood – proceeds from Observation (usually) to Model, to Hypothesis (HA), to Null Hypothesis (H0), to Experiment, to Statistical Test, and finally, to a decision about the Null Hypothesis and, therefore, the Hypothesis and Model.

• Statistical tests are done to guide our decision to either accept or reject the null hypothesis. Most tests in biology are done at a significance level of 5% (or 0.05) but this is just a convention and may be varied if circumstances require it.

• The outcome of the statistical test will, hopefully, lead us to make the correct decision about the null hypothesis but it is always possible (although, again hopefully, unlikely) that we may either incorrectly reject a true null or incorrectly accept a false null. Statisticians refer to these two types of mistakes, respectively, as Type I and Type II errors.

Topic 2: Elements of design 11

Study Guide SBI209 Design and analysis of biological studies

Topic 2: Elements of design

DESI

GN A

ND A

NALY

SIS

OF

BIOL

OGIC

AL S

TUDI

ES

Topic 1 Approaches to biological studies

Sub-topic 1: Variables, distributions and summary statistics • Example: Types of variables [CD] • Exercise: Types of variables; Exercise 1, in Tutorial 2 in

the Problems book. • Excel exercise: Distributions [CD] • Exercises: Mathematical operations, Memory

operations, Statistical operations, Summary statistics; Exercises 2–5, in Tutorial 2 in the Problems book.

• Excel exercise: Summary statistics [CD]

Sub-topic 2: Accuracy and precision • Exercises: Precision; Exercise 6, in Tutorial 2 in the

Problems book. • Excel exercise: Reliability [CD] • Example: Sample size and precision [CD]

Sub-topic 3: Estimation and sampling schemes • Exercises: Sampling schemes; Exercise 7, in Tutorial 2 in

the Problems book.

Sub-topic 4: Manipulative experiments • Example: Mensurative and manipulative experiments

[CD] Note: [CD] means that the item is on the CD-ROM.

Topic 2 Elements of design

Topic 3 Problems and solutions

Topic 4 Testing hypotheses about

one or two means

Topic 5 Testing hypotheses about

more than two means

Topic 6 Testing hypotheses

about frequencies

Topic 7 Testing hypotheses

about relationships

Topic 8 An introduction to

multivariate analysis

Topic 2: Elements of design 12

Study Guide SBI209 Design and analysis of biological studies

Introduction

Topic 1 introduced a framework for testing biological ideas (or answering biological questions). In this topic, we look in more detail at the kinds of information biologists collect and the main reasons they collect it. We shall also examine factors which can affect the reliability of that information.

The major issues you will investigate in this topic will provide you with a framework for:

• identifying the types of variables biologists collect information on;

• understanding and evaluating the importance of the accuracy and precision of this information;

• appreciating the advantages and disadvantages of different study designs; and

• understanding the difference between mensurative and manipulative experiments and considerations important for both.

Sub-topic 1: Variables, distributions and summary statistics

Terms

In the unit so far, we have considered the observations that biologists might make only in general terms as “information”, “data” or “results”. It is now time to be more specific.

If you have not already done so, review the terms in Table 1–2 of the Statistical Manual indicated, then continue with the Study Guide.

Definitions of some of the most important terms are summarised in Box 3.

A one sentence description of why biologists make observations is:

• Biologists make observations of selected variables on a sample from the population to estimate the value of one or more parameters of that population.

The variables are the features, or characteristics, of the population that the biologist is interested in. The parameters are unknown values for the target population. Because we usually cannot examine the entire population, we try to learn about it by making observations on a sample from it. For instance, we might measure the amount of lead in a sample of oysters to estimate the average amount of lead in the entire target population. In this case, the variable is “amount of lead” and the parameter of interest is the “average amount in the population”.

This discussion may appear most relevant to situations where the information is being collected for the purposes of estimation. For instance, estimates of the lead content of oysters may be required for public health purposes. The discussion is, however, equally relevant to model testing, although in such cases estimates might be made for two or more different populations. For instance, estimates of the lead content of small and large oysters might be made to test hypotheses, and

Topic 2: Elements of design 13

Study Guide SBI209 Design and analysis of biological studies

models, about how the animals accumulated the metal during growth.

Box 3. Variables, populations, samples and parameters Variable: A variable is any observable feature of the natural world. Examples of variables are: the number of limpets in a quadrat, the moisture content of a leaf, the sex of a frog. These are all variables because they have the potential to vary. Population: The set of all possible observations on a variable is the population. The population is the thing being studied. For example, we might want to know the average weight of barramundi in Yellow Waters Lagoon. The variable is weight of barramundi and the population would be the weights of all fish in the lagoon. Sample: Large and infinite populations cannot be observed in their entirety, so we take only (nearly always randomly) a sample (sub)set of observations and attempt to draw conclusions about the population from this sample. Parameter: A parameter is some characteristic of the distribution of the values of a variable in a population. For example, if the population is the weights of barramundi in Yellow Waters Lagoon now, then one parameter of that population is the mean (i.e. the average weight), another is the variance.

Types of variables

All observations are not created equal! In practice, biologists commonly make observations on up to three different types, or kinds, of variables.

These three types of variables are summarised in Box 4.

Box 4. Types of variables Nominal or Classification: Features which can be classified into named groups, lacking order, such as habitat, sex, guild, organelle Ordinal or Ranking: Features which can be ranked in order, such as social position, size-class Numerical or Quantitative: Features which can be enumerated or quantified (counted or measured), such as weight, number, temperature. Discrete numerical variables are things, such as individuals, which can only be counted in whole numbers, whereas continuous numerical variables are things which can be measured in fractions, such as weight.

For further information read section 1.4 of the Statistical Manual, then continue with the Study Guide.

Do Exercise 1 in the tutorial “Variables, distributions and summary statistics” in the Problems book.

IMPORTANT: The kinds of operations, and statistical tests, which are sensible and appropriate depends on the types of variables which have been recorded. This is why it is important to be able to recognise the different types of variables.

The two names for each type of variable (e.g. nominal and classification) are interchangeable and used about equally often, so you should become familiar with BOTH terms.

Topic 2: Elements of design 14

Study Guide SBI209 Design and analysis of biological studies

Examine the extended example “Types of variables”. Topic 2–1, on the CD.

Frequency distributions

Information in its “raw” state is rarely very useful. To make sense of it – even before we use it for estimation or model testing – we may need to organise it, summarise it, or present it in a more interpretable form. One way of presenting data in a more interpretable form is to draw a graph showing its frequency distribution. The x-axis of the graph shows the different categories of observations, while the y-axis shows how often each of those observations was seen, or recorded, in the sample.

SIZE CLASS

NU

MB

ER O

F SN

AIL

S

0

10

20

30

40

50

60

70

80

14 16 18 20 22 24 26 28 30 32 34 36 38

Mean=30.1 mmVariance=24.5 mm

Figure 2. Size-frequency distribution for snails.

For instance, Figure 2, which we shall look at in more detail in Topic 6, shows the numbers (or “frequencies”) of snails counted in different size classes in a mangrove forest.

Read section 1.5 of the Statistical Manual for a little more on frequency distributions, then continue with the Study Guide.

Frequency distributions can be drawn for the results of observations of any type of variable. When frequency distributions for continuous variables are constructed, however, the observations usually have to be grouped together into classes; for example, size-classes (as in Figure 2).

IMPORTANT: Examining the frequency distribution for some data is often a good way of checking that the results conform to your expectations. If the shape of the distribution is not as you expect it to be, then you may have made mistakes or something unusual may be going on.

Summary statistics

A frequency distribution provides very complete information about the results for a variable in a sample but, in some cases, it may tell us more than we need to know. Also, graphs of frequency distributions tend to take up rather a lot of space, so plotting distributions may not be the best approach when we simply want to review the main features of the results. For this reason, statisticians have derived summary statistics which, as the name suggests, summarise in statistics (that is, numbers) some key features of the data. There are two categories of such statistics: measures of location and measures of shape.

These statistics are summarised in Boxes 5 and 6. Box 7 shows the symbols used to refer to the most commonly used statistics for samples and populations.

Topic 2: Elements of design 15

Study Guide SBI209 Design and analysis of biological studies

Box 5. Measures of location These statistics provide information on where most of the values in the sample (and the population) lie. The examples below use this sample: 3, 4, 4, 5, 6, 7, 8. Mean: The mean is the average value. The type of mean most commonly used is the arithmetic mean, calculated by adding up all of the values and dividing by the number of values. The arithmetic mean of the values above is about 5.29. (See Section 3.1 in the Statistical Manual for information about the other types of means.) Median: The median is just the middle value, if the observations are put in order. The median of the sample above is 5. Mode: The mode is just the most common value; 4 in the sample above.

Box 6. Measures of shape These statistics provide information about the spread of the values in the sample and the shape of the distribution. Variance: The variance is a measure of how much variation there is in the sample. If most of the values are close together, the variance will be small; otherwise it will be large. This is the only one of these measures that is commonly used. A related measure is the standard deviation, which is just the square root of the variance. The standard deviation is sometimes more convenient to use because it is usually a smaller number. Kurtosis: The kurtosis measures the “peakedness” of a distribution: think of spiky mountains compared to flat hills. Skewness: The skewness measures the extent to which the distribution is “pushed”, or skewed, to one side or other of the mean. If a distribution is very skewed, the mean may not give a good idea of the “average” value. Skewness is likely to create more problems than kurtosis.

Box 7. Symbols Mean of a sample: X (read as “Xbar”) Mean of a population: µ (read as “mu”) Variance of a sample: s2 (read as “s squared”) Variance of a population: δ2 (read as “sigma squared”) Standard deviation of a sample: s (read as “s”) Standard deviation of a population: δ (read as “sigma”) Important: the standard deviation is the square root of the variance; this is the reason that the variance (of a sample) is “s2

” and the standard deviation is just “s”.

Do Exercises 2 to 5 in Tutorial 2, “Variables, distributions and summary statistics”, in the Problems book.

IMPORTANT: You can use the Excel workbook “Summary Statistics” on the CD to do Exercise 5 (see below) but it is a good idea to do part of this exercise using your calculator. You could use the workbook to more easily explore the effect of changing the number of classes.

Explore the Excel workbook “Summary Statistics” on the CD and compare the results you get using it for Exercise 5 to your own calculations. Explore the effect of distribution shape on the mean, median and mode (see the CD for more details).

Topic 2: Elements of design 16

Study Guide SBI209 Design and analysis of biological studies

Sub-topic 2: Accuracy and precision

Estimates are of little value unless they are reliable. There are two different aspects of reliability which are important: accuracy and precision. Estimates are accurate if they are not biased away from the true value. If, for instance, you are weighing samples but the weight balance is not properly zeroed, then your readings will be biased – always a bit too high or low – and not accurate. In contrast, estimates are precise if repeated estimates tend to be close together. For instance, if you weigh the same sample three times and get values of 1.35 g, 1.34 g and 1.35 g, then your readings are precise. Of course, if the weight balance is not zeroed properly, these readings could still be biased and inaccurate.

A “dart board” analogy often makes this clearer (Figure 3). In this picture, the circle “darts” are precise and accurate: they are close together and grouped around the “bull’s eye”. The star darts are accurate – because they tend to be centred on the bull’s eye – but they are all over the place and not precise. In contrast, the diamond darts are precise – they are all close together – but they are inaccurate, being way to the right, and slightly above, the bull’s eye. The square darts are the worst of the lot: they are neither precise – they are all over the place – nor accurate – they are all over to the left of the bull’s eye.

Figure 3. “Dart board” illustration of the terms “accuracy” and “precision”.

Accuracy

Ensuring that observations are accurate and precise requires paying attention to two different sorts of issues. Accuracy is a function of the methods used to make the measurements. To ensure that observations are accurate you have to pay attention to all the processes involved in making them. Any instruments are used must be free of bias, in good working order and properly calibrated. You should select samples randomly so that they properly represent the population (more on this later). And you should take care when taking readings to ensure that you don’t introduce errors, particularly systematic errors. In some situations some bias is inevitable. For instance, nets are known to be selective, tending to catch only particular kinds and sizes of fish. When there are problems like this, you need to be sure that the resulting information will still be useful and that you take any biases into account when interpreting the results.

Topic 2: Elements of design 17

Study Guide SBI209 Design and analysis of biological studies

It is critically important that you make sure that your observations are as accurate as required. Doing this, however, involves evaluating the observational process and this process will depend greatly upon the type of measurements being taken. It is primarily a methodological, and not statistical, issue so we won’t consider it further. It is very important to realise that the accuracy of measurements usually cannot be determined after the fact from the measurements themselves. The accuracy of a set of measurements can usually only be determined by evaluating the measurement process. It may, for instance, involve the comparison of measurements taken using different methods.

Precision

In contrast, precision is a function of characteristics of the sample. Three of the most important aspects of the sample are:

• the number of observations made, usually referred to as the number of replicates but also called the sample size;

• the variability in the sample, which is a function of the variability in the population;

• the actual size of each individual sample (or replicate), which is, unfortunately, easy to get mixed up with the number of replicates (“sample size”); and

• the arrangement of the samples.

The section below examines how the precision of estimates is measured. It also considers how the precision of estimates can be affected by the number of replicates and the size of the sample unit (e.g. quadrat, water sample, etc.). Sub-topic 3 looks

at the issue of the arrangement of the samples; that is, how they are collected.

Precision, number of replicates and sample size

It is usually easy to determine the precision of an estimate from the observations collected and, in some circumstances, it may be possible to collect more observations to improve precision if this is required. The two most commonly used measures of precision are the standard error of the mean and a confidence interval around the mean. The latter is based on the standard error but can be easier to interpret.

Read Section 4.2 of the Statistical Manual then continue with the Study Guide.

Do Exercise 6 in Tutorial 2, “Variables, distributions and summary statistics”, in the Problems book.

IMPORTANT: You can use the Excel workbook “Reliability” on the CD to do Exercise 6 (see below) but it is a good idea to also do this exercise using your calculator.

Explore the “SE” and “Confidence Interval” pages in the Excel workbook “Reliability” on the CD.

Examine the example “Sample size and precision”, Topic 2–2, on the CD.

Topic 2: Elements of design 18

Study Guide SBI209 Design and analysis of biological studies

Sub-topic 3: Estimation and sampling schemes

I have already mentioned that one thing that can affect the precision of estimates is the arrangement of the samples, in other words, how they are collected in space (or time). In fact, particular kinds of arrangements – or sampling schemes – have been designed for different situations. In this sub-topic we will look at the features and uses of these.

These are the four basic types of schemes (Figure 4; Table 1).

• Simple random sampling. In this, the simplest scheme, samples are just randomly selected from the area, or population, of interest. For example, someone trying to estimate the number of starfish on a rocky shore might simply throw quadrats around randomly in the area of interest and count the number of animals in each.

• Stratified sampling. In this scheme, the area, or population, to be sampled is first divided into strata, with samples taken randomly within each stratum. For instance, someone studying beach animals may divide the shore into low-shore, mid-shore and high-shore zones, and then take random samples within each of these three zones. Stratified sampling often provides more precise – sometimes very much more precise – estimates than other methods because samples within each stratum tend to be rather similar. This similarity results in reduced variability and increased precision. In addition to providing estimates of population characteristics, such as mean abundance, this scheme is useful for testing hypotheses; this is discussed in more detail below.

• Cluster sampling. This scheme is often used when the items of interest tend to occur in clumps or clusters. In the first stage of sampling, several clusters are randomly selected. In the second stage, random items from each cluster are selected for measurement. There may, however, be more than two stages of selection. For instance, someone studying leaf physiology might first select some trees randomly from those of interest. They would then randomly select some branches on each tree, finally taking some leaves randomly off each branch. In situations where “things”, such as leaves, occur naturally in clusters, this type of sampling can be very efficient and save considerable time. Think which would be quicker: picking 27 leaves randomly – truly randomly – from a forest, or picking 3 randomly selected leaves from each of 3 randomly selected branches from each of 3 randomly selected trees?

• Systematic sampling. In this scheme, the starting point might be randomly selected but after this samples are selected according to some system. For instance, someone studying crops in an orchard might sample every second tree.

Topic 2: Elements of design 19

Study Guide SBI209 Design and analysis of biological studies

Figure 4. Illustration of four basic types of sampling schemes.

Table 1. Advantages, disadvantages and uses of different schemes.

Scheme Advantages Disadvantages Uses

Simple random (SR)

Usually simple to use

Provides limited information, probably not efficient or precise

Pilot studies, simple studies

Stratified

Usually provides more precise results than other methods; provides more information than SR

More complex to run and analyse; may take more time to sample

Situations where an area, or population, can be divided into homogeneous strata; testing hypotheses

Cluster

When the situation is suitable, this scheme is likely to be more efficient; provides more information than SR

More complex to run and analyse; may be less precise than stratified sampling

Situations where items of interest are naturally grouped in clusters; can be used to test some types of hypotheses

Systematic Usually simple to use

Unless done carefully, may provide biased estimates

Drawing maps and similar situations

It is important to realise that the last scheme – systematic sampling – can have serious problems in some situations. For instance, suppose that, in the “orchard” example above (and in Figure 4), we were interested in the water status of the trees: were they getting enough water? If irrigation lines ran next to every second tree, and these happened to be the trees we sampled, then we would get a biased and unreliable idea of

Topic 2: Elements of design 20

Study Guide SBI209 Design and analysis of biological studies

how the trees were doing. Randomly selecting trees to sample would solve this problem.

IMPORTANT: Systematic sampling may provide biased estimates in some circumstances and should be used in situations, such as drawing maps, when a very even coverage of an area is required.

Do Exercise 7 in Tutorial 2, “Variables, distributions and summary statistics”, in the Problems book.

Testing hypotheses

Stratified schemes, in addition to providing estimates of population characteristics, can be used to test hypotheses. For instance, we may have developed a model which says that beach worms should be most abundant lower on the beach where it is damper. After stating appropriate alternate and null hypotheses, we could test this model by collecting samples at different distances up the beach. The scheme might end up being very similar to that illustrated in Figure 4, where the beach is divided into three zones, parallel to the water, and samples are taken in each of these zones (and see next section).

Sub-topic 4: Manipulative experiments

If, as described above, we test a model about beach worms by collecting samples in zones at different distances from the water, then we are doing a sampling study. Such studies may also be called sampling experiments or mensurative experiments. In discussing these, and other, types of experiments in more detail, it will help to develop this example in more detail.

• Observation: Casual observations suggest that beach worms might be more abundant closer to the water line where the sand is wetter longer.

• Model: Beach worms are more abundant closer to the water line (so my casual observations were correct; this is an example of an observation model; see Box 8).

• Hypothesis (HA): If I divide the beach into three zones, parallel to the water line, and I take samples of sand from each zone, then the numbers of worms in the sample will be higher in the samples in the lower zones.

• Null hypothesis (H0): If I divide the beach into three zones, parallel to the water line, and I take samples of sand from each zone, then the numbers of worms in the sample will not be higher in the samples in the lower zones.

• Mensurative experiment: Divide the beach into three roughly equal strips, parallel to the water line, and collect four (say) samples of sand from random positions in each zone and count the number of worms in each sample.

This is a sampling experiment – or sampling study – because all that is required to test the hypothesis is to collect some observations. In this particular case, collecting the observations requires that we first collect appropriate samples of sand, because the worms are usually hidden in the sand. If the animals were always obvious on the surface, then we would simply need to go to each randomly selected spot in each zone and count the number of animals in a defined area (as is done, for instance, on the rocky shore). Even if we need to take the

Topic 2: Elements of design 21

Study Guide SBI209 Design and analysis of biological studies

samples back to the laboratory and subject them to complex processing in order to collect the observations (e.g. perhaps to extract and identify the animals), this would still be a sampling experiment because, aside from collecting the samples, we don’t alter the environment. Also, note that the hypothesis specifies that this is a sampling experiment – read it and you will see this – it is not something that is added later.

Box 8. Sampling experiments and observation models Observation: Casual observations suggest that beach worms might be more abundant closer to the water line where the sand is wetter longer. Model: Beach worms are more abundant closer to the water line (so my casual observations were correct). The observation and model here illustrate one common use of sampling experiments: to test observation models. The observation in this example essentially states that a pattern in the distribution of the beach worms appears to exist; specifically, that they are more common lower on the shore. Now, if this pattern does exist, then it may be due to the sand being damper, as suggested, or perhaps predators (such as birds) are less effective, or perhaps food is more abundant. All of these explanations are process models: they invoke biologically plausible reasons (or processes) to explain what we have observed. It is, however, also possible that my casual observations were misleading and that the worms are not actually more common lower on the shore. Thus, in some situations – such as this one – it may be prudent to check that the observations are correct and that the supposed pattern does actually exist. To do this, we can propose an observation model that simply states that the casual observations were reliable.

Sampling experiments, designed to test appropriate hypotheses, are one way to test models but they have some disadvantages (but also advantages; see later). In particular, they rarely provide unambiguous tests of models concerning biological processes. For instance, suppose that we discover that beach worms are indeed more abundant lower on the shore. Does this prove that the reason for this is that the soil is wetter for longer? Of course not; there are several other plausible explanations (see Box 8).

More definitive tests of models can be done by proposing hypotheses which are tested using manipulative experiments. Suppose that, by doing the sampling experiment described earlier, we have established that beach worms are more abundant lower on the shore. We might then construct an explanation for this observation and propose an hypothesis requiring an experimental test.

• Observation: Beach worms are more abundant lower down the shore, where the sand is damper for longer.

• Model: Beach worms are more abundant lower because they cannot tolerate being dried out as would occur higher on the shore.

• Hypothesis (HA): If I transplant worms (with sand) to a mid-shore region and keep them damp (by spraying sea water over the sand) then they will survive better than worms transplanted to the mid-shore but not kept damp, and as well as worms lower on the shore.

• Null hypothesis (H0): If I transplant worms (with sand) to a mid-shore region and keep them damp (by spraying sea water over the sand) then they will not survive better

Topic 2: Elements of design 22

Study Guide SBI209 Design and analysis of biological studies

than worms transplanted to the mid-shore but not kept damp, or as well as worms lower on the shore.

• Manipulative experiment: Set up replicate trays containing sand from the low shore region and put equal numbers of worms into each, then set up these treatments: (a) trays placed low on the shore; (b) trays placed in mid-shore regions; and (c) trays placed in mid-shore regions and watered (with sea water). Keep track of the numbers of worms remaining in each tray (see diagram).

Figure 5. Experiment to test whether greater moisture is responsible for more worms lower on the shore.

This is a manipulative experiment because we are deliberately and carefully altering the natural conditions in selected parts of the environment. Because these parts of the environment – in this case, boxes of sand with worms – start out the same, any difference that develops must be due to the manipulation. In this case, if the worms do better in the watered boxes in the mid-shore area (compared to the unwatered boxes) then the only plausible explanation for this result is an effect of water. It can’t be anything else because we started with the same type of sand, and the same number of worms, in all the boxes. Manipulative experiments provide direct and unambiguous tests of hypotheses (and models), if done correctly (how to ensure that these experiments are done correctly is the next topic).

The example described here is a field experiment. The rationale for laboratory experiments is similar, although conditions in such experiments are usually more artificial and tightly controlled. Indeed, an experiment, like the one described here, could have been done in the laboratory. (The particular experiment described here contains some complexities: these will be examined in the next topic.)

Examine the example “Mensurative and manipulative experiments”, Topic 2–4, on the CD.

Summary

• The three types of variables observed are nominal or classification (or categorical), ordinal or ranking, and numerical or quantitative.

• To be reliable, observations and estimates should be accurate (not showing bias) and precise (not too variable).

Topic 2: Elements of design 23

Study Guide SBI209 Design and analysis of biological studies

• The three basic types of sampling schemes used are simple random sampling, cluster (or nested or hierarchical) sampling and stratified sampling. Systematic sampling may be useful for certain specific tasks (e.g. drawing maps).

• Models may be tested, after stating appropriate hypotheses, by doing sampling (or mensurative) experiments or manipulative experiments. The latter usually provide more definitive tests of models about processes.

24

Study Guide SBI209 Design and analysis of biological studies

Topic 3: Problems and solutions 25

Study Guide SBI209 Design and analysis of biological studies

Topic 3: Problems and solutions

DESI

GN A

ND A

NALY

SIS

OF

BIOL

OGIC

AL S

TUDI

ES

Topic 1 Approaches to biological studies

Sub-topic 1: Problems with replication • No exercise or additional material.

Sub-topic 2: Problems with confounding • Explanation and examples: Types of confounding [CD]

Sub-topic 3: Problems with independence • Examples and further explanation of non-

independence [CD]

Sub-topic 4: Review exercises • Exercises: Experimental problems; Exercise 1, in

Tutorial 3 in the Problems book.

Sub-topic 5: Enhancements to design • No exercise or additional material. Note: [CD] means that the item is on the CD-ROM.

Topic 2 Elements of design

Topic 3 Problems and solutions

Topic 4 Testing hypotheses about

one or two means

Topic 5 Testing hypotheses about

more than two means

Topic 6 Testing hypotheses

about frequencies

Topic 7 Testing hypotheses

about relationships

Topic 8 An introduction to

multivariate analysis

Topic 3: Problems and solutions 26

Study Guide SBI209 Design and analysis of biological studies

Introduction

It is sad, but true, that a superb analysis cannot save a poorly designed and executed study. And, unfortunately, studies – including published studies – have problems of design or execution more often than is desirable. For example, Underwood (1985) reviewed studies testing ideas about competition and concluded that 55% had one or other problems. Hurlbert (1984) reviewed studies in ecology and found that 47% were pseudoreplicated (a problem discussed below). And when I analysed experimental studies published in the journal Ecology I found that about 25% had non-independent data (and didn’t deal with this correctly). In most cases, the problems identified concern the design of the study, not the subsequent analysis of the data (although there may also be problems there).

The situation appears to have improved since these reviews were done but problems still occur all too often. Further, these problems are generally more serious than analysis flaws because analyses can always be redone (as long as the data exist). But once a study is over – and sometimes once it has started – it is usually too late to fix problems. I know this from personal experience. The largest experiment (in terms of replicates) I did for my Ph.D. – which concerned the ecology of organisms living on sea shore boulders – involved marking and manipulating hundreds of rocks of various sizes. Unfortunately, perhaps because I got carried away with the size of the study, the experiment contained a fatal flaw which would have made any results uninterpretable (I left out a critical control). (Fortunately(!), a storm destroyed the whole thing before I had

to decide what to do with the data. I then set up a much more modest, but correctly designed, study.)

The major issues you will investigate in this topic will provide you with a framework for

• identifying common problems in the design of sampling and manipulative experiments;

• understanding the implications of different sorts of design problems or flaws; and

• deciding how to resolve such problems.

Sub-topic 1: Problems with replication

As discussed earlier in this unit, biological systems are inherently variable: individual animals, plants, tissue samples and study plots are likely to show varying responses in any situation (even sometimes in those cases where the different individuals are genetically identical). Statistical methods enable us to draw reliable conclusions, despite this inherent variation, but to use these methods we need to have some idea of the range of variation in responses. This usually means that we need to make several repeated, or replicate, observations under similar conditions.

Now suppose that we suspect that the females of a particular species of frog are larger than the males. Consider the following three sets of observations.

• Set 1 – Find one female and one male and measure the weight of each frog.

Topic 3: Problems and solutions 27

Study Guide SBI209 Design and analysis of biological studies

• Set 2 – Find one female and one male and measure the weight of each frog several times.

• Set 3 – Find several females and several males and measure the weight of each frog.

The first set of observations (Set 1) contains no replicates and no information about variability in the weight of female and male frogs. Obviously, we can’t draw any reliable conclusions about males and females from this data (and even if males and females are the same size on average, there is a 50% chance that any randomly selected female will be bigger than a randomly selected male). This situation is unreplicated.

The second set of observations (Set 2) does contain replicate measurements but these are just repeated measurements of the same two frogs. These data tell us how precise our measurements of weight are but do not provide any information on variation in the size of males and females. Here we have (or appear to have) replicate observations but they are not the right sort of replicates for testing the hypothesis. This situation is pseudoreplicated.

The third set of observations (Set 3) provides information on the average weights of male and female frogs and on how much variation there is among different male, and different female, frogs. We can use this information to do a valid test of the hypothesis. This situation is properly replicated.

A properly replicated study is essential and the problems of no replication and pseudoreplication are examined in a little more detail below. A separate, but still important, problem may occur if a study is insufficiently replicated. In this situation, the study is

designed correctly – and proper replicate observations have been collected – but the number of observations is not sufficient to accomplish the objectives of the study. For instance, it may be that male and female frogs really do differ in size (as measured by weight) but the actual difference is rather small compared to the variation among frogs of the same sex (e.g. the difference between males and females may be 10 g but in the population frogs of the same sex may differ by up 15 g). In this situation, a study with limited replication – say two or three frogs of each sex – may not provide a sufficiently powerful test of the hypothesis: a larger sample size may be needed. This is an important problem but it is beyond the scope of this unit to examine it in detail.

The problem of no replication

If a study is unreplicated, then differences – for instance, among experimental treatments – may just be due to chance and there is no way of determining how likely this is. Consider the experiment in Figure 6, which is designed to test the model that predatory snails limit the numbers of clams on the shore. The researcher has found a piece of shore where clams and snails occur then removed the snails from one large patch (and he will continue to do this to maintain this state). If, over time, the number of clams in the “removal” are greater than the number in the “remain” treatment, then this could be due to predation or it might just mean that the “removal” patch was more favourable for other reasons. There is no way to tell from this experiment.

Topic 3: Problems and solutions 28

Study Guide SBI209 Design and analysis of biological studies

Figure 6. Unreplicated experiment testing effect of predators on clams.

The problem of pseudoreplication

In a pseudoreplicated study, replicates appear to be present but they are the “wrong kind” of replicates. Consider the revised version of the experiment above. Now the researcher will have replicate counts from plots where predators are “removed” and “remain” but the basic design of the study has not changed. All the “replicate” “removed” counts still come from the same patch of shore and the problem identified above (it may be a better bit of shore) remains. (In fact, the lack of appropriate replication here means that the study is confounded, an issue discussed in the next section.) To do a valid experiment, the researcher would have to locate several different patches of seashore containing both species and then remove the predatory snails from (a randomly selected) half of these patches (as, for example, in Figure 8).

Figure 7. Pseudoreplicated experiment testing effect of predators on clams.

Figure 8. Properly replicated experiment testing effect of predators on clams.

Studies can also be pseudoreplicated in time. Probably the most common situation in which this is done is when testing for

Topic 3: Problems and solutions 29

Study Guide SBI209 Design and analysis of biological studies

seasonal changes. Suppose that I want to test if there is a seasonal pattern in the abundance of frogs in a lagoon, so this year I go out and collect these observations:

• 2003, Wet: January = 35; February = 32; March = 33

• 2003, Dry: June = 28; July = 15; August = 23

There are certainly fewer frogs in the dry season in this particular year than in the wet season but this may be an unusual year. To do a valid test for a seasonal pattern in abundance I would need (unfortunately) to collect observations in the wet and dry seasons of at least two years (and more would be better). Only then would I have a properly replicated (and not pseudoreplicated) set of observations.

Sub-topic 2: Problems with confounding

It is, obviously, critically important that any tests we do of hypotheses are fair. By this I mean that, whether we end up accepting or rejecting the null hypothesis, this decision is a fair reflection of the true state of affairs. As discussed in Topic 2, this requires, in the first instance, unbiased observations. But this alone may not be sufficient to ensure a fair test.

One common problem resulting in unfair tests is confounding. Confounding is easier to illustrate than explain. Suppose that we want to test the effect of a new fertiliser on the growth of pot plants (as in plants in pots). One way to do this would be to select some pot plants and feed some of them with the new fertiliser but not the others. Clearly, it would not be a fair test if we deliberately decided to feed the fertiliser only to those plants in poor condition. The plants might not show much of a response because they were in a bad way to start with. It would be

equally bad to feed the fertiliser only to the plants in good condition. Here the effects of the fertiliser might be exaggerated because the unfed plants were struggling to start with. In both of these situations, we would have confounded the (potential) effects of the fertiliser with the (potential) effects of poor health. At the end of the study, we would have no way of untangling the effects of the fertiliser from the effects of poor health.

One way of visualising this problem is by drawing a table like this (using the last situation as the example):

Condition? Fertilised?

Control plants Poor No

Treated plants Good Yes

This clearly illustrates that the (potential) effects of “fertiliser” – the factor we are interested in – and “condition” are confounded and inseparable.

Make no mistake: this is an extremely serious problem. If such a problem exists, it represents a “fatal flaw” in the design of a study. The study is not a fair test of the hypothesis and the information collected may very well be useless.

In this “plants and fertiliser” example the problem can be avoided by randomly assigning plants to the “control” and “fertiliser” treatments. (Just selecting plants in “good” condition won’t help because some plants will always be in better condition than others, or differ in response to fertiliser, or differ in something else: randomisation is the only solution.) And randomisation is the solution to confounding problems which

Topic 3: Problems and solutions 30

Study Guide SBI209 Design and analysis of biological studies

occur due to poor allocation of replicates. This is, however, not the only way in which the problem of confounding can arise. For instance, the lack of appropriate replication in the “snail experiments” illustrated in Figure 6 and Figure 7 creates confounding:

Patch? Predation?

Snails present 1 Yes

Snails removed 2 No

There are also other variations on the “theme” of non-random allocation of replicates. These issues are explored in more detail on the CD.

About “controls”

The two examples above (fertilising plants and removing snails) illustrate the importance of evaluating the design of experiments – both mensurative and manipulative – for potential confounding.

It can be particularly easy to introduce confounding into manipulative experiments because in such experiments we alter and interfere with the natural system. We need to do this to create the particular circumstances required to test the null hypothesis. Unfortunately, when doing this we may also introduce other, undesired, changes.

Suppose, for instance, that we want to test whether being rolled around affects the diversity of communities on intertidal boulders (the null hypothesis being that it doesn’t). To do this we might set up two treatments (I actually did this experiment

in my Ph.D.): one consisting of natural rocks, which we just observe; and the other of stable rocks, which are fixed to the bottom by having a bolt put through them so that they can’t move.

Bolt? Rolled?

Natural rocks No Yes

Stable rocks Yes No

Now the “stable rocks” can’t be rolled, which is fine, but the problem is that they also have a large metal bolt stuck through them! Differences between the “natural” and “stable” rocks might occur because the latter can’t roll but could also be due to the presence of the bolt (leaching metal, for instance).

The solution to this problem is to introduce a third treatment: a “bolt control” treatment. These rocks have the bolt put through them but they aren’t actually anchored to the bottom: they have a bolt but can still roll.

Bolt? Rolled?

Natural rocks No Yes

Stable rocks Yes No

Bolt control Yes Yes

Now we can compare the “natural rocks” with the “bolt control” rocks to test if there are any effects of the bolt itself. If there are, then this can be taken into account when interpreting the results of the “stable rocks”.

Topic 3: Problems and solutions 31

Study Guide SBI209 Design and analysis of biological studies

These additional treatments, used to test for potential confounding in manipulative experiments, are often referred to as “controls”. A “control” may, however, also just be a treatment which hasn’t had anything done to it (as in the “fertiliser” example above): in this case it is “controlling” for the experimental manipulations. Because the term “control” can mean different things, you need to read the descriptions of experiments carefully.

Review Topic 3–2, “Explanation and examples: types of confounding”, on the CD.

Sub-topic 3: Problems with non-independence

The previous problem – confounding – mainly causes difficulties when it comes to interpreting the results of a study; that is, deciding what the results mean (which, if the study is badly confounded, may not be much). The problem of non-independence mainly causes difficulties when it comes to analysing the results of a study. If non-independent data are analysed incorrectly – and a proper analysis may be tricky – then any conclusions are likely to be suspect.

Problems arise because many statistical tests require that the observations – that is, the replicates – are randomly and independently selected and if this is not the case then the results of the test may be unreliable. In this case, “unreliable” means that a test may tend to give too many Type I errors (rejecting a true null hypothesis) or too many Type II errors (accepting a false null hypothesis). The reason for this is that the expected result of the test if the null is true is based on the assumption that

the replicates are independently selected. If this assumption is false, because the replicates haven’t been independently selected, then the expected results may be invalid. And so will any conclusion – about accepting or rejecting the null – based on them.

Observations which are not independent are referred to as non-independent. The simplest way of ensuring that observations are independent is to randomly select the replicates included in the study; and this means, randomly select all the replicates included in all the groups in the study.

Probably the most common way in which non-independence occurs is through repeated observation of the same replicates. For instance, repeated observations – that is, observations made at several different times – on the same individual animals, plants, tissue samples, traps, cages, flowers, reefs or study plots are likely to be non-independent.

Review “Examples and further explanation of non-independence”, Topic 3–3, on the CD.

The simplest way to deal with this problem is avoid it by ensuring that all replicates are independently sampled. In some cases, however, this may not be possible. For example, if I set up an experiment by caging small plots in the mangrove forest, then, obviously, I am going to want to continue to go back to and observe those particular plots. I can’t randomly select new plots each time. In situations like this, there are a few alternative options:

Topic 3: Problems and solutions 32

Study Guide SBI209 Design and analysis of biological studies

• Sub-sample the “replicates” – In some cases, it might be possible to change the nature of the replicates by sub-sampling. In the “mangrove experiment” example, I might actually randomly select and observe smaller plots inside each of the experimental plots. These smaller plots could be randomly selected each time. This is illustrated on the CD (Topic 3–3).

• Do not use statistical analyses to compare groups of non-independent observations – In some cases, in which non-independent observations have been collected, statistical comparisons of the affected groups might not be necessary. Again, in the “mangrove experiment” example, I might only be interested in the differences among the treatments at each time; perhaps at the start, middle and end of the experiment. I may not need to compare the results at different times with each other – and it is here, and only here, that the problem lies.

• Use analyses designed for non-independent observations – If comparisons of non-independent observations are required, then the specialised analyses designed for these types of data should be used. One of these – the paired t-test – is described in this unit but others are beyond the scope and are only mentioned (where relevant). Aside from the paired t-test, which is something of a special case, these other analyses tend to be more complex to do and interpret.

It is very important to note that it is not always a problem to collect multiple observations on the same replicates. For instance, if we wanted to test whether the height of trees is related to their diameter then it would be both natural and

correct to select some trees and measure their height and diameter, taking both measurements on the same trees. In this case, however, what we would actually be testing is whether “height” and “diameter” were related; that is, whether height depended on diameter. In other words, the hypothesis being tested is about whether these observations are independent.

Sub-topic 4: Review and exercises

The three main problems are summarised in Box 9.

Box 9. Problems in design Pseudoreplication occurs when the replicates included in the study do not encompass the kind of variation required to test the hypothesis. For instance, if the hypothesis concerns differences between species, then replicate individuals of each species must be observed. It is wrong to make replicate observations on just one individual of each species. Confounding occurs when the groups included in the study differ in more ways than are required to test the hypothesis. For instance, if an hypothesis about the effects of cane toads on the fauna of natural ponds is to be tested by comparing ponds with and without cane toads, then the two groups of ponds must be similar in all respects except for the presence of cane toads. Non-independence occurs when the same replicates are observed more than once during a study and this fact is not taken into account during the analysis of the results. As analysing non-independent data can be tricky, this situation is best avoided if possible. Note that some specialised designs, and some types of analyses, work with non-independent data: here expert advise and assistance is required.

Topic 3: Problems and solutions 33

Study Guide SBI209 Design and analysis of biological studies

Do Exercise 1 in Tutorial 3, “Design problems and solutions”, in the Problems book.

Sub-topic 5: Enhancements to design

These issues are easiest explained by example. Suppose that a biologist wishes to determine the best temperature and salinity conditions for raising prawn larvae. (Putting this more formally, she wishes to test the null hypothesis that prawn survival is not affected by temperature or salinity and, if this is rejected, identify the conditions which give best survival.) Also suppose that, from previous work, she knows that temperatures below 20°C or above 30°C are unsuitable, as are salinities less than 25 ppt (parts per thousand) or greater than 35 ppt.

Based on this information, she decides to do a laboratory experiment where she attempts to grow larvae at 20°C, 25°C and 30°C, and at 25 ppt, 30 ppt and 35 ppt. She intends to set up replicate tanks, each containing 50 larvae, at various combinations of these conditions. Her final design is illustrated in the table below.

Salinity

25 ppt 30 ppt 35 ppt

Tem

pera

ture

20°C 3 tanks 2 tanks 6 tanks

25°C 2 tanks 2 tanks

30°C 4 tanks 5 tanks

The design for this experiment it unbalanced and incomplete.

• Unbalanced – The design is unbalanced because the seven treatments included in the experiment have different numbers of replicates. Some treatments have only 2 replicate tanks; one treatment has 6. Unbalanced designs can be more difficult to analyse, although this is not a major concern with modern statistical packages. Major packages such as Statistica and SPSS, for instance, would have no problems with this. On the other hand, Excel, which is able to analyse designs of this general type (2 factor designs), cannot handle unbalanced experiments and doing the calculations by hand would be very difficult. Perhaps a more important consideration is the fact that some common kinds of analyses – such as analysis of variance (see Topic 5) – are more robust and reliable when designs are balanced.

• Incomplete – The design is incomplete because not every combination of the two factors being tested – temperature and salinity – is included in the experiment. Two combinations have been left out: 25°C at 30 ppt, and 30°C at 35 ppt. Incomplete designs are more difficult to analyse and this is the case whether or not advanced statistical packages are used. And the results of analyses can be more difficult to interpret. Further, the results of an incomplete experiment may be much less informative (because certain combinations of conditions are missing) than those of a complete design.

It is best to avoid these potential problems by ensuring that designs are both balanced and complete. In most cases this requires little extra effort and, as only rarely are there good

Topic 3: Problems and solutions 34

Study Guide SBI209 Design and analysis of biological studies

reasons not to do this, it should be standard practice. In the example above, this could be accomplished by including the two missing combinations of conditions and using the same number of replicate tanks (say, three) in all treatments. The design would then be as shown below.

Salinity

25 ppt 30 ppt 35 ppt

Tem

pera

ture

20°C 3 tanks 3 tanks 3 tanks

25°C 3 tanks 3 tanks 3 tanks

30°C 3 tanks 3 tanks 3 tanks

Unbalanced or incomplete designs may, however, be used in some special situations in which resources are limited, rare or valuable.

• Rare or valuable resources and unbalanced designs – In some cases it may be necessary to expose rare or valuable resources, such as endangered animals, to danger or risk in order to test critical hypotheses. In these situations it makes sense to have a larger group of “control”, and therefore undisturbed, animals and a smaller group of experimental, and potentially damaged, animals. Here, the value of the animals would over-ride other considerations. (It would, however, be vital to ensure that both groups were large enough for the experiment to be valid; otherwise, the study would risk disturbing or damaging organisms for no benefit.)

• Limited resources and incomplete designs – It may be that only a limited number of replicates – for instance, tanks or reefs or fields – are available or can be observed in the time available. In these situations it may simply not be possible to run a complete experiment and only the particular conditions deemed most important can be included.

Although these special situations do arise, they are not particularly common and balanced and complete designs should be the norm.

Summary • Studies must be properly replicated and, in particular,

pseudoreplication must be avoided.

• Studies should be designed to avoid, so far as possible, confounding because this makes it difficult, if not impossible, to validly test the null hypothesis.

• Studies should be done using randomly selected replicates to avoid the problem of non-independence of observations. In special cases, this guideline may be relaxed but it is then important to ensure that the results can still be analysed validly.

• Balanced, complete designs usually provide more information and are also usually easier to analyse and interpret.

Topic 4: Testing hypotheses about one or two means 35

Study Guide SBI209 Design and analysis of biological studies

Topic 4: Testing hypotheses about one or two means DE

SIGN

AND

ANA

LYSI

S OF

BI

OLOG

ICAL

STU

DIES

Topic 1 Approaches to

biological studies Sub-topic 1: Types of comparisons of two values • No additional material

Sub-topic 2: Comparing the mean of a sample to a value • Example: problem with “goodness-of-fit” tests [CD] • Example: Comparing a mean to a value [CD] • Excel worksheet: t-tests – hypothetical [CD]

Sub-topic 3: Comparing means of paired samples • Example: Comparing means of paired samples [CD] • Excel worksheet: t-tests – paired [CD]

Sub-topic 4: Comparing means of unpaired samples • Example: Comparing means of unpaired samples [CD] • Excel worksheet: t-tests – unpaired [CD] • Exercises: Testing hypotheses about one or two means;

Exercises 1 – 6 in Tutorial 4, in the Problems book.

Note: [CD] means that the item is on the CD-ROM.

Topic 2 Elements of design

Topic 3 Problems and solutions

Topic 4 Testing hypotheses about

one or two means

Topic 5 Testing hypotheses about

more than two means

Topic 6 Testing hypotheses

about frequencies

Topic 7 Testing hypotheses

about relationships

Topic 8 An introduction to

multivariate analysis

Topic 4: Testing hypotheses about one or two means 36

Study Guide SBI209 Design and analysis of biological studies

Introduction

So far in this unit we have examined the steps leading up to the statistical test of an hypothesis: constructing models, hypotheses and null hypotheses; and assessing the validity of the data and study design. (And, as I have already emphasised, these steps are of critical importance, for if something goes wrong here, even the most complex possible analysis may not help.) It is now time to consider the steps after the collection of the data: the analysis of that data and the interpretation of the results of this analysis with respect to the null hypothesis (and hypothesis and model).

In this unit, we will examine in some detail tests for three broad classes of hypotheses:

• Hypotheses about means of samples (Topics 4 & 5) – Here the null hypothesis being tested concerns the mean of one or more samples (e.g. mean weight). The mean is, of course, the average value of the variable: it is a measure of where the “middle” of the population is. So here we are concerned primarily with this “middle”.

• Hypotheses about frequency distributions (Topic 6) – Here the null hypothesis being tested concerns the entire frequency distribution of a variable in one or more samples (e.g. the size-frequency distribution). The frequency distribution contains information about the “middle” of the population but also about the shape of that population. Here we are concerned with the shape of the entire distribution, not just where the “middle” is.

• Hypotheses about relationships between variables (Topic 7) – Here the null hypothesis being tested concerns the relationship between variables (e.g. between weight and length). Here we are not really concerned with averages, or frequency distributions, but in how one variable changes in response to changes in another (e.g. how weight changes with changing length).

The “Analysis selector” pages on the CD can help you decide what analysis to do. Select it from the top menu bar and follow the questions.

Here, in Topic 4, we look at testing relatively simple hypotheses about means: hypotheses involving only one, or at most two, means. (Topic 5 examines tests of hypotheses about more than two means.) After completing this topic, you should be able to:

• identify situations in which hypotheses about means are being tested;

• frame and test simple hypotheses about means of one or two samples; and

• interpret the results of these tests and draw conclusions about the alternate hypothesis and model.

Summary statistics

Before moving to the stage of formally testing an hypothesis, it may be useful to review the basic characteristics of the sample, or samples, collected. If the samples were collected with multiple objectives and hypotheses in mind, this may help in deciding which hypotheses to test, and which analyses to attempt, first. This review may also highlight errors made when the sample was selected, when the observations were made or

Topic 4: Testing hypotheses about one or two means 37

Study Guide SBI209 Design and analysis of biological studies

when the data was entered. If these errors were simple calculation or typing errors, then they can be fixed; otherwise the incorrect values can be deleted and, if possible, new observations substituted.

This preliminary review of the results often involves looking at the frequency distribution of the sample and calculating the mean, variance and standard error. We examined these issues in Topic 2, Sub-Topic 1, “Variables, distributions and summary statistics”, but you may wish to briefly review this information.

Sub-topic 1: Types of comparisons of two values

As noted in the Introduction, here we are concerned with testing hypotheses about the means of one or two samples. (Hypotheses concerning greater numbers of means will be examined in the next topic.) Before looking at how we might test these hypotheses, it is useful first to consider the nature of these hypotheses in a little more detail.

In general, the hypotheses tested in this topic involve a comparison of two values: either a comparison of the means of two samples or a comparison of the mean of one sample with some specified value (that is, with a value coming from some theory and not from another sample). Thus, in very general terms, the null hypotheses we are concerned with in this topic are of one of the following three forms:

• H0: Value 1 = Value 2 (two-tailed hypothesis)

• H0: Value 1 ≤ Value 2 (one-tailed hypothesis)

• H0: Value 1 ≥ Value 2 (one-tailed hypothesis)

In these hypotheses, “Value 1” will be the mean of a sample; “Value 2” may be a “theoretical” value (more on this below) or the mean of another sample. Because these hypotheses are all of the same general form, they are all tested in a similar way: using a t-test.

Box 10. One- and two-tailed hypotheses When two values are being compared (e.g. the means of two samples) two general kinds of null hypotheses can be identified. These differ in the number of kinds of alternative observations which can prove the null hypothesis wrong. One-tailed tests: In the case of one-tailed hypotheses, there is only one kind of alternative to the stated null. For instance, if the null is: H0: Zinc in the soil is less than or equal to 4 ppm then the only alternative to this is: HA: Zinc in the soil is greater than 4 ppm The null does not have to say “less than or equal to”. The following null is also one-tailed: H0: Moisture in the soil is greater than or equal to 60% because the only alternative to this is: HA: Zinc in the soil is less than 60% Two-tailed tests: In the case of two-tailed hypotheses, there are two kinds of alternatives to the stated null. For instance, if the null is: H0: Zinc in the soil is equal to 4 ppm then the alternatives to this are: HA1: Zinc in the soil is greater than 4 ppm and HA2: Zinc in the soil is less than 4 ppm Important point: Note that in all cases the null hypothesis includes an “equals” but the alternate hypotheses do not. Null hypotheses without an “equals” are not easily tested.

Topic 4: Testing hypotheses about one or two means 38

Study Guide SBI209 Design and analysis of biological studies

Two-tailed and one-tailed hypotheses

Before examining how to actually test these hypotheses, there is one further general issue to consider: the concept of one and two tailed tests. This was briefly introduced – without a great deal of explanation – in Tutorial 1, Exercise 4.

These two types of hypotheses are described in Box 10.

Sub-topic 2: Comparing the mean of a sample to a value

The simplest situation is when the null hypothesis concerns only one sample. These situations arise when testing whether or not the mean of a sample conforms to some expectation. Some examples are:

• A biologist derives a model describing how birds partition their time among different activities. The model predicts that, under certain conditions, the birds should spend an average of 4.3 hours/day foraging.

• A biologist is checking the calibration of an instrument before using it to take readings from water samples. If the instrument is properly calibrated, the mean reading from pure water should be 0 (zero).

• A biologist is responsible for checking the effluent from a factory. The factory can only release effluent if the concentration of copper is less than 4 ppm.

In all of these cases, the biologist would proceed by collecting a set of replicate observations – the sample – and then testing an hypothesis about the mean. I refer to this type of test as “comparing a mean to an hypothetical (or theoretical) value”.

As you can see from the examples above, the value that the mean is compared to may indeed come from some general model or theory (such as the one about birds). This is, however, not always the case: the value might come from a calibration manual or a pollution regulation. The important point is just that the mean of a sample is being compared to some other value which is not derived from another sample.

“ Goodness-of-fit” tests

Before (finally) getting to the statistical test, one potential complication must be discussed. The type of test described here is often referred to as a goodness-of-fit test, because we are testing how good the fit of the observations is to the theoretical value. If these do not differ, then we have a “good fit” (and have confidence in the model, calibration or whatever).

The “complication” that arises is that sometimes the model testing procedure (as described in Topic 1) results in us having to try to test an “untestable” null hypothesis. What does this mean? As noted in Box 10, null hypotheses without an “equals” sign cannot be tested easily (if at all). Sometimes, when following the model testing procedure, we end up with an hypothesis of this form: H0: Mean does not equal some value.

Null hypotheses like this do not have an “equals” sign, so cannot be easily tested (see Underwood 1997 for more).

This problem, and solutions, is further described in Box 11.

Topic 4: Testing hypotheses about one or two means 39

Study Guide SBI209 Design and analysis of biological studies

Box 11. “Untestable” null hypotheses As noted in Box 10, null hypotheses without an “equals” are not easily tested. For instance, suppose that, following the “model testing” procedure, we end up with this hypothesis and null hypothesis: HA: Mean foraging time equals 4.3 hours/day. H0: Mean foraging time does not equal 4.3 hours/day. The null here does not include an “equals” sign and cannot be easily tested. The solution here is to test the HA instead of the H0. In some cases it may be possible to sensibly restate the model so that a testable null results.

Review the extended example “Problems with goodness-of-fit tests”, Topic 4–2, on the CD.

t-test of a mean to an hypothetical value

In some cases, the null hypothesis that we wish to test is that the mean of a sample is equal (or less than or equal to, or greater than or equal to) to some particular value. The general form of these hypotheses is:

• The mean of a sample is equal to some value.

Specific examples are:

• The mean amount of time honeyeaters spend foraging is equal to the predicted value (from theory).

• The mean concentration of lead in the effluent pipe is less than or equal to the allowable limit.

The “value” here may come from a theory (or model), a regulation, a calibration guide, or be arrived at in some other way. The critical point is that this “value” is given; that is, it is not derived from another sample.

Null hypotheses of the form above can be tested using a type of test called a t-test, specifically, a t-test comparing the mean of a sample to a hypothetical value (“hypothetical value” is my terminology; elsewhere you may see reference to “testing hypotheses about the mean of a sample”).

Read Section 4.1 of the Statistical Manual, then continue with the Study Guide.

Review the example “Comparing a mean to a value”, Topic 4–2, on the CD.

Examine the Excel worksheet “t-tests – hypothetical” on the CD.

Tutorial 4, “Testing hypotheses about one or two means”, contains exercises dealing with comparisons of a mean to a (hypothetical) value. You could attempt these exercises now but it is probably better to leave this until after you have reviewed the material in Sub-topic 4.

Sub-topic 3: Comparing means of paired samples

Sometimes the two values being compared are means of samples in which an observation in one sample is paired, or

Topic 4: Testing hypotheses about one or two means 40

Study Guide SBI209 Design and analysis of biological studies

matched, with an observation in the other sample. The most obvious of these situations is when the observations are before and after readings on the same replicates; for instance, on the same set of plants, animals, study plots or islands. “Before” and “after” here just represent two times, so observations before and after some experimental treatment, in the morning and evening and in two different seasons all represent comparisons of paired samples (provided that the same replicates are observed each time).

Paired comparisons can also arise through deliberate matching of replicates, although this is less common in biology. For example, a set of experimental plants might first be grouped into pairs on the basis of condition. After this, one randomly selected plant from each pair would be given the experimental treatment and the other used as the control. (This “pairing” is an example of a scheme called “blocking” which can be used to make experiments more powerful. Full discussion is beyond the scope of this unit.)

The general form of these hypotheses is:

• The means of two samples of paired observations are equal

Specific examples are:

• The mean number of Crown-of-Thorns starfish on a set of reefs is the same in 1993 and 2003 (observations in both years on the same set of reefs)

• The mean heart rate of a group of people increases after stress (observations before and after stress on the same people)

There is one potential problem with testing these types of hypotheses: they actually appear to use non-independent data. As discussed earlier in the Study Guide, this can compromise the reliability of statistical tests. In practice, however, the test is actually done on the differences between the values for each replicate. These differences will be independent, if the replicates themselves were randomly selected, and this eliminates the problem of non-independence (more details in the Manual).

Read Section 5.1 of the Statistical Manual, then continue with the Study Guide.

Review the example “Comparing means of paired samples” on the CD.

Examine the Excel worksheet “t-tests – Paired” on the CD.

Tutorial 4, “Testing hypotheses about one or two means”, contains exercises dealing with comparisons of a mean to a (hypothetical) value. You could attempt these exercises now but it is probably better to leave this until after you have reviewed the material in Sub-topic 4.

Sub-topic 4: Comparing means of unpaired samples

Probably the most common type of comparison of two values arises when the hypotheses concerns the means of two separate

Topic 4: Testing hypotheses about one or two means 41

Study Guide SBI209 Design and analysis of biological studies

samples (i.e. two samples each comprising several, randomly and independently selected, replicates).

The general form of these hypotheses is:

• The means of two or more samples are equal

Specific examples are:

• The mean weight of male and female turtles is the same (sample of 15 male and 12 female animals collected from the study area)

• The mean concentration of oil in sediments at the impact site is less than or equal to the mean at the control site (6 soil samples collected from random locations at each site)

The test here is again a type of t-test but this is complicated by having two separate samples, each with its own variance, and, therefore, standard error. This complicates the calculations – and also the assumptions required for the test to be valid – because the two standard errors have to be combined to provide one pooled estimate for use in the t-test (details in the Manual).

Read Section 6.1 of the Statistical Manual, then continue with the Study Guide.

Review the example “Comparing means of unpaired samples” on the CD.

Examine the Excel worksheet “t-tests – Unpaired” on the CD. You may wish to use this (and the other) worksheet to help you complete the tutorial.

Do Tutorial 4, “Testing hypotheses about one or two means”, Exercises 1 to 6, in the Problems book. This contains exercises dealing with all three types of t-tests discussed here.

Summary • Hypotheses concerning comparisons of two values

(where one or both values are means from samples) can be one- or two-tailed.

• Hypotheses about the mean of a single sample (e.g. that the mean of the sample equals some value) can be tested using a t-test for comparing a mean to a hypothetical value.

• Hypotheses about the means of two paired samples (e.g. observations taken before and after some event) can be tested using a t-test for comparing a means of paired samples.

• Hypotheses about the means of two unpaired samples (e.g. observations made at sites affected by oil and uncontaminated sites) can be tested using a t-test for comparing a means of unpaired samples.

42

Study Guide SBI209 Design and analysis of biological studies

Topic 5: Testing hypotheses about more than two means 43

Study Guide SBI209 Design and analysis of biological studies

Topic 5: Testing hypotheses about more than two means DE

SIGN

AND

ANA

LYSI

S OF

BI

OLOG

ICAL

STU

DIES

Topic 1 Approaches to

biological studies Sub-topic 1: One-factor ANOVA versus two-factor • Exercise: Sampling schemes (review)

Sub-topic 2: One-factor ANOVA • Example: One-factor ANOVA [CD] • Excel workbook: 1FANOVA [CD] • Exercises: Examples of one-factor ANOVA, Exercises 1

and 2 in Tutorial 5, in the Problems book.

Sub-topic 3: Multiple comparisons tests • Example: One-factor ANOVA – Tukey’s Test [CD] • Excel worksheet: 1FANOVA – Tukey’s Test [CD]

Sub-topic 4: Assumptions of ANOVA • Example: One-factor ANOVA – Residuals [CD] • Excel worksheet: 1FANOVA – Residuals [CD] • Excel worksheet: 1FANOVA – Transformations [CD]

Sub-topic 5: Two-factor ANOVA • Example: Two-factor ANOVA [CD] • Excel worksheet: 2FANOVA [CD] • Excel worksheet: 2FANOVA example [CD] • Example: Interactions in two-factor ANOVA [CD] • Excel worksheet: ANOVA-Interactions [CD] • Example: Nested and stratified designs. [CD] • Exercises: Examples of two-factor ANOVA, Exercises 3

and 4 in Tutorial 5, in the Problems book.

Topic 2 Elements of design

Topic 3 Problems and solutions

Topic 4 Testing hypotheses about

one or two means

Topic 5 Testing hypotheses about

more than two means

Topic 6 Testing hypotheses

about frequencies

Topic 7 Testing hypotheses

about relationships

Topic 8 An introduction to

multivariate analysis

Topic 5: Testing hypotheses about more than two means 44

Study Guide SBI209 Design and analysis of biological studies

Introduction

In Topic 4, we looked at testing relatively simple hypotheses about means: hypotheses involving only one, or at most two, means. In practice, very few studies use only two samples (or sets of replicates). For instance, a simple experiment testing the effect of temperature on the growth of crab larvae might use tanks at three, or more, different temperatures. And a study of patterns of zonation is likely to incorporate observations in more than just two zones.

So what do we do if we want to compare the means of more than two samples? More generally, what do we do when we want to test null hypotheses like these:

• H0: Mean 1 = Mean 2 = … = Meani = Mean; or

• H0: µ1 = µ2 = … = µi = µ

The two null hypotheses above mean the same thing: the second, briefer, version just uses the statistical symbol for the mean (µ). In words, they say: “The null hypothesis is that Mean 1 equals Mean 2 equals Mean i (up to however many means there are in the study) equals some overall mean”. A shorter way of saying this is just to say that “all the means are equal”.

Of course, one obvious way to proceed is to use several unpaired t-tests. Suppose, for instance, that we do have three treatments in our study – tanks at 20°C, 25°C and 30°C – and we wish to test the null hypothesis that survival was the same at all temperatures. (The reason for testing this broad, general null hypothesis first is that if it is accepted then we don’t need to go

any further. We only need to do more detailed analyses if the general null is rejected.)

Using t-tests to test this null hypothesis is clumsy because, for these three means, we need to do three separate tests:

• 20°C against 25°C

• 20°C against 30°C

• 25°C against 30°C

This is not too bad but as the number of treatments (or groups) in our study increases, the number of pair-wise comparisons required rises rather alarmingly (Table 2). Even with only five groups, ten tests are required. Using a computer, doing these tests would not be hard but it would be tedious, as would sorting through the results.

Table 2. How the number of pair-wise comparisons rises, and chance of making at least one Type I error, rises with increasing means.

Number of means Pair-wise tests Pr(Type I error)

2 1 0.05

3 3 0.14

4 6 0.26

5 10 0.40

6 15 0.54

10 45 0.90

The tedium is, however, not the major problem. As we learned back in Topic 1, every time we do a statistical test we run the

Topic 5: Testing hypotheses about more than two means 45

Study Guide SBI209 Design and analysis of biological studies

chance of making a mistake. For a single t-test, done at the usual significance level, the chance of making a Type I error – rejecting the null when it is actually true – is 0.05, or 5%. In other words, if we do a hundred t-tests in situations where the null is actually true, in about five of them we will (incorrectly) conclude that the null is false (and should be rejected).

One way to put this error rate into context is to ask: What is the chance, in a given number of comparisons (where the null is true), that there will be at least one Type I mistake (with the null being incorrectly rejected)? Why ask this question? Simple: even if there is only one Type I mistake, incorrect conclusions about the results of the study are going to be drawn. (More mistakes will just make the situation worse.)

It is quite easy to calculate these probabilities (we will examine how in Topic 6). The results (Table 2) are very disturbing. Even with only four means, there is about a 25%, or one in four, chance of at least one Type I error. With six means, the odds are better than even: there will be at least one Type I error more often than not. And with ten means, you are virtually guaranteed at least one error. (A study with ten groups is large, but certainly not unheard of.)

Clearly, the use of multiple t-tests has considerable problems. A much better approach is to use a procedure which can test the general null hypothesis – all means are equal – directly. Such a procedure is analysis of variance, usually abbreviated as ANOVA. Using ANOVA, we can do one test of the general null hypothesis then, if it is rejected, proceed to do other, more specific, tests to determine which means (or groups) actually differ.

So, here in Topic 5, we will look at how ANOVA can be used to test hypotheses about three or more means. (In fact, ANOVA can also be used to compare just two means and in this situation it will give identical results to the unpaired t-test.)

After completing this topic, you should be able to:

• identify situations in which hypotheses about several means are being tested and distinguish between single factor and multi-factor situations;

• frame, and test, using ANOVA, hypotheses about means of three or more samples;

• evaluate the suitability of the data for ANOVA; and

• interpret the results of these tests and draw conclusions about the alternate hypothesis and model.

Sub-topic 1: One-factor ANOVA versus two factor ANOVA

Earlier in this unit (Topic 3, Sub-topic 5), we examined an example of a study designed by a biologist wishing to determine the best temperature and salinity conditions for raising prawn larvae. The conditions studied were temperatures from 20°C to 30°C and salinities from 25 ppt (parts per thousand) to 35 ppt. The design (which had 50 larvae per tank) looked like this:

Topic 5: Testing hypotheses about more than two means 46

Study Guide SBI209 Design and analysis of biological studies

Salinity

25 ppt 30 ppt 35 ppt Te

mpe

ratu

re 20°C 3 tanks 3 tanks 3 tanks

25°C 3 tanks 3 tanks 3 tanks

30°C 3 tanks 3 tanks 3 tanks

This is a two-factor experiment (or two-way experiment) because the effects of two factors – temperature and salinity – are being tested simultaneously. If the effects of salinity had been tested at only one temperature, as in the design below, then the experiment would be a one-factor study (or single-factor, or one-way, study). Here the study only tests the effects of different salinities:

Salinity

25 ppt 30 ppt 35 ppt

20°C 3 tanks 3 tanks 3 tanks

The two-factor design is an example of a multi-factor experiment (or multi-way experiment). Multi-factor experiments are often more efficient than single-factor experiments and are also usually more informative. (For one thing, they may identify important interactions: this is addressed a little later.)

Before testing hypotheses in situations in which there are more than two means, it is important to review the design of the

study to determine if it is an example of a multi-factor design. If it is, then (usually) it will be necessary to analyse the results using the appropriate multi-factor ANOVA. In this unit, the only multi-factor designs we consider are those having two factors: in practice, studies may have up to six factors. (Studies with more than six factors are possible – there is nothing theoretically wrong with such designs – but they usually prove to be too impractical to actually attempt. For instance, a seven factor study with only two different treatments for each factor, and two replicates, would require 256 different experimental units!)

You have already seen other examples of multi-factor designs in Exercise 7 in the “Elements of design” tutorial. Then you had to determine whether the sampling program had a cluster or stratified design (with combinations also being possible). All of the studies in that exercise were multi-factor designs, with two or three factors being investigated. The issues covered in that exercise are also relevant here, so it is probably useful to briefly review the examples given.

Review Exercise 7 in Tutorial 2, “Elements of design”, in the Problems book. This had examples of multi-factor designs.

Sub-topic 2: One-factor ANOVA

A one-factor ANOVA is, in a way, a bit like a t-test with more than two means (although statisticians might be horrified by this description). It tests the general null hypothesis:

• H0: µ1 = µ2 = … = µi = µ

Topic 5: Testing hypotheses about more than two means 47

Study Guide SBI209 Design and analysis of biological studies

This, as noted earlier, just says that the means of all the samples are equal; or all samples have the same mean. For the example which started this topic (survival of larvae in tanks at three temperatures), using ANOVA we easily and directly test the appropriate null hypothesis:

• H0: Mean at 20˚C = Mean at 25˚C = Mean at 30˚C

The purpose and rationale for one-factor ANOVA is further explained in the Statistical Manual.

Read Sections 6.2.1 and 6.2.2 of the Statistical Manual, then continue with the Study Guide.

Review the example “One-factor ANOVA” on the CD.

The calculations required to do a one-factor ANOVA can appear complex and complicated but the actual mathematical operations are relatively simple. These simply involve adding and squaring different combinations of the original data.

Read Sections 6.2.4 and 6.2.5 of the Statistical Manual, then continue with the Study Guide.

Examine the Excel worksheet “1FANOVA” on the CD. You may wish to use this worksheet to help you complete the tutorial.

Do Tutorial 5, “Testing hypotheses about more than two means”, Exercises 1 and 2, in the Problems book.

Sub-topic 3: Multiple comparisons tests

As you have seen, the end result of the ANOVA is the acceptance or rejection of the general null hypothesis. If the null is accepted then all the means compared are equal (strictly speaking, there is no evidence that the means differ). If, on the other hand, the null is rejected then all the means are not equal.

Rejecting the null hypothesis does not, however, indicate that all the means differ. It may be, for instance, that all are equal except for one (which is, therefore, different from all the rest). In the case of temperature experiment, several different alternatives are possible. Three possibilities (and there are several others) are:

• HA: Mean at 20˚C < Mean at 25˚C = Mean at 30˚C

• HA: Mean at 20˚C < Mean at 25˚C < Mean at 30˚C

• HA: Mean at 20˚C = Mean at 25˚C < Mean at 30˚C

Obviously, to get the most out of the experimental results we need to know which of these alternatives is correct. And the ANOVA, on its own, is of no help here.

Comparisons of individual means

What we often really need to do here is to look at pairs of means and decide which differ and which do not. But this seems to suggest that we should go back to doing simple unpaired t-tests, an approach which we have already seen has problems. In practice, there are two ways to proceed here:

• State (and so test) a more specific combination of null and alternate hypotheses. ANOVA is a very flexible technique and it is possible to test more complex

Topic 5: Testing hypotheses about more than two means 48

Study Guide SBI209 Design and analysis of biological studies

hypotheses. For instance, in the case of the “temperature experiment” we could test if survival tended to increase (or decreases) in a linear fashion with temperature. If it did, then this would provide a much better understanding of the results of the experiment. This approach – which is sometimes called “testing a priori hypotheses” – is beyond the scope of this unit and will not be considered further.

• Follow the ANOVA with a (so-called) “multiple comparisons test” which compares all the pairs of means. Such a test (and there are several; see below) will compare all the means and, again, provide a more detailed understanding of the results. This is the approach which is examined in this unit using one particular kind of test: Tukey’s test.

So why do an ANOVA first

At this point you might well be wondering why you should bother to do an ANOVA if you are just going to have to do another test anyway. Why not do the other test (e.g. Tukey’s test) straight away and save some time and effort. There are several reasons.

• The ANOVA provides a direct test of the overall null hypothesis that all means are equal. If this null is accepted then there is probably not much reason to do anything else. (This will not always be true but in cases where it isn’t, one would probably be doing something else – such as testing a priori hypotheses – anyway.)

• The ANOVA is more powerful than any multiple comparisons test. In fact, if the ANOVA assumptions are

valid (see below), then ANOVA is known to be the most powerful way of testing the overall null hypothesis. An ANOVA might well reject the overall null in cases where a multiple comparisons test would result in it being accepted.

• Results from the ANOVA are needed to complete a multiple comparisons test. Although you don’t have to do an ANOVA to get the results required, it is one easy way of getting them.

Tukey’s test

The test described in this unit is Tukey’s Test (often referred to as “Tukey’s HSD Test”, where HSD stands for “Honestly Significantly Different”). In this test, the means are ranked in order, from smallest to largest, and then the difference between each pair of means is compared to a calculated critical value. If the difference is greater than the critical value, then the means differ, otherwise they don’t.

Read Section 6.4 of the Statistical Manual, then continue with the Study Guide.

Review the example “One-factor ANOVA – Tukey’s test” on the CD.

Review the “Tukey’s Test” page in the Excel file “1FANOVA” on the CD.

Topic 5: Testing hypotheses about more than two means 49

Study Guide SBI209 Design and analysis of biological studies

The aim of multiple comparisons tests

At first sight, doing something like Tukey’s Test might not appear a lot different from doing a whole set of t-tests. The crucial difference is in the Type I error rate associated with these two different approaches. As we saw earlier, increasing the number of means – and, therefore, the number of t-tests – increases the chance of at least one Type I error dramatically. In contrast, multiple comparisons tests are designed so that, no matter how many means there are, the chance of at least one Type I error never goes above the chosen significance level (usually 0.05 or 5%). This is the advantage of these procedures over other approaches, such as multiple t-tests. The disadvantage of multiple comparisons tests is that, in order to control the Type I error rate, they usually sacrifice power.

Tukey’s Test is not the only multiple comparisons test, although it is one of the ones more commonly used (and implemented in statistical packages). Some are designed for specific, but rather uncommon, situations (such as comparing several different means to one control mean). There are, however, several other multiple comparisons procedures, designed for the same situations as Tukey’s Test, which do not properly control the Type I error rate. Because of the problems with these procedures, I generally recommend Tukey’s Test (see Day & Quinn 1989 in the reference list for more).

Sub-topic 4: Assumptions of ANOVA

The ANOVA procedure makes several assumptions about the observations (or data). If the data do not adequately satisfy these assumptions, the validity of the test may be seriously

compromised. (Note: the material here is derived from Section 6.2.3 of the Manual.)

The ANOVA assumptions

Assumption of Additivity of Treatment Effects

In essence, the ANOVA model is additive–it assumes that variation from different sources is added together (rather than being, say, multiplied) to give the total variation in the data set. In general, it seems that even if this assumption is not strictly correct it is “close enough”. Therefore, there is usually no need to worry about this assumption.

Assumption of Normality of Distributions of Observations

The observations in each group are assumed to be normally distributed; that is, they are assumed to have a normal (or “bell-shaped”) distribution. Major departures from normality may well invalidate the test. Often, non-normality of the observations is associated with non-equality of group variances (see below) and fixing one problem fixes the other.

Assumption of Equality of Group Variance

The distributions of the observations in each group are assumed to have equal variances. Minor violations of this assumption are usually of little importance but major violations invalidate the test. A useful test of this assumption is Cochran’s test (see below and Section 6.3 in the Manual). If this assumption is not valid, then the data should be appropriately transformed (see below and Section 6.2.4.2 in the Manual).

Topic 5: Testing hypotheses about more than two means 50

Study Guide SBI209 Design and analysis of biological studies

Assumption of Independence of Observations

The observations are assumed to be collected in such a way that they are statistically independent. In other words, any observation made must not depend on any other observation. If the observations are in any way dependent the test is invalid.

Implications of violations of the assumptions

The only assumptions we need to be concerned about are those of independence, normality and equality (of variance). The additivity assumption is probably only of theoretical interest.

The independence assumption is critical. If the observations are not independent, then the actual Type I error rate may be much greater, or much smaller, than the assumed rate. The results of the analysis will probably not be reliable.

The normality and equality assumptions are much less important. ANOVA is said to be “robust” against moderate violations of these assumptions. In other words, provided that the distributions are roughly normal, with roughly equal variances, the analysis will be reliable. In a way, ANOVA is a bit like a car engine which, having been designed to run on high grade fuel, turns out to run perfectly well on fuel of much lower quality.

Unfortunately, statistics books, especially those for “users” (like biologists) often provide contradictory, and excessively alarmist, advice about the normality and equality assumptions (see McGuinness 2002 in the References). Moderate departures from normality and equality are not likely to be a problem.

Assessing the assumptions

So what is a “moderate departure”? And how do we determine if our data show this? Again, standard texts provide varying advice. (And for that reason, there is a longer-than-usual discussion here.)

Many authors recommend doing a special “test for equality of variance” before proceeding with an ANOVA. Indeed, the Statistical Manual describes one test for doing this: Cochran’s test. In practice, Cochran’s test is probably the best of the bunch but even it is overly conservative: it will indicate that there is a problem, when there probably isn’t (see McGuinness 2002 for details). Nonetheless, it is handy to have some idea how these tests function.

Read Sections 6.3 and 6.5 of the Statistical Manual, then continue with the Study Guide.

So how then to proceed? The recommendations below have been extracted and abbreviated from McGuinness (2002):

• Graphical methods (residual plots, normal plots, variance vs means plots) should be used to check the distribution of observations and variances. All that is required for the analysis to be reliable is approximate normality of observations and equality of variances. Only when there is single, large variance, or marked non-normality, are there likely to be substantial problems.

• Groups with large variances should be checked for errors and outliers. An unusually large variance may

Topic 5: Testing hypotheses about more than two means 51

Study Guide SBI209 Design and analysis of biological studies

simply result from an incorrectly recorded or entered observation.

• The data should be appropriately transformed (see below) if there is a relationship between means and variances, or if there are biological reasons for expecting non-normal distributions.

• If a formal test of the homogeneity assumption is required, Cochran’s test, with a revised significance level of 0.01, may be used, but this procedure is still likely to be overly conservative except for small sample sizes.

• In cases where there is very substantial non-normality or heterogeneity, alternative procedures (such as the use of randomization tests, or generalised linear models with non-normal error distributions) may be pursued through specialist advice.

Review the example “One-factor ANOVA – ANOVA assumptions” on the CD.

Examine the Excel worksheet “1FANOVA” on the CD and look at the “Residuals” page.

Transforming the data

In some, perhaps many, biological situations there is often a relationship between the mean and variance for samples (or experimental treatments). One common trend is for larger means to be associated with larger variances. Another common trend – with proportional and percentage data – is for samples

with means around 50% to have larger variances than samples with smaller or larger means.

If the problem is severe, transforming the data might help to resolve the problem. The types of transformations commonly used are described in the Manual.

Read Section 6.2.4.2 of the Statistical Manual, then continue with the Study Guide.

All of the analysis worksheets on the CD allow you to transform the data using these standard transformations. Try the following exercise to see how transformations might work.

Open the Excel worksheet “1FANOVA” on the CD. Clear the data and then enter the values 1, 2 and 3 for Group 1; 10, 13 and 16 for Group 2; and 10, 26 and 32 for Group 3. Look at the means and variances, Cochran’s value and the plots on the “Residuals” page. Now transform the data by typing “1” in the appropriate box on the “Data-ANOVA” page. Look again at the means and variances, Cochran’s value and the plots on the “Residuals” page. Note how the transformation has made the variances of the three groups more equal.

Sub-topic 5: Two-factor ANOVA

One-factor ANOVA is an Excellent “general purpose” kind of tool but, as noted earlier, more complex multi-factor studies usually require an appropriate multi-factor analysis. These multi-factor analyses are, however, simply extensions of the

Topic 5: Testing hypotheses about more than two means 52

Study Guide SBI209 Design and analysis of biological studies

basic one-factor ANOVA, with modifications to take into account more factors and, perhaps, different kinds (see below) of factors.

The main difference in the results, for a two-factor ANOVA, is the addition of rows in the ANOVA table to accommodate tests of more hypotheses. The table below shows the format of a standard one-factor ANOVA table of results (in the “df” column, k refers to the number of groups and n to the number of replicates in each group). By comparing the calculated “F-calc” value to the tabled value we can test the null hypothesis that all means are equal. (Note that the “within” row can also be labelled “error” or “residual”: these terms are equivalent.)

Table 3. Basic table for a one-factor ANOVA. Note that # just stands for a number.

SOURCE SS df MS F-calc

Among # k # #

Within # k(n – 1) #

Total # kn – 1

The table for a two-factor ANOVA is very similar, except that there are now three rows with calculated F-ratios (“F-calc”): Factor A, Factor B and Interaction.

Table 4. Basic table for a two-factor ANOVA. Note that # just stands for a number.

SOURCE SS df MS F-calc

Factor A # a # #

Factor B # b # #

Interaction # (a –1)(b – 1) # #

Within # ab(n – 1) #

Total # abn – 1

In this table, Factor A and Factor B refer to the two factors being tested (and a and b in the table refer to the number of different treatments, or levels, in Factors A and B, respectively). The Interaction row refers to something called the interaction between the factors. This interaction is examined in more detail below. The F-ratios (“F-calc”) for Factors A and B allow us to test hypotheses about the means of these factors:

• H01: Means of Factor A treatments are equal.

• H02: Means of Factor B treatments are equal.

Testing these hypotheses is referred to as testing for the main effects of the (respective) factors. This is because the test for Factor A effects completely ignores Factor B. If, for instance, Factor A was temperature and Factor B was salinity, then a test of H01 would just compare the averages for the different temperatures, ignoring the fact that the temperature results were collected at several different salinities. A test of H02 would do the reverse: compare salinities while ignoring temperatures. Testing the main effects is similar, in some senses, to doing one-factor ANOVAS.

Topic 5: Testing hypotheses about more than two means 53

Study Guide SBI209 Design and analysis of biological studies

Review the example “two-factor ANOVA” on the CD.

Read Section 7 of the Statistical Manual, then continue with the Study Guide.

Examine the Excel workbook “2FANOVA” on the CD, which operates in a similar way to the “1FANOVA” workbook.

Examine the Excel workbook “ANOVA2F-EG” on the CD. You may wish to use this as an example to learn how to do a two-factor ANOVA using Excel’s AnalysisToolPak..

Interactions

The “Interaction” in the table is something new. It tests for so-called “interactive” effects. This is rather difficult to explain verbally but it basically means that the two factors act together in complex ways. Also, the effects of one factor cannot be easily or sensibly separated from the effects of the other factor. The best way to understand interactions is by examples.

Review the example “Interactions in a two-factor ANOVA” on the CD.

Use the Excel workbook “ANOVA-Interaction” on the CD to explore the meaning of the interaction.

The last graph in this example (that is, the example on the CD-ROM) is this:

Figure 9. Example of an interaction.

The significant interaction here means that we cannot make simple statements about either the effects of the oil or the effects of height on the shore (i.e. the low zone versus the high zone). The effects of the oil depend on the zone we look at. Alternatively, the effects of the oil vary between the two zones. In contrast, if the interaction were not significant, then we could make simple statements about the effects of the oil and the height on the shore (refer back to the CD for these).

More complex two-factor designs

So far in this sub-topic we have looked at what might be called a “simple two-factor design with orthogonal, fixed factors”. Many two factor designs will be like this but it is important to recognise that not all designs are.

In general, several things determine the design of an ANOVA:

Topic 5: Testing hypotheses about more than two means 54

Study Guide SBI209 Design and analysis of biological studies

• The number of factors. Here we are considering only designs with two factors but, as noted earlier, ANOVA designs may have more.

• The type of factors. Factors can be considered either fixed or random – this is explained more in the box. The type of factor affects the interpretation of the results and some of the calculations.

• The arrangement of the factors in the design. Designs can be either nested or stratified, or, in some combinations, a mixture of these. We considered this earlier from the point of view of sampling schemes; here we consider how it affects the analysis.

• The number of levels in each factor. The term “levels” just refers to the number of kinds of treatments, or groups, in each factor. For instance, consider an experiment testing the effects of three temperatures and five humidity regimes on the hatching success of turtle eggs. Here the temperature factor would have three levels and the humidity factor would have five levels. The numbers of levels present – and the number of replicates – affects the degrees of freedom in the table and the power of tests.

Nested and stratified designs

In a stratified design, each level of one factor is present with each level of the other factor. In a nested design, this is not the case: some levels of one factor are only present with some levels of the other factor.

Review the example “Nested and stratified designs” on the CD.

One of the main practical implications of this distinction is that we can’t test for interactions between nested factors. In the first CD example – high and low shore zone samples in two bays – we can see if Zone and Bay interact because both zones were sampled in both bays. In the second example – two randomly selected patches sampled in each bay – we can’t test for an interaction between Patch and Bay because different patches were sampled in the two bays.

Re-read Section 7.2 of the Statistical Manual, on nested designs, then continue with the Study Guide.

Random and fixed factors

In an ANOVA, a factor may be considered “fixed” or “random”. In practice, most, but certainly not all, factors are fixed but you should be aware of the meaning of “random” in this context and be able to identify these kinds of factors.

Fixed and random factors are defined in Box 12.

Topic 5: Testing hypotheses about more than two means 55

Study Guide SBI209 Design and analysis of biological studies

Box 12. Fixed and random factors Factors in ANOVA can be considered either fixed or random. The distinction is based primarily on the way in which the levels were selected and the nature of the conclusions to be drawn. Whether factors are fixed or random affects some ANOVA calculations. Fixed factors. Most factors are fixed factors. In these factors, the particular levels included in the study are deliberately selected by the researcher and/or include all the possible levels. For instance, if a lake is divided into shallow and deep sections, then the “sections” factor is fixed because the two possible levels were deliberately selected and both included in the study. If three incubation temperatures are selected – 10, 20 and 30 degrees – then temperature is a fixed factor. With fixed factors, the primary focus of the study is on the particular levels included in the study. Random factors. In random factors, the particular levels on the factor included in the study are randomly selected. If the study were repeated. different levels might be selected. For instance, if an experiment is repeated in three randomly selected patches of salt-marsh, then the factor “patch” is a random factor. If the study were repeated, different patches might be selected. Here the patches are just selected to be representative of “salt-marsh” and are not of interest in their own right.

Re-read Section 7.1.2 of the Statistical Manual, on random and fixed factors, then continue with the Study Guide.

Review the example “Nested and stratified designs” on the CD.

In the CD examples, “Patch” is a random factor because they were randomly selected. “Zone” is a fixed factor because the zones were deliberately selected. Whether or not “Bay” was a random factor would depend on how the bays were selected (and with what intention).

Exercises

Do Tutorial 5, “Testing hypotheses about more than two means”, Exercises 3 and 4, in the Problems book.

Summary

• Hypotheses about more than two means are best tested using the appropriate analysis of variance (ANOVA). This avoids the problem of excessive Type I errors which would result from, for instance, using many t-tests.

• ANOVA designs may have one, two or more factors and it is important to identify the correct number of factors in the study so that the appropriate design can be used.

• In multi-factor designs (those with two or more factors) it is also important to identify any random or nested factors as these affect the way in which the analysis is done.

• Upon finding a significant effect in an ANOVA, you may need to do a multiple comparisons test, such as Tukey’s Test, to identify exactly which differences among means are important.

• It is important to be sure that the ANOVA is likely to be valid by checking that the data conform reasonably well to

Topic 5: Testing hypotheses about more than two means 56

Study Guide SBI209 Design and analysis of biological studies

the assumptions of normality and equality of variance. The types of analyses described here should never be used if the data are dependent (i.e. if the data are not independent).

Topic 6: Testing hypotheses about frequencies 57

Study Guide SBI209 Design and analysis of biological studies

Topic 6: Testing hypotheses about frequencies DE

SIGN

AND

ANA

LYSI

S OF

BI

OLOG

ICAL

STU

DIES

Topic 1 Approaches to

biological studies Sub-topic 1: Reasons for looking at frequency distributions • No additional material

Sub-topic 2: Four particular “theoretical” distributions • Example: Regular, binomial, poisson and normal

distributions [CD] • Excel worksheet: Distributions [CD] • Exercises: Testing hypotheses about frequencies;

Exercises 1 to 5 in the Problems book • Example: Using a distribution – situation and expecteds

[CD]

Sub-topic 3: Comparing an observed and an expected distribution • Example: Using a distribution – testing the null [CD] • Excel worksheet: Chi-distributions [CD]

Sub-topic 4: Comparing two or more observed distributions • Example: Comparing observed distributions [CD] • Excel worksheet: Chi-OBS [CD] • Exercises: Testing hypotheses about frequencies;

Exercises 6 to 10 in the Problems book.

Note: [CD] means that the item is on the CD-ROM.

Topic 2 Elements of design

Topic 3 Problems and solutions

Topic 4 Testing hypotheses about

one or two means

Topic 5 Testing hypotheses about

more than two means

Topic 6 Testing hypotheses

about frequencies

Topic 7 Testing hypotheses

about relationships

Topic 8 An introduction to

multivariate analysis

Topic 6: Testing hypotheses about frequencies 58

Study Guide SBI209 Design and analysis of biological studies

Introduction

The last two topic have examined some ways of testing hypotheses about means. As noted earlier, this can be useful because our questions do often relate to average values: What is the average weight of the fish? (And does it differ among sites?) What is the average rate of bacterial growth? (And is it affected by temperature?)

There are, however, many situations in which focussing on averages might be inappropriate or misleading, or might simply not provide sufficient information. In such situations, it may be more appropriate, and more informative, to focus instead on the entire frequency distribution. In other words, instead of calculating a mean (and probably a standard deviation or standard error) and testing hypotheses about it, we look at the entire frequency distribution – in a table or graph – and test hypotheses about it.

Topic 6 looks at this approach. In it, we will cover these issues:

• some reasons for looking at frequency distributions;

• four particular “theoretical” distributions and their uses;

• methods for comparing an observed distribution with an expected “theoretical” distribution; and

• methods for comparing two or more observed distributions.

Sub-topic 1: Reasons for looking at frequency distributions

In this section we’ll examine why we might look at the entire frequency distribution, rather than just at an average. And, for a bit of variety, we’ll start use some non-biological examples: some results from one of the tests Australia played over the 2002 to 2003 season.

Figure 10 is a frequency distribution of the speed of 48 balls bowled by Andy Bichel. These speeds are measured to the nearest 0.1 km/hr and, for the purposes of this graph, have been grouped into 1 km/hr classes (that is, the first class has balls from 130.0 km/hr to 130.9 km/hr, the next class has balls from 131.0 km/hr to 131.9 km/hr, and so on).

Bichel’s average speed, during this spell, was 136.4 km/hr, with a variance of 5.3 km/hr: the median and mode speeds were 136.6 km/hr. These result – mean, mode and median all similar – and a quick look at the graph, all suggest that, during this spell, Bichel bowled at a fairly consistent speed, only occasionally sending one in a little faster or slower.

In this situation, the mean, and variance, give a fairly good idea of the characteristics of this spell of bowling: an average speed of nearly 136.5 km/hr, with relatively little variation (the difference between the fastest and slowest classes is less than 10 km/hr). The mean, and variance, give a good idea of the bowling because the distribution is fairly symmetrical and also doesn’t have much spread. Also, none of the classes themselves are of particular interest.

Topic 6: Testing hypotheses about frequencies 59

Study Guide SBI209 Design and analysis of biological studies

Average speed = 136.4 km/hr

0

1

2

3

4

5

6

7

8

9

10

130.5 131.5 132.5 133.5 134.5 135.5 136.5 137.5 138.5 139.5 140.5 141.5

Speed (km/hr)

Num

ber o

f bal

ls

Figure 10. Andy Bichel bowling speeds.

Our second cricket example features Australia batting: a 45 over spell early in one innings (Figure 11). The graph shows the number of times Australia scored a particular number of runs in the over (so, for instance, there were 8 “maidens”, overs with no runs scored.). This distribution is definitely not symmetric and it also displays a fair spread of values.

In this case, the mean number of runs per over – 4.6 runs per over – tells us something but it does not give us a very complete idea of the scoring pattern. A run rate of 4.6 is good (for test cricket), so the batting was fairly good, but we don’t know how the runs were being scored. By looking at the entire frequency distribution, however, we can see that much of the time the Australians scored about 2–6 runs an over (21 overs) but the opposition also managed to limit the scoring to 0 or 1 quite

often (14 overs). To make up for this, the Australians had some big scoring overs (9 overs of 8 or more runs).

Average = 4.6 runs per over

8

6

3

2

6

4

6

1

2 2

1 1 1

0 0 0

1 1

0

1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Runs scored per over

Num

ber o

f ove

rs

Figure 11. Australian batting during a 45 over period.

In this situation, the average is meaningful, but looking at the entire distribution reveals (potentially) important information that would otherwise be overlooked.

This is also the case with the final example (Figure 12): a 111 ball, 98 run spell of batting by Steve Waugh. This time I’ve plotted two things: the number of times a certain number of runs were scored (stippled bar) and the total number of runs made from that type of scoring shot (solid black bars). Thus, the very first (left-most) bar shows that Waugh scored no runs off 76 of the 111 balls he faced. The very last (right-most) bar shows that he scored a total of 76 of his 98 runs in “fours”.

Topic 6: Testing hypotheses about frequencies 60

Study Guide SBI209 Design and analysis of biological studies

Figure 12. Batting by Steve Waugh (111 balls, 98 runs).

Waugh’s average here is 0.88 runs/ball – or 88.3 runs per 100 balls (as it is usually shown) – nearly a run a ball and pretty good going (especially for a “has been”). This “strike rate” is informative but, again, doesn’t tell the whole story. A quick look at the graph reveals that most of his runs came in boundaries, with a bunch of singles thrown in, and the occasional 2 or 3. Also, by looking at the frequency distribution we can tally up the “boundaries” (scores of 4); a statistic of some special interest and something we can’t get from the average.

For the final example, we will return to biology and turn to a type of frequency distribution often displayed and discussed: a size-frequency distribution (Figure 13). This graph (which is also in the Manual) shows the number of mangrove snails (Cerithidea anticipata) in each of 13 size-classes, each 2 mm wide.

SIZE CLASS

NU

MB

ER O

F SNA

IL

0

10

20

30

40

50

60

70

80

14 16 18 20 22 24 26 28 30 32 34 36 38

Mean=30.1 mmVariance=24.5 mm

Figure 13. Size-frequency distribution for 300 mangrove snails.

The mean size is 30.1 mm, meaning that the average snail was about 3 cm long, but the variance was quite high (24.5 which is about 80% of the mean), indicating that there is a fair bit of variation around this average. The graph is more informative. The graph has one “peak” at the 28–30 mm size-class, with large numbers in adjacent size-classes: these appear to be adult animals and they make up much of the population. There is a much smaller, second “peak” at the 16–18 mm size-class: these are most probably juvenile individuals, ranging in size from about 12–22 mm. Thus, inspection of the entire frequency distribution reveals biologically important information: the population has at least two distinct size-classes, juveniles and adults. (It is possible that there may even be a third size-class, intermediate between these two, but simple inspection of the graph is not sufficient to test this idea.)

Topic 6: Testing hypotheses about frequencies 61

Study Guide SBI209 Design and analysis of biological studies

Sub-topic 2: Four particular “ theoretical” distributions

So far in this topic we have looked at examples of what I call “observed” distributions: distributions constructed by collecting and summarising information (in tables or graphs). There are several “theoretical” distributions which are of interest because they can, sometimes, serve as models for biological situations. In other words, we can use a theoretical distribution – such as the binomial or poisson distribution – as a model for some biological situation. If the observed data match the predictions of this model, then we have learned something about that situation. Examples in this section will illustrate how this works.

Rectangular (or even) distribution

In a rectangular, or even, distribution all categories have the same frequency (Figure 14). We would not expect to actually see this type of distribution much in practice but it can sometimes be a useful model to test. For instance, we could test the null hypothesis that epiphytes were evenly distributed at all heights up trees. Refuting this null hypothesis would indicate that epiphytes were more common at some heights than at others. More detailed biological studies could proceed from this point.

Figure 14. Graph of an even, or rectangular, distribution.

Review the examples of rectangular distributions on the “Distribution examples” page on the CD.

Binomial distribution

The binomial distribution can be applied in situations in which there are a series of “trials”, each with two possible outcomes. As much early probability work was based on games, these two outcomes are often called “success” and “failure”, although this terminology is not appropriate in many biological situations. Using the binomial distribution, we can calculate the likelihood, or expected frequency, of particular combinations of outcomes. For instance, in a gambling game we can calculate the chance of getting one head and one tail if we toss two coins. In a biological situation we can calculate the chance that a clutch of three eggs all hatch out to be female. The graph below (Figure 15), shows, for clutches of five eggs, the probabilities of getting no females (and five males), one female (and four males), two females (and three males)…and so on. (Note that this example

Topic 6: Testing hypotheses about frequencies 62

Study Guide SBI209 Design and analysis of biological studies

assumes that there is an equal chance of an egg hatching out to be male or female.)

Figure 15. Graph of a binomial distribution.

The key features of a binomial distribution are p, the probability of a “successful” outcome (whatever that is), and n, the number of trials. In the example in the graph (Figure 15), p is assumed to be 0.5 (equal chance of male or female chick) and n is five (five eggs in the clutch).

Read Sections 9.1 and 9.2 of the Statistical Manual, then continue with the Study Guide.

Review the examples of binomial distributions on the “Distribution examples” page on the CD.

Explore the binomial distribution in the Excel file “Distributions” on the CD. (The rectangular, or even, distribution is not included because it is too basic.)

Poisson distribution

The poisson distribution can be used to predict what we expect to see when sampling populations in which things are distributed at random in time or space. For instance, suppose that we have counted the number of termite mounds in many one hectare plots in the savanna. The graph below plots the numbers of plots expected to have no termite mounds, one termite mound, two termite mounds…and so on (if mounds are randomly distributed with a mean of two mounds per hectare). If the observed counts follow this expected pattern closely, then termite mounds would appear to be distributed at random in the area. (It is, however, not quite this simple, because the size of the plot can affect the outcome of this test.)

Figure 16. Graph of a poisson distribution.

The poisson distribution has one important feature (or parameter): the mean, µ. (A characteristic of this distribution is that the variance equals the mean.)

Read Section 9.3 of the Statistical Manual, then continue with the Study Guide.

Topic 6: Testing hypotheses about frequencies 63

Study Guide SBI209 Design and analysis of biological studies

Review the examples of poisson distributions on the “Distribution examples” page on the CD.

Explore the poisson distribution in the Excel file “Distributions” on the CD.

Normal distribution

The normal distribution is often called the bell-shaped curve (or bell curve), and sometimes the gaussian distribution. This is a symmetric distribution in which most of the population is close to the mean. For a particular cohort (age class) of animals, measures of size – such as weight, height or length – are often approximately normally distributed.

Figure 17. Graph of a normal distribution.

The key features (or parameters) of the normal distribution are its mean, µ, and its variance, σ2.

Read Section 9.4 of the Statistical Manual, then continue with the Study Guide.

Review the examples of normal distributions on the “Distribution examples” page on the CD.

Explore the normal distribution in the Excel file “Distributions” on the CD.

Do Tutorial 6, “Testing hypotheses about frequencies”, Exercises 1 to 5, in the Problems book.

Using a theoretical distribution

As noted earlier, we can use these theoretical distributions as “models” to try to understand biological processes. For instance, there has been much research trying to determine if birds can influence the sex-ratio of their offspring. Random mating, and equal survival of male and female embryos, should result in equal numbers of male and female chicks emerging. If equal numbers of the two sexes do not emerge, then it appears that some process is at work altering the eventual sex-ratio. Further studies can then be done to determine what this process actually is.

Review the example “Using a distribution – situation and expecteds” on the CD.

Topic 6: Testing hypotheses about frequencies 64

Study Guide SBI209 Design and analysis of biological studies

Sub-topic 3: Comparing observed and expected distributions

So far, we have examined different kinds of distributions, and how to calculate what we would expect to see in different circumstances (e.g. different probabilities of “success” for the binomial). Having calculated what we expect to see, how do we then compare the observed results with these expectations? In other words, how do we test the (general) null hypothesis:

• H0: The observed frequencies equal the expected frequencies.

There are actually a few different ways of testing this null. The one that will be discussed in detail in this unit is the Chi-squared test (you may encounter this in other subjects, for example, genetics).

Before examining this test, it is important to note that this is another kind of goodness-of-fit test. We first encountered these in Topic 4, where the problems that can arise with goodness-of-fit tests and the “model testing procedure” were discussed. Those same problems can occur here: this is discussed a little more in the example on the CD (see later).

Chi-squared test

The Chi-squared test is the procedure used most often to compare an observed set of frequencies to expectations derived from some model. It is commonly used in genetic studies to compare, for instances, observed phenotypes with, say, expectations under a model of random mating. In other fields it may be used to test if a size-frequency distribution is normal (using the

normal distribution) or if events, such as bacterial deaths or divisions, are occurring randomly through time (using the poisson distribution).

As with all these tests, the procedure is to calculate the test value (a Chi-squared value), then compare this to the appropriate value in the tables: if the calculated value is greater than the tabled value, then the null hypothesis is rejected.

Calculating the test value is fairly simple. For each category (or class) in the data table, we calculate (see the CD and the Manual for examples):

• ( )

EEO 2

2 −=χ

(Here χ2 means “Chi-squared” and “O” and “E” stand for the Observed and Expected values, respectively.)

The individual (“cell”) Chi-squared values are added up to give a total for the table and it is this value which is compared to the table value. Notice that the comparison of the observed and expected values is done quite directly here just by subtracting one value from the other. Also note that if the observed and expected values are equal, the Chi-squared value will be zero. Thus, the larger the Chi-squared value, the larger the difference between the observed and expected frequencies.

As with the t-test and ANOVA, to find the appropriate value in the tables we need to work out the degrees of freedom for the test. In this particular case, we use this formula:

• 1−−= pkdf

Topic 6: Testing hypotheses about frequencies 65

Study Guide SBI209 Design and analysis of biological studies

In this formula, k is the number of categories (or classes) in the data table and p is the number of parameters estimated from the data. This last value probably requires some explanation. In order to calculate the expected values for a particular distribution, we may need to estimate some parameters from the data we have collected. Consider the following examples.

• Even distribution example. We’ve counted the numbers of parasites in four bands of equal area along the side of a fish. If the total number of parasites is 84, and they are evenly distributed, then we would expect to see 21 parasites in each band. In this case, we didn’t estimate anything from the data, so the degrees of freedom are (4 – 0 – 1): 3. (Note that the “4” here is the number of bands.)

• Binomial distribution example. We’re studying a species of bird which lays clutches of two eggs and have counted the numbers with no males, one male (and one female) and two males. We then calculate how many clutches we expect to see, if the probability of a male and female is the same (0.5). This is another case where we have not estimated anything from the data, so the degrees of freedom are (3 – 0 – 1): 2. (Here the “3” refers to the three combinations of outcomes: 0 male, 1 male and 2 male.)

• Poisson distribution example. We have counts of barnacles in quadrats placed in random locations on the rocky shore. If the barnacles are randomly distributed on the shore, counts of the numbers of quadrats with no barnacles, with one barnacle, with two barnacles and so on, should follow a Poisson distribution. To test if the

barnacles are randomly distributed we might compare our observed counts of numbers of quadrats with the values expected from the Poisson distribution. To derive the appropriate Poisson distribution, however, we need to know the mean and we would usually estimate this value from the observed counts. In this case we would have estimated one value – the mean – from the data, so the degrees of freedom are (k – 1 – 1): k – 2. (Here “k” would be however many categories we decided to use.)

• Normal distribution example. We have measured the sizes of juvenile fish caught in a billabong and want to test if the size-frequency distribution is normal. After deciding on appropriate class boundaries, we would draw up the distribution by counting how many fish were in each size-class. To calculate how many we would expect to see in each class, if sizes are normally distributed, we would use the normal distribution. In order to do this, however, we would usually have to estimate both the mean size and variance from the data. So in this case we have estimated two values – the mean and the variance – from the data, so the degrees of freedom are (k – 2 – 1): k – 3. (Here “k” would be however many size-classes we decided to use.)

Read Sections 10.1.1 to 10.1.4 of the Statistical Manual, then continue with the Study Guide.

Review the example “Using a distribution – testing the null” on the CD.

Topic 6: Testing hypotheses about frequencies 66

Study Guide SBI209 Design and analysis of biological studies

Review the three test pages in the Excel file “Chi-Distributions” on the CD.

Other tests

The G-test

As noted above, the Chi-squared test is not the only test that can be used, although it is the only one which will be examined in detail here. The G-test is an alternative which can be used in exactly the same situations as the Chi-squared test. This test also compares the observed and expected frequencies (so starts in the same way with the tabulation of those frequencies) and it also gives a value which is compared to the Chi-squared tables. The test value is, however, calculated in a different way. For the kinds of situations discussed in this unit, the G-test has no particular advantages over the Chi-squared test: as most people find the latter easier to calculate, it is the test that will be used here. (The G-test does have advantages in other, more complex, situations.)

The Kolmogorov-Smirnov test

Another test which can be used to compare observed and expected frequencies is the Kolmogorov-Smirnov test (and, no, it doesn’t involve drinking quantities of vodka). The calculations for this test are very simple – it is probably the simplest test in the Statistical Manual – but it does require special tables. In most circumstances the Chi-squared test is the preferred procedure but there are some occasions where the Kolmogorov-Smirnov test may better. In particular – and this is discussed in

a little more detail below – the Chi-squared test may be invalid when the sample size is small: the Kolmogorov-Smirnov test does not have this problem.

Read Sections 10.1.5 and10.1.6 of the Statistical Manual, then continue with the Study Guide.

Important points about comparing observed and expected distributions

For a Chi-squared test to be valid the following must be true:

• The test must be done on counts or frequencies, not on percentages – tests done using percentages are not valid. For example, the Chi-squared test must be done on the number of snails in the different size-classes, not on the percentages in these size-classes.

• Expected counts should not be too small – tests with very small expected counts are likely to be inaccurate. A commonly used rule of thumb is that no expected count should be less than one, and no more than one-fifth of the counts should be less than five. It too many expected values are too small, then you may have to combine some classes or categories (see the Manual Section 10.1.4.2.2). If combining classes or categories is undesirable, then you should consider using the Kolmogorov-Smirnov test.

• The observations should be independent – tests using dependent observations will not be valid. This is usually not a problem but observations would be dependent if, for example, counting an individual in one size-class affected what size-class some other individual would be recorded in.

Topic 6: Testing hypotheses about frequencies 67

Study Guide SBI209 Design and analysis of biological studies

IMPORTANT: Read this section carefully. You must follow these rules for the test to be valid.

(Note that the Excel workbooks for this topic automatically highlight cells with expected values which may be too small.)

Sub-topic 4: Comparing two or more observed distributions

So far we have considered situations in which we want to compare an observed distribution – observed results – with the predictions (or expectations) from some model. There are, however, situations in which testing the null hypothesis requires us to compare two (or more) observed distributions. Consider these two situations:

• Example 1. The banding patterns on the shells of a particular species of snail are genetically determined. As part of a study of the biology of this species, you “cross” snails with different banding patterns and record the results. You then compare the observed frequencies of the different phenotypes (banding patterns) to the frequencies expected from Mendelian genetics.

• Example 2. Although the banding patterns are genetically determined, the survival of individuals with different patterns might be determined by environmental factors. Continuing your study, you count the frequencies of the different banding patterns in three very different habitats. You then compare the observed frequencies of the different phenotypes in these three habitats to see if they are the same.

Example 1 here is similar to other examples considered in Sub-topic 3. It is an example of a goodness-of-fit test comparing the observed frequencies with those expected from some model (in this case a genetic model, rather than a distribution, such as the Poisson). If the null hypothesis is accepted, then simple Mendelian genetics would appear to apply. If the null is rejected, then some more complex genetic process may be occurring which requires further study.

Example 2 is not a goodness-of-fit test and it differs from the other examples discussed so far in this section. Here there are no “theoretical expected frequencies”. (As you will see, there are expected frequencies but they are not derived from a model or distribution.) If the null is accepted here, then the observed frequencies are the same in the different habitats and, as a consequence, environmental factors do not appear to be important. Rejecting the null, in contrast, provides support for the idea that the environment is affecting the observed frequencies of the different phenotypes.

Let’s develop this example further, but, to make it simple, consider only two phenotypes and two habitats (and I’ll use round “neat” numbers to make it even clearer). The table below has some (invented) results.

OBSERVED Sunny habitat

Shady habitat

TOTAL

Dark pattern 10 50 60

Light pattern 20 20 40

TOTAL 30 70 100

Topic 6: Testing hypotheses about frequencies 68

Study Guide SBI209 Design and analysis of biological studies

Now what we want to test, as we’ve seen, is if the frequencies of the two phenotypes are the same in the two habitats. In other words, we want to test a null hypothesis like this:

• H0: The proportions of dark and light snails are equal in the two habitats.

Note that this null hypothesis refers to the proportions of dark and light snails in the two habitats. Because we have different numbers of snails from the two habitats – 50 from the shady but only 30 from the sunny – we wouldn’t expect to see the same number of light and dark snails in each sample, even if there was the same proportion of the two phenotypes in both places.

In order to compare the two habitats, and take into account the different sample sizes, we need to calculate how many snails of each phenotype we would expect to see if the proportions did not differ. The next table (below) has these numbers.

EXPECTED Sunny habitat

Shady habitat

TOTAL

Dark pattern 18.0 42.0 60.0

Light pattern 12.0 28.0 40.0

TOTAL 30.0 70.0 100.0

Thus, if the proportions of light and dark snails were actually the same in the two habitats we would expect our sample 30 snails from the sunny habitat to have 18 dark and 12 light snails. In the shady habitat we would expect the 50 snails to comprise 30 dark and 20 light.

How do we get these numbers? If we combine the samples from the two habitats we have a total of 100 snails, 60 dark and 40 light. In other words, 60% of the sample is dark and the other 40% light. Now, if there really is no difference in proportions between the two habitats, then 60% of the snails in each habitat should be dark and 40% should be light. So, 60% of the 30 snails in the sunny habitat should be dark – this is 18 snails – and the remainder – 12 – should be light. In the shady habitat, 60% of the 50 snails should be dark – this is 42 snails – and the remainder – 28 – should again be light.

At this point you should notice that, even though we didn’t use any sort of theoretical model or distribution, we have still derived “expected” values. In practice what we have done is estimate the overall proportions of the two phenotypes and apply them to each habitat. In other words, we have pretended that the null hypothesis is true (proportions the same in the two habitats) and calculated what we would then see. If the null hypothesis is true, then the observed and expected numbers should be fairly close. We wouldn’t expect the observed and expected numbers to be exactly the same, because we would expect some variation due to sampling error; but the observed values should be similar to the expected if the null is true.

In this particular example, the observed and expected values are not very different but they are also not particularly similar. Could the sort of difference we see here – 10 versus 18, for instance, for dark snails in the sunny habitat – result from sampling variation? Could this sort of difference be due just to chance?

Topic 6: Testing hypotheses about frequencies 69

Study Guide SBI209 Design and analysis of biological studies

To answer these questions, and to help us decide whether to accept or reject the null, we need some way to formally, objectively test the null hypothesis.

Chi-squared test to compare two or more observed distributions

Of course, in this topic we have already used a test – the Chi-squared test – to compare observed and expected frequencies. The same test can be used here – although it is referred to now as a Chi-squared contingency table – and the test value is calculated in exactly the same way:

• ( )

EEO 2

2 −=χ

(As before, χ2 means “Chi-squared” and “O” and “E” stand for the Observed and Expected values, respectively.)

The individual (“cell”) Chi-squared values are again added up to give a total for the table and it is this value which is compared to the table value. Again, if the observed and expected values are equal, the Chi-squared value will be zero and the larger the Chi-squared value, the larger the difference between the observed and expected frequencies.

The degrees of freedom for this type of Chi-squared test are calculated in a different way:

• ( ) ( )11 −×−= crdf

In this formula, r is the number of rows in the data table and c is the number of columns. For the “2 × 2” table above, the degrees of freedom are one. (The table value for one degree of freedom

is 3.84 and this is considerably smaller than the calculated value (12.70) so the null hypothesis would be rejected.)

More than two distributions

The data table for the “snail” example has only two rows (phenotypes) and two columns (habitats) but the same procedure can be extended easily and directly to any number of rows and any number of columns. The calculations become increasingly tedious with more rows and columns (a good reason to use a computer) but they are done in precisely the same way.

Read Section 10.2.1 to 10.2.4 of the Statistical Manual, then continue with the Study Guide.

Review the example “Comparing observed frequency distributions” on the CD.

Review the “Chi-OBS” Excel file on the CD.

Do Tutorial 6, “Testing hypotheses about frequencies”, Exercises 6 to 10, in the Problems book.

A note about the “correction for continuity”

In some texts you will find a slightly different formula used for “2 × 2” tables (i.e. tables with two rows and two columns):

Topic 6: Testing hypotheses about frequencies 70

Study Guide SBI209 Design and analysis of biological studies

• ( )[ ]

EEO 2

2 5.0−−=χ

The “0.5” which is subtracted here is referred to as the “correction for continuity”. The reasons for using this adjustment are technical and of little importance here. In practice, the correction makes the calculated values slightly smaller but doesn’t seem to change things much.

Important points about comparing two or more observed distributions

The same three rules stated in Sub-topic 3 also apply here. The test must be done on frequencies (i.e. counts) not percentages; the expected values must not be too small (use the same rule); and the observations must be independent.

Alternative tests

As in Sub-topic 3, the G-test and Kolmogorov-Smirnov test can be used instead of the Chi-squared test. As before, the G-test does not seem to have particular advantages in these simple situations, so there is no strong reason to prefer it over the Chi-squared test. The Kolmogorov-Smirnov test may be used when sample sizes – and, therefore, expected values – are small but it has a major limitation: it can only compare two distributions at a time.

Summary • Some simple (“theoretical”) distributions can be used as

models for biological situations. Four distributions which may be useful in different situations are the Rectangular (or Even) distribution, the Binomial

distribution, the Poisson distribution and the Normal distribution.

• To determine what would be expected (if the model applies), the required parameters are stated, or estimated, and the probabilities of different outcomes are calculated.

• An appropriate null hypothesis to test in this situation is that “the observed and expected frequencies do not differ”.

• The Chi-squared test can be used to test this null hypothesis (an alternative is the G-test). If sample sizes are small, the Kolmogorov-Smirnov test may be a better choice.

• In some situations, it may be useful to compare several observed distributions, to test the null that they do not differ. The Chi-squared test can again be used (with the G-test as an alternative). If sample sizes are small, the Kolmogorov-Smirnov test can be used to compare two (but only two) observed distributions.

Topic 7: Testing hypotheses about relationships 71

Study Guide SBI209 Design and analysis of biological studies

Topic 7: Testing hypotheses about relationships DE

SIGN

AND

ANA

LYSI

S OF

BI

OLOG

ICAL

STU

DIES

Topic 1 Approaches to

biological studies Sub-topic 1: Why look at relationships • No additional material

Sub-topic 2: Correlations • Example: Testing a correlation: Pearson’s [CD] • Excel worksheet: Relationships [CD] • Example: Testing a correlation: Spearman’s [CD] • Excel worksheet: Spearmans [CD] • Exercises: Exercises 1 and 2a in Tutorial 7 in the

Problems book • Example: Misinterpreting correlations [CD]

Sub-topic 3: Linear regression • Example: Calculating a regression [CD] • Excel worksheet: Relationships [CD] • Exercises: Exercises 2b in Tutorial 7 in the Problems

book

Sub-topic 4: Non-linear relationships • Example: Transforming a relationship [CD] • Excel worksheet: Relationships [CD] • Exercises: Exercises 3 and 4 in Tutorial 7 in the Problems

book

Sub-topic 5: Other methods • No additional material Note: [CD] means that the item is on the CD-ROM.

Topic 2 Elements of design

Topic 3 Problems and solutions

Topic 4 Testing hypotheses about

one or two means

Topic 5 Testing hypotheses about

more than two means

Topic 6 Testing hypotheses

about frequencies

Topic 7 Testing hypotheses

about relationships

Topic 8 An introduction to

multivariate analysis

Topic 7: Testing hypotheses about relationships 72

Study Guide SBI209 Design and analysis of biological studies

Introduction

In the unit so far, we have examined methods for testing hypotheses of two (general) types:

• Hypotheses about the means of samples.

• Hypotheses about the frequency distribution of samples.

Although these two sets of methods test different kinds of hypotheses, they are similar in that in both cases the hypotheses concern only single variables. Consider these examples of null hypotheses:

• H0: The mean lead content is less than or equal to the permitted amount.

• H0: The mean number of fish does not differ among experimental treatments.

• H0: The observed proportions of eye colours do not differ from Mendelian predictions.

• H0: The proportions of shell patterns do not differ among habitats.

These hypotheses concern, respectively, “lead content”, “number of fish”, “eye colours” and “shell patterns” and each of these is a single variable. Because these hypotheses refer to only one variable, the methods used to test them can be referred to as univariate procedures (or univariate statistics).

Consider, in contrast, these null hypotheses:

• H0: There is no relationship between nitrogen in the soil and plant growth.

• H0: There is no relationship between the size of a park and the number of species present.

• H0: There is no relationship between the length of a fish and its weight.

Each of these (perfectly fine) hypotheses mentions two variables. Also, these null hypotheses are not about differences in means or frequency distributions, but about the presence or absence of relationships. The methods used to test these hypotheses can be called bivariate procedures (or bivariate statistics). Topic 7 looks at hypotheses about relationships. In it, we will cover these issues:

• some reasons for looking at relationships;

• two methods, applicable to different types of variables, for testing whether or not a linear relationship exists;

• a method for describing the nature of the linear relationship and predicting one variable (Y; the dependent variable) from the other (X; the independent variable); and

• some methods which can be used when relationships are not linear.

Sub-topic 1: Why look at relationships

Consider again the “relationship” null hypotheses given above:

• H0: There is no relationship between nitrogen in the soil and plant growth.

• H0: There is no relationship between the size of a park and the number of species present.

Topic 7: Testing hypotheses about relationships 73

Study Guide SBI209 Design and analysis of biological studies

• H0: There is no relationship between the length of a fish and its weight.

It is, I think, fairly easy to think of situations in which two of these null hypotheses might arise. The first might result from developing models about environmental factors affecting plant growth. The second might arise during studies aimed at developing models for park design and management.

The third is, perhaps, a little less obvious. Indeed, it is fairly obvious that there is little chance of this null hypothesis being true: a long fish (of any particular species) will almost always weigh more than a short fish. In practice, null hypotheses, similar to the third above, usually arise in the first stage of studies designed to describe and then utilise the relationship between the two variables: in this case, weight and length. It is, for instance, usually easier to accurately measure the length of live animals, than it is to accurately weigh them (because the animal’s wiggling causes the weight to bounce all over the place). But if an animal’s weight is closely related to its length (or some other measure), then we can measure the easier variable – length – and just use this information to estimate its weight.

The last example demonstrates that, when examining relationships among variables, we may want to first test whether or not a relationship exists. If one does, then we may want to describe (and, potentially, use) that relationship. The first step here again involves statistically testing a null hypothesis. The methods for doing this differ in detail from those examined in earlier sections, but the overall process is the same (calculate a test statistic, etc.). The second step also involves the use of

statistical methods, but not to test a null hypothesis: instead the methods are used to find the best description of the relationship for predicting one variable from the other.

Sub-topic 2, immediately below, examines ways of testing null hypotheses about relationships. The remaining sub-topics explore the process of describing, and using, the relationship.

About dependent and independent, and X and Y, variables

Let’s examine, for the third (but not last) time, the three “relationship” null hypotheses introduced earlier:

• H0: There is no relationship between nitrogen in the soil and plant growth.

• H0: There is no relationship between the size of a park and the number of species present.

• H0: There is no relationship between the length of a fish and its weight.

Note that all three are of the form:

• H0: There is no relationship between Variable A and Variable B.

Note also that all three presume that we have observed Variables A and B on the same individual “entities”: plants, parks and fish. It would, for instance, be silly to suggest that there might be a relationship between the sizes of six randomly selected parks from New South Wales and the numbers of species in six randomly selected parks in the Northern Territory. The data in these cases consist of paired observations on appropriate plants, parks or fish.

Topic 7: Testing hypotheses about relationships 74

Study Guide SBI209 Design and analysis of biological studies

Statisticians often refer to these two variables as X1 and X2. One reason for doing this is to emphasise that, in a sense, both variables are of equal importance, and that the correlation of X1 with X2 is the same as the correlation of X2 with X1. This may not make too much sense but is relatively easy to explain with an example. Consider the two graphs below:

LENGTH

WE

IGH

T

LEN

GT

H

WEIGHT

Figure 18. Two plots of relationship between length and weight.

Both graphs plot the same data and show the same relationship. Both are equally “sensible” because weight is related to length and length is related to weight. Consider, in contrast, the next two graphs (Figure 19).

Here the left graph may make more “sense” than the right. It may make sense to say that the number of species in a park is related to the area (size) of the park but the reverse statement – that the area of a park is related to the number of species – seems a bit odd.

SP

EC

IES

AREA SPECIES

AR

EA

Figure 19. Two plots of relationship between species and area.

The difference between these two situations – fish size and park diversity – results from differing interpretations of the underlying biology. In the case of “length” and “weight” we recognise that these two variables increase together because they are two measures of the same thing – the size of the fish. Further, both are likely to be related to another variable which is really “responsible” for fish being different sizes: age. In other words, length and weight both depend (to some extent) on age; but length doesn’t depend on weight and weight doesn’t depend on length. Thus, it may not matter too much which variable goes on the bottom (X1) axis and which goes on the side (X2, or Y) axis.

The “park” example is a little different. Here we may assume that the number of species in the park does depend on the area of the park (or things to do with the area, such as habitat diversity). Further, most of our “real” interest in is the numbers of species present and less in the area of the park. In this case, it

Topic 7: Testing hypotheses about relationships 75

Study Guide SBI209 Design and analysis of biological studies

makes more sense (and is conventional) to plot “number of species” on the side axis and “area” on the bottom axis.

Some people use the codes X1 and X2 to refer to the variables in those situations – such as the fish – in which either variable can “sensibly” go on the side axis. In other situations – like the park diversity one – they use Y to refer to the dependent variable, plotted on the side axis, and X to refer to the independent variable, plotted on the bottom axis.

Even in situations – such as the fish – where either variable could be plotted on either axis, we may actually be more “interested” in one of them. For instance, in the “fish example”, one reason for looking at the relationship might be (as we saw earlier) so that we can estimate the weight of a fish after measuring its length. In this case, it is “weight” that is really of interest and it should, therefore, be plotted on the side axis.

IMPORTANT: In this unit, I will just use the more familiar “X” and “Y” terms for the variables and axes. Also, it is critically important, when the aim is to estimate one variable from another, that the variable to be estimated is the “Y” variable. (The reason will be discussed later.)

Sub-topic 2: Correlations

The four graphs in Figure 20 illustrate relationships of various strengths. The graph in the top–left shows a situation where Y is perfectly correlated with X (all the points lie on a straight line). The graph on the bottom–right is of a situation in which there is no correlation at all between Y and Y: the two vary quite independently of each other.

X VARIABLE

Y V

AR

IAB

LE

X VARIABLE

Y V

AR

IAB

LE

X VARIABLE

Y V

AR

IAB

LE

X VARIABLE

Y V

AR

IAB

LE

Figure 20. Examples of relationships of different strength.

Correlation coefficients do two things. Using them we can describe or measure the strength of the relationship (or correlation) and test the null hypothesis that there is no correlation. Both correlation coefficients discussed in this section can vary between –1 (minus one) and +1 (plus one). The table below summarises the meaning of different values. (Note that “~” means approximately, or around. Also note that these are my own

Topic 7: Testing hypotheses about relationships 76

Study Guide SBI209 Design and analysis of biological studies

categories; other people use similar but slightly different boundaries.)

+1 A perfect positive relationship. All the points lie on a line sloping up to the right.

~0.8 A fairly strong positive relationship. Most of the points lie fairly close to a line sloping up to the right.

~0.5 A moderate positive relationship. A trend sloping up to the right is evident but the points are scattered around it.

~0.2 A rather weak positive relationship. The points form a cloud but this tends up to the right and down to the left.

0 No relationship at all. The points form a random “cloud”.

~–0.2 to ~–0.8

These values indicate negative relationships which are, respectively, rather weak, moderate, or fairly strong.

–1 A perfect negative relationship. All the points lie on a line sloping down to the right.

IMPORTANT: Note that the terms “weak” and “strong” here are only descriptive. You must test the null hypothesis to determine whether or not a relationship actually exists.

IMPORTANT: These correlation coefficients MUST be

between –1 (minus one) and +1 (plus one). Any result outside these limits is an error.

Pearson’s parametric correlation coefficient for quantitative data

Pearson’s correlation coefficient – also called the parametric correlation coefficient, or just the correlation coefficient – can be used when both variables are quantitative (which usually means counted or measured).

Read Section 11.1 of the Statistical Manual, then continue with the Study Guide.

Review the example of “Testing a correlation: Pearson’s” on the CD.

Explore the “Data-Graph” and “Correlations” pages in the Excel file “Relationships” on the CD.

As described in the Manual, once the correlation coefficient has been calculated, the null hypothesis – that there is no relationship – can be tested quite simply. It is just a matter of comparing the calculated value to that from the tables, with the appropriate degrees of freedom (which is n – 2).

Spearman’s non-parametric correlation coefficient for ranked data

In some situations, one or both variables are ranked (or ordinal) variables, and not quantitative. For instance, in environmental studies, “damage” may be recorded on a scale of, say, 1,

Topic 7: Testing hypotheses about relationships 77

Study Guide SBI209 Design and analysis of biological studies

meaning none, to 5, meaning severe. In behavioural studies, individuals may be ranked by their position in the social structure of the group, or in some dominance hierarchy.

In these situations, Spearman’s rank correlation coefficient – also called Spearman’s non-parametric correlation coefficient – is more appropriate.

Read Section 11.2 of the Statistical Manual, then continue with the Study Guide.

Review the example of “Testing a correlation: Spearman’s” on the CD.

Explore the “Data-Graph” and “Correlations” pages in the Excel file “Spearmans” on the CD.

Do Tutorial 7, “Testing hypotheses about relationships”, Exercises 1 and 2a, in the Problems book.

These are tests for linear trend

It is important to remember that these correlation coefficients test for linear trend. Examine Figure 21 below.

Calculating the correlation coefficient for the graph on the left (A) would give a reasonably high value – around 0.8 – because the points do fall fairly close to an upward sloping line. It is, however, very obvious that the relationship is curved.

X VARIABLE

Y V

AR

IAB

LE

A

X VARIABLE

Y V

AR

IAB

LE

B

Figure 21. Examples of non-linear relationships.

The correlation coefficient for the graph on the right (B) would be close to 0 (zero) because there is, in this case, no linear trend. But the relationship in Graph B is just as strong as that in Graph A.

IMPORTANT: This is a critical point! It is important to review plots of the data to check that any relationship present is linear.

Re-read Section 11.1.5.2 of the Statistical Manual, then continue with the Study Guide to see the calculations for Example B above.

Try entering these values – (0, 0); (1, 1); (2, 3); (3, 4); (4, 3); and (5, 1); and (6, 0) – into the “Data-Graph” page in the Excel files “Relationships” and “Spearmans” on the CD and looking at the correlations. Are they significant? Is there a relationship? [Note: (6, 0) means enter 6 for the X and 0 for the Y.]

Topic 7: Testing hypotheses about relationships 78

Study Guide SBI209 Design and analysis of biological studies

“ Correlation does not mean causation”

Once a significant correlation has been identified, it is tempting – too tempting for many people! – to conclude that the relationship thus identified is causal. In other words, that changes in the X variable (whatever it is) cause changes in the Y variable (whatever it is).

This temptation should be avoided! Changes in X may cause changes in Y but there may well be something more complicated going on. For instance, in most countries there are correlations between (a) the number of pubs (or hotels or bars) and the number of churches, and (b) the number of crimes and the number of police. It would, however, be quite wrong to conclude that (1) religion causes people to drink, or (2) that police cause crime!

These two examples, although silly, do illustrate the two main problems that can arise. The numbers of pubs and churches are correlated, not because religion drives people to drink, but because large towns have more people, so have more pubs and more churches. In other words, the numbers of both institutions are related to the number of people present, and the latter is the causal variable. And, of course, police do not cause crime: in this case there is a causal relationship, but it goes the other way.

IMPORTANT: This is a critical point! Much nonsense is generated, and spread, by assuming that a correlation indicates a causal relationship. This point is so important that it is explored on the CD.

Review the examples of “Misinterpreting correlations” on the CD.

Sub-topic 3: Linear regression

Calculating a correlation coefficient enables us to test whether or not a relationship exists. Sometimes we want, or need, to go further and describe that relationship. Describing the relationship is, in a way, like calculating a mean: it allows us to describe the results in more detail.

The simplest way to describe a linear, or straight-line, relationship is to work out the equation that best represents (what is meant by “best represents” will be discussed shortly) that relationship. The general equation for a straight-line is:

• Y = a + b × X

In this equation, Y is the Y-variable plotted on the Y-axis, and X is the X-variable plotted on the X-axis. The letter “a” stands for a number, called the Y-intercept, which is the point where the line crosses the Y-axis. The other letter “b” stands for another number, the slope, which indicates how steeply the line slopes. If “b” is a positive number, the line slopes up; if “b” is a negative number it slopes down.

The equation of the line describes it precisely. Further, in some cases the values of the numbers, a and b, may themselves be of special interest. For instance, these numbers may be compared with predicted values to determine if some general theory holds: physiology and island biogeography are two fields in which this is commonly done.

Topic 7: Testing hypotheses about relationships 79

Study Guide SBI209 Design and analysis of biological studies

The other main use of the equation is that it enables us to predict what Y should be for different values of X. Suppose, for instance, that we have the following equation for mud crabs:

• Weight (g) = –1782.12 + 17.10 × Carapace width (cm)

With this equation, we can now estimate the weight of any mud crab, provided we have measured its carapace width. Equations are often used in this “instrumental” fashion.

What does “ best represents” mean?

Most of the time, the equation that is calculated is the one that allows us to make the best predictions of Y from X that we can (with the data we have). When the equation is derived especially for the purpose of making predictions, this is both sensible and essential.

Some people consider that this way of defining “best represents” may not be as suitable in other situations: in other words, in situations in which making predictions is not the primary aim. The arguments raised are rather technical, and the alternative methods proposed are not without their own problems. The linear regression method described in this unit is the one that is suitable for making predictions and is also the method most generally used.

IMPORTANT: The equations for predicting Y from X and X from Y will usually not be the same, so you can’t derive the equation to predict Y from X and simply rearrange this to predict X from Y: you must re-calculate the equation. This is a consequence of the way in which “best represents” is defined (and this is further discussed in the Statistical Manual).

Read Section 12 of the Statistical Manual, then continue with the Study Guide.

Review the example “Calculating a regression” on the CD.

Explore the “Regression” page in the Excel file “Relationships” on the CD.

Do Tutorial 7, “Testing hypotheses about relationships”, Exercise 2b, in the Problems book.

Sub-topic 4: Non-linear relationships

There is one practical problem that frequently arises in biology when trying to use the methods describes in the previous two sub-topics: biological relationships are often curved, not linear. In such situations, “straight-line” methods may give unreliable and inaccurate results.

Two common kinds of curved relationships are shown in Figure 22 and it is fairly obvious that a straight-line doesn’t describe either situation very well. Predictions from a straight-line “fitted” through either of these relationships will probably be unreliable.

Topic 7: Testing hypotheses about relationships 80

Study Guide SBI209 Design and analysis of biological studies

LENGTH

WE

IGH

TA

AREA

SP

EC

IES

B

Figure 22. Two examples of common non-linear relationships.

There are two general ways of proceeding in this situation:

• Transform one, or both, of the variables so that the relationship is no longer curved, then use standard straight-line methods.

• Use other methods to derive and fit a more complex equation.

Both of these kinds of approaches are examined below.

Transforming variables

A curved relationship can often be “straightened” by a relatively simple transformation of one or both variables.

The relationship on the left in Figure 23 (A) is clearly curved, as such weight-length relationships often are. If, however, we plot the logarithm of the X and Y values, instead of the original observations, the resulting relationship will often be straight, or nearly so (Figure 23B). We can then proceed to use the usual methods for calculating correlation coefficients and regressions.

LENGTH

WE

IGH

T

A

LOG(LENGTH)

LOG

(WE

IGH

T)

B

Figure 23. Transforming a curved relationships.

A wide range of transformations are possible but the most commonly used, and most commonly effective, involve taking logarithms of one, or both, variables. In part this is because there are theoretical reasons for expecting some relationships to display patterns (patterns which can be easily straightened by log transforms). For instance, under certain reasonable assumptions, weight-length relationships can be expected to have a particular type of shape, called a power curve, which can be straightened by a double-log transform (i.e. taking logarithms of both variables).

Read Section 13.1 of the Statistical Manual, then continue with the Study Guide.

Review the example of “Transforming a relationship” on the CD.

Topic 7: Testing hypotheses about relationships 81

Study Guide SBI209 Design and analysis of biological studies

Explore the “Transform” and “Regressions” pages in the Excel file “Relationships” on the CD.

Do Tutorial 7, “Testing hypotheses about relationships”, Exercises 3 and 4, in the Problems book.

Other methods

In some situations, transformation may not give satisfactory results, or may be undesirable for other reasons. In such situations it may be necessary to use other methods: methods which are better suited to working with curved relationships. These are briefly discussed in the Manual but they are really beyond the scope of this unit. One of these methods – multiple regression – is also discussed briefly in Topic 8, because it is more generally useful.

Read Section 13.2 of the Statistical Manual, then continue with the Study Guide.

Summary • The degree of relationship between two variables (Y and

X) can be measured using a correlation coefficient. The correlation coefficient ranges from –1 (minus one) to 1 (plus one), with –1 indicating a perfect negative (sloping down) relationship and +1 indicating a perfect positive (sloping up) relationship. A value of 0 (zero) indicates no relationship at all.

• When both variables are quantitative, Pearson’s parametric correlation coefficient can be used. When one, or both, variables are ranked, Spearman’s non-parametric correlation coefficient can be used.

• The null hypothesis that there is no relationship can be tested by comparing the calculated correlation coefficient to the appropriate set of tables.

• The method of linear regression can be used to derive the straight-line equation that best predicts Y from X.

• Curved relationships are common in biology but can often be handled using linear regression if one, or both, of the variables are transformed.

• In situations in which transformations are not effective, or not desirable, other methods are available.

82

Study Guide SBI209 Design and analysis of biological studies

Topic 8: An introduction to multivariate analysis 83

Study Guide SBI209 Design and analysis of biological studies

Topic 8: An introduction to multivariate analysis DE

SIGN

AND

ANA

LYSI

S OF

BI

OLOG

ICAL

STU

DIES

Topic 1 Approaches to

biological studies Sub-topic 1: Why use multivariate analysis? • No additional material

Sub-topic 2: Kinds of multivariate analysis • No additional material

Sub-topic 3: Multiple regression • No additional material

Sub-topic 4: Cluster analysis • Example: Uses of cluster analysis [CD]

Sub-topic 5: Ordination • Example: Uses of ordination [CD] Note: [CD] means that the item is on the CD-ROM.

Topic 2 Elements of design

Topic 3 Problems and solutions

Topic 4 Testing hypotheses about

one or two means

Topic 5 Testing hypotheses about

more than two means

Topic 6 Testing hypotheses

about frequencies

Topic 7 Testing hypotheses

about relationships

Topic 8 An introduction to

multivariate analysis

Topic 8: An introduction to multivariate analysis 84

Study Guide SBI209 Design and analysis of biological studies

Introduction

Most of this unit deals with some methods which can be used to test hypotheses about one (Topics 4–6), or at most two (Topic 7), variables. This is appropriate, as these methods suffice in many situations, as the examples used throughout should illustrate.

Of course, many studies collect information on more than one or two variables. And in some of these studies, focussing on only one or two variables may rather constrain or limit the questions being asked, or hypotheses being tested.

For instance, a study looking at plant growth might involve measuring growth and several environmental variables (such as light, moisture, nutrients, etc.). Using the methods of Topic 7 we could test if there was a relationship between growth and any of these environmental variables (by calculating the correlation coefficient). We could also describe the relationship between growth and any one of these variables (by calculating a regression). Doing this would not, however, allow us to investigate how growth responded to all of the important environmental variables (to do this we would need to calculate a multiple regression; see below).

Situations – such as this plant growth study – where we want to look simultaneously at several variables require special multivariate methods. Topic 8 is an introduction to multivariate analysis. In it, we will cover these issues:

• some reasons for using multivariate analysis;

• general types of multivariate analyses;

• multiple regression, an extension of linear regression; and

• two of the most common types of multivariate analyses, cluster analysis and ordination.

Sub-topic 1: Why use multivariate analysis?

It should be apparent from the brief introduction above that there are some situations in which univariate and bivariate methods may not be ideal. In this section, we examine the question “Why multivariate analysis?” in a little more detail, first with reference to ecology and subsequently in other situations.

Ecological “community” questions

Community data

Before looking at “community” questions, it is helpful to consider some of the kinds of observations which are commonly collected in “community” ecology studies. The results of such studies are often recorded in tables similar to that shown on the next page (Table 5). (And this table illustrates a useful format to use for such studies.)

In the particular study illustrated, samples of the benthic animals were taken in three pools on each of two streams at two seasons. Two replicates were taken in each pool each time. In addition, information on environmental conditions was also collected.

Topic 8: An introduction to multivariate analysis 85

Study Guide SBI209 Design and analysis of biological studies

Table 5. Example of data table for a community ecological study.

Variables with information for sample identification and tracking

Variables with information about the study design

Variables with environmental information about each sample location

Variables with species information (abundances) (Columns for other species would follow, if more species were found.)

Variables with community information

Sam

ple

#

Sam

ple

ID

Dat

e

Stre

am

Pool

Rep

licat

e

Tem

pera

ture

Flow

Dep

th

Alg

al c

over

Spec

ies

1

Spec

ies

2

Spec

ies

3

Spec

ies

4

Spec

ies

5

Num

ber o

f Sp

H’ D

iver

sity

1 N11 3 May North 1 1 18 4 1.1 65 6 35 78 1 0 4 0.83

2 N12 3 May North 1 2 19 3 1.2 53 5 43 76 0 0 3 0.80

3 N21 4 May North 2 1 16 3 1.5 56 4 45 98 4 0 4 0.83

4 N22 4 May North 2 2 15 4 1.6 78 6 55 88 2 0 4 0.87

5 S11 6 May South 1 1 17 1 1.8 45 15 63 37 9 1 5 1.19

6 S12 6 May South 1 2 17 2 1.9 58 11 69 34 11 1 5 1.15

7 S21 7 May South 2 1 18 2 2.2 55 12 84 24 12 3 5 1.12

8 S22 7 May South 2 2 18 1 2.3 43 23 83 23 11 2 5 1.16

Sample ID is just the first letter of the stream name, followed by the pool number and replicate number. Temperature is in degrees C; flow is on a six point scale (0 = none to 5 = very fast); depth is in metres; algal cover is in percent cover. Values for species are counts of the number of individuals found. More columns would be used if more species were found. There might also be a column for “notes”.

Note that in this table the rows contain the information for the different samples. This is the usual arrangement, although in some cases the rows are the variables and the columns have the information for the different samples.

Topic 8: An introduction to multivariate analysis 86

Study Guide SBI209 Design and analysis of biological studies

The data table (Table 5) is a comprehensive record of the information collected. Note that information on five types of variables is present (usually in the order shown):

• Variables with information for sample identification and tracking. This information is important during data entry and checking, and for subsequent tracking of errors or anomalies.

• Variables with information about the study design. These columns specify the design of the study: in this case, which replicate, pool and stream each sample is from.

• Variables with environmental information. These columns have the results for the environmental variables recorded for each sample. (Note that in a study focussing on the algae, observations might be made for all the species present and these would then be “variables with species information”.)

• Variables with species information. These values are usually counts of the numbers of individuals of each species found, but this information might also be recorded as biomass or percent cover (for sessile species). (Note that, obviously, the correct species names, or abbreviations, would usually be used, if they were known.)

• Variables with community information. These values are derived from the species information. (The “number of species” is just a count of the number of species present and “H’ diversity” is calculated from a standard formula.)

With this table of data as an example, let us proceed to consider what sorts of questions might be asked – or what sorts of hypotheses might be tested – as part of studies on the ecology of these streams. (We won’t, in this case, work through the entire model development procedure.)

Univariate questions or hypotheses

Studies might proceed by focussing on individual species or environmental conditions. For instance, these questions might be asked (for simplicity we’ll just look at the questions and take the hypotheses as “understood”):

• Is Species 1 (which might be of some special interest) equally abundant in both streams? (The null would be: Species 1 does not differ in abundance between North and South Streams.) Similar questions could, of course, be asked for the other species.

• Does algal cover differ between the streams? Again, similar questions could be asked for the other environmental variables.

• Does the number of species differ between streams? A similar question could be asked about H’ diversity.

All of these questions (and their matching hypotheses) focus on single variables. The last question does address a “community ecology” issue, because “diversity” is an attribute of communities, but does so by looking at single variables.

Bivariate questions or hypotheses

The univariate questions focus on a single variable, either a species variable, an environmental variable, or a single

Topic 8: An introduction to multivariate analysis 87

Study Guide SBI209 Design and analysis of biological studies

community variable. We might also ask questions about relationships (again we’ll omit the hypotheses):

• Is the abundance of Species 1 related to the flow rate? And similar questions could be asked about relationships between other species and this, or other, environmental variables.

• Is the flow rate related to the depth? Again, similar questions could be asked about relationships between the other environmental variables.

• Is H’ diversity related to algal cover? And, again, similar questions could be asked about relationships between the other community variable and this, or other, environmental variables.

The answers to these questions might very well be of interest, but note that any particular question still only concerns only a relatively small amount of the data collected for each sample.

Community questions

Derived community variables, such as the number of species or H’ diversity, summarise complex information but, in doing so, also omit potentially valuable details. For instance, two pools may have exactly the same number of species but these may be totally different species. These communities in these pools would have the same diversity but differ in structure (because the actual species present would be different).

Thus, analyses of derived community variables may not “tell the complete story”. Also, as noted in the introduction, we may want to know how the species, or community, respond to the

entire suite of environmental variables (and not just depth or temperature). Consider these questions:

• What is the relationship between the abundance of Species 1 and algal cover, temperature, depth and flow?

• Considering all the environmental variables together, are the differences between the environments in the two streams greater than between the two pools in each stream?

• Do the communities in the two streams differ in structure?

• Is the community in Pool 2 on South Stream unusual in structure? Or is its structure similar to that of the other pools (from both streams)?

Questions like these cannot be easily answered (if at all) by focussing on individual environmental or species variables, or even by looking at relationships between pairs of variables. They require analyses which focus on entire suites, or sets, of variables. Multivariate methods are designed to address these sorts of questions.

Other situations

The data in Table 5, and the subsequent questions, are ecological in nature but similar sorts of data sets, and questions, arise in other disciplines.

• In studies of animal behaviour, the “species variables” might be replaced by observations on different kinds of behaviour (e.g. aggression, feeding, grooming). The environmental variables would then probably refer to

Topic 8: An introduction to multivariate analysis 88

Study Guide SBI209 Design and analysis of biological studies

potentially important features of each animal and its environment (e.g. size, age, condition, social position).

• In genetic studies, the “species variables” might be observations on the genetic characteristics of different samples. The “environmental variables” might then record potentially important information about the source of each sample.

• In physiological studies, the “species variables” might be observations on physiological variables for plants or animals. The “environmental variables” would again record potentially important information about the environment of each individual sampled.

• In evolutionary studies, the “species variables” might be observations on various morphological features (e.g. size, colour pattern, dentition).

In any of these situations, researchers may well proceed by testing univariate or bivariate hypotheses, using the kinds of methods described in earlier topics. They may also, however, need to address questions which require focussing simultaneously on several variables and, therefore, need to turn to multivariate methods.

Early restrictions on the use of multivariate methods

Most multivariate methods, even for small data sets, require a reasonable amount of “computing grunt”. Now it is possible – indeed it is usually easy – to do complex analyses on a desktop (or laptop) computer that would have required a fairly large “main-frame” computer just 10 to 15 years ago. Not only has raw computing power increased phenomenally, so has the

range of statistical packages available, and their “user friendliness”.

In the “old days”, when statisticians derived analyses, they usually had to pay fairly careful attention to the practicality of doing the calculations required. There was little point in deriving some “gee whiz” method if it would take a roomful of mathematical geniuses two years to analyse one small data set!

Such considerations are now of much less importance. Computers are not particularly smart but they are extra-ordinarily fast. Several modern methods rely on this raw speed.

Sub-topic 2: Kinds of multivariate analysis

Tabachnick and Fidell (1989), in their book Using Multivariate Statistics, divide multivariate methods into four groups with different objectives:

• Methods for examining differences among groups;

• Methods for examining relationships among variables;

• Methods for examining group membership; and

• Methods for examining structure.

Read Sections 14.1 and 14.2 of the Statistical Manual, then continue with the Study Guide.

As noted in the Manual, the first two of these four groups represent extensions of the sorts of methods discussed in Topics 4 to 7 in this Study Guide. Most of these methods are not commonly used and will not be discussed further. One important exception is multiple regression, which is considered in the next section.

Topic 8: An introduction to multivariate analysis 89

Study Guide SBI209 Design and analysis of biological studies

Sub-topic 3: Multiple regression

Multiple regression is a (relatively) simple extension of ordinary regression. Using ordinary, linear, regression we can calculate the equation of a straightline to enable one variable to be prediction from another. For instance, a study of plant growth might result in the following equation (where a and b are numbers):

• Growth = a + b × Nitrogen

We would, however, expect that plant growth might well be influenced by other factors, besides nitrogen, and want to take these into account. Using multiple regression we can do this.

Read Section 14.3 of the Statistical Manual, then continue with the Study Guide.

An expanded “growth prediction equation” might look like this (where a, b1, b2 and b3 are numbers; N is nitrogen and P is phosphorus):

• Growth = a + b1 × N + b2 × P + b3 × Moisture

In this equation, “growth” is the dependant variable – as it is in simple straightline equation – but there are three independent variables, “nitrogen”, “phosphorus” and “moisture”.

The concept of multiple regression is relatively simple but the calculations definitely are not! A computer and a decent statistical package are required to do them.

Limitations of multiple regression

The Manual mentions some limitations. A major one is that all the relationships between the dependent variable and the independent variables are assumed to be linear.

A second is that the results can depend critically on exactly which independent variables are included. For instance, suppose an equation is required to predict, for benthic communities, to predict diversity from environmental measurements (this is derived from the example ”Mis-interpreting correlations: extended example” on the CD for Topic 7). The table below (derived from a “dummy” data set) shows the values of the various coefficients (for Zinc, Nutrients and Depth, these are the multipliers) vary depending on which variables are included in the equation.

Table 6. Effect of excluding variables on results of multiple regession.

Value of coefficients

Variables included

Intercept Zinc Nutrients Depth

Zinc, Nutrients, Depth

45.41 1.29 5.06 0.05

Nutrients, Depth

65.82 – 5.00 0.22

Zinc, Depth

601.38 –2.51 – 0.34

Note that most of the numbers change to some degree, depending on which variables are included. For instance, the multiplier for depth is 0.05, 0.22 or 0.34, depending upon which

Topic 8: An introduction to multivariate analysis 90

Study Guide SBI209 Design and analysis of biological studies

of the other two variables are included. The value for Zinc is positive (+1.29) if all independent variables are used but negative (–2.51) if nutrients is not included. This is a big change!

Aside from these potential problems, multiple regressions, just like bivariate correlations and regressions, reveal relationships but not causes.

Sub-topic 4: Cluster analysis – “ group membership”

Most people will have seen a diagram similar to the one below:

Chimpanzee

Cane toad

Dingo

Human

Figure 24. Evolutionary "tree" for four species.

It shows the evolutionary relationships among species: that is, it groups species on the basis of the closeness of their evolutionary links. The “tree” links together, or groups together, species on the basis of their degree of evolutionary relationship.

With an evolutionary diagram like this, we can even give the groups names. The group comprising the chimpanzee and

human is the primates. The group including the dingo, chimpanzee and human is the mammals. And the group including all the animals is the vertebrates.

Cluster analysis is the process of grouping “entities” together on the basis of their similarity. The “entities” may be species, samples, times, study sites or individuals. And “similarity” may be measured by closeness of evolutionary relationship, similarity in genetic composition, similarity of community structure, or similarity in environmental conditions. The overall aim is to form groups on the basis of the shared similarity of the entities in them.

Measures of similarity

Computers are “numerical animals”. The evolutionary tree in Figure 24 was constructed by humans evaluating a diverse array of different types of evidence. A computer working at the same task requires that the evidence, whatever it is, is represented by numbers. The computer can then use those numbers to determine patterns of similarity.

The first step in doing a cluster analysis is, therefore, in deriving some measure of similarity between the different pairs of “entities” under study. The results of this first step are often presented in a “table of similarities”. (Sometimes “dissimilarity” is measured but this is usually just the reverse of similarity. “Dissimilarity” is also sometimes called “distance”.)

Topic 8: An introduction to multivariate analysis 91

Study Guide SBI209 Design and analysis of biological studies

Figure 25. Table showing similarities between four species (note: values invented).

Toad Dingo Chimp Human

Toad – 29% 27% 26%

Dingo 29% – 75% 74%

Chimp 27% 75% – 99%

Human 26% 74% 99% –

These similarities are derived mathematically from the original observations on each species. If we were clustering natural communities, then the similarities would (usually) be derived from the abundances of the different species in each community.

Different methods of determining similarity are appropriate in different situations, and in some cases, there may be a choice between alternative similar methods. Discussion of these is beyond the scope of this unit.

Clustering strategies

Once the “table of similarities” (or dissimilarities) is available, the next step is to use this information to successively group similar “entities” together.

Read Section 14.4 of the Statistical Manual, then continue with the Study Guide.

Review the examples of cluster analysis on the “Examples of cluster analysis” page on the CD.

A wide range of different “strategies” are available for doing the clustering and the outcome may well depend on the particular method selected. Again, discussion of the various alternatives is beyond the scope of this unit.

Sub-topic 5: Ordination – “ structure”

Cluster analysis is a good method for putting “entities” into groups but it does not always display the relationships among the “entities” particularly well. For instance, the first cluster example on the CD groups some major Australia on the basis of the distance between them. In this analysis, Townsville and Cairns are grouped together because they are closest, but Brisbane is put into a group which includes Sydney, Melbourne, Alice Springs, Yulara and Adelaide. Obviously, this does not display the true geographic relationships of these places all that well.

Ordination methods attempt to display relationships by plotting the “entities” on two or three dimensional maps (maps with four, or more, dimensions are possible but difficult for us to interpret). In general, two dimensional maps are used, if they adequately represent the relationships present.

As in cluster analysis, the “entities” may be species, samples, times, study sites or individuals. Also, as in cluster analysis, the first step in the process is usually calculating a table of similarities (although this does depend on the method used).

The end result of an ordination is a diagram, or “map”, that is similar to that in Figure 26 (assuming that the results can be displayed in two dimensions). Points which are close together represent “entities” – species in this case – which are rather

Topic 8: An introduction to multivariate analysis 92

Study Guide SBI209 Design and analysis of biological studies

similar and points which are widely separated represent “entities” which are very different.

Bottle tree(baobab)

Eucalypt

Date palm

Sugar cane

Cactus

thinstem

thickstem

short

tall

Figure 26. Example of an ordination of species (invented example). Note that the arrows are explained later.

Read Section 14.5 of the Statistical Manual, then continue with the Study Guide.

Review the examples of ordination on the “Examples of ordination” page on the CD.

The term “structure” is used with reference to these methods because the results of an ordination may reveal something of the processes “structuring” the relationships. For instance, examining the ordination in Figure 26 reveals that species are separated across the page (left to right) primarily on the basis of

the thickness of the main stem. Separation up and down the page (bottom to top) is primarily on the basis of height. The important underlying factors here are, therefore, stem thickness and height.

As with cluster analysis, there are several alternate methods which can be used to “ordinate” a set of entities. Again, discussion of these is beyond the scope of this unit.

Sub-topic 6: Limitations of some multivariate methods

The emphasis in this unit, particularly in Topics 4 to 7, has been on testing hypotheses derived from models. Some multivariate methods, such as multiple regression and MANOVA (multivariate analysis of variance), can be used in this way. In these cases, the “logic” is essentially the same as used throughout this unit (although the actual statistical methods may be considerably more complex).

Some multivariate methods – and this usually includes cluster analysis and ordination (although it depends what methods are used) – do not “lend themselves” that well to formal tests of hypotheses. This is changing as new methods are developed. Nonetheless, much of the interpretation of the outcomes of multivariate analyses is essentially subjective, often consisting of inspection of the results for meaningful patterns. This can make the conclusions of such studies a little less certain and rather more debatable.

These limitations may not be important in some circumstances and they are being overcome in others. Even when these limitations exist, it is important to recognise that multivariate

Topic 8: An introduction to multivariate analysis 93

Study Guide SBI209 Design and analysis of biological studies

methods may be used to address questions which cannot be examined using other methods (or, perhaps, may be examined but only in a limited fashion). In such situations, their advantages usually greatly outweigh their limitations.

Summary • Univariate and bivariate methods are often appropriate

and sufficient for answering biological questions and testing hypotheses.

• In some situations, however, focussing on only one or two variables may not address the question adequately (or test an appropriate hypothesis).

• Multivariate methods use information on several variables simultaneously. The two types of methods used most commonly in biology are cluster analysis and ordination. Several different approaches are available for each of these two types of methods.

• Cluster analysis groups items (samples, sites, times, individuals) together on the basis of their similarity. The results are usually presented as a kind of “tree diagram” (called a “dendrogram”). There are several approaches and options.

• Ordination plots items (samples, sites, times, individuals) in space based on their similarity. The results are usually presented as a kind of two-dimensional map (assuming that the results can be adequately displayed in two dimensions). Again, there are several approaches and options.

References and Bibliography 94

Study Guide SBI209 Design and analysis of biological studies

References and Bibliography Andrew, N.L. & Mapstone, B.D. 1987. Sampling and the

description of spatial pattern in marine ecology. Oceanogr. & Mar. Biol. Ann. Rev., 25:39–90.

Box, G.E.P., Hunter, W.G. & Hunter, J.S. 1978. Statistics for Experimenters: Introduction to Design, Data Analysis and Model Building. John Wiley & Sons.

Chalmers, A.F. 1982. What is this Thing Called Science? : An Assessment of the Nature and Status of Science and its Methods. University of Queensland Press.

Cochran, W.G. & Cox, G.M. 1957. Experimental Designs. John Wiley & Sons.

Conover, W.J. 1980. Practical Nonparametric Statistics. John Wiley & Sons.

Day, R.W. & Quinn, G.P. 1989. Comparisons of treatments after an analysis of variance. Ecological Monographs 59: 433–463.

Fowler, J., Cohen, L., & Jarvis, P. 1998. Practical Statistics for Field Biology. John Wiley & Sons.

Hurlbert, S.H. 1984. Pseudoreplication and the design of ecological field experiments. Ecological Monographs, Volume 54:187–211.

McGuinness, K.A. 2002. Of rowing boats, ocean liners and tests of the ANOVA variance homogeneity assumption. Austral Ecology 27: 681–688.

Neter, J., Wasserman, W. & Kutner, M.H. 1985. Applied Linear Statistical Models. Richard D. Irwin Inc.

Quinn, GP & Keough MJ 2002 Experimental design and data analysis for biologists. Cambridge University Press.

Ruxton, GD & Colegrave N. 2003 Experimental design for the life sciences. Oxford University Press.

Siegel, S. 1956. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, Ltd., London.

Snedecor, G.W. & Cochran, W.G. 1980. Statistical Methods. Iowa State University Press, Ames, Iowa.

Sokal, R.R. & Rohlf, F.J. 1969. Biometry. Freeman. Steel, R.G.D. & Torrie, J.H. 1980. Principles and Procedures of

Statistics: A Biometrical Approach. McGraw-Hill. Tabachnick, B.G. & Fidell, L.S. 1989. Using Multivariate Statistics.

Harper & Row Publishers. Underwood, A. J. 1981. Techniques of analysis of variance in

experimental marine biology and ecology. Ann. Rev. Oceanogr. Mar. Biol. 19, 513–605.

Underwood, A. J. 1990. Experiments in ecology and management: their logics, functions and interpretations.Australian Journal of Ecology 15:365–389.

Underwood, A. J. 1997. Experiments in Ecology, Their Logical Design and Interpretation Using Analysis of Variance. Cambridge University Press, Cambridge.

Winer, B. J., Brown, D. R. & Michels, K. M. 1991. Statistical Principles in Experimental Design, 3rd edn. McGraw-Hill, New York.

Zar, J. H. 1984. Biostatistical Analysis, 2nd edn. Prentice Hall, Englewood Cliffs.