10
Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green, Ph.D. The raw data (singular = datum) you have collected for the past two lab sessions are counts of the number of cells in various stages of mitosis. This chapter will guide you through the process of data analysis so that you can determine whether there is a difference between your treatment and control onion root tips. I. Data, Parameters, and Statistics: Quick Review Recall that data can be of three basic types: 1. Attribute data. These are descriptive, "either-or" measurements, and usually describe the presence or absence of a particular attribute. Because such data have no specific sequence, they are considered unordered. 2. Discrete numerical data. These correspond to biological observations counted as integers (whole numbers). These data are ordered, but do not describe physical attributes of the things being counted. 3. Continuous numerical data. These are data that fall along a numerical continuum. The limit of resolution of such data is the accuracy of the methods and instruments used to collect them. Continuous numerical data generally fall along a normal (Gaussian) distribution, a function indicating the probability that a data point will fall between any two real numbers. Usually, data measurements are distributed over a range of values. Measures of the tendency of measurements to occur near the center of the range include the population mean (the average measurement), the median (the measurement located at the exact center of the range) and the mode (the most common measurement in the range). Measurements of dispersion around the mean include the range, variance and standard deviation. Parameters and Statistics If you were able to measure the height of every adult male Homo sapiens who ever existed, and then calculate a mean, median, mode, range, variance and standard deviation from your measurements, those values would be known as parameters. They represent the actual values as calculated from measuring every member of a population of interest. Obviously, it is very difficult to obtain data from every member of a population of interest, and impossible of that population is theoretically infinite in size. However, one can estimate parameters by randomly sampling members of the population. Such an estimate, calculated from measurements of a subset of the entire population, is known as a statistic.

Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Embed Size (px)

Citation preview

Page 1: Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Mitosis Data Analysis - 1

Mitosis Data Analysis: Testing Statistical Hypotheses

By Dana Krempels, Ph.D. and Steven Green, Ph.D.

The raw data (singular = datum) you have collected for the past two lab sessions are counts of the number of cells in various stages of mitosis. This chapter will guide you through the process of data analysis so that you can determine whether there is a difference between your treatment and control onion root tips.

I. Data, Parameters, and Statistics: Quick Review Recall that data can be of three basic types:

1. Attribute data. These are descriptive, "either-or" measurements, and usually describe the presence or absence of a particular attribute. Because such data have no specific sequence, they are considered unordered.

2. Discrete numerical data. These correspond to biological observations counted as integers (whole numbers). These data are ordered, but do not describe physical attributes of the things being counted.

3. Continuous numerical data. These are data that fall along a numerical continuum. The limit of resolution of such data is the accuracy of the methods and instruments used to collect them. Continuous numerical data generally fall along a normal (Gaussian) distribution, a function indicating the probability that a data point will fall between any two real numbers.

Usually, data measurements are distributed over a range of values. Measures of the tendency of measurements to occur near the center of the range include the population mean (the average measurement), the median (the measurement located at the exact center of the range) and the mode (the most common measurement in the range). Measurements of dispersion around the mean include the range, variance and standard deviation. Parameters and Statistics If you were able to measure the height of every adult male Homo sapiens who ever existed, and then calculate a mean, median, mode, range, variance and standard deviation from your measurements, those values would be known as parameters. They represent the actual values as calculated from measuring every member of a population of interest. Obviously, it is very difficult to obtain data from every member of a population of interest, and impossible of that population is theoretically infinite in size. However, one can estimate parameters by randomly sampling members of the population. Such an estimate, calculated from measurements of a subset of the entire population, is known as a statistic.

Page 2: Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Mitosis Data Analysis - 2

In general, parameters are written as Greek symbols equivalent to the Roman symbols used to represent statistics. For example, the standard deviation for a subset of an entire population is written as "s", whereas the true population parameter is written as s. II. From Raw Data to Mitotic Index Now that you’ve had a chance to review a bit of statistical information, it’s time to apply it to your team’s project. In this section, you will be guided through the process of calculating indices from your raw data collected over the past two weeks, and then using those indices to compare the two populations of dividing cells, treatment and control.

A. Ordinal Data Points: Mitotic Index (M) When you counted mitotic cells in your samples, you were taking a survey of the number of different stages of mitosis present in each of your two populations (treatment and control). You counted the number of mitotic cells in 10 samples (remember: all the root tips from a single individual onion comprise one sample) in each of the treatment and control populations. You then calculated a Mitotic Index (M) for each sample. (Depending on the parameter your team chose, this might have been simply the number of mitotic cells in a sample (M), or it could have been the number of cells in a particular phase of mitosis in a sample (Mx, with x being the phase of mitosis). Be sure to specify the nature of your index in all your reports.) At the end of your preliminary calculations, you should have ten M values for each of the two populations you are comparing. You will use these M values in a Mann-Whitney U test to determine whether your two populations differ significantly in their states of mitosis. Recall the formula for a Mitotic Index, which represents the proportion/frequency of mitotic cells in your total cell population.

M = nm/N

nm = the number of mitotic cells in the sample

N = the total number of cells counted in the sample.

Your team should have counted at least 10 samples from each of your two root tip cell poppuations, and and should have Mitotic Indices for both. If you have not yet done so, calculate the indices and enter them in the table below. Provide an appropriate table legend.

Table .

Treatment Sample #

Mitotic Index (M)

Control Sample #

Mitotic Index (M)

Page 3: Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Mitosis Data Analysis - 3

[NOTE: If your team is calculating indices for one or more specific stages of mitosis, you will subject each of those paired sets of indices to the Mann-Whitney test, as well. This will be for you to decide.]

So what do we do with these indices? You may have an intuitive sense of whether or not your treatment and control overlap in the number of mitotic cells. But that’s not enough. Statistics and statistical tests are used to test whether the results of an experiment are significantly different from the null hypothesis prediction. What is meant by "significant?" For that matter, what is meant by "expected" results? To answer these questions, we must consider the matter of probability. B. Probability The probability (P value) that an observed result is due to some factor other than chance is also known as alpha (α). By convention, α is usually set at 0.05, or 5%, which means that there is a 95% probability that a particular outcome is due to some factor other than random chance. In essence, α is a “cut off value” that defines the area(s) in a probability distribution where a particular value is unlikely to fall. In some studies, a more rigorous α of 0.01 (1%) is required to reject the null hypothesis, and in some others, a more lenient α of 0.1 (10%) is allowed for rejection of the null hypothesis. For our study of mitosis, you will use an α level of 0.05. The term "significant" is often used in every day conversation, yet few people know the statistical meaning of the word. In scientific endeavors, significance has a highly specific and important definition. Every time you read the word "significant" in this lab manual, know that we refer to the following scientifically accepted standard: The difference between an observed and expected result is said to be statistically significant if and only if:

Under the assumption that there is no true difference, the probability that the observed difference would be at least as large as that actually seen is less than or equal to a (5%; 0.05).

Conversely, under the assumption that there is no true difference, the probability that the observed difference would be smaller than that actually seen is greater than 95% (0.95).

(Go ahead and read that as many times as it takes for it to make (1) sense, or (2) you fall asleep. Whichever comes first.) Once an investigator has calculated a statistic from collected data, s/he must be able to draw conclusions from it. How does one determine whether deviations from the expected (null hypothesis) are significant? A probability distribution assigns a relative probability to any possible outcome (e.g., a particular Mitotic Index). The mitotic indices you calculated for each sample, while expressed as numbers, are not distributed along a normal curve. They are ordinal, rather than continuous, data. For this reason, a non-parametric statistical test, the Mann-Whitney U test, will be employed for your analysis. C. Statistical Hypotheses A non-parametric test is used to test the significance of qualitative or attribute data such as those you have been collecting for this project. In the following sections, you will learn how to apply a statistical test to your data.

Page 4: Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Mitosis Data Analysis - 4

Your team should already have devised two statistical hypotheses stated in terms of opposing statements, the null hypothesis (Ho) and the alternative hypothesis (Ha). The null hypothesis states that there is no significant difference between the two populations being compared. The alternative hypothesis may be either directional (one-tailed), stating the precise way in which the two populations will differ (“Control Group will have more mitotic cells than Treatment Group.”), or non-directional (two-tailed), not specifying the way in which two populations will differ (“Control and Treatment will differ in the number of mitotic cells.”). To determine whether or not there is a difference in mitosis between your two populations (Treatment and Control), you must perform a statistical test on your data. III. Applying a Statistical Test to Your Mitotic Indices Once your team has calculated a Mitotic index (M) for each of your 10 samples from each of the two onion cell populations (Treatment and Control), you are ready to employ a statistical test to determine whether there is overlap between the range of calculated indices. If there is a great deal of overlap, then there is not a significant difference between them; you will fail to reject your null hypothesis. However, if there is very little overlap (5% or less), you can confidently conclude that the two cell populations do differ significantly; you will reject your null hypothesis.

Non-parametric test for two samples: Mann-Whitney U The Mann-Whitney test allows the investigator (you) to compare your two cell populations without assuming that your Mitotic Index values are normally distributed. The Mann-Whitney U does have its rules. For this test to be appropriate:

• You must be comparing two random, independent samples (Treatment & Control) • The measurements (Mitotic I ndices, in our case) should be ordinal • No two measurements should have exactly the same value (though we can deal

with “ties” in a way that will be explained shortly).

The Mann-Whitney U test allows the investigator to determine whether there is a significant difference between two sets of ordered/ranked data, such as those your team has collected in its mitosis study.

Here is a stepwise explanation and example of how to apply this test to your data.

1. State your null and alternative hypotheses.

Ho:

HA:

Example: Ho: There is no difference in the ranks of Mitotic Indices (M) between meristematic cells in an onion treated with aqueous trifluralin and an onion treated with plain water.

HA: There is a difference in the ranks of Mitotic Indices (M) between meristematic cells in an onion treated with aqueous trifluralin and an onion treated with plain water.

2. State the significance level (alpha, α) necessary to reject Ho. This is typically P < 0.05

3. Rank your Mitotic Indices from smallest to largest in a table, noting which index came from which population of cells (Treatment or Control).

Example: Table 1 shows 20 (imaginary) values for Mitotic Indices from the two onion root tip cell populations mentioned before, treated with trifluralin (T) and treated with plain water (C). Table 2 shows the values ranked and labeled by population.

Page 5: Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Mitosis Data Analysis - 5

(Notice in the ranked table that if two values are the same, then each receives the average of the two ranks. For example, value 0.35 appears twice (ranks 6 and 7). The sum of the rank values divided by two is their mean: 13/2 = 6.5. The two equal values thus “share” ranks 6 and 7 equally.)

Table 1. Example: Mitotic Indices for Table 2. Example: Ranked Mitotic Indices treatment and control root tips (not ranked) Sample # M treatment Mcontrol Rank Ranked M

values Cell Population

1 0.20 0.55 1 0.10 T 2 0.25 0.60 2 0.15 T 3 0.45 0.65 3 0.20 T 4 0.35 0.80 4 0.25 T 5 0.15 0.35 5 0.30 T 6 0.10 0.75 6.5 0.35 T 7 0.55 0.70 6.5 0.35 C 8 0.40 0.85 8 0.40 T 9 0.30 0.90 9 0.45 T 10 0.45 0.50 10 0.45 T 11 0.50 C 12.5 0.55 C 12.5 0.55 T 14 0.60 C 15 0.65 C 16 0.70 C 17 0.75 C 18 0.80 C 19 0.85 C 20 0.90 C

4. Assign points to each ranked value. Each “treatment” rank gets one point for every “control” rank that appears below it. Every “control” value gets one point for every “treatment” value that appears below it. For example, the first rank, 2(T) has 9 “control” values below it, so it gets 9 points. Value 9(C) has 3 “treatment” values below it, so it gets 3 points. (Table 3) Table 3. Points assigned to ranked M values in Treatment and Control onion cell popuiations. (example) Rank Ranked M

values Cell population Points

1 0.10 T 10 2 0.15 T 10 3 0.20 T 10 4 0.25 T 10 5 0.30 T 10 6.5 0.35 T 10 6.5 0.35 C 4 8 0.40 T 9 9 0.45 T 9 10 0.45 T 9 11 0.50 C 1 12.5 0.55 C 1 12.5 0.55 T 7 14 0.60 C 0 15 0.65 C 0 16 0.70 C 0 17 0.75 C 0 18 0.80 C 0 19 0.85 C 0 20 0.90 C 0

Page 6: Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Mitosis Data Analysis - 6

5. Calculate a U statistic for each category by adding the points for each cell population.

Utreatment = 10 + 10 + 10 + 10 + 10 + 10 + 9 + 9 + 9 + 7 = 94 Ucontrol = 4 + 1 + 1 + 0 + 0 + 0 + 0 + 0 + 0 = 6

Your final U value is the smaller of these two values. In this example our U value is 6. In general, the lower the U value, the greater the difference between the two groups being tested. (For example, if none of the M values overlapped, the U value would be zero. That means there is a large difference between the two groups: they do not overlap at all.)

6. You are now ready to move to the final step, determining whether to reject or fail to reject your null hypothesis. (Proceed to Section IV.) IV. Critical values for non-parametric statistics As you already know, a specific probability value linked to every possible value of any statistic, including the Mann-Whitney U statistic you just calculated.

Remember that we have defined our significance level (a) as 0.05. This implies that a correct null hypothesis will be rejected only 5% of the time, but correctly identified as false 95% of the time. A critical value of a statistic (e.g., your Mann-Whitney U statistic) is that value associated with a significance level of 0.05 or lower. The critical values for the Mann-Whitney U statistic are listed in Table 4.

Compare your U value to those shown in the Table of Critical Values for the Mann-Whitney U (Table 4). Find the sample size (i.e., the number of Mitotic Indices (M) you calculated) for each of your two cell populations, and use the matrix to find the critical value for U at those two sample sizes. (For example, if you calculated 19 M values for one cell population and 17 for the other, then the critical value of the U statistic would be 99. This means that a U value of 99 or lower indicates rejection of the null hypothesis.

If your U value is lower than the critical value at the appropriate spot in the table, reject your null hypothesis. If your U value is greater than that in the table, fail to reject. In our example of treatment and control groups with 10 samples each, we obtained a Mann-Whitney statistic of 6. This is far lower than the critical value of 23 required for rejection of the null hypothesis. This means that there is very little overlap between the two populations: they are significantly different. A complete table of Mann-Whitney U critical values can be found in Table 5.

Table 4. Small section of a table of critical values for the Mann-Whitney U test. Example: If both your treatment and control groups consist of ten values, then the critical value for the Mann-Whitney U is shown in the square marked with the red arrow.

Page 7: Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Mitosis Data Analysis - 7

Table 5. Critical values for the Mann-Whitney U statistic. Find the value that corresponds to the sample sizes (10) of your two cell populations. If your U value is smaller than that shown in the table, then there is less than 5% chance that the difference between your two cell populations is due to chance alone. If your U value is smaller than the one shown in this table for your two sample sizes, reject your null hypothesis. If your U value is larger than that shown in the table, fail to reject your null hypothesis. (From The Open Door Web Site, http://www.saburchill.com/)

V. Graphic Representation of your Data Tables of numerical data are important, but they are not always the best way to present your data to an audience. As the old saying goes, “A picture is worth a thousand words.” The most effective way to present your experimental results, whenever possible, is with a figure.

Page 8: Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Mitosis Data Analysis - 8

A. Mitosis Raw Data A simple bar graph can be used to represent the proportion of cells in your sample that you found in each stage of mitosis. An example can be seen in Figure 1.

Figure 1. A bar graph showing a hypothetical distribution of cells in each stage of mitosis in a study population of cells. Note that the categories could be placed in any order, and do not necessarily represent a continuum.

Don’t confuse a bar graph, which depicts categories of data that are not necessarily continuous, with a histogram, which depicts continuous data. An example of a histogram is shown in Figure 2.

Figure 2. A histogram showing a hypothetical distribution of cells of different diameter in a population of cells. Note that each bar on the histogram represents a specific subset of a range of continuous numerical data that occur in a set order.

Notice that these figures, unlike tables, have their legends underneath. Be sure to use the proper format for all figures and tables in all your work. B. Visualizing Mann-Whitney U results Because the Mann-Whitney U provides a measure of how great the overlap is between two groups being compared, a box plot is a good way to represent your Mann-Whitney U results. The box graph can be created to show the median of each group, the range of values, and their overlap. An example of a box plot is shown in Figure 3, with a key and explanation in Figure 4.

Page 9: Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Mitosis Data Analysis - 9

Figure 3. Sample box plot showing overlap of mitotic index values for two populations of cells.

Figure 4. The black bar in the center of each population’s values represents the median. The Interquartile Range (IQR) includes 50% of the values, and is bordered on the bottom by the 25th percentile and on the top by the 75th percentile. The range is the region between the minimum and maximum values. The star represents a data point that is an outlier.

VI. Experimental Error vs. Human Error Your team made sure that all factors except one—the chemical used on one population of onion root tips—were exactly the same for Treatment and Control groups. But did you get exactly the same number of mitotic cells in each sample? Probably not. What might account for slightly different results among samples?

Page 10: Mitosis Data Analysis: Testing Statistical Hypotheses · Mitosis Data Analysis - 1 Mitosis Data Analysis: Testing Statistical Hypotheses By Dana Krempels, Ph.D. and Steven Green,

Mitosis Data Analysis - 10

Slight variation in results in carefully run trials is known as experimental variability or experimental error. In this experiment, it could be due to genetic differences between individual onions or to other biological factors. Note that this natural variability is NOT the same as variability caused by actual mistakes in experimental technique (human error). DO NOT CITE HUMAN ERROR AS A REASON FOR UNEXPECTED RESULTS IN YOUR EXPERIMENT! THAT IS UNPROFESSIONAL. If you make accidental mistakes that could affect your results, you should re-do the experiment, not simply explain away those mistakes as “human error.” Citing human error as a good reason for your results is about as good as saying, “Oops! We are terrible at science. But we don’t really care enough to do it right.”

NEVER include human error in this or any future discussions of experimental variability. Experimental error ≠ mistakes! When contemplating your results, your fellow scientists will assume you have done your experiments as carefully as possible, and have minimized inaccuracies due to human error. In statistics, an outlier is a data point that is very different from the majority of the other data points. A data point’s outlier value may indicate experimental error or true variability. If the investigators suspect an outlier is due to experimental error, it may be excluded from the statistical analysis. However, it is always important to include outliers. Real data should not be ignored. VII. Project Completed. Is This the End? The study you are now completing is only the beginning of what could be a long-term research project to discover the various factors that direct and affect mitosis. The only thing you are determining now is whether or not there is a statistically significant difference between your treatment and control cell populations. In other words, the research project you are now completing is a pilot study. It establishes an observable fact (i.e., that there is or is not a difference in mitosis between cells treated with a particular chemical and those treated with a placebo (plain water)). That fact should be subject to further investigation beyond what you have accomplished here. Although you may have established that there is or is not a difference in mitosis between your treated and untreated roots, you still may not be able to definitively state why or why not there is a difference. To do that, you must move to the next step, which is to list as many competing hypotheses as possible as to why there is a difference (or even—if your team has obtained negative results—why there is not a difference, despite obvious differences in your two populations). Each of these multiple hypotheses could form the basis for a research project that would take your team one step further towards discovering the reasons for your pilot study’s observed result. You should be able to give a brief description of an experiment that could be designed to test each of your competing hypotheses. In your presentation, be sure to include a list of hypotheses that could explain your observed results. What factors differed between the cell populations that might cause differences in mitosis? Consider what is happening on a cellular and molecular level. When you analyze your results, think about every aspect of your findings, and report anything you find intriguing enough to warrant further study. Science is not a one-project endeavor. Every new piece of valid information can be seen as opening a new doorway to discovery of the most intimate mechanisms of life.