Download ppt - DATA ANALYSIS FOR RESEARCH PROJECTS. TYPES OF DATA Quantitative data measurements use scale with equal intervals examples include mass (g), length (cm),

DATA ANALYSIS FOR RESEARCH PROJECTS

TYPES OF DATA

Quantitative data measurements use scale with equal intervals

examples include mass (g), length (cm), volume (mL), temperature (oC or K)

Qualitative data non-standard scales with unequal intervals or

discrete categories examples include gender, choice, color scales

Quantitative Scales of Measure

Scale Properties ExampleInterval

(equal)

Numerical value indicates rank and meaningfully reflects relative distance between points on a scale

Temperature (oC or oF)

Ratio

(equal)

Has all the properties of an interval scale, and in addition has a true zero point. (proportional scale)

Length

Weight

Temperature (K)

Qualitative Scales of Measure

Scale Properties ExampleNominal

(to name)

Data represents qualitative or equivalent categories (not numerical, cannot be rank ordered).

Eye color, hair color

Gender

Race

Ordinal

(to order)

Numerically ranked, but has no implication about how far apart ranks are.

Grades

Rating Scales

Sample Data

An experiment was conducted to measure the tensile strength of each of twelve pieces of two types of steel. The data from this experiment are given in the table to the right.

Is there a significant difference in tensile strength between the two types of steel?

23.39 25.4527.89 22.9224.29 27.4225.15 27.6524.28 27.3129.50 27.2625.36 25.5818.75 25.6222.93 26.6129.60 25.9213.82 27.4627.34 26.46

Steel 1 (1000 lb/in^2)

Steel 2 (1000 lb/in^2)

Is there a better way to compare the data from these groups?

What have you used before to compare data from two different groups?

It is difficult to decide (consistently) whether differences between experimental groups are significant

We need a rigorous procedure that includes a clear operational definition of dissimilarity.

Statistics & Statistical Analysis

Statistical hypothesis-testing methods give us the ability to say with confidence that differences between groups are real and not just due to random chance, sampling errors, or other mistakes in data collection.

Sample data for consideration…

For the following sets of data, discuss:– What was the IV and DV tested?– How should the data be processed to

determine if the IV affects the DV?– How will you decide if the IV has a

significant effect on the DV?

Sample Data Set 1Effect of Temperature on the pressure of a sample of gas above water

Temperature of Water (oC) Pressure (mmHg)

50 90

55 120

60 145

65 180

70 219

75 264

80 310

Graphing data

Correlation coefficient gives a measure of how strong the relationship is between the graphed variables.

Multiple trials can and should all be analyzed at the same time.

Sample Data Set 2 Effect of Stress on the Height of Bean Plants after 30 Days

Stressed Plants (cm) Unstressed Plants (cm)

55.0 48.0

65.0 65.0

50.0 59.0

57.0 57.0

59.0 51.0

73.0 63.0

57.0 65.0

54.0 58.0

62.0 44.0

68.0 50.0

Comparing levels of IV

If graphing the data is not appropriate, the different groups of the IV can be compared.

These types of statistics are called “Descriptive Statistics” since they:– describe the data sets– summarize groups of measurements

Descriptive Statistics:

Measure of Central Tendency attempt to provide one value that is most typical of the

entire set of data

What are some examples of measures of central tendency?

Variation describes the spread within the data set

* two sets of data with the same mean may have quite different spread within the data

QUANTITATIVE

DATA

QUALITATIVE

DATA

Central

Tendency

Measurement

Mean, Median

or Mode

Nominal Ordinal

Mode Median

Variation

Standard Deviation

Or

Range

Frequency Distribution

Appropriate Measures of Central Tendency and Variations for Types of Data

What is “standard deviation”???

The standard deviation is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data. This relates the variation in a set of data.

When the data points are pretty precise (close to the mean, little variation), the bell-shaped curve is steep, and the standard deviation is small.

When there is greater variation in the data, the bell curve is relatively flat. that tells you you have a relatively large standard deviation.

Displaying variation:Box-and-Whisker Plot

First Quartile (Q1) – smaller than 75% of ranked values Median (Q2) – smaller than 50% and larger than 50% Third Quartile (Q3) – smaller than 25% of ranked values

SMALLEST VALUE

FIRSTQUARTILE MEDIAN

THIRD QUARTILE

LARGEST VALUE

Illustrating Distributions for qualitative data: Histograms

Symmetrical – mean equals median Left-skewed – mean < median Right-skewed – mean > median

Statistical Hypothesis Testing “A trend is apparent in the graph of the data, is

this trend significant?” “So the means of the groups are different, is

the difference significant?”

Statistical hypothesis testing is needed to determine the significance in the results of your data analysis.

The results of these tests provide “Inferential Statistics.” We make inferential decisions based on the data we collect from a sample population.

Sample Data Effect of Stress on the Height of Bean Plants after 30 Days

Stressed Plants (cm) Unstressed Plants (cm)

55.0 48.0

65.0 65.0

50.0 59.0

57.0 57.0

59.0 51.0

73.0 63.0

57.0 65.0

54.0 58.0

62.0 44.0

68.0 50.0

Example for comparing means:t Test for Quantitative Data

Equal Sample Size

t =

= mean of Group 1

= mean of Group 2

= variance of Group 1

= variance of Group 2

= number of items or measurements

2

2

2

1

21

nss

xx

1x

2x

2

1s2

2s

n

Statistical calculations Use the TI-84 or TI-83

calculator OR Use Microsoft Excel Data

Analysis

Calculate the t-test for the stressed plants data on the next slide, using the graphing calculator

Level of Significance

Establish a level of significance

In this class, use 0.05.

this means the probability of error in

rejecting the null hypothesis is 5/100

OR

we can be 95% confident that the null

hypothesis may be rejected

Results from the calculator

t: value for the t-test x1: mean from List 1

x2: mean from List 2

Sx1: standard deviation for List 1

Sx2: standard deviation for List 2

df: degrees of freedom n1: number of values in List 1

n2: number of values in List 2

t-Test Results from Excelt-Test: Two-Sample Assuming Equal Variances

Stressed Plants (cm)

Unstressed Plants (cm)

Mean 60 56Variance 49.11111111 54.88888889Observations 10 10Pooled Variance 52Hypothesized Mean Difference 0df 18t Stat 1.240347346P(T<=t) one-tail 0.115386178t Critical one-tail 1.734063062P(T<=t) two-tail 0.230772356t Critical two-tail 2.100923666

Statistical Hypotheses(different from your research hypothesis)

Null Hypothesissuggests any observed difference between two sample means occurred by chance and is NOT significant

state that there is no relationship between variables: i.e. two means are equal OR they are not statistically different

Claim / Alternative Hypothesisderived from literature, research hypothesissuggests outcome of experiment if I.V. affects D.V.

Null Hypothesis

What would be the null hypothesis for this set of data?

The mean height of stressed plants is not The mean height of stressed plants is not significantly different from the mean height significantly different from the mean height of unstressed plants.of unstressed plants.

Confidence Levels

Probability that findings are repeatable Infers that results of sample are the same as

results of the whole population If we reject the null hypothesis at 95%

confidence level:– 95% certainty that difference between groups is

NOT due to chance– 95% certainty that results will be the same with

further testing

Confidence levels

Probablity of error: Error that occurs if null hypothesis is rejected when it is true and should not be rejected

Identified by Greek lowercase alpha, Researchers usually select < 0.05 If confidence level is 95%, then probability

of error () is 5%, or 0.05

Statistical Tests:Test Values and Critical Values

Test value – the result of a statistical test on your data.

Critical value – this is a reference value for each statistical test.– Your calculated statistical test value must exceed

this value for you to reject the null hypothesis You can find the critical value for each

statistical test in publications and university websites. (links available on my website)

If you use Microsoft Excel for your statistics, the critical value will be given with the results.

Significance of t valueDetermine the degrees of freedomdf = (number in experimental group – 1) + (number in control group – 1)

df = (10 – 1) + (10 – 1) = 18

Determine significance of calculated t by looking at table for critical t values

Calculated t < critical t not significant

Calculated t > critical t is significant

At df = 18, t = 2.101;

Calculated t of 1.24 < 2.101 and is not significant at 0.05 level.

Rejecting Null Hypothesis

If test value is not significant

null hypothesis is NOT REJECTED

If test value is significant

null hypothesis is REJECTED

Do Statistical Findings Support the Research Hypothesis?

Null hypothesis was rejected =Research hypothesis was supported

(unless research hypothesis IS a null hypothesis)

Null hypothesis was not rejected =Research hypothesis was not supported

Summary:Steps of Hypothesis Testing

1. State the null hypothesis and alternative hypothesis (claim)

2. Choose the confidence level (95%) and sample size

3. Collect the data and calculate the appropriate statistics

4. Make the proper statistical inference

Populations of Study – Be careful what you claim!

Samplespecific portion of the population that is selected for the study ( 100 bean seedlings used in the study)

Sampled Populationpopulation from which the sample was drawn (all the bean seedlings in the nursery from which the experimenter obtained their bean seedlings)

Target Population ALL units (persons, things, experimental outcomes) of the specific group whose characteristics are being studied (all the bean seedlings of the same species)

Communicating StatisticsEffect of Stress on the Mean Height of Bean Plants after 30 Days

Stressed Group Unstressed Group

Mean

Variance

Standard Deviation

1SD

2SD

Number

60.0 cm

49.1 cm

7.0 cm

53.0 – 67.0 cm

46.0 – 74.0 cm

10

56.0 cm

60.7 cm

7.8 cm

48.2 – 63.8 cm

40.4 – 71.6 cm

10

Results of t test t = 1.3 df = 18

t of 1.3 < 2.101 p > 0.10

Effect of Stress on the Height of Bean Plants After 30 Days

40

45

50

55

60

65

70

75

Stressed Unstressed

Treatment of Plants

Hei

gh

t (c

m)

Types of Tests

For Quantitative Data:– Linear Regression– One-Way Analysis of Variance (ANOVA)– t Test

For Qualitative Data:– Chi-Squared Test– Z Test

Linear Regression

Determines a linear relationship between two variables based on a correlation coefficient

H0: The number of yellow M&M’s is not related to the total number of M&M’s in the package.

ANOVA Test

Compares the means of more than two groups

H0: There is no significant difference between the numbers of M&M’s in plain packages, almond packages and peanut packages

t-Test Compares the means of two independent

groups

H0: There is no significant difference between the numbers of M&M’s in plain and peanut packages

Two-tail test determines if populations are not equal / the same (more difficult to support)

One-tail test determines if one mean is greater than the other (easier to support)

Chi-Squared Test

Determines if a proportion within a sample is larger than expected; can be used for more than two groups

H0: There are equal numbers of each color of M&M in a package.

Z-Test

Compares proportions between two groups

H0: There are equal proportions of red M&M’s in plain and peanut packages

Selecting a Statistical Test

Things to consider: Number of groups of data Type of data: Quantitative or

Qualitative Type of variable – numerical or

categorical The relationship in the null hypothesis

being tested

Statistical Tests Review

Comparison of two variables for correlation correlation coefficient test

Comparing means of more than two groups/levels ANOVA test

Comparing two means t-test Comparison of proportions within a

population X2 (chi-squared) test Comparison of proportions between

populations Z test

Key Questions for your Research:

What kind of data will you need to collect to test your hypothesis? (Qualitative or Quantitative)– What kind of scale will you use?– How do you plan on analyzing this data?

• Comparison of groups? What will you compare?• Look for a trend? What will you graph?

– How many different levels will you need data for?– How many trials?

What relevant qualitative data will you look for that may also help you interpret results?