Assignment #6

Preview:

Citation preview

Assignment #6

Chapter 10: 14, 15 Chapter 11: 14, 18 Due tomorrow Nov. 6th by 2pm in your TA’s homework box

Assignment #7

Chapter 12: 18, 24 Chapter 13: 28 Due next Friday Nov. 13th by 2pm in your TA’s homework box

Reading

For Today: Chapter 14 For Tuesday: Chapter 15

Lab Report

•  Posted on web-site •  Dates

–  Rough draft due to TAs homework box on Monday Nov. 16th –  Rough draft returned in your lab section the week of Nov. 23rd –  Final draft due at start of your registered lab section the week of Nov. 30th

•  10% of course grade –  Rough Draft - 5% –  Final draft - 5% –  If you’re happy with your rough draft mark, you can tell your TA to use it for

the final draft

•  Read the “Writing a Lab Report” section of your lab notebook for guidance!!

Chapter 13 Review

Assumptions of t-tests

•  Random sample(s)

•  Populations are normally distributed

•  (for 2-sample t) Populations have equal variances

Detecting deviations from normality

• Previous data/ theory

• Histograms

• Quantile plots

• Shapiro-Wilk test

Sampled from a normally distributed population

Sampled from non-normally distributed populations

Detecting deviations from normality: by quantile plot

Normal data

Detecting differences from normality: Shapiro-Wilk test

A Shapiro-Wilk test is used to test statistically whether a set of data comes from a normal distribution.

What to do when the assumptions are not true

•  If the sample sizes are large, sometimes the parametric tests work OK anyway

•  Transformations

•  Non-parametric tests

•  Randomization and resampling

Data transformations

A data transformation changes each data point by some simple mathematical formula.

Log-transformation

" Y = ln Y[ ]

Y Y' = ln[Y]

Freq

uenc

y

Other transformations Arcsine

" p = arcsin p[ ] proportions

Square-root

" Y = Y +1 2 Counts; When standard deviaiton and mean increase

together Square

" Y = Y 2 Left skwed data Reciprocal

" Y =1Y

Right skewed data

Antilog

" Y = eY Left skewed data

Non-parametric methods

•  Assume less about the underlying distributions

•  Also called "distribution-free"

•  "Parametric" methods assume a distribution or a parameter

Sign test

•  Non-parametric test •  Compares data from one sample to a

constant •  Simple: for each data point, record

whether individual is above (+) or below (-) the hypothesized constant.

•  Use a binomial test to compare result to 1/2.

The sign test has very low power

So it is quite likely to not reject a false null hypothesis.

Most non-parametric methods use RANKS

•  Rank each data point in all samples from lowest to highest

•  Lowest data point gets rank 1, next lowest gets rank 2, ...

Non-parametric test to compare 2 groups

The Mann-Whitney U test compares the central tendencies of two groups using ranks.

Performing a Mann-Whitney U test

•  First, rank all individuals from both groups together in order (for example, smallest to largest)

•  Sum the ranks for all individuals in each group --> R1 and R2

Calculating the test statistic, U

U1 = n1n2 +n1 n1+1( )2

− R1

U2 = n1n2 −U1

U1 is the number of times an individual from pop. 1 has a lower rank than an individual from pop. 2, out of all pairwise comparisons.

Mann-Whitney: Large sample approximation

For n1 and n2 both greater than 10, use

Z =2U − n1n2

n1n2 n1+ n2 +1( ) / 3

Compare this Z to the standard normal distribution

Permutation tests •  Also known as “randomization tests” •  Used for hypothesis testing on measures of

association •  Mixes the real data randomly •  Variable 1 from an individual is paired with variable 2

data from a randomly chosen individual. This is done for all individuals.

•  The estimate is made on the randomized data. •  The whole process is repeated numerous times. The

distribution of the randomized estimates is the null distribution.

Male wingless

Male winged

0 1.4 0.7 1.6 0.7 1.9 1.4 2.3 1.6 2.6 1.8 2.8 1.9 2.8 1.9 2.8 1.9 3.1 2.2 3.8 2.1 3.9 2.1 4.5

4.7

Real data: Randomized data:

Y 1 −Y 2 = −1.41Male

wingless Male

winged 0.7 2.8 2.3 1.9 1.9 2.1 1.8 1.6 3.8 0 1.4 1.4 1.9 2.2 3.9 2.1 4.7 1.6 2.6 4.5 1.9 2.8 2.8 0.7

3.1

Y 1 −Y 2 = 0.41

1000 permutations

P < 0.001

Chapter 14 Designing Experiments

Types of studies

Experimental study Researchers assign treatments to units so that differences in response can be compared.

Observational Study Researcher has no influence over which subjects receive which treatments.

Why do experimental study?

•  Random assignment of treatments minimizes influence of confounding variables

•  Confounding variables mask or distort

the causal relationship between measured variables in a study

Confounding variables

Supplemental Oxygen (Explanatory variable)

Survive Mt. Everest (Response variable)

Preparedness (Confounding variable)

Unmeasured variable that masks or distorts the causal relationship between measured variables in a study

Goals of experiments

•  Eliminate bias

•  Reduce sampling error (increase precision and power)

Precise Imprecise

Biased

Unbiased

Design features that reduce bias

•  Controls

•  Random assignment to treatments

•  Blinding

Controls

A group which is identical to the experimental treatment in all respects

aside from the treatment itself.

Uncontrolled experiment

•  Treatment applied to group of subjects and response measured.

•  We cannot determine whether the treatment is the cause of the response.

Example: placebo

•  Some illnesses, e.g. pain and depression, respond to fact of treatment, even with no pharmaceutically active ingredients

•  Control: "sugar pills"

Example: independent recovery

•  Patients tend to seek treatment when they feel very bad

•  As a result, they often visit the doctor when they are at their worst. Improvement may be inevitable, even without treatment

•  Control: untreated group to compare with, if we want to measure the effects of a new therapy

Example: Stress associated with experimental methods

•  Stressful or intrusive methods may produce a response separate from the effect of the treatment of interest

•  Control: use same methods on group that does not get treatment of interest

Randomization

The random assignment of treatments to

units in an experimental study

Breaks the association between possible confounding variables and the

explanatory variable.

Randomization

Supplemental Oxygen (Explanatory variable)

Survive Mt. Everest (Response variable)

Preparedness (Confounding variable)

?

Randomization

•  Doesn’t eliminate variation caused by confounding variable, only their correlation with treatment

•  Variation from confounding variables is spread more evenly between treatments, so they create no bias.

Randomize using a random process

•  Example: Random number generator on computer (e.g. random.org)

1.  List all subjects

2.  Assign each a random number

3.  Assign treatment A to lowest numbers and B to highest numbers.

Experiment: individuals are randomly assigned to

treatments

Examples of wrong ways to randomize

•  Treatment A to all patients at one clinic and B to all patients at second clinic

•  Assign treatments alphabetically

•  Haphazard assignment (researcher trying to be random)

Blinding

•  Preventing knowledge of patient and/or experimenter of which treatment is given to whom –  Single blind – blind patient –  Double blind – blind patient and experimenter

•  Unblinded studies usually find much larger effects (sometimes threefold higher), showing the bias that results from lack of blinding

Reducing sampling error

t =Y 1 −Y 2

sp2 1

n1+1n2

#

$ %

&

' (

Increasing the signal to noise ratio

"Signal"

"Noise"

Reducing sampling error Increasing the signal to noise ratio

If the "noise" is smaller, it is easier to detect a given "signal".

Can be achieved with smaller s or larger n. €

sp2 1n1

+1n2

"

# $

%

& ' .

Design features that reduce the effects of sampling error

•  Replication

•  Balance

•  Blocking

•  Extreme treatments

Replication

The application of every treatment to

multiple, independent experimental units

Replication

Replication

SEY1−Y2

= sp2 1n1+1n2

"

#$

%

&' Larger n reduces

sampling error

What are experimental units? •  Units that are randomly sampled and assigned

treatments –  Single individuals –  Batches of individuals that are more similar to

each other than to other batches (e.g. family)

•  Pseudoreplication (using more experimental units than you actually have) causes underestimation of standard errors and P-values

Balance

In a balanced experimental design, all

treatments have equal sample size.

Balance increases precision

SEY 1 − Y 2

= sp2 1

n1+1n2

#

$ %

&

' ( .

For a given total sample size (n1+n2), the standard error is smallest when n1=n2.

Balance increases precision

n1+n2=20

n1 =10n2 =101n1+1n2= 0.2

n1 =19n2 =11n1+1n2=1.05

Blocking

The grouping of experimental units that have similar properties.

Within each block, treatments are

randomly assigned to experimental units.

Blocking accounts for extraneous variation

C = Control T = Treated

Variance among hospitals will not contribute to SE. Only variance within hospitals will contribute to "noise"

Paired design is an example of blocking

Treatment effects are measured by

differences between treatments within pairs. This minimizes the influence of

differences between pairs.

Randomized block design

Like a paired design but for more than two treatments.

Extreme Treatments

•  Treatment effects are easiest to detect when they are large.

•  Stronger treatments can increase the signal-to-noise ratio.

•  Caution: effects may not scale linearly

Experiments with more than one factor

A factor is a single treatment variable whose effects are of interest to the researcher Multiple factors to: •  Make more efficient use of money and

resources •  Estimate effects of interaction between

factors

Interaction between explanatory variables

The effect of one variable depends on the

state of a second variable

Factorial Design

•  Investigates all treatment combinations of two or more variables.

•  Can measure interactions between treatments

Example of factorial design and interaction

What if we can’t do experimental studies?

Observational studies are still useful to detect patterns and generate hypotheses

Best observational studies Minimize bias: •  Controls •  Randomization •  Blinding Minimize sampling error: •  Replication •  Balance •  Blocking •  Extreme treatments

Matching Every individual in the treatment group is paired with a control individual having the same or very similar values for the suspected confounding variables Does not account for all confounding variables (like randomization does), but only those used to match participants.

In-class Exercise Do people use more paper when they know it will be recycled? •  People given paper and told to test scissors. •  Recycling bin wither present or not No recycling bin: 4,4,4,4,4,4,4,5,8,9,9,9,9,12,12,13,14,14,14,14,15,23 Recycling bin: 4,5,8,8,8,9,9,9,12,14,14,15,16,19,23,28,40,43,129,130 1.  Make histograms and identify options for test 2.  Choose an test that you can do in class and conduct it