73
RNA-seq Design of experiments

RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

RNA-seqDesign of experiments

Page 2: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Experimental design

Page 3: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Introduction

• An experiment is a process or study that results in the collection of data.

• Statistical experiments are conducted in situations in which researchers can manipulate the conditions of the experiment and can control the factors that are irrelevant to the research objectives.

Page 4: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Statistical design of experiments

Experimental design is the process of planning a study to meet specified objectives. Planning an experiment properly is very important in order to ensure that the right type of data and a sufficient sample size and power are available to answer the research questions of interest as clearly and efficiently as possible.

Page 5: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Designing an experiment

• Perform the following steps when designing an experiment:

1. Define the problem and the questions to be addressed

2. Define the population of interest

3. Determine the need for sampling

4. Define the experimental design

Page 6: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Define problem

• Before data collection begins, specific questions that the researcher plans to examine must be clearly identified.

• In addition, a researcher should identify the sources of variability in the experimental conditions. One of the main goals of a designed experiment is to partition the effects of the sources of variability into distinct components in order to examine specific questions of interest.

• The objective of designed experiments is to improve the precision of the results in order to examine the research hypotheses.

Page 7: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Define population

• A population is a collective whole of people, animals, plants, or other items that researchers collect data from. Before collecting any data, it is important that researchers clearly define the population, including a description of the members.

• The designed experiment should designate the population for which the problem will be examined. The entire population for which the researcher wants to draw conclusions will be the focus of the experiment.

Page 8: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Determine the need for sampling

• A sample is one of many possible sub-sets of units that are selected from the population of interest.

• In many data collection studies, the population of interest is assumed to be much larger in size than the sample so, potentially, there are a very large (usually considered infinite) number of possible samples. The results from a sample are then used to draw valid inferences about the population.

Page 9: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Determine the need for sampling

• A random sample is a sub-set of units that are selected randomly from a population. A random sample represents the general population or the conditions that are selected for the experiment because the population of interest is too large to study in its entirety.

• Using techniques such as random selection after stratification or blocking is often preferred.

Page 10: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Determine the need for sampling

• Determining the sample size requires some knowledge of the observed or expected variance among sample members in addition to how large a difference among treatments you want to be able to detect.

• Another way to describe this aspect of the design stage is to conduct a prospective power analysis, which is a brief statement about the capability of an analysis to detect a practical difference. A power analysis is essential so that the data collection plan will work to enhance the statistical tests primarily by reducing residual variation, which is one of the key components of a power analysis study.

Page 11: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Define experimental design

• Defining the experimental design consists of the following steps:

1. Identify the experimental unit.

2. Identify the types of variables.

3. Define the treatment structure.

4. Define the design structure.

Page 12: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Experimental units

• An Experimental (or sampling) units is the person or object that will be studied by the researcher. This is the smallest unit of analysis in the experiment from which data will be collected (e.g. patient, mouse, plant, or cell line).

Page 13: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Experimental units

• An entity receiving an independent application of a treatment is called an experimental unit.

• An experimental run is the process of applying a particular treatment combination to an experimental unit and recording its response.

• A replicate is an independent run carried out on a different experimental unit under the same conditions.

Page 14: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Example: Two pots

Experimental unit: plant on the potNo replication

Page 15: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Types of variables

• A data collection plan considers how four important variables: background, constant, uncontrollable, and primary, fit into the study.

• The explanatory variables are referred to as factors.

• Inconclusive results are likely to result if any of these classifications are not adequately defined. It is important to consider all the relevant variables before the final data collection plan is approved in order to maximize confidence in the final results.

Page 16: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Background variables

• Background variables can be identified and measured yet cannot be controlled; they will influence the outcome of an experiment.

• Background variables will be treated as covariates in the model rather than primary variables.

Page 17: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Primary variables

• Primary variables are the variables of interest to the researcher.

• Primary variables are independent variables that are possible sources of variation in the response. These variables comprise the treatment and design structures and are referred to as factors.

• When background variables are used in an analysis, better estimates of the primary variables should result because the sources of variation that are supplied by the covariates have been removed.

Page 18: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Constant variables

• Constant variables can be controlled or measured but, for some reason, will be held constant over the duration of the study.

• This action increases the validity of the results by reducing extraneous sources of variation from entering the data. For this data collection plan, some of the variables that will be held constant include:

• the use of standard operating procedures • the use of one operator for each measuring device • all measurements taken at specific times and locations

Page 19: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Uncontrollable variables

• Uncontrollable variables are those variables that are known to exist, but conditions prevent them from being manipulated, or it is very difficult (due to cost or physical constraints) to measure them.

• The experimental error is due to the influential effects of uncontrollable variables, which will result in less precise evaluations of the effects of the primary and background variables.

• The design of the experiment should eliminate or control these types of variables as much as possible in order to increase confidence in the final results.

Page 20: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Explanatory and response variables

𝑋𝑋 𝑌𝑌

- Response variables- Explanatory variables- Factors

Page 21: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Factors

𝑋𝑋 𝑌𝑌

𝑍𝑍

Response variablesTreatment factor or design factor

Levels: 𝑋𝑋 = 𝑥𝑥Treatment combination or treatment: a particular combination of factor levels(e.g. 𝑥𝑥1, 𝑥𝑥2 if there are two treatment factors)

- Noise factor- Blocking factor

Page 22: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Primary factors

• The treatment structure consists of factors that the researcher wants to study and about which the researcher will make inferences.

• The primary (treatment or design) factors are controlled by the researcher and are expected to show the effects of greatest interest on the response variable(s).

Page 23: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Levels

• The levels of the primary factors represent the range of the inference space relative to a study. The levels of the primary factors can represent the entire range of possibilities or a random sub-set. It is also important to recognize and define when combinations of levels of two or more treatment factors are illogical or unlikely to exist.

Page 24: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Fixed effects

• Fixed effects treatment factors are usually considered to be "fixed" in the sense that all levels of interest are included in the study because they are selected by some non-random process, they consist of the whole population of possible levels, or other levels were not feasible to consider as part of the study.

• The fixed effects represent the levels of a set of precise hypotheses of interest in the research. A fixed factor can have only a small number of inherent levels; for example, the only relevant levels for gender are male and female.

• A factor should also be considered fixed when only certain values of it are of interest, even though other levels might exist. Treatment factors can also be considered "fixed" as opposed to "random" because they are the only levels about which you would want to make inferences.

Page 25: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Three basic principles of experimental design• Replication

• Randomization

• Blocking

Page 26: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Replication

• By replication we mean an independent repeat run of each treatment combination.

• Replication is essential for estimating experimental error.

• If a treatment condition appears more than one time, it is defined to be replicated.

• Misconceptions about the number of replications have often occurred in experiments where sub-samples or repeated observations on a unit have been mistaken as additional experimental units.

Page 27: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Randomization

By randomization we mean that both the assignment of treatments to units and the order in which the individual runs of the experiments are to be performed are randomly determined.

A completely randomized design is an experimental design in which treatments are assigned to all units by randomization.

Page 28: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Example: Randomized

Experimental unit: plant on the pot4 replicates for each treatment

Page 29: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Blocking

• Most experimental designs require experimental units to be allocated to treatments either randomly or randomly with constraints, as in blocked designs.

• Blocks are groups of experimental units that are formed to be as homogeneous as possible with respect to the block characteristics. The term block comes from the agricultural heritage of experimental design where a large block of land was selected for the various treatments, that had uniform soil, drainage, sunlight, and other important physical characteristics.

• Homogeneous clusters improve the comparison of treatments by randomly allocating levels of the treatments within each block.

Page 30: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Blocking

Blocking is an experimental design strategy used to reduce or eliminate the variability transmitted from nuisance factors, which may influence the response variable but in which we are not directly interested.

Blocking is the grouping of experimental units that have similar properties. Within each block, treatments are randomly assigned to experimental units. The resulting design is called a randomized block design. This design enables more precise estimates of the treatment effects because comparisons between treatments are made among homogeneous experimental units in each block.

Page 31: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Blocking

𝑋𝑋 𝑌𝑌

𝑍𝑍

Page 32: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Blocking example

Blocking removes the variation in response among chambers, allowing more precise estimates and more powerful tests of the treatment effects.

Page 33: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Design structure

• The design structure consists of those factors that define the blocking of the experimental units into clusters.

• The types of commonly used design structures:

• Completely randomized design

• Randomized complete block design

• Factorial design

Page 34: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Completely randomized design

• Subjects are assigned to treatments completely at random.

Page 35: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Randomized complete block design

• Subjects are divided into b blocks (see description of blocks above) according to demographic characteristics. Subjects in each block are then randomly assigned to treatments so that all treatment levels appear in each block.

Page 36: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Factorial design

Many experiments in biology investigate more than one treatment factor, because:

1. answering two questions from a single experiment rather than just one makes more efficient use of time, supplies, and other costs

2. the factors might interact.

Page 37: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Factorial design

An experiment having a factorial design investigates all treatment combinations of two or more treatment factors. A factorial design can measure interactions between factors.

An interaction between two (or more) explanatory variables means that the effect of one variable on the response depends on the state of the other variable.

Page 38: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Factorial design

𝑋𝑋1 𝑌𝑌

𝑋𝑋2

Page 39: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Analyzing data

Page 40: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

A unified model: general linear model

𝐸𝐸[𝑦𝑦] = 𝛽𝛽0 + 𝛽𝛽1𝑥𝑥1 + ⋯+ 𝛽𝛽𝑝𝑝−1𝑥𝑥𝑝𝑝−1

Page 41: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Basic linear models

Model formula Model Design

𝑦𝑦~𝑥𝑥 Linear regression Dose-response

𝑦𝑦~t One-way ANOVA Completely randomized

𝑦𝑦~t + b Two-way ANOVA Randomized block

𝑦𝑦~t1 + t2 + t1t2 Two-way, fixed-effect ANOVA

Factorial design

𝑦𝑦~𝑡𝑡 + 𝑥𝑥 ANCOVA Observation study with one known noise factor

𝑦𝑦~𝑥𝑥1 + 𝑥𝑥2 + 𝑥𝑥1𝑥𝑥2 Multiple linear regression

Dose-response

𝑥𝑥: numerical, t: categorical treatment factor, b: categorical blocking factor

Page 42: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Randomized complete block design

How does fish abundance affects the abundance and diversity of prey species?

Page 43: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Design

30 fish 90 fish

Control Low High

3𝑚𝑚 × 3𝑚𝑚

Page 44: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Data: Zooplankton diversity in three fish abundance treatments

1 2 3 4 5

Control 4.1 3.2 3.0 2.3 2.5

Low 2.2 2.4 1.5 1.3 2.6

High 1.3 2.0 1.0 1.0 1.6

Page 45: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Model: 𝑦𝑦~t + b

𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1𝑡𝑡𝑖𝑖 + 𝛽𝛽2bi + 𝜖𝜖𝑖𝑖

H0: Mean zooplankton diversity is the same in every abundance treatment

𝑦𝑦~b

H1: Mean zooplankton diversity is not the same in every abundance treatment

𝑦𝑦~t + b

Page 46: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Fitting the model to data

Page 47: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Adjusting for a known confounding factor

Page 48: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Adjusting for a known confounding factorMole rats are the only known mammals with distinct social castes. - A single queen and a small number of males are the only reproducing individuals in a colony.

- Workers gather food, defend the colony, care for the young, and maintain the burrows.

- Two worker castes in the Damaraland mole rat: - “Frequent workers”: do almost all of the work in the colony - “Infrequent workers”: do little work except on rare occasions after rains

Page 49: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Adjusting for a known confounding factorTo assess the physiological differences between the two types of workers, researchers compared daily energy expenditures of wild mole rats during a dry season.

Known noise factor: Energy expenditure appears to vary with body mass in both groups, but infrequent workers are heavier than frequent workers

Research question: How different is mean daily energy expenditure between the two groups when adjusted for differences in body mass?

Page 50: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Data

Page 51: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Data

Page 52: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Model: 𝑦𝑦~𝑡𝑡 + 𝑥𝑥H0: Castes do not differ in energy expenditure

𝑦𝑦~𝑥𝑥

H1: Castes differ in energy expenditure

𝑦𝑦~𝑡𝑡 + 𝑥𝑥

Page 53: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Fitting the model to data

Page 54: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Example: RNA-seq

Page 55: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Multiple factors

• Experiments with more than one factor influencing the counts can be analyzed using design formula that include the additional variables.

• In fact, DESeq2 can analyze any possible experimental design that can be expressed with fixed effects terms (multiple factors, designs with interactions, designs with continuous variables, splines, and so on are all possible).

• By adding variables to the design, one can control for additional variation in the counts. For example, if the condition samples are balanced across experimental batches, by including the batch factor to the design, one can increase the sensitivity for finding differences due to condition. There are multiple ways to analyze experiments when the additional variables are of interest and not just controlling factors.

Page 56: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Including type

Page 57: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Accounting for type

• We can account for the different types of sequencing, and get a clearer picture of the differences attributable to the treatment.

• As condition is the variable of interest, we put it at the end of the formula.

• Thus the results function will by default pull the condition results unless contrast or name arguments are specified. Then we can re-run DESeq.

Page 58: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Accounting for type

Page 59: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Accounting for type

Page 60: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Accounting for type

Page 61: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Accounting for type

• It is also possible to retrieve the log2 fold changes, p values and adjusted p values of the type variable. The contrast argument of the function results takes a character vector of length three: the name of the variable, the name of the factor level for the numerator of the log2 ratio, and the name of the factor level for the denominator.

Page 62: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Accounting for type

Page 63: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Gene Ontology

Page 64: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Annotating and exporting results

• Our result table only contains information about Ensemblgene IDs, but alternative gene names may be more informative for collaborators. Bioconductor’s annotation packages help with mapping various ID schemes to each other.

Page 65: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Annotating and exporting results

Page 66: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Annotating and exporting results

Page 67: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Running topGO

Page 68: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Running topGO

Page 69: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Running topGO

Page 70: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Running topGO

Page 71: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Downregulated GO

Page 72: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Upregulated GO

Page 73: RNA-seq · RNA-seq Design of experiments. Experimental design. Introduction • An experiment is a process or study that results in the collection of data. • Statistical experiments

Published results

The top 10 most significant terms are shown for downregulated (D) and upregulated (E) genes, respectively.