Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo,...

Design of Experiments

• Problem formulation

• Setting up the experiment

• Analysis of data

Panu Somervuo, March 20, 2007

Problem formulation

• what is the biological question?• how to answer that?• what is already known?• what information is missing?• problem formulation model of the biological system

Setting up an experiment

• what kind of data is needed to answer the question?• how to collect the data?• how much data is needed?• biological and technical replicates• pooling• how to carry out the experiment (sample preparation,

measurements)? ControlControl

TestTest

Analysis of data

• preprocessing• filtering & outlier removal• normalization• statistical model fitting• hypothesis testing• reporting the results, documentation

Everything depends on everything

problem formulationmodel of the system

setting up the experimentnumber of samples

analysis of datastatistical tests

Practical guidelines• blocking unwanted effects (e.g. dye effect)

• randomization (avoid systematic bias by randomizing e.g. the order of sample preparations)

• replication (replicate measurements can be averaged to reduce the effect of random errors)

group1 group2

cy5 cy5

cy3 cy3

ControlControl

TestTest

y = µ+F1+F2+...+errorlog transform

normalization

Pairwise sample comparison vs modeling

• pairwise sample comparison is easy and straightforward

• instead of comparing samples as such, we can construct a model for the measurements and then perform comparisons

ControlControl

TestTest

Mathematical model of data

• try to capture the essence of a (biological) phenomenon in mathematical terms

• here we concentrate on linear models: observation consists of effects of one or more factors and random error

• factor may have several levels (e.g. factor sex has two levels, male and female)

Examples of models

• single factor: y = µ + gene + error • two factors: y = µ + treatment + gene + error

• two factors including interaction term: y = µ + treatment + gene + treatment.gene + error • four factors: y = µ + treatment + gene + dye + array + error

normalization, log transform

From model to experimental design y = µ + drug + sex + drug.sex + error

factor 1, drug: 3 levelsfactor 2, sex: 2 levels3x2 factorial design:

no treatment y111, y112, y113, y114

y121, y122, y123, y124

treatment A y211, y212, y213, y214

y221, y222, y223, y224

treatment B y311, y312, y313, y314

y321, y322, y323, y324

Analysis of variance

• ANOVA can be used to analyse factorial designsy = µ + drug + sex + drug.sex + error

no treatment 1.0, 1.1, 0.9, 1.3

0.7, 0.5, 0.6, 0.8

treatment A 1.1, 1.2, 0.8, 1.3

0.7, 0.8, 0.6, 0.9

treatment B 2.1, 1.9, 1.7, 2.0

1.5, 1.3, 1.4, 1.1

summary(aov(y~drug*sex,data=data))

Df Sum Sq Mean Sq F value Pr(>F) drug 2 2.86750 1.43375 51.3582 3.644e-08 ***sex 1 1.26042 1.26042 45.1493 2.673e-06 ***drug:sex 2 0.06583 0.03292 1.1791 0.3302 Residuals 18 0.50250 0.02792 ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Multiple pairwise comparisons

• ANOVA tells that at least one drug treatment has effect, but in order to find which one we perform all pairwise comparisons:

no treatment 1.0, 1.1, 0.9, 1.3

0.7, 0.5, 0.6, 0.8

treatment A 1.1, 1.2, 0.8, 1.3

0.7, 0.8, 0.6, 0.9

treatment B 2.1, 1.9, 1.7, 2.0

1.5, 1.3, 1.4, 1.1

TukeyHSD(aov(y~drug*sex,data=data,"drug")

Tukey multiple comparisons of means 95% family-wise confidence level factor levels have been ordered

Fit: aov(formula = y ~ drug * sex, data = data)

$drug diff lwr uprA-0 0.0625 -0.1507113 0.2757113B-0 0.7625 0.5492887 0.9757113B-A 0.7000 0.4867887 0.9132113

Benefits of (good) models• after fitting the model with data, model can be used to answer the questions

e.g.:– is there dye effect?– is the difference of gene expression levels in two conditions statistically

significant?– is there interaction between gene and another factor?

• simple pairwise sample comparisons cannot give answers to all of these questions simultaneously

ControlControl

TestTest

y=µ+F1+F2+...+error

What is a good model?

• good model allows us to get more detailed results • best model and parametrization is application specific• simple vs complex model

y=µ+F1+F2+F3+...+error• there should be balance between model complexity and

the amount of data

dye1 dye2

control y111, y112, y113

y121, y122, y123

treatment A y211, y212, y213

y221, y222, y223

treatment B y311, y312, y313

y321, y322, y323

How the number of samples affects the confidence of our results?

• measurement error is always present, see the example self-self hybridization:

How the number of samples affects the confidence of our results?

• let’s compute the mean average of expression level of a gene• how accurate is this value?• variance(mean) = variance(error)/number of samples• samples from normal distribution (mean 0, sd 1):

Theoretical sample size calculations

• for each statistical test, there is a (test-specific) relation between:– power of a test: 1 – probability(type I error)– significance level: probability(type II error)– error variance– mean difference needed to be detected– number of samples

actual situation

drug has effect

actual situation

drug has no effect

our conclusion

drug has effect

correct conlusion

true positive

probability

type I error

false positive

probability

our conclusion

drug has no effect

type II error

false negative

probability

correct conclusion

true negative

probability

How many samples are needed to detect sample mean difference of 1 unit ?

R function power.t.test:

> power.t.test(delta=1,power=0.95,sd=1,sig.level=0.05)

Two-sample t test power calculation

n = 26.98922 delta = 1 sd = 1 sig.level = 0.05 power = 0.95 alternative = two.sided

NOTE: n is number in *each* group

What is the power of test when using 10 samples ?

> power.t.test(n=10,delta=1,sd=1,sig.level=0.05)

n = 10 delta = 1 sd = 1 sig.level = 0.05 power = 0.5619846 alternative = two.sided

How small difference between sample means we are able to detect using 10 samples ?

> power.t.test(n=10,power=0.95,sd=1,sig.level=0.05)

n = 10 delta = 1.706224 sd = 1 sig.level = 0.05 power = 0.95 alternative = two.sided

Two kinds of replicates

• biological replicates: biological variability

• technical replicates: measurement accuracy

• most statistical programs assume independent samples

A3 A2 A1 B3 B2B1 C3C2C1 D3 D2 D1

Pooling

• ok when the interest is not on the individual, but on common patterns across individuals (population characteristics)

• results in averaging reduces variability substantive features are easier to find

• recommended when fewer than 3 arrays are used in each condition• beneficial when many subjects are pooled• one pool vs independent samples in multiple pools

C. Kendziorski, R. A. Irizarry, K.-S. Chen, J. D. Haag, and M. N. Gould,"On the utility of pooling biological samples in microarray experiments",PNAS March 2005, 102(12) 4252-4257

inference for most genes was not affected by pooling

How to allocate the samples to microarrays?

• which samples should be hybridized on the same slide?• different experimental designs• reference design, loop design• what is the optimal design?

Example of four-array experiment

array cy3 cy5 log(cy5/cy3)

1 A B log(B) – log(A)

3 B A log(A) – log(B)

4 B A log(A) – log(B)

1 2 3 4

cy3cy5

Reference design

1 Ref A log(A) – log(Ref)

2 Ref B log(B) – log(Ref)

3 Ref C log(C) – log(Ref)

4 Ref D log(D) – log(Ref)

log(C/A) = log(C) - log(A) = log(C) - log(Ref) + log(Ref) - log(A)

= log(C) - log(Ref) – (log(A) - log(Ref)) = logratio(array3) - logratio(array1)

Loop design

2 B C log(C) – log(B)

3 C D log(D) – log(C)

4 D A log(A) – log(D)

log(C/A) = log(C) – log(B) + log(B) – log(A)= logratio(array2) + logratio(array1)

log(C/A) = log(C) – log(D) + log(D) – log(A) = - logratio(array3) - logratio(array4)

log(C/A)=(logratio1 + logratio2)/2

Comparing the designs

reference design reference design with replicates

loop design

number of arrays 3 6 3

amount of RNA required per sample

1+Ref 2+Ref 2

error 2.0 1.0 0.67

Design with all direct pairwise comparisons

Parental - stressedParental - stressed

Parental - unstressedParental - unstressed

Derived - stressedDerived - stressed

Derived - unstressedDerived - unstressed

EnvironmentEnvironment

GenotypeGenotype

Example: examining genotype, phenotype, and environment

Reference SampleReference Sample

Assay VariationAssay Variation

Optimal design• maximize the accuracy of parameters of interest• procedure: enumerate all possible designs, calculate

the parameter accuracy for each of them and select the best design

• optimal design is model specific

About the nature of microarray data

• Microarray data can give hypothesis to be tested further• Results from microarray analysis should be cerified by

other means (qPCR,...)• quality of microarray data depends on samples, probes,

hybridization, lab work• data pre-processing, normalization, and outlier detection

are as important as good experimental design

More about statistics

• M.J. Crawley: ”Statistics – An Introduction using R”, John Wiley&Sons, 2005

• S.A. Glantz: ”Primer of Biostatistics”, McGraw-Hill, 5th ed., 2002

• D.C. Montgomery: ”Design and Analysis of Experiments”, John Wiley&Sons, 5th ed. 2001

• Google

Design of Experiments Problem formulation Setting up the experiment Analysis of data Panu Somervuo,...

Documents

ConX – XEUS meeting Panu Helistö, Mikko Kiviranta Utrecht, 26-27.10.2004

Minicourse on BV functions - maths.ox.ac.uk€¦ · Minicourse on BV functions University of Oxford Panu Lahti June 16, 2016 University of Oxford Panu Lahti Minicourse on BV functions

Design formulation ● design disciplines ● differences ● commonalities ● formulation 1/24

Cover Sheet: Request 10677fora.aa.ufl.edu/docs/47//18Oct16//18Oct_MCS4XXX_R... · RNA-seq Data Analysis: A Practical Approach. Eija Korpelainen, Jarno Tuimala, Panu Somervuo, Mikael

Public Budget Formulation (PBF) Budget Formulation

Panu Promputthangkoon* and Bancherd …rdo.psu.ac.th/sjstweb/journal/35-5/35-5-11.pdfOriginal Article Compound soil-tyre chips modified by cement as a road construction material Panu

BSQ strategic formulation BSQ strategic formulation … › lckoo › 2004BSQstrategicformulationframework.pdfstrategic formulation framework. Nevertheless, many modern practitioners

LISA BECK AND PANU LAHTI arXiv:2009.09889v1 [math.FA] 21 … · 2020. 9. 22. · 2 LISA BECK AND PANU LAHTI This result is analogous to the case of weakly convergent sequences in

Panu Somervuo, March 19, 2007 1 cDNA microarrays

Annual Report 2012 - helsinki.fi · Panu Somervuo Ulisses Camargo Mar Cabeza Patrik Koskinen 1 Jussi Jousimo Anni Arponen Anniina Mattila Markku Karhunen 4 Astrid van Teeffelen Swee

25.01.2000 Panu Suhonen, SW Development Manager1 Wicom Page 1 All rights reserved. Wicom Communications Oy IP Telephony Applications Wicom Communications Oy 25.01.2000 Panu Suhonen,

Panu Aho Procurement and in Turkujulkaisut.turkuamk.fi/isbn9789522167132.pdf · Panu Aho Procurement and commissioning of electric city buses in Turku Observations from the eFÖLI

Preparation of Polyherbal Formulation for Antidepressant ... · PDF filePreparation of Polyherbal Formulation for Antidepressant activity. ... Centella asiatica, ... polyherbal formulation

Verkkokauppa.com Q3 2018 26.10.2018, Panu Porkka, CEO ... · 26.10.2018, Panu Porkka, CEO STOREFRONT RETAIL IS GOING ONLINE. COME ALONG. Questions during or after the presentation

Juhani Aho Panu

ConX – XEUS meeting Panu Helistö, Mikko Kiviranta

Mustakallio, Panu; Kosonen, Risto; Melikov, Arsen The effects of … · Mustakallio, Panu; Kosonen, Risto; Melikov, Arsen The effects of mixing air distribution and heat load arrangement

Strategy formulation

Panu Kalliokoski, Solita “Why Custom Software Should Be Open Source” - Mindtrek 2017

Niosomal Formulation Of Orlistat: Formulation And In-Vitro