78
Basic statistics Descriptive statistics and ANOVA Thomas Alexander Gerds Department of Biostatistics, University of Copenhagen

Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Basic statisticsDescriptive statistics and ANOVA

Thomas Alexander Gerds

Department of Biostatistics, University of Copenhagen

Page 2: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Contents

I Data are variableI Statistical uncertaintyI Summary and display of dataI Confidence intervalsI ANOVA

Page 3: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Data are variable

A statistician is used to receive a value, such as

3.17 %,

together with an explanation, such as

"this is the expression of 1-B6.DBA-GTM in mouse 12".

The value from the next mouse in the list is 4.88% . . .

Page 4: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

The measurement is difficult

Page 5: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Data processing is done by humans

Page 6: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Two mice have different genes

Page 7: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

They are exposed

Page 8: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

. . . and treated differently

Page 9: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Decomposing variance

Variability of data is usually a composite of

I Measurement error, sampling schemeI Random variationI GenotypeI Exposure, life style, environmentI Treatment

Statistical conclusions can often be obtained by explaining thesources of variation in the data.

Page 10: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example 1

In the yeast experiment of Smith and Kruglyak (2008) 1 transcriptlevels were profiled in 6 replicates of the same strain called ’RM’ inglucose under controlled conditions.

1the article is available at http://biology.plosjournals.org

Page 11: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example 1

Figure:

Page 12: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Sources of the variation of these 6 values

I Measurement errorI Random variation

Page 13: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example 1

In the same yeast experiment Smith and Kruglyak (2008) profiledalso 6 replicates of a different strain called ’By’ in glucose.Theorder in which the 12 samples were processed was at random tominimize a systematic experimental effect.

Page 14: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example 1

Figure:

Page 15: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Sources of the variation of these 12 values

I Measurement errorI Study design/experimental environmentI Genotype

Page 16: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example 1

Furthermore, Smith and Kruglyak (2008) cultured 6 ’RM’ and 6’By’ replicates in ethanol.The order in which the 24 samples wereprocessed was random to minimize a systematic experimental effect.

Page 17: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Sources of variation

Figure:

Page 18: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Sources of variation

I Measurement errorI Experimental environmentI GenesI Exposure, environmental factors

Page 19: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example 2

Festing and Weigler in the Handbook of Laboratory Animal Science. . .

. . . consider the results of an experiment using a completelyrandomized design . . .

. . . in which adult C57BL/6 mice were randomly allocated to oneof four dose levels of a hormone compound.

The uterus weight was measured after an appropriate time interval.

Page 20: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example 2

Figure:

Page 21: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example 2

Figure:

Page 22: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example 2

Figure:

Page 23: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example 2

Conclusions from the figures

I The uterus weight depends on the doseI The variation of the data increases with increasing dose

Question: Why could these first conclusions be wrong?

Page 24: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Descriptive statistics

Page 25: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Descriptive statistics (summarizing data)

Categorical variables: count (%).

Continuous variables:I raw values (if n is small)I range (min, max)I location: median (IQR=inter quartile range)I location: means (SD)

Page 26: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Sample: Table 1

22Quality of life (QOL), supportive care, and spirituality in hematopoietic

stem cell transplant (HSCT) patients. Sirilla & Overcash. Supportive Care inCancer, October 2012.

Page 27: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Sample: Table 1

Page 28: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

R excursion: calculating descriptive statistics in groups

library(Publish)library(data.table)data(Diabetes)setDT(Diabetes) ## make data.tableDiabetes[,.(mean.age=mean(age), sd.age=sd(age),median.

chol=median(chol,na.rm=TRUE)),by=location]

location mean.age sd.age median.chol1: Buckingham 47.07500 16.74849 2022: Louisa 46.63054 15.90929 206

Page 29: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

R excursion: making table onelibrary(Publish)data(Diabetes)tab1 <- summary(utable(location∼gender + age + Q(chol) + BMI,

data=Diabetes))tab1

Variable Level Buckingham (n=200) Louisa (n=203) Total (n=403) p-valuegender female 114 (57.0) 120 (59.1) 234 (58.1)

male 86 (43.0) 83 (40.9) 169 (41.9) 0.7422age mean (sd) 47.1 (16.7) 46.6 (15.9) 46.9 (16.3) 0.7847chol median [iqr] 202.0 [174.0, 231.0] 206.0 [183.5, 229.0] 204.0 [179.0, 230.0] 0.2017

missing 1 0 1BMI mean (sd) 28.6 (7.0) 29.0 (6.2) 28.8 (6.6) 0.5424

missing 3 3 6

Page 30: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

R excursion: exporting a table

Method 1: Write table to file

write.csv(tab1,file="tables/tab1.csv")

Then open file tab1.csv with Excel

Method 2: Use kable3 and include in dynamic report4

‘‘‘{r,results=’asis’}knitr::kable(tab1)‘‘‘

3https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html

4https://www.rdocumentation.org/packages/knitr/versions/1.17/topics/kable

Page 31: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

R excursion: exporting a table

Method 1: Write table to file

write.csv(tab1,file="tables/tab1.csv")

Then open file tab1.csv with Excel

Method 2: Use kable3 and include in dynamic report4

‘‘‘{r,results=’asis’}knitr::kable(tab1)‘‘‘

3https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html

4https://www.rdocumentation.org/packages/knitr/versions/1.17/topics/kable

Page 32: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Dynamite plots are depreciated (DO NOT USE)

Page 33: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Exercise

I Read and discuss the documentation of why dynamite plotsare not good:

http://biostat.mc.vanderbilt.edu/wiki/Main/DynamitePlots

Page 34: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Dot plots are appreciated when n is small

●●

●●

Mea

sure

men

t sca

le

−3

−2

−1

01

23

A B C

Figure: Group A (n=3), group B (n=3, one replicate), group C (n=4)

Page 35: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Box plots are appreciated when n is large

●●

●●

●●

●●●

Mea

sure

men

t sca

le

−4

−2

02

4

A B C

Figure: Group A (n=300), group B (n=400), group C (n=400)

Page 36: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Making boxplots with ggplot2

library(ggplot2)bp <- ggplot(Diabetes, aes(location,chol))bp <- bp + geom_boxplot(aes(fill=location))print(bp)

Find the ggplot2 cheat sheet via help menu in Rstudio

Page 37: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Making boxplots with ggplot2

●●

●●

●●

100

200

300

400

Buckingham Louisalocation

chol

location

Buckingham

Louisa

Page 38: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Making boxplots with ggplot2

bp+facet_grid(.∼gender)

●●

●●

female male

Buckingham Louisa Buckingham Louisa

100

200

300

400

location

chol

location

Buckingham

Louisa

Page 39: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Making dotplots with ggplot2dp <- ggplot(mice,aes(x=Dose,fill=Dose,y=BodyWeight))dp <- dp + geom_dotplot(binaxis="y")print(dp)

●●

●●

●●●

10

11

12

13

0 1 2.5 7.5 50Dose

Bod

yWei

ght

Dose

0

1

2.5

7.5

50

Page 40: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

R excursion: exporting a figure

Write figure to pdf (vector graphics, also eps 5, infinite zoom)

ggsave(dp,file="dotplot-mice-bodyweight.pdf")# orpdf("figures/dotplot-mice-bodyweight.pdf")dpdev.off()

Write figure to jpg (image file, also tiff, giff etc)

jpeg("figures/dotplot-mice-bodyweight.jpg")dpdev.off()

5postscript("figures/dotplot-mice-bodyweight.eps")

Page 41: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Quantifying variability

A sample of data X1, . . . ,XN has a standard deviation (sd); it isdefined by

SD =

√√√√ 1N − 1

N∑i=1

(Xi − X )2; X =1N

N∑i=1

Xi

SD measures the variability of the measurements in the sample.

The variance of the sample is defined as SD2. The term ’standarddeviation’ relates to the normal distribution.

Page 42: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Normal distribution

Page 43: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

What is so special about the normal distribution?

I It is symmetric around the mean, thus the mean is equal tothe median.

I The mean is the most likely value. Mean and standarddeviation describe the full destribution.

I The distribution of measurements, like height, distance,volume is often normal.

I The distribution of statistics, like mean, proportion, meandifference, etc. are very often approximately normal.

Page 44: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Quantifying statistical uncertainty

For statistical inference and conclusion making, via p-values andconfidence intervals, it is crucial to quantify the variability of thestatistic (mean, proportion, mean difference, risk ratio, etc.):

The standard error is the standard deviation of the statistic.

The standard error is a measure of the statistical uncertainty.

Page 45: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Illustration

Population:

Mean = 3.81

Page 46: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Illustration

Population:

Mean = 3.81Mean = 2.13

Page 47: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Illustration

Population:

Mean = 3.81

Mean = 4.01

Mean = 2.13

Page 48: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Quantifying statistical uncertainty

Example: We want to estimate the unknown mean uterus weightfor untreated mice. The standard error of the mean is defined as

SE = SD/√N where N is the sample size.

Based on N = 4 values, 0.012, 0.0088, 0.0069, 0.009:

I mean: β̂ = 0.0091I standard deviation: SD = 0.002108I empirical variance: var = 0.0000044I standard error: SE = 0.002108/2 = 0.001054

Page 49: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Quantifying statistical uncertainty

Example: We want to estimate the unknown mean uterus weightfor untreated mice. The standard error of the mean is defined as

SE = SD/√N where N is the sample size.

Based on N = 4 values, 0.012, 0.0088, 0.0069, 0.009:

I mean: β̂ = 0.0091I standard deviation: SD = 0.002108I empirical variance: var = 0.0000044I standard error: SE = 0.002108/2 = 0.001054

Page 50: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

The standard error is the standard deviation of the mean

Ute

rus

wei

ght (

g)

0.00

00.

005

0.01

00.

015

Ourstudy

Hypotheticalstudy 1

Hypotheticalstudy 47

Hypotheticalstudy 100

The unknown trueaverage uterus

weight●

The (hypothetical) mean values are approximately normallydistributed, even if the data are not normally distributed!

Page 51: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Variance vs statistical uncertainty

"’The terms standard error and standard deviation are oftenconfused. The contrast between these two terms reflects theimportant distinction between data description and inference, onethat all researchers should appreciate."’ 6

Rules:I The higher the unexplained variability of the data, the higher

the statistical uncertainty.I The higher the sample size, the lower the statistical

uncertainty.

6Altman & Bland, Statistics Notes, BMJ, 2005, Nagele P, Br J Anaesthesiol2003;90: 514-6

Page 52: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Confidence intervals

Page 53: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Constructing confidence limits

A 95% confidence interval for the parameter β is

[β̂ − 1.96 ∗ SE ; β̂ + 1.96 ∗ SE ]

Example: a confidence interval for the mean uterus weight ofuntreated mice is given by

95%CI = [0.0091− 1.96 ∗ 0.001054; 0.0091+ 1.96 ∗ 0.001054]= [0.007; 0.011].

The standard error SE measures the variability of the mean β̂around the (unknown) population value β, under the assumptionthat the model is correctly specified.

Page 54: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

The idea of a 95% confidence interval

Ute

rus

wei

ght (

g)

0.00

00.

005

0.01

00.

015

Ourstudy

Hypotheticalstudy 1

Hypotheticalstudy 47

Hypotheticalstudy 100

The unknown trueaverage uterus

weight●

By construction, we expect at most 5 of the 100 confidenceintervals not to cover (include) the true value.

Page 55: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Confidence limits for the mean uterus weights (long code)

library(Publish)cidat <- mice[,{mean=mean(UterusWeight)

se=sqrt(var(UterusWeight)/.N)list(mean=mean,

lower=mean-se*qnorm(1 - 0.05/2),upper=mean+se*qnorm(1 - 0.05/2))},by=Dose]

publish(cidat,digits=1)

Dose mean lower upper0.0 0.009 0.007 0.011.0 0.025 0.020 0.032.5 0.051 0.046 0.067.5 0.089 0.079 0.10

50.0 0.087 0.066 0.11

Page 56: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Confidence limits for the mean uterus weights (short code)

library(Publish)cidat <- mice[,ci.mean(UterusWeight),by=Dose]publish(cidat,digits=1)

Dose mean se lower upper level statistic0.0 0.009 0.001 0.006 0.01 0.05 arithmetic1.0 0.025 0.002 0.018 0.03 0.05 arithmetic2.5 0.051 0.002 0.044 0.06 0.05 arithmetic7.5 0.089 0.005 0.072 0.11 0.05 arithmetic

50.0 0.087 0.011 0.053 0.12 0.05 arithmetic

Page 57: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Confidence limits for the geometric mean uterus weights(short code)

library(Publish)gcidat <- mice[,ci.mean(UterusWeight,statistic="

geometric"),by=Dose]publish(gcidat,digits=1)

Dose geomean se lower upper level statistic0.0 0.009 1.1 0.006 0.01 0.05 geometric1.0 0.024 1.1 0.018 0.03 0.05 geometric2.5 0.051 1.0 0.044 0.06 0.05 geometric7.5 0.089 1.1 0.073 0.11 0.05 geometric

50.0 0.085 1.1 0.057 0.13 0.05 geometric

Page 58: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

ggplot2: Plot of means with confidence intervals

library(ggplot2)pom <- ggplot(cidat)+geom_pointrange(aes(x=Dose,

y=meanU,ymin=upperU,ymax=lowerU),color=4)

pom + coord_flip() + ylab("Uterus weight (g)")+xlab("Dose")

Page 59: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Plot of means with confidence intervals

0

1

2.5

7.5

50

0.03 0.06 0.09Uterus weight (g)

Dos

e

Page 60: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Publish: Plot of means with confidence intervals (code)

library(Publish)u <- plotConfidence(x=cidat$mean,

lower=cidat$lower,upper=cidat$upper,labels=cidat$Dose,title.labels="Hormon dose",title.values=expression(

bold(paste("Mean (",CI[95],")"))),cex=1.8,stripes=TRUE,stripes.col=c("gray95","white"),xratio=c(.2,.3),xlim=c(0,.15),xlab="Uterus weight (g)")

Page 61: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Publish: Plot of means with confidence intervals (result)

0

1

2.5

7.5

50

Hormon dose

0.01 (0.01−0.01)

0.02 (0.02−0.03)

0.05 (0.04−0.06)

0.09 (0.07−0.11)

0.09 (0.05−0.12)

Mean (CI95)

0.00 0.05 0.10 0.15Uterus weight (g)

Page 62: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Parameters

It is generally difficult to interpret a p-value without furtherquantification of the parameter of interest.

Parameters are interpretable characteristics that have to beestimated based on data.

Examples that we will study during the course:

I MeansI Mean differencesI ProbabilitiesI Risk ratios, odds ratios, hazard ratiosI Association parameters, regression coefficients

Page 63: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Juonala et al. (part I)

Aims: The objective was to produce reference values and to analyse theassociations of age and sex with carotid intima-media thickness (IMT),carotid compliance (CAC), and brachial flow-mediated dilatation (FMD) inyoung healthy adults.

Methods and results: We measured IMT, CAC, and FMD with ultrasound in2265 subjects aged 24–39 years. The mean values (mean ± SD) in men andwomen were 0.592± 0.10 vs. 0.572± 0.08mm (P < 0.0001) for IMT,2.00± 0.66 vs. 2.31± 0.77%/10 mmHg (P < 0.0001) for CAC, and6.95± 4.00 vs. 8.83± 4.56% (P < 0.0001) for FMD.

The sex differences in IMT (95% confidence interval= [-0.013; 0.004] mm,P = 0.37) and CAC (95% CI=[-0.01;0.18]%/10 mmHg, P = 0.09) becamenon-significant after adjustments with risk factors and carotid diameter.

Page 64: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Confidence intervals

A confidence interval is a range of values which covers the unknowntrue population parameter with high probability. Roughly theprobability is 100− α% where α is the level of significance.

For example:−0.013 to 0.004

is a 95% confidence interval for the unknown average difference inIMT between men and women.

Confidence intervals have the advantage over p-values, that theirabsolute value has a direct interpretation.7

7Confidence intervals rather than P values: estimation rather thanhypothesis testing. Statistics with Confidence, Altman et al.

Page 65: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Relation between confidence intervals and p-values

If we estimate the parameter β, e.g.

β = mean(IMT men )-mean(IMT women)

and have computed a 95% confidence interval for this parameter,

[lower95, upper95]

then the null hypothesis

β = 0 "There is no difference"

can be rejected at the 5% significance level if the value 0 is notincluded in the interval: 0 /∈ [lower95, upper95].

Page 66: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

ANOVA

Page 67: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Example (DGA p.208)

22 cardiac bypass operation patients were randomized to 3 types ofventilation.

Outcome: Red cell folate level (µ g/l)Group Ventilation N Mean SdI 50% N2O, 50% O2 in 24 hours 8 316.6 58.7II 50% N2O, 50% O2 during operation 9 256.4 37.1III 30–50% O2 (no N2O) in 24 hours 5 278.0 33.8

Page 68: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

ANOVA

# R-codeanova(lm(cell∼group,data=RedCellData))

Page 69: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

ANOVA table for red cell folate levels

Source ofvariation

Degreesof free-dom

Sum ofsquares

Meansquares

F P

Betweengroups

2 15515.88 7757.9 3.71 0.04

Withingroups

19 39716.09 2090.3

Total 21 55231.97

Page 70: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

What are sum of squares and degrees of freedom?

Recall the definition of the variance for a sample of N valuesX1, . . . ,XN with mean=X :

Var =1

N − 1{(X1 − X )2 + · · ·+ (XN − X )2}

Var =1

N − 1︸ ︷︷ ︸degrees of freedom

{(X1 − X )2 + · · ·+ (XN − X )2︸ ︷︷ ︸Sum of squares

}

In ANOVA terminology the variance is referred to as a mean squarewhich is short for: mean squared deviation from the mean.

Page 71: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

What are sum of squares and degrees of freedom?

Recall the definition of the variance for a sample of N valuesX1, . . . ,XN with mean=X :

Var =1

N − 1{(X1 − X )2 + · · ·+ (XN − X )2}

Var =1

N − 1︸ ︷︷ ︸degrees of freedom

{(X1 − X )2 + · · ·+ (XN − X )2︸ ︷︷ ︸Sum of squares

}

In ANOVA terminology the variance is referred to as a mean squarewhich is short for: mean squared deviation from the mean.

Page 72: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

What are sum of squares and degrees of freedom?

Recall the definition of the variance for a sample of N valuesX1, . . . ,XN with mean=X :

Var =1

N − 1{(X1 − X )2 + · · ·+ (XN − X )2}

Var =1

N − 1︸ ︷︷ ︸degrees of freedom

{(X1 − X )2 + · · ·+ (XN − X )2︸ ︷︷ ︸Sum of squares

}

In ANOVA terminology the variance is referred to as a mean squarewhich is short for: mean squared deviation from the mean.

Page 73: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

ANOVA methods

I Independent observationsI t test for two groupsI One-way ANOVA for more groupsI More-way ANOVA for more grouping variables

I Dependent observations:I Repeated measures anovaI Mixed effect models

I Rank statistics (non-parametric ANOVA tests)I Nonparametric anova (Kruskal-Wallis test)

I Mixture of discrete and continuous factors:I Ancova

I Model comparison and model selection . . .

Page 74: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Nice method

Nice methods, but what is the question?

Page 75: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Typical F-test hypotheses

H0 Null hypothesis The red cell folate does not depend onthe treatment

H1 Alternativehypothesis

The red cell folate does depend on thetreatment

This means

H0 : Mean group I = Mean group II = Mean group III

H1 : Mean group I 6= Mean group IIor Mean group III 6= Mean group IIor Mean group I 6= Mean group III

Usually we want to know which treatment yields the best response.

Page 76: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

F-test statistic

Central idea: The deviation of a subjects response from the grandmean of all responses is attributable to a deviation of that valuefrom its group mean plus the deviation of that group mean fromthe grand mean.

F =between-group variabilitywithin-group variability

=Variance of the mean response values between groups

Variance of the values within the groups

If the between-group variability is large relative to the within-groupvariability, then the grouping factor contributes to the systematicpart of the variability of the response values.

Page 77: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Conclusions from the ANOVA table

Source ofvariation

Degreesof free-dom

Sum ofsquares

Meansquares

F P

Betweengroups

2 15515.88 7757.9 3.710.04

Withingroups

19 39716.09 2090.3

Total 21 55231.97

Conclusion: The red cell folate depends significantly on thetreatment.

Page 78: Basic statistics - Descriptive statistics and ANOVApublicifsv.sund.ku.dk/.../lecturenotes/DescriptiveStatistics-handout.pdf · Example1 IntheyeastexperimentofSmithandKruglyak(2008)1

Take home messages

I The variation of data can be decomposed into a systematicand a random part.

I The standard deviation quantifies the variability of the data.I The standard error quantifies the uncertainty of statistical

conclusions.I ANOVA is an old and general statistical technique with many

different applications.