Upload
truongdan
View
239
Download
3
Embed Size (px)
Citation preview
Biostatistics and Experimental Design
Gerhard Thallinger
Institute of Computational BiotechnologyGraz University of Technology
http://genome.tugraz.at
based on lecture notes from Hubert Hackl
2017/2018
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Aims
At the end of this course, you should be able to ...
I ... understand statistical results (understand statistics in(bio)medical publications)
I ... analyze and visualise data by applying appropriate statisticalmethods
I ... design experiments for research and clinical studiesI ... judge statistical results from a critical point of viewI ... use R, a free software environment for statistical computing
and graphics
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Simpson’s paradoxall
Drug Recovery Sum Recovery rateyes no
new 20 20 40 50%old 16 24 40 40%
female
Drug Recovery Sum Recovery rateyes no
new 18 12 30 60%old 7 3 10 70%
male
Drug Recovery Sum Recovery rateyes no
new 2 8 10 20%old 9 21 30 30%
Example adapted from: Pearl J. Causality: Models, Reasoning, and Inference,Cambridge University Press, 1st edition, 2000;174ff
Simpson EH. Journal of the Royal Statistical Society, Ser. B, 1951;13:238-241
Simpson’s paradox
Confounding variablesExamples: kidney stone treatment, sex bias, education, ...
Breadth and length of skulls (Pearson 1896)
Pearson K. Phil. Trans. R. Soc. Lond. A, 1896;187:253-318
Car/goat problem (The Monty Hall Paradox)
One of three doors hides a car (all three equally likely) and the othertwo hide goats. You choose Door A. The host, who knows where thecar is, then opens one of the other two doors to reveal a goat, andasks whether you wish to change your choice. Say he opens Door C;should you stick with the original choice, Door A, or switch to Door B?
”Let’s make a deal”
Car/goat problem
Naıve approachRegardless of the initial situation, there are now only two doors fromwhich I could choose.
p(car is behind A) = p(car is not behind A) = 12
⇒ There is no advantage in switching the door.
Bayes theorem
p(A|open C) =p(open C|A)× p(A)
p(open C)=
12 ×
13
12
=13
p(B|open C) =p(open C|B)× p(B)
p(open C)=
1× 13
12
=23
⇒ The probability of winning the car is bigger if one changes the door.
Diagnosis study
1 in 1000 persons are suffering from a disease. There is a test, whichgives wrong results with a probability of 5% (false-positive rate is 5%).
What is the probability that a person with positive test result has thisdisease?
The naıve approach would yield 95%.
Considering the prevalence of the disease the probability of havingthe disease when the test is positive is less than 2%.
Biostatistics
BiostatisticsApplication of statistics in biology and medicine and related research.Guidelines to conduct and interpret medical studies.Helps to objectify evaluation of medical data
Descriptive statisticsAim is to describe data by characteristic values and visualization withgraphical procedures in a short and concise wayData presented without measure of significance
Inferential statisticsAre used to draw inferences about a population from a sample.Hypothesis testingQuantify uncertainty of decisionParameter estimation
Key concepts
PopulationCollection of all objects, events or individuals (people) about whomyou would like to ask a research question.
SampleTo study a population, the researcher typically selects a small group,called a sample, from the population.The sample size is the number of individuals in the sample (not thenumber of measurements you make on each person!). The sampleshould be representative and random.
Random sampleSample chosen from a population in a fashion that ensures everyobject, event, item or individual has an equal chance of being drawn.The selection of any one entity can in no way influence or affect theselection of any other(independent).
IndividualsObjects, events, persons, individuals (observation unit)
What statistical calculations can do
Statistical estimationAn example is to calculate the mean of a sample. This is only anestimate of the population mean and called a point estimate. Youwant also know how good this estimate is and want to give a range ofvalues (confidence interval)
Statistical hypothesis testingStatistical hypothesis testing helps to decide whether an observeddifference is likely to be caused by chance and provide a measurecalled p-value.
Statistical modelingStatistical modeling tests how well experimental data fit amathematical model constructed from e.g. physical principles. Anexample for this is linear regression.
ExamplesSample size and populationAristotle maintained that women have fewer teeth than men; althoughhe was twice married, it never occurred to him to verify this statementby examining his wives mouths.
Russell B, The Impact of Science on Society, Simon and Schuster, New York, 1953;p 7
Test whether a drug is effective in treating patients with HIV
I The population you really care about is more diverse than thepopulation from which your data were sampled
I Collection of data from a ”convenience sample” rather than arandom sample
I The measured variable (CD4 lymphocytes) is a proxy for anothervariable you really care about (survival time)
I Measurements may be made or recorded incorrectly (quality ofantibody!)
I Combination of different kinds of measurements to reach anoverall conclusion.
Applications in Medicine
I EpidemiologyI BiometryI In vitro and animal experimentsI Clinical trials (Phase I to IV)I Approval for drugs and medical devicesI Evaluation of new measurement and diagnostic techniquesI Meta analysisI Evidence based medicine
Research projects
PLANNING
DESIGN
EXECUTION data collection
DATA PROCESSING
DATA ANALYSIS
PRESENTATION
INTERPRETATION
PUBLICATION
Classification of statistical methods
Univariate methodsEach variable is considered individually
Bivariate methodsRelation between 2 variables is studied
Multivariate methodsRelation between >2 variables is studied
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Measurement
Observation unitThe unit upon which measurements are madeBlood samples, animals, test persons, patients ...
VariableObservable or measurable properties of the observation unit whichcan take different values.Should address the question and follow objectivity, reliability, andvalidity.Diagnosis, tumor stage, cholesterol levels ...
ValueA realized measurement; feature characteristicType of surgery, 3 mol/ml, female ...
Types of data
Categorical data (qualitative)Nominal data (sex male, female; blood group 0, A, B, AB)Ordinal data (cancer stage I, II, III, IV)
Numerical data (quantitative)Discrete data (number of children 0, 1, 2, 3, 4, 5+)Continuous data (blood pressure; height in cm)
Other types of dataRanks, percentages, rates and ratios, scores, visual analogue scale,censored data
Note: It is important to know the data type since representation andanalysis are dependent on this type.
Types of scales
Nominal scaleEqual or not equal (a = b, a 6= b)
Ordinal scaleRank is possible (a < b,a = b,a > b)
Interval scaleNot only rank but also difference of values (c = a− b)0 is taken arbitrarily (e.g. 2007 AD, temperature scale, diopter)
Ratio scaleNot only differences but also ratios (c = a/b)0 is represented naturally in empirical data (e.g. age of a person,absolute zero)
Frequencies
Absolute frequencyNumber of observation k bearing the same value or fall within a givenclass from the number n of total observations
fabs = k
Relative frequencyEstimate of the probability of a single event for discrete data:
frel =kn
0 ≤ frel ≤ 1
Relative frequency in percent:
frel% = frel × 100%
Presentation of categorical (discrete) data
Frequency tableBlood group distribution of 2060 individuals from Croatia:
Blood group frequency relative frequency relative frequency %
0 702 0.341 34.1%A 862 0.418 41.8%B 365 0.177 17.7%
AB 131 0.064 6.4%
Total 2060 1.000 100.0%
Mourant AE, et al. The Distribution of the Human Blood Groups and OtherPolymorphisms, Oxford University Press, 1976; pp. 909
Together with relative frequencies, the sample size should be given
1 man and 6 women are 14.286% and 85.714%⇒ if sample size is small, use absolute and avoid relative frequencies⇒ Percentages with many decimal places pretend large sample size
Presentation of discrete data
In bar charts, bars should always start from 0.
Prefer bar charts to pie charts since the eye is good at judging linearmeasures and bad at judging relative areas.
3-dimensional pie charts show misleading proportions due to thechange of perspective.
Presentation of data course
Consider relation between x- and y-scale.
Diagrams should start from 0.
Presentation of continuous data
A simple graphical way of depicting a complete set of observations isby means of the histogram in which the number (or frequency) ofobservation is plotted for different values or groups of values.
ExampleSerum cholesterol levels (mmol/l) of a sample of 86 stroke patients
3.7 3.8 3.8 4.4 4.5 4.5 4.5 4.7 4.7 4.8 4.8 4.9 4.94.9 5.0 5.1 5.1 5.2 5.3 5.3 5.4 5.4 5.5 5.5 5.5 5.65.6 5.6 5.6 5.6 5.6 5.6 5.7 5.7 5.7 5.8 5.8 5.9 6.06.1 6.1 6.1 6.1 6.2 6.3 6.3 6.4 6.4 6.4 6.4 6.4 6.56.5 6.6 6.7 6.7 6.8 6.8 7.0 7.0 7.0 7.0 7.1 7.1 7.27.3 7.4 7.4 7.5 7.5 7.6 7.6 7.6 7.7 7.8 7.8 7.8 8.28.3 8.6 8.7 8.9 9.3 9.5 10.2 10.4
Markus HS, et al. Stroke, 1995;26(8):1329-1333
Histogram
Partition into classes
Following aspects should be considered:
I Partition comprises all valuesI Values have to be assigned to the classes unequivocallyI The class width should be the same for all classesI Mid-point of a class represents all values within the classI The smaller the number of classes the greater the class width
and the greater the loss of information.I The higher the number of classes the more of the uninteresting
random effects are apparent.
Empirical formulae:
k ≈√
n, k ≈ 5× log10(n)
where k is the number of classes and n the number of values.
Histogram
Partition in classes (Example)
Range : min = 3.7 , max = 10.4Span width : max −min = 10.4− 3.7 = 6.7k ≈√
86 = 9.27
Class width = 1.0 and k = 8 ⇒
Interval Tally Frequency Relative frequency
3.00-3.99 /// 3 3.5%4.00-4.99 ///// ///// / 11 12.8%5.00-5.99 ///// ///// ///// ///// //// 24 27.9%6.00-6.99 ///// ///// ///// ///// 20 23.3%7.00-7.99 ///// ///// ///// //// 19 22.1%8.00-8.99 ///// 5 5.8%9.00-9.99 // 2 2.3%
10.00-10.99 // 2 2.3%
Total 86 100.0%
Histogram
Histograms of cholesterol levels from stroke patients
Histogram
Histograms with different number of classes
Histograms have to be area accurate, when frel or fabs is plotted, the class width has tobe constant.In cases of different class widths, the frequency density (frel/widthi ) should be plotted.
Frequency density histogram
Age group Relative frequency (%) Frequency density (%)(Relative frequency per year)
0-4 25.3 5.065-14 18.9 1.89
15-44 30.3 1.0145-64 13.6 0.6865+ 11.7 0.33
Frequency polygon
Frequency polygons are useful for comparisons.
Cumulative frequency histogram and empiricalcumulative distribution function
Measures of central tendency
Arithmetic mean
x = (x1 + x2 + . . .+ xn) =1n
n∑i=1
xi
where n is the number of observations (degree of freedom) andx1, x2, . . . , xn is the sample (observations)
MedianFor ranked data x1 ≤ x2 ≤ . . . ≤ xn the median x is for
odd n x = x(n+1)/2
even n x = 12 (xn/2 + xn/2+1)
ModeThe mode xmod is the most frequent observation.
It is the only measure for nominal data. For continuous data it isrepresented by the center of the class with the most frequentobservations within the histogram and can be used for bimodal data
Measures of variability
Rank, rank listThe sample x1, x2, . . . , xn sorted by the size of the values isx(1), x(2), . . . , x(n) and called rank list, where the indices (1), ...(n) arethe ranks R(xi ) of the values.
RangeSpan width (range): r = xmax − xmin = x(n) − x(1)
PercentilesThe p% percentile (Qp) means that p% of the values are smaller thanor equal to the p% percentile.
Qp =
x(k) : n × p is not an integer (k = int(n × p) + 1)12
(x(k) + x(k+1)) : n × p is an integer (k = n × p)
Measures of variability
Quartiles1stquartile = Q1 = Q252ndquartile = Q2 = Q50 = median3rdquartile = Q3 = Q75
Interquartile rangeIQR = Q3−Q1 = Q75 −Q25
Outlier detectionxi ≥ Q75 + 1.5× IQR or xi ≤ Q25 − 1.5× IQR . . . mild outlierxi ≥ Q75 + 3.0× IQR or xi ≤ Q25 − 3.0× IQR . . . extreme outlier.
This approach could be misleading for small number of observations.There are also other methods for outlier detection and fordetermination of quartiles. E.g.:
Qp = (1− j)×x(k+1) + j×x(k+2) : k = int((n−1)×p); j = (n−1)×p−k
Box-and-whiskers plot
Measures of variability
Variance (2nd moment)
s2 =1
n − 1
n∑i=1
(xi − x)2
Standard deviation
s =√
s2 =
√1
n − 1
n∑i=1
(xi − x)2
where n is the number of observations and n − 1 corresponds to thedegrees of freedom
Coefficient of variation
CV = s/|x | or CV = s/|x | × 100%
provides a standardized measure for the variability (CV < 10%represents low and CV > 25% high variability).
Measures of variabilityStandard error of mean
SE(x) = s/√
n
describes not the data, but the accuracy of the estimation.
SE is sometimes misleadingly used
Measures of shape
Skewness (3rd moment)
g1 =m3√m3
2
=
1n−1
n∑i=1
(xi − x)3√(1
n−1
n∑i=1
(xi − x)2
)3
g1 = 0 means the distribution is symmetrical, g1 > 0 right skewed,and g1 < 0 left skewed and mi is the i-th central moment.
Kurtosis (4th moment)
g2 =m4
m22− 3 =
1n−1
n∑i=1
(xi − x)4
(1
n−1
n∑i=1
(xi − x)2
)2 − 3
For normal distribution g2 = 0. If g2 > 0 (g2 < 0) within the center ofthe distribution lies more(less) values than for the normal distribution.
q-q plot
Comparison of sample quantiles with quantiles of a normaldistribution.
Normal distributed observations should following a line.
Transformations
MotivationMost (parametric) statistical methods for analyzing continuous dataassumes normal distribution.To test for a normal distribution the Shapiro-Wilk test and the q-q plotcan be used.Another important assumption is that different groups of observationshave the same standard deviations (or CV).Reduction of the influence of outlying values.
TransformationsLog (is the most common transformation)Square rootReciprocalBox-Cox (find the best transformation)Rank
Log transformations
⇒ asymmetric confidence interval: CI = blog x±t×
slog x√(n)
Shapiro CM, et al., Am J Med Sci, 1987;293(6):365-370
Hodgkin's disease
T4 (cells/mm3)
Fre
quen
cy
0 500 1000 1500 2000 2500
02
46
8
0 500 1000 1500 2000 2500
02
46
8
Hodgkin's disease(log transformed)
T4 (log cells/mm3)
Fre
quen
cy
4 5 6 7 8 9
02
46
8
4 5 6 7 8 9
02
46
8
Non−Hodgkin's disease
T4 (cells/mm3)
Fre
quen
cy
0 500 1000 1500 2000 2500
02
46
8
0 500 1000 1500 2000 2500
02
46
8
Non−Hodgkin's disease(log transformed)
T4 (log cells/mm3)
Fre
quen
cy
4 5 6 7 8 9
02
46
8
4 5 6 7 8 9
02
46
8
Box-Cox-transformations
Define a function to find the best transformation:
x ′ =
xλ − 1λ
forλ 6= 0;
log(x) forλ = 0;
For the logarithmic transformation λ = 0, square root λ = 12 , cubic
λ = 13 , and reciprocal λ = −1.
Optimal λ can be calculated from the likelihood function L(λ).
Standardization
StandardizationFor the analysis of multivariate data a standardization is oftenwanted. That is a normalization where the mean gets 0 and thestandard deviation gets 1.
x′
i =xi − x
s
x′
i is also called z-score. The data are centered and the area underthe normal distribution gets 1. This is helpful for comparisons.
Ranging
x′
i =xi − xmin
xmax − xmin
E.g. for construction of diagrams and figures
Bivariate descriptive methodsContingency tablenominal versus nominal(ordinal) scaled variable
Light Regular Dark Total
Male 20 40 50 110Female 50 20 20 90
Total 70 60 70 200
Barplots
Bivariate descriptive methods
Boxplots
nominal versus metric scaled variable
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Diagnostic tests
Sensitivity (SN) =TP
TP + FN= TPR (Precision)
Specificity (SP) =TN
FP + TN= 1− FPR
Positive predictive value (PPV) =TP
FP + TP(Recall)
Negative predictive value (NPV) =TN
TN + FN
Prevalance (observed in this study) =TP + FN
n; Accuracy =
TP + TNn
Diagnostic tests
Consider the predictive ability of the test for the general populationor groups with different prevalence of disease (T .. test result,D .. disease state):
P(D+|T +) =P(T +|D+)× P(D+)
P(T +)=
P(T +|D+)× P(D+)
P(T +|D+)× P(D+) + P(T +|D−)× P(D−)
P(D+) = Prevalence (PREV )P(D+|T +) = PPVP(T +|D+) = SNP(T +|D−) = 1− SP
PPV =SN × PREV
SN × PREV + (1− SP)× (1− PREV )
NPV =SP × (1− PREV )
(1− SN)× PREV + SP × (1− PREV )
Diagnostic tests
Likelihood ratio:
LR+ =P(T +|D+)
P(T +|D−)=
SN1− SP
LR− =P(T−|D+)
P(T−|D−)=
1− SNSP
post test odds = pre test odds × LR
PPV1− PPV
=PREV
1− PREV×
SN1− SP
. . .
PPV =SN × PREV
SN × PREV + (1− SP)× (1− PREV )
Receiver operating characteristics (ROC) curve
Example
The new england
journal of medicine
established in 1812 may 27, 2004 vol. 350 no. 22
Prevalence of Prostate Cancer among Men with a Prostate-Specific Antigen Level ≤4.0 ng per Milliliter
Ian M. Thompson, M.D., Donna K. Pauler, Ph.D., Phyllis J. Goodman, M.S., Catherine M. Tangen, Dr.P.H., M. Scott Lucia, M.D., Howard L. Parnes, M.D., Lori M. Minasian, M.D., Leslie G. Ford, M.D.,
Scott M. Lippman, M.D., E. David Crawford, M.D., John J. Crowley, Ph.D., and Charles A. Coltman, Jr., M.D.
Table 2. Relationship of the Prostate-Specific Antigen (PSA) Level to the Prevalence of Prostate Cancer and High-Grade Disease.*
PSA LevelNo. of Men(N=2950)
Men withProstate Cancer
(N=449)
Men with High-Grade Prostate Cancer
(N=67) Sensitivity Specificity
no. of men (%) no./total no. (%)
≤0.5 ng/ml 486 32 (6.6) 4/32 (12.5) 1.0 0.0
0.6–1.0 ng/ml 791 80 (10.1) 8/80 (10.0) 0.93 0.02
1.1–2.0 ng/ml 998 170 (17.0) 20/170 (11.8) 0.75 0.33
2.1–3.0 ng/ml 482 115 (23.9) 22/115 (19.1) 0.37 0.73
3.1–4.0 ng/ml 193 52 (26.9) 13/52 (25.0) 0.12 0.92
Example
Discussion
Screening testsTesting healthy population for early signs of rare serious disease
High sensitivity and PPV
Don’t want FN and accept moderate number of FP
Diagnostic testsE.g. testing high risk individuals
High specifity and NPV
False positive diagnosis would have major consequences for the patient (HIV+)
Predictive values are strongly dependent on prevalence
The choice of the cut-off is not a statistical decision
Test must be repeatable and should have minimal inter-observer variation
Method comparison for categorical dataUsed to quantify the agreement of categorical assessments ofdifferent observers:
Normal Benign Suspect Cancer Total
Normal 21 12 0 0 33Benign 4 17 1 0 22Suspect 3 9 15 2 29Cancer 0 0 0 1 1
Total 28 38 16 3 85
Observed agreement of frequencies
po =1n
k∑i=1
fii = (21 + 17 + 15 + 1)/85 = 0.635 (64%)
Expected agreement of frequencies (by chance)
pe =1n2
k∑i=1
rici = (33 ·28 + 22 ·38 + 29 ·16 + 1 ·3)/852 = 0.308 (31%)
Observer A
Obs. B
Data from Boyd et al., J Natl Cancer Inst, 1982;68(3):357-363
Method comparison for categorical data
Measure of agreement Cohen’s κ
κ =po − pe
1− pe=
0.635− 0.3081− 0.308
= 0.47
Guidelines to interpret κ
Value of κ Strength of agreement
<0.20 Poor0.21-0.40 Fair0.41-0.60 Moderate0.61-0.80 Good0.81-1.00 Very good
Altman DG, Practical statistics for medical research, 1991; Chapman, London, pp 404adapted from Landis and Koch, Biometrics, 1977;33(1):159-174
Method comparison for categorical data
Cohen’s κ does not take into account the degree of disagreement⇒Weighted κ adds weights to the frequencies in each cell accordingto their distance:
wij = 1−|i − j |k − 1
(linear) wij = 1−|i − j |2
(k − 1)2 (quadratic; Fleiss-Cohen)
po(w) =1n
k∑i=1
k∑j=1
wij fij pe(w) =1n2
k∑i=1
k∑j=1
wij ricj
κlw =po(w) − pe(w)
1− pe(w)=
0.866− 0.6911− 0.691
= 0.57 (linear weights)
κqw =po(w) − pe(w)
1− pe(w)=
0.947− 0.8411− 0.841
= 0.67 (quadratic weights)
Note: Weighted kappa must not be applied to unordered categorialdata.
Example: Kappa statistics for gene grouping
Huang DW, et al., Genome Biology, 2007;8(9):R183
Example: Diagnosis of renal artery stenosis
Vasbinder GB, et al. Ann Intern Med, 2004;141(9):674-682
Objective: To determine the validity of computed tomographicangiography (CTA) and magnetic resonance angiography (MRA)compared with digital subtraction angiography (DSA) for detectionof renal artery stenosis.
Results: Twenty percent of patients who underwent all 3 testshad clinically relevant renal artery stenosis. Moderate interob-server agreement was found, with values ranging from 0.59 to0.64 for CTA and 0.40 to 0.51 for MRA. The combined sensitivityand specificity were 64% (95% CI, 55% to 73%) and 92% (CI,90% to 95%) for CTA and 62% (CI, 54% to 71%) and 84% (CI,81% to 87%) for MRA.
Limitations: Eighteen percent of the patients were includednonconsecutively. Digital subtraction angiography may be an im-perfect reference test.
Conclusion: Computed tomographic angiography and MRA arenot reproducible or sensitive enough to rule out renal artery ste-nosis in hypertensive patients. Therefore, DSA remains the diag-nostic method of choice.
Table 3. Overall Diagnostic Accuracy and Areas under the Receiver-Operating Characteristic Curves for All Observers*
Observer Sensitivity, % Specificity, % Positive PredictiveValue, %
Negative PredictiveValue, %
AUC
CTAA 69 91 67 92 0.84B 61 89 59 90 0.76†C 61 97 83 91 0.84
Combined 64 (55–73) 92 (90–95) 68 (59–77) 91 (88–94) 0.85 (0.79–0.91)
MRAD 67 77 42 90 0.75E 63 84 50 90 0.76F 57 90 59 89 0.81
Combined 62 (54–71) 84 (81–87) 49 (40–58) 90 (87–93) 0.83 (0.77–0.89)
* Values in parentheses are 95% CIs. AUC area under the receiver-operating characteristic curve; CTA computed tomographic angiography; MRA magneticresonance angiography.† The AUC for CTA observer B is statistically significantly lower than the AUCs for CTA observers A (P 0.03) and C (P 0.05).
Method comparison studiesAim is to see if 2 (or more) methods (devices) agree enough that theycan be interchanged (e.g. quicker or cheaper methods).
Best approach is to analyze the differences between the measurementsof the 2 methods on each subject.
Bland JM, Altman DG, Lancet, 1986;1(8476):307-310
Method comparison studies
It is expected that about 95% of the observations were included in therange of mean ± 2SD.
This range of values defines the 95% limits of agreement.
In case of variable agreement (wider scatter as the averageincreases)⇒ log-transform
Inappropriate use of correlation coefficient r and significance testing:
1. r measures the strength of a relation between 2 variables, not the agreementbetween them (perfect correlation if the points lie along any straight line).
2. Change in scale of measurement does not affect the correlation
3. Correlation depends on the range of the true quantity in the sample.
4. The test of significance may prevalently show that the two methods are related
5. Data which seem to be in poor agreement can produce quite high correlations.
Method comparison studies
Repeatability of a method
The repeatability of a method can be assessed by comparing repeated measurementsusing the method repeatedly on a series of subjects.The Bland-Altman plot can also be used to assess the repeatability.Since for the repeated measurements the same method is used, the mean differenceshould be zero. Hence, the Coefficient of Repeatability (CR) can be defined as:
CR = 1.96×
√√√√√ n∑i=1
(d2i − d1i )2
n − 1
If more than 2 measurements per subject⇒ ANOVA
Measuring agreement using repeated measurements:
Take difference of means from each methodThe SD has to be corrected (law of Error Propagation):
SDc =
√√√√SD2 +
(SD1
2
)2
+
(SD2
2
)2
Error grid analysis (EGA)Comparison of blood glucose meters with the gold standard(Beckman analyzer)
Brunner GA, et al., Diabetes Care, 1998;21(4):585-590Clarke WL, et al., Diabetes Care, 1987;10(5):622-628
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Combinatorics
PermutationsFor n different elements there are n! permutations.
For example n = 3 :ABC, ACB, BAC, BCA, CAB, CBA ⇒ 3! = 6 permutations.
For n objects in k groups, not distinguishable within a group, there aren!
n1!× n2!× . . .× nk !permutations.
For example 2 red balls, 3 green balls, and 7 blue balls⇒12!
2!× 3!× 7!= 7920 permutations.
Combinations
CombinationsIf from n elements not all (as for permutations) but k elements weredrawn are called combinations.
Binomial coefficient(nk
)=
n!
k !(n − k)!=
n × (n − 1)× (n − 2)× . . .× (n − k + 1)
1× 2× 3× . . .× k
Without repetitionsOrder does not matter:
(nk
)Order matters: k !×
(nk
)With repetitionsOrder does not matter:
(n+k−1k
)Order matters: nk
Random experiments
All outcomes of the experiment are known in advance
But, it is a priori unknown which will be the outcome of each repetitionof the experiment:
I Systematic and random errorsI Complex processes, result of many combined processes
The experiment can be repeated under identical conditions
Examples are tossing a coin, throwing a dice, or life-time of a bulb.
Sample space and event
Sample spaceCollection of possible elementary outcomes from a random exp.
Throwing a dice: Ω = 1,2,3,4,5,6Life-time of a bulb: Ω = [0,∞)Diagnosis: Ω = diseased ,healthyBody height: Ω = R+
EventA set of outcomes of the experiment.
A = 6,A = tail,A = diseased,A = height > 180cmA = Ω ... certain eventA = ∅ ... impossible event
Sigma-field SA σ-field (σ-algebra) S is a non empty collection of subsets of Ω thatsatisfy ∅ ∈ S
A ∈ S⇒ Ac ∈ SAi is a countable sequence of sets⇒
⋃i
Ai ∈ S
Probability measure
The pair (Ω,S) is considered as sample space associated with astatistical experiment.
A set function P defined on S is called a probability measure (orprobability) if it satisfies the following conditions:
1. P(A) ≥ 0 for all A ∈ S.2. P(Ω) = 1.3. Ai ∈ S be a disjoint sequence of sets (Aj ∩ Ak = 0 for j 6= k)
⇒ P(∞∑i=1
Ai ) =∞∑i=1
P(Ai )
P(A) is called the probability of event.
The triple (Ω,S,P) is called a probability space.
Probability
For an experiment with k possible equal probable outcomes :
P(A1) = P(A2) = . . . = P(Ak ) =1k
,k∑
i=1P(Ai ) = 1.
Are events mutually exclusive then the probability is the sum of theprobability of each event:
P(A1 ∪ A2 ∪ . . . ∪ Ak ) = P(A1) + P(A2) + . . .+ P(Ak ) =k∑
i=1P(Ai ).
Are events independent then the probability of occurrence of allevents is the product of the probability of each event:
P(A1 ∩ A2 ∩ . . . ∩ Ak ) = P(A1)× P(A2)× . . .× P(Ak ) =k∏
i=1P(Ai ).
Conditional probability
For 2 arbitrary events A and BP(A ∪ B) = P(A) + P(B)− P(A ∩ B)
P(Ac) = 1− P(A)
What is the probability of event A given B?P(A|B) = P(A ∩ B)/P(B)
What is the probability of event B given A?P(B|A) = P(A ∩ B)/P(A)
Bayes Theorem
P(A|B) =P(B|A)× P(A)
P(B)
Example 1Do women get less promoted than men?
From 200 promotions only 4 women get promoted (2%).For one position 40 women and 3270 men have applied.
P(P|F ) =P(F |P)× P(P)
P(F )=
0.02× 2003270+40
403270+40
= 0.1 = 10%
P(P|M) =P(M|P)× P(P)
P(M)=
0.98× 2003270+40
32703270+40
= 0.0599 = 6%
Bayes Theorem
P(B) = P(A ∩ B) + P(Ac ∩ B) = P(B|A)× P(A) + P(B|Ac)× P(Ac)
P(A|B) =P(B|A)× P(A)
P(B|A)× P(A) + P(B|Ac)× P(Ac)=
P(B|A)× P(A)n∑
i=1P(B|Ai )× P(Ai )
Example 2A Briton was arrested 1990 for 16 years based on a random DNAmatch with a probability of 1 in 3× 106 according to experts.
Suppose there are 10000 people in the DNA database than theprobability that the suspect is innocent given a DNA match (that iswhat we want to know) can be calculated using the Bayes theorem:
P(I|M) =P(M|I)× P(I)
P(M)=
13×106 × 9999
100001
3×106 × 999910000 + 1
10000
= 0.0033
P(M|I) = 13000000 ⇔ P(I|M) = 3
1000
Likelihood function
posterior(probability)︷ ︸︸ ︷P(B|A) =
likelihood︷ ︸︸ ︷P(A|B)×
prior(probability)︷ ︸︸ ︷P(B)
P(A)︸ ︷︷ ︸evidence, normalizing factor
∝
likelihood of B given fixed A︷ ︸︸ ︷L(B|A) ×
prior︷ ︸︸ ︷P(B)
Consider a model which gives the probability density function (PDF)of an observable random variable vector X as a function of aparameter θ (in general a parameter vector). Then for specific valuesx1, ..., xn of X (a given realization), the function
L(θ|x1, ..., xn) = f (x1, ..., xn|θ)
is a likelihood function of θ. The likelihood function is functionally thesame in form as PDF. However, the emphasis is changed from the xto the θ. The PDF is a function of the x ’s while holding theparameters θ’s constant, L is a function of the parameters θ’s, whileholding the x ’s constant.
Likelihood ratio
Bayes theoremThe Bayes theorem can also be written in terms of a likelihood ratioand odds:
O(A|B) = O(A)× Λ(A|B)
where Λ(A|B) is the likelihood ratio,
O(A|B) =P(A|B)
P(Ac |B)are the odds of A given B, and
O(A) =P(A)
P(Ac)are the odds of A.
Likelihood ratio
Λ(A|B) =L(A|B)
L(Ac |B)=
P(B|A)
P(B|Ac)
Maximum likelihood estimation
Choosing an estimator for θ (θ(X )) that maximizes L(θ|x1, ..., xn) andsatisfies therefore
L(θ|x1, ..., xn) = supθ∈Θ
L(x1, ..., xn|θ)
is called maximum likelihood estimator (MLE).
Since products of probabilities are very small it is convenient to workwith logarithm of the likelihood function. log is a monotone functiontherefore
log L(θ|x1, ..., xn) = supθ∈Θ
log L(x1, ..., xn|θ).
If θ exists it must satisfy the likelihood equations
∂ log L(θ|x1, ..., xn)
∂θj= 0, j = 1,2, ..., k , θ = (θ1, ..., θk ).
Maximum likelihood
If X1,X2, ...,Xn are independent and identically distributed (i.i.d.) withprobability density function (PDF) or probability mass function (PMF)the likelihood function can be calculated:
L(θ|x1, ..., xn) =n∏
i=1f (xi |θ)
and the log likelihood function:
log L(θ|x1, ..., xn) = log(n∏
i=1f (xi |θ)) =
n∑i=1
log f (xi |θ)
For example linear regression:
log L(y = ax + b|x) =n∑
i=1log f (ax + b)
Random variable
The probability measure P is a set function and hence difficult to workwith.
Let (Ω,S) be a sample space. A random variable is defined as finite,single-valued function that maps Ω into R if the inverse images underX of all Borel sets in R are events, that is if
X : Ω→ R X−1(B) = ω : X (ω) ∈ B ∈ S for all B ⊂ R
In short, a random variable (r.v.) is a function that assigns a realnumber to the outcome of a random experiment.
The resulting value (X = x) is called realization of the randomvariable X.
Discrete random variableA discrete random variable can take a countable number ofpredetermined values.
ExamplesTo toss a coin, to throw a dice, or the number of cars crossing a lineduring a certain time interval
Probability mass function (PMF)For discrete random variables the mass function determines theprobability of each element of the sample space.
f (xi ) = P[X = xi ]
Continuous random variableContinuous random variables can take any real value
Probability density function (PDF)A probability density function is a function f(x) that describes theprobability density in terms of the input variable x, which satisfy
1. P[a ≤ X ≤ b] =b∫a
f (x)dx ,
2. f (x) ≥ 0,∀x ∈ R,
3.∞∫−∞
f (x)dx = 1.
The histogram is an estimator for the probability density function.
Cumulative Distribution Function (CDF)
F (x) = P(X ≤ x) =
x∫a
f (x)dx continuous r .v .x∑
xi =aP(X = xi ) discrete r .v .
where a is the smallest value that the r.v. can take.
Cumulative Distribution Function (CDF)
Properties of the CDF
1. limx→−∞
= 0; limx→+∞
= 1
2. x < y ⇒ F (x) ≤ F (y)
3. F (x) is continuous from the right, F (x + h)→ F (x) as h→ 0
Probability and CDF
P(X > x) = 1− F (x)
P(x < X ≤ y) = F (y)− F (x)
Measures for the distribution function and r.v.
Expectation
E(X ) = µ =∞∫−∞
xf (x)dx
Variance
var(X ) = σ2 = E [(X−E(X ))2] = E(X 2)−(E(X ))2 =∞∫−∞
(x−µ)2f (x)dx
Standard deviation
sd(X ) = σ =√
E [(X − E(X ))2]
Covariance
cov(X ,Y ) = E [(X − E(X ))(Y − E(Y ))]
Correlation
ρ = cov(X ,Y )/σxσy
Normal distribution
Factors of variation which act in an additive way result in a symmetricdistribution which is called a normal distribution.The PDF of the normal distribution with parameters µ and σ (N(µ, σ);also called Gauss distribution) is :
f (x ;µ, σ) = 1√2πσ
e−(x−µ)2
2σ2 E(X ) = µ sd(X ) = σ
Normal distribution
Effects of different σ and µ on the PDF of the normal distribution:
Standard normal distribution
A variable that has a normal distribution with mean µ = 0 andvariance σ2 = 1 is called the standard normal variate and iscommonly designated by the letter Z.
Z =X − µσ
∼ N(0; 1)
Standard normal distribution
The cumulative distribution function can be calculated as follows:
F (x) =
x∫−∞
f (u)du =1
√2πσ
x∫−∞
e−(u−µ)2
2σ2 du
Substituting µ = 0 and σ2 = 1 yields:
Φ(z) =1√2π
z∫−∞
e−u22 du
=12
[1 + erf (z√2
)]
Standard normal distribution and probability
Since the area under the standard normal distribution is 1, theprobability is according to the area under the normal distributionwithin the range of z
P(Z ≤ z) = Φ(z)
P(−0.56 ≤ z ≤ 2.00) = Φ(2.00)− (1− Φ(0.56)) = 0.6895
P(−2.00 ≤ z ≤ 2.00) = 2× Φ(2.00)− 1 = 0.9545
Lognormal distribution
Factors of variation which act in a multiplicative way lead to anasymmetric distribution which is called a lognormal distribution.
Shapiro CM, et al., Am. J. Med Sci., 1987;293(6):365-370
Hodgkin's disease
T4 (cells/mm3)
Fre
quen
cy
0 500 1000 1500 2000 2500
02
46
8
0 500 1000 1500 2000 2500
02
46
8
Hodgkin's disease(log transformed)
T4 (log cells/mm3)
Fre
quen
cy
4 5 6 7 8 9
02
46
8
4 5 6 7 8 9
02
46
8
Lognormal distribution
The lognormal distribution with parameters µ and σ is denotedlnN(µ, σ) and has the following PDF:
f (x ;µ, σ) = 1x√
2πσe−
(lnx−µ)2
2σ2E(X ) = e(µ+ σ2
2 )
sd(X ) =√
(eσ2 − 1)(e2µ+σ2 )
Binomial distribution
The binomial distribution is the simplest probability distribution fordiscrete data.
It represents the probability distribution of the number of successes kin a sequence of n independent yes/no experiments, each of whichyields success with probability p. It is denoted B(n,p).
f (k ; n,p) =
(nk
)pk (1− p)n−k E(X ) = np sd(X ) =
√np(1− p)
For n = 1 it is identical to the Bernoulli distribution.
Binomial distribution
ExampleThe probability being in blood group B is 0.08 so the probability ofbeing group 0, A, or AB is 0.92.
For two unrelated people, the probability of both being in blood groupB is 0.08× 0.08 = 0.006
Number in B Probability
B B 2 0.08× 0.08 = 0.0064¬B B 1 0.92× 0.08 = 0.0736
B ¬B 1 0.08× 0.92 = 0.0736¬B ¬B 0 0.92× 0.92 = 0.8464
Binomial distribution
0 5 10 15
0.0
0.4
0.8
n=2, p=0.08
k
prob
abili
ty
0 5 10 15
0.0
0.3
0.6
n=6, p=0.08
kpr
obab
ility
0 5 10 15
0.0
0.2
0.4
n=10, p=0.08
k
prob
abili
ty
0 5 10 15
0.00
0.20
n=20, p=0.08
k
prob
abili
ty
0 5 10 15
0.00
0.10
0.20
n=50, p=0.08
k
prob
abili
ty
0 5 10 150.00
0.10
n=100, p=0.08
kpr
obab
ility
Binomial versus hypergeometric distribution
Binomial distributionProbability distribution of the number of successes k in a sequence ofn independent yes/no experiments (with replacements), each ofwhich yields success with probability p:
f (k ; n,p) =
(nk
)pk (1− p)n−k
Hypergeometric distributionProbability distribution that describes the number of successes k in asequence of n draws from a finite population N without replacements.
f (k ; N,m,n) =
(mk
)(N−mn−k
)(Nn
) E(X ) =nmN
sd(X ) =
√nmN
N −mN
N − nN − 1
The finite population N consists in a drawing experiment e.g. of mwhite marbles and N −m black marbles.
Over-representation analysis
Poisson distribution
Another discrete probability distribution is the Poisson distribution. Itcan be described by number of events k occurring over time (orspace) at a fixed rate λ on average, but where each event occursindependently and at random (Pois(λ)).For example the daily number of new registrations of cancer may be2.2 on average, but on any day there may be no cases or there maybe several.
f (k ;λ) =λk e−λ
k !E(X ) = λ sd(X ) =
√λ
Examples are:
I The number of phone calls at a call center per minute.I The number of mutations in a given stretch of DNA after a certain
amount of radiation.I The number of light bulbs that burn out in a certain time interval.I The number of cars that pass through a certain point on a road
(distant from traffic lights) during a given period of time.
Poisson distribution
Examples for different values of λ
0 10 20 30 40
0.00
0.05
0.10
0.15
0.20
0.25
0.30
k
prob
abili
ty
λ=2.2
λ=10.0
λ=24.0
Negative bionomial distribution
If count data is too dispersed to fit a Poisson distribution it can bemodeled by the two parameter negative binomial distribution (Pascaldistribution or Polya distribution).
The negative binomial distribution is the distribution of the number oftrials n needed to get fixed number of successes k , where each of thetrials yields success with probability p. It is denoted NB(k ,p).
The probability mass function is therefore given by:
f (n; k ,p) =
(n − 1k − 1
)pk (1−p)n−k E(X ) =
kp
sd(X ) =
√k(1− p)
p
For k = 1 it is identical to the geometric distribution.
Negative bionomial distribution
Examples for a fixed number of successes
Negative bionomial distribution
Examples for a fixed probability
Other distributions
Test distributionsχ2, t , F , ...
Mathematical deduced distributionsExponential, Gamma, Beta, Cauchy, logistic, uniform, Weibull,...
Extended Binomial distributionsBernoulli, geometric, multinomial,...
Multinomial, Beta, and Dirichlet distribution
Two possibilities Three or more possibilities
Binomial distribution Multinomial distribution
f (k ; n, p) =(n
k
)pk (1− p)n−k f (θ1, .., θk ; n; p1, .., pk ) =
n!
k∏i=1
θi !
k∏i=1
pθii
k∑i=1
pi = 1; 0 ≤ θi ≤ n;k∑
i=1θi = n
Beta distribution Dirichlet distribution
f (θ;α, β) =Γ(α+ β)
Γ(α)Γ(β)θα−1 (1− θ)β−1 f (θ1, .., θk ;α1, .., αk ) =
Γ
(k∑
i=1αi
)k∏
i=1Γ(αi )
k∏i=1
θαi−1i
x ≥ 0 : Γ (x + 1) = x!; α > 0, β > 0 θi ≥ 0;k∑
i=1θi = 1
The Beta distribution is the conjugate of the Binomial distribution (samefunctional form, however, variable and parameter are exchanged).
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Parameter estimation and confidence interval
AimsI Estimation of parameters of the relevant population by the
statistics of the sample distribution.I Measures of uncertainty and quality of these estimations and
specification of a confidence interval.
To be valid the sample must be representative of the population. Forquantification of the strength of the evidence or its uncertainty thecharacteristics of the sampling distributions are useful (e.g.properties of the distribution of the means of random samples).
Sampling distributions
Variability of sample means of many random samples of a given sizefrom the population
I is less among the means of large samples than small samplesI is less than the variability of the individual observations in the
populationI increases with greater variability (standard deviation) among the
individual values
The distribution of sample means will be nearly normal whatever thedistribution of the variable in the population as long as the samplesare large enough.
Distribution of means from random sampling
25 30 35 40 45
0.00
0.02
0.04
0.06
Normal distribution
x
f(x)
n=10
x
Fre
quen
cy
25 30 35 40 45
020
040
060
080
010
00
n=25
y
Fre
quen
cy
25 30 35 40 45
050
010
0015
00
n=100
z
Fre
quen
cy
25 30 35 40 45
020
060
010
0014
00
Central limit theorem
If X1,X2, ...,Xn be independent and identically distributed (i.i.d.) withmean µi and variance σ2
i
Then
Xnorm =
n∑i=1
xi −n∑
i=1µi√
n∑i=1
σ2i
has a limiting cumulative distribution function which approaches anormal distribution (∼ N(0; 1)) for large n.
⇒ importance of the normal distribution
Standard error of sample mean
The standard deviation of a large number of sample means will be:
s =σ√
n
where σ is the standard deviation of the variable in the population andn is the size of each sample.
We can estimate the standard error of the (population) mean (SEM)from a single sample using the observed standard deviation in thatsample:
SEM =s√
n
The standard error of the mean is often abbreviated to standard error(SE). The standard error is a measure for the quality of the estimationof the population mean. SE can be used to construct a confidenceinterval.
Standard error
Standard error of the differences between two sample means
SE(x1 − x2) =√
[SE(x1)]2 + [SE(x2)]2 =
√s2
1
n1+
s22
n2
Standard error of a sample proportionFrom the binomial distribution we know the standard deviation
s =√
np(1− p) ⇒ SE =
√p(1− p)
n
This will be true only for large samples (np > 5 and n(1− p) > 5).
Standard error of the difference between two proportions
SE(p1 − p2) =
√p1(1− p1)
n1+
p2(1− p2)
n2
Confidence interval
An (1− α) - confidence interval [θl , θu] is a random interval whichincludes the unknown, true value θ with a probability of 1− α.
P[θl ≤ θ ≤ θu] ≥ 1− α
Per convention α = 0.05, but it can be chosen arbitrarily.
x − t1−α/2s√
n≤ µ ≤ x + t1−α/2
s√
nfor normal distributed population
Student’s t-distribution
If X1, . . . ,Xn are independent and N(0,1) distributed then:
t =x
s/√
nis t-distributed with degrees of freedom n − 1.
With small degrees of freedom (or small n) the t-distribution differsfrom the normal distribution considerably.If the degree of freedom is high the t-distribution approximates thestandard normal distribution.
Student’s t-distribution
Student, Biometrika, 1908;6(1):1-25
William Sealy Gosset(1876-1937)
Confidence interval
95% confidence intervals for mean serum albumin concentration from216 patients with primary biliary cirrhosis constructed from 100random samples of size 100
Christensen E, et al., Gastroenterology, 1985;89(5):1084-1091
Confidence interval for relative frequencies
If X1, ...Xn are independent and binary variables (0,1) with parameter
p = P[xi = 1]⇒ k =n∑
i=1xi is binomial distributed with p =
kn
.
The (1− α) confidence interval is [pl , pu] with
pl =k
k + (n − k + 1)F ∗1−α/2pu =
(k + 1)F1−α/2
n − k + (k + 1)F1−α/2
and F ∗1−α/2, F1−α/2 are quantiles of F-distributions.
In case of large n z =k − np√np(1− p)
is approximately N(0,1)
distributed and the confidence interval is:
p − z1−α/2
√p(1− p)
n≤ p ≤ p + z1−α/2
√p(1− p)
n
Parameter estimation
We want to have good estimators for different parameters of thepopulation distribution (θ = µ, θ = σ2).
An estimator θ of a parameter θ should
I for large n approach θ andI for large n follow a normal distribution (central limit theorem).
These properties are most of the time satisfied and we want aquantitative criteria. The estimation error (θ − θ) should be minimal:
1. Unbiasedness 3. Consistency
E [θ − θ] = 0 ... Bias limn→∞ P(∣∣∣θn − θ
∣∣∣ < ε) = 1 (ε > 0)
2. Minimal variance 4. Robustness
Var [θ]→ minimal Not unduly affected by outliers
Maximum likelihood estimation
Parameter estimation for normal distribution
L(θ|x) =n∏
i=1f (xi |θ) =
1
(σ√
2π)ne−
12σ2
∑(xi−µ)2
log L = −n logσ −n2
log 2π −1
2σ2
∑(xi − µ)2
d(log L)
dµ= 0 =
1σ2
∑(xi − µ) =
1σ2(∑
(xi )− nµ) ⇔ µ =
∑xi
n
d(log L)
dσ= 0 =
− nσ
+
∑(xi − µ)2
σ3 ⇔ σ2 =
∑(xi − µ)2
n
Ordinary least squares (OLS) is a special case of the maximumlikelihood method
Thumbnail example
X = x1, ..., xn, where xt ∈ 0,1
Binomial distribution: P(X |Θ) =(n
k
)Θk (1−Θ)n−k ... Likelihood
Maximum likelihood estimation
P(X |Θ) =(n
k
)Θk (1−Θ)n−k
log P(X |Θ) = k log Θ + (n − k) log(1−Θ) + C
ddΘ
log P(X |Θ) =kΘ−
n − k1−Θ
= 0⇒ Θ =kn
Since the data X are usually subject to random fluctuations andintrinsic uncertainty, repeating the whole process of data collectionand parameter estimation under identical conditions will mostly leadto slightly different results.
⇒ if we are able to repeat the data-generating process several times,we will get a distribution of parameter estimates Θ, from which wecan infer the intrinsic uncertainty of the estimation process
Distribution of parameter estimate Θ
The probability of k observations of heads in a sample of size n isgiven by
P(k) =(n
k
)Θk (1−Θ)n−k
k = nΘ⇒ P(k) = C( n
nΘ
)ΘnΘ(1−Θ)n(1−Θ)
In more complicated situations analytical solutions are usually notavailable⇒ computational procedure called bootstrapping.
Frequentist versus Bayesian paradigm
Bayesian approach
P(Θ|X )︸ ︷︷ ︸posterior probability
∝ P(X |Θ)︸ ︷︷ ︸likelihood
P(Θ)︸ ︷︷ ︸prior probability
We want to compute the posterior probability from the likelihood andthe prior probability.
It is mathematically convenient to choose a functional form that isinvariant with respect to the transformation (see above), that is, forwhich the prior and the posterior probability are in the same functionfamily (conjugate).
The conjugate of the binomial distribution is the beta distribution:
P(Θ|X ) ∝ Θk+α+1(1−Θ)N−k+β−1
P(Θ|X ) = B(Θ|k + α,N − k + β)
Comparison of frequentist and Bayesian approach
Maximum a posteriori (MAP) estimate:ΘMAP = argmaxΘ P(Θ|X )
Maximum likelihood (ML) estimate:ΘML = argmaxΘ P(X |Θ)
N →∞⇒ ΘMAP = ΘML
Suppose you are allowed to toss a thumbnail a few times only
You can use prior knowledge, e.g. Torque acting on the fallingthumbnail from theoretical physics.
If you allowed to toss the thumbnail arbitrarily often, the data will”speak for itself”, and including any prior knowledge no longer makesany difference to the prediction.
Comparison of frequentist and Bayesian approach
Main difference between the frequentist and the Bayesian approachis the different interpretation of Θ:
The Frequentist statistician interprets Θ as a parameter and aims toestimate it with a point estimate, typically adopting the maximumlikelihood approach
The Bayesian statistician interprets Θ as random variable and tries toinfer its whole posterior distribution, P(Θ|X ).
For derivation of P(Θ|X ) in complex inference problems a powerfulcomputational approach called Markov Chain Monte Carlo (MCMC)approach can be used (Bayesian pendant to the frequentist’sbootstrap approach)
Parameter free estimation / Resampling
Parameter-free means there are no assumptions about the form ofthe population distribution, but instead using the data (sample) and itsdistribution.
We are not interested in the parameters per se but we want to test ahypothesis or want to know the quality of a prediction based on thedata.
In both cases using resampling methods allows to quantify theperformance of the estimation.
Bootstrap
The idea of the bootstrap is to randomly sample n times withreplacement from the original data points (based on the samedistribution of the original data).
If this procedure is often repeated (eg. 1000 times) the distribution ofthe medians should approximate a normal distribution and the meanand variance of the medians can be calculated.
The 95% confidence interval can be derived from the sortedbootstrap samples (at the 25th and 975th value).
Permutation test
The permutation test (randomization test) is similar to the bootstrap,only that the re-sampling procedure is done without replacements.
As example the question is addressed if active genes in a specificcondition tend to be adjacent within the genome. For this purpose theposition within the genome were 10000 times permuted and thenumber of adjacent active genes were counted.
As measure of the test the z-score or the p-value (that is the fractionof the rearrangements that have counts as far apart or more thanactually observed) can be provided.
Jackknife
The jackknife approach is used to measure the performance of anestimator (θ∗) by systematically recomputing the statistic estimate(θ∗−i ) leaving out one observation at a time from the sample.
Finally the jackknife corrected estimator (θjack ) can be calculated fromthe θ∗−i as follows:
θjack = nθ∗ −n − 1
n
n∑i=1
θ∗−i
For example estimating the mean:
x =
n∑i=1
xi
n and x−j =
n∑i=1
xi−xj
n−1 ⇒ xj = nx − (n − 1)x−j
and analogous for general estimators :
θ∗j = nθ∗ − (n − 1)θ∗−j with θjack =
n∑j=1θ∗j
n
Quenouille M. Journal of the Royal Statistical Society, Ser. B, 1949;11:68-84Tukey JW. Annals of Mathematical Statistics, 1958;29:614
Parameter free estimation / Resampling - Summary
Bootstrap resamplingGenerate samples with the same size n as x with replacement toestablish confidence intervals.
Permutation subsamplingGenerate samples with a (in general) smaller size n than x withoutreplacement to test hypotheses of ’no effect’.
Jackknife ’Leave one out’ samplingGenerate samples of size n=x-1 to measure the performance of anestimator.
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Hypothesis test
In medicine often comparison between treatments or procedures, orbetween groups or subjects is conducted. Or more general aresearch question is addressed and tested with an experiment.
The numerical value corresponding to the comparison of interest iscalled effect.
A null hypothesis H0 can be stated if this effect of interest is zero, aswell as and alternative hypothesis H1 that the effect is not zero.
The null hypothesis is in general the negation of the researchhypothesis that generated the data.
The probability that we could have observed data (or data that weremore extreme) if the null hypothesis is true is called p-value. Thesmaller the p-value the more evidence we have against the nullhypothesis.
Test statistic
For most problems calculating a test statistic - a value which we cancompare with the known distribution we expect when the nullhypothesis is true - can be used to evaluate the probability:
test statistic =observed value − hypothesized value
standard error of mean
In many cases the hypothesized value is zero, so that the test statisticbecomes the ratio of the observed quantity of interest to it’s standarderror.
Error types
If H0 is rejected with high probability than based on this evidence youcan accept the research hypothesis.In general there are two possible decisions:
I reject H0 and accept H1 orI do not reject H0 and consider H1 as not approved
Note: H0 can never be accepted however large the p-value may be.
As apparent in the following table there are two possibilities to decidecorrectly and two possibilities to make errors:
Decision H0 is really true H0 is really false
Do notreject H0
correctType II error
The probability of this is β
Reject H0Type I error
The probability of this is α correct
Significance
α the (maximal) probability of the Type I error is the level ofsignificance. By reducing the risk of an error of the first kind weincrease the risk of an error of the second kind.
I The conventional compromise is to choose α = 0.05 as level ofsignificance.
I If p ≤ α H0 is rejected (the research hypothesis accepted) andthe test is stated statistically significant.
I Sometimes, if α = 0.001 is chosen and p ≤ α the test is stated’highly’ significant.
These are reasonable guide-lines, however, not an absolutedemarcation. There is not a great difference between p=0.06 andp=0.04 and they indicate similar strength of evidence. Therefore thep-values should provided and not only that the test is significant.
Two-sided tests versus one-sided testsExample for an one sample testH0 : µ = µ0H1 : µ = µ1 6= µ0 ... two-sided alternative hypothesisH1 : µ = µ1 > µ0 or µ = µ1 < µ0 ... one-sided alternative hypotheses
One sided tests are rarely appropriate and in most cases two-sidedtests are used. Even when there is strong prior expectations, forexample that a new treatment can’t be worse than than the old oneyou can not be sure (otherwise you would not need an experiment).
Critical regions of a two−sided test
tα/2 µ0 t1−α/2
reject H0 fail to reject H0 reject H0
Critical region of an one−sided test (lower tail)
tα µ0
reject H0 fail to reject H0
Critical region of an one−sided test (upper tail)
µ0 t1−α
fail to reject H0 reject H0
Power of a test
The statistical power of a test is defined as 1− β. This is theprobability that a new therapy or theory is proven better, if it is reallybetter.
The power depends on the sample size n and effect size δ, whichrefers to the magnitude of the effect under the alternate hypothesis.
The effect size if means of normal distributed data are compared is:
δ =µ1 − µ0
σ0.
I Optimal tests are defined, that at a given α the power is maximal.I The power decreases, if α decreases.I The power increases, if the variability decreases.I The power is better for one-sided tests.
Power analysis
Since there is a relation between α, the power (1− β), the samplesize n, and the effect size δ the optimal sample size can be calculatedfrom the other parameters. This procedure is called power analysis.
1. Estimate effect size (e.g. from literature)2. Define α and β3. Calculate optimal sample size n
Calculation of sample size (continuous data)
Determination of difference in the mean to a given µ and knownvariance σ0 and independent normal distributed data x1, ..., xn.
z =√
nx − µ0
σ0
H0 : µ = µ0 is z normal distributed and H0 is rejected if |z| > z1−α/2
If µ = µ1 > µ0 ⇒ z =√
nx − µ1
σ0+√
nµ1 − µ0
σ0
z1−α/2 = zβ +√
nδ ⇒ n =(z1−α/2 + z1−β)2
δ2
For example: α = 0.05, β = 0.20, δ =38− 35
6= 0.5 ⇒
n =(z0.975 + z0.80)2
0.52 ≈(1.96 + 0.84)2
0.25≈ 31
Calculation of sample size (proportions)
The effect size if two proportions p0 and p1 are compared is:δ = p0 − p1
The sample size can be calculated as follows:
n =(z1−α/2 + z1−β)2 ∗ [p0(1− p0) + p1(1− p1)]
δ2
For example: α = 0.05, β = 0.20,p0 = 0.80,p1 = 0.75 ⇒
δ = 0.80− 0.75 = 0.05
n =(z0.975 + z0.80)2 ∗ (0.80 ∗ 0.20 + 0.75 ∗ 0.25)
0.052
≈(1.96 + 0.84)2 ∗ 0.35
0.0025≈ 1094
Estimation versus hypothesis testing
I There is a close relation between confidence intervals andhypothesis testing:p < 0.05 (i.e. significant)⇔ the 95% interval does not includethe value specified in H0.The reason for this relation is that both methods are based onsimilar aspects of the theoretical distribution of the test statistic.
I The confidence interval shows the uncertainty, or lack ofprecision, in the estimate of interest, and thus conveys moreuseful information than the p-value.
I The use of a new treatment is dependent not only on thesignificance but also on the amount of the effect. A singlenumber (p-value) cannot convey the necessary information.
Testing for equality or noninferiority
In traditional comparative studies, the burden of proof rests on thealternative (research) hypothesis of difference between the groups. Ifthe evidence is not strong enough in favour of a difference, equalitycannot be ruled out, but the null hypothesis cannot be accepted.
”Absence of evidence is not evidence of absence”Altman DG and Bland JM, British Medical Journal, 1995;311:485.
It is not possible to establish an alternative hypothesis of exactequality. Therefore a region around the mean has to be defined,where the two means are considered equal:
|µ1 − µ0| < δ
δ represents the equivalence margin. This allows us to define thefollowing null and alternative hypotheses:
H0 : |µ1 − µ0| ≥ δH1 : |µ1 − µ0| < δ
Testing for equality or noninferiority
Two one-sided t-tests (TOST)Perform two one-sided tests based on the following split nullhypotheses:
H01 : µ1 − µ0 ≥ δH02 : µ1 − µ0 ≤ −δ
The p-value for the overall test is p = max(p1,p2). Whether toperform correction for multiple testing is heavily discussed. If youwant to be on the save side, divide α by 2 (Bonferroni correction).
Schuirmann DJ, J Pharmacokin Biopharm, 1987;15:657-680.Wellek S, Testing Statistical Hypotheses of Equivalence. CRC Press, 2003.
Confidence intervalConstruct the (1− 2 ∗ α) confidence interval of the difference of themeans. If the CI for the difference is completely contained in theinterval b−δ, δc then we declare equivalence.
Tryon WW, Psychological Methods, 2001;6(4):371-386
Non-parametric tests
Parametric methodsI Makes assumptions about the sampling distributionsI Based on theoretical distributions which are described by
parameters (mean, standard deviation)I Confidence intervals and hypothesis tests
Non-parametric (distribution-free) methods
I Often used to analyze data which are not normal distributed (i.e.skewed data)
I Mostly based on the ranks or on comparing sum of ranks.I Tend to be more suited to hypothesis testing than estimationI In some cases estimation calculation of confidence intervals is
possible (e.g. median).
Multiple testing
ProblemIf more hypothesis tests are done in parallel than the probabilityincreases to draw wrong conclusion.
Example - MicroarraysThousands of genes are tested if they are significantly differentialexpressed.
I In case of 1000 tests, 50 false positives are expected at a Type Ierror level of 0.05.
I The probability for k independent tests, that at least one p < α is1− (1− α)k and converges for large k towards 1.
I Multiple testing corrections adjust p-values (or the significancelevel α) derived from multiple statistical tests to correct for theseoccurrence of false positives.
Type I error
Decision H0 is really true H0 is really false
Do not reject H0 U T (Type II error) G − RReject H0 V (Type I error) S R
G0 G1 G
Per family and per comparison error rate
PFER = E(V ), PCER = E(V )/G
Family wise error rate (FWER)
FWER = P(V > 0)
False discovery rate (FDR)
FDR =
E(V/R) R > 0
0 R = 0
Methods for multiple testing corrections
Method Error control
Bonferroni FWER most stringentBonferroni step down (Holm) FWER ..Westfall and Young permutation FWER ..Benjamini and Hochberg FDR FDR less stringent
Familiy-wise error allow very few occurrences of false positives.
False discovery error rate allows a percentage of called genes to befalse positive.
Multiple testing corrections
Sort p-values from smallest to largest and apply correctioncorresponding to the selected method.
p Bonf. Holm BH(FDR)
p(1) p(1) ∗ n p(1) ∗ n p(1) ∗ np(2) p(2) ∗ n p(2) ∗ (n − 1) p(2) ∗ n/2: : : :p(i) p(i) ∗ n p(i) ∗ (n − i + 1) p(i) ∗ n/i: : : :p(n−1) p(n−1) ∗ n p(n−1) ∗ 2 p(n−1) ∗ n/(n − 1)
p(n) p(n) ∗ n p(n) p(n)
padj = min(1,p)
Westfall and Young permutation
1. Compute the t statistic for each row in the original dataset.2. Order them: |t(1)| ≥ |t(2)| ≥ |t(3)| ≥ ... ≥ |t(k)|3. Permute columns of data matrix4. Compute t statistics for all rows of the permuted dataset:
t (b)1 , ..., t (b)
k
5. Compute u(b)k = |t (b)
(k) | and
u(b)j = max(u(b)
j+1, |t(b)(j) |), 1 ≤ j ≤ k − 1
6. Repeat 1-5 N times and calculate the adjusted p-values:
p(j) =
N∑b=1
I(u(b)j ≥ |t(j)|)
N,
where I(•) is the indicator function set to 1 if the condition inparentheses is true and 0 if false.
.
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Comparing groups: general steps
I Determine data type and putative distributionI (Test data for the presumed distribution)I Select the test according to data type and distributionI Formulate null hypothesis and select significance level αI Calculate test statistics value, determine degrees of freedom (df)I Determine p-value based on the test statistics value by lookup in
table with dfI Adjust the p-value if multiple tests were performedI Reject the null hypothesis if the p-value ≤ α
Choosing an appropriate method
There are several aspects of the data to be considered whenchoosing an appropriate method of analysis:
I The number of groups of observationsI Independent or dependent groups of observationsI The type of the dataI The distribution of the dataI The objective of the analysis
Comparing groups of continuous data
One group of observationsComparing the mean of a single group of observations with a specificvalue k .
Confidence interval for the mean
Is k within the (1− α)CI : [x − t1−α/2s√
n, x + t1−α/2
s√
n]
One sample t-test
t =x − ks/√
n
Confidence interval for the medianFrom the ranked data the CI are the values of the nearest ranks to
[np − 1.96√
np(1− p),np + 1.96√
np(1− p)] with p =12
One group of observations
Binomial sign test
z =r − np√np(1− p)
where r is the number of observations > k and p =12
Binomial sign test with continuity correction
z =|r − np| − 1
2√np(1− p)
Wilcoxon signed rank sum test1. calculate differences: xi − k2. rank them in order to the magnitude |xi − k |3. calculate the sum of all positive (negative) ranks corresponding tothe observation above(below) k
⇒ get p-value for the sum from tabulated test statistic.
Two groups of paired observations
Confidence interval for the differences between means
(1− α)CI : [d − t1−α/2SE(d), d + t1−α/2SE(d)]
Paired t-testOne sample t test can also be used for the comparison of meansusing mean difference (d):
t =d − kSE(d)
(e.g. k = 0)
Non-parametric methodsOne sample sign test and Wilcoxon signed rank sum test can also beapplied to the differences between the paired data (Wilcoxonmatched pairs signed rank sum test).
Two groups of independent observationsConfidence interval for the differences between means
Pooled variance: s2 =(n1 − 1)s2
1 + (n2 − 1)s22
n1 + n2 − 2
Standard error: SE(x1 − x2) = s√
1n1
+ 1n2
(1− α)− CI : x1 − x2 ± t1−α/2SE(x1 − x2)
Two sample t-test
t =x1 − x2
SE(x1 − x2)
Welch testThe Welch test is a modification of the t-test for the case of unequalvariances of the two groups.
t =x1 − x2√
s21
n1+
s22
n2
with degreesof freedom
df =(s2
1/n1 + s22/n2)2
(s21/n1)2
n1 − 1+
(s22/n2)2
n2 − 1
Two groups of independent observations
Mann-Whitney U-test
Rank all observations (as if they were from a single sample)
U1 = n1n2 + n1(n1+1)2 −
n1∑i=1
ri
U2 = n1n2 + n2(n2+1)2 −
n2∑i=1
ri
U = min (U1,U2)
U < U(α;n1;n2) ⇒ test is significant (U() from the tabulated statistics).
Mann-Whitney U-test Example
Two groups: A=7,4,9,17 and B=11,6,21,14
Is there any evidence that A and B are drawn from populations withdifferent levels of the variable. H0: There is no tendency for membersof one population to exceed members of the other.
Ranked observations: A B A A B B A B4 6 7 9 11 14 17 21
For each A(B), count how many Bs (As) are preceding:U = 0 + 1 + 1 + 3 = 5 and U ′ = 1 + 3 + 3 + 4 = 11
U + U ′ = n1 ∗ n2 ⇔ 5 + 11 = 4 ∗ 4 U = n1n2 +n1(n1 + 1)
2−
n1∑i=1
ri
There are 70 different ways of arrangements (8!/4!4!) and each hasequal probability of 1/70 under the null hypothesis. E.g. U = 2 :AAABBABB, AABAABBB⇒ p = 2/70 = 0.029.
Comparing two variances using the F-test
We can test the null hypothesis that two population variances areequal using the F-distribution.
If the data are normal distributed the ratio of two independentestimates of the same variance will follow an F-distribution.
F (ν1, ν2) =χ2
1/ν1
χ22/ν2
with ν1, ν2 are the degrees of freedom.
Calculate (s1/s2)2 with s1 > s2 and look up with degrees of freedom(ν1 = n1 − 1; ν2 = n2 − 1) in the tabled F-statistic.
Chi-square distribution
The chi-square distribution results when independent variables withstandard normal distributions are squared and summed:
X = (Z1 + c1)2 + (Z2 + c2)2 + ...+ (Zν + cν)2 has a χ2 distribution
with ν degrees of freedom and non-centrality parameter δ2 =ν∑
i=1c2
i .
More independent groups of observations
One way ANalysis Of VAriance (ANOVA)The main objective is to define the sources of variation that have anyinfluence on the data. The following model is suggested for the data,where just one factor is supposed to be effecting at the population:
xij = µ+ αi + εij i = 1, ..., k j = 1, ...,ni
k∑i=1
ni = N
The idea is to test if the data xij can be explained as the responsefrom different treatments (groups i = 1, ..., k ) of a given factor.
αi is the treatment effect and can be characterized by the samplemean for every subgroup:
αi = xi − x =1ni
ni∑j=1
xij +1N
k∑i=1
ni∑j=1
xij
and
εij = xij − xi
ANOVA
H0 : µ1 = µ2 = ... = µkH1 : At least one of the µi is not equal to the others
The test of H0 is based on estimating σ2. A general estimator of thevariance is based on the variance within groups:
MSE =s2
1 + s22 + ...+ s2
k
k=
1N − k
k∑i=1
ni∑j=1
(xij − xi )2
The second estimator of the variance is based on the variancebetween groups:
MSA = Ns2x =
1k − 1
k∑i=1
ni (xi − x)2
If H0 is true both variances would be very similar and if MSA >> MSEthan H0 is rejected. This can be formulated by the F statistic:
F =MSAMSE
; H0 is rejected if F exceeds the critical value fα;k−1,N−k
ANOVA
All the information can be summarized with an ANOVA table:
Variation df Sum of squares MSS F-value
Treatments k-1 SSA =k∑
i=1ni (xi − x)2 SSA/(k-1) MSA/MSE
Error N-k SSE =k∑
i=1
Ni∑j=1
(xij − xi )2 SSE/(N-k)
Total N-1 SST =k∑
i=1
Ni∑j=1
(xij − x)2
ANOVA
H0 : µ1 = µ2 = ... = µk
Rejecting the Null hypothesis signifies that their is a statisticallysignificant difference (at the level α) between any of the group means.
It is, however, not known which of the means differ. Therefore a”post-hoc” test is necessary to determine which specific means showa difference. The following tests are commonly used:
I Fisher’s Least Significant Difference (LSD): Similar to pair-wiset-tests between all groups, uses pooled SD of all groups. Doesnot correct for multiple testing.
I Tukey’s Honestly Significant Difference (HSD): Similar topair-wise t-tests between all groups, does correct for multipletesting.
I Scheffe’s method: Corrects α for all pair-wise and also for allcomparisons involving more than two means at a time.
Kruskal-Wallis test
As ANOVA is a more general form of the t-test, the Kruskal-Wallis testis a more general form of the non-parametric Mann-Whitney U-test.
H =12
N(N + 1)
k∑i=1
Ni (Ri − R)2
where R is the average of all ranks (R = (N + 1)/2), Ri is the ranksum of Ni observations in the i th group and Ri is the average rank ineach group (Ri = Ri/Ni )
The H statistic can be also equivalently formulated:
H =12
N(N + 1)
k∑i=1
R2i
Ni− 3(N + 1)
H is χ2 distributed with k-1 degrees of freedom. For more than one tieH has to be corrected by
C = 1−k∑
i=1(t3
i − ti )/(N3 − N) and H ′ =HC
Comparing groups of categorical data
I Categorical data are very common in medical research, whenindividuals are categorized into one or more mutually exclusivegroups. The number falling into a particular group is calledfrequency.
I The data are often shown in form frequency tables.I It can be also summarized as the proportion of the total number
of individuals in one of the categories.
One proportionConfidence interval
p =rn
and SE(p) =
√p(1− p)
n
Based on normal distribution when np > 5 and n(1− p) > 5⇒ r > 5and (n − r) > 5
95% CI: [p − 1.96
√p(1− p)
n,p + 1.96
√p(1− p)
n]
Hypothesis testTest the null hypothesis that the population proportion is somepre-specified value k :
z =p − kSE(p)
with SE(p) =
√k(1− k)
n
and with continuity correction:
z =|p − k | −
12n
SE(p)
Proportions in two independent groups
Confidence interval
SE(p1 − p2) =
√p1(1− p1)
n1+
p2(1− p2)
n2
95% CI : [p1 − p2 − 1.96SE(p1 − p2),p1 − p2 + 1.96SE(p1 − p2)]
Hypothesis test
p =r1 + r2
n1 + n2
SE(p1 − p2) =
√p(1− p)
n1+
p(1− p)
n2=
√p(1− p)(
1n1
+1n2
)
z =p1 − p2
SE(p1 − p2)and zc =
|p1 − p2| −12
(1n1
+1n2
)
SE(p1 − p2)
Two paired proportionsExample - Sleep difficultiesTwo groups of individuals were investigated with regard to sleepdifficulties. The individuals were matched with respect to age (within5 years), level of education, marital status, occupation, tobaccosmoking frequency and duration, and alcohol use.
Marijuana group Control group Number of pairs
yes yes a = 4yes no b = 3no yes c = 9no no d = 16
total n = 32
p1 − p2 =a + b
n−
a + cn
=b − c
n
Karacan I, et al. Ann NY Acad Sci, 1977;282(1):348-374
Two paired proportions
Confidence interval
SE(p1 − p2) =1n
√b + c −
(b − c)2
n
95% CI : [p1 − p2 − 1.96SE(p1 − p2),p1 − p2 + 1.96SE(p1 − p2)]
Hypothesis testReplace both b and c by (b + c)/2
SE(p1 − p2) =1n
sqrtb + c
2+
b + c2
=1n√
b + c
z =p1 − p2
SE(p1 − p2)=
b − c√
b + c
Analysis of frequency tables
Chi squared test for an r × c tableThe null hypothesis is that the two classifications (columns and rows)are unrelated in the relevant population.
Compare the observed frequencies which what we would expect ifthe null hypothesis is true.
X 2 =r∑
i=1
c∑j=1
(Oij − Eij )2
Eij
with observed frequencies Oij and expected frequencies Eij
The expected frequency in each cell is the product of the relevant rowand column totals divided by the sum of all the observed frequenciesin the table (i.e. sample size).
X 2 is χ2 distributed with (r − 1)(c − 1) degree of freedom.
2x2 frequency tables
C1 C2 total
R1 a b a+bR2 c d c+dtotal a+c b+d N
There are two common tests for 2× 2 frequency tables:I Chi squared test (if all Eij > 5)I Fisher’s exact test
Chi squared test
For the first cell:
(O11 − E11)2
E11=
(a− (a+b)(a+c)N )2
(a+b)(a+c)N
and for the sum of all 4 cells:
X 2 =N(ad − bc)2
(a + b)(a + c)(b + d)(c + d)
Continuity correction (also known as Yates’ correction):
X 2Y =
N(|ad − bc| − N2 )2
(a + b)(a + c)(b + d)(c + d)
The Chi squared test is equivalent to the comparison of proportions.
Fisher’s exact test
The method consists of evaluating the probability associated with allpossible 2x2 tables which have the same row and column totals,making the assumption that the null hypothesis is true.
p(a,b, c,d) =(a + b)!(a + c)!(b + d)!(c + d)!
N!a!b!c!d !
In order to calculate the significance of the observed data, i.e. thetotal probability of observing data as extreme or more extreme if thenull hypothesis is true, there are 2 possibilities:
1) Add the probabilities in the ’tail’ of the distribution in which theobserved data fall and double the value to get a two tailed test.
2) Add up probabilities of all tables where p < p(a,b, c,d).
McNemar’s test for paired samples
Cases+ - total
+ a b a+bControl
- c d c+dtotal a+c b+d N
X 2 =(|b − c| − 1)2
b + c
Ordered 2 x k contingency table
score/categories x1 = 1 ... xk = k total
frequency r1 ... rk R =k∑
i=1ri
total n1 ... nk N =k∑
i=1ni
From the regression approach we get:
X 2trend =
[k∑
i=1rixi − Rx
]2
p(1− p)
[k∑
i=1nix2
i − Nx2
], df = 1, p =RN
, x =k∑
i=1
nixi
N
An alternative approach is based on Kendall’s rank correlation (τ )
X 2 =
(τ
SE(τ)
)2
Coefficients of association
The following coefficients are defined to describe the association ofnominal data (categories) from contingency tables (k = min(r , c)).
Contingency coefficient (Pearson):(adjusted for number of rows and columns) CC =
√k
k − 1
√χ2
n + χ2
Cramer’s V: V =
√χ2
n × (k − 1)
Phi (Cramer’s V in non-square tables): φ =
√χ2
n
Coefficient of association (Yule): Q =ad − bcad + bc
Eta is a coefficient of nonlinearassociation. It is designed for caseswhere one of the measures is nominaland the other numeric.
η =
√1
n−1
∑ki=1 ni (yi − y)2
s2y
Comparing risks
Relative risk and odds ratio
Outcome+ - total risk odds
+ a b a+b a/(a+b) a/bExposure
- c d c+d c/(c+d) c/d
total a+c b+d N RR =a/(a + b)
c/(c + d)OR =
adbc
There is another way of analyzing 2× 2 tables, which involves thecomparison of two groups with respect to the risk of some event.
The methods were developed from epidemiology, especially for theanalysis of case control studies.
The parameters of interest are the relative risk (RR) and the oddsratio (OR).
Relative risk
In a prospective study groups of subjects with different characteristicsare followed up to see whether an outcome of interest occurs.
The risks in the two groups (exposed and non-exposed) are a/(a + b)and c/(c + d).
The relative risk RR =a/(a + b)
c/(c + d)
Under the null hypothesis the expected value of RR is 1.
SE(log RR) =
√1a−
1a + b
+1c−
1c + d
(1−α)CI : [log RR−z1−α/2SE(log RR), log RR + z1−α/2SE(log RR)]
Odds ratio
In retrospective case-control studies the selection of subjects is basedon the outcome. In this case the relative risk is not a valid estimate.
We can use the odds (a/b) of the outcome in the first group (cases)and compare to the odds (c/d) in the second group (controls) and get
the odds ratio OR =adbc
For case-control studies the outcome of interest is usually rare so theodds ratio offers a method of getting an approximate relative risk.
SE(log OR) =
√1a
+1b
+1c
+1d
(1−α)CI : [log OR−z1−α/2SE(log OR), log RR +z1−α/2SE(log OR)]
Goodness-of-fit
I qq-plotI Chi-square goodness-of-fit testI Kolmogorov-Smirnov test (KS-test)I Shapiro-Wilk testI Anderson-Darling test
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Correlation and regression
Aim is to find associations between two- or more variables (bivariateor multi-variate data).
Possible questions are:
1. Is there a relation between variables?2. How strong is this relation?3. Which shape has this relation?4. Can a variable of interest predicted by observation of other
variables?
Correlation
The correlation is a method which analyzes the strength of the linearagreement between x and y . Where x and y are pairwiseobservations of the same observation unit (bivariate data). Asmeasure the (Pearson) correlation coefficient r is used.
Variance
s2x =
1n − 1
n∑i=1
(xi − x)2 s2y =
1n − 1
n∑i=1
(yi − y)2
Covariance
Cov(x , y) = sxy =1
n − 1
n∑i=1
(xi − x)(yi − y)
Correlation
r =sxy
sxsy=
n∑i=1
(xi − x)(yi − y)√n∑
i=1(xi − x)2
n∑i=1
(yi − y)2
Correlation
Test for linear relation
H0: true correlation ρ = 0
For common normal distributed (x , y) the test-statistic
T = r
√n − 21− r2
is t-distributed with n − 2 degree of freedom.
With following transformation the correlation is approximatelystandard normal distributed:
z ′ = 0.5(ln(1 + r)− ln(1− r)) and SE =1
√N − 3
(1− α)CI : [(e2zl − 1)/(e2zl − 1), (e2zu − 1)/(e2zu − 1)] with
zl = z ′ − z1−α/21
√N − 3
and zu = z ′ + z1−α/21
√N − 3
Spearman’s rank correlation
Spearman’s rank correlation coefficient rs is obtained by ranking thevalues of the two variables separately and calculate the Pearson’scorrelation on the ranks of the data. For ties the average rank is used.
In case when there are no ties the Spearman’s rank correlation canbe calculated simpler:
rs = 1−6
n∑i=1
d2i
N3 − N
where di are the differences in the ranks.
Kendall’s τ
Kendell’s rank correlation coefficient τ is the proportion of concordantpairs (ordered the same way) minus the proportion of discordant pairs(ordered in opposite way).
τ =nc − nd
12 n(n − 1)
=S
12 n(n − 1)
When there are no ties nc + nd = n(n − 1)/2.
To allow for perfect correlation when ties were between subjects forboth variables there is a different version:
τb =S√
(n(n − 1)/2−∑
t(t − 1)/2)(n(n − 1)/2−∑
u(u − 1)/2)
Considerations for calculation of correlation
1. If a lot of variables will be tested there are many correlations. Asfor multiple testing the significant correlations are overestimated.
2. Spurious correlations for trends over time (divorce rate vs. priceof gasoline)
3. Correlation by heterogeneity (frequency of voice vs. body height:correlation based on gender)
4. Trivial correlation5. Confounding variables (number of storks vs. birth-rate;
Simpson’s paradox)6. Non-linear relations7. Extreme data points
Regression
We want to describe the relation between a set of data on twocontinuous variables and predict the value of one variable for anindividual when we only know the other variable.
Also the effect of one variable on the other variable is of interest.Therefore the relation is directed and the variables are categorized:
X .. independent, predictor value (plotted on the horizontal x-axis)
Y .. dependent, response or outcome variable (plotted on the verticaly-axis)
Whereas correlation provides strength and sign of a relation,regression gives a quantitative model of the relation of dependentvariables.
Linear regression
Define a statistical model of regression:yi = f (xi ) + εi i = 1, ..,nwhere f is the regression function and εi is random noise (error) withE [εi ] = 0 and variance σ2.
For linear regression the regression function is the linear function:f (x) = β0 + β1xwhere β0 is the intercept and β1 the slope of the linear function.
Estimation of parameter
Minimum least square method:
∂n∑
i=1(yi − β0 − β1xi )
2
∂β0= 2
n∑i=1
(β1xi + β0 − yi ) = 0
∂n∑
i=1(yi − β0 − β1xi )
2
∂β1= 2
n∑i=1
xi (β1xi + β0 − yi ) = 0
⇒
β1 =
n∑i=1
(xi − x)(yi − y)
n∑i=1
(xi − x)2= r
sy
sx
β0 = y − β1x
εi = yi − yi = yi − β0 − β1xi = yi − y − β1(xi − x)
Residual variance
s2res =
n∑i=1
(yi − yi )2
n − 2=
n∑i=1
(yi − y − β1(xi − x))2
n − 2= (1− r2)s2
y
The variance can be divided in residual (unexplained) s2res and by the
regression explained variance (s2reg)):
s2y︸︷︷︸
total
= s2reg︸︷︷︸
explained
+ s2res︸︷︷︸
unexplained
= r2s2y + (1− r2)s2
y
⇒ r2 is a measure for the quality of the regression
Confidence interval
Slope
SE(β1) =sres√
n∑i=1
(xi − x)2
(1− α)CI : β1 ± t1−α/2SE(β1)
Estimated y for a given x
SE(y) = sres
√1n
+(x − x)2∑ni=1(xi − x)2
(1− α)CI : y ± t1−α/2SE(y)
Hypothesis testH0 : There is no relation⇔ β1 = 0
The ratioβ1
SE(β1)is compared with the t-distribution with df = n − 2.
Prediction interval
spred = sres
√1 +
1n
+(x − x)2∑ni=1(xi − x)2
(1− α) prediction interval: y ± t1−α/2spred
Here the estimated standard deviation of the individual values y − yat the value x is used and not the standard error.
Note: the prediction interval is much wider than the confidenceinterval.
The confidence interval and the prediction interval can be added tothe scatter plot around the regression line.
Causality
Correlation and regression are based on similar mathematicalbackground but are distinct methods with a different purpose.
Correlation and regression only give information about association,however, a causal relation cannot be directly inferred. This appliesregardless of the strength of the observed association.
One of the strongest ways to prove causal inference is to conduct anexperiment (i.e., systematically manipulate a variable to study itseffect on another).
Causal inference
ProblemI Confounding variables (see Simpson’s paradox)
MethodsI Pearl’s do-operatorI Control by selection (stratification)
no variation in the confounding variable
I Statistical control
Partial correlationMultiple regression model
I Directionality and time
Pearl’s do operator
The idea is to perform an atomic intervention, leaving all othermechanisms unperturbed. This is denoted by do(Xi = xi ) or shortdo(xi ).
Pearl J, Causality - Models, Reasoning, and Interference, Cambridge University Press,2000
Partial correlation
Partial correlation represents the relationship between two variableswhile controlling for a third variable.
rYZ .X =rZY − rZX × rXY√1− r2
ZX
√1− r2
XY
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Scatter plots
Multiple regression
In observational studies we are interested in the way one variable isinfluenced by several variables
X1, ...,Xk ... predictor variables, explanatory variables
Y ... dependent, response or outcome variable is expressed as acombination of the explanatory variables
It is not necessary for the explanatory variables to be continuous.
Statistical Model:
y = β0 + β1x1 + β2x2 + ...+ βk xk + εi
where β0...βk are the regression coefficients.
Multiple regression
Multiple regression
Y = Xβ + ε
with β = (β0, β1, ..., βk )T and
Y =
y1...
yn
, X =
1 x11 · · · x1k...
.... . .
...1 xn1 · · · xnk
, ε =
ε1...εn
Minimum square estimator:
n∑i=1
ε2i = (Y − Xβ)T (Y − Xβ)→ min!⇒
∂
∂β(Y − Xβ)T (Y − Xβ) = Y − Xβ = 0
β = (X T X )−1X T Y
Global F test
H0 : β1 = β2 = ... = βk = 0
Source df Sum of squares MSS F-value
Regression k SSreg =n∑
i=1(yi − y)2 SSreg
kMSreg
MSres
Residues n − k − 1 SSres =n∑
i=1(yi − yi )
2 SSres
n − k − 1
Total n − 1 SSy =n∑
i=1(yi − y)2
SSy = SSreg + SSres
Goodness-of-fit
R2 = 1−SSres
SSy=
SSreg
SSy=
n∑i=1
(yi − y)2
n∑i=1
(yi − y)2
R2 ∗ 100% tells how many percent of the variability around theabsolute mean can be explained by the regression.
The expected value of R2 will increase independent of the influenceof each variable as more variables are added to the model⇒
Adjusted R2 = 1−MSres
MSy= 1−
n − 1n − k − 1
(1− R2)
When linear regression is performed than R2 = r2. For multipleregression models R is called multiple correlation coefficient, howeverit must not interpreted the same way.The F-test is the only way to assess whether a model explains asignificant proportion of variability.
Variable selection
A problem arises when the number of variables p is high compared tothe number of observations.
⇒ Selection of variables:I Only select those variables which are significant or most
significant in pairwise comparison.I In case of many strong correlated variables include only one of
them in modelI Include variables with already known influenceI Exclude correlated variables where the influence is not plausible.
Forward selection
I Start with null model or take only those variables which have tobe in the model.
I Add stepwise those variables which lead to the most reduction ofSSres.
I Stop procedure when SSres can not be reduced (or whenchanges are very small) by adding a new variable.
Backward selection (elimination)
I Start with a model containing all variables p = k .
I Remove variables one by one which show the least increase ofSSres.
I Stop procedure when SSres is substantially increased byremoving one of the remaining variables.
All subsets regression
Selecting the best model is to examine every possible model:
I There are 2k − 1 subsets with i1, .., ip ⊆ 1,2, ..., k.
I Calculate for each subset a multiple regression with variablesXi1 , ...,Xip .
I Choose model with smallest p and acceptable SSres.
I Assess goodness-of-fit with Cp statistics.
Goodness-of-fit measures
Adjusted R2:
R2adj = 1−
MSres
MSy= 1−
n − 1n − p − 1
(1− R2)
F-test: Comparison of a model with k − 1 variables with a modelincluding an additional variable:
F =SSres(k − 1)− SSres(k)
SSres(k)/(n − k − 1)
Mallow’s Cp:
Cp =SSres(p)
MSres(k)− n − 2(p + 1)
Akaike information criterion (AIC, smaller values are better):
AIC = n log (SSres(p)
n) + 2(p + 1) + n
Model assumptions
I Linearity : The expected value of Y is linear dependent on theexplanatory variables
I Homoscedasticity : Homogeneity of variance of the residualsindependent of the explanatory variables
I Assumption of normal distribution of the residues
Methods to test assumptions
Two-way analysis of variance (ANOVA)
In one-way ANOVA the means across only one factor (treatmentgroups) are compared, whereas in two-way ANOVA the meansacross two factors are compared.
There are 2 common application cases:
1. Two-way cross classifications (e.g. randomized complete blockdesign RCBD)
2. Repeated measurements
Two-way cross classifications
↓ A B → Level 1 Level 2 . . . Level j . . . Level m Totalx111 x121 x1j1 x1m1
Level 1...
... . . .... . . .
... T1..x11n x12n x1jn x1mn
......
......
......
......
xi11 xi21 xij1 xim1
Level i...
... . . ....
...... Ti..
xi1n xi2n xijn ximn...
......
......
......
...xk11 xk21 xkj1 xkm1
Level k...
... . . ....
...... Tk..
xk1n xk2n xkjn xkmn
Total T.1. T.2. . . . T.j. . . . T.M. T...
Two way cross classification model
Statistical model:
xijl = µ+ αi + βj + γij + εijl
where groups of A are i = 1, .., k , the groups of B j = 1, ..,m, and thenumber of repeated measurements l = 1, ..,n.
The model describes if the data xijl can be explained by the overallmean the effects of treatments of the factor A, the treatment of thefactor B, and the interdependency between A and B.
This is called interdependency model and if the last term is omitted itis basically an additive model.
Partitioning the variation
SST = SSA + SSB + SSAB + SSE
SST =k∑
i=1
m∑j=1
n∑l=1
x2ijk −
T 2...
N
SSA =k∑
i=1
T 2i..
mn−
T 2...
N
SSB =m∑
j=1
T 2.j.
kn−
T 2...
N
SSAB =k∑
i=1
m∑j=1
T 2ij.
n−
T 2...
N− SSA− SSB
SSE = SST − SSA− SSB − SSAB
ANOVA table
Variation df SSQ MSS
Factor A k − 1 SSA MSA =SSAk − 1
Factor B m − 1 SSA MSB =SSB
m − 1
Interaction (k − 1)(m − 1) SSAB MSAB =SSAB
(k − 1)(m − 1)
Error N − km SSE MSE =SSE
N − km
Total N − 1 SST
Fixed and random effects
Fixed effectsA variable (effect, factor) is considered fixed, when all possible valuescan be observed in the study (e.g. the gender of a patient, type ofcar). Categorical variables are (in general) fixed effects.
Random effectsA variable (effect, factor) is considered random, when only a subsetof a population can be observed in a study (e.g. only the threeuniversities in Graz out of all universities in Austria, a patient withmultiple measurements).
F-values from ANOVA for different effects
Effects Fixed Random Mixed Mixed
A is Fixed Random Fixed RandomB is Fixed Random Random Fixed
Factor A F =MSAMSE
F =MSA
MSABF =
MSAMSAB
F =MSAMSE
Factor B F =MSBMSE
F =MSB
MSABF =
MSBMSE
F =MSB
MSAB
A× B F =MSABMSE
F =MSABMSE
F =MSABMSE
F =MSABMSE
Repeated measurementsThis analysis is considered to be an extension to the paired t test,since the measurements are done on the same subject and thereforecompass paired data.
An example for this type of analysis is studying short-term effects of adrug on the heart rate:
Time (min)Subject 0 30 60 120
1 96 92 86 922 110 106 108 1143 89 86 85 834 95 78 78 835 128 124 118 1186 100 98 100 947 72 68 67 718 79 75 74 749 100 106 104 102
Statistical model for repeated measurements
Statistical model:
xij = µ+ αi + βj (tj ) + εij
where tj are the time points or in general measuring points, βj (tj )individual effect of the subject j at the time point tj .
The question to address is, if the time course is constant (αi = 0) or ischanging over the subjects (αi 6= 0).
The analysis methods are differing in the assumptions of theindividual variations:
1. Multi-variate one-way analysis of variance (MANOVA)
2. Uni-variate model of analysis of variance with repeatedmeasurements
MANOVA
Multi-variate analysis of variance (MANOVA) is used when there are2 or more dependent variables (DV).
MANOVA uses a linear combination of the response variables, whichmaximizes the ratio of between-group and within-group variances ofz:
zik = c0 + c1xi1 + ...+ ck xik
If H denotes the hypothesis sums of squares and cross-productmatrix and E denotes the error sums of squares and cross-productmatrix then the matrix A can be expressed as A = HE−1.
The eigenvalues λi of A correspond to the factors ci in the linearcombination.
MANOVA
Based on the λis the following test statistics can be calculated:
Pillai’s trace = trace[H(H + E)−1] =k∑
i=1
λi
1 + λi
Hotellings-Lawley’s trace = trace(A) =k∑
i=1λi
Wilk’s Λ =|E |
|H + E |=
k∏i=1
11 + λi
Roy’s largest root = max(λi ).
These statistics are translated into F statistics in order to test the nullhypothesis.
Uni-variate model of analysis of variance withrepeated measurements
The within subjects design requires homogeneity of treatmentdifference variances. One can create a new set of variables,composed of all possible pairwise differences, and the variances ofthese differences must all be equal in the population. This is calledthe sphericity assumption.
The compound symmetry assumption - a special case of thesphericity assumption - is met if all the covariances (the off-diagonalelements of the covariance matrix) are equal and all the variancesare equal in the populations being sampled.
Since for more than 2 time points these assumptions are often not thecase there is a correction accounting for this namely theGreenhouse-Geisser and the Huynh-Feldt corrections.
Logistic regression
In many studies the outcome variable of interest is the presence orabsence of some condition, or in general a binary variable.
For such data multiple linear regression can’t be used and a similarapproach called multiple linear logistic or logistic regression is used.
Here the explanatory variables were used to predict a transformationof the dependent variable and model a probability therefore the linearmodel is not working.
The transformation is called logit:
logit(p) = log (p
1− p) with
p1− p
is the odds
and p is the proportion of individuals with the characteristic. Theregression model can be formulated as:
log (p
1− p) = β0 + β1x1 + β2x2 + ...+ βk xk
Logistic regression
p(x) =eβ0+β1x
1 + eβ0+β1x
p(x) is the logistic distribution function (from which the name isderived) and models the probability that y = 1.
If you want compare predictions for subjects with or without aparticular characteristic (explanatory variable) you have:
log (p1
1− p1)− log (
p2
1− p2) = log
p1(1− p2)
p2(1− p1)= log (OR)
With the logit transformation there is now a linear relation betweenthe explanatory variables and the outcome.
Estimation and tests in logistic regression
Estimation of regression coefficients βi and standard error SE(βi ) isdone by the maximum likelihood method.
For the test if the influence of xi on P(y = 1|xi ) is significant the nullhypothesis is H0 : βi = 0 and the alternative two sided hypothesis isβi 6= 0.
The test statistic is called Wald statistic:
W =βi
SE(βi )
which can be approximated by a normal distribution.
Interpretation of coefficients
Linear model
g(x) = log (p
1− p) = β0 + β1x
Binary variable x
For x = 0 and x = 1⇒
g(0) = β0 and g(1) = β0 + β1
β1 = g(1)− g(0) = log (OR) and OR = eβ1
Continuous variable x
If x changes by k units:
∆g = kβ1 = log (OR)
ekβ1 = (eβ1 )k = ORk ⇒
OR is multiplicative.
Computation
One issue to consider is that for y = 0 or y = 1 the logit(p) is −∞ or∞.
The method of analysis uses an iterative procedure whereby theanswer is obtained by several repeated cycles of calculation using themaximum likelihood approach.
The k + 1 not-linear equation can lead sometimes to numericalproblems. It is recommended that data from at least 20 events and 20not-events for each explanatory variable are available.
Due to the computational complexity logistic regression can onlyfound in large statistical packages.
Quality of the prediction
Information from different significant influence factors (explanatoryvariables) can be combined by the prognostic index (PI).
PI = β1x1 + β2x2 + ...+ βk xk
Like for diagnostic tests PI can divided with different cut-points andthe quality of the prediction can be studied by a receiver operatingcharacteristics.
Here for all cut-points c the prognostic index is studies how good theoutcome is predicted by the binary variable PI > c.
The AUC is measure for the quality, which can be compared for eachunivariate predictors (explanatory variables).
Discriminant analysis
We wish to be able to find some combinations of variables thatclassifies a large proportion of subjects into the correct group, so thatwe can have a good chance of allocating (diagnosing) new subjectscorrectly.
The basic idea of discriminant analysis to find the combination ofvariables that maximizes the separation between the groups, as withlogistic regression.
With more than two groups the groups can be further separated byconstructing a second combination of the same variables which arecalled canonical variates or discriminant functions.
Discriminant analysis
Group x1 x2 . . . xk
A 96 92 86 92A 79 75 74 74A 89 86 85 83A 95 78 78 83B 128 124 118 118B 100 98 100 94B 110 106 108 114B 93 87 91 89
The discriminant function can be defined as:
y = β0 + β1x1 + ...βk xk
The parameters βi are estimated in that way that the ratio of thebetween-groups variance to the within-groups variance is maximal.
Discriminant analysis
Discriminant (function) analysis DA is mathematically identical to asingle factor MANOVA: DA is multivariate analysis of variance(MANOVA) reversed. In MANOVA, the independent variables are thegroups and the dependent variables are the predictors. In DA, theindependent variables are the predictors and the dependent variablesare the groups.
Factor analysis and ordination techniques
Explorative methods to find an elementary explanation model formutual relations.
Overview of common ordination techniques
indirect direct
linear Principal Component Analysis (PCA) Redundancy analysis (RDA)
unimodal (Detrended) Correspondence Analysis ((D)CA) Canonical CA (CCA)
Other common methods in this context include Multi Dimensional Scaling(MDS) and Principal Coordinate Analysis (PCoA).
Principal component analysis (PCA)
Variables are summarized by a linear combination to the principalcomponents.
The origin of the coordinate system is centered to the center of thedata (mean centering).
The coordinate system is rotated to a maximum of the variance in thefirst axis⇒ First principal component (PC) is in direction of themaximum variance from origin and subsequent PCs are orthogonal tothe first PC and describe maximum residual variance.
This method can be approached by a singular value decomposition:of the (m × n) data matrix X
Principal component analysis (PCA)
X = UWV T with UUT = V T V = VV T = I
For mean centered data the Covariance matrix C can be calculatedby XX T .
U are eigenvectors of XX T and the eigenvalues are in the diagonal ofW defined by the characteristic equation |C − λI| = 0.
Transformation of the input vectors into the principal componentspace can be described by Y = XU where the projection of sample ialong the axis is defined by the j-th principal component:
yij =m∑
t=1xitutj
PCA for gene expression data
Correspondence analysis (CA)
CA is an extension of the analysis of contingency tables. In this case,the status of descriptors (objects in rows) are compared with these ofother descriptors (variables in columns).
Aim of the CA is to reduce the contingency tables to a fewsummarizing variables, showing a lack in independence betweenrows and columns.
The approach is a combination of using the χ2 statistic and singularvalue decomposition similar to that for principal component analysis.
Starting with an r × c contingency table where Ti are the row totals inrow i , Tj are the column totals in column j .
The total number is N and the number of observations in row i andcolumn j is nij .
Correspondence analysis (CA)
χ2 =(O − E)2
E⇒
A matrix S with elements sij can be constructed where
s2ij =
(nij
N−
TiTj
N2 )2
TiTj
N2
The matrix S can singular value decomposed with:
S = UWV T
W is a diagonal matrix, and its diagonal elements are referred to asthe singular values of S. We think of them as sorted from the largestto the smallest and denote them by λk .
Correspondence analysis (CA)
The coordinates for sample i in the new space are then given byaik = λk uik/
√Ti/N for k = 1, ..., J and the variables are viewed in the
same space with variable j given coordinates bjk = λk vjk/√
Tj/N fork = 1, ..., J.
These coordinates are called principal coordinates.
Overlay of PCA and CA of expression data of 773 genes in 73samples across 5 cell cycle phases in yeast. Three most informativecomponents and coordinates respectively are used.
Data from Spellman P, et al. Mol Biol Cell. 1998;9:3273-3297
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Experimental design
Basic principles
I ReplicationI Independence and pseudo-replicationI ControlsI RandomizationI Interspersion (Blocking, Stratification)I Design typesI Power analysis
Replication
I Reduce the effect of uncontrolled variation (i.e., increaseprecision)
I Quantify uncertaintyI Increase power of the significance test (Power analysis)
Types of replication
Technical replicates: replicates that share the same sample; i.e. themeasurements are repeated
Biological replicates: replicate measurements from independentbiological samples
Pseudo replicates
I ”Incorrect” replication when replicating samples, not treatmentsI Replicates are not independentI Type I error (α) approaches 1 with increasing number of samples
per unit.
Hurlbert SH, Ecological Monographs, 1984;54(2):187-211
Controls
I Any treatment against which one or more other treatments arecompared
I It may be an ”untreated” treatment, a ”procedural” treatment, orsimply a different treatment
I Controls must undergo identically experimental procedure to thetreated units (e.g. injection of a saline solution)
I To allow separation of the effects of different aspects of theexperimental procedure
Randomization
I Random sampling from clearly defined populationsI Experimental subjects (”units”) should be assigned to treatment
groups at random (does not mean haphazardly)I One needs to explicitly randomize using a computer, dice, ...I Avoid biasI Ensures that statistical inferences are reliable
InterspersionI Interspersion is necessary to avoid unbalanced effects of
unforeseen events (e.g. weather or other defects) betweentreatment and control.
I Even by randomization simple segregation can occur(with 3-fold replication the chances are 10%).
Hurlbert SH, Ecological Monographs, 1984;54(2):187-211
Common design types
I Factorial designsI Completely randomized designI Complete randomized block designI Latin square designI Cross over designsI Nested designI Split-plot designI Repeated measurements
Factorial design
One factorial experimentAim is to study the effect of one single factor (with several levels).
For example the only interesting factor is drug treatment, all otherfactors (age, weight, sex ...) are ignored (but should kept constant).
Multi-factorial experimentThe design incorporates two or more factors that are crossed witheach other. The term crossed indicates that all combinations of thefactors are included and that every level (group) of each factor occursin combination with every level of the other factors.
Multi-factorial design allows the study of interaction between factors.
Analysis of a two factorial design with two-way ANOVA.
Randomized complete block design (RCBD)
I Treatments are assigned at random within blocks of adjacentsubjects, each treatment once per block.
I The number of blocks is the number of replications.I Any treatment can be adjacent to any other treatment, but not to
the same treatment within the block.I Used to control variation in an experiment by accounting for
spatial effects.
Sample layout with 4 treatments (A-D) and 3 blocks (I-III):
Block I A B C DBlock II D A B CBlock III B D C A
Latin square design (LSD)
I Treatments are assigned at random within rows and columns,with each treatment once per row and once per column.
I There are equal numbers of rows, columns, and treatments.I Useful where the experimenter desires to control variation in two
different directions
Sample layout with 4 treatments (A-D) assigned to 4 rows (I-IV) and 4columns (1-4):
Column1 2 3 4
Row I A B C DRow II C D A BRow III D C B ARow IV B A D C
Crossover design
An experimental design that combines attributes of latin squares andrepeated measures designs is the crossover design, often used inexperiments that apply multiple treatments to individual organisms.
In its simplest form, the crossover design can be considered as a latinsquare where subjects are one blocking factor (e.g. rows) and timeperiods are a second blocking factor (e.g. columns) and treatmentsare applied to each combination of subject and period using one ofthe latin square randomizations:
Period 1 Period 2Subject 1 A BSubject 2 B A
Problematic in this type of design are carryover effects.
Nested design
Multi-factorial experimental designs where a factor(B) is crossed withon factor (C) but nested within another (A)
A second factor (or set of factors) is then applied to whole blocks, withreplicate blocks for each level of this factor.
Split-plot design
Split-plot designs were originally used in agricultural experiments andrepresents a randomized complete block design, with one or morefactors applied to the experimental units within each block.
A second factor (or set of factors) is then applied to whole blocks, withreplicate blocks for each level of this factor.
Note: The units of replication are different for different factors.
Crawley MJ, Statistical Computing Keough MJ & Quinn GP EcologicalWiley, 2002:352ff Applications, 1998;8(1):141-161
Repeated measure designs
Factor A are units of replication termed ”subjects”Factor B (subjects) nested within AFactor C: repeated recordings on each subject
Completely randomized design (2 factor design (2x8) with 10replicates)⇒ 160 subjects needed
Power analysis
There is a relation between the 4 parameters for a significance test:Sample size n, significance level α (commonly 0.05), power 1− β(commonly 80%), effect size δ = ∆/σ (standardized difference ofmeans)
1. Clearly define null hypothesis and alternative hypothesis2. Identify the statistical model to be applied to the data, the desired
power and significance level3. Identify the assumption of the statistical procedure
Obtain some pilot estimate of variation4. Specify the effect size (e.g. other studies of the same biological
system)5. Calculate sample size
Power analysis
Cohen suggested 1988 some values for small, medium and largestandardized differences (δ = 0.2,0.5,0.8).
A more useful approach may be to plot detectable effect size versussample size or the power versus the effect size.
If there are constraints on the size of the experiment or samplingprogram with an estimate of σ, chosen values for α and β and thenumber of observations possible to determine the minimumdetectable effect size (MDES).
Experimental design for cDNA microarrays
Churchill GA, Nature Genetics, 2002;32(Suppl):490-495
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Types of study design
1. Retrospective studies (of past events) including case-controlstudies2. Prospective studies (of past events)3. Cohort studies or epidemiological design (of ongoing or futureevents)4. Clinical trials
Basic structure for different designs
Types of studies
Therapy studyEffectivity of a drug, new surgery or alternative methodsDesign: RCT
Diagnosis studyValidity and reliability of new diagnostic testsDesign: Cross-sectional
Screening studyInvestigation of test resultsDesign: Cross-sectional
Prognosis studyProgress of an early diagnosed diseaseDesign: Cohort
Causal studyAssociation between dangerous substances and a diseaseDesign: Cohort, Case control
Hierarchy of medical studies
Clinical trials
Clinical studies form a class of all scientific approaches to evaluatingmedical disease prevention, diagnostic techniques, and treatments.Among this class trials, often called clinical trials, form a subset ofthose clinical studies that evaluate investigational drugs.
I Phase I trials focus on safety of a new investigational medicine.These are the first human trials after successful animal trials.
I Phase II trials are small trials to evaluate efficacy and focus moreon a safety profile.
I Phase III trials are well-controlled trials, the most rigorousdemonstration of a drug’s efficacy prior to federal regulatoryapproval.
I Phase IV trials are often conducted after a medicine is marketedto provide additional details about the medicine’s efficacy and amore complete safety profile.
Clinical trials
The goal in a phase I trial is to identify a maximum tolerated dose(MTD), a dose that has reasonable efficacy (i.e. is toxic enough to killcancer cells) but with tolerable toxicity (i.e. not toxic enough to kill thepatient).
Phase I trials are applied to patients from standard treatment failurewho are at high risk of death in the short term.
In phase II trials the optimal dose (MTD) is applied to a small group ofpatients meeting predefined inclusion criteria (there are alsoexclusion criteria) and the response rate, the proportion orpercentage of patients who respond, is studied.
A second type of phase II trials consist of small comparative trialswhere we want to establish the efficacy of a new drug against acontrol or standard regimen.
Clinical trials
Phase III/IV are larger studies and the standard is a randomizeddouble-blind controlled trial (”golden standard”).
Controlled: The drug is tested against a control group receiving aplacebo or the standard treatment. The size, shape, procedureshould be very similar to control psychological and emotional effects.
Randomized: If a patient gets the drug or the placebo is assignedrandomly.
Stratified randomization: If there are expected cofounding variables(e.g. age) patients are stratified and treatment randomly assignedwithin stratum.
Minimization: A non-random treatment allocation for smaller trials.The allocation is based on the balance of several parameter, so thatthe n + 1 treatment is assigned based on the sum of the numberswithin the stratified variables (e.g. age≤50 or age>50).
Clinical trials
Double-blind: Blind to the patient and blind to the investigator(Triple-blind means that also regulatory officers/statisticians are”blinded”).
Selection of subjects: Based on inclusion/exclusion criteria
Alternative designs
I Crossover designI Within group (paired) comparisonsI Sequential designI Factorial designI Adaptive designI Zelen’s design
Sample size
Sample size for Phase II trials and surveys:
n =(z1−α)2p(1− p)
d2 (response rate)
Sample size for other Phase II trials:
n =(z1−α)2s2
d2 (continuous endpoint)
n =(z1−α + z1−β)2
( 12 ln 1+r
1−r )2+ 3 (correlation endpoint)
Sample size
Phase II designs for selection:
N =4(z1−α)2s2
d2 (continuous endpoint)
N =4(z1−α)2p(1− p)
(p2 − p1)2 (binary endpoint)
Phase III trials:
N =4(z1−α + z1−β)2σ2
d2 (comparison of 2 means)
N =4(z1−α + z1−β)2p(1− p)
(p2 − p1)2 (comparison of 2 proportions)
Number-needed-to-treat
Experimental event rate: EER =a
a + b
Control event rate: CER =c
c + d
Relative risk: RR =EERCER
Relative risk reduction: RRR =EER − CER
CERAbsolute risk reduction: ARR = EER − CER
Number-needed-to-treat: NNT =1
ARR=
1EER − CER
Study protocol
I International Conference on Harmonisation of TechnicalRequirements for Registration of Pharmaceuticals for HumanUse (ICH) guidelines for Good Clinical Practise (GCP).
I Formal document outlining the proposed procedures (basicallycontain any information from patient selection criteria toresponsibilities)
I For protocol violations (e.g. patients didn’t take their treatments)the only safe way is to keep those in the analysis as intended(intention-to-treat).
Study safety
SponsorInforming the local site investigators of the true historical safety recordof the drug, Monitoring the results of the study (Data MonitoringCommitee (DMC) also known as Data Safety Monitoring Board),Collecting adverse event reports, Write site-specific informed consent
Local site investigatorConducting the study according to the study protocol, Give trulyinformed consent (risks, potential benefits)
Institutional review board (IRB) or Ethics CommitteeScrutinize the study for both medical safety and protection to thepatients
Regulatory agencies (FDA, EAEM)Review all study data before allowing the drug to proceed to the nextphase, Audits for the local site investigator
OutlineAims of this course
Introduction
Descriptive statistics
Diagnostic tests and method comparison
Probability and theoretical distributions
Parameter estimation and confidence interval
Hypothesis testing
Comparing groups
Correlation and regression
Relation between several variables
Experimental design
Study design and clinical trials
Discussion of medical literature
Medical journals and sites
How to choose a statistical test
Motulsky H, Intuitive Biostatistics, 2nd Ed., Oxford Univ. Press, 2010:pp 387-389
Bayesians vs. FrequentistFrequentistThe population value is seen as fixed (but unknown) and calculateconfidence interval and hypothesis tests. The entire informationcomes from the data.
BayesiansThe population mean follows a distribution (prior probability). Datacan be used to modify the prior probability distribution and gives theposterior probability distribution. Here a 95% credible interval (orBayesian confidence interval) can be constructed, which is narrowerthan the confidence interval derived from the data alone. Difficultiescan arise in deciding on the prior distribution (prior) and somebayesian methods may lead to intractable computational problems.
Dos and Don’ts
I Don’t carry out a significance test, get a large p-value, and theninterpret this as meaning that there is no difference.
I A confidence interval for the mean difference would be muchbetter than significance tests. A non-significant difference in 10subjects cannot be interpreted.
I Quote your p-values correctly to one significant figure (e.g.p = 0.007; do not use p < 0.013,p < 0.01,p > 0.05,p = NS)
I ”Significant” should not be used if you mean ”important”.I Don’t do direct comparison of p-values. It is not correct to
compare two groups by testing changes in each one separately.Significance does not depend only on magnitude, but onvariability and sample size. A two sample t-test should be usedto compare the log ratios in the two groups.
Dos and Don’ts (cont.)
I Always state if you are using SD, SE or CI. Avoid ±I Do confidence intervals (or SE’s) on group means, rather than on
comparisons.I Don’t use three-dimensional effects.I Don’t analyze the data as if they are all from the same population
and ignoring the fact that these 21 groups of subjects are from 9different trials.
I Don’t do Chi-square test analysis of ordered categorical data.
Manuscript Writing Guidelines
1. Read the journal’s instructions to authors. If they do not coverstatistics, use those of one of the major general medical journals.
2. Never, ever, conclude that there is no difference or relationshipbecause it is not significant.
3. Give confidence intervals where you can.4. Give exact p-values where possible, not p < 0.05 or p = NS,
though only one significant figure is necessary.5. Be clear what your main hypothesis and outcome variable are.
Avoid multiple testing.(Note: This is not feasible nowadays. This should be changed to”Ajdust p-values for multiple testing.”)
Bland M, How to Upset the Statistical Referee. Talk presented to the LondonHypertension Society, 2004
Manuscript Writing Guidelines (cont.)
6. Get the design right, be clear about blinding and randomization,do a sample size calculation if you can.
7. Be clear whether you are quoting standard deviations orstandard errors, avoid ± notation.
8. Avoid bar charts with error bars.9. Check the assumptions of your statistical methods.
10. Give clear descriptions of your statistical methods.
Bland M, How to Upset the Statistical Referee. Talk presented to the LondonHypertension Society, 2004