Variety’s the very spice of life, That gives it all its flavour.
Arthur BergPennsylvania State University, Division of Biostatistics
statgen.psu.edu
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 1 / 25
About Me
I Assistant Professor, Penn State Biostatistics (August 2009 – present)
Penn State College of Medicine, Department of Public Health SciencesCenter for Statistical GeneticsPenn State Cancer InstituteDepartment of StatisticsColorectal Surgery
I Assistant Professor, University of Florida Statistics (2007 – 2009)
Penn State College of Medicine, Department of Public Health Sciences
I PhD (Math), MS (Stats)︸ ︷︷ ︸UCSD (La Jolla, CA)
, MA (Math), BS (Math)︸ ︷︷ ︸USC (Los Angeles, CA)
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 2 / 25
Statistical Modeling
Outcomes
I apple tree yield
I cow’s milk production
I stock price
I temperature
I survival probability
I disease status
general model
outcomes = f(observations) + error
Sum of Squared Errors (SSerr) — vs — Total Sum of Squares (SStot)
R2 = 1− SSerr
SStot
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 6 / 25
The Blackbox Model
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 7 / 25
Statistical Issues
outcomes = f(observations) + error
I Goodness of Fit — How good is f?
Overfitting — how good is the model under a new dataset?Underfitting — are there patterns in the residuals?
I “Large p, small n” — Too many observations, too few outcomes
multiple hypotheses testingmultiple estimation rank bias
I Missing data
Missing Completely At Random (MCAR)Missing At Random (MAR)Missing Not At Random (MNAR) or non-ignorableCensored dataTruncated data
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 8 / 25
Goals
Utilmate Goal: Prediction
I Given economic conditions and latest stock prices, predict tomorrow’sstock price
I Given the genetics and phenotypes of the parents, predict the phenotypeof the offspring
I Given a patient’s medical history, genetics, and current symptoms,predict the best treatment
Intermediate Goal: Refining the Model
I Given a list of potential predictors, identify the relevant ones
I From the rich array of genomic data, identify genes responsible for thedisease
I Research new directions identified from the estimated model
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 9 / 25
Improving Clinical Medicine
Users' Guides for Clinical Decision AnalysisAre the results valid?Were all important strategies and outcomes in¬cluded?
Was an explicit and sensible process used toidentify, select, and combine the evidence intoprobabilities?
Were the utilities obtained in an explicit and sen¬sible way from credible sources?
Was the potential impact of any uncertainty in theevidence determined?
What are the results?In the baseline analysis, does one strategy resultin a clinically important gain for patients? If not,is the result a toss-up?
How strong is the evidence used in the analysis?Could the uncertainty in the evidence change theresult?
Will the results help me in caring for my patients?Do the probability estimates fit my patients' clinicalfeatures?
Do the utilities reflect how my patients would valuethe outcomes of the decision?
tide on how to perform decision analy¬sis; if you wish to read about that, youshould look elsewhere.13,14FRAMEWORK FOR THE USERS'GUIDESWe will approach articles on clinical
decision analysis using the same frame¬work introduced in earlier articles inthis series, as follows:Are the Results Valid?This question addresses whether the
strategy recommended by the analysisis truly likely to be the better one forpatients. Just as with other types ofstudies, the validity of a decision analy¬sis is largely determined by the strengthof the methods used.What Are the Results?The users' guides under this second
question consider the size ofthe expectednet benefit from the recommended strat¬egy and our confidence in this estimateof net benefit.Will the Results Help Me in Caringfor My Patients?If the decision analysis yields valid
and important results, you should ex¬amine whether these results can be gen¬eralized to the patients in your practice.The Table summarizes the specific
guides you should use when addressingthese three questions. We will explorethe guides by applying them to the studywe found in our search. This article willdeal with the validity guides, while thenext in the series will address the re¬sults and applicability.ARE THE RESULTS VALID?Were All Important Strategies andOutcomes Included?At issue here is how well the struc¬
ture of the model fits the clinical deci¬sion you face. Most clinical decision
No EmbolismNo Prophylaxis
Embolism
-<
^1
Cardiomyopathy
\ Warfarin
No Embolism
. Embolism
No Bleed
Bleed
No Bleed
Bleed
<
<
Structure of a decision tree. Square indicates decision node; circles, chance nodes; triangles, outcomenodes; and lines, strategy pathways. Numbers (when present) by lines indicate probabilities, and bytriangles, utilities.
analyses are built as decision trees, andthe articles will usually include one ormore diagrams showing the structure ofthe decision tree used for the analysis.Reviewing these diagrams will help youunderstand the model. You must thenjudge whether the model fits the clinicalproblem well enough to be valid.The Figure shows a diagram ofamuch
simplified version of the decision treefor the anticoagulation problem. The cli¬nician has two options for patients withcardiomyopathy, either to offer no pro¬phylaxis or to prescribe warfarin. Ei¬ther way, patients may or may not de¬velop embolie events. Prophylaxis low¬ers the chance ofembolism but can causebleeding in some patients. As seen inthe Figure, decision trees are displayedgraphically, oriented from left to right,with the decision to be analyzed on theleft, the compared strategies in the cen¬ter, and the clinical outcomes on theright. The decision is diagrammed by asquare, termed a "decision node." Thelines emanating from the decision noderepresent the clinical strategies beingcompared. Chance events are dia¬grammed with circles, called "chancenodes," and outcome states are shownas triangles or as rectangles.To explore more fully how the mod¬
el's structure affects its validity, we willhighlight two aspects here.Were All of the Realistic Clinical
Strategies Compared?—In a decisionanalysis, a strategy is defined as a se¬quence of actions and decisions that arecontingent on each other. For instance,the strategy of anticoagulant therapyfor a patient includes not only the pre¬scription and the monitoring, but alsothe adjustment of the warfarin dose forchanges in prothrombin time. The au¬thors should specify which decision strat¬egies are being compared (at least two,otherwise there's no decision). Further,the clinical strategies included shouldbe described in enough detail to recog¬nize them as separate and realisticchoices. You should satisfy yourself that
the clinical strategies you consider im¬portant are included in the analysis.For example, in a decision analysis of
the management of suspected herpesencephalitis, the authors included thethree strategies available to cliniciansthen: brain biopsy, empirical vidarabine,or neither.15 At that time, this modelrepresented the clinical decision well.Since then, however, acyclovir has be¬come available and has beenwidely usedfor this disorder. Because the originalmodel did not include an acyclovir strat¬egy, it would no longer accurately por¬tray the decision.In the anticoagulation example, the
analysts studied two clinical strategies,warfarin and no warfarin. This fits quitewell the clinical decision you face in thescenario. Note that the decision modeldoes not include a third strategy of us¬ing aspirin instead ofwarfarin. If, whenconsidering the treatment options forthis patient, you would seriously con¬sider the use of aspirin instead of war¬farin, then you would judge this modelas incomplete.Were All Clinically Relevant Out¬
comes Considered?—To be useful to cli¬nicians and patients, the decision modelshould include the outcomes of the dis¬ease that matter to patients. Generallyspeaking, these include not only thequantity of life but also its quality, inmeasures of disease and disability. Ob¬viously, the specific disorder in questiondetermines which outcomes are clini¬cally relevant. For an analysis of anacute, life-threatening condition, life ex¬pectancy might be appropriate as themain outcome measure. But in an analy¬sis ofdiagnostic strategies for a nonfataldisorder, more relevant outcomes wouldbe discomfort from testing or days ofdisability avoided. By examining the out¬comes used in the analysis, you can dis¬cover the viewpoint from which the ana¬lyst built the decision model. Clinicaldecision analyses should be built fromthe perspective of the patient, that is,should include all the clinical benefits
at Penn State Milton S Hershey Med Ctr on February 27, 2010 www.jama.comDownloaded from
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 10 / 25
Personalized Medicine
I Evidence-based medicine
meta-analysisrisk-benefit analysisrandomized controlled trialsdata mining
I -Omic Sciences
genomicsproteomicsmetabolomicsmetagenomicspharmacogenomics
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 11 / 25
Translational Research vs Basic Science
Translation Research: “bench-to-bedside” ; fast-tracking basic researchwith emphasis on practical application.
⇒realizes incremental improvement over short periods of time
Basic Science: relies on theoretical underpinnings and takes a long time.⇒ leads to breakthroughs or paradigm-shifts in practice
Challenges of Translational Research
I Often requires multi-disciplinary teams which can be difficult to establish
I Traditional incentives are geared toward traditional research
I Requires relaxing experimental conditions leaving one vulnerable tocriticism (e.g. journal publications)
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 12 / 25
Biostatistics’ Home
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 13 / 25
Tools of the Trade
I RI SAS, Stata, SPSS, MATLAB,
Mathematica, WinBUGS
I LATEX
I Office (Word, Excel, Powerpoint)
I Sweave (R + LATEX)
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 14 / 25
What a Biostatistician Does (A Biased Perspective)
I Research
statistical methodologyapplication/modification of a statistical techniquecomputational algorithms
I Collaboration
GrantsPapers
I Teaching
BiostatisticsLinear ModelsClinical TrialsMultivariate StatisticsSurvival Analysis
I Service
Institutional Review Board (IRB)
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 15 / 25
Case Study: Survival Regression
Data Structure (n = 120)
death: times at death or censoring
treatment: A, B, C, or placebo (30 individuals each)
status: 0=censored, 1=uncensored
setwd("/Users/berg/Documents/presentations/2010/Bucknell")
download.file(
"http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/cancer.txt",
"cancer.txt")
library(survival)
cancer<-read.table("cancer.txt",header=TRUE)
attach(cancer)
plot(survfit(Surv(death,status)~treatment),
ylab="Surivorship",xlab="Time", lwd=3,
col=c("red","blue","orange","black"))
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 16 / 25
Case Study (cont.)
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
Time
Sur
ivor
ship
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 17 / 25
Case Study I (cont.)
Do not ignore the censoring like this!
levels(treatment)<-levels(treatment)[c(4,1,2,3)]
summary(lm(death~treatment))
Call:
lm(formula = death ~ treatment)
..................
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.633 1.205 7.996 1.07e-12 ***
treatmentDrugA -1.133 1.704 -0.665 0.5073
treatmentDrugB -3.367 1.704 -1.976 0.0505 .
treatmentDrugC -3.833 1.704 -2.250 0.0263 *
Residual standard error: 6.599 on 116 degrees of freedom
Multiple R-squared: 0.05581, Adjusted R-squared: 0.03139
F-statistic: 2.286 on 3 and 116 DF, p-value: 0.08246
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 18 / 25
Case Study (cont.)
Do not ignore the censoring like this!
fit<-lm(death[status==1]~treatment[status==1])
summary(fit)
Call:
lm(formula = death[status == 1] ~ treatment[status == 1])
................................
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.480 1.399 6.778 1.14e-09 ***
treatment[status == 1]DrugA -1.120 1.978 -0.566 0.5726
treatment[status == 1]DrugB -2.680 1.978 -1.355 0.1788
treatment[status == 1]DrugC -4.242 2.070 -2.049 0.0433 *
Residual standard error: 6.994 on 92 degrees of freedom
Multiple R-squared: 0.0498, Adjusted R-squared: 0.01881
F-statistic: 1.607 on 3 and 92 DF, p-value: 0.1931
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 19 / 25
Case Study (cont.)
model1<-survreg(Surv(death,status)~treatment,dist="exponential")
summary(model1)
Call:
survreg(formula = Surv(death, status) ~ treatment, dist = "exponential")
Value Std. Error z p
(Intercept) 2.448 0.200 12.238 1.95e-34
treatmentDrugA -0.125 0.283 -0.443 6.58e-01
treatmentDrugB -0.430 0.283 -1.520 1.28e-01
treatmentDrugC -0.333 0.296 -1.125 2.61e-01
Scale fixed at 1
Exponential distribution
Loglik(model)= -310.1 Loglik(intercept only)= -311.5
Chisq= 2.8 on 3 degrees of freedom, p= 0.42
Number of Newton-Raphson Iterations: 4
n= 120
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 20 / 25
Case Study (cont.)
model2<-survreg(Surv(death,status)~treatment)
summary(model2)
Call:
survreg(formula = Surv(death, status) ~ treatment)
Value Std. Error z p
(Intercept) 2.531 0.1572 16.102 2.47e-58
treatmentDrugA -0.191 0.2193 -0.872 3.83e-01
treatmentDrugB -0.475 0.2186 -2.174 2.97e-02
treatmentDrugC -0.454 0.2313 -1.963 4.96e-02
Log(scale) -0.260 0.0797 -3.264 1.10e-03
Scale= 0.771
Weibull distribution
Loglik(model)= -305.4 Loglik(intercept only)= -308.3
Chisq= 5.8 on 3 degrees of freedom, p= 0.12
Number of Newton-Raphson Iterations: 5
n= 120
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 21 / 25
Case Study (cont.)
drug<-treatment
levels(drug)[2:4]<-"drug"
model3<-survreg(Surv(death,status)~drug)
anova(model2,model3)
Call:
survreg(formula = Surv(death, status) ~ drug)
Value Std. Error z p
(Intercept) 2.526 0.1593 15.86 1.19e-56
drugdrug -0.353 0.1834 -1.93 5.41e-02
Log(scale) -0.246 0.0792 -3.10 1.92e-03
Scale= 0.782
Weibull distribution
Loglik(model)= -306.4 Loglik(intercept only)= -308.3
Chisq= 3.71 on 1 degrees of freedom, p= 0.054
Number of Newton-Raphson Iterations: 5
n= 120
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 22 / 25
Case Study (cont.)
drug<-treatment
levels(drug)[2:4]<-"drug"
model3<-survreg(Surv(death,status)~drug)
anova(model2,model3); summary(model3)
Terms Resid. Df -2*LL Test Df Deviance P(>|Chi|)
1 treatment 115 610.7742 NA NA NA
2 drug 117 612.8627 1 vs. 2 -2 -2.088459 0.351963
Value Std. Error z p
(Intercept) 2.526 0.1593 15.86 1.19e-56
drugdrug -0.353 0.1834 -1.93 5.41e-02
Log(scale) -0.246 0.0792 -3.10 1.92e-03
Scale= 0.782
Weibull distribution
Loglik(model)= -306.4 Loglik(intercept only)= -308.3
Chisq= 3.71 on 1 degrees of freedom, p= 0.054
Number of Newton-Raphson Iterations: 5
n= 120
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 23 / 25
Case Study (cont.)
tapply(predict(model3,type="response"),drug,mean)
placebo drug
12.502711 8.782404
sapply(split(death,drug),mean)
placebo drug
9.633333 6.855556
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 24 / 25
Poplar Gene Expressions: Multidimensional Scaling
Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 25 / 25