Download pdf - Variety's the very spice of life, That gives it all its

Variety’s the very spice of life, That gives it all its flavour.

Arthur BergPennsylvania State University, Division of Biostatistics

statgen.psu.edu

Arthur Berg Variety’s the very spice of life, That gives it all its flavour. 1 / 25

About Me

I Assistant Professor, Penn State Biostatistics (August 2009 – present)

Penn State College of Medicine, Department of Public Health SciencesCenter for Statistical GeneticsPenn State Cancer InstituteDepartment of StatisticsColorectal Surgery

I Assistant Professor, University of Florida Statistics (2007 – 2009)

Penn State College of Medicine, Department of Public Health Sciences

I PhD (Math), MS (Stats)︸︷︷︸UCSD (La Jolla, CA)

, MA (Math), BS (Math)︸︷︷︸USC (Los Angeles, CA)


Variation


Statistical Modeling

Outcomes

I apple tree yield

I cow’s milk production

I stock price

I temperature

I survival probability

I disease status

general model

outcomes = f(observations) + error

Sum of Squared Errors (SSerr) — vs — Total Sum of Squares (SStot)

R2 = 1− SSerr

SStot


The Blackbox Model


Statistical Issues

outcomes = f(observations) + error

I Goodness of Fit — How good is f?

Overfitting — how good is the model under a new dataset?Underfitting — are there patterns in the residuals?

I “Large p, small n” — Too many observations, too few outcomes

multiple hypotheses testingmultiple estimation rank bias

I Missing data

Missing Completely At Random (MCAR)Missing At Random (MAR)Missing Not At Random (MNAR) or non-ignorableCensored dataTruncated data


Goals

Utilmate Goal: Prediction

I Given economic conditions and latest stock prices, predict tomorrow’sstock price

I Given the genetics and phenotypes of the parents, predict the phenotypeof the offspring

I Given a patient’s medical history, genetics, and current symptoms,predict the best treatment

Intermediate Goal: Refining the Model

I Given a list of potential predictors, identify the relevant ones

I From the rich array of genomic data, identify genes responsible for thedisease

I Research new directions identified from the estimated model


Improving Clinical Medicine

Users' Guides for Clinical Decision AnalysisAre the results valid?Were all important strategies and outcomes in¬cluded?

Was an explicit and sensible process used toidentify, select, and combine the evidence intoprobabilities?

Were the utilities obtained in an explicit and sen¬sible way from credible sources?

Was the potential impact of any uncertainty in theevidence determined?

What are the results?In the baseline analysis, does one strategy resultin a clinically important gain for patients? If not,is the result a toss-up?

How strong is the evidence used in the analysis?Could the uncertainty in the evidence change theresult?

Will the results help me in caring for my patients?Do the probability estimates fit my patients' clinicalfeatures?

Do the utilities reflect how my patients would valuethe outcomes of the decision?

tide on how to perform decision analy¬sis; if you wish to read about that, youshould look elsewhere.13,14FRAMEWORK FOR THE USERS'GUIDESWe will approach articles on clinical

decision analysis using the same frame¬work introduced in earlier articles inthis series, as follows:Are the Results Valid?This question addresses whether the

strategy recommended by the analysisis truly likely to be the better one forpatients. Just as with other types ofstudies, the validity of a decision analy¬sis is largely determined by the strengthof the methods used.What Are the Results?The users' guides under this second

question consider the size ofthe expectednet benefit from the recommended strat¬egy and our confidence in this estimateof net benefit.Will the Results Help Me in Caringfor My Patients?If the decision analysis yields valid

and important results, you should ex¬amine whether these results can be gen¬eralized to the patients in your practice.The Table summarizes the specific

guides you should use when addressingthese three questions. We will explorethe guides by applying them to the studywe found in our search. This article willdeal with the validity guides, while thenext in the series will address the re¬sults and applicability.ARE THE RESULTS VALID?Were All Important Strategies andOutcomes Included?At issue here is how well the struc¬

ture of the model fits the clinical deci¬sion you face. Most clinical decision

No EmbolismNo Prophylaxis

Embolism

-<

^1

Cardiomyopathy

\ Warfarin

No Embolism

. Embolism

No Bleed

Bleed

No Bleed

Bleed

<

<

Structure of a decision tree. Square indicates decision node; circles, chance nodes; triangles, outcomenodes; and lines, strategy pathways. Numbers (when present) by lines indicate probabilities, and bytriangles, utilities.

analyses are built as decision trees, andthe articles will usually include one ormore diagrams showing the structure ofthe decision tree used for the analysis.Reviewing these diagrams will help youunderstand the model. You must thenjudge whether the model fits the clinicalproblem well enough to be valid.The Figure shows a diagram ofamuch

simplified version of the decision treefor the anticoagulation problem. The cli¬nician has two options for patients withcardiomyopathy, either to offer no pro¬phylaxis or to prescribe warfarin. Ei¬ther way, patients may or may not de¬velop embolie events. Prophylaxis low¬ers the chance ofembolism but can causebleeding in some patients. As seen inthe Figure, decision trees are displayedgraphically, oriented from left to right,with the decision to be analyzed on theleft, the compared strategies in the cen¬ter, and the clinical outcomes on theright. The decision is diagrammed by asquare, termed a "decision node." Thelines emanating from the decision noderepresent the clinical strategies beingcompared. Chance events are dia¬grammed with circles, called "chancenodes," and outcome states are shownas triangles or as rectangles.To explore more fully how the mod¬

el's structure affects its validity, we willhighlight two aspects here.Were All of the Realistic Clinical

Strategies Compared?—In a decisionanalysis, a strategy is defined as a se¬quence of actions and decisions that arecontingent on each other. For instance,the strategy of anticoagulant therapyfor a patient includes not only the pre¬scription and the monitoring, but alsothe adjustment of the warfarin dose forchanges in prothrombin time. The au¬thors should specify which decision strat¬egies are being compared (at least two,otherwise there's no decision). Further,the clinical strategies included shouldbe described in enough detail to recog¬nize them as separate and realisticchoices. You should satisfy yourself that

the clinical strategies you consider im¬portant are included in the analysis.For example, in a decision analysis of

the management of suspected herpesencephalitis, the authors included thethree strategies available to cliniciansthen: brain biopsy, empirical vidarabine,or neither.15 At that time, this modelrepresented the clinical decision well.Since then, however, acyclovir has be¬come available and has beenwidely usedfor this disorder. Because the originalmodel did not include an acyclovir strat¬egy, it would no longer accurately por¬tray the decision.In the anticoagulation example, the

analysts studied two clinical strategies,warfarin and no warfarin. This fits quitewell the clinical decision you face in thescenario. Note that the decision modeldoes not include a third strategy of us¬ing aspirin instead ofwarfarin. If, whenconsidering the treatment options forthis patient, you would seriously con¬sider the use of aspirin instead of war¬farin, then you would judge this modelas incomplete.Were All Clinically Relevant Out¬

comes Considered?—To be useful to cli¬nicians and patients, the decision modelshould include the outcomes of the dis¬ease that matter to patients. Generallyspeaking, these include not only thequantity of life but also its quality, inmeasures of disease and disability. Ob¬viously, the specific disorder in questiondetermines which outcomes are clini¬cally relevant. For an analysis of anacute, life-threatening condition, life ex¬pectancy might be appropriate as themain outcome measure. But in an analy¬sis ofdiagnostic strategies for a nonfataldisorder, more relevant outcomes wouldbe discomfort from testing or days ofdisability avoided. By examining the out¬comes used in the analysis, you can dis¬cover the viewpoint from which the ana¬lyst built the decision model. Clinicaldecision analyses should be built fromthe perspective of the patient, that is,should include all the clinical benefits

at Penn State Milton S Hershey Med Ctr on February 27, 2010 www.jama.comDownloaded from


Personalized Medicine

I Evidence-based medicine

meta-analysisrisk-benefit analysisrandomized controlled trialsdata mining

I -Omic Sciences

genomicsproteomicsmetabolomicsmetagenomicspharmacogenomics


Translational Research vs Basic Science

Translation Research: “bench-to-bedside” ; fast-tracking basic researchwith emphasis on practical application.

⇒realizes incremental improvement over short periods of time

Basic Science: relies on theoretical underpinnings and takes a long time.⇒ leads to breakthroughs or paradigm-shifts in practice

Challenges of Translational Research

I Often requires multi-disciplinary teams which can be difficult to establish

I Traditional incentives are geared toward traditional research

I Requires relaxing experimental conditions leaving one vulnerable tocriticism (e.g. journal publications)


Biostatistics’ Home


Tools of the Trade

I RI SAS, Stata, SPSS, MATLAB,

Mathematica, WinBUGS

I LATEX

I Office (Word, Excel, Powerpoint)

I Sweave (R + LATEX)


What a Biostatistician Does (A Biased Perspective)

I Research

statistical methodologyapplication/modification of a statistical techniquecomputational algorithms

I Collaboration

GrantsPapers

I Teaching

BiostatisticsLinear ModelsClinical TrialsMultivariate StatisticsSurvival Analysis

I Service

Institutional Review Board (IRB)


Case Study: Survival Regression

Data Structure (n = 120)

death: times at death or censoring

treatment: A, B, C, or placebo (30 individuals each)

status: 0=censored, 1=uncensored

setwd("/Users/berg/Documents/presentations/2010/Bucknell")

download.file(

"http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/cancer.txt",

"cancer.txt")

library(survival)

cancer<-read.table("cancer.txt",header=TRUE)

attach(cancer)

plot(survfit(Surv(death,status)~treatment),

ylab="Surivorship",xlab="Time", lwd=3,

col=c("red","blue","orange","black"))


Case Study (cont.)

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Time

Sur

ivor

ship


Case Study I (cont.)

Do not ignore the censoring like this!

levels(treatment)<-levels(treatment)[c(4,1,2,3)]

summary(lm(death~treatment))

Call:

lm(formula = death ~ treatment)

..................

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 9.633 1.205 7.996 1.07e-12 ***

treatmentDrugA -1.133 1.704 -0.665 0.5073

treatmentDrugB -3.367 1.704 -1.976 0.0505 .

treatmentDrugC -3.833 1.704 -2.250 0.0263 *

Residual standard error: 6.599 on 116 degrees of freedom

Multiple R-squared: 0.05581, Adjusted R-squared: 0.03139

F-statistic: 2.286 on 3 and 116 DF, p-value: 0.08246


Case Study (cont.)

Do not ignore the censoring like this!

fit<-lm(death[status==1]~treatment[status==1])

summary(fit)

Call:

lm(formula = death[status == 1] ~ treatment[status == 1])

................................

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 9.480 1.399 6.778 1.14e-09 ***

treatment[status == 1]DrugA -1.120 1.978 -0.566 0.5726

treatment[status == 1]DrugB -2.680 1.978 -1.355 0.1788

treatment[status == 1]DrugC -4.242 2.070 -2.049 0.0433 *

Residual standard error: 6.994 on 92 degrees of freedom

Multiple R-squared: 0.0498, Adjusted R-squared: 0.01881

F-statistic: 1.607 on 3 and 92 DF, p-value: 0.1931


Case Study (cont.)

model1<-survreg(Surv(death,status)~treatment,dist="exponential")

summary(model1)

Call:

survreg(formula = Surv(death, status) ~ treatment, dist = "exponential")

Value Std. Error z p

(Intercept) 2.448 0.200 12.238 1.95e-34

treatmentDrugA -0.125 0.283 -0.443 6.58e-01

treatmentDrugB -0.430 0.283 -1.520 1.28e-01

treatmentDrugC -0.333 0.296 -1.125 2.61e-01

Scale fixed at 1

Exponential distribution

Loglik(model)= -310.1 Loglik(intercept only)= -311.5

Chisq= 2.8 on 3 degrees of freedom, p= 0.42

Number of Newton-Raphson Iterations: 4

n= 120


Case Study (cont.)

model2<-survreg(Surv(death,status)~treatment)

summary(model2)

Call:

survreg(formula = Surv(death, status) ~ treatment)


(Intercept) 2.531 0.1572 16.102 2.47e-58

treatmentDrugA -0.191 0.2193 -0.872 3.83e-01

treatmentDrugB -0.475 0.2186 -2.174 2.97e-02

treatmentDrugC -0.454 0.2313 -1.963 4.96e-02

Log(scale) -0.260 0.0797 -3.264 1.10e-03

Scale= 0.771

Weibull distribution




n= 120


Case Study (cont.)

drug<-treatment

levels(drug)[2:4]<-"drug"

model3<-survreg(Surv(death,status)~drug)

anova(model2,model3)

Call:

survreg(formula = Surv(death, status) ~ drug)


(Intercept) 2.526 0.1593 15.86 1.19e-56

drugdrug -0.353 0.1834 -1.93 5.41e-02

Log(scale) -0.246 0.0792 -3.10 1.92e-03

Scale= 0.782





n= 120


Case Study (cont.)

drug<-treatment

levels(drug)[2:4]<-"drug"

model3<-survreg(Surv(death,status)~drug)

anova(model2,model3); summary(model3)

Terms Resid. Df -2*LL Test Df Deviance P(>|Chi|)

1 treatment 115 610.7742 NA NA NA

2 drug 117 612.8627 1 vs. 2 -2 -2.088459 0.351963


(Intercept) 2.526 0.1593 15.86 1.19e-56

drugdrug -0.353 0.1834 -1.93 5.41e-02

Log(scale) -0.246 0.0792 -3.10 1.92e-03

Scale= 0.782





n= 120


Case Study (cont.)

tapply(predict(model3,type="response"),drug,mean)

placebo drug

12.502711 8.782404

sapply(split(death,drug),mean)

placebo drug

9.633333 6.855556


Poplar Gene Expressions: Multidimensional Scaling


Variety’s the very spice of life,That gives it all its flavour.

William CowperThe Task (1785)