The Right Treatment for the Right Patient (at the Right Time): Personalized Medicine ...davidian/pm_mdacc.pdf · 2015-01-21 · Personalized Medicine and Statistics Marie Davidian

The Right Treatment for theRight Patient (at the Right Time):

Personalized Medicine and Statistics

Marie Davidian

Department of StatisticsNorth Carolina State University

http://www4.stat.ncsu.edu/~davidian

1/47 Personalized Medicine and Statistics

http://www4.stat.ncsu.edu/~davidian

“Personalized Medicine ”

Source of graphic: http://www.personalizedmedicine.com/


“Personalized Medicine ”

Source of graphic: http://www.personalizedmedicine.com/


Why is a statistician talking to you about personalized medicine?

To convince you of the essential role of statistics in the developmentof optimal, evidence-based personalized treatment strategies


Why is a statistician talking to you about personalized medicine?

To convince you of the essential role of statistics in the developmentof optimal, evidence-based personalized treatment strategies


There should be a treatment for that

Modern expectation: A treatment for everything• Drugs• Biologics• Medical devices• Surgical procedures• Behavioral interventions• . . .


Does it work?

Fundamental – the randomized controlled clinical trial:• Evaluation of effectiveness• Comparison of a new treatment to standard of care• Comparison of existing treatments to establish new uses


The controlled clinical trial

Basics of a typical clinical trial:• A sample of subjects with the disease/disorder is recruited• Subjects are randomized to treatments under study⇒ eliminate bias, allow fair comparison

• A (primary) outcome is ascertained for each subject• E.g., overall survival time in cancer, viral load level after 1 year in

human immunodeficiency virus (HIV) infection, death,myocardial infarction, or urgent target vessel revascularizationwithin 30 days in cardiovascular disease



Assessment of effectiveness:• Comparison of some summary measure of outcome

between/among treatments• E.g., the average

• “Is the average outcome if all patients in the population were totake treatment A different from (better than) that if they allinstead were to take treatment B? ”

• Use statistical methods to evaluate the strength of the evidencein the data from the sample supporting a real difference in thepopulation (statistical hypothesis test )



Assessment of effectiveness:• Comparison of some summary measure of outcome

between/among treatments• E.g., the average• “Is the average outcome if all patients in the population were to

take treatment A different from (better than) that if they allinstead were to take treatment B? ”

• Use statistical methods to evaluate the strength of the evidencein the data from the sample supporting a real difference in thepopulation (statistical hypothesis test )


Results

Thus, usually:• Assessment of effectiveness (and regulatory approval ) are

based on a summary measure (e.g., an average) across theentire population

• Statistics is key

Success:• Countless treatments have been evaluated and approved on this

basis• And have benefited numerous patients

However:• All patients are not created equal


Results








Results








Patient heterogeneity



Sources of heterogeneity:• Demographic characteristics• Physiological characteristics• Medical history, concomitant conditions

• Genetic/genomic characteristics

Result: What is “best ” for a patient with one set of characteristicsmight not be “best ” for another• Well-recognized issue for decades• Subgroup analyses• Interest skyrocketed with the advent of the genomic era



Sources of heterogeneity:• Demographic characteristics• Physiological characteristics• Medical history, concomitant conditions• Genetic/genomic characteristics




Sources of heterogeneity:• Demographic characteristics• Physiological characteristics• Medical history, concomitant conditions• Genetic/genomic characteristics



A contrived example

Average outcome in the patient population:• Larger outcomes are better (survival time)• If all patients took treatment A = 9 months• If all patients took treatment B =18 months• Treatment B is better on average

Genetic mutation:• 20% have it, 80% don’t• If all patients took treatment A = (0.2)(25) + (0.8)(5) = 9• If all patients took treatment B = (0.2)(10) + (0.8)(20) = 18• That is, patients with the mutation do much better on treatment A

(25 months vs. 10 months) on average


A contrived example


Genetic mutation:• 20% have it, 80% don’t

• If all patients took treatment A = (0.2)(25) + (0.8)(5) = 9• If all patients took treatment B = (0.2)(10) + (0.8)(20) = 18• That is, patients with the mutation do much better on treatment A



A contrived example


Genetic mutation:• 20% have it, 80% don’t• If all patients took treatment A = (0.2)(25) + (0.8)(5) = 9

• If all patients took treatment B = (0.2)(10) + (0.8)(20) = 18• That is, patients with the mutation do much better on treatment A



A contrived example


Genetic mutation:• 20% have it, 80% don’t• If all patients took treatment A = (0.2)(25) + (0.8)(5) = 9• If all patients took treatment B = (0.2)(10) + (0.8)(20) = 18

• That is, patients with the mutation do much better on treatment A(25 months vs. 10 months) on average


A contrived example


Genetic mutation:• 20% have it, 80% don’t• If all patients took treatment A = (0.2)(25) + (0.8)(5) = 9• If all patients took treatment B = (0.2)(10) + (0.8)(20) = 18• That is, patients with the mutation do much better on treatment A



Personalized medicine

Moral:• While useful for evaluation and approval, summary measures are

not useful for determining how to treat individual patients• Ad hoc search for subgroups could be misleading

(e.g., false discovery )• May be much more complicated than finding a single “mutation ”

(no “magic bullet ”)

Needed: A formal, principled, evidence-based approach


Popular perspective on personalized medicine

Subgroup identification and targeted treatment:• Can we determine subgroups of patients who share certain

characteristics and who are likely to benefit from a particulartreatment?

• Can biomarkers be developed to identify such patients?• In fact, can a new treatment be developed to target a subgroup

that is likely to benefit?• Can clinical trials and approval be focused on particular

subgroups of patients?

Focus: “The right patient for the treatment ”



Subgroup identification and targeted treatment:• Can we determine subgroups of patients who share certain

characteristics and who are likely to benefit from a particulartreatment?

• Can biomarkers be developed to identify such patients?• In fact, can a new treatment be developed to target a subgroup

that is likely to benefit?• Can clinical trials and approval be focused on particular

subgroups of patients?

Focus: “The right patient for the treatment ”



How do we identify “the right patient ?”

Source of graphic: Christoph Meinel, Hasso-Plattner-Institut Potsdam


Another perspective on personalized medicine

Clinical practice: Clinicians make (a series of) treatment decision(s)over the course of a patient’s disease or disorder• Key decision points in the disease/disorder process• Multiple treatment options at each• Accrued information on the patient• Goal: Make the best decision(s) for this individual patient

Focus: “The right treatment for the patient ”

How do we identify “the right treatment ?”


Another perspective on personalized medicine

Clinical practice: Clinicians make (a series of) treatment decision(s)over the course of a patient’s disease or disorder• Key decision points in the disease/disorder process• Multiple treatment options at each• Accrued information on the patient• Goal: Make the best decision(s) for this individual patient

Focus: “The right treatment for the patient ”

How do we identify “the right treatment ?”


Cancer treatment

Two decision points:• Decision 1: Induction chemotherapy (C)• Decision 2: Maintenance treatment (M) for patients who

respond, Salvage chemotherapy (S) for those who don’t• Several options for each• Goal: Prolong survival (outcome)


Clinical decision-making

How are these decisions made?• Clinical judgment• Practice guidelines based on previous studies, expert opinion

• Synthesis of all information on/characteristics of the patient tomake a treatment decision from among the possible options


Clinical decision-making

How are these decisions made?• Clinical judgment• Practice guidelines based on previous studies, expert opinion• Synthesis of all information on/characteristics of the patient to

make a treatment decision from among the possible options


In either case

Needed: A formal , evidence-based approach to• Identifying patient characteristics like “mutation status” for which

the “best ” treatment changes with their value• “Tailoring variables ”• Determining how and in what combination tailoring variables

may be used to identify subgroups and select treatment optionsfor individual patients

• Evidence = data• Statistical design, modeling, and analysis


Cancer treatment

Two decision points• Decision 1: Two induction chemotherapy options C1, C2

• Decision 2: Two maintenance options M1, M2 for patients whorespond, two salvage options S1, S2 for those who don’t

• Goal: Prolong survival (outcome)


How “best” to make these decisions?

Conventional clinical trials:• Most primary analyses focus on a summary measure• Most trials focus on a single decision

How can we challenge this paradigm and move toward making the“best ” decisions for individual patients ?


Using statistics to formalize decision-making

Formalizing clinical decision-making: At any decision• Would like a formal rule that takes as input all available

information on the patient to that point and outputs a treatmentfrom among the possible options

• Decision 1: “If age < 50 years and progesterone receptor level< 10 fmol, give chemotherapy C2; otherwise, give C1”

Dynamic treatment regime: Aka adaptive treatment strategy• A set of such rules, each corresponding to a decision point• Ideally – find the regime that leads to the “best ” possible

outcome for each patient given the available information• Base this on data
















Single decision

Assume:• There is an outcome of interest, Y , e.g., survival time• Large outcomes are good• X = all available information on the patient• Set of treatment options, e.g., {C1,C2} = {0, 1}

Treatment regime: A rule of the form d(X ) such that

d(X ) = 0 or 1 depending on X


Single decision

Assume:• There is an outcome of interest, Y , e.g., survival time• Large outcomes are good• X = all available information on the patient• Set of treatment options, e.g., {C1,C2} = {0, 1}

Treatment regime: A rule of the form d(X ) such that



Dynamic treatment regime


Example regimes: X = {age,PR level}• Rule involving cut-offs

d(X ) = 1 (C2) if age < 50 and PR level < 10= 0 (C1) otherwise

written concisely as

d(X ) = I(age < 50 and PR < 10)

• Rule involving linear combinations

d(X ) = I{age + 8.7 log(PR)− 60 > 0}


Dynamic treatment regime


Example regimes: X = {age,PR level}• Rule involving cut-offs

d(X ) = 1 (C2) if age < 50 and PR level < 10= 0 (C1) otherwise

written concisely as

d(X ) = I(age < 50 and PR < 10)

• Rule involving linear combinations

d(X ) = I{age + 8.7 log(PR)− 60 > 0}


Defining the optimal regime

Clearly: An infinite number of rules/regimes d(X ) are possible• Can we find the “best ?”• That is, “best ” or optimal, regime dopt(X ) among all possible

d(X )?

• What do we mean by “best ?”



Clearly: An infinite number of rules/regimes d(X ) are possible• Can we find the “best ?”• That is, “best ” or optimal, regime dopt(X ) among all possible

d(X )?• What do we mean by “best ?”



Make the “best” decision at the time a patient presents:• Everything we know about a patient is contained in X• For a patient with a particular set of information X , the “best ”

decision is to give the treatment that leads to the largestexpected outcome for such a patient

Thus: Optimal regime dopt should satisfy• If a patient with information X were to receive treatment

according to dopt , his/her expected outcome is as large aspossible

• And if all patients in the population were to follow dopt , theexpected (average) outcome across the entire population shouldbe as large as possible

















Expected outcome: A = treatment received

E(Y |X ,A)

is the expected outcome for patients with characteristics X were theyto receive treatment A• With two options {0, 1}:

dopt(X ) = 0 if E(Y |X ,A = 1) < E(Y |X ,A = 0)= 1 if E(Y |X ,A = 1) > E(Y |X ,A = 0)

• E(Y |X ,A) is the regression of outcome on characteristics andtreatment received (expected outcome given X and A)



Expected outcome: A = treatment received

E(Y |X ,A)

is the expected outcome for patients with characteristics X were theyto receive treatment A• With two options {0, 1}:


• E(Y |X ,A) is the regression of outcome on characteristics andtreatment received (expected outcome given X and A)


Estimating the optimal regime


Data: Recorded (X ,A,Y ) from n patients• From a conventional clinical trial comparing treatments 0 and 1,

baseline information X• Develop a regression model and fit to the data

• E.g., if X = (X (1),X (2)), a linear model (need not be linear)

E(Y |X ,A) = α0 + α1X (1) + A(β0 + β1X (2))

• Or a logistic regression model

logit{E(Y |X ,A) } = α0 + α1X (1) + A(β0 + β1X (2))





baseline information X• Develop a regression model and fit to the data• E.g., if X = (X (1),X (2)), a linear model (need not be linear)

E(Y |X ,A) = α0 + α1X (1) + A(β0 + β1X (2))








E(Y |X ,A) = α0 + α1X (1) + A(β0 + β1X (2))








E(Y |X ,A) = α0 + α1X (1) + A(β0 + β1X (2))






Data: Recorded (X ,A,Y ) from n patients• These models imply

dopt(X ) = 0 if β0 + β1X (2) < 0= 1 if β0 + β1X (2) > 0

• Estimated optimal regime: Substitute estimates for β0, β1

dopt(X ) = I( β0 + β1X (2) > 0 )

• The same idea applies to fancier models


Multiple decisions

Two decisions (cancer example):• Decision 1: X1 = information available, set of treatment options,

e.g., {C1,C2}• Decision 2: X2 = additional information collected, set of

treatment options, e.g., {M1, M2, S1, S2}• X2 includes responder status

Regime: A set of rules d = {d1(X1),d2(X1,A1,X2)}• d1(X1) dictates treatment at decision 1 given information

available at that point• A1 is treatment received at decision 1 (determined by X1)• d2(X1,A1,X2) dictates treatment at decision 2 given all accrued

information at that point


Multiple decisions

Two decisions (cancer example):• Decision 1: X1 = information available, set of treatment options,

e.g., {C1,C2}• Decision 2: X2 = additional information collected, set of

treatment options, e.g., {M1, M2, S1, S2}• X2 includes responder status

Regime: A set of rules d = {d1(X1),d2(X1,A1,X2)}• d1(X1) dictates treatment at decision 1 given information

available at that point• A1 is treatment received at decision 1 (determined by X1)• d2(X1,A1,X2) dictates treatment at decision 2 given all accrued

information at that point


Optimal multi-decision regime

Optimal regime: dopt is such that the expected outcome for a patientwith initial characteristics X1, if s/he were to receive all subsequenttreatments according to dopt , is as large as possible

dopt = {dopt1 (X1),d

opt2 (X1,A1,X2) }

• Can we use data and regression modeling as before?• I.e., consider each decision separately and use data from

separate clinical trials comparing the options at each?• Unfortunately not. . .


Optimal multi-decision regime

Optimal regime: dopt is such that the expected outcome for a patientwith initial characteristics X1, if s/he were to receive all subsequenttreatments according to dopt , is as large as possible

dopt = {dopt1 (X1),d

opt2 (X1,A1,X2) }

• Can we use data and regression modeling as before?• I.e., consider each decision separately and use data from

separate clinical trials comparing the options at each?• Unfortunately not. . .


Complications for multiple decisions

Delayed effects: For example• C1 may yield higher proportion of responders than C2 but may

also have other effects that render subsequent maintenancetreatments (M) less effective in regard to survival

• C1 may not appear best initially but may have enhancedeffectiveness over the long term when followed by M1

• Result – Must use data from a single study (same patients)reflecting the entire sequence of decisions

Data: (X1,A1,X2,A2,Y ) recorded from n patients• A1 = treatment received at decision 1,

A2 = treatment received at decision 2


Complications for multiple decisions

Delayed effects: For example• C1 may yield higher proportion of responders than C2 but may

also have other effects that render subsequent maintenancetreatments (M) less effective in regard to survival

• C1 may not appear best initially but may have enhancedeffectiveness over the long term when followed by M1

• Result – Must use data from a single study (same patients)reflecting the entire sequence of decisions

Data: (X1,A1,X2,A2,Y ) recorded from n patients• A1 = treatment received at decision 1,

A2 = treatment received at decision 2


SMART

SMART: Sequential, Multiple Assignment, Randomized Trial• Randomize subjects at each decision point• Collect not only baseline information X1 but intervening

information X2

• Goal: Provide rich data for estimation of optimal treatmentregimes

Pioneered by Susan Murphy, Phil Lavori, and others


SMART

Schematic of a SMART: Cancer example (randomization at •s)

Cancer

Response

NoResponse

Response

NoResponse

C

C

1

2

M1

M2

M1

M2

S1

S2

S1

S2


Sequential regression (Q-learning)

Estimating dopt : dopt = {dopt1 (X1),d

opt2 (X1,A1,X2) }

• Start at the final decision and work backwards• Backward induction, dynamic programming



Decision 2: Given the patient’s accrued history to this point,determine the optimal rule to be used now• Decision 2 options coded {0, 1}

dopt2 (X1,A1,X2)

= 0 if E(Y |X1,A1,X2,A2 = 1) < E(Y |X1,A1,X2,A2 = 0)= 1 if E(Y |X1,A1,X2,A2 = 1) > E(Y |X1,A1,X2,A2 = 0)

• E(Y |X1,A1,X2,A2) is the regression of outcome on bothtreatments received and accrued characteristics

• Can develop and fit a regression model



Decision 1: Trickier• Must take into account that the optimal rule at decision 2 will be

followed in the future• I.e., dopt

1 (X1) must make the expected outcome as large aspossible when also dopt

2 (X1,A1,X2) will be used to determinetreatment at decision 2

• Mathematically messy; best explained by illustration



Decision 1: Trickier• Must take into account that the optimal rule at decision 2 will be

followed in the future• I.e., dopt

1 (X1) must make the expected outcome as large aspossible when also dopt

2 (X1,A1,X2) will be used to determinetreatment at decision 2

• Mathematically messy; best explained by illustration



Implementation:• Develop a regression model for decision 2, e.g.,

E(Y |X1,A1,X2,A2) = α20 + α21X1 + α22X2 + α23A1X1

+ A2(β20 + β21X2 + β22A1),

fit to the data, and estimate dopt2 (X1,A1,X2)

d2(X1,A1,X2) = I( β20 + β21X2 + β22A1 > 0 )



Implementation:• Develop a regression model for decision 2, e.g.,

E(Y |X1,A1,X2,A2) = α20 + α21X1 + α22X2 + α23A1X1

+ A2(β20 + β21X2 + β22A1),

fit to the data, and estimate dopt2 (X1,A1,X2)

d2(X1,A1,X2) = I( β20 + β21X2 + β22A1 > 0 )



Implementation:• For each subject, form Y , the predicted value with the decision 2

optimal treatment determined by the estimated dopt2 (X1,A1,X2)

substituted for A2 actually received, i.e.,

Y = α20 + α21X1 + α22X2 + α23A1X1

+ dopt2 (X1,A1,X2)(β20 + β21X2 + β22A1)

• Develop a regression model for decision 1 with Y as theoutcome, e.g.,

E(Y |X1,A1) = α10 + α11X1 + A1(β10 + β11X1),

fit to the data, and estimate dopt1 (X1)

d1(X1) = I( β10 + β11X1 > 0 )



Implementation:• For each subject, form Y , the predicted value with the decision 2

optimal treatment determined by the estimated dopt2 (X1,A1,X2)

substituted for A2 actually received, i.e.,

Y = α20 + α21X1 + α22X2 + α23A1X1

+ dopt2 (X1,A1,X2)(β20 + β21X2 + β22A1)

• Develop a regression model for decision 1 with Y as theoutcome, e.g.,

E(Y |X1,A1) = α10 + α11X1 + A1(β10 + β11X1),

fit to the data, and estimate dopt1 (X1)

d1(X1) = I( β10 + β11X1 > 0 )


Result

• An approach exists for estimating optimal treatment regimesfrom appropriate data

• So why isn’t everyone doing this ?


Issues

• This all depends on the chosen regression models• If the regression model(s) are not correct, an estimated optimal

regime could be very different from the true optimal regime• Are there other ways to estimate an optimal regime that are less

sensitive to chosen models?

• If X1, X2 are high-dimensional, rules could be very complicated(and how to select the likely tailoring variables?)

• Distrust of “black boxes” – can we restrict attention only toregimes that are feasible and interpretable in practice?

• Linear models produce rules involving linear combinations ofcharacteristics – what if we want to restrict to rules involvingcut-offs/thresholds ?


Issues



sensitive to chosen models?• If X1, X2 are high-dimensional, rules could be very complicated

(and how to select the likely tailoring variables?)• Distrust of “black boxes” – can we restrict attention only to

regimes that are feasible and interpretable in practice?

• Linear models produce rules involving linear combinations ofcharacteristics – what if we want to restrict to rules involvingcut-offs/thresholds ?


Issues



sensitive to chosen models?• If X1, X2 are high-dimensional, rules could be very complicated

(and how to select the likely tailoring variables?)• Distrust of “black boxes” – can we restrict attention only to

regimes that are feasible and interpretable in practice?• Linear models produce rules involving linear combinations of

characteristics – what if we want to restrict to rules involvingcut-offs/thresholds ?


Issues

• What if more than one outcome is of interest? E.g., balancingefficacy and toxicity in cancer treatment?

• Assessment of uncertainty (e.g., standard errors, confidenceintervals)?

• Design principles (e.g., sample size) for SMARTs?• Once an optimal regime is estimated, should it be compared to

standard of care in a conventional clinical trial?• Can real-time regimes be developed?


Issues

• What if more than one outcome is of interest? E.g., balancingefficacy and toxicity in cancer treatment?

• Assessment of uncertainty (e.g., standard errors, confidenceintervals)?

• Design principles (e.g., sample size) for SMARTs?• Once an optimal regime is estimated, should it be compared to

standard of care in a conventional clinical trial?• Can real-time regimes be developed?


Next steps

• Statistical research is ongoing to address all of these issues (andis way ahead of what is actually being done in practice)

• SMARTs and estimation of optimal regimes have beenembraced in behavioral and mental health research

• But much less so in chronic diseases like cancer, CVD, HIVinfection

(with notable exceptions. . . )• Must challenge the conventional clinical trial paradigm and

myopic view of treatment


Next steps



• But much less so in chronic diseases like cancer, CVD, HIVinfection (with notable exceptions. . . )

• Must challenge the conventional clinical trial paradigm andmyopic view of treatment


Next steps






Next steps






Thought Leaders

2013 MacArthur Fellow Susan Murphy and Jamie Robins


Thought Leaders

Peter Thall


Summary

The goal of truly personalized medicine is still elusive

I hope I have convinced you that statistical science is essential toreaching this goal


Summary

The goal of truly personalized medicine is still elusive

I hope I have convinced you that statistical science is essential toreaching this goal


All things statistics

http://www.worldofstatistics.org/

http://thisisstatistics.org/


http://www.worldofstatistics.org/

http://thisisstatistics.org/

References

Schulte, P. J., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2014). Q-and A-learning methods for estimating optimal dynamic treatmentregimes. Statistical Science, in press.

Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012). Arobust method for estimating optimal treatment regimes. Biometrics68, 1010–1018.

Zhang, B., Tsiatis, A.A., Davidian, M., Zhang, M., and Laber, E.B.,(2012). Estimating optimal treatment regimes from a classificationperspective. Stat 1, 103–114.

Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2013).Robust estimation of optimal dynamic treatment regimes forsequential treatment decisions. Biometrika 100, 681–694.


Documents

The Right Treatment for the Right Patient (at the Right Time): Personalized Medicine ...davidian/pm_mdacc.pdf · 2015-01-21 · Personalized Medicine and Statistics Marie Davidian