22
This article was downloaded by: [173.166.127.254] On: 16 March 2016, At: 08:33 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA Management Science Publication details, including instructions for authors and subscription information: http://pubsonline.informs.org An Analytics Approach to Designing Combination Chemotherapy Regimens for Cancer Dimitris Bertsimas, Allison O’Hair, Stephen Relyea, John Silberholz To cite this article: Dimitris Bertsimas, Allison O’Hair, Stephen Relyea, John Silberholz (2016) An Analytics Approach to Designing Combination Chemotherapy Regimens for Cancer. Management Science Published online in Articles in Advance 16 Mar 2016 . http://dx.doi.org/10.1287/mnsc.2015.2363 Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval, unless otherwise noted. For more information, contact [email protected]. The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service. Copyright © 2016, INFORMS Please scroll down for article—it is on subsequent pages INFORMS is the largest professional society in the world for professionals in the fields of operations research, management science, and analytics. For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

This article was downloaded by: [173.166.127.254] On: 16 March 2016, At: 08:33Publisher: Institute for Operations Research and the Management Sciences (INFORMS)INFORMS is located in Maryland, USA

Management Science

Publication details, including instructions for authors and subscription information:http://pubsonline.informs.org

An Analytics Approach to Designing CombinationChemotherapy Regimens for CancerDimitris Bertsimas, Allison O’Hair, Stephen Relyea, John Silberholz

To cite this article:Dimitris Bertsimas, Allison O’Hair, Stephen Relyea, John Silberholz (2016) An Analytics Approach to Designing CombinationChemotherapy Regimens for Cancer. Management Science

Published online in Articles in Advance 16 Mar 2016

. http://dx.doi.org/10.1287/mnsc.2015.2363

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval, unless otherwise noted. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

Copyright © 2016, INFORMS

Please scroll down for article—it is on subsequent pages

INFORMS is the largest professional society in the world for professionals in the fields of operations research, managementscience, and analytics.For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

Page 2: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

MANAGEMENT SCIENCEArticles in Advance, pp. 1–21ISSN 0025-1909 (print) � ISSN 1526-5501 (online) http://dx.doi.org/10.1287/mnsc.2015.2363

© 2016 INFORMS

An Analytics Approach to Designing CombinationChemotherapy Regimens for Cancer

Dimitris BertsimasSloan School and Operations Research Center, Massachusetts Institute of Technology,

Cambridge, Massachusetts 02142, [email protected]

Allison O’HairStanford Graduate School of Business, Stanford, California 94305, [email protected]

Stephen RelyeaLincoln Laboratory, Massachusetts Institute of Technology, Lexington, Massachusetts 02420, [email protected]

John SilberholzSloan School and Operations Research Center, Massachusetts Institute of Technology,

Cambridge, Massachusetts 02142, [email protected]

Cancer is a leading cause of death worldwide, and advanced cancer is often treated with combinations ofmultiple chemotherapy drugs. In this work, we develop models to predict the outcomes of clinical trials

testing combination chemotherapy regimens before they are run and to select the combination chemotherapyregimens to be tested in new phase II and phase III clinical trials, with the primary objective of improving thequality of regimens tested in phase III trials compared to current practice. We built a database of 414 clinical trialsfor advanced gastric cancer and use it to build statistical models that attain an out-of-sample R2 of 0.56 whenpredicting a trial’s median overall survival (OS) and an out-of-sample area under the curve (AUC) of 0.83 whenpredicting if a trial has unacceptably high toxicity. We propose models that use machine learning and optimizationto suggest regimens to be tested in phase II and phase III trials. Though it is inherently challenging to evaluate theperformance of such models without actually running clinical trials, we use two techniques to obtain estimates forthe quality of regimens selected by our models compared with those actually tested in current clinical practice.Both techniques indicate that the models might improve the efficacy of the regimens selected for testing inphase III clinical trials without changing toxicity outcomes. This evaluation of the proposed models suggests thatthey merit further testing in a clinical trial setting.

Keywords : healthcare: treatment; programming: integer; simulation; statisticsHistory : Received October 1, 2014; accepted September 23, 2015, by Noah Gans, stochastic models and systems.

Published online in Articles in Advance.

1. IntroductionCancer is a leading cause of death worldwide, account-ing for 8.2 million deaths in 2012. This number isprojected to increase, with an estimated 13.1 milliondeaths in 2030 (World Health Organization 2012). Theprognosis for many solid-tumor cancers is grim unlessthey are caught at an early stage, when the tumoris contained and can still be surgically removed. Atthe time of diagnosis, the tumor is often sufficientlyadvanced that it has metastasized to other organs andcan no longer be surgically removed, leaving drugtherapy or best supportive care as the best treatmentoptions.

A key goal of oncology research for advanced can-cer is to identify novel chemotherapy regimens thatyield better clinical outcomes than currently availabletreatments (Overmoyer 2003, Roth 2003). Phase II clin-ical trials are used to evaluate the efficacy of novel

regimens, with a focus on exploring treatments thathave never been previously tested for a disease: inthis work we found that 89.4% of phase II trials foradvanced gastric cancer test a new chemotherapy regi-men. Although some trials evaluate a new drug for aparticular cancer, the majority (84.6% for gastric cancer)instead test novel combinations of previously testeddrugs in different dosages and schedules; in this workwe focus primarily on this type of chemotherapy regi-men. The most effective regimens identified in phase IItrials are then evaluated in phase III studies, which arelarge randomized controlled trials comparing one ormore experimental regimens against a control grouptreated with the best available standard chemother-apy regimen (Friedman et al. 2010b). Treatments thatperform well against standard treatments in phase IIItrials may then be considered new standard regimensfor advanced cancer; identifying such regimens and

1

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 3: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy Regimens2 Management Science, Articles in Advance, pp. 1–21, © 2016 INFORMS

thereby improving the set of treatments available topatients is a key goal of oncology research for advancedcancer (Overmoyer 2003, Roth 2003).

Finding novel, effective chemotherapy treatmentsfor advanced cancer is challenging in part becausethe most effective chemotherapy regimens often con-tain more than one drug. Meta-analyses for a numberof cancers have demonstrated efficacy gains of com-bination chemotherapy regimens over single-agenttreatments (Delbaldo et al. 2004, Wagner 2006), andin this work we found that 80% of all chemotherapyclinical trials for advanced gastric cancer have testedmultidrug treatments. As a result of the large num-ber of different chemotherapy drugs, there are a hugenumber of potential drug combinations that could beinvestigated in a new phase II trial, especially whenconsidering different dosages and dosing schedules foreach drug. However, testing any chemotherapy regimenin a clinical trial is expensive, costing on average morethan $10 million for phase II studies and $20 millionfor phase III trials (Sertkaya et al. 2014); these costsare often incurred either by pharmaceutical compa-nies or by the government. Furthermore, even after aphase II study testing a new regimen has been run, itcan be difficult to determine whether this regimen isa good candidate for testing in a larger phase III trialbecause phase II studies often enroll patient populationsthat are not representative of typical advanced cancerpatients (Friedman et al. 2010b). For these reasons, itis a challenge for researchers to identify effective newcombination chemotherapy regimens.

Our aspiration in this paper is to propose anapproach that could serve as a method for selecting thechemotherapy regimens to be tested in phase II andphase III clinical trials. Because phase III clinical trialsare used to test the most promising chemotherapy regi-mens to date and can directly affect standard clinicalpractice, our central objective in this work is to designtools that can improve the quality of the chemotherapyregimens tested in phase III trials compared to currentpractice. The key contributions of the paper are asfollows.

• Clinical Trial Database. We developed a databasecontaining information about the patient demographics,study characteristics, chemotherapy regimens tested,and outcomes of all phase II and phase III clinical trialsfor advanced gastric cancer from papers published inthe period 1979–2012 (§2). Surprisingly, and to the bestof our knowledge, such a database did not exist beforethis study.

• Statistical Models Predicting Clinical Trial Outcomes.We train statistical models using the results of previousrandomized and nonrandomized clinical trials (§3).We use these models to predict survival and toxicityoutcomes of new clinical trials evaluating regimenswhose drugs have individually been tested before, but

potentially in different combinations or dosages. To ourknowledge, this is the first paper to employ statisticalmodels for the prediction of clinical trial outcomesof arbitrary drug combinations and to perform anout-of-sample evaluation of the predictions.

• Design of Chemotherapy Regimens. We propose andevaluate tools for suggesting novel chemotherapy regi-mens to be tested in phase II studies and for selectingpreviously tested regimens to be further evaluated inphase III clinical trials (§4). Our methodology balancesthe dual objectives of exploring novel chemotherapyregimens and testing treatments predicted to be highlyeffective. To our knowledge, this is the first use ofstatistical models and optimization to design novelchemotherapy regimens based on the results of previ-ous clinical trials.

We summarize the models developed and evaluatedin this paper in Table 1. In §4.4 we approximate thequality of our suggested chemotherapy regimens usingboth simulated clinical trial outcomes and the trueoutcomes of similar clinical trials in our database. In§5 we discuss the next step in evaluating our mod-els: using clinical trials to evaluate the quality of thechemotherapy regimens we suggest.

The approach we propose in this work is related toboth patient-level clinical prediction rules and meta-regressions, though it differs in several important ways.Medical practitioners and researchers in the fields ofdata mining and machine learning have a rich historyof predicting clinical outcomes. For instance, techniquesfor prediction of patient survival range from simpleapproaches like logistic regression to more sophisticatedones such as artificial neural networks and decisiontrees (Ohno-Machado 2001). Most commonly, theseprediction models are trained on individual patientrecords and used to predict the clinical outcome of anunseen patient, often yielding impressive out-of-samplepredictions (Burke 1997, Delen et al. 2005, Lee et al.2003, Hurria et al. 2011, Jefferson et al. 1997). Areas ofparticular promise involve incorporating biomarker andgenetic information into individualized chemotherapyoutcome predictions (Efferth and Volm 2005, Phan et al.2009). Individualized predictions represent a usefultool to patients choosing between multiple treatmentoptions (Zhao et al. 2009, 2012; van’t Veer and Bernards2008) and, when trained on clinical trial outcomes for aparticular treatment, can be used to identify promisingpatient populations on which to test that treatment(Zhao et al. 2013) or to determine if that treatment ispromising for a phase III clinical trial (De Ridder 2005).However, such models do not enable predictions ofoutcomes for patients treated with previously unseenchemotherapy regimens, limiting their usefulness indesigning novel chemotherapy regimens.

The technique of meta-regression involves build-ing models using the effect size of randomized trials

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 4: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy RegimensManagement Science, Articles in Advance, pp. 1–21, © 2016 INFORMS 3

Table 1 A Summary of the Models Developed and Evaluated in This Paper

Model Approach Evaluation techniques

Prediction of clinical trial efficacy and toxicityoutcomes

Statistical models trained on a large databaseof previous clinical trials (§3.2)

Sequential out-of-sample R2, root-mean-square error(RMSE), and area under the curve (AUC)(§3.3), as wellas evaluation of whether models could aid planners inavoiding unpromising trials (§3.4)

Design of novel chemotherapy regimens forevaluation in phase II studies

Integer optimization using our statisticalmodels to select novel chemotherapyregimens with high predicted efficacy andacceptable predicted toxicity (§4.1)

Selection of previously tested chemotherapyregimens for further evaluation in phase IIIclinical trials

Use of our statistical models to identifypreviously tested regimens with highpredicted efficacy and acceptable predictedtoxicity (§4.2)

The simulation and matching metrics, which use bothsimulation and the outcomes of true clinical trials tocompare our suggested chemotherapy regimens againstthose selected in current clinical practice (§4.4)

as the dependent variable and using independentvariables such as patient demographics and informa-tion about the chemotherapy regimen in a particulartrial. These models are used to complement meta-analyses, explaining statistical heterogeneity betweenthe effect sizes computed from randomized clinical tri-als (Thompson and Higgins 2002). Though in structuremeta-regressions are similar to the prediction modelswe build, representing trial outcomes as a functionof trial properties, they are used to explain differ-ences between existing randomized trials, and studyauthors generally do not evaluate the out-of-samplepredictiveness of the models. Like meta-analyses, meta-regressions are performed on a small subset of theclinical trials for a given disease, often containing justa few drug combinations. Even when a wide range ofdrug combinations are considered, meta-regressionstypically do not contain enough drug-related variablesto be useful in proposing new trials. For instance, Hsuet al. (2012) uses only three variables to describe thedrug combination in the clinical trial; new combinationchemotherapy trials could not be proposed using theresults of this meta-regression. Finally, meta-analysesand meta-regressions are typically performed on asubset of published randomized controlled studies,whereas our approach uses data from both randomizedand nonrandomized studies.

Because approaches such as patient-level clinicalprediction rules, meta-analysis, and meta-regressioncannot be readily used to design new chemotherapyregimens to be tested in clinical trials, other methods,collectively termed “preclinical models,’’ are insteadused in the treatment design process. Following com-monly accepted principles for designing combinationchemotherapy regimens (Page and Takimoto 2002),researchers seek to combine drugs that are effectiveas single agents and that show synergistic behaviorwhen combined; drugs that cause the same toxic effectsand that have the same patterns of resistance are notcombined in suggested regimens. Molecular simula-tion is a well-developed methodology for identifying

synergism in drug combinations (Chou 2006), andvirtual clinical trials, which rely on pharmacodynamicand pharmacokinetic models to analyze different drugcombinations, can also be used to suggest new treat-ments (Kleiman et al. 2009). Animal studies and in vitroexperimentation can be used to further evaluate novelchemotherapy regimens; results of these preclinicalstudies are often cited as motivations for phase IIstudies of combination chemotherapy regimens (Chaoet al. 2006, Iwase et al. 2011, Lee et al. 2009). A keylimitation of current preclinical models is that mostdo not incorporate treatment outcomes from actualpatients, whereas the new models we propose in thiswork leverage patient outcomes reported in previousclinical trials.

We believe that the data-driven approaches we pro-pose in this paper would complement existing pre-clinical models. For instance, an in vitro experimentcould be performed to evaluate the antitumor activityof a combination chemotherapy regimen suggested byour model from §4.1. Such experimentation could beused to evaluate and refine chemotherapy regimenssuggested by our models, which would be especiallyimportant when our model suggests a regimen thatcombines drugs that have never been tested togetherin a prior clinical trial. Because our models cannotaccurately predict outcomes of clinical trials testingnew drugs, existing preclinical models would needto be used to design therapies incorporating thesenew drugs. On the other hand, our statistical modelscould be used to predict the efficacy and toxicity ofchemotherapy regimens designed using other preclini-cal models, identifying the most and least promisingsuggested regimens.

In this work, we evaluate our proposed approach ongastric cancer. Not only is this cancer important—gastriccancer is the third leading cause of cancer death inthe world (Torre et al. 2015)—but there is no singlechemotherapy regimen widely considered to be thestandard or best treatment for this cancer (Wagner2006, Wong and Cunningham 2009, NCCN 2014), and

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 5: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy Regimens4 Management Science, Articles in Advance, pp. 1–21, © 2016 INFORMS

Table 2 Definitions of Some Common Chemotherapy Clinical TrialTerms

Term Definition

Arm A group or subgroup of patients in a trial thatreceives a specific treatment

Controlled trial A type of trial in which an experimental treatment iscompared to a standard treatment

Cycle The length of time between repeats of a dosingschedule in a chemotherapy treatment

Exclusion criteria The factors that make a person ineligible fromparticipating in a clinical trial

Inclusion criteria The factors that allow a person to participate in aclinical trial

Phase I study A clinical study focused on identifying safe dosagesfor an experimental treatment

Phase I/II study A study that combines a phase I and phase IIinvestigation

Phase II study A clinical study that explores the efficacy and toxicityof an experimental treatment

Phase III trial A randomized controlled trial that compares anexperimental treatment with an established therapy

Randomized trial A type of trial in which patients are randomlyassigned to one of several arms

Sequential treatment A treatment regimen in which patients transition fromone treatment to another after a prespecifiednumber of treatment cycles

researchers frequently perform clinical trials testingnew chemotherapy regimens for this cancer. We believe,however, that our approach has potential to help inselecting regimens to test in trials for many other dis-eases, and we discuss this as an area of future workin §5.

2. Clinical Trial DatabaseIn this section, we describe the inclusion/exclusionrules we used and the data we collected to build ourdatabase. Definitions of some of the common clinicaltrial terms we use are given in Table 2.

In this study, we seek to include a wide range ofclinical trials, subject to the following inclusion crite-ria: (1) Phase I/II, phase II, or phase III clinical trialsfor advanced or metastatic gastric cancer;1 (2) trialspublished no later than March 2012, the study cutoffdate; and (3) trials published in the English language.Notably, these criteria include nonrandomized clini-cal trials, unlike meta-analyses, which typically onlyinclude randomized studies. Although including non-randomized trials provides us with a significantlylarger set of clinical trial outcomes and the ability togenerate predictions for a broader range of chemother-apy drug combinations, this comes at the price ofneeding to control for differences in demographics andother factors between different clinical trials.

1 Clinical trials for gastric cancer often contain patients with cancer ofthe gastroesophageal junction or the esophagus due to the similaritiesamong these three types of cancer. We include studies as long as allpatients have one of these three forms of cancer.

Exclusion criteria were as follows: (1) trials testingsequential treatments; (2) trials that involve the applica-tion of radiotherapy;2 (3) trials that apply chemotherapyfor earlier stages of cancer, when the disease can stillbe cured; and (4) trials to treat gastrointestinal stromaltumors, a related form of cancer.

To locate candidate papers for our database, weperformed searches on PubMed, the Cochrane Cen-tral Register of Controlled Trials, and the CochraneDatabase of Systematic Reviews. In the Cochrane sys-tems, we searched for either MeSH term “StomachNeoplasms” or MeSH term “Esophageal Neoplasms”with the qualifier “Drug Therapy.” In PubMed, wesearched for a combination of the following keywordsin the title: “gastr∗” or “stomach”; “advanced” or“metastatic”; and “phase” or “randomized trial” or“randomised trial.” A single individual reviewed thesesearch results, and these searches yielded 350 clinicaltrials that met the inclusion criteria for this study.

After this search through medical databases, we fur-ther expanded our set of papers by searching throughthe references of papers that met our inclusion criteria.This reference search yielded 64 additional papersthat met the inclusion criteria for this study. In total,our literature review yielded 414 clinical trials testing495 treatment arms that we deemed appropriate forour approach. Since there are often multiple paperspublished regarding the same clinical trial, we verifiedthat each clinical trial included was unique.

2.1. Manual Data CollectionA single individual manually extracted data from clin-ical trial papers and entered extracted data valuesinto a database. Values not reported in the clinicaltrial report were marked as such in the database. Weextracted clinical trial outcome measures of interestthat capture the efficacy and toxicity of each treatment.Several measures of treatment efficacy (e.g., tumorresponse rate, median time until tumor progression,median survival time) are commonly reported in clini-cal trials. A review of the primary objectives of thephase III trials in our database indicated that for themajority of these trials (60%), the primary objectivewas to demonstrate improvement in the median overallsurvival (OS)—the length of time from enrollment inthe study until death—of patients in the treatmentgroup. As a result, this is the metric we have chosen asour measure of efficacy.3 To capture the toxic effects oftreatment, we also extracted the fraction of patients

2 Radiotherapy is not recommended for metastatic gastric cancerpatients (NCCN 2014), and through PubMed and Cochrane searchesfor stomach neoplasms and radiotherapy, we found only threeclinical trials using radiotherapy for metastatic gastric cancer.3 The full survival distribution of all patients, which enables thecomputation of metrics such as 6-month and one-year survivalrates, was available for only 348/495 (70.3%) of treatment arms.

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 6: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy RegimensManagement Science, Articles in Advance, pp. 1–21, © 2016 INFORMS 5

experiencing any toxicity at Grade 3 or 4, designatingsevere, life-threatening, or disabling toxicities (NationalCancer Institute 2006).

For each drug in a given trial’s chemotherapy treat-ment, the drug name, dosage level for each application,number of applications per cycle, and cycle lengthwere collected. We also extracted many covariates thathave been previously investigated for their effects onresponse rate or overall survival in prior chemotherapyclinical trials for advanced gastric cancer. To limit thenumber of missing values in our database, we limitedourselves to variables that are widely reported in clinicaltrials. These variables are summarized in Table 3.

We chose not to collect many less commonly reportedcovariates that have also been investigated for theireffects on response and survival in other studies, includ-ing cancer extent, histology, a patient’s history of prioradjuvant therapy and surgery, and further details ofpatients’ initial conditions, such as their baseline biliru-bin levels or body surface areas (Ajani et al. 2010, Banget al. 2010, Kang et al. 2009, Koizumi et al. 2008). How-ever, the variables we do collect enable us to control forpotential sources of endogeneity, in which patient andphysician decision rules in selecting treatments mightlimit the generalizability of model results. For example,we collect performance status, a factor used by physi-cians in selecting treatments (NCCN 2014). Althoughother factors, such as comorbidities and patient prefer-ences for toxicities, are important in treatment decisionsfor the general population (NCCN 2014), clinical trialsuniformly exclude patients with severe comorbidities,and toxicity preferences do not affect actual survival ortoxicity outcomes. The only other treatment decisionwe do not account for in our models is that patientswith HER2-positive cancers should be treated with thedrug trastuzumab (NCCN 2014), whereas this treat-ment is ineffective in other patients. We address thisissue by excluding trastuzumab from the combinationchemotherapy regimen suggestions we make in §4.

In Table 3, we record the patient demographics wecollected as well as trial outcomes. We note that theset of toxicities reported varies across trials, and thatthe database contains a total of 9,592 toxicity entries,averaging 19.4 reported toxicities per trial arm.

2.2. An Overall Toxicity ScoreAs described in §2.1, we extracted the proportion ofpatients in a trial who experience each individualtoxicity at Grade 3 or 4. In this section, we present amethodology for combining these individual toxicityproportions into a clinically relevant score that captures

Meanwhile, the median OS was available for 463/495 (93.5%) oftreatment arms. Given the broader reporting of median OS coupledwith the established use of median OS as a primary endpoint inphase III trials, we have chosen this metric as our central efficacymeasure.

Table 3 Patient Demographic, Study Characteristic, and OutcomeVariables Extracted from Gastric Cancer Clinical Trials

Average %Variable value Range reported

Patient demographicsFraction male 0072 0.29–1.00 9708Fraction of patients with 0014 0.00–1.00 9802prior palliative chemotherapy

Median age (years) 59056 46–80 9900Mean performance status a 0086 0.11–3.00 7908Fraction of patients with 0089 0.00–1.00 9409primary tumor in the stomach

Fraction of patients with primary 0008 0.00–1.00 9403tumor in the gastroesophagealjunction

Study characteristicsFraction of study authors from Country 0.00–1.00 9508each country (11 different dependentvariables for countries in atleast 10 trial arms)b

Fraction of study authors 0042 0.00–1.00 9508from an Asian countryb

Number of patients 5309 11–521 10000Publication year 2003 1979–2012 10000

OutcomesMedian overall survival (months) 902 1.8–22.6 9305Incidence of every Grade 3/4 Toxicity dependent c

or Grade 4 toxicity

Note. These variables, together with the drug variables, were inputted into adatabase.

aThe mean Eastern Cooperative Oncology Group (ECOG) performance statusof patients in a clinical trial, on a scale from 0 (fully active) to 5 (dead). See§A.1 in Appendix A for details.

bThe studies that did not report this variable instead reported affiliatedinstitutions without linking authors to institutions. The proportion of authorsfrom an Asian country serves as a proxy to identify study populations withpatients of Asian descent, who are known to have different treatment outcomesthan other populations.

cSee §A.2 in Appendix A for details on data preprocessing for blood toxicities.

the overall toxicity of a treatment. The motivation foran overall toxicity score is that there are 370 differentpossible averse events from cancer treatments (NationalCancer Institute 2006). Instead of building a model foreach of these toxicities, some of which are significantlymore severe than others, we use an overall toxicityscore.

To gain insight into the rules that clinical decisionmakers apply in deciding whether a treatment hasan acceptable level of toxicity, we refer to guidelinesestablished in phase I clinical trials. The primary goalof these early studies is to assess drugs for safety andtolerability on small populations and to determine anacceptable dosage level to use in later trials (Golan et al.2008). These trials enroll patients at increasing dosagelevels until the toxicity becomes unacceptable. The“Patients and Methods” sections of phase I trials specifya set of so-called “dose-limiting toxicities’’ (DLTs). Ifa patient experiences any one of the toxicities in thisset at the specified grade, he or she is said to haveexperienced a DLT. When the proportion of patients

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 7: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy Regimens6 Management Science, Articles in Advance, pp. 1–21, © 2016 INFORMS

with a DLT exceeds a predetermined threshold, thetoxicity is considered “too high,” and a lower dose isindicated for future trials. From these phase I trials,we can learn the toxicities and grades that clinicaltrial designers consider the most clinically relevantand design a composite toxicity score to representthe fraction of patients with at least one DLT duringtreatment.

Based on a review of the 20 clinical trials meeting ourinclusion criteria that also presented a phase I study(so-called combined phase I/II trials), we identified thefollowing set of DLTs to include in the calculation ofour composite toxicity score:

• Any Grade 3 or Grade 4 nonblood toxicity, excludingalopecia, nausea, and vomiting. Of the 20 trials reviewed,18 stated that all Grade 3/4 nonblood toxicities areDLTs, except some specified toxicities. Alopecia wasexcluded in all 18 trials, and nausea and vomitingwere excluded in 12 (67%). The next most frequentlyexcluded toxicity was anorexia, which was excluded in5 trials (28%).

• Any Grade 4 blood toxicity. Of the 20 trials reviewed,17 (85%) defined Grade 4 neutropenia as a DLT, 16(80%) defined Grade 4 thrombocytopenia as a DLT,7 (35%) defined Grade 4 leukopenia as a DLT, and4 (20%) defined Grade 4 anemia as a DLT. Only onetrial defined Grade 3 blood toxicities as DLTs, so wechose to exclude this level of blood toxicity from ourdefinition of DLT.

The threshold for the proportion of patients with aDLT that constitutes an unacceptable level of toxicityranges from 33% to 67% over the set of phase I trialsconsidered, indicating the degree of variability amongdecision makers regarding where the threshold shouldbe set for deciding when a trial is “too toxic.” Inthis work we use the median value of 0.5 to identifytrials with an unacceptably high proportion of patientsexperiencing a DLT. Details on the computation ofthe proportion of patients experiencing a DLT arepresented in §A.3 of Appendix A. The 372 clinical trialarms in the data set with nonmissing median OS andDLT proportion are plotted in Figure 1. This figureshows that a typical clinical trial in our database has amedian OS between 5 months and 15 months, and aproportion of patients with a DLT between 0 and 0.75.

3. Statistical Models Predicting ClinicalTrial Outcomes

This section describes the development and testing ofstatistical models that predict the outcomes of clinicaltrials. These models are capable of taking a proposedclinical trial involving chemotherapy drugs that havebeen seen previously in different combinations andgenerating predictions of patient outcomes. In contrastwith meta-analysis and meta-regression, whose primary

Figure 1 Results of the 372 Clinical Trial Arms in Our Database withNonmissing Median OS and DLT Proportion

0

5

10

15

20

0

Proportion of patients with a DLT

Med

ian

over

all s

urvi

val

Number ofpatients

20

50

100

200

500

0.25 0.50 0.75 1.00

Note. The size of a point is proportional to the number of patients in thatclinical trial arm.

aim is the synthesis of existing trials, our objectiveis accurate prediction on unseen future trials (out-of-sample prediction).

3.1. Data and VariablesWe used the data we extracted from published clinicaltrials described in Table 3 together with data about thedrug therapy tested in each trial arm to develop thestatistical models. This data can be classified into fourcategories: patient demographics, study characteristics,chemotherapy treatment, and trial outcomes.

One challenge of developing statistical models usingdata from different clinical trials comes from the patientdemographic data. The patient populations can varysignificantly from one trial to the next. For instance,some clinical trials enroll healthier patients than others,making it difficult to determine whether differences inoutcomes across trials are actually due to different treat-ments or only differences in the patients. To account forthis, we include as independent variables in our modelall of the patient demographic and study characteristicvariables listed in Table 3. The reporting frequencies foreach of these variables is given in Table 3, and missingvalues are replaced by their variable means or esti-mates described in §A.1 of Appendix A before modelbuilding. In total, we included 20 patient demographicand study characteristic variables in our models.4

For each treatment protocol we also define a setof variables to capture the chemotherapy drugs usedand their dosage schedules. There exists considerablevariation in dosage schedules across chemotherapy

4 Variables include the fraction of patients who are male, the fractionof patients with prior palliative chemotherapy, the median patientage, the mean ECOG performance status of patients, the fractionof patients with a primary tumor in the stomach, the fraction ofpatients with a primary tumor in the gastroesophageal junction,the fraction of study authors from each country (11 total variables),the fraction of study authors from an Asian country, the number ofpatients in the study, and the study’s publication year.

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 8: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy RegimensManagement Science, Articles in Advance, pp. 1–21, © 2016 INFORMS 7

trials. For instance, consider two different trials thatboth use the common drug 5-fluorouracil:5 in the first,it is administered at 3,000 mg/m2 once a week, and inthe second, at 200 mg/m2 once a day. To allow for thepossibility that these different schedules might leadto different survival and toxicity outcomes, we definevariables that describe not only whether or not the drugis used (a binary variable), but we also define variablesfor both the instantaneous and average dosages for eachdrug in a given treatment. The instantaneous dose isdefined as the dose of drug d administered each day dis given to patients, and the average dose of a drug dis defined as the average dose of d delivered eachweek. We do not encode information about loadingdosages, which are only given during the first cycleof a chemotherapy regimen, and we use the averageinstantaneous dosage if a drug is given at differentdosages on different days during a cycle. In total, weincluded 72 drugs in our models, which are listed inthe electronic companion (available as supplementalmaterial at http://dx.doi.org/10.1287/mnsc.2015.2363).As a result, we included 216 drug-related independentvariables in our models.

Last, for every clinical trial arm we define outcomevariables to be the median overall survival and thecombined toxicity score defined in §2.2. Trial armswithout an outcome variable (and for which we cannotreplace the value by estimates described in §§A.2 andA.3 of Appendix A) are removed before building ortesting the corresponding models.

3.2. Statistical ModelsWe implement and test several statistical learning tech-niques to develop models that predict clinical trialoutcomes. Information extracted from results of pre-viously published clinical trials serve as the trainingdatabase from which the model parameters are learned.Then, given a vector of inputs corresponding to patientcharacteristics and chemotherapy treatment variablesfor a newly proposed trial, the models produce predic-tions of the outcomes for the new trial.

The first class of models we consider are regularizedlinear regression models. If we let x represent a mean-centered, unit-variance vector of inputs for a proposedtrial (i.e., patient, study, and treatment variables) andy represent a particular outcome measure we wouldlike to predict (median OS or DLT proportion), thenthis class of models assumes a relationship of theform y = Â′x+�0 + �, for some unknown vector ofcoefficients Â, intercept �0, and error term �. We assumethat the noise terms �i are independent with varianceof the form V 4�i5 = �2n−1

i , where � is an unknownconstant and ni is the number of patients in trialarm i. We adjust for this expected heteroskedasticity

5 See Lutz et al. (2007) and Thuss-Patience et al. (2005).

by assigning weight wi = ni/n̄ to each trial arm i forall linear models, where n̄ is the average number ofpatients in a clinical trial arm. It is well known that, insettings with a relatively small ratio of data samples topredictor variables, regularized models help to reducethe variability in the model parameters. We obtainestimates of the regression coefficients Â̂ and �̂0 byminimizing the following objective:

minÂ̂1 �̂0

{ N∑

i=1

wi4Â̂′xi + �̂0 − yi5

2+��Â̂�

pp

}

1 (1)

where N is the number of observations in the trainingset and � is a regularization parameter that limits thecomplexity of the model and prevents overfitting to thetraining data, thereby improving prediction accuracyon future unseen trials. We choose the value of � fromamong a set of 50 candidates through 10-fold crossvalidation on the training set.6

The choice of norm p leads to two different algo-rithms. Setting p = 2 yields the more traditional ridgeregression algorithm (Hoerl and Kennard 1970), popu-lar historically for its computational simplicity. Morerecently, the choice of p = 1, known as the lasso, hasgained popularity because of its tendency to inducesparsity in the solution (Tibshirani 1996). We presentresults for both variants below, as well as results forunregularized linear regression models.

The use of regularized linear models provides signif-icant advantages over more sophisticated models interms of simplicity and ease of interpretation. Never-theless, there is a risk that they will miss significantnonlinear effects and interactions in the data. Therefore,we also implement and test two additional techniquesthat are better suited to handle nonlinear relationships:random forests (RFs) and support vector machines(SVMs). For random forests (Breiman 2001), we use thenominal values recommended by Hastie et al. (2009) forthe number of trees to grow (500) and minimum nodesize (5). The number of variable candidates to sampleat each split is chosen through 10-fold cross validationon the training set.7 For SVMs, following the approachof Hsu et al. (2003), we adopt the radial basis functionkernel and select the regularization parameter C andkernel parameter � through 10-fold cross validation onthe training set.8

6 Candidate values of � are exponentially spaced between �max/104

and �max. We take �max to be the smallest value for which all fittedcoefficients Â̂ are (numerically) 0.7 Candidate values are chosen from among exponentially spacedvalues 46105−44v/3571 6105−34v/3571 0 0 0 1 610524v/3575, where v is the totalnumber of input variables and 6 · 7 denotes rounding to the nearestinteger.8 Candidate values are chosen from an exponentially spacedtwo-dimensional grid of candidates 4C = 2−512−31 0 0 0 12151� =

2−1512−131 0 0 0 1235.

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 9: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy Regimens8 Management Science, Articles in Advance, pp. 1–21, © 2016 INFORMS

All models were built and evaluated with the statis-tical language R version 3.0.1 (R Development CoreTeam 2011) using packages glmnet (Friedman et al.2010a), randomForest (Liaw and Wiener 2002), ande1071 (Meyer et al. 2012).

3.3. Statistical Model ResultsFollowing the methodology of §2, we collected andextracted data from a set of 414 published journal arti-cles from 1979–2012 describing the treatment methodsand patient outcomes for a total of 495 treatment armsof gastric cancer clinical trials.

To compare our statistical models and evaluate theirability to predict well on unseen trials, we implement asequential testing methodology. We begin by sorting allof the clinical trials in order of their publication date.We then only use the data from prior published trialsto predict the patient outcomes for each clinical trialarm. Note that we never use data from another arm ofthe same clinical trial to predict any clinical trial arm.This chronological approach to testing evaluates ourmodel’s capability to do exactly what will be requiredof it in practice: predict a future trial outcome usingonly the data available from the past. Following thisprocedure, we develop models to predict the medianoverall survival as well as the overall toxicity score. Webegin our sequential testing 20% of the way throughthe set of 495 total treatment arms, setting aside thefirst 98 arms to be used solely for model buildingso that our training set is large enough for the firstprediction. Of the remaining 397 arms, we first removethose for which the outcome is not available, leaving383 arms for survival and 338 for toxicity. We thenpredict outcomes only for those arms using drugsthat have been seen at least once in previous trials(albeit possibly in different combinations and dosages).This provides us with 347 data points to evaluatethe survival models and 307 to evaluate the toxicitymodels.

The survival models are evaluated by calculating theroot-mean-square error (RMSE) between the predictedand actual trial outcomes. They are compared againsta naive predictor (labeled “Baseline”), which ignoresall trial details and reports the average of previouslyobserved outcomes as its prediction. This is a stan-dard baseline method used in evaluating sequentialprediction models. Model performance is presented interms of the coefficient of determination (R2) of ourprediction models relative to this baseline. For eachfour-year period of time we compute the RMSE andR2 of each model. To assess statistical fluctuation ofthese quantities, for each prediction we additionallytrain 40 models with bootstrap resampled versionsof the training set, and for each four-year period wereport the mean, 2.5% quantile, and 97.5% quantileof the RMSE and R2 obtained when randomly sam-pling one of the 40 bootstrap model predictions for

each of the predictions made during that four-yearperiod. Figure 2 displays the four-year sliding-windowstatistical fluctuation of the out-of-sample R2 value,along with the values of the RMSE and R2 over the mostrecent four-year window of sequential testing, bothfor the cross-validation results and the out-of-samplepredictions.

To evaluate the toxicity models, recall from the discus-sion of §2.2 that the toxicity of a treatment is consideredmanageable as long as the proportion of patients expe-riencing a dose-limiting toxicity (DLT) is less than afixed threshold: a typical value used in phase I studiesfor this threshold is 0.5. Thus, we evaluate our tox-icity models on their ability to distinguish betweentrials with “high toxicity” (DLT proportion > 0.5) andthose with “acceptable toxicity” (DLT proportion ≤ 0.5).The metric we will adopt for this assessment is thearea under the receiver operating characteristic curve(AUC). The AUC can be naturally interpreted as theprobability that our models will correctly distinguishbetween a randomly chosen unseen trial arm withhigh toxicity and a randomly chosen unseen trialarm with acceptable toxicity. As was the case for sur-vival, we calculate the AUC for each model over afour-year sliding window and assess statistical fluctua-tion using bootstrapping, with the results shown inFigure 3.

We see in Figures 2 and 3 that models for survivaland toxicity all show a trend of improving predictionquality over time, which indicates that our modelsare becoming more powerful as additional data areadded to the training set. The decrease in the AUCof the toxicity model toward the end of the testingperiod might be attributable to the large number ofnew drugs tested in gastric cancer in recent years: 8%of trial arms from 2007–2009 evaluated a drug that hadbeen tested in fewer than three previous arms, whereas21% of arms from 2010–2012 tested such a drug. Wesee that ridge regression, lasso, SVM, and RF all attainsimilar performance when predicting both survival andtoxicity, with sequential R2 of approximately 0.5 forrecent survival predictions and AUC of more than 0.8for recent toxicity predictions. For both prediction tasks,statistical fluctuations overlap for these four models’performances in the final 48-month window. Especiallyfor earlier predictions, the unregularized linear modelis not competitive, likely because of overfitting to thetraining set.

As a result of this performance assessment, weidentified the regularized linear models as the bestcandidates for inclusion in our optimization models,because they have good prediction quality, are theleast computationally intensive, and are the simplest ofthe models we evaluated. We conducted additionaltesting to determine whether the explicit inclusion of

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 10: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy RegimensManagement Science, Articles in Advance, pp. 1–21, © 2016 INFORMS 9

Figure 2 Out-of-Sample Prediction Accuracy of Survival Models

0

0.2

0.4

0.6

2000

Year

LassoLinearRFRidgeSVM

2002 2004 2006 2008 2010 2012

Sur

viva

l: F

our-

year

seq

uent

ial R

2 Model X-Val OOS Bootstrap

Root-mean-square error (RMSE)Baseline — 3075 3.76 [3.73, 3.78]Linear — 2072 3.16 [2.67, 4.58]Ridge 2002 2050 2.57 [2.47, 2.67]Lasso 2000 2044 2.66 [2.50, 2.83]RF 2005 2061 2.68 [2.59, 2.77]SVM 2000 2060 2.65 [2.54, 2.77]

Coefficient of determination (R2)Baseline — 0000 0.00 [−0.01, 0.01]Linear — 0048 0.27 [−0.49, 0.50]Ridge 0049 0056 0.53 [0.50, 0.57]Lasso 0050 0058 0.50 [0.43, 0.56]RF 0048 0052 0.49 [0.45, 0.53]SVM 0050 0052 0.50 [0.46, 0.54]

Notes. (Left) Sequential out-of-sample prediction accuracy of survival models calculated over four-year sliding windows ending in the date shown, reported as thecoefficient of determination 4R25. (Right) Root-mean-square prediction error (RMSE) and R2 for the cross-validation set (“X-Val”), for out-of-sample predictions(“OOS”), and for bootstrapped out-of-sample predictions (“Bootstrap”) for the most recent four-year window of data (March 2008–March 2012), which includes132 out-of-sample predictions.

pairwise interaction terms between variables improvedthe ridge regression models for survival and toxicity ina significant way. We found that out-of-sample resultswere not significantly improved by the addition ofdrug/drug, drug/demographic, or drug/trial infor-mation interaction terms, and therefore we chose toproceed with the simpler models without interactionterms. The lack of improved out-of-sample performancedue to interaction terms may be due to insufficientsample size to identify interaction effects or due tononlinear interactions that could not be captured bythe regularized linear models. We ultimately selectedthe ridge regression models to carry forward into theoptimization. Depictions of the predicted versus actual

Figure 3 Out-of-Sample Classification Accuracy of Toxicity Models

0.5

0.6

0.7

0.8

0.9

1.0

LassoLinearRFRidgeSVM

2000

Year

2002 2004 2006 2008 2010 2012

Tox

icity

: Fou

r-ye

ar s

eque

ntia

l AU

C

Model X-Val OOS Bootstrap

Area under the curve (AUC)Baseline — 0043 0.49 [0.36, 0.61]Linear — 0082 0.77 [0.70, 0.84]Ridge 0076 0083 0.81 [0.77, 0.85]Lasso 0077 0082 0.81 [0.76, 0.85]RF 0078 0081 0.80 [0.75, 0.84]SVM 0078 0087 0.84 [0.80, 0.89]

Notes. (Left) Four-year sliding-window sequential out-of-sample classification accuracy of toxicity models, reported as the area under the curve (AUC) forpredicting whether a trial will have high toxicity (DLT proportion > 0.5). (Right) AUC for the cross-validation set (“X-Val”), for out-of-sample predictions (“OOS”),and for bootstrapped out-of-sample predictions (“Bootstrap”) for the most recent four-year window of data (March 2008–March 2012), which includes119 out-of-sample predictions. Of these, 24 (20.1%) actually had high toxicity.

values for survival along with the receiver operatingcharacteristic (ROC) curve for the toxicity model areshown for the ridge regression models in Figure 4.

Although we rely on a naive baseline throughoutthis section to evaluate our prediction models, it wouldbe challenging to improve this baseline. Clinical trialauthors do not publish predictions of trial survivaloutcomes, so we cannot compare our predictions tooncologists’ predictions. In §3.4 we evaluate whetherthese models could be used by clinical trial planners toidentify unpromising clinical trials before they are run,and in §4 we evaluate if our prediction models couldhelp us to design effective combination chemotherapyregimens.

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 11: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy Regimens10 Management Science, Articles in Advance, pp. 1–21, © 2016 INFORMS

Figure 4 Performance of the Ridge Regression Models for Survival and Toxicity Over the Most Recent Four Years of Data (March 2008–March 2012)

5

10

15

20

5

Actual survival (months)

Pre

dict

ed s

urvi

val (

mon

ths)

AUC = 0.828

0

0.25

0.50

0.75

1.00

0

False positive rate

Tru

e po

sitiv

e ra

te

10 15 20 0.25 0.50 0.75 1.00

Notes. (Left) Predicted versus actual values for survival model 4n = 1325. (Right) ROC curve for high toxicity (DLT proportion > 0.5) predictions, of which 24 areactually high 4n = 1195.

3.4. Identifying Unpromising Clinical TrialsBefore They Are Run

One application of statistical models for predicting atrial’s efficacy and toxicity is to identify and eliminateor modify unpromising proposed trials before they arerun. Such a tool could assist clinical trial planners indeciding whether to run a proposed trial.

First, clinical trial planners might use the modelspredicting toxicity to avoid clinical trials predicted tohave a high DLT rate or to adjust the dosages of thedrugs being tested. The ridge regression model for theproportion of patients with a DLT could be used torank trials based on their predicted DLT proportion,and trials with predicted values exceeding some cutoffcDLT could be flagged. The left side of Figure 5 plotsthe proportion of trials with a high DLT proportion(more than 50%) and the proportion of trials with a lowDLT proportion that are flagged with various cutoffscDLT across all 397 studies in the statistic model testingset (studies published since 1997). Overall, the ridgeregression model achieves an AUC of 0.75 in predictingif a trial will have a high DLT rate. Further, 10% of alltrials with a high DLT rate can be flagged while onlyflagging 0.4% of trials with a low DLT rate, and 20% ofall trials with high DLT rate can be flagged while onlyflagging 1.9% of trials with a low DLT rate.

Planners might also use the models predictingmedian OS to identify clinical trials predicted not toattain a high efficacy compared to recent trials in asimilar patient population. We stratify trials basedon their patient demographics9 and define a studyto have high efficacy if it exceeds the 75th percentile

9 The first stratum is trials for which at least half of the patients hadreceived prior palliative chemotherapy. These 55 arms in the testset had an average median OS of 7.6 months. The remaining four

of median OS values reported in trial arms withinits strata in the past four years. The ridge regressionmodel for the median OS could be used to rank trialsbased on the ratio between the predicted median OSand the strata-specific cutoff for high efficacy, andthe trials with a ratio below some cutoff cOS could beflagged as being unlikely to achieve high efficacy. Theright side of Figure 5 plots the proportion of flaggedtrials that did and did not achieve the strata-specificcutoff for high efficacy for various cutoffs cOS acrossthe 397 studies in the testing set. Overall, the ridgeregression model achieves an AUC of 0.72 in predictingif a trial will not achieve high efficacy. Further, 10%of all trials that do not achieve high efficacy can beflagged while only flagging 0.8% of trials with highefficacy, and 20% of trials that do not achieve highefficacy can be flagged while only flagging 3.1% oftrials with high efficacy.

4. Design of Chemotherapy RegimensThis section describes an approach for designing novelchemotherapy regimens to be tested in phase II studiesusing mixed integer optimization, using the extracteddata and the statistical models we have developed in

strata all consist of trial arms with fewer than half of the patientswith prior palliative chemotherapy (or that value not reported) andvarying patient health levels as measured by the average ECOGperformance status in the trial. We include a stratum for arms withgood performance status (average value 0–0.5; these 57 arms in thetest set had an average median OS of 11.8 months), with mediumperformance status (average value 0.5–1.0; these 173 arms in thetest set had an average median OS of 10.0 months), with poorperformance status (average value of 1.0 or greater; these 77 arms inthe test set had an average median OS of 9.0 months) and unreportedperformance status (these 35 arms in the test set had an averagemedian OS of 9.3 months).

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 12: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy RegimensManagement Science, Articles in Advance, pp. 1–21, © 2016 INFORMS 11

Figure 5 Performance of the Ridge Regression Models at Flagging Low-Performing Trial Arms Among the 397 Arms in the Testing Set

0

0.25

0.50

0.75

1.00P

rop.

hig

h-to

xici

ty a

rms

flagg

ed

0

Prop. low-toxicity arms flagged0.25 0.50 0.75 1.00

0

0.25

0.50

0.75

1.00

Pro

p. lo

w-e

ffica

cy a

rms

flagg

ed

0

Prop. high-efficacy arms flagged0.25 0.50 0.75 1.00

Notes. (Left) Performance flagging trials with unacceptably high toxicity. (Right) Performance flagging trials that do not achieve top-quartile median OS.

§§2 and 3. Further, we present a methodology thatleverages the statistical models from §3 to select thebest-performing regimen already tested in a phase IIstudy for further evaluation in a phase III trial. Finally,we evaluate the quality of the regimens we suggestfor phase II and phase III trials against those actuallyselected by oncology researchers using two evaluationapproaches: the simulation metric and the matchingmetric.

4.1. Phase II Regimen Optimization ModelGiven the current data from clinical trials and thecurrent predictive models that we have constructed,we would like to select the next best regimen to testin a phase II clinical trial. Following the objectivesfor designing clinical trials laid out in §1, we seekto identify trials that have high efficacy, that haveacceptable toxicity, and that test novel treatments.

To identify regimens with high efficacy, we includethe predicted median OS of patients in the trial in theobjective of our optimization model. Our reasoningfor this is that, for the majority of phase III trials inour database that clearly stated a primary objective,the objective was to demonstrate improvement inthe overall survival (OS) of patients in the treatmentgroup.10

To limit our suggestions to regimens with accept-able toxicity, we add a constraint to our optimizationmodel to bound the predicted proportion of patientsexperiencing a DLT to not exceed some constant t.No phase III studies in our database listed a toxicityoutcome as a primary objective, validating our choice

10 Of the 20 phase III trials in our database with a clearly statedprimary objective, 12 of them listed OS as a primary objective.

to address toxicity using a constraint instead of aspart of the objective. Even in cases where our modelpredicts that a regimen will be acceptably nontoxic, aphase I study would still be necessary to ensure patientsafety, potentially resulting in changes to the dosagelevels suggested by our models.

We seek novel treatments using three approaches.First, we require that all regimens (the combination ofdrug and dose selections) suggested by our modelshave never been previously tested in a clinical trial;hence, our model always suggests novel regimens.Second, we require that new drugs are tested in aclinical trial as soon as they are available, ensuringthat we evaluate new drugs as quickly as possible.Finally, our models assign higher weight to regimenscontaining drugs that have not been extensively tested.Motivated by the standard deviation of the samplemean,11 we assign weight ud = t−1/2

d to drug d, wheretd is the number of times drug d has previously beentested in a clinical trial, defining u to be the vectorof all such weights. In the model (1), we incrementthe objective by âud if drug d is selected for testing inthe regimen, where â is a parameter that controls theaggressiveness of the exploration.

Our mathematical model includes decision variablesfor the chemotherapy variables described in §3.1. Wedefine three variables for each drug, correspondingto the chemotherapy treatment variables used in thestatistical models: a binary indicator variable bd to

11 Recall that sd44∑t

i=1 Xi5/t5= �t−1/2 when Xi are independent andidentically distributed (IID) random variables with standard devia-tion � . If we view each Xi as the observed impact of some drug d onthe efficacy or toxicity of a regimen that tests it, then the standarddeviation of the sample mean estimate of that drug’s effect scaleswith t−1/2

d , where td is the number of times the drug has been tested.

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 13: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy Regimens12 Management Science, Articles in Advance, pp. 1–21, © 2016 INFORMS

indicate whether drug d is or is not part of the trial(bd = 1 if and only if drug d is part of the optimalchemotherapy regimen), a continuous variable id toindicate the instantaneous dose of drug d that shouldbe administered in a single session, and a continuousvariable ad to indicate the average dose of drug d thatshould be delivered each week. We define x in ouroptimization model to be the demographic and trialinformation variables for which the chemotherapyregimen is being selected; these values are treated as aconstant in the optimization process.

We use the ridge regression models from §3.2 toparameterize the optimization model. Let the model foroverall survival (OS) be denoted by 4Â̂b

OS5′b+ 4Â̂i

OS5′i+

4Â̂aOS5

′a+ 4Â̂xOS5

′x, with drug variables b, instantaneousdose variables i, average dose variables a, demographicand study characteristic constants x, and coefficientscorresponding to each set of variables indicated withsuperscripts. Similarly, we have a model for the propor-tion of patients with a DLT, which we will denote by4Â̂b

DLT5′b+ 4Â̂i

DLT5′i+ 4Â̂a

DLT5′a+ 4Â̂x

DLT5′x. Note that these

models are all linear in the variables.We can then select the drug therapy to test in the

next clinical study using the following mixed integeroptimization model:

maxb1 i1a

{

4Â̂bOS + âu5′b+ 4Â̂i

OS5′i+ 4Â̂a

OS5′a+ 4Â̂x

OS5′x}

(1)

subject to 4Â̂bDLT5

′b+4Â̂iDLT5

′i+4Â̂aDLT5

′a+4Â̂xDLT5

′x≤ t1 (1a)n∑

d=1

bd ≤N1 (1b)

Ab≤c1 (1c)

4b1i1a5yP1 (1d)

4bd1id1ad5∈ìd1 d=110001n1 (1e)

bd ∈801191 d=110001n0 (1f)

The objective of (1) maximizes the predicted overallsurvival of the selected chemotherapy regimen plussome constant â times ud, the weight capturing howoften drug d has previously been tested, for each drugd in the regimen. This “exploration constant” â controlshow much weight is assigned to exploring drugs thathave not been extensively tested in the training set; alarge â would value exploration of new drugs overidentifying a combination with high predicted efficacy,while â = 0 optimizes the efficacy of the regimen withno consideration for exploration. We experiment witha number of â values in §4.4.

Constraint (1a) bounds the predicted toxicity by aconstant t. This constant value can be defined basedon common values used in phase I/II trials or canbe varied to suggest trials with a range of predictedtoxicities. In §4.4, we present results from varyingthe toxicity limit t. Constraint (1b) limits the total

number of drugs in the selected trial to N , whichcan be varied to select trials with different numbersof drugs. We limit suggested drug combinations tocontain no more than N = 3 drugs, which encompasses89.1% of our database. We chose not to select a limit ofN = 4 or higher both because the average number ofdrugs tested in combinations in our database is 2.3and because all preferred regimens in the NationalComprehensive Cancer Network guidelines for gastriccancer contain three or fewer drugs (NCCN 2014).

We also include constraints (1c) to constrain the drugcombinations that can be selected. In our models, werequire a new drug to be included if it has never beenevaluated in a previous clinical trial and we incorporategenerally accepted guidelines for selecting combinationchemotherapy regimens (Page and Takimoto 2002, Pratt1994, Golan et al. 2008).12 As discussed in §2.1, we alsoeliminate the drug trastuzumab because it is only indi-cated for the subpopulation of HER2-positive patients.We leave research into effective treatments for thissubpopulation as future work. Additional requirementscould be added to constraints (1c), though we do notdo so in this work. Such additional constraints may benecessary because of known toxicities and propertiesof the drugs, or these constraints can be used to addpreferences of the business or research group runningthe clinical trial. For example, a pharmaceutical com-pany may want to require that a new drug they havedeveloped and only tested a few times be used in thetrial. In this case, the optimal solution will be the bestdrug combination containing the necessary drug.

Constraints (1d) force our selected regimen to differfrom the set P of all regimens previously tested in thetraining set. Constraints (1e) limit the instantaneousand average dose of drug d to belong to a feasible setìd. This forces id and ad to equal 0 when bd = 0 and tomatch the instantaneous and average dosages of drugd in some clinical trial in the full database when bd = 1.These constraints force the dosage for a particular drugto be realistic. Last, constraints (1f) define b to be abinary vector of decision variables.

12 We limit the combinations to contain no more than one drugfrom any drug class. There are 23 classes of drugs used in totalin our database, using the classes defined by Golan et al. (2008).The most common classes are platinum-based, antimetabolites,anthracyclines, taxanes, camptothecins, alkylating agents, andchemoprotectants. We disallow pairs of drug classes from beingused together if this pairing appears no more than once in ourdatabase and is discouraged in the guidelines for selecting regimens.The following pairs of classes were disallowed from being usedtogether: anthracycline/camptothecin, alkylating agent/taxane, tax-ane/topoisomerase II inhibitor, antimetabolite/protein kinase, andcamptothecin/topoisomerase II inhibitor. If a chemoprotectant drugis used, it must be used with a drug from the antimetabolite classthat is not capecitabine.

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 14: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy RegimensManagement Science, Articles in Advance, pp. 1–21, © 2016 INFORMS 13

4.2. Phase III Regimen Selection ModelPhase III trials are large randomized controlled trialsthat evaluate the most promising regimens tested inprevious phase II studies, and treatments that per-form well against historical controls in phase III trialsmay then be considered new standard therapies foradvanced cancer. A relatively small number of phase IIItrials are run (7% of trials in our database are phase III),both because a phase III trial is only run when a ther-apy is shown to be particularly effective in a phase IIstudy and because phase III trials enroll more patientsthan phase II studies and are therefore more expensive.Because of both the impact of phase III results onclinical standards and the relatively small numbers ofthese trials run, it is especially important that we selecthigh-quality regimens to test in the experimental armsof these clinical trials.

One option for selecting a regimen to test in aphase III trial would be to identify the prior phase IIstudy arm with DLT proportion not exceeding parame-ter t that achieved the highest median OS, selecting theregimen tested in that phase II study arm. However,phase II studies are often performed using patientpopulations that are not representative of the patientstested in phase III trials and standard clinical prac-tice (Friedman et al. 2010b). As a result, we instead usethe statistical models from §3 to select the regimen tobe tested in the experimental arms of phase III clinicaltrials. Using the demographic and trial variables xfrom the new phase III trial being designed, we selectthe regimen previously tested in a phase II study thathas the highest predicted median OS in population x,limiting to regimens with predicted DLT proportion notexceeding a toxicity limit parameter t. Each of the priorphase II studies is already in the training set of the

Figure 6 Median OS and DLT Proportion for Chemotherapy Regimens That Could Potentially Be Selected for a Phase III Trial Experimental Arm byOur Models

5

10

15

20

Med

ian

OS

(m

onth

s)

5

10

15

20

Adj

uste

d m

edia

n O

S (

mon

ths)

0

DLT (proportion)0.25 0.50 0.75 1.00 0

Adjusted DLT (proportion)0.25 0.50 0.75 1.00

Notes. The left figure plots outcomes of the phase II study that tested each regimen, and the right figure plots predicted outcomes for the regimen in a typicalphase III clinical trial patient population. The red points show the top five trials according to the data reported in the phase II trials, and the green points show thetop five trials according to the prediction models.

statistical model being used to select phase III regimens,but using the prediction model enables us to controlfor the patient population of trials and variability intrial outcomes due to chance. The experimental armsof phase III clinical trials seek novel chemotherapyregimens, so we limit our selected regimens to thosewhose set of drugs do not exactly match the set ofdrugs tested in any arm of a prior phase III trial.

The effects of using prediction models instead ofpublished clinical trial outcomes are displayed inFigure 6. The figure on the left shows, for each regimentested in a phase II study in our database that doesnot match the drug combination tested in a phase IIIclinical trial, the proportion of patients experiencing aDLT and the median overall survival, as they werereported in the phase II study. The figure on the rightshows the predicted performance of each drug regimen,using the average of trial and demographic variablesacross all phase III trials in our database. The red pointsshow the five best trials according to the data, andthe green points show the five best trials according tothe prediction models (we define the best trials hereas the ones with the highest overall survival, subjectto a DLT proportion of no more than 0.5). Figure 6shows that our method will often suggest differentregimens than we would select by just using the datapublished in the individual trials. These differencesoccur because our method takes into account patientand trial characteristics, controlling for trials run inparticularly healthy populations (where strong efficacyoutcomes may be due to demographics) and for trialswith fewer enrolled patients (where strong results aremore likely to be due to chance). For instance, the trialsindicated in red in Figure 6 were smaller on averagethan the trials in green (37 versus 51 patients), which

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 15: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy Regimens14 Management Science, Articles in Advance, pp. 1–21, © 2016 INFORMS

may indicate why our models had more confidencein the quality of the trials labeled in green. We willinvestigate how the regimens suggested by our modelscompare to those selected in current clinical practice in§§4.3 and 4.4.

4.3. Evaluation TechniquesEvaluating the quality of chemotherapy regimens sug-gested by our optimization model against those selectedfor clinical trials by oncologists is an inherently difficulttask. The only definitive way to evaluate a chemother-apy regimen in a given population is through a clinicaltrial, and the only definitive way to evaluate the per-formance of some regimen A suggested by our modelsagainst some regimen B designed without the modelsis through running a randomized controlled trial in apopulation, testing A against B.

Although ultimately clinical trial evidence is neededto evaluate the effectiveness of the proposed models, itis important to first perform preclinical evaluations ofthe proposed approach to determine if it shows promisein improving the results of clinical trials comparedto current practice. As described in §1, preclinicalevaluations estimate the quality of an approach beforetesting it in humans and are used extensively in thedesign of chemotherapy regimens for advanced cancer.Preclinical evaluation cannot be used to conclusivelyconfirm the effectiveness of a new therapy, but it can beused to rule out unpromising approaches. For instance,a new drug that has a high cell kill rate in vitromay or may not be effective at treating cancer in thehuman body, but a new drug that performs poorly inpreclinical testing would likely not be tested in humans.We similarly use preclinical evaluation to determine ifour proposed models show promise, which could helpin deciding whether clinical evaluation is appropriate.

To perform preclinical evaluation of our proposedmodels, we use two different techniques to approximatethe quality of suggestions from our model compared tothose made in real clinical trials: the simulation metricand the matching metric.

4.3.1. Simulation Metric. Through clinical trials,oncologists learn about the efficacy and toxicity ofregimens they test and use this information whendesigning further chemotherapy regimens. To evaluatethe ability of our approach to learn through time, wesimulate the efficacy and toxicity outcomes of clinicaltrials that test the chemotherapy regimens proposedby our models. To simulate clinical trial outcomes, wefirst select plausible coefficient vectors Â∗

OS and Â∗DLT

for our survival and toxicity prediction models, whichwe use to represent the ground truth impact of theexplanatory variables from §2 on efficacy and toxicity.We then simulate the median OS and the proportionof patients with a DLT in a proposed clinical trialusing Â∗

OS and Â∗DLT, respectively, in both cases adding

normally distributed noise with variance based on thenumber of patients in the clinical trial. To compare theregimens selected by our models with toxicity limitt and exploration parameter â against the regimenstested in clinical trials in current practice, we start 20%of the way through the clinical trial database (in 1997),selecting the regimen to be tested in each phase II studyusing the optimization model from §4.1 and selecting theregimen to be used in each phase III trial experimentalarm using the procedure described in §4.2. For eachtrial, both the regimen selected by our models and theregimen run in current practice are evaluated using Â∗

OSand Â∗

DLT. The outcomes for the regimen selected by ourmodels, plus normally distributed noise, are added tothe training set and can be used to design subsequentchemotherapy regimens. We evaluate each parameterset 4t1 â5 using 40 different sets of coefficients Â∗

OS andÂ∗

DLT. Each set of coefficients is obtained by drawing abootstrap sample of the entire database of clinical trialsand training ridge regression models for the efficacyand toxicity outcomes as described in §3; because thesecoefficients are obtained using bootstrap resampling oftrue clinical trial outcomes, they are plausible accordingto clinical trial data. Details of the simulation metricprocedure are provided in Appendix B.

As with many procedures developed to simulatecomplicated real-world systems, the simulation metricmay make a biased evaluation of our proposed model.First, the approach simulates outcomes using the samelinear model specification used by the ridge regressionmodel from §3, which means that the statistical modelsused to make decisions are assumed to be structurallyaccurate. Model performance might be worse if therewere a mismatch between the structure of the ridgeregression model and the true structure of the relation-ship between the covariates and outcomes. Further,although we include a number of constraints in theoptimization models to prevent infeasible regimensfrom being suggested, there there is no guarantee thatall chemotherapy regimens in the feasible set of theoptimization model are indeed biologically, legally, orpractically feasible. This could lead the optimizationmodel to obtain a strong evaluation for a suggestionthat is actually infeasible, which would favorably biasthe evaluation of our approach. Due to the potentialbiases in the simulation metric, results indicating thatour models improve over current practice might meritfurther study in a clinical trial setting, whereas resultsindicating that our models do not improve over currentpractice would suggest that no further evaluation iswarranted.

4.3.2. Matching Metric. Although the simula-tion metric enables us to evaluate our ability tolearn through time by providing feedback about thechemotherapy regimens selected by our optimizationmodel, a shortcoming of this technique is that it relies

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 16: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy RegimensManagement Science, Articles in Advance, pp. 1–21, © 2016 INFORMS 15

on simulated outcomes instead of actual clinical trialoutcomes, introducing a number of potential biases.As a result, we also evaluate our model’s suggestedregimens for phase III trial experimental arms usingthe results of similar clinical trials that were run inpractice. To compare the regimens selected by ourmodels with toxicity limit t against the regimens testedin clinical trials in current practice, we start 20% ofthe way through the clinical trial database (in 1997),selecting the regimen to be used in each phase III trialexperimental arm using the procedure described in§4.2. For each phase III experimental arm, the regimenselected by our models is evaluated using the clinicaltrial in the database testing the most similar chemother-apy regimen, taking into account how well the drugclasses, drugs, and dosages match13 and limiting tochronologically future clinical trials. Meanwhile, theregimens tested in the actual phase III experimentalarms are evaluated using the outcomes of those trials.Details of the matching metric procedure are providedin Appendix B.

A key benefit of the matching metric as defined is thatit does not use the statistical models developed in thiswork in any way to evaluate proposed regimens, noteven to adjust for the population in the matched clinicaltrial. As a result, the matching metric may make a biasedevaluation of proposed chemotherapy regimens becauseit evaluates a chemotherapy regimen using the outcomesof a trial run in a different population. If the matchedtrial used to evaluate a proposed chemotherapy regimenis run in a healthier population than the populationfor which the regimen was selected, favorable biaswould be introduced to the evaluation. Correspondinglyif the patients in the matched trial are less healthy,unfavorable bias is introduced. Due to the potentialbias in the matching metric’s evaluation of our model,any improvements over current practice indicated bythe matching metric would require confirmatory testingin a clinical trial setting.

4.4. Optimization ResultsTo compare our models’ suggested regimens withthose of oncologists, we sequentially computed thesimulation metric for 40 sets of simulated coefficients4Â∗

OS1Â∗DLT5, testing all 9 combinations of parameter

settings t ∈ 8003100410059 and â ∈ 8012149. As detailedin Appendix B, for each set of parameter values 4t1 â5,we obtain 40 vectors of overall survival values for the

13 For each drug in our suggested regimen, we assess a penalty of 0if the same drug is tested at the same dosage in the future regimen,a penalty of 1 if the same drug is tested at a different dosage, apenalty of 10 if a different drug from the same drug class is tested,and a penalty of 100 if no drugs from the same drug class are testedin the future regimen. The penalties are summed for each drug inour suggested regimen, and the future clinical trial with the smallestpenalty score is considered the best match.

27 phase III experimental arms for both current practice(yCP

OS) and for the proposed models (yModOS ). Additionally,

we obtain 40 vectors of “high toxicity” indicator valuesfor the phase III experimental arms both for currentpractice (yCP

DLT) and for the proposed models (yModDLT ).

Figure 7 compares the performance of the regimensselected by this paper’s models for the 27 phase IIIexperimental arms against the 27 phase III experi-mental arms selected by oncologists, reporting theaverage differences in outcomes across the 27 arms,1′4yMod

OS − yCPOS5/27 and 1′4yMod

DLT − yCPDLT5/27. The figure

plots each individual bootstrap replicate and the aver-age comparative performance across replicates, and theaccompanying table additionally reports 95% bootstrappercentile confidence intervals (Davison and Hinkley1997) of comparative performance, computed usingthe R boot package (Canty and Ripley 2014). In all360 simulation runs, the average simulated median OSof the regimens suggested by the proposed modelswas higher then the average simulated median OSof the regimens tested in current practice, with theaverage differences ranging from 2.8 months (95% CI0.9, 5.0) for the model with 4t1 â5= 4003105 to 4.7 months(95% CI 2.6, 6.5) for the model with 4t1â5= 4005145.The simulated proportion of trials with a high DLTrate did not significantly differ from current practicefor any model. Although on average models with highâ values have better simulated survival outcomes andmodels with restrictive toxicity limits have worse simu-lated survival outcomes and better simulated toxicityoutcomes, the bootstrap 95% confidence intervals forall these differences include 0.

The regimens selected by our proposed models forphase II studies and phase III trial experimental armsare qualitatively different from the ones tested in cur-rent practice, which may explain some of the differencesin simulated performance. Due to the nature of theformulation of optimization problem (1), nearly allregimens selected by the proposed models for phase IIstudies (100%) and phase III experimental arms (98%)contain exactly three drugs. Although this is similarto the average number of drugs tested in phase IIstudy arms (2.3) and phase III trial experimental arms(2.4), these studies test many other combination sizes(only 31% of phase II study arms and 37% of phase IIItrial experimental arms test three-drug combinations).Given that there are benefits to regimens with fewerdrugs (e.g., ease of administration and lower cost) andregimens with more drugs (e.g., better combatting drugresistance in the cancer), oncology researchers mayprefer to test a range of regimen sizes. For phase IIItrial experimental arms, the proposed models selectedregimens with newer drugs (newest drug tested in amedian of 5 previous trials) than the regimens tested inpractice (newest drug tested in a median of 22 previoustrials). Given the significant cost of phase III trials, this

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 17: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy Regimens16 Management Science, Articles in Advance, pp. 1–21, © 2016 INFORMS

Figure 7 Simulation Metric Evaluation of Phase III Experimental Arm Suggestions

Current practice0

2

4

6

0–0.2 0.2

Increase in high DLT rate (proportion)

Ave

rage

sur

viva

l im

prov

emen

t (m

onth

s)

Γ

0

2

4

t

0.3

0.4

0.5

â t ã OS (95% CI) ã DLT (95% CI)

0 0.3 2.8 [0.9, 5.0] −0.06 [−0.15, 0.04]0 0.4 3.1 [1.8, 4.9] −0.06 [−0.15, 0.04]0 0.5 3.2 [1.6, 6.3] −0.01 [−0.18, 0.18]2 0.3 3.6 [1.6, 5.9] −0.05 [−0.15, 0.07]2 0.4 4.3 [1.8, 5.6] −0.05 [−0.15, 0.11]2 0.5 4.5 [2.1, 6.7] 0.02 [−0.18, 0.26]4 0.3 3.7 [0.7, 6.0] −0.06 [−0.18, 0.07]4 0.4 4.4 [2.0, 6.3] −0.04 [−0.15, 0.07]4 0.5 4.7 [2.6, 6.5] 0.02 [−0.18, 0.33]

Notes. (Left) Comparison between phase III experimental arm suggestions 4n = 275 from our models and from current practice according to the simulation metric.Each point represents a simulation of our approach and highlighted points are averages across simulations. Points to the left of the y axis improved over currentpractice in the proportion of trials with unacceptably high toxicity, and points above the x axis improved over current practice in average median OS. (Right) Foreach optimization model with parameters â and t , the change in average median OS and proportion of trial arms with unacceptably high DLT compared to currentpractice are shown. CI, confidence interval.

could represent risk aversion on the part of clinicaltrial planners that is not captured in the procedurefrom §4.2. By design 100% of regimens suggested byour models for phase III experimental arms had neverbeen tested before (even in different dosages), similarto the 89% rate seen in clinical practice. In contrast, theproportion of suggested regimens testing new combina-tions (ignoring dosages) for phase II studies was 14% inthe proposed models with â = 0, 25% in the proposedmodels with â = 2, 37% in the proposed models withâ = 4, and 41% in current practice. This suggests theproposed models with â = 0 and â = 2 may spendmore effort optimizing dosages within a combinationand less effort exploring new combinations comparedto clinical practice.

To further compare our models’ suggested regimensfor phase III experimental arms with those of oncol-ogists, we used the matching metric to evaluate oursuggested regimens obtained using each toxicity limitt ∈ 8003100410059. As detailed in Appendix B, for eachparameter value t we obtain overall survival values forthe 27 phase III experimental arms for both currentpractice (yCP

OS) and for the proposed models (yModOS ) as

well as “high toxicity” indicator values for the phase IIIexperimental arms both for current practice (yCP

DLT) andfor the proposed models (yMod

DLT ).Using the matching metric, Figure 8 compares the

performance of the suggested regimens for phase IIIexperimental arms from the three models with t ∈

8003100410059 against the regimens selected by oncol-ogists. The plot uses the same axes as Figure 7 andcaptures statistical fluctuations with 2 standard devia-tion ellipses fitted by bootstrap resampling the match-ing metric results for the 27 phase III experimental

arms. According to the matching metric, the regimenssuggested by the proposed models with t = 0031004,and 0.5 had on average 2.3, 3.5, and 3.9 months highermedian OS than current practice and 0.02, 0.05, and0.12 proportion higher rate of trials with unacceptablyhigh toxicity, respectively. The ellipses for all threemodels include only positive changes in survival butpositive and negative changes in proportion of trialswith unacceptably high toxicity.

5. Discussion and Future WorkIn this work, we built a database of clinical trials forgastric cancer, built statistical models to predict out-of-sample efficacy and toxicity of clinical trials, anddesigned models that propose combination chemother-apy regimens to be tested in clinical trials. Out-of-sampleevaluation suggests that the statistical models could beused by clinical trial planners to identify 10%–20% ofthe trials with high toxicity or that fail to achieve highefficacy, in both cases with a small number of misclassi-fications. Further, two preclinical evaluation techniquesindicate that the models presented in this work mightimprove the efficacy of the regimens selected for testingin phase III clinical trial experimental arms withoutmajor changes in toxicity outcomes.

A key limitation of the results presented in this paperis that we do not run clinical trials to evaluate oursuggested chemotherapy regimens; instead, we evaluateour suggestions using simulated results or the results ofclinical trials testing similar chemotherapy regimens. Asdetailed in §4.3, such preclinical evaluation of proposedapproaches is important, because it can be used toeliminate unpromising approaches without needing torun a clinical trial. Given that both the simulation and

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 18: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy RegimensManagement Science, Articles in Advance, pp. 1–21, © 2016 INFORMS 17

Figure 8 (Color online) Matching Metric Evaluation of Phase III Experimental Arm Suggestions

0

2

4

Ave

rage

sur

viva

l im

prov

emen

t (m

onth

s)

t

0.3

0.4

0.5

0–0.2 0.2

Increase in high DLT rate (proportion)

Current practice

t Match drugs (%) Match classes (%) ã OS ã DLT

0.3 56 82 2.3 0.020.4 59 85 3.5 0.050.5 59 89 3.9 0.12

Notes. (Left) Comparison between phase III experimental arm suggestions 4n = 275 from our models and from current practice according to the matching metric.(Right) For each optimization model with parameter t , the proportion of the model suggestions that match all drugs and all drug classes of the matched future trialarm, as well as the average change in median OS and proportion of trial arms with unacceptably high DLT compared to current practice.

matching metrics indicated that the proposed approachcould improve the efficacy of the regimens tested inphase III trials, we believe it merits further evaluationin a clinical trial setting. We believe the techniqueshould first be evaluated using a single-arm clinicaltrial testing a regimen designed by our optimizationmodels from §4.1. Key outcomes of this clinical trialwould be the acceptability of our tools to clinical trialdecision makers and the efficacy and toxicity outcomesof the trial. If this initial evaluation is deemed a success,we could then perform a randomized controlled trialcomparing a regimen designed by our approach againsta regimen designed using standard clinical trial designmethodology.

We believe that the models presented in this workhave the potential to significantly improve the qual-ity of chemotherapy regimens tested in clinical trials.However, there are a number of promising futuredirections that have the potential to strengthen theresults presented in this work. One opportunity wouldbe to integrate preclinical models, such as in vitroexperimentation and molecular simulation studies, intoour optimization framework. This integration couldtake the form of adding constraints to eliminate drugcombinations found to be biologically infeasible andadding interaction terms between pairs of drugs foundto be synergistic or antagonistic in preclinical models.Another avenue of future work is to use more sophis-ticated techniques such as the knowledge gradientalgorithm (Frazier et al. 2008) or Q-learning (Deardenet al. 1998) when exploring the space of combinationchemotherapy regimens using phase II studies.

Although in this work we focused on designingcombination chemotherapy regimens for gastric cancer,we believe that data-driven tools leveraging databases

of clinical trial results could prove useful in other set-tings. Combination therapy is used to treat many othercancers as well as other diseases such as hypertensionand diabetes, so our models could be applied to designtherapies for these diseases. Models trained on a subsetof clinical trial results for a specific patient subpopula-tion, such as HER2-positive patients, could be used todesign specialized regimens for these subgroups. Otherapplications might include improving clinical prognos-tic models for individual cancer patients, identifyingthe best available therapies for a particular diseasefrom among all those tested in clinical trials, and com-paring treatments that have never been compared in arandomized setting.

Supplemental MaterialSupplemental material to this paper is available at http://dx.doi.org/10.1287/mnsc.2015.2363.

AcknowledgmentsThe authors gratefully acknowledge the contributions ofNatasha Markuzon in helping to design semiautomatedforms of data extraction and of Jason Acimovic, CynthiaBarnhart, Allison Chang, and over 20 MIT undergraduatestudents in the early stages of this research. The authorsalso thank department editor Noah Gans as well as theanonymous associate editor and referees for their helpfulfeedback, which led to significant improvements in the work.This material is based upon work supported by the NationalScience Foundation [Grant 1237136] and a grant by LincolnLaboratories. Any opinions, findings, and conclusions orrecommendations expressed in this material are the authorsand do not necessarily reflect the views of the National ScienceFoundation. This work is sponsored by the Department of theAir Force [Air Force Contract FA8721-05-C-0002]. Opinions,interpretations, conclusions, and recommendations are thoseof the authors and are not necessarily endorsed by the UnitedStates Government.

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 19: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy Regimens18 Management Science, Articles in Advance, pp. 1–21, © 2016 INFORMS

Appendix A. Data PreprocessingWe performed a series of data preprocessing steps to standard-ize the data collected from clinical trials. Many of these stepsinvolved imputation of some value y from a closely relatedvalue x. In all such cases, we consider linear imputation vialinear regression equation y = �0 +�1x + � and quadraticimputation via linear regression equation y = �0 +�1x

2 + �,selecting the model that obtains the lowest sum of squaredresiduals.

A.1. Performance StatusPerformance status is a measure of an individual’s overallquality of life and well-being. It is reported in our databaseof clinical trials predominantly using the Eastern CooperativeOncology Group (ECOG) scale (Oken et al. 1982), and lessoften using the Karnofsky performance status (KPS) scale(Karnofsky and Burchenal 1949). The ECOG scale runs from0–5, where 0 is a patient who is fully active and 5 representsdeath. Among the 495 treatment arms evaluated in this work,423 (85.5%) reported performance status with the ECOGscale, 71 (14.3%) reported with the KPS scale, and 1 (0.2%)did not report performance status. We include mean ECOGscore as a variable in our prediction models.

In 94 treatment arms, the proportion of patients withECOG score 0 and 1 was reported as a combined value. Tocompute the weighted performance score for these arms,we first obtain an estimate of the proportion of patientswith ECOG score 0 and score 1, based on the proportion ofpatients with score 0 or 1. This estimation is done by takingthe n= 302 trials with full ECOG breakdown and nonzerop0 + p1 and fitting linear and quadratic imputation models toestimate p0/4p0 + p15 from p0 + p1; the quadratic model wasselected because it had the lowest sum of squared residualsand achieved an R2 value of 0.264. For the 18 treatment armswith the proportion of patients with each KPS score reported,we perform a conversion from the KPS scale to the ECOGscale based on data in Buccheri et al. (1996). For treatmentarms reporting performance status with other data groupings,the combined score is marked as unavailable. In total, 100trial arms (20.2%) were assigned a missing score.

A.2. Grade 4 Blood ToxicitiesWe need to compute the proportion of patients with eachGrade 4 blood toxicity to compute the proportion of patientswith a DLT, as defined in §2.2. Many treatment arms reportthe proportion of patients with a Grade 3/4 blood toxicity butdo not provide the proportion of patients specifically with aGrade 4 blood toxicity. For neutropenia, thrombocytopenia,anemia, and lymphopenia, we built linear and quadraticimputation models to predict the Grade 4 toxicity from theGrade 3/4 toxicity, training on arms reporting both values.The models were trained using 217, 311, 254, and 8 treatmentarms, respectively. The quadratic imputation model wasmost effective in predicting Grade 4 neutropenia and thelinear models were the most effective for the remaining threeimputations, with the models obtaining R2 values of 0.881,0.669, 0.277, and 0.035, respectively. The models were used toimpute the proportion of patients with a Grade 4 toxicity in119, 93, 101, and 3 treatment arms, respectively.

Two of the most common blood toxicities, leukopeniaand neutropenia, are generally not both reported because of

their similarity (neutrophils are the most common type ofleukocyte). Because neutropenia is more frequently reportedthan leukopenia in clinical trial reports, we chose this asthe common measure for these two toxicities. We trainedquadratic and linear imputation models using the proportionof patients experiencing Grade 3/4 leukopenia to predict theproportion of patients experiencing Grade 4 neutropenia,training on the 142 arms that reported both proportions. Thelinear model was the most effective with R2 = 00752, andwe used that model to convert data from 99 treatment armsthat reported leukopenia toxicity data but not neutropenia.Overall, we used some form of imputation to computeGrade 4 blood toxicities in 230 treatment arms (46.5%).

As a sensitivity analysis to evaluate the effects of imputingthese dependent variables, we sequentially evaluated ridgeregression models limited to the 265 treatment arms for whichno imputation was performed on the dependent variables.The ridge regression model predicting the proportion ofpatients with a DLT had a test set sequential AUC of 0.817(bootstrap 95% CI [0.757, 0.866]) on the last four years ofthe limited data set and did not significantly differ from theAUC of 0.828 (bootstrap 95% CI [0.770, 0.846]) on the lastfour years of the full data set.

A.3. Proportion of Patients with a DLTThe fraction of patients with at least one DLT during treatmentcannot be calculated directly from the individual toxicityproportions reported. For instance, in a clinical trial in which20% of patients had Grade 4 neutropenia and 30% of patientshad Grade 3/4 diarrhea, the proportion of patients with a DLTmight range from 30% to 50%. Here we compare approachesfor computing the proportion of patients experiencing atleast one DLT. We consider five options for combining thetoxicities.

• Max Approach: Label a trial’s toxicity as the proportionof patients with the most frequently occurring DLT. This is alower bound on the true proportion of patients with a DLT.

• Independent Approach: Assume all DLTs in a trial occurredindependently of one another, and use this to compute theexpected proportion of patients with any DLTs.

• Sum Approach: Label a trial’s toxicity as the sum of theproportion of patients with each DLT. This is an upper boundon the true proportion of patients with a DLT.

• Grouped Independent Approach: Define groups of toxicities,using the 20 broad anatomical/pathophysiological categoriesdefined by the NCI-CTCAE v3.0 toxicity reporting criteria(National Cancer Institute 2006). Assign each toxicity groupa “group score” that is the incidence of the most frequentlyoccurring DLT in that group. Then, compute a toxicity scorefor the trial by assuming that toxicities from each group occurindependently, with probability equal to the group score.

• Grouped Sum Approach: Using the same groupings asin the Grouped Independent Approach, compute a toxicityscore for the trial as the sum of the group scores.

We evaluate how each of these five approaches do atestimating the proportion of patients with Grade 3/4 toxicitiesin clinical trials that report this value given the individualGrade 3/4 toxicities. Because there is a strong similaritybetween the set of Grade 3/4 toxicities and the set of DLTs,we believe this metric is a good approximation of howwell the approaches will approximate the proportion of

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 20: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy RegimensManagement Science, Articles in Advance, pp. 1–21, © 2016 INFORMS 19

Table A.1 Correlation of Estimates of Total Grade 3/4Toxicity to the True Value

Combination approach Correlation

Grouped independent 00893Independent 00875Max 00867Grouped sum 00855Sum 00820

patients with a DLT. Forty (8.1%) trial arms report this value,though we can only compute the combined metric for 36 ofthem because of missing toxicity data. The quality of eachcombination approach is obtained by taking the correlationbetween that approach’s results and the combined Grade 3/4toxicities.

As reported in Table A.1, all five combination approachesprovide reasonable estimates for the combined toxicity value,

Figure B.1 Pseudocode of the Simulation Metric Procedure

Input: Clinical trial database X (rows in chronological order) containing q arms (the first r 2= �q/5�− 1 will be used as a training set),corresponding trial weight vector w, and corresponding phase III study indicator pIII. Vectors of binary, instantaneous, and average dosevariables and demographic/study characteristic variables for trial k are indicated by bk1 ik1ak , and xk , respectively. Further input: outcomevectors yOS and yDLT and parameters t and â .Xboot , yboot

OS , ybootDLT , wboot ← bootstrap resampled versions of the q arms in X, yOS, yDLT, and w

Â∗

OS1Â∗

DLT1 rOS1 rDLT ← coefficients and residuals of ridge regression models with parameters selected via cross validation, trained with Xboot ,yboot

OS , ybootDLT , and wboot

� 2OS ← r′

OSdiag4wboot5rOS/4q − f − 15, with ridge regression degrees of freedom f (Hastie et al. 2009)� 2

DLT ← r′

DLTdiag4wboot5rDLT/4q − f − 15P←

⋃rk=184b

k1 ik1ak59, PIII ← 8bk � 1 ≤ k ≤ r1 pIIIk = 19

Xtrain ←X8110001r9, ytrainOS ← yOS1 8110001r9, ytrain

DLT ← yDLT1 8110001r9, wtrain ←w8110001r9

yCPOS ← 6 71yMod

OS ← 6 71yCPDLT ← 6 71yMod

DLT ← 6 7

for k = r + 1 to q do

Â̂OS1 Â̂DLT ← coefficients of ridge regression models trained using Xtrain , ytrainOS , ytrain

DLT , and wtrain

if Trial k is a control arm of a phase III trial thenb∗ ← bk , i∗ ← ik , a∗ ← ak , PIII ← PIII ∪ 8bk9

else if Trial k is an experimental arm of a phase III trial thenP̃← 84b1 i1a5 ∈ P � by PIII, 4Â̂b

DLT5′b+ 4Â̂i

DLT5′i+ 4Â̂a

DLT5′a+ 4Â̂x

DLT5′xk ≤ t9

b∗1 i∗1a∗ ← arg maxb1 i1a∈P̃

{

4Â̂bOS5

′b+ 4Â̂iOS5

′i+ 4Â̂aOS5

′a+ 4Â̂xOS5

′xk}

, with ties broken randomlyPIII ← PIII ∪ 8b∗9

else if Trial k is an arm of a phase II study then

Solve optimization model (1) using coefficients Â̂OS and Â̂DLT, vector u computed using Xtrain, and patient demographic variables xk ,obtaining optimal binary, instantaneous, and average dosage variable values b∗1 i∗, and a∗, respectively. Use parameter â in the objective, t inconstraint (1a), N = 3 in constraint (1b), set P in constraint (1d), and sets ìd derived from X for constraints (1e).

end if

P← P∪ 84b∗1 i∗1a∗59

Append Xtrain with a row derived from b∗1 i∗1a∗, and xk , and append wtrain with wk

Append ytrainOS with a sample from N44Â∗b

OS5′b∗ + 4Â∗i

OS5′i∗ + 4Â∗a

OS5′a∗ + 4Â∗x

OS5′xk1� 2

OS/wk5

Append ytrainDLT with a sample from N44Â∗b

DLT5′b∗ + 4Â∗i

DLT5′i∗ + 4Â∗a

DLT5′a∗ + 4Â∗x

DLT5′xk1� 2

DLT/wk5

if Trial k is the experimental arm of a phase III trial thenAppend yCP

OS with 4Â∗bOS5

′bk + 4Â∗iOS5

′ik + 4Â∗aOS5

′ak + 4Â∗xOS5

′xk

Append yModOS with 4Â∗b

OS5′b∗ + 4Â∗i

OS5′i∗ + 4Â∗a

OS5′a∗ + 4Â∗x

OS5′xk

Append yCPDLT with �Â∗b

DLT′bk+Â∗i

DLT′ ik+Â∗a

DLT′ak+Â∗x

DLT′xk≥005

Append yModDLT with �Â∗b

DLT′b∗+Â∗i

DLT′ i∗+Â∗a

DLT′a∗+Â∗x

DLT′xk≥005

end if

end for

Output: yCPOS , yMod

OS , yCPDLT, and yMod

DLT

though in general grouped metrics outperformed nongroupedmetrics. The best approach is the Grouped IndependentApproach, because it allows the best approximation ofthe combined Grade 3/4 toxicities. We use this approachto compute the final proportion of patients experiencinga DLT.

If one or more of the DLTs for a trial arm are mentioned inthe text but their values cannot be extracted (e.g., if toxicitiesare not reported by grade), then the proportion of patientsexperiencing a DLT for that trial arm is marked as unavailable.This is the case for 104 (21.0%) of trial arms in the database.

Appendix B. Pseudocode of Evaluation Techniquesfor Proposed Regimens

Figure B.1 provides the full pseudocode for the simulationmetric evaluation technique, and Figure B.2 provides the fullpseudocode for the matching metric evaluation technique.

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 21: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy Regimens20 Management Science, Articles in Advance, pp. 1–21, © 2016 INFORMS

Figure B.2 Pseudocode of the Matching Metric Procedure

Input: Clinical trial database X (rows in chronological order) containing q arms (the first r 2= �q/5�− 1 will be used as a training set),corresponding trial weight vector w from §3.2, and corresponding phase III study indicator pIII. Vectors of binary, instantaneous, and averagedose variables and demographic/study characteristic variables for trial k are indicated by bk1 ik1ak , and xk , respectively. The vector of drugclass indicators (Golan et al. 2008) for trial k is indicated by ck . Further input includes outcome vectors yOS and yDLT and parameter t.PIII ← 8bk � 1 ≤ k ≤ r1 pIII

k = 19yCP

OS ← 6 71yModOS ← 6 71yCP

DLT ← 6 71yModDLT ← 6 7

for k = r + 1 to q do

if Trial k is a control arm of a phase III trial thenPIII ← PIII ∪ 8bk9

else if Trial k is an experimental arm of a phase III trial thenXtrain ←X8110001k−19, ytrain

OS ← yOS18110001k−19, ytrainDLT ← yDLT18110001k−19, wtrain ←w8110001k−19

Â̂OS1 Â̂DLT ← coefficients of ridge regression models with parameters selected via cross validation (see §3), trained using Xtrain , ytrainOS , ytrain

DLT ,and wtrain

P←⋃k−1

j=1 84bj1 ij1aj59

P̃← 84b1 i1a5 ∈ P � by PIII, 4Â̂bDLT5

′b+ 4Â̂iDLT5

′i+ 4Â̂aDLT5

′a+ 4Â̂xDLT5

′xk ≤ t9

b∗1 i∗1a∗ ← arg maxb1 i1a∈P̃84Â̂bOS5

′b+ 4Â̂iOS5

′i+ 4Â̂aOS5

′a+ 4Â̂xOS5

′xk9, with ties broken randomlyPIII ← PIII ∪ 8b∗9

c∗ ← vector of drug class indicators (Golan et al. 2008) from b∗

F← arg maxk≤j≤q890c∗′cj + 9b∗′bj +b∗′6�a∗1=a

j1 and i∗1=i

j10 0 0�

a∗n=a

jn and i∗n=i

jn79

Append yCPOS with yOS1k and append yCP

DLT with �yDLT 1k≥005

Append yModOS with

j∈F yOS1 j/�F� and append yModDLT with

j∈F �yDLT 1 j≥005/�F�

end if

end for

Output: yCPOS , yMod

OS , yCPDLT, and yMod

DLT

ReferencesAjani J, Rodriguez W, Bodoky G, Moiseyenko V, Lichinitser M,

Gorbunova V, Vynnychenko I, Garin A, Lang I, Falcon S (2010)Multicenter phase III comparison of cisplatin/S-1 with cis-platin/infusional fluorouracil in advanced gastric or gastro-esophageal adenocarcinoma study: The FLAGS trial. J. ClinicalOncology 28(9):1547–1553.

Bang Y-J, Van Cutsem E, Feyereislova A, Chung HC, Shen L,Sawaki A, Lordick, F, et al. (2010) Trastuzumab in combinationwith chemotherapy versus chemotherapy alone for treatment ofHER2-positive advanced gastric or gastro-oesophageal junctioncancer (ToGA): A phase 3, open-label, randomised controlledtrial. Lancet 376(9742):687–697.

Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.Buccheri G, Ferrigno D, Tamburini M (1996) Karnofsky and ECOG

performance status scoring in lung cancer: A prospective, longi-tudinal study of 536 patients from a single institution. Eur. J.Cancer 32(7):1135–1141.

Burke H (1997) Artificial neural networks improve the accuracy ofcancer survival prediction. Cancer 79(4):857–862.

Canty A, Ripley B (2014) boot: Bootstrap R 4S-Plus5 Functions. Packageversion 1.3-13.

Chao Y, Li CP, Chao TY, Su WC, Hsieh RK, Wu MF, Yeh KH, KaoWY, Chen LT, Cheng AL (2006) An open, multi-centre, phase IIclinical trial to evaluate the efficacy and safety of paclitaxel,UFT, and leucovorin in patients with advanced gastric cancer.British J. Cancer 95(2):159–163.

Chou T-C (2006) Theoretical basis, experimental design, and com-puterized simulation of synergism and antagonism in drugcombination studies. Pharmacological Rev. 58(3):621–681.

Davison AC, Hinkley DV (1997) Bootstrap Methods and Their Applica-tions (Cambridge University Press, Cambridge, UK).

Dearden R, Friedman N, Russell S (1998) Bayesian Q-learning.AAAI-98 Proc. (AAAI Press, Palo Alto, CA), 761–768.

Delbaldo C, Michiels S, Syz N, Soria JC, Le Chevalier T, Pignon JP(2004) Benefits of adding a drug to a single-agent or a 2-agentchemotherapy regimen in advanced non-small-cell lung cancer:A meta-analysis. J. Amer. Medical Assoc. 292(4):470–484.

Delen D, Walker G, Kadam A (2005) Predicting breast cancer surviv-ability: A comparison of three data mining methods. ArtificialIntelligence Medicine 34(2):113–127.

De Ridder F (2005) Predicting the outcome of phase III trials usingphase II data: A case study of clinical trial simulation in latestage drug development. Basic Clinical Pharmacology Toxicology96(3):235–241.

Efferth T, Volm M (2005) Pharmacogenetics for individualized cancerchemotherapy. Pharmacology Therapeutics 107(2):155–176.

Frazier PI, Powell WB, Dayanik S (2008) A knowledge-gradientpolicy for sequential information collection. SIAM J. ControlOptim. 47(5):2410–2439.

Friedman J, Hastie T, Tibshirani R (2010a) Regularization pathsfor generalized linear models via coordinate descent. J. Statist.Software 33(1):1–22.

Friedman LM, Furberg CD, DeMets DL (2010b) Fundamentals ofClinical Trials, Vol. 4 (Springer, New York).

Golan DE, Tashjian AH Jr, Armstrong EJ, Armstrong AW, eds. (2008)Principles of Pharmacology: The Pathophysiologic Basis of DrugTherapy, 2nd ed. (Lippincott, Williams and Wilkins, Philadelphia).

Hastie T, Tibshirani R, Friedman J (2009) The Elements of StatisticalLearning, 2nd ed. (Springer, New York).

Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimationfor nonorthogonal problems. Technometrics 12(1):55–67.

Hsu C, Shen Y-C, Cheng C-C, Cheng AL, Hu FC, Yeh KH(2012) Geographic difference in safety and efficacy of systemicchemotherapy for advanced gastric or gastroesophageal carci-noma: A meta-analysis and meta-regression. Gastric Cancer 15(3):265–280.

Hsu C-W, Chang C-C, Lin C-J (2003) A practical guide to supportvector classification. Technical report, Department of ComputerScience, National Taiwan University. Accessed June 4, 2013,http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.

Hurria A, Togawa K, Mohile SG, Owusu C, Klepin HD, Gross CP,Lichtman SM, et al. (2011) Predicting chemotherapy toxicityin older adults with cancer: A prospective multicenter study.J. Clinical Oncology 29(25):3457–3465.

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.

Page 22: INFORMS is located in Maryland, USA ... - web.mit.eduweb.mit.edu/dbertsim/www/papers/HealthCare/An Analytics Approac… · An Analytics Approach to Designing Combination Chemotherapy

Bertsimas et al.: Designing Chemotherapy RegimensManagement Science, Articles in Advance, pp. 1–21, © 2016 INFORMS 21

Iwase H, Shimada M, Tsuzuki T, Ina K, Sugihara M, HarutaJ, Shinoda M, Kumada T, Goto H (2011) A phase II multi-center study of triple therapy with paclitaxel, S-1 and cisplatinin patients with advanced gastric cancer. Oncology 80(1–2):76–83.

Jefferson M, Pendleton N, Lucas S, Horan MA (1997) Comparison ofa genetic algorithm neural network with logistic regression forpredicting outcome after surgery for patients with nonsmall celllung carcinoma. Cancer 79(7):1338–1342.

Kang Y-K, Kang W-K, Shin D-B, Chen J, Xiong J, Wang J,Lichinitser M, et al. (2009) Capecitabine/cisplatin versus5-fluorouracil/cisplatin as first-line therapy in patients withadvanced gastric cancer: A randomised phase III noninferioritytrial. Ann. Oncology 20(4):666–673.

Karnofsky DA, Burchenal JH (1949) The clinical evaluation ofchemotherapeutic agents in cancer. MacLeod CM, ed. Evalu-ation of Chemotherapeutic Agents (Columbia University Press,New York), 191–205.

Kleiman M, Sagi Y, Bloch N, Agur Z (2009) Use of virtual patientpopulations for rescuing discontinued drug candidates and forreducing the number of patients in clinical trials. AlternativesLab Animals 37(Supplement 1):39–45.

Koizumi W, Narahara H, Hara T, Takagane A, Akiya T, Takagi M,Miyashita K, et al. (2008) S-1 plus cisplatin versus S-1 alone forfirst-line treatment of advanced gastric cancer (SPIRITS trial): Aphase III trial. Lancet 9(3):215–221.

Lee KH, Hyun MS, Kim H-K, Jin HM, Yang J, Song HS, Do YR, et al.(2009) Randomized, multicenter, phase III trial of heptaplatin1-hour infusion and 5-fluorouracil combination chemother-apy comparing with cisplatin and 5-fluorouracil combinationchemotherapy in patients with advanced gastric cancer. CancerRes. Treatment 41(1):12–18.

Lee Y-J, Mangasarian OL, Wolberg WH (2003) Survival-time classifica-tion of breast cancer patients. Comput. Optim. Appl. 25(1):151–166.

Liaw A, Wiener M (2002) Classification and regression by random-Forest. R News 2(3):18–22. http://CRAN.R-project.org/doc/Rnews/.

Lutz MP, Wilke H, Wagener DJT, Vanhoefer U, Jeziorski K, Hegewisch-Becker S, Balleisen L, et al. (2007) Weekly infusional high-dosefluorouracil (HD-FU), HD-FU plus folinic acid (HD-FU/FA), orHD-FU/FA plus biweekly cisplatin in advanced gastric cancer:Randomized phase II trial 40,953 of the European organisationfor research and treatment of cancer gastrointestinal groupand the arbeitsgemeinschaft internistische onkologie. J. ClinicalOncology 25(18):2580–2585.

Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F,Chang CC, Lin CC (2012) e1071: Misc Functions of theDepartment of Statistics, Probability Theory Group (formerly:E1071), TU Wien. R package version 1.6-1. http://CRAN.R-project.org/package=e1071.

National Cancer Institute (2006) Common terminology criteriafor adverse events v3.0 (CTCAE). http://ctep.cancer.gov/protocolDevelopment/electronic_applications/docs/ctcaev3.pdf.

NCCN (National Comprehensive Cancer Network) (2014) NCCNGuidelines for Gastric Cancer, Version 1.2014 (NCCN, Fort Wash-ington, PA). Accessed October 15, 2014, http://www.nccn.org/professionals/physician_gls/pdf/gastric.pdf.

Ohno-Machado L (2001) Modeling medical prognosis: Survivalanalysis techniques. J. Biomedical Informatics 34(6):428–439.

Oken MM, Creech RH, Tormey DC, Horton J, Davis TE, McFaddenET, Carbone PP (1982) Toxicity and response criteria of theeastern cooperative oncology group. Amer. J. Clinical Oncology5(6):649–656.

Overmoyer B (2003) Combination chemotherapy for metastaticbreast cancer: Reaching for the cure. J. Clinical Oncology 21(4):580–582.

Page R, Takimoto C (2002) Principles of chemotherapy. Pazdur R,Coia LR, Hoskins WJ, Wagman LD, eds. Cancer Management:A Multidisciplinary Approach: Medical, Surgical and RadiationOncology (PRR Inc., Melville, NY), 21–38.

Phan J, Moffitt R, Stokes T, Liu J, Young AN, Nie S, Wang MD (2009)Convergence of biomarkers, bioinformatics and nanotechnologyfor individualized cancer treatment. Trends Biotech. 27(6):350–358.

Pratt WB (1994) The Anticancer Drugs (Oxford University Press,New York).

R Development Core Team (2011) R: A Language and Environmentfor Statistical Computing R Foundation for Statistical Computing,Vienna, Austria. ISBN 3-900051-07-0. Available at http://www.R-project.org/.

Roth AD (2003) Chemotherapy in gastric cancer: A never endingsaga. Ann. Oncology 14(2):175–177.

Sertkaya A, Birkenbach A, Berlind A, Eyraud J (2014) Examination ofclinical trial costs and barriers for drug development. TechnicalReport HHSP23337007T, U.S. Department of Health and HumanServices, Washington, DC.

Thompson S, Higgins J (2002) How should meta-regression anal-yses be undertaken and interpreted? Statist. Medicine 21(11):1559–1573.

Thuss-Patience PC, Kretzschmar A, Repp M, Kingreen D, HennesserD, Micheel S, Pink D, Scholz C, Dörken B, Reichardt P (2005)Docetaxel and continuous-infusion fluorouracil versus epirubicin,cisplatin, and fluorouracil for advanced gastric adenocarci-noma: A randomized phase II study. J. Clinical Oncology 23(3):494–501.

Tibshirani RJ (1996) Regression shrinkage and selection via the Lasso.J. Roy. Statist. Soc. Ser. B 58(1):267–288.

Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A (2015)Global cancer statistics, 2012. CA Cancer J. Clinicians 65(2):87–108.

van’t Veer LJ, Bernards R (2008) Enabling personalized cancermedicine through analysis of gene-expression patterns. Nature452(7187):564–570.

Wagner A (2006) Chemotherapy in advanced gastric cancer: Asystematic review and meta-analysis based on aggregate data.J. Clinical Oncology 24(18):2903–2909.

Wong R, Cunningham D (2009) Optimising treatment regimens forthe management of advanced gastric cancer. Ann. Oncology20(4):605–608.

World Health Organization (2012) Fact sheets: Cancer. World HealthOrganization, Geneva.

Zhao L, Tian L, Cai T, Claggett B, Wei LJ (2013) Effectively selectinga target population for a future comparative study. J. Amer.Statist. Assoc. 108(502):527–539.

Zhao Y, Kosorok MR, Zeng D (2009) Reinforcement learning designfor cancer clinical trials. Statist. Med. 28(26):3294–315.

Zhao Y, Zeng D, Rush AJ, Kosorok MR (2012) Estimating individual-ized treatment rules using outcome weighted learning. J. Amer.Statist. Assoc. 107(499):1106–1118.

Dow

nloa

ded

from

info

rms.

org

by [

173.

166.

127.

254]

on

16 M

arch

201

6, a

t 08:

33 .

For

pers

onal

use

onl

y, a

ll ri

ghts

res

erve

d.