Development and Validation of a Gene Signature Classifier ......Oct 27, 2020 · 2 . Abstract: Purpose: Consensus molecular subtyping (CMS) of colorectal cancer (CRC) has potential

1

Development and Validation of a Gene Signature Classifier for Consensus Mo-

lecular Subtyping of Colorectal Carcinoma in a CLIA-Certified Setting

Jeffrey S. Morris1, Rajayalakshmi Luthra2, Yusha Liu3, Dzifa Duose2, Wonyul Lee4, Neelima Reddy2,

Justin Windham5, Huiqin Chen4, Zhimin Tong2, Baili Zhang2, Wei Wei6, Manyam Ganiraju7, Bradley

Broom7, Hector Alvarez2, Alicia Mejia2, Omkara Veeranki2, Mark Routbort2, Van Morris8, Michael J.

Overman8, David Menter8, Riham Katkhuda9, Ignacio I. Wistuba2, Jennifer S. Davis10, Scott Kopetz8*,

Dipen M. Maru2*

1 Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania School of

Medicine

2 Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Cen-

ter

3 Department of Biostatistics, University of Chicago School of Medicine

4 Department of Biostatistics, The University of Texas MD Anderson Cancer Center

5 NanoString Technologies Inc.

6 Cleveland Clinic Foundation

7Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson

Cancer Center

8Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer

Center

9Department of Pathology, University of Chicago Medical Center

10Department of Epidemiology, The University of Texas MD Anderson Cancer Center

* Contributed equally as co-senior authors of this article

Corresponding Author:

Dipen Maru, MD

Professor

Departments of Anatomic Pathology and Translational Molecular Pathology

The University of Texas MD Anderson Cancer Center

Phone: 713 792 2678

Email: [email protected]

on July 5, 2021. © 2020 American Association for Cancer Research.clincancerres.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on October 27, 2020; DOI: 10.1158/1078-0432.CCR-20-2403

mailto:[email protected]://clincancerres.aacrjournals.org/

2

Abstract:

Purpose: Consensus molecular subtyping (CMS) of colorectal cancer (CRC) has potential to reshape

the CRC landscape. We developed and validated an assay that is applicable on formalin fixed paraf-

fin embedded (FFPE) samples of CRC and implemented the assay in a CLIA-certified laboratory. Ex-

perimental design: We performed an in silico experiment to build an optimal CMS classifier using a

training set of 1329 samples from 12 studies and validation set of 1329 samples from 14 studies. We

constructed assay based on Nanostring codesets for the top 472 genes, and performed analyses on

paired flash frozen (FF)/FFPE samples from 175 CRCs to adapt the classifier to FFPE using a subset

of genes found to be concordant between FF and FFPE, and tested the classifier`s reproducibility,

repeatability, and validated in a CLIA-certified laboratory. We assessed prognostic significance of

CMS in 345 patients pooled across 3 clinical trials. Results: The best classifier was Weighted Sup-

port Vector Machine with high accuracy across platforms and gene lists (>0.95), and the 472-gene

model outperforming existing classifiers. We constructed subsets of 99 and 200 genes with high

FF/FFPE concordance, and adapted FFPE-based classifier that had strong classification accuracy

(>80%) relative to “gold standard” CMS. The classifier was reproducible to sample type, RNA quality,

and demonstrated poor prognosis for CMS1-3 and good prognosis for CMS2 in metastatic CRC

(p

3

Introduction:

Colorectal cancer (CRC) is the third most common cancer and a leading cause of cancer death

worldwide. Several papers were published introducing CRC molecular subtyping systems; each parti-

tioning colorectal cancer into three to six subtypes (1-7). The translational value of these works were

limited by their relatively small sample sizes and lack of consensus regarding which of the six subtyp-

ing systems best captured the tumor heterogeneity and had superior utility as predictive and/or prog-

nostic marker. In this context, Guinney, et al. (8) assembled the colorectal subtyping consortium

(CRCSC) that sought to identify consensus molecular subtypes (CMS) by assembling a data base of

gene expression measurements from 4,151 CRC patients from a collection of 18 international stud-

ies, having each of the six subtyping systems applied to each of these samples, and then using a

network analysis to identify consensus clusters. The four consensus subtypes were identified primari-

ly based on the biologic characteristics of colorectal cancer. However, findings by Guinney, et al. and

subsequent other studies have demonstrated prognostic and predictive value of CMS in colorectal

cancer (9-17).

In order to fully realize these potential benefits of CMS, it is necessary to have a robust, reliable sin-

gle sample classifier to discern a CRC patient’s CMS from the tumor tissue. As part of an internation-

al consortium (8), we previously presented a Random Forest classifier that included 5973 genes and

a “single sample” classifier based on nearest centroid predictor applied using 693 genes, built primari-

ly using microarrays designed for use with flash frozen (FF) samples. Efforts are needed to build a

more parsimonious single sample classifier using fewer genes that can be reliably run on RNA ex-

tracted from formalin-fixed paraffin embedded (FFPE) samples. In this paper, we introduce an FFPE-

based CMS classifier using the NanoString platform that has strong accuracy for predicting the CMS

in CRC samples including those from the CRCSC study. This gene classifier was discovered and val-

idated in silico by using the CRCSC data sets and subsequently optimized based on degree of corre-

lation across tissue types; FFPE vs. FF samples and platform type NanoString vs. Affymetrix. Subse-

quently, we validated this FFPE tissue based gene classifier in a CLIA-certified molecular diagnostic

laboratory and demonstrate prognostic significance of CMS in CRC.

Material and Methods:

Development and Validation of CMS Classifier on CRCSC: We performed in silico development

and validation of the CMS classifier using the samples and datasets that were part of the “consensus”

set in CRCSC, meaning that they had so-called “gold standard” CMS status, defined based on

agreement among the six different subtyping systems, against which we could compare to assess

classification accuracy of our CMS classifier. Details of the discovery and validation approach, data

sets used, including tissue type, total number of samples, total number of consensus samples and

preprocessing method are shown in figure-1A, supplementary table-1 and supplementary methods 1

and 2. Classification Modeling Strategies: The classification modelling strategies we considered

included Linear Discriminant Analysis (18,19), Quadratic Discriminant Analysis (20,21), K-Nearest

Neighbor (22,23), Random Forrest (8,24), Rotation Forrest (25,26), Weighted Support Vector Ma-



http://clincancerres.aacrjournals.org/

4

chine (wSVM)(27,28), Distance-Weighted Discrimination (DWD)(29,30), and Ensemble Methods

comprised of voting schemes across these classifiers (supplementary methods 3). We split the train-

ing data set of V1 into subsets, containing 332, 332, 332, and 333 samples, respectively, for use in

the four-fold cross validation model building strategy. For each modeling strategy, we applied the

Quantile Normalization(30) and fit the model to each of the four ¾ subsets, optimizing tuning parame-

ters using nested cross validation, and assessed accuracy for predicting the gold standard CMS on

the left-out ¼ subset. Tuning parameters that showed the best accuracy were selected as optimal

parameters for each subset. We summarized the predictive accuracy of each modeling strategy as a

function of number of genes, allowing us to both assess which modeling strategy appears to be best

and the minimum number of genes needed for accurate CMS classification. Choosing the best mod-

eling strategy, we computed the classification accuracy in the validation data set V2 as well as the

various subsets mentioned above, and summarized results again as a function of number of genes in

the model. Gene Ranking Strategy: We designed a boosting procedure based on multi-class Ada-

boost (31) to order the genes (see supplementary methods 4), which amounts to a forward stage-

wise additive selection in which samples were repeatedly re-weighted at each step so the next best

gene focused more on samples misclassified on previous steps, resulting in a list of genes ranked in

descending order of classification importance. By using the same reduced gene sets for each classifi-

cation method, we were able to gain a straightforward comparison of which method appears to per-

form better, to find the minimum gene set size yielding good classification performance, and to fairly

compare the various methods at any desired model size. wSVM classifier: Our results revealed that

the best performing classifier was the wSVM. The user calls the wSVM function with an N by P matrix

of expression values for P genes for each of N samples with the column names as Entrez IDs, and

the function will quantile normalize the data and apply the wSVM to get class predictions. After pair-

wise coupling, for sample I, we obtain probabilities of each CMS, ij such that ∑ 𝜋𝑖𝑗 = 14𝑗=1 , with

i=maxj {ij} indicating the highest CMS class probability for that sample, which we consider a meas-

ure of CMS classification confidence. We have two possible rules to classify a sample into a CMS

group based on these measures:

1. Most Likely CMS: Classify sample i into the most likely CMS, {j: ij =max (ij)}, regardless of clas-

sification confidence i.

2. Most Likely CMS with a Confidence Threshold: Classify sample i into the most likely CMS as

long as the classification confidence i is above some threshold (e.g. 0.50 or higher), and oth-

erwise consider indeterminate {Choose CMS j: ij =max (ij) if ij >, otherwise CMS indetermi-

nate}. Indeterminate samples are heterogenous tumors containing characteristics of multiple

CMS, so could also be called “mixed CMS”, as done for 13% of total samples by Guinney et al.

(8).

Generation of Gene Signature Classifier in CRC Samples:

Summary of the Approach for Development of NanoString Classifier: We used a novel strategy

to port the CMS classifier designed for Affymetrix platform on FF samples over to the NanoS-

tring/FFPE setting that efficiently utilizes the vast information available to us in the CRCSC data sets

and overcomes inconsistencies in mRNA quality between FF and FFPE samples (Figure 1-B and

supplementary Figure-1).

Affymetrix 133-2 Plus2.0 and NanoString CodeSets Based Gene Expression Assays: Flash fro-

zen and FFPE tumor samples from randomly selected 175 patients including 95 men and 80 women

with stage I-IV colon cancer were included in the first phase of the study to build a FFPE classifier.




5

Seventy-two out of 175 samples were included in the CRCSC, therefore, the “gold standard” CMS based on

Affymetrix expression array was known and only Nanostring assay was run on RNA extracted from FF and

FFPE tissue samples of these tumors. Additional FF and FFPE samples from 103 tumors were identified from

our institutional biorepository. The “gold standard” CMS was not known for these samples, therefore, Affymet-

rix 133-2 Plus2.0 was run on FF samples to identify the CMS as the “gold standard”. Subsequently, Nanostring

assay was run on RNA extracted from FF and FFPE samples from these tumors. All samples were derived

from primary colon/rectum resection specimens without preoperative tumor targeted therapy. The

clinico-pathologic features of the patient population are shown in Table-1 and supplementary meth-

ods 5. Briefly, tumor areas with higher than 60% tumor cellularity from the gland forming tumor and

higher than 20% tumor cellularity from signet ring or mucinous tumor were manually macro-dissected

from FF or FFPE tissue sections to enrich for the demarcated tumor area. In more than 2/3 samples,

superficial and deep (invasive border) areas of the tumor were included for the macro dissection.

RNA was extracted using Qiagen’s AllPrep DNA/RNA kits (QIAGEN, Netherlands) per manufacturer’s

instructions. Each sample was quantitated using the qubit fluorometer, with yields ranging from .9 g

to 26 g. Each sample was run on the Agilent Bioanalyzer (Agilent Technologies, Santa Clara, CA) to

determine the RNA Integrity Number (RIN) for the FF samples and the DV200 value for the FFPE

samples. The FFPE samples included in the development of single sample classifier had 18% to 79%

of the RNA having greater than 200 intact nucleotides. Gene expression analysis with Affymetrix 133-

2 Plus2.0 was performed as described previously (supplementary methods 6). We designed a custom

set of NanoString CodeSets (NanoString Technologies, Seattle, WA) with 472 signature probes, se-

lected from the top 500 genes from the boosting procedure and 28 reference probes. The NanoS-

tring CodeSets for each gene was chosen to be the genomic region that was most highly correlated

with the fRMA level-expression summary of the gene by Affymetrix 133-2 Plus2.0 probe set (Affymet-

rix, Santa Clara, CA). The genes with at least 0.70 correlation between the CodeSets and gene level

summary were included in the customized CodeSets. The 28 reference CodeSets were selected from

the reference CodeSets on the NanoString PanCan array plusR, and selecting genes with evidence of

no difference in CMS in our preliminary data (32-36). The assay was performed as per the NanoS-

tring guidelines (supplementary methods 7) with 10 patient samples, a positive and negative control

on each cartridge. Raw data from the nsolver software was transferred to the bioinformatics group

where the custom CMS classifier algorithm in the form of R script was used to determine which sam-

ples belonged to a particular CMS.

Validation at the Research Molecular Diagnostic Laboratory: The NanoString CodeSets were technically validated by running two samples with “gold standard” CMS known from the CRCSC data on three different lots of CodeSets. The old lot of CodeSets and new lot of CodeSets were run to-gether in the same run and accuracy in identifying the CMS by these CodeSets was assessed by lin-ear regression. We tested repeatability of CMS assay across 4 different runs by same technician in 12 samples, reproducibility with different technician in 12 samples, and reproducibility with different input RNA quantity (50-500ng) for 6 samples using the same CodeSets on the same nCounter used for prior experiments. We also tested reproducibility of CMS between colonoscopy biopsies and sur-gically resected primary CRC by running customized CodeSets on matched biopsies and resection samples, using same CodeSets and nCounter and laboratory personnel. Assessing Performance of CMS Classifier Assay in A CLIA Certified Laboratory:

The NanoString assay with top 200 genes (CRC CMS-200) and top 99 genes (CRC CMS-100),was

further validated at our CLIA-certified Molecular Diagnostic Laboratory (MDL) to apply this assay as

an integral biomarker for a phase II clinical trial (NCT034365630) assessing safety and efficacy of




6

dual TGF-β trap: anti-PD-L1 molecule M7824 (EMD-Serono) in CMS4 subtype CRC. Thirty-five tumor

samples from stage II/III primary colon cancer, previously used for validation at the research molecu-

lar diagnostic laboratory, were used to validate the assay across 10 runs for a total of 120 reactions.

All 35 samples were included in the CRCSC study and gold standard CMS was known for these

samples, and the laboratory technician was blinded from the gold standard CMS for those samples.

Input for the assay was 250 ng of total RNA extracted from FFPE tumor tissue with 20% or higher

tumor cellularity. Accuracy, analytical sensitivity, and analytical specificity were assessed by compar-

ing calls from the MDL CMS panel with “gold standard” CRCSC Affymetrix calls. Reproducibility was

assessed across original run and at least 3 additional repeat runs without re-extraction of RNA. Re-

peat runs were also performed with re-extracted RNA and by 2 technicians.

Assessing Performance of CMS Classifier as a Prognostic Marker in Stage IV Colorectal Can-

cer:

Patients with a CMS determination from the NanoString based gene expression score were pooled

from three separate sources, including two clinical trials; NCT03436563 (n=91) and a phase II trial

assessing Trametinib and Durvalumab in microsatellite stable colorectal cancer (n=19,

NCT03428126) and Assessment of Targeted Therapies Against Colorectal Cancer (ATTACC,

NCT01196130) Screening Protocol (n=235) (37). The ATTACC samples and the samples from the

Trametinib/Durvalumab study were characterized by CRC CMS-100 assay at the research molecular

diagnostic laboratory, while samples from patients enrolled in M7824 clinical trial were characterized

by CRC CMS-200 performed at the CLIA compliant molecular diagnostic laboratory. Median overall

survival was calculated from date of stage IV diagnosis to death or date of last follow up, which was

censored. Survival patterns were visualized with Kaplan Meier survival curves and compared using

the log-rank test. Graphs were generated using IBM SPSS Statistics 24.

The study was approved by the institutional review board (IRB) with an informed consent from each

subject or each subject`s guardian for the clinical trial samples. The work on samples from subjects

not enrolled in the clinical trial was approved by the IRB with waiver of informed consent. The study

has been conducted as per the ethical guidelines of U.S. common rule.

Results:

Performance of CRCSC Classifier on CRCSC Data Sets:

We selected the wSVM model as it had the best performance in the Training Data V1 using 4-fold

cross-validation (Supplementary Figure 2, Supplementary Table 2 and supplementary material 8).

The 4-group classification accuracy of the wSVM model on the validation data set V2 was 0.955 for

the full model (5973 genes), and still outstanding for models involving smaller gene numbers, with 4-

group classification accuracies of 0.959, 0.932, and 0.898 for models with 500, 75, and 20 genes, re-

spectively. The performance of the wSVM classifier for the out-of-sample subset (V2o), the RNAseq

subset (TCGA), and the Affymetrix subset (V2a) were comparable to the overall validation perfor-

mance (V2), suggesting the classifier was robust to platform and has good out-of-sample perfor-

mance, relatively evenly across CMS (Supplementary Tables 3-7, supplementary material 9). We

chose a wSVM classifier with 472 genes to move forward with further validation. This classifier yield-

ed an overall 96.3% classification accuracy in the Affymetrix subset V2a, with accuracies of 0.966,

0.967, 0.932, and 0.971 for CMS1, CMS2, CMS3, and CMS4, respectively. The CMS structure was

remarkably persistent being highly consistent in training and validation datasets (heat map in Sup-

plemental Figure-3). Further comparison on our classifier with classifiers described by Guinney et. al.




7

is shown in the supplementary Table-8. Performance of 472 gene CRCSC classifier based on single

Affymetrix probe gene set and the classifier performance by classification confidence are described in

supplementary material 10 and 11 and supplementary Figure-4. Supplementary Table-9 shows all of

the misclassified samples along with the corresponding wSVM class probabilities (ij) for each CMS,

classification confidence (i), and indication of whether this sample could be considered a “CMS mix-

ture” (i.e. (ij >0.20 for multiple CMS) and if the “gold standard” was a part of that mixture. From this,

we see that most of the “misclassified samples” had lower classification confidence i, and many had

evidence of being CMS mixtures, with the “gold standard” CMS being a component of the mixtures.

Nanostring CRCSC Classifier Optimization Based on Correlation between FF and FFPE Tumor

Samples:

The sample-specific correlations of FF and FFPE measurements were very high for most samples

(Figure 2a, Supplementary Table 10A and supplementary material 12), with a small number of sam-

ples with low correlations tending to have poorer RNA quality for their FF samples (p=0.0077) but not

FFPE (p=0.28, Figures 2B and 2C). A histogram of the gene-specific correlation of FF and FFPE

measurements for each of the 472 classifier genes is found in Figure 2D and summarized in supple-

mentary Table 10B. This histogram demonstrates the high level of variability across genes in terms

of concordance of paired FF/FFPE gene expression measurements, and the remarkable consistency

of the gene-specific concordances across batches (Supplementary Figure 5) suggests that this con-

cordance is a consistent characteristic of the gene/probe set and not a random technical factor. This

motivated us to select a subset of genes showing high FF/FFPE concordance for use in our FFPE

classifier, choosing the top 100 genes in terms of FF/FFPE correlation for the CMS-100 classifier and

the top 200 genes for CMS-200.

NanoString FFPE Classifier Performance:

Figure 3 shows the classification accuracy of CMS-100 on FFPE and FF samples, CMS-472 on FF

samples, and the Affy FF-100 and Affy FF-472 based on the Affymetrix validation data V2a, with ac-

curacy split out by confidence threshold and proportion of unclassified samples. The CRC CMS-

100 model applied to FFPE samples had 4-group accuracy of 0.80 with 0.81 for CRCSC samples and

0.78 for non-CRCSC samples. For samples with high confidence (i>0.80 or 0.90), the performance

was better with 4-group accuracy of 0.86 and 0.89, respectively. For FF samples, the CMS-100 had

4-group classification accuracy of 0.80, 0.74 for the CRCSC samples and 0.88 for non-CRCSC sam-

ples, and 4-class accuracy of 0.87 and 0.92 for samples classified with high confidence i =0.80 or

0.90, respectively (supplementary material 13). These performed comparably to the CMS-472, the

472 gene classifier on FF samples, and not much worse than CMS-100 in an idealized non-clinical

setting based on batch-corrected Affymetrix data from the CRCSC studies.

Supplementary figures 6A and 6B plot the 4-class accuracy vs. confidence level i for FFPE and FF

samples, demonstrating that samples classified with high confidence were more likely to be accurate-

ly classified. Supplementary figures 6C and 6D plot the 4-class accuracy vs. RNA quality, defined by

%200nt (FFPE) or RIN (FF), demonstrating that there is little if any association of CMS accuracy with

RNA quality, suggesting that the performance of classifier is robust to RNA quality in this study. One

gene out of the 100 was mistakenly left off an order of the Nanostring codesets for some of the vali-

dation study, so the corresponding classifier CMS-100 that was validated has 99 genes. We con-

firmed the performance of the 100 and 99 gene classifier was concordant.




8

The CRC CMS-100 assay with 99 genes was 100% reproducible in predicting a CMS across different

runs (12 samples= 48 runs), between two laboratory personnel (12 samples) and with different RNA

input concentration (n=6). The reproducibility between biopsy and resection was 91% with 15 of 17

patients had same CMS between matched biopsy and resection specimens (Supplementary Table-

11). All (12 from left colon and 5 from right colon) biopsy samples were procured from same tumor as

surgically resected specimens. Tissue sections from RNA were derived from FFPE blocks that were

generated for the clinical use. The two cases with discrepant CMS between biopsy and resection

were sporadic CRC without any known predisposing condition or preoperative tumor targeted thera-

py. To determine impact of tumor location and histopathologic features on reproducibility of CMS, an-

other pathologist reviewed Hematoxylin & Eosin stained sections of primary tumor from surgical re-

sections of those included in assessment of inter-run reproducibility, InterTech reproducibility, repro-

ducibility across different RNA concentration and reproducibility between biopsy and resection (n=30).

In 19 samples both superficial and deep area of the tumor were macro dissected, 4 samples had only

superficial and 7 had only deep area of the tumor macro dissected. Five tumors were poorly differen-

tiated, one with mucinous histology and 25 tumors were moderately differentiated including one with

mucinous histology. Due to high reproducibility across runs, technicians and RNA concentration, we

did not observe any difference in CMS call among samples with different areas of macro dissection or

histologic parameters. We also did not observe significant difference in probability of a CMS in the

context of histologic parameters. However, two (of 17) samples that showed discrepancy for CMS be-

tween biopsy and surgical resection had only deep area of the tumor macro dissected from the resec-

tion specimens. We also did not find histologic features unique to 7 samples that were discrepant for

the CMS between research laboratory and CLIA certified laboratory.

Performance of CRC CMS-200 in CLIA-certified Molecular Diagnostic Laboratory:

On initial run, 32/35 samples were accurately assigned the CMS as compared to the gold standard

based on “most likely CMS”, i.e. with confidence threshold of i>0.50. Three misclassified samples,

with 0.50 and 0.57 “most likely CMS” probability on the initial run had “most likely CMS” probability in

the borderline range (≥0.43 & 0.57 is used, then in all three samples in all runs had CMS as per the gold

standard, and the assay had 100% analytical sensitivity and analytical specificity. Inter run reproduci-

bility was assessed from 3 separate extractions from 4 unique patient samples for a total of 12 cases.

These 12 cases were run across 3 separate NanoString Runs and by 2 technologists. There was

100% concordance for the CMS classification among all 3 runs with an average standard deviation of

± 0.002 for the “most likely CMS” probability. The inter tech reproducibility was 100% for the CMS

classification between both technicians with an average standard deviation of ± 0.002 for the CMS

probability. Intra run reproducibility was assessed among 4 samples run in triplicate on a single

NanoString run. There was 100% concordance for the CMS classification among all 3 runs with an

average standard deviation of ± 0.012 for the CMS probability. Comparing CMS reproducibility with

CMS-100 (99 genes) vs. 200 gene assay demonstrated 97% reproducibility with only 1 of 35 samples

showing discordant CMS. List of CMS100 (99 genes) and CMS200 test and 16 housekeeping genes

is shown in Supplementary Table-12. These reproducibility and repeatability findings were deemed

up to the level of a CLIA certified assay to determine CMS 4 vs. other CMS for FFPE tumor samples




9

from patients enrolled in the clinical trial targeting patients with CMS4 colorectal cancer, as described

in the methods.

KRAS-BRAF Mutational Status and Prognostic Relevance of CMS by the NanoString CMS

Classifier:

To confirm the expected biologic performance of the assay, we surveyed a set of mCRC patients en-

rolled in clinical trials and ATTACC Protocol (Table-2). Higher frequency of KRAS mutation was ob-

served with CMS 3 (66%) and CMS 4 (50%) samples. BRAF mutations were identified only in CMS 1

(50%), CMS 4(8%), and mixed (12%) subtype samples (Figure-4A). We did not find significant differ-

ence in any of the clinicopathologic and molecular characteristics between samples that were classi-

fied as mixed vs. those that were classified in to one of CMS. Using the CMS-100 (99 genes) classifi-

er, we were able to identify significant differences in overall survival by CMS, consistent with prior

studies (9,14). Specifically, patients with a CMS2 tumor had the best survival with a median of 46.1

months from stage IV diagnosis (95% CI: 36.6, 58.1), patients with a CMS1 or CMS3 tumor had the

poorest survival after a stage IV diagnosis with median survival times of 23.2 (95% CI: 19.3, 59.2) or

21.4 (95% CI: 15.8, 34.6) months, respectively. Patients with a CMS4 tumor had a survival pattern

that was in between that of CMS2 and CMS1 or CMS3 with a median survival time of 35.3 months

(95% CI: 32.2, 40.0) (Figure-4B).

Discussion:

Consensus molecular subtyping has great potential to reshape the landscape of CRC treatment and

contribute to the development of new precision therapeutic approaches. However, to realize this po-

tential, it is necessary to transform the CMS based on network analysis of multiple gene expression

datasets into a clinical test requires an assay that is reproducible across platforms and tissue types,

has high classification accuracy, and is able to generate CMS in a single sample setting. We

achieved this objective using a three step approach for building the classifier; in silico testing of vari-

ous classification strategies considering various gene list sizes on CRCSC data generated on Affy-

metrix platform and determining that wSVM is the optimal system, a gene reduction exercise to select

genes with best concordance across gene expression profiling platforms and tissue types to ensure

optimal performance in FFPE samples, updating the wSVM using the CRCSC training set based on

this reduced number of genes, and then using this classifier on measurements from the Nanostring

assay on FFPE samples after transforming these values onto the scale of the Affymetrix data on FF

samples that dominated the CRCSC training set.

Rather than just choosing a classification strategy in an ad hoc fashion, we used a systematic, rigor-ous strategy to rank the genes based on their classification value and to compare a large number of classification strategies for a wide range of model sizes. This allowed us to find out which strategy performed best, the wSVM, and to determine how parsimonious a classifier could be without sacrific-ing substantial classification accuracy. Given that the classification literature clearly demonstrates that no one classification strategy is optimal for all data sets, the consideration of multiple approaches is important when building classification signatures. Moreover, findings from the Microarray Quality Con-trol (MAQC) project from FDA has shown that even slight differences in the statistical analysis led to discrepancies in biological interpretation (38). High accuracy in predicting CMS by nearly all statistical




10

methods gives credence to the utility of our CMS assay in accurately classifying a colon cancer in one of the CMS. The wSVM classifier we built using this strategy applied to training data, consisting largely of batch-

corrected Affymetrix gene expression measurements from FF samples, performed exceptionally well

in the CRCSC validation data. Our custom design of CodeSets best capitulating the signal in our

training data, and our strategy of starting with more genes than necessary, then narrowing to a subset

with evidence of high FF/FFPE correlation further mitigated the influence of FFPE on classification

performance. Our quantile normalization strategy was sufficient to obtain reasonable performance for

small (99 or 200) gene FFPE NanoString classifier. This strategy allowed us to efficiently utilize our

data resources, using the enormous data on FF samples to train the classifier and collecting a smaller

set of paired FF/FFPE samples to identify genes with high FF/FFPE concordance and map the FFPE

NanoString expression values to the scale of the FF Affymetrix expression values, leading to our

novel strategy for building the classifier. The consistency of gene-specific FF/FFPE concordance

across batches provides strong support for this strategy.

The high concordance we observed in CMS for CRC samples between a research molecular testing laboratory and a CLIA-certified clinical molecular diagnostic laboratory indicate robust performance of the assay. High inter-laboratory reproducibility is likely due to similarities in the pre-analytical and analytical processes between two laboratories. Another reason for our high inter-laboratory reproduc-ibility is the use of NanoString n-counter technology that utilizes non-amplified nucleic acids without any reverse transcription step and is applicable to multiple samples. Ragulan et. al.(39) demonstrated high classification accuracy and reproducibility of NanoString based subtyping-classification between FF and FFPE tissue samples of colorectal cancer. In spite of significant differences in the CRC clas-ses and validation approach, this study and our study indicate that NanoString is a reliable platform to develop and validate gene expression based signature using FFPE samples of CRC. Guinney et al. (8) found that approximately 87% of CRC tumors classified cleanly into a single CMS, but approximately 13% were “mixed CMS”, not outliers or a fifth subtype but heterogeneous samples containing characteristics of multiple CMS. We also found similar proportions of “mixed CMS” sam-ples in our analyses. Clinically, patients with mixed CMS tumors could be treated multiple ways. One option would be to include any “mixed CMS” sample with a high enough probability of CMSx as a po-tential candidate for any targeted therapy that has been validated as a precision therapeutic for CMSx (x=1, 2, 3, or 4), which of course would require prospective validation before clinical application. There is increasing evidence of the prognostic and predictive utility of CMS. Lenz et al. (9) using a NanoString based assay in a large cohort of patients with metastatic or advanced colorectal cancer enrolled in CALGB/SWOG 80305 phase III clinical trial, demonstrated that there is significant differ-ence in overall survival by CMS with median survival of 40 months in CMS2 vs. median survival of 15 months in CMS1. The NanoString assay used for the CALGB/SWOG 80305 and our study differed significantly. Lenz et al. developed a customized NanoString based genes that were derived from some of the large data sets with published gold standard CMS labels, including, The Cancer Genome Atlas and other studies (5,13). Only genes that are common to these three data sets and those as-sessed in the CALGB/SWOG 80405 panel are used. While the genes included in our NanoString based assay were all derived from CRCSC database. Similar prognostic trends were observed by other groups including in patients enrolled in FIRE3 study comparing Cetuximab vs. Bevacizumab with FOLFIRI in metastatic colorectal cancer patients and by Mooi et. al.(14). As research only clas-sifiers, these methodologies are not designed for application for individual patients or suitable for use




11

in prospective patient assignment. In contrast, our classifier as deployed in a clinical lab is suitable for classifying individual patients with the rigor needed for guiding clinical management. Our CLIA validated assay has potential of integral, integrated and exploratory marker. Hypotheses being explored include focused immunotherapy in CMS1, which represents a subgroup with evidence of higher immune infiltrates and activated T-cells. CMS2 represents a group with best overall survival from EGFR inhibition in retrospective assessment of the CALGB/SWOG 80405 trial, while CMS1 benefited from VEGF inhibition (PMID 31042420). CMS4 has active stromal signature and an im-mune modulating strategy has been proposed. For example, in the clinical trial (NCT034365630) as-sessing safety and efficacy of dual TGF-β trap: anti-PD-L1 molecule M7824 (EMD-Serono), CMS as-say was used as an intergral biomarker to select patients with CMS4. Expanded efforts in this trial or other ongoing trials can be done looking to identify other CMS where efficacy of either M7824 or other drug can be assessed based on its mechanism of action and CMS biology. As an integrated assay, all patients can be prospectively tested to identify CMS. The interim analysis then looks at all comers, and if negative, then looks at CMS specific subgroups, with plan to continue the second half of the randomized study. Finally as an exploratory biomarker, a retrospective analysis can be done with the high quality CLIA assay to look for a CMS signal but also to minimize the risk of inconsistent assays when designing the follow up study. To support the goal of dissemination of a robust CMS classifier for retrospective or prospective utilization, the NanoString CodeSets and supporting bioinformatics information can be found at http://qcsrlshinypro.mdanderson.edu/CMSclia/. Unavailability of matched samples prevented us from assessing CMS accuracy between primary and

metastatic tissues in our study. Fontana E et.al (40) using publically available data from Khambata-

Ford dataset (41), demonstrated no significant difference in CMS distribution between localized vs.

metastatic disease. The impact of sample site on CMS classification is necessary to determine host

organ influence and metastasis associated evolution of gene expression in CRC.

In summary, we have developed, validated and demonstrated prognostic utility of a CRC-CMS assay

using FFPE samples. This CLIA validated assay provides a foundation to expand its utility to assess

prognosis in a standard of care setting and explore the assay as a predictor of response to therapy in

clinical trials.

Acknowledgments: This work is funded by National Cancer Institute through Assay Validation For

High Quality Markers For NCI-Supported Clinical Trials (UH2CA207101) and MD Anderson Cancer

Center SPORE in Gastrointestinal Cancer (P50 CA221707). Part of this research was performed in

MD Anderson’s Core facilities which is supported in part by the National Institutes of Health through

Cancer Center Support Grant CA016672. Part of the validation work in the clinical lab was funded as

part of a clinical trial (NCT03436563) funded by EMD-Serono. We thank Kim-Anh Vu in MD Anderson’s Department of Anatomic Pathology for helping with the figures.



http://qcsrlshinypro.mdanderson.edu/CMSclia/http://projectreporter.nih.gov/project_info_description.cfm?aid=9427630http://clincancerres.aacrjournals.org/

12

References:

1. Cancer Genome Atlas N. Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012;487(7407):330-7 doi 10.1038/nature11252.

2. Perez-Villamil B, Romera-Lopez A, Hernandez-Prieto S, Lopez-Campos G, Calles A, Lopez-Asenjo JA, et al. Colon cancer molecular subtypes identified by expression profiling and associated to stroma, mucinous type and different clinical behavior. BMC Cancer 2012;12:260 doi 10.1186/1471-2407-12-260.

3. Schlicker A, Beran G, Chresta CM, McWalter G, Pritchard A, Weston S, et al. Subtypes of primary colorectal tumors correlate with response to targeted treatment in colorectal cell lines. BMC Med Genomics 2012;5:66 doi 10.1186/1755-8794-5-66.

4. Sadanandam A, Lyssiotis CA, Homicsko K, Collisson EA, Gibb WJ, Wullschleger S, et al. A colorectal cancer classification system that associates cellular phenotype and responses to therapy. Nat Med 2013;19(5):619-25 doi 10.1038/nm.3175.

5. Marisa L, de Reynies A, Duval A, Selves J, Gaub MP, Vescovo L, et al. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med 2013;10(5):e1001453 doi 10.1371/journal.pmed.1001453.

6. De Sousa EMF, Wang X, Jansen M, Fessler E, Trinh A, de Rooij LP, et al. Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions. Nat Med 2013;19(5):614-8 doi 10.1038/nm.3174.

7. Budinska E, Popovici V, Tejpar S, D'Ario G, Lapique N, Sikora KO, et al. Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer. J Pathol 2013;231(1):63-76 doi 10.1002/path.4212.

8. Guinney J, Dienstmann R, Wang X, de Reynies A, Schlicker A, Soneson C, et al. The consensus molecular subtypes of colorectal cancer. Nat Med 2015;21(11):1350-6 doi 10.1038/nm.3967.

9. Lenz HJ, Ou FS, Venook AP, Hochster HS, Niedzwiecki D, Goldberg RM, et al. Impact of Consensus Molecular Subtype on Survival in Patients With Metastatic Colorectal Cancer: Results From CALGB/SWOG 80405 (Alliance). Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2019;37(22):1876-85 doi 10.1200/JCO.18.02258.

10. Kwon Y, Park M, Jang M, Yun S, Kim WK, Kim S, et al. Prognosis of stage III colorectal carcinomas with FOLFOX adjuvant chemotherapy can be predicted by molecular subtype. Oncotarget 2017;8(24):39367-81 doi 10.18632/oncotarget.17023.

11. Song N, Pogue-Geile KL, Gavin PG, Yothers G, Kim SR, Johnson NL, et al. Clinical Outcome From Oxaliplatin Treatment in Stage II/III Colon Cancer According to Intrinsic Subtypes: Secondary Analysis of NSABP C-07/NRG Oncology Randomized Clinical Trial. JAMA oncology 2016;2(9):1162-9 doi 10.1001/jamaoncol.2016.2314.

12. Stintzing S WP, Lenz HJ et al. Consensus molecular subgroups (CMS)of colorectal cancer (CRC) and first-line efficacy of FOLFIRI plus cetuximab or bevasizumab in the FIRE3 (AIO KRK-0306) trial. . JCO 2017;35(Suppl 15):3510.

13. Van Cutsem E, Cervantes A, Adam R, Sobrero A, Van Krieken JH, Aderka D, et al. ESMO consensus guidelines for the management of patients with metastatic colorectal cancer. Annals of oncology : official journal of the European Society for Medical Oncology 2016;27(8):1386-422 doi 10.1093/annonc/mdw235.

14. Mooi JK, Wirapati P, Asher R, Lee CK, Savas P, Price TJ, et al. The prognostic impact of consensus molecular subtypes (CMS) and its predictive effects for bevacizumab benefit in metastatic colorectal cancer: molecular analysis of the AGITG MAX clinical trial. Annals of oncology : official journal of the European Society for Medical Oncology 2018;29(11):2240-6 doi 10.1093/annonc/mdy410.

15. Becht E, de Reynies A, Giraldo NA, Pilati C, Buttard B, Lacroix L, et al. Immune and Stromal Classification of Colorectal Cancer Is Associated with Molecular Subtypes and Relevant for Precision Immunotherapy. Clinical cancer research : an official journal of the American Association for Cancer Research 2016;22(16):4057-66 doi 10.1158/1078-0432.CCR-15-2879.

16. Lal N, White BS, Goussous G, Pickles O, Mason MJ, Beggs AD, et al. KRAS Mutation and Consensus Molecular Subtypes 2 and 3 Are Independently Associated with Reduced Immune Infiltration and Reactivity in Colorectal Cancer. Clinical cancer research : an official journal of the American Association for Cancer Research 2018;24(1):224-33 doi 10.1158/1078-0432.CCR-17-1090.




13

17. Okita A, Takahashi S, Ouchi K, Inoue M, Watanabe M, Endo M, et al. Consensus molecular subtypes classification of colorectal cancer as a predictive factor for chemotherapeutic efficacy against metastatic colorectal cancer. Oncotarget 2018;9(27):18698-711 doi 10.18632/oncotarget.24617.

18. Huang D, Quan Y, He M, Zhou B. Comparison of linear discriminant analysis methods for the classification of cancer based on gene expression data. J Exp Clin Cancer Res 2009;28:149 doi 10.1186/1756-9966-28-149.

19. Guo Y, Hastie T, Tibshirani R. Regularized linear discriminant analysis and its application in microarrays. Biostatistics 2007;8(1):86-100 doi 10.1093/biostatistics/kxj035.

20. Arevalillo JM, Navarro H. A new method for identifying bivariate differential expression in high dimensional microarray data using quadratic discriminant analysis. BMC Bioinformatics 2011;12 Suppl 12:S6 doi 10.1186/1471-2105-12-S12-S6.

21. Hastie T, Tibshirani R. Efficient quadratic regularization for expression arrays. Biostatistics 2004;5(3):329-40 doi 10.1093/biostatistics/5.3.329.

22. Ayyad SM, Saleh AI, Labib LM. Gene expression cancer classification using modified K-Nearest Neighbors technique. Biosystems 2019;176:41-51 doi 10.1016/j.biosystems.2018.12.009.

23. Kumar MA, Ewoldt RH, Zukoski CF. Intrinsic nonlinearities in the mechanics of hard sphere suspensions. Soft Matter 2016;12(36):7655-62 doi 10.1039/c6sm01310d.

24. Huynh-Thu VA, Geurts P. Unsupervised Gene Network Inference with Decision Trees and Random Forests. Methods Mol Biol 2019;1883:195-215 doi 10.1007/978-1-4939-8882-2_8.

25. Stiglic G, Rodriguez JJ, Kokol P. Rotation of random forests for genomic and proteomic classification problems. Adv Exp Med Biol 2011;696:211-21 doi 10.1007/978-1-4419-7046-6_21.

26. Rodriguez JJ, Kuncheva LI, Alonso CJ. Rotation forest: A new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 2006;28(10):1619-30 doi 10.1109/TPAMI.2006.211.

27. Chan WH, Mohamad MS, Deris S, Zaki N, Kasim S, Omatu S, et al. Identification of informative genes and pathways using an improved penalized support vector machine with a weighting scheme. Comput Biol Med 2016;77:102-15 doi 10.1016/j.compbiomed.2016.08.004.

28. Abdi MJ, Hosseini SM, Rezghi M. A novel weighted support vector machine based on particle swarm optimization for gene selection and tumor classification. Comput Math Methods Med 2012;2012:320698 doi 10.1155/2012/320698.

29. Huang H, Lu X, Liu Y, Haaland P, Marron JS. R/DWD: distance-weighted discrimination for classification, visualization and batch adjustment. Bioinformatics 2012;28(8):1182-3 doi 10.1093/bioinformatics/bts096.

30. Franks JM, Cai G, Whitfield ML. Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data. Bioinformatics 2018;34(11):1868-74 doi 10.1093/bioinformatics/bty026.

31. Ji Zhu HZ, Saharon Rosset and Trevor Hastie. Multi-class AdaBoost. 2009. 349-60 p. 32. Chen DT, Davis-Yadley AH, Huang PY, Husain K, Centeno BA, Permuth-Wey J, et al. Prognostic

Fifteen-Gene Signature for Early Stage Pancreatic Ductal Adenocarcinoma. PLoS One 2015;10(8):e0133562 doi 10.1371/journal.pone.0133562.

33. Ligibel JA, Cirrincione CT, Liu M, Citron M, Ingle JN, Gradishar W, et al. Body Mass Index, PAM50 Subtype, and Outcomes in Node-Positive Breast Cancer: CALGB 9741 (Alliance). J Natl Cancer Inst 2015;107(9) doi 10.1093/jnci/djv179.

34. Prat A, Galvan P, Jimenez B, Buckingham W, Jeiranian HA, Schaper C, et al. Prediction of Response to Neoadjuvant Chemotherapy Using Core Needle Biopsy Samples with the Prosigna Assay. Clinical cancer research : an official journal of the American Association for Cancer Research 2016;22(3):560-6 doi 10.1158/1078-0432.CCR-15-0630.

35. Veldman-Jones MH, Lai Z, Wappett M, Harbron CG, Barrett JC, Harrington EA, et al. Reproducible, Quantitative, and Flexible Molecular Subtyping of Clinical DLBCL Samples Using the NanoString nCounter System. Clinical cancer research : an official journal of the American Association for Cancer Research 2015;21(10):2367-78 doi 10.1158/1078-0432.CCR-14-0357.

36. Wallden B, Storhoff J, Nielsen T, Dowidar N, Schaper C, Ferree S, et al. Development and verification of the PAM50-based Prosigna breast cancer gene signature assay. BMC Med Genomics 2015;8:54 doi 10.1186/s12920-015-0129-6.




14

37. Overman MJ, Morris V, Kee B, Fogelman D, Xiao L, Eng C, et al. Utility of a molecular prescreening program in advanced colorectal cancer for enrollment on biomarker-selected clinical trials. Annals of oncology : official journal of the European Society for Medical Oncology 2016;27(6):1068-74 doi 10.1093/annonc/mdw073.

38. Goodsaid FM, Amur S, Aubrecht J, Burczynski ME, Carl K, Catalano J, et al. Voluntary exploratory data submissions to the US FDA and the EMA: experience and impact. Nature reviews Drug discovery 2010;9(6):435-45 doi 10.1038/nrd3116.

39. Ragulan C, Eason K, Fontana E, Nyamundanda G, Tarazona N, Patil Y, et al. Analytical Validation of Multiplex Biomarker Assay to Stratify Colorectal Cancer into Molecular Subtypes. Scientific reports 2019;9(1):7665 doi 10.1038/s41598-019-43492-0.

40. Fontana E, Eason K, Cervantes A, Salazar R, Sadanandam A. Context matters-consensus molecular subtypes of colorectal cancer as biomarkers for clinical trials. Annals of oncology : official journal of the European Society for Medical Oncology 2019;30(4):520-7 doi 10.1093/annonc/mdz052.

41. Khambata-Ford S, Garrett CR, Meropol NJ, Basik M, Harbison CT, Wu S, et al. Expression of epiregulin and amphiregulin and K-ras mutation status predict disease control in metastatic colorectal cancer patients treated with cetuximab. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2007;25(22):3230-7 doi 10.1200/JCO.2006.10.5437.




15

Table-1: Clinicopathologic features of patients and samples included in the assay development and validation in the research molecular diagnostic laboratory

Characteristics

n

Age 50 years 136

Gender Male 95 Female 80

Tumor location Right colon 75 Left and sigmoid colon 92 Rectum 5 Multiple primary tumors 3

pT stage pT1 0 pT2 16 pT3 133 pT4 18 pT4a 3 pT4b 1

pN stage pN0 76 pN1 55 pN1a 2 pN1b 5 pN1c 0 pN2 27 pN2a 4 pN2b 2 pN3 0 pNX 4

pM stage pM0 165 pM1 5 pMX 5

Grade Low (well or moderately differentiated) 145 High (poorly differentiated) 30

Time between date of surgery and gene expression analysis

< 5years 20

5-10 years 95 >10 years 60

Samples Matched flash frozen and formalin fixed paraffin embedded

149

Only formalin fixed paraffin embedded 12 Only flash frozen 4




16

Table-2: Patient characteristics of the cohorts utilized to correlate CMS and overall survival

N = 345

Mean Age at Initial Diagnosis (SD) 50.9 (11.5)

Mean Age at Stage IV Diagnosis (SD) 51.5 (11.5)

Sex

Male 187 (54.2) Female 158 (45.8)

Race/Ethnicity

Non-Hispanic (NH) White 258 (74.8) NH African American 32 (9.3) Hispanic 27 (7.8) NH Asian 22 (6.4) Other/Unknown 6 (1.7)

Stage at Initial Diagnosis

I 5 (1.4) II 24 (7.0) III 110 (31.9) IV 204 (59.1) NA 2 (0.6)

KRAS mutation status

wild type 107 (31.0) canonical mutation 147 (42.6) NA 91 (26.4)

NRAS mutation status

wild type 236 (68.4) canonical mutation 17 (4.9) non-canonical mutation 1 (0.3) NA 91 (26.4)

BRAF mutation status

wild type 229 (66.4) v600 20 (5.8) other mutation 5 (1.4) NA 91 (26.4)

MSI status

MSS 177 (51.3) NA 168 (48.7)

Consensus Molecular Subtype

1, Immune 12 (3.5) 2, Canonical 117 (33.9) 3, Metabolic 21 (6.1) 4, Mesenchymal 161 (46.7) Mixed 34 (9.9)




17

Figure Legends:

Figure-1: Flowchart showing approach development and validation of CMS classifier on colorectal cancer sub-

typing consortium (CRCSC) (1A) and development of NanoString classifier on colorectal cancer samples.

Figure 2: Sample-wise and Gene-wise correlation of paired FF/FFPE Samples: Figure- 2A: Histogram of sam-

ple-wise Spearman correlation of paired FF/FFPE values across all 472 CMS genes on Nanostring assay, with

threshold of 0.75 marked with red vertical line. Figure-2B and 2C: Association of sample-wise FF/FFPE with

RNA Quality: Scatterplot of gene-specific Spearman correlation of FF/FFPE vs. RNA quality of FF samples

(2B, based on RIN) and FFPE samples (2C, based on % with 200nt). Figure-2D: Histogram of gene-wise

Spearman correlation of paired FF/FFPE values based on samples with sample-wise correlation > 0.75, with

thresholds to determine the top 100 and top 200 genes indicated by red and blue vertical lines, respectively.

Figure-3: Bar chart and table showing 4-class accuracy of CMS classifiers, along with number (proportion) of

samples classified to each CMS. We assess accuracy for classifier with top 100 genes in terms of FF/FFPE

correlation for FFPE and FF, computed based on Nanostring measurements for FFPE and FF in current study

(Nano FFPE-100, Nano FF-100) and based on Affymetrix measurements for FF in the Affy CRCSC validation

data set (V2a, Affy FF-100), and for the full 472 gene classifier applied to FF samples run on Nanostring plat-

form for current study (Nano FF-472) and FF samples run on Affymetrix in the Affymetrix CRCSC validation

data set (Affy FF-472). Performance is summarized overall and for subsets of samples with high classification

confidence (⍺>0.50, 0.80 or 0.90).

Figure-4: Distribution of KRAS and BRAF mutations across CMS (4A) and correlation of CMS with overall sur-

vival (4B) in stage IV colorectal cancer.




CRCSC databases (n=18)

Agilent Datasets excluded

CRCSC databases (n=14)

V1: Discovery set

(n=1329 samples from 12 studies)V2: Validation

set (n=1329

samples from

14 studies)• Apply four-fold cross validation

model building for each

classification model

• Apply quantile normalization

Test accuracy of classification model

Predictive accuracy of each model as function

of number of genes

• Gene ranking strategy based on its

classification from 5973 to 5 genes

• Compare the classification model with same

gene list

V2a:

Validation

subset with

data on

Affymetrix

U133Plus2.0

(n=929

samples )

V2ap:

Validation

subset with

Affymetrix

U133Plus2.0

arrays fRMA

probe-level

data (n=929

samples)

Validation

subset with

TCGA RNAseq

data (n=189

samples)

V2o: Out of

sample

validation

subset

containing

GSE2109 and

GSE17536

(n=383

samples)

Test accuracy of each model and gene ranking

strategy on validation cohorts

Select the best performing model with least

number of genes - Weighted Vector Support

Machine Model

Further

optimization &

tuning of the

parameters

ValidationDiscovery

• Obtain best possible accuracy by

tuning parameters with each model

• Apply multiclass Adaboost to rank

genes for each model

Figure 1A




Sample wise correlation (ρ) between NanoString flash

frozen (FF) value and NanoString formalin fixed paraffin

embedded (FFPE) value across 472 genes (n=175

samples)

Samples with ρ≥0.75 (n=142 samples)

Gene-wise correlation (r) between FF value & FFPE to

identify sets of genes with top 100 (CMS100) and top

200 (CMS200) correlations

Train new model on reduced gene sets (CMS100/CMS200)

on CRCSC V1 and obtain new wSVM classifier Ψ100 and Ψ200Determine accuracy using Affymetrix data on CRCSC V2

Pick gene set (100 or 200 genes)

NanoString expression for each

FFPE sample for the chosen

gene set (n=158)

Quantile normalized expression

on NanoString FFPE sample for

the chosen gene set

Compute probability of each CMS pj• Most likely CMS: Choose CMS j with

highest pjOR

• Apply confidence threshold:

Choose CMS j with highest pj if

greater than confidence threshold α,

otherwise classify as “Mixed CMS”

• Quantile normalized to the scale of

Affymetrix value of matched FF

sample for the chosen gene set

Apply Ψ to NanoString data normalized

to the scale of Affymetrix data

CMS for each sample

Figure 1B




Figure 2F

req

ue

ncy

Fre

qu

en

cy

Sa

mp

le-w

ise

FF

/FF

FP

E

co

rre

latio

n

Sa

mp

le-w

ise

FF

/FF

FP

E

co

rre

latio

n

Sample-wise FF/FFPE correlation

Gene-wise FF/FFPE correlation

top 100 genes

top 100 genes

FF RIN

FFPE %200nt

p-value=0.0077

p-value=0.280

00.3

0.4

0.5

0.6

0.7

0.8

0.9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.4 0.6 0.8 1.0

0.2 0.4 0.6 0.80.0

2 4 6 8 10

20

40

60

80

0

20

20 30 50 70

40

40

60

60

80

80

100

120

140

A B

C D




CMS1 CMS2 CMS3 CMS4 Unclassified

No threshold 0.80 38 (0.24) 88 (0.56) 9 (0.06) 23 (0.15) 0

α>0.50 0.81 36 (0.23) 85 (0.54) 8 (0.05) 22 (0.14) 5 (0.03)

α>0.80 0.86 29 (0.19) 65 (0.42) 8 (0.05) 11 (0.07) 43 (0.28)

α>0.90 0.90 25 (0.16) 51 (0.33) 7 (0.04) 7 (0.04) 66 (0.42)


No threshold 0.80 34 (0.22) 72 (0.47) 31 (0.20) 16 (0.10) 0

α>0.50 0.82 34 (0.22) 71 (0.46) 28 (0.18) 19 (0.12) 4 (0.03)

α>0.80 0.87 29 (0.19) 64 (0.42) 19 (0.13) 9 (0.06) 30 (0.20)

α>0.90 0.92 20 (0.13) 51 (0.34) 16 (0.11) 5 (0.03) 59 (0.39)


No threshold 0.81 40 (0.26) 64 (0.42) 20 (0.13) 29 (0.19) 0

α>0.50 0.82 38 (0.25) 60 (0.40) 19 (0.13) 27 (0.18) 7 (0.05)

α>0.80 0.85 34 (0.22) 51 (0.33) 16 (0.10) 23 (0.15) 29 (0.19)

α>0.90 0.88 30 (0.20) 42 (0.28) 12 (0.08) 20 (0.13) 48 (0.32)


No threshold 0.89 222 (0.17) 597 (0.45) 176 (0.13) 334 (0.25) 0

α>0.50 0.9 218 (0.16) 590 (0.44) 172 (0.13) 329 (0.25) 20 (0.02)

α>0.80 0.96 162 (0.12) 479 (0.36) 120 (0.09) 247 (0.19) 320 (0.24)

α>0.90 0.98 131 (0.10) 418 (0.31) 103 (0.08) 199 (0.15) 478 (0.36)


No threshold 0.95 232 (0.18) 572 (0.43) 173 (0.13) 352 (0.27) 0

α>0.50 0.95 231 (0.17) 569 (0.43) 170 (0.13) 348 (0.26) 11 (0.01)

α>0.80 0.98 190 (0.14) 530 (0.40) 146 (0.11) 313 (0.24) 150 (0.11)

α>0.90 0.99 169 (0.13) 498 (0.38) 123 (0.09) 288 (0.22) 251 (0.19)

Nano FF-1004-class

accuracy

4-class accuracy

4-class accuracy

4-class accuracy

4-class accuracy

Distribution of predicted CMS

Nano FFPE-100Distribution of predicted CMS

Affy FF-472*Distribution of predicted CMS

Nano FF-472Distribution of predicted CMS

Affy FF-100*Distribution of predicted CMS

0.70

0.75

0.80

0.85

0.90

0.95

1.00

Nano FFPE-100 Nano FF-100 Nano FF-472 Affy FF-100 Affy FF-472

4-Group Accuracy Assessment of the CMS Classifier

No threshold

Fo

ur

Gro

up

-Accu

racy

α>0.50 α>0.80 α>0.90

Figure 3

A

B




1.00

0.50

0

1.00

0.50

0

Pro

po

rtio

n

NA

NA

p

Published OnlineFirst October 27, 2020.Clin Cancer Res Jeffrey S Morris, Rajyalakshmi Luthra, Yusha Liu, et al. CLIA-Certified SettingConsensus Molecular Subtyping of Colorectal Carcinoma in a Development and Validation of a Gene Signature Classifier for

Updated version

10.1158/1078-0432.CCR-20-2403doi:

Access the most recent version of this article at:

Material

Supplementary

http://clincancerres.aacrjournals.org/content/suppl/2020/10/27/1078-0432.CCR-20-2403.DC1

Access the most recent supplemental material at:

Manuscript

Authoredited. Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journal.Sign up to receive free email-alerts

Subscriptions

Reprints and

[email protected] at

To order reprints of this article or to subscribe to the journal, contact the AACR Publications

Permissions

Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://clincancerres.aacrjournals.org/content/early/2020/10/27/1078-0432.CCR-20-2403To request permission to re-use all or part of this article, use this link



http://clincancerres.aacrjournals.org/lookup/doi/10.1158/1078-0432.CCR-20-2403http://clincancerres.aacrjournals.org/content/suppl/2020/10/27/1078-0432.CCR-20-2403.DC1http://clincancerres.aacrjournals.org/cgi/alertsmailto:[email protected]://clincancerres.aacrjournals.org/content/early/2020/10/27/1078-0432.CCR-20-2403http://clincancerres.aacrjournals.org/

Article FileFigure-1AFigure-1BFigure-2Figure-3Figure-4

Documents

Development and Validation of a Gene Signature Classifier ......Oct 27, 2020 · 2 . Abstract: Purpose: Consensus molecular subtyping (CMS) of colorectal cancer (CRC) has potential