Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
A novel messenger RNA signature as a biomarker for predicting early relapse in
non-small cell lung cancer
Authors: Jing Li, MD1, Xiaoxia Liu1, MD, Wenqian Xu, MD1, Xin Wang, MD1
Department: 1Departments of CyberKnife, Huashan Hospital, Fudan University,
Shanghai, China
Authors:
Jing Li, MD. Departments of CyberKnife, Huashan Hospital, Fudan University.
No.525, Hongfeng Road, Pudong District, Shanghai 200041, China. Tel: +86-021-
38719999. Fax: +86-021-38719999. Email: [email protected].
Xiaoxia Liu. Departments of CyberKnife, Huashan Hospital, Fudan University.
No.525, Hongfeng Road, Pudong District, Shanghai 200041, China. Tel: +86-021-
38719999. Fax: +86-021-38719999. Email: xiaoxia@ fudan.edu.cn .
Wenqian Xu, MD. Departments of CyberKnife, Huashan Hospital, Fudan University.
No.525, Hongfeng Road, Pudong District, Shanghai 200041, China. Tel: +86-021-
38719999. Fax: +86-021-38719999. Email: amy126simon@ 126.com .
Xin Wang, MD. Departments of CyberKnife, Huashan Hospital, Fudan University.
No.525, Hongfeng Road, Pudong District, Shanghai 200041, China. Tel: +86-021-
38719999. Fax: +86-021-38719999. Email: [email protected].
Correspondence to:
Xin Wang, MD. Departments of CyberKnife, Huashan Hospital, Fudan University.
No.525, Hongfeng Road, Pudong District, Shanghai 200041, China. Tel: +86-021-
38719999. Fax: +86-021-38719999. Email: [email protected].
Running title: RNA signature for predicting early relapse in NSCLC
Category: Original article
This study wasn’t based on a previous communication to a society or meeting.
Abstract
Background: High throughput gene expression profiling has showed great promise in
providing insight into molecular mechanisms. Recurrence-related mRNAs may
potentially enrich genes with the ability to predict cancer recurrence and survival,
therefore we attempted to build an early recurrence associated gene signature to
improve prognostic prediction of lung cancer.
Methods: Propensity score matching was conducted between patients in early relapse
group and long-term survival group from TCGA training series (N=579) and patients
were matched 1:1. Global transcriptome analysis was then performed between the
paired groups to identify tumor specific mRNAs. Finally, using LASSO Cox
regression model, we built a multi-gene early relapse classifier incorporating forty
mRNAs. The prognostic and predictive accuracy of the signature was internally
validated in another 193 lung cancer patients.
Results: Forty mRNAs were finally identified to build an early relapse classifier.
With specific risk score formula, patients were classified into a high-risk group and a
low-risk group. Relapse free survival was significantly different between the two
groups in both discovery (HR: 3.126, 95% CI: 2.249-4.346, P<0.001) and internal
validation (HR 1.806, 95% CI 1.077-3.030, P=0.025).Further analysis revealed that
the prognostic value of this signature was independent of tumor stage, histotype and
EGFR mutation (P<0.05). Receiver operating characteristic (ROC) analysis showed
that the area under ROC curve of this signature was higher than TNM stage alone
(0.771 vs 0.686, P<0.05).
Conclusions: Our forty-mRNA-based classifier provides a reliable model for
predicting early recurrence in non-small cell lung cancer after surgery. This model
may facilitate personalized therapy-decision making for these patients.
Keywords: Non-small cell lung cancer; Recurrence; Signature
Introduction
Lung cancer is one of the most frequent causes of cancer-related deaths worldwide[1,
2] .According to the latest update of cancer statistics in the United States in 2018, a
total of 234,030 estimated new lung and bronchus cancer cases will be diagnosed,
both the incidence of males and females are the second highest among all cancer
types.[2]. Non-small cell lung cancer (NSCLC) accounts for about 80% of lung
cancer cases at the time of initial diagnosis, and the standard treatment is curative
resection, which is associated with a higher chance of long-term survival. However,
even after curative resection of NSCLC, long-term survival is reported as <50%, with
33.1% of patients exhibiting recurrence within 2 years [3]. Early detection of
recurrence of primary lung cancer after surgery is associated with improved outcomes
and survival in patients received surgical resection. Thus, uncovering the underlying
mechanisms and precise biomarkers is urgently needed to facilitate early diagnosis
and treatment of lung cancer and predict and monitor cancer recurrence and
metastasis.
In fact, lung cancer is of a high heterogeneity, originating from complex interactions
between environmental and genetic factors [4]. Some critical genes, such as EGFR
[5].PD-1[6], NFS1 [7], and BRAF [8] are implicated in the initiation, progression, and
metastasis of lung cancer. Great efforts have been made to identify the molecular
markers for prognosis prediction. However, majority studies are focused on single
gene, and sometime demonstrated conflicting evidence as to the prognostic
significance of these genes. In recent years, many studies have focused on gene
expression profiles in lung cancer; these have shown great promise for predicting
prognosis in individual patients. Yu et al successfully developed a five microRNA
based signature that can effectively predicted survival and relapse in lung cancer[9].
Tomida et al. developed a signature that could identify adenocarcinoma patients at
very high risk for relapse, even those with cancer in the early stage[10]. However,
most of them are not used clinical practice. Thus, identifying a more powerful and
practical gene signature for prognosis prediction is urgent,
In the present study, we adopted The Cancer Genome Atlas (TCGA), and conducted
mRNA profiling on large cohorts of NSCLC patients. By using the sample-splitting
method and Cox regression analysis, a prognostic forty-mRNA signature was
identified from the discovery set, and validated in another cohort. This mRNA
signature may help identify the subset of NSCLC patients at high risk of early relapse.
Patients and Methods
Preprocessing of microarray data in TCGA database
The raw sequencing data and clinical information were downloaded from TCGA
database (Illumina HiSeq Systems) (https://cancergenome.nih.gov/), and were
normalized using Robust Multichip Average[11]. The samples were collected from
1991 to 2013. miRNAs whose expression was = 0 in more than 50% of the samples
data were removed and were then normalized by log2(X + 1). miRNAs with log 2 fold
change (log FC) < −1 or log FC > 1 (FDR adjusted P < 0.05) were considered to be
differentially expressed miRNAs and were included for subsequent analysis[12].
Datasets selection
The selection criterion for lung cancer datasets were as follows: (i) pathological
diagnosed with NSCLC; (ii) patients should have basic clinical information for
analysis; (iii) pathological diagnosed with stage I-III; (iv) with intact follow up
information of relapse free survival (RFS) interval and RFS status. Patients who
received neoadjuvant chemotherapy or radiation were excluded from the study.
Clinical data for all the patients used in this study were obtained from TCGA. RFS
times for patients who experienced tumor progression within the follow-up period
were obtained from the TCGA file for new tumor events. The patients were
randomized divided into discovery cohort and validation cohort with ratio 3:1.
Identification of early relapse associated genes
Early relapse was defined as the locoregional recurrence or distant metastasis within 1
year after primary resection[3]. Samples in the discovery set were selected and
divided into early relapse group and long-term survival group (no relapse after a
minimum of 5 years follow-up). Propensity score (PS) matching analysis was
performed between the two groups to adjust for stage and histotype, which were the
most significant clinical factors associated with early relapse. All patients were
matched 1:1. Finally, 31 paired patients in the discovery set were identified to identify
the changes of global gene expression profile between early relapse group and long-
term survival groups. The analysis of differentially expressed genes (DEGs) between
early relapse and long-term survival samples was conducted using the Linear Models
for Microarray data (LIMMA) method [13] The threshold for identification of DEGs
was set as P<0.05 and fold change>=1.25. Lastly, LASSO Cox regression
model[13]was used to select the most significantly relapse associated mRNAs of all
the differentially expressed genes.
Development of risk score and statistical analysis
Using LASSO Cox regression analysis, we identified a panel of genes and constructed
a multi-mRNA-based classifier for predicting the early relapse of patients with stage
I-III lung cancer in the discovery set. With specific risk score formula, patients from
different sets were divided into high-risk and low-risk groups by using the median
risk score of the discovery set as the cutoff point. Survival rate in the low-risk and
high-risk groups were estimated by the Kaplan-Meier estimate, and compared using
the log-rank test. Multivariate Cox regression analysis and data stratification analysis
were performed to test the independent prognostic role of risk score in predicting
RFS. Time-dependent ROC analysis was used to investigate the prognostic or
predictive accuracy of each feature and signature. All statistical analyses were
performed with use of R (version 2.15.0, www.r-project.org). All statistical tests were
2-sided, and P values<0.05 were considered statistically significant.
Results
Preparation of lung cancer data sets
A total of 772 eligible patients were identified in TCGA database, which included 419
(54.3%) cases at stage I, 236(30.6%) at stage II, and 117(15.2%) at stage III. 375
patients were diagnosed as squamous cell carcinoma, 397 were adenocarcinoma. Of
them, 579 patients were divided into discovery set and 193 patients were in internal
validation set using the median risk score as cutoff point.as described below. The
original data of the all patients included in analysis were listed in Table S1.
Development of early relapse signature in the discovery set
Patients in discovery set were divided into early relapse group and long-term survival
group with no relapse in five years. Patients’ clinicopathological characters before and
after PS matching were summarized in Table 1. Before the implement of PS analysis,
it is noticeable that tumor stage in early relapse group was significantly higher than
that in long-term survival group. Besides, there is high percentage of squamous cell
carcinoma in long time survival group. After PS matching, there were no significant
differences in tumor stage, histotype, and radiotherapy between early relapse and
long-term survival groups in each set (Table 1).
Changes of global mRNA expression profiles were analyzed between early relapse
and long-term survival groups. One-hundred and twenty six of them were
differentially expressed between the two groups (P <0.05, fold change>=2.0) (Fig.1A)
(Table 3). LASSO coefficient profiles of the 126 mRNAs were shown in Figure 1B. A
coefficient profile plot was produced against the log (λ) sequence. Vertical line was
drawn at the value selected using 10-fold cross-validation, and the minimize λ method
resulted in 40 optimal coefficients. Of these, fifteen mRNAs were down-regulated and
twenty-five were up-regulated in early relapse group compared with long-term
survival group (Table 4). Using Lasso Cox regression modeling, we derived a forty-
mRNA signature to calculate the risk score for every patient based on the expression
levels of the forty RNAs weighted by their regression coefficients: Risk score=
ADAMTS18*0.068+ADH1C*-0.006+AJAP1*0.092+AKAP12*0.041+C1orf186*-
0.112+CCR10*-0.087+CD177*-0.076+CLEC7A*-0.070+DPPA2*0.061+DUSP13*-
0.061+FGF19*0.023+ FTCD*-0.079+GLYATL2*0.003 + HOMER2*-0.115
+HSD17B13*-0.067 +HTR1B*0.059 + KIAA1875*-0.071+ KLB*-0.091
+LEFTY1*0.080 +LOC100131726*0.184 + MAGEA8*0.107 +MPPED1*-0.057
+NBPF4*-0.114 +PALM3*-.094 + PCDHA4*0.05 +PLA1A*0.133 + PLA2G4F*-
0.016+ PLEKHG4B*-0.044+ PLIN4*0.029+ PSORS1C2*-0.027+ PTPRR*-0.019+
RBM46*0.079+ RFPL3S*-0.109+ TMEM213*-0.089+ TMEM63C*-0.03+
TRIM58*0.044+ TSKS*-0.069+ ZDHHC11*-0.034+ ZFP42*0.086 + ZYG11A*-
0.075. Each gene represents its transcriptional expression levels.
The prognostic value of forty-mRNA signature in discovery, validation cohorts
The distribution of risk scores and RFS status was shown in Figure 2A (left panel).
The chance of recurrence raised steadily ad score increased. Time-dependent ROC
analyses at 1 year, 3 year and 5 year were conducted to assess the prognostic accuracy
of the forty-mRNA based classifier (Fig.2A, middle panel). The 1-year, 3-year, 5-year
RFS rates for patients with low-risk scores were 95.3%, 80.8%, and 67.4%, compared
with 79.0%,49.7%, and 37.8% for patients with high-risk scores, respectively (HR:
3.244, 95% CI: 2.338-4.500, P<0.001, Fig.2A, right panel).
We then did the same analyses in the internal validation cohort. The prognostic score
showed same clinical significance as in discovery set. The 1-year, 3-year and 5-year
RFS was 90.3%, 66.3%, and 62.1% for the low-risk group, and 80.1%, 54.3%, and
37.8% for the high-risk group (HR 1.970, 95% CI 1.181-3.289, P=0.009, Fig. 2B).
Furthermore, in the entire dataset analysis, risk score-based classification yielded
similar results (Fig. 2C). Patients with lung cancer can be divided into low and high
risk with significantly different RFS and the signature showed the best predicting
accuracy at one year after surgery.
Independence and accuracy of the signature in predicting RFS
After multivariate analysis adjusted by clinicopathological variables that were
significance in univariate survival analysis, the forty-mRNA-based signature
remained a powerful and independent prognostic factor in both the discovery and
internal validation cohorts (Table 2). Stratified analysis suggested that the forty-
mRNA-based classifier was still a statistically significant prognostic model in stage
IA (Fig.3A), stage IB (Fig.3B), stage II (Fig.3C), stage III (Fig.3D), patients
diagnosed with adenocarcinoma (Fig.3E) or squamous cell carcinoma (Fig.3F),
patients with or without KRAS mutation (Fig.3G and 3F).
To further confirm that the forty-mRNA-based signature had higher efficacy in
predicting early relapse, time-dependent ROC was used, which suggested that the
forty-mRNA-based classifier had significantly higher prognostic accuracy than tumor
stage at 1 year. Combined TNM stage and the signature provided more accurate
survival prediction than TNM stage or forty-mRNA-based signature alone (Fig. 4).
Identification of forty-mRNA signature associated biological signaling pathway
To further identify the biologically meaningful pathways that the forty gene were
involved, we performed GSEA analysis in TCGA database to identify associated
biological signaling pathway. Significant gene sets (FDR < 5%) were visualized as
Enrichment Map (Fig. 5). The risk score was accompanied with exceptional
regulation of several important cancer-related networks, namely Selenoamino acid
metabolism, One carbon pool by folate, Amyotrophic lateral sclerosis (ALS), and
Drug metabolism cytochrome (P450).
Discussion
Surgery is the optimal treatment to cure lung cancer. However, nearly 50% of patients
with NSCLC experience recurrence and have a poor prognosis despite curative
resection [14, 15].TNM staging indicates the serious of disease and recurrence
potential of primary lung cancer [16, 17]. However, even patients diagnosed at the
same stage are split between the recurrent and non-recurrent group after curative
resection. Therefore, the current TNM staging system has its limitation in clinical
practice. Accurately predicting the cases in which disease is likely to recur can help to
personalize therapy and follow-up strategies. Several studies have indicated that the
risk factors associated with postoperative recurrence include tumor differentiation,
and vessel invasion1, adenocarcinoma, visceral pleural invasion, the serum
carcinoembryonic antigen (CEA) level[4, 18-20]. In addition, novel predictors of lung
cancer, such as maximal standardized uptake values (SUVs) of tumors on positron
emission tomography (PET), the status of epidermal growth factor receptor (EGFR)
and KRAS were also associated with postoperative outcomes[20-23]. These study
mainly based on single clinicopathological factor or single gene status, and the results
show variety depending on sample size or patient selection. Little attention has been
paid to mRNAs expression pattern and its clinical significance in the prediction of
early relapse in stage I-III NSCLC using high-throughput expression profile datasets.
In the present study, we developed a novel prognostic classifier based on forty
mRNAs to improve the prediction accurate of early relapse and RFS for NSCLC after
surgical resection. By applying the forty-mRNA signature to the patients in the TCGA
discovery set, a clear difference was observed in survival for patients with low and
high-risk score. And it was internally validated in the validation series, suggesting the
good reproducibility of this signature in lung cancer. After stratified by AJCC stage,
histotype and EGFR status, the forty-mRNA-based signature remains a good
prognostic model, implying that the mRNA signature can be used to refining the
current staging system. Furthermore, the time-dependent ROC at 1 year suggested
that this forty-mRNA-signature has considerable prognostic accuracy in predicting
tumor relapse within the first year after initial resection of lung cancer. Therefore, our
study identified a forty-mRNAs signature that could help identify patients with high
risk of early relapse and guide individualized treatment of patients with lung cancer,
which is credible to be applied to clinic[13].
Most of genes included in the signature have been demonstrated to be linked with
cancer. The GSEA analysis found that the risk score based gene exceptional regulated
several important cancer-related networks, including Selenoamino acid metabolism,
One carbon pool by folate, ALS, and Drug metabolism cytochrome (P450). ALS is a
progressive disease characterized by degeneration of motor neurons that results in
increasing weakness and death[24]. An increased risk of ALS was observed during the
first year after cancer diagnosis, and in contrast, a lower risk of cancer was observed
in ALS patients after diagnosis compared with ALS-free individuals [25, 26]. Some
drugs for ALS was shown to induce anti-cancer effects on cancer [25, 27, 28]. Most of
genes involved in ALS turned out to be related to various cancers using survival
analysis, pathway enrichment analysis, and TF enrichment analysis[26]. The one
carbon pool by folate pathway has been noted as a predominant pathway in cancer
cell survival and progression for many years[29].It has been proven to contribute to
genome instability and cancer development[30], cell proliferation, DNA synthesis
control, and cell migration[31, 32], mitochondrial folate metabolism[33].
Selenoamino acid metabolism play critical role in reactive oxygen species-mediated
DNA damage, apoptosis and drug resistance in human cancer [34-36]. P450 in the
tumor is relevant to cancer susceptibility, drug response, and progression. Thus, it is
not surprising that our signature has a good prediction of early recurrence after
surgery for NSCLC.
Albeit we successfully developed a prognostic model for cancer prognosis using a
biology-driven approach in large NSCLC cases, the limitations of our study should be
addressed. Firstly, there is no external validation for the signature. Before the
signature can be applied as a clinical-grade assay, the external validation is essential.
Secondly, the information of several other important clinicopathological features and
therapy strategies is not available in TCGA database, thus, we cannot adjust these
factor when built the signature. Thirdly, our study was based on the data from a
public-available datasets without testing prospectively in a clinical trial, which may
have some inherit limitation as retrospective study.
In conclusion, our study demonstrated that the forty-mRNA prognostic model can
effectively distinguish NSCLC patients with low early recurrence risk from those with
high early recurrence risk, regardless of TNM stage, histotype and EGFR status. Since
our forty-mRNA-based classifier can make a good supplementary to traditional TNM
stage, clinicians may be able to recommend less aggressive therapy for low-risk
individuals and intensive care for high risk individuals in directing personalized
therapy. Therefore, this model may facilitate personalized clinical decision making for
lung cancer patients.
AbbreviationsNSCLC: Non-small cell lung cancer
TCGA: The Cancer Genome Atlas
RFS: Relapse free survival
PS: Propensity score
DEGs: Differentially expressed genes
LIMMA: Linear Models for Microarray data
Acknowledgments
None
Funding support
This research was supported by the National Science Foundation of China (No.
81802374). The funders had no role in the study design, data collection and analysis,
decision to publish, or preparation of the manuscript.
Authors' contributions
JL and XW conceived this study. XXL, XW and WQX improved the study design and
contributed to the interpretation of results. JL and XW performed the study. JL and
XXL performed data processing and statistical analysis. JL wrote the manuscript. XW
revised the manuscript. All authors read and approved the final manuscript.
Competing InterestsThe authors have declared that no competing interest exists.
References
1. Global Burden of Disease Cancer C, Fitzmaurice C, Allen C, Barber RM,
Barregard L, Bhutta ZA, et al. Global, Regional, and National Cancer Incidence,
Mortality, Years of Life Lost, Years Lived With Disability, and Disability-Adjusted
Life-years for 32 Cancer Groups, 1990 to 2015: A Systematic Analysis for the Global
Burden of Disease Study. JAMA Oncol. 2017; 3: 524-48.
2. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin. 2018;
68: 7-30.
3. Vendrell JA, Mau-Them FT, Beganton B, Godreuil S, Coopman P, Solassol J.
Circulating Cell Free Tumor DNA Detection as a Routine Tool forLung Cancer
Patient Management. Int J Mol Sci. 2017; 18.
4. Chen YY, Huang TW, Tsai WC, Lin LF, Cheng JB, Chang H, et al. Risk factors of
postoperative recurrences in patients with clinical stage I NSCLC. World J Surg
Oncol. 2014; 12: 10.
5. Blakely CM, Watkins TBK, Wu W, Gini B, Chabon JJ, McCoach CE, et al.
Evolution and clinical impact of co-occurring genetic alterations in advanced-stage
EGFR-mutant lung cancers. Nat Genet. 2017; 49: 1693-704.
6. Akbay EA, Koyama S, Carretero J, Altabef A, Tchaicha JH, Christensen CL, et al.
Activation of the PD-1 pathway contributes to immune escape in EGFR-driven lung
tumors. Cancer Discov. 2013; 3: 1355-63.
7. Alvarez SW, Sviderskiy VO, Terzi EM, Papagiannakopoulos T, Moreira AL,
Adams S, et al. NFS1 undergoes positive selection in lung tumours and protects cells
from ferroptosis. Nature. 2017; 551: 639-43.
8. Damsky WE, Curley DP, Santhanakrishnan M, Rosenbaum LE, Platt JT, Gould
Rothberg BE, et al. beta-catenin signaling controls metastasis in Braf-activated Pten-
deficient melanomas. Cancer Cell. 2011; 20: 741-54.
9. Yu SL, Chen HY, Chang GC, Chen CY, Chen HW, Singh S, et al. MicroRNA
signature predicts survival and relapse in lung cancer. Cancer Cell. 2008; 13: 48-57.
10. Tomida S, Takeuchi T, Shimada Y, Arima C, Matsuo K, Mitsudomi T, et al.
Relapse-related molecular signature in lung adenocarcinomas identifies patients with
dismal prognosis. J Clin Oncol. 2009; 27: 2793-9.
11. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, et
al. Exploration, normalization, and summaries of high density oligonucleotide array
probe level data. Biostatistics. 2003; 4: 249-64.
12. Fu Q, Yang F, Xiang T, Huai G, Yang X, Wei L, et al. A novel microRNA
signature predicts survival in liver hepatocellular carcinoma after hepatectomy. Sci
Rep. 2018; 8: 7933.
13. Dai W, Li Y, Mo S, Feng Y, Zhang L, Xu Y, et al. A robust gene signature for the
prediction of early relapse in stage I-III colon cancer. Mol Oncol. 2018; 12: 463-75.
14. Hoffman PC, Mauer AM, Vokes EE. Lung cancer. Lancet. 2000; 355: 479-85.
15. al-Kattan K, Sepsas E, Fountain SW, Townsend ER. Disease recurrence after
resection for stage I lung cancer. Eur J Cardiothorac Surg. 1997; 12: 380-4.
16. Yamashita T, Uramoto H, Onitsuka T, Ono K, Baba T, So T, et al. Association
between lymphangiogenesis-/micrometastasis- and adhesion-related molecules in
resected stage I NSCLC. Lung Cancer. 2010; 70: 320-8.
17. Uramoto H, Tanaka F. Prediction of recurrence after complete resection in
patients with NSCLC. Anticancer Res. 2012; 32: 3953-60.
18. Koo HK, Jin SM, Lee CH, Lim HJ, Yim JJ, Kim YT, et al. Factors associated
with recurrence in patients with curatively resected stage I-II lung cancer. Lung
Cancer. 2011; 73: 222-9.
19. Maeda R, Yoshida J, Ishii G, Hishida T, Nishimura M, Nagai K. Risk factors for
tumor recurrence in patients with early-stage (stage I and II) non-small cell lung
cancer: patient selection criteria for adjuvant chemotherapy according to the seventh
edition TNM classification. Chest. 2011; 140: 1494-502.
20. Lee SH, Jo EJ, Eom JS, Mok JH, Kim MH, Lee K, et al. Predictors of Recurrence
after Curative Resection in Patients with Early-Stage Non-Small Cell Lung Cancer.
Tuberc Respir Dis (Seoul). 2015; 78: 341-8.
21. Higashi K, Ueda Y, Arisaka Y, Sakuma T, Nambu Y, Oguchi M, et al. 18F-FDG
uptake as a biologic prognostic factor for recurrence in patients with surgically
resected non-small cell lung cancer. J Nucl Med. 2002; 43: 39-45.
22. Liu WS, Zhao LJ, Pang QS, Yuan ZY, Li B, Wang P. Prognostic value of
epidermal growth factor receptor mutations in resected lung adenocarcinomas. Med
Oncol. 2014; 31: 771.
23. Nadal E, Chen G, Prensner JR, Shiratsuchi H, Sam C, Zhao L, et al. KRAS-G12C
mutation is associated with poor outcome in surgically resected lung adenocarcinoma.
J Thorac Oncol. 2014; 9: 1513-22.
24. Rowland LP, Shneider NA. Amyotrophic lateral sclerosis. N Engl J Med. 2001;
344: 1688-700.
25. Fang F, Al-Chalabi A, Ronnevi LO, Turner MR, Wirdefeldt K, Kamel F, et al.
Amyotrophic lateral sclerosis and cancer: a register-based study in Sweden.
Amyotroph Lateral Scler Frontotemporal Degener. 2013; 14: 362-8.
26. Taguchi YH, Wang H. Genetic Association between Amyotrophic Lateral
Sclerosis and Cancer. Genes (Basel). 2017; 8.
27. Oberley LW, Buettner GR. Role of superoxide dismutase in cancer: a review.
Cancer Res. 1979; 39: 1141-9.
28. Seol HS, Lee SE, Song JS, Lee HY, Park S, Kim I, et al. Glutamate release
inhibitor, Riluzole, inhibited proliferation of human hepatocellular carcinoma cells by
elevated ROS production. Cancer Lett. 2016; 382: 157-65.
29. Locasale JW. Serine, glycine and one-carbon units: cancer metabolism in full
circle. Nat Rev Cancer. 2013; 13: 572-83.
30. Yang M, Vousden KH. Serine and one-carbon metabolism in cancer. Nat Rev
Cancer. 2016; 16: 650-62.
31. Lehtinen L, Ketola K, Makela R, Mpindi JP, Viitala M, Kallioniemi O, et al.
High-throughput RNAi screening for novel modulators of vimentin expression
identifies MTHFD2 as a regulator of breast cancer cell migration and invasion.
Oncotarget. 2013; 4: 48-63.
32. Gustafsson Sheppard N, Jarl L, Mahadessian D, Strittmatter L, Schmidt A,
Madhusudan N, et al. The folate-coupled enzyme MTHFD2 is a nuclear protein and
promotes cell proliferation. Sci Rep. 2015; 5: 15029.
33. Nilsson R, Jain M, Madhusudhan N, Sheppard NG, Strittmatter L, Kampf C, et
al. Metabolic enzyme expression highlights a key role for MTHFD2 and the
mitochondrial folate pathway in cancer. Nat Commun. 2014; 5: 3128.
34. Chen T, Wong YS. Selenocystine induces reactive oxygen species-mediated
apoptosis in human cancer cells. Biomed Pharmacother. 2009; 63: 105-13.
35. Fan C, Chen J, Wang Y, Wong YS, Zhang Y, Zheng W, et al. Selenocystine
potentiates cancer cell apoptosis induced by 5-fluorouracil by triggering reactive
oxygen species-mediated DNA damage and inactivation of the ERK pathway. Free
Radic Biol Med. 2013; 65: 305-16.
36. Wang K, Fu XT, Li Y, Hou YJ, Yang MF, Sun JY, et al. Induction of S-Phase
Arrest in Human Glioma Cells by Selenocysteine, a Natural Selenium-Containing
Agent Via Triggering Reactive Oxygen Species-Mediated DNA Damage and
Modulating MAPKs and AKT Pathways. Neurochem Res. 2016; 41: 1439-47.
Figure Legends
Figure 1. (A) Heat map showed One-hundred and twenty six differentially expressed
mRNAs in NSCLC between early relapse and long-term survival group in discovery
set. (B) LASSO coefficient profiles of the 126 early relapse associated mRNAs. A
vertical line is drawn at the value chosen by 10-fold cross-validation.
Figure 2. Distribution of risk score(left panel), time dependent ROC curves at 1, 3
and 5 years(middle panel) and Kaplan-Meier survival analysis between patients at low
and high risk of relapse(right panel) in discovery set (A), internal validation set (B),
and entire dataset (C).
Figure 3. Kaplan-Meier survival analysis for patients based on the fifteen-mRNA-
based signature stratified by clinicopathological risk factors. (A) stage IA, P<0.001;
(B) stage IB, P<0.001; (C) stage II, P=0.001; (D) stage III, P=0.002; (E)
adenocarcinoma, P<0.001; (F) squamous cell carcinoma, P<0.001; (G) EGFR wild
type, P=0.003; (H) EGFR mutation, P=0.003.
Figure 4. Time-dependent ROC curves at 1 year compare the prognostic accuracy in
predicting early relapse of the forty-mRNA signature with TNM staging system (A) in
the entire cohorts with stage I-III lung cancer (N=772). Decision curve analysis at 12
months for the tumor stage, r integrated mRNA signature and the two combined
model (B). The y-axis measures the net benefit.
Figure 5. Gene Set Enrichment Analysis Delineates biological pathways associated
with risk score
Table 1 Clinical-pathological features of patients in early relapse and long-term survival groups before and after propensity score matching.
Variable Training SetBefore matching After matching
early relapse
long-term survival
p early relapse
long-term survival
p
Age(mean,IQR) 65.1 62.4 0.29 65.2 62.6 0.397(58.7-72.0)
(59.5-72.3) (57.0-71.0)
(59.0-71.0)
Gender 0.717 0.793 female 22 19 5 10 male 44 33 19 18Stage <0.001 1 I 17 38 5 10 II 25 8 19 18 III 24 6 6 2T stage <0.001 0.296 T1 5 20 5 10 T2 41 29 19 18 T3 18 2 6 2 T4 2 1 1 1N stage <0.001 0.744 N0 27 40 20 19 N1 23 7 8 7 N2 16 5 3 5Histological type 0.046 1adenocarcinom
a35 18 15 15
squamous cell carcinoma
31 34 16 16
Radiation therapy 0.009 0.125No 48 50 22 29
Yes 14 2 7 2 Unknown 4 0 2 0Total 66 52 31 31
Table 2 Univariable and multivariable Cox regression analysis in lung cancer
Discovery set(N=579)
Variables Univariate Analysis Multivariate Analysis
HR(95%CI) p HR(95%CI) p
Age 1.005(0.988-1.021) 0.576Gender female Reference 0.580 male 0.918(0.679-1.242)Stage <0.001 <0.001 I Reference Reference II 1.993(1.415-2.809) 1.936(1.369-2.736) III 2.414(1.630-3.577) 1.994(1.308-3.039)histotype 0.045 0.114 Squamous cell carcinoma Reference Reference Adenocarcinoma 1.362(1.007-1.843) 1.281(0.942-1.742)Radiotherapy <0.001 <0.001 No Reference Reference Yes 2.146(1.457-3.160) 1.607(1.059-2.439)Unknown 3.523(1.925-6.447) 3.006(1.633-5.536)15 gene risk score <0.001 <0.001 Low Reference Reference High 3.244(2.338-4.500) 3.126(2.249-4.346)
Validation set(N=193)
Variables Univariate Analysis Multivariate Analysis
HR(95%CI) p HR(95%CI) p
Age 0.998(0.971-1.025) 0.869Gender female Reference 0.068 male 1.590 (0.967-2.613)Stage 0.103 I Reference II 1.251(0.721-2.172) III 2.071(1.060-4.044)histotype 0.002 0.005 Squamous cell carcinoma Reference Reference Adenocarcinoma 2.277(1.347-3.849) 2.119(1.249-3.594)Radiotherapy 0.055 No Reference Yes 2.081(1.123-3.858)
Unknown 0.746(0.181-3.071)15 gene risk score 0.009 0.025 Low Reference Reference High 1.970(1.181-3.289) 1.806(1.077-3.030)
Table 3. Differentially expressed genes between early relapse and long-term survival
groups (P <0.05, fold change>=2.0).
Gene symbol logFCAveExp
rt P.Value
adj.P.Val
B
FGF19 -2.728092.26286
1-4.0604
0.000142
0.3553590.81506
2
PLA2G4F1.57673
26.88418
93.75653
60.00038
70.533308 -0.02272
ENHO1.52796
13.30991
33.61162
90.00061
60.56877 -0.40911
RFPL3S1.09244
22.59591
13.57433
60.00069
40.56877 -0.50709
ANXA10 -2.551362.78635
3-3.55941
0.000727
0.56877 -0.54613
TPPP1.28446
17.93489
83.40547
90.00117
30.56877 -0.94277
HLF 1.58857.21671
13.34167
0.001426
0.56877 -1.10392
CYP17A11.32116
11.72539 3.32559
0.001497
0.56877 -1.14421
LOC4418691.12463
58.63022
43.30838
30.00157
70.56877 -1.18719
RHCE1.00455
83.70646
33.29067
40.00166
40.56877 -1.23127
LOC619207 1.25754.05008
23.26740
40.00178
40.56877 -1.28896
HOXD8 -1.456455.53551
3-3.22267 0.00204 0.56877 -1.39909
TMEM63C1.72192
66.18397
63.21800
60.00206
80.56877 -1.4105
TRIM58 -1.422512.94306
3-3.21418
0.002092
0.56877 -1.41987
PCDHA4 -1.664714.90585
6-3.13664 0.00263 0.56877 -1.60798
C1orf1861.22376
83.09153
93.03380
80.00354
50.620294 -1.85253
PLIN51.12535
25.04104 3.01553
0.003736
0.620294 -1.8954
SOHLH2 -1.75995 3.30674 -2.977390.00416
50.628829 -1.98427
PLIN41.01004
24.95335
32.96686
60.00429
20.633915 -2.00864
C1orf511.13394
86.84631
2.948407
0.004522
0.640287 -2.05125
STC2 -1.108598.93618
1-2.9375
0.004663
0.641611 -2.07634
KIAA18751.23297
74.07180
82.93715
80.00466
80.641611 -2.07712
S100A11.13021
94.43977
72.93330
50.00471
90.641611 -2.08596
SLCO1B1 -1.108021.35699
2-2.90743
0.005075
0.661301 -2.14515
BLK1.55353
23.89343
12.87549
90.00554
80.670049 -2.21765
RBM46 -1.483231.17057
1-2.80072
0.006821
0.684867 -2.38515
CLEC7A1.04540
38.31229
52.77386 0.00734 0.6894 -2.44451
PLEKHG4B1.52708
77.22096
92.76882
30.00744
10.6894 -2.45559
ABHD12B -1.055981.75937
6-2.75763
0.007671
0.6894 -2.48016
AKAP12 -1.23528.91873
5-2.7548 0.00773 0.6894 -2.48638
NBPF4 -1.094431.40113
1-2.7474
0.007887
0.6894 -2.50256
STC1 -1.08794 9.05034 -2.73360.00818
70.6894 -2.53267
HTR1B -1.023071.59102
6-2.73162
0.008231
0.6894 -2.53699
HAS2AS -1.04622.59579
7-2.72026
0.008487
0.6894 -2.56169
MIOX1.38254
52.09935
32.71589
90.00858
70.6894 -2.57114
ZDHHC111.18308
45.84667
12.71322
10.00864
90.6894 -2.57694
PCDHB6 -1.423584.72073
1-2.69684
0.009037
0.6894 -2.61232
FTCD1.15463
21.96121
2.647279
0.010311
0.705841 -2.71842
LRRTM1 -1.198051.43468
4-2.63601
0.010623
0.714931 -2.74233
CKMT21.10296
52.78022
12.62670
10.01088
70.714931 -2.76203
KCNA4 -1.081921.45096
9-2.62538
0.010925
0.714931 -2.76482
BMPER -1.14934.42606
5-2.59193
0.011927
0.714931 -2.83512
CCL191.63379
77.03591
82.55109
50.01326
20.714931 -2.91996
CA31.08452
63.86080
82.53882
10.01369 0.721945 -2.94526
GOLGA8C1.06653
91.64043
72.51801
40.01444
20.730697 -2.98793
C12orf39 -1.24491.33122
7-2.51232
0.014655
0.730697 -2.99957
NOS1 -1.358672.70115
3-2.50965
0.014755
0.730697 -3.00501
MAGEA1 -2.40993.85353
1-2.50031
0.015112
0.730697 -3.02402
PABPC1L1.00731
98.40477
32.49826
0.015191
0.730697 -3.02818
LOC100131726
-1.217964.18668
5-2.49475
0.015328
0.730697 -3.03532
GABRA2 -1.0841.16890
2-2.49389
0.015362
0.730697 -3.03706
LGALS21.08298
73.79723
92.48955
20.01553
20.730697 -3.04585
ADAMTS19 -1.165570.83188
4-2.48395
0.015756
0.730697 -3.05719
ADH1C2.17976
55.68457
32.48276
40.01580
30.730697 -3.05958
SLC30A10 -1.05045 1.32916 -2.482220.01582
50.730697 -3.06068
PSORS1C2 1.34559 2.361862.46967
70.01633
70.730697 -3.08598
ZYG11A 1.28215.50069
82.46860
60.01638
10.730697 -3.08813
TSKS1.05059
71.79009
52.46790
70.01641 0.730697 -3.08954
CCR101.03771
93.94787
62.46528
20.01652 0.730697 -3.09482
TM7SF41.14242
33.94772
42.45201
80.01708
30.730697 -3.12142
UNC5D -1.31481.50740
2-2.44054
0.017584
0.730958 -3.14435
PASD1 -1.328951.05877
9-2.43685
0.017748
0.730958 -3.15171
DUSP131.41696
83.18931
62.43644
0.017767
0.730958 -3.15252
NCCRP1 -1.598315.79897
1-2.43215
0.017959
0.730958 -3.16106
HLA-DQB21.22656
17.49600
62.43194
10.01796
90.730958 -3.16147
MST1P21.03910
36.51342
62.42479
90.01829
30.731066 -3.17566
ACSM51.00356
52.65786 2.4221
0.018418
0.731224 -3.18101
MUC151.37181
36.48726
82.40441
10.01925 0.737346 -3.21596
GLI2 -1.02179 6.76626 -2.397050.01960
60.738871 -3.23045
TSPYL5 -1.10624 8.92384 -2.371780.02087
40.738871 -3.27989
LCT1.29624
21.72813
42.36352
80.02130
30.742647 -3.29595
PCDHB17 -1.04442.77495
3-2.35544
0.021732
0.743464 -3.31163
GLYATL21.62424
22.43453
72.35394
80.02181
20.743464 -3.31453
NCRNA001051.02824
85.67992
72.34902
70.02207
70.747975 -3.32405
PCDHGB5 -1.331085.78508
7-2.34329
0.022391
0.752388 -3.33513
PLA1A1.10802
65.76255
52.33568
40.02281
20.752388 -3.34979
HOMER21.01279
77.60315
32.33328
0.022946
0.752388 -3.35441
C1orf161 -1.11253.94988
1-2.32805
0.023242
0.752388 -3.36446
BNIPL1.16021
97.01440
32.31594
0.023938
0.752388 -3.38764
COL9A21.20008
18.09042
72.30279
0.024716
0.758232 -3.41272
HSF41.01086
16.87678
92.29818
30.02499
30.758232 -3.42147
LEFTY1 -1.05562.87726
3-2.29633
0.025106
0.758232 -3.42498
CYP4Z11.00704
21.83374
2.279884
0.026124
0.758232 -3.45611
MESP11.04374
55.16345
62.25891
20.02747
40.758232 -3.49552
GPD11.07032
94.54613
92.25055
10.02803 0.758232 -3.51115
MAGEC1 -1.686442.76814
2-2.25025 0.02805 0.758232 -3.51171
MAPK151.10389
45.13425
2.243043
0.028537
0.758232 -3.52514
NTSR1 -1.43173 2.21385 -2.2370.02895
20.758232 -3.53638
PRODH1.21901
68.85877
32.23556
60.02905 0.758232 -3.53903
PTPRR -1.05591 4.19124 -2.234330.02913
60.758232 -3.54133
ZFP42 -1.743832.60912
9-2.22849
0.029543
0.758232 -3.55215
CXCL131.44642
38.17065
2.219555
0.030177
0.758448 -3.56866
TMEM2131.30518
73.10531
2.206322
0.031137
0.759638 -3.59302
KLHL4 -1.116663.75531
9-2.20077
0.031548
0.759638 -3.6032
EDAR -1.246393.79978
1-2.1996
0.031635
0.759638 -3.60534
CD1E1.05726
83.99941
12.19221
10.03219 0.76227 -3.61885
AJAP1 -1.18159 2.5557 -2.184890.03274
80.765075 -3.63221
GATA4 -1.497042.61174
2-2.17205
0.033749
0.769278 -3.65553
DPPA2 -1.157191.11764
2-2.16566
0.034256
0.772019 -3.66709
ALB1.03334
81.72862
32.15837
40.03484
30.772992 -3.68024
CYP2D61.04964
24.15136
92.14756
40.03573 0.775971 -3.69968
ECEL1 -1.520794.22018
7-2.14435
0.035998
0.775971 -3.70544
EPHA6 -1.153792.42628
1-2.13713
0.036605
0.775971 -3.71837
PALM31.18574
55.75340
22.13100
50.03712
70.77623 -3.7293
CTAG2 -1.926752.87426
1-2.13011
0.037204
0.776248 -3.73089
CD1771.63878
15.48399
72.12932
60.03727
10.776873 -3.73229
DNAJB131.10238
74.29773
92.1277
0.037412
0.777084 -3.73519
KLB1.01903
94.17403
52.11834
20.03822
70.779243 -3.75182
FGF5 -1.079481.79446
6-2.1181
0.038249
0.779243 -3.75225
PCDHB5 -1.167426.46210
5-2.11422
0.038592
0.779243 -3.75913
CYP4F121.25064
53.75651
32.09189
60.04061
70.779243 -3.79849
C4orf71.83416
54.27980
52.08670
30.04110
20.779243 -3.80759
HSD17B131.12819
72.98134
2.083463
0.041406
0.779243 -3.81326
GDF5 1.147293.70068
72.08280
60.04146
80.779243 -3.81441
MAGEA8 -1.277571.67556
9-2.07528
0.042185
0.779567 -3.82756
PCK1 -1.424831.76076
6-2.06872
0.042818
0.781907 -3.83897
LOC100133469
-1.642811.90315
2-2.06833
0.042855
0.781907 -3.83965
MST1P91.18695
86.00719
22.05775
50.04389
40.783221 -3.85799
ANKRD1 -1.038763.81991
1-2.05739 0.04393 0.783221 -3.85863
MPPED11.43048
11.97852
42.05332
60.04433
50.783221 -3.86565
TFF1 -1.461873.22649
5-2.03256
0.046456
0.78566 -3.90137
PCDHA1 -1.21527 3.55384 -2.016450.04816
20.788116 -3.92887
ADAMTS18 -1.110483.52257
9-2.01617
0.048191
0.788116 -3.92934
UCA1 -1.454253.68230
5-2.01463
0.048357
0.788473 -3.93196
C10orf811.56456
85.89625
52.00942
10.04892
20.789294 -3.9408
PCDHGB1 -1.153024.59705
5-2.00327
0.049597
0.790617 -3.95122
Table 4. Forty differentially expressed mRNA included in the signature.
Gene symbol logFC P.Valuedown-regulated
FGF19 -2.72809 0.000142ZFP42 -1.74383 0.029543
PCDHA4 -1.66471 0.00263RBM46 -1.48323 0.006821TRIM58 -1.42251 0.002092
MAGEA8 -1.27757 0.042185AKAP12 -1.2352 0.00773
LOC100131726 -1.21796 0.015328AJAP1 -1.18159 0.032748DPPA2 -1.15719 0.034256
ADAMTS18 -1.11048 0.048191NBPF4 -1.09443 0.007887PTPRR -1.05591 0.029136
LEFTY1 -1.0556 0.025106HTR1B -1.02307 0.008231
up-regulated PLIN4 1.010042 0.004292
HOMER2 1.012797 0.022946KLB 1.019039 0.038227
CCR10 1.037719 0.01652CLEC7A 1.045403 0.00734
TSKS 1.050597 0.01641RFPL3S 1.092442 0.000694PLA1A 1.108026 0.022812
HSD17B13 1.128197 0.041406FTCD 1.154632 0.010311
ZDHHC11 1.183084 0.008649PALM3 1.185745 0.037127
C1orf186 1.223768 0.003545KIAA1875 1.232977 0.004668ZYG11A 1.2821 0.016381
TMEM213 1.305187 0.031137PSORS1C2 1.34559 0.016337
DUSP13 1.416968 0.017767MPPED1 1.430481 0.044335
PLEKHG4B 1.527087 0.007441PLA2G4F 1.576732 0.000387GLYATL2 1.624242 0.021812
CD177 1.638781 0.037271TMEM63C 1.721926 0.002068
ADH1C 2.179765 0.015803
Supplementary files:
Table S1. The original data of the all patients included in analysis.