View
0
Download
0
Category
Preview:
Citation preview
Supplementary Results
Pratyaksha J. Wirapati
April 19, 2007
Contents
1 Survival data 21.1 Data collection and preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Cohort-specific survival for all endpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Coexpression modules 32.1 Gene Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Number of genes selected and mapped to each dataset . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Heatmaps for all datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 Biological annotation of the modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Module score distributions 83.1 Dot histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Tests of bimodality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Three subtypes 104.1 Scatter plots of ERBB2 vs ESR1 module scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 Tests of trimodality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Additional survival analyses 135.1 Three subtypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 Three subtypes combined with high/low proliferation . . . . . . . . . . . . . . . . . . . . . . . . . 135.3 Separate metastasis-free survival and relapse-free survival . . . . . . . . . . . . . . . . . . . . . . . 13
6 Prognostic value of ER-status within type-2 (ERBB2+) tumors 14
7 Gene-by-gene survival analysis 147.1 Gene Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147.2 Agreement of gene associations with different survival endpoints . . . . . . . . . . . . . . . . . . . 14
8 Forest plots of signature performance for all survival endpoints 15
9 Concordance in risk classifications 169.1 Tables of pairwise concordance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169.2 Combined prediction by pairs of signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179.3 Patient classifications on proliferation-vs-subtype plots . . . . . . . . . . . . . . . . . . . . . . . . 18
1
1 Survival data
1.1 Data collection and preprocessing
Survival endpoints The original studies often do not give precise description of their endpoints. Thefollowing are the definitions we used:
• Relapse-free survival (RFS) The endpoints are any events including local (in the same breast orthe other side) or regional relapse, and distant metastasis (which is often implied by death). We alsoconsider as RFS endpoints described as disease-free survival (DFS) in the original studies.
• Metastasis-free survival (MFS) The endpoints are relapses to other organs, such as liver or brain.
• Overall survival (OS) The endpoints are patient’s deaths. Few studies specify “disease-specific sur-vival” to distinguish death due to breast cancer, as opposed to any death. When not specified, weassume all deaths are disease related.
Note that these endpoint types are “nested”. Death due to breast cancer implies distant metastasis shortlybefore (and often the time to metastasis is identical to the time to death). RFS typically includes MFSand OS. The events are therefore correlated. Although they obviously have different baseline hazard (RFSdrops faster than MFS, which in turn drops faster than OS) (See Supplementary Result 1.2 below), most ofour conclusions are concerned with relative survival between groups, which are valid for any type of events(Supplementary Result 1.2, 5.1, 5.2, 5.3, 6, 8). To save space (without affecting the conclusions), Figure 4c,dand Figure 6a,b,c in the main text are based on combined event types. Furthermore, we can not detect genesspecifically associated with one type of endpoint but not the others (see Supplementary Result 7).
Time-unit conversion and modifications All time units were converted to “days”. One year is consid-ered to be 365.25 days. “Months” and “‘weeks” were converted to “years” first. Patients with missing timewere excluded, but those with missing event indicator (very few) were considered censored. Median follow-uptime can vary greatly between studies (e.g. very short in STNO and very long in TRANSBIG; see the numbersat risk in Supplementary Result 1.2). Therefore all survival data were truncated to 10 years (i.e. patientswith longer follow-up are considered to be censored at this time point) in Cox regression analysis to ensurethat the meta-analytical results pertain to similar time range in all datasets. The truncation also deals withproblems with non-proportional hazard in long follow-up time (see Hilsenbeck et al. [1998] Time-dependenceof hazard ratios for prognostic factors in primary breast cancer. Breast Cancer Res. Treat. 52:227-237).
1.2 Cohort-specific survival for all endpoints
a b c
total 6311890 1415 1147 799 469
STNO 60115 39 10 1UNC 32128 33 10 4JRH1 4599 75 59 30 1MGH 2560 50 42 28 18NCH 47135 110 97 81 66NKI 121319 260 216 131 75TRANSBIG 101253 207 170 147 118UPP 88249 185 158 140 107JRH2 1561 55 44 38 33STOCK 40159 140 124 68EMC2 37180 164 149 94 40UCSF 20132 97 68 37 11
STNO
UNCJRH1MGH
NCH
NKITRANSBIGUPP
JRH2STOCK
EMC2UCSF
40
60
80
100
rela
pse-
free
sur
viva
l (%
)
group: number at risk: events
follow-up: 0 2.5 5 7.5 10 (year)
total 3061015 852 693 491 308
NKI 105319 266 223 135 82TRANSBIG 56253 225 197 177 149EMC 107286 226 180 127 44HPAZ 2596 80 49 16 2JRH2 1361 55 44 36 31
NKI
TRANSBIG
EMC
HPAZ
JRH2
40
60
80
100
met
asta
sis-
free
sur
viva
l (%
)
group: number at risk: events
follow-up: 0 2.5 5 7.5 10 (year)
total 4842019 1628 1313 867 512
STNO 46115 54 14 3 1DUKE 43170 86 46 12 3UNC 22129 39 14 5JRH1 4599 85 70 32 1UCSF 37132 104 74 43 14NKI 74319 290 248 147 89STOCK 40159 148 130 64NCH 34135 122 111 96 81TRANSBIG 57253 240 212 190 154UPP 51232 198 173 152 122EMC2 23180 175 166 103 44HPAZ 1296 87 55 20 3
STNODUKE
UNC
JRH1
UCSFNKI
STOCKNCH
TRANSBIGUPPEMC2HPAZ
40
60
80
100
over
all s
urvi
val (
%)
group: number at risk: events
follow-up: 0 2.5 5 7.5 10 (year)
Cohort-specific survival curves for relapse-free survival (a), metastasis-free survival (b) and overall survival(c). Panel c is the same as figure 1b in the main text.
2
2 Coexpression modules
2.1 Gene Tables
The genes identified by prototype-based coexpression analysis are listed in the file modules.txt, in a tab-delimited text file. For convenient browsing, open the file using spreadsheet program (e.g. MS Excel) andformat the column widths to fit the contents automatically. We identified 909 genes under the specifiedcriteria. The ordering of the genes in the table is the same as that in Figure 2b of the main text.
2.2 Number of genes selected and mapped to each dataset
---------------------------------------------------
modules
--------------------------------------
datasets ESR1 ERBB2 AURKA PLAU STAT1
---------------------------------------------------
NKI 250 20 278 154 159
EMC 248 18 270 151 156
UPP 268 21 294 160 166
STOCK 268 21 294 160 166
DUKE 182 14 204 130 134
UCSF 143 11 176 109 116
UNC 235 19 270 156 161
NCH 235 19 270 156 161
STNO 133 8 156 100 102
JRH1 100 8 135 90 99
JRH2 248 18 270 151 156
MGH 196 11 198 123 108
expO 268 21 294 160 166
TGIF1 248 18 270 151 156
BWH 268 21 294 160 166
TRANSBIG 11 1 50 2 2
EMC2 1 1 10 1 0
HPAZ 2 1 13 1 0
---------------------------------------------------
Gene union 268 21 294 160 166
---------------------------------------------------
3
2.3 Heatmaps for all datasets
NKI1 2 3
ESR1FOXA1SCUBE2ERBB4BCL2ERBB3AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12XBP1MYBTFF3TFF1CCND1
KRT5KRT7KLF5KRT16PROM1CDH3LMO4
ERBB2GRB7
STARD3PERLD1
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2MCM10CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECKDCN
PLAUMMP11CTSKPLAURCTSBMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL12A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-CHLA-BHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
EMC1 2 3
ESR1FOXA1SCUBE2ERBB4BCL2ERBB3AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12SLC39A6XBP1MYBTFF3TFF1KRT18CCND1
KRT5KRT7KLF5KRT16PROM1CDH3LMO4
ERBB2GRB7
STARD3PERLD1
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2MCM10CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECKDCN
PLAUMMP11CTSKPLAURCTSBMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-CHLA-BHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
TGIF112 3
ESR1FOXA1SCUBE2ERBB4BCL2ERBB3AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12SLC39A6XBP1MYBTFF3TFF1KRT18CCND1
KRT5KRT7KLF5KRT16PROM1CDH3LMO4
ERBB2GRB7
STARD3PERLD1
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2MCM10CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECKDCN
PLAUMMP11CTSKPLAURCTSBMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-CHLA-BHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
JRH11 2 3
ESR1BCL2
RARAEGFRFABP7YBX1FOXC1
GATA3SLC39A6XBP1MYBTFF3KRT18KRT5KRT7PROM1CDH3
ERBB2GRB7
STARD3AURKABUB1FOXM1CCNB1CDKN3MCM6MCM2CCNA2CCNE1MCM3TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSDPT
PLAUMMP11CTSKPLAURCTSBMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL6A3COL6A2COL3A1COL6A1
STAT1HLA-GHLA-CHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IRF1IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
STNO1 2 3 N
ESR1SCUBE2ERBB4BCL2ERBB3
EGFRFABP7CRYABYBX1FOXC1
GATA3SLC39A6XBP1MYBTFF3TFF1KRT18CCND1KRT5KRT7PROM1CDH3
ERBB2GRB7
STARD3AURKABUB1FOXM1CCNB1CDKN3MCM6MCM2CCNA2CCNE1MCM3TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNAMKI67CKS1BTYMSE2F1
DPT
PLAUCTSKPLAURCTSBMMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A2COL5A1COL12A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-CHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1
MX1IRF1IFITM1CD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
UCSF1 2 3 N
ESR1BCL2ERBB3AR
EGFRFABP7CRYABYBX1FOXC1
GATA3SLC39A6MYBTFF3KRT18CCND1KRT5KRT7KLF5KRT16CDH3LMO4
ERBB2GRB7
STARD3
AURKABUB1FOXM1CCNB1CDKN3MCM6MCM2CCNA2CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMS
DPT
PLAUMMP11CTSKPLAURCTSBMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-CHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1
MX1IRF1IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
4
UPP1 2 3
ESR1FOXA1SCUBE2ERBB4BCL2ERBB3
AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12SLC39A6XBP1MYBTFF3TFF1KRT18CCND1
KRT5KRT7KLF5KRT16PROM1CDH3LMO4
ERBB2GRB7
STARD3PERLD1
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2MCM10CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECKDCN
PLAUMMP11CTSKPLAURCTSBMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL12A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-CHLA-BHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1
IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
STOCK1 2 3
ESR1FOXA1SCUBE2ERBB4BCL2ERBB3
AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12SLC39A6XBP1MYBTFF3TFF1KRT18CCND1
KRT5KRT7KLF5KRT16PROM1CDH3LMO4
ERBB2GRB7
STARD3PERLD1
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2MCM10CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECKDCN
PLAUMMP11CTSKPLAURCTSBMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL12A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-CHLA-BHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1
IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
expO
1 2 3ESR1FOXA1SCUBE2ERBB4BCL2ERBB3
AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12SLC39A6XBP1MYBTFF3TFF1KRT18CCND1
KRT5KRT7KLF5KRT16PROM1CDH3LMO4
ERBB2GRB7
STARD3PERLD1
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2MCM10CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECKDCN
PLAUMMP11CTSKPLAURCTSBMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL12A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-CHLA-BHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1
IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
TRANSBIG1 2 3
ESR1
SCUBE2
KRT18
AURKACCNB2BUB1
MCM6CCNE2
TGFB3
BIRC5
TK1
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
HPAZ12 3
SCUBE2EGFR
ERBB2
MCM6
TGFB3
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
EMC21 3
CCNE2
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
Note Because of the small number of genes mapped, the vertical scale of the heatmaps for TRANSBIG,HPAZ and EMC2 is larger than that of other datasets.
5
UNC
1 2 3 NESR1FOXA1SCUBE2ERBB4ERBB3
AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12SLC39A6XBP1MYBTFF3TFF1KRT18CCND1
KRT5KLF5KRT16PROM1CDH3
ERBB2GRB7
STARD3PERLD1
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2MCM10CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECKDCN
PLAUMMP11CTSKPLAURCTSBMMP2MMP13MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL12A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-BHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1
IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
NCH
1 2 3ESR1FOXA1SCUBE2ERBB4ERBB3
AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12SLC39A6XBP1MYBTFF3TFF1KRT18CCND1
KRT5KLF5KRT16PROM1CDH3
ERBB2GRB7
STARD3PERLD1
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2MCM10CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECKDCN
PLAUMMP11CTSKPLAURCTSBMMP2MMP13MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL12A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-BHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1
IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
JRH2
12 3ESR1FOXA1SCUBE2ERBB4BCL2ERBB3AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12SLC39A6XBP1MYBTFF3TFF1KRT18CCND1
KRT5KRT7KLF5KRT16PROM1CDH3LMO4
ERBB2GRB7
STARD3PERLD1
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2MCM10CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECKDCN
PLAUMMP11CTSKPLAURCTSBMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-CHLA-BHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
BWH
1 23NESR1FOXA1SCUBE2ERBB4BCL2ERBB3
AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12SLC39A6XBP1MYBTFF3TFF1KRT18CCND1
KRT5KRT7KLF5KRT16PROM1CDH3LMO4
ERBB2GRB7
STARD3PERLD1
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2MCM10CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECKDCN
PLAUMMP11CTSKPLAURCTSBMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL12A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-CHLA-BHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1
IFITM1GZMACD38IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
DUKE
1 2 3ESR1FOXA1ERBB4BCL2ERBB3AR
RARA
EGFRFABP7CRYABYBX1FOXC1
GATA3CA12XBP1MYBTFF3TFF1KRT18CCND1KRT5KRT7KLF5KRT16PROM1CDH3LMO4
ERBB2GRB7
STARD3
AURKACCNB2BUB1FOXM1CCNB1CDKN3MCM6CCNE2MCM2CCNA2CCNE1MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1BIRC5TOP2AMYBL2PCNALMNB1MKI67CKS1BTK1TYMSE2F1
DPT
RECK
PLAUMMP11CTSKPLAURMMP2MMP13TIMP3MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL6A3COL6A2COL1A2COL3A1COL6A1
STAT1HLA-GHLA-FHLA-DRAHLA-DMAHLA-DQB1HLA-DPA1HLA-DMB
MX1IFIT3IRF1IFITM1GZMACD38IFIT5
MGH
12 3ESR1ERBB4BCL2ERBB3
AR
EGFR
FOXC1
GATA3CA12SLC39A6XBP1MYBTFF3CCND1
KRT5KLF5KRT16PROM1
ERBB2
AURKACCNB2BUB1FOXM1CDKN3MCM6MCM2CCNA2MCM10MCM3
TGFB3TGFBR2IGF1SPARCL1
PTTG1MYBL2
CKS1BTYMSE2F1
DPTRECKDCN
PLAUMMP11CTSBMMP2
MMP1LAMB1TIMP2
COL11A1COL5A2COL10A1COL5A1COL12A1COL6A3COL1A2COL3A1COL6A1
STAT1HLA-FHLA-CHLA-BHLA-DPA1HLA-EHLA-DMB
MX1IFIT3IRF1
IFIT5
05
1015 fo
llow
-up
time
(yea
r)
censoredevent
6
2.4 Biological annotation of the modules
The genes were divided into groups according to the module and the direction of the correlation (i.e., thosecorrelated and anticorrelated with ESR1 were grouped separately; denoted by ESR1+ and ESR1-). OnlyESR1 and AURKA modules have negatively correlated genes.
Each of these grouped was compared against the genesets in GO ontology database (based on annotationin Entrez Gene database tables from ftp://ftp.ncbi.nih.gov/gene/; version 21 January 2007) and Molecu-lar Signature Database (MSigDB) version 2 from http://www.broad.mit.edu/gsea/msigdb/msigdb index.
html, Fisher’s exact test was used to score the association between the modules and a gene set. The top-ranking genesets were shown in the file go.txt and msigdb.txt. The tables are in tab-delimited text formatthat can be opened from spreadsheet programs. The files contain Fisher’s exact test p-values, odd-ratio ofenrichment, geneset identifiers, and brief description.
7
3 Module score distributions
3.1 Dot histograms
ESR1 module ERBB2 module AURKA module PLAU module STAT1 module
NKI -1 0 estrogen score
-1 0 1 2 ERBB2-amplification score
-1 -0.5 0 0.5 1 proliferation score
-1 0 1 invasion score
-1 0 1 immune-response score
EMC
-1 0 1 -1 0 1 2 -1 0 1 -2 -1 0 1 -1 0 1 2
UPP
-1 -0.5 0 0.5 -1 0 1 2 -0.5 0 0.5 1 -1 0 1 -1 0 1
STOCK
-1 -0.5 0 0.5 1 -1 0 1 2 -0.5 0 0.5 1 -1 0 1 -1 0 1
DUKE
-0.5 0 0.5 0 1 -0.5 -0.25 0 0.25 0.5 -0.5 0 0.5 1 -0.5 0 0.5 1 1.5
UCSF
-0.5 0 0.5 -0.5 0 0.5 1 1.5 -1 -0.5 0 0.5 1 -0.5 0 0.5 -0.5 0 0.5 1
UNC
-1 -0.5 0 0.5 1 0 1 2 -1 -0.5 0 0.5 1 -1 0 1 -1 0 1 2
NCH
-0.5 0 0.5 -0.5 0 0.5 1 -0.5 0 0.5 -1 -0.5 0 0.5 1 -1 0 1
STNO
-1 0 1 -1 0 1 2 -1 0 1 -1 0 1 -1 -0.5 0 0.5 1JRH1
-0.5 0 0.5 0 0.5 1 -0.25 0 0.25 -0.5 0 0.5 -0.5 0 0.5JRH2
-1 -0.5 0 0.5 0 1 2 -1 -0.5 0 0.5 1 -1 0 1 -0.5 0 0.5 1MGH
-1 -0.5 0 0.5 0 1 -0.5 0 0.5 -1 -0.5 0 0.5 1 -0.5 0 0.5 1
expO
-1 0 1 0 2 -1 0 1 -2 -1 0 1 -1 0 1 2TGIF1
-1 -0.5 0 0.5 -0.5 0 0.5 1 -0.5 0 0.5 -0.5 0 0.5 -1 -0.5 0 0.5 1BWH
-1 0 1 2 0 2 -2 -1 0 1 -1 0 1 -2 -1 0 1 2
TRANSBIG
-2.5 0 2.5 0 2 -2.5 0 2.5 -5 -2.5 0 2.5 -2.5 0 2.5 5
HPAZ-2.5 0 2.5 -2.5 0 2.5 5 -2 0 2 -5 -2.5 0 2.5
EMC2-2.5 0 2.5 5 -0.5 0 0.5 -2 -1 0 1 2 -5 -2.5 0
Note In datasets TRANSBIG, EMC2 and HPAZ (produced on custom diagnostic platforms), few genes werefound for modules other than proliferation (AURKA), and therefore the module score might be unreliable.None can be found for STAT1-module for EMC2 and HPAZ.
8
3.2 Tests of bimodality
Log likelihood ratio between two- and one-component Gaussian mixture models were used as the test statis-tics, (as suggested in McLachlan and Peel [2000] Finite Mixture Modules, Wiley, New York). However, theparametric bootstrap null distribution was generated from uniform distribution instead of unimodal normaldistribution (following Hartigan [1985] “The dip test of unimodality.” Ann. Statist. 13:30–84). This wasdone because the p-values based on unimodal-normal null distribution were too sensitive to non-normal, butstill unimodal, densities.
---------+-------------------------------------------------+
| p-values of bimodality test of module scores |
+---------+---------+---------+---------+---------+
dataset | ESR1 | ERBB2 | AURKA | PLAU | STAT1 |
---------+---------+---------+---------+---------+---------+
NKI | 0.001 * | 0.001 * | 0.560 | 0.990 | 0.068 |
EMC | 0.001 * | 0.001 * | 0.438 | 0.996 | 0.064 |
UPP | 0.002 * | 0.001 * | 0.047 | 0.061 | 0.061 |
STOCK | 0.001 * | 0.001 * | 0.083 | 0.083 | 0.083 |
DUKE | 0.069 | 0.001 * | 0.630 | 0.082 | 0.078 |
UCSF | 0.003 * | 0.001 * | 0.055 | 0.473 | 0.054 |
UNC | 0.005 * | 0.001 * | 0.146 | 0.087 | 0.085 |
NCH | 0.040 | 0.047 | 0.097 | 0.295 | 0.083 |
STNO | 0.689 | 0.001 * | 0.412 | 0.573 | 0.571 |
JRH1 | 0.113 | 0.003 * | 0.724 | 0.804 | 0.731 |
JRH2 | 0.003 * | 0.001 * | 0.119 | 0.531 | 0.010 * |
MGH | 0.064 | 0.076 | 0.929 | 0.965 | 0.070 |
expO | 0.021 | 0.001 * | 0.673 | 0.574 | 0.481 |
TGIF1 | 0.877 | 0.001 * | 0.088 | 0.875 | 0.919 |
BWH | 0.927 | 0.001 * | 0.020 | 0.926 | 0.889 |
TRANSBIG | 0.001 * | 0.001 * | 0.715 | 0.108 | 0.072 |
EMC2 | 0.692 | 0.081 | 0.753 | 0.001 * | nan |
HPAZ | 0.108 | 0.001 * | 0.109 | 0.097 | nan |
---------+---------+---------+---------+---------+---------+
* p <= 0.01
9
4 Three subtypes
4.1 Scatter plots of ERBB2 vs ESR1 module scores
EMC UNC NKI STOCK UPP
1
2
3-1
0
1
2
ER
BB
2-am
plifi
catio
n s
core
-1 0 1estrogen score
1
2
3-1
0
1
2
-1 -0.5 0 0.5 1estrogen score
1
2
3-1
0
1
2
-1.5 -1 -0.5 0 0.5 1estrogen score
1
2
3-1
0
1
2
-1 -0.5 0 0.5 1estrogen score
1
2
3-1
0
1
2
-1 -0.5 0 0.5estrogen score
-1.5
-1
-0.5
0
0.5
1
prol
ifera
tion
sco
re
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
NCH DUKE UCSF STNO JRH1
1
2
3-1
-0.5
0
0.5
1
ER
BB
2-am
plifi
catio
n s
core
-1 -0.5 0 0.5estrogen score
1
2
3
0
1
2
17q-
ampl
ifica
tion
sco
re
-1 -0.5 0 0.5estrogen score
1
2
3-0.5
0
0.5
1
1.5
ER
BB
2-am
plifi
catio
n s
core
-1 -0.5 0 0.5estrogen score
1
2
3-1
0
1
2
-1 0 1estrogen score
1
2
3-0.5
0
0.5
1
ER
BB
2-am
plifi
catio
n s
core
-0.5-0.25 0 0.25 0.5 0.75estrogen score
-0.5
-0.25
0
0.25
0.5
0.75
prol
ifera
tion
sco
re
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
prol
ifera
tion
sco
re
subtype1 2 3
-1
-0.5
0
0.5
1
prol
ifera
tion
sco
re
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.2
0
0.2 pr
olife
ratio
n s
core
subtype1 2 3
10
JRH2 expO TGIF1 BWH
1
2
3-1
0
1
2
ER
BB
2-am
plifi
catio
n s
core
-1 -0.5 0 0.5estrogen score
1
2
3-1
0
1
2
3
ER
BB
2-am
plifi
catio
n s
core
-1 0 1estrogen score
1
2
3-0.5
0
0.5
1
1.5
ER
BB
2-am
plifi
catio
n s
core
-1 -0.5 0 0.5 1estrogen score
1
2
3-1
0
1
2
3
ER
BB
2-am
plifi
catio
n s
core
-1 0 1 2estrogen score
-1
-0.5
0
0.5
1
prol
ifera
tion
sco
re
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
1.5
prol
ifera
tion
sco
re
subtype1 2 3
-0.75
-0.5
-0.25
0
0.25
0.5
prol
ifera
tion
sco
re
subtype1 2 3
-2
-1
0
1
prol
ifera
tion
sco
re
subtype1 2 3 N
MGH TRANSBIG HPAZ EMC2
1
2
3-1
-0.5
0
0.5
1
1.5
ER
BB
2-am
plifi
catio
n s
core
-1 -0.5 0 0.5estrogen score
1
2
3-1
0
1
2
3
ER
BB
2-am
plifi
catio
n s
core
-2.5 0 2.5estrogen score
1
2
3-2.5
0
2.5
5
ER
BB
2-am
plifi
catio
n s
core
-2.5 0 2.5estrogen score
12
3
-1
-0.5
0
0.5
ER
BB
2-am
plifi
catio
n s
core
-2.5 0 2.5 5estrogen score
-0.5
-0.25
0
0.25
0.5
0.75
prol
ifera
tion
sco
re
subtype1 2 3
-3
-2
-1
0
1
2
3
prol
ifera
tion
sco
re
subtype1 2 3
-2
-1
0
1
2
prol
ifera
tion
sco
re
subtype1 2 3
Notes
1. Dataset BWH contains entirely of high-grade tumors according to pathological data. It does not havepatient outcome data and therefore the difficulty in deciding high- or low-proliferation does not affectsurvival analysis.
2. Dataset MGH contains only one ER-negative, 3 ERBB2+ and 3 low-grade tumors (out of 60) accordingto pathological data.
3. Dataset EMC2 does not have enough genes to reliably determine the ER and ERBB2 module scores.The clusters can not be fitted well. In subsequent analysis, the pathological ER-status are used to assignthe subtype (ER+ is considered type 3, ER- is type 1).
11
4.2 Tests of trimodality
Log likelihood ratio between k- and (k−1)-component Gaussian mixture models were used as the test statistics,(as suggested in McLachlan and Peel [2000] Finite Mixture Modules, Wiley, New York). Parametric bootstrapwere used to general the null distribution, using the the parameters from the fitted null model.
---------+-------------------+
| p-values of tests|
| for the number |
| of components |
---------+---------+---------+
dataset | 3 vs 2 | 4 vs 3 |
---------+---------+---------+
NKI | 0.001 * | 0.978 |
EMC | 0.001 * | 0.024 |
UPP | 0.001 * | 0.325 |
STOCK | 0.001 * | 0.014 |
DUKE | 0.002 * | 0.007 * |
UCSF | 0.001 * | 0.012 |
UNC | 0.001 * | 0.054 |
NCH | 0.001 * | 0.589 |
STNO | 0.001 * | 0.026 |
JRH1 | 0.002 * | 0.030 |
JRH2 | 0.002 * | 0.088 |
MGH | 0.932 | 0.671 |
expO | 0.001 * | 0.002 * |
TGIF1 | 0.001 * | 0.268 |
BWH | 0.001 * | 0.025 |
TRANSBIG | 0.001 * | 0.035 |
EMC2 | 0.091 | 0.007 * |
HPAZ | 0.001 * | 0.905 |
---------+---------+---------+
* p <= 0.01
12
5 Additional survival analyses
5.1 Three subtypes
total 6311890 1365 1010 584
type 1 129361 219 172 88type 2 110224 119 87 57type 3 3921305 1027 751 439
type 1
type 2
type 3
0
20
40
60
80
100
rela
pse-
free
sur
viva
l (%
)
group: number at risk: events
follow-up: 0 3 6 9 (year)
total 3061015 821 620 375
type 1 71211 151 125 70type 2 44119 83 60 34type 3 191685 587 435 271
type 1type 2
type 3
0
20
40
60
80
100
met
asta
sis-
free
sur
viva
l (%
)group: number at risk: events
follow-up: 0 3 6 9 (year)
total 4842019 1569 1122 644
type 1 122389 260 189 101type 2 100263 160 99 59type 3 2621367 1149 834 484
type 1type 2
type 3
0
20
40
60
80
100
over
all s
urvi
val (
%)
group: number at risk: events
follow-up: 0 3 6 9 (year)
5.2 Three subtypes combined with high/low proliferation
total 6311890 1365 1010 584
1H 108305 184 147 761L 2156 35 25 122H 100196 99 75 472L 1028 20 12 103H 253654 477 335 2023L 139651 550 416 237
1H
1L2H
2L3H
3L
0
20
40
60
80
100
rela
pse-
free
sur
viva
l (%
)
follow-up: 0 3 6 9 (year)
total 3061015 821 620 375
1H 65192 138 114 651L 619 13 11 52H 38101 69 49 262L 618 14 11 83H 127344 272 188 1213L 64341 315 247 150
1H
1L
2H
2L
3H
3L
0
20
40
60
80
100
met
asta
sis-
free
sur
viva
l (%
)
follow-up: 0 3 6 9 (year)
total 4842019 1569 1122 644
1H 104324 216 162 861L 1865 44 27 152H 94231 138 83 472L 632 22 16 123H 189686 559 385 2293L 73681 590 449 255
1H
1L
2H
2L
3H
3L
0
20
40
60
80
100
over
all s
urvi
val (
%)
follow-up: 0 3 6 9 (year)
5.3 Separate metastasis-free survival and relapse-free survival
Figure 4c,d use metastasis-free survival, if available, or relapse-free survival otherwise. Below are the resultsfor the two types of endpoints separately.
total 3061015 821 620 375
1 71211 151 125 702 44119 83 60 343H 127344 272 188 1213L 64341 315 247 150
123H
3L
0
20
40
60
80
100
met
asta
sis-
free
sur
viva
l (%
)
group: number at risk: events
poor(1+2+3H) vs. good(3L):5-year: 67% vs 89%HR = 2.28 [1.73, 3.01]
follow-up: 0 3 6 9 (year)
DUKEEMCEMC2HPAZJRH1JRH2MGHNKISTNOSTOCKUCSFUNCUPPNCHTRANSBIGTotal
0 25 50 75 100
5-year MFS (%)
poor: 1+2+3H good: 3L
hazard ratio
1 3 10 30
total 6311890 1365 1010 584
1 129361 219 172 882 110224 119 87 573H 253654 477 335 2023L 139651 550 416 237
1
23H
3L
0
20
40
60
80
100
rela
pse-
free
sur
viva
l (%
)
group: number at risk: events
poor(1+2+3H) vs. good(3L):5-year: 64% vs 84%HR = 2.22 [1.84, 2.69]
follow-up: 0 3 6 9 (year)
DUKEEMCEMC2HPAZJRH1JRH2MGHNKISTNOSTOCKUCSFUNCUPPNCHTRANSBIGTotal
0 25 50 75 100
5-year RFS (%)
poor: 1+2+3H good: 3L
hazard ratio
1 3 10 30
13
6 Prognostic value of ER-status within type-2 (ERBB2+) tumors
Because pathological ER status is not available for all patients, here ER-status is based on estrogen scores(using as cutoff the midpoint between the cluster center of type 1 and type 3).
total 110224 119 87 57
2,ER+ 66133 78 56 302,ER- 4491 41 31 27
2,ER+2,ER-
0
20
40
60
80
100
rela
pse-
free
sur
viva
l (%
)
follow-up: 0 3 6 9 (year)
total 44119 83 60 34
2,ER+ 2879 57 41 232,ER- 1640 26 19 11
2,ER+2,ER-
0
20
40
60
80
100
met
asta
sis-
free
sur
viva
l (%
)follow-up: 0 3 6 9 (year)
total 100263 160 99 59
2,ER+ 57152 101 66 362,ER- 43111 59 33 23
2,ER+2,ER-
0
20
40
60
80
100
over
all s
urvi
val (
%)
follow-up: 0 3 6 9 (year)
7 Gene-by-gene survival analysis
7.1 Gene Tables
The top-ranking genes identified by meta-analytical gene-by-gene Cox regression are are listed in the filezsurvival.txt, in a tab-delimited text file. For convenient browsing, open in spreadsheet program (e.g. MSExcel) and format the column widths to fit the contents automatically.
7.2 Agreement of gene associations with different survival endpoints
Here we show that the choice of endpoints (OS: overall survival, MFS: metastasis-free survival, RFS: relapse-free survival) do not substantially affect gene-by-gene survival analysis. The lower Z-scores of MFS is duemainly to smaller sample size (MFS data are available only in 5 datasets, see Figure 1a of the main text).
-10 -5 0 5 10
-10
-5
0
5
10
Z-OS
Z-M
FS
corr = 0.60
-10 -5 0 5 10
-10
-5
0
5
10
Z-OS
Z-R
FS
corr = 0.87
-10 -5 0 5 10
-10
-5
0
5
10
Z-MFS
Z-R
FS
corr = 0.67
14
8 Forest plots of signature performance for all survival endpoints
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
ZCOX
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
CCYC
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
GGI-128
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
CSR
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
p53-32
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
CON-52
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
NCH-70
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
EMC-76
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
NKI-70
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:ONC-16
50 60 70 80 90 1005-year survival (%)of low-risk group
hazard ratio.5 1 2 4 8 16
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
ZCOX
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
CCYC
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
GGI-128
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
CSR
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
p53-32
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
CON-52
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
NCH-70
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
EMC-76
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
NKI-70
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:ONC-16
50 60 70 80 90 1005-year survival (%)of low-risk group
hazard ratio.5 1 2 4 8 16
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
ZCOX
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
CCYC
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
GGI-128
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
CSR
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
p53-32
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
CON-52
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
NCH-70
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
EMC-76
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:
NKI-70
NKI
EMC
NCH
UPP
STOCK
DUKE
UNC
STNO
UCSF
JRH1
JRH2
MGH
TRANSBIG
HPAZ
EMC2
Total for genomic datasets:ONC-16
50 60 70 80 90 1005-year survival (%)of low-risk group
hazard ratio.5 1 2 4 8 16
Relapse-free Survival Metastasis-free Survival Overall Survival
15
9 Concordance in risk classifications
9.1 Tables of pairwise concordance
Whole signatures:
NKI-70 ONC-16 EMC-76 NCH-70 CON-52 p53-32 GGI-128 CSR CCYC ZCOX
NKI-70 * 79.97 69.37 77.54 81.02 76.07 82.18 75.12 81.34 83.66
ONC-16 79.97 * 70.22 82.50 83.87 74.06 86.51 79.02 84.08 86.72
EMC-76 69.37 70.22 * 70.11 73.59 61.04 73.38 72.11 73.27 73.48
NCH-70 77.54 82.50 70.11 * 85.45 67.84 86.29 82.39 83.13 85.66
CON-52 81.02 83.87 73.59 85.45 * 69.11 91.04 82.29 88.82 91.14
p53-32 76.07 74.06 61.04 67.84 69.11 * 71.64 67.21 69.74 72.69
GGI-128 82.18 86.51 73.38 86.29 91.04 71.64 * 83.34 92.62 93.46
CSR 75.12 79.02 72.11 82.39 82.29 67.21 83.34 * 82.81 83.87
CCYC 81.34 84.08 73.27 83.13 88.82 69.74 92.62 82.81 * 91.25
ZCOX 83.66 86.72 73.48 85.66 91.14 72.69 93.46 83.87 91.25 *
Average pairwise concordance: 79.4% (84.5% if EMC-76 and p53-32 are excluded)
Proliferation-gene partial signatures:
ONC-16 NKI-70 EMC-76 NCH-70 CON-52 p53-32 CSR GGI-128 CCYC ZCOX
ONC-16 * 84.71 82.39 85.35 89.46 84.92 84.40 90.72 89.77 90.20
NKI-70 84.71 * 83.24 83.45 88.09 80.50 83.34 88.51 88.72 89.67
EMC-76 82.39 83.24 * 84.61 86.93 78.39 83.13 86.29 86.72 85.56
NCH-70 85.35 83.45 84.61 * 87.45 81.13 87.66 87.56 86.51 87.24
CON-52 89.46 88.09 86.93 87.45 * 83.66 86.72 93.15 91.78 92.30
p53-32 84.92 80.50 78.39 81.13 83.66 * 81.34 85.98 84.29 84.40
CSR 84.40 83.34 83.13 87.66 86.72 81.34 * 86.61 86.19 87.45
GGI-128 90.72 88.51 86.29 87.56 93.15 85.98 86.61 * 94.83 94.52
CCYC 89.77 88.72 86.72 86.51 91.78 84.29 86.19 94.83 * 93.67
ZCOX 90.20 89.67 85.56 87.24 92.30 84.40 87.45 94.52 93.67 *
Average pairwise concordance: 86.7%
Non-proliferation-gene partial signatures:
NKI-70 ONC-16 EMC-76 NCH-70 CON-52 p53-32 GGI-128 CSR CCYC ZCOX
NKI-70 * 67.00 59.30 65.21 62.78 69.43 69.30 65.42 62.05 73.22
ONC-16 67.00 * 58.36 63.73 60.15 68.90 65.29 62.68 59.20 70.48
EMC-76 59.30 58.36 * 59.52 60.46 54.98 61.85 64.47 56.88 57.62
NCH-70 65.21 63.73 59.52 * 66.16 58.78 64.18 65.74 56.04 65.00
CON-52 62.78 60.15 60.46 66.16 * 58.99 61.40 60.46 58.88 62.99
p53-32 69.43 68.90 54.98 58.78 58.99 * 65.07 61.52 58.67 69.74
GGI-128 69.30 65.29 61.85 64.18 61.40 65.07 * 65.74 57.40 69.97
CSR 65.42 62.68 64.47 65.74 60.46 61.52 65.74 * 62.05 67.53
CCYC 62.05 59.20 56.88 56.04 58.88 58.67 57.40 62.05 * 63.42
ZCOX 73.22 70.48 57.62 65.00 62.99 69.74 69.97 67.53 63.42 *
Average pairwise concordance: 63.1%
16
9.2 Combined prediction by pairs of signatures
Every distinct pair of signatures were combined in the following way:
1. Their continuous prediction scores were used as explanatory variables in Cox regression (stratified bydataset).
2. The fitted coefficients were used as weights in linear combination of the two prediction scores to producethe combined prediction score
3. The combined scores were used to rank patients. 33% in each dataset were classified as low-risk.
The results for the 45 pairs are shown below as the black bars to the right of the individual signatures. Theidentity of each combination is not shown (the ordering from left to right is [ONC-16 + NKI-70], [ONC-16 +EMC-76p], and so on), because we only want to demonstrate that their hazard ratios and survivals are similarand within the confidence interval of the some individual signatures. Note that the combined performance maybe slightly biased upward because the weights are estimated from the same data. Even so, the improvementis not clinically substantial.
0 10 20 30 40 50
0.5
1.0
2.0
5.0
Haz
ard
Rat
io
ONC.16NKI.70EMC.76pNCH.70
CON.52p53.32GGI.128CSR
CCYCZCOXcombinations
0 10 20 30 40 50
6070
8090
100
Sur
viva
l at 5
yea
rs
ONC.16NKI.70EMC.76pNCH.70
CON.52p53.32GGI.128CSR
CCYCZCOXcombinations
17
9.3 Patient classifications on proliferation-vs-subtype plots
All vertical axes correspond to proliferation score.
NKI EMC NCH UPP STOCK
ONC-16
-1
-0.5
0
0.5
1
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
NKI-70
-1
-0.5
0
0.5
1
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
EMC-76
-1
-0.5
0
0.5
1
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
NCH-70
-1
-0.5
0
0.5
1
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
CON-52
-1
-0.5
0
0.5
1
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
18
All vertical axes correspond to proliferation score.
NKI EMC NCH UPP STOCK
p53-32
-1
-0.5
0
0.5
1
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
CSR
-1
-0.5
0
0.5
1
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
GGI-128
-1
-0.5
0
0.5
1
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
CCYC
-1
-0.5
0
0.5
1
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
ZCOX
-1
-0.5
0
0.5
1
subtype1 2 3
-1.5
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
-0.5
0
0.5
1
subtype1 2 3
19
All vertical axes correspond to proliferation score.
DUKE UNC STNO UCSF JRH1
ONC-16
-0.5
-0.25
0
0.25
0.5
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.5
-0.25
0
0.25
0.5
subtype1 2 3 N
-0.2
0
0.2
subtype1 2 3
NKI-70
-0.5
-0.25
0
0.25
0.5
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.5
-0.25
0
0.25
0.5
subtype1 2 3 N
-0.2
0
0.2
subtype1 2 3
EMC-76
-0.5
-0.25
0
0.25
0.5
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.5
-0.25
0
0.25
0.5
subtype1 2 3 N
-0.2
0
0.2
subtype1 2 3
NCH-70
-0.5
-0.25
0
0.25
0.5
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.5
-0.25
0
0.25
0.5
subtype1 2 3 N
-0.2
0
0.2
subtype1 2 3
CON-52
-0.5
-0.25
0
0.25
0.5
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.5
-0.25
0
0.25
0.5
subtype1 2 3 N
-0.2
0
0.2
subtype1 2 3
20
All vertical axes correspond to proliferation score.
DUKE UNC STNO UCSF JRH1
p53-32
-0.5
-0.25
0
0.25
0.5
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.5
-0.25
0
0.25
0.5
subtype1 2 3 N
-0.2
0
0.2
subtype1 2 3
CSR
-0.5
-0.25
0
0.25
0.5
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.5
-0.25
0
0.25
0.5
subtype1 2 3 N
-0.2
0
0.2
subtype1 2 3
GGI-128
-0.5
-0.25
0
0.25
0.5
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.5
-0.25
0
0.25
0.5
subtype1 2 3 N
-0.2
0
0.2
subtype1 2 3
CCYC
-0.5
-0.25
0
0.25
0.5
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.5
-0.25
0
0.25
0.5
subtype1 2 3 N
-0.2
0
0.2
subtype1 2 3
ZCOX
-0.5
-0.25
0
0.25
0.5
subtype1 2 3
-1
-0.5
0
0.5
1
subtype1 2 3 N
-1
-0.5
0
0.5
1
subtype1 2 3 N
-0.5
-0.25
0
0.25
0.5
subtype1 2 3 N
-0.2
0
0.2
subtype1 2 3
21
All vertical axes correspond to proliferation score.
JRH2 MGH TRANSBIG HPAZ EMC2
ONC-16
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-3
-2
-1
0
1
2
3
subtype1 2 3
-2
-1
0
1
2
subtype1 2 3
NKI-70
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-3
-2
-1
0
1
2
3
subtype1 2 3
-2
-1
0
1
2
subtype1 2 3
-2
-1
0
1
2
subtype1 2
EMC-76
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-2
-1
0
1
2
subtype1 2
NCH-70
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-3
-2
-1
0
1
2
3
subtype1 2 3
-2
-1
0
1
2
subtype1 2 3
CON-52
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-3
-2
-1
0
1
2
3
subtype1 2 3
-2
-1
0
1
2
subtype1 2 3
-2
-1
0
1
2
subtype1 2
22
All vertical axes correspond to proliferation score.
JRH2 MGH TRANSBIG HPAZ EMC2
p53-32
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
CSR
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-3
-2
-1
0
1
2
3
subtype1 2 3
-2
-1
0
1
2
subtype1 2 3
-2
-1
0
1
2
subtype1 2
GGI-128
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-3
-2
-1
0
1
2
3
subtype1 2 3
-2
-1
0
1
2
subtype1 2 3
-2
-1
0
1
2
subtype1 2
CCYC
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-3
-2
-1
0
1
2
3
subtype1 2 3
-2
-1
0
1
2
subtype1 2 3
-2
-1
0
1
2
subtype1 2
ZCOX
-1
-0.5
0
0.5
1
subtype1 2 3
-0.5
-0.25
0
0.25
0.5
0.75
subtype1 2 3
-3
-2
-1
0
1
2
3
subtype1 2 3
-2
-1
0
1
2
subtype1 2 3
-2
-1
0
1
2
subtype1 2
23
Recommended