Statistical Models - Purdue University · Big Data Training for Translational Omics Research Survival Methods • Kaplan-Meier plot: visually checking the survival curve between groups

Big Data Training for Translational Omics Research

Statistical Models

Unit 3 Session 1


Outline

bull Logistic Regressionndash ROC curve and AUC

bull Linear Regression

bull Kaplan-Meier plot and log-rank test

bull Cox Proportional odds model


Logistic Model

bull Logistic model is used for casecontrol study

bull Usage scenario when the response is binary say diseasehealthy or recurrencenon-recurrence

log119901 119904119905119886119905119906119904 = 119903119890119888119906119903119903119890119899119888119890

1 minus 119901 119904119905119886119905119906119904 = 119903119890119888119906119903119903119890119899119888119890= β0 + 12057311199091 +⋯+ 120573119899119909119899

bull Where 119909119894 are predictors and 120573119894 are the parameters of interest


Linear Model

bull Response continuous say weight or

gene expression

bull Predictors any variables (say gene

expression)

bull Model 119910 = β0 + 12057311199091 +⋯+ 120573119899119909119899 + 120598

bull Assumptions error term 120598 sim 119894119894119889 119873 0 1205902


Survival Methods

bull Kaplan-Meier plot visually checking the survival curve

between groups

bull Cox Proportional odds model and log-rank test as formal

statistical test

bull Response survival time (say DFS) and censor

bull Predictors any variables (say group or specific genes)

bull Recurrence censor = 1 and Non-recurrence censor = 0


Load data

bull Toy example datatoy_datalt- readcsv(toy_example_datacsv)


Logistic Model

bull Response --- recurrencenon-recurrence status

bull Predictor --- the expression of gene HOXB13

logistic regresion use gene HOXB13 to predict the recurnon-recur status

fitlogistic lt- glm(status~ gene_HOXB13data = toy_datafamily = binomial(link = logit))

summary(fitlogistic)

plot ROC curvep lt- predict(fitlogistic type=response)

pr lt- prediction(p toy_data$status)

prf lt- performance(pr measure = tpr xmeasure = fpr)

plot(prfmain=ROC plot of logistic regression)

calculate the auc

auc lt- performance(pr measure = auc)

auc lt- aucyvalues[[1]]auc


Logistic Regression Result


ROC Curve


Linear Model

bull Response --- expression of HOXB13

bull Predictor --- expression of IL17BR linear model use gene IL17BR to predict another gene HOXB13

HOXB13fitlmlt- lm(gene_HOXB13~gene_IL17BRdata = data_toy)

summary(fitlm)


Linear Regression Result


Kaplan-Meier Plot

bull We use Kaplan-Meier plot and log-rank

test to check whether the survival time is

significantly different from each other

between groups (say highlow ratio

group)

ratiosurv lt- survfit(Surv(timecensor) ~ ratio_group data = toy_data)

autoplot(ratiosurvpVal = TpX=025pY =025title = paste0(Kaplan-Meier plot of

toy example )yLab = Survival Probability)


Kaplan-Meier Plot


Cox Proportional Odds Model

bull We use highlow ratio group to predict the

survival probability Here the response is

the survival time and the censor

information

fitcox lt- coxph(Surv(timecensor) ~ group data = toy_data)

summary(fitcox)


Cox Model Result


Data Downloading Processing

and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization

bull Variance based filtering (top 25)

bull T test based filtering(based on the P-value cutoff)

The above steps are implemented in

ldquoget_DEG_tableRrdquo script


Data Availability

bull Microdissected dataset GSE1378

httpwwwncbinlmnihgovgeoqueryacccgiacc=GSE

1378

bull Whole tissue dataset GSE1379


1379

bull The easiest way to download data is using ldquogetGEOrdquo

function from ldquoGEOqueryrdquo package


Use ldquogetGEOrdquo to Download Databull We have downloaded the data you can use

ldquogetGEOrdquo function to get data locally or online

bull Local (loading_method = lsquolocalrsquo)geo_Name lt- lsquoGSE1378rsquo

geodata2 lt-getGEO(filename paste0(geo_datageo_Name_series_matrixtxtgz) GSEMatrix = TRUE)

bull Online (loading_method = lsquoonlinersquo)geodata lt- getGEO(geo_Name GSEMatrix = TRUEdestdir = geo_data)

bull You can set loading_method variable in the get_DEG_table function to rdquolocalrdquo or ldquoonlinerdquo to change the way of downloading data

bull Note that the downloaded geno matrix is in log2scale


Parsing Data

bull Extract the geno matrix pheno table and

feature tableidx lt- 1 geno lt- assayData(geodata[[idx]])$exprs

pheno lt- pData(phenoData(geodata[[idx]]))

feature lt- as(featureData(geodata[[idx]]) dataframe)

bull Parsing phenotype table to get variable

Age Size DFS censorinfos_df$Age = asnumeric(unlist(strsplit(infos_df$X9 split = =))[seq(2 2 n 2)])

infos_df$Size = asnumeric(unlist(strsplit(infos_df$X3 split = =))[seq(2 2 n 2)])

infos_df$DFS = asnumeric(unlist(strsplit(infos_df$X10 split = =))[seq(2 2 n 2)])

infos_df$censor = ifelse(infos_df$status == Status=recur 1 0)


Normalization

bull Gene wise normalization (subtract the

median log2 value)tmp_gm lt- apply(geno 2 median)

geno lt- geno - matrix(rep(1 numOfGene) numOfGene 1)

matrix(tmp_gm 1 n)

bull Sample wise normalization (divided

by mean value in original scale)geno lt- apply(geno c(1 2) function(x) 2 ^ x )

geno lt- t(apply(geno 1 function(x) x (mean(x)) ))

geno lt- apply(geno c(1 2) function(x) log2(x) )


Variance Based Filtering

bull Calculate the variance for each gene and

choose the top 25 variance based filtering (75th percentile)

var_geno lt- apply(geno 1 var)

var_filtered_idx lt- var_geno gt quantile(var_geno 075)

feature_var_filtered lt- feature[var_filtered_idx]

geno_var_filtered lt- geno[var_filtered_idx]


T test Based Filtering

bull For each gene do T test between the

recurrence and non-recurrence group

The status variable indicates the group

informationtmp_test lt- ttest(gene_express ~ status data = sdata alternative =

twosided)

pvalue_list[i] lt- tmp_test$pvalue

bull Fitering the gene by the P-value cutoffttest_filtered_idx lt- which(pvalue_list lt cutoff)

feature_ttest_filtered lt- feature_var_filtered[ttest_filtered_idx]

geno_ttest_filtered lt- geno_var_filtered[ttest_filtered_idx]


Sample Results

(GSE1378microdissected 00011 cutoff)


Sample Results

(GSE1379 whole tissue dataset cutoff 00011)


Statistical Modeling(examples)


Outline

bull Select overlapped genes between GSE1378 and

GSE1379 for subsequent analysis

bull Heatmap and Dendrogram

bull Univariate logistic regression for selected genes and

two-gene ratio predictor

bull Multivariate logistic regression (size and the other two

potential predictors)

bull Survival analysis part 1 Kaplan-Meier plot

bull Survival analysis part 2 Cox proportional odds model


Overlapped Genes

bull In the prepossessing step we obtained two DEG tables

for the datasets GSE1378 and GSE1379

bull We used the overlapped genes in this two DEG tables

for the subsequent analysis

bull GSE1378 Micro-dissected breast cancer cell (LCM)

bull GSE1379 Whole tissue section

bull The overlapped genes are HOXB13 (identified twice as

AI208111 and BC007092) IL17BR (AF2080111) and

AI240933 (EST)

bull We will study the prognostic value of these markers


Heatmap and Dendrogram

bull We use Heatmap and Dendrogram to

Visually check the relationship

(correlation) among genes or samples


Heatmap(microdissectedGSE1378)

consistent with the paper


Heatmap(whole section tissue GSE 1379)


Model Set 1

bull Univariate logistic regression for each

gene

ndash Response variable recurnon-recur

status

ndash Predictors one of the overlapped

genes HOXB13 IL17BR(AF2080111)

AI240933(EST)


Model Set 2

bull Univariate logistic regression for

ratio of genes


status

ndash Predictors HOXB13IL17BR


Model Set 3

bull Multivariate logistic regression

ndash Response variable recurnon-

recur

ndash Predictors tumor size

HOXB13IL17BR PGR and ERBB2


Model Set 4

bull Survival model

ndash Response variable DFS (disease free

survival time) censor

ndash Predictor use ldquo-interceptbetardquo from

logistic regression as the cutoff to

divide the sample into two groups high

ratio group and low ratio group


Important Note

bull Please remember there are two datasets GSE1378 and GSE1379

bull Can fit the same sets of model on these two datasets

bull Need to set the working dataset variable

working_dataset = GSE1378 whole tissue sectionGSE1379

working_dataset = GSE1378 microdissected breast cancer cells

GSE1378

bull Use working dataset GSE1378 as example


Univariate Logistic Regression

for Each Gene

bull As an example we check the gene HOXB13gb_acc = BC007092 HOXB13

geno_selected = geno[which(feature$GB_ACC == gb_acc)]

logit_data = dataframe(status = infos_df$statusgene = geno_selected )

fit lt- glm(status~ geno_selecteddata = logit_datafamily = binomial(link = logit))

p lt- predict(fit type=response)

pr lt- prediction(p infos_df$status)


plot(prfmain=paste0(ROC plot of gene gb_acc))


auc lt- aucyvalues[[1]]

auc


Sample Output (gene HOXB13 )


ROC (auc 0796 gene HOXB13 )


Univariate Logistic Regression (HOXB13IL17BR)

gb_acc1 = BC007092 HOXB13

gb_acc2 = AF208111 IL17BR

geno_selected1 = geno[which(feature$GB_ACC == gb_acc1)]


in the log2 scale the ratio is the difference

gene_ratio = geno_selected1-geno_selected2

logit_data = dataframe(status = infos_df$statusgene1 = geno_selected1 gene2 =

geno_selected2ratio =gene_ratio)

fit the model

fit lt- glm(status~ gene_ratiodata = logit_datafamily = binomial(link = logit))

summary(fit)


Sample Output(HOXB13IL17BR)


ROC (auc=084 HOXB13IL17BR)


Multivariate Logistic Regression(tumor size gene ratio PGR ERBB2)



gene_name3 = PGR_3UTR1 PGR

gene_name4 = BF108852 ERBB2



geno_selected3 = geno[which(feature$GeneName == gene_name3)]




logit_data = dataframe(status = infos_df$statussize = infos_df$Sizegene1 = geno_selected1 gene2 =

geno_selected2ratio =gene_ratiogene3= geno_selected3gene4= geno_selected4)

fit the multinvariate logistic regression

fit lt- glm(status~ gene_ratio+size+gene3+gene4data = logit_datafamily = binomial(link = logit))

summary(fit)


Sample Output (Multivariate)


ROC (auc = 086 Multivariate )


Kaplan-Meier Plot

(gene ratio highlow group cutoff = -12)




fitcox lt- coxph(Surv(timecensor) ~ group data = surv_data)

summary(fitcox)


Sample Output (Cox)


Validation GSE6532

bull The link to this dataset

httpwwwncbinlmnihgovgeoqueryacccgiacc=gse6532

bull Sample size87

bull Number of total markers 54675

bull Gene HOXB13IL17RB and ESTs are included in this dataset

bull We use this dataset as validation

bull Result They are not significant on this independent set


Outline

bull Logistic Regressionndash ROC curve and AUC

bull Linear Regression

bull Kaplan-Meier plot and log-rank test

bull Cox Proportional odds model


Logistic Model



log119901 119904119905119886119905119906119904 = 119903119890119888119906119903119903119890119899119888119890

1 minus 119901 119904119905119886119905119906119904 = 119903119890119888119906119903119903119890119899119888119890= β0 + 12057311199091 +⋯+ 120573119899119909119899



Linear Model


gene expression


expression)

bull Model 119910 = β0 + 12057311199091 +⋯+ 120573119899119909119899 + 120598



Survival Methods


between groups


statistical test





Load data



Logistic Model










calculate the auc






ROC Curve


Linear Model




summary(fitlm)




Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Logistic Model



log119901 119904119905119886119905119906119904 = 119903119890119888119906119903119903119890119899119888119890

1 minus 119901 119904119905119886119905119906119904 = 119903119890119888119906119903119903119890119899119888119890= β0 + 12057311199091 +⋯+ 120573119899119909119899



Linear Model


gene expression


expression)

bull Model 119910 = β0 + 12057311199091 +⋯+ 120573119899119909119899 + 120598



Survival Methods


between groups


statistical test





Load data



Logistic Model










calculate the auc






ROC Curve


Linear Model




summary(fitlm)




Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Linear Model


gene expression


expression)

bull Model 119910 = β0 + 12057311199091 +⋯+ 120573119899119909119899 + 120598



Survival Methods


between groups


statistical test





Load data



Logistic Model










calculate the auc






ROC Curve


Linear Model




summary(fitlm)




Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Survival Methods


between groups


statistical test





Load data



Logistic Model










calculate the auc






ROC Curve


Linear Model




summary(fitlm)




Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Load data



Logistic Model










calculate the auc






ROC Curve


Linear Model




summary(fitlm)




Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Logistic Model










calculate the auc






ROC Curve


Linear Model




summary(fitlm)




Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87








ROC Curve


Linear Model




summary(fitlm)




Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






ROC Curve


Linear Model




summary(fitlm)




Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Linear Model




summary(fitlm)




Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87








Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Kaplan-Meier Plot





group)





Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Kaplan-Meier Plot






information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87










information


summary(fitcox)


Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Cox Model Result



and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87







and Analysis


Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Outline

bull Download data

bull Parsing data

bull Normalization






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Data Availability



1378



1379












Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87














Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Parsing Data











Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Normalization




matrix(tmp_gm 1 n)



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87



















twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87











twosided)






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Sample Results



Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Sample Results





Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87








Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Outline











Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Overlapped Genes









AI240933 (EST)













Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87
















Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87











Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87








Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Model Set 1


gene


status



AI240933(EST)


Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Model Set 2


ratio of genes


status



Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Model Set 3



recur




Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Model Set 4

bull Survival model








Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Important Note






GSE1378




for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87







for Each Gene











auc















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87



















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87

















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87















fit the model


summary(fit)





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87

























summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87























summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87





















summary(fit)






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87










Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87








Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Kaplan-Meier Plot






summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87









summary(fitcox)


Sample Output (Cox)


Validation GSE6532



bull Sample size87






Sample Output (Cox)


Validation GSE6532



bull Sample size87






Validation GSE6532



bull Sample size87





Documents

Statistical Models - Purdue University · Big Data Training for Translational Omics Research Survival Methods • Kaplan-Meier plot: visually checking the survival curve between groups