Upload
laszlo
View
45
Download
0
Tags:
Embed Size (px)
DESCRIPTION
3 rd Summer School in Computational Biology September 10, 2014. Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory Center for Cancer Research and Cell Biology Queen’s University Belfast, UK. Exercise – Survival Analysis. Homework ~ 1.5 hours. - PowerPoint PPT Presentation
Citation preview
3rd Summer Schoolin Computational Biology
September 10, 2014
Frank Emmert-Streib & Salissou MoutariComputational Biology and Machine Learning Laboratory
Center for Cancer Research and Cell Biology Queen’s University Belfast, UK
Exercise – Survival Analysis
Homework ~ 1.5 hours
3
1. Kaplan-Meier Survival Curves
4
Result: Survival Curve
S(t)
5
Goal: estimate S(t) from data
• A survival curve shows S(t) as a function of t.– S(t): survival function (survivor function)– t: time
S(t) gives the probability that the random variable T is larger than a specified time t, i.e.,S(t) = Pr(T>t)T: is the event
Problem: censoring
6
Small example: Leukemia
Chemotherapy(we use this info later)
censoring
Acute Myelogenous Leukemia (AML)
survival time
Only 5 patients
7
Small example: Leukemia
censoring
Number in risk Number of events
event
???
8
Kaplan-Meier estimator for S(t)
• Estimator:
ni: number of subjects at time ti
di: number of events at time ti
Kaplan & Meier 1958
9
Kaplan-Meier estimator for S(t)
• Estimator:
ni: number of subjects at time ti
di: number of events at time ti
10
Check S(t) till t
11
Kaplan-Meier estimator for S(t)
• Estimator:
ni: number of subjects at time ti
di: number of events at time ti
12
Check S(t) till t
13
Kaplan-Meier estimator for S(t)
• Estimator:
ni: number of subjects at time ti
di: number of events at time ti
Last time seen,still alive at thattime
14
Check S(t) till t
15
Kaplan-Meier estimator for S(t)
• Estimator:
ni: number of subjects at time ti
di: number of events at time ti
16
Check S(t) till t
17
Kaplan-Meier estimator for S(t)
• Estimator:
ni: number of subjects at time ti
di: number of events at time ti
18
Check S(t) till t
19
Full data set: Leukemia
23 patients
20
R code
21
2. Comparing Survival Curves
22
Reasons for comparing survival curves (SC)
• Treatment vs no treatment:– Compare a SC for patients that have been treated
with a certain medication with the SC for patient that have not been treated.
– Result: Has the treatment an effect on the survival of the patients?
23
Reasons for comparing survival curves
• Chemotherapy vs no chemotherapy :– Compare a SC for patients that had chemotherapy
with the SC for patient that have not had chemotherapy.
– Result: Has the chemotherapy an effect on the survival of the patients?
Survival Analysis has a big practical relevance
24
Data: Leukemia
11 patients with chemo12 patients without
Goal: compare thetwo SCs statistically
Group 1
Group 2
25
R code
26
Log-rank test (Mantel-Haenszel)
• Hypothesis:Null hypothesis H0: No difference in survival between (group 1) and (group 2).
Alternative hypothesis H1: Difference in survival between (group 1) and (group 2).
Mantel and Haenszel 1959
27
Idea of the test
• For each time t, estimate the expected number of events for (group 1) and (group 2).
Number in risk at t in i Number of events at t in i
28
The eit are obtained assuming H0 is true.Hence, mit – eit is a measure for the deviation of the data from H0.
sum
E2E1 O1 - E1 O2 – E2
29
Wrapping up
• Test statistic:
• Sampling distribution:s follows a chi-square distribution with one degree of freedom
30
R code
• Back to our leukemia data set:
31
Data: Leukemia
11 patients with chemo12 patients without
Goal: compare thetwo SCs statistically
Group 1
Group 2
Survival Analysis & Biomarkers
NIH Definition of Biomarker
A characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to therapeutic intervention.
FDA Definition of Biomarker
Any measurable diagnostic indicator that is used to assess the risk or presence of disease
What is a biomarker?
These definitions are very broad and do not help in finding practical implementations for a particular disease.
Our “definition”
Remark: We do not want to address all possible problems that can involve biomarkers but focus on a particular application.
Application: Identify a set of genes that can be used for a prognostic analysis.
…that are good!
Definition of ‘prognosis’
A prognosis is a medical term denoting the prediction of how a patient will progress over time.
For instance, a patient with a diagnosed disease can have:– Long time survival– Short time survival
Our “definition”
Remark: We do not want to address all possible problems that can involve biomarkers but focus on a particular application.
Application: Identify a set of genes that can be used for a prognostic analysis.
• Set of genes: we call biomarkers • Use biomarkers to predict the prognostic outcome of
a patientto classifysurvival
Underlying idea to identify biomarkers
The identification of biomarkers is a composite approach (or a procedure) that is based on a couple of other methods.
In the previous example:1. Survival analysis2. Differential expression of genes 3. Classification
Underlying idea to identify biomarkers
The identification of biomarkers is a composite approach (or a procedure) that is based on a couple of other methods.
In the previous example:1. Clustering2. Survival analysis3. Differential expression of genes 4. Classification
Our “definition”
Remark: We do not want to address all possible problems that can involve biomarkers but focus on a particular application.
Application: Identify a set of genes that can be used for a prognostic analysis.
Structured patient groups vs unstructured patient groups
Statistics: Feature selection problem
Underlying idea to identify biomarkers
The identification of biomarkers is a composite approach (or a procedure) that is based on a couple of other methods.
The definition of the procedure is part of the experimental design of the whole experiment.
Yes, the experimental design includes the analysis of the data!
Summary & Outlook to Genome
and Network Medicine
Almost there!
Schedule
17 lectures
Interdisciplinary summer school
Vision of the VC
Universities require interdisciplinary engagement in the educational and research
effort
Professor Patrick Johnston of President andVice-Chancellor (VC) of Queen’s University
A look 5 years ahead
1. Single cell experiments
Experimental measurements of– DNA– Gene expression (mRNA)– Protein binding
within single cells.
What do the other high-throughput data provide information for? Populations of cells.
NGS
1. Single cell experiments
Experimental measurements of– DNA– Gene expression (mRNA)– Protein binding
within single cells.
What do the other high-throughput data provide information for? Populations of cells.
NGS
Study the heterogeneity of cancer tumors.
1. Single cell experiments
PacBio (Pacific Biosciences)SMRT: Single molecule real time sequencing
2. Personalized Medicine
The idea behind Personalized medicine is to provide a customization of healthcare using molecular analysis - with medical decisions, practices etc, which are tailored to the needs of the individual patient.
One drug for all customized treatment.
2. Personalized Medicine
2012
What does this all mean?
What does this all mean?
It means first of all more data!
What does this all mean?
It means first of all more data!
Survey
Please participate in the survey about the summer school in order to help us to improve.
We will send it early next week.
Thank you to everyone for participating!
We hope you enjoyed the summer school.