1 Luigi Salmaso Associate Professor of Statistics University of Padova Research Group for the...
25
1 Luigi Salmaso Associate Professor of Statistics University of Padova Research Group for the Bladder Cancer multicentric study : PF. Bassi, C. Brombin, L. Corain, M. Racioppi, L. Salmaso ROBUST CLINICAL PREDICTION INTERNATIONAL SYMPOSIUM OF UROLOGY FUT-UROLOGY 2008
1 Luigi Salmaso Associate Professor of Statistics University of Padova Research Group for the Bladder Cancer multicentric study: PF. Bassi, C. Brombin,
1 Luigi Salmaso Associate Professor of Statistics University of
Padova Research Group for the Bladder Cancer multicentric study:
PF. Bassi, C. Brombin, L. Corain, M. Racioppi, L. Salmaso ROBUST
CLINICAL PREDICTION INTERNATIONAL SYMPOSIUM OF UROLOGY FUT-UROLOGY
2008
Slide 2
2 Topics Some considerations on DATA COLLECTION and STATISTICAL
METHODS most frequently used in UROLOGY Case study: INVASIVE
BLADDER CANCER Application and results of several statistical
methods to the case study Robust clinical prediction using the
NonParametric Combination of Dependent Permutation Tests ( NPC Test
) Conclusions and practical suggestions
Slide 3
3 Necessary steps for optimal statistical predictions Study
design Collecting data using a Web-based Database Study protocol .
. . Robust Statistical Analysis by suitable statistical methods
(e.g. Nonparametric permutation methods) Individual predictions
based, e. g., on nomograms or other techniques
Slide 4
4 Some considerations on DATA COLLECTION and STATISTICAL
METHODS most frequently used in UROLOGY The availability of an
electronic database can improve the quality and completeness of
collected data, reducing, in particular, the number of missing data
and the risk of imputation errors. Accuracy in defining the nature
(observational/ randomized/) and the endpoints of the study can
lead to a better choice of the sample size and of the subsequent
statistical analysis to perform.
Slide 5
5 ELECTRONIC DATABASE : An example WEB-based Database Variables
coding WEB-based Database
Slide 6
6 NonParametric Combination of Dependent Permutation Tests (NPC
Test) STATISTICAL ANALYSIS: standard methods and recent advances
Survival Analysis Univariate Test (Student t test, Wilcoxon)
Classification complex methods (Neural Networks, Artificial
Intelligence, ) Multivariate Methods (Logistic regression, )
Slide 7
7 Case study: INVASIVE BLADDER CANCER Total sample size: 1,003
subjects 469 subjects including DOD (Dead of Disease) and AWD
(Alive with Disease, i.e. statistically died) patients 534 subjects
including NED (Non Evidence of Disease) patients Lost patients and
DOC (Dead for Other Causes) patients were excluded Aim of the
study: Detecting variables (factors) that best predict the outcome
(DEAD or ALIVE) after a BLADDER CANCER DIAGNOSIS Italian
multicentric observational study (from Jan 2001 to Dec 2006)
Reference: prof. PF. Bassi (Univ. Cattolica, Rome)
Slide 8
8 TNM-Classification of Bladder Cancer has been used, according
to Wittekind & Sobin (2002), thus the original variables were
transformed into ordinal variables. 30 endpoints were considered as
relevant for the statistical analysis. Case study: INVASIVE BLADDER
CANCER First symptonDiagnosis patient state of health at the first
medical visit I Phase Diagnosis patient condition after bladder
cancer diagnosis II Phase Surgery patient state after surgery
(histopathological variables were examined) Diagnosis III Phase In
particular, the interest is in evaluating the importance of
endpoints, collected at three phases of the study, in predicting
the outcome.
Slide 9
9 Results of Kaplan-Meier (survival analysis) (artificial
example)
Slide 10
10 Results of univariate tests
Slide 11
11 The logistic regression model has been applied to the same
dataset but very poor results were obtained (only two significant
predictors: Stage TNM at I and II Phase) The main problems for
application: the inability of logistic regression to handle missing
values (missing data are present in 522 subjects out of 1,003
individuals); the high number of coefficients to be estimated so
that the recursive algorithm do not converge (after 1000
iterations). Note that when convergence is not achieved for
parameter estimates, results may be unreliable. Results of Logistic
Regression
Slide 12
12 Results of Logistic Regression
Slide 13
13 Results of Logistic Regression: Number and % of missing
values by variable Mean (missing values): 85,9 % mean (missing
values): 9% Subjects with at least one missing values: 522
(52%)
Slide 14
14 The multivariate permutation approach for hypothesis testing
by NonParametric Combination (NPC) offers the following advantages:
PERMUTATION APPROACH FOR HYPOTHESIS TESTING No need to specify the
dependence structure among variables Exact solutions Powerful tests
Treatment of missing values (missing completely at random, MCAR, or
not completely at random, not-MAR) It also deals with:
-Stratification -Multivariate categorical variables It handles:
-Mixed variables -Multivariate restricted alternatives NPC Test
implements methods and algorithms presented in several
international papers by prof. L. Salmaso and prof. F. Pesarin. L.
Salmaso leads an internationally recognised research group in
theoretical and applied nonparametric statistics. NPC TEST is a
unique and innovative statistical method (and software) that
provides researchers with authentic and powerful innovative
solutions in the field of hypotheses testing. Robust statistical
prediction using NPC Test
Slide 15
15 Robust statistical prediction using NPC Test FEATURES OF
STATISTICAL SOFTWARE NPC TEST 2.0 NPC TEST allows us to perform
hypothesis testing in the case of: Two and C samples with dependent
or independent variables Two and C samples with repeated measures
Stratified analysis NPC TEST also provides: Powerful test
statistics for the treatment of missing values One or two tailed
test Data (including mixed variables): categorical ordered
categorical numeric or continuous binary
Slide 16
16 t Statistic ANOVA differ. of means test statistics - missing
values Anderson Darling Cramer- Von-Mises Chi- square Modified Chi-
square Likelihoo d Ratio Robust statistical prediction using NPC
Test FEATURES OF STATISTICAL SOFTWARE NPC TEST 2.0 Combining
functions for intermediate tests include: An innovation of NPC TEST
w.r.t. existing methods consists in the performance of any
combination of tests, starting with an appropriate set of
elementary tests, leading to a multivariate or multistrata overall
global test through the NPC methodology. Elementary partial test
statistics include: Fisher Liptak Tippet Direct NPC TEST supports
all statistical software standard functions: data import, data
manipulating and produces an effective report that can be easily
integrated and customized by means of an efficient text
editor.
Slide 17
17 Robust statistical prediction using NPC Test
Slide 18
18 After processing variables thus obtaining p-values using NPC
methods, we also performed a control of the familywise error rate
(FWE) The need for multiplicity control arises when any problem is
structured into two or more experimental hypotheses (Finos and
Salmaso, 2006) In order to have an inference on all the hypotheses
defining the multivariate problem, it is necessary to control the
probability of erroneously rejecting at least one univariate
(elementary) hypothesis; this is called multivariate type I error
or familywise error rate (FWE) (Marcus et al., 1976) Robust
statistical prediction using NPC Test
Slide 19
19 Robust statistical prediction using NPC Test CLOSED TESTING
GRAPHICAL REPRESENTATION
Slide 20
20 Results of NPC Test
Slide 21
21 Results of NPC Test
Slide 22
22 Results of NPC Test
Slide 23
23 NPC method can offer a significant contribution to
successful research in biomedical studies with several endpoints
The advantages of NPC Test are connected with its flexibility of
handling any type of variables We recommended the use of this
methodology whenever the normality assumption is hard to justify,
in presence of missing values and when the number of variables is
higher than the number of subjects Conclusions and practical
suggestions
Slide 24
24 Bassi P.F., Pagano F. (2007). Invasive Bladder Cancer.
Springer. Corain L., Salmaso L. (2007). A critical review and a
comparative study on conditional permutation tests for two-way
ANOVA. Communications in Statistics Simulations and Computation,
36, 791-805. Finos L., Salmaso L. (2006). Weighted methods
controlling the multiplicity when the number of variables is much
higher than the number of observations. Journal of Nonparametric
Statistics, 18, 245-261. Finos L., Salmaso L. (2006). FDR- and
FWE-controlling methods using data- driven weights. Journal of
Statistical Inference and Planning, 137, 3859-3870. Finos L.,
Salmaso L., Solari A. (2007). Conditional Inference under
simultaneous stochastic ordering constraints. Journal of
Statistical Inference and Planning, 137, 2633-2641. Marcus R.,
Peritz E., Gabriel K.R. (1976). On closed testing procedures with
special reference to ordered analysis of variance. Biometrika, 63,
655-660. Marozzi M., Salmaso L. (2006). Multivariate Bi-Aspect
Testing for Two-Sample Location Problem. Communications in
Statistics Theory and Methods, 35, 477-488. Salmaso L., Solari A.
(2005). Multiple aspect testing for case-control designs. Metrika,
62, 331-340. Wittekind C., Sobin L. H. (2002). TNM Classification
of malignant tumours UICC, International Union Against cancer (6.
ed.). Wiley-Liss, New York.
http://www.gest.unipd.it/~salmaso/NPC_TEST.htm REFERENCES
Slide 25
25 We applied a neural network model (Multilayer Perceptron) to
the same dataset By applying a k-fold cross-validation, we obtained
a rate of right classification of 75.3% for DOD+AWD and of 60.5%
for NED. By using the subset of variables identified by univariate
analysis we got a very similar performance (74.5% and 62.4%) Main
problems of neural networks are: Neural network work as black
boxes, hence it is not possible to convert the neuronal structure
into a known model structure All input fields must be numeric (in
the study we do not have numerical but ordinal categorical
variables) Neuronal networks can suffer from a problem called
interference (i.e. to forget some of what it learned on older data)
Results of Neural Networks