Upload
phungdiep
View
216
Download
0
Embed Size (px)
Citation preview
AUTOMATED CLASSIFICATION SYSTEM
FOR DETECTION AND PREDICTION OF
OSTEOARTHRITIS IN HUMAN KNEE JOINTS
TOMASZ WOLOSZYNSKI
(MSC)
THIS THESIS IS PRESENTED FOR THE DEGREE OF DOCTOR OF
PHILOSOPHY OF THE UNIVERSITY OF WESTERN AUSTRALIA
SCHOOL OF MECHANICAL AND CHEMICAL ENGINEERING
2011
ABSTRACT
The development of an automated classification system for detection and
prediction of knee osteoarthritis (OA) is of great interest to the medical
community. The system, once developed, would aid or replace human experts in
the assessment of risk and severity of knee OA and other chronic and progressive
joint diseases. Also, the system would provide inexpensive and reliable means
for patient monitoring and diagnosis and hence it would be a valuable tool in the
evaluation of drug treatment effects against knee OA. To date, a few attempts to
develop such a system have been reported in the literature. However, the systems
developed cannot detect the disease in its earliest stage and they are sensitive to
knee imaging conditions. Therefore, there is a growing need for the development
of an accurate and robust system for detection and prediction of knee OA.
This thesis is divided into three parts. The first part presents the development
(Chapter 2) and evaluation (Chapters 2 and 3) of a new method for measuring
distances between trabecular bone (TB) texture regions selected on knee
radiographs. The method developed, called a signature dissimilarity measure
(SDM), quantifies texture roughness and orientation at predefined scales that can
be adjusted to trabecular image sizes at which OA changes are most prominent.
Unlike other methods, the SDM method is invariant to in-plane rotation and
predefined scales. To evaluate the method developed, data sets of TB texture
images from healthy and OA knees and from knees with non-progressive and
progressive OA were constructed. The results obtained have demonstrated
that the method developed has high classification accuracies in detection and
prediction of progression of knee OA and, when combined with a support vector
i
ABSTRACT
machine (SVM) classifier, it outperforms a benchmark method used for knee
classification. The invariance of the SDM method to knee imaging conditions was
also evaluated. Using radiographs of frozen tibia head and computer-generated
fractal texture images, the method developed was found to be invariant to a
range of exposure, magnification, image size, anisotropy direction, noise, and
blur encountered in a routine screening of knee radiographs.
In the second part, a general-purpose classification method that would be more
accurate than single classifiers (e.g. the SVM classifier) was developed and
evaluated on benchmark data sets (Chapter 4). To achieve this, a special hybrid
fusion-selection method, called a measure of competence based on random
classification (MCR), was developed and used with ensembles of homogeneous
and heterogeneous classifiers. The MCR method estimates local (i.e. for each
image) classification accuracies of all classifiers in the ensemble and then selects
classifiers that have better-than-random accuracies. Using the majority voting
rule to combine the classifiers selected, the MCR method achieved the best
performance on 14 benchmark data sets.
The third part of this thesis presents a combination of the SDM and MCR methods
to form a dissimilarity-based multiple classifier (DMC) system (Chapter 5).
To combine the two methods, a special approach was applied to generate
ensembles of classifiers using the distances measured between TB texture images.
Performance of the DMC system in detection of knee OA and prediction of knee
OA progression was investigated and compared against benchmark systems. The
results obtained showed that the system developed is the most accurate system
in discriminating between healthy and OA and between non-progressive and
progressive knees.
In conclusion, the DMC system developed can accurately detect and predict knee
OA. The system could also find applications in other areas of medicine and in
ii
ABSTRACT
engineering. This includes diagnosis and prognosis of diseases based on analysis
of medical images and machine condition monitoring based on classification of
anisotropic and textured engineering surfaces.
iii
CONTENTS
ABSTRACT i
CONTENTS iv
ACKNOWLEDGEMENTS viii
JOURNAL PUBLICATIONS AND CONFERENCE PRESENTATIONS ARISING
FROM THIS THESIS ix
STATEMENT OF CANDIDATE CONTRIBUTION xi
ABBREVIATIONS xii
CHAPTER 1. INTRODUCTION 1
1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Thesis objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Thesis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5. List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE FOR TRABECULAR BONE
TEXTURE IN KNEE RADIOGRAPHS 16
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1 Scale-space representation of texture image . . . . . . . . . . . . 20
2.2 Roughness signature . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Orientation signature . . . . . . . . . . . . . . . . . . . . . . . . . 23
iv
CONTENTS
2.4 Signature dissimilarity measure . . . . . . . . . . . . . . . . . . . 25
3. Materials and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Brodatz textures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Fractal textures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Tibia head . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Healthy and osteoarthritic knees . . . . . . . . . . . . . . . . . . 33
3.5 Computational times . . . . . . . . . . . . . . . . . . . . . . . . . 36
4. Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 36
5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6. List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7. List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
CHAPTER 3. PREDICTION OF PROGRESSION OF RADIOGRAPHIC KNEE
OSTEOARTHRITIS USING TIBIAL TRABECULAR BONE TEXTURE 66
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2. Subjects and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.1 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2 Acquisition and grading of knee radiographs . . . . . . . . . . . 70
2.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.4 Trabecular bone image analysis . . . . . . . . . . . . . . . . . . . 71
2.5 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6. List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7. List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
CHAPTER 4. A MEASURE OF COMPETENCE BASED ON RANDOM
CLASSIFICATION FOR DYNAMIC ENSEMBLE SELECTION 93
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2. Theoretical framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
v
CONTENTS
2.1 Measure of competence based on random classification (MCR) . 98
2.2 Theoretical justification . . . . . . . . . . . . . . . . . . . . . . . . 98
3. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.1 DES-P and DES-KL classification systems . . . . . . . . . . . . . 100
4. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.1 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.2 MCSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.3 Classifier ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9. List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM FOR
TRABECULAR BONE TEXTURE IN KNEE RADIOGRAPHS: DETECTION AND
PREDICTION OF OSTEOARTHRITIS 119
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
2. Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
2.1 Subjects and radiographs . . . . . . . . . . . . . . . . . . . . . . . 123
2.3 DMC system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.4 Comparison against other benchmark systems . . . . . . . . . . 128
3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5. List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6. List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 146
1. Summary of main findings and observations . . . . . . . . . . . . . . . 146
2. General conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
vi
CONTENTS
3. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
3.1 Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
3.2 Other areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
vii
ACKNOWLEDGEMENTS
I would like to express my gratitude and appreciation to those who have made
the completion of this thesis possible.
Firstly, I would like to thank my supervisors, Winthrop Professor Gwidon
Stachowiak for his invaluable help and guidance throughout the entire doctoral
research, Associate Professor Pawel Podsiadlo for providing me with technical
expertise on medical image processing and for his commitment, patience and
support during the preparation of all scientific materials arising from this
thesis and Professor Marek Kurzynski for stimulating discussions on combining
classifiers.
Secondly, I would like to thank Professor Stefan Lohmander from Lund
University (Sweden) and University of Southern Denmark and Associate
Professor Martin Englund from Lund University and Boston University School
of Medicine (MA, USA) for their collaboration and help on evaluating method
developed in this thesis in clinical settings.
Thirdly, I would like to acknowledge The University of Western Australia for
providing me with a financial support during the time of my PhD study and
the School of Mechanical and Chemical Engineering for providing the necessary
infrastructure and environment to conduct this work.
Finally, I would like to thank my parents for their unconditional support
throughout my postgraduate studies.
viii
JOURNAL PUBLICATIONS AND CONFERENCE
PRESENTATIONS ARISING FROM THIS THESIS
Journal publications
Tomasz Woloszynski, Pawel Podsiadlo, Gwidon W Stachowiak, and Marek
Kurzynski. A signature dissimilarity measure for trabecular bone texture in knee
radiographs. Medical Physics 2010;37:2030–2042 (Chapter 2).
Tomasz Woloszynski, Pawel Podsiadlo, Gwidon W Stachowiak, Marek
Kurzynski, L Stefan Lohmander, and Martin Englund. Prediction of Progression
of Radiographic Knee Osteoarthritis Using Tibial Trabecular Bone Texture.
Arthritis & Rheumatism 2012;64;688–695 (Chapter 3).
Tomasz Woloszynski, Marek Kurzynski, Pawel Podsiadlo, and Gwidon W
Stachowiak. A measure of competence based on random classification for
dynamic ensemble selection. Information Fusion 2012;13;207–212 (Chapter 4).
Tomasz Woloszynski, Pawel Podsiadlo, Gwidon W Stachowiak, and Marek
Kurzynski. Dissimilarity based multiple classifier system for trabecular bone
texture in knee radiographs: detection and prediction of osteoarthritis. Submitted
to Proceedings of the Institution of Mechanical Engineers, Part H, Journal of
Engineering in Medicine (Chapter 5).
ix
JOURNAL PUBLICATIONS AND CONFERENCE PRESENTATIONS . . .
Conference presentations
Tomasz Woloszynski, Pawel Podsiadlo and Gwidon W Stachowiak.
Classification of bone texture for detection of early knee osteoarthritis. Oral
presentation at ASIATRIB 2010 - Tribology Congress in Australia, December
2010, Perth, Australia.
The presentation was awarded a ”Young Investigators Award for the Outstanding
Paper.”
Tomasz Woloszynski, Pawel Podsiadlo and Gwidon W Stachowiak. A
multiple classifier bone texture system for prediction of knee osteoarthritis
progression. Oral presentation at International Tribology Conference Hiroshima
2011, October–November 2011, Hiroshima, Japan.
Invited talk
Tomasz Woloszynski, Pawel Podsiadlo and Gwidon W Stachowiak. Bone texture
analysis for detection and prediction of knee osteoarthritis. Oral presentation at
International Forum on ”Front-line of Tribology in the Asian Region” in JAST
(Japanese Society of Tribologists) Tribology Conference, May 2011, Tokyo, Japan.
x
STATEMENT OF CANDIDATE CONTRIBUTION
Tomasz Woloszynski (70%), Pawel Podsiadlo, Gwidon W Stachowiak, and Marek
Kurzynski. A signature dissimilarity measure for trabecular bone texture in knee
radiographs. Medical Physics 2010;37;2030–2042 (Chapter 2).
Tomasz Woloszynski (70%), Pawel Podsiadlo, Gwidon W Stachowiak, Marek
Kurzynski, L Stefan Lohmander, and Martin Englund. Prediction of Progression
of Radiographic Knee Osteoarthritis Using Tibial Trabecular Bone Texture.
Arthritis & Rheumatism 2012;64;688–695 (Chapter 3).
Tomasz Woloszynski (70%), Marek Kurzynski, Pawel Podsiadlo, and Gwidon
W Stachowiak. A measure of competence based on random classification for
dynamic ensemble selection. Information Fusion 2012;13;207–212 (Chapter 4).
Tomasz Woloszynski (70%), Pawel Podsiadlo, Gwidon W Stachowiak, and
Marek Kurzynski. Dissimilarity-based multiple classifier system for trabecular
bone texture in knee radiographs: detection and prediction of osteoarthritis.
Submitted to Proceedings of the Institution of Mechanical Engineers, Part H,
Journal of Engineering in Medicine (Chapter 5).
Candidate signature: . . . . . . . . . . . . . . . . . . . . . . . . . . .Tomasz Woloszynski
Coordinating supervisor signature: . . . . . . . . . . . . . . . . . . . . . . . . . . .Professor Gwidon W. Stachowiak
xi
ABBREVIATIONS
AUC area under receiver operating characteristic curve
AP anteroposterior
BMI body mass index
BMD bone mineral density
BPNN backpropagation neural network
CI confidence interval
DCS dynamic classifier selection
DCS-LA DCS local accuracy
DCS-MCB DCS multiple classifier behaviour
DCS-MLA DCS modified local accuracy
DES dynamic ensemble selection
DES-KE DES knora eliminate
DES-KL DES Kullback-Leibler
DES-P DES performance
DMC dissimilarity-based multiple classifier
EMD earth mover’s distance
EP ensemble pruning
FD fractal dimension
FSA fractal signature analysis
JSN joint space narrowing
k-NN k nearest neighbours
KL scale Kellgren and Lawrence scale
KL divergence Kullback-Leibler divergence
LBP local binary patterns
xii
ABBREVIATIONS
LDC linear discriminant classifier
LKC Ludmila Kuncheva collection
MCR measure of competence based on random classification
MCS multiple classifier system
MTF modulation transfer function
MRI magnetic resonance imaging
MV majority voting
NMC nearest mean classifier
NN nearest neighbour
OA osteoarthritis
OARSI osteoarthritis research society international
PCA principal component analysis
PGD principal gradient direction
QDC quadratic discriminant classifier
RBF radial basis function
ROC receiver operating characteristic
ROI region of interest
SB single best
SD standard deviation
SDM signature dissimilarity measure
SVM support vector machine
TB trabecular bone
UCI University of California machine learning repository
WND-CHARM weighted neighbor distance using compound hierarchy of
algorithms representing morphology
xiii
CHAPTER 1
INTRODUCTION
This thesis is arranged as a series of four journal papers. The papers 1, 2 and
3 have been published, while the paper 4 has been submitted for publication.
The papers represent development and progression of ideas that lead to the
completion of this thesis.
1. Background
Automated classification system for detection and prediction of knee
osteoarthritis (OA) can be defined as a method used to assign a knee into one
of predefined classes. The assignment is based on computer-aided assessment
of OA changes in knee images and the classes are defined according to the
disease grading. Although a number of methods for knee classification have
been reported in the literature, there are currently no accurate methods that could
detect and predict the disease and that are invariant to knee imaging conditions.
Therefore, an accurate and robust system for detection and prediction of knee OA
is required.
Methods used for OA assessment can be divided into two groups: statistical
and classification/regression methods. The first group includes methods that
calculate statistical parameters from knee images using histomorphometric
analysis [1, 2], fractal signature analysis (FSA) [3–6], Hurst orientation
transform [7] and variance orientation transform [8]. Although some of the
1
CHAPTER 1. INTRODUCTION
methods provide detailed description of the images, their application to detection
and prediction of knee OA is not simple. Also, the methods are sensitive to noise,
magnification, projection angle, and angular space quantisation [7, 8].
The second group includes methods in which detection and prediction of knee
OA are formulated as classification/regression problems. To date, to the best of
the author’s knowledge, two such methods have been developed: a weighted
neighbor distance using a compound hierarchy of algorithms representing
morphology (WND-CHARM) image classification system [9] and a regression
model that calculates shape parameters based on FSA [10]. The WND-CHARM
system extracts a large number of texture features from knee radiographs and
uses the features with the nearest neighbour classifier. The regression model
calculates horizontal and vertical shape parameters from trabecular bone (TB)
texture regions in knee radiographs and uses the parameters with a generalised
linear model. However, applications of the system and the model are rather
limited. This is because features extracted in the WND-CHARM system are
redundant, require extensive computation time and have little or no physical
interpretation [11], and shape parameters calculated in the regression model
describe TB texture only in the horizontal and vertical directions and the
box-counting technique used is highly sensitive to image signal-to-noise ratio
and trabecular marrow pore size [4]. Also, the nearest neighbour classifier and
the generalised linear model used are sensitive to outliers and cannot produce
nonlinear decision boundaries. Therefore, a new system that avoids problems
associated with the use of image features, that is invariant to imaging conditions
and that overcomes limitations of single classifiers and models needs to be
developed and evaluated in detection and prediction of knee OA. This issue is
addressed in this thesis.
The system developed could also find applications in other areas. In medicine,
this includes classification of breast lesions from ultrasound images [12],
2
CHAPTER 1. INTRODUCTION
diagnosis of interstitial lung disease based on chest radiography [13, 14] and
assessment of dermoscopic images for skin lesions [15]. In engineering, the
system could be used for classification of anisotropic/textured surfaces of wear
particles [16, 17], e.g. in a machine condition monitoring tool.
Thus, this thesis aims at the development and evaluation of an automated
classification system for detection and prediction of knee OA which could also be
useful in other areas of medicine and in engineering. Although several attempts
to develop such a system have been reported in the literature, there has been little
work conducted on theoretical foundations of the systems developed and on the
effects of imaging conditions on their performance.
3
CHAPTER 1. INTRODUCTION
2. Thesis objectives
The following thesis objectives were formulated:
I. Development of an automated system for detection and prediction of knee
OA, which includes
• Development of a method for measuring distances between TB texture
images,
• Evaluation of the method under varying imaging conditions and in TB
texture classification,
• Development of a classification method based on classifier ensembles,
• Evaluation of the classification method on benchmark data sets.
II. Evaluation of the system developed on TB texture images, which includes
• Detection of knee OA,
• Prediction of knee OA progression.
4
CHAPTER 1. INTRODUCTION
3. Thesis overview
Thesis overview is illustrated in a diagram in Fig. 1.
Chapter 2 (paper 1): A signature dissimilarity measure for
trabecular bone texture in knee radiographs
In this chapter, a new method for measuring distances between TB texture
regions selected on knee radiographs was developed and evaluated. The method
developed, called a signature dissimilarity measure (SDM), quantifies roughness
and orientation of the bone texture for a predefined range of scales. The ability to
quantify texture roughness and orientation at predefined scales is important for
the assessment of biological and engineering surfaces since most of them exhibit
multiscale and anisotropic nature. The method developed is also invariant to
in-plane rotation and scale within the predefined range. In contrast, methods
used so far are sensitive to image rotation and scale [9] or describe the bone
texture only in the horizontal and vertical directions [10].
Changes in TB structure were shown to occur first on the development pathway
to knee OA [18, 19]. This indicates that accurate assessment of the bone structure
could be used not only for detection, but more importantly for prediction of knee
OA, aiding medical experts in diagnosis and prognosis of the disease. To date,
the most popular method used in the assessment of the bone structure has been
plain radiography [10, 20, 21]. This is because it is cheap, non-invasive, widely
accessible and it produces two-dimensional bone texture that is directly related
to the three-dimensional bone structure [22–25]. Therefore, the SDM method was
developed for TB texture images.
The performance of the SDM method in detection of knee OA and in
rotation-invariant texture classification was studied using TB texture images of
5
CHAPTER 1. INTRODUCTION
healthy and OA knees and images taken from Brodatz album [26]. The effects
of imaging conditions such as exposure, magnification and projection angle on
the SDM method were investigated using knee radiographs of frozen tibia head.
In addition, computer-generated fractal texture images were used to evaluate
invariance of the method to image size, anisotropy direction, noise, and blur.
The results obtained showed that the SDM method combined with the support
vector machine (SVM) classifier outperforms the benchmark WND-CHARM
system in knee OA detection. The performance of the method in
rotation-invariant texture classification was comparable to a benchmark Local
Binary Patterns system [27]. Also, it was found that the SDM method is invariant
to a range of exposure, magnification, image size, anisotropy direction, noise, and
blur encountered in a routine screening of knee radiographs.
From the work described in this chapter, it was concluded that the SDM method
can quantify TB texture roughness and orientation in details and that it is
invariant to a range of imaging conditions. Therefore, the method developed
had a potential for detection and prediction of knee OA and it was used in the
subsequent studies.
Chapter 3 (paper 2): Prediction of progression of radiographic knee
osteoarthritis using tibial trabecular bone texture
This chapter describes the evaluation of the SDM method in prediction of
progression of early and late knee OA using tibial TB texture. If the disease
could be predicted, this would indicate that the SDM method can assess OA
changes in TB that occur before progression of joint space narrowing (JSN) and
osteophytes. This work is a part of the current trend of research directed towards
the development of an accurate, low-cost and non-invasive system for prediction
of knee OA [10, 11].
6
CHAPTER 1. INTRODUCTION
A longitudinal study design with baseline and follow-up examinations of four
years apart was used to evaluate prediction accuracy of the SDM method.
All subjects studied underwent partial meniscectomy about 16 years prior to
the baseline. Knees of all subjects were divided into non-progressive and
progressive groups based on the difference in medial JSN grade between the two
examinations. For each knee, TB regions of interest (ROIs) were selected from
standing anteroposterior digital radiographs taken at baseline.
Three texture parameters, i.e. roughness, degree of anisotropy and direction of
anisotropy were calculated for the selected ROIs using the SDM method. The
results obtained for a generalised linear model based on the parameters showed
that the SDM method can successfully discriminate between non-progressive and
progressive knees. In particular, it was shown that high prediction accuracy can
be obtained for knees with early OA at baseline, i.e. for knees that have no or
doubtful radiographic signs of the disease. The results also showed that the SDM
method provides detailed description of OA changes in the bone texture that
occur before progression of radiographic features such as JSN and osteophytes.
In conclusion, the results presented in this chapter demonstrated that the SDM
method can predict loss of tibiofemoral joint space in knees with early and late
OA and that it can describe OA changes in TB texture in detail.
Chapter 4 (paper 3): A measure of competence based on random
classification for dynamic ensemble selection
Although the SDM method produces higher classification accuracies for TB
texture than the benchmark system, the accuracies could be further increased
by combining the method with classifier ensembles instead of single classifiers
(e.g. the SVM classifier) and models. The rationale behind this is that classifier
ensembles overcome most limitations of single classifiers and models and they
7
CHAPTER 1. INTRODUCTION
showed the best performance for a wide range of classification problems [28–31].
In this chapter, a new method based on classifier ensembles that could be used
with the SDM method was developed and evaluated.
The method developed, called a measure of competence based on random
classification (MCR), first estimates local (i.e. for each image) classification
accuracies of all classifiers in the ensemble and then selects classifiers that have
better-than-random accuracies. The method uses distances between images
instead of image features and therefore it is compatible with the SDM method.
Theoretical result showed that the MCR method improves the performance of
the majority voting rule.
The performance of the MCR method was investigated on 14 benchmark data
sets. The results obtained showed that the method developed outperforms other
methods based on classifier ensembles regardless of the ensemble type used
(homogeneous or heterogeneous). The results also showed that the MCR method
gives the best performance for classifier ensembles of different sizes.
From the results described in this chapter, it was clear that the MCR method
can reliably select accurate classifiers from homogeneous and heterogeneous
classifier ensembles and that it can perform well for a wide range of classification
problems.
Chapter 5 (paper 4): Dissimilarity-based multiple classifier system
for trabecular bone texture in knee radiographs: detection and
prediction of osteoarthritis
In this chapter, the SDM and MCR methods developed were combined into a
dissimilarity-based multiple classifier (DMC) system and used for detection and
prediction of knee OA. For this purpose, a special approach was applied to
8
CHAPTER 1. INTRODUCTION
generate homogeneous and heterogeneous classifier ensembles using distances
measured between TB texture images.
The accuracies of the DMC system in detection and prediction of knee OA were
evaluated using TB texture images of healthy and OA knees and knees with
non-progressive and progressive OA. The disease progression was defined as
an increase in the sum of JSN and osteophytes grades between baseline and
follow-up four years later.
The results obtained demonstrated that the SDM method can accurately
discriminate between healthy and OA knees and between knees with
non-progressive and progressive OA. The accuracies obtained were higher
than those of other classification systems, including the SDM method combined
with the SVM classifier.
Concluding remarks
This thesis began with the development of a new method (SDM) for measuring
distances between TB texture images. The aim was to determine if the method
developed could be applied to texture quantification in medicine. Using real-life
and artificial texture images, the SDM method was shown to be successful in
detection of knee OA and invariant to a range of imaging conditions. Further
evaluation of the method evidenced that it can predict progression of early and
late knee OA and that it can provide a detailed description of OA changes in the
bone texture. The successful quantification of texture roughness and orientation
for predefined scales demonstrated potential of the SDM method in medicine
where surfaces studied are multiscale and anisotropic in nature.
After evaluation of the SDM method, a new method (MCR) for selecting
accurate classifiers from homogeneous and heterogeneous classifier ensembles
9
CHAPTER 1. INTRODUCTION
was developed. The aim was to increase classification accuracies of the SDM
method in detection and prediction of OA. The MCR method was tested on
classifier ensembles of different types and sizes and it was shown to perform
well for a wide range of classification problems. The SDM and MCR methods
developed were then used to form an automated system (DMC) for texture
classification. Using the system, the highest classification accuracies for detection
and prediction of progression of knee OA were achieved.
In conclusion, the DMC system developed can be a useful decision-support
tool in medicine. This includes diagnosis and prognosis of joint diseases based
on quantification and classification of medical texture images. Since the DMC
system is accurate for multiscale and anisotropic texture images and it is invariant
to a range of imaging conditions, it could also find applications in other areas,
e.g. machine condition monitoring based on classification of wear particles and
product quality control based on quantification of surface morphology.
10
CHAPTER 1. INTRODUCTION
4. References
[1] L Kamibayashi, U P Wyss, T D V Cooke, and B Zee. “Trabecular
Microstructure in the Medial Condyle of the Proximal Tibia of Patients with
Knee Osteoarthritis”. In: Bone 17 (1995), pp. 27–35.
[2] L Kamibayashi, U P Wyss, T D V Cooke, and B Zee. “Changes in mean
trabecular orientation in the medial condyle of the proximal tibia in
osteoarthritis”. In: Calcified Tissue International 57 (1995), pp. 69–73.
[3] E A Messent, R J Ward, C J Tonkin, and C Buckland Wright. “Tibial
cancellous bone changes in patients with knee osteoarthritis. A short term
longitudinal study using Fractal Signature Analysis”. In: Osteoarthritis and
Cartilage 13 (2005), pp. 463–470.
[4] J A Lynch, D J Hawkes, and J C Buckland Wright. “Analysis of texture in
macroradiographs of osteoarthritic knees using the fractal signature”. In:
Physics in Medicine and Biology 36 (1991), pp. 709–722.
[5] J C Buckland Wright, J A Lynch, and D G Macfarlane. “Fractal signature
analysis measures cancellous bone organisation in macroradiographs of
patients with knee osteoarthritis”. In: Annals of the Rheumatic Diseases 55
(1996), pp. 749–755.
[6] E A Messent, J C Buckland Wright, and G M Blake. “Fractal analysis of
trabecular bone in knee osteoarthritis (OA) is a more sensitive marker
of disease status than bone mineral density (BMD)”. In: Calcified Tissue
International 76 (2005), pp. 419–425.
[7] P Podsiadlo and G W Stachowiak. “Analysis of trabecular bone texture
by modified Hurst orientation transform”. In: Medical Physics 29 (2002),
pp. 460–474.
[8] M Wolski, P Podsiadlo, and G W Stachowiak. “Directional fractal signature
analysis of trabecular bone: evaluation of different methods to detect
early osteoarthritis in knee radiographs”. In: Proceedings of the Institution
11
CHAPTER 1. INTRODUCTION
of Mechanical Engineers - Part H: Journal of Engineering in Medicine 223 (2009),
pp. 211–236.
[9] N Orlov, L Shamir, T Macura, J Johnston, M D Eckley, and I G Goldberg.
“WND-CHARM: Multi-purpose image classification using compound
image transforms”. In: Pattern Recognition Letters 29 (2008), pp. 1684–1693.
[10] V B Kraus, S Feng, S C Wang, S White, M Ainslie, A Brett, A Holmes, and
H C Charles. “Trabecular morphometry by fractal signature analysis is a
novel marker of osteoarthritis progression”. In: Arthritis & Rheumatism 60
(2009), pp. 3711–3722.
[11] L Shamir, S M Ling, W Scott, M Hochberg, L Ferrucci, and I G Goldberg.
“Early detection of radiographic knee osteoarthritis using computer aided
analysis”. In: Osteoarthritis and Cartilage 17 (2009), pp. 1307–1312.
[12] B Liu, H D Cheng, J Huang, J Tian, X Tang, and J Liu. “Fully automatic
and segmentation-robust classification of breast tumors based on local
texture analysis of ultrasound images”. In: Pattern Recognition 43 (2010),
pp. 280–298.
[13] B van Ginneken, L Hogeweg, and M Prokop. “Computer-aided diagnosis
in chest radiography: Beyond nodules”. In: European Journal of Radiology 72
(2009), pp. 226–230.
[14] S G Armato III, A S Roy, H MacMahon, F Li, K Doi, S Sone, and M B Altman.
“Evaluation of automated lung nodule detection on low-dose computed
tomography scans from a lung cancer screening program”. In: Academic
Radiology 12 (2005), pp. 337–346.
[15] C Serrano and B Acha. “Pattern analysis of dermoscopic images based on
Markov random fields”. In: Pattern Recognition 42 (2009), pp. 1052–1057.
[16] G P Stachowiak, P Podsiadlo, and G W Stachowiak. “Shape and texture
features in the automated classification of adhesive and abrasive wear
particles”. In: Tribology Letters 24 (2006), pp. 15–26.
12
CHAPTER 1. INTRODUCTION
[17] G P Stachowiak, G W Stachowiak, and P Podsiadlo. “Automated
classification of wear particles based on their surface texture and shape
features”. In: Tribology International 41 (2008), pp. 34–43.
[18] C Buckland Wright. “Subchondral bone changes in hand and knee
osteoarthritis detected by radiography”. In: Osteoarthritis and Cartilage 12
(2004), S10–S19.
[19] C Ding, F Cicuttini, and G Jones. “Tibial subchondral bone size and knee
cartilage defects: relevance to knee osteoarthritis”. In: Osteoarthritis and
Cartilage 15 (2007), pp. 479–486.
[20] P Podsiadlo, L Dahl, M Englund, L S Lohmander, and G W Stachowiak.
“Differences in trabecular bone texture between knees with and without
radiographic osteoarthritis detected by fractal methods”. In: Osteoarthritis
and Cartilage 16 (2008), pp. 323–329.
[21] M Wolski, P Podsiadlo, G W Stachowiak, L S Lohmander, and M Englund.
“Differences in trabecular bone texture between knees with and without
radiographic osteoarthritis detected by directional fractal signature
method”. In: Osteoarthritis and Cartilage 18 (2010), pp. 684–690.
[22] L Pothuaud, C L Benhamou, P Porion, E Lespessailles, R Harba, and
P Levitz. “Fractal dimension of trabecular bone projection texture is related
to three dimensional microarchitecture”. In: Journal of Bone and Mineral
Research 15 (2000), pp. 691–699.
[23] R Jennane, R Harba, G Lemineur, S Bretteil, A Estrade, and C L Benhamou.
“Estimation of the 3D self similarity parameter of trabecular bone from its
2D projection”. In: Medical Image Analysis 11 (2007), pp. 91–98.
[24] L Pothuaud, P Carceller, and D Hans. “Correlations between grey level
variations in 2D projection images (TBS) and 3D microarchitecture:
Applications in the study of human trabecular bone microarchitecture”. In:
Bone 42 (2008), pp. 775–787.
13
CHAPTER 1. INTRODUCTION
[25] G Luo, J H Kinney, J J Kaufman, D Haupt, A Chiabrera, and
R S Siffert. “Relationship between plain radiographic patterns and
three-dimensional trabecular architecture in the human calcaneus”. In:
Osteoporosis International 9 (1999), pp. 339–345.
[26] P Brodatz. Textures: A Photographic Album for Artists and Designers. Dover
Publications, New York, 1966.
[27] T Ojala, M Pietikainen, and T Maenpaa. “Multiresolution gray-scale and
rotation invariant texture classification with local binary patterns”. In:
IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002),
pp. 971–987.
[28] L I Kuncheva. Combining Pattern Classifiers: Methods and Algorithms.
Wiley-Interscience, 2004.
[29] J Kittler, M Hatef, R P W Duin, and J Matas. “On combining classifiers”.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998),
pp. 226–239.
[30] L Breiman. “Bagging predictors”. In: Machine Learning 24 (1996),
pp. 123–140.
[31] Y Freund and R E Schapire. “A decision-theoretic generalization of on-line
learning and an application to boosting”. In: Journal of Computer and System
Sciences 55 (1997), pp. 119–139.
14
CHAPTER 1. INTRODUCTION
5. List of figures
Literature
review
Paper 1
(chapter 2)
Paper 2
(chapter 3)
Paper 3
(chapter 4)
Automated system for
detection and prediction
of knee osteoarthritis
Paper 4
(chapter 5)
Development
Evaluation
Figure 1: Thesis overview.
15
CHAPTER 2
A SIGNATURE DISSIMILARITY MEASURE FOR
TRABECULAR BONE TEXTURE IN KNEE
RADIOGRAPHS
Tomasz Woloszynski1, Pawel Podsiadlo, PhD1, Gwidon W Stachowiak, PhD1,
and Marek Kurzynski, PhD2
1Tribology Laboratory, School of Mechanical and Chemical Engineering,
University of Western Australia, Australia
2Chair of Systems and Computer Networks, Wroclaw University of Technology,
Poland
Medical Physics 2010;37;2030–2042
Abstract
Purpose: The purpose of this study is to develop a dissimilarity measure for
the classification of trabecular bone (TB) texture in knee radiographs. Problems
associated with the traditional extraction and selection of texture features and
with the invariance to imaging conditions such as image size, anisotropy, noise,
blur, exposure, magnification, and projection angle were addressed.
Methods: In the method developed, called a signature dissimilarity measure
(SDM), a sum of earth mover’s distances calculated for roughness and orientation
16
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
signatures is used to quantify dissimilarities between textures. Scale-space theory
was used to ensure scale and rotation invariance. The effects of image size,
anisotropy, noise, and blur on the SDM developed were studied using computer
generated fractal texture images. The invariance of the measure to image
exposure, magnification, and projection angle was studied using x-ray images
of human tibia head. For the studies, Mann-Whitney tests with significance level
of 0.01 were used. A comparison study between the performances of a SDM
based classification system and other two systems in the classification of Brodatz
textures and the detection of knee osteoarthritis (OA) were conducted. The other
systems are based on weighted neighbor distance using compound hierarchy of
algorithms representing morphology (WND-CHARM) and local binary patterns
(LBP).
Results: Results obtained indicate that the SDM developed is invariant to image
exposure (2.5–30 mA s), magnification (×1.00–×1.35), noise associated with film
graininess and quantum mottle (<25%), blur generated by a sharp film screen,
and image size (>64×64 pixels). However, the measure is sensitive to changes in
projection angle (>5◦), image anisotropy (>30◦), and blur generated by a regular
film screen. For the classification of Brodatz textures, the SDM based system
produced comparable results to the LBP system. For the detection of knee OA,
the SDM based system achieved 78.8% classification accuracy and outperformed
the WND-CHARM system (64.2%).
Conclusions: The SDM is well suited for the classification of TB texture images
in knee OA detection and may be useful for the texture classification of medical
images in general.
Key words: dissimilarity measure, image classification, scale-space, knee
osteoarthritis, radiographs
17
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
1. Introduction
Trabecular bone (TB) analysis plays an important role in the assessment of
knee osteoarthritis (OA). This is due to the fact that the TB parameters such as
thickness, separation, connectivity, and orientation change with the progression
of the disease [1–4]. Recently, it has been shown that these changes in bone
structure precede radiographic OA, i.e., they occur before joint space narrowing
and osteophyte formation take place [5]. For these reasons, TB has been used in
the assessment of risk and severity of OA in knees. The images of TB used in
the OA assessment are usually acquired by plain radiography [5–7]. Although
other bone imaging techniques such as magnetic resonance imaging (MRI) [8]
and scintigraphy [9] are also used, plain radiography remains the most popular
technique. This is mainly due to its wide accessibility, low cost, and non-invasive
nature, and the fact that the bone texture obtained by plain radiography is directly
related to the three-dimensional bone structure [10–13]. Therefore, x-ray images
of TB are used in this study.
Two main approaches for the assessment of knee OA are used. The first approach
is based on statistical tests for the differences between healthy and OA knee TB
textures. Methods employing this approach include histomorphometric analysis
[2, 3], fractal signature analysis [4, 14–16], and Hurst orientation transform
based methods [6, 17]. Since TB exhibits fractal nature [18, 19], recent research
has been mainly focused on the development of fractal methods. However,
such methods generally lack the ability to describe the bone anisotropy, i.e.,
fractal dimensions (FDs) are calculated only in horizontal and vertical directions
[4, 15], and the ability to fully quantify bone roughness, i.e., FD is the only
parameter used. Fractal methods are also sensitive to imaging conditions such
as noise, magnification, and projection angle [17]. Although some of these
problems have recently been addressed [17, 20], dependence of results on angular
space quantization still remains unresolved. The second approach is based on
18
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
formulating OA detection and prediction as a classification problem [21, 22].
In this approach, multiple features are extracted from x-ray images of TB and
then used to classify patients knees. Image classes are defined according to the
Kellgren-Lawrence (KL) scaling system [23]. However, in all methods used so far,
the calculation, extraction, and selection of texture features is not a trivial matter
[20–22]. Consequently, the development of reliable decision support systems
based on such methods is difficult.
One possible way to address the problem is through the use of a dissimilarity
measure. In this approach, no texture features are produced; instead a direct
measurement of distances between texture images is performed. This eliminates
the problems associated with the extraction and selection of texture features, e.g.,
the high dimensionality of feature space and the estimation of large number
of parameters. In past research, dissimilarity measures were proved to be
useful alternatives to feature-based classifications [24–26]. This includes image
search [27], iris recognition [28], and medical image segmentation [29]. However,
dissimilarity measures that are able to quantify TB anisotropy and roughness at
various scales and that are invariant to x-ray measurement conditions are not yet
available.
In this paper, aforementioned problems were addressed by developing a new
measure, called a signature dissimilarity measure (SDM). In the SDM, a sum
of earth movers distances (EMDs) [30] calculated for roughness and orientation
signatures is used to quantify dissimilarities between textures. The problems
of invariance to imaging conditions, the ability to describe TB roughness and
anisotropy at various scales, and the detection of OA knee were addressed.
2. Method
A dissimilarity measure developed in this study is based on a scale-space theory.
Relevant aspects of the theory are presented first.
19
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
2.1 Scale-space representation of texture image
Assume that texture data is a digital image defined as a function I : X × Y → Z
which assigns a gray value z ∈ Z to a single pixel at location (x, y) ∈ X × Y , i.e.,
z = I(x, y). Let a two-dimensional sampled Gaussian kernel function be given as
G(k,m, σn) =1
2πσ2n
exp
(−k
2 +m2
2σ2n
), (1)
where k,m are integers and σn ∈ R+ is the scale parameter taken from a
predefined increasing sequence σ1, . . . , σN ; N is the total number of scales used.
The range of scale parameters σn depends on the application and database used.
Generally, the range should be set in such a way that it covers the scales of interest
and that the sizes of sampled Gaussian kernels are smaller than the image size.
Then, the scale-space representation of the image can be defined as
L(x, y, σn) = G ∗ I =∞∑
k=−∞
∞∑m=−∞
G(k,m, σn)I(x− k, y −m), (2)
where ∗ is the convolution operator. The Gaussian kernel function G was chosen
because of its causality, homogeneity, and isotropy [31, 32]. Using the scale-space
representation scale-normalized image derivatives are obtained, i.e.,
Lxαyβ ,norm(x, y, σn) = σ(α+β)n Gxαyβ ∗ I, (3)
where α and β are the orders of differentiation in the x and y directions,
respectively, and Gxαyβ(k,m, σn) is the sampled Gaussian derivative given by
Gxαyβ(k,m, σn) =
{∂(α+β)
∂xαyβ
[1
2πσ2n
exp
(−x
2 + y2
2σ2n
)]}∣∣∣∣(x,y)=(k,m)
, (4)
where x and y are continuous variables. The normalized derivatives have a useful
property which is the invariance to image scale [31].
20
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
2.2 Roughness signature
Roughness signature is used to measure the complexity of a texture image.
Complexity is understood as the frequency content of texture. For example, a
texture containing mainly smooth regions would have a low complexity, while
a texture containing rough patches with a number of sharp edges would exhibit
a high complexity. In the case of TB textures, the roughness signature would
be affected by bone properties such as trabecular thickness and separation. The
roughness signature is calculated in the following manner:
1. First, the length of the gradient Diff1 and the Laplacian Diff2 operators are
calculated for a texture image at predefined scales,
Diff1(x, y, σn) =√L2x,norm(x, y, σn) + L2
y,norm(x, y, σn), n = 1, . . . , N, (5)
Diff2(x, y, σn) = Lxx,norm(x, y, σn) + Lyy,norm(x, y, σn), n = 1, . . . , N. (6)
The gradient operator is used to detect edges. The second operator takes
nonzero values for smooth circular regions (called blobs) and it acts as a
smoothness detector.
2. For each pixel, the extremum values of the operators Diff1 and Diff2 are then
found across all scales,
Diffk,σmax(x, y) = maxn=1,...,N
|Diffk(x, y, σn)| , k = 1, 2, (7)
where σmax denotes the scale at which the operator takes the extremum
value at pixel (x, y).
3. Next, a roughness measure is calculated as a difference between the two
extremum operators, i.e.,
R(x, y) = Diff1,σmax(x, y)−Diff2,σmax(x, y). (8)
The measure R(x, y) is negative if a neighbourhood surrounding the pixel
21
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
located at (x, y) is smooth and positive otherwise. The measure is zero if
the neighbourhood does not resemble either a smooth circular region or an
edge.
4. In order to remove the effect of stationary distortions in the image (e.g.,
noise, overexposure, etc.), the roughness measure is normalized with
respect to its standard deviation calculated over the whole image
Rnorm(x, y) =R(x, y)
SD(R(x, y)). (9)
5. Finally, a roughness signature Srough(I) of image I(x, y) is calculated as a
normalized histogram of the roughness measure Rnorm(x, y). The histogram
is built with bin centers lying in the interval (MR − 3.5,MR + 3.5), where
MR is the mean value of Rnorm(x, y). The interval limits were chosen as a
trade off between mapping Rnorm(x, y) on the histogram with a preselected
number of bins and the minimization of the number of Rnorm(x, y) values
that lie outside the interval. The histogram is normalized so that entries
(weights) from all bins sum to 1. Resulting roughness signature Srough(I) is
stored as pairs of bin centers and weights.
The roughness signature resembles the standard texture feature of coarseness [33]
described by Tamura et al. However, the standard Tamura coarseness is a single
number calculated as the average of coarseness measures at each pixel whereas
the signature is a histogram of the roughness measure. The Tamura coarseness
has been modified by using histogram of coarseness measures [34, 35]. However,
both the standard and modified Tamura coarsenesses are not scale and rotation
invariant since the coarseness measure is calculated using square regions of fixed
sizes (i.e., powers of 2).
Examples of fractal and Brodatz texture images with their roughness signatures
are shown in Fig. 1. It can be seen from this figure that the shape of the signature
22
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
and its position with respect to zero depend on texture complexity. The texture
image of reptile skin in Fig. 1(a) taken from the Brodatz album [36] was identified
as relatively smooth since the mean MR is negative. This indicates that blob
regions in this image cover a greater area than patches with sharp edges. For
the rough fractal image shown in Fig. 1(c), the roughness signature takes the
maximum value approximately at zero roughness measure Rnorm(x, y). This
indicates that, on average, a neighbourhood of each pixel taken from the fractal
surface does not resemble either a smooth circular region or an edge. All images
were analyzed using the same range of scales, i.e., σn = 1.1n, n = 1, . . . , 25. The
size of Brodatz and fractal images was 180×180 pixels.
2.3 Orientation signature
Orientation signature is a measure of a texture image direction. The signature
is defined as a histogram of angles between gradient directions calculated at the
pixel locations (x, y) and the principal gradient direction (PGD) of changes in the
entire image. For TB textures, the signature measures orientations of trabeculae.
The orientation signature is calculated in the following steps:
1. First, for each pixel located at (x, y) an angle θ(x, y) between the image
horizontal axis and a gradient vector at scale σmax is calculated, i.e.,
θ(x, y) = tan−1(Ly,norm(x, y, σmax)/Lx,norm(x, y, σmax)). (10)
2. Edge angles associated with rapid changes in pixel brightness values such
as edges are then selected. It is reasonable to select such angles since they
represent dominating texture directions. Smooth regions (blobs) can be
ignored because they do not have any well-defined directional patterns.
For the selection of edge angles, the maximum and minimum values of the
Laplacian operator Diff2 are found for each pixel. The maxima and minima
of Diff2 represent valleys and hills in the texture image, respectively. A
binary image is obtained by adding the maximum and minimum values of
23
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Diff2 for each pixel and setting the threshold at zero level. A perimeter of the
binary image is calculated and then used to locate edges (the rapid changes
in pixel brightness) connecting bright and dark patches in the texture image.
Angles located at the edges are edge angles. As an example, a TB texture
image was used to illustrate steps required for the selection of edge angles
(Fig. 2).
3. Next, the PGD is calculated using a principal component analysis (PCA)
method and image directional derivatives. In the PCA method, a covariance
matrix is used to retrieve information about a spread of data. In a similar
way, a second moment matrix µ2(0) of directional derivatives around zero
provides information about texture changes in x−y coordinates. The second
moment matrix is given by
µ2(0) =
∑L2x,norm
∑Lx,normLy,norm∑
Ly,normLx,norm∑L2y,norm
, (11)
where L·,norm = L·,norm(x, y, σmax) and the sums are taken over pixels lying
on edges. The direction which captures most texture changes (i.e., the
PGD) is obtained by calculating the eigenvector associated with the largest
eigenvalue of the matrix µ2(0).
4. Finally, the orientation signature Sorient(I) of image I(x, y) is calculated
as a smoothed and normalized histogram of angle differences between
the PGD and the edge angles. For the histogram, the bin centers cover
a 180◦ range and each center is computed as the mean value of angle
differences within the bin. The number of bins is chosen in such a way
that the signature can capture image rotations at small angles and the
bin weights are not suppressed by an averaging filter. The filter is used
to reduce the high-frequency peaks in histogram associated with discrete
image rotation. All bin weights are normalized so that their sum is equal
to 1. Finally, the histogram of angles is represented in a two-dimensional
24
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
angular space. This is achieved by wrapping the bin centers around a
circle with a predetermined radius of 45. The resulting orientation signature
Sorient(I) is stored as pairs of bin centers and weights.
The orientation signature is similar to the standard Tamura texture feature of
directionality [33]. However, the Tamura directionality is not scale invariant
since it uses a fixed size (3×3 pixels) edge detection mask in the calculation
of gradients. Also, the feature is not rotation invariant since the histogram of
gradient directions depends on the initial orientation of the image.
As an example, orientation signatures were calculated for Brodatz texture images
shown in Figs. 3(a), (c), and (e). The images represent [Figs. 3(a) and (c)] two
weakly anisotropic leather textures with different orientations and [Fig. 3(e)] an
isotropic weave texture. The size of all images is 180×180 pixels and the scale
range was set to σn = 1.1n, n = 1, . . . , 25. The number of histogram bins was set
to 45. Rose plots of the signatures calculated are shown in Figs. 3(b), (d), and (f).
On the rose plots, the PGDs are marked as thick solid lines. It can be seen from the
figure that the shape of the rose plots depends on texture anisotropy. Elliptical
plots were obtained for the anisotropic textures of leather [Figs. 3(b) and (d)],
while a circular plot was obtained for the isotropic texture of weave [Fig. 3(f)].
It can also be seen that the rose plots obtained for the leather texture images are
virtually the same. This shows that the orientation signature is rotation invariant.
2.4 Signature dissimilarity measure
Roughness and orientation signatures are used to calculate a dissimilarity
measure between texture images. The dissimilarity measure between two
textures A and B is defined as
Diss(A,B) = αEMD[Srough(A), Srough(B)] + βEMD[Sorient(A), Sorient(B)], (12)
25
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
where EMD[·, ·] is the earth movers distance [30] between two signatures and
α, β are the normalization factors. The EMD is calculated as a minimal amount
of work that is needed to move a mass of earth spread in space (represented as
bin weights and centers of one signature) to a collection of holes (represented as
bin weights and centers of the other signature). The minimal amount of work is
found by solving a special case of transportation problem by linear optimization
[30]. The EMD was used because it can reliably compare two signatures with
different bin centers unlike statistical measures based on histograms, e.g.,
chi-square or G-test [30]. The EMDs are normalized with respect to the maximum
values of the EMDs obtained from a training set of images. This ensures that
roughness and orientation signatures equally affect the SDM, i.e., they are equally
informative.
3. Materials and results
The performance of the SDM was evaluated using Brodatz and fractal textures,
x-ray images of a tibia head, healthy and OA knee joints. The numbers of
bins used in the roughness and orientation signatures were set to 50 and 45,
respectively, for all experiments conducted.
3.1 Brodatz textures
A benchmark database generated from the Brodatz album was used to evaluate
the performance of the SDM in a rotation invariant texture classification.
Although the relevance of the album to TB textures is marginal, the former
provides a controlled environment with well-defined and visually separable
classes. Therefore, it is frequently used for preliminary evaluations of the
discriminative power of newly developed texture analysis methods. The
performance of a classification system based on the SDM was compared to
a system based on local binary patterns (LBP) [37] operator with parameters
P,R = 8, 1 + 16, 2 + 24, 3. For the comparison study, a nearest neighbour classifier
was used. This is because the classifier is naturally fitted for dissimilarity-based
26
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
classifications and its simplicity ensures that the results obtained accurately
reflect the ”true” discriminative power of the SDM. The image database described
by Ojala et al. [37] was chosen for the experiments. Although other databases
generated from the Brodatz album are available [38, 39], the chosen database was
specifically designed for rotation invariant texture classifications. Also, the LBP
system achieved the best scores for this database which makes the comparison
more challenging for the SDM. The database was replicated as follows.
Sixteen source textures were captured from the Brodatz texture album. Each
source texture represented one class. For each class, training and testing images
were generated as follows. Eight images of size 256×256 pixels were extracted
from each source texture. The first image was used for training the classifier and
the other images were used for testing. Each image was then rotated at 0◦, 20◦,
30◦, 45◦, 60◦, 70◦, 90◦, 120◦, 135◦, and 150◦, and cropped to the size of 180×180
pixels. This resulted in an image database containing 1280 (16×8×10) images
with 160 (16×1×10) training images and 1120 (16×7×10) testing images. The
training images were split into subimages of sizes 16×16, 30×30, 60×60, 90×90,
and 180×180 pixels resulting in five groups of training images with 19360, 5760,
1440, 640, and 160 subimages, respectively. Rotated images were computed using
bilinear interpolation. In the case of 0◦ and 90◦ rotation angles, an artificial blur
was added to the images to simulate the effect of blurring caused by bilinear
interpolation used for image rotation at other angles. The blur was generated
using circular averaging filter with radius equal to 1. The scale range in the SDM
was set to σn = 0.7 × 1.07n, n = 1, . . . , 25. This ensured that for all scales used,
the sizes of gradient and the Laplacian differential operators were smaller than
the sizes of training subimages.
Two experiments were conducted using the image database generated. In the
first experiment, the training set comprised of the images rotated at four angles:
0◦, 30◦, 45◦, and 60◦. The testing set was presented at six rotation angles: 20◦, 70◦,
27
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
90◦, 120◦, 135◦, and 150◦. In the second experiment, the training set comprised of
the images rotated at a single angle (the training angle). The testing set contained
images rotated at the remaining nine angles. In each class, the roughness and
orientation signatures were calculated for each subimage taken from the training
set and then averaged. As a result, each class was represented by a training pair
of the averaged signatures.
The results obtained from the first experiment are shown in Table 1. These results
represent the classification accuracies (i.e., the percentage of correctly classified
images) of the classification systems for each size of training images. It can be
seen from this table that for small sizes (16×16 and 30×30 pixels) the LBP system
produced the most accurate results. However, when the size of training images
was increased to 60×60, 90×90, and 180×180 pixels the performance of SDM and
LBP systems was comparable. Classification accuracies obtained for the second
experiment were analogous and they are listed in Table 2. Results obtained for
the LBP system were similar to those presented by Ojala et al. [37].
3.2 Fractal textures
To evaluate the effects of image size, anisotropy, noise, and blur fractal textures
were used. All images were generated by a spectral synthesis algorithm [40].
Fractal textures were used because they reflect the multiscale and nonstationary
nature of TB textures [18, 19] and provide a controlled environment for
experiments. However, since fractal textures are a rough approximation of bone
textures, the experimental results obtained only indicate the potential of the
SDM in TB texture classification. Five databases of fractal surface images were
generated.
1. The first image database contained images of isotropic fractal textures with
FD 2.7. Sizes of the images generated were: 64×64, 128×128, 192×192,
and 256×256 pixels. For each image size, there were 20 images of fractal
surfaces. This resulted in four groups of images with the total number of 80
28
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
images (20 images per group).
2. The second image database contained images of anisotropic fractal textures
with FDs 2.8 and 2.2 along two anisotropy directions perpendicular to each
other. The directions were generated along lines inclined to the horizontal
axis of an image at 6k degree angles, where k = 0, 1, . . . , 15. As a result,
16 directions were produced. For each anisotropy direction, a group of ten
fractal texture images of size 256×256 pixels was generated. This resulted
in 16 groups of images with the total number of 160 images (ten images per
group).
3. The third image database contained images of isotropic fractal textures with
FD 2.4 that were corrupted by the Poisson noise. The noise, which was
added to the images, simulated the effect of a quantum mottle observed in
x-ray images [41]. The FD 2.4 was used since this was the lowest dimension
calculated for TB textures in previous studies [17]. This presented a worst
case scenario because images with low FDs are most sensitive to noise [17,
40]. For the database, 20 isotropic fractal texture images of size 256×256
pixels were generated and then corrupted by the Poisson noise with mean
value of 128. A contribution of noise to the image pixel values was varied
between 0% and 25% with the step of 5%. This resulted in six groups of
images (20 images per group).
4. The fourth image database is similar to the third database, except the
isotropic fractal texture images were corrupted by the Gaussian noise. This
database was used to simulate effects of the noise of film graininess in x-ray
images. The Gaussian noise had mean and standard deviation values of 128
and 40, respectively.
5. The fifth image database contained images of isotropic fractal textures with
FD 2.7 smoothed by two kernel operators. Smoothed images were used
to simulate effects of a blur occurring in an image acquisition. The kernel
29
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
operators were generated using two different logit modulation transfer
functions (MTFs) [42]. The first MTF was used to model a sharp Kodak
Lanex Fine/OGscreen film system. The second MTF was used to model
an unsharp Kodak Lanex Regular/OG-screen film system. The high FD
was used since this is the most sensitive case for the study of a blurring
effect [17]. The database contained three groups of 20 images each, i.e.,
unprocessed and smoothed by the kernel operators.
Four experiments were conducted using the fractal image databases generated.
In all experiments the scale range used in the SDM was set to σn = 1.1n, n =
1, . . . , 25 in order to detect differences between fractal textures at a wide range
of scales. In the first experiment, the effect of image size on the SDM
was investigated using the first image database. Each image group in the
database was compared against a reference group of 128×128 pixels images.
The comparison between two groups was performed using the Mann-Whitney
test [43] with P<0.01 considered statistically significant. For the test, two
independent samples were used: (i) The first sample contained dissimilarity
measures calculated for every pair of images taken from the reference group; (ii)
the second sample contained dissimilarity measures calculated for every pair of
images taken from the other group and every cross pair of images taken from
both groups. Mean values with 99% confidence intervals (CI) for each sample
were calculated.
In the second experiment, the effect of changes in the direction of image
anisotropy was investigated using the second image database. Each group of
images was compared against a reference group containing images with the
anisotropy direction at 0◦ angle.
In the third experiment, the sensitivity to noise of the SDM was investigated using
the third and fourth image databases. The original images with FD 2.4 were used
30
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
as the reference group.
In the fourth experiment, the effect of a blur on the SDM was investigated
using the fifth image database. The reference group contained the unsmoothed
isotropic images with FD 2.7.
Mean (99% CI) values of the dissimilarity measures obtained in the four
experiments are listed in Tables 3 – 6. Results obtained showed that the group
of 64×64 pixels images, the groups of images with the anisotropy angle between
30◦ and 60◦, and the group of images smoothed with kernel representing regular
film screen were statistically different from their respective reference groups. For
other groups of images, no statistically significant differences were found.
3.3 Tibia head
Tibia head database was used to investigate the effect of varying x-ray
measurement conditions, i.e., exposure, magnification and projection angle on
the SDM. The database contained images of anteroposterior radiographs of
human tibia head (provided by the Bank of Bone and Tissues, Hollywood Private
Hospital, Perth, Australia). The radiographs were obtained using a Shimadzu
Corporation (Kyoto, Japan) (model P-20) x-ray machine with a fine sharp film.
They were digitized using a film scanner with resolution 50 µm per pixel and
quantized into 256 gray levels. The database was previously used for the
evaluation of modified Hurst orientation transform method [17].
The tibia head database was used to construct three image databases. Each image
database contained groups of 25 overlapping regions of 256×256 pixels selected
under the medial compartment of the tibia head.
1. The first image database was constructed from radiographs taken with
seven different exposures, i.e., 2.5, 5, 7.5, 10, 16, 24, and 30 mA s. The total
number of images in the database was 175 (i.e., one group per exposure and
31
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
25 images per group).
2. The second image database contained x-ray images taken with
magnifications ×1.00, ×1.13, ×1.23, and ×1.35, respectively. The exposure
was set to 16 mA s. The database contained four groups with the total
number of 100 images (i.e., one group per magnification and 25 images per
group).
3. The third image database contained x-ray images of the tibia head taken at
0◦, 5◦, 10◦, and 15◦ projection angles. The exposure was set to 24 mA s. The
database contained four groups with the total number of 100 images (i.e.,
one group per projection angle and 25 images per group).
Three experiments were conducted using the image databases. In all
experiments, the scale range used in the SDM was set to σn = 1.1n, n = 1, . . . , 25
for the same reason as before. In the first experiment, the effect of exposure
changes on the SDM was investigated using the first image database. Each group
of images was compared against a reference group using the Mann-Whitney
test. The group of images obtained from the radiographs taken with exposure
of 2.5 mA s was selected by radiologist as the reference group based on visual
examination. The mean (99% CI) values for each sample were also calculated.
In the second, experiment the effect of magnification changes was examined
using the second database. Each group of images was compared against the
reference group of images taken at magnification ×1.00.
In the third experiment, the effects of changes in projection angle were
investigated using the third database. The group of images taken at 0◦ projection
angle was used as the reference group.
The results obtained from the first experiment are listed in Table 7. These results
32
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
are mean (99% CI) values of the dissimilarity measures calculated between seven
groups of images and the reference group. It can be noticed from this table that
CIs are overlapping. For the significance level of 0.01, no statistical differences
were found between the reference group and the remaining groups of images.
The results obtained from the second experiment are shown in Table 8. No
statistically significant differences were found between the groups of images.
For the third experiment, it was found that groups of images taken at projection
angles greater than 5◦ are statistically different from the reference group P<0.01
as shown in Table 9.
3.4 Healthy and osteoarthritic knees
The image database of healthy and OA knees was used to evaluate the
performance of the SDM in knee OA detection. The radiographs were taken
from 17 healthy and 34 OA subjects. Each subject was locked in a standardized
standing position [44]. Two radiographs were taken per subject (i.e., one per
knee). Each tibiofemoral compartment in each knee radiograph was graded at KL
scale (0-4) by two radiologists with ∼10 yr experience. The compartments were
graded according to the atlas from Osteoarthritis Research Society International
[45]. Disagreements in the KL grade between the two readers were adjudged
by a third reader who was a radiologist with 15 yr experience. The level of
disagreement for the database used was 14.6%. Subjects were divided into
healthy and OA groups based on the KL grades. Healthy subjects had both
tibiofemoral compartments in both knees assigned with KL grade 0 (no OA).
OA subjects had at least one tibiofemoral compartment in any knee assigned
with KL grade 2 (minimal OA) or 3 (moderate OA). This radiographic criterion
roughly correlates with the progression of OA in knee joints. Knee radiographs
were taken in the Perth Radiographic Clinic, Subiaco.
There were 137 TB texture images of size 256×256 pixels each. For healthy
subjects, TB texture images were extracted under the medial and lateral
33
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
compartments using the automated method developed in the previous study
[46]. For OA subjects, the images were extracted under the compartment
diagnosed with radiographic OA using the same method. The images of healthy
knees from OA subjects were not used in this study. This resulted in healthy
and OA classes with 68 and 69 images per class, respectively. The classes were
matched by age, body mass index (BMI), and gender as shown in Table 10.
Examples of TB texture images are shown in Figs. 4 and 5. The non-uniform
background brightness of TB texture images is sometimes corrected during
preprocessing for unbiased analyses [6]. However, since the SDM is based on
differential operators the absolute values of pixels do not affect the measure. For
this reason, only the unprocessed TB texture images were used in the experiment.
Two SDM based systems were used for the knee OA detection, i.e., a SDM-nearest
neighbour (SDM-NN) and a SDM-support vector machine (SDM-SVM). In the
first system, the measure developed was used with a NN classifier for the
same reason as explained before. In the second system, a SVM classifier with
radial basis kernel was used because of its good performance for a wide range
of binary classification problems. The classification accuracy was estimated
using a leave-one-out cross-validation method. The method selects a single
TB texture image for testing and uses the remaining images for training. The
experiment was repeated 137 times so that each image was used for testing only
once. Classification accuracy was calculated as the average value of accuracies
obtained from all repetitions. The scale range in the SDM was set to σn =
2 × 1.07n, n = 1, . . . , 25, which corresponded to trabecular sizes from 0.11 to
1.08 mm. Previous studies based on fractal analyses showed that differences
in FD of TB textures between subjects with healthy and OA knees are most
significant in this range [4, 6]. The performances of proposed systems were
compared against a benchmark weighted neighbor distance using compound
hierarchy of algorithms representing morphology (WND-CHARM) classification
system [47] and a Tamura features based classification system. The former system
34
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
was successfully used in previous studies of knee OA detection and prediction
based on the texture analysis of knee joint areas.[21, 22] In the system, first 2885
features were extracted from each TB texture image, including Zernike, Tamura,
and Haralick features, multiscale histograms, and Chebyshev statistics. A Fisher
score was then calculated for each feature and 10% of features with the highest
scores were selected. The features selected were then used by a weighted nearest
neighbour classifier. In the latter system, Tamura features such as contrast,
directionality, and modified coarseness (average and three-bin histogram) were
used because of their similarity to the signatures developed in quantifying texture
roughness and orientation. A SVM classifier with radial basis kernel function was
used.
Table 11 presents confusion matrices obtained for the SDM-NN, SDM-SVM,
WND-CHARM, and Tamura systems in knee OA detection. Using the matrices
the classification accuracy, specificity (percentage of healthy subjects classified
as healthy), and sensitivity (percentage of OA subjects classified as OA) were
calculated. For the SDM-NN (SDM-SVM, WND-CHARM, and Tamura) system,
their values were 78.8% (85.4%, 64.2%, and 67.9%), 80.9% (83.8%, 70.6%, and
85.3%), and 76.8% (87.0%, 58.0%, and 50.7%), respectively.
To find which parts of the tibia have the strongest signal for OA detection, TB
texture images were extracted from a search region of the tibia head as shown in
Fig. 6. The search region was manually selected from 34 healthy and 57 OA knee
x-ray images, excluding subchondral bone sclerosis, periarticular osteopenia, and
fibula head. The region was resized, if necessary, to fit into a 1280×640 pixels
rectangle. From each region selected, 27 TB texture images of size 256×256 pixels
were extracted using 128 pixel overlap. This resulted in 27 databases of texture
images of healthy and OA knees; one database per tibia part. Healthy knees had
both tibiofemoral compartments assigned with KL grade 0. OA knees had at
least one compartment assigned with KL grade 2 or 3. The SDM-NN system was
35
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
trained using randomly selected 60 TB texture images (30 images per class) and
tested using the remaining 31 images. This experiment was repeated 300 times.
The classification accuracy was calculated as the average value of accuracies
obtained from all repetitions. Classification accuracies calculated for all tibia
parts are shown in Fig. 6. TB texture located immediately under the medial
compartment provides the strongest signal for knee OA detection (77.1%).
3.5 Computational times
The computational time required by the SDM to calculate the roughness and
orientation signatures for a single TB texture image was 1.8 s. The calculation
of the EMD between the two pairs of signatures took additional 0.2 s. The
SDM was implemented in MATLAB using Parallel Computing Toolbox and the
computational times were measured on a Intel Core2 Quad computer with 4 GB
of RAM and 2.66 GHz clock. At this stage, the MATLAB code developed is not
available.
4. Discussion and conclusion
For the classification of TB texture images, a new dissimilarity measure, i.e.,
a SDM was developed. Unlike feature extraction methods, the SDM does not
produce texture parameters. For the measure, special roughness, and orientation
signatures that are invariant to brightness shift, rotation, and scale changes in a
predefined range were developed.
The accuracy of the dissimilarity measure in texture classification was
investigated using Brodatz images. Lower accuracies of the SDM system at
16×16 and 30×30 pixels can be explained by the fact that the gradient and the
Laplacian differential operators use more extrapolated pixel values lying outside
the image boundary than the LBP operator. This problem could be overcome by
eliminating higher scales from the scale range used by the SDM. However, the
elimination of higher scales could limit the scale invariance of the SDM and this
36
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
could lead to high classification errors. The other way to overcome this problem
is by increasing training image sizes to at least 60×60 pixels.
The invariance of the SDM to noise, blur, changes in anisotropy direction, and
image size was evaluated using fractal texture images. It was found that the
SDM is invariant to Gaussian noise and Poisson noise with their contribution in
the image up to 25%. This is because changes in pixel values due to noise do
not affect image anisotropy and spatial arrangements of blob regions. Results
obtained showed that the SDM is sensitive to the blur generated by regular
film screen. The reason for this is that the blur suppresses texture details in
image. This affects image roughness and, subsequently, the roughness signature
changes. The problem could be solved by eliminating lower scales from the scale
range used by the SDM. However, this would limit the scale invariance of the
dissimilarity measure. Therefore, for the best performance of the SDM sharp film
screens should be used.
The SDM was invariant to changes in image anisotropy except for angles between
30◦ and 60◦. For these angles, the texture roughness is most affected by the pixel
interpolation used to represent anisotropy in directions other than horizontal and
vertical. This effect could be reduced by increasing the image resolution. It was
observed that the SDM is sensitive to the reduction in image size from 128×128 to
64×64 pixels. The reason is that in smaller images, the percentage of extrapolated
pixels used by the differential operators is higher than in larger images. The
other reason is that smaller images contain less detailed information about texture
roughness. The problem could be overcome by using image sizes larger than
64×64 pixels.
The invariance of the SDM to exposure, magnification and projection angle was
investigated using the tibia head database. Results obtained showed that the
SDM is invariant to exposure (2.5-30 mA s) and magnification (×1.00-×1.35).
37
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Exposure causes changes in the width and position of image histogram [48]. The
SDM is insensitive to these changes because the roughness measure Rnorm(x, y) is
normalized and the differential operators used are invariant to brightness shift.
The invariance to magnification was achieved because the differential operators
are based on the scale-space theory. It was found that projection angles >5◦ affect
the measure. This is because the front view of bone texture is obstructed at higher
angles. Therefore, all precautions should be taken to ensure that radiographs are
taken at similar projection angle.
The performances of the SDM based classification systems in knee OA detection
were evaluated using classes of healthy and OA knee images. For the SDM-NN
system, the classification accuracy of 78.8% can be considered as relatively good
when compared against the other systems, but it is lower than the detection
rate of radiographic OA obtained by an experienced human reader (∼100%).
The values of specificity and sensitivity were similar and this indicates that the
SDM has a good detection trade off between healthy and OA classes. However,
the sensitivity value (76.8%) was a little lower than the specificity value (80.9%)
which is usually unadvisable in medical applications. In this case, there is more
chance for the subject with OA to be classified as healthy, than the other way
around. Examples of misclassified TB texture images are shown in Fig. 7. In
these examples the bone structures, especially trabeculae orientations, are similar
to those found in images from the opposite class (Figs. 5 and 7). The SDM-SVM
system achieved the best overall classification accuracy (85.4%). One possible
reason could be that the SVM classifier can account for possible nonlinear
decision boundaries between healthy and OA classes. The other reason could
be that the classifier works well in the presence of outliers or class overlapping
near the true decision boundary. The values of sensitivity (87.0%) and specificity
(83.8%) are slightly unbalanced in favour of knee OA detection rate which is
preferable. However, the classification accuracy of the SDM-SVM system is
still lower than the detection rate obtained by an experienced human reader.
38
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
The accuracies of these SDM based systems could be improved by adjusting
parameters of the SDM and using multiclassifiers.
The effect of different parts of the tibia on the performance of the SDM-NN
system was studied. It was found that the strongest signal related to OA detection
is provided by TB texture located immediately under the medial compartment
(77.1%). One possible reason is that most OA progression occurs medially due
to greater biomechanical load on this site of the knee [1]. This is supported by
the previous studies conducted on fractal analyses [4, 6]. The second strongest
signal was found for the part of the tibia located under the lateral compartment.
However, the classification accuracy calculated for this part was relatively low
(70.4%). The reason could be that for 57 OA knees used in the analysis there were
49 and 34 knees with radiographic OA in the medial and lateral compartments,
respectively. Therefore, the alterations in lateral tibial bone structure due to OA
were represented by fewer TB textures than in the medial case. It can be seen from
Fig. 6 that there is a large overlap between parts of the tibia with the strongest
signals related to OA detection and those selected by the automated method.
There were two TB texture images extracted from most knee radiographs. Since
image properties such as magnification, orientation, and contrast are similar
for TB texture images extracted from the same radiograph this could lead to
overoptimistic results. However, the effects of these properties on the SDM
are minimized because the measure is scale and rotation invariant, and the
roughness and orientation signatures are normalized. It was found that for
two out of 108 images that were correctly classified by the SDM-NN system,
their nearest neighbours were images extracted from the same radiograph,
respectively. Classification accuracy obtained using the classifier based on the
second nearest neighbour for the two images was 78.8%, i.e., the same as for the
original setup. Since all classifiers are based on the concept of similarity between
objects taken from the same class, one may conclude that the performances of
39
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
systems evaluated were not optimistically biased.
Individual contributions of the roughness and orientation signatures to the
performance of the SDM based classification system can provide information
about TB texture changes related to OA. The contributions were calculated in
the following manner: First, for each TB texture image, the distance to its nearest
neighbour was decomposed into the distances associated with the roughness and
orientation, respectively. Then, each distance was normalized in such a way
that their sum was equal to 1. For the correctly classified texture images the
normalized distances were subtracted from 1. Finally, all normalized distances
were averaged. The averaged values obtained for the roughness and orientation
signatures were 0.56 and 0.44, respectively. This indicates that the roughness of
TB texture is most affected by OA.
The SDM based system outperformed the WND-CHARM classification system
in the knee OA detection. One possible reason for the worse performance of the
WND-CHARM system (64.2%) is that the features used are not scale invariant.
The other reason is that the features are sensitive to angular space quantization
(e.g., directional Radon transform and Haralick features) and to image rotation
(e.g., Tamura features). Thus, even a small change in magnification or rotation
of TB texture image affects the feature values. Another possible reason could
be that the WND-CHARM system was not specifically designed for TB texture
analysis. However, it works well for the whole area of knee joints [21]. The value
of sensitivity obtained for the WND-CHARM system was 58.0%. This suggests
that the classification of knees with radiographic OA is almost random.
The SDM based systems achieved better overall classification accuracies than
the Tamura system. Possible reasons are that the Tamura features are sensitive
to image scale and orientation, and they correlate well with human visual
perception while trabeculae patterns in x-ray images are difficult to classify by
40
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
human readers. The sensitivity value obtained for the Tamura system was 50.7%
which suggests that the classification of OA knees is random.
The results obtained indicate the potential of SDM in decision support systems
for the detection and prediction of knee OA based on TB texture analysis. The
measure could be also useful in classifications of other medical images. This
includes, for example, classification of breast lesions using ultrasound images
[49], diagnosis of interstitial lung disease based on chest radiography [50, 51],
classification of dermoscopic images for skin lesions [52], and classification of
brain tumours from MRI [53] after extending the SDM to three dimensions.
Future research will focus on refining SDM based classification systems, e.g., by
using multiclassifiers, and evaluating the systems using large databases in clinical
settings.
Acknowledgements
Financial support from the School of Mechanical Engineering, University of
Western Australia is greatly appreciated.
41
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
5. References
[1] E L Radin and R M Rose. “Role of Subchondral Bone in the Initiation
and Progression of Cartilage Damage”. In: Clinical Orthopaedics and Related
Research 213 (1986), pp. 34–40.
[2] L Kamibayashi, U P Wyss, T D V Cooke, and B Zee. “Trabecular
Microstructure in the Medial Condyle of the Proximal Tibia of Patients with
Knee Osteoarthritis”. In: Bone 17 (1995), pp. 27–35.
[3] L Kamibayashi, U P Wyss, T D V Cooke, and B Zee. “Changes in mean
trabecular orientation in the medial condyle of the proximal tibia in
osteoarthritis”. In: Calcified Tissue International 57 (1995), pp. 69–73.
[4] E A Messent, R J Ward, C J Tonkin, and C Buckland Wright. “Tibial
cancellous bone changes in patients with knee osteoarthritis. A short term
longitudinal study using Fractal Signature Analysis”. In: Osteoarthritis and
Cartilage 13 (2005), pp. 463–470.
[5] C Buckland Wright. “Subchondral bone changes in hand and knee
osteoarthritis detected by radiography”. In: Osteoarthritis and Cartilage 12
(2004), S10–S19.
[6] P Podsiadlo, L Dahl, M Englund, L S Lohmander, and G W Stachowiak.
“Differences in trabecular bone texture between knees with and without
radiographic osteoarthritis detected by fractal methods”. In: Osteoarthritis
and Cartilage 16 (2008), pp. 323–329.
[7] H Defossez, R M Hall, P G Walker, B M Wroblewski, P D Siney, and B
Purbach. “Determination of the trabecular bone direction from digitised
radiographs”. In: Medical Engineering & Physics 25 (2003), pp. 719–729.
[8] E Lespessailles, C Chappard, N Bonnet, and C L Benhamou. “Imaging
techniques for evaluating bone microarchitecture”. In: Joint Bone Spine 73
(2006), pp. 254–261.
42
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
[9] P Dieppe, J Cushnaghan, P Young, and J Kirwan. “Prediction of the
progression of joint space narrowing in osteoarthritis of the knee by bone
scintigraphy”. In: Annals of the Rheumatic Diseases 52 (1993), pp. 557–563.
[10] L Pothuaud, C L Benhamou, P Porion, E Lespessailles, R Harba, and
P Levitz. “Fractal dimension of trabecular bone projection texture is related
to three dimensional microarchitecture”. In: Journal of Bone and Mineral
Research 15 (2000), pp. 691–699.
[11] R Jennane, R Harba, G Lemineur, S Bretteil, A Estrade, and C L Benhamou.
“Estimation of the 3D self similarity parameter of trabecular bone from its
2D projection”. In: Medical Image Analysis 11 (2007), pp. 91–98.
[12] L Pothuaud, P Carceller, and D Hans. “Correlations between grey level
variations in 2D projection images (TBS) and 3D microarchitecture:
Applications in the study of human trabecular bone microarchitecture”. In:
Bone 42 (2008), pp. 775–787.
[13] G Luo, J H Kinney, J J Kaufman, D Haupt, A Chiabrera, and
R S Siffert. “Relationship between plain radiographic patterns and
three-dimensional trabecular architecture in the human calcaneus”. In:
Osteoporosis International 9 (1999), pp. 339–345.
[14] J A Lynch, D J Hawkes, and J C Buckland Wright. “Analysis of texture in
macroradiographs of osteoarthritic knees using the fractal signature”. In:
Physics in Medicine and Biology 36 (1991), pp. 709–722.
[15] J C Buckland Wright, J A Lynch, and D G Macfarlane. “Fractal signature
analysis measures cancellous bone organisation in macroradiographs of
patients with knee osteoarthritis”. In: Annals of the Rheumatic Diseases 55
(1996), pp. 749–755.
[16] E A Messent, J C Buckland Wright, and G M Blake. “Fractal analysis of
trabecular bone in knee osteoarthritis (OA) is a more sensitive marker
of disease status than bone mineral density (BMD)”. In: Calcified Tissue
International 76 (2005), pp. 419–425.
43
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
[17] P Podsiadlo and G W Stachowiak. “Analysis of trabecular bone texture
by modified Hurst orientation transform”. In: Medical Physics 29 (2002),
pp. 460–474.
[18] N L Fazzalari and I H Parkinson. “Fractal dimension and architecture of
trabecular bone”. In: Journal of Pathology 178 (1996), pp. 100–105.
[19] I H Parkinson and N L Fazzalari. “Methodological principles for
fractal analysis of trabecular bone”. In: Journal of Microscopy 198 (2000),
pp. 134–142.
[20] M Wolski, P Podsiadlo, and G W Stachowiak. “Directional fractal signature
analysis of trabecular bone: evaluation of different methods to detect
early osteoarthritis in knee radiographs”. In: Proceedings of the Institution
of Mechanical Engineers - Part H: Journal of Engineering in Medicine 223 (2009),
pp. 211–236.
[21] L Shamir, S M Ling, W W Scott, Jr., A Bos, N Orlov, T J Macura, D M Eckley,
L Ferrucci, and I G Goldberg. “Knee x-ray image analysis method for
automated detection of osteoarthritis”. In: IEEE Transactions on Biomedical
Engineering 56 (2009), pp. 407–415.
[22] L Shamir, S M Ling, W Scott, M Hochberg, L Ferrucci, and I G Goldberg.
“Early detection of radiographic knee osteoarthritis using computer aided
analysis”. In: Osteoarthritis and Cartilage 17 (2009), pp. 1307–1312.
[23] J H Kellgren and J S Lawrence. “Radiological assessment of osteoarthrosis”.
In: Annals of the Rheumatic Diseases 16 (1957), pp. 494–502.
[24] R P W Duin, D de Ridder, and D M J Tax. “Experiments with a featureless
approach to pattern recognition”. In: Pattern Recognition Letters 18 (1997),
pp. 1159–1166.
[25] E Pekalska, P Paclik, and R P W Duin. “A generalized kernel approach to
dissimilarity based classification”. In: Journal of Machine Learning Research 2
(2001), pp. 175–211.
44
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
[26] E Pekalska and R P W Duin. “Dissimilarity representations allow
for building good classifiers”. In: Pattern Recognition Letters 23 (2002),
pp. 943–956.
[27] H Jegou, C Schmid, H Harzallah, and J Verbeek. “Accurate Image Search
Using the Contextual Dissimilarity Measure”. In: IEEE Transactions on
Pattern Analysis and Machine Intelligence 32 (2010), pp. 2–11.
[28] N Sudha and Y H K Wong. “Hausdorff distance for iris recognition”. In:
22nd IEEE International Symposium on Intelligent Control. 2007, pp. 614–619.
[29] W Zhu, T Jiang, and X Li. “Local region based medical image segmentation
using J divergence measures”. In: 27th Annual International Conference of the
Engineering in Medicine and Biology Society. 2005, pp. 7174–7177.
[30] Y Rubner, C Tomasi, and L J Guibas. “The Earth Movers Distance as a
metric for image retrieval”. In: International Journal of Computer Vision 40
(2000), pp. 99–121.
[31] T Lindeberg. “Scale-space for discrete signals”. In: IEEE Transactions on
Pattern Analysis and Machine Intelligence 12 (1990), pp. 234–254.
[32] J Babaud, A P Witkin, M Baudin, and R O Duda. “Uniqueness of the
Gaussian kernel for scale-space filtering”. In: IEEE Transactions on Pattern
Analysis and Machine Intelligence PAMI 8 (1986), pp. 26–33.
[33] H Tamura, S Mori, and T Yamawaki. “Textural features corresponding to
visual perception”. In: IEEE Transactions on Systems, Man and Cybernetics
SMC-8 (1978), pp. 460–473.
[34] M Wei-Ying and Z H Jiang. “Benchmarking of image features for
content-based retrieval”. In: Conference Record of the Thirty-Second Asilomar
Conference on Signals, Systems & Computers. 1998, pp. 253–257.
[35] V Castelli and L D Bergman. Image Databases: Search and Retrieval of Digital
Imagery. John Wiley & Sons, Inc., New York, 2002.
45
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
[36] P Brodatz. Textures: A Photographic Album for Artists and Designers. Dover
Publications, New York, 1966.
[37] T Ojala, M Pietikainen, and T Maenpaa. “Multiresolution gray-scale and
rotation invariant texture classification with local binary patterns”. In:
IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002),
pp. 971–987.
[38] T Randen and T J H Husoy. “Filtering for texture classification: A
comparative study”. In: IEEE Transactions on Pattern Analysis and Machine
Intelligence 21 (1999), pp. 291–310.
[39] J Zhang, M Marszalek, S Lazebnik, and C Schmid. “Local features and
kernels for classification of texture and object categories: a comprehensive
study”. In: International Journal of Computer Vision 73 (2007), pp. 213–238.
[40] J C Russ. Fractal surfaces. Plenum Press, New York, 1994.
[41] A G Haus and S M Jaskulski. The Basics of Film Processing in Medical Imaging.
Medical Physics Publishing, Madison, 1997.
[42] J A Bencomo and B G Fallone. “A logit model for the modulation transfer
function of screen film systems”. In: Medical Physics 13 (1986), pp. 857–860.
[43] H B Mann and D R Whitney. “On a test of whether one of two
random variables is stochastically larger than the other”. In: The Annals of
Mathematical Statistics 18 (1947), pp. 50–60.
[44] P Podsiadlo and G W Stachowiak. “A rig for acquisition of standardized
trabecular bone radiographs”. In: Acta Radiologica 43 (2002), pp. 101–103.
[45] R D Altman, M Hochberg, W A Murphy, Jr., F Wolfe, and M
Lequesne. “Atlas of individual radiographic features in osteoarthritis”. In:
Osteoarthritis and Cartilage 3 (1995), A3–A70.
[46] P Podsiadlo, M Wolski, and G W Stachowiak. “Automated selection of
trabecular bone regions in knee radiographs”. In: Medical Physics 35 (2008),
pp. 1870–1883.
46
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
[47] N Orlov, L Shamir, T Macura, J Johnston, M D Eckley, and I G Goldberg.
“WND-CHARM: Multi-purpose image classification using compound
image transforms”. In: Pattern Recognition Letters 29 (2008), pp. 1684–1693.
[48] J C Russ. The Image Processing Handbook, 4th ed. CRC Press, Florida, 2002.
[49] B Liu, H D Cheng, J Huang, J Tian, X Tang, and J Liu. “Fully automatic
and segmentation-robust classification of breast tumors based on local
texture analysis of ultrasound images”. In: Pattern Recognition 43 (2010),
pp. 280–298.
[50] B van Ginneken, L Hogeweg, and M Prokop. “Computer-aided diagnosis
in chest radiography: Beyond nodules”. In: European Journal of Radiology 72
(2009), pp. 226–230.
[51] S G Armato III, A S Roy, H MacMahon, F Li, K Doi, S Sone, and M B Altman.
“Evaluation of automated lung nodule detection on low-dose computed
tomography scans from a lung cancer screening program”. In: Academic
Radiology 12 (2005), pp. 337–346.
[52] C Serrano and B Acha. “Pattern analysis of dermoscopic images based on
Markov random fields”. In: Pattern Recognition 42 (2009), pp. 1052–1057.
[53] P Georgiadis, D Cavouras, I Kalatzis, D Glotsos, E Athanasiadis,
S Kostopoulos, K Sifaki, M Malamas, G Nikiforidis, and E Solomou.
“Enhancing the discrimination accuracy between metastases, gliomas and
meningiomas on brain MRI by volumetric textural features and ensemble
pattern recognition methods”. In: Magnetic Resonance Imaging 27 (2009),
pp. 120–130.
47
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
6. List of figures
(a)
-4 -2 0 20
0.02
0.04
0.06
0.08
Roughness measure Rnorm
(x,y)
Bin
weig
hts
MR = -0.32
(b)
(c)
-2 0 2 40
0.02
0.04
0.06
0.08
Roughness measure Rnorm
(x,y)
Bin
weig
hts
MR = 0.05
(d)
Figure 1: Texture images and their respective roughness signatures: [(a) and (b)]A reptile skin texture taken from Brodatz album; [(c) and (d)] an isotropic fractaltexture image with FD 2.9.
48
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
(a) (b) (c)
(d) (e) (f)
Figure 2: Steps required for the selection of edge angles in texture image: (a)A trabecular bone texture image; (b) and (c) minimum (bright patches representhills) and maximum (bright patches represent valleys) values of the Laplacianoperator Diff2 calculated for the bone texture image; (d) a binary image of theminimum and maximum values of Diff2 (white regions represent minima; blackregions represent maxima); (e) edges represented as a perimeter of the binaryimage; and (f) the bone texture image with superimposed edges. Edge angles areselected at the locations of edges.
49
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
(a) (b)
(c) (d)
(e) (f)
Figure 3: Brodatz texture images and their respective orientation signatures: [(a),(b), (c), and (d)] Leather texture images rotated at 70◦ and 120◦; [(e) and (f)] aweave texture image.
50
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Figure 4: Examples of TB texture regions extracted from the medial and lateralcompartments of the tibia.
51
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
(a) (b) (c) (d)
Figure 5: Examples of TB texture images taken from [(a) and (b)] healthy and [(c)and (d)] OA classes.
52
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Figure 6: Classification accuracies (%) calculated for different parts of tibia head.Two areas corresponding to the highest classification accuracies are highlighted.Square regions with white boundary were selected by the automated method.
53
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
(a) (b) (c) (d)
Figure 7: Examples of misclassified images of [(a) and (b)] healthy and [(c) and(d)] OA TB textures.
54
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
7. List of tables
Table 1: Classification accuracies of Brodatz textures calculated using fourtraining angles.
Accuracy (%)
Size of training images SDM LBP
16×16 87.35 97.91
30×30 94.79 98.21
60×60 98.80 98.21
90×90 98.95 98.06
180×180 99.10 98.06
55
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Tabl
e2:
Cla
ssifi
cati
onac
cura
cies
ofBr
odat
zte
xtur
esca
lcul
ated
usin
gsi
ngle
trai
ning
angl
e.
Size
oftr
aini
ngim
ages
/ac
cura
cy(%
)
SDM
LBP
Trai
ning
angl
e(d
eg)
16×
1630×
3060×
6090×
9018
0×18
016×
1630×
3060×
6090×
9018
0×18
0
078
.07
88.7
891
.96
90.4
793
.45
94.5
494
.54
95.1
395
.03
94.8
4
2083
.03
96.3
299
.10
99.0
099
.10
96.7
298
.01
97.9
198
.01
98.0
1
3085
.81
97.3
299
.10
99.1
099
.10
97.7
198
.11
98.0
198
.11
98.0
1
4586
.30
96.0
398
.80
99.1
099
.10
97.5
197
.42
97.5
197
.61
97.6
1
6086
.30
94.4
497
.42
97.3
299
.10
97.6
197
.51
97.7
197
.91
97.9
1
7084
.53
94.3
495
.63
96.6
298
.61
98.0
198
.01
98.0
198
.01
98.0
1
9077
.18
91.4
698
.01
98.7
198
.71
93.1
592
.26
92.8
593
.05
93.2
5
120
83.4
398
.21
99.1
099
.10
99.1
098
.11
97.9
198
.01
98.0
198
.11
135
83.9
296
.82
98.8
099
.10
99.1
097
.81
97.3
297
.42
97.6
197
.61
150
88.1
994
.34
96.4
296
.92
98.6
198
.21
98.0
198
.11
98.1
198
.11
Ave
rage
83.6
894
.81
97.4
497
.54
98.4
096
.94
96.9
197
.07
97.1
597
.15
56
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Table 3: Mean (99% CI) values of the SDM calculated for different image sizes; 20images were used for each image size.
Image size Mean (99% CI)
64×64 0.68 (0.03)∗
128×128 (reference group) 0.58 (0.06)
192×192 0.52 (0.02)
256×256 0.53 (0.03)∗ Statistically significant differences (P<0.01).
57
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Table 4: Mean (99% CI) values of the SDM calculated for different isotropy angles;ten images were used for each anisotropy direction.
Anisotropy direction (deg) Mean (99% CI)
0 (reference group) 0.58 (0.08)
6 0.54 (0.03)
12 0.62 (0.04)
18 0.60 (0.04)
24 0.60 (0.04)
30 0.70 (0.05)∗
36 0.79 (0.06)∗
42 0.84 (0.06)∗
48 0.78 (0.06)∗
54 0.80 (0.06)∗
60 0.80 (0.06)∗
66 0.60 (0.05)
72 0.58 (0.04)
78 0.59 (0.04)
84 0.54 (0.03)
90 0.55 (0.04)∗ Statistically significant differences (P<0.01).
58
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Table 5: Mean (99% CI) values of the SDM calculated for different noisecontribution, no statistically significant differences were found; 20 images wereused for each noise type.
Contribution ofnoise (%)
Mean (99% CI)(Poisson noise)
Mean (99% CI)(Gaussian noise)
0 (reference group) 0.66 (0.05) 0.66 (0.05)
5 0.64 (0.03) 0.64 (0.03)
10 0.64 (0.03) 0.64 (0.03)
15 0.64 (0.03) 0.64 (0.03)
20 0.64 (0.03) 0.64 (0.03)
25 0.64 (0.03) 0.64 (0.03)
59
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Table 6: Mean (99% CI) values of the SDM calculated for different film screens; 20images were used for each film screen.
Film screen Mean (99% CI)
No screen (reference group) 0.60 (0.04)
Fine 0.62 (0.02)
Regular 0.64 (0.02)∗
∗ Statistically significant differences (P<0.01).
60
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Table 7: Mean (99% CI) values of the SDM calculated for different exposures,no statistically significant differences were found; 25 images were used for eachexposure.
Exposure value (mA s) Mean (99% CI)
2.5 (reference group) 0.71 (0.07)
5 0.70 (0.03)
7.5 0.69 (0.03)
10 0.69 (0.03)
15 0.65 (0.03)
24 0.63 (0.03)
30 0.67 (0.03)
61
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Table 8: Mean (99% CI) values of the SDM calculated for different magnifications,no statistically significant differences were found; 25 images were used for eachmagnification.
Magnification Mean (99% CI)
×1.00 (reference group) 0.67 (0.06)
×1.13 0.64 (0.03)
×1.23 0.64 (0.03)
×1.35 0.70 (0.03)
62
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Table 9: Mean (99% CI) values of the SDM calculated for different projectionangles; 25 images were used for each angle.
Projection angle (deg) Mean (99% CI)
0 (reference group) 0.53 (0.03)
5 0.54 (0.01)
10 0.58 (0.02)∗
15 0.71 (0.03)∗
∗ Statistically significant differences (P<0.01).
63
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Table 10: Details of healthy and OA classes.
ClassNo. of TB
textureimages
Mean (SD)age in years
Mean (SD)BMI in kg/m2 Men/Women
Healthy 68 40.8 (7.4) 24.6 (4.5) 0.70
OA 69 44.5 (7.5) 27.1 (4.0) 0.69
64
CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .
Table 11: Confusion matrices of SDM-NN, SDM-SVM, WND-CHARM, andTamura systems for knee OA detection.
SDM-NN SDM-SVM WND-CHARM Tamura
Healthy OA Healthy OA Healthy OA Healthy OA
Healthy 55 13 57 11 48 20 58 10
OA 16 53 9 60 29 40 34 35
65
CHAPTER 3
PREDICTION OF PROGRESSION OF RADIOGRAPHIC
KNEE OSTEOARTHRITIS USING TIBIAL
TRABECULAR BONE TEXTURE
Tomasz Woloszynski1, Pawel Podsiadlo, PhD1, Gwidon W Stachowiak, PhD1,
Marek Kurzynski, PhD2, L Stefan Lohmander, MD, PhD3,4, and Martin
Englund, MD, PhD3,5
1Tribology Laboratory, School of Mechanical and Chemical Engineering,
University of Western Australia, Australia
2Chair of Systems and Computer Networks, Wroclaw University of Technology,
Poland
3Department of Orthopedics, Clinical Sciences, Lund University, Sweden
4Research Unit for Musculoskeletal Function and Physiotherapy, Institute of
Sports Science and Clinical Biomechanics, and Department Orthopaedics and
Traumatology, University of Southern Denmark, Denmark
5Clinical Epidemiology Research & Training Unit, Boston University School of
Medicine, MA, USA
Arthritis & Rheumatism 2012;64;688–695
66
CHAPTER 3. PREDICTION OF PROGRESSION . . .
Abstract
Objective. To develop a system for prediction of progression of radiographic
knee osteoarthritis (OA) using tibial trabecular bone (TB) texture.
Methods. We studied 203 knees with (n=68) or without (n=135) radiographic
tibiofemoral OA in 105 subjects (90 men, 15 women, mean age 54 years) who had
2 sets of knee radiographs taken 4 years apart. We determined medial and lateral
compartment tibial TB texture using an automated region selection method.
Three texture parameters were calculated: roughness, degree of anisotropy, and
direction of anisotropy based on a signature dissimilarity measure method. We
evaluated tibiofemoral OA progression using a radiographic semi-quantitative
outcome: an increase in the medial joint space narrowing (JSN) grade. We
examined the predictive ability of TB texture in knees with and without
pre-existing radiographic OA, with adjustment for age, sex, and body mass
index using logistic regression (generalized estimating equations) and receiver
operating characteristic curves.
Results. The prediction of increased medial JSN in knees with or without
pre-existing radiographic OA was the most accurate for medial TB texture; area
under the curve (AUC) was 0.77 and 0.75, respectively. For lateral TB texture,
AUC was 0.71 and 0.72, respectively.
Conclusion. We have developed a system, based on analysing tibial TB texture,
which yields good prediction of loss of tibiofemoral joint space. The predictive
ability of the system needs to be further validated.
Key words: osteoarthritis, radiography
67
CHAPTER 3. PREDICTION OF PROGRESSION . . .
1. Introduction
Osteoarthritis (OA) is the most common knee joint disease and the leading cause
of knee pain and functional disability in adults [1]. On a structural level, knee
OA is characterized by loss of articular cartilage, meniscal tears and maceration,
osteophytes, and microstructural changes in subchondral bone [2–5]. Previous
studies have suggested that bone changes occur before cartilage defects [6]. The
two-dimensional trabecular bone (TB) texture provided by plain radiography
contains information directly related to the three-dimensional bone structure
[7–10]. Because of these findings there is a growing interest in developing a
low-cost and non-invasive TB texture-based system for predicting progression of
early and late knee OA, i.e. a method to examine the size, shape, and orientation
of TB trying to foresee the risk for structural progression of OA.
In a previous study structural features of x-ray images of the whole knee joint
predicted OA progression, defined as an increase from Kellgren and Lawrence
(KL) [11] grade 0 at baseline to grade 2 at follow-up 20 years later [12]. For
the prediction, a Weighted Neighbor Distance using Compound Hierarchy of
Algorithms Representing Morphology (WND-CHARM) classification system
was used and the classification accuracy of 62% was obtained. In a further study
the TB texture was used to predict OA progression defined as an increase in
medial joint space narrowing (JSN) grade over a 3-year period [13]. This study
was done using a system based on a regression model and fractal signature
analysis (FSA). The system achieved the prediction accuracy value (defined as an
area under the receiver operating characteristic curve, AUC) for OA progression
of 0.75. Although results obtained from these two systems are promising, the
interpretation of bone texture changes is not easy. This is because the image
features extracted in the WND-CHARM system and the polynomial coefficients
used in the regression model and FSA based system have little or no physical
meaning. Also, the WND-CHARM system is sensitive to imaging conditions
68
CHAPTER 3. PREDICTION OF PROGRESSION . . .
such as magnification and rotation, while the box-counting technique used in
the calculation of FSA highly depends on trabecular marrow pore size and
signal-to-noise ratio [14].
In the present study, we used a well-defined cohort of subjects with prior
meniscectomy having weight-bearing knee radiographs taken 4 years apart.
We developed trabecular bone structure parameters based on a signature
dissimilarity measure (SDM) method [15] that quantifies roughness, degree of
anisotropy and direction of anisotropy of TB textures. These parameters are
invariant to a range of image magnification, exposure, noise, and blur. Unlike the
previous studies [12, 13], we evaluated progression of both early and late medial
compartment knee OA. This allowed for a thorough assessment of the influence
of changes in TB texture in the different stages of knee OA.
2. Subjects and methods
2.1 Subjects
The study was approved by the ethics committee of the Faculty of Medicine
at Lund University, Sweden and informed consent was obtained from all
participants. Subjects were retrospectively identified via surgical records to have
undergone isolated medial or lateral meniscectomy at Lund University Hospital
in 1983, 1984, or 1985 [16, 17]. Exclusion criteria included cruciate ligament
injury, previous knee surgery (i.e., knee surgery before the index meniscectomy),
meniscectomy in both knee compartments, osteochondritis dissecans, fracture in
or adjacent to the knee, septic arthritis, osteonecrosis, and radiographic signs of
knee OA at the time of meniscectomy [17].
Out of 519 identified subjects, 254 who did not meet any of the exclusion criteria
were invited to participate at the first radiographic follow-up examination (Exam
A) in 2000. Knee x-rays were taken from 155 subjects who were then invited to a
69
CHAPTER 3. PREDICTION OF PROGRESSION . . .
second radiographic examination (Exam B) in 2004. For 106 subjects, longitudinal
knee x-rays were obtained (Table 1). We excluded one subject due to bilateral
radiographic end-stage OA at Exam A (i.e. medial grade joint space narrowing
(JSN) grade 3, osteotomy or arthroplasty), and further 6 knees (in 6 subjects)
were excluded due to the same reason. One knee was also excluded due to
artefacts present on the film preventing the image analysis, leaving 203 knees
in 105 subjects for analysis.
2.2 Acquisition and grading of knee radiographs
At Exam A, standing anteroposterior (AP) digital radiographs of the tibiofemoral
joint at about 15◦ flexion were obtained using a fluoroscopically positioned x-ray
beam [17]. At Exam B, posteroanterior digital radiographs of the tibiofemoral
joint were obtained using the fixed flexion (SynaFlexer) protocol [18, 19]. In a pilot
series prior to Exam B, we acquired on the same day knee radiographs from 10
subjects (20 knees) using both protocols. Grading the radiographs (side-by-side
comparison) we detected some discrepancies but no statistical or systematic
differences between the pairs of knees with respect to semi-quantitative JSN
and osteophytes scoring according to the 1995 atlas of Osteoarthritis Research
Society International (OARSI) [20]. Since bone textures from radiographs at Exam
B were not used, having two different protocols did not affect subsequent image
analyses.
Two readers who had the knowledge of time sequence but were blinded to
clinical data and the other raters readings graded the paired knee radiographs
from Exam A and B. They read for JSN and osteophytes in the tibiofemoral joint
on a four point scale (0 to 3, where 0 indicates no evidence of JSN or osteophytes)
according to the 1995 OARSI atlas [20]. Interobserver agreement using weighted
kappa was 0.84 for JSN and 0.72 for osteophytes. The films with discrepancy of
any JSN or osteophyte grade between the investigators were then adjudicated,
i.e. a consensus reading was made. At Exam B we classified one subject who
70
CHAPTER 3. PREDICTION OF PROGRESSION . . .
was operated on with proximal tibial valgus osteotomy in the left knee, between
examinations, as having JSN grade 3 in the medial compartment.
2.3 Definitions
Radiographic knee OA
We defined a knee to have radiographic tibiofemoral OA if one or more of the
following criteria were fulfilled in either the medial or lateral compartment:
• JSN grade ≥2
• Sum of marginal osteophytes grades in the same tibiofemoral compartment
≥2
• JSN grade 1 and marginal osteophytes grade 1 in the same tibiofemoral
compartment
These criteria approximate grade 2 or worse on the KL scale.
Progression of medial compartment radiographic knee OA
Because lateral compartment OA is rare, we focused on medial compartment OA
only. We defined progression of medial compartment radiographic knee OA as
an increase in the medial compartment JSN grade.
First, we analysed all knees as one group irrespective of radiographic status
at Exam A. Second, we stratified the analysis for the absence or presence of
pre-existing radiographic tibiofemoral OA, as defined above. Hence, early and
late radiographic OA progressions in the medial compartment were evaluated
separately.
2.4 Trabecular bone image analysis
We used digital radiographs taken at Exam A (Phasix 60 generator, CGR, Liege,
Belgium). For the image analysis the radiographs were converted from DICOM
71
CHAPTER 3. PREDICTION OF PROGRESSION . . .
to uncompressed TIFF format and stored as 8-bit gray-scale level images with the
resolution of 146 µm per image pixel. Previous studies showed that 8-bit images
contain sufficient details for evaluation of OA changes [13, 21]. Image analysis
was performed blinded to the outcome.
Region selection
An automated region selection method was used to determine the TB region
of interest (ROI) on the digital radiographs [22]. The method selects the ROI
on the subchondral bone immediately under the cortical plate of the medial
and lateral tibial compartment, respectively, in a series of steps (Fig. 1). The
steps include delineation of cortical bone plates using active shape model and
fine ROI adjustment for fibula head, periarticular osteopenia, and subchondral
bone sclerosis. Landmarks used are: tibial borders, tibial spine, fibula head, and
cortical plates. Epiphyseal bone and physis are not considered in the selection of
ROIs. Size of each TB texture image selected was 112×112 pixels, which covered
an area of 16.4×16.4 mm.
Bone texture parameters
We calculated three TB texture parameters, i.e. roughness, degree of anisotropy
and direction of anisotropy using the SDM method. In the method, a scale-space
representation of a bone texture image is generated as a set of images in which the
fine-scale features are successively smoothed. This is achieved by the convolution
of the bone image with Gaussian kernels of increasing width parameter (called
scale). 25 scales ranging from 1 to 9 pixels in steps of 1.096n, n = 0, . . . , 24
were used. Having the image representation, for each pixel the gradient (edge
detector) and Laplacian (smoothness detector) operators are calculated across
all scales and the extremum values of the operators are found. A difference
between the extremum values and an angle associated with the extremum
gradient value define roughness and orientation measures at each pixel location,
respectively. A normalized histogram of the roughness (orientation) measure,
72
CHAPTER 3. PREDICTION OF PROGRESSION . . .
called a roughness (orientation) signature, is then generated. The shape and
position of the roughness signature with respect to zero describe the roughness
of bone texture image. If the signature has the maximum value approximately
at zero this indicates that, on average, a neighbourhood of each pixel does not
resemble either smooth regions or edge patches. For the signature skewed toward
negative values, each pixel has its neighbourhood, on average, with more blobs
(smooth regions) than edges. For positive values this is opposite. The above
procedure is repeated for all bone texture images.
Roughness (R1,R2)
For roughness (i.e. complexity of a bone texture) measurement Earth Mover’s
Distances (EMDs) [23] between all possible pairs of roughness signatures are
calculated. The EMDs represent an image distance space. We defined two
independent roughness parameters (R1, R2) as a projection of the distances
calculated between the roughness signatures of TB texture images on a
two-dimensional space. For the distance projection, Sammons nonlinear
mapping [24] was used. The space dimension was chosen as a trade off between
avoiding the ”curse of dimensionality” and being able to capture possible
nonlinear relations in the distance space. The parameters provide a measure of
the overall texture roughness. Higher roughness indicates that there are more
sharp-edged texture features (i.e. more thin and long trabeculae and more narrow
spaces in between them). The parameters were normalized in such a way that the
smoothest and roughest TB texture in this study corresponded to the values (0,
0) and (1, 1), respectively, for (R1, R2). Characterisation of TB texture roughness
provides valuable information about TB changes in OA [3, 25].
Degree of Anisotropy (DegA)
We defined the degree of anisotropy as the sum of squared bin weights in the
73
CHAPTER 3. PREDICTION OF PROGRESSION . . .
orientation signature, i.e.:
DegA =∑θ
[S(θ)]2, (1)
where S(θ) is the weight of the angle θ in the orientation signature. The DegA
parameter is a measure of the overall anisotropy of TB texture; the higher value
of the parameter, the higher is the degree of anisotropy (i.e. there are more
sharp-edged texture features aligned in the same direction). The parameter was
normalized in such a way that the values 0 and 1 for DegA represent the least and
most anisotropic TB texture. Anisotropy of TB texture changes with OA [21, 25].
Direction of Anisotropy (DirA)
Direction of anisotropy was defined as the average value of normalized bin
centers in the orientation signature, i.e.
DirA =
∑θ θ · S(θ)∑θ S(θ)
. (2)
This parameter measures the weighted average direction of trabeculae
alignments. Each weight S(θ) is proportional to the ”sharpness” of the trabeculae
aligned along the angle θ, i.e. longer and thinner (shorter and thicker) trabeculae
have higher (lower) weights. This allows for quantifying the overall direction of
bone texture with a single number that depends on the sharpness and alignment
of trabeculae. DirA is equal to 0◦ for bone texture that has all trabeculae aligned
to the image vertical direction. OA changes in TB texture at different directions
are significant [21, 26].
2.5 Statistics
To evaluate predictive abilities of the texture parameters we used two binary
logistic regression models (Model 1 and Model 2). For both models the covariates
used were the texture parameters and their quadratic and first-order interaction
terms. Forward/backward parameter selection or a subset of the parameters with
74
CHAPTER 3. PREDICTION OF PROGRESSION . . .
significant associations were not used, i.e. all linear, quadratic and interaction
terms of three texture parameters were always included in the models. In Model 2
we further adjusted for age, sex, and body mass index (BMI). Hosmer-Lemeshow
tests were used to assess goodness of fit. To account for correlation between
knees within the same subject, we estimated regression coefficients using type III
generalized estimating equations with exchangeable working correlation matrix.
A prediction score for the progression of early and late medial compartment
radiographic OA was calculated for each knee. The score was an average value
of all covariates weighted by the regression coefficients.
We constructed the receiver operating characteristic (ROC) curves based on
the scores using 10-fold cross-validation method [27]. The cross-validation was
repeated 300 times and the averaged ROC curves were calculated. The area under
the curve (AUC) was used as a measure of the overall performance of the model.
For the null model, the AUC is equal to 0.5. The two models were not optimized
for AUC since our aim was to develop models that can provide an accurate
prediction without being optimized for a particular performance measure. As
this is an exploratory study, we focused on identifying the models and terms that
are predictive of loss of tibiofemoral joint space and hence we did not correct
significant associations in the models for multiple testing.
The statistical analysis was performed in SPSS software, release 16.0 (SPSS Inc.,
Chicago IL, USA).
3. Results
Radiographic characteristics
At Exam A, 68 knees (33%) in 51 subjects (49%) were classified as having
radiographic tibiofemoral OA (Table 1). Fifty-four knees (27%) in 41 subjects
(39%) had an increase in the medial compartment JSN grade from Exam A to B.
75
CHAPTER 3. PREDICTION OF PROGRESSION . . .
Associations with subject characteristics
Associations of the texture parameters of medial ROI with age, sex, BMI, and
medial JSN grade at Exam A were calculated using analysis of covariance. The
R1 and R2 parameters were both associated with age (P<0.01) and sex (P<0.01).
For both the DegA and DirA parameters, associations were found with BMI
(P<0.01) and medial JSN grade (P=0.02). For lateral ROI there were no significant
associations.
Prediction of OA progression
We studied all knees in the sample (n=203). The highest prediction accuracy
(AUC 0.77) was obtained using the medial ROI and Model 2 (Table 2). The
quadratic terms of the texture parameters R2, DegA and DirA, and the interaction
terms R1∗R2 and DegA∗DirA were significantly associated with increase in
medial JSN grade (Table 3). In this multivariable model with the highest
prediction accuracy, the covariates age (P=0.76), sex (P=0.57), and BMI (P=0.06)
were found not to be statistically significant. The ROC curves obtained for the
model are shown in Fig. 2.
Prediction of early OA progression
We studied knees (n=135) with KL grade equivalents≤1 (calculated from JSN and
osteophytes grades) at Exam A. Model 2 constructed for the medial ROI achieved
the highest prediction accuracy (AUC 0.75, Table 2). Significant associations were
found for the interaction and quadratic terms of the DegA and DirA parameters.
Age (P=0.85), sex (P=0.89) and BMI (P=0.71) were not significant (Table 3).
Prediction of late OA progression
We studied knees (n=68) with KL grade equivalents≥2 at Exam A. Once again the
highest prediction accuracy (AUC 0.77) was obtained using the medial ROI and
Model 2 (Table 2). The parameters R1, R2, their interaction term R1∗R2, sex, and
76
CHAPTER 3. PREDICTION OF PROGRESSION . . .
BMI were significantly associated with loss of medial compartment joint space.
Age (P=0.82) was found to be not statistically significant (Table 3).
The results were obtained for Models 1 and 2 that included linear, quadratic and
interaction terms. If not all terms were used the prediction accuracies ranged
from 0.54 to 0.60 AUC.
For model based on age, sex and BMI the accuracies were 0.58 (all knees), 0.52
(knees with early OA progression) and 0.66 (knees with late OA progression)
AUC. Adding medial JSN grade at Exam A to the model increased the AUC
values by 0.02, and further adding the texture parameters increased them to
0.75 (all knees), 0.74 (knees with early progression) and 0.77 (knees with late OA
progression).
4. Discussion
In this study we confirm that medial tibial TB texture is predictive of loss
of medial joint space in knees with existing OA [13]. Importantly, we extend
these findings to provide a detailed description of TB roughness and anisotropy
changes due to knee OA and to show for the first time that medial tibial TB texture
is also predictive of medial joint space loss in knees with KL grade equivalents
≤1 at baseline and that lateral tibial TB texture is predictive of medial joint space
loss.
We have several possible explanations for the prediction of loss of medial
compartment joint space by TB texture, and they are not mutually exclusive.
First, there may be an interaction between subchondral bone remodelling
and the modulation of cartilage catabolism [28]. Previous studies suggest that
an increase in vascularisation and remodelling rate of OA subchondral bone
promotes diffusion of cytokines, eicosanoids and growth factors into articular
cartilage, thus affecting chondrocytes and inducing cartilage degradation [29–32].
77
CHAPTER 3. PREDICTION OF PROGRESSION . . .
This could lead to the onset of secondary ossification and a decrease in
cartilage thickness [33]. Second, genetic factors may play a role in the abnormal
metabolism of subchondral bone, leading to morphological alterations and
subsequently to radiographic OA changes. For example, Wnt signaling
antagonists are involved in the proliferation, differentiation and mineralisation of
osteoblasts [34–37], and polymorphisms of their encoding genes were identified
in patients with knee [38] and hip [39] OA. Also, abnormal production of
IGF-1 growth factor in OA subchondral bone osteoblasts could increase bone
remodelling rate and stiffness, leading to cartilage matrix degradation [28].
Third, TB texture changes may indicate abnormal trabecular bone structure
due to unfavourable biomechanical loading, which may also adversely affect
the overlying joint cartilage. Previous studies showed that high systemic
bone mineral density (BMD) increases the risk of incident knee OA and JSN
[40]. Meniscal damage is associated with increased BMD in the ipsilateral
compartment [41], and with increased risk of the development of subchondral
bone marrow lesions [42].
For medial ROI and knees without pre-existing radiographic OA, the DegA∗DirA
term was positively associated with an increase in the medial JSN grade (Table 3).
This indicates an increase in the degree of anisotropy of TB texture and a shift
in the alignment of trabeculae towards the horizontal direction. The finding
is consistent with previous studies where the retention of horizontal trabeculae
within the medial subchondral region was observed [3, 43]. The bone architecture
may be reorganized in knees with early-stage OA as a result of trabeculae being
less well aligned to the main loading direction [44].
For knees with pre-existing OA, the R1 and R2 terms (medial ROI) were
negatively and positively associated with an increase in the medial JSN grade,
respectively (Table 3). This suggests a nonlinear change in the overall roughness.
The negative association of the R1 term could indicate a decrease in the overall
78
CHAPTER 3. PREDICTION OF PROGRESSION . . .
roughness of TB which can be associated with shorter and thicker trabeculae.
Previous studies found that during OA bone remodelling the TB thickness
increased, especially in the main loading direction [3–5], and the medial tibial
plateau bone area expanded [45]. The expansion of the bone area has been
associated with the thickening of subchondral trabeculae in knees with early
and moderate OA [46]. The positive association of the R2 term could indicate
an increase in the number of thinner trabeculae which can be attributed to
fenestration and thinning. It was suggested that the high roughness of TB texture
in late knee OA is caused by osteoporosis [43]. The DegA and DirA parameters
were found not to be statistically significant, while female gender and high BMI
were associated with increased risk of progression.
We also found that the lateral compartment tibial TB texture predicts medial
joint space loss. A possible explanation is that medial compartment OA is
often associated with relative unloading of the lateral compartment inducing
bone resorption [47]. Further, medial cartilage thickness correlates with lateral
apparent trabecular number, thickness and separation [47, 48].
Our study has several important limitations that we would like to point out. First,
two different radiographic protocols were used, although both of them were in
weight bearing and with the knee in about the same degree of flexion. A pilot
sample prior to Exam B doing both protocols did not suggest any systematic
effects on the semi-quantitative scoring of JSN and osteophytes. Second, all
the subjects have had prior partial meniscectomy in at least one knee and it is
unclear whether prediction would be different for a cohort without meniscal
surgery. Third, the sample size available for lateral compartment OA was too
small, hence was not analysed. Fourth, for the prediction of OA we used
discrete medial JSN and osteophyte grades whereas the nature of knee OA is
continuous. This could affect our results since the grading of OA features is
subject to the reader’s interpretation and the grades may not be linear with
79
CHAPTER 3. PREDICTION OF PROGRESSION . . .
respect to the actual progression of knee OA [49]. A further limitation is that
we did not analyse x-ray images at Exam A for subjects lost to follow-up (32% of
subjects). Hence, there are no estimates if the texture parameters differ from those
subjects who completed Exam B. Finally, the texture parameters do not provide
information about bone texture changes at individual scales and directions as
fractal signatures do. However, they are able to quantify texture changes at each
pixel location over all scales. Further studies with large databases of knee images
are required to evaluate the full potential of different approaches in the prediction
of OA.
In conclusion, the system for the automated analysis of TB texture showed
promising results in the prediction of loss of tibiofemoral joint space. In
particular, we showed that the texture parameters markedly improve the model
for prediction of joint space based on age, sex, BMI, and JSN grade, and that a
good prediction of medial joint space loss in knees with early OA progression (i.e.
KL grade equivalents ≤1 at baseline) can be obtained. The prediction accuracy
of this system needs to be further validated using large databases of knee images
from other populations.
Acknowledgements
The study was supported by grants from the School of Mechanical and Chemical
Engineering, University of Western Australia, the Swedish Research Council,
and Lund University Medical Faculty. Drs Englund and Lohmander are funded
by the Swedish Research Council, the Greta and Johan Kock Foundation, King
Gustaf V 80-year Birthday Foundation, and the Faculty of Medicine, Lund
University, Sweden.
Conflict of interest
The authors have no conflict of interest for this manuscript.
80
CHAPTER 3. PREDICTION OF PROGRESSION . . .
5. References
[1] G Peat, R McCarney, and P Croft. “Knee pain and osteoarthritis in older
adults: a review of community burden and current use of primary health
care”. In: Annals of the Rheumatic Diseases 60 (2001), pp. 91–97.
[2] E L Radin and R M Rose. “Role of subchondral bone in the initiation
and progression of cartilage damage”. In: Clinical Orthopaedics and Related
Research 213 (1986), pp. 34–40.
[3] J C Buckland-Wright. “Subchondral bone changes in hand and knee
osteoarthritis detected by radiography”. In: Osteoarthritis and Cartilage 12
(2004), S10–S19.
[4] L Kamibayashi, U P Wyss, T D V Cooke, and B Zee. “Trabecular
microstructure in the medial condyle of the proximal tibia of patients with
knee osteoarthritis”. In: Bone 17 (1995), pp. 27–35.
[5] L Kamibayashi, U P Wyss, T D V Cooke, and B Zee. “Changes in mean
trabecular orientation in the medial condyle of the proximal tibia in
osteoarthritis”. In: Calcified Tissue International 57 (1995), pp. 69–73.
[6] C Ding, F Cicuttini, and G Jones. “Tibial subchondral bone size and knee
cartilage defects: relevance to knee osteoarthritis”. In: Osteoarthritis and
Cartilage 15 (2007), pp. 479–486.
[7] L Pothuaud, C L Benhamou, P Porion, E Lespessailles, R Harba, and
P Levitz. “Fractal dimension of trabecular bone projection texture is related
to three-dimensional microarchitecture”. In: Journal of Bone and Mineral
Research 15 (2000), pp. 691–699.
[8] R Jennane, G Harba, G Lemineur, S Bretteil, A Estrade, and C L Benhamou.
“Estimation of the 3D self-similarity parameter of trabecular bone from its
2D projection”. In: Medical Image Analysis 11 (2007), pp. 91–98.
81
CHAPTER 3. PREDICTION OF PROGRESSION . . .
[9] L Pothuaud, P Carceller, and D Hans. “Correlations between grey-level
variations in 2D projections images (TBS) and 3D microarchitecture:
Applications in the study of human trabecular bone microarchitecture”. In:
Bone 42 (2008), pp. 775–787.
[10] L Apostol, V Boudousq, O Basset, C Odet, S Yot, J Tabary, J M Dinten,
E Boiler, P O Kotzki, and F J Peyrin. “Relevance of 2D radiographic texture
analysis for the assessment of 3D bone micro-architecture”. In: Medical
Physics 33 (2006), pp. 3546–3556.
[11] J H Kellgren and J S Lawrence. “Radiological assessment of osteoarthrosis”.
In: Annals of the Rheumatic Diseases 16 (1957), pp. 494–502.
[12] L Shamir, S M Ling, W Scott, M Hochberg, L Ferrucci, and I G Goldberg.
“Early detection of radiographic knee osteoarthritis using computer-aided
analysis”. In: Osteoarthritis and Cartilage 17 (2009), pp. 1307–1312.
[13] V B Kraus, S Feng, S C Wang, S White, M Ainslie, A Brett, A Holmes, and
H C Charles. “Trabecular morphometry by fractal signature analysis is a
novel marker of osteoarthritis progression”. In: Arthritis & Rheumatism 60
(2009), pp. 3711–3722.
[14] H W Chung, C C Chu, M Underweiser, and F W Wehrli. “On the fractal
nature of trabecular structure”. In: Medical Physics 21 (1994), pp. 1535–1540.
[15] T Woloszynski, P Podsiadlo, G W Stachowiak, and M Kurzynski. “A
signature dissimilarity measure for trabecular bone texture in knee
radiographs”. In: Medical Physics 37 (2010), pp. 2030–2042.
[16] M Englund, E M Roos, H P Roos, and L S Lohmander. “Patient-relevant
outcomes fourteen years after meniscectomy: influence of type of meniscal
tear and size of resection”. In: Rheumatology (Oxford) 40 (2001), pp. 631–639.
[17] M Englund and E M Roos L S Lohmander. “Impact of type of meniscal
tear on radiographic and symptomatic knee osteoarthritis: a sixteen-year
followup of meniscectomy with matched controls”. In: Arthritis &
Rheumatism 48 (2003), pp. 2178–2187.
82
CHAPTER 3. PREDICTION OF PROGRESSION . . .
[18] C Peterfy, J Li, S Zaim, J Duryea, J Lynch, Y Miaux, W Yu, and H K Genant.
“Comparison of fixed-flexion positioning with fluoroscopic semi-flexed
positioning for quantifying radiographic joint-space width in the knee:
test-retest reproducibility”. In: European Radiology 32 (2004), pp. 128–132.
[19] M Kothari, A Guermazi, G von Ingersleben, Y Miaux, M Sieffert, J E Block,
R Stevens, and C G Peterfy. “Fixed-flexion radiography of the knee
provides reproducible joint space width measurements in osteoarthritis”.
In: European Radiology 14 (2004), pp. 1568–1573.
[20] R D Altman, M Hochberg, W A Murphy, Jr., F Wolfe, and M
Lequesne. “Atlas of individual radiographic features in osteoarthritis”. In:
Osteoarthritis and Cartilage 3 (1995), A3–A70.
[21] P Podsiadlo, L Dahl, M Englund, L S Lohmander, and G W Stachowiak.
“Differences in trabecular bone texture between knees with and without
radiographic osteoarthritis detected by fractal methods”. In: Osteoarthritis
and Cartilage 16 (2008), pp. 323–329.
[22] P Podsiadlo, M Wolski, and G W Stachowiak. “Automated selection of
trabecular bone regions in knee radiographs”. In: Medical Physics 35 (2008),
pp. 1870–1883.
[23] Y Rubner, C Tomasi, and L J Guibas. “The earth movers distance as a metric
for image retrieval”. In: International Journal of Computer Vision 40 (2000),
pp. 99–121.
[24] J W Sammon. “A nonlinear mapping for data structure analysis”. In: IEEE
Transactions on Computers C18 (1969), pp. 401–409.
[25] E A Messent, R J Ward, C J Tonkin, and C Buckland-Wright. “Tibial
cancellous bone changes in patients with knee osteoarthritis. A short-term
longitudinal study using Fractal Signature Analysis”. In: Osteoarthritis and
Cartilage 13 (2005), pp. 463–470.
83
CHAPTER 3. PREDICTION OF PROGRESSION . . .
[26] M Wolski, P Podsiadlo, G W Stachowiak, L S Lohmander, and M Englund.
“Differences in trabecular bone texture between knees with and without
radiographic osteoarthritis detected by directional fractal signature
method”. In: Osteoarthritis and Cartilage 18 (2010), pp. 684–690.
[27] W Adler and B Lausen. “Bootstrap estimated true and false positive rates
and ROC curve”. In: Computational Statistics & Data Analysis 53 (2009),
pp. 718–729.
[28] S K Tat, D Lajeunesse, J P Pelletier, and J Martel-Pelletier. “Targeting
subchondral bone for treating osteoarthritis: what is the evidence?” In: Best
Practice & Research Clinical Rheumatology 24 (2010), pp. 51–70.
[29] A M Coats, P Zioupos, and R M Aspden. “Material properties of
subchondral bone from patients with osteoporosis or osteoarthritis by
microindentation testing end electron probe microanalysis”. In: Calcified
Tissue International 73 (2003), pp. 66–71.
[30] D B Burr. “The importance of subchondral bone in osteoarthrosis”. In:
Current Opinion in Rheumatology 10 (1998), pp. 256–262.
[31] D B Burr and M B Schaffler. “The involvement of subchondral mineralized
tissues in osteoarthrosis: quantitative microscopic evidence”. In: Microscopy
Research and Technique 37 (1997), pp. 343–357.
[32] H Imhof, M Breitenseher, F Kainberger, T Rand, and S Trattnig.
“Importance of subchondral bone to articular cartilage in health and
disease”. In: Topics in Magnetic Resonance Imaging 10 (1999), pp. 180–192.
[33] D B Burr. “Increased biological activity of subchondral mineralized
tissues underlies the progressive deterioration of articular cartilage in
osteoarthritis”. In: The Journal of Rheumatology 32 (2005), pp. 1156–1158.
[34] X Li, P Liu, W Liu, P Maye, J Zhang, Y Zhang, M Hurley, C Guo, A Boskey,
L Sun, S E Harris, D W Rowe, H Zhu Ke, and D Wu. “Dkk2 has a role in
terminal osteoblast differentiation and mineralized matrix formation”. In:
Nature Genetics 37 (2005), pp. 945–952.
84
CHAPTER 3. PREDICTION OF PROGRESSION . . .
[35] J Li, I Sarosi, R C Cattley, J Pretorius, F Asuncion, M Grisanti, S Morony,
S Adamu, Z Geng, W Qiu, P Kostenuik, D L Lacey, W S Simonet,
B Bolon, X Qian, V Shalhoub, M S Ominsky, H Zhu Ke, X Li, and
W G Richards. “Dkk1-mediated inhibition of Wnt signalling in bone results
in osteopenia”. In: Bone 39 (2006), pp. 754–766.
[36] F Morvan, K Boulukos, P Clment-Lacroix, S Roman-Roman, I Suc-Royer,
B Vayssiere, P Ammann, P Martin, S Pinho, P Pognonec, P Mollat, C Niehrs,
R Baron, and G Rawadi. “Deletion of a single allele of the Dkk1 gene leads
to an increase in bone formation and bone mass”. In: Journal of Bone and
Mineral Research 21 (2006), pp. 934–945.
[37] D Diarra, M Stolina, K Polzer, J Zwerina, M S Ominsky, D Dwyer, A Korb,
J Smolen, M Hoffmann, C Scheinecker, D van der Heide, R Landewe, D
Lacey, W G Richards, and G Schett. “Dickkopf-1 is a master regulator of
joint remodelling”. In: Nature Medicine 13 (2007), pp. 156–163.
[38] A M Valdes, J Loughlin, M V Oene, K Chapman, G L Surdulescu,
M Doherty, and T D Spector. “Sex and ethnic differences in the association
of ASPN, CALM1, COL2A1, COMP, and FRZB with genetic susceptibility
to osteoarthritis of the knee”. In: Arthritis & Rheumatism 56 (2007),
pp. 137–146.
[39] J Loughlin, B Dowling, K Chapman, L Marcelline, Z Mustafa, L Southam,
A Ferreira, C Ciesielski, D A Carson, and M Corr. “Functional variants
within the secreted frizzled-related protein 3 gene are associated with hip
osteoarthritis in females”. In: Proceedings of the National Academy of Sciences
USA 101 (2004), pp. 9757–9762.
[40] M C Nevitt, Y Zhang, M K Javaid, T Neogi, J R Curtis, J Niu, C E McCulloch,
N A Segal, and D T Felson. “High systemic bone mineral density
increases the risk of incident knee OA and joint space narrowing, but not
radiographic progression of existing knee OA: the MOST study”. In: Annals
of the Rheumatic Diseases 69 (2010), pp. 163–168.
85
CHAPTER 3. PREDICTION OF PROGRESSION . . .
[41] G H Lo, J Niu, C E McLennan, D P Kiel, R R McLean, A Guermazi,
H K Genant, T E McAlindon, and D J Hunter. “Meniscal damage associated
with increased local subchondral bone mineral density: a Framingham
study”. In: Osteoarthritis and Cartilage 16 (2008), pp. 261–267.
[42] M Englund M, A Guermazi, F W Roemer, M Yang, Y Zhang, M C Nevitt,
J A Lynch, C E Lewis, J Torner, and D T Felson. “Meniscal pathology on
MRI increases the risk for both incident and enlarging subchondral bone
marrow lesions of the knee: the MOST Study”. In: Annals of the Rheumatic
Diseases 69 (2010), pp. 1796–1802.
[43] E A Messent, R J Ward, C J Tonkin, and C Buckland-Wright. “Cancellous
bone differences between knees with early, definite and advanced joint
space loss; a comparative quantitative macroradiographic study”. In:
Osteoarthritis and Cartilage 13 (2005), pp. 39–47.
[44] M Ding, A Odgaard, and I Hvid. “Changes in the three-dimensional
microstructure of human tibial cancellous bone in early osteoarthritis”. In:
Journal of Bone and Joint Surgery (British Volume) 85-B (2003), pp. 906–912.
[45] D Dore, S Quinn, C Ding, T Winzenberg, F Cicuttini, and G Jones.
“Subchondral bone and cartilage damage”. In: Arthritis & Rheumatism 62
(2010), pp. 1967–1973.
[46] Y Wang, A Wluka, and F M Cicuttini. “The determinants of change in
tibial plateau bone area in osteoarthritic knees: a cohort study”. In: Arthritis
Research & Therapy 7 (2005), R687–R693.
[47] R I Bolbos, J Zuo, S Banerjee, T M Link, C B Ma, X Li, and S Majumdar.
“Relationship between trabecular bone structure and articular cartilage
morphology and relaxation times in early OA of the knee joint using
parallel MRI at 3 T”. In: Osteoarthritis and Cartilage 16 (2008), pp. 1150–1159.
[48] C T Lindsey, A Narasimhan, J M Adolfo, H Jin, L S Steinbach,
T Link, M Ries, and S Majumdar. “Magnetic resonance evaluation of the
86
CHAPTER 3. PREDICTION OF PROGRESSION . . .
interrelationship between articular cartilage and trabecular bone of the
osteoarthritic knee”. In: Osteoarthritis and Cartilage 12 (2004), pp. 86–96.
[49] Orlov N Ferrucci L Goldberg IG Shamir L Rahimi S. “Progression analysis
and stage discovery in continuous physiological processes using image
computing”. In: EURASIP Journal on Bioinformatics and Systems Biology,
doi:10.1155/2010/107036 (2010).
87
CHAPTER 3. PREDICTION OF PROGRESSION . . .
6. List of figures
Figure 1: Trabecular bone regions of interest selected in the analysis using anautomated method.
88
CHAPTER 3. PREDICTION OF PROGRESSION . . .
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positive rate
Tru
e p
ositiv
e r
ate
early OA progression (AUC 0.75)
late OA progression (AUC 0.77)
Figure 2: The receiver operating characteristic (ROC) curves obtained forprogression of early and late osteoarthritis (OA) respectively, i.e., using themedial region of interest and loss of medial compartment joint space as outcomewith adjustment for age, sex, and body mass index. The diagonal line representsthe ROC curve of the null model (area under the curve [AUC] 0.5).
89
CHAPTER 3. PREDICTION OF PROGRESSION . . .
7. List of tables
Table 1: Study subject characteristics (n=105).
Characteristic Exam A Exam B
Age, years 53.6±10.5 57.6±10.5
Men, n (%) 90 (86) –
Body mass index, kg/m2 26.1±3.4 26.8±3.7
Time since Exam A, months – 48.8±1.0
Knees (n=203) with radiographic OA∗, n (%) 68 (33) 95 (47)
Medial JSN grade (n=203), n (%)
0 107 (53) 85 (42)
1 78 (38) 79 (39)
2 18 (9) 28 (14)
3 0 (0) 11 (5)
Data are presented as means ± SD unless stated otherwise.OA, osteoarthritis; JSN, joint space narrowing.∗ Approximating Kellgren and Lawrence grade 2 or worse.
90
CHAPTER 3. PREDICTION OF PROGRESSION . . .
Table 2: Prediction accuracies calculated as area under the curve (AUC) forprogression of medial tibiofemoral compartment osteoarthritis (OA) defined asan increase in medial joint space narrowing grade.
Medial ROI Medial ROIOutcome Model 1∗ Model 2† Model 1∗ Model 2†
Whole sample(n=203)
0.74(0.67, 0.82)
0.77(0.70, 0.84)
0.68(0.62, 0.75)
0.71(0.64, 0.78)
Early OA progression‡
(n=135)0.74
(0.67, 0.82)0.75
(0.69, 0.83)0.72
(0.64, 0.80)0.72
(0.65, 0.80)
Late OA progression§
(n=68)0.76
(0.68, 0.84)0.77
(0.68, 0.86)0.68
(0.60, 0.77)0.71
(0.63, 0.79)
Data are presented as AUC means (95% confidence interval).ROI, region of interest.∗ based on texture parameters only.† based on texture parameters with adjustment for age, sex, and body mass index.‡ knees with Kellgren and Lawrence equivalents ≤1, see methods.§ knees with Kellgren and Lawrence equivalents ≥2, see methods.
91
CHAPTER 3. PREDICTION OF PROGRESSION . . .
Table 3: Significant associations of texture parameters and covariates for Model 2∗
and medial region of interest.
Texture parameter/covariate β (95% CI) P-value
Whole sample (n=203)
R1∗R2 21.3 (2.4, 40.0) 0.03
DegA∗DirA 12.6 (4.5, 20.5) <0.01
R2∗R2 -12.4 (-17.3, -7.4) 0.01
DegA∗DegA -7.5 (-10.4, -4.6) 0.01
DirA∗DirA -4.9 (-6.3, -3.4) <0.01
Early OA progression† (n=135)
DegA∗DirA 12.2 (6.8, 17.6) 0.02
DegA∗DegA -8.5 (-12.7, -4.2) 0.05
DirA∗DirA -6.3 (-8.7, -3.9) 0.02
Late OA progression‡ (n=68)
R1 -16.9 (-24.5, -9.4) <0.01
R2 16.5 (8.8, 24.1) <0.01
R2∗R2 -23.2 (-33.3, -13.1) 0.01
Sex (male) -1.0 (-1.5, -0.5) 0.02
Body mass index 2.2 (1.5, 3.0) <0.01
95% CI, 95% confidence interval; DegA, degree of anisotropy; DirA, direction of ani-sotropy; R1,R2, roughness.∗ based on texture parameters with adjustment for age, sex, and body mass index.† knees with Kellgren and Lawrence equivalents ≤1, see methods.‡ knees with Kellgren and Lawrence equivalents ≥2, see methods.
92
CHAPTER 4
A MEASURE OF COMPETENCE BASED ON RANDOM
CLASSIFICATION FOR DYNAMIC ENSEMBLE
SELECTION
Tomasz Woloszynski1, Marek Kurzynski, PhD2, Pawel Podsiadlo, PhD1, and
Gwidon W Stachowiak, PhD1
1Tribology Laboratory, School of Mechanical and Chemical Engineering,
University of Western Australia, Australia
2Chair of Systems and Computer Networks, Wroclaw University of Technology,
Poland
Information Fusion 2012;13;207–212
Abstract
In this paper, a measure of competence based on random classification (MCR)
for classifier ensembles is presented. The measure selects dynamically (i.e. for
each test example) a subset of classifiers from the ensemble that perform better
than a random classifier. Therefore, weak (incompetent) classifiers that would
adversely affect the performance of a classification system are eliminated. When
all classifiers in the ensemble are evaluated as incompetent, the classification
accuracy of the system can be increased by using the random classifier instead.
93
CHAPTER 4. A MEASURE OF COMPETENCE . . .
Theoretical justification for using the measure with the majority voting rule is
given. Two MCR based systems were developed and their performance was
compared against six multiple classifier systems using data sets taken from
the UCI Machine Learning Repository and Ludmila Kuncheva Collection. The
systems developed had typically the highest classification accuracies regardless
of the ensemble type used (homogeneous or heterogeneous).
Key words: multiple classifier system, dynamic ensemble selection, competence
measure, random classification.
94
CHAPTER 4. A MEASURE OF COMPETENCE . . .
1. Introduction
In multiple classifier systems (MCSs), a diverse ensemble of classifiers is
trained and combined together in order to outperform any single classifier
in the ensemble [1]. The diversity of the ensemble can be achieved by using
heterogeneous classifiers [2–4] and/or by generating a different training data set
for each classifier through, for examples, bagging [5], boosting [6], or random
subspaces [7]. For the combination of classifiers in the ensemble, two main
approaches used are classifier fusion and classifier selection.
In the first approach, all classifiers in the ensemble contribute to the decision
of the MCS, e.g. through sum or majority voting [2, 6]. Generally, a fusion
based MCS performs better than any classifier in the ensemble provided that all
classifiers are complementary, i.e. they make independent errors [8]. However,
such independence usually cannot be guaranteed in practice. To address this
problem, ensemble pruning (EP) methods have been developed [9–11]. In these
methods, the decision of the MCS is made by using a subset of complementary
classifiers selected from the ensemble. Techniques used for the classifiers
selection include measures of goodness [12, 13], multiple comparisons statistics
[14], reinforcement learning [15], genetic algorithms [16], and quadratic integer
programming [17]. The main drawback of the EP methods is that the selection
is global, i.e. the same subset of classifiers is used for classification of all test
examples. This could adversely affect the performance of EP based classification
systems, since for some regions in a feature space there could be subsets of
classifiers that have higher classification accuracies than the globally selected
subset.
In the second approach, a single classifier is selected from the ensemble for each
test example and its decision is used as the decision of the MCS. The selection
of a classifier can be either static or dynamic. In static classifier selection, a
95
CHAPTER 4. A MEASURE OF COMPETENCE . . .
region of competence in the feature space is assigned for each classifier during
the training phase [18, 19]. Classification is made by the classifier assigned to
the competence region that contains the test example. In dynamic classifier
selection (DCS), competences of classifiers are calculated during the classification
phase, i.e. at the time when the test example is presented [20–22]. The classifier
with the highest value of competence is used for the classification of the test
example. The performance of DCS based classification systems depends on the
correct estimation of competence which is usually defined as local accuracy [23].
However, in the case of poor estimation the selected classifier may not be the
most accurate classifier in the ensemble for the region and this could adversely
affect the performance of the system. Recently, dynamic ensemble selection
(DES) methods have been developed to overcome this problem [24–27]. These
methods first dynamically select a subset of classifiers from the ensemble and
then combine the selected classifiers by majority voting. In this way DES based
classification systems take advantage of both the selection and fusion approaches
while avoiding the problems of the EP and DCS methods. The systems have some
similarities to random forests [28] with regard to the selection of features and data
sets used in classifier training and the use of the majority voting rule. The major
difference, however, is that the random forests grow a diverse classifier ensemble
while DES based systems select and combine classifiers from an ensemble that
is already given. For the dynamic selection of classifiers subset in the DES
based systems, local accuracy methods [24, 25], a local accuracy and diversity
method [26] and a two-level optimization and selection strategy [27] have been
used. However, the method and strategy used have little theoretical justification
[24, 26] and the subset of classifiers is selected in a rather complex manner [25,
27].
In this study, these issues were addressed by developing a measure of
competence based on random classification (MCR). The study is the continuation
of our initial work on the competence measure based on random guessing that
96
CHAPTER 4. A MEASURE OF COMPETENCE . . .
was successfully used for the classification of five benchmark data sets [29].
In particular, theoretical justification is given and extensive experiments are
conducted to investigate whether the performance of a random classifier is a
natural criterion for selecting relatively accurate and diverse classifiers from
the ensemble. The measure developed uses a random classifier to evaluate the
performances of classifiers in the ensemble. It is theoretically shown that the
MCR increases the classification accuracy of MCS when the majority voting rule
is used. The performances of MCR based classification systems were compared
against six MCSs based on the following combination methods: single best
(SB), majority voting (MV), DCS local accuracy (DCS-LA) [21], DCS multiple
classifier behaviour (DCS-MCB) [22], DCS modified local accuracy (DCS-MLA)
[30], and DES knora-eliminate (DES-KE) [24] using data sets from the UCI
Machine Learning Repository [31] and the Ludmila Kuncheva Collection [32].
2. Theoretical framework
Consider a classification problem with a setM = {1, 2, . . . ,M} of class labels and
a feature space X ⊆ Rn (i.e. each example is described by n features). Assume
that examples are independent and identically distributed (iid) random variable
pairs (X, J), where X ∈ X and J ∈ M. The probability measure µ and prior
probabilities pj , j = 1, . . . ,M describe distributions of X and J , respectively. The
posterior probability that an example described by a feature vector x ∈ X belongs
to the class j ∈M is denoted by pj(x).
Let ψ : X → M be a classifier that produces a vector of discriminant functions
[d1(x), d2(x), . . . , dM(x)]. The value of the discriminant function dj(x), j =
1, 2, . . . ,M represents a support given by the classifier ψ for the fact that the
example described by a feature vector x belongs to the j-th class. Assume without
loss of generality that dj(x) ≥ 0, j = 1, 2, . . . ,M and∑M
j=1 dj(x) = 1. Classification
97
CHAPTER 4. A MEASURE OF COMPETENCE . . .
is made according to the maximum rule
ψ(x) = i ⇔ di(x) = maxj=1,...,M
dj(x). (1)
The performance of the classifier ψ is measured by its probability of correct
classification Pc(ψ). For a random classifier ψrnd that draws a class label using
a uniform distribution the probability Pc(ψrnd) is equal to 1M
.
2.1 Measure of competence based on random classification (MCR)
Let C(ψ|x) denote the MCR of the classifier ψ for the example described by a
feature vector x. The functionC(ψ|x) is defined as any strictly increasing function
G(·) with the property G(0) = 0 and the argument of the form Pc,N(ψ|x)− 1M
, i.e.
C(ψ|x) = G
[Pc,N(ψ|x)− 1
M
], (2)
where Pc,N(ψ|x) is the estimated value of conditional probability of correct
classification
Pc(ψ|x) = pi(x)⇔ di(x) = maxj=1,...,M
dj(x) (3)
andN is the number of examples in a validation data set V used for the estimation
of Pc(ψ|x). The examples in V are assumed to be iid random variable pairs
distributed as (X, J) and independent of examples used for the training of
classifiers.
The measure C(ψ|x) is positive (non-positive) if the classifier ψ is competent
(incompetent) for the example described by a feature vector x, i.e. if the value
of Pc,N(ψ|x) is greater than (lower than or equal to) Pc(ψrnd).
2.2 Theoretical justification
Let Ψ = {ψ1, . . . , ψL} be an ensemble of L classifiers and let Ψvote be a MCS
obtained by combining classifiers from the ensemble Ψ using the majority voting
98
CHAPTER 4. A MEASURE OF COMPETENCE . . .
rule. Ψvote is Bayes optimal if the following assumptions are satisfied [33, 34]:
• for each classifier the probability of choosing the correct class is greater than
the probability of choosing any single other class,
• all classifiers make independent errors, and
• the number of classifiers tends to infinity.
However, the MCS has little practical value since the independence between
classifiers errors usually cannot be guaranteed and a large number of classifiers
in the ensemble would adversely affect the system’s complexity and efficiency.
If the above assumptions are not satisfied then, assuming that there exists a set
R ⊆ X with µ(R) > 0 where Pc(ψl|x) < 1M
, l = 1, . . . , L, it is possible to construct
a classification system Ψvote,N for which
Pr[
limN→∞
Pc(Ψvote,N) > Pc(Ψvote)]
= 1. (4)
The Ψvote,N is constructed in the following manner:
• if the MCR evaluates at least one classifier as competent for a given x then
Ψvote,N is obtained by majority voting of the competent classifiers, and
• if there is no classifier evaluated as competent by the MCR for a given x
then Ψvote,N = ψrnd.
The Ψvote,N performs better than the Ψvote if the condition
∀l=1,...,L
(Pc(ψl|x) ≤ 1
M⇒ Pc,N(ψl|x) ≤ 1
M
)∧(
Pc(ψl|x) >1
M⇒ Pc,N(ψl|x) >
1
M
)(5)
is satisfied for almost all x ∈ X , i.e. for all x except possibly a region with the
probability measure zero. In particular
Pc(Ψvote,N |x ∈ X \ R) ≥ Pc(Ψvote|x ∈ X \ R) (6)
99
CHAPTER 4. A MEASURE OF COMPETENCE . . .
because the performance of the Ψvote in the set X \ R would not deteriorate by
locally removing worse-than-random classifiers from the ensemble Ψ, and
Pc(Ψvote,N |x ∈ R) > Pc(Ψvote|x ∈ R) (7)
because majority voting of worse-than-random classifiers in the set R cannot
outperform the random classifier. The reason is that the majority voting rule
can only select one of the class labels returned by classifiers in the ensemble.
Consequently, if all classifiers are worse-than-random then neither of the
returned class labels gives a better-than-random classification accuracy.
The probability (4) can be easily obtained if the estimator of Pc,N(ψ|x) is strongly
consistent, e.g. k-NN, histogram or kernel estimator [35]. Consequently, as N →
∞, the condition (5) is satisfied with probability equal to one by noting that for
the estimator
Pr[
limN→∞
Pc,N(ψ|x) = Pc(ψ|x)]
= 1. (8)
Therefore, it follows from (6) and (7) that, asN →∞, Ψvote,N performs better than
the Ψvote with probability equal to one.
3. Methods
3.1 DES-P and DES-KL classification systems
The DES-Performance (DES-P) and DES-Kullback-Leibler (DES-KL) classification
systems were developed according to Section 2.2. In the systems, the values of
the competence measure C(ψ|x) were estimated as follows:
1. DES-P system
The competence measure used in the DES-P system follows the definition of
the MCR. First, the performance of the classifier ψ in a neighbourhood of the
test example x is estimated using weighted k nearest neighbours taken from
the validation data set V . k = 10 was chosen since for this value a weighted
100
CHAPTER 4. A MEASURE OF COMPETENCE . . .
nearest neighbour estimation method produced the best results in previous
studies [30]. The competence measure is then obtained by subtracting the
performance of the random classifier 1M
from the performance estimated,
i.e.
C(ψ|x) = Pc(ψ|x)− 1
M. (9)
2. DES-KL system
The competence measure used in the DES-KL system estimates the
performances of classifiers from the information theory perspective. First,
for each example y in the validation data set V a source competence
Csrc(ψ|y) is calculated as the Kullback-Leibler (KL) divergence between the
uniform distribution[
1M, . . . , 1
M
]and the vector of discriminant functions
[d1(y), . . . , dM(y)] produced by the classifier ψ. In this way, the source
competence measures how ”close” the values of the discriminant functions
are to the probability of random classification. Since the KL divergence
is nonnegative, the source competence is given the sign of the expression
di(y) − maxj=1,...,M,j 6=i dj(y), where i denotes the class of the example y.
As a result, the source competence attains its maximum (minimum) if the
support given by the classifier for the correct (any incorrect) class is equal
to one. The competence of the classifier ψ for the test example x is then
obtained by a weighted sum of the source competences
C(ψ|x) =∑y∈V
Csrc(ψ|y)exp(−d(x, y)2), (10)
where d(x, y) is Euclidean distance between the examples x and y. For the
DES-P and DES-LA systems, different types of methods for the comparison
between each classifier from the ensemble and the random classifier were
used to investigate their effect on the performance.
101
CHAPTER 4. A MEASURE OF COMPETENCE . . .
4. Experiments
The performance of the DES-P and DES-KL systems was compared against six
MCSs using 14 benchmark data sets. The comparison study was conducted in
MATLAB using PRTools developed by Duin et al. [36].
4.1 Data sets
The benchmark data sets taken from the UCI Machine Learning Repository (UCI)
[31] and the Ludmila Kuncheva Collection (LKC) [32] were used. For each
data set, the feature vectors were normalized to zero mean and unit standard
deviation. A brief description of the data sets used is given in Table 1. The
training and testing data sets were extracted using two-fold cross-validation. The
training data set was also used as the validation data set.
4.2 MCSs
The DES-P and DES-KL systems were compared against two MCSs based on
benchmark combination methods and four competence based MCSs:
1. SB system
This system selects the single best classifier in the ensemble.
2. MV system
This system is based on majority voting of all classifiers in the ensemble.
3. DCS-LA system
This system defines the competence of the classifier ψ for the test example
x as the local classification accuracy. The accuracy is estimated using the k
nearest neighbours of the test example x that are taken from the validation
data set V . k = 10 was chosen since for this value the DCS-LA system had
the best overall performance in previous studies [21].
4. DCS-MCB system
This system defines the competence of the classifier ψ for the test example
102
CHAPTER 4. A MEASURE OF COMPETENCE . . .
x as the classification accuracy calculated for a data set V . V is dynamically
generated from the validation data set V in the following manner: First, a
multiple classifier behaviour (MCB) is calculated for the test example x and
for its k nearest neighbours taken from V . The MCB is defined as a vector
whose elements are the decisions (i.e. class labels assigned to the example
x) of all classifiers in the ensemble. Next, similarities between the MCBs
are calculated using the averaged Hamming distance. The examples from
V that are most similar to the test example x (i.e. below some similarity
threshold) are then used to generate the data set V . Since the optimal values
of the parameter k and the similarity threshold were not given in previous
studies, the values of k = 10 and the similarity threshold equal to 0.5 were
arbitrarily chosen.
5. DCS-MLA system
This system is similar to the DCS-LA system, except the local classification
accuracy is estimated using weighted k nearest neighbours of the test
example x that are taken from V [30].
6. DES-KE system
This system dynamically selects a subset of classifiers with the perfect
classification accuracy of k nearest neighbours of the test example x. The
k nearest neighbours are taken from the validation data set V . If there is no
classifier with the perfect classification accuracy of all k nearest neighbours,
the value of k is decreased until at least one such classifier is found.
k = 8 was chosen since for this value the DES-KE system had the best
performance [24].
4.3 Classifier ensembles
Two types of classifier ensembles were used in the experiments: homogeneous
and heterogeneous. The homogeneous ensemble consisted of classification trees
with the Gini splitting criterion and the pruning level set to 4 [37]. The trees
103
CHAPTER 4. A MEASURE OF COMPETENCE . . .
were used because of their instability with respect to the training data set and
relatively good classification accuracy. The heterogeneous ensemble consisted of
the following 11 classifiers [37]:
• (1 and 2) LDC (QDC) - linear (quadratic) discriminant classifiers based on
normal distributions with the same (different) covariance matrix for each
class;
• (3) NMC - nearest mean classifier;
• (4–6) k-NN - k-nearest neighbours classifiers with k = 1, 5, 15;
• (7 and 8) PARZEN1 (PARZEN2) - Parzen classifier with the Gaussian kernel
and the optimal smoothing parameter hopt (and the smoothing parameter
hopt/2);
• (9) TREE - classification tree with the Gini splitting criterion and the pruning
level set to 4;
• (10 and 11) BPNN1 (BPNN2) - feed-forward backpropagation neural
network classifier containing two hidden layers with 5 neurons each (one
hidden layer with 10 neurons) and the maximum number of learning
epochs set to 80.
The classifiers chosen for the heterogeneous ensemble are structurally diverse,
i.e. training and classification phases are carried out differently for each type of
classifier used. For both ensemble types, classifiers were trained using bagging.
5. Results
Classification accuracies (i.e. the percentage of correctly classified examples) were
calculated for the systems using the homogeneous and heterogeneous ensembles
containing 11 classifiers each and they are given in Tables 2 and 3, respectively.
The accuracies are average values obtained over 10 runs (5 replications of
two-fold cross-validation). Differences in rank between the systems were
104
CHAPTER 4. A MEASURE OF COMPETENCE . . .
evaluated using an Iman and Davenport test combined with a post-hoc Holm’s
step-down procedure [38]. Differences between the accuracies of the DES-P and
DES-KL systems and the six MCSs were also evaluated and a 5×2cv F-test was
used [39]. For both tests, the level of P<0.05 was considered as statistically
significant.
The DES-P and DES-KL systems achieved the highest overall classification
accuracies averaged over all data sets (i.e. 85.19% and 85.09%) for the
homogeneous ensemble (Table 2). The two systems developed outperformed the
SB, MV, DCS-LA, DCS-MCB, DCS-MLA, and DES-KE systems by 9.70%, 1.98%,
5.04%, 4.06%, 4.86% and 3.06% on average, respectively. The ranks of the DES-P
and DES-KL systems were statistically significantly higher than those of the six
MCSs. The exception was that the DES-KL system had a lower rank than the
MV system. The systems developed produced statistically significantly higher
accuracies than the other MCSs in 79 out of 168 cases (14 data sets × 6 MCSs × 2
systems developed). However, against all the other MCSs they were significantly
better for only 2 data sets (the OptDigits and Vowel). The MV system that had
the third best accuracy (83.16%) outperformed the other 5 MCSs for 8 out of 14
data sets.
The DES-P and DES-KL systems also achieved the highest overall classification
accuracies (i.e. 87.71% and 87.24%) when the heterogeneous ensemble was used
(Table 3). The two systems outperformed the SB, MV, DCS-LA, DCS-MCB,
DCS-MLA, and DES-KE systems by 0.44%, 0.50%, 2.05%, 2.05%, 1.98% and
1.46% on average, respectively. The DES-P and DES-KL systems had statistically
significantly higher ranks than those of the DCS-LA, DCS-MCB and DES-KE
systems. The DES-KL system had also a higher rank than the DCS-MLA
system. The two systems developed had statistically significantly higher (lower)
accuracies than the six MCSs in 64 (3) out of 168 cases. However, against all the
other MCSs only the DES-KL system was significantly better, and only for the
105
CHAPTER 4. A MEASURE OF COMPETENCE . . .
Vowel data set. From the six MCSs used for the comparison study, the SB system
achieved the highest classification accuracy averaged over all data sets (87.04%).
Classification accuracies calculated for the MCSs using homogeneous and
heterogeneous ensembles of sizes ranging from 11 to 500 and from 11 to 55
classifiers, respectively are given in Table 4. The DES-P and DES-KL systems
had the highest overall accuracies for all ensemble sizes and types.
The percentage of times each classifier was selected by the DCS and DES based
systems is given in Table 5. The nearest neighbour and the two Parzen classifiers
were most frequently selected by the systems.
6. Discussion
The newly developed DES-P and DES-KL systems had typically higher
accuracies than the six MCSs. In particular, the systems had the two highest
accuracies for 13 and 6 out of 14 data sets using the homogeneous and
heterogeneous ensembles, respectively. However, they produced statistically
significantly higher accuracies than those of all the other MCSs for only 2 data
sets.
For the homogeneous ensemble, possible reasons for the better performance of
the systems developed could be that the ensemble used was diverse, there was
no superior classifier, and the DES-P and DES-KL systems could select subsets
containing more complementary classifiers than subsets of the best-performing
classifiers used in the other MCSs [1, 10]. Another reason could be that in the
systems developed, all classifiers in the ensemble were evaluated as incompetent
for at least one test example in 9 data sets. Consequently, assuming that the
estimation of the competences was correct for these examples, one can claim that
the random classifier used in the DES-P and DES-KL systems outperformed all
other systems.
106
CHAPTER 4. A MEASURE OF COMPETENCE . . .
For the heterogeneous ensemble, the relatively worse performance of the systems
developed could be explained by the fact that the better-than-random classifiers
in the ensemble were rarely complementary, i.e. that there was a subset of
superior classifiers for all data sets. As a result, the performance of the systems
could deteriorate in the case where other than superior classifiers were selected
from the ensemble. For example, the nearest neighbour and the two Parzen
classifiers often outperformed all the other classifiers in the ensemble. For the
Dermatology, Iris and Yeast data sets, the single best classifier in the ensemble
also outperformed the DES-P and DES-KL systems. This problem could be
overcome by increasing the number of classifiers in the ensemble and introducing
additional diversity by, for example, using only a subset of features for training [6,
28]. The performance of the systems developed could also deteriorate for highly
skewed data sets (i.e. for data sets with highly different class prior probabilities
pj) such as the EColi and Yeast. This is because the value of 1M
could be lower
than the performance of almost any classifier. This problem could be rectified
by estimating prior probabilities pj using the training data set and eliminating
classifiers ψl from the ensemble with Pc(ψl) ≤ pi, where pi = maxj=1,...,M pj .
The two systems developed had typically the highest classification accuracies for
all ensemble sizes. For homogeneous ensembles, their performance improved
noticeably with increased ensemble size. This agrees with the fact that for
bagging classification trees, the rate of the performance improvement is typically
at its highest for the first 30–40 trees [1, 5]. For heterogeneous ensembles, the
performance of each MCS evaluated remained almost the same for all ensemble
sizes. A possible explanation is that a subset of superior classifiers dominated
the ensembles and subsequently, the ensemble size had little or no effect on the
performance of the MCSs. The DES-P and DES-KL systems performed similarly
for all ensemble sizes and types. This indicates that the type of comparison
method used in the systems has little effect on their performance.
107
CHAPTER 4. A MEASURE OF COMPETENCE . . .
The MCR does not require feature vectors and the DES-P and DES-KL systems
can be used for classification in dissimilarity space. This is of practical importance
since the direct measurement of distances between examples may be preferred
over feature extraction in some problems, e.g. classifications of electrocardiogram
[40] and electroencephalogram signals [41, 42] or trabecular bone texture in knee
radiographs [43].
7. Conclusion
From the work conducted, the following conclusions can be drawn:
1. A measure of competence based on random classification (MCR) for
dynamic ensemble selection was successfully developed. The use of the
MCR with majority voting rule was justified theoretically.
2. Two systems based on the MCR were developed, i.e. DES-Performance
(DES-P) and DES-Kullback-Leibler (DES-KL). They showed the best overall
performances when compared against six MCSs on 14 data sets. It appears
that the systems are well suited for the classification of a wide range of data
sets.
3. The MCR uses distances between examples and the DES-P and DES-KL
systems can be used for classification in both feature and dissimilarity
spaces.
Acknowledgements
Financial support from the School of Mechanical and Chemical Engineering, The
University of Western Australia, is greatly appreciated.
108
CHAPTER 4. A MEASURE OF COMPETENCE . . .
8. References
[1] L I Kuncheva. Combining Pattern Classifiers: Methods and Algorithms.
Wiley-Interscience, 2004.
[2] J Kittler, M Hatef, R P W Duin, and J Matas. “On combining classifiers”.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998),
pp. 226–239.
[3] T Woloszynski and M Kurzynski. “Application of combining classifiers
using dynamic weights to the protein secondary structure prediction -
comparative analysis of fusion methods”. In: 7th International Symposium
on Biological and Medical Data Analysis. 2006, pp. 83–91.
[4] A M P Canuto, M C C Abreu, L M Oliveira, J C Xavier, Jr., and A M Santos.
“Investigating the influence of the choice of the ensemble members in
accuracy and diversity of selection-based and fusion-based methods for
ensembles”. In: Pattern Recognition Letters 28 (2007), pp. 472–486.
[5] L Breiman. “Bagging predictors”. In: Machine Learning 24 (1996),
pp. 123–140.
[6] Y Freund and R E Schapire. “A decision-theoretic generalization of on-line
learning and an application to boosting”. In: Journal of Computer and System
Sciences 55 (1997), pp. 119–139.
[7] T K Ho. “The random subspace method for constructing decision forests”.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998),
pp. 832–844.
[8] L I Kuncheva. “A theoretical study on six classifier fusion strategies”. In:
IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002),
pp. 281–286.
[9] R E Banfield, L O Hall, K W Bowyer, and W P Kegelmeyer. “Ensemble
diversity measures and their application to thinning”. In: Information Fusion
6 (2005), pp. 49–62.
109
CHAPTER 4. A MEASURE OF COMPETENCE . . .
[10] D Ruta and B Gabrys. “Classifier selection for majority voting”. In:
Information Fusion 6 (2005), pp. 63–81.
[11] G Martinez-Munoz, D Hernandez-Lobato, and A Suarez. “An analysis
of ensemble pruning techniques based on ordered aggregation”. In:
IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2009),
pp. 245–259.
[12] D D Margineantu and T G Dietterich. “Pruning adaptive boosting”. In:
Fourteenth International Conference on Machine Learning. 1997, pp. 211–218.
[13] G Martinez-Munoz and A Suarez. “Aggregation ordering in bagging”. In:
IASTED International Conference on Artificial Intelligence and Applications.
2004, pp. 258–263.
[14] G Tsoumakas, L Angelis, and I Vlahavas. “Selective fusion of
heterogeneous classifiers”. In: Intelligent Data Analysis 9 (2005), pp. 511–525.
[15] I Partalas, G Tsoumakas, and I Vlahavas. “Pruning an ensemble of
classifiers via reinforcement learning”. In: Neurocomputing 72 (2009),
pp. 1900–1909.
[16] Z Zhou, J Wu, and W Tang. “Ensembling neural networks: many could be
better than all”. In: Artificial Intelligence 137 (2002), pp. 239–263.
[17] Y Zhang, S Burer, and W N Street. “Ensemble pruning via semi-definite
programming”. In: Journal of Machine Learning Research 7 (2006),
pp. 1315–1338.
[18] L I Kuncheva. “Clustering-and-selection model for classifier combination”.
In: Fourth International Conference on Knowledge-Based Intelligent Engineering
Systems and Allied Technologies. 2000, pp. 185–188.
[19] R Liu and B Yuan. “Multiple classifiers combination by clustering and
selection”. In: Information Fusion 2 (2001), pp. 163–168.
110
CHAPTER 4. A MEASURE OF COMPETENCE . . .
[20] B V Dasarathy and B V Sheela. “A composite classifier system design:
concepts and methodology”. In: Proceedings of the IEEE 67 (1979),
pp. 708–713.
[21] K Woods, W P Kegelmeyer, Jr., and K Bowyer. “Combination of multiple
classifiers using local accuracy estimates”. In: IEEE Transactions on Pattern
Analysis and Machine Intelligence 19 (1997), pp. 405–410.
[22] G Giacinto and F Roli. “Dynamic classifier selection based on multiple
classifier behaviour”. In: Pattern Recognition 34 (2001), pp. 1879–1881.
[23] L Didaci, G Giacinto, F Roli, and G L Marcialis. “A study on the
performances of dynamic classifier selection based on local accuracy
estimation”. In: Pattern Recognition 38 (2005), pp. 2188–2191.
[24] A H R Ko, R Sabourin, and A S Britto, Jr. “From dynamic classifier
selection to dynamic ensemble selection”. In: Pattern Recognition 41 (2008),
pp. 1718–1731.
[25] E Kim and Y Lee. “Multiple classifier fusion method based on local
competence”. In: IASTED International Conference. 2002, pp. 404–409.
[26] M C P de Souto, R G F Soares, A Santana, and A M P Canuto. “Empirical
comparison of dynamic classifier selection methods based on diversity and
accuracy for building ensembles”. In: International Joint Conference on Neural
Networks. 2008, pp. 1480–1487.
[27] E M dos Santos, R Sabourin, and P Maupin. “A dynamic
overproduce-and-choose strategy for the selection of classifier ensembles”.
In: Pattern Recognition 41 (2008), pp. 2993–3009.
[28] L Breiman. “Random forests”. In: Machine Learning 45 (2001), pp. 5–32.
[29] T Woloszynski and M Kurzynski. “On a new measure of classifier
competence applied to the design of multiclassifier systems”. In: 15th
International Conference on Image Analysis and Processing. 2009, pp. 995–1004.
111
CHAPTER 4. A MEASURE OF COMPETENCE . . .
[30] P C Smits. “Multiple classifier systems for supervised remote sensing image
classification based on dynamics classifier selection”. In: IEEE Transactions
on Geoscience and Remote Sensing 40 (2002), pp. 801–813.
[31] A Asuncion and D J Newman. UCI Machine Learning Repository. 2007. URL:
http://www.ics.uci.edu/˜mlearn/MLRepository.html.
[32] L I Kuncheva. Ludmila Kuncheva Collection. 2004. URL: http : / / www .
bangor.ac.uk/˜mas00a/activities/patrec1.html.
[33] P J Boland. “Majority systems and the Condorcet Jury Theorem”. In: Journal
of the Royal Statistical Society: Series D (The Statistician) 38 (1989), pp. 181–189.
[34] C List and R E Goodin. “Epistemic democracy: generalizing the Condorcet
Jury Theorem”. In: Journal of Political Philosophy 9 (2001), pp. 277–306.
[35] L Devroye, L Gyorfi, and G Lugosi. A Probabilistic Theory of Pattern
Recognition (Stochastic Modelling and Applied Probability). Springer, 1997.
[36] R P W Duin, P Juszczak, P Paclik, E Pekalska, D de Ridder, D M J Tax,
and S Verzakov. PR-Tools4.1, A Matlab Toolbox for Pattern Recognition.
http://prtools.org. 2007.
[37] R O Duda, P E Hart, and D G Stork. Pattern Classification.
Wiley-Interscience, 2001.
[38] J Demsar. “Statistical comparison of classifiers over multiple data sets”. In:
Journal of Machine Learning Research 7 (2006), pp. 1–30.
[39] E Alpaydin. “Combined 5x2cv F test for comparing supervised
classification learning algorithms”. In: Neural Computation 11 (1999),
pp. 1885–1892.
[40] S Fang and H Chan. “Human identification by quantifying similarity and
dissimilarity in electrocardiogram phase space”. In: Pattern Recognition 42
(2009), pp. 1824–1831.
112
CHAPTER 4. A MEASURE OF COMPETENCE . . .
[41] J L Hernandez, R Biscay, J C Jimenez, P Valdes, and R G de Peralta.
“Measuring the dissimilarity between EEG recordings through a non-linear
dynamical system approach”. In: International Journal of Bio-Medical
Computing 38 (1995), pp. 121–129.
[42] R G Geocadin, R Ghodadra, T Kimura, H Lei, D L Sherman, D F Hanley, and
N V Thakor. “A novel quantitative EEG injury measure of global cerebral
ischemia”. In: Clinical Neurophysiology 111 (2000), pp. 1779–1787.
[43] T Woloszynski, P Podsiadlo, G W Stachowiak, and M Kurzynski. “A
signature dissimilarity measure for trabecular bone texture in knee
radiographs”. In: Medical Physics 37 (2010), pp. 2030–2042.
113
CHAPTER 4. A MEASURE OF COMPETENCE . . .
9. List of tables
Table 1: A brief description of the data sets used.
Data set Source Examples Features Classes
Breast Cancer Wisconsin UCI 699 9 2
Dermatology UCI 366 34 6
EColi UCI 336 7 8
Glass UCI 214 9 6
Ionosphere UCI 351 34 2
Iris UCI 150 4 3
Laryngeal3 LKC 353 16 3
OptDigits UCI 3823 64 10
Page Blocks UCI 5473 10 5
Segmentation UCI 2310 19 7
Thyroid LKC 215 5 3
Vowel UCI 990 10 11
Wine UCI 178 13 3
Yeast UCI 1484 8 10
114
CHAPTER 4. A MEASURE OF COMPETENCE . . .
Tabl
e2:
Cla
ssifi
cati
onac
cura
cies
obta
ined
for
MC
Ssus
ing
aho
mog
eneo
usen
sem
ble
cont
aini
ng11
clas
sifie
rs.
The
best
resu
ltfo
rea
chda
tase
tis
inbo
ld.
Dat
ase
tSB
MV
DC
S-LA
DC
S-M
CB
DC
S-M
LAD
ES-K
ED
ES-P
DES
-KL
Brea
stC
.W.
94.8
295
.71
94.8
594
.65
94.8
895
.16
95.8
595
.71
Der
mat
olog
y73
.43
85.4
684
.10
84.1
583
.83
85.9
088
.85
88.9
1
ECol
i71
.07
78.8
176
.08
75.9
074
.53
75.9
781
.25
80.4
8
Gla
ss58
.12
66.8
065
.69
68.4
966
.44
68.2
970
.90
70.6
0
Iono
sphe
re86
.10
87.4
786
.72
87.9
285
.81
87.6
487
.98
88.2
1
Iris
90.1
391
.07
91.7
392
.00
92.0
091
.87
92.0
092
.67
Lary
ngea
l366
.35
69.3
563
.91
63.2
963
.97
65.1
670
.54
69.9
2
Opt
Dig
its
62.5
586
.87
76.1
778
.98
75.8
079
.84
89.4
087
.28
Page
Bloc
ks95
.12
96.0
895
.82
95.7
895
.73
96.0
796
.15
96.3
5
Segm
enta
tion
86.1
995
.52
90.6
891
.79
90.9
995
.15
96.1
896
.39
Thyr
oid
90.5
291
.72
90.8
990
.61
90.7
192
.56
93.0
293
.21
Vow
el50
.65
71.7
666
.81
71.4
971
.74
73.4
981
.03
83.4
7
Win
e86
.04
93.6
989
.99
90.6
789
.08
93.4
794
.37
94.2
6
Yeas
t45
.12
53.9
947
.97
49.4
148
.36
48.6
055
.18
53.8
1
Ave
rage
75.4
483
.16
80.1
081
.08
80.2
882
.08
85.1
985
.09
115
CHAPTER 4. A MEASURE OF COMPETENCE . . .
Tabl
e3:
Cla
ssifi
cati
onac
cura
cies
obta
ined
for
MC
Ssus
ing
ahe
tero
gene
ous
ense
mbl
eco
ntai
ning
11cl
assi
fiers
.Th
ebe
stre
sult
for
each
data
seti
sin
bold
.
Dat
ase
tSB
MV
DC
S-LA
DC
S-M
CB
DC
S-M
LAD
ES-K
ED
ES-P
DES
-KL
Brea
stC
.W.
96.2
596
.25
94.7
994
.74
94.8
294
.94
96.2
595
.85
Der
mat
olog
y95
.85
95.6
893
.55
93.6
093
.44
94.6
595
.68
94.9
7
ECol
i84
.05
84.3
578
.87
78.9
378
.34
79.2
984
.17
83.2
7
Gla
ss66
.95
67.0
367
.70
68.7
168
.56
67.9
169
.11
69.0
1
Iono
sphe
re85
.30
84.6
783
.70
84.2
283
.87
83.9
986
.72
86.1
0
Iris
96.8
095
.07
94.9
395
.07
95.3
395
.87
95.3
396
.27
Lary
ngea
l370
.88
73.0
965
.21
65.1
065
.27
65.1
070
.48
68.8
4
Opt
Dig
its
96.4
397
.23
95.9
995
.76
95.9
097
.31
97.5
197
.08
Page
Bloc
ks96
.09
96.0
695
.97
96.0
095
.93
96.0
296
.04
96.4
7
Segm
enta
tion
93.7
094
.89
94.1
294
.36
94.2
095
.22
95.6
095
.25
Thy
roid
94.5
291
.36
94.4
294
.42
94.4
294
.70
95.0
895
.63
Vow
el87
.82
87.9
490
.53
88.9
991
.19
92.2
092
.81
93.1
7
Win
e95
.86
96.6
495
.63
95.3
995
.41
96.8
597
.64
96.9
7
Yeas
t58
.02
57.4
550
.58
50.7
850
.32
50.2
155
.53
52.4
3
Ave
rage
87.0
486
.98
85.4
385
.43
85.5
086
.02
87.7
187
.24
116
CHAPTER 4. A MEASURE OF COMPETENCE . . .
Tabl
e4:
Cla
ssifi
cati
onac
cura
cies
aver
aged
over
alld
ata
sets
obta
ined
for
diff
eren
tens
embl
esi
zes.
The
best
resu
ltfo
rea
chen
sem
ble
size
isin
bold
. Ense
mbl
esi
zeSB
MV
DC
S-LA
DC
S-M
CB
DC
S-M
LAD
ES-K
ED
ES-P
DES
-KL
Hom
ogen
eous
1175
.44
83.1
680
.10
81.0
880
.28
82.0
885
.19
85.0
9
2275
.36
84.3
880
.13
80.8
080
.14
82.7
986
.29
86.3
5
3376
.35
85.3
181
.04
81.2
081
.05
84.1
287
.07
87.3
4
4476
.15
85.3
080
.72
81.2
881
.09
84.4
187
.26
87.3
8
5576
.14
85.7
880
.98
81.3
880
.90
84.9
087
.44
87.7
6
100
76.4
685
.54
80.4
180
.99
80.6
085
.38
87.3
487
.58
200
76.9
385
.96
80.9
881
.33
80.9
485
.34
87.7
888
.05
500
76.9
785
.88
80.9
181
.29
80.7
385
.56
87.4
187
.89
Het
erog
eneo
us11
87.0
486
.98
85.4
385
.43
85.5
086
.02
87.7
187
.24
2287
.30
87.4
585
.86
85.5
585
.57
86.6
088
.00
87.5
6
3387
.38
87.6
185
.44
85.4
085
.63
86.8
488
.05
87.6
2
4487
.49
87.6
985
.63
85.5
185
.55
86.8
588
.26
87.7
0
5587
.38
87.6
585
.53
85.5
385
.48
86.8
388
.24
87.6
5
117
CHAPTER 4. A MEASURE OF COMPETENCE . . .
Tabl
e5:
The
perc
enta
geof
tim
esea
chcl
assi
fier
was
sele
cted
byth
eD
CS
and
DES
base
dsy
stem
sav
erag
edov
eral
lens
embl
esi
zes
and
data
sets
.
Cla
ssifi
erD
CS-
LAD
CS-
MC
BD
CS-
MLA
DES
-KE
DES
-PD
ES-K
L
Hom
ogen
eous
TREE
––
–47
9788
Het
erog
eneo
us
LDC
78
860
9790
QD
C7
77
5087
79
NM
C6
76
5491
82
1-N
N20
1819
7499
97
5-N
N7
88
6098
92
15-N
N6
76
5695
89
PAR
ZEN
118
1717
7399
97
PAR
ZEN
219
1719
7499
97
TREE
78
747
9787
BPN
N1
22
219
5240
BPN
N2
11
121
5141
118
CHAPTER 5
DISSIMILARITY-BASED MULTIPLE CLASSIFIER
SYSTEM FOR TRABECULAR BONE TEXTURE IN KNEE
RADIOGRAPHS: DETECTION AND PREDICTION OF
OSTEOARTHRITIS
Tomasz Woloszynski1, Pawel Podsiadlo, PhD1, Gwidon W Stachowiak, PhD1,
and Marek Kurzynski, PhD2
1Tribology Laboratory, School of Mechanical and Chemical Engineering,
University of Western Australia, Australia
2Chair of Systems and Computer Networks, Wroclaw University of Technology,
Poland
Submitted to Proceedings of the Institution of Mechanical Engineers, Part H,
Journal of Engineering in Medicine (Chapter 5).
Abstract
A dissimilarity-based multiple classifier (DMC) system was developed and used
for detection and prediction of knee osteoarthritis (OA). The DMC system
calculates distances between trabecular bone (TB) texture images and uses
the distances with accurate classifiers. To generate the classifiers, a specially
developed approach is used to obtain an ensemble of diverse classifiers and
119
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
then select and combine accurate classifiers from the ensemble. The DMC
system was evaluated using standardised radiographs of human knees taken
at baseline and follow-up four years later. Three experienced readers graded
the medial and lateral compartments of the knees for joint space narrowing
(JSN), osteophytes and OA according to atlas of radiographic features and
Kellgren and Lawrence (KL) scale. From the radiographs, TB texture images
were selected under both compartments using an automated region selection
method. Texture images selected under 68 healthy (KL grade 0), 69 OA (KL
grade 2 or 3), 60 non-progressive (no change in the sum of JSN and osteophyte
grades between baseline and follow-up) and 59 progressive (an increase in
the sum) compartments were used. The DMC system exhibited statistically
significantly (P<0.05) higher classification accuracies in discriminating between
healthy and OA (90.51%) and between non-progressive and progressive (80.00%)
compartments than accuracies of two benchmark systems. The results indicate
potential of the DMC as a decision-support tool for detection and prediction of
knee OA.
Key words: knee osteoarthritis, radiographs, texture, multiple classifier system.
120
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
1. Introduction
Knee osteoarthritis (OA) is a chronic and progressive joint disease of unknown
origin characterised by cartilage degradation and bone changes [1–5]. Detection
and prediction of OA are of particular importance since they could prompt earlier
treatment to prevent the cartilage and bone destruction that lead to disability.
Previous studies suggested that bone changes occur first on the development
pathway to knee OA [1, 6], i.e. that changes in subchondral bone precede
osteophytes and loss of cartilage volume. Therefore, bone changes were analysed
for knee OA detection and prediction. For the analysis, knee radiographs were
used since they are cheap, non-invasive, widely accessible, and they contain
two-dimensional bone texture that is directly related to three-dimensional bone
structure [7–11].
Knee OA detection was defined as a problem of classifying a knee radiograph into
healthy (Kellgren and Lawrence (KL) [12] grade 0) or OA (KL grade 2 or 3) class,
and a weighted neighbor distance using a compound hierarchy of algorithms
representing morphology (WND-CHARM) classification system was used [13].
The system extracts 2885 texture features from each knee radiograph and uses
them with a nearest neighbour classifier. The WND-CHARM system was also
used to predict knee OA development defined as an increase in KL grade from 0
at baseline to 3 at follow-up 20 years later [14]. The system was trained on knee
radiographs taken at baseline and then it classified a given knee radiograph as
no-OA-development (KL grade 0 at follow-up) or OA-development (KL grade
3 at follow-up). In other work, a regression model was trained on knee (KL
grade ≥1) radiographs taken at baseline and then it was used to predict OA by
assigning a knee radiograph into no-OA-progression or OA-progression group
[15]. The disease progression was defined as an increase in the sum of joint space
narrowing (JSN) and osteophyte grades between baseline and follow-up three
years later. The regression model uses shape parameters and fractal signature
121
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
analysis.
However, uses of the classification system and the regression model are limited.
This is because features of the WND-CHARM system are redundant and their
computation is time-consuming [16], shape parameters used in the regression
model describe bone anisotropy only in the horizontal and vertical directions
and they are calculated using the box-counting technique that highly depends
on image signal-to-noise ratio and trabecular marrow pore size [17]. Recently
developed Hurst orientation transform [18] and variance orientation transform
[19] methods overcome most of these problems, but the results produced depend
on angular space quantisation.
One possible solution for knee OA detection and prediction is a
dissimilarity-based texture classification [20–22]. Unlike in other approaches,
distances between TB texture images are used for classification, avoiding
the problems associated with the use of image features. In previous study,
the distances were calculated using a scale- and rotation-invariant signature
dissimilarity measure (SDM) [22]. For detection of knee OA, the SDM combined
with a support vector machine (SVM) classifier achieved 85.4% classification
accuracy, outperforming the benchmark WND-CHARM system. For prediction
of knee OA progression, a generalised linear model based on roughness and
orientation texture features obtained from the SDM achieved the accuracy of 0.74
AUC (area under the receiver operating characteristic curve) [23].
Although the results obtained are promising further work is required to
increase the accuracies. This could be achieved by combining the SDM with
a classifier ensemble. The rationale behind this is that classifier ensembles are
generally more accurate than a single classifier; as shown for a wide range of
classification problems [24–26]. However, this approach requires both generation
and combination of diverse and accurate classifiers.
122
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
To address these problems, a dissimilarity-based multiple classifier (DMC)
system was developed in this study. The system consists of the following
three components: 1) measurement of distances between TB texture images,
2) generation of a diverse classifier ensemble and 3) selection of accurate
classifiers from the ensemble and combination of the classifiers selected. For
the measurement of the distances, the SDM method is used. A diverse classifier
ensemble is generated using prototype selection [27], bootstrapping of training
set [28] and heterogeneous classifiers [29]. A measure of competence based on
random classification (MCR) [30] is used to select accurate classifiers from the
ensemble. The classifiers selected are combined using the majority voting rule. To
evaluate the performance of the DMC system, two data sets of knee radiographs
were chosen. The first data set, used for OA detection, contains radiographs
taken from healthy (KL grade 0) and OA (KL grade 2 or 3) knees. The second
data set, used for prediction of OA progression, contains radiographs of knees
with non-progressive and progressive OA. The disease progression was defined
as an increase in the sum of JSN and osteophyte grades between baseline and
follow-up four years later.
2. Materials and methods
2.1 Subjects and radiographs
Informed consent was obtained from all subjects studied. For each subject,
standardised anteroposterior knee radiographs were taken in semi-flexed (30◦)
position using a Shimadzu Corporation (Kyoto, Japan) x-ray machine (model
P-20) and a radiographic frame [31] at Perth Radiological Clinic, Subiaco, Western
Australia.
Detection of knee OA
102 knee radiographs were taken from 51 subjects. For each radiograph, the
medial and lateral tibiofemoral compartments were graded at KL scale according
123
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
to the 1995 Osteoarthritis Research Society International (OARSI) atlas [32]. The
compartments were graded by two readers with 10 years of experience each.
The discrepancies in grades were adjudicated by a third reader with 15 years
of experience. The grades were used to divide the subjects into healthy and
OA groups. The healthy group contained 17 subjects who had KL grade 0 (no
radiographic OA) in both tibiofemoral compartments in both knees. The OA
group contained 34 subjects who had KL grade 2 or 3 (minimal or moderate
radiographic OA) in at least one tibiofemoral compartment in at least one knee.
Subject characteristics for each group are listed in Table 1.
A TB texture region of size 256×256 pixels was selected under each of the
medial and lateral compartments. For OA subjects, the bone region was selected
only under the compartments diagnosed with radiographic OA. An example
of knee radiograph on which the regions are selected is shown in Fig. 1. The
region selection was performed using an automated region selection method
[33]. The method uses an active shape model for the delineation of cortical bone
plates and a fine region adjustment for fibula head, periarticular osteopenia, and
subchondral bone sclerosis. The regions selected formed classes of 68 healthy
and 69 OA texture images, respectively. Examples of the images are shown in
Figs. 2(a)–(d). The classes were previously used to evaluate the SDM method in
knee OA detection [22].
Prediction of knee OA progression
From 50 subjects, 100 knee radiographs were taken at baseline (from September
2000 to November 2001) and another 100 at follow-up four years later. Each
knee radiograph was graded for tibiofemoral joint space narrowing (JSN) and
osteophytes grades (0 to 3, where 0 indicates no JSN and osteophytes) in the
medial and lateral compartments according to the 1995 OARSI atlas. The same
three readers graded the radiographs. Based on grades the compartments were
divided into non-progressive and progressive groups. The first group contained
124
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
141 compartments with no increase in the sum of JSN and osteophytes grades
between baseline and follow-up. The second group contained 59 compartments
with an increase in the sum between the two examinations. 11 knees had both
non-progressive and progressive compartments. The groups had different subject
characteristics, i.e. age, BMI (body mass index) and gender at baseline and this
could cause bias in results [34, 35]. Therefore, 81 non-progressive compartments
from 14 subjects were excluded (Table 2) to match the characteristics. This left the
total number of compartments at 119.
Under each of the compartments a TB texture region of size 256×256 pixels was
selected using the automated region selection method. The selection was done
only for the radiographs taken at baseline. The regions formed non-progressive
and progressive classes containing 60 and 59 texture images, respectively. Their
examples are shown in Figs. 2(e)-(h).
2.3 DMC system
A flowchart of the DMC system is shown in Fig. 3. Components of the system are
described in the following sections.
Measurement of distances between images
Distances between TB texture images are measured using the SDM method [22].
The method produces a matrix D = (dij)N×N for N images where dij is the
distance between i-th and j-th images as follows.
First, a scale-space representation of each image is calculated as K convolutions
of the image with the Gaussian kernel functions [36]. The functions differ in
scale parameter σ that is taken from a sequence of predefined scales σ1, . . . , σK .
In this study, the sequence was set to σk = 2 × 1.07k, k = 1, . . . , 25 which
corresponds to TB image sizes ranging from 0.11 to 1.08 mm. Previous studies
showed that OA changes in TB are most significant in the size range [4, 37].
125
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
Next, scale and rotation invariant gradient and Laplacian differential operators
[22] are calculated for each image representation. The operators calculated act
as edge and smoothness detectors, respectively. Maximum absolute values of
the operators are then found over all scales for each pixel. This step allows for
detecting edges and smooth regions that are most prominent in the texture image.
The values found are used to calculate roughness and orientation measures for
each pixel. The roughness measure is defined as a difference between the two
maximum absolute values, and it is negative if a neighbourhood of the pixel is
smooth and positive otherwise. The orientation measure is defined as an angle of
the gradient with the maximum absolute value, and it is calculated with respect to
the direction that captures most texture changes in the image. The two measures
are invariant to in-plane rotation and scale within the predefined range of scales.
Next, for each image representation, histograms of the measures are used to
produce roughness and orientation signatures. Finally, the distance dij is defined
as a sum of distance between two roughness signatures and distance between two
orientation signatures produced for i-th and j-th images. For the calculation of
distances between the signatures, Earth movers distance is used [38]. The matrix
D produced is invariant to a range of imaging conditions such as exposure,
magnification, noise, and blur [22]. For TB texture images, the matrix is affected
by bone properties such as trabecular thickness, separation and orientation that
change with knee OA [3].
Generation of a diverse classifier ensemble
The diverse classifier ensemble consists of three different subsets of classifiers.
These subsets are generated using prototype selection [27], bootstrapping of
training set [28] and heterogeneous classifiers [29], respectively.
Subset 1: Prototype selection
For this subset, first a percentage of TB texture images, called prototypes, is
randomly selected from each class. Then, classifiers are trained on the distances
126
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
between the prototypes and the remaining TB texture images. 100 linear and
100 quadratic discriminant classifiers were used with the prototypes consisting
of 20% and 10% of all TB texture images, respectively. The percentages of
prototypes and the number and types of classifiers used are a trade off between
good performance, high diversity and low computational complexity [27].
Subset 2: Bootstrapping of training set
This subset contains 50 decision tree classifiers with the Gini splitting criterion
and the pruning level set to 5 [39]. The classifiers were trained in a
five-dimensional feature space obtained by projecting the distances taken from
the matrix D. For the distance projection, Sammons nonlinear mapping was used
[40]. Each classifier was trained on a bootstrapped sample of training set.[28]
Previous studies showed that a combination of the classifiers outperforms the
single best classifier [26, 30].
Subset 3: Heterogeneous classifiers
11 heterogeneous classifiers are included in this subset [39]: (1) linear, (2)
quadratic, (3) nearest mean, (4-6) k-nearest neighbour classifiers with k = 1, 5, 15,
(7 and 8) Parzen classifiers with the optimal smoothing parameter h or h/2, (9)
decision tree classifier with the Gini splitting criterion and the pruning level set
to 5, (10) SVM classifier with radial basis function (RBF) kernel, (11) feed-forward
backpropagation neural network classifier containing one hidden layer with 10
neurons. Each classifier was trained in a two-dimensional feature space that was
obtained by projecting the distances using the Sammons mapping. Bootstrapping
of training set was used. It was shown that the heterogeneous classifiers perform
well for a number of benchmark data sets [26, 30].
Selection of accurate classifiers and combination of the classifiers selected
To select accurate classifiers from the diverse classifier ensemble, the MCR
method is used [30]. For each TB texture image, the method estimates local
127
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
classification accuracies of all classifiers. Accuracies of heterogeneous classifiers
are estimated using weighted k = 10 nearest neighbours. For homogeneous
classifiers, the estimation is performed by measuring how ”close” the classifiers
are to a random classifier in an information-theoretic sense. The classifiers with
local accuracies that are higher-than-random are selected from the ensemble and
combined using the majority voting rule. In previous studies, the MCR method
and the voting rule outperformed other methods used for combining classifiers
[30].
2.4 Comparison against other benchmark systems
The performance of the DMC system was compared against the following
systems used for classification of knee radiographs.
WND-CHARM system
This system is a benchmark for detection of knee OA [13] and prediction of knee
OA development [14]. In the system, 2885 features are extracted from each texture
image. The features include Chebyshev statistics, first four moments, multiscale
histograms, Haralick, Tamura and Zernike features, and they are calculated on
raw images, images processed by wavelet, Fourier and Chebyshev transforms
and combinations of the processed images [16]. A Fisher score is calculated for
each feature and 10% of features with the highest scores are selected. The selected
features are used with a weighted nearest neighbour classifier.
SDM-SVM system
This system achieved the best performance in detection of knee OA based on
classification of TB texture images [22]. In the system, the SVM classifier with RBF
kernel is trained on the distances between TB texture images measured using the
SDM method. The SVM classifier was chosen because of its good performance
for a wide range of binary classification problems.
128
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
3. Results
Classification accuracy (the percentage of correctly classified TB texture images),
specificity (the percentage of healthy/non-progressive images classified as
healthy/non-progressive) and sensitivity (the percentage of OA/progressive
images classified as OA/progressive) were calculated as averages over 5
repetitions of 2-fold cross-validation. Differences between the accuracies were
evaluated using 5×2 cv F test [41] with the significance level of P<0.05.
For detection of knee OA, the DMC system (90.51%) had statistically significantly
higher classification accuracy that those of the SDM-SVM (84.96%) and
WND-CHARM (67.74%) systems (Table 3). The system developed had also the
highest specificity and sensitivity values. For the DMC and SDM-SVM systems,
the specificity values were lower than their respective sensitivities.
The DMC system (80.00%) had also statistically significantly higher classification
accuracy than the SDM-SVM (71.09%) and WND-CHARM (63.87%) systems
for prediction of knee OA progression (Table 4). The highest specificity value
(82.00%) was obtained for DMC. The DMC and WND-CHARM systems had
sensitivity values lower than their respective specificities.
The total computational time required for the WND-CHARM system to produce
the detection and the prediction results was 571.9 minutes. For the SDM-SVM
and DMC systems, this took 62.1 and 66 minutes, respectively. The computational
times were measured on a Intel Core2 Quad computer with 4 GB of RAM and 2.66
GHz clock using MATLAB.
4. Discussion
A new dissimilarity-based multiple classifier system, called a DMC system, for
TB texture images was developed and used to detect and predict knee OA.
129
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
Unlike other classification systems, the DMC system uses distances calculated
between images and accurate classifiers. For the calculation of the distances, the
SDM method was used. A special approach was applied to generate a diverse
ensemble consisting of three subsets of classifiers, and then accurate classifiers
were selected from the ensemble using the MCR method and combined according
to the majority voting rule.
The performance of the DMC system in detection of OA was investigated using
TB texture images extracted from radiographs of healthy (KL grade 0) and OA
(KL grade 2 or 3) knees. Results showed that the DMC is more accurate (by
22.77%) than the benchmark WND-CHARM system. One possible reason for the
higher accuracy could be that the SDM method used in DMC quantifies bone
texture roughness and orientation for scales that were adjusted to trabecular
image sizes at which OA changes are most prominent [4]. In contrast, texture
features extracted in the WND-CHARM system were calculated for fixed scales
that cannot be adjusted. Previous studies showed that the roughness and
orientation of TB texture change at the trabecular sizes as results of fenestration
and thinning of trabeculae and realignment of bone microstructure towards
the main loading direction in knee OA [1, 2, 4, 42]. Another reason could be
that the SDM method is invariant to a range of imaging conditions such as
exposure, magnification, noise, and blur encountered in a routine screening of
knee radiographs. The WND-CHARM system extracts features that are sensitive
to image rotation (e.g. Tamura features [43]) and angular space quantisation (e.g.
Haralick features [44]). Consequently, even a small change in imaging conditions
can be detrimental to the ability of the WND-CHARM system to discriminate
between images of healthy and OA knees.
The DMC system is also more accurate than the SDM-SVM system. This could
be explained by the fact that the latter system uses a single classifier instead of
a combination of classifiers. It was shown that combined classifiers reduce risks
130
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
associated with choosing space that does not contain the optimal classifier and
falling into local error minima during training [24]. Another possible explanation
could be that the classifiers combined in DMC are able to reproduce arbitrary
decision boundaries and they are not sensitive to outliers. The single SVM
classifier used in the SDM-SVM, however, could perform poorly in detecting knee
OA where decision boundaries are piecewise smooth or the number of outliers is
unknown [45].
The performance of the DMC system was also investigated in prediction of
OA progression defined as an increase in the sum of JSN and osteophytes
grades between baseline and follow-up. The DMC system exhibited the best
performance. This could be explained in a similar way to that of the detection
of knee OA. In particular, by the fact that the DMC system is adjusted to the
scales at which OA progression affects roughness and orientation of TB texture.
The progression is manifested by the retention of horizontal trabeculae and
reorganisation of bone microstructure due to trabeculae being less well aligned
to the main loading direction [42].
For detection of knee OA, the DMC system had the highest specificity (87.65%)
and sensitivity (93.33%) values. In the case of prediction of knee OA progression,
the SDM-SVM (81.36%) and DMC (77.97%) systems had the two highest
sensitivity values. The higher sensitivity of the former system, however, came at
the cost of very low specificity value (61.00%); much lower than 82.00% obtained
for the system developed. The DMC values were higher than those obtained for
the shape parameters used in the previous study on the prediction of knee OA
progression [15]. For medical applications, however, slightly higher sensitivity
than specificity for OA prediction could be preferable. This is because the cost
of treating OA at later stages is usually higher than the cost of misdiagnosing a
non-progressive knee.
131
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
In conclusion, results obtained indicate that the DMC system developed could
be a useful decision-support tool for the assessment of risk and severity of
the disease. It has higher classification accuracies for knee OA detection (by
22.77% and 5.55%) and prediction (by 16.13% and 8.91%) than the benchmark
WND-CHARM and SDM-SVM systems. The DMC could also be useful in other
medical applications, e.g. diagnosis of interstitial lung disease based on chest
radiography [46], classification of breast lesions using ultrasound images [47]
and discrimination of dermoscopic images for skin lesions [48]. Future work will
focus on further evaluations of the DMC system, especially using large data sets
of knee radiographs from longitudinal studies.
Acknowledgements
The authors appreciate financial support from the School of Mechanical and
Chemical Engineering, University of Western Australia.
132
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
5. List of figures
Figure 1: An example of knee radiograph with the selected regions.
133
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
(a) (b) (c) (d)
(e) (f) (g) (h)
Figure 2: Examples of TB texture images from healthy [(a) and (b)], OA [(c) and(d)], non-progressive [(e) and (f)], and progressive [(g) and (h)] classes.
134
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
Signature dissimilarity measure (SDM)
Accurate classifiers
Measure of competence based on
random classification (MCR)
Knee radiographs
Classified TB texture images
Distances between
TB texture images
DMC system
Diverse classifier
ensemble
Prototype selection, bootstrapping of
training set, heterogeneous classifiers
Majority voting
Classification
Extraction of TB texture images
Figure 3: A flowchart of the DMC system.
135
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
6. List of tables
Table 1: Subject characteristics for detection of knee OA.
Group No. of TB textureimages (KL grade)
Mean (SD)age in years
Mean (SD)BMI Men/Women
Healthy 68 (0) 40.8 (7.4) 24.6 (4.5) 0.70
OA 37 (2)32 (3) 44.5 (7.5) 27.1 (4.0) 0.69
136
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
Table 2: Subject characteristics for prediction of knee OA progression.
GroupNo. of TB texture
images (increase inthe sum∗)
Mean (SD)age inyears
Mean (SD)BMI inkg/m2
Men/Women
Non-progressive 60 (0) 47.1 (6.3) 25.5 (1.8) 0.75
Progressive
41 (1)12 (2)5 (3)1 (5)
47.5 (6.0) 26.7 (2.8) 0.83
∗ Sum of JSN and osteophytes grades between baseline and follow-up four years later.
137
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
Table 3: Classification accuracies, specificities and sensitivities (in percent) of theWND-CHARM, SDM-SVM and DMC systems for detection of knee OA.
System Accuracy Specificity Sensitivity
WND-CHARM 67.74 76.47 59.13
SDM-SVM 84.96 77.65 92.17
DMC 90.51 87.65 93.33
138
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
Table 4: Classification accuracies, specificities and sensitivities (in percent) ofthe WND-CHARM, SDM-SVM and DMC systems for prediction of knee OAprogression.
System Accuracy Specificity Sensitivity
WND-CHARM 63.87 64.33 63.39
SDM-SVM 71.09 61.00 81.36
DMC 80.00 82.00 77.97
139
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
7. References
[1] C Buckland Wright. “Subchondral bone changes in hand and knee
osteoarthritis detected by radiography”. In: Osteoarthritis and Cartilage 12
(2004), S10–S19.
[2] L Kamibayashi, U P Wyss, T D V Cooke, and B Zee. “Changes in mean
trabecular orientation in the medial condyle of the proximal tibia in
osteoarthritis”. In: Calcified Tissue International 57 (1995), pp. 69–73.
[3] L Kamibayashi, U P Wyss, T D V Cooke, and B Zee. “Trabecular
Microstructure in the Medial Condyle of the Proximal Tibia of Patients with
Knee Osteoarthritis”. In: Bone 17 (1995), pp. 27–35.
[4] E A Messent, R J Ward, C J Tonkin, and C Buckland Wright. “Tibial
cancellous bone changes in patients with knee osteoarthritis. A short term
longitudinal study using Fractal Signature Analysis”. In: Osteoarthritis and
Cartilage 13 (2005), pp. 463–470.
[5] E L Radin and R M Rose. “Role of Subchondral Bone in the Initiation
and Progression of Cartilage Damage”. In: Clinical Orthopaedics and Related
Research 213 (1986), pp. 34–40.
[6] C Ding, F Cicuttini, and G Jones. “Tibial subchondral bone size and knee
cartilage defects: relevance to knee osteoarthritis”. In: Osteoarthritis and
Cartilage 15 (2007), pp. 479–486.
[7] L Apostol, V Boudousq, O Basset, C Odet, S Yot, J Tabary, J M Dinten,
E Boiler, P O Kotzki, and F J Peyrin. “Relevance of 2D radiographic texture
analysis for the assessment of 3D bone micro-architecture”. In: Medical
Physics 33 (2006), pp. 3546–3556.
[8] R Jennane, R Harba, G Lemineur, S Bretteil, A Estrade, and C L Benhamou.
“Estimation of the 3D self-similarity parameter of trabecular bone from its
2D projection”. In: Medical Image Analysis 11 (2007), pp. 91–98.
140
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
[9] G Luo, J H Kinney, J J Kaufman, D Haupt, A Chiabrera, and
R S Siffert. “Relationship between plain radiographic patterns and
three-dimensional trabecular architecture in the human calcaneus”. In:
Osteoporosis International 9 (1999), pp. 339–345.
[10] L Pothuaud, C L Benhamou, P Porion, E Lespessailles, R Harba, and
P Levitz. “Fractal dimension of trabecular bone projection texture is related
to three dimensional microarchitecture”. In: Journal of Bone and Mineral
Research 15 (2000), pp. 691–699.
[11] L Pothuaud, P Carceller, and D Hans. “Correlations between grey-level
variations in 2D projections images (TBS) and 3D microarchitecture:
Applications in the study of human trabecular bone microarchitecture”. In:
Bone 42 (2008), pp. 775–787.
[12] J H Kellgren and J S Lawrence. “Radiological assessment of osteoarthrosis”.
In: Annals of the Rheumatic Diseases 16 (1957), pp. 494–502.
[13] L Shamir, S M Ling, W W Scott, Jr., A Bos, N Orlov, T J Macura, D M Eckley,
L Ferrucci, and I G Goldberg. “Knee x-ray image analysis method for
automated detection of osteoarthritis”. In: IEEE Transactions on Biomedical
Engineering 56 (2009), pp. 407–415.
[14] L Shamir, S M Ling, W Scott, M Hochberg, L Ferrucci, and I G Goldberg.
“Early detection of radiographic knee osteoarthritis using computer aided
analysis”. In: Osteoarthritis and Cartilage 17 (2009), pp. 1307–1312.
[15] V B Kraus, S Feng, S C Wang, S White, M Ainslie, A Brett, A Holmes, and
H C Charles. “Trabecular morphometry by fractal signature analysis is a
novel marker of osteoarthritis progression”. In: Arthritis & Rheumatism 60
(2009), pp. 3711–3722.
[16] N Orlov, L Shamir, T Macura, J Johnston, M D Eckley, and I G Goldberg.
“WND-CHARM: Multi-purpose image classification using compound
image transforms”. In: Pattern Recognition Letters 29 (2008), pp. 1684–1693.
141
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
[17] H W Chung, C C Chu, M Underweiser, and F W Wehrli. “On the fractal
nature of trabecular structure”. In: Medical Physics 21 (1994), pp. 1535–1540.
[18] P Podsiadlo and G W Stachowiak. “Analysis of trabecular bone texture
by modified Hurst orientation transform”. In: Medical Physics 29 (2002),
pp. 460–474.
[19] M Wolski, P Podsiadlo, G W Stachowiak, L S Lohmander, and M Englund.
“Differences in trabecular bone texture between knees with and without
radiographic osteoarthritis detected by directional fractal signature
method”. In: Osteoarthritis and Cartilage 18 (2010), pp. 684–690.
[20] R P W Duin, D de Ridder, and D M J Tax. “Experiments with a featureless
approach to pattern recognition”. In: Pattern Recognition Letters 18 (1997),
pp. 1159–1166.
[21] E Pekalska and R P W Duin. “Dissimilarity representations allow
for building good classifiers”. In: Pattern Recognition Letters 23 (2002),
pp. 943–956.
[22] T Woloszynski, P Podsiadlo, G W Stachowiak, and M Kurzynski. “A
signature dissimilarity measure for trabecular bone texture in knee
radiographs”. In: Medical Physics 37 (2010), pp. 2030–2042.
[23] T Woloszynski, P Podsiadlo, G W Stachowiak, M Kurzynski, L S
Lohmander, and M Englund. “Prediction of progression of radiographic
knee osteoarthritis using tibial trabecular bone texture”. In: Arthritis &
Rheumatism 64 (2012), pp. 688–695.
[24] L I Kuncheva. Combining Pattern Classifiers: Methods and Algorithms.
Wiley-Interscience, 2004.
[25] P C Smits. “Multiple classifier systems for supervised remote sensing image
classification based on dynamics classifier selection”. In: IEEE Transactions
on Geoscience and Remote Sensing 40 (2002), pp. 801–813.
142
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
[26] T Woloszynski and M Kurzynski. “A probabilistic model of classifier
competence for dynamic ensemble selection.” In: Pattern Recognition 44
(2011), pp. 2656–2668.
[27] E Pekalska, R P W Duin, and P Paclik. “Prototype selection for
dissimilarity-based classifiers”. In: Pattern Recognition 39 (2006),
pp. 189–208.
[28] L Breiman. “Bagging predictors”. In: Machine Learning 24 (1996),
pp. 123–140.
[29] A M P Canuto, M C C Abreu, L M Oliveira, J C Xavier, Jr., and A M Santos.
“Investigating the influence of the choice of the ensemble members in
accuracy and diversity of selection-based and fusion-based methods for
ensembles”. In: Pattern Recognition Letters 28 (2007), pp. 472–486.
[30] T Woloszynski, M Kurzynski, P Podsiadlo, and G W Stachowiak. “A
measure of competence based on random classification for dynamic
ensemble selection”. In: Information Fusion 13 (2012), pp. 207–212.
[31] P Podsiadlo and G W Stachowiak. “A rig for acquisition of standardized
trabecular bone radiographs”. In: Acta Radiologica 43 (2002), pp. 101–103.
[32] R D Altman, M Hochberg, W A Murphy, Jr., F Wolfe, and M
Lequesne. “Atlas of individual radiographic features in osteoarthritis”. In:
Osteoarthritis and Cartilage 3 (1995), A3–A70.
[33] P Podsiadlo, M Wolski, and G W Stachowiak. “Automated selection of
trabecular bone regions in knee radiographs”. In: Medical Physics 35 (2008),
pp. 1870–1883.
[34] P Manninen, H Riihimaki, M Heliovaara, and P Makela. “Overweight,
gender and knee osteoarthritis”. In: International Journal of Obesity and
Related Metabolic Disorders 20 (1996), pp. 595–597.
143
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
[35] M Reijman, H A P Pols, A P Bergink, J M W Hazes, J N Belo, A M Lievense,
and S M A Bierma-Zeinstra. “Body mass index associated with onset and
progression of osteoarthritis of the knee but not of the hip: The Rotterdam
Study”. In: Annals of the Rheumatic Diseases 66 (2007), pp. 158–162.
[36] T Lindeberg. “Scale-space for discrete signals”. In: IEEE Transactions on
Pattern Analysis and Machine Intelligence 12 (1990), pp. 234–254.
[37] P Podsiadlo, L Dahl, M Englund, L S Lohmander, and G W Stachowiak.
“Differences in trabecular bone texture between knees with and without
radiographic osteoarthritis detected by fractal methods”. In: Osteoarthritis
and Cartilage 16 (2008), pp. 323–329.
[38] Y Rubner, C Tomasi, and L J Guibas. “The Earth Movers Distance as a
metric for image retrieval”. In: International Journal of Computer Vision 40
(2000), pp. 99–121.
[39] R O Duda, P E Hart, and D G Stork. Pattern Classification.
Wiley-Interscience, 2001.
[40] J W Sammon. “A nonlinear mapping for data structure analysis”. In: IEEE
Transactions on Computers C18 (1969), pp. 401–409.
[41] E Alpaydin. “Combined 5x2cv F test for comparing supervised
classification learning algorithms”. In: Neural Computation 11 (1999),
pp. 1885–1892.
[42] E A Messent, R J Ward, C J Tonkin, and C Buckland-Wright. “Cancellous
bone differences between knees with early, definite and advanced joint
space loss; a comparative quantitative macroradiographic study”. In:
Osteoarthritis and Cartilage 13 (2005), pp. 39–47.
[43] H Tamura, S Mori, and T Yamawaki. “Textural features corresponding to
visual perception”. In: IEEE Transactions on Systems, Man and Cybernetics
SMC-8 (1978), pp. 460–473.
144
CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .
[44] R M Haralick, K Shanmugam, and I Dinstein. “Textural features for image
classification”. In: IEEE Transactions on Systems, Man, and Cybernetics SMC-3
(1973), pp. 610–621.
[45] A Abe. Support Vector Machines for Pattern Classification (Advances in Pattern
Recognition). Springer, 2005.
[46] S G Armato III, A S Roy, H MacMahon, F Li, K Doi, S Sone, and M B Altman.
“Evaluation of automated lung nodule detection on low-dose computed
tomography scans from a lung cancer screening program”. In: Academic
Radiology 12 (2005), pp. 337–346.
[47] B Liu, H D Cheng, J Huang, J Tian, X Tang, and J Liu. “Fully automatic
and segmentation-robust classification of breast tumors based on local
texture analysis of ultrasound images”. In: Pattern Recognition 43 (2010),
pp. 280–298.
[48] C Serrano and B Acha. “Pattern analysis of dermoscopic images based on
Markov random fields”. In: Pattern Recognition 42 (2009), pp. 1052–1057.
145
CHAPTER 6
CONCLUSIONS AND FUTURE WORK
This chapter begins with main findings and observations taken from the four
papers that form the core of this thesis. General conclusions drawn from the
studies conducted are presented afterwards. The chapter ends with discussion
on possible future work related to potential applications of the system developed
in medicine and other areas.
1. Summary of main findings and observations
Main findings and observations taken from Chapter 2 (paper 1):
• A new method, i.e. signature dissimilarity measure (SDM), was developed
for measuring distances between trabecular bone (TB) texture regions
selected on knee radiographs. The method was evaluated using TB texture
images of healthy and osteoarthritic (OA) knees, images taken from Brodatz
album, knee radiographs of frozen tibia head, and computer-generated
fractal texture images.
• The accuracy of the method developed in detection of knee OA was studied.
Results showed that the SDM method, when combined with the support
vector machine (SVM) classifier, outperforms the benchmark system for
knee classification.
• The accuracy of the method in rotation-invariant texture classification
was evaluated and compared against the multiresolution grey-scale- and
146
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
rotation-invariant benchmark method. The two methods, when combined
with the nearest neighbour classifier, exhibited comparable performance.
• The effects of imaging conditions on the SDM method were investigated.
The method was found to be invariant to a range of exposure,
magnification, image size, anisotropy direction, noise, and blur
encountered in a routine screening of knee radiographs.
• From the results obtained it was concluded that the SDM method is accurate
in texture classification and invariant to imaging conditions.
Main findings and observations taken from Chapter 3 (paper 2):
• Three parameters of TB texture, i.e. roughness, degree of anisotropy
and direction of anisotropy, were calculated using the SDM method. A
generalised linear model based on the parameters was constructed.
• The accuracy of the model in prediction of knee OA progression was
studied. The results obtained showed that the model can predict OA
progression in knees with early (Kellgren and Lawrence [KL] grade≤1) and
late (KL grade ≥2) radiographic knee OA at baseline.
• The model had higher prediction accuracy than those of models based on
age, sex, body mass index, and joint space narrowing grade at baseline.
• The SDM method could be a valuable tool in prediction of knee OA
progression and in quantification of OA changes in the bone texture.
Main findings and observations taken from Chapter 4 (paper 3):
• A new method, i.e. measure of competence based on random classification
(MCR), was developed for selecting accurate classifiers from classifier
ensembles. The method was evaluated using 14 benchmark data sets.
• Based on theoretical derivation it was shown that the method developed
increases classification accuracy of the majority voting rule.
147
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
• Results showed that the MCR method outperforms other methods based on
classifier ensembles regardless of the ensemble type used (homogeneous or
heterogeneous).
• The MCR method uses distances between images instead of image features
and hence it can work with dissimilarity measures.
Main findings and observations taken from Chapter 5 (paper 4):
• The SDM and MCR methods were used to form a dissimilarity-based
multiple classifier (DMC) system.
• A special approach was applied to generate homogeneous and
heterogeneous classifier ensembles using distances between TB texture
images.
• The accuracies of the DMC system in detection and prediction of
progression of knee OA were studied. The system developed had higher
accuracies than those obtained for the SDM method combined with the
SVM classifier and the benchmark system.
• The DMC system could be a useful decision-support tool in medicine,
engineering and other areas.
2. General conclusions
The following general conclusions can be drawn from the work conducted in this
thesis:
• Classification of texture images must involve quantification of roughness
and orientation at multiple scales. For the quantification, distances between
images can be used instead of image features.
• The SDM method developed has the ability to measure distances between
texture images in terms of roughness and orientation and is invariant to a
range of imaging conditions.
148
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
• The SDM method could successfully discriminate between healthy and OA
knees and between knees with non-progressive and progressive OA.
• Selection of classifiers that have better-than-random accuracies improves
performance of methods based on classifier ensembles.
• The MCR method developed has the ability to select accurate classifiers
from both homogeneous and heterogeneous classifier ensembles.
• The SDM and MCR methods form an accurate and robust DMC system that
could successfully detect knee OA and predict knee OA progression.
• The DMC system could eventually be used in other practical applications,
e.g. classification of medical images in diagnosis and prognosis of
diseases and quantification of engineering surfaces for automated machine
inspection.
3. Future work
This thesis describes the development of a new automated classification system
for detection and prediction of knee OA and its evaluation on medical images of
bone texture. Further work could aim at extending the applicability of the system
to medicine and other areas.
3.1 Medicine
The results obtained in this thesis demonstrated potential of the DMC system
for the detection and prediction of knee OA. Therefore, the future work could
focus on the development of a DMC based decision-support tool for patient
monitoring and management of patient treatment against the disease. Current
research on this topic is based on advanced imaging techniques (e.g. computed
tomography [1]) and biochemical markers (e.g. serum cartilage oligomeric
matrix protein, hyaluronan and soluble vascular cell adhesion molecule 1 [2–5]).
149
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
However, these methods are rather expensive and inaccessible to the majority
of patients and they require highly qualified medical staff and specialised
instruments. In addition, they lack a widely accepted and validated OA scoring
system. In contrast, since the DMC system uses plain radiography and the
gold standard of Kellgren and Lawrence scale it could provide inexpensive and
reliable means of evaluating the effects of medication, intra-articular injections
and surgical interventions on the disease progression for patient monitoring and
treatment.
The future work could also focus on using the DMC system in those medical
applications where classification and quantification of biological surfaces are
required. For example, the system could be used for the assessments of
fracture risk due to osteoporosis [6, 7] and trabeculae network damage due to
rheumatoid arthritis [8], quantification of TB structure on dental radiographs
[9], classification of breast lesions using ultrasound images [10], diagnosis of
interstitial lung disease based on chest radiography [11, 12], and discrimination of
skin lesions using dermoscopic images [13]. Other medical applications include
quantifications of surfaces of dental ceramics [14] and bone implants [15]. After
extending the SDM method to three dimensions, the DMC system could also
find applications in the analysis of bone, blood vessels and biomaterials from
tomographic images [16] and classification of brain tumours from magnetic
resonance imaging [17].
3.2 Other areas
The work conducted in this thesis showed that the DMC system can be used
for multiscale quantification of texture roughness and orientation and that it
is invariant to a range of imaging conditions. Thus, the system could be
a useful tool for the analysis and characterisation of engineering surfaces.
This includes applications in metrology in 3D surface topography description,
150
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
machine condition monitoring and failure prediction based on classification of
wear particles [18] and automated surface inspection and quality control based
on quantification of defected/worn engineering surfaces [19]. The system could
also be used for finding a relationship between topographies of surfaces and their
friction coefficients. Current research targeted at finding the relationship uses
basic parameters (e.g. Ra and Rq) that provide limited information about surface
texture [20, 21]. On the opposite, the DMC system provides detailed information
about texture roughness and orientation for predefined scales. Therefore,
complete description of engineering surfaces, including textural and frictional
characteristics, would be possible when the relationship between topographies
of the surfaces and their friction coefficients is found.
The DMC system could also find applications in food engineering and chemistry,
geology, botany and forestry, mining and textile industries, and remote sensing.
In food engineering and chemistry, the system could be used for segregation
of tea granules [22, 23], discrimination of crumb grains based on visual
appearance [24] and classification of meat texture [25, 26]. In geology, botany
and forestry, the system could be a useful tool for classification of surfaces of
heavy-mineral grains [27], identification of plants [28] and discrimination of
vegetation communities [29]. In mining and textile industries, the system could
be used for iron ore particle characterisation [30], estimation of run-of-mine ore
composition [31] and automated inspection of textile fabrics [32]. The system
could also be used for classification of land cover obtained through aerial space
photography in remote sensing [33].
151
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
4. References
[1] P N Bansal, N S Joshi, V Entezari, M W Grinstaff, and B D
Snyder. “Contrast enhanced computed tomography can predict the
glycosaminoglycan content and biomechanical properties of articular
cartilage”. In: Osteoarthritis and Cartilage 18 (2010), pp. 184–191.
[2] Y M Golightly, S W Marshall, V B Kraus, J B Renner, A Villaveces, C Casteel,
and J M Jordan. “Biomarkers of incident radiographic knee osteoarthritis”.
In: Arthritis & Rheumatism 63 (2011), pp. 2276–2283.
[3] V B Kraus. “Biomarkers in osteoarthritis”. In: Current Opinion in
Rheumatology 17 (2005), pp. 641–646.
[4] S M Ling, D D Patel, P Garnero, M Zhan, M Vaduganathan, D Muller,
D Taub, J M Bathon, M Hochberg, D R Abernethy, E J Metter, and L Ferrucci.
“Serum protein signatures detect early radiographic osteoarthritis”. In:
Osteoarthritis and Cartilage 17 (2009), pp. 43–48.
[5] J Cibere, H Zhang, P Garnero, A R Poole, T Lobanok, T Saxne, V B Kraus,
A Way, A Thorne, H Wong, J Singer, J Kopec, A Guermazi, C Peterfy,
S Nicolaou, P L Munk, and J M Esdaile. “Association of biomarkers
with pre-radiographically defined and radiographically defined knee
osteoarthritis in a population-based study”. In: Arthritis & Rheumatism 60
(2009), pp. 1372–1380.
[6] C L Benhamou, S Poupon, E Lespessailles, S Loiseau, R Jennane, V Siroux,
W Ohley, and L Pothuaud. “Fractal analysis of radiographic trabecular
bone texture and bone mineral density: two complementary parameters
related to osteoporotic fractures”. In: Journal of Bone and Mineral Research
16 (2001), pp. 697–704.
[7] B Brunet-Imbault, G Lemineur, C Chappard, R Harba, and C L Benhamou.
“A new anisotropy index on trabecular bone radiographic images using the
fast Fourier transform”. In: BMC Medical Imaging 5 (2005), pp. 4–14.
152
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
[8] C B Caldwell, E L Moran, and E R Bogoch. “Fractal dimension as a measure
of altered trabecular bone in experimental inflammatory arthritis”. In:
Journal of Bone and Mineral Research 13 (1998), pp. 978–985.
[9] T D Faber, D C Yoon, and S K Service S C White. “Fourier and wavelet
analyses of dental radiographs detect trabecular changes in osteoporosis”.
In: Bone 35 (2004), pp. 403–411.
[10] B Liu, H D Cheng, J Huang, J Tian, X Tang, and J Liu. “Fully automatic
and segmentation-robust classification of breast tumors based on local
texture analysis of ultrasound images”. In: Pattern Recognition 43 (2010),
pp. 280–298.
[11] B van Ginneken, L Hogeweg, and M Prokop. “Computer-aided diagnosis
in chest radiography: Beyond nodules”. In: European Journal of Radiology 72
(2009), pp. 226–230.
[12] S G Armato III, A S Roy, H MacMahon, F Li, K Doi, S Sone, and M B Altman.
“Evaluation of automated lung nodule detection on low-dose computed
tomography scans from a lung cancer screening program”. In: Academic
Radiology 12 (2005), pp. 337–346.
[13] C Serrano and B Acha. “Pattern analysis of dermoscopic images based on
Markov random fields”. In: Pattern Recognition 42 (2009), pp. 1052–1057.
[14] J L Drummond, M Thompson, and B J Super. “Fracture surface examination
of dental ceramics using fractal analysis”. In: Dental Materials 21 (2005),
pp. 586–589.
[15] A Wennerberg. “The importance of surface roughness for implant
incorporation”. In: International Journal of Machine Tools and Manufacture 38
(1998), pp. 657–662.
[16] R E Guldberg, R T Ballock, B D Boyan, C L Duvall, A S Lin, S Nagaraja,
M Oest, J Phillips, B D Porter, G Robertson, and W R Taylor. “Analyzing
bone, blood vessels, and biomaterials with microcomputed tomography”.
In: IEEE Engineering in Medicine and Biology Magazine 22 (2003), pp. 77–83.
153
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
[17] P Georgiadis, D Cavouras, I Kalatzis, D Glotsos, E Athanasiadis,
S Kostopoulos, K Sifaki, M Malamas, G Nikiforidis, and E Solomou.
“Enhancing the discrimination accuracy between metastases, gliomas and
meningiomas on brain MRI by volumetric textural features and ensemble
pattern recognition methods”. In: Magnetic Resonance Imaging 27 (2009),
pp. 120–130.
[18] P Georgiadis, D Cavouras, I Kalatzis, D Glotsos, E Athanasiadis,
S Kostopoulos, K Sifaki, M Malamas, G Nikiforidis, and E Solomou.
“Enhancing the discrimination accuracy between metastases, gliomas and
meningiomas on brain MRI by volumetric textural features and ensemble
pattern recognition methods”. In: Magnetic Resonance Imaging 27 (2009),
pp. 120–130.
[19] Z Peng and T B Kirk. “Computer image analysis of wear particles in
three-dimensions for machine condition monitoring”. In: Wear 223 (1998),
pp. 157–166.
[20] N S S Marand P K D V Yarlagadda and C Fookes. “Design and development
of automatic visual inspection system for PCB manufacturing”. In: Robotics
and Computer-Integrated Manufacturing 27 (2011), pp. 949–962.
[21] P L Menezes and S V Kailas. “Influence of surface texture and roughness
parameters on friction and transfer layer formation during sliding of
aluminium pin on steel plate”. In: Wear 267 (2009), pp. 1534–1549.
[22] R Singh, S N Melkote, and F Hashimoto. “Frictional response of precision
finished surfaces in pure sliding”. In: Wear 258 (2005), pp. 1500–1509.
[23] S Borah, E L Hines, and M Bhuyan. “Wavelet transform based image texture
analysis for size estimation applied to the sorting of tea granules”. In:
Journal of Food Engineering 79 (2007), pp. 629–639.
[24] D Wu, H Yang, X Chen, Y He, and X Li. “Application of image texture
for the sorting of tea categories using multi-spectral imaging technique
154
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
and support vector machine”. In: Journal of Food Engineering 88 (2008),
pp. 474–483.
[25] U Gonzales-Barron and F Butler. “Discrimination of crumb grain visual
appearance of organic and non-organic bread loaves by image texture
analysis”. In: Journal of Food Engineering 84 (2008), pp. 480–488.
[26] O Basset, B Buquet, S Abouelkaram, P Delachartre, and J Culioli.
“Application of texture image analysis for the classification of bovine
meat”. In: Food Chemistry 69 (2000), pp. 437–445.
[27] J Li, J Tan, and P Shatadal. “Classification of tough and tender beef by image
texture analysis”. In: Meat Science 57 (2001), pp. 341–346.
[28] J P Moral Cardona, J M Gutierrez Mas, A Sanchez Bellon,
S Dominguez-Bella, and J Martinez Lopez. “Surface textures of
heavy-mineral grains: a new contribution to provenance studies”. In:
Sedimentary Geology 174 (2005), pp. 223–235.
[29] O Martinez Bruno, R de Oliveira Plotze, M Falvo, and M de Castro.
“Fractal dimension applied to plant identification”. In: Information Sciences
178 (2008), pp. 2722–2733.
[30] H Murray, A Lucieer, and R Williams. “Texture-based classification of
sub-Antarctic vegetation communities on Heard Island”. In: International
Journal of Applied Earth Observation and Geoinformation 12 (2010),
pp. 138–149.
[31] E Donskoi, S P Suthers, S B Fradd, J M Young, J J Campbell, T D Raynlyn,
and J M F Clout. “Utilization of optical image analysis and automatic
texture classification for iron ore particle characterisation”. In: Minerals
Engineering 20 (2007), pp. 461–471.
[32] J Tessier, C Duchesne, and G Bartolacci. “A machine vision approach to
on-line estimation of run-of-mine ore composition on conveyor belts”. In:
Minerals Engineering 20 (2007), pp. 1129–1144.
155
CHAPTER 6. CONCLUSIONS AND FUTURE WORK
[33] K L Mak and P Pen. “An automated inspection system for textile fabrics
based on Gabor filters”. In: Robotics and Computer-Integrated Manufacturing
24 (2008), pp. 359–369.
156