AUTOMATED CLASSIFICATION SYSTEM FOR … · The development of an automated classiﬁcation system for ... MV majority voting ... completion of this thesis. 1. Background Automated

AUTOMATED CLASSIFICATION SYSTEM

FOR DETECTION AND PREDICTION OF

OSTEOARTHRITIS IN HUMAN KNEE JOINTS

TOMASZ WOLOSZYNSKI

(MSC)

THIS THESIS IS PRESENTED FOR THE DEGREE OF DOCTOR OF

PHILOSOPHY OF THE UNIVERSITY OF WESTERN AUSTRALIA

SCHOOL OF MECHANICAL AND CHEMICAL ENGINEERING

2011

ABSTRACT

The development of an automated classification system for detection and

prediction of knee osteoarthritis (OA) is of great interest to the medical

community. The system, once developed, would aid or replace human experts in

the assessment of risk and severity of knee OA and other chronic and progressive

joint diseases. Also, the system would provide inexpensive and reliable means

for patient monitoring and diagnosis and hence it would be a valuable tool in the

evaluation of drug treatment effects against knee OA. To date, a few attempts to

develop such a system have been reported in the literature. However, the systems

developed cannot detect the disease in its earliest stage and they are sensitive to

knee imaging conditions. Therefore, there is a growing need for the development

of an accurate and robust system for detection and prediction of knee OA.

This thesis is divided into three parts. The first part presents the development

(Chapter 2) and evaluation (Chapters 2 and 3) of a new method for measuring

distances between trabecular bone (TB) texture regions selected on knee

radiographs. The method developed, called a signature dissimilarity measure

(SDM), quantifies texture roughness and orientation at predefined scales that can

be adjusted to trabecular image sizes at which OA changes are most prominent.

Unlike other methods, the SDM method is invariant to in-plane rotation and

predefined scales. To evaluate the method developed, data sets of TB texture

images from healthy and OA knees and from knees with non-progressive and

progressive OA were constructed. The results obtained have demonstrated

that the method developed has high classification accuracies in detection and

prediction of progression of knee OA and, when combined with a support vector

i

ABSTRACT

machine (SVM) classifier, it outperforms a benchmark method used for knee

classification. The invariance of the SDM method to knee imaging conditions was

also evaluated. Using radiographs of frozen tibia head and computer-generated

fractal texture images, the method developed was found to be invariant to a

range of exposure, magnification, image size, anisotropy direction, noise, and

blur encountered in a routine screening of knee radiographs.

In the second part, a general-purpose classification method that would be more

accurate than single classifiers (e.g. the SVM classifier) was developed and

evaluated on benchmark data sets (Chapter 4). To achieve this, a special hybrid

fusion-selection method, called a measure of competence based on random

classification (MCR), was developed and used with ensembles of homogeneous

and heterogeneous classifiers. The MCR method estimates local (i.e. for each

image) classification accuracies of all classifiers in the ensemble and then selects

classifiers that have better-than-random accuracies. Using the majority voting

rule to combine the classifiers selected, the MCR method achieved the best

performance on 14 benchmark data sets.

The third part of this thesis presents a combination of the SDM and MCR methods

to form a dissimilarity-based multiple classifier (DMC) system (Chapter 5).

To combine the two methods, a special approach was applied to generate

ensembles of classifiers using the distances measured between TB texture images.

Performance of the DMC system in detection of knee OA and prediction of knee

OA progression was investigated and compared against benchmark systems. The

results obtained showed that the system developed is the most accurate system

in discriminating between healthy and OA and between non-progressive and

progressive knees.

In conclusion, the DMC system developed can accurately detect and predict knee

OA. The system could also find applications in other areas of medicine and in

ii

ABSTRACT

engineering. This includes diagnosis and prognosis of diseases based on analysis

of medical images and machine condition monitoring based on classification of

anisotropic and textured engineering surfaces.

iii

CONTENTS

ABSTRACT i

CONTENTS iv

ACKNOWLEDGEMENTS viii

JOURNAL PUBLICATIONS AND CONFERENCE PRESENTATIONS ARISING

FROM THIS THESIS ix

STATEMENT OF CANDIDATE CONTRIBUTION xi

ABBREVIATIONS xii

CHAPTER 1. INTRODUCTION 1

1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. Thesis objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3. Thesis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5. List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE FOR TRABECULAR BONE

TEXTURE IN KNEE RADIOGRAPHS 16

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1 Scale-space representation of texture image . . . . . . . . . . . . 20

2.2 Roughness signature . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Orientation signature . . . . . . . . . . . . . . . . . . . . . . . . . 23

iv

CONTENTS

2.4 Signature dissimilarity measure . . . . . . . . . . . . . . . . . . . 25

3. Materials and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 Brodatz textures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Fractal textures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Tibia head . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 Healthy and osteoarthritic knees . . . . . . . . . . . . . . . . . . 33

3.5 Computational times . . . . . . . . . . . . . . . . . . . . . . . . . 36

4. Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 36

5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6. List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7. List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

CHAPTER 3. PREDICTION OF PROGRESSION OF RADIOGRAPHIC KNEE

OSTEOARTHRITIS USING TIBIAL TRABECULAR BONE TEXTURE 66

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2. Subjects and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2.1 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2.2 Acquisition and grading of knee radiographs . . . . . . . . . . . 70

2.3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.4 Trabecular bone image analysis . . . . . . . . . . . . . . . . . . . 71

2.5 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6. List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7. List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

CHAPTER 4. A MEASURE OF COMPETENCE BASED ON RANDOM

CLASSIFICATION FOR DYNAMIC ENSEMBLE SELECTION 93

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

2. Theoretical framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

v

CONTENTS

2.1 Measure of competence based on random classification (MCR) . 98

2.2 Theoretical justification . . . . . . . . . . . . . . . . . . . . . . . . 98

3. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

3.1 DES-P and DES-KL classification systems . . . . . . . . . . . . . 100

4. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.1 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.2 MCSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.3 Classifier ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

8. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

9. List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM FOR

TRABECULAR BONE TEXTURE IN KNEE RADIOGRAPHS: DETECTION AND

PREDICTION OF OSTEOARTHRITIS 119

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

2. Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

2.1 Subjects and radiographs . . . . . . . . . . . . . . . . . . . . . . . 123

2.3 DMC system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

2.4 Comparison against other benchmark systems . . . . . . . . . . 128

3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

5. List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6. List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

CHAPTER 6. CONCLUSIONS AND FUTURE WORK 146

1. Summary of main findings and observations . . . . . . . . . . . . . . . 146

2. General conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

vi

CONTENTS

3. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

3.1 Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

3.2 Other areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

4. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

vii

ACKNOWLEDGEMENTS

I would like to express my gratitude and appreciation to those who have made

the completion of this thesis possible.

Firstly, I would like to thank my supervisors, Winthrop Professor Gwidon

Stachowiak for his invaluable help and guidance throughout the entire doctoral

research, Associate Professor Pawel Podsiadlo for providing me with technical

expertise on medical image processing and for his commitment, patience and

support during the preparation of all scientific materials arising from this

thesis and Professor Marek Kurzynski for stimulating discussions on combining

classifiers.

Secondly, I would like to thank Professor Stefan Lohmander from Lund

University (Sweden) and University of Southern Denmark and Associate

Professor Martin Englund from Lund University and Boston University School

of Medicine (MA, USA) for their collaboration and help on evaluating method

developed in this thesis in clinical settings.

Thirdly, I would like to acknowledge The University of Western Australia for

providing me with a financial support during the time of my PhD study and

the School of Mechanical and Chemical Engineering for providing the necessary

infrastructure and environment to conduct this work.

Finally, I would like to thank my parents for their unconditional support

throughout my postgraduate studies.

viii

JOURNAL PUBLICATIONS AND CONFERENCE

PRESENTATIONS ARISING FROM THIS THESIS

Journal publications

Tomasz Woloszynski, Pawel Podsiadlo, Gwidon W Stachowiak, and Marek

Kurzynski. A signature dissimilarity measure for trabecular bone texture in knee

radiographs. Medical Physics 2010;37:2030–2042 (Chapter 2).

Tomasz Woloszynski, Pawel Podsiadlo, Gwidon W Stachowiak, Marek

Kurzynski, L Stefan Lohmander, and Martin Englund. Prediction of Progression

of Radiographic Knee Osteoarthritis Using Tibial Trabecular Bone Texture.

Arthritis & Rheumatism 2012;64;688–695 (Chapter 3).

Tomasz Woloszynski, Marek Kurzynski, Pawel Podsiadlo, and Gwidon W

Stachowiak. A measure of competence based on random classification for

dynamic ensemble selection. Information Fusion 2012;13;207–212 (Chapter 4).

Tomasz Woloszynski, Pawel Podsiadlo, Gwidon W Stachowiak, and Marek

Kurzynski. Dissimilarity based multiple classifier system for trabecular bone

texture in knee radiographs: detection and prediction of osteoarthritis. Submitted

to Proceedings of the Institution of Mechanical Engineers, Part H, Journal of

Engineering in Medicine (Chapter 5).

ix

JOURNAL PUBLICATIONS AND CONFERENCE PRESENTATIONS . . .

Conference presentations

Tomasz Woloszynski, Pawel Podsiadlo and Gwidon W Stachowiak.

Classification of bone texture for detection of early knee osteoarthritis. Oral

presentation at ASIATRIB 2010 - Tribology Congress in Australia, December

2010, Perth, Australia.

The presentation was awarded a ”Young Investigators Award for the Outstanding

Paper.”

Tomasz Woloszynski, Pawel Podsiadlo and Gwidon W Stachowiak. A

multiple classifier bone texture system for prediction of knee osteoarthritis

progression. Oral presentation at International Tribology Conference Hiroshima

2011, October–November 2011, Hiroshima, Japan.

Invited talk

Tomasz Woloszynski, Pawel Podsiadlo and Gwidon W Stachowiak. Bone texture

analysis for detection and prediction of knee osteoarthritis. Oral presentation at

International Forum on ”Front-line of Tribology in the Asian Region” in JAST

(Japanese Society of Tribologists) Tribology Conference, May 2011, Tokyo, Japan.

x

STATEMENT OF CANDIDATE CONTRIBUTION

Tomasz Woloszynski (70%), Pawel Podsiadlo, Gwidon W Stachowiak, and Marek

Kurzynski. A signature dissimilarity measure for trabecular bone texture in knee

radiographs. Medical Physics 2010;37;2030–2042 (Chapter 2).

Tomasz Woloszynski (70%), Pawel Podsiadlo, Gwidon W Stachowiak, Marek

Kurzynski, L Stefan Lohmander, and Martin Englund. Prediction of Progression

of Radiographic Knee Osteoarthritis Using Tibial Trabecular Bone Texture.

Arthritis & Rheumatism 2012;64;688–695 (Chapter 3).

Tomasz Woloszynski (70%), Marek Kurzynski, Pawel Podsiadlo, and Gwidon

W Stachowiak. A measure of competence based on random classification for

dynamic ensemble selection. Information Fusion 2012;13;207–212 (Chapter 4).

Tomasz Woloszynski (70%), Pawel Podsiadlo, Gwidon W Stachowiak, and

Marek Kurzynski. Dissimilarity-based multiple classifier system for trabecular

bone texture in knee radiographs: detection and prediction of osteoarthritis.

Submitted to Proceedings of the Institution of Mechanical Engineers, Part H,

Journal of Engineering in Medicine (Chapter 5).

Candidate signature: . . . . . . . . . . . . . . . . . . . . . . . . . . .Tomasz Woloszynski

Coordinating supervisor signature: . . . . . . . . . . . . . . . . . . . . . . . . . . .Professor Gwidon W. Stachowiak

xi

ABBREVIATIONS

AUC area under receiver operating characteristic curve

AP anteroposterior

BMI body mass index

BMD bone mineral density

BPNN backpropagation neural network

CI confidence interval

DCS dynamic classifier selection

DCS-LA DCS local accuracy

DCS-MCB DCS multiple classifier behaviour

DCS-MLA DCS modified local accuracy

DES dynamic ensemble selection

DES-KE DES knora eliminate

DES-KL DES Kullback-Leibler

DES-P DES performance

DMC dissimilarity-based multiple classifier

EMD earth mover’s distance

EP ensemble pruning

FD fractal dimension

FSA fractal signature analysis

JSN joint space narrowing

k-NN k nearest neighbours

KL scale Kellgren and Lawrence scale

KL divergence Kullback-Leibler divergence

LBP local binary patterns

xii

ABBREVIATIONS

LDC linear discriminant classifier

LKC Ludmila Kuncheva collection

MCR measure of competence based on random classification

MCS multiple classifier system

MTF modulation transfer function

MRI magnetic resonance imaging

MV majority voting

NMC nearest mean classifier

NN nearest neighbour

OA osteoarthritis

OARSI osteoarthritis research society international

PCA principal component analysis

PGD principal gradient direction

QDC quadratic discriminant classifier

RBF radial basis function

ROC receiver operating characteristic

ROI region of interest

SB single best

SD standard deviation

SDM signature dissimilarity measure

SVM support vector machine

TB trabecular bone

UCI University of California machine learning repository

WND-CHARM weighted neighbor distance using compound hierarchy of

algorithms representing morphology

xiii

CHAPTER 1

INTRODUCTION

This thesis is arranged as a series of four journal papers. The papers 1, 2 and

3 have been published, while the paper 4 has been submitted for publication.

The papers represent development and progression of ideas that lead to the

completion of this thesis.

1. Background

Automated classification system for detection and prediction of knee

osteoarthritis (OA) can be defined as a method used to assign a knee into one

of predefined classes. The assignment is based on computer-aided assessment

of OA changes in knee images and the classes are defined according to the

disease grading. Although a number of methods for knee classification have

been reported in the literature, there are currently no accurate methods that could

detect and predict the disease and that are invariant to knee imaging conditions.

Therefore, an accurate and robust system for detection and prediction of knee OA

is required.

Methods used for OA assessment can be divided into two groups: statistical

and classification/regression methods. The first group includes methods that

calculate statistical parameters from knee images using histomorphometric

analysis [1, 2], fractal signature analysis (FSA) [3–6], Hurst orientation

transform [7] and variance orientation transform [8]. Although some of the

1

CHAPTER 1. INTRODUCTION

methods provide detailed description of the images, their application to detection

and prediction of knee OA is not simple. Also, the methods are sensitive to noise,

magnification, projection angle, and angular space quantisation [7, 8].

The second group includes methods in which detection and prediction of knee

OA are formulated as classification/regression problems. To date, to the best of

the author’s knowledge, two such methods have been developed: a weighted

neighbor distance using a compound hierarchy of algorithms representing

morphology (WND-CHARM) image classification system [9] and a regression

model that calculates shape parameters based on FSA [10]. The WND-CHARM

system extracts a large number of texture features from knee radiographs and

uses the features with the nearest neighbour classifier. The regression model

calculates horizontal and vertical shape parameters from trabecular bone (TB)

texture regions in knee radiographs and uses the parameters with a generalised

linear model. However, applications of the system and the model are rather

limited. This is because features extracted in the WND-CHARM system are

redundant, require extensive computation time and have little or no physical

interpretation [11], and shape parameters calculated in the regression model

describe TB texture only in the horizontal and vertical directions and the

box-counting technique used is highly sensitive to image signal-to-noise ratio

and trabecular marrow pore size [4]. Also, the nearest neighbour classifier and

the generalised linear model used are sensitive to outliers and cannot produce

nonlinear decision boundaries. Therefore, a new system that avoids problems

associated with the use of image features, that is invariant to imaging conditions

and that overcomes limitations of single classifiers and models needs to be

developed and evaluated in detection and prediction of knee OA. This issue is

addressed in this thesis.

The system developed could also find applications in other areas. In medicine,

this includes classification of breast lesions from ultrasound images [12],

2


diagnosis of interstitial lung disease based on chest radiography [13, 14] and

assessment of dermoscopic images for skin lesions [15]. In engineering, the

system could be used for classification of anisotropic/textured surfaces of wear

particles [16, 17], e.g. in a machine condition monitoring tool.

Thus, this thesis aims at the development and evaluation of an automated

classification system for detection and prediction of knee OA which could also be

useful in other areas of medicine and in engineering. Although several attempts

to develop such a system have been reported in the literature, there has been little

work conducted on theoretical foundations of the systems developed and on the

effects of imaging conditions on their performance.

3


2. Thesis objectives

The following thesis objectives were formulated:

I. Development of an automated system for detection and prediction of knee

OA, which includes

• Development of a method for measuring distances between TB texture

images,

• Evaluation of the method under varying imaging conditions and in TB

texture classification,

• Development of a classification method based on classifier ensembles,

• Evaluation of the classification method on benchmark data sets.

II. Evaluation of the system developed on TB texture images, which includes

• Detection of knee OA,

• Prediction of knee OA progression.

4


3. Thesis overview

Thesis overview is illustrated in a diagram in Fig. 1.

Chapter 2 (paper 1): A signature dissimilarity measure for

trabecular bone texture in knee radiographs

In this chapter, a new method for measuring distances between TB texture

regions selected on knee radiographs was developed and evaluated. The method

developed, called a signature dissimilarity measure (SDM), quantifies roughness

and orientation of the bone texture for a predefined range of scales. The ability to

quantify texture roughness and orientation at predefined scales is important for

the assessment of biological and engineering surfaces since most of them exhibit

multiscale and anisotropic nature. The method developed is also invariant to

in-plane rotation and scale within the predefined range. In contrast, methods

used so far are sensitive to image rotation and scale [9] or describe the bone

texture only in the horizontal and vertical directions [10].

Changes in TB structure were shown to occur first on the development pathway

to knee OA [18, 19]. This indicates that accurate assessment of the bone structure

could be used not only for detection, but more importantly for prediction of knee

OA, aiding medical experts in diagnosis and prognosis of the disease. To date,

the most popular method used in the assessment of the bone structure has been

plain radiography [10, 20, 21]. This is because it is cheap, non-invasive, widely

accessible and it produces two-dimensional bone texture that is directly related

to the three-dimensional bone structure [22–25]. Therefore, the SDM method was

developed for TB texture images.

The performance of the SDM method in detection of knee OA and in

rotation-invariant texture classification was studied using TB texture images of

5


healthy and OA knees and images taken from Brodatz album [26]. The effects

of imaging conditions such as exposure, magnification and projection angle on

the SDM method were investigated using knee radiographs of frozen tibia head.

In addition, computer-generated fractal texture images were used to evaluate

invariance of the method to image size, anisotropy direction, noise, and blur.

The results obtained showed that the SDM method combined with the support

vector machine (SVM) classifier outperforms the benchmark WND-CHARM

system in knee OA detection. The performance of the method in

rotation-invariant texture classification was comparable to a benchmark Local

Binary Patterns system [27]. Also, it was found that the SDM method is invariant

to a range of exposure, magnification, image size, anisotropy direction, noise, and

blur encountered in a routine screening of knee radiographs.

From the work described in this chapter, it was concluded that the SDM method

can quantify TB texture roughness and orientation in details and that it is

invariant to a range of imaging conditions. Therefore, the method developed

had a potential for detection and prediction of knee OA and it was used in the

subsequent studies.

Chapter 3 (paper 2): Prediction of progression of radiographic knee

osteoarthritis using tibial trabecular bone texture

This chapter describes the evaluation of the SDM method in prediction of

progression of early and late knee OA using tibial TB texture. If the disease

could be predicted, this would indicate that the SDM method can assess OA

changes in TB that occur before progression of joint space narrowing (JSN) and

osteophytes. This work is a part of the current trend of research directed towards

the development of an accurate, low-cost and non-invasive system for prediction

of knee OA [10, 11].

6


A longitudinal study design with baseline and follow-up examinations of four

years apart was used to evaluate prediction accuracy of the SDM method.

All subjects studied underwent partial meniscectomy about 16 years prior to

the baseline. Knees of all subjects were divided into non-progressive and

progressive groups based on the difference in medial JSN grade between the two

examinations. For each knee, TB regions of interest (ROIs) were selected from

standing anteroposterior digital radiographs taken at baseline.

Three texture parameters, i.e. roughness, degree of anisotropy and direction of

anisotropy were calculated for the selected ROIs using the SDM method. The

results obtained for a generalised linear model based on the parameters showed

that the SDM method can successfully discriminate between non-progressive and

progressive knees. In particular, it was shown that high prediction accuracy can

be obtained for knees with early OA at baseline, i.e. for knees that have no or

doubtful radiographic signs of the disease. The results also showed that the SDM

method provides detailed description of OA changes in the bone texture that

occur before progression of radiographic features such as JSN and osteophytes.

In conclusion, the results presented in this chapter demonstrated that the SDM

method can predict loss of tibiofemoral joint space in knees with early and late

OA and that it can describe OA changes in TB texture in detail.

Chapter 4 (paper 3): A measure of competence based on random

classification for dynamic ensemble selection

Although the SDM method produces higher classification accuracies for TB

texture than the benchmark system, the accuracies could be further increased

by combining the method with classifier ensembles instead of single classifiers

(e.g. the SVM classifier) and models. The rationale behind this is that classifier

ensembles overcome most limitations of single classifiers and models and they

7


showed the best performance for a wide range of classification problems [28–31].

In this chapter, a new method based on classifier ensembles that could be used

with the SDM method was developed and evaluated.

The method developed, called a measure of competence based on random

classification (MCR), first estimates local (i.e. for each image) classification

accuracies of all classifiers in the ensemble and then selects classifiers that have

better-than-random accuracies. The method uses distances between images

instead of image features and therefore it is compatible with the SDM method.

Theoretical result showed that the MCR method improves the performance of

the majority voting rule.

The performance of the MCR method was investigated on 14 benchmark data

sets. The results obtained showed that the method developed outperforms other

methods based on classifier ensembles regardless of the ensemble type used

(homogeneous or heterogeneous). The results also showed that the MCR method

gives the best performance for classifier ensembles of different sizes.

From the results described in this chapter, it was clear that the MCR method

can reliably select accurate classifiers from homogeneous and heterogeneous

classifier ensembles and that it can perform well for a wide range of classification

problems.

Chapter 5 (paper 4): Dissimilarity-based multiple classifier system

for trabecular bone texture in knee radiographs: detection and

prediction of osteoarthritis

In this chapter, the SDM and MCR methods developed were combined into a

dissimilarity-based multiple classifier (DMC) system and used for detection and

prediction of knee OA. For this purpose, a special approach was applied to

8


generate homogeneous and heterogeneous classifier ensembles using distances

measured between TB texture images.

The accuracies of the DMC system in detection and prediction of knee OA were

evaluated using TB texture images of healthy and OA knees and knees with

non-progressive and progressive OA. The disease progression was defined as

an increase in the sum of JSN and osteophytes grades between baseline and

follow-up four years later.

The results obtained demonstrated that the SDM method can accurately

discriminate between healthy and OA knees and between knees with

non-progressive and progressive OA. The accuracies obtained were higher

than those of other classification systems, including the SDM method combined

with the SVM classifier.

Concluding remarks

This thesis began with the development of a new method (SDM) for measuring

distances between TB texture images. The aim was to determine if the method

developed could be applied to texture quantification in medicine. Using real-life

and artificial texture images, the SDM method was shown to be successful in

detection of knee OA and invariant to a range of imaging conditions. Further

evaluation of the method evidenced that it can predict progression of early and

late knee OA and that it can provide a detailed description of OA changes in the

bone texture. The successful quantification of texture roughness and orientation

for predefined scales demonstrated potential of the SDM method in medicine

where surfaces studied are multiscale and anisotropic in nature.

After evaluation of the SDM method, a new method (MCR) for selecting

accurate classifiers from homogeneous and heterogeneous classifier ensembles

9


was developed. The aim was to increase classification accuracies of the SDM

method in detection and prediction of OA. The MCR method was tested on

classifier ensembles of different types and sizes and it was shown to perform

well for a wide range of classification problems. The SDM and MCR methods

developed were then used to form an automated system (DMC) for texture

classification. Using the system, the highest classification accuracies for detection

and prediction of progression of knee OA were achieved.

In conclusion, the DMC system developed can be a useful decision-support

tool in medicine. This includes diagnosis and prognosis of joint diseases based

on quantification and classification of medical texture images. Since the DMC

system is accurate for multiscale and anisotropic texture images and it is invariant

to a range of imaging conditions, it could also find applications in other areas,

e.g. machine condition monitoring based on classification of wear particles and

product quality control based on quantification of surface morphology.

10


4. References

[1] L Kamibayashi, U P Wyss, T D V Cooke, and B Zee. “Trabecular

Microstructure in the Medial Condyle of the Proximal Tibia of Patients with

Knee Osteoarthritis”. In: Bone 17 (1995), pp. 27–35.

[2] L Kamibayashi, U P Wyss, T D V Cooke, and B Zee. “Changes in mean

trabecular orientation in the medial condyle of the proximal tibia in

osteoarthritis”. In: Calcified Tissue International 57 (1995), pp. 69–73.

[3] E A Messent, R J Ward, C J Tonkin, and C Buckland Wright. “Tibial

cancellous bone changes in patients with knee osteoarthritis. A short term

longitudinal study using Fractal Signature Analysis”. In: Osteoarthritis and

Cartilage 13 (2005), pp. 463–470.

[4] J A Lynch, D J Hawkes, and J C Buckland Wright. “Analysis of texture in

macroradiographs of osteoarthritic knees using the fractal signature”. In:

Physics in Medicine and Biology 36 (1991), pp. 709–722.

[5] J C Buckland Wright, J A Lynch, and D G Macfarlane. “Fractal signature

analysis measures cancellous bone organisation in macroradiographs of

patients with knee osteoarthritis”. In: Annals of the Rheumatic Diseases 55

(1996), pp. 749–755.

[6] E A Messent, J C Buckland Wright, and G M Blake. “Fractal analysis of

trabecular bone in knee osteoarthritis (OA) is a more sensitive marker

of disease status than bone mineral density (BMD)”. In: Calcified Tissue

International 76 (2005), pp. 419–425.

[7] P Podsiadlo and G W Stachowiak. “Analysis of trabecular bone texture

by modified Hurst orientation transform”. In: Medical Physics 29 (2002),

pp. 460–474.

[8] M Wolski, P Podsiadlo, and G W Stachowiak. “Directional fractal signature

analysis of trabecular bone: evaluation of different methods to detect

early osteoarthritis in knee radiographs”. In: Proceedings of the Institution

11


of Mechanical Engineers - Part H: Journal of Engineering in Medicine 223 (2009),

pp. 211–236.

[9] N Orlov, L Shamir, T Macura, J Johnston, M D Eckley, and I G Goldberg.

“WND-CHARM: Multi-purpose image classification using compound

image transforms”. In: Pattern Recognition Letters 29 (2008), pp. 1684–1693.

[10] V B Kraus, S Feng, S C Wang, S White, M Ainslie, A Brett, A Holmes, and

H C Charles. “Trabecular morphometry by fractal signature analysis is a

novel marker of osteoarthritis progression”. In: Arthritis & Rheumatism 60

(2009), pp. 3711–3722.

[11] L Shamir, S M Ling, W Scott, M Hochberg, L Ferrucci, and I G Goldberg.

“Early detection of radiographic knee osteoarthritis using computer aided

analysis”. In: Osteoarthritis and Cartilage 17 (2009), pp. 1307–1312.

[12] B Liu, H D Cheng, J Huang, J Tian, X Tang, and J Liu. “Fully automatic

and segmentation-robust classification of breast tumors based on local

texture analysis of ultrasound images”. In: Pattern Recognition 43 (2010),

pp. 280–298.

[13] B van Ginneken, L Hogeweg, and M Prokop. “Computer-aided diagnosis

in chest radiography: Beyond nodules”. In: European Journal of Radiology 72

(2009), pp. 226–230.

[14] S G Armato III, A S Roy, H MacMahon, F Li, K Doi, S Sone, and M B Altman.

“Evaluation of automated lung nodule detection on low-dose computed

tomography scans from a lung cancer screening program”. In: Academic

Radiology 12 (2005), pp. 337–346.

[15] C Serrano and B Acha. “Pattern analysis of dermoscopic images based on

Markov random fields”. In: Pattern Recognition 42 (2009), pp. 1052–1057.

[16] G P Stachowiak, P Podsiadlo, and G W Stachowiak. “Shape and texture

features in the automated classification of adhesive and abrasive wear

particles”. In: Tribology Letters 24 (2006), pp. 15–26.

12


[17] G P Stachowiak, G W Stachowiak, and P Podsiadlo. “Automated

classification of wear particles based on their surface texture and shape

features”. In: Tribology International 41 (2008), pp. 34–43.

[18] C Buckland Wright. “Subchondral bone changes in hand and knee

osteoarthritis detected by radiography”. In: Osteoarthritis and Cartilage 12

(2004), S10–S19.

[19] C Ding, F Cicuttini, and G Jones. “Tibial subchondral bone size and knee

cartilage defects: relevance to knee osteoarthritis”. In: Osteoarthritis and

Cartilage 15 (2007), pp. 479–486.

[20] P Podsiadlo, L Dahl, M Englund, L S Lohmander, and G W Stachowiak.

“Differences in trabecular bone texture between knees with and without

radiographic osteoarthritis detected by fractal methods”. In: Osteoarthritis

and Cartilage 16 (2008), pp. 323–329.

[21] M Wolski, P Podsiadlo, G W Stachowiak, L S Lohmander, and M Englund.


radiographic osteoarthritis detected by directional fractal signature

method”. In: Osteoarthritis and Cartilage 18 (2010), pp. 684–690.

[22] L Pothuaud, C L Benhamou, P Porion, E Lespessailles, R Harba, and

P Levitz. “Fractal dimension of trabecular bone projection texture is related

to three dimensional microarchitecture”. In: Journal of Bone and Mineral

Research 15 (2000), pp. 691–699.

[23] R Jennane, R Harba, G Lemineur, S Bretteil, A Estrade, and C L Benhamou.

“Estimation of the 3D self similarity parameter of trabecular bone from its

2D projection”. In: Medical Image Analysis 11 (2007), pp. 91–98.

[24] L Pothuaud, P Carceller, and D Hans. “Correlations between grey level

variations in 2D projection images (TBS) and 3D microarchitecture:

Applications in the study of human trabecular bone microarchitecture”. In:

Bone 42 (2008), pp. 775–787.

13


[25] G Luo, J H Kinney, J J Kaufman, D Haupt, A Chiabrera, and

R S Siffert. “Relationship between plain radiographic patterns and

three-dimensional trabecular architecture in the human calcaneus”. In:

Osteoporosis International 9 (1999), pp. 339–345.

[26] P Brodatz. Textures: A Photographic Album for Artists and Designers. Dover

Publications, New York, 1966.

[27] T Ojala, M Pietikainen, and T Maenpaa. “Multiresolution gray-scale and

rotation invariant texture classification with local binary patterns”. In:

IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002),

pp. 971–987.

[28] L I Kuncheva. Combining Pattern Classifiers: Methods and Algorithms.

Wiley-Interscience, 2004.

[29] J Kittler, M Hatef, R P W Duin, and J Matas. “On combining classifiers”.

In: IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998),

pp. 226–239.

[30] L Breiman. “Bagging predictors”. In: Machine Learning 24 (1996),

pp. 123–140.

[31] Y Freund and R E Schapire. “A decision-theoretic generalization of on-line

learning and an application to boosting”. In: Journal of Computer and System

Sciences 55 (1997), pp. 119–139.

14


5. List of figures

Literature

review

Paper 1

(chapter 2)

Paper 2

(chapter 3)

Paper 3

(chapter 4)

Automated system for

detection and prediction

of knee osteoarthritis

Paper 4

(chapter 5)

Development

Evaluation

Figure 1: Thesis overview.

15

CHAPTER 2

A SIGNATURE DISSIMILARITY MEASURE FOR

TRABECULAR BONE TEXTURE IN KNEE

RADIOGRAPHS

Tomasz Woloszynski1, Pawel Podsiadlo, PhD1, Gwidon W Stachowiak, PhD1,

and Marek Kurzynski, PhD2

1Tribology Laboratory, School of Mechanical and Chemical Engineering,

University of Western Australia, Australia

2Chair of Systems and Computer Networks, Wroclaw University of Technology,

Poland

Medical Physics 2010;37;2030–2042

Abstract

Purpose: The purpose of this study is to develop a dissimilarity measure for

the classification of trabecular bone (TB) texture in knee radiographs. Problems

associated with the traditional extraction and selection of texture features and

with the invariance to imaging conditions such as image size, anisotropy, noise,

blur, exposure, magnification, and projection angle were addressed.

Methods: In the method developed, called a signature dissimilarity measure

(SDM), a sum of earth mover’s distances calculated for roughness and orientation

16

CHAPTER 2. A SIGNATURE DISSIMILARITY MEASURE . . .

signatures is used to quantify dissimilarities between textures. Scale-space theory

was used to ensure scale and rotation invariance. The effects of image size,

anisotropy, noise, and blur on the SDM developed were studied using computer

generated fractal texture images. The invariance of the measure to image

exposure, magnification, and projection angle was studied using x-ray images

of human tibia head. For the studies, Mann-Whitney tests with significance level

of 0.01 were used. A comparison study between the performances of a SDM

based classification system and other two systems in the classification of Brodatz

textures and the detection of knee osteoarthritis (OA) were conducted. The other

systems are based on weighted neighbor distance using compound hierarchy of

algorithms representing morphology (WND-CHARM) and local binary patterns

(LBP).

Results: Results obtained indicate that the SDM developed is invariant to image

exposure (2.5–30 mA s), magnification (×1.00–×1.35), noise associated with film

graininess and quantum mottle (<25%), blur generated by a sharp film screen,

and image size (>64×64 pixels). However, the measure is sensitive to changes in

projection angle (>5◦), image anisotropy (>30◦), and blur generated by a regular

film screen. For the classification of Brodatz textures, the SDM based system

produced comparable results to the LBP system. For the detection of knee OA,

the SDM based system achieved 78.8% classification accuracy and outperformed

the WND-CHARM system (64.2%).

Conclusions: The SDM is well suited for the classification of TB texture images

in knee OA detection and may be useful for the texture classification of medical

images in general.

Key words: dissimilarity measure, image classification, scale-space, knee

osteoarthritis, radiographs

17


1. Introduction

Trabecular bone (TB) analysis plays an important role in the assessment of

knee osteoarthritis (OA). This is due to the fact that the TB parameters such as

thickness, separation, connectivity, and orientation change with the progression

of the disease [1–4]. Recently, it has been shown that these changes in bone

structure precede radiographic OA, i.e., they occur before joint space narrowing

and osteophyte formation take place [5]. For these reasons, TB has been used in

the assessment of risk and severity of OA in knees. The images of TB used in

the OA assessment are usually acquired by plain radiography [5–7]. Although

other bone imaging techniques such as magnetic resonance imaging (MRI) [8]

and scintigraphy [9] are also used, plain radiography remains the most popular

technique. This is mainly due to its wide accessibility, low cost, and non-invasive

nature, and the fact that the bone texture obtained by plain radiography is directly

related to the three-dimensional bone structure [10–13]. Therefore, x-ray images

of TB are used in this study.

Two main approaches for the assessment of knee OA are used. The first approach

is based on statistical tests for the differences between healthy and OA knee TB

textures. Methods employing this approach include histomorphometric analysis

[2, 3], fractal signature analysis [4, 14–16], and Hurst orientation transform

based methods [6, 17]. Since TB exhibits fractal nature [18, 19], recent research

has been mainly focused on the development of fractal methods. However,

such methods generally lack the ability to describe the bone anisotropy, i.e.,

fractal dimensions (FDs) are calculated only in horizontal and vertical directions

[4, 15], and the ability to fully quantify bone roughness, i.e., FD is the only

parameter used. Fractal methods are also sensitive to imaging conditions such

as noise, magnification, and projection angle [17]. Although some of these

problems have recently been addressed [17, 20], dependence of results on angular

space quantization still remains unresolved. The second approach is based on

18


formulating OA detection and prediction as a classification problem [21, 22].

In this approach, multiple features are extracted from x-ray images of TB and

then used to classify patients knees. Image classes are defined according to the

Kellgren-Lawrence (KL) scaling system [23]. However, in all methods used so far,

the calculation, extraction, and selection of texture features is not a trivial matter

[20–22]. Consequently, the development of reliable decision support systems

based on such methods is difficult.

One possible way to address the problem is through the use of a dissimilarity

measure. In this approach, no texture features are produced; instead a direct

measurement of distances between texture images is performed. This eliminates

the problems associated with the extraction and selection of texture features, e.g.,

the high dimensionality of feature space and the estimation of large number

of parameters. In past research, dissimilarity measures were proved to be

useful alternatives to feature-based classifications [24–26]. This includes image

search [27], iris recognition [28], and medical image segmentation [29]. However,

dissimilarity measures that are able to quantify TB anisotropy and roughness at

various scales and that are invariant to x-ray measurement conditions are not yet

available.

In this paper, aforementioned problems were addressed by developing a new

measure, called a signature dissimilarity measure (SDM). In the SDM, a sum

of earth movers distances (EMDs) [30] calculated for roughness and orientation

signatures is used to quantify dissimilarities between textures. The problems

of invariance to imaging conditions, the ability to describe TB roughness and

anisotropy at various scales, and the detection of OA knee were addressed.

2. Method

A dissimilarity measure developed in this study is based on a scale-space theory.

Relevant aspects of the theory are presented first.

19


2.1 Scale-space representation of texture image

Assume that texture data is a digital image defined as a function I : X × Y → Z

which assigns a gray value z ∈ Z to a single pixel at location (x, y) ∈ X × Y , i.e.,

z = I(x, y). Let a two-dimensional sampled Gaussian kernel function be given as

G(k,m, σn) =1

2πσ2n

exp

(−k

2 +m2

2σ2n

), (1)

where k,m are integers and σn ∈ R+ is the scale parameter taken from a

predefined increasing sequence σ1, . . . , σN ; N is the total number of scales used.

The range of scale parameters σn depends on the application and database used.

Generally, the range should be set in such a way that it covers the scales of interest

and that the sizes of sampled Gaussian kernels are smaller than the image size.

Then, the scale-space representation of the image can be defined as

L(x, y, σn) = G ∗ I =∞∑

k=−∞

∞∑m=−∞

G(k,m, σn)I(x− k, y −m), (2)

where ∗ is the convolution operator. The Gaussian kernel function G was chosen

because of its causality, homogeneity, and isotropy [31, 32]. Using the scale-space

representation scale-normalized image derivatives are obtained, i.e.,

Lxαyβ ,norm(x, y, σn) = σ(α+β)n Gxαyβ ∗ I, (3)

where α and β are the orders of differentiation in the x and y directions,

respectively, and Gxαyβ(k,m, σn) is the sampled Gaussian derivative given by

Gxαyβ(k,m, σn) =

{∂(α+β)

∂xαyβ

[1

2πσ2n

exp

(−x

2 + y2

2σ2n

)]}∣∣∣∣(x,y)=(k,m)

, (4)

where x and y are continuous variables. The normalized derivatives have a useful

property which is the invariance to image scale [31].

20


2.2 Roughness signature

Roughness signature is used to measure the complexity of a texture image.

Complexity is understood as the frequency content of texture. For example, a

texture containing mainly smooth regions would have a low complexity, while

a texture containing rough patches with a number of sharp edges would exhibit

a high complexity. In the case of TB textures, the roughness signature would

be affected by bone properties such as trabecular thickness and separation. The

roughness signature is calculated in the following manner:

1. First, the length of the gradient Diff1 and the Laplacian Diff2 operators are

calculated for a texture image at predefined scales,

Diff1(x, y, σn) =√L2x,norm(x, y, σn) + L2

y,norm(x, y, σn), n = 1, . . . , N, (5)

Diff2(x, y, σn) = Lxx,norm(x, y, σn) + Lyy,norm(x, y, σn), n = 1, . . . , N. (6)

The gradient operator is used to detect edges. The second operator takes

nonzero values for smooth circular regions (called blobs) and it acts as a

smoothness detector.

2. For each pixel, the extremum values of the operators Diff1 and Diff2 are then

found across all scales,

Diffk,σmax(x, y) = maxn=1,...,N

|Diffk(x, y, σn)| , k = 1, 2, (7)

where σmax denotes the scale at which the operator takes the extremum

value at pixel (x, y).

3. Next, a roughness measure is calculated as a difference between the two

extremum operators, i.e.,

R(x, y) = Diff1,σmax(x, y)−Diff2,σmax(x, y). (8)

The measure R(x, y) is negative if a neighbourhood surrounding the pixel

21


located at (x, y) is smooth and positive otherwise. The measure is zero if

the neighbourhood does not resemble either a smooth circular region or an

edge.

4. In order to remove the effect of stationary distortions in the image (e.g.,

noise, overexposure, etc.), the roughness measure is normalized with

respect to its standard deviation calculated over the whole image

Rnorm(x, y) =R(x, y)

SD(R(x, y)). (9)

5. Finally, a roughness signature Srough(I) of image I(x, y) is calculated as a

normalized histogram of the roughness measure Rnorm(x, y). The histogram

is built with bin centers lying in the interval (MR − 3.5,MR + 3.5), where

MR is the mean value of Rnorm(x, y). The interval limits were chosen as a

trade off between mapping Rnorm(x, y) on the histogram with a preselected

number of bins and the minimization of the number of Rnorm(x, y) values

that lie outside the interval. The histogram is normalized so that entries

(weights) from all bins sum to 1. Resulting roughness signature Srough(I) is

stored as pairs of bin centers and weights.

The roughness signature resembles the standard texture feature of coarseness [33]

described by Tamura et al. However, the standard Tamura coarseness is a single

number calculated as the average of coarseness measures at each pixel whereas

the signature is a histogram of the roughness measure. The Tamura coarseness

has been modified by using histogram of coarseness measures [34, 35]. However,

both the standard and modified Tamura coarsenesses are not scale and rotation

invariant since the coarseness measure is calculated using square regions of fixed

sizes (i.e., powers of 2).

Examples of fractal and Brodatz texture images with their roughness signatures

are shown in Fig. 1. It can be seen from this figure that the shape of the signature

22


and its position with respect to zero depend on texture complexity. The texture

image of reptile skin in Fig. 1(a) taken from the Brodatz album [36] was identified

as relatively smooth since the mean MR is negative. This indicates that blob

regions in this image cover a greater area than patches with sharp edges. For

the rough fractal image shown in Fig. 1(c), the roughness signature takes the

maximum value approximately at zero roughness measure Rnorm(x, y). This

indicates that, on average, a neighbourhood of each pixel taken from the fractal

surface does not resemble either a smooth circular region or an edge. All images

were analyzed using the same range of scales, i.e., σn = 1.1n, n = 1, . . . , 25. The

size of Brodatz and fractal images was 180×180 pixels.

2.3 Orientation signature

Orientation signature is a measure of a texture image direction. The signature

is defined as a histogram of angles between gradient directions calculated at the

pixel locations (x, y) and the principal gradient direction (PGD) of changes in the

entire image. For TB textures, the signature measures orientations of trabeculae.

The orientation signature is calculated in the following steps:

1. First, for each pixel located at (x, y) an angle θ(x, y) between the image

horizontal axis and a gradient vector at scale σmax is calculated, i.e.,

θ(x, y) = tan−1(Ly,norm(x, y, σmax)/Lx,norm(x, y, σmax)). (10)

2. Edge angles associated with rapid changes in pixel brightness values such

as edges are then selected. It is reasonable to select such angles since they

represent dominating texture directions. Smooth regions (blobs) can be

ignored because they do not have any well-defined directional patterns.

For the selection of edge angles, the maximum and minimum values of the

Laplacian operator Diff2 are found for each pixel. The maxima and minima

of Diff2 represent valleys and hills in the texture image, respectively. A

binary image is obtained by adding the maximum and minimum values of

23


Diff2 for each pixel and setting the threshold at zero level. A perimeter of the

binary image is calculated and then used to locate edges (the rapid changes

in pixel brightness) connecting bright and dark patches in the texture image.

Angles located at the edges are edge angles. As an example, a TB texture

image was used to illustrate steps required for the selection of edge angles

(Fig. 2).

3. Next, the PGD is calculated using a principal component analysis (PCA)

method and image directional derivatives. In the PCA method, a covariance

matrix is used to retrieve information about a spread of data. In a similar

way, a second moment matrix µ2(0) of directional derivatives around zero

provides information about texture changes in x−y coordinates. The second

moment matrix is given by

µ2(0) =

∑L2x,norm

∑Lx,normLy,norm∑

Ly,normLx,norm∑L2y,norm

, (11)

where L·,norm = L·,norm(x, y, σmax) and the sums are taken over pixels lying

on edges. The direction which captures most texture changes (i.e., the

PGD) is obtained by calculating the eigenvector associated with the largest

eigenvalue of the matrix µ2(0).

4. Finally, the orientation signature Sorient(I) of image I(x, y) is calculated

as a smoothed and normalized histogram of angle differences between

the PGD and the edge angles. For the histogram, the bin centers cover

a 180◦ range and each center is computed as the mean value of angle

differences within the bin. The number of bins is chosen in such a way

that the signature can capture image rotations at small angles and the

bin weights are not suppressed by an averaging filter. The filter is used

to reduce the high-frequency peaks in histogram associated with discrete

image rotation. All bin weights are normalized so that their sum is equal

to 1. Finally, the histogram of angles is represented in a two-dimensional

24


angular space. This is achieved by wrapping the bin centers around a

circle with a predetermined radius of 45. The resulting orientation signature

Sorient(I) is stored as pairs of bin centers and weights.

The orientation signature is similar to the standard Tamura texture feature of

directionality [33]. However, the Tamura directionality is not scale invariant

since it uses a fixed size (3×3 pixels) edge detection mask in the calculation

of gradients. Also, the feature is not rotation invariant since the histogram of

gradient directions depends on the initial orientation of the image.

As an example, orientation signatures were calculated for Brodatz texture images

shown in Figs. 3(a), (c), and (e). The images represent [Figs. 3(a) and (c)] two

weakly anisotropic leather textures with different orientations and [Fig. 3(e)] an

isotropic weave texture. The size of all images is 180×180 pixels and the scale

range was set to σn = 1.1n, n = 1, . . . , 25. The number of histogram bins was set

to 45. Rose plots of the signatures calculated are shown in Figs. 3(b), (d), and (f).

On the rose plots, the PGDs are marked as thick solid lines. It can be seen from the

figure that the shape of the rose plots depends on texture anisotropy. Elliptical

plots were obtained for the anisotropic textures of leather [Figs. 3(b) and (d)],

while a circular plot was obtained for the isotropic texture of weave [Fig. 3(f)].

It can also be seen that the rose plots obtained for the leather texture images are

virtually the same. This shows that the orientation signature is rotation invariant.

2.4 Signature dissimilarity measure

Roughness and orientation signatures are used to calculate a dissimilarity

measure between texture images. The dissimilarity measure between two

textures A and B is defined as

Diss(A,B) = αEMD[Srough(A), Srough(B)] + βEMD[Sorient(A), Sorient(B)], (12)

25


where EMD[·, ·] is the earth movers distance [30] between two signatures and

α, β are the normalization factors. The EMD is calculated as a minimal amount

of work that is needed to move a mass of earth spread in space (represented as

bin weights and centers of one signature) to a collection of holes (represented as

bin weights and centers of the other signature). The minimal amount of work is

found by solving a special case of transportation problem by linear optimization

[30]. The EMD was used because it can reliably compare two signatures with

different bin centers unlike statistical measures based on histograms, e.g.,

chi-square or G-test [30]. The EMDs are normalized with respect to the maximum

values of the EMDs obtained from a training set of images. This ensures that

roughness and orientation signatures equally affect the SDM, i.e., they are equally

informative.

3. Materials and results

The performance of the SDM was evaluated using Brodatz and fractal textures,

x-ray images of a tibia head, healthy and OA knee joints. The numbers of

bins used in the roughness and orientation signatures were set to 50 and 45,

respectively, for all experiments conducted.

3.1 Brodatz textures

A benchmark database generated from the Brodatz album was used to evaluate

the performance of the SDM in a rotation invariant texture classification.

Although the relevance of the album to TB textures is marginal, the former

provides a controlled environment with well-defined and visually separable

classes. Therefore, it is frequently used for preliminary evaluations of the

discriminative power of newly developed texture analysis methods. The

performance of a classification system based on the SDM was compared to

a system based on local binary patterns (LBP) [37] operator with parameters

P,R = 8, 1 + 16, 2 + 24, 3. For the comparison study, a nearest neighbour classifier

was used. This is because the classifier is naturally fitted for dissimilarity-based

26


classifications and its simplicity ensures that the results obtained accurately

reflect the ”true” discriminative power of the SDM. The image database described

by Ojala et al. [37] was chosen for the experiments. Although other databases

generated from the Brodatz album are available [38, 39], the chosen database was

specifically designed for rotation invariant texture classifications. Also, the LBP

system achieved the best scores for this database which makes the comparison

more challenging for the SDM. The database was replicated as follows.

Sixteen source textures were captured from the Brodatz texture album. Each

source texture represented one class. For each class, training and testing images

were generated as follows. Eight images of size 256×256 pixels were extracted

from each source texture. The first image was used for training the classifier and

the other images were used for testing. Each image was then rotated at 0◦, 20◦,

30◦, 45◦, 60◦, 70◦, 90◦, 120◦, 135◦, and 150◦, and cropped to the size of 180×180

pixels. This resulted in an image database containing 1280 (16×8×10) images

with 160 (16×1×10) training images and 1120 (16×7×10) testing images. The

training images were split into subimages of sizes 16×16, 30×30, 60×60, 90×90,

and 180×180 pixels resulting in five groups of training images with 19360, 5760,

1440, 640, and 160 subimages, respectively. Rotated images were computed using

bilinear interpolation. In the case of 0◦ and 90◦ rotation angles, an artificial blur

was added to the images to simulate the effect of blurring caused by bilinear

interpolation used for image rotation at other angles. The blur was generated

using circular averaging filter with radius equal to 1. The scale range in the SDM

was set to σn = 0.7 × 1.07n, n = 1, . . . , 25. This ensured that for all scales used,

the sizes of gradient and the Laplacian differential operators were smaller than

the sizes of training subimages.

Two experiments were conducted using the image database generated. In the

first experiment, the training set comprised of the images rotated at four angles:

0◦, 30◦, 45◦, and 60◦. The testing set was presented at six rotation angles: 20◦, 70◦,

27


90◦, 120◦, 135◦, and 150◦. In the second experiment, the training set comprised of

the images rotated at a single angle (the training angle). The testing set contained

images rotated at the remaining nine angles. In each class, the roughness and

orientation signatures were calculated for each subimage taken from the training

set and then averaged. As a result, each class was represented by a training pair

of the averaged signatures.

The results obtained from the first experiment are shown in Table 1. These results

represent the classification accuracies (i.e., the percentage of correctly classified

images) of the classification systems for each size of training images. It can be

seen from this table that for small sizes (16×16 and 30×30 pixels) the LBP system

produced the most accurate results. However, when the size of training images

was increased to 60×60, 90×90, and 180×180 pixels the performance of SDM and

LBP systems was comparable. Classification accuracies obtained for the second

experiment were analogous and they are listed in Table 2. Results obtained for

the LBP system were similar to those presented by Ojala et al. [37].

3.2 Fractal textures

To evaluate the effects of image size, anisotropy, noise, and blur fractal textures

were used. All images were generated by a spectral synthesis algorithm [40].

Fractal textures were used because they reflect the multiscale and nonstationary

nature of TB textures [18, 19] and provide a controlled environment for

experiments. However, since fractal textures are a rough approximation of bone

textures, the experimental results obtained only indicate the potential of the

SDM in TB texture classification. Five databases of fractal surface images were

generated.

1. The first image database contained images of isotropic fractal textures with

FD 2.7. Sizes of the images generated were: 64×64, 128×128, 192×192,

and 256×256 pixels. For each image size, there were 20 images of fractal

surfaces. This resulted in four groups of images with the total number of 80

28


images (20 images per group).

2. The second image database contained images of anisotropic fractal textures

with FDs 2.8 and 2.2 along two anisotropy directions perpendicular to each

other. The directions were generated along lines inclined to the horizontal

axis of an image at 6k degree angles, where k = 0, 1, . . . , 15. As a result,

16 directions were produced. For each anisotropy direction, a group of ten

fractal texture images of size 256×256 pixels was generated. This resulted

in 16 groups of images with the total number of 160 images (ten images per

group).

3. The third image database contained images of isotropic fractal textures with

FD 2.4 that were corrupted by the Poisson noise. The noise, which was

added to the images, simulated the effect of a quantum mottle observed in

x-ray images [41]. The FD 2.4 was used since this was the lowest dimension

calculated for TB textures in previous studies [17]. This presented a worst

case scenario because images with low FDs are most sensitive to noise [17,

40]. For the database, 20 isotropic fractal texture images of size 256×256

pixels were generated and then corrupted by the Poisson noise with mean

value of 128. A contribution of noise to the image pixel values was varied

between 0% and 25% with the step of 5%. This resulted in six groups of

images (20 images per group).

4. The fourth image database is similar to the third database, except the

isotropic fractal texture images were corrupted by the Gaussian noise. This

database was used to simulate effects of the noise of film graininess in x-ray

images. The Gaussian noise had mean and standard deviation values of 128

and 40, respectively.

5. The fifth image database contained images of isotropic fractal textures with

FD 2.7 smoothed by two kernel operators. Smoothed images were used

to simulate effects of a blur occurring in an image acquisition. The kernel

29


operators were generated using two different logit modulation transfer

functions (MTFs) [42]. The first MTF was used to model a sharp Kodak

Lanex Fine/OGscreen film system. The second MTF was used to model

an unsharp Kodak Lanex Regular/OG-screen film system. The high FD

was used since this is the most sensitive case for the study of a blurring

effect [17]. The database contained three groups of 20 images each, i.e.,

unprocessed and smoothed by the kernel operators.

Four experiments were conducted using the fractal image databases generated.

In all experiments the scale range used in the SDM was set to σn = 1.1n, n =

1, . . . , 25 in order to detect differences between fractal textures at a wide range

of scales. In the first experiment, the effect of image size on the SDM

was investigated using the first image database. Each image group in the

database was compared against a reference group of 128×128 pixels images.

The comparison between two groups was performed using the Mann-Whitney

test [43] with P<0.01 considered statistically significant. For the test, two

independent samples were used: (i) The first sample contained dissimilarity

measures calculated for every pair of images taken from the reference group; (ii)

the second sample contained dissimilarity measures calculated for every pair of

images taken from the other group and every cross pair of images taken from

both groups. Mean values with 99% confidence intervals (CI) for each sample

were calculated.

In the second experiment, the effect of changes in the direction of image

anisotropy was investigated using the second image database. Each group of

images was compared against a reference group containing images with the

anisotropy direction at 0◦ angle.

In the third experiment, the sensitivity to noise of the SDM was investigated using

the third and fourth image databases. The original images with FD 2.4 were used

30


as the reference group.

In the fourth experiment, the effect of a blur on the SDM was investigated

using the fifth image database. The reference group contained the unsmoothed

isotropic images with FD 2.7.

Mean (99% CI) values of the dissimilarity measures obtained in the four

experiments are listed in Tables 3 – 6. Results obtained showed that the group

of 64×64 pixels images, the groups of images with the anisotropy angle between

30◦ and 60◦, and the group of images smoothed with kernel representing regular

film screen were statistically different from their respective reference groups. For

other groups of images, no statistically significant differences were found.

3.3 Tibia head

Tibia head database was used to investigate the effect of varying x-ray

measurement conditions, i.e., exposure, magnification and projection angle on

the SDM. The database contained images of anteroposterior radiographs of

human tibia head (provided by the Bank of Bone and Tissues, Hollywood Private

Hospital, Perth, Australia). The radiographs were obtained using a Shimadzu

Corporation (Kyoto, Japan) (model P-20) x-ray machine with a fine sharp film.

They were digitized using a film scanner with resolution 50 µm per pixel and

quantized into 256 gray levels. The database was previously used for the

evaluation of modified Hurst orientation transform method [17].

The tibia head database was used to construct three image databases. Each image

database contained groups of 25 overlapping regions of 256×256 pixels selected

under the medial compartment of the tibia head.

1. The first image database was constructed from radiographs taken with

seven different exposures, i.e., 2.5, 5, 7.5, 10, 16, 24, and 30 mA s. The total

number of images in the database was 175 (i.e., one group per exposure and

31


25 images per group).

2. The second image database contained x-ray images taken with

magnifications ×1.00, ×1.13, ×1.23, and ×1.35, respectively. The exposure

was set to 16 mA s. The database contained four groups with the total

number of 100 images (i.e., one group per magnification and 25 images per

group).

3. The third image database contained x-ray images of the tibia head taken at

0◦, 5◦, 10◦, and 15◦ projection angles. The exposure was set to 24 mA s. The

database contained four groups with the total number of 100 images (i.e.,

one group per projection angle and 25 images per group).

Three experiments were conducted using the image databases. In all

experiments, the scale range used in the SDM was set to σn = 1.1n, n = 1, . . . , 25

for the same reason as before. In the first experiment, the effect of exposure

changes on the SDM was investigated using the first image database. Each group

of images was compared against a reference group using the Mann-Whitney

test. The group of images obtained from the radiographs taken with exposure

of 2.5 mA s was selected by radiologist as the reference group based on visual

examination. The mean (99% CI) values for each sample were also calculated.

In the second, experiment the effect of magnification changes was examined

using the second database. Each group of images was compared against the

reference group of images taken at magnification ×1.00.

In the third experiment, the effects of changes in projection angle were

investigated using the third database. The group of images taken at 0◦ projection

angle was used as the reference group.

The results obtained from the first experiment are listed in Table 7. These results

32


are mean (99% CI) values of the dissimilarity measures calculated between seven

groups of images and the reference group. It can be noticed from this table that

CIs are overlapping. For the significance level of 0.01, no statistical differences

were found between the reference group and the remaining groups of images.

The results obtained from the second experiment are shown in Table 8. No

statistically significant differences were found between the groups of images.

For the third experiment, it was found that groups of images taken at projection

angles greater than 5◦ are statistically different from the reference group P<0.01

as shown in Table 9.

3.4 Healthy and osteoarthritic knees

The image database of healthy and OA knees was used to evaluate the

performance of the SDM in knee OA detection. The radiographs were taken

from 17 healthy and 34 OA subjects. Each subject was locked in a standardized

standing position [44]. Two radiographs were taken per subject (i.e., one per

knee). Each tibiofemoral compartment in each knee radiograph was graded at KL

scale (0-4) by two radiologists with ∼10 yr experience. The compartments were

graded according to the atlas from Osteoarthritis Research Society International

[45]. Disagreements in the KL grade between the two readers were adjudged

by a third reader who was a radiologist with 15 yr experience. The level of

disagreement for the database used was 14.6%. Subjects were divided into

healthy and OA groups based on the KL grades. Healthy subjects had both

tibiofemoral compartments in both knees assigned with KL grade 0 (no OA).

OA subjects had at least one tibiofemoral compartment in any knee assigned

with KL grade 2 (minimal OA) or 3 (moderate OA). This radiographic criterion

roughly correlates with the progression of OA in knee joints. Knee radiographs

were taken in the Perth Radiographic Clinic, Subiaco.

There were 137 TB texture images of size 256×256 pixels each. For healthy

subjects, TB texture images were extracted under the medial and lateral

33


compartments using the automated method developed in the previous study

[46]. For OA subjects, the images were extracted under the compartment

diagnosed with radiographic OA using the same method. The images of healthy

knees from OA subjects were not used in this study. This resulted in healthy

and OA classes with 68 and 69 images per class, respectively. The classes were

matched by age, body mass index (BMI), and gender as shown in Table 10.

Examples of TB texture images are shown in Figs. 4 and 5. The non-uniform

background brightness of TB texture images is sometimes corrected during

preprocessing for unbiased analyses [6]. However, since the SDM is based on

differential operators the absolute values of pixels do not affect the measure. For

this reason, only the unprocessed TB texture images were used in the experiment.

Two SDM based systems were used for the knee OA detection, i.e., a SDM-nearest

neighbour (SDM-NN) and a SDM-support vector machine (SDM-SVM). In the

first system, the measure developed was used with a NN classifier for the

same reason as explained before. In the second system, a SVM classifier with

radial basis kernel was used because of its good performance for a wide range

of binary classification problems. The classification accuracy was estimated

using a leave-one-out cross-validation method. The method selects a single

TB texture image for testing and uses the remaining images for training. The

experiment was repeated 137 times so that each image was used for testing only

once. Classification accuracy was calculated as the average value of accuracies

obtained from all repetitions. The scale range in the SDM was set to σn =

2 × 1.07n, n = 1, . . . , 25, which corresponded to trabecular sizes from 0.11 to

1.08 mm. Previous studies based on fractal analyses showed that differences

in FD of TB textures between subjects with healthy and OA knees are most

significant in this range [4, 6]. The performances of proposed systems were

compared against a benchmark weighted neighbor distance using compound

hierarchy of algorithms representing morphology (WND-CHARM) classification

system [47] and a Tamura features based classification system. The former system

34


was successfully used in previous studies of knee OA detection and prediction

based on the texture analysis of knee joint areas.[21, 22] In the system, first 2885

features were extracted from each TB texture image, including Zernike, Tamura,

and Haralick features, multiscale histograms, and Chebyshev statistics. A Fisher

score was then calculated for each feature and 10% of features with the highest

scores were selected. The features selected were then used by a weighted nearest

neighbour classifier. In the latter system, Tamura features such as contrast,

directionality, and modified coarseness (average and three-bin histogram) were

used because of their similarity to the signatures developed in quantifying texture

roughness and orientation. A SVM classifier with radial basis kernel function was

used.

Table 11 presents confusion matrices obtained for the SDM-NN, SDM-SVM,

WND-CHARM, and Tamura systems in knee OA detection. Using the matrices

the classification accuracy, specificity (percentage of healthy subjects classified

as healthy), and sensitivity (percentage of OA subjects classified as OA) were

calculated. For the SDM-NN (SDM-SVM, WND-CHARM, and Tamura) system,

their values were 78.8% (85.4%, 64.2%, and 67.9%), 80.9% (83.8%, 70.6%, and

85.3%), and 76.8% (87.0%, 58.0%, and 50.7%), respectively.

To find which parts of the tibia have the strongest signal for OA detection, TB

texture images were extracted from a search region of the tibia head as shown in

Fig. 6. The search region was manually selected from 34 healthy and 57 OA knee

x-ray images, excluding subchondral bone sclerosis, periarticular osteopenia, and

fibula head. The region was resized, if necessary, to fit into a 1280×640 pixels

rectangle. From each region selected, 27 TB texture images of size 256×256 pixels

were extracted using 128 pixel overlap. This resulted in 27 databases of texture

images of healthy and OA knees; one database per tibia part. Healthy knees had

both tibiofemoral compartments assigned with KL grade 0. OA knees had at

least one compartment assigned with KL grade 2 or 3. The SDM-NN system was

35


trained using randomly selected 60 TB texture images (30 images per class) and

tested using the remaining 31 images. This experiment was repeated 300 times.

The classification accuracy was calculated as the average value of accuracies

obtained from all repetitions. Classification accuracies calculated for all tibia

parts are shown in Fig. 6. TB texture located immediately under the medial

compartment provides the strongest signal for knee OA detection (77.1%).

3.5 Computational times

The computational time required by the SDM to calculate the roughness and

orientation signatures for a single TB texture image was 1.8 s. The calculation

of the EMD between the two pairs of signatures took additional 0.2 s. The

SDM was implemented in MATLAB using Parallel Computing Toolbox and the

computational times were measured on a Intel Core2 Quad computer with 4 GB

of RAM and 2.66 GHz clock. At this stage, the MATLAB code developed is not

available.

4. Discussion and conclusion

For the classification of TB texture images, a new dissimilarity measure, i.e.,

a SDM was developed. Unlike feature extraction methods, the SDM does not

produce texture parameters. For the measure, special roughness, and orientation

signatures that are invariant to brightness shift, rotation, and scale changes in a

predefined range were developed.

The accuracy of the dissimilarity measure in texture classification was

investigated using Brodatz images. Lower accuracies of the SDM system at

16×16 and 30×30 pixels can be explained by the fact that the gradient and the

Laplacian differential operators use more extrapolated pixel values lying outside

the image boundary than the LBP operator. This problem could be overcome by

eliminating higher scales from the scale range used by the SDM. However, the

elimination of higher scales could limit the scale invariance of the SDM and this

36


could lead to high classification errors. The other way to overcome this problem

is by increasing training image sizes to at least 60×60 pixels.

The invariance of the SDM to noise, blur, changes in anisotropy direction, and

image size was evaluated using fractal texture images. It was found that the

SDM is invariant to Gaussian noise and Poisson noise with their contribution in

the image up to 25%. This is because changes in pixel values due to noise do

not affect image anisotropy and spatial arrangements of blob regions. Results

obtained showed that the SDM is sensitive to the blur generated by regular

film screen. The reason for this is that the blur suppresses texture details in

image. This affects image roughness and, subsequently, the roughness signature

changes. The problem could be solved by eliminating lower scales from the scale

range used by the SDM. However, this would limit the scale invariance of the

dissimilarity measure. Therefore, for the best performance of the SDM sharp film

screens should be used.

The SDM was invariant to changes in image anisotropy except for angles between

30◦ and 60◦. For these angles, the texture roughness is most affected by the pixel

interpolation used to represent anisotropy in directions other than horizontal and

vertical. This effect could be reduced by increasing the image resolution. It was

observed that the SDM is sensitive to the reduction in image size from 128×128 to

64×64 pixels. The reason is that in smaller images, the percentage of extrapolated

pixels used by the differential operators is higher than in larger images. The

other reason is that smaller images contain less detailed information about texture

roughness. The problem could be overcome by using image sizes larger than

64×64 pixels.

The invariance of the SDM to exposure, magnification and projection angle was

investigated using the tibia head database. Results obtained showed that the

SDM is invariant to exposure (2.5-30 mA s) and magnification (×1.00-×1.35).

37


Exposure causes changes in the width and position of image histogram [48]. The

SDM is insensitive to these changes because the roughness measure Rnorm(x, y) is

normalized and the differential operators used are invariant to brightness shift.

The invariance to magnification was achieved because the differential operators

are based on the scale-space theory. It was found that projection angles >5◦ affect

the measure. This is because the front view of bone texture is obstructed at higher

angles. Therefore, all precautions should be taken to ensure that radiographs are

taken at similar projection angle.

The performances of the SDM based classification systems in knee OA detection

were evaluated using classes of healthy and OA knee images. For the SDM-NN

system, the classification accuracy of 78.8% can be considered as relatively good

when compared against the other systems, but it is lower than the detection

rate of radiographic OA obtained by an experienced human reader (∼100%).

The values of specificity and sensitivity were similar and this indicates that the

SDM has a good detection trade off between healthy and OA classes. However,

the sensitivity value (76.8%) was a little lower than the specificity value (80.9%)

which is usually unadvisable in medical applications. In this case, there is more

chance for the subject with OA to be classified as healthy, than the other way

around. Examples of misclassified TB texture images are shown in Fig. 7. In

these examples the bone structures, especially trabeculae orientations, are similar

to those found in images from the opposite class (Figs. 5 and 7). The SDM-SVM

system achieved the best overall classification accuracy (85.4%). One possible

reason could be that the SVM classifier can account for possible nonlinear

decision boundaries between healthy and OA classes. The other reason could

be that the classifier works well in the presence of outliers or class overlapping

near the true decision boundary. The values of sensitivity (87.0%) and specificity

(83.8%) are slightly unbalanced in favour of knee OA detection rate which is

preferable. However, the classification accuracy of the SDM-SVM system is

still lower than the detection rate obtained by an experienced human reader.

38


The accuracies of these SDM based systems could be improved by adjusting

parameters of the SDM and using multiclassifiers.

The effect of different parts of the tibia on the performance of the SDM-NN

system was studied. It was found that the strongest signal related to OA detection

is provided by TB texture located immediately under the medial compartment

(77.1%). One possible reason is that most OA progression occurs medially due

to greater biomechanical load on this site of the knee [1]. This is supported by

the previous studies conducted on fractal analyses [4, 6]. The second strongest

signal was found for the part of the tibia located under the lateral compartment.

However, the classification accuracy calculated for this part was relatively low

(70.4%). The reason could be that for 57 OA knees used in the analysis there were

49 and 34 knees with radiographic OA in the medial and lateral compartments,

respectively. Therefore, the alterations in lateral tibial bone structure due to OA

were represented by fewer TB textures than in the medial case. It can be seen from

Fig. 6 that there is a large overlap between parts of the tibia with the strongest

signals related to OA detection and those selected by the automated method.

There were two TB texture images extracted from most knee radiographs. Since

image properties such as magnification, orientation, and contrast are similar

for TB texture images extracted from the same radiograph this could lead to

overoptimistic results. However, the effects of these properties on the SDM

are minimized because the measure is scale and rotation invariant, and the

roughness and orientation signatures are normalized. It was found that for

two out of 108 images that were correctly classified by the SDM-NN system,

their nearest neighbours were images extracted from the same radiograph,

respectively. Classification accuracy obtained using the classifier based on the

second nearest neighbour for the two images was 78.8%, i.e., the same as for the

original setup. Since all classifiers are based on the concept of similarity between

objects taken from the same class, one may conclude that the performances of

39


systems evaluated were not optimistically biased.

Individual contributions of the roughness and orientation signatures to the

performance of the SDM based classification system can provide information

about TB texture changes related to OA. The contributions were calculated in

the following manner: First, for each TB texture image, the distance to its nearest

neighbour was decomposed into the distances associated with the roughness and

orientation, respectively. Then, each distance was normalized in such a way

that their sum was equal to 1. For the correctly classified texture images the

normalized distances were subtracted from 1. Finally, all normalized distances

were averaged. The averaged values obtained for the roughness and orientation

signatures were 0.56 and 0.44, respectively. This indicates that the roughness of

TB texture is most affected by OA.

The SDM based system outperformed the WND-CHARM classification system

in the knee OA detection. One possible reason for the worse performance of the

WND-CHARM system (64.2%) is that the features used are not scale invariant.

The other reason is that the features are sensitive to angular space quantization

(e.g., directional Radon transform and Haralick features) and to image rotation

(e.g., Tamura features). Thus, even a small change in magnification or rotation

of TB texture image affects the feature values. Another possible reason could

be that the WND-CHARM system was not specifically designed for TB texture

analysis. However, it works well for the whole area of knee joints [21]. The value

of sensitivity obtained for the WND-CHARM system was 58.0%. This suggests

that the classification of knees with radiographic OA is almost random.

The SDM based systems achieved better overall classification accuracies than

the Tamura system. Possible reasons are that the Tamura features are sensitive

to image scale and orientation, and they correlate well with human visual

perception while trabeculae patterns in x-ray images are difficult to classify by

40


human readers. The sensitivity value obtained for the Tamura system was 50.7%

which suggests that the classification of OA knees is random.

The results obtained indicate the potential of SDM in decision support systems

for the detection and prediction of knee OA based on TB texture analysis. The

measure could be also useful in classifications of other medical images. This

includes, for example, classification of breast lesions using ultrasound images

[49], diagnosis of interstitial lung disease based on chest radiography [50, 51],

classification of dermoscopic images for skin lesions [52], and classification of

brain tumours from MRI [53] after extending the SDM to three dimensions.

Future research will focus on refining SDM based classification systems, e.g., by

using multiclassifiers, and evaluating the systems using large databases in clinical

settings.

Acknowledgements

Financial support from the School of Mechanical Engineering, University of

Western Australia is greatly appreciated.

41


5. References

[1] E L Radin and R M Rose. “Role of Subchondral Bone in the Initiation

and Progression of Cartilage Damage”. In: Clinical Orthopaedics and Related

Research 213 (1986), pp. 34–40.










Cartilage 13 (2005), pp. 463–470.



(2004), S10–S19.




and Cartilage 16 (2008), pp. 323–329.

[7] H Defossez, R M Hall, P G Walker, B M Wroblewski, P D Siney, and B

Purbach. “Determination of the trabecular bone direction from digitised

radiographs”. In: Medical Engineering & Physics 25 (2003), pp. 719–729.

[8] E Lespessailles, C Chappard, N Bonnet, and C L Benhamou. “Imaging

techniques for evaluating bone microarchitecture”. In: Joint Bone Spine 73

(2006), pp. 254–261.

42


[9] P Dieppe, J Cushnaghan, P Young, and J Kirwan. “Prediction of the

progression of joint space narrowing in osteoarthritis of the knee by bone

scintigraphy”. In: Annals of the Rheumatic Diseases 52 (1993), pp. 557–563.




Research 15 (2000), pp. 691–699.


“Estimation of the 3D self similarity parameter of trabecular bone from its


[12] L Pothuaud, P Carceller, and D Hans. “Correlations between grey level

variations in 2D projection images (TBS) and 3D microarchitecture:


Bone 42 (2008), pp. 775–787.





[14] J A Lynch, D J Hawkes, and J C Buckland Wright. “Analysis of texture in

macroradiographs of osteoarthritic knees using the fractal signature”. In:

Physics in Medicine and Biology 36 (1991), pp. 709–722.

[15] J C Buckland Wright, J A Lynch, and D G Macfarlane. “Fractal signature

analysis measures cancellous bone organisation in macroradiographs of

patients with knee osteoarthritis”. In: Annals of the Rheumatic Diseases 55

(1996), pp. 749–755.

[16] E A Messent, J C Buckland Wright, and G M Blake. “Fractal analysis of

trabecular bone in knee osteoarthritis (OA) is a more sensitive marker

of disease status than bone mineral density (BMD)”. In: Calcified Tissue

International 76 (2005), pp. 419–425.

43




pp. 460–474.

[18] N L Fazzalari and I H Parkinson. “Fractal dimension and architecture of

trabecular bone”. In: Journal of Pathology 178 (1996), pp. 100–105.

[19] I H Parkinson and N L Fazzalari. “Methodological principles for

fractal analysis of trabecular bone”. In: Journal of Microscopy 198 (2000),

pp. 134–142.

[20] M Wolski, P Podsiadlo, and G W Stachowiak. “Directional fractal signature

analysis of trabecular bone: evaluation of different methods to detect

early osteoarthritis in knee radiographs”. In: Proceedings of the Institution

of Mechanical Engineers - Part H: Journal of Engineering in Medicine 223 (2009),

pp. 211–236.

[21] L Shamir, S M Ling, W W Scott, Jr., A Bos, N Orlov, T J Macura, D M Eckley,

L Ferrucci, and I G Goldberg. “Knee x-ray image analysis method for

automated detection of osteoarthritis”. In: IEEE Transactions on Biomedical

Engineering 56 (2009), pp. 407–415.




[23] J H Kellgren and J S Lawrence. “Radiological assessment of osteoarthrosis”.

In: Annals of the Rheumatic Diseases 16 (1957), pp. 494–502.

[24] R P W Duin, D de Ridder, and D M J Tax. “Experiments with a featureless

approach to pattern recognition”. In: Pattern Recognition Letters 18 (1997),

pp. 1159–1166.

[25] E Pekalska, P Paclik, and R P W Duin. “A generalized kernel approach to

dissimilarity based classification”. In: Journal of Machine Learning Research 2

(2001), pp. 175–211.

44


[26] E Pekalska and R P W Duin. “Dissimilarity representations allow

for building good classifiers”. In: Pattern Recognition Letters 23 (2002),

pp. 943–956.

[27] H Jegou, C Schmid, H Harzallah, and J Verbeek. “Accurate Image Search

Using the Contextual Dissimilarity Measure”. In: IEEE Transactions on

Pattern Analysis and Machine Intelligence 32 (2010), pp. 2–11.

[28] N Sudha and Y H K Wong. “Hausdorff distance for iris recognition”. In:

22nd IEEE International Symposium on Intelligent Control. 2007, pp. 614–619.

[29] W Zhu, T Jiang, and X Li. “Local region based medical image segmentation

using J divergence measures”. In: 27th Annual International Conference of the

Engineering in Medicine and Biology Society. 2005, pp. 7174–7177.

[30] Y Rubner, C Tomasi, and L J Guibas. “The Earth Movers Distance as a

metric for image retrieval”. In: International Journal of Computer Vision 40

(2000), pp. 99–121.

[31] T Lindeberg. “Scale-space for discrete signals”. In: IEEE Transactions on


[32] J Babaud, A P Witkin, M Baudin, and R O Duda. “Uniqueness of the

Gaussian kernel for scale-space filtering”. In: IEEE Transactions on Pattern

Analysis and Machine Intelligence PAMI 8 (1986), pp. 26–33.

[33] H Tamura, S Mori, and T Yamawaki. “Textural features corresponding to

visual perception”. In: IEEE Transactions on Systems, Man and Cybernetics

SMC-8 (1978), pp. 460–473.

[34] M Wei-Ying and Z H Jiang. “Benchmarking of image features for

content-based retrieval”. In: Conference Record of the Thirty-Second Asilomar

Conference on Signals, Systems & Computers. 1998, pp. 253–257.

[35] V Castelli and L D Bergman. Image Databases: Search and Retrieval of Digital

Imagery. John Wiley & Sons, Inc., New York, 2002.

45


[36] P Brodatz. Textures: A Photographic Album for Artists and Designers. Dover

Publications, New York, 1966.

[37] T Ojala, M Pietikainen, and T Maenpaa. “Multiresolution gray-scale and

rotation invariant texture classification with local binary patterns”. In:


pp. 971–987.

[38] T Randen and T J H Husoy. “Filtering for texture classification: A

comparative study”. In: IEEE Transactions on Pattern Analysis and Machine

Intelligence 21 (1999), pp. 291–310.

[39] J Zhang, M Marszalek, S Lazebnik, and C Schmid. “Local features and

kernels for classification of texture and object categories: a comprehensive

study”. In: International Journal of Computer Vision 73 (2007), pp. 213–238.

[40] J C Russ. Fractal surfaces. Plenum Press, New York, 1994.

[41] A G Haus and S M Jaskulski. The Basics of Film Processing in Medical Imaging.

Medical Physics Publishing, Madison, 1997.

[42] J A Bencomo and B G Fallone. “A logit model for the modulation transfer

function of screen film systems”. In: Medical Physics 13 (1986), pp. 857–860.

[43] H B Mann and D R Whitney. “On a test of whether one of two

random variables is stochastically larger than the other”. In: The Annals of

Mathematical Statistics 18 (1947), pp. 50–60.

[44] P Podsiadlo and G W Stachowiak. “A rig for acquisition of standardized

trabecular bone radiographs”. In: Acta Radiologica 43 (2002), pp. 101–103.

[45] R D Altman, M Hochberg, W A Murphy, Jr., F Wolfe, and M

Lequesne. “Atlas of individual radiographic features in osteoarthritis”. In:

Osteoarthritis and Cartilage 3 (1995), A3–A70.

[46] P Podsiadlo, M Wolski, and G W Stachowiak. “Automated selection of

trabecular bone regions in knee radiographs”. In: Medical Physics 35 (2008),

pp. 1870–1883.

46





[48] J C Russ. The Image Processing Handbook, 4th ed. CRC Press, Florida, 2002.




pp. 280–298.



(2009), pp. 226–230.




Radiology 12 (2005), pp. 337–346.



[53] P Georgiadis, D Cavouras, I Kalatzis, D Glotsos, E Athanasiadis,

S Kostopoulos, K Sifaki, M Malamas, G Nikiforidis, and E Solomou.

“Enhancing the discrimination accuracy between metastases, gliomas and

meningiomas on brain MRI by volumetric textural features and ensemble

pattern recognition methods”. In: Magnetic Resonance Imaging 27 (2009),

pp. 120–130.

47


6. List of figures

(a)

-4 -2 0 20

0.02

0.04

0.06

0.08

Roughness measure Rnorm

(x,y)

Bin

weig

hts

MR = -0.32

(b)

(c)

-2 0 2 40

0.02

0.04

0.06

0.08

Roughness measure Rnorm

(x,y)

Bin

weig

hts

MR = 0.05

(d)

Figure 1: Texture images and their respective roughness signatures: [(a) and (b)]A reptile skin texture taken from Brodatz album; [(c) and (d)] an isotropic fractaltexture image with FD 2.9.

48


(a) (b) (c)

(d) (e) (f)

Figure 2: Steps required for the selection of edge angles in texture image: (a)A trabecular bone texture image; (b) and (c) minimum (bright patches representhills) and maximum (bright patches represent valleys) values of the Laplacianoperator Diff2 calculated for the bone texture image; (d) a binary image of theminimum and maximum values of Diff2 (white regions represent minima; blackregions represent maxima); (e) edges represented as a perimeter of the binaryimage; and (f) the bone texture image with superimposed edges. Edge angles areselected at the locations of edges.

49


(a) (b)

(c) (d)

(e) (f)

Figure 3: Brodatz texture images and their respective orientation signatures: [(a),(b), (c), and (d)] Leather texture images rotated at 70◦ and 120◦; [(e) and (f)] aweave texture image.

50


Figure 4: Examples of TB texture regions extracted from the medial and lateralcompartments of the tibia.

51


(a) (b) (c) (d)

Figure 5: Examples of TB texture images taken from [(a) and (b)] healthy and [(c)and (d)] OA classes.

52


Figure 6: Classification accuracies (%) calculated for different parts of tibia head.Two areas corresponding to the highest classification accuracies are highlighted.Square regions with white boundary were selected by the automated method.

53


(a) (b) (c) (d)

Figure 7: Examples of misclassified images of [(a) and (b)] healthy and [(c) and(d)] OA TB textures.

54


7. List of tables

Table 1: Classification accuracies of Brodatz textures calculated using fourtraining angles.

Accuracy (%)

Size of training images SDM LBP

16×16 87.35 97.91

30×30 94.79 98.21

60×60 98.80 98.21

90×90 98.95 98.06

180×180 99.10 98.06

55


Tabl

e2:

Cla

ssifi

cati

onac

cura

cies

ofBr

odat

zte

xtur

esca

lcul

ated

usin

gsi

ngle

trai

ning

angl

e.

Size

oftr

aini

ngim

ages

/ac

cura

cy(%

)

SDM

LBP

Trai

ning

angl

e(d

eg)

16×

1630×

3060×

6090×

9018

0×18

016×

1630×

3060×

6090×

9018

0×18

0

078

.07

88.7

891

.96

90.4

793

.45

94.5

494

.54

95.1

395

.03

94.8

4

2083

.03

96.3

299

.10

99.0

099

.10

96.7

298

.01

97.9

198

.01

98.0

1

3085

.81

97.3

299

.10

99.1

099

.10

97.7

198

.11

98.0

198

.11

98.0

1

4586

.30

96.0

398

.80

99.1

099

.10

97.5

197

.42

97.5

197

.61

97.6

1

6086

.30

94.4

497

.42

97.3

299

.10

97.6

197

.51

97.7

197

.91

97.9

1

7084

.53

94.3

495

.63

96.6

298

.61

98.0

198

.01

98.0

198

.01

98.0

1

9077

.18

91.4

698

.01

98.7

198

.71

93.1

592

.26

92.8

593

.05

93.2

5

120

83.4

398

.21

99.1

099

.10

99.1

098

.11

97.9

198

.01

98.0

198

.11

135

83.9

296

.82

98.8

099

.10

99.1

097

.81

97.3

297

.42

97.6

197

.61

150

88.1

994

.34

96.4

296

.92

98.6

198

.21

98.0

198

.11

98.1

198

.11

Ave

rage

83.6

894

.81

97.4

497

.54

98.4

096

.94

96.9

197

.07

97.1

597

.15

56


Table 3: Mean (99% CI) values of the SDM calculated for different image sizes; 20images were used for each image size.

Image size Mean (99% CI)

64×64 0.68 (0.03)∗

128×128 (reference group) 0.58 (0.06)

192×192 0.52 (0.02)

256×256 0.53 (0.03)∗ Statistically significant differences (P<0.01).

57


Table 4: Mean (99% CI) values of the SDM calculated for different isotropy angles;ten images were used for each anisotropy direction.

Anisotropy direction (deg) Mean (99% CI)

0 (reference group) 0.58 (0.08)

6 0.54 (0.03)

12 0.62 (0.04)

18 0.60 (0.04)

24 0.60 (0.04)

30 0.70 (0.05)∗

36 0.79 (0.06)∗

42 0.84 (0.06)∗

48 0.78 (0.06)∗

54 0.80 (0.06)∗

60 0.80 (0.06)∗

66 0.60 (0.05)

72 0.58 (0.04)

78 0.59 (0.04)

84 0.54 (0.03)

90 0.55 (0.04)∗ Statistically significant differences (P<0.01).

58


Table 5: Mean (99% CI) values of the SDM calculated for different noisecontribution, no statistically significant differences were found; 20 images wereused for each noise type.

Contribution ofnoise (%)

Mean (99% CI)(Poisson noise)

Mean (99% CI)(Gaussian noise)

0 (reference group) 0.66 (0.05) 0.66 (0.05)

5 0.64 (0.03) 0.64 (0.03)

10 0.64 (0.03) 0.64 (0.03)

15 0.64 (0.03) 0.64 (0.03)

20 0.64 (0.03) 0.64 (0.03)

25 0.64 (0.03) 0.64 (0.03)

59


Table 6: Mean (99% CI) values of the SDM calculated for different film screens; 20images were used for each film screen.

Film screen Mean (99% CI)

No screen (reference group) 0.60 (0.04)

Fine 0.62 (0.02)

Regular 0.64 (0.02)∗

∗ Statistically significant differences (P<0.01).

60


Table 7: Mean (99% CI) values of the SDM calculated for different exposures,no statistically significant differences were found; 25 images were used for eachexposure.

Exposure value (mA s) Mean (99% CI)

2.5 (reference group) 0.71 (0.07)

5 0.70 (0.03)

7.5 0.69 (0.03)

10 0.69 (0.03)

15 0.65 (0.03)

24 0.63 (0.03)

30 0.67 (0.03)

61


Table 8: Mean (99% CI) values of the SDM calculated for different magnifications,no statistically significant differences were found; 25 images were used for eachmagnification.

Magnification Mean (99% CI)

×1.00 (reference group) 0.67 (0.06)

×1.13 0.64 (0.03)

×1.23 0.64 (0.03)

×1.35 0.70 (0.03)

62


Table 9: Mean (99% CI) values of the SDM calculated for different projectionangles; 25 images were used for each angle.

Projection angle (deg) Mean (99% CI)

0 (reference group) 0.53 (0.03)

5 0.54 (0.01)

10 0.58 (0.02)∗

15 0.71 (0.03)∗

∗ Statistically significant differences (P<0.01).

63


Table 10: Details of healthy and OA classes.

ClassNo. of TB

textureimages

Mean (SD)age in years

Mean (SD)BMI in kg/m2 Men/Women

Healthy 68 40.8 (7.4) 24.6 (4.5) 0.70

OA 69 44.5 (7.5) 27.1 (4.0) 0.69

64


Table 11: Confusion matrices of SDM-NN, SDM-SVM, WND-CHARM, andTamura systems for knee OA detection.

SDM-NN SDM-SVM WND-CHARM Tamura

Healthy OA Healthy OA Healthy OA Healthy OA

Healthy 55 13 57 11 48 20 58 10

OA 16 53 9 60 29 40 34 35

65

CHAPTER 3

PREDICTION OF PROGRESSION OF RADIOGRAPHIC

KNEE OSTEOARTHRITIS USING TIBIAL

TRABECULAR BONE TEXTURE


Marek Kurzynski, PhD2, L Stefan Lohmander, MD, PhD3,4, and Martin

Englund, MD, PhD3,5




Poland

3Department of Orthopedics, Clinical Sciences, Lund University, Sweden

4Research Unit for Musculoskeletal Function and Physiotherapy, Institute of

Sports Science and Clinical Biomechanics, and Department Orthopaedics and

Traumatology, University of Southern Denmark, Denmark

5Clinical Epidemiology Research & Training Unit, Boston University School of

Medicine, MA, USA

Arthritis & Rheumatism 2012;64;688–695

66

CHAPTER 3. PREDICTION OF PROGRESSION . . .

Abstract

Objective. To develop a system for prediction of progression of radiographic

knee osteoarthritis (OA) using tibial trabecular bone (TB) texture.

Methods. We studied 203 knees with (n=68) or without (n=135) radiographic

tibiofemoral OA in 105 subjects (90 men, 15 women, mean age 54 years) who had

2 sets of knee radiographs taken 4 years apart. We determined medial and lateral

compartment tibial TB texture using an automated region selection method.

Three texture parameters were calculated: roughness, degree of anisotropy, and

direction of anisotropy based on a signature dissimilarity measure method. We

evaluated tibiofemoral OA progression using a radiographic semi-quantitative

outcome: an increase in the medial joint space narrowing (JSN) grade. We

examined the predictive ability of TB texture in knees with and without

pre-existing radiographic OA, with adjustment for age, sex, and body mass

index using logistic regression (generalized estimating equations) and receiver

operating characteristic curves.

Results. The prediction of increased medial JSN in knees with or without

pre-existing radiographic OA was the most accurate for medial TB texture; area

under the curve (AUC) was 0.77 and 0.75, respectively. For lateral TB texture,

AUC was 0.71 and 0.72, respectively.

Conclusion. We have developed a system, based on analysing tibial TB texture,

which yields good prediction of loss of tibiofemoral joint space. The predictive

ability of the system needs to be further validated.

Key words: osteoarthritis, radiography

67


1. Introduction

Osteoarthritis (OA) is the most common knee joint disease and the leading cause

of knee pain and functional disability in adults [1]. On a structural level, knee

OA is characterized by loss of articular cartilage, meniscal tears and maceration,

osteophytes, and microstructural changes in subchondral bone [2–5]. Previous

studies have suggested that bone changes occur before cartilage defects [6]. The

two-dimensional trabecular bone (TB) texture provided by plain radiography

contains information directly related to the three-dimensional bone structure

[7–10]. Because of these findings there is a growing interest in developing a

low-cost and non-invasive TB texture-based system for predicting progression of

early and late knee OA, i.e. a method to examine the size, shape, and orientation

of TB trying to foresee the risk for structural progression of OA.

In a previous study structural features of x-ray images of the whole knee joint

predicted OA progression, defined as an increase from Kellgren and Lawrence

(KL) [11] grade 0 at baseline to grade 2 at follow-up 20 years later [12]. For

the prediction, a Weighted Neighbor Distance using Compound Hierarchy of

Algorithms Representing Morphology (WND-CHARM) classification system

was used and the classification accuracy of 62% was obtained. In a further study

the TB texture was used to predict OA progression defined as an increase in

medial joint space narrowing (JSN) grade over a 3-year period [13]. This study

was done using a system based on a regression model and fractal signature

analysis (FSA). The system achieved the prediction accuracy value (defined as an

area under the receiver operating characteristic curve, AUC) for OA progression

of 0.75. Although results obtained from these two systems are promising, the

interpretation of bone texture changes is not easy. This is because the image

features extracted in the WND-CHARM system and the polynomial coefficients

used in the regression model and FSA based system have little or no physical

meaning. Also, the WND-CHARM system is sensitive to imaging conditions

68


such as magnification and rotation, while the box-counting technique used in

the calculation of FSA highly depends on trabecular marrow pore size and

signal-to-noise ratio [14].

In the present study, we used a well-defined cohort of subjects with prior

meniscectomy having weight-bearing knee radiographs taken 4 years apart.

We developed trabecular bone structure parameters based on a signature

dissimilarity measure (SDM) method [15] that quantifies roughness, degree of

anisotropy and direction of anisotropy of TB textures. These parameters are

invariant to a range of image magnification, exposure, noise, and blur. Unlike the

previous studies [12, 13], we evaluated progression of both early and late medial

compartment knee OA. This allowed for a thorough assessment of the influence

of changes in TB texture in the different stages of knee OA.

2. Subjects and methods

2.1 Subjects

The study was approved by the ethics committee of the Faculty of Medicine

at Lund University, Sweden and informed consent was obtained from all

participants. Subjects were retrospectively identified via surgical records to have

undergone isolated medial or lateral meniscectomy at Lund University Hospital

in 1983, 1984, or 1985 [16, 17]. Exclusion criteria included cruciate ligament

injury, previous knee surgery (i.e., knee surgery before the index meniscectomy),

meniscectomy in both knee compartments, osteochondritis dissecans, fracture in

or adjacent to the knee, septic arthritis, osteonecrosis, and radiographic signs of

knee OA at the time of meniscectomy [17].

Out of 519 identified subjects, 254 who did not meet any of the exclusion criteria

were invited to participate at the first radiographic follow-up examination (Exam

A) in 2000. Knee x-rays were taken from 155 subjects who were then invited to a

69


second radiographic examination (Exam B) in 2004. For 106 subjects, longitudinal

knee x-rays were obtained (Table 1). We excluded one subject due to bilateral

radiographic end-stage OA at Exam A (i.e. medial grade joint space narrowing

(JSN) grade 3, osteotomy or arthroplasty), and further 6 knees (in 6 subjects)

were excluded due to the same reason. One knee was also excluded due to

artefacts present on the film preventing the image analysis, leaving 203 knees

in 105 subjects for analysis.

2.2 Acquisition and grading of knee radiographs

At Exam A, standing anteroposterior (AP) digital radiographs of the tibiofemoral

joint at about 15◦ flexion were obtained using a fluoroscopically positioned x-ray

beam [17]. At Exam B, posteroanterior digital radiographs of the tibiofemoral

joint were obtained using the fixed flexion (SynaFlexer) protocol [18, 19]. In a pilot

series prior to Exam B, we acquired on the same day knee radiographs from 10

subjects (20 knees) using both protocols. Grading the radiographs (side-by-side

comparison) we detected some discrepancies but no statistical or systematic

differences between the pairs of knees with respect to semi-quantitative JSN

and osteophytes scoring according to the 1995 atlas of Osteoarthritis Research

Society International (OARSI) [20]. Since bone textures from radiographs at Exam

B were not used, having two different protocols did not affect subsequent image

analyses.

Two readers who had the knowledge of time sequence but were blinded to

clinical data and the other raters readings graded the paired knee radiographs

from Exam A and B. They read for JSN and osteophytes in the tibiofemoral joint

on a four point scale (0 to 3, where 0 indicates no evidence of JSN or osteophytes)

according to the 1995 OARSI atlas [20]. Interobserver agreement using weighted

kappa was 0.84 for JSN and 0.72 for osteophytes. The films with discrepancy of

any JSN or osteophyte grade between the investigators were then adjudicated,

i.e. a consensus reading was made. At Exam B we classified one subject who

70


was operated on with proximal tibial valgus osteotomy in the left knee, between

examinations, as having JSN grade 3 in the medial compartment.

2.3 Definitions

Radiographic knee OA

We defined a knee to have radiographic tibiofemoral OA if one or more of the

following criteria were fulfilled in either the medial or lateral compartment:

• JSN grade ≥2

• Sum of marginal osteophytes grades in the same tibiofemoral compartment

≥2

• JSN grade 1 and marginal osteophytes grade 1 in the same tibiofemoral

compartment

These criteria approximate grade 2 or worse on the KL scale.

Progression of medial compartment radiographic knee OA

Because lateral compartment OA is rare, we focused on medial compartment OA

only. We defined progression of medial compartment radiographic knee OA as

an increase in the medial compartment JSN grade.

First, we analysed all knees as one group irrespective of radiographic status

at Exam A. Second, we stratified the analysis for the absence or presence of

pre-existing radiographic tibiofemoral OA, as defined above. Hence, early and

late radiographic OA progressions in the medial compartment were evaluated

separately.

2.4 Trabecular bone image analysis

We used digital radiographs taken at Exam A (Phasix 60 generator, CGR, Liege,

Belgium). For the image analysis the radiographs were converted from DICOM

71


to uncompressed TIFF format and stored as 8-bit gray-scale level images with the

resolution of 146 µm per image pixel. Previous studies showed that 8-bit images

contain sufficient details for evaluation of OA changes [13, 21]. Image analysis

was performed blinded to the outcome.

Region selection

An automated region selection method was used to determine the TB region

of interest (ROI) on the digital radiographs [22]. The method selects the ROI

on the subchondral bone immediately under the cortical plate of the medial

and lateral tibial compartment, respectively, in a series of steps (Fig. 1). The

steps include delineation of cortical bone plates using active shape model and

fine ROI adjustment for fibula head, periarticular osteopenia, and subchondral

bone sclerosis. Landmarks used are: tibial borders, tibial spine, fibula head, and

cortical plates. Epiphyseal bone and physis are not considered in the selection of

ROIs. Size of each TB texture image selected was 112×112 pixels, which covered

an area of 16.4×16.4 mm.

Bone texture parameters

We calculated three TB texture parameters, i.e. roughness, degree of anisotropy

and direction of anisotropy using the SDM method. In the method, a scale-space

representation of a bone texture image is generated as a set of images in which the

fine-scale features are successively smoothed. This is achieved by the convolution

of the bone image with Gaussian kernels of increasing width parameter (called

scale). 25 scales ranging from 1 to 9 pixels in steps of 1.096n, n = 0, . . . , 24

were used. Having the image representation, for each pixel the gradient (edge

detector) and Laplacian (smoothness detector) operators are calculated across

all scales and the extremum values of the operators are found. A difference

between the extremum values and an angle associated with the extremum

gradient value define roughness and orientation measures at each pixel location,

respectively. A normalized histogram of the roughness (orientation) measure,

72


called a roughness (orientation) signature, is then generated. The shape and

position of the roughness signature with respect to zero describe the roughness

of bone texture image. If the signature has the maximum value approximately

at zero this indicates that, on average, a neighbourhood of each pixel does not

resemble either smooth regions or edge patches. For the signature skewed toward

negative values, each pixel has its neighbourhood, on average, with more blobs

(smooth regions) than edges. For positive values this is opposite. The above

procedure is repeated for all bone texture images.

Roughness (R1,R2)

For roughness (i.e. complexity of a bone texture) measurement Earth Mover’s

Distances (EMDs) [23] between all possible pairs of roughness signatures are

calculated. The EMDs represent an image distance space. We defined two

independent roughness parameters (R1, R2) as a projection of the distances

calculated between the roughness signatures of TB texture images on a

two-dimensional space. For the distance projection, Sammons nonlinear

mapping [24] was used. The space dimension was chosen as a trade off between

avoiding the ”curse of dimensionality” and being able to capture possible

nonlinear relations in the distance space. The parameters provide a measure of

the overall texture roughness. Higher roughness indicates that there are more

sharp-edged texture features (i.e. more thin and long trabeculae and more narrow

spaces in between them). The parameters were normalized in such a way that the

smoothest and roughest TB texture in this study corresponded to the values (0,

0) and (1, 1), respectively, for (R1, R2). Characterisation of TB texture roughness

provides valuable information about TB changes in OA [3, 25].

Degree of Anisotropy (DegA)

We defined the degree of anisotropy as the sum of squared bin weights in the

73


orientation signature, i.e.:

DegA =∑θ

[S(θ)]2, (1)

where S(θ) is the weight of the angle θ in the orientation signature. The DegA

parameter is a measure of the overall anisotropy of TB texture; the higher value

of the parameter, the higher is the degree of anisotropy (i.e. there are more

sharp-edged texture features aligned in the same direction). The parameter was

normalized in such a way that the values 0 and 1 for DegA represent the least and

most anisotropic TB texture. Anisotropy of TB texture changes with OA [21, 25].

Direction of Anisotropy (DirA)

Direction of anisotropy was defined as the average value of normalized bin

centers in the orientation signature, i.e.

DirA =

∑θ θ · S(θ)∑θ S(θ)

. (2)

This parameter measures the weighted average direction of trabeculae

alignments. Each weight S(θ) is proportional to the ”sharpness” of the trabeculae

aligned along the angle θ, i.e. longer and thinner (shorter and thicker) trabeculae

have higher (lower) weights. This allows for quantifying the overall direction of

bone texture with a single number that depends on the sharpness and alignment

of trabeculae. DirA is equal to 0◦ for bone texture that has all trabeculae aligned

to the image vertical direction. OA changes in TB texture at different directions

are significant [21, 26].

2.5 Statistics

To evaluate predictive abilities of the texture parameters we used two binary

logistic regression models (Model 1 and Model 2). For both models the covariates

used were the texture parameters and their quadratic and first-order interaction

terms. Forward/backward parameter selection or a subset of the parameters with

74


significant associations were not used, i.e. all linear, quadratic and interaction

terms of three texture parameters were always included in the models. In Model 2

we further adjusted for age, sex, and body mass index (BMI). Hosmer-Lemeshow

tests were used to assess goodness of fit. To account for correlation between

knees within the same subject, we estimated regression coefficients using type III

generalized estimating equations with exchangeable working correlation matrix.

A prediction score for the progression of early and late medial compartment

radiographic OA was calculated for each knee. The score was an average value

of all covariates weighted by the regression coefficients.

We constructed the receiver operating characteristic (ROC) curves based on

the scores using 10-fold cross-validation method [27]. The cross-validation was

repeated 300 times and the averaged ROC curves were calculated. The area under

the curve (AUC) was used as a measure of the overall performance of the model.

For the null model, the AUC is equal to 0.5. The two models were not optimized

for AUC since our aim was to develop models that can provide an accurate

prediction without being optimized for a particular performance measure. As

this is an exploratory study, we focused on identifying the models and terms that

are predictive of loss of tibiofemoral joint space and hence we did not correct

significant associations in the models for multiple testing.

The statistical analysis was performed in SPSS software, release 16.0 (SPSS Inc.,

Chicago IL, USA).

3. Results

Radiographic characteristics

At Exam A, 68 knees (33%) in 51 subjects (49%) were classified as having

radiographic tibiofemoral OA (Table 1). Fifty-four knees (27%) in 41 subjects

(39%) had an increase in the medial compartment JSN grade from Exam A to B.

75


Associations with subject characteristics

Associations of the texture parameters of medial ROI with age, sex, BMI, and

medial JSN grade at Exam A were calculated using analysis of covariance. The

R1 and R2 parameters were both associated with age (P<0.01) and sex (P<0.01).

For both the DegA and DirA parameters, associations were found with BMI

(P<0.01) and medial JSN grade (P=0.02). For lateral ROI there were no significant

associations.

Prediction of OA progression

We studied all knees in the sample (n=203). The highest prediction accuracy

(AUC 0.77) was obtained using the medial ROI and Model 2 (Table 2). The

quadratic terms of the texture parameters R2, DegA and DirA, and the interaction

terms R1∗R2 and DegA∗DirA were significantly associated with increase in

medial JSN grade (Table 3). In this multivariable model with the highest

prediction accuracy, the covariates age (P=0.76), sex (P=0.57), and BMI (P=0.06)

were found not to be statistically significant. The ROC curves obtained for the

model are shown in Fig. 2.

Prediction of early OA progression

We studied knees (n=135) with KL grade equivalents≤1 (calculated from JSN and

osteophytes grades) at Exam A. Model 2 constructed for the medial ROI achieved

the highest prediction accuracy (AUC 0.75, Table 2). Significant associations were

found for the interaction and quadratic terms of the DegA and DirA parameters.

Age (P=0.85), sex (P=0.89) and BMI (P=0.71) were not significant (Table 3).

Prediction of late OA progression

We studied knees (n=68) with KL grade equivalents≥2 at Exam A. Once again the

highest prediction accuracy (AUC 0.77) was obtained using the medial ROI and

Model 2 (Table 2). The parameters R1, R2, their interaction term R1∗R2, sex, and

76


BMI were significantly associated with loss of medial compartment joint space.

Age (P=0.82) was found to be not statistically significant (Table 3).

The results were obtained for Models 1 and 2 that included linear, quadratic and

interaction terms. If not all terms were used the prediction accuracies ranged

from 0.54 to 0.60 AUC.

For model based on age, sex and BMI the accuracies were 0.58 (all knees), 0.52

(knees with early OA progression) and 0.66 (knees with late OA progression)

AUC. Adding medial JSN grade at Exam A to the model increased the AUC

values by 0.02, and further adding the texture parameters increased them to

0.75 (all knees), 0.74 (knees with early progression) and 0.77 (knees with late OA

progression).

4. Discussion

In this study we confirm that medial tibial TB texture is predictive of loss

of medial joint space in knees with existing OA [13]. Importantly, we extend

these findings to provide a detailed description of TB roughness and anisotropy

changes due to knee OA and to show for the first time that medial tibial TB texture

is also predictive of medial joint space loss in knees with KL grade equivalents

≤1 at baseline and that lateral tibial TB texture is predictive of medial joint space

loss.

We have several possible explanations for the prediction of loss of medial

compartment joint space by TB texture, and they are not mutually exclusive.

First, there may be an interaction between subchondral bone remodelling

and the modulation of cartilage catabolism [28]. Previous studies suggest that

an increase in vascularisation and remodelling rate of OA subchondral bone

promotes diffusion of cytokines, eicosanoids and growth factors into articular

cartilage, thus affecting chondrocytes and inducing cartilage degradation [29–32].

77


This could lead to the onset of secondary ossification and a decrease in

cartilage thickness [33]. Second, genetic factors may play a role in the abnormal

metabolism of subchondral bone, leading to morphological alterations and

subsequently to radiographic OA changes. For example, Wnt signaling

antagonists are involved in the proliferation, differentiation and mineralisation of

osteoblasts [34–37], and polymorphisms of their encoding genes were identified

in patients with knee [38] and hip [39] OA. Also, abnormal production of

IGF-1 growth factor in OA subchondral bone osteoblasts could increase bone

remodelling rate and stiffness, leading to cartilage matrix degradation [28].

Third, TB texture changes may indicate abnormal trabecular bone structure

due to unfavourable biomechanical loading, which may also adversely affect

the overlying joint cartilage. Previous studies showed that high systemic

bone mineral density (BMD) increases the risk of incident knee OA and JSN

[40]. Meniscal damage is associated with increased BMD in the ipsilateral

compartment [41], and with increased risk of the development of subchondral

bone marrow lesions [42].

For medial ROI and knees without pre-existing radiographic OA, the DegA∗DirA

term was positively associated with an increase in the medial JSN grade (Table 3).

This indicates an increase in the degree of anisotropy of TB texture and a shift

in the alignment of trabeculae towards the horizontal direction. The finding

is consistent with previous studies where the retention of horizontal trabeculae

within the medial subchondral region was observed [3, 43]. The bone architecture

may be reorganized in knees with early-stage OA as a result of trabeculae being

less well aligned to the main loading direction [44].

For knees with pre-existing OA, the R1 and R2 terms (medial ROI) were

negatively and positively associated with an increase in the medial JSN grade,

respectively (Table 3). This suggests a nonlinear change in the overall roughness.

The negative association of the R1 term could indicate a decrease in the overall

78


roughness of TB which can be associated with shorter and thicker trabeculae.

Previous studies found that during OA bone remodelling the TB thickness

increased, especially in the main loading direction [3–5], and the medial tibial

plateau bone area expanded [45]. The expansion of the bone area has been

associated with the thickening of subchondral trabeculae in knees with early

and moderate OA [46]. The positive association of the R2 term could indicate

an increase in the number of thinner trabeculae which can be attributed to

fenestration and thinning. It was suggested that the high roughness of TB texture

in late knee OA is caused by osteoporosis [43]. The DegA and DirA parameters

were found not to be statistically significant, while female gender and high BMI

were associated with increased risk of progression.

We also found that the lateral compartment tibial TB texture predicts medial

joint space loss. A possible explanation is that medial compartment OA is

often associated with relative unloading of the lateral compartment inducing

bone resorption [47]. Further, medial cartilage thickness correlates with lateral

apparent trabecular number, thickness and separation [47, 48].

Our study has several important limitations that we would like to point out. First,

two different radiographic protocols were used, although both of them were in

weight bearing and with the knee in about the same degree of flexion. A pilot

sample prior to Exam B doing both protocols did not suggest any systematic

effects on the semi-quantitative scoring of JSN and osteophytes. Second, all

the subjects have had prior partial meniscectomy in at least one knee and it is

unclear whether prediction would be different for a cohort without meniscal

surgery. Third, the sample size available for lateral compartment OA was too

small, hence was not analysed. Fourth, for the prediction of OA we used

discrete medial JSN and osteophyte grades whereas the nature of knee OA is

continuous. This could affect our results since the grading of OA features is

subject to the reader’s interpretation and the grades may not be linear with

79


respect to the actual progression of knee OA [49]. A further limitation is that

we did not analyse x-ray images at Exam A for subjects lost to follow-up (32% of

subjects). Hence, there are no estimates if the texture parameters differ from those

subjects who completed Exam B. Finally, the texture parameters do not provide

information about bone texture changes at individual scales and directions as

fractal signatures do. However, they are able to quantify texture changes at each

pixel location over all scales. Further studies with large databases of knee images

are required to evaluate the full potential of different approaches in the prediction

of OA.

In conclusion, the system for the automated analysis of TB texture showed

promising results in the prediction of loss of tibiofemoral joint space. In

particular, we showed that the texture parameters markedly improve the model

for prediction of joint space based on age, sex, BMI, and JSN grade, and that a

good prediction of medial joint space loss in knees with early OA progression (i.e.

KL grade equivalents ≤1 at baseline) can be obtained. The prediction accuracy

of this system needs to be further validated using large databases of knee images

from other populations.

Acknowledgements

The study was supported by grants from the School of Mechanical and Chemical

Engineering, University of Western Australia, the Swedish Research Council,

and Lund University Medical Faculty. Drs Englund and Lohmander are funded

by the Swedish Research Council, the Greta and Johan Kock Foundation, King

Gustaf V 80-year Birthday Foundation, and the Faculty of Medicine, Lund

University, Sweden.

Conflict of interest

The authors have no conflict of interest for this manuscript.

80


5. References

[1] G Peat, R McCarney, and P Croft. “Knee pain and osteoarthritis in older

adults: a review of community burden and current use of primary health

care”. In: Annals of the Rheumatic Diseases 60 (2001), pp. 91–97.

[2] E L Radin and R M Rose. “Role of subchondral bone in the initiation

and progression of cartilage damage”. In: Clinical Orthopaedics and Related

Research 213 (1986), pp. 34–40.

[3] J C Buckland-Wright. “Subchondral bone changes in hand and knee


(2004), S10–S19.


microstructure in the medial condyle of the proximal tibia of patients with

knee osteoarthritis”. In: Bone 17 (1995), pp. 27–35.






Cartilage 15 (2007), pp. 479–486.



to three-dimensional microarchitecture”. In: Journal of Bone and Mineral

Research 15 (2000), pp. 691–699.

[8] R Jennane, G Harba, G Lemineur, S Bretteil, A Estrade, and C L Benhamou.

“Estimation of the 3D self-similarity parameter of trabecular bone from its


81


[9] L Pothuaud, P Carceller, and D Hans. “Correlations between grey-level

variations in 2D projections images (TBS) and 3D microarchitecture:


Bone 42 (2008), pp. 775–787.

[10] L Apostol, V Boudousq, O Basset, C Odet, S Yot, J Tabary, J M Dinten,

E Boiler, P O Kotzki, and F J Peyrin. “Relevance of 2D radiographic texture

analysis for the assessment of 3D bone micro-architecture”. In: Medical

Physics 33 (2006), pp. 3546–3556.




“Early detection of radiographic knee osteoarthritis using computer-aided





(2009), pp. 3711–3722.

[14] H W Chung, C C Chu, M Underweiser, and F W Wehrli. “On the fractal

nature of trabecular structure”. In: Medical Physics 21 (1994), pp. 1535–1540.

[15] T Woloszynski, P Podsiadlo, G W Stachowiak, and M Kurzynski. “A

signature dissimilarity measure for trabecular bone texture in knee

radiographs”. In: Medical Physics 37 (2010), pp. 2030–2042.

[16] M Englund, E M Roos, H P Roos, and L S Lohmander. “Patient-relevant

outcomes fourteen years after meniscectomy: influence of type of meniscal

tear and size of resection”. In: Rheumatology (Oxford) 40 (2001), pp. 631–639.

[17] M Englund and E M Roos L S Lohmander. “Impact of type of meniscal

tear on radiographic and symptomatic knee osteoarthritis: a sixteen-year

followup of meniscectomy with matched controls”. In: Arthritis &

Rheumatism 48 (2003), pp. 2178–2187.

82


[18] C Peterfy, J Li, S Zaim, J Duryea, J Lynch, Y Miaux, W Yu, and H K Genant.

“Comparison of fixed-flexion positioning with fluoroscopic semi-flexed

positioning for quantifying radiographic joint-space width in the knee:

test-retest reproducibility”. In: European Radiology 32 (2004), pp. 128–132.

[19] M Kothari, A Guermazi, G von Ingersleben, Y Miaux, M Sieffert, J E Block,

R Stevens, and C G Peterfy. “Fixed-flexion radiography of the knee

provides reproducible joint space width measurements in osteoarthritis”.

In: European Radiology 14 (2004), pp. 1568–1573.







and Cartilage 16 (2008), pp. 323–329.



pp. 1870–1883.

[23] Y Rubner, C Tomasi, and L J Guibas. “The earth movers distance as a metric

for image retrieval”. In: International Journal of Computer Vision 40 (2000),

pp. 99–121.

[24] J W Sammon. “A nonlinear mapping for data structure analysis”. In: IEEE

Transactions on Computers C18 (1969), pp. 401–409.

[25] E A Messent, R J Ward, C J Tonkin, and C Buckland-Wright. “Tibial

cancellous bone changes in patients with knee osteoarthritis. A short-term


Cartilage 13 (2005), pp. 463–470.

83






[27] W Adler and B Lausen. “Bootstrap estimated true and false positive rates

and ROC curve”. In: Computational Statistics & Data Analysis 53 (2009),

pp. 718–729.

[28] S K Tat, D Lajeunesse, J P Pelletier, and J Martel-Pelletier. “Targeting

subchondral bone for treating osteoarthritis: what is the evidence?” In: Best

Practice & Research Clinical Rheumatology 24 (2010), pp. 51–70.

[29] A M Coats, P Zioupos, and R M Aspden. “Material properties of

subchondral bone from patients with osteoporosis or osteoarthritis by

microindentation testing end electron probe microanalysis”. In: Calcified

Tissue International 73 (2003), pp. 66–71.

[30] D B Burr. “The importance of subchondral bone in osteoarthrosis”. In:

Current Opinion in Rheumatology 10 (1998), pp. 256–262.

[31] D B Burr and M B Schaffler. “The involvement of subchondral mineralized

tissues in osteoarthrosis: quantitative microscopic evidence”. In: Microscopy

Research and Technique 37 (1997), pp. 343–357.

[32] H Imhof, M Breitenseher, F Kainberger, T Rand, and S Trattnig.

“Importance of subchondral bone to articular cartilage in health and

disease”. In: Topics in Magnetic Resonance Imaging 10 (1999), pp. 180–192.

[33] D B Burr. “Increased biological activity of subchondral mineralized

tissues underlies the progressive deterioration of articular cartilage in

osteoarthritis”. In: The Journal of Rheumatology 32 (2005), pp. 1156–1158.

[34] X Li, P Liu, W Liu, P Maye, J Zhang, Y Zhang, M Hurley, C Guo, A Boskey,

L Sun, S E Harris, D W Rowe, H Zhu Ke, and D Wu. “Dkk2 has a role in

terminal osteoblast differentiation and mineralized matrix formation”. In:

Nature Genetics 37 (2005), pp. 945–952.

84


[35] J Li, I Sarosi, R C Cattley, J Pretorius, F Asuncion, M Grisanti, S Morony,

S Adamu, Z Geng, W Qiu, P Kostenuik, D L Lacey, W S Simonet,

B Bolon, X Qian, V Shalhoub, M S Ominsky, H Zhu Ke, X Li, and

W G Richards. “Dkk1-mediated inhibition of Wnt signalling in bone results

in osteopenia”. In: Bone 39 (2006), pp. 754–766.

[36] F Morvan, K Boulukos, P Clment-Lacroix, S Roman-Roman, I Suc-Royer,

B Vayssiere, P Ammann, P Martin, S Pinho, P Pognonec, P Mollat, C Niehrs,

R Baron, and G Rawadi. “Deletion of a single allele of the Dkk1 gene leads

to an increase in bone formation and bone mass”. In: Journal of Bone and

Mineral Research 21 (2006), pp. 934–945.

[37] D Diarra, M Stolina, K Polzer, J Zwerina, M S Ominsky, D Dwyer, A Korb,

J Smolen, M Hoffmann, C Scheinecker, D van der Heide, R Landewe, D

Lacey, W G Richards, and G Schett. “Dickkopf-1 is a master regulator of

joint remodelling”. In: Nature Medicine 13 (2007), pp. 156–163.

[38] A M Valdes, J Loughlin, M V Oene, K Chapman, G L Surdulescu,

M Doherty, and T D Spector. “Sex and ethnic differences in the association

of ASPN, CALM1, COL2A1, COMP, and FRZB with genetic susceptibility

to osteoarthritis of the knee”. In: Arthritis & Rheumatism 56 (2007),

pp. 137–146.

[39] J Loughlin, B Dowling, K Chapman, L Marcelline, Z Mustafa, L Southam,

A Ferreira, C Ciesielski, D A Carson, and M Corr. “Functional variants

within the secreted frizzled-related protein 3 gene are associated with hip

osteoarthritis in females”. In: Proceedings of the National Academy of Sciences

USA 101 (2004), pp. 9757–9762.

[40] M C Nevitt, Y Zhang, M K Javaid, T Neogi, J R Curtis, J Niu, C E McCulloch,

N A Segal, and D T Felson. “High systemic bone mineral density

increases the risk of incident knee OA and joint space narrowing, but not

radiographic progression of existing knee OA: the MOST study”. In: Annals

of the Rheumatic Diseases 69 (2010), pp. 163–168.

85


[41] G H Lo, J Niu, C E McLennan, D P Kiel, R R McLean, A Guermazi,

H K Genant, T E McAlindon, and D J Hunter. “Meniscal damage associated

with increased local subchondral bone mineral density: a Framingham

study”. In: Osteoarthritis and Cartilage 16 (2008), pp. 261–267.

[42] M Englund M, A Guermazi, F W Roemer, M Yang, Y Zhang, M C Nevitt,

J A Lynch, C E Lewis, J Torner, and D T Felson. “Meniscal pathology on

MRI increases the risk for both incident and enlarging subchondral bone

marrow lesions of the knee: the MOST Study”. In: Annals of the Rheumatic

Diseases 69 (2010), pp. 1796–1802.

[43] E A Messent, R J Ward, C J Tonkin, and C Buckland-Wright. “Cancellous

bone differences between knees with early, definite and advanced joint

space loss; a comparative quantitative macroradiographic study”. In:

Osteoarthritis and Cartilage 13 (2005), pp. 39–47.

[44] M Ding, A Odgaard, and I Hvid. “Changes in the three-dimensional

microstructure of human tibial cancellous bone in early osteoarthritis”. In:

Journal of Bone and Joint Surgery (British Volume) 85-B (2003), pp. 906–912.

[45] D Dore, S Quinn, C Ding, T Winzenberg, F Cicuttini, and G Jones.

“Subchondral bone and cartilage damage”. In: Arthritis & Rheumatism 62

(2010), pp. 1967–1973.

[46] Y Wang, A Wluka, and F M Cicuttini. “The determinants of change in

tibial plateau bone area in osteoarthritic knees: a cohort study”. In: Arthritis

Research & Therapy 7 (2005), R687–R693.

[47] R I Bolbos, J Zuo, S Banerjee, T M Link, C B Ma, X Li, and S Majumdar.

“Relationship between trabecular bone structure and articular cartilage

morphology and relaxation times in early OA of the knee joint using

parallel MRI at 3 T”. In: Osteoarthritis and Cartilage 16 (2008), pp. 1150–1159.

[48] C T Lindsey, A Narasimhan, J M Adolfo, H Jin, L S Steinbach,

T Link, M Ries, and S Majumdar. “Magnetic resonance evaluation of the

86


interrelationship between articular cartilage and trabecular bone of the

osteoarthritic knee”. In: Osteoarthritis and Cartilage 12 (2004), pp. 86–96.

[49] Orlov N Ferrucci L Goldberg IG Shamir L Rahimi S. “Progression analysis

and stage discovery in continuous physiological processes using image

computing”. In: EURASIP Journal on Bioinformatics and Systems Biology,

doi:10.1155/2010/107036 (2010).

87


6. List of figures

Figure 1: Trabecular bone regions of interest selected in the analysis using anautomated method.

88


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False positive rate

Tru

e p

ositiv

e r

ate

early OA progression (AUC 0.75)

late OA progression (AUC 0.77)

Figure 2: The receiver operating characteristic (ROC) curves obtained forprogression of early and late osteoarthritis (OA) respectively, i.e., using themedial region of interest and loss of medial compartment joint space as outcomewith adjustment for age, sex, and body mass index. The diagonal line representsthe ROC curve of the null model (area under the curve [AUC] 0.5).

89


7. List of tables

Table 1: Study subject characteristics (n=105).

Characteristic Exam A Exam B

Age, years 53.6±10.5 57.6±10.5

Men, n (%) 90 (86) –

Body mass index, kg/m2 26.1±3.4 26.8±3.7

Time since Exam A, months – 48.8±1.0

Knees (n=203) with radiographic OA∗, n (%) 68 (33) 95 (47)

Medial JSN grade (n=203), n (%)

0 107 (53) 85 (42)

1 78 (38) 79 (39)

2 18 (9) 28 (14)

3 0 (0) 11 (5)

Data are presented as means ± SD unless stated otherwise.OA, osteoarthritis; JSN, joint space narrowing.∗ Approximating Kellgren and Lawrence grade 2 or worse.

90


Table 2: Prediction accuracies calculated as area under the curve (AUC) forprogression of medial tibiofemoral compartment osteoarthritis (OA) defined asan increase in medial joint space narrowing grade.

Medial ROI Medial ROIOutcome Model 1∗ Model 2† Model 1∗ Model 2†

Whole sample(n=203)

0.74(0.67, 0.82)

0.77(0.70, 0.84)

0.68(0.62, 0.75)

0.71(0.64, 0.78)

Early OA progression‡

(n=135)0.74

(0.67, 0.82)0.75

(0.69, 0.83)0.72

(0.64, 0.80)0.72

(0.65, 0.80)

Late OA progression§

(n=68)0.76

(0.68, 0.84)0.77

(0.68, 0.86)0.68

(0.60, 0.77)0.71

(0.63, 0.79)

Data are presented as AUC means (95% confidence interval).ROI, region of interest.∗ based on texture parameters only.† based on texture parameters with adjustment for age, sex, and body mass index.‡ knees with Kellgren and Lawrence equivalents ≤1, see methods.§ knees with Kellgren and Lawrence equivalents ≥2, see methods.

91


Table 3: Significant associations of texture parameters and covariates for Model 2∗

and medial region of interest.

Texture parameter/covariate β (95% CI) P-value

Whole sample (n=203)

R1∗R2 21.3 (2.4, 40.0) 0.03

DegA∗DirA 12.6 (4.5, 20.5) <0.01

R2∗R2 -12.4 (-17.3, -7.4) 0.01

DegA∗DegA -7.5 (-10.4, -4.6) 0.01

DirA∗DirA -4.9 (-6.3, -3.4) <0.01

Early OA progression† (n=135)

DegA∗DirA 12.2 (6.8, 17.6) 0.02

DegA∗DegA -8.5 (-12.7, -4.2) 0.05

DirA∗DirA -6.3 (-8.7, -3.9) 0.02

Late OA progression‡ (n=68)

R1 -16.9 (-24.5, -9.4) <0.01

R2 16.5 (8.8, 24.1) <0.01

R2∗R2 -23.2 (-33.3, -13.1) 0.01

Sex (male) -1.0 (-1.5, -0.5) 0.02

Body mass index 2.2 (1.5, 3.0) <0.01

95% CI, 95% confidence interval; DegA, degree of anisotropy; DirA, direction of ani-sotropy; R1,R2, roughness.∗ based on texture parameters with adjustment for age, sex, and body mass index.† knees with Kellgren and Lawrence equivalents ≤1, see methods.‡ knees with Kellgren and Lawrence equivalents ≥2, see methods.

92

CHAPTER 4

A MEASURE OF COMPETENCE BASED ON RANDOM

CLASSIFICATION FOR DYNAMIC ENSEMBLE

SELECTION

Tomasz Woloszynski1, Marek Kurzynski, PhD2, Pawel Podsiadlo, PhD1, and

Gwidon W Stachowiak, PhD1




Poland

Information Fusion 2012;13;207–212

Abstract

In this paper, a measure of competence based on random classification (MCR)

for classifier ensembles is presented. The measure selects dynamically (i.e. for

each test example) a subset of classifiers from the ensemble that perform better

than a random classifier. Therefore, weak (incompetent) classifiers that would

adversely affect the performance of a classification system are eliminated. When

all classifiers in the ensemble are evaluated as incompetent, the classification

accuracy of the system can be increased by using the random classifier instead.

93

CHAPTER 4. A MEASURE OF COMPETENCE . . .

Theoretical justification for using the measure with the majority voting rule is

given. Two MCR based systems were developed and their performance was

compared against six multiple classifier systems using data sets taken from

the UCI Machine Learning Repository and Ludmila Kuncheva Collection. The

systems developed had typically the highest classification accuracies regardless

of the ensemble type used (homogeneous or heterogeneous).

Key words: multiple classifier system, dynamic ensemble selection, competence

measure, random classification.

94


1. Introduction

In multiple classifier systems (MCSs), a diverse ensemble of classifiers is

trained and combined together in order to outperform any single classifier

in the ensemble [1]. The diversity of the ensemble can be achieved by using

heterogeneous classifiers [2–4] and/or by generating a different training data set

for each classifier through, for examples, bagging [5], boosting [6], or random

subspaces [7]. For the combination of classifiers in the ensemble, two main

approaches used are classifier fusion and classifier selection.

In the first approach, all classifiers in the ensemble contribute to the decision

of the MCS, e.g. through sum or majority voting [2, 6]. Generally, a fusion

based MCS performs better than any classifier in the ensemble provided that all

classifiers are complementary, i.e. they make independent errors [8]. However,

such independence usually cannot be guaranteed in practice. To address this

problem, ensemble pruning (EP) methods have been developed [9–11]. In these

methods, the decision of the MCS is made by using a subset of complementary

classifiers selected from the ensemble. Techniques used for the classifiers

selection include measures of goodness [12, 13], multiple comparisons statistics

[14], reinforcement learning [15], genetic algorithms [16], and quadratic integer

programming [17]. The main drawback of the EP methods is that the selection

is global, i.e. the same subset of classifiers is used for classification of all test

examples. This could adversely affect the performance of EP based classification

systems, since for some regions in a feature space there could be subsets of

classifiers that have higher classification accuracies than the globally selected

subset.

In the second approach, a single classifier is selected from the ensemble for each

test example and its decision is used as the decision of the MCS. The selection

of a classifier can be either static or dynamic. In static classifier selection, a

95


region of competence in the feature space is assigned for each classifier during

the training phase [18, 19]. Classification is made by the classifier assigned to

the competence region that contains the test example. In dynamic classifier

selection (DCS), competences of classifiers are calculated during the classification

phase, i.e. at the time when the test example is presented [20–22]. The classifier

with the highest value of competence is used for the classification of the test

example. The performance of DCS based classification systems depends on the

correct estimation of competence which is usually defined as local accuracy [23].

However, in the case of poor estimation the selected classifier may not be the

most accurate classifier in the ensemble for the region and this could adversely

affect the performance of the system. Recently, dynamic ensemble selection

(DES) methods have been developed to overcome this problem [24–27]. These

methods first dynamically select a subset of classifiers from the ensemble and

then combine the selected classifiers by majority voting. In this way DES based

classification systems take advantage of both the selection and fusion approaches

while avoiding the problems of the EP and DCS methods. The systems have some

similarities to random forests [28] with regard to the selection of features and data

sets used in classifier training and the use of the majority voting rule. The major

difference, however, is that the random forests grow a diverse classifier ensemble

while DES based systems select and combine classifiers from an ensemble that

is already given. For the dynamic selection of classifiers subset in the DES

based systems, local accuracy methods [24, 25], a local accuracy and diversity

method [26] and a two-level optimization and selection strategy [27] have been

used. However, the method and strategy used have little theoretical justification

[24, 26] and the subset of classifiers is selected in a rather complex manner [25,

27].

In this study, these issues were addressed by developing a measure of

competence based on random classification (MCR). The study is the continuation

of our initial work on the competence measure based on random guessing that

96


was successfully used for the classification of five benchmark data sets [29].

In particular, theoretical justification is given and extensive experiments are

conducted to investigate whether the performance of a random classifier is a

natural criterion for selecting relatively accurate and diverse classifiers from

the ensemble. The measure developed uses a random classifier to evaluate the

performances of classifiers in the ensemble. It is theoretically shown that the

MCR increases the classification accuracy of MCS when the majority voting rule

is used. The performances of MCR based classification systems were compared

against six MCSs based on the following combination methods: single best

(SB), majority voting (MV), DCS local accuracy (DCS-LA) [21], DCS multiple

classifier behaviour (DCS-MCB) [22], DCS modified local accuracy (DCS-MLA)

[30], and DES knora-eliminate (DES-KE) [24] using data sets from the UCI

Machine Learning Repository [31] and the Ludmila Kuncheva Collection [32].

2. Theoretical framework

Consider a classification problem with a setM = {1, 2, . . . ,M} of class labels and

a feature space X ⊆ Rn (i.e. each example is described by n features). Assume

that examples are independent and identically distributed (iid) random variable

pairs (X, J), where X ∈ X and J ∈ M. The probability measure µ and prior

probabilities pj , j = 1, . . . ,M describe distributions of X and J , respectively. The

posterior probability that an example described by a feature vector x ∈ X belongs

to the class j ∈M is denoted by pj(x).

Let ψ : X → M be a classifier that produces a vector of discriminant functions

[d1(x), d2(x), . . . , dM(x)]. The value of the discriminant function dj(x), j =

1, 2, . . . ,M represents a support given by the classifier ψ for the fact that the

example described by a feature vector x belongs to the j-th class. Assume without

loss of generality that dj(x) ≥ 0, j = 1, 2, . . . ,M and∑M

j=1 dj(x) = 1. Classification

97


is made according to the maximum rule

ψ(x) = i ⇔ di(x) = maxj=1,...,M

dj(x). (1)

The performance of the classifier ψ is measured by its probability of correct

classification Pc(ψ). For a random classifier ψrnd that draws a class label using

a uniform distribution the probability Pc(ψrnd) is equal to 1M

.

2.1 Measure of competence based on random classification (MCR)

Let C(ψ|x) denote the MCR of the classifier ψ for the example described by a

feature vector x. The functionC(ψ|x) is defined as any strictly increasing function

G(·) with the property G(0) = 0 and the argument of the form Pc,N(ψ|x)− 1M

, i.e.

C(ψ|x) = G

[Pc,N(ψ|x)− 1

M

], (2)

where Pc,N(ψ|x) is the estimated value of conditional probability of correct

classification

Pc(ψ|x) = pi(x)⇔ di(x) = maxj=1,...,M

dj(x) (3)

andN is the number of examples in a validation data set V used for the estimation

of Pc(ψ|x). The examples in V are assumed to be iid random variable pairs

distributed as (X, J) and independent of examples used for the training of

classifiers.

The measure C(ψ|x) is positive (non-positive) if the classifier ψ is competent

(incompetent) for the example described by a feature vector x, i.e. if the value

of Pc,N(ψ|x) is greater than (lower than or equal to) Pc(ψrnd).

2.2 Theoretical justification

Let Ψ = {ψ1, . . . , ψL} be an ensemble of L classifiers and let Ψvote be a MCS

obtained by combining classifiers from the ensemble Ψ using the majority voting

98


rule. Ψvote is Bayes optimal if the following assumptions are satisfied [33, 34]:

• for each classifier the probability of choosing the correct class is greater than

the probability of choosing any single other class,

• all classifiers make independent errors, and

• the number of classifiers tends to infinity.

However, the MCS has little practical value since the independence between

classifiers errors usually cannot be guaranteed and a large number of classifiers

in the ensemble would adversely affect the system’s complexity and efficiency.

If the above assumptions are not satisfied then, assuming that there exists a set

R ⊆ X with µ(R) > 0 where Pc(ψl|x) < 1M

, l = 1, . . . , L, it is possible to construct

a classification system Ψvote,N for which

Pr[

limN→∞

Pc(Ψvote,N) > Pc(Ψvote)]

= 1. (4)

The Ψvote,N is constructed in the following manner:

• if the MCR evaluates at least one classifier as competent for a given x then

Ψvote,N is obtained by majority voting of the competent classifiers, and

• if there is no classifier evaluated as competent by the MCR for a given x

then Ψvote,N = ψrnd.

The Ψvote,N performs better than the Ψvote if the condition

∀l=1,...,L

(Pc(ψl|x) ≤ 1

M⇒ Pc,N(ψl|x) ≤ 1

M

)∧(

Pc(ψl|x) >1

M⇒ Pc,N(ψl|x) >

1

M

)(5)

is satisfied for almost all x ∈ X , i.e. for all x except possibly a region with the

probability measure zero. In particular

Pc(Ψvote,N |x ∈ X \ R) ≥ Pc(Ψvote|x ∈ X \ R) (6)

99


because the performance of the Ψvote in the set X \ R would not deteriorate by

locally removing worse-than-random classifiers from the ensemble Ψ, and

Pc(Ψvote,N |x ∈ R) > Pc(Ψvote|x ∈ R) (7)

because majority voting of worse-than-random classifiers in the set R cannot

outperform the random classifier. The reason is that the majority voting rule

can only select one of the class labels returned by classifiers in the ensemble.

Consequently, if all classifiers are worse-than-random then neither of the

returned class labels gives a better-than-random classification accuracy.

The probability (4) can be easily obtained if the estimator of Pc,N(ψ|x) is strongly

consistent, e.g. k-NN, histogram or kernel estimator [35]. Consequently, as N →

∞, the condition (5) is satisfied with probability equal to one by noting that for

the estimator

Pr[

limN→∞

Pc,N(ψ|x) = Pc(ψ|x)]

= 1. (8)

Therefore, it follows from (6) and (7) that, asN →∞, Ψvote,N performs better than

the Ψvote with probability equal to one.

3. Methods

3.1 DES-P and DES-KL classification systems

The DES-Performance (DES-P) and DES-Kullback-Leibler (DES-KL) classification

systems were developed according to Section 2.2. In the systems, the values of

the competence measure C(ψ|x) were estimated as follows:

1. DES-P system

The competence measure used in the DES-P system follows the definition of

the MCR. First, the performance of the classifier ψ in a neighbourhood of the

test example x is estimated using weighted k nearest neighbours taken from

the validation data set V . k = 10 was chosen since for this value a weighted

100


nearest neighbour estimation method produced the best results in previous

studies [30]. The competence measure is then obtained by subtracting the

performance of the random classifier 1M

from the performance estimated,

i.e.

C(ψ|x) = Pc(ψ|x)− 1

M. (9)

2. DES-KL system

The competence measure used in the DES-KL system estimates the

performances of classifiers from the information theory perspective. First,

for each example y in the validation data set V a source competence

Csrc(ψ|y) is calculated as the Kullback-Leibler (KL) divergence between the

uniform distribution[

1M, . . . , 1

M

]and the vector of discriminant functions

[d1(y), . . . , dM(y)] produced by the classifier ψ. In this way, the source

competence measures how ”close” the values of the discriminant functions

are to the probability of random classification. Since the KL divergence

is nonnegative, the source competence is given the sign of the expression

di(y) − maxj=1,...,M,j 6=i dj(y), where i denotes the class of the example y.

As a result, the source competence attains its maximum (minimum) if the

support given by the classifier for the correct (any incorrect) class is equal

to one. The competence of the classifier ψ for the test example x is then

obtained by a weighted sum of the source competences

C(ψ|x) =∑y∈V

Csrc(ψ|y)exp(−d(x, y)2), (10)

where d(x, y) is Euclidean distance between the examples x and y. For the

DES-P and DES-LA systems, different types of methods for the comparison

between each classifier from the ensemble and the random classifier were

used to investigate their effect on the performance.

101


4. Experiments

The performance of the DES-P and DES-KL systems was compared against six

MCSs using 14 benchmark data sets. The comparison study was conducted in

MATLAB using PRTools developed by Duin et al. [36].

4.1 Data sets

The benchmark data sets taken from the UCI Machine Learning Repository (UCI)

[31] and the Ludmila Kuncheva Collection (LKC) [32] were used. For each

data set, the feature vectors were normalized to zero mean and unit standard

deviation. A brief description of the data sets used is given in Table 1. The

training and testing data sets were extracted using two-fold cross-validation. The

training data set was also used as the validation data set.

4.2 MCSs

The DES-P and DES-KL systems were compared against two MCSs based on

benchmark combination methods and four competence based MCSs:

1. SB system

This system selects the single best classifier in the ensemble.

2. MV system

This system is based on majority voting of all classifiers in the ensemble.

3. DCS-LA system

This system defines the competence of the classifier ψ for the test example

x as the local classification accuracy. The accuracy is estimated using the k

nearest neighbours of the test example x that are taken from the validation

data set V . k = 10 was chosen since for this value the DCS-LA system had

the best overall performance in previous studies [21].

4. DCS-MCB system

This system defines the competence of the classifier ψ for the test example

102


x as the classification accuracy calculated for a data set V . V is dynamically

generated from the validation data set V in the following manner: First, a

multiple classifier behaviour (MCB) is calculated for the test example x and

for its k nearest neighbours taken from V . The MCB is defined as a vector

whose elements are the decisions (i.e. class labels assigned to the example

x) of all classifiers in the ensemble. Next, similarities between the MCBs

are calculated using the averaged Hamming distance. The examples from

V that are most similar to the test example x (i.e. below some similarity

threshold) are then used to generate the data set V . Since the optimal values

of the parameter k and the similarity threshold were not given in previous

studies, the values of k = 10 and the similarity threshold equal to 0.5 were

arbitrarily chosen.

5. DCS-MLA system

This system is similar to the DCS-LA system, except the local classification

accuracy is estimated using weighted k nearest neighbours of the test

example x that are taken from V [30].

6. DES-KE system

This system dynamically selects a subset of classifiers with the perfect

classification accuracy of k nearest neighbours of the test example x. The

k nearest neighbours are taken from the validation data set V . If there is no

classifier with the perfect classification accuracy of all k nearest neighbours,

the value of k is decreased until at least one such classifier is found.

k = 8 was chosen since for this value the DES-KE system had the best

performance [24].

4.3 Classifier ensembles

Two types of classifier ensembles were used in the experiments: homogeneous

and heterogeneous. The homogeneous ensemble consisted of classification trees

with the Gini splitting criterion and the pruning level set to 4 [37]. The trees

103


were used because of their instability with respect to the training data set and

relatively good classification accuracy. The heterogeneous ensemble consisted of

the following 11 classifiers [37]:

• (1 and 2) LDC (QDC) - linear (quadratic) discriminant classifiers based on

normal distributions with the same (different) covariance matrix for each

class;

• (3) NMC - nearest mean classifier;

• (4–6) k-NN - k-nearest neighbours classifiers with k = 1, 5, 15;

• (7 and 8) PARZEN1 (PARZEN2) - Parzen classifier with the Gaussian kernel

and the optimal smoothing parameter hopt (and the smoothing parameter

hopt/2);

• (9) TREE - classification tree with the Gini splitting criterion and the pruning

level set to 4;

• (10 and 11) BPNN1 (BPNN2) - feed-forward backpropagation neural

network classifier containing two hidden layers with 5 neurons each (one

hidden layer with 10 neurons) and the maximum number of learning

epochs set to 80.

The classifiers chosen for the heterogeneous ensemble are structurally diverse,

i.e. training and classification phases are carried out differently for each type of

classifier used. For both ensemble types, classifiers were trained using bagging.

5. Results

Classification accuracies (i.e. the percentage of correctly classified examples) were

calculated for the systems using the homogeneous and heterogeneous ensembles

containing 11 classifiers each and they are given in Tables 2 and 3, respectively.

The accuracies are average values obtained over 10 runs (5 replications of

two-fold cross-validation). Differences in rank between the systems were

104


evaluated using an Iman and Davenport test combined with a post-hoc Holm’s

step-down procedure [38]. Differences between the accuracies of the DES-P and

DES-KL systems and the six MCSs were also evaluated and a 5×2cv F-test was

used [39]. For both tests, the level of P<0.05 was considered as statistically

significant.

The DES-P and DES-KL systems achieved the highest overall classification

accuracies averaged over all data sets (i.e. 85.19% and 85.09%) for the

homogeneous ensemble (Table 2). The two systems developed outperformed the

SB, MV, DCS-LA, DCS-MCB, DCS-MLA, and DES-KE systems by 9.70%, 1.98%,

5.04%, 4.06%, 4.86% and 3.06% on average, respectively. The ranks of the DES-P

and DES-KL systems were statistically significantly higher than those of the six

MCSs. The exception was that the DES-KL system had a lower rank than the

MV system. The systems developed produced statistically significantly higher

accuracies than the other MCSs in 79 out of 168 cases (14 data sets × 6 MCSs × 2

systems developed). However, against all the other MCSs they were significantly

better for only 2 data sets (the OptDigits and Vowel). The MV system that had

the third best accuracy (83.16%) outperformed the other 5 MCSs for 8 out of 14

data sets.

The DES-P and DES-KL systems also achieved the highest overall classification

accuracies (i.e. 87.71% and 87.24%) when the heterogeneous ensemble was used

(Table 3). The two systems outperformed the SB, MV, DCS-LA, DCS-MCB,

DCS-MLA, and DES-KE systems by 0.44%, 0.50%, 2.05%, 2.05%, 1.98% and

1.46% on average, respectively. The DES-P and DES-KL systems had statistically

significantly higher ranks than those of the DCS-LA, DCS-MCB and DES-KE

systems. The DES-KL system had also a higher rank than the DCS-MLA

system. The two systems developed had statistically significantly higher (lower)

accuracies than the six MCSs in 64 (3) out of 168 cases. However, against all the

other MCSs only the DES-KL system was significantly better, and only for the

105


Vowel data set. From the six MCSs used for the comparison study, the SB system

achieved the highest classification accuracy averaged over all data sets (87.04%).

Classification accuracies calculated for the MCSs using homogeneous and

heterogeneous ensembles of sizes ranging from 11 to 500 and from 11 to 55

classifiers, respectively are given in Table 4. The DES-P and DES-KL systems

had the highest overall accuracies for all ensemble sizes and types.

The percentage of times each classifier was selected by the DCS and DES based

systems is given in Table 5. The nearest neighbour and the two Parzen classifiers

were most frequently selected by the systems.

6. Discussion

The newly developed DES-P and DES-KL systems had typically higher

accuracies than the six MCSs. In particular, the systems had the two highest

accuracies for 13 and 6 out of 14 data sets using the homogeneous and

heterogeneous ensembles, respectively. However, they produced statistically

significantly higher accuracies than those of all the other MCSs for only 2 data

sets.

For the homogeneous ensemble, possible reasons for the better performance of

the systems developed could be that the ensemble used was diverse, there was

no superior classifier, and the DES-P and DES-KL systems could select subsets

containing more complementary classifiers than subsets of the best-performing

classifiers used in the other MCSs [1, 10]. Another reason could be that in the

systems developed, all classifiers in the ensemble were evaluated as incompetent

for at least one test example in 9 data sets. Consequently, assuming that the

estimation of the competences was correct for these examples, one can claim that

the random classifier used in the DES-P and DES-KL systems outperformed all

other systems.

106


For the heterogeneous ensemble, the relatively worse performance of the systems

developed could be explained by the fact that the better-than-random classifiers

in the ensemble were rarely complementary, i.e. that there was a subset of

superior classifiers for all data sets. As a result, the performance of the systems

could deteriorate in the case where other than superior classifiers were selected

from the ensemble. For example, the nearest neighbour and the two Parzen

classifiers often outperformed all the other classifiers in the ensemble. For the

Dermatology, Iris and Yeast data sets, the single best classifier in the ensemble

also outperformed the DES-P and DES-KL systems. This problem could be

overcome by increasing the number of classifiers in the ensemble and introducing

additional diversity by, for example, using only a subset of features for training [6,

28]. The performance of the systems developed could also deteriorate for highly

skewed data sets (i.e. for data sets with highly different class prior probabilities

pj) such as the EColi and Yeast. This is because the value of 1M

could be lower

than the performance of almost any classifier. This problem could be rectified

by estimating prior probabilities pj using the training data set and eliminating

classifiers ψl from the ensemble with Pc(ψl) ≤ pi, where pi = maxj=1,...,M pj .

The two systems developed had typically the highest classification accuracies for

all ensemble sizes. For homogeneous ensembles, their performance improved

noticeably with increased ensemble size. This agrees with the fact that for

bagging classification trees, the rate of the performance improvement is typically

at its highest for the first 30–40 trees [1, 5]. For heterogeneous ensembles, the

performance of each MCS evaluated remained almost the same for all ensemble

sizes. A possible explanation is that a subset of superior classifiers dominated

the ensembles and subsequently, the ensemble size had little or no effect on the

performance of the MCSs. The DES-P and DES-KL systems performed similarly

for all ensemble sizes and types. This indicates that the type of comparison

method used in the systems has little effect on their performance.

107


The MCR does not require feature vectors and the DES-P and DES-KL systems

can be used for classification in dissimilarity space. This is of practical importance

since the direct measurement of distances between examples may be preferred

over feature extraction in some problems, e.g. classifications of electrocardiogram

[40] and electroencephalogram signals [41, 42] or trabecular bone texture in knee

radiographs [43].

7. Conclusion

From the work conducted, the following conclusions can be drawn:

1. A measure of competence based on random classification (MCR) for

dynamic ensemble selection was successfully developed. The use of the

MCR with majority voting rule was justified theoretically.

2. Two systems based on the MCR were developed, i.e. DES-Performance

(DES-P) and DES-Kullback-Leibler (DES-KL). They showed the best overall

performances when compared against six MCSs on 14 data sets. It appears

that the systems are well suited for the classification of a wide range of data

sets.

3. The MCR uses distances between examples and the DES-P and DES-KL

systems can be used for classification in both feature and dissimilarity

spaces.

Acknowledgements

Financial support from the School of Mechanical and Chemical Engineering, The

University of Western Australia, is greatly appreciated.

108


8. References



[2] J Kittler, M Hatef, R P W Duin, and J Matas. “On combining classifiers”.


pp. 226–239.

[3] T Woloszynski and M Kurzynski. “Application of combining classifiers

using dynamic weights to the protein secondary structure prediction -

comparative analysis of fusion methods”. In: 7th International Symposium

on Biological and Medical Data Analysis. 2006, pp. 83–91.

[4] A M P Canuto, M C C Abreu, L M Oliveira, J C Xavier, Jr., and A M Santos.

“Investigating the influence of the choice of the ensemble members in

accuracy and diversity of selection-based and fusion-based methods for

ensembles”. In: Pattern Recognition Letters 28 (2007), pp. 472–486.


pp. 123–140.

[6] Y Freund and R E Schapire. “A decision-theoretic generalization of on-line

learning and an application to boosting”. In: Journal of Computer and System

Sciences 55 (1997), pp. 119–139.

[7] T K Ho. “The random subspace method for constructing decision forests”.


pp. 832–844.

[8] L I Kuncheva. “A theoretical study on six classifier fusion strategies”. In:


pp. 281–286.

[9] R E Banfield, L O Hall, K W Bowyer, and W P Kegelmeyer. “Ensemble

diversity measures and their application to thinning”. In: Information Fusion

6 (2005), pp. 49–62.

109


[10] D Ruta and B Gabrys. “Classifier selection for majority voting”. In:

Information Fusion 6 (2005), pp. 63–81.

[11] G Martinez-Munoz, D Hernandez-Lobato, and A Suarez. “An analysis

of ensemble pruning techniques based on ordered aggregation”. In:


pp. 245–259.

[12] D D Margineantu and T G Dietterich. “Pruning adaptive boosting”. In:

Fourteenth International Conference on Machine Learning. 1997, pp. 211–218.

[13] G Martinez-Munoz and A Suarez. “Aggregation ordering in bagging”. In:

IASTED International Conference on Artificial Intelligence and Applications.

2004, pp. 258–263.

[14] G Tsoumakas, L Angelis, and I Vlahavas. “Selective fusion of

heterogeneous classifiers”. In: Intelligent Data Analysis 9 (2005), pp. 511–525.

[15] I Partalas, G Tsoumakas, and I Vlahavas. “Pruning an ensemble of

classifiers via reinforcement learning”. In: Neurocomputing 72 (2009),

pp. 1900–1909.

[16] Z Zhou, J Wu, and W Tang. “Ensembling neural networks: many could be

better than all”. In: Artificial Intelligence 137 (2002), pp. 239–263.

[17] Y Zhang, S Burer, and W N Street. “Ensemble pruning via semi-definite

programming”. In: Journal of Machine Learning Research 7 (2006),

pp. 1315–1338.

[18] L I Kuncheva. “Clustering-and-selection model for classifier combination”.

In: Fourth International Conference on Knowledge-Based Intelligent Engineering

Systems and Allied Technologies. 2000, pp. 185–188.

[19] R Liu and B Yuan. “Multiple classifiers combination by clustering and

selection”. In: Information Fusion 2 (2001), pp. 163–168.

110


[20] B V Dasarathy and B V Sheela. “A composite classifier system design:

concepts and methodology”. In: Proceedings of the IEEE 67 (1979),

pp. 708–713.

[21] K Woods, W P Kegelmeyer, Jr., and K Bowyer. “Combination of multiple

classifiers using local accuracy estimates”. In: IEEE Transactions on Pattern

Analysis and Machine Intelligence 19 (1997), pp. 405–410.

[22] G Giacinto and F Roli. “Dynamic classifier selection based on multiple

classifier behaviour”. In: Pattern Recognition 34 (2001), pp. 1879–1881.

[23] L Didaci, G Giacinto, F Roli, and G L Marcialis. “A study on the

performances of dynamic classifier selection based on local accuracy

estimation”. In: Pattern Recognition 38 (2005), pp. 2188–2191.

[24] A H R Ko, R Sabourin, and A S Britto, Jr. “From dynamic classifier

selection to dynamic ensemble selection”. In: Pattern Recognition 41 (2008),

pp. 1718–1731.

[25] E Kim and Y Lee. “Multiple classifier fusion method based on local

competence”. In: IASTED International Conference. 2002, pp. 404–409.

[26] M C P de Souto, R G F Soares, A Santana, and A M P Canuto. “Empirical

comparison of dynamic classifier selection methods based on diversity and

accuracy for building ensembles”. In: International Joint Conference on Neural

Networks. 2008, pp. 1480–1487.

[27] E M dos Santos, R Sabourin, and P Maupin. “A dynamic

overproduce-and-choose strategy for the selection of classifier ensembles”.

In: Pattern Recognition 41 (2008), pp. 2993–3009.

[28] L Breiman. “Random forests”. In: Machine Learning 45 (2001), pp. 5–32.

[29] T Woloszynski and M Kurzynski. “On a new measure of classifier

competence applied to the design of multiclassifier systems”. In: 15th

International Conference on Image Analysis and Processing. 2009, pp. 995–1004.

111


[30] P C Smits. “Multiple classifier systems for supervised remote sensing image

classification based on dynamics classifier selection”. In: IEEE Transactions

on Geoscience and Remote Sensing 40 (2002), pp. 801–813.

[31] A Asuncion and D J Newman. UCI Machine Learning Repository. 2007. URL:

http://www.ics.uci.edu/˜mlearn/MLRepository.html.

[32] L I Kuncheva. Ludmila Kuncheva Collection. 2004. URL: http : / / www .

bangor.ac.uk/˜mas00a/activities/patrec1.html.

[33] P J Boland. “Majority systems and the Condorcet Jury Theorem”. In: Journal

of the Royal Statistical Society: Series D (The Statistician) 38 (1989), pp. 181–189.

[34] C List and R E Goodin. “Epistemic democracy: generalizing the Condorcet

Jury Theorem”. In: Journal of Political Philosophy 9 (2001), pp. 277–306.

[35] L Devroye, L Gyorfi, and G Lugosi. A Probabilistic Theory of Pattern

Recognition (Stochastic Modelling and Applied Probability). Springer, 1997.

[36] R P W Duin, P Juszczak, P Paclik, E Pekalska, D de Ridder, D M J Tax,

and S Verzakov. PR-Tools4.1, A Matlab Toolbox for Pattern Recognition.

http://prtools.org. 2007.

[37] R O Duda, P E Hart, and D G Stork. Pattern Classification.


[38] J Demsar. “Statistical comparison of classifiers over multiple data sets”. In:

Journal of Machine Learning Research 7 (2006), pp. 1–30.

[39] E Alpaydin. “Combined 5x2cv F test for comparing supervised

classification learning algorithms”. In: Neural Computation 11 (1999),

pp. 1885–1892.

[40] S Fang and H Chan. “Human identification by quantifying similarity and

dissimilarity in electrocardiogram phase space”. In: Pattern Recognition 42

(2009), pp. 1824–1831.

112


[41] J L Hernandez, R Biscay, J C Jimenez, P Valdes, and R G de Peralta.

“Measuring the dissimilarity between EEG recordings through a non-linear

dynamical system approach”. In: International Journal of Bio-Medical

Computing 38 (1995), pp. 121–129.

[42] R G Geocadin, R Ghodadra, T Kimura, H Lei, D L Sherman, D F Hanley, and

N V Thakor. “A novel quantitative EEG injury measure of global cerebral

ischemia”. In: Clinical Neurophysiology 111 (2000), pp. 1779–1787.




113


9. List of tables

Table 1: A brief description of the data sets used.

Data set Source Examples Features Classes

Breast Cancer Wisconsin UCI 699 9 2

Dermatology UCI 366 34 6

EColi UCI 336 7 8

Glass UCI 214 9 6

Ionosphere UCI 351 34 2

Iris UCI 150 4 3

Laryngeal3 LKC 353 16 3

OptDigits UCI 3823 64 10

Page Blocks UCI 5473 10 5

Segmentation UCI 2310 19 7

Thyroid LKC 215 5 3

Vowel UCI 990 10 11

Wine UCI 178 13 3

Yeast UCI 1484 8 10

114


Tabl

e2:

Cla

ssifi

cati

onac

cura

cies

obta

ined

for

MC

Ssus

ing

aho

mog

eneo

usen

sem

ble

cont

aini

ng11

clas

sifie

rs.

The

best

resu

ltfo

rea

chda

tase

tis

inbo

ld.

Dat

ase

tSB

MV

DC

S-LA

DC

S-M

CB

DC

S-M

LAD

ES-K

ED

ES-P

DES

-KL

Brea

stC

.W.

94.8

295

.71

94.8

594

.65

94.8

895

.16

95.8

595

.71

Der

mat

olog

y73

.43

85.4

684

.10

84.1

583

.83

85.9

088

.85

88.9

1

ECol

i71

.07

78.8

176

.08

75.9

074

.53

75.9

781

.25

80.4

8

Gla

ss58

.12

66.8

065

.69

68.4

966

.44

68.2

970

.90

70.6

0

Iono

sphe

re86

.10

87.4

786

.72

87.9

285

.81

87.6

487

.98

88.2

1

Iris

90.1

391

.07

91.7

392

.00

92.0

091

.87

92.0

092

.67

Lary

ngea

l366

.35

69.3

563

.91

63.2

963

.97

65.1

670

.54

69.9

2

Opt

Dig

its

62.5

586

.87

76.1

778

.98

75.8

079

.84

89.4

087

.28

Page

Bloc

ks95

.12

96.0

895

.82

95.7

895

.73

96.0

796

.15

96.3

5

Segm

enta

tion

86.1

995

.52

90.6

891

.79

90.9

995

.15

96.1

896

.39

Thyr

oid

90.5

291

.72

90.8

990

.61

90.7

192

.56

93.0

293

.21

Vow

el50

.65

71.7

666

.81

71.4

971

.74

73.4

981

.03

83.4

7

Win

e86

.04

93.6

989

.99

90.6

789

.08

93.4

794

.37

94.2

6

Yeas

t45

.12

53.9

947

.97

49.4

148

.36

48.6

055

.18

53.8

1

Ave

rage

75.4

483

.16

80.1

081

.08

80.2

882

.08

85.1

985

.09

115


Tabl

e3:

Cla

ssifi

cati

onac

cura

cies

obta

ined

for

MC

Ssus

ing

ahe

tero

gene

ous

ense

mbl

eco

ntai

ning

11cl

assi

fiers

.Th

ebe

stre

sult

for

each

data

seti

sin

bold

.

Dat

ase

tSB

MV

DC

S-LA

DC

S-M

CB

DC

S-M

LAD

ES-K

ED

ES-P

DES

-KL

Brea

stC

.W.

96.2

596

.25

94.7

994

.74

94.8

294

.94

96.2

595

.85

Der

mat

olog

y95

.85

95.6

893

.55

93.6

093

.44

94.6

595

.68

94.9

7

ECol

i84

.05

84.3

578

.87

78.9

378

.34

79.2

984

.17

83.2

7

Gla

ss66

.95

67.0

367

.70

68.7

168

.56

67.9

169

.11

69.0

1

Iono

sphe

re85

.30

84.6

783

.70

84.2

283

.87

83.9

986

.72

86.1

0

Iris

96.8

095

.07

94.9

395

.07

95.3

395

.87

95.3

396

.27

Lary

ngea

l370

.88

73.0

965

.21

65.1

065

.27

65.1

070

.48

68.8

4

Opt

Dig

its

96.4

397

.23

95.9

995

.76

95.9

097

.31

97.5

197

.08

Page

Bloc

ks96

.09

96.0

695

.97

96.0

095

.93

96.0

296

.04

96.4

7

Segm

enta

tion

93.7

094

.89

94.1

294

.36

94.2

095

.22

95.6

095

.25

Thy

roid

94.5

291

.36

94.4

294

.42

94.4

294

.70

95.0

895

.63

Vow

el87

.82

87.9

490

.53

88.9

991

.19

92.2

092

.81

93.1

7

Win

e95

.86

96.6

495

.63

95.3

995

.41

96.8

597

.64

96.9

7

Yeas

t58

.02

57.4

550

.58

50.7

850

.32

50.2

155

.53

52.4

3

Ave

rage

87.0

486

.98

85.4

385

.43

85.5

086

.02

87.7

187

.24

116


Tabl

e4:

Cla

ssifi

cati

onac

cura

cies

aver

aged

over

alld

ata

sets

obta

ined

for

diff

eren

tens

embl

esi

zes.

The

best

resu

ltfo

rea

chen

sem

ble

size

isin

bold

. Ense

mbl

esi

zeSB

MV

DC

S-LA

DC

S-M

CB

DC

S-M

LAD

ES-K

ED

ES-P

DES

-KL

Hom

ogen

eous

1175

.44

83.1

680

.10

81.0

880

.28

82.0

885

.19

85.0

9

2275

.36

84.3

880

.13

80.8

080

.14

82.7

986

.29

86.3

5

3376

.35

85.3

181

.04

81.2

081

.05

84.1

287

.07

87.3

4

4476

.15

85.3

080

.72

81.2

881

.09

84.4

187

.26

87.3

8

5576

.14

85.7

880

.98

81.3

880

.90

84.9

087

.44

87.7

6

100

76.4

685

.54

80.4

180

.99

80.6

085

.38

87.3

487

.58

200

76.9

385

.96

80.9

881

.33

80.9

485

.34

87.7

888

.05

500

76.9

785

.88

80.9

181

.29

80.7

385

.56

87.4

187

.89

Het

erog

eneo

us11

87.0

486

.98

85.4

385

.43

85.5

086

.02

87.7

187

.24

2287

.30

87.4

585

.86

85.5

585

.57

86.6

088

.00

87.5

6

3387

.38

87.6

185

.44

85.4

085

.63

86.8

488

.05

87.6

2

4487

.49

87.6

985

.63

85.5

185

.55

86.8

588

.26

87.7

0

5587

.38

87.6

585

.53

85.5

385

.48

86.8

388

.24

87.6

5

117


Tabl

e5:

The

perc

enta

geof

tim

esea

chcl

assi

fier

was

sele

cted

byth

eD

CS

and

DES

base

dsy

stem

sav

erag

edov

eral

lens

embl

esi

zes

and

data

sets

.

Cla

ssifi

erD

CS-

LAD

CS-

MC

BD

CS-

MLA

DES

-KE

DES

-PD

ES-K

L

Hom

ogen

eous

TREE

––

–47

9788

Het

erog

eneo

us

LDC

78

860

9790

QD

C7

77

5087

79

NM

C6

76

5491

82

1-N

N20

1819

7499

97

5-N

N7

88

6098

92

15-N

N6

76

5695

89

PAR

ZEN

118

1717

7399

97

PAR

ZEN

219

1719

7499

97

TREE

78

747

9787

BPN

N1

22

219

5240

BPN

N2

11

121

5141

118

CHAPTER 5

DISSIMILARITY-BASED MULTIPLE CLASSIFIER

SYSTEM FOR TRABECULAR BONE TEXTURE IN KNEE

RADIOGRAPHS: DETECTION AND PREDICTION OF

OSTEOARTHRITIS


and Marek Kurzynski, PhD2




Poland

Submitted to Proceedings of the Institution of Mechanical Engineers, Part H,

Journal of Engineering in Medicine (Chapter 5).

Abstract

A dissimilarity-based multiple classifier (DMC) system was developed and used

for detection and prediction of knee osteoarthritis (OA). The DMC system

calculates distances between trabecular bone (TB) texture images and uses

the distances with accurate classifiers. To generate the classifiers, a specially

developed approach is used to obtain an ensemble of diverse classifiers and

119

CHAPTER 5. DISSIMILARITY-BASED MULTIPLE CLASSIFIER SYSTEM . . .

then select and combine accurate classifiers from the ensemble. The DMC

system was evaluated using standardised radiographs of human knees taken

at baseline and follow-up four years later. Three experienced readers graded

the medial and lateral compartments of the knees for joint space narrowing

(JSN), osteophytes and OA according to atlas of radiographic features and

Kellgren and Lawrence (KL) scale. From the radiographs, TB texture images

were selected under both compartments using an automated region selection

method. Texture images selected under 68 healthy (KL grade 0), 69 OA (KL

grade 2 or 3), 60 non-progressive (no change in the sum of JSN and osteophyte

grades between baseline and follow-up) and 59 progressive (an increase in

the sum) compartments were used. The DMC system exhibited statistically

significantly (P<0.05) higher classification accuracies in discriminating between

healthy and OA (90.51%) and between non-progressive and progressive (80.00%)

compartments than accuracies of two benchmark systems. The results indicate

potential of the DMC as a decision-support tool for detection and prediction of

knee OA.

Key words: knee osteoarthritis, radiographs, texture, multiple classifier system.

120


1. Introduction

Knee osteoarthritis (OA) is a chronic and progressive joint disease of unknown

origin characterised by cartilage degradation and bone changes [1–5]. Detection

and prediction of OA are of particular importance since they could prompt earlier

treatment to prevent the cartilage and bone destruction that lead to disability.

Previous studies suggested that bone changes occur first on the development

pathway to knee OA [1, 6], i.e. that changes in subchondral bone precede

osteophytes and loss of cartilage volume. Therefore, bone changes were analysed

for knee OA detection and prediction. For the analysis, knee radiographs were

used since they are cheap, non-invasive, widely accessible, and they contain

two-dimensional bone texture that is directly related to three-dimensional bone

structure [7–11].

Knee OA detection was defined as a problem of classifying a knee radiograph into

healthy (Kellgren and Lawrence (KL) [12] grade 0) or OA (KL grade 2 or 3) class,

and a weighted neighbor distance using a compound hierarchy of algorithms

representing morphology (WND-CHARM) classification system was used [13].

The system extracts 2885 texture features from each knee radiograph and uses

them with a nearest neighbour classifier. The WND-CHARM system was also

used to predict knee OA development defined as an increase in KL grade from 0

at baseline to 3 at follow-up 20 years later [14]. The system was trained on knee

radiographs taken at baseline and then it classified a given knee radiograph as

no-OA-development (KL grade 0 at follow-up) or OA-development (KL grade

3 at follow-up). In other work, a regression model was trained on knee (KL

grade ≥1) radiographs taken at baseline and then it was used to predict OA by

assigning a knee radiograph into no-OA-progression or OA-progression group

[15]. The disease progression was defined as an increase in the sum of joint space

narrowing (JSN) and osteophyte grades between baseline and follow-up three

years later. The regression model uses shape parameters and fractal signature

121


analysis.

However, uses of the classification system and the regression model are limited.

This is because features of the WND-CHARM system are redundant and their

computation is time-consuming [16], shape parameters used in the regression

model describe bone anisotropy only in the horizontal and vertical directions

and they are calculated using the box-counting technique that highly depends

on image signal-to-noise ratio and trabecular marrow pore size [17]. Recently

developed Hurst orientation transform [18] and variance orientation transform

[19] methods overcome most of these problems, but the results produced depend

on angular space quantisation.

One possible solution for knee OA detection and prediction is a

dissimilarity-based texture classification [20–22]. Unlike in other approaches,

distances between TB texture images are used for classification, avoiding

the problems associated with the use of image features. In previous study,

the distances were calculated using a scale- and rotation-invariant signature

dissimilarity measure (SDM) [22]. For detection of knee OA, the SDM combined

with a support vector machine (SVM) classifier achieved 85.4% classification

accuracy, outperforming the benchmark WND-CHARM system. For prediction

of knee OA progression, a generalised linear model based on roughness and

orientation texture features obtained from the SDM achieved the accuracy of 0.74

AUC (area under the receiver operating characteristic curve) [23].

Although the results obtained are promising further work is required to

increase the accuracies. This could be achieved by combining the SDM with

a classifier ensemble. The rationale behind this is that classifier ensembles are

generally more accurate than a single classifier; as shown for a wide range of

classification problems [24–26]. However, this approach requires both generation

and combination of diverse and accurate classifiers.

122


To address these problems, a dissimilarity-based multiple classifier (DMC)

system was developed in this study. The system consists of the following

three components: 1) measurement of distances between TB texture images,

2) generation of a diverse classifier ensemble and 3) selection of accurate

classifiers from the ensemble and combination of the classifiers selected. For

the measurement of the distances, the SDM method is used. A diverse classifier

ensemble is generated using prototype selection [27], bootstrapping of training

set [28] and heterogeneous classifiers [29]. A measure of competence based on

random classification (MCR) [30] is used to select accurate classifiers from the

ensemble. The classifiers selected are combined using the majority voting rule. To

evaluate the performance of the DMC system, two data sets of knee radiographs

were chosen. The first data set, used for OA detection, contains radiographs

taken from healthy (KL grade 0) and OA (KL grade 2 or 3) knees. The second

data set, used for prediction of OA progression, contains radiographs of knees

with non-progressive and progressive OA. The disease progression was defined

as an increase in the sum of JSN and osteophyte grades between baseline and

follow-up four years later.

2. Materials and methods

2.1 Subjects and radiographs

Informed consent was obtained from all subjects studied. For each subject,

standardised anteroposterior knee radiographs were taken in semi-flexed (30◦)

position using a Shimadzu Corporation (Kyoto, Japan) x-ray machine (model

P-20) and a radiographic frame [31] at Perth Radiological Clinic, Subiaco, Western

Australia.

Detection of knee OA

102 knee radiographs were taken from 51 subjects. For each radiograph, the

medial and lateral tibiofemoral compartments were graded at KL scale according

123


to the 1995 Osteoarthritis Research Society International (OARSI) atlas [32]. The

compartments were graded by two readers with 10 years of experience each.

The discrepancies in grades were adjudicated by a third reader with 15 years

of experience. The grades were used to divide the subjects into healthy and

OA groups. The healthy group contained 17 subjects who had KL grade 0 (no

radiographic OA) in both tibiofemoral compartments in both knees. The OA

group contained 34 subjects who had KL grade 2 or 3 (minimal or moderate

radiographic OA) in at least one tibiofemoral compartment in at least one knee.

Subject characteristics for each group are listed in Table 1.

A TB texture region of size 256×256 pixels was selected under each of the

medial and lateral compartments. For OA subjects, the bone region was selected

only under the compartments diagnosed with radiographic OA. An example

of knee radiograph on which the regions are selected is shown in Fig. 1. The

region selection was performed using an automated region selection method

[33]. The method uses an active shape model for the delineation of cortical bone

plates and a fine region adjustment for fibula head, periarticular osteopenia, and

subchondral bone sclerosis. The regions selected formed classes of 68 healthy

and 69 OA texture images, respectively. Examples of the images are shown in

Figs. 2(a)–(d). The classes were previously used to evaluate the SDM method in

knee OA detection [22].

Prediction of knee OA progression

From 50 subjects, 100 knee radiographs were taken at baseline (from September

2000 to November 2001) and another 100 at follow-up four years later. Each

knee radiograph was graded for tibiofemoral joint space narrowing (JSN) and

osteophytes grades (0 to 3, where 0 indicates no JSN and osteophytes) in the

medial and lateral compartments according to the 1995 OARSI atlas. The same

three readers graded the radiographs. Based on grades the compartments were

divided into non-progressive and progressive groups. The first group contained

124


141 compartments with no increase in the sum of JSN and osteophytes grades

between baseline and follow-up. The second group contained 59 compartments

with an increase in the sum between the two examinations. 11 knees had both

non-progressive and progressive compartments. The groups had different subject

characteristics, i.e. age, BMI (body mass index) and gender at baseline and this

could cause bias in results [34, 35]. Therefore, 81 non-progressive compartments

from 14 subjects were excluded (Table 2) to match the characteristics. This left the

total number of compartments at 119.

Under each of the compartments a TB texture region of size 256×256 pixels was

selected using the automated region selection method. The selection was done

only for the radiographs taken at baseline. The regions formed non-progressive

and progressive classes containing 60 and 59 texture images, respectively. Their

examples are shown in Figs. 2(e)-(h).

2.3 DMC system

A flowchart of the DMC system is shown in Fig. 3. Components of the system are

described in the following sections.

Measurement of distances between images

Distances between TB texture images are measured using the SDM method [22].

The method produces a matrix D = (dij)N×N for N images where dij is the

distance between i-th and j-th images as follows.

First, a scale-space representation of each image is calculated as K convolutions

of the image with the Gaussian kernel functions [36]. The functions differ in

scale parameter σ that is taken from a sequence of predefined scales σ1, . . . , σK .

In this study, the sequence was set to σk = 2 × 1.07k, k = 1, . . . , 25 which

corresponds to TB image sizes ranging from 0.11 to 1.08 mm. Previous studies

showed that OA changes in TB are most significant in the size range [4, 37].

125


Next, scale and rotation invariant gradient and Laplacian differential operators

[22] are calculated for each image representation. The operators calculated act

as edge and smoothness detectors, respectively. Maximum absolute values of

the operators are then found over all scales for each pixel. This step allows for

detecting edges and smooth regions that are most prominent in the texture image.

The values found are used to calculate roughness and orientation measures for

each pixel. The roughness measure is defined as a difference between the two

maximum absolute values, and it is negative if a neighbourhood of the pixel is

smooth and positive otherwise. The orientation measure is defined as an angle of

the gradient with the maximum absolute value, and it is calculated with respect to

the direction that captures most texture changes in the image. The two measures

are invariant to in-plane rotation and scale within the predefined range of scales.

Next, for each image representation, histograms of the measures are used to

produce roughness and orientation signatures. Finally, the distance dij is defined

as a sum of distance between two roughness signatures and distance between two

orientation signatures produced for i-th and j-th images. For the calculation of

distances between the signatures, Earth movers distance is used [38]. The matrix

D produced is invariant to a range of imaging conditions such as exposure,

magnification, noise, and blur [22]. For TB texture images, the matrix is affected

by bone properties such as trabecular thickness, separation and orientation that

change with knee OA [3].

Generation of a diverse classifier ensemble

The diverse classifier ensemble consists of three different subsets of classifiers.

These subsets are generated using prototype selection [27], bootstrapping of

training set [28] and heterogeneous classifiers [29], respectively.

Subset 1: Prototype selection

For this subset, first a percentage of TB texture images, called prototypes, is

randomly selected from each class. Then, classifiers are trained on the distances

126


between the prototypes and the remaining TB texture images. 100 linear and

100 quadratic discriminant classifiers were used with the prototypes consisting

of 20% and 10% of all TB texture images, respectively. The percentages of

prototypes and the number and types of classifiers used are a trade off between

good performance, high diversity and low computational complexity [27].

Subset 2: Bootstrapping of training set

This subset contains 50 decision tree classifiers with the Gini splitting criterion

and the pruning level set to 5 [39]. The classifiers were trained in a

five-dimensional feature space obtained by projecting the distances taken from

the matrix D. For the distance projection, Sammons nonlinear mapping was used

[40]. Each classifier was trained on a bootstrapped sample of training set.[28]

Previous studies showed that a combination of the classifiers outperforms the

single best classifier [26, 30].

Subset 3: Heterogeneous classifiers

11 heterogeneous classifiers are included in this subset [39]: (1) linear, (2)

quadratic, (3) nearest mean, (4-6) k-nearest neighbour classifiers with k = 1, 5, 15,

(7 and 8) Parzen classifiers with the optimal smoothing parameter h or h/2, (9)

decision tree classifier with the Gini splitting criterion and the pruning level set

to 5, (10) SVM classifier with radial basis function (RBF) kernel, (11) feed-forward

backpropagation neural network classifier containing one hidden layer with 10

neurons. Each classifier was trained in a two-dimensional feature space that was

obtained by projecting the distances using the Sammons mapping. Bootstrapping

of training set was used. It was shown that the heterogeneous classifiers perform

well for a number of benchmark data sets [26, 30].

Selection of accurate classifiers and combination of the classifiers selected

To select accurate classifiers from the diverse classifier ensemble, the MCR

method is used [30]. For each TB texture image, the method estimates local

127


classification accuracies of all classifiers. Accuracies of heterogeneous classifiers

are estimated using weighted k = 10 nearest neighbours. For homogeneous

classifiers, the estimation is performed by measuring how ”close” the classifiers

are to a random classifier in an information-theoretic sense. The classifiers with

local accuracies that are higher-than-random are selected from the ensemble and

combined using the majority voting rule. In previous studies, the MCR method

and the voting rule outperformed other methods used for combining classifiers

[30].

2.4 Comparison against other benchmark systems

The performance of the DMC system was compared against the following

systems used for classification of knee radiographs.

WND-CHARM system

This system is a benchmark for detection of knee OA [13] and prediction of knee

OA development [14]. In the system, 2885 features are extracted from each texture

image. The features include Chebyshev statistics, first four moments, multiscale

histograms, Haralick, Tamura and Zernike features, and they are calculated on

raw images, images processed by wavelet, Fourier and Chebyshev transforms

and combinations of the processed images [16]. A Fisher score is calculated for

each feature and 10% of features with the highest scores are selected. The selected

features are used with a weighted nearest neighbour classifier.

SDM-SVM system

This system achieved the best performance in detection of knee OA based on

classification of TB texture images [22]. In the system, the SVM classifier with RBF

kernel is trained on the distances between TB texture images measured using the

SDM method. The SVM classifier was chosen because of its good performance

for a wide range of binary classification problems.

128


3. Results

Classification accuracy (the percentage of correctly classified TB texture images),

specificity (the percentage of healthy/non-progressive images classified as

healthy/non-progressive) and sensitivity (the percentage of OA/progressive

images classified as OA/progressive) were calculated as averages over 5

repetitions of 2-fold cross-validation. Differences between the accuracies were

evaluated using 5×2 cv F test [41] with the significance level of P<0.05.

For detection of knee OA, the DMC system (90.51%) had statistically significantly

higher classification accuracy that those of the SDM-SVM (84.96%) and

WND-CHARM (67.74%) systems (Table 3). The system developed had also the

highest specificity and sensitivity values. For the DMC and SDM-SVM systems,

the specificity values were lower than their respective sensitivities.

The DMC system (80.00%) had also statistically significantly higher classification

accuracy than the SDM-SVM (71.09%) and WND-CHARM (63.87%) systems

for prediction of knee OA progression (Table 4). The highest specificity value

(82.00%) was obtained for DMC. The DMC and WND-CHARM systems had

sensitivity values lower than their respective specificities.

The total computational time required for the WND-CHARM system to produce

the detection and the prediction results was 571.9 minutes. For the SDM-SVM

and DMC systems, this took 62.1 and 66 minutes, respectively. The computational

times were measured on a Intel Core2 Quad computer with 4 GB of RAM and 2.66

GHz clock using MATLAB.

4. Discussion

A new dissimilarity-based multiple classifier system, called a DMC system, for

TB texture images was developed and used to detect and predict knee OA.

129


Unlike other classification systems, the DMC system uses distances calculated

between images and accurate classifiers. For the calculation of the distances, the

SDM method was used. A special approach was applied to generate a diverse

ensemble consisting of three subsets of classifiers, and then accurate classifiers

were selected from the ensemble using the MCR method and combined according

to the majority voting rule.

The performance of the DMC system in detection of OA was investigated using

TB texture images extracted from radiographs of healthy (KL grade 0) and OA

(KL grade 2 or 3) knees. Results showed that the DMC is more accurate (by

22.77%) than the benchmark WND-CHARM system. One possible reason for the

higher accuracy could be that the SDM method used in DMC quantifies bone

texture roughness and orientation for scales that were adjusted to trabecular

image sizes at which OA changes are most prominent [4]. In contrast, texture

features extracted in the WND-CHARM system were calculated for fixed scales

that cannot be adjusted. Previous studies showed that the roughness and

orientation of TB texture change at the trabecular sizes as results of fenestration

and thinning of trabeculae and realignment of bone microstructure towards

the main loading direction in knee OA [1, 2, 4, 42]. Another reason could be

that the SDM method is invariant to a range of imaging conditions such as

exposure, magnification, noise, and blur encountered in a routine screening of

knee radiographs. The WND-CHARM system extracts features that are sensitive

to image rotation (e.g. Tamura features [43]) and angular space quantisation (e.g.

Haralick features [44]). Consequently, even a small change in imaging conditions

can be detrimental to the ability of the WND-CHARM system to discriminate

between images of healthy and OA knees.

The DMC system is also more accurate than the SDM-SVM system. This could

be explained by the fact that the latter system uses a single classifier instead of

a combination of classifiers. It was shown that combined classifiers reduce risks

130


associated with choosing space that does not contain the optimal classifier and

falling into local error minima during training [24]. Another possible explanation

could be that the classifiers combined in DMC are able to reproduce arbitrary

decision boundaries and they are not sensitive to outliers. The single SVM

classifier used in the SDM-SVM, however, could perform poorly in detecting knee

OA where decision boundaries are piecewise smooth or the number of outliers is

unknown [45].

The performance of the DMC system was also investigated in prediction of

OA progression defined as an increase in the sum of JSN and osteophytes

grades between baseline and follow-up. The DMC system exhibited the best

performance. This could be explained in a similar way to that of the detection

of knee OA. In particular, by the fact that the DMC system is adjusted to the

scales at which OA progression affects roughness and orientation of TB texture.

The progression is manifested by the retention of horizontal trabeculae and

reorganisation of bone microstructure due to trabeculae being less well aligned

to the main loading direction [42].

For detection of knee OA, the DMC system had the highest specificity (87.65%)

and sensitivity (93.33%) values. In the case of prediction of knee OA progression,

the SDM-SVM (81.36%) and DMC (77.97%) systems had the two highest

sensitivity values. The higher sensitivity of the former system, however, came at

the cost of very low specificity value (61.00%); much lower than 82.00% obtained

for the system developed. The DMC values were higher than those obtained for

the shape parameters used in the previous study on the prediction of knee OA

progression [15]. For medical applications, however, slightly higher sensitivity

than specificity for OA prediction could be preferable. This is because the cost

of treating OA at later stages is usually higher than the cost of misdiagnosing a

non-progressive knee.

131


In conclusion, results obtained indicate that the DMC system developed could

be a useful decision-support tool for the assessment of risk and severity of

the disease. It has higher classification accuracies for knee OA detection (by

22.77% and 5.55%) and prediction (by 16.13% and 8.91%) than the benchmark

WND-CHARM and SDM-SVM systems. The DMC could also be useful in other

medical applications, e.g. diagnosis of interstitial lung disease based on chest

radiography [46], classification of breast lesions using ultrasound images [47]

and discrimination of dermoscopic images for skin lesions [48]. Future work will

focus on further evaluations of the DMC system, especially using large data sets

of knee radiographs from longitudinal studies.

Acknowledgements

The authors appreciate financial support from the School of Mechanical and

Chemical Engineering, University of Western Australia.

132


5. List of figures

Figure 1: An example of knee radiograph with the selected regions.

133


(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 2: Examples of TB texture images from healthy [(a) and (b)], OA [(c) and(d)], non-progressive [(e) and (f)], and progressive [(g) and (h)] classes.

134


Signature dissimilarity measure (SDM)

Accurate classifiers

Measure of competence based on

random classification (MCR)

Knee radiographs

Classified TB texture images

Distances between

TB texture images

DMC system

Diverse classifier

ensemble

Prototype selection, bootstrapping of

training set, heterogeneous classifiers

Majority voting

Classification

Extraction of TB texture images

Figure 3: A flowchart of the DMC system.

135


6. List of tables

Table 1: Subject characteristics for detection of knee OA.

Group No. of TB textureimages (KL grade)

Mean (SD)age in years

Mean (SD)BMI Men/Women

Healthy 68 (0) 40.8 (7.4) 24.6 (4.5) 0.70

OA 37 (2)32 (3) 44.5 (7.5) 27.1 (4.0) 0.69

136


Table 2: Subject characteristics for prediction of knee OA progression.

GroupNo. of TB texture

images (increase inthe sum∗)

Mean (SD)age inyears

Mean (SD)BMI inkg/m2

Men/Women

Non-progressive 60 (0) 47.1 (6.3) 25.5 (1.8) 0.75

Progressive

41 (1)12 (2)5 (3)1 (5)

47.5 (6.0) 26.7 (2.8) 0.83

∗ Sum of JSN and osteophytes grades between baseline and follow-up four years later.

137


Table 3: Classification accuracies, specificities and sensitivities (in percent) of theWND-CHARM, SDM-SVM and DMC systems for detection of knee OA.

System Accuracy Specificity Sensitivity

WND-CHARM 67.74 76.47 59.13

SDM-SVM 84.96 77.65 92.17

DMC 90.51 87.65 93.33

138


Table 4: Classification accuracies, specificities and sensitivities (in percent) ofthe WND-CHARM, SDM-SVM and DMC systems for prediction of knee OAprogression.

System Accuracy Specificity Sensitivity

WND-CHARM 63.87 64.33 63.39

SDM-SVM 71.09 61.00 81.36

DMC 80.00 82.00 77.97

139


7. References



(2004), S10–S19.










Cartilage 13 (2005), pp. 463–470.

[5] E L Radin and R M Rose. “Role of Subchondral Bone in the Initiation

and Progression of Cartilage Damage”. In: Clinical Orthopaedics and Related

Research 213 (1986), pp. 34–40.



Cartilage 15 (2007), pp. 479–486.

[7] L Apostol, V Boudousq, O Basset, C Odet, S Yot, J Tabary, J M Dinten,

E Boiler, P O Kotzki, and F J Peyrin. “Relevance of 2D radiographic texture

analysis for the assessment of 3D bone micro-architecture”. In: Medical

Physics 33 (2006), pp. 3546–3556.


“Estimation of the 3D self-similarity parameter of trabecular bone from its


140









Research 15 (2000), pp. 691–699.

[11] L Pothuaud, P Carceller, and D Hans. “Correlations between grey-level

variations in 2D projections images (TBS) and 3D microarchitecture:


Bone 42 (2008), pp. 775–787.



[13] L Shamir, S M Ling, W W Scott, Jr., A Bos, N Orlov, T J Macura, D M Eckley,

L Ferrucci, and I G Goldberg. “Knee x-ray image analysis method for

automated detection of osteoarthritis”. In: IEEE Transactions on Biomedical

Engineering 56 (2009), pp. 407–415.







(2009), pp. 3711–3722.




141


[17] H W Chung, C C Chu, M Underweiser, and F W Wehrli. “On the fractal

nature of trabecular structure”. In: Medical Physics 21 (1994), pp. 1535–1540.



pp. 460–474.





[20] R P W Duin, D de Ridder, and D M J Tax. “Experiments with a featureless

approach to pattern recognition”. In: Pattern Recognition Letters 18 (1997),

pp. 1159–1166.

[21] E Pekalska and R P W Duin. “Dissimilarity representations allow

for building good classifiers”. In: Pattern Recognition Letters 23 (2002),

pp. 943–956.




[23] T Woloszynski, P Podsiadlo, G W Stachowiak, M Kurzynski, L S

Lohmander, and M Englund. “Prediction of progression of radiographic

knee osteoarthritis using tibial trabecular bone texture”. In: Arthritis &

Rheumatism 64 (2012), pp. 688–695.



[25] P C Smits. “Multiple classifier systems for supervised remote sensing image

classification based on dynamics classifier selection”. In: IEEE Transactions

on Geoscience and Remote Sensing 40 (2002), pp. 801–813.

142


[26] T Woloszynski and M Kurzynski. “A probabilistic model of classifier

competence for dynamic ensemble selection.” In: Pattern Recognition 44

(2011), pp. 2656–2668.

[27] E Pekalska, R P W Duin, and P Paclik. “Prototype selection for

dissimilarity-based classifiers”. In: Pattern Recognition 39 (2006),

pp. 189–208.


pp. 123–140.

[29] A M P Canuto, M C C Abreu, L M Oliveira, J C Xavier, Jr., and A M Santos.

“Investigating the influence of the choice of the ensemble members in

accuracy and diversity of selection-based and fusion-based methods for

ensembles”. In: Pattern Recognition Letters 28 (2007), pp. 472–486.

[30] T Woloszynski, M Kurzynski, P Podsiadlo, and G W Stachowiak. “A

measure of competence based on random classification for dynamic

ensemble selection”. In: Information Fusion 13 (2012), pp. 207–212.

[31] P Podsiadlo and G W Stachowiak. “A rig for acquisition of standardized

trabecular bone radiographs”. In: Acta Radiologica 43 (2002), pp. 101–103.






pp. 1870–1883.

[34] P Manninen, H Riihimaki, M Heliovaara, and P Makela. “Overweight,

gender and knee osteoarthritis”. In: International Journal of Obesity and

Related Metabolic Disorders 20 (1996), pp. 595–597.

143


[35] M Reijman, H A P Pols, A P Bergink, J M W Hazes, J N Belo, A M Lievense,

and S M A Bierma-Zeinstra. “Body mass index associated with onset and

progression of osteoarthritis of the knee but not of the hip: The Rotterdam

Study”. In: Annals of the Rheumatic Diseases 66 (2007), pp. 158–162.

[36] T Lindeberg. “Scale-space for discrete signals”. In: IEEE Transactions on





and Cartilage 16 (2008), pp. 323–329.

[38] Y Rubner, C Tomasi, and L J Guibas. “The Earth Movers Distance as a

metric for image retrieval”. In: International Journal of Computer Vision 40

(2000), pp. 99–121.

[39] R O Duda, P E Hart, and D G Stork. Pattern Classification.


[40] J W Sammon. “A nonlinear mapping for data structure analysis”. In: IEEE

Transactions on Computers C18 (1969), pp. 401–409.

[41] E Alpaydin. “Combined 5x2cv F test for comparing supervised

classification learning algorithms”. In: Neural Computation 11 (1999),

pp. 1885–1892.

[42] E A Messent, R J Ward, C J Tonkin, and C Buckland-Wright. “Cancellous

bone differences between knees with early, definite and advanced joint

space loss; a comparative quantitative macroradiographic study”. In:


[43] H Tamura, S Mori, and T Yamawaki. “Textural features corresponding to

visual perception”. In: IEEE Transactions on Systems, Man and Cybernetics

SMC-8 (1978), pp. 460–473.

144


[44] R M Haralick, K Shanmugam, and I Dinstein. “Textural features for image

classification”. In: IEEE Transactions on Systems, Man, and Cybernetics SMC-3

(1973), pp. 610–621.

[45] A Abe. Support Vector Machines for Pattern Classification (Advances in Pattern

Recognition). Springer, 2005.




Radiology 12 (2005), pp. 337–346.




pp. 280–298.



145

CHAPTER 6

CONCLUSIONS AND FUTURE WORK

This chapter begins with main findings and observations taken from the four

papers that form the core of this thesis. General conclusions drawn from the

studies conducted are presented afterwards. The chapter ends with discussion

on possible future work related to potential applications of the system developed

in medicine and other areas.

1. Summary of main findings and observations

Main findings and observations taken from Chapter 2 (paper 1):

• A new method, i.e. signature dissimilarity measure (SDM), was developed

for measuring distances between trabecular bone (TB) texture regions

selected on knee radiographs. The method was evaluated using TB texture

images of healthy and osteoarthritic (OA) knees, images taken from Brodatz

album, knee radiographs of frozen tibia head, and computer-generated

fractal texture images.

• The accuracy of the method developed in detection of knee OA was studied.

Results showed that the SDM method, when combined with the support

vector machine (SVM) classifier, outperforms the benchmark system for

knee classification.

• The accuracy of the method in rotation-invariant texture classification

was evaluated and compared against the multiresolution grey-scale- and

146

CHAPTER 6. CONCLUSIONS AND FUTURE WORK

rotation-invariant benchmark method. The two methods, when combined

with the nearest neighbour classifier, exhibited comparable performance.

• The effects of imaging conditions on the SDM method were investigated.

The method was found to be invariant to a range of exposure,

magnification, image size, anisotropy direction, noise, and blur

encountered in a routine screening of knee radiographs.

• From the results obtained it was concluded that the SDM method is accurate

in texture classification and invariant to imaging conditions.


• Three parameters of TB texture, i.e. roughness, degree of anisotropy

and direction of anisotropy, were calculated using the SDM method. A

generalised linear model based on the parameters was constructed.

• The accuracy of the model in prediction of knee OA progression was

studied. The results obtained showed that the model can predict OA

progression in knees with early (Kellgren and Lawrence [KL] grade≤1) and

late (KL grade ≥2) radiographic knee OA at baseline.

• The model had higher prediction accuracy than those of models based on

age, sex, body mass index, and joint space narrowing grade at baseline.

• The SDM method could be a valuable tool in prediction of knee OA

progression and in quantification of OA changes in the bone texture.


• A new method, i.e. measure of competence based on random classification

(MCR), was developed for selecting accurate classifiers from classifier

ensembles. The method was evaluated using 14 benchmark data sets.

• Based on theoretical derivation it was shown that the method developed

increases classification accuracy of the majority voting rule.

147


• Results showed that the MCR method outperforms other methods based on

classifier ensembles regardless of the ensemble type used (homogeneous or

heterogeneous).

• The MCR method uses distances between images instead of image features

and hence it can work with dissimilarity measures.


• The SDM and MCR methods were used to form a dissimilarity-based

multiple classifier (DMC) system.

• A special approach was applied to generate homogeneous and

heterogeneous classifier ensembles using distances between TB texture

images.

• The accuracies of the DMC system in detection and prediction of

progression of knee OA were studied. The system developed had higher

accuracies than those obtained for the SDM method combined with the

SVM classifier and the benchmark system.

• The DMC system could be a useful decision-support tool in medicine,

engineering and other areas.

2. General conclusions

The following general conclusions can be drawn from the work conducted in this

thesis:

• Classification of texture images must involve quantification of roughness

and orientation at multiple scales. For the quantification, distances between

images can be used instead of image features.

• The SDM method developed has the ability to measure distances between

texture images in terms of roughness and orientation and is invariant to a

range of imaging conditions.

148


• The SDM method could successfully discriminate between healthy and OA

knees and between knees with non-progressive and progressive OA.

• Selection of classifiers that have better-than-random accuracies improves

performance of methods based on classifier ensembles.

• The MCR method developed has the ability to select accurate classifiers

from both homogeneous and heterogeneous classifier ensembles.

• The SDM and MCR methods form an accurate and robust DMC system that

could successfully detect knee OA and predict knee OA progression.

• The DMC system could eventually be used in other practical applications,

e.g. classification of medical images in diagnosis and prognosis of

diseases and quantification of engineering surfaces for automated machine

inspection.

3. Future work

This thesis describes the development of a new automated classification system

for detection and prediction of knee OA and its evaluation on medical images of

bone texture. Further work could aim at extending the applicability of the system

to medicine and other areas.

3.1 Medicine

The results obtained in this thesis demonstrated potential of the DMC system

for the detection and prediction of knee OA. Therefore, the future work could

focus on the development of a DMC based decision-support tool for patient

monitoring and management of patient treatment against the disease. Current

research on this topic is based on advanced imaging techniques (e.g. computed

tomography [1]) and biochemical markers (e.g. serum cartilage oligomeric

matrix protein, hyaluronan and soluble vascular cell adhesion molecule 1 [2–5]).

149


However, these methods are rather expensive and inaccessible to the majority

of patients and they require highly qualified medical staff and specialised

instruments. In addition, they lack a widely accepted and validated OA scoring

system. In contrast, since the DMC system uses plain radiography and the

gold standard of Kellgren and Lawrence scale it could provide inexpensive and

reliable means of evaluating the effects of medication, intra-articular injections

and surgical interventions on the disease progression for patient monitoring and

treatment.

The future work could also focus on using the DMC system in those medical

applications where classification and quantification of biological surfaces are

required. For example, the system could be used for the assessments of

fracture risk due to osteoporosis [6, 7] and trabeculae network damage due to

rheumatoid arthritis [8], quantification of TB structure on dental radiographs

[9], classification of breast lesions using ultrasound images [10], diagnosis of

interstitial lung disease based on chest radiography [11, 12], and discrimination of

skin lesions using dermoscopic images [13]. Other medical applications include

quantifications of surfaces of dental ceramics [14] and bone implants [15]. After

extending the SDM method to three dimensions, the DMC system could also

find applications in the analysis of bone, blood vessels and biomaterials from

tomographic images [16] and classification of brain tumours from magnetic

resonance imaging [17].

3.2 Other areas

The work conducted in this thesis showed that the DMC system can be used

for multiscale quantification of texture roughness and orientation and that it

is invariant to a range of imaging conditions. Thus, the system could be

a useful tool for the analysis and characterisation of engineering surfaces.

This includes applications in metrology in 3D surface topography description,

150


machine condition monitoring and failure prediction based on classification of

wear particles [18] and automated surface inspection and quality control based

on quantification of defected/worn engineering surfaces [19]. The system could

also be used for finding a relationship between topographies of surfaces and their

friction coefficients. Current research targeted at finding the relationship uses

basic parameters (e.g. Ra and Rq) that provide limited information about surface

texture [20, 21]. On the opposite, the DMC system provides detailed information

about texture roughness and orientation for predefined scales. Therefore,

complete description of engineering surfaces, including textural and frictional

characteristics, would be possible when the relationship between topographies

of the surfaces and their friction coefficients is found.

The DMC system could also find applications in food engineering and chemistry,

geology, botany and forestry, mining and textile industries, and remote sensing.

In food engineering and chemistry, the system could be used for segregation

of tea granules [22, 23], discrimination of crumb grains based on visual

appearance [24] and classification of meat texture [25, 26]. In geology, botany

and forestry, the system could be a useful tool for classification of surfaces of

heavy-mineral grains [27], identification of plants [28] and discrimination of

vegetation communities [29]. In mining and textile industries, the system could

be used for iron ore particle characterisation [30], estimation of run-of-mine ore

composition [31] and automated inspection of textile fabrics [32]. The system

could also be used for classification of land cover obtained through aerial space

photography in remote sensing [33].

151


4. References

[1] P N Bansal, N S Joshi, V Entezari, M W Grinstaff, and B D

Snyder. “Contrast enhanced computed tomography can predict the

glycosaminoglycan content and biomechanical properties of articular

cartilage”. In: Osteoarthritis and Cartilage 18 (2010), pp. 184–191.

[2] Y M Golightly, S W Marshall, V B Kraus, J B Renner, A Villaveces, C Casteel,

and J M Jordan. “Biomarkers of incident radiographic knee osteoarthritis”.

In: Arthritis & Rheumatism 63 (2011), pp. 2276–2283.

[3] V B Kraus. “Biomarkers in osteoarthritis”. In: Current Opinion in

Rheumatology 17 (2005), pp. 641–646.

[4] S M Ling, D D Patel, P Garnero, M Zhan, M Vaduganathan, D Muller,

D Taub, J M Bathon, M Hochberg, D R Abernethy, E J Metter, and L Ferrucci.

“Serum protein signatures detect early radiographic osteoarthritis”. In:


[5] J Cibere, H Zhang, P Garnero, A R Poole, T Lobanok, T Saxne, V B Kraus,

A Way, A Thorne, H Wong, J Singer, J Kopec, A Guermazi, C Peterfy,

S Nicolaou, P L Munk, and J M Esdaile. “Association of biomarkers

with pre-radiographically defined and radiographically defined knee

osteoarthritis in a population-based study”. In: Arthritis & Rheumatism 60

(2009), pp. 1372–1380.

[6] C L Benhamou, S Poupon, E Lespessailles, S Loiseau, R Jennane, V Siroux,

W Ohley, and L Pothuaud. “Fractal analysis of radiographic trabecular

bone texture and bone mineral density: two complementary parameters

related to osteoporotic fractures”. In: Journal of Bone and Mineral Research

16 (2001), pp. 697–704.

[7] B Brunet-Imbault, G Lemineur, C Chappard, R Harba, and C L Benhamou.

“A new anisotropy index on trabecular bone radiographic images using the

fast Fourier transform”. In: BMC Medical Imaging 5 (2005), pp. 4–14.

152


[8] C B Caldwell, E L Moran, and E R Bogoch. “Fractal dimension as a measure

of altered trabecular bone in experimental inflammatory arthritis”. In:

Journal of Bone and Mineral Research 13 (1998), pp. 978–985.

[9] T D Faber, D C Yoon, and S K Service S C White. “Fourier and wavelet

analyses of dental radiographs detect trabecular changes in osteoporosis”.

In: Bone 35 (2004), pp. 403–411.




pp. 280–298.



(2009), pp. 226–230.




Radiology 12 (2005), pp. 337–346.



[14] J L Drummond, M Thompson, and B J Super. “Fracture surface examination

of dental ceramics using fractal analysis”. In: Dental Materials 21 (2005),

pp. 586–589.

[15] A Wennerberg. “The importance of surface roughness for implant

incorporation”. In: International Journal of Machine Tools and Manufacture 38

(1998), pp. 657–662.

[16] R E Guldberg, R T Ballock, B D Boyan, C L Duvall, A S Lin, S Nagaraja,

M Oest, J Phillips, B D Porter, G Robertson, and W R Taylor. “Analyzing

bone, blood vessels, and biomaterials with microcomputed tomography”.

In: IEEE Engineering in Medicine and Biology Magazine 22 (2003), pp. 77–83.

153







pp. 120–130.






pp. 120–130.

[19] Z Peng and T B Kirk. “Computer image analysis of wear particles in

three-dimensions for machine condition monitoring”. In: Wear 223 (1998),

pp. 157–166.

[20] N S S Marand P K D V Yarlagadda and C Fookes. “Design and development

of automatic visual inspection system for PCB manufacturing”. In: Robotics

and Computer-Integrated Manufacturing 27 (2011), pp. 949–962.

[21] P L Menezes and S V Kailas. “Influence of surface texture and roughness

parameters on friction and transfer layer formation during sliding of

aluminium pin on steel plate”. In: Wear 267 (2009), pp. 1534–1549.

[22] R Singh, S N Melkote, and F Hashimoto. “Frictional response of precision

finished surfaces in pure sliding”. In: Wear 258 (2005), pp. 1500–1509.

[23] S Borah, E L Hines, and M Bhuyan. “Wavelet transform based image texture

analysis for size estimation applied to the sorting of tea granules”. In:

Journal of Food Engineering 79 (2007), pp. 629–639.

[24] D Wu, H Yang, X Chen, Y He, and X Li. “Application of image texture

for the sorting of tea categories using multi-spectral imaging technique

154


and support vector machine”. In: Journal of Food Engineering 88 (2008),

pp. 474–483.

[25] U Gonzales-Barron and F Butler. “Discrimination of crumb grain visual

appearance of organic and non-organic bread loaves by image texture

analysis”. In: Journal of Food Engineering 84 (2008), pp. 480–488.

[26] O Basset, B Buquet, S Abouelkaram, P Delachartre, and J Culioli.

“Application of texture image analysis for the classification of bovine

meat”. In: Food Chemistry 69 (2000), pp. 437–445.

[27] J Li, J Tan, and P Shatadal. “Classification of tough and tender beef by image

texture analysis”. In: Meat Science 57 (2001), pp. 341–346.

[28] J P Moral Cardona, J M Gutierrez Mas, A Sanchez Bellon,

S Dominguez-Bella, and J Martinez Lopez. “Surface textures of

heavy-mineral grains: a new contribution to provenance studies”. In:

Sedimentary Geology 174 (2005), pp. 223–235.

[29] O Martinez Bruno, R de Oliveira Plotze, M Falvo, and M de Castro.

“Fractal dimension applied to plant identification”. In: Information Sciences

178 (2008), pp. 2722–2733.

[30] H Murray, A Lucieer, and R Williams. “Texture-based classification of

sub-Antarctic vegetation communities on Heard Island”. In: International

Journal of Applied Earth Observation and Geoinformation 12 (2010),

pp. 138–149.

[31] E Donskoi, S P Suthers, S B Fradd, J M Young, J J Campbell, T D Raynlyn,

and J M F Clout. “Utilization of optical image analysis and automatic

texture classification for iron ore particle characterisation”. In: Minerals

Engineering 20 (2007), pp. 461–471.

[32] J Tessier, C Duchesne, and G Bartolacci. “A machine vision approach to

on-line estimation of run-of-mine ore composition on conveyor belts”. In:

Minerals Engineering 20 (2007), pp. 1129–1144.

155


[33] K L Mak and P Pen. “An automated inspection system for textile fabrics

based on Gabor filters”. In: Robotics and Computer-Integrated Manufacturing

24 (2008), pp. 359–369.

156

Documents

AUTOMATED CLASSIFICATION SYSTEM FOR … · The development of an automated classiﬁcation system for ... MV majority voting ... completion of this thesis. 1. Background Automated