Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Automatic Detection andClassification of Vertebral Fracture
using Statistical Models ofAppearance
A thesis submitted to the University of Manchester for the degree ofDoctor of Philosophy in the Faculty of Medical and Human Sciences
2008
Martin G Roberts
School of Medicine
Contents
1 Introduction 20
1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.1.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.1.2 Context - Osteoporosis . . . . . . . . . . . . . . . . . . . . . . 20
1.1.3 Novel vertebral fracture detection methods . . . . . . . . . . 21
1.2 Overview of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Clinical Background 24
2.1 Osteoporosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.1 Epidemiology . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1.2 Prevention and Treatment . . . . . . . . . . . . . . . . . . . . 28
2.1.3 Interpretation of Indicators of Osteoporosis . . . . . . . . . . . 30
2.2 Measurement of Bone Mineral Density . . . . . . . . . . . . . . . . . 30
2.2.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.2 Single Photon Absorptiometry (SPA) . . . . . . . . . . . . . . 32
2.2.3 Dual Photon Absorptiometry (DPA) . . . . . . . . . . . . . . 32
2.2.4 Dual Energy X-ray Absorptiometry (DXA) . . . . . . . . . . . 33
2.2.5 Quantitative Computed Tomography (QCT) . . . . . . . . . . 34
2.2.6 Quantitative Ultrasonography (QUS) . . . . . . . . . . . . . . 35
2
Contents
2.3 Measurement of Bone Structure and Integrity . . . . . . . . . . . . . 35
2.4 Vertebral Fractures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Vertebral Fracture 38
3.1 Vertebrae and Vertebral Fractures . . . . . . . . . . . . . . . . . . . . 38
3.1.1 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Imaging the Spine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.1 Conventional Radiography . . . . . . . . . . . . . . . . . . . . 43
3.2.2 Imaging with DXA and SXA . . . . . . . . . . . . . . . . . . . 43
3.2.3 Magnetic Resonance Imaging (MRI) . . . . . . . . . . . . . . 46
3.2.4 Sagittal Computed Tomography . . . . . . . . . . . . . . . . . 50
3.3 Vertebral Fracture Identification . . . . . . . . . . . . . . . . . . . . . 50
3.3.1 Semi-quantitative identification of vertebral fracture . . . . . . 51
3.3.2 Quantitative Morphometry . . . . . . . . . . . . . . . . . . . . 53
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4 Model Based Vision 56
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Snakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.1 Deformable Elliptical Models . . . . . . . . . . . . . . . . . . 58
4.3 Elastic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Active Shape Model (ASM) . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Summary of ASM . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.2 Point Distribution Models . . . . . . . . . . . . . . . . . . . . 60
4.4.3 Active Shape Model Search . . . . . . . . . . . . . . . . . . . 64
4.5 Active Appearance Models . . . . . . . . . . . . . . . . . . . . . . . . 67
Word Count: 48996 3
Contents
4.5.1 Background to the Active Appearance Model . . . . . . . . . 67
4.5.2 Appearance Models . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.3 Fitting Appearance Models . . . . . . . . . . . . . . . . . . . 74
4.5.4 Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5.5 Extensions to the AAM . . . . . . . . . . . . . . . . . . . . . 77
4.5.6 Constrained AAM . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6 Model Optimisation using Minimum Description Length . . . . . . . 81
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5 The consistent combination of multiple sub-model AAMs 84
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2 Model-Based Segmentation - Some Trade-Offs . . . . . . . . . . . . . 85
5.2.1 Statistical Models in Medical Imaging . . . . . . . . . . . . . . 85
5.2.2 Global vs Local Models . . . . . . . . . . . . . . . . . . . . . . 85
5.3 Combining Overlapping Sub-Models . . . . . . . . . . . . . . . . . . . 86
5.3.1 Vertebral Triplet Modelling . . . . . . . . . . . . . . . . . . . 86
5.3.2 Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3.3 Dynamic Sub-Model Sequence Ordering Algorithm . . . . . . 88
5.3.4 Algorithm Pseudo-Code . . . . . . . . . . . . . . . . . . . . . 89
5.3.5 Updating the Constraint Variance . . . . . . . . . . . . . . . 90
5.3.6 Quality of Fit Measure . . . . . . . . . . . . . . . . . . . . . . 101
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6 Vertebral Segmentation using Multiple AAMs 104
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2 ASM vs AAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Word Count: 48996 4
Contents
6.3 Data - DXA Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3.1 Summary of Training Set . . . . . . . . . . . . . . . . . . . . . 105
6.3.2 Shape annotation . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3.3 Point correspondence . . . . . . . . . . . . . . . . . . . . . . . 110
6.4 Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.5.1 Summary of optimal AAM determination . . . . . . . . . . . . 115
6.5.2 AAM form used . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.5.3 Initialisation Method and AAM profile length . . . . . . . . . 127
6.5.4 Point constraint form . . . . . . . . . . . . . . . . . . . . . . . 127
6.5.5 Optimisation of the sub-model structure . . . . . . . . . . . . 130
6.5.6 Multiple Initialisations for Fractured Vertebrae . . . . . . . . 135
6.5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7 Vertebral Fracture Classification using Shape and Appearance Pa-rameters 143
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.2 Classification Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.2.1 Data and Ground Truth . . . . . . . . . . . . . . . . . . . . . 144
7.2.2 Linear Classifiers - Inputs and Training Scheme . . . . . . . . 145
7.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.3.1 Initial APM form selection . . . . . . . . . . . . . . . . . . . . 150
7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.6 Classifying given a semi-automatic segmentation . . . . . . . . . . . . 167
Word Count: 48996 5
Contents
7.6.1 Semi-automatic method . . . . . . . . . . . . . . . . . . . . . 167
7.6.2 Semi-automatic Results . . . . . . . . . . . . . . . . . . . . . 168
7.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8 Segmention of vertebrae in radiographs 174
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.2.2 AAM approach . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.4.1 Overall Accuracy Performance . . . . . . . . . . . . . . . . . . 180
8.4.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8.4.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9 Conclusions and Further Work 183
9.1 Summary of Original Work and Results . . . . . . . . . . . . . . . . . 183
9.1.1 AAM methodological developments . . . . . . . . . . . . . . 183
9.1.2 Vertebral Segmentation . . . . . . . . . . . . . . . . . . . . . . 184
9.1.3 Vertebral Classification . . . . . . . . . . . . . . . . . . . . . . 185
9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.2.1 Other modalities . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.2.2 Classifer improvements . . . . . . . . . . . . . . . . . . . . . . 186
9.2.3 Automatic Detection of Search Failure . . . . . . . . . . . . . 187
9.3 Final Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Word Count: 48996 6
Contents
A 189
A.1 Weighted fitting of shape and appearance model parameters . . . . . 189
A.2 Optimal pose parameters . . . . . . . . . . . . . . . . . . . . . . . . . 190
A.3 Weighted fitting of shape model parameters . . . . . . . . . . . . . . 192
A.4 Applying additional appearance model constraints . . . . . . . . . . . 192
Word Count: 48996 7
List of Tables
6.1 Search error statistics (point-to-line) for 6mm profile gradient AAM . 123
6.2 Search error statistics (point-to-line) for 6mm profile intensity AAM . 123
6.3 Search error statistics (point-to-line) for 6mm profile renormalised in-tensity AAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4 Search error statistics (point-to-line) for classical region intensity AAM 124
6.5 Search error statistics (point-to-line) for classical region intensity renor-malised AAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.6 Search error statistics (point-to-line) for classical region intensity sig-moidal 2D gradient AAM . . . . . . . . . . . . . . . . . . . . . . . . 125
6.7 Search error statistics (point-to-line) for region corner feature AAM . 126
6.8 Search error statistics (point-to-line) for 6 step profile gradient AAM 128
6.9 Search error statistics (point-to-line) for 8 step profile gradient AAM 128
6.10 Search error statistics (point-to-line) for 10 step profile gradient AAM 128
6.11 Search error statistics using full covariance matrix for point constraints 128
6.12 Search error statistics (point-to-line) for single vertebra sub-models . 132
6.13 Search error statistics (point-to-line) for semi-triplet sub-models . . . 133
6.14 Search error statistics (point-to-line) for quintet sub-models . . . . . 133
6.15 Search error statistics (point-to-line) for single global model . . . . . 133
6.16 Shape Model Intrinsic Accuracy for Triplet sub-models . . . . . . . . 133
6.17 Shape Model Intrinsic Accuracy for Quintet sub-models . . . . . . . . 134
8
List of Tables
6.18 Shape Model intrinsic accuracy for a single global model . . . . . . . 134
6.19 Search error statistics using alternative fractured initialisations . . . . 138
6.20 Accuracy and Precision by individual vertebrae . . . . . . . . . . . . 140
7.1 Beta-convolved false positive rates (%) for the gradient appearanceclassifier as a function of variance retained in texture model . . . . . 157
7.2 Beta-convolved false positive rates (%) for the intensity appearanceclassifier as a function of variance retained in texture model . . . . . 157
7.3 Area under ROC curves . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.4 False Positive Rates (%) in the mid-thoracic spine (T9-T7) for thedifferent classifiers at various sensitivities. . . . . . . . . . . . . . . . 158
7.5 False Positive Rates (%)in the lower-thoracic spine (T12-T10) for thedifferent classifiers at various sensitivites . . . . . . . . . . . . . . . . 160
7.6 False Positive Rates (%)in the lumbar spine for the different classifiersat various sensitivites . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.7 McNemar Test Statistic comparing FPR for various classifiers between93% and 97% sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.8 Overall Patient-Level FPR and Sensitivity given individual vertebraeFPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.9 Classifier Sensitivities for 1%, 2% and 5% FPR, for semi-automaticsegmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7.10 Area under ROC curves given semi-automatic segmentation . . . . . 170
7.11 Overall Patient-Level FPR and Sensitivity given individual vertebraeFPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.1 Search Accuracy Percentiles by Fracture Status for the two profilesamplers used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9
List of Figures
2.1 The microstructure of normal (left) and osteoporotic (right) bone. . . 25
2.2 The trabecular structure of normal (left) and osteoporotic (right) ver-tebrae. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 The variation in fracture incidence rate with age for women. Takenfrom [132] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 The lateral anatomy of a vertebra. . . . . . . . . . . . . . . . . . . . 38
3.2 The spinal column, showing the numbered cervical, thoracic, and lum-bar vertebrae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Examples of spinal radiographs . . . . . . . . . . . . . . . . . . . . . 41
3.4 This radiograph shows an osteoporotic spine with numerous severefractures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 The projection effect of lateral radiography on the spine. . . . . . . . 44
3.6 Examples of parallax effects in radiographs . . . . . . . . . . . . . . . 45
3.7 Examples of DXA images . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8 Examples of vertebral fractures in DXA images . . . . . . . . . . . . 48
3.9 Appearance of verterae on a T1-weighted sagittal slice MRI image ofthe thoracic spine (T11-T9) . . . . . . . . . . . . . . . . . . . . . . . 49
3.10 Examples of non-fracture vertebral deformities . . . . . . . . . . . . . 51
3.11 The Genant semi-quantitative grading system . . . . . . . . . . . . . 52
4.1 Spine shape model variation mode 1 . . . . . . . . . . . . . . . . . . . 63
10
List of Figures
4.2 Spine shape model variation mode 2 . . . . . . . . . . . . . . . . . . . 64
4.3 Spine appearance model variation mode 1 . . . . . . . . . . . . . . . 69
4.4 Spine appearance model variation mode 2 . . . . . . . . . . . . . . . 70
4.5 Spine appearance model variation mode 3 . . . . . . . . . . . . . . . 71
4.6 Spine appearance model variation mode 4 . . . . . . . . . . . . . . . 72
4.7 L1 triplet appearance model variation mode 1 . . . . . . . . . . . . . 73
4.8 L1 triplet appearance model variation mode 1 . . . . . . . . . . . . . 73
4.9 L1 triplet profile gradient appearance model variation mode 1 . . . . 74
4.10 L1 triplet profile gradient appearance model variation mode 2 . . . . 75
4.11 Face corner feature appearance model - mode 1 variation . . . . . . . 78
5.1 Sub-model combination example - two iterations of vertebral triplets . 88
6.1 DXA image with superimposed shape annotation . . . . . . . . . . . 107
6.2 More examples of DXA images with vertebral fractures and superim-posed shape annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3 Zoomed-in view of individual vertebral shape points . . . . . . . . . . 110
6.4 Fractured vertebral shape annotation example . . . . . . . . . . . . . 111
6.5 Fractured vertebral shape annotation example . . . . . . . . . . . . . 111
6.6 Fractured vertebral shape annotation example . . . . . . . . . . . . . 111
6.7 Fractured vertebral shape annotation example . . . . . . . . . . . . . 112
6.8 AAM search failure example with severe fracture . . . . . . . . . . . . 118
6.9 Example of large global contrast variation . . . . . . . . . . . . . . . 132
6.10 Mean point-to-line errors (mm) by vertebral fracture grade, comparingquintet sub-model AAMs to a global AAM . . . . . . . . . . . . . . . 134
7.1 Mid-Thoracic Spine (T7-T9) ROC Curves showing the Eastell-McCloskeyheight classifier and the shape and appearance model linear discriminants158
11
List of Figures
7.2 Lower-Thoracic Spine (T10-T12) ROC Curves showing the Eastell-McCloskey height classifier and the shape and appearance model lineardiscriminants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.3 Lumbar Spine ROC Curves showing the Eastell-McCloskey height clas-sifier and the shape and appearance model linear discriminants . . . . 159
7.4 ROC Curves for combined Grade 1 Fractures showing the Eastell-McCloskey height classifier and the shape and appearance model lineardiscriminants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.5 ROC Curves for combined Grade 2 Fractures showing the Eastell-McCloskey height classifier and the shape and appearance model lineardiscriminants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.6 Visualisation of the (scale-free) discriminant direction in shape param-eter space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.7 Visualisation of the (scale-free) discriminant direction in appearanceparameter space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.8 ROC curves for (semi)automatically-segmented images, with all verte-brae combined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.9 ROC curves for appearance classifier on (semi)automatically-segmentedimages, for the 3 fracture grades . . . . . . . . . . . . . . . . . . . . . 171
8.1 Lumbar radiograph. a) shows the raw image (contrast enhanced); b)shows the automatically located vertebral contours superimposed. . . 176
8.2 Zoomed in view of L3 showing its shape model points . . . . . . . . . 177
8.3 L2 triplet 3SD variation in first (left) and second (right)shape modes 178
12
Glossary
AAM Active Appearance ModelABQ Algorithmically Based Qualitative method of vertebral fracture diagnosisAPM Appearance ModelASM Active Shape ModelBP BisphosphonateBMD Bone Mineral DensityCoV Coefficient of Variation, i.e. precision SD as percentage of Mean.CDF Cumulative Density Function (integral of PDF)DXA Dual Energy X-ray AbsorptiometryFPR False Positive RateHRT Hormone Replacement TherapyLD Linear DiscriminantMAD Median Absolute DeviationMRI Magnetic Resonance ImagingPCA Principal Components AnalysisPDF Probability Density FunctionPDM Point Distribution ModelQCT Quantitative Computed TomographyQM Quantitative Morphometric method of vertebral fracture diagnosisQUS Quantitative UltrasonographyROC Receiver Operating CharacteristicSD Standard DeviationSQ Semi-Quantitative method of vertebral fracture diagnosisSVD Singular Value Decomposition (i.e. of matrix)SVH Short Vertebral Height (i.e. a vertebral deformity)SXA Single Energy X-ray AbsorptiometryWHO World Health Organisation
13
Abstract
Vertebral fractures are an important diagnostic feature for osteoporosis. Howeverexisting expert diagnosis from radiological images is rather subjective, whilst currentquantitative methods require time-consuming hand annotation, and then lack speci-ficity. We develop methods from Computer Vision (the Active Appearance Model)to provide a semi-automatic segmentation method to locate the full outlines of thevertebral bodies. We split the spine up into a number of overlapping sub-models, anddevelop a novel approach to combining multiple sub-models into a consistent overallfit. The accuracy of these methods is shown to be superior to using a single globalmodel - especially in the more difficult fractured cases. Mean segmentation accuracyis comparable to manual precision, and is of the order of 0.75mm for normal verte-brae and 1mm for fractured vertebrae, although accuracy can deteriorate for severefractures. The method was applied to both lateral dual energy X-ray absorptiometry(DXA) scans, and digitised lumbar radiographs.
We develop novel fracture classification methods using the parameters of both shapeand appearance models. Linear discriminants are trained using a consensus expertreading by two radiologists as the gold standard. The classifier performance is evalu-ated on unseen DXA images. By using the appearance parameters, the false positiverates are reduced substantially compared to conventional 3-height morphometry. At95% sensitivity the appearance model classifiers give an overall false positive rate ofunder 5%, compared to 18% with conventional morphometric methods.
Institution The University of ManchesterCandidate Martin G RobertsDegree Title Doctor of PhilosophyThesis Title Automatic Detection and Classification of Vertebral Fracture
using Statistical Models of AppearanceDate 2nd May 2008
14
Declaration
No portion of the work referred to in the thesis has been submitted in support of anapplication for another degree or qualification of this or any other university or otherinstitute of learning.
15
Copyright Statement
1. The author of this thesis (including any appendices and/or schedules to thisthesis) owns any copyright in it (the “Copyright”) and he has given the Uni-versity of Manchester the right to use such Copyright for any administrative,promotional, educational and/or teaching purposes.
2. Copies of this thesis, either in full or in extracts, may be made only in accor-dance with the regulations of the John Rylands University Library of Manch-ester. Details of these regulations may be obtained from the Librarian. Thispage must form part of any such copies made.
3. The ownership of any patents, designs, trade marks, and any and all otherintellectual property rights except for the Copyright (the “Intellectual PropertyRights”) and any reproductions of copyright works, for example graphs andtables (“Reproductions”), which may be described in this thesis, may not beowned by the author and may be owned by third parties. Such IntellectualProperty Rights and Reproductions cannot and must not be made availablefor use without the prior written permission of the owner(s) of the relevantIntellectual Property Rights and/or Reproductions.
4. Further information on the conditions under which disclosure, publication andexploitation of this thesis, the Copyright and any Intellectual Property Rightsand/or Reproductions described in it may take place is available from the Headof School of Medicine (or the Vice-President).
16
Acknowledgements
I would like to thank my supervisors Prof. Tim Cootes and Prof. Judith Adams fortheir guidance, support and enthusiasm during my research.
I also thank Stephen Capener for annotation of the more recent data, and all whocontribute C++ code to the VXL library, which I have used extensively, and inparticular Prof. Tim Cootes for all his APM and AAM source code, and Dr Ian Scottfor his linear classifier training and instantiation classes, and for code for hierarchicalbootstrapped confidence intervals.
The radiologists who classified the DXA images were Prof. JE Adams (JEA) andDr Elisa Pacheco (EP). I also thank Professor Cyrus Cooper for his permission touse a set of radiographs, previously obtained in an epidemiological study under hissupervision [23].
Finally I thank both the Research Endowment in Central Manchester and Manch-ester Children’s University Hospitals NHS Trust∗ for providing initial funding for theproject, and the Arthritis Research Council (ARC) for providing current funding.
∗CMMC account 9504
17
About the Author
Martin Roberts has had a varied background. He graduated from Cambridge in1980, having read Mathematics and Theoretical Physics. He turned his mind fromthe mind-bending world of relativistic quantum mechanics to somewhat more prac-tical mathematical applications by next taking a Masters degree at Lancaster inOperational Research (OR). After this he joined the then Scicon Consultancy (nowEDS-Scicon) in London, working for five years in Operational Analysis simulations,and tracker development for the Royal Navy. He next became more of a software en-gineer, but specialising in algorithmic applications such as process monitoring (faultdetection), manufacturing control, and air traffic control tools (e.g. aircraft conflictdetection). His subsequent specialisation was sonar tracking in naval applicationsand the use of sonar in naval mine hunting operations. This was interspersed with agood deal of time exploring the Indian Himalaya. He also spent 3 years lecturing ORalgorithms in the Mathematics Department of the University of Central Lancashire,with a collaborative research interest in algorithms for predicting (from the geneticsequence) which protein segments are likely to fold into surface-active α−helices, withapplication in penicillin binding proteins†. He joined the Division of Imaging Scienceand Biomedical Engineering (ISBE) at the University of Manchester as a ResearchAssociate in 2003.
He now lives in Halifax, has two daughters, and enjoys all forms of mountaineering,on rock, ice, and mixed routes. He even likes climbing Yorkshire gritstone! He lead-climbs at about Very Severe grade on rock, or Scottish grade III/4 on mixed winterroutes, and also enjoys skiing steep couloirs.
Publications since joining ISBE
Immediately prior to registering for a PhD he published the following paper whichprovided a basis for some of the work in this thesis.
• Roberts MG, Cootes TF, and Adams JE. Linking sequences of activeappearance sub-models via constraints: an application in automated vertebral
†with collaborators Dr DA Phoenix and Dr A Pewsey
18
List of Figures
morphometry. In: 14th British Machine Vision Conference, (pages 349–358)2003.
After registering for a PhD he published the following papers related to the work inthis thesis.
• Roberts MG, Cootes TF, and Adams JE. Vertebral shape: Automatic measure-ment with dynamically sequenced active appearance models. In: 8th MICCAIConference, vol. 2, (pages 733–740). 2005.
• Roberts MG, Cootes TF, and Adams JE. Automatic segmentation of lumbarvertebrae on digitised radiographs using linked active appearance models. In:Graham J, Thacker N, and Cootes T, eds., Medical Image Understanding andAnalysis Conference, (pages 120–124) (BMVA), 2006.
• Roberts MG, Cootes TF, and Adams JE. Improving the segmentation accuracyof fractured vertebrae with dynamically sequenced active appearance models.In: 9th MICCAI Conference - Workshop on joint and bone disease, (pages 1–8).2006.
• Roberts MG, Cootes TF, and Adams JE. Vertebral morphometry: semiauto-matic determination of detailed shape from DXA images using active appear-ance models. Investigative Radiology, 41(12):849–859, 2006.
• Roberts MG, Cootes TF, Pacheco EM, and Adams JE. Quantitative vertebralfracture detection on DXA images using shape and appearance models. Aca-demic Radiology, 14:1166–1178, 2007.
19
Chapter 1
Introduction
1.1 The Problem
1.1.1 Aim
The aim of this thesis is to investigate the use of computer vision techniques to detect
and quantify vertebral fractures due to osteoporosis. Osteoporosis is a progressive
skeletal disease characterised by low bone mass and structural deterioration of bone
tissue, leading to bone fragility and an increased susceptibility to fractures, especially
of the hip, spine and wrist. Early detection of the condition can allow preventative
or therapeutic intervention.
This thesis describes investigations into locating and classifying vertebrae using sta-
tistical models of shape and appearance, with the overall aim of improving the effec-
tiveness of current methods of osteoporosis diagnosis.
1.1.2 Context - Osteoporosis
Osteoporosis is one of the most important diseases facing the elderly, and as life
expectancy increases, this makes it a serious public health problem. The estimated
lifetime risk of sustaining an osteoporotic fracture in the U.S. is 39.7% for women,
and 13.1% for men at the age of 50 [101]. By the age of 80, 70% of U.S. women are
20
Chapter 1. Introduction
osteoporotic [99].
The financial cost of osteoporosis is increasing rapidly. In the EU osteoporosis now
costs more than 4.8 billion Euros annually in hospital healthcare alone, a 33% increase
over three years [131], whilst in England and Wales, the total direct hospital cost of
osteoporotic fractures in 1999 was £584 million [131].
Postmenopausal osteoporosis is a significant cause of morbidity and mortality amongst
the elderly in the Western world, leading to large numbers of fractures of the hip,
spine and wrist. Hip fractures are the most serious and painful: 27% of women who
sustain a hip fracture die within 1 year [102]. In the U.S, the estimated lifetime risk
of hip fracture is 17.5% for women and 6.0% for men [101]. In Europe, in 2000, the
number of osteoporotic fractures was estimated at 3.79 million of which 0.89 million
were hip fractures [80]. These figures are predicted to increase, due to increasing life
expectancy.
Half of all osteoporotic fractures are vertebral: a 50 year old woman has a one in
four chance of having such a fracture, a 50 year old man about half that risk [64].
Vertebral fractures tend to occur about two decades earlier than hip and other osteo-
porotic fractures, and are often the first clinical sign of osteoporosis. The presence
of even one vertebral fracture increases the risk of any subsequent vertebral fracture
five-fold [100], and the risk of a subsequent hip fracture is doubled [9]. Proven thera-
pies are available for patients with vertebral fractures, which reduce the incidence of
subsequent fractures by 50% or more [131]. All such patients need treatment, as the
risk of further fractures is high, around 20% in the 12 months following a recent ver-
tebral fracture. Thus early diagnosis of vertebral fracture is important. Furthermore
in trials of new treatments for osteoporosis, incident vertebral fracture statistics are
studied, and used as a measure of efficacy. This provides another reason for increasing
the reliability and efficiency of vertebral fracture diagnosis.
1.1.3 Novel vertebral fracture detection methods
Currently there are quantitative approaches to diagnosing vertebral fracture, but
these rely on time-consuming and imprecise annotation of vertebrae, in order to
extract morphometric height information. Typically each vertebra is characterised
by three heights (posterior, anterior and middle), and either the heights or various
21
Chapter 1. Introduction
height ratios are thresholded. Such approaches are sometimes referred to as 3-height
morphometry. These current quantitative methods lack specificity (see chapter 3
for a thorough disussion), as well as being time-consuming; whereas a widely ac-
cepted method of semi-quantitative expert reading suffers from subjectivity, and is
less suitable when not practised by skilled radiologists (e.g. other medical special-
ists, general practitioners or specialised radiographers). Therefore we have developed
more specific and reliable quantitative methods applied to spinal images acquired by
dual energy x-ray absorptiometry (DXA) scans. We have also successfully applied
the segmentation phase of these to spinal radiographs.
The first stage is to automatically and accuractely segment the vertebral bodies. We
use methods from Computer Vision (the Active Appearance Model), and develop
these methods to accurately locate the full outlines of the vertebral bodies. To
do this we have split the spine up into a number of overlapping sub-models. We
have developed a novel approach to combining multiple sub-models into a consistent
overall fit. We assess the accuracy of these methods, comparing several different sub-
model structures, and show our approach is superior to using a single global model -
especially in the more difficult fractured cases.
We have developed novel and more specific fracture classification methods using the
parameters of both shape and appearance models (see Chapter 4). Linear classi-
fiers are trained, and their performance evaluated on unseen images using miss-1-out
tests. By using the appearance model parameters, the false positive rates are reduced
substantially compared to existing quantitative methods (3-height morphometry).
1.2 Overview of Thesis
Chapter 2 describes osteoporosis, how it is detected, measured and treated, in order
to provide the context of the project.
Chapter 3 continues the clinical background with a critical review of current meth-
ods of diagnosing vertebral fracture.
Chapter 4 consists of a literature review of computer vision techniques for robustly
segmenting structures in medical images, leading up to the Active Appearance Model
of Cootes et al which we subsequently use.
22
Chapter 1. Introduction
Chapter 5 introduces our development of the AAM methodology to allow the con-
sistent combination of multiple overlapping sub-model AAMS which are fitted in a
data-dependent sequence. We use a constrained form of the AAM in order to incor-
porate the linkage between the sub-models. The linking and sub-model sequencing
algorithm is described in general terms.
Chapter 6 next describes our use of multiple but overlapping Active Appearance
Models to accurately segment vertebrae in DXA images, using the multi-AAM ap-
proach described in Chapter 5. We present results on our DXA dataset and optimise
the AAM form and sub-structures used. We also analyse some of the failure cases
(often these are severe fractures), and propose methods of improving the performance
in difficult cases.
Chapter 7 presents our novel methods of vertebral fracture detection, using linear
classifiers trained on both shape and appearance parameters. The latter encode useful
information about texture around the endplate, and it is shown that this, in addition
to a more complete and subtle shape description, leads to a marked improvement in
specificity compared to existing quantitative (morphometric) methods.
Chapter 8 presents related results on segmentation accuracy of our methods applied
to lumbar radiographs.
Chapter 9 draws some conclusions from the work and outlines areas requiring future
development.
23
Chapter 2
Clinical Background
2.1 Osteoporosis
Osteoporosis is a progressive skeletal disease characterised by low bone mass and
structural deterioration of bone tissue, leading to bone fragility and an increased
susceptibility to fractures, especially of the hip, spine and wrist. Osteoporosis has
many causes, the most common of which is a deficiency in oestrogen production after
the menopause in women. This deficiency causes a loss of both cancellous (trabec-
ular or spongy) and cortical (compact) bone. Most bones comprise a hard cortical
shell, within which is a fine net of trabeculae (strands), which improve bone strength
whilst adding little weight. The loss of trabecular bone is known as postmenopausal,
or ‘Type I’ osteoporosis. This usually begins after the menopause in women between
the ages of 55-65 and gives rise to fractures in skeletal sites that are rich in tra-
becular bone: especially vertebral and wrist fractures. A decrease in cortical bone,
which occurs approximately 15 years later in life, is known as senile, or ‘Type II’
osteoporosis, occurs in both men and women as age advanvces, and leads to fractures
of the hip. Poor diet can also contribute to osteoporosis - for example, deficiencies
in calcium, protein and vitamins C and D adversely affect bone health. Many drug
therapies, such as anticoagulants, glucocorticoids, and hormones used in therapeutic
doses, have side-effects which accelerate bone loss. Osteoporosis can also be caused
by a wide variety of other conditions that affect the remodelling of bone, such as
abnormalities in the endocrine system (e.g. hyperparathyroidisn, hyperadrenalism
and hypogonadism). Osteoporosis can even appear in childhood (osteoporosis imper-
24
Chapter 2. Clinical Background
Figure 2.1: The microstructure of normal (left) and osteoporotic (right) bone.
fecta), due to rare inherited forms of the disease (abnormal Type I collagen) which
result in poor bone formation. Maintainance of bone mass requires regular loading
of the bone, which encourages bone development and remodelling. Individuals who
are less active are therefore more likely to become osteoporotic. Almost any chronic
illness can lead to bone loss, with inactivity and malnutrition being major factors.
The mechanism for osteoporotic bone loss is complex and only partially understood.
Bone mass is maintained by osteoblasts, which form bone, and osteoclasts, which
resorb bone from the skeleton. Trabecular bone, which has the highest surface area,
and largest metabolic activity, is remodelled at a greater rate than cortical bone, and
is therefore lost more rapidly from the skeleton when there is an imbalance between
bone formation and resorption. Trabeculae are reduced in number in osteoporotic
bone and the spacings between trabeculae are greater. Hence the bone becomes me-
chanically weaker. Figure 2.1 shows the difference in micro-architecture between
normal and osteoporotic bone, whilst Figure 2.2 shows the change in the microstruc-
ture of trabecular vertebral bone which occurs with osteoporosis.
Bone mass (in healthy individuals) increases from childhood until the early 20’s,
remains static up to age 40-50 years, after which it declines. In men, this decline is
fairly gradual, but in women, the decline is particularly rapid immediately after the
menopause. This decline in bone mass with age is reflected in an increase in the rate
of osteoporotic fractures in the elderly population.
25
Chapter 2. Clinical Background
Figure 2.2: The trabecular structure of normal (left) and osteoporotic (right) vertebrae.
2.1.1 Epidemiology
Postmenopausal osteoporosis is a significant cause of morbidity and mortality amongst
the elderly in the Western world, leading to large numbers of fractures of the hip,
spine and wrist. The number of osteoporotic fractures in the U.K. has been estimated
at 200 thousand per annum [49].
As the elderly population grows, due to advances in healthcare and demographic
changes, so the proportion of women (and men) suffering from the disease will con-
tinue to increase. Other trends have meant that the number of osteoporosis sufferers
has increased at an even greater rate than that expected from demographic changes
[2].
Of all fractures due to osteoporosis, hip fractures are the most serious and painful.
Approximately 27% of women who sustain a hip fracture die within 1 year [102],
while half will suffer long-term pain and disability [50]. The incidence of hip fractures
increases dramatically with age, as not only does the bone strength decrease due to
osteoporosis, but also individuals become more prone to falling [109]. Figure 2.3
shows the incidence for women of vertebral, hip and wrist fractures as a function of
age.
The prevalence and incidence of vertebral fractures are difficult to measure, as ver-
tebral fractures can be asymptomatic [43]. Estimates of vertebral fractures must
be extrapolated from epidemiological studies. It appears that typically only severe
vertebral fractures actually result in back pain [47], as the prevalence of back pain
26
Chapter 2. Clinical Background
Figure 2.3: The variation in fracture incidence rate with age for women. Taken from [132]
actually declines after age 50, although the prevalence of vertebral fractures increases.
The detection of mild vertebral fractures is additionally complicated by the fact that
no reliable criteria exist to define them, and they are easily confused with other mild
deformities. There is also evidence that vertebral fractures on radiographs are of-
ten not reported [60, 42], or else not acted upon, partly due to the wide variety of
terminology used by radiologists.
In addition to causing pain and deformity to their sufferers, osteoporotic fractures
place a large burden upon national healthcare systems. Over 1.3 million osteoporotic
fractures occur annually in the United States [75]. In Europe in 2000 the number of
osteoporotice fractures was estimated to be 3.79 million, of which 0.89 million were
hip fractures. The financial cost of dealing with osteoporotic fractures in the United
States was estimated to be US$ 20 billion in 1988 and US$ 35 billion in 1998. In
Europe the financial cost was estimated to be 31 billion Euros in 2005; whilst in
England and Wales the cost was £542 million in 1999, and is currently in excess of
£1 billon in the UK. These figures will increase with the number of elderly in the
population in the coming years, providing further impetus for early detection and
effective treatment of osteoporosis. The earlier and more reliably the disease can be
detected, the more patients can benefit from strategies for prevention and treatment.
27
Chapter 2. Clinical Background
The scale of the disease, and its increasing prevalence makes its detection, prevention
and treatment important.
Knowledge of an individual’s lifestyle and medical history can help to detect patients
at high risk of osteoporosis [23, 85]. The most powerful risk factors for osteoporotic
fractures include having low premenopausal oestrogen due to stress, excessive exercise
or anorexia nervosa, and being thin. Dietary and lifestyle factors which increase
the likelihood of osteoporosis include the excessive intake of cigarettes, caffeine, and
alcohol, and a low intake of calcium and vitamin D. Some drugs are also known to
increase risk. An individual’s bone mineral density (BMD) has the most influence on
her/his risk of osteoporotic fracture [119, 137].
2.1.2 Prevention and Treatment
To some extent it is possible to prevent, or at least delay, the disease by avoiding
many of the known risk factors, such as excessive tobacco and alcohol, which have
other negative health consequences. Adequate calcium, vitamin C and D intake, and
regular moderate exercise are also important for maintaining high BMD.
Hormone replacement therapy (HRT) used to be given to women at menopause with
established low bone density. This has been shown to reduce bone loss immedi-
ately after the menopause [48, 134]. Several cohorts and case control studies suggest
that HRT reduces fragility fracture risk by 30 to 50% [134], but that the effect
is lost within 5 years after discontinuation of HRT. However, HRT has side effects
(breast tenderness, uterine bleeding, increased risk of deep venous thromboembolism
and cardiovascular events), and its prolonged use increases the risk of breast can-
cer. Therefore HRT is no longer considered as a first line therapy for prevention
of postmenopausal osteoporosis, except for women who underwent the menopause
before the age of 45. Preliminary results from the Womens HOPE study indicate
that smaller doses of conjugated equine oestrogens (CEE) and medroxyprogesterone
acetate (MPA) are sufficient to slow down the bone turnover and to inhibit bone loss
in early postmenopausal women [89]. Long term evaluation of the side effects of this
regimen is not yet available.
Bisphosphonates (BP) are potent inhibitors of bone resorption through effects on
osteoclast resorption. They are used in a variety of metabolic bone diseases including
28
Chapter 2. Clinical Background
osteoporosis. BPs have a poor intestinal absorption, a long skeletal retention and can
induce mild gastrointestinal disturbances. The three bisphosphonates most frequently
used in the treatment of osteoporosis are etidronate, alendronate and risedronate.
There are other bisphosphonates under study e.g., ibandronate and zoledronate.
Alendronate (10 mg/day) was found to increase BMD, decrease levels of biochemical
markers of bone turnover, and decrease, by about 30-50%, the incidence of fragility
fractures [12]. The anti-fracture efficacy has been shown both in women with preva-
lent vertebral fractures and in women with low BMD (T-score ∗ < −2) but without
vertebral fractures [12, 10]. Risedronate (5 mg daily) decreases the incidence of new
vertebral and peripheral fractures by the same extent as alendronate in women with
prevalent vertebral fractures [74, 112]. In osteoporotic women 70 to 79 years of age,
risedronate decreased the incidence of hip fracture by 40% [96] . Histomorphome-
tric study in patients treated with risedronate for five years supports its excellent
long-term bone safety [129].
The first effective stimulator of bone formation, the recombinant 1-34 fragment of hu-
man parathyroid hormone [rhPTH(1-34)], has recently been approved. Teriparatide
is indicated for the treatment of osteoporosis in postmenopausal women who are at
high risk of a fracture. It also appears to increase bone mass in men with primary
or hypogonadal (low testosterone level) osteoporosis who are at high risk of fracture.
rhPTH(1-34) decreases the incidence of new vertebral fractures and nonvertebral frac-
tures by 65% and 53% respectively in osteoporotic women with prevalent fractures
[105].
Thus proven therapies exist for patients with vertebral fractures which reduce the
incidence of subsequent fractures by 30% to 65%. All patients with prevalent vertebral
fractures require treatment, as there is good evidence that the risk of further fractures
is extremely high, around 20% in the 12 months following a recent vertebral fracture.
Thus early detection of osteoporosis is important, and the early detection of prevalent
vertebral fracture is an important diagnostic feature.
With all new osteoporosis therapies, it is essential that safety and efficacy can be
evaluated rapidly and thoroughly in large multi-centre trials, in order to benefit
patients as soon as possible. The numbers of incident vertebral fractures that occur
for the trial group are used as a measure of efficacy. Furthermore osteopososis can
∗T-score BMD measure is explained shortly when we discuss the measurement of BMD
29
Chapter 2. Clinical Background
be a side-effect of treatments for other conditions: for example chronic glucocorticoid
use [138]. There is also therefore a need for improvements in the detection of incident
fractures during large clinical trials.
2.1.3 Interpretation of Indicators of Osteoporosis
When interpreting indicators of skeletal status, it is important to consider how all
the information available about a patient relates to the patient’s risk of fractures. In
particular age has a significant effect upon how BMD values are interpreted. An 80
year old woman’s BMD may be well below that of a healthy 50 year old, but her
BMD may be above average for her age. Her immediate risk of fracture is much
greater than the 50 year old, but the cumulative risk of her suffering a fracture in the
rest of her life may be less than that of the 50 year old. When considering whether
an individual requires treatment for osteoporosis, both immediate and longer term
risk of osteoporotic fracture are usually considered. The WHO recommend use of a
fracture risk assessment for the next 10 years, using a multi-factor prediction model
including (inter alia) BMD, age, height loss, exposure to systemic glucocorticoids,
parental fracture history, and current fracture status [82].
To detect osteoporosis, and to monitor the effects of treatments on the disease, one
is therefore interested in any measurement that can be performed which relates to
the current and future risk that a patient might suffer an osteoporotic fracture. Such
measurements include direct assessment of bone quantity in the skeleton, measure-
ments of bone turnover using biochemical markers, which help predict future bone
loss, and measurements of bone structure, which contributes to bone strength.
2.2 Measurement of Bone Mineral Density
2.2.1 Purpose
The amount of bone in an individual’s skeleton has been shown to be a very powerful
predictor of the risk of a fracture [78]. In 1994, an operational definition of osteoporo-
sis was proposed by the World Health Organisation (WHO) with diagnostic criteria
of fragility based on the measurement of bone mineral density (BMD) and on the
30
Chapter 2. Clinical Background
presence of fractures [81]. There are four categories:
1. Normal: BMD not more than 1 standard deviation below the young adult
mean.
2. Low bone mass (osteopenia): BMD between 1 and 2.5 standard deviations
below the young adult mean.
3. Osteoporosis: BMD more than 2.5 standard deviations below the young adult
mean.
4. Severe osteoporosis (established osteoporosis): BMD more than 2.5 standard
deviations below the young adult mean in the presence of one or more fragility
fractures.
The normalised score BMD−µR
σRis referred to as the T-score, where µR, σR are the sam-
ple mean and standard deviations in a reference population of young adults. Note
different reference values are used for men and women. This pragmatic definition
in terms of T-score clearly has limitations, as the cut-offs are somewhat arbitrary.
This definition was established for postmenopausal Caucasian women and may not
be applicable to men or women from other ethnic groups, who moreover may have
different population statistics for the “normal” mean and standard deviation. Fur-
thermore the variance of peak BMD depends on the measurement site, and so the
prevalence of osteoporosis (according to this diagnostic) depends on the measurement
site. We discuss in subsequent sections the need to also incorporate measures of bone
structure, strength, and fracture status. In particular a patient in the early stages of
osteoporosis could be in the osteopenia category, but have a number of mild vertebral
fractures, and should be diagnosed as in fact osteoporotic.
Despite limitations in defining osteoporosis in terms of BMD scores, BMD is still
a powerful predictor of osteoporosis. There have been great advances over the last
decade in non-invasive techniques for very accurate measurement of bone mineral at
a range of skeletal sites. The method most commonly in use at present is dual energy
X-ray absorptiometry (DXA), which has superseded single and dual energy photon
absorptiometry (SPA and DPA). Ultrasound scanning, which is portable, and involves
no ionising radiation, is a promising technology for bone mineral measurement; how-
ever it is not sufficiently reliable at present for routine clinical use. Improvements in
31
Chapter 2. Clinical Background
its precision and accuracy may enable it to become a valuable measurement tool in
the future. We now discuss methods of measuring BMD.
2.2.2 Single Photon Absorptiometry (SPA)
Single photon absorptiometry [127] used rectilinear scanning with a beam of 27.3 keV
photons produced from an Iodine 125 source, and a collimated detector to measure
transmitted photons. It was the first method of direct bone mineral measurement
devised, and measured the rate of absorption of photons passing through bone. The
absorption of photons can be related to density and depth of bone through which
the beam has passed. Scanning was performed in a water bath, to correct the effect
of overlying tissue. Bone mineral was measured as ‘bone mineral content’ (BMC)
in grams, measuring the amount of bone in the path of the beam. By dividing this
measure by the projected area (Ap) in the path of the beam, one can also measure
bone mass per unit area (g/cm2), known as areal bone mineral density (BMD). BMD
is the most common measure of bone mass measured by densitometers.
SPA was best suited to the measurement of bone at peripheral skeletal sites, such
as the forearm or heel. To scan more clinically relevant sites, with more overlying
fat and soft tissue, better correction is required. This can be achieved if scanning is
performed at two separate energies. SPA has now been superseded by dual energy
methods (first DPA then DXA, see below).
2.2.3 Dual Photon Absorptiometry (DPA)
Dual photon absorptiometry [140] (now obsolete) measured absorption of photons at
44 and 100 keV, produced by a Gadolinium 153 source. As tissue and bone have
different absorption coefficients at the two energies, the component of absorption
resulting from bone rather than tissue could be calculated. This enabled sites such as
the spine and femoral neck, which are surrounded by soft tissue, to have their bone
density reliably measured.
By performing DPA in a scanning mode, it could be used as an imaging modality,
although it suffered from very low resolution and poor signal/noise, as photon flux
was low and calculation of bone content involves subtraction of images at separate en-
32
Chapter 2. Clinical Background
ergies. This method has in turn been succeeded by dual energy X-ray absorptiometry
(DXA).
2.2.4 Dual Energy X-ray Absorptiometry (DXA)
DXA operates in a similar fashion to DPA, except that a low output X-ray tube is used
instead of a radionuclide source. The use of X-rays enables significantly higher photon
flux to be achieved, resulting in lower noise and improved image quality, shorter scan
times (5 minutes per site), and improved precision of BMD measurements (1-2%).
DXA is capable of measurement of BMD at a large range of skeletal sites, including
the arms, legs, spine, hip and pelvis. Whole body BMD can also be measured. Like
the earlier SPA and DPA, DXA measures “areal” BMD (in g/cm2), which depends
on both volumetric BMD and bone dimensions; but as fracture risk depends on both
bone mineralisation and bone size, areal BMD is a good predictor of fracture risk.
The accuracy and reproducibility of DXA are better than those of other densitometric
methods [63]. There is a good correlation between BMD at different measurement
sites, but the best predictor of the risk of fracture at a site is the BMD measured at
that site [36]. Sites for BMD measurement in clinical practice are the lumbar spine
(L1-L4) and the hip (femoral neck and total hip).
Initially DXA used a pencil beam and a single detector moved in a raster across
the site of measurement. Modern DXA scanners employ fan-beam X-ray sources
and a bank of detectors to image a whole line array simultaneously. This allows
faster scanning (approximately 3-5 minutes per site) than rectilinear (‘pencil’ beam)
systems, and has improved image quality and spatial resolution. DXA is currently
one of the most effective and reliable methods of measuring bone density, and its use
is increasing. There are around 27,000 central DXA scanners worldwide. Its only
major disadvantage is the use of ionising radiation, albeit at a low dose ( only 1-6µSv
for BMD measurement, 7µSv for single-energy mode spine imaging, and 42µSv for
dual energy spine imaging, compared to 500µSv for conventional radiography, [14]).
Imaging artefacts can cause inaccuracies in DXA areal BMD measurements [1], most
commonly in the lumbar spine. Degenerative disc disease with osteophytes, or os-
teoarthritis with hyperostosis of the facet joints can falsely elevate BMD; laminectomy† would falsely reduce the BMD of the affected vertebra; vertebral fracture can also
†the removal of the laminae and spinous process
33
Chapter 2. Clinical Background
falsely elevate BMD (same BMC as before fracture, but Ap is reduced). A similar
effect of reduced projected area can cause overestimation of BMD at the hip (due to
patient positioning), if there is inadequate internal rotation of the femur (resulting
in foreshortening of the femoral neck and reduction of Ap). As DXA uses the soft
tissues as a reference, errors in BMD can also arise if the patient is excessively under-
or overweight.
As the image quality of DXA scans improved, so measurements based upon image
structure rather than intensity (BMD) became feasible. The measurement of verte-
bral shape and other bone dimensions can now be performed from DXA images with
reasonable accuracy, as will be further discussed in 3.2.2.
A review of DXA technology and clinical use is given in [1].
2.2.5 Quantitative Computed Tomography (QCT)
In Quantitative Computed Tomography (QCT) [62] a radiation source produces X-
rays that pass through the patient to a detector on the opposite side. The source and
detector rotate about the imaged volume, and the attentuated X-rays are obtained
as a set of 2-D projections. Mathematical reconstruction algorithms are then used to
reproduce the 3-D representation of the spatial variation in attenuation within the
imaged volume. Calibration phantoms, made of different concentrations of calcium
hydroxyapatite in water-equivalent plastic, are used to convert attenuation to true
volumetric bone mineral density. QCT is the only method regularly in use which
enables volumetric bone mineral density (g/cm3) to be measured, but at the cost
of increased radiation dose. Precision of QCT-measured ‘true’ BMD is excellent (c.
1% CoV). Single energy QCT systems for measurement of bone mass have been in
use since the early 1980s. The 3-D nature of the technique means that QCT allows
examination of the separate contributions of cortical and trabecular bone. Since the
trabecular bone is normally weakened first in osteoporosis, this makes it a sensitive
technique for detecting vertebral bone loss [72], but the method requires careful
calibration using a reference phantom. The most common application of QCT to bone
densitometry is the direct measurement of trabecular bone in the lower vertebrae of
the spine (L1-L3) using general purpose CT scanners. Specialised scanners have also
been developed to measure BMD in peripheral skeletal sites. These obtain 1.2mm
sections through the region of interest, in an effort to reduce dose. Thus QCT is an
34
Chapter 2. Clinical Background
established and useful tool in the measurement of site-specific bone density. The fat
content of trabecular bone means that dual-energy QCT can also be used to further
improve accuracy, but at the cost of higher dose and poorer precision.
2.2.6 Quantitative Ultrasonography (QUS)
In quantitative ultrasonography (QUS) bone density is measured using two param-
eters of ultrasound transmission: speed of sound (SOS) and broadband ultrasound
attenuation BUS [44]. Most equipment measures these parameters at the calcaneus,
phalanges of the fingers, tibia and patella. Most scientific data on QUS fracture
prediction has been obtained at the calcaneus. Correlation between QUS and DXA
BMD is modest [44], and the predictive power of QUS for osteoporotic fracture is
slightly lower than BMD. Despite the limited range of sites at which this technique
can be applied, its low cost, portability, and lack of ionising radiation mean that it
may become a practical alternative to X-ray based methods for routine screening,
although the technique is used primarily as a research tool at present. QUS has yet
to be widely used in clinical practice [44], possibly because its long term precision
is rather low. It is temperature-dependent, and there are not reliable phantoms for
cross-calibration between scanners.
2.3 Measurement of Bone Structure and Integrity
Bone density is not the only determinant of bone strength. BMD alone is insufficient
to determine bone strength [59], and there is a considerable overlap of BMD for
patients with and without fragility fractures [6]. Bone shape and structure also
affect its strength, and hence the likelihood of fracture. Bone shape can affect the
chances of a future fracture in one of two ways. Firstly, a shape change may have
resulted from an osteoporotic fracture itself (such as in the spine), indicating that
damage has already taken place. Secondly, a bone’s shape may affect the stress it
experiences under normal loading. For example, the natural variation in shape of
the femoral neck means that some individuals are at greater risk of hip fracture than
others, purely as a result of their femoral neck shape. Those with greater hip axis
length on DXA are at greater risk of hip fracture [51, 52].
35
Chapter 2. Clinical Background
Measures of the micro-structure of the trabeculae can also help in assessing the bone
strength. Various techniques have been established for describing changes in bone
structure and shape at a range of skeletal sites, and many of these have been shown
to be useful in improving fracture prediction [106, 67]. For example Smyth et al [125]
developed a method of characterising the texture of the proximal femur which had
a moderate correlation with expert radiologist evaluation of the Singh index [124].
Gregory et al [70, 69] have developed a combined predictor of hip fracture based
on BMD, shape descriptors, and texture descriptors of the bone micro-architecture.
Other trabecular bone structure descriptors of the distal radius, the calcaneus, and
the spine have been reported to improve fracture risk evaluation when combined with
BMD [90]. Recent developments in QCT technology (periperal pQCT scanners) allow
high resolution (82µm voxels) of the distal radius and allow individual trabeculae
to be visualised. This allows a variety of histomorphometric measures: trabecular
bone volume fraction, trabecular thickness, spacing and number. It has been found
that pQCT-derived trabecular density at the distal radius is significantly different
in osteopenic women with a history of fracture than in those without [16], whereas
DXA-derived BMD was not able to distinguish between these fracture/non-fracture
groups.
Magnetic Resonance Imaging (MRI) (see also section 3.2.3) can be used to derive
bone structure estimates. Bone tissue has a very low water content, and additionally
protons within bone tissue matrix have a short T2 relaxation time - an MRI measure
reflecting the chemical environment of the protons. As a result bone gives no signal
in standard MRI. In high-resolution T2 weighted MRI images, bone tissue appears
black while bone marrow (because of its water and fat content) produces a high
intensity signal within the inter-trabecular spaces. Thus the trabecular network can
be visualised indirectly through marrow visualisation. MRI derived T2* was found
to have a predictive value in differentiating between healthy women and osteoporotic
women with mild fractures [37]. High resolution MRI bone imaging is most commonly
performed at peripheral sites (heel and wrist), but recent developments have allowed
high-resolution imaging of the proximal femur [86]. Morphology parameters and
fractal analysis from high-resolution MRI images can be used to detect differences in
trabecular structures with age, BMD and osteoporotic status [92, 91].
The use of combined structure/BMD measurements is likely to become increasingly
widespread as technological improvements begin to allow the clinical community to
move towards multiple factor risk assessments. A summary review is given in [84].
36
Chapter 2. Clinical Background
But for now these methods are still more in the research stage than in clinical use.
2.4 Vertebral Fractures
As vertebral fractures can occur relatively early in the progression of osteoporosis,
their presence is a very important diagnostic. The presence of a (non-traumatic)
vertebral fracture is itself an indicator of a loss of bone strength. In trials of osteo-
porosis treatments or prevention regimes in peri- and postmenopausal women, the
rate of vertebral fracture is used as a quantitative measure of the effectiveness of
the method under trial. The diagnosis of vertebral fracture is however by no means
straightforward, and is considered in more detail in the next chapter.
37
Chapter 3
Vertebral Fracture
3.1 Vertebrae and Vertebral Fractures
Vertebrae are box-like structures which form the spine, supporting the body’s weight
whilst allowing flexibility. The basic (lateral) anatomy of a vertebra is shown in
Figure 3.1.
The spine is composed of several sections (Figure 3.2). In this work we will be
concerned with vertebral fractures occuring in the thoracic and lumbar spine. In
the chest region the thoracic spine attaches to the ribs. There are 12 vertebrae in
the thoraic spine, denoted T1 (uppermost) to T12. In the lumbar spine there are
normally five vertebrae, denoted L1 (uppermost) to L5.
A vertebral fracture commonly occurs when the inner structure of the vertebral body
(made from cancellous trabecular bone) has been weakened by osteoporosis, and the
spinous process
vertebral body
superior vertebral notch
inferior vertebral notchinferior articular facet
superior articular facet
cortical bone
trabecular bone
Figure 3.1: The lateral anatomy of a vertebra.
38
Chapter 3. Vertebral Fracture
Figure 3.2: The spinal column, showing the numbered cervical, thoracic, and lumbarvertebrae
39
Chapter 3. Vertebral Fracture
vertebra breaks under only very mild trauma. Typically the central endplate collapses
(or is pushed by the expanding nucleus pulposus) into the vertebral body. Most mild
fractures take the form of a concave central depression of the endplate. Often the
cortical rim of the vertebra remains intact, at least in the early (mild) stages of the
fracture. In lateral view this gives rise to a complex edge appearance, where the
vertebral ring appears as an exterior edge, and the depressed endplate as an inner
edge with a rather diffuse appearance. Fractures may then progress to become wedge-
like (anterior height reduction) if there is fracture of the vertebral ring and vertebral
body cortex. Finally the posterior cortex may also collapse giving rise to a crush
fracture where there is wholesale loss of height. Vertebral fracture is a continuous
process, rather than a simple dichotomy (fractured or not), and in its early stages it
can be difficult to diagnose.
Figure 3.3 shows the typical appearance of normal and fractured vertebrae imaged
using conventional radiography, together with rather ambiguous cases which could be
early mild fractures, but which could also be some other form of mild deformity. Fig-
ure 3.4 shows a radiograph of a spine displaying symptoms of advanced osteoporosis
with numerous severe fractures.
3.1.1 Significance
Low energy vertebral fractures can be a source of back pain, although they can often
be asymptomatic [47], and hence undiagnosed. The presence of even one vertebral
fracture increases the risk of any subsequent vertebral fracture five-fold [100], and
the risk of a subsequent hip fracture is doubled [9]. Vertebral fractures also result in
height loss, kyphosis and morbidity.
Pharmaceutical trials commonly use the presence of vertebral fractures as an entry
criterion for diagnosis of established disease. Although vertebral fracture incidence is
a less powerful and precise indicator of osteoporosis than bone mass, the fact that it
more directly indicates bone strength means that it is a more trusted validation for
treatments. In a report from the 2003 MORE study, the presence of severe vertebral
fracture was the strongest predictor of future vertebral and non-vertebral fractures
[41]
40
Chapter 3. Vertebral Fracture
T9
T7
Figure 3.3: The left hand image is a good quality spinal radiograph showing generallyhealthy vertebrae. However both T9 and T7 (2nd and 4th from bottom) may be early mildfractures, though they could also be slightly deformed for some other reason. The righthand image shows an image with a moderate vertebral fracture
41
Chapter 3. Vertebral Fracture
Figure 3.4: This radiograph shows an osteoporotic spine with numerous severe fractures.
42
Chapter 3. Vertebral Fracture
3.2 Imaging the Spine
3.2.1 Conventional Radiography
The conventional method of imaging the spine in order to detect vertebral fractures,
is to perform lateral X-ray radiography. It is not possible to image the complete
spine on a single X-ray film, so normally both a lumbar and a thoracic radiograph
are taken to cover all vertebrae from L4 to T4.
Lateral radiography causes a projection effect due to the divergent beam, resulting in
enlargement and distortion of the vertebrae furthest from the centering point when
imaged. The effect of this is shown in Figure 3.5, in which vertebrae above or below
the X-ray centre point are magnified and distorted, as a result of the X-ray beam no
longer passing laterally through the vertebra, but at an angle from above or below.
If the patient is not correctly aligned (spine parallax, or perhaps has a scoliosis,
then there may be apparent tilting of the vertebral bodies. This is sometimes called
the “bean-can” effect, and leads to the vertebrae rims being distorted into quasi-
elliptical shapes. In extreme cases adjacent vertebral bodies may even appear to
inter-penetrate. Such radiographs are very difficult to interpret, and sometimes false
positive diagnoses of vertebral fracture may result from the distorted perspective,
which tends to give the endplate rim the concavely depressed appearance typical of
endplate fracture. Figure 3.6 shows a radiograph with a moderate tilting effect, and
a more seriously malaligned radiograph.
Although spinal radiographs give high resolution, they expose patients to a high
radiation dose, and the need to use two overlapping radiographs to visualise the
whole spine makes it difficult to identify the vertebral levels reliably.
3.2.2 Imaging with DXA and SXA
The use of lateral spine images, obtained with fan-beam X-ray bone densitometry
systems, offers a potential practical alternative to radiographs for clinical analysis
of vertebral fractures. DXA (and SXA) scanners, described in Section 2.2.4, use a
parallel beam geometry to eliminate the projection effect of conventional radiography.
As DXA scanners have come to be used for measurement of bone dimensions to assess
43
Chapter 3. Vertebral Fracture
(patient facingaway from page)
vertebrae in spine
image of vertebrae distorteddue to projection
source
film
Figure 3.5: The projection effect of lateral radiography on the spine.
osteoporosis, so the criteria for choosing the scanning method have changed. Such
morphometric measurements can also be performed using DXA scanners operating in
a single energy (SXA) mode, which can give a quicker scan time with some equipment.
However SXA imaging tends to give a noisy picture of the thoracic spine, due to soft
tissue associated with the lungs and ribs, and diaphragm motion artefacts. A review
of DXA technology and clinical use is given in [1].
Figure 3.7 shows vertebrae imaged using Dual Energy X-ray Absorptiometry (DXA),
showing both a good quality image of a healthy spine, and a severely osteoporotic
spine in which almost every vertebra is fractured. Figure 3.8 shows an image with
several vertebral fractures, together with a zoomed-in view of the fractured T7-T9
vertebrae. The horizontal lines appearing in the middle of some of these images are
diaphragm motion artefacts due to breathing during the exposure. DXA involves a
substantially lower radiation dose than conventional spinal radiographs [14], avoids
the projectional parallax effects of conventional radiographs due to the divergent
X-ray beam, and has the whole spine on a single image. DXA vertebral fracture
assessment can be combined with a bone mineral density (BMD) assessment using
the same equipment. If the scanner has a C-arm then the lateral spine images can
be obtained in the supine position without repositioning the patient into the lateral
44
Chapter 3. Vertebral Fracture
L4
T12
L4
L1
Figure 3.6: Apparent tilting and parallax effects on lumbar radiographs. The left handimage shows modest beam misalignment in the upper part of the radiograph (T12/L1).The right hand image shows more serious parallax effects.
45
Chapter 3. Vertebral Fracture
decubitus position ∗. Although the best resolutions available from DXA and SXA are
still significantly worse than that available using conventional radiography (0.35mm
vs 0.01mm for radiography), good agreement is obtained between morphometric mea-
surements on DXA and radiographs [130], and between expert reading of DXA images
and radiographs for moderate and severe fractures [110, 8, 56, 58, 20]. There are,
however, discrepancies over mild fractures, and there can be problems in visualising
the upper thoracic vertebrae on DXA (above T7), though there tend to be fewer ver-
tebral fractures in this region. Thus visual vertebral fracture evaluation, using lateral
spine images obtained with a DXA device, in the absence of, or as a pre-screen for
conventional radiography, can significantly improve osteoporosis risk evaluation. The
combined evaluation of vertebral fracture status and BMD could become the stan-
dard for patient evaluation, particularly in older postmenopausal women for whom
vertebral fractures are common but may be asymptomatic. A proportion of cases
where differential diagnosis is not possible on DXA, or where patient obesity causes
a particularly poor signal to noise ratio, would need referring for conventional spinal
radiography.
3.2.3 Magnetic Resonance Imaging (MRI)
Magnetic Resonance Imaging (MRI) is a non-ionizing modality that uses the inter-
action of a strong magnetic field with the spin alignment of ionized hydrogen atoms
(i.e. protons), typically in water, to produce an image. Sequences of radiofrequency
pulses are used to produce 3-dimensional images. The interactions between protons,
gradient fields, and radiofrequency pulses allow an image to be created based on
spatial frequency encoding. Hydrogen, present in musculoskeletal water, is the most
frequently studied component by MRI. Bone tissue has a very low water content and
as a result bone gives a low signal value in standard MRI. MRI imaging can therefore
be used for vertebral morphometry. Normally the T1-weighted sagittal slice is used.
Figure 3.9 shows the typical appearance of vertebrae on an MRI image.
Goh at al [66] evaluated the precision of MRI-based vertebral morphometry on a
dataset of 220 mid sagittal T1-weighted MRI images. They found a high degree of
precision (c. 2% CoV) for vertebral body heights. They concluded that MRI-based
morphometric analysis is likely to be superior to using radiographs, given the selective
planar nature of MRI imaging. Unlike radiological studies, MRI investigations of the
∗i.e. lying on one side with the knees tucked up in a somewhat foetal position
46
Chapter 3. Vertebral Fracture
Figure 3.7: The left hand image is a good quality DXA image showing a healthy spinewith points marked for 6-point vertebral morphomery. The right hand DXA image showsa severely osteoporotic spine, with almost every vertebra fractured, and poorer signal tonoise ratio probably due to low BMD or patient obesity
47
Chapter 3. Vertebral Fracture
Figure 3.8: The left hand DXA image shows a spine with multiple fractures in boththoracic and lumbar spines, whilst a zoomed-in view of the fractured T7-T9 is shown onthe right
48
Chapter 3. Vertebral Fracture
Figure 3.9: Appearance of verterae on a T1-weighted sagittal slice MRI image of thethoracic spine (T11-T9)
thoracic spine enable visualisation of the upper thoracic segments (above T4). More
importantly, MRI vertebral morphometry data may be examined and interpreted in
relation to other spinal pathologies or clinical findings, such as malignancy, infection,
and disc degeneration.
A similar study by Tomomitsu et al [133] comparing vertebral height measurements
with T1-weighted saggital MRI to radiographs for the lumbar spine concluded that
mild biconcave fractures could be better detected by MRI than by radiographs. How-
ever the conclusion was a non-sequitur, as the morphometric criteria used for fracture
diagnosis were based on thresholds defined for radiographs, and as discussed in the
next section are notoriously non-specific in any case. The study showed that MRI
tended to produce values of anterior and posterior height that were greater than those
obtained from radiographs, whereas the value of central vertebral height tended to
be lower in the MRI images (which explains why more morphometric “fractures”
were diagnosed on MRI!). Nevertheless the ability of the modality to allow accurate
height information was established. Although the cost of the technique has reduced
in recent years, access to MRI scanners still tends to be restricted to higher priority
uses, and so the use of MRI for vertebral fracture evaluation is not yet established in
clinical practice.
49
Chapter 3. Vertebral Fracture
3.2.4 Sagittal Computed Tomography
Bauer et al [5] investigated the use of sagittal reformations of axial computed tomog-
raphy datasets in the identification of vertebral fractures. The conclusion was that
sagittal CT reformations could more accurately assess vertebral fractures than stan-
dard radiographs, as long as the thinner available slice thicknesses were used (<3mm).
A recent study by Williams et al also [139] concluded that vertebral fractures could
be seen in sagittal slice reformations from 3D volume QCT images, although the
sensitivity of axial CT images was found to be poor. Although the original 3D axial
images may have been acquired for other purposes, it was recommended that the
sagittal reformation be examined by the consulting radiologists for signs of vertebral
fracture.
3.3 Vertebral Fracture Identification
The diagnosis of vertebral fracture can be challenging and somewhat controversial,
and there is no absolute “gold standard” for the definition of vertebral fracture. A
review of the various diagnosis methods is given by Guermazi et al [71]. Established
methods of vertebral fracture definition focus mainly on the identification of short
vertebral height as the indicator of a vertebral fracture. This can be problematical,
especially in the case of mild fractures for two reasons. The change in height cannot
be determined longitudinally (from a single image), and short vertebral height is not
specific to osteoporotic fracture. Short vertebral height may be a long-standing devel-
opmental abnormality, or due to for example Scheuermann’s disease or degenerative
disc disease; or normal vertebrae may exhibit misleading appearances on radiographs
due to parallax when the vertebral bodies are projected obliquely. Figure 3.10 shows
potentially confusing radiographs, one displaying reduced height due to Schmorl’s
nodes, and the other apparent wedging due to remodelling in degenerative disease.
Existing quantitative morphometric methods (discussed further below) based on pos-
terior, anterior and middle heights are particularly prone to false positives on non-
fracture short vertebral height deformity. Therefore the Genant semi-quantitative
method [65] (and see next paragraph) has become almost a de facto gold standard
for fracture assessment, but there still remains significant subjectivity, particularly
for mild (grade 1) fractures. The subjectivity problem is discussed at length by Jiang
et al and by Ferrar et al [79, 54], who propose an Algorithmically Based Qualitative
50
Chapter 3. Vertebral Fracture
Figure 3.10: Examples of non-fracture vertebral deformities. The left hand image shows anumber of vertebrae with Schmorl’s nodes, and the right hand image shows the wedge-likeappearance caused by remodelling due to degenerative disease
diagnosis method (ABQ). In addition to the issue of subjectivity, another problem
is the inadequate number of radiologists in some countries to interpret radiographs
[7]. Furthermore, newer scanning technologies (DXA) are becoming available in units
other than radiology departments. Therefore it is desirable to define a quantitative
approach which can capture at least some of the more subtle information used in
expert visual assessment.
3.3.1 Semi-quantitative identification of vertebral fracture
Identification of fracture in the Genant semi-quantitative (SQ) method [65] is based
on both the appearance of apparent reduction in vertebral body height, and also the
identification of radiological characteristics of fracture at the vertebral endplate. The
evaluation of the change in height is standardised so that a fracture is identified if
vertebral height appears reduced by more than 20%, and fractures are graded into 3
grades (mild, moderate, severe) corresponding to height loss of 20-25%, 25-40%, or
51
Chapter 3. Vertebral Fracture
Figure 3.11: The Genant semi-quantitative grading system
more than 40% respectively. Fractures can also be assigned a type (endplate, wedge
or crush) as illustrated in Figure 3.11. However height reduction alone is not intended
to be the sole criterion, and expert assessment of the radiological characteristics of
fracture should also be used: changes at the endplate and cortical margin, lack of
consistency with adjacent vertebrae, and lack of parallelism of the endplates. There
is a need for greater clarification and consistency of these radiological characteristics
of fracture, compared to other diagnoses. Otherwise there is a danger of too much
subjectivity. Nevertheless studies have shown good inter-radiologist concordance in
applying SQ, when the radiologists doing the assessment had been explicitly trained
in the technique [87, 141, 11]. On the other hand the proportion of mild vertebral
fractures identified by SQ tends to be greater than for other methods. For example in
the Study of Osteoporotic Fractures (SOF) almost four times as many mild vertebral
deformities were identified by SQ than by various quantitative approaches [142], and
in another study [11] mild vertebral fractures identified by SQ in men were found not
to be correlated with BMD measurements.
Jiang et al and Ferrar et al [79, 54] have proposed an Algorithmically Based Qualita-
52
Chapter 3. Vertebral Fracture
tive diagnosis method (ABQ), in order to address the rather subjective assessment of
radiological characteristics of fracture in SQ. In ABQ more stress is placed on the con-
cavely depressed appearance of the collapsed endplate as being the fundamental way
of distinguising an osteoporotic fracture from other forms of short height deformity.
As already discussed above this tends to produce a more diffuse or multiple-edged
appearance to the vertebra, whereas for example in wedging due to degenerative
disease the edges of the vertebral bodies remain relatively crisp. ABQ has a rec-
ommended flowchart of various differential diagnoses. In a recent study [55] it was
found that women with ABQ-identified vertebral fractures had significantly lower
BMD (and age-adjusted BMD) than those women who had been diagnosed with ver-
tebral fracture by other methods, but were regarded as short vertebral height (SVH)
deformities (not fractures) by ABQ. Similarly the ABQ-positives had better correla-
tion with other osteoporosis surrogates (history of non-vertebral fracture, low weight
and self-reported height loss).
3.3.2 Quantitative Morphometry
Melton [21] developed definitions of vertebral fractures utilising percentage reduc-
tions in ratios of anterior, middle or posterior heights of vertebral bodies compared
with normal values for that particular vertebral body. Eastell [45] modified this
method, defining fractures on the basis of standard deviation reductions instead of
fixed percentages. These were standard deviation derived from an (outlier-trimmed)
normal population. A 3 SD threshold was proposed. The Eastell/Melton criteria
define endplate fractures when the mid-height to posterior height ratio is below a
threshold, and wedge fractures when the anterior to posterior height ratio is below
a threshold. Crush fractures are defined by taking the ratio of the posterior height
to that of both neighbours and then thresholding both of these. One problem in the
crush criteria is that by using two ratios it doubles the chances of a false positive.
There can also be problems in identifying crush fractures if the neighbouring verte-
brae are fractured. Also false positive endplate or wedge fractures can be identified
because the posterior height is unusually large. These consideration led McCloskey
[95] to propose a number of modifications to the Eastell/Melton standard criteria,
including the use of predicted posterior heights and the addition of more complex
criteria in order to reduce false positives.
In the McCloskey method the expected posterior height is predicted from up to four of
53
Chapter 3. Vertebral Fracture
its neighbours in a way which eliminates the use of fractured neighbouring vertebrae
as predictors. The algorithm proceeds up the spine † and predicts a vertebra’s pos-
terior height from up to four of its nearest neighbours. Each prediction assumes that
vertebral heights scale in the same ratio as the ratios of their mean heights, so some
mean reference data are required (typically obtained from a trimmed population of
healthy subjects). This gives up to four predictors, but some of the four neighbours
may be excluded from the prediction step. Firstly any vertebra below the currently
considered one that has already been classified as a crush fracture is excluded. Sec-
ondly the maximum of the predicted heights from the remaining neighbours is taken
as an initial base against which to compare the other remaining predictors. If the ra-
tio of any of the remaining predictors to this maximum is less than the crush fracture
threshold, then it too is excluded. The final prediction is the mean taken over the
remaining set of predictors. This gives a baseline posterior height against which to
assess crush ratios. Furthermore a vertebra is only considered to have an endplate or
wedge fracture if the ratio of the mid-height or anterior height to both the posterior
and predicted posterior heights is below threshold. These additional criteria tend to
reduce false positives caused by large posterior height.
Minne [103] has developed an approach that assesses the presence and severity of
vertebral fractures by comparing vertebral heights that have been normalised for
body size by dividing all values by the corresponding values of T4. These results are
then compared to a normal range based on the values for healthy young women.
The main weakness of QM is its lack of specificity, because QM treats all height re-
ductions as indicative of vertebral fracture. Poor differentiation between true fracture
and non-fracture deformity by QM mainly affects mild deformities [141, 11, 68]. Wu
et al. [141] compared SQ and a morphometric approach using different fracture crite-
ria for the detection of incident fracture. A consensus reading by three observers using
SQ was used as the gold standard. There was only fair to moderate agreement be-
tween quantitative morphometry and visual interpretation (the highest kappa score
was 0.63). The authors concluded that the assessment of incident fractures with
morphometric techniques may not be sufficiently reliable, and that visual reading
by a trained observer should also be performed. In a comprehensive study, Black
et al. [11] compared four different morphometric techniques in 3,013 spine films.
In addition, SQ was compared with the morphometric approach in 502 cases. The
agreement between the semi-quantitative approach and the quantitative approaches
†most caudal vertebra first
54
Chapter 3. Vertebral Fracture
was moderate. There was a high concordance between quantitative morphometry
and the semi-quantitative evaluation for fractures defined as moderate or severe by
semi-quantitative reading. There was, however, a significant discordance for fractures
designated mild in the semi-quantitative reading. In a recent study by Ferrar et al
[55] into the ABQ method with a fracture-enriched population, it was found that
those women who were diagnosed with osteoporotic fracture by ABQ correlated well
with other measures of osteoporosis (BMD age-adjusted z-score, weight and height
loss); whereas those ABQ-negatives with QM fractures (often mild wedges) were not
significantly different to the normal population with regard to these measures. The
conclusion was that morphometric diagnosis alone is insufficiently specific.
3.4 Conclusion
There is a significant degree of under-reporting of vertebral fractures, but problems
remain in the differential diagnosis of osteoporotic fracture. There is a large degree
of subjectivity in the application of the most widely accepted semi-quantitative tech-
nique, especially in distinguising mild fractures from other forms of SVH deformities.
The newer ABQ method shows good evidence of better distinguishing between frac-
tures and other SVH deformities. Nevertheless, as there is both a world-wide shortage
of radiologists [7] to perform expert reading of difficult cases, and newer equipment
such as DXA scanners are now becoming available outside of radiology units (e.g. in
general practice), or are being used in pre-screens by less-skilled operators (e.g. spe-
cially trained radiographers), there is also a need to advance quantitative techniques
of fracture diagnosis. The first requirement is to (semi)-automatically determine the
full vertebral shape, and secondly take account of more subtle shape information
(concave depression), and textural features in some way, in order to use at least some
of the visual clues used in expert reading. Although some difficult cases will always
require differential diagnosis by an expert radiologist, there is also a role for better
pre-screening of the cases that need such referral.
Later in this thesis we develop methods of automatically locating vertebrae and using
the parameters of statistical appearance models to achieve this. In the next chapter
we review the model-based vision methods upon which our work is based.
55
Chapter 4
Model Based Vision
4.1 Introduction
Medical images often provide noisy and sometimes incomplete data, and typically deal
with complex and variable structures. Sometimes additional structures are overlayed,
for example soft tissue or ribs can overlay vertebrae in the thoracic spine. For such
reasons low-level feature detection processes such as edge detection alone tend to be
unreliable when applied to medical or other biological images, as there are invariably
gaps and spurious fragments. The most effective approaches explicitly describe the
shape of object boundaries and introduce constraints on these, whilst also allowing a
degree of flexibility, in order to accommodate shape variation. These class of methods
are known as Deformable Template methods
Deformable Template methods are local optimisation schemes for locating required
contours in images, which typically represent the boundaries of objects. The various
schemes try and match a deformable model to an image by some form of local opti-
misation, typically of some form of “energy” functional. They all have an aspect of
incorporating prior knowledge about the kinds of allowable shape. Thus they have a
certain resistance to seduction by the noise or clutter typical of medical images, and
can “fill in” missing parts of the shape (e.g. where an edge is obscured or weak) by
a form of extrapolation.
Because the contours can deform to fit a wide variety of shapes, these methods are
often used in medical image analysis. This is because biological structures and organ-
56
Chapter 4. Model Based Vision
isms are naturally deformable, and their shape varies considerably. During the search
the image data, desired contour properties (or prior knowledge of possible shapes),
and also constraints are incorporated. The combination of these three features into
a single search give these algorithms a good chance of success, where for example a
purely data-driven search would be unreliable.
In this chapter we briefly review some early methods of Deformable Templates leading
up to the Active Shape Model of Cootes et al [27]. We then introduce the later
Appearance Models of Edwards et al [46, 24] and its fitting via the Active Appearance
Model [26].
4.2 Snakes
Snakes [83] incorporate a low degree of prior knowledge - typically a measure of
allowable local curvature and elasticity. The essential idea in the seminal paper by
Kass [83], was to take a form of feature map and treat it as a landscape in which a
“snake” (i.e. the contour) could slither around. Prior information on possible shapes
could be added in two ways. Kass recognised that the classical Euler equations for an
elastic string moving in an external field could be used in computer vision. The prior
information is relatively weak, and enforces smoothness on the snake by adding an
“internal” energy term, representing the potential energy stored in the snake due to
stretching and bending. The snake is described by means of a parameterised variable
s on [0,1]. The elastic energy adds in terms in the first derivative w.r.t s; whilst
bending energy increases with the second derivative (i.e. curvature is penalised).
Reducing the curvature coefficient β over a section of a snake allows a corner to more
easily develop there. The external energy depends on the image data. For example
when the snake is seeking a strong edge, an appropriate energy term to add is
E = −|∇I|2 (4.1)
The equilibrium configuration is the minimum energy total energy contour. As the
total energy is integrated over the snake this is a classical calculus of variations
problem, soluble via the Euler-Lagrange equations for the snake energy functional.
Many methods have been proposed to improve the original snake of Kass. These typ-
ically involve adding additional terms to the energy functional. Cohen [22] proposed
57
Chapter 4. Model Based Vision
an internal “inflation” force to expand a snake past spurious inner edges towards the
real edges of the sought structure, thus making the segmentation less sensitive to the
starting position. Poon [107] used simulated annealing to avoid entrapment by local
minima. Chakraborty and Duncan [19] introduced region-based information in order
to decrease seduction by insignificant edges. A review is given in [97].
4.2.1 Deformable Elliptical Models
Just as snake-type models have improved on simple image edge detection by enforcing
constraints on likely object boundaries, so the explicit incorporation of a priori knowl-
edge of shape variation into deformable models can enable the problem of matching
the model to the image to be further constrained. Staib and Duncan’s approach [128]
used stronger prior shape information than snakes, but it was somewhat less general,
requiring closed curves. It was applied to tracking heart motion on echocardiagrams.
The model was based on the elliptical Fourier decomposition of the shape boundary,
and can be viewed as a linear combination of basis functions which are all ellipses, of
varying scale, orientation, and phase shift.
The high frequency content of the decomposition was discarded. For an application
fitting to echocardiagrams of a heart ventricle, only the first four harmonics were
retained. This smoothes any segmentation derived as a model fit. By fitting the
model to an ensemble of shapes, a prior distribution can be derived for the model
parameters. Separate probability distributions were derived for each Fourier coeffi-
cient on the basis of the training set contours. But the probabilities of individual
Fourier coefficients are not necessarily independent over the training examples, with
the result that if two coefficients are strongly correlated, unrealistic combinations
of coefficients could still be considered likely by the model. The segmentation was
performed by using essentially a strongest edge search, but constrained by the prior
distribution of coefficients.
4.3 Elastic Models
Nastar and Ayache [104] also used a set of orthogonal basis functions, but these were
based on ideas in mechanics of the normal modes of vibration of a body. Their model
58
Chapter 4. Model Based Vision
was not simply of a contour, as elastic coupling between non-adjacent parts creates
a model of a volume, not just its surface. This also stops model solutions potentially
collapsing, as a snake could be inclined to, but it is necessary to set up a detailed
physical model of the body (or area) in terms of masses connected by springs, in
the manner of finite element analysis. The authors put the Newtonian equations
for damped coupled harmonic oscillators in linear form, thus decomposable in terms
of eigenmodes. However there are a large number of essentially ad hoc parameters
controlling elasticity terms and pseudo-masses.
4.4 Active Shape Model (ASM)
4.4.1 Summary of ASM
In the Active Shape Model (ASM) of Cootes et al [27], the equivalent of the template
is a statistical shape model. This requires a training set of representative images.
Points of interest (e.g. significant anatomical landmarks, corners, and strong edges)
must be manually annotated. The annotation must of course be in a consistent
manner (i.e. the same points in different images need to correspond).
Once the images have been annotated, the training shapes are first co-registered onto
a common scale using Procrustes Analysis, to first remove variation which can be
reduced to rigid-body transforms (translation, rotation, and (isotropic) scaling).
The basic insight of the ASM was that the distribution of all (co-registered) shapes
in the training set can be approximated by the first two moments of the joint dis-
tribution. So the mean (co-registered) shape is derived. Individual variation is then
expressed in terms of the residuals from the mean. Deviation from the mean is es-
sentially captured in the residual covariance matrix, as there is of course substantial
correlation between the variation in points. It is precisely this correlation - which
descibes how points tend to move together - that implicitly captures the shape con-
straints inherent in the training set. The covariance matrix is diagonalised, thus
performing Principal Components Analysis on the training set.
The model of all possible shapes of the required type is then simply the mean shape
plus a weighted sum of the first m principal components. Thus the model is linear,
59
Chapter 4. Model Based Vision
and uses an orthogonal basis. Also the eigenvalues can provide bounds on attain-
able shapes, either singly; or by assuming a Gaussian model via a hyper-ellipsoid
whose axes are parameterised by the (square rooted) eigenvalues. Thus strong prior
constraints are easily associated with the Shape Model.
4.4.2 Point Distribution Models
The Shape Model used in the ASM is a form of Point Distribution Model (PDM). We
now summarise this in mathematical form. We deal only with the 2-dimensional case,
but the method can be generalised to shapes in 3 dimensions or more (sometimes a
temporal sequence is represented as a 4D shape).
Let xi be a vector of length 2n describing the n points of the ith shape of the training
set, given by:
xi = (xi1, xi2, xi3, xi4, . . . , xi2n−1, xi2n) (4.2)
where (xi2j−1, xi2j) is the Cartesian coordinate of the jth landmark point on the ith
training example.
The training set shapes are then aligned to remove variability due to translational,
scaling, and rotation (i.e. similarity transforms). In principle more general alignment
can be done (e.g. using affine transforms to allow shearing). To align the set of N
training shapes, each shape is initially aligned with the first shape in the training set.
The mean shape of the training set is calculated, and all shapes are then aligned to the
mean, whereupon the mean is recalculated. The alignment and mean recalculation
are iterated until the alignment converges, using a summed square distance as the
alignment metric. Having found a scaling, translation and rotation to align each
training shape to the mean shape, aligned shapes are used in subsequent analysis.
First the mean aligned shape is calculated, and then the deviations about that mean
are evaluated, whence the covariance matrix is computed. Mathematically, having
obtained N aligned shapes x1 . . .xN , and calculated the mean shape
x =1
N
N∑
i=1
xi (4.3)
60
Chapter 4. Model Based Vision
the deviations of each example from the mean are calculated, given by
di = xi − x (4.4)
Then the covariance matrix C is given by
C = ddT (4.5)
Principal component analysis (PCA) [93] is then applied to these deviations, effec-
tively fitting a 2n-D ellipsoid to the distribution of the training vectors d in the 2n-D
space. This is done by calculating the 2n eigenvectors pk (k = 1, . . . , 2n) of C , cor-
responding to the principal axes of the ellipsoid, with their corresponding eigenvalues
λk such that
Cpk = λkpk (4.6)
and λk ≥ λk+1. The eigenvectors with the largest eigenvalues correspond to the axes
that describe the largest variation in the shapes. The eigenvectors are orthogonal
and therefore uncorrelated, and can each be thought of as an independent mode of
variation. Each eigenvalue gives the amount of residual variance associated with
the corresponding mode. Furthermore the dimensionality can often be substantially
reduced by retaining only the most significant components (i.e. the ones with the
largest eigenvalues). For example to retain 98% of the original variance it is necessary
to choose the cut off at the eigenvalue m so that
m∑
k=1
λk >= 0.98Tr(C) (4.7)
There are many methods of diagonalising C, but one of most numerically robust is
to use Singular Value Decomposition (SVD) [108]. One problem can be that there
are insufficient training examples to be able to obtain a matrix of full rank, but by
transposing the problem it is possible to derive the N (training set size) eigenvectors
of the N by N pseudo-covariance matrix:
C′ = dTd (4.8)
It can be shown that the eigenvalues of C′ are also eigenvalues of the (degenerate)
61
Chapter 4. Model Based Vision
C. Also given the eigenvectors P′ of C′, the reduced set of N eigenvectors P(N) of C
for the non-zero set of eigenvalues is given by
P(N) = dP′ (4.9)
SVD is a very robust method when smaller-than-ideal training set sizes lead to C
being not positive definite, and can be applied as described above to C′ in this
situation.
By taking the first m eigenvectors in order of decreasing eigenvalue (i.e. most variance
explained first), the principal shape variation observed can be described by a 2nxm
matrix Ps, containing only these eigenvectors:
Ps = (p1p2 . . .pm) (4.10)
Any shape x similar to those in the training set, and allowable by the modes of
variation given by the principal axes of the hyper-ellipsoid can be generated using
x = x + Psb (4.11)
by varying a vector of weights b = (b1b2 . . . bm)T . These are known as shape param-
eters, and completely describe the range of possible shapes allowable by the PDM.
Given another shape x′ within the subspace defined by the set of eigenvectors, the
required shape parameters b′ are given by
b′ = PsT (x′ − x), (4.12)
since Ps is orthogonal, as the eigenvectors of a symmetric matrix are always orthog-
onal.
Even if x′ does not lie within the subspace, the above solution for b′ is still optimal
in a least square sense. When applying shape models the parameters are normally
additionally constrained in that the elements bk of b are kept within some limits, such
as between −3√
λk and +3√
λk, where λk is the kth eigenvalue, and hence√
λk is the
standard deviation of bk over the training set. This would define a hyper-cuboid in the
shape-space. This can still give rise to unplausible shapes if all parameters are close to
their extrema, so a better method is to constrain to a hyper-ellipsoid by limiting the
Mahalanobis distance of b′. The orthogonality of the Principal Components means
62
Chapter 4. Model Based Vision
Figure 4.1: Spine shape model variation mode 1
that covariance terms are zero, and the Mahalanobis distance D is given by:
D2 =i=m∑
i=1
b′2
i
λi
(4.13)
Under the assumption that the distribution of shape parameters over possible valid
shapes is Gaussian, D is proportional to the log-likelihood of this distribution. So to
prevent an unlikely configuration, a limit can be placed upon the maximum Maha-
lanobis distance allowed.
As an example of a shape model, Figures 4.1 and 4.2 show the change in shape
for the vertebrae from L4 to T7 as the first and second shape parameters of a shape
model of the spine vary through two standard deviations either side of zero.
63
Chapter 4. Model Based Vision
Figure 4.2: Spine shape model variation mode 2
4.4.3 Active Shape Model Search
4.4.3.1 Summary of ASM Search
The Active Shape Model is a method of searching for the best set of pose parameters
and (constrained) shape model weights to generate an allowed shape which best
matches the image evidence. In order to have a criterion of best match, it is necessary
to supplement the shape model described above with a texture model. This is a set
of models (one per shape point) which models the grey level texture around each
point. The ASM proceeds by locally moving points along normals to the current
shape boundary, to locate the best profile match for each point. Then this set of
points is fitted to the constraining shape model. We now define the algorithm in
more detail.
4.4.3.2 Modelling Image Grey Levels
The ASM models the appearance of structures around the object boundaries using
a profile model of the image brightness perpendicular to the boundary. Typically
the brightness levels in nearby pixels will be correlated, and so again to model the
expected profile observed over the training set at a particular landmark point, a
64
Chapter 4. Model Based Vision
Principal Component model (similar to that used for PDMs) is constructed.
Rather as the shapes were co-aligned using a similarity transform, each of the grey-
level profiles can be co-aligned over the training set by using a linear transform (offset
and linear scaling) to model gross variation in brightness and contrast. Each pixel’s
grey-level model is constructed after this normalisation,
Having performed PCA on the training profiles, and selected the number of modes
required to explain most of the variance between training profiles, the jth profile can
be modelled as
gj = gj + Ptjbj (4.14)
where gj is the mean profile for model point j, Ptj the truncated, orthogonalised
covariance matrix, and bj a vector of weights.
It is necessary to define a measure of the quality of match between the image evidence
and the profile model.
In fitting the model to the profile g, bt = PTtj(g− g) is the parameter vector used to
describe the profile. gfitted = g + Ptjbt is the closest approximation of the model to
the profile. The residual r is given by r = g − gfitted. The fitness measure is given
(from [28]) by
f =
i=mt∑
i=1
b2i
µi
+
k=n∑
k=1
r2k
vk
(4.15)
where µi is the eigenvalue of mode i and vk describes how well the kth profile point
was modelled, and is the mean variance of each training profile from the best fit of a
model trained from the remaining training profiles. So this ‘fitness’ measure describes
the extent to which a sampled profile in the image differs from the profile model in
both ways observed in training (i.e. Mahalanobis distance - see Equation 4.4.2); and
in ways which are not explicable by the training data (residual variance). The second
term of the measure describes the residual deviations of the sampled profile from
the model, relative to the model’s residual variance over the training set. Arguably
these variances should ideally be estimated using jackknifed leave-one-out train/refit
schemes; but in practise it is usual to use a somewhat biased estimate by using the
refitting residual variance over the model’s own training set.
65
Chapter 4. Model Based Vision
4.4.3.3 Combining Grey Level and Shape in Search
For each model point the ASM searches along the local normal over some search
distance to locate the point which allows texture model parameters to best match
the local texture model to the sampled grey level texture. At each iteration a set of
best new points are thus obtained. These are located in a manner which is greedy
on a per-point basis. The overall shape must still conform to the shape model, so
next the best overall shape model fit to the new point set is used. In effect this
also smooths the shape. Also the allowed shapes are bounded by the shape model
hyper-ellipsoid; and if the least squares fit falls outside the bound, the parameters
are adjusted so the fit lies on the closest point of the allowed volume. This gives the
ASM good robustness in the presence of noise. The ASM search continues to iterate
in this manner until the solution converges.
Local minima in the ASM search objective function can be reduced by smoothing
the objective function by considering the image at multiple scales. This has the
additional advantage of enabling profiles to be searched for over longer distances.
During training, a pyramid of images of multiple resolutions is created. These are
generated by using Gaussian smoothing, followed by sub-sampling - normally in as-
cending multiples of 2. Separate grey-level profile models are generated for each level
of the multi-resolution pyramid by sampling from the appropriate subsampled im-
age, instead of the original full resolution image. This process is identical to that for
training a conventional local grey-level model, but using a lower resolution image,
and scaling the positions of the landmark points appropriately to the smaller image.
The local grey-level profiles are sampled over the same number of pixels as before,
making them effectively longer when viewed in the original resolution image.
To perform multi-resolution image search, conventional search is started in a reduced
resolution image. When search has converged in that image, the shape model is
projected into a higher resolution image, where search continues. This process repeats
until search has converged in the full resolution image.
66
Chapter 4. Model Based Vision
4.5 Active Appearance Models
4.5.1 Background to the Active Appearance Model
There are several weaknesses in the way that the ASM treats shape and texture.
Firstly separate texture models are generated for every point, but in practise texture
at points close to each other will be correlated. Secondly there will be correlations
between shape and texture which are not modelled - the degree of plausibility of a
particular set of textures might depend on the shape, but the ASM has an in-built
bias to try and locate points that appear to have a surrounding texture close to the
mean. Thirdly there is a degree of local greediness about the way the ASM initially
moves each point separately to its own local optimum, before applying the shape
model consraints. Fourthly because only profile models are considered there is no
use of texture information within the interior of the shape but not intersected by a
profile.
Active Appearance Models arose as a way of overcoming these shortcomings, and
also as extensions of earlier work on Eigenfaces [135] and Active Blobs [121]. In the
Eigenface approach of Turk and Pentland, texture is modelled by extracting image
texture from a window containing a face, and then projecting the texture vector
into a subspace obtained by performing PCA over a training set of such windowed
faces. Rather than using a simple window, the AAM extracts texture from within
the convex hull of a shape described using a Shape Model, and also involves warping
to try and obtain a “shape-free” texture model.
4.5.2 Appearance Models
There are a number of AAM variants. We describe first the Appearance Model of the
original “classical” AAM. Unlike the ASM this samples texture from within the entire
region of the shape (actually its convex hull). Because the shape varies, the AAM
tries to define a shape-free patch from which to sample image texture. This is done
by using a triangulation mesh between the points of the shape model, that allows the
image patch to be covered by triangles. Each triangle is then affine warped so that
the three control points defining the triangle are in their mean positions. The process
of warping all the triangular segments warps the whole image patch to that defined by
67
Chapter 4. Model Based Vision
the mean shape. An input to the AAM building process is the total number of pixels
to be included in the model. In effect this defines how many points are taken from
each triangle in the mesh. After warping to the mean shape each training image has
this set of sample points in correspondence, and the pixel values at all these points
are concatenated into one overall texture vector gi (for training image i). The texture
vectors are co-aligned for gross brightness and contrast variation using a linear shift
and scaling to the mean in a similar manner to the Procrustes alignment of shapes.
Again, just as for the shape model, PCA is performed on the aligned texture vector
covariance matrix. This results in a texture model of the form:
g = g + Ptbt + r (4.16)
The next step in the appearance model building is to perform a tertiary PCA on
both the shape and texture parameters. Firstly in each training example the resul-
tant shape and texture parameters are concatenated into a single combined vector,
but with a rescaling weight w on the shape components to bring them into a com-
mensurate scale with the texture parameters. Thus we have combined vectors b(a)i
with:
b(a)i = (wbs1, wbs2, ...wbsms
,bt1,bt2, ...,btmt)T (4.17)
The scaling w is typically chosen so that shape and texture parameters contribute
an equal variance to the total variance of the combined vector parameters. Then to
model shape-texture correlation PCA is performed again on the combined vectors to
arrive at a compact “appearance model”. Typically some significant dimensionality
reduction can be obtained by retaining only 99% of the overall variance. The new
matrix of orthogonal eigenvectors is designated as Q, and we then have the linear
model
b(a) = b(a) + Qc + ra (4.18)
where c are the appearance parameters. Constraints are applied to the appearance
parameters in a similar way to the shape model parameters. These implicitly con-
strain both shape and texture, but are somewhat stricter than constraints on either
alone, in that correlation between the two has been modelled. So certain forms of
texture can only occur with certain shape deformations.
Figures 4.3 to 4.6 show the resultant variation in the appearance of the spine as
68
Chapter 4. Model Based Vision
Figure 4.3: Spine appearance model variation mode 1
the first four appearance parameters vary by 2.5 standard deviations about zero.
Interestingly the second mode appears to be largely a gross variation in brightness
across the image.
Later in this thesis we discuss how we fit together appearance models of smaller
subsections of the spine, which use triplets of vertebrae. Figures 4.7 and 4.8 show
the resultant variation in the appearance of the L1-centred appearance model as the
first two appearance parameters vary by 2.5 standard deviations about zero.
Note that the linear nature of the models allows us to express the shape and grey-
levels directly as functions of c. Firstly we row-decompose Q into the ms shape
parameter rows and mt texture parameter rows, so
Q =
(Qcs
Qct
)(4.19)
69
Chapter 4. Model Based Vision
Figure 4.4: Spine appearance model variation mode 2
70
Chapter 4. Model Based Vision
Figure 4.5: Spine appearance model variation mode 3
71
Chapter 4. Model Based Vision
Figure 4.6: Spine appearance model variation mode 4
72
Chapter 4. Model Based Vision
Figure 4.7: L1 triplet appearance model variation mode 1
Figure 4.8: L1 triplet appearance model variation mode 1
It is possible to decompose equation 4.18 to obtain the model frame shape or texture
as:x = x + Qsc
g = g + Qtc(4.20)
where, with Im denoting an m-dimensional identity matrix, the sub-matrices are given
by
Ws = wIms
Qs = PsWs−1Qcs
Qt = PtQct
(4.21)
As well as texture models that sample within the shape‘s convex hull using a tri-
73
Chapter 4. Model Based Vision
Figure 4.9: L1 triplet profile gradient appearance model variation mode 1
angulated mesh, it is also possible to use profiles in an ASM-like manner. However,
unlike the ASM, all the profiles are concatenated together into a single texture vector.
Using profiles like this removes one advantage of AAMs, namely the warping of the
patch to a shape-free control patch. On the other hand by sampling further outside
the shape region, the convergence zone of AAM search can be extended. A degree
of shape normalisation can be introduced by scaling the profile length in the ratio of
the overall shape size (e.g. r.m.s. distance to centroid) to that of the mean shape.
Figures 4.9 and 4.10 show the variation in the first two appearance modes of a
profile model for an L1-centred triplet of vertebrae. The gradient along the profile
has been sampled and then sigmoidally renormalised to the local image statistics in
a manner discussed later in section 4.5.5.
4.5.3 Fitting Appearance Models
Essentially the goal of fitting an appearance model to a given image is to find a
set of shape pose parameters, texture scaling parameters, and appearance model
parameters c that best match the synthesised texture to the sampled texture. The
Active Appearance Model (AAM) is a method of fitting appearance model parameters
to best match the image evidence, given the implicit model constraints. One of the
strengths of AAMs is that by using a sum-of-squares measure to compare model and
target, they can exploit the linear nature of the problem to perform fast parameter
updates and thus are able to match to a new image very quickly. The AAM fitting
74
Chapter 4. Model Based Vision
Figure 4.10: L1 triplet profile gradient appearance model variation mode 2
scheme shares some common ideas with the Active Blobs of Sclaroff and Isidoro [121],
namely that the texture should be projected into a fixed shape space, and that the
current residuals should drive updates to the model parameters via a linear model. In
the Active Blobs method the update matrix was derived from a single image with a
shape model based on elastic vibration modes (like Nastar and Ayache [104]), and the
texture model involved only planar lighting changes; whereas the AAM incorporates
more general variability over an ensemble of training images. The key insight of a
linear relationship between residuals and parameter updates is retained. So the AAM
learms from the training set not only the appearance model itself, but also how to fit
that model.
Firstly the shape pose parameters t (e.g. translation, scaling and rotation) and
texture scaling parameters u are concatenated with the appearance model parameters
c to give the full set of AAM parameters, denoted by p. Thus
pT =(cT |tT |uT
)(4.22)
The AAM seeks to minimise a sum-of-squares problem of the form
F (p) = |r(p)|2 = rT r (4.23)
where the vector of residuals r is calculated as:
r = w(I : p) − g (4.24)
75
Chapter 4. Model Based Vision
where w(I : p) is the (normalised) texture sampled from the image I given the
shape defined by model parameters p. The AAM is trained (see [26]) how to solve
this minimisation by learning the relationship between perturbations in parameters
and the texture residuals that these induce. This relationship is then inverted to
provide an update matrix that can be applied to the current residual vector. When
applied iteratively this provides an efficient model fitting scheme where the necessary
parameter change δp to the parameter vector p, given r is estimated as:
δp = −Rr (4.25)
where R is derived from the pseudo-inverse of the Jacobian J = δrδp
thus.
R =[JTJ
]−1JT (4.26)
The Jacobian is learnt off-line by first fitting the appearance model to each training
example in turn, and then systematically perturbing each AAM parameter through
a series of steps from its correct value. The induced residuals then have a Gaussian
smoothing kernel applied to them, and by averaging over the training set a numerical
approximation to the Jacobian is derived.
Of course the linear relationship 4.25 may break down, particularly if the residuals
are large (i.e. outside the training region of J). In this case the update step may
actually increase the overall residual error. Therefore a simple form of line-search is
applied. Instead of applying an update δp the update αδp is applied, with α = 12.
If the error is still worse α is halved again, until either some improvement results,
or a minimum value of α is attained, in which case the AAM is assumed to have
converged.
Like the ASM, AAMs are typically used with coarse-to-fine search using Gaussian
image pyramids (usually separated by a scale factor of two). A multi-resolution AAM
starts from a coarse scale first, using a heavily smoothed image, and a sub-sampled
texture patch. Normally it will quickly find an optimal though imprecise fit. A few
iterations at each of the higher resolution layers refine the fit. Starting at the coarse
up-smoothed scale tends to avoid entrapment by local minima, and increases the
convergence zone of the AAM.
76
Chapter 4. Model Based Vision
4.5.4 Initialisation
AAMs typically require an initialisation reasonably close (in some sense) to the sought
object, since they perform local search. A small set of approximate initialisation
points may be provided by a user interaction (e.g. clicking the mouse at a central
position); or from some higher level image understanding process. For example Howe
et al [77] use an initial search with a Generalised Hough Transform using a template
given by the mean shape model. The best matches in pose are used to initialise an
AAM.
4.5.5 Extensions to the AAM
4.5.5.1 Non-Linear renormalisation and Feature Appearance Models
Appearance models are not limited to simple grey level texture. It is also possible
to build appearance models of various feature measures. Typically these use a non-
linear renormalisation that makes subsequent AAM search more robust to changes
in contrast and lighting effects. Cootes [32] used Sobel-filtered gradients with the
sigmoidal renormalisation:
g′ =g
|g| + |g|(4.27)
Bosch et al [15] used a renormalisation to transform the asymmetric pixel-intensity
distribution of ultrasound cardiac images to a Gassian. This gave significantly im-
proved AAM matching results. The AAM’s L2 norm is theoretically optimal (in
a maximum-likelhood sense) for Gaussian residuals, so it makes sense to apply a
remapping to the sampled texture if this is more likely to result in Gaussian residu-
als. Another way of looking at it is that the AAM’s texture model can be a kind of
feature detector operating for example with some measure of edge strength or “corner-
ness”. Scott et al [123] developed a pair of orthogonal edge and corner measures
from the structure tensor in a similar manner to a Harris corner detector. These were
then renormalised onto a [0,1) scale using a similar sigmoidal function to that used
by Cootes [32]. An advantage of the renormalisation is that the value of the “feature
present” indication is upper-bounded at 1. This tends to avoid very large outliers
that can wreak havoc with least squares estimators. The other advantage of using
local image structure measures is that they are relatively invariant to imaging param-
77
Chapter 4. Model Based Vision
Figure 4.11: Face corner feature appearance model - mode 1 variation
eters. Furthermore because the structure tensor used in the measures of [123] involve
smoothing in a region around each pixel, they involve more information from outside
the boundaries of the current shape. When this is also placed in a multi-resolution
search, the effective convergence radius can be significantly increased. Figure 4.11
shows the variation in the first appearance mode’s “cornerness” feature component
in an appearance model of a dataset of faces, using Scott’s feature appearance model
[123]. The eyes, nostrils, and mouth are clearly visible as corner features.
4.5.5.2 AAM variants
The AAM has been widely adopted, and many applications and modifications sug-
gested. Notable amongst these are the Shape-AAM [25] and the Inverse-Compositional
AAM [3]. In the Shape-AAM [25] the residuals drive only the shape and pose param-
eters, and the texture parameters are then set by directly fitting the texture model
to the texture sampled at the current shape. Overall combined appearance model
constraints can then be imposed. In the Inverse-Compositional AAM [3] of Baker
and Matthews, it is demonstrated that the shape update should be implemented as
a function composition rather than a simple linear addition. However this approach
requires the shape and texture parameters to be updated separately, so potential
advantages of modelling their correlation are lost.
A potential problem for AAMs is variations in R across the population. The estimate
of the Jacobian from the training set will only be an approximation for any given
target image, and may be a poor one if the target image is significantly different
from the training images, or in the margins of the apearance distribution. Bataur
and Hayes [4] have proposed Adaptive AAMs, in which the Jacobian varies as a
linear function of the position in parameter space. This can lead to more robust
78
Chapter 4. Model Based Vision
and accurate convergence, particularly when dealing with examples where there is
significant texture variation (such as faces under differing lighting conditions). Cootes
introduced the Updating AAM [35] to address this problem. In the updating AAM
the update matrix is initially the same as the standard AAM, but is progressively
corrected as the search proceeds by continually re-estimating the Jacobian by utilising
the actual residuals occurring after the update.
4.5.6 Constrained AAM
Since the AAM is a local search method, and relies on an update matrix learned
in the locale of correct solutions, it relies upon a suitable initialisation. Typically
this initialisation is provided by prior estimates of some of the shape points, either
manually (e.g a user clicks the mouse pointer on a small subset of points), or via
automatic feature detectors. There may be some prior knowledge of the variances
associated with these initialisation points. Cootes developed the Constrained AAM
[31] to incorporate such constraints.
The least squares minimisation of the standard AAM is replaced by a maximum a-
posteriori (MAP) formulation, which seeks to maximise the probability of the model
given the data which (by Bayes theorem) is proportional to:
P (data|model)P (model) (4.28)
Given a uniform prior on the model parameters this is equivalent to a least squares
formulation under the assumption of uncorrelated Gaussian residuals of equal vari-
ance. A Gaussian prior could be assumed on the model parameters, and Cootes
showed how the AAM update step can be reformulated to incorporate this prior.
The effect is to pull the AAM solution more towards its mean, which may be helpful
in high noise conditions. In the rest of this thesis we will be more concerned with
incorporating prior knowledge about constrained points, so we summarise a simpli-
fied version of [31], ignoring the model prior terms (in effect a uniform model prior
is assumed).
Suppose we have prior estimates of the positions of some points in the image frame
X0, together with their covariance matrix SX . Unknown points can be represented by
zeroes, together with large upper bounds in SX , and effectively zeroes in SX−1. Let
79
Chapter 4. Model Based Vision
d(p) = (X−X0) be a vector of the displacements of the current point positions from
their prior positions. We assume further that the prior point positions are Gaussian
distributed, and also that the texture residuals are independently and identically dis-
tributed with variance σr2. Then maximising the logarithm of the MAP is equivalent
to minimising:
E1(p) = σr−2rT r + dTSX
−1d (4.29)
By using a first order Taylor expansion similar to that used to derive the basic AAM
update equation, the parameter update is given by the solution to the equation set:
Aδp = −a (4.30)
where, after defining the Jacobian of d w.r.t p as K
A =(σr
−2JTJ + KTSX−1K
)
a =(σr
−2JT r(p) + KTSX−1d) (4.31)
and
K =δd
δp(4.32)
When computing the prior point displacement Jacobian K, it is necessary to take
into account the global pose transformation t as well as the appearance model param-
eters c. Cootes further developed the special case of isotropic prior point positional
variance with zero off-diagonal terms, and when the pose transformation St(x) is
a similarity transform which scales by s. Then let x0 be the prior point positions
mapped into the model frame, so x0 = St−1(X0), and let y = s(x − x0). Then
dTSX−1d = yTSX
−1y.
In this case:A =
(σr
−2JTJ + KmTSX
−1Km
)
a =(σr
−2JT r(p) + KmTSX
−1y) (4.33)
The Jacobian Km is the concatenation(
δyδc
|δyδt
)and:
δyδc
= sQs
δyδt
= −sδ(St
−1(X0))
δt− (x − x0) ·
(sxs ,
sy
s , 0, 0) (4.34)
80
Chapter 4. Model Based Vision
The update equation can then be solved using standard methods in linear algebra;
for example, since the matrix A is symmetric, Cholesky decomposition [108] can be
used for speed to invert A; but if that appears ill-conditioned, then SVD can be used
to robustly calculate an inverse (in the least-squares sense).
4.6 Model Optimisation using Minimum Descrip-
tion Length
One potential problem with statistical shape models is the need to establish the
set of correspondences between images in the training set. If points are placed at
positions that are in some sense inconsistent, then the shape model is degraded,
and may generate unphysical shapes. This can be particularly problematical when
not enough points are used. The correspondence problem is particularly acute when
dealing with 3D shapes with rather amorphous regions. An approach which shows
merit in partially automating the model-building process was introduced by Davies
et al [39, 40]. This approach using an information theoretical approach using the
minimum description length (MDL) of the model and training set. The idea is that
a consistent set of correspondences will lead to a more compact description and a
better model. MDL reformulates the correspondence problem as a method of finding
the most compact coding of the training set. This requires the minimisation of the
sum of the description lengths of: the model, each shape’s model parameters, and
each shape’s residuals. An existing shape segmentation is still required, but the set of
corresponding points is obtained by a re-parameterisation of the shape using a set of
kernel functions. The method then finds the set of kernel parameters which minimises
the total description length. Davies et al [39, 40] compared MDL-optimised shape
models to manually crafted ones and found that the MDL-optimised models had
better specificity (synthesised shapes more closely match at least one shape in the
training set), and generalisability (lower residuals in miss-1-out tests).
The original formulation of the MDL approach required an existing segmentation to
reparameterise. More recently Cootes et al [29] have experimented with a completely
automatic model building process by generalising MDL to building appearance mod-
els within a groupwise registration scheme. Shape and texture models are computed
using a current set of corresponding points for all but a current target image. This
current target image then has its points moved in order to minimise the description
81
Chapter 4. Model Based Vision
length of encoding it using the current model. This process is repeated for each image
in turn, withing a larger iteration loop, until (it is hoped) the process converges to a
set of co-registered images, with a set of points in correspondence. The method has
given promising results on a face dataset, and on a dataset of MR images of normal
brains. Current evaluation criteria have focussed on specificity measures, partly due
to a lack of ground truth.
4.7 Conclusions
Methods purely driven by image data are often unreliable as they are too under-
constrained, especially in medical images where noise and clutter are ubiquitous.
Difficult image understanding and segmentation problems require strong priors. Us-
ing linear models with orthogonal bases for these priors has many advantages, due
to their mathematical simplicity and the associated least squares methods, and their
relative ease of optimisation. The ASM and its descendants (particularly the Active
Appearance Model) form the most suitable class for many problems, as not only is
the shape model based on a training set, but there is also a texture model used to
match to an unseen image, which is also learnt rather than imposed. Because the
AAM learns so much from its training set it is relatively free of ad-hoc parameters.
Although the shape and texture models of the ASM and AAM provide helpful con-
straints that avoid unphysical solutions, and are robust against noise and partially
missing structure, there still remains the problem that they will fail to adequately fit
unseen objects which deviate too far from the training set. Thus model-based meth-
ods can suffer from an undertraining problem. In medical images, the application
may well be related to diagnosing disease. Unfortunately it is precisely the (rare?)
pathological cases in which the shape model may be undertrained. For example an
ASM can be used to fit to vertebrae in images of the spine, but the ASM would not
be able to fit to a severe vertebral fracture or an extreme scoliosis if none such were
present in the training set.
In such cases the greater flexibility of a “snake” could be advantageous, perhaps as
a user-invoked fallback mode. Also annotating the images to create the training set
can be very time consuming. This can be “bootstrapped”, by building initial models
which are then used to provide a degree of fitting to further training images, with the
82
Chapter 4. Model Based Vision
model being continuously updated as more training images are added. In the early
stages of annotating the training images a snake approach could be used, steered by
an expert user. Thus the snakes could assist in the generation of a better trained
model. Further work on MDL-based automatic model building [29] may eventually
allow automation of the tedious process of hand-annotation of the training set.
In the subsequent chapter we also introduce a method of using multiple sub-models
which reduces problems with undertraining.
83
Chapter 5
The consistent combination of
multiple sub-model AAMs
5.1 Introduction
In this chapter we discuss our methods for semi-automatically segmenting vertebrae,
but presented in a generic form. Rather than using a single model of the whole
spine, we have developed methods for using a sequence of linked sub-models. We
published an early paper [113] on a basic version of these ideas immediately prior
to commencing this PhD. This work is also substantially summarised as part of this
chapter, for reasons of logical completeness.
We begin by discussing a general trade-off concerning sub-model flexibility and con-
straint. We define a general sequencing algorithm for linking multiple sub-models,
illustrated specifically by the vertebral segmentation problem. We compare different
methods of sequencing the fitting of the several sub-models.
We have already published most of the ideas in this chapter in a number of papers
[113, 114, 116, 117]. However as the work has been spread over four different papers, it
is necessary to rewrite the content into a more integrated whole. Whereas the earlier
implementation [113] prior to this thesis assumed zero off-diagonal covariance terms,
in this thesis we examine the use of bootstrapped estimates of inter-point covariance,
which could allow better modelling of the links between the multiple sub-models.
84
Chapter 5. The consistent combination of multiple sub-model AAMs
Note that detailed results on vertebral segmentation accuracy with a variety of AAM
forms and sub-model structures are presented in the next chapter. Here we deal with
the more generic algorithm for combining sub-models together in a linked sequence.
5.2 Model-Based Segmentation - Some Trade-Offs
5.2.1 Statistical Models in Medical Imaging
Many problems in medical image interpretation require an automated system to anal-
yse images. These images may provide noisy data, and typically complex structure.
As noted in the previous chapter, model-based methods offer solutions to these diffi-
culties [33], by enforcing strong priors learned from a set of annotated training im-
ages. A widespread such approach is the Active Appearance Model (AAM) [33, 24],
discussed in detail in section 4.5.
5.2.2 Global vs Local Models
Although the use of a model provides helpful constraints that avoid unphysical solu-
tions, there still remains the problem that as the model is based on a finite training
set, it will fail to adequately fit unseen objects which deviate too far from the training
set. In particular under-training of the model may mean that it may is insufficiently
adaptable on a local level, especially when pathologies are present. Sometimes the
training set might reasonably capture the variation in some sub-structures, but not all
the necessary variation in their inter-relationships. For example large kyphosis (high
spinal curvature) or scoliosis (pathological lateral displacement of vertebrae) may not
be captured in a global model. Or, since one vertebral fracture means that others are
more likely, in a small training set some spurious correlations may be learned, so that
for example a severe fracture of T12 might only be reproducible in a global model
with a severe fracture of T10. In previous work [113] we showed that such under-
training problems could at least be mitigated by using multiple sub-models, partly
because the use of sub-models means that each sub-structure can have its own pose
parameters. Moreover when there are projective effects present (e.g. from a divergent
X-ray beam) each sub-model can have a locally optimised affine transform to best
approximate to the varying projection. Another advantage of using sub-structures
85
Chapter 5. The consistent combination of multiple sub-model AAMs
is that the order of their fitting can be dynamically adjusted to defer problematical
regions. Problems can arise from either clutter (e.g. soft-tissue), high noise, par-
tial occlusion, or from pathologies. We show later how this dynamic sequencing can
be done using a quality of fit measure; in the next chapter we show that this dy-
namic ordering improves accuracy on DXA images of the spine. Also when there is
varying brightness, contrast, or other variable lighting effects across the image, the
use of smaller structures may mitigate such problems by allowing local normalisa-
tion. Finally inherent non-linearities, or higher moments (e.g. skewness) of the shape
distribution, are even harder to capture in a single global model.
But if the size of modelled sub-structures is too small, then in noisy data the models
can be too unconstrained. So there is a trade-off in the optimal size of structures to
model. Even when models of sub-structures are used, it is still necessary to express
the global level relationships between them. We show that if the constrained form of
the AAM is used [31], this linkage can be naturally expressed in terms of soft point
constraints, which are recursively updated as each sub-model is fitted. It is possible
to overlap the sub-structures, thus providing natural linkage. Also (or instead) a
global shape model of all the points can be used to constrain the sub-models.
5.3 Combining Overlapping Sub-Models
We summarise how the linked sub-model search algorithm for locating vertebrae
operated in our earlier work [113]. Then we describe the generalisation and the
enhancements we have introduced in the course of this work.
5.3.1 Vertebral Triplet Modelling
5.3.1.1 Sub-Model Iterative Linkage
In [113] the spine was modelled by a sequence of overlapping triplets of vertebrae, and
the sequence of sub-model solutions were combined as follows, using a fixed ordering
starting by going up the lumbar. Each vertebra is fitted using the triplet sub-model in
which it is central (Figure 5.1). When a triplet model has been fitted it also provides
part of the initialisation (via the overlapping vertebrae) for subsequent iterations
86
Chapter 5. The consistent combination of multiple sub-model AAMs
which fit its neighbours. Furthermore constraints are applied so that overlapping
vertebrae cannot be moved far from the provisional positions determined previously
- the constrained AAM [31] (equation 4.31) rather than simple AAM is used. Note
also that subsequent iterations do not update any point positions which have already
been determined, unless the point is in the central vertebra of the triplet.
This feed-forward of constraints is further aided by re-fitting the global shape model
to the solution so far. This is used to initialise a starting solution for vertebrae not
yet fitted, but only low constraint weights are attached to this global prior. Thus
information in the global shape model is still used in guiding the solution, but the
global shape constraints are downweighted, which allows their violation to a degree
if the image evidence locally supports such a solution.
Our approach differs somewhat from that of Davatzikos et al [38], who propose a
Hierarchical Active Shape Model based on a wavelet decomposition of the shape
contours, in which local regions of the wavelet transform space are decoupled when
applying the shape model constraints. In [38] coarse global constraints continue
to apply in a strict sense. However, in the case of segmenting vertebrae, certain
pathologies of the spine such as scoliosis may cause even coarse aspects of the shape
model to be violated, as whole vertebrae can be laterally shifted outside the region
captured in the training set. Therefore we use multiple overlapping AAMs instead,
which allows a wider range of pathologies to be adequately fitted. Also there may be
some advantages in decomposing the image search process, as well as the application
of shape constraints. Furthermore our approach is more suitable to AAM search, as
unlike with the ASM the shape constraints are not applied after first locating a set
of target positions, but are intrinsic to the search process per se.
5.3.2 Generalisation
This idea of linking sub-models can be generalised. It is simply necessary to label
points in a sub-model as either core (e.g. the central vertebra of the triplet), or overlap
(i.e. included to provide linkage with other sub-models). It is also necessary to store
the overall solution in for example a global vector of points. Each sub-model also has
its own local copy of its sub-solution only. The difference between core and overlap
points is that the core points are always copied back into the global solution when
the sub-model is fitted (even if they had been provisionally set as part of a previous
87
Chapter 5. The consistent combination of multiple sub-model AAMs
Figure 5.1: An illustration of the first two iterations (static fit ordering) combining verte-bral triplet sub-models. a) shows the result of fitting the first triplet containing L4/L3/L2.b) indicates the the next iteration which fits L3/L2/L1. The second iteration resets L2(now core) and provides an initialisation of L1, and an updated prediction for T12 via theglobal shape model. However the previous solution for L3 is retained (not core).
sub-model); whereas the overlap points are only copied out once the first time they
are fitted. As overlap points can be contained in more than one model this distinction
is made in order not to compromise overall consistency. The second difference is that
once core points are fitted they are assigned higher constraint weights. In principle
the overlap can be null, in which case the updated predictions made through the
global shape model provide the only linkage.
5.3.3 Dynamic Sub-Model Sequence Ordering Algorithm
If one sub-model fit fails, then this can misalign the starting solution for subsequent
iterations. In such a case it would be better to adapt the sequence dynamically by
comparing the fit quality of several candidate sub-models. Picking the best quality
fit model as the one to impose at this iteration will tend to defer noisier or poorly
fitting regions until they have been better constrained by their neighbours. Therefore
we developed the originally static ordering of [113], and use a dynamic ordering,
which is also easier to generalise. This extension to the algorithm has been published
previously in [114]. A set of Nc candidate sub-models is maintained. Each candidate
88
Chapter 5. The consistent combination of multiple sub-model AAMs
is provisionally fitted, then the sub-model with the best fit quality is imposed into the
global solution. We use the residual sum of squares as the basis of the fitting quality
measure, but with some renormalisation as discussed subsequently. A new candidate
is then added as detailed in the next paragraph, until all sub-models have either been
fitted already, or are now in the set of candidates. When no more candidates can be
added the remaining Nc − 1 sub-models are fitted (best quality first) and the search
concludes.
A new candidate is added at the end of each iteration by searching from the latest
best candidate to locate its nearest neighbour that has never been added to the
candidate list. In the vertebral case the ordering uses the current estimates of the
distance between vertebral centres. In more general applications it would be necessary
to define a kind of proximity ordering on the sub-models in order to know which
candidate to push into the list next. This ordering can be viewed as a set of adjacency
relations, which may not necessarily be transitive overall. For example rather than
have a single overall ordering of the model set, each sub-model can have its own
ordering of all the others, based on either mean proximity, or mutual correlation.
5.3.4 Algorithm Pseudo-Code
In this section we summarise the sub-model combination algorithm more formally in
a pseudo-code form.
We make use of the following nomenclature:
Let C be the working set of current candidate AAMs, and the R be the set of indices
of remaining sub-models not yet fitted. Let x(g) be the current global solution of all
points. Let S(g) denote the global covariance matrix of the errors in x(g).
Let M be the global shape model. Let x(g) be a vector containing the global shape
model’s weighted best fit to x(g), with weights determined by the respective (recipro-
cal) variances. Note that points in x(g) that have not yet been determined become in
effect predictions from the points that have been determined, because the determined
points have substantially higher weights attached. Let d be a vector of boolean flags
indicating which points have been determined by a sub-model fit. Let 〈mi|mj〉 denote
the proximity measure of AAM sub-model i to sub-model j. At each iteration, it is
necessary to copy between the partial global solution and relevant sub-model local
89
Chapter 5. The consistent combination of multiple sub-model AAMs
storage. We let Pi(x(g), x(g),d, f) denote the operation of copying from the global
solution and prediction vectors into a local points vector for sub-model i, where f is
a boolean flag set true on the first iteration, and false thereafter. Also let Ti(S(g))
denote a similar operation to extract relevant portions of the covariance matrix.
We also adopt the following conventions.
1. Let M→fit(x,w) be the weighted least squares fit of model M to points vector
x using corresponding weights w. See Appendix A.
2. Let m→x denote the current points solution of model m
3. diag(A) denotes the vector formed from the diagonal elements of matrix A.
4. Given a vector w we use w[−1] to denote the vector whose elements are the
inverse of those of w, i.e. [w[−1]]i = ([w]i)−1.
5. We use the C-style ? operator with boolean flags, so (b ? x1 : x2) means the
latter expression evaluates to x1 if boolean variable b is true, or x2 otherwise.
Firstly it is necessary to somehow initialise the global solution points vector x(g),
and set an initial initial covariance matrix S(g) to reflect the variance of this process,
together with a corresponding correlation matrix T(g) . For example some limited
user interaction may fix an approximate inital solution, or a salient feature detection
process might produce an initial estimate. We use a parameter σ2I which is the mean
point error variance associated with the initialisation, i.e. σ2I = Tr(S(g))
2n. After the
first iteration (i.e. once a sub-model has been fitted), then the relevant remaining
terms have a reduced variance σ2P (see section 5.3.5.3). The algorithm then proceeds
as defined in Algorithm 1.
5.3.5 Updating the Constraint Variance
5.3.5.1 Simple diagonal constraint covariance matrix
When using the constrained AAM, the optimum solution depends on the covariance
matrix SX . In our earlier work [113] we assumed a simplified diagonal form with
essentially just three variances: low, medium and high values. These corresponded
90
Chapter 5. The consistent combination of multiple sub-model AAMs
Algorithm 1 Formal Sub-Model Combination Algorithm
1. Insert the initial Nc candidate sub-models into C and initialiseR = {i : mi /∈ C}.
2. Set “first” iteration flag f = true and [d]j = false, ∀j
3. While C 6= ∅ loop
(a) Initialise sub-model quality container, Q = ∅(b) For each sub-model mi ∈ C do
i. Copy current global solution and covariance matrix partitions tolocal constraints x′
i, S(i)X , thus (See Algorithm 2):
A. (x(g), x(g))Pi(d,f)7−→ x′
i
B. S(g) Ti7−→ S(i)X
ii. Set weights w = diag(S(i)X )[−1]
iii. Initialise mi APM parameters to the best weighted fit of mi to x′i
given weights w (see Appendix A).
iv. Perform a constrained AAM fit of mi to the image, given pointconstraint values x′
i and covariance matrix S(i)X .
v. Calculate the quality of fit of this sub-model q (see section 5.3.6),and store in a suitable ordered container Q so that Q(i) = q.
(c) if f then reinitialise S(g): S(g) = σ2P T(g)
(d) Obtain index of best-fitting model, k = argmax(Q)
(e) Impose sub-model mk into global solution, thus updating x(g),S(g),d.See Algorithm 3.
(f) Update predicted points thus:
i. Set weights w = diag(S(g))[−1]; then w′ is set thus:[w′]j = (dj ? wj : σI
−2) ∀j
ii. x(g) = M→fit(x(g),w′)
iii. Adjust the global covariance matrix for updated predictions ifnecessary - see section 5.3.5.4.
(g) C 7→ C \ {mk}(h) if R 6= ∅
i. k′ = argminj∈R (〈mk|mj〉)ii. C 7→ C ∪ {mk′} R 7→ R \ {k′}
(i) f = false
91
Chapter 5. The consistent combination of multiple sub-model AAMs
Algorithm 2 Project points into sub-model, operations Pi(x(g), x(g),d, f); Ti(S
(g))
Let Ii denote the ordered set of point indices of sub-model mi. Note we assume thatthe points have the same relative ordering in the sub-model as the global model, withthe first sub-model point having local index 0 and global index [Ii]0 (i.e. as in C-style array indexing). The operation sets an output vector xi and output covariance
matrix S(i)X
1. k = 0
2. For j ∈ Ii do
(a) Perform Pi(x(g), x(g),d, f) for the current point thus:
i. if [d]j = true or f = true then [xi]k = [x(g)]j ;otherwise [xi]k = [x(g)]j
(b) Perform Ti(S(g)) for the row corresponding to the current point thus:
i. k′ = 0
ii. For j′ ∈ Ii do
A. [S(i)X ]kk′ = [S(g)]jj′
B. k′ 7→ k′ + 1
(c) k 7→ k + 1
92
Chapter 5. The consistent combination of multiple sub-model AAMs
Algorithm 3 Impose sub-model into global solution
Let Ii denote the ordered set of point indices of sub-model mi. Note we assumethat the points have the same relative ordering in the sub-model as the global model,with the first sub-model point having local index 0 and global index [Ii]0 (i.e. asin C-style array indexing). Let IC denote the ordered subset of Ii containing the
indices of points in the core of this sub-model. Let S(i)X denote the estimated sub-
model covariance matrix (after fitting this sub-model), as given by equation 5.7.Let T(g) denote the global correlation matrix obtained by renormalising the initialvalues of S(g) (see section 5.3.5.3). Note that as discussed in section 5.3.5.3 we usea renormalisation to a constant mean prediction variance σ2
P .
1. x = mi→x
2. k = 0
3. For j ∈ Ii do
(a) if j ∈ IC or dj = false
i. [x(g)]j = xk
ii. k′ = 0
iii. For j′ ∈ Ii do
A. if j′ ∈ IC or dj′ = false then [S(g)]jj′ = [S(i)X ]kk′;
otherwise [S(g)]jj′ = [S(g)]j′j = 0
B. k′ 7→ k′ + 1
iv. Renormalise the cross-covariance terms in the global covariancematrix between this updated value and all other predicted points.See section 5.3.5.3. Note that these steps could be replaced by a fullupdate of the prediction covariance matrix, see section 5.3.5.4. Indetail:
v. Predicted point indices set JP = {j′ : j′ /∈ Ii and dj′ = false}vi. For j′ ∈ JP do
A. [S(g)]jj′ = σP
√[S(g)]jj[T
(g)]jj′
B. [S(g)]j′j = [S(g)]jj′
(b) dj = true
(c) k 7→ k + 1
93
Chapter 5. The consistent combination of multiple sub-model AAMs
respectively to: determined core points; overlap points which had been fitted but
were not yet core; and predicted points (i.e. predicted via the global model from
either the initialisation, or consequent to previous sub-model fits). See for example
Figure 5.1. The constraint variances for points in the core portion of the sub-model
just fitted are reset to reflect the estimated variance of the point location error σa2∗;
whereas the associated variance for those points in its overlap subset are set to a
higher value σO2, which effectively downweights them, allowing greater flexibility to
neighbouring sub-models which will subsequently refit them. Other points which
have not yet been fitted (and are merely predicted via the global model) had a high
variance. This was set on the first iteration to the average variance (over all points)
of the initialisation (point-to-line) error σI2. Subsequently the prediction variance
was set using a constant σP02 obtained by averaging over all points in the spine. This
was obtained by taking the initialisation points as given, and then fixing the points
in just two vertebrae in turn to their manually annotated values, and then obtaining
the best shape model fit to this point subset. The (point-to-line) prediction errors for
points in the next neighbour can then be obtained, and an average variance calculated
over the spine by gradually moving up fixing just one pair of vertebrae in turn. So
for example we fix the values of points in L4 and L3, and use the global model to
predict L2; next we treat L3 and L2 as fixed and predict where L1 is; and so on. In
the general case the relevant subset of points to average the predictions over at each
stage are those in the nearest other sub-model. The actual prediction variance used
in the algorithm is then further degraded by adding on the variance assumed for the
determined points, so finally:
σP2 = σa
2 + σP02 (5.1)
In reality the set of points that have been already determined may well contain more
than two vertebrae, especially when the dynamic sequencing method is used, so this
method represents a slightly crude upper bound. A possible more exact calculation
is presented later, but we have not implemented it.
Somehat arbitrarily the overlap point variance σO2 was set in [113] by averaging the
∗as a point-to-line error, which then produces an isotropic variance of σa
2 in both Cartesiancomponents
94
Chapter 5. The consistent combination of multiple sub-model AAMs
prediction variance and the core accuracy so
σO2 = 0.5(σa
2 + σP2) (5.2)
The value of σa can be initially set by determining the mean accuracy of standard
global AAM, and if necessary recursively fine-tuned in subsequent experiments with
sub-models.
5.3.5.2 Non-diagonal constraint covariance matrix
In reality the simple diagonal assumption will fail, as there are bound to be correla-
tions between the errors in nearby points. In this section we discuss how a plausible
(though approximate) covariance matrix can be introduced. The true values of SX
are difficult to determine, and indeed are somewhat circular, since they depend on
previous stages of the fitting algorithm, which in turn depend on the values assumed
for SX . Also we wish to include a degree of “model noise” to reflect inadequacies of
the training set, which by definition is something of an unknown. However we can say
that we expect correlations in the errors to be related to the fundamental correlation
between the points themselves, as captured in the shape model. Clearly there will
be correlation between the errors in estimates of nearby points, as fitting errors will
tend to displace several adjacent points off their true location in a connected manner.
We recall that the AAM will be working in appearance parameter space, and so it is
natural to explore the effects on point errors of small errors in the model parameters.
These are not simple linear transforms due to “model noise”, so we estimate their
effects via “miss-M-out” bootstrap experiments.
Secondly the purpose of having the “medium” constraints attached to sub-model
overlap points is to provide a degree of linkage, but in a way which does not over-
constrain subsequent sub-model searches for which these points will be core. Part of
the purpose of this is to recognise that partial search failures do occur, and to have
a kind of “second-chance” when fitting subsequent sub-models. We therefore view
the medium constraint weights that should be attached to these overlap points as
intermediate between the underlying fit accuracy (on successful searches), and that
which would be obtained if we simply used the sub-model’s shape model to predict
the overlap points given the core. Thus for the overlap variance we use on point i in
95
Chapter 5. The consistent combination of multiple sub-model AAMs
a sub-model:
σiO2 = 0.5(σa
2 + σip2). (5.3)
where σip2) is the prediction variance of overlap point i given the core solution. The
prediction covariance terms are derived in “leave-M-out” bootstrap experiments as
follows.
We randomly select a test subset (M=16), and train shape models on the remainder
of the training set. We then loop over the test set. For each test image we take its
annotated shape and select the shape model parameters to best fit the core points
only. In essence the covariance of the prediction errors in the overlap points can then
be calculated from the differences between the annotated positions and their values in
the (core-fitted) shape model. We also wish to have an a priori plausible covariance
matrix for the fitting errors on the core, so we make small random perturbations to the
shape model parameters, which allows an overall covariance matrix to be established.
The scale of the perturbations is selected so that the average expected point variance
induced over the core points equals σa2, though in reality it may be slightly higher
due to model inadequacy.
The perturbation in parameter k in test image ν is a zero-mean Gassian with standard
deviation ∆νσk, where σk is the shape parameter’s standard deviation (derived from
the mode’s associated eigenvalue), and ∆ν is a scaling factor. Recalling the linear
nature of the shape model, the variance induced in Cartesian co-ordinate xi will be:
σi2 = ∆ν
2ms∑
k=1
[Ps]ik2σk
2 (5.4)
Let σi2 =
∑ms
k=1 [Ps]ik2σk
2 Then ∆ν is determined by averaging the set of σi2 over
the coordinates of core points and requiring an average variance of σa2. Let the sets
IC and IO define the indices of points in the sub-model which are core or overlap
respectively. Then
∆2ν =
|IC |σa2
s2ν
∑i∈IC
σi2 (5.5)
where sν is the scale of the model to world co-ordinates similarity transform for test
image ν.
The covariance matrix C thus derived is then renormalised into a correlation matrix
96
Chapter 5. The consistent combination of multiple sub-model AAMs
T to allow a more flexible change of variance scaling. After fitting a sub-model we
then fix variances of core points to σ2a, and variances of overlap points to σiO
2. Note
that when deriving the latter via 5.3 we set
σip2 = Cii (5.6)
Thus after a sub-model has been fitted, (and selected as the best fitting one of the
candidate set), and so associated point constraints need updating, the point covari-
ance used on its updated points in a subsequent constrained AAM (see equation 4.31)
is given by the appropriate term in:
[SX ]ij = Tijσa2 i, j ∈ IC
[SX ]ij = TijσiOσjO i, j ∈ IO
[SX ]ij = TijσiOσa i ∈ IO, j ∈ IC
[SX ]ij = [SX ]ji i ∈ IC , j ∈ IO
(5.7)
But if previous sub-model fits had already determined some of the overlap points,then
since these points will not be updated again by this sub-model, the diagonal (variance)
overlap terms are not updated, and the covariance terms between such overlap and
core points are instead reset to zero. This reflects the fact that the respective core
and overlap points have been set by notionally independent estimators.
5.3.5.3 Covariance matrix of predictions via global shape model
Finally for points which have merely been predicted but not yet updated we simply
continue to use the prediction error correlation matrix resulting from the initialisation
process. This again is derived in bootstrap resampling experiments (with separate
train/test splits). The initialisation process is application specific, and may need to
simulate user input such as clicking on a small subset of initialisation points with
some typical precision errror; or running some other initial salient feature detection
process. It could be as simple as assuming the mean shape and pose. The difference
between the actual shape and the shape generated by the initialisation process is
recorded for each test image, and then a prediction error covariance matrix S(g)I can
be derived.
This matrix of course relects the errors at initialisation. After the first iteration, in
97
Chapter 5. The consistent combination of multiple sub-model AAMs
order to allow for a somewhat reduced prediction error as more points are determined,
we renormalise this matrix by first converting it to correlation matrix T(g), so
[T(g)]ij =[S
(g)I ]ij(
[S(g)I ]ii[S
(g)I ]jj
)0.5 (5.8)
Then we multiply by σP2 (see equation 5.1 above). Thus the variance terms are
constant reflecting an approximate average as in the earlier implementation.
Thus in our current implementation, although the predicted points themselves do
get updated as the search proceeds, the associated covariance terms do not entirely
reflect the full set of available predictors. This is admittedly inconsistent, but in our
application, the main constraints arise through overlapping the sub-model triplets.
If the sub-model coupling methods we have developed were to be used in another
application where the sub-models had a lesser overlap, then the updated predictions
through the global model would have to play a stronger role. In such a case the con-
ditional global covariance matrix (given the points determined so far) should be used,
but including some additional “model noise”. However in the vertebral case there
is already a set of constraints imposed via the overlapping of the sub-models them-
selves, and it is simply not worth the computational effort to evaluate the conditional
covariance matrix for predicted points; especially when the associated constraints are
in any case quite weak. However in the next section we outline how this could be
done.
A slight complication is how to update the covariance terms between predicted points
and those which have just been determined by a model fit. For example if we fit the
vertebral triplet centred on T12, then the covariance terms relating T11,T12,L1 are
updated. This has a knock-on effect on all other cross-terms between all these points
and all other points which are currently predicted. The update arising from the latest
sub-model imposition means that the cross-covariance terms between for example
T10 and T11 are in effect “broken” - as these are still set to values appropriate to
the initialisation, whereas the variance of the T11 estimates has just been reduced
by imposing the T12 solution. We therefore update these cross-terms assuming the
same prediction correlation coefficients as at initialisation, but we renormalise for the
updated T11 point variances. This ensures that the sub-partitions used in sub-model
fits remain consistent.
98
Chapter 5. The consistent combination of multiple sub-model AAMs
5.3.5.4 How to calculate a consistent global prediction covariance
Although as noted above, we have used an approximate form of the prediction co-
variance, it is possible to calculate the correct form of this, given whatever set of
sub-models have been determined so far. Although we have not implemented this we
include it for completeness, and because other applications with less (or no) overlap
between sub-models may require it. First suppose that some subset of the overall
points are known, thus defining a (partial) shape S2. As more sub-models are fitted
the number of points in S2 increases, which in general will decrease the prediction
errors in the remainder of the shape S1. Note that S2 may comprise not only those
points which have already been determined by previous sub-model AAMs, but also
any special points set in the global initialisation process. Firstly for convenience
we suppose that these two subsets have the point indices re-ordered so that we can
partition the overall global covariance matrix C thus:
C =
(C(11) C(12)
C(21) C(22)
)(5.9)
Then given the shape vector x(2) of S2, and assuming a joint Gaussian distribution,
the conditional distribution of S1 is still Gaussian. Formulae for the conditional mean
and covariance are given for example by de Bruijne [17]. The maximum likelihood
estimator of S1 is given by [17]:
x(1) = x(1) + C12C22−1(x(2) − x(2)) (5.10)
The covariance matrix K0 of this conditional estimator is given by:
K0 = C(11) −C(12)C(22)−1C(21) (5.11)
There can be numerical problems in inverting C(22) due to chance covariance in the
training set, and multi-collinearity arising from estimating it from a small sample.
De Bruijne et al [17] suggest using ridge regression to avoid numerical problems by
adding a small constant along the diagonal of C(22).
In fact K0 represents the covariance given a perfect knowledge of S2, whereas in
99
Chapter 5. The consistent combination of multiple sub-model AAMs
reality we are assuming x(2) has errors. Because it is a linear prediction process we
can calculate the additional induced covariance from the regression coefficients of S1
on S2.
Defining for convenience
B = C12C22−1 (5.12)
Then the additional covariance induced in S1 given the current covariance matrix
V(2) for S2 will be given by
K1 = BV(2)BT (5.13)
This matrix must be then added to K0 to obtain the final covariance matrix K for
the predicted points. Also because of undertraining there is a certain unmodelled
error which we represent by an additional variance term σm2. This can be estimated
by bootstrap refitting of the shape model back to unseen examples using random
selection to split the training/test set (e.g. ’leave-8-out’). Then by averaging over
all points in all such random test set selections, the residual variance σm2 can be
obtained, which is then added to the diagonal elements, hence:
K = K0 + K1 + σm2I (5.14)
5.3.5.5 Inverting the covariance matrix
Because the constrained AAM actually works with the inverse of the points’ covari-
ance matrix, we work with each sub-model’s locally relevant sub-partition of the
overall covariance matrix. This sub-matrix is in effect inverted using SVD. As the
covariance matrices are necessarily symmetric we could have used Cholesky decom-
position [108], which would be more efficient. However there may be problems in
obtaining a full rank covariance matrix if the training set is not sufficiently large, and
so we use SVD which is more numerically robust in such situations. If dealing with
larger matrices (e.g. full updating of the prediction covariance matrix as discussed in
the previous section) then it might be more efficient to add in small constants along
the diagonal of the matrix, effectively then using ridge regression as recommended by
de Bruijne ( [17]);and then use Cholesky decomposition. Note that we never use the
inverse of the covariance matrix itself, as this always appears in equation 4.31 right
multiplied by a vector (or another matrix which can be regarded as a column-wise
100
Chapter 5. The consistent combination of multiple sub-model AAMs
concatenation of vectors). Thus we actually numerically solve equations of the form
SXd = y (5.15)
using a pre-calculated singular value decomposition of SX . Similar considerations
would apply to inverting C22 in equations 5.10 and 5.11
Another implementation detail is that because we work with several sub-models at
each iteration (and then only select the best-fitting one), some care needs to be taken
over the precise timing of when the singular value decompositions are recomputed. A
kind of cache is used, and whenever a sub-model is selected as the one to impose at
this iteration, any neighbouring models which have a mutual overlap are flagged as
requiring re-calculation of the singular value decomposition. Otherwise cached values
may be re-used.
5.3.6 Quality of Fit Measure
The “best fitting” model is essentially taken to be the one with the lowest residual
sum of squares. However some rescaling is applied to try and have a more consistent
way of comparing different sub-models containing potentially different numbers of
points. Firstly the scaled residual sum of squares is calculated as:
S =n∑
i=1
r2i
σ2ri
(5.16)
where ri is the grey level residual at point i, and σri is its estimated standard de-
viation. Then we effectively map S onto its associated cumulative density function
(CDF), as when comparing different sub-models with different numbers of points it
is not meaningful to directly compare values of S. So S is converted to a value on
a standard Gaussian approximating the associated theoretical χ2 distribution. In
the standard appearance model training code we use, the set {σri} is estimated by
refitting the appearance model back to its own training set, and then calculating the
residual variances. However this tends to under-estimate the true values when fitting
to unseen data, so we introduced an additional scaling parameter α on the expected
value of S to account for this. Also model inadequacies can cause nearby residuals to
be positively correlated, so the variance of the sampled distribution is boosted, which
101
Chapter 5. The consistent combination of multiple sub-model AAMs
we model by a second scaling parameter β. This can happen because for example
shape model inadequacies lead to adjacent portions of edges being offset from the
“true” edge, which leads to spatially consistent residuals, whose positive correlation
will boost the overall residual variance.
If the distribution were a true χ2 then we would have:
E(S) = n var(S) = 2n (5.17)
We extend this to
E(S) = αn var(S) = 2α2βn (5.18)
Estimates of these multipliers can be obtained by repeated random splitting into a
training/test set, and estimating the distribution of S from the trained appearance
model’s best fit to the annotated image (given the annotated shape). There is a
danger that outliers from images containing atypical pathologies will bias the fitting
process. Therefore a robust estimator of sample variance was used. We used the Sn
statistic of Rousseeuw and Croux [120], which estimates the standard deviation of a
distribution given a sample set X as:
σ = 1.1926M{i}({
M{j} ({|xi − xj | : xj ∈ X}) : xi ∈ X})
(5.19)
where M{k}(S) represents the median of values in set S, whose elements are iden-
tified by a set of indices {k}. The median of median sample separations is used,
so no estimate of central location is needed. This estimator has significantly better
normal efficiency than the more common MAD estimators, and is more appropriate
for asymmettric distributions.
Having estimated sample standard deviation (and hence variance vS) by Rousseeuw
and Croux’s method, we then calculate a sample mean µS on a trimmed sample,
obtained by excluding any outliers more than three standard deviations beyond the
median of S. The robustly estimated sampled variance vS and trimmed mean µS of
the sampled values of S then determine α and β thus:
α =µS
nβ =
nvS
2µ2S
(5.20)
102
Chapter 5. The consistent combination of multiple sub-model AAMs
The final standardised Gaussian z-value used is:
z =S − αn
α√
2nβ(5.21)
The quality measure is obtained by just negating this, as picking the highest value of
−z is equivalent to choosing the candidate with the lowest probability of obtaining a
residual sum of squares any better than that achieved.
5.4 Conclusion
We have presented a general algorithm for combining multiple (possibly overlapping)
sub-models by using the constrained AAM. A global shape model is also still used, to
provide linkage via iteratively updated predictions. Global constraints are therefore
still present, but as a weak prior. We have also developed a method of allowing the
fitting order of the sub-models to be determined by the data, using the heuristic of
“best quality first”. We believe that this will in general improve the robustness of the
search by deferring noisy or poorly fitting regions until they have been constrained
by their neighbours. Also this permits generalisation of the algorithm to cases where
there is no natural fitting order present. In the next chapter we will present results
showing the application of these methods to DXA images of the spine, and will
demonstrate that the dynamic ordering method improves the accuracy compared to
a static ordering of proceeding up the spine starting with the lower lumbar vertebrae.
The dynamic ordering algorithm together with a principled update of the global
constraints lead to a natural generalisation of the approach. This applies even when
the sub-structures have no overlapping portions, and no natural ordering for the fit
sequence, since the data itself suggests the sequence.
In principle the algorithm could be applied hierarchically with more than one layer
of decomposition. For example a vertebral triplet could be decomposed into the
upper and lower halves of single vertebrae. Such layered decomposition would be
intermediate between standard AAMs and the single point local AAMs of [30]. The
generality means this approach offers an interesting way of handling under-training
of pathological cases - a perennial problem in applying model based vision in medical
contexts.
103
Chapter 6
Vertebral Segmentation using
Multiple AAMs
6.1 Introduction
In this chapter we apply the multi-AAM method of the previous chapter to the
problem of segmenting vertebrae. We have already published similar results in a
number of papers [113, 114, 116, 117]. The training set has been somewhat extended
since these publications, the AAM parameters have been further optimised, and we
also present previously unpublished data on the optimal choice of sub-model structure
and appearance model form.
6.2 ASM vs AAM
Smyth et al [126] previously used an ASM to segment vertebrae, employing a single
model of the spine from L4 to T7. However we have used the AAM instead. The
AAM is a more statistically principled approach, as it incorporates all correlations
between texture at different points, and also models the correlation between shape
and texture via a tertiary PCA. Also as the AAM can match to a range of actual
textures, it is better suited than the ASM to the problem of segmenting vertebrae,
where the associated image texture can be quite variable, and in fact correlates with
shape. For example fractured vertebrae have not only a different shape, but the
104
Chapter 6. Vertebral Segmentation using Multiple AAMs
appearance of the endplate is different to that of a normal vertebrae: the fractured
endplate has a more diffuse, often multi-edged, texture [79] (see Figures 6.4 to
6.7); whereas a normal vertebra would have a clearly defined margin. An ASM in
contrast would try and adjust its shape so that each point located a position that
best matched the mean texture, whereas the AAM adjusts its appearance parameters
to best match to the actual texture of the image. Therefore this study used AAMs
rather than ASMs. In fact in further work by Smyth [125], in order to use an ASM
with fractured vertebrae it was necessary to introduce additional edges into the shape
model. However these essentially became degenerate in the case of normal vertebrae.
But with an AAM the surrounding texture (e.g. multi-edged structure) can more
naturally be incorporated into modes of the texture model.
Another group working with lumbar radiographs originally used the ASM as part of
a hierarchical segmentation process [143], but more recently have changed to using
the AAM instead [77]. On the other hand, as discussed in the previous chapter, the
ASM can have a larger convergence zone than the classical AAM, due to sampling
outside of the current shape along profiles. We attempt to have the best of both
worlds by using a profile AAM.
Furthermore the AAM can also combine prior estimates of shape points with the
image grey level residuals to perform Constrained AAM search [31]. In the previous
chapter we showed how this can be used to link multiple AAM sub-models together.
In this chapter we use these multi-AAM methods to segment vertebrae.
6.3 Data - DXA Images
6.3.1 Summary of Training Set
Assessment of vertebral fractures by spinal radiographs is still the definitive method
of determining vertebral fracture. However DXA images are being increasingly used,
because although the images are noisier and of lower spatial resolution, they have
several advantages, as discussed in chapter 3. DXA involves a substantially lower
radiation dose than conventional spinal radiographs, avoids the projectional parallax
effects of conventional radiographs due to the divergent X-ray beam, and has the
whole spine on a single image. See chapter 3 for further details. Typical images are
105
Chapter 6. Vertebral Segmentation using Multiple AAMs
illustrated in Figures 6.1 to 6.2. These Figures also illustrate the annotated shape
model points, and include some more unusual shapes and fractured vertebrae.
Because the thoracic vertebrae are more clearly identified on dual energy images,
dual, rather than single, energy images have been used in the study. The anonymised
DXA images used were obtained from two previous studies [126, 94] with institutional
ethical approval, together with anonymised images from recent patients aged 65 years
and older referred for bone densitometry, and for whom the referring physician had
requested DXA vertebral fracture assessment. We obtained the dataset of Smyth et al
[126] consisting of 78 lateral spine dual energy DXA images in women (mean age 61,
age range 44-80 years), acquired on an Hologic (Bedford, MA) QDR2000plus scanner.
The dimensions of each pixel of the scan were 0.9 x 1.0 mm. We also annotated further
images in the same dataset, which had not been used before, as the previous study
focussed on normal, and not fractured, vertebrae. We added a further 46 images from
this source, 31 of which included fractures or borderline deformities. We obtained a
further set of anonymised DXA images that had been used in a previous study into
the efficacy of clodronate [94], using an Hologic QDR4500A scanner. The dimensions
of each pixel of the scan were 0.5 x 0.5 mm. This gave a further set of 78 images
for inclusion. Of these, 59 included vertebral fractures or borderline deformities. We
selected a further 158 images from patients attending for DXA BMD measurement,
using an Hologic QDR 4500 Discovery, with a resolution of 0.35mm. These patients
had a propensity to osteoporosis, and the specific images were selected because they
contained fractures or other deformities helpful in training the shape model. Patients
in the clodronate study tended towards obesity. DXA images of obese patients give
a particularly poor signal to noise ratio, especially in the lumbar spine. There were
10 images of extremely poor quality in the clodronate study which were not used
for testing, but still partially used for training models, as these images contained a
high fracture prevalence. This left a test set of 350 images, with 512 fractures or
deformities.
6.3.2 Shape annotation
The images were annotated manually using an in-house tool. This was written in the
ANSI C++ programming language using a combination of: i) the Trolltech Qt library
(Trolltech ASA, Oslo, Norway) to provide the user interface; ii) proprietary in-house
statistical modelling software. The tool partly used previous Active Appearance
106
Chapter 6. Vertebral Segmentation using Multiple AAMs
Figure 6.1: The left hand image is a good quality DXA image, with the manually an-notated shape superimposed next. Rightmost is a typical DXA image with moderate di-aphragm motion artefacts (T10,T11), with the standard morphometric 6 points indicated.
107
Chapter 6. Vertebral Segmentation using Multiple AAMs
Figure 6.2: The left hand image is a DXA image showing fractures of T12 and T5 (possiblyT7). Next the same image is shown with the full annotated shape, followed by an image withobvious fractures of T10 and T12. Rightmost is an image with unusual lumbar curvature
108
Chapter 6. Vertebral Segmentation using Multiple AAMs
Models to reduce the manual labour, with the models being iteratively updated as
the annotated cases were added to the dataset. A useful feature of this tool is that
when points have been manually repositioned, they constrain other points in the
model; so if the initial automatic fit is incorrect, the user can partially correct it and
then re-run the model fit as constrained by the corrections (the Constrained AAM is
used with high constraint weights on the manually positioned points).
There can be vertebrae to which the provisional shape model cannot be adequately
fitted, because of undertraining. Therefore we also introduced a dynamic program-
ming edge search method into the tool. This first fits the vertebra’s shape model∗
learned so far to points already positioned by the user. Typically this will not fit pre-
cisely due to undertraining, so next the whole shape is warped using thin plate spines
so that the shape exactly passes through all points fixed by the user. Next a profile
search 5 mm either side of unconstrained points, and 1mm either side of constrained
ones is conducted to try and obtain a maximum sum of local (along profile) absolute
gradients - i.e. we look for the strongest local edge on each point. However high
curvature is penalised, and the sum of gradient strength and curvature penalty can
be straightforwardly solved by dynamic programming. Hence the vertebra’s shape is
updated. If necessary the user can then fix a further subset of points. The edge seek
can have a tendency to overfit - it is difficult to get a penalty curvature coefficent
that works ideally on every image. So the final refinement is typically for the user
to fix certain points which are regarded as well positioned, and then use a “Snap to
Model” feature which just implements the first part of this algorithm (fit the current
shape model, then use thin plate spline warps to exactly fit all manually positioned
points). As a rough guide it was found to generally be adequate to manually fix
around one point in 4, plus all points going around vertebral corners. Hence the tool
is not constrained by the prototype models in cases in which there is an inadequate
fit. The initial annotation was performed by the author (MR), with checking by an
experienced radiologist (JEA), particularly in difficult cases (e.g. fractures). The
later 158 image clinical dataset was annotated by an experienced radiographer (SC),
supervised by the author.
We also modelled the linkage of the vertebral bodies with the pedicles, as the inferior
margin of the pedicle is continuous with the posterior inferior margin of the vertebral
body. However, pedicles above T10 were not so modelled as the angulation of these
pedicles results in their being poorly visualised on a lateral scan. Although the
∗strictly the shape model of the sub-model determining the vertebra
109
Chapter 6. Vertebral Segmentation using Multiple AAMs
Figure 6.3: The left hand image shows all model points around a lumbar vertebra, in-cluding pedicle connections. The middle shows the annotation of a T10 fracture, and onthe right that of a severely fractured T9
pedicles are not of diagnostic interest, including the anchoring edges may help to
stabilise the AAM search by providing further structure in the model (Figure 6.3).
Since the posterior part of a vertebra is less prone to osteoporotic fracture, it was
thought that this anchoring of the posterior portion of the vertebrae would help to
stabilise the search, particularly with severe endplate or wedge fractures. Provisional
studies using only the dataset of Smyth et al [126] indicated a marginal improvement
in accuracy could be obtained by including the pedicles. Each vertebral contour uses
40 points around the vertebral body with 8 further points around the pedicles for
L4-T10 (see Figure 6.3), and 32 points per vertebra for T9-T7.
It can sometimes be rather ambiguous where to annotate fractured vertebrae. Some-
times there appear to be two or three edges: the remaining cortical rim, and a diffuse
lateral picture of the collapsed endplate. In such cases we attempted to consistently
place the vertebral contours on the outer edge of the endplate (but not the cortical
rim). As discussed above the variation in appearance should in principle be captured
as different texture modes in the appearance model.
6.3.3 Point correspondence
It would be possible to adjust the manually annotated shapes using the MDL ap-
proach [40] to further optimise the correspondences between the annotated points
across the training set images. We have simply retained the manually annotated
shapes for several reasons. Firstly. although there are some difficult cases, the box-
like structure of vertebrae generally means that there are some obvious landmarks
at the vertebral corners, connection with the pedicle, and typically also the endplate
110
Chapter 6. Vertebral Segmentation using Multiple AAMs
Figure 6.4: Illustration of shape annotation of a fractured lumbar endplate. Although aremnant of cortical rim is still visible, the shape points are annotated on the more innerendplate border
Figure 6.5: Illustration of shape annotation of a fractured lumbar endplate. Although aremnant of cortical rim is still visible, the shape points are annotated on the more innerendplate border
Figure 6.6: Illustration of shape annotation of a fractured T12 endplate. Although aremnant of cortical rim is still visible, the shape points are annotated on the more innerendplate border
111
Chapter 6. Vertebral Segmentation using Multiple AAMs
Figure 6.7: llustration of shape annotation of several fractured vertebrae. Althoughremnants of the vertebrae’s cortical rims are still visible, the shape points are annotatedon the more inner endplate borders. Note the generally fuzzier edge structure of fracturedvertebrae.
mid-points as used in standard 6-point morphomentry. We are not dealing with amor-
phous 3D blobs where establishing a set of point correspondences is a major problem.
Secondly because we have always started with an existing shape model - originally
from the dataset of Smyth et al [126] - our annotation tool uses an existing model
to initialise points which have not been explicitly positioned, together with a small
degree of thin plate spline warping to force the model fit to the manually positioned
points. As discussed above this has been used in conjunction with local edge fitting.
So we have never been in the position of simply equi-positioning points along a curve,
as an existing shape model (gradually refined) has been used throughout. The model
was regularly rebuilt as the annotation proceeded, so that some degree of groupwise
correspondence has been implicitly induced.
Thirdly the MDL method adjusts point locations around an existing segmentation,
which is typically obtained by a manual process. Given the noisy nature of DXA data,
and the often multi-edged appearance of fractured vertebrae; together with other
complicating pathologies such as osteophytes on the anterior corners; we consider
that correspondence errors in the original segmentation are likely to exceed any errors
in point placement around the rim of the segmented shape. So the gain from MDL
112
Chapter 6. Vertebral Segmentation using Multiple AAMs
may be lost in the other “noise” inherent in the original segmentation. Fourthly
in order to maximise the opportunity to train the models on as many fractured
vertebrae as possible, we have evaluated model fitting performance using a “leave-
N-out” train/test loop. In order to avoid bias, we would need to run the MDL
optimisation algorithm for each train/test split. This would be a costly computational
overhead, made worse by the fact that the original MDL optimisation algorithm was
designed for single part shapes; using multi-part shapes (or shapes not homotopic to
a circle/sphere) introduces an additional layer of combinatorial complexity. Since it
already takes a substantial amount of computation time (several hours) to build all
the Active Appearance Models†, we believe that the computational overhead would
not justify marginal improvements in the shape models.
Finally we are not really considering shape as the only criterion, as we are interested
in how well appearance models can be automatically fitted to unseen images. We
would really need to consider an MDL approach to the whole appearance model.
The development of this is still continuing. Although there has been recent progress
[29] in building fully automatic appearance models, we do not believe that this tech-
nology is sufficiently mature to be reliably used on difficult and noisy data like DXA
images. Also in [29] the correlation between shape and texture was not modelled, but
that is important in the form of (profile) appearance models that we use. Moreover
the pathological cases require some insight into the physical structures, whilst the
repetitive edge structure of vertebrae means that it would be easy for a completely
automatic process to fall into local minima involving fitting onto the converse edge
of a neighbouring vertebra - for example the superior edge of T12 might be assigned
to the inferior edge of T11, especially when there are diaphragm motion artefacts
present, or disc disease may obscure the inter-vertebral space.
We have examined our shape model modes, and not observed any unphysical char-
acteristics within allowed parameter ranges, such as might be caused by poor point
correspondence. Later in this chapter we also present results in shape model general-
isability, which appears to be adequate except for a small number of severe fractures.
Therefore we do not believe the use of the MDL method to be justified in this case.
†we use a 20 node cluster as it is
113
Chapter 6. Vertebral Segmentation using Multiple AAMs
6.4 Initialisation
It is necessary to define how the starting solution for model searches is to be con-
structed. When the algorithm is run interactively in an associated prototype clinical
tool, the clinician initialises the solution by clicking on a number of points with the
mouse. We experimented with two methods of initialisation. In the first method
(adopted from Smyth [126]) the clinician clicks on the bottom, middle and top of
the spine. More specifically these points were the middle inferior of L4, middle supe-
rior of T12 and middle superior of T7. The least squares global shape model fit to
these 3 points is used as the starting solution.
It was noted that the clinician would essentially move the mouse pointer up the spine
counting vertebrae during the initialisation, and so it would take a little more time
to simply click instead in the approximate centre of each vertebra in turn. This
might provide a better initialisation, especially in cases where high kyphosis (often
accompanied by fractures) or scoliosis produce departures from the typical 3-point
prediction.
When using the 10-point initialisation (centre of each of 10 vertebrae) there are
some additional technical details. Firstly there is a slight complication in using the
underlying C++ shape model code, because the points used for initialisation are not
strictly speaking contained in the shape model itself. But essentially they are linear
combinations of other points, with an inherent additional variance (pointer error).
The centre x(c)v of vertebra v can be defined as
x(c)v = 0.5(xvs + xvi) + e (6.1)
where xvs is the superior midpoint of vertebra v and xvi is its inferior midpoint, and e
is a random error, which we assume has zero mean and SD of 2mm in the y direction
(double that of an edge point) and the same 3mm in the x-direction. Then the
covariance matrix of the extended vector of the original points plus these manually
placed centre-points can be deduced from this linear combination and the underlying
covariance matrix of the shape model. This extended covariance matrix can then in
turn be diagonalised to produce an extended shape model for initialisation. It is this
extended shape model which is used for the initialisation as follows.
114
Chapter 6. Vertebral Segmentation using Multiple AAMs
1. Firstly this extended global shape model is used to obtain the model’s best fit
to the input set of vertebral centres. However, conditions such as scoliosis or
high kyphosis can cause the global model shape fit to be poor at particular
vertebrae.
2. So next the individual vertebrae are rigidly translated back to be centred on
their input positions, and then each sub-model is fitted to these translated
points in turn, by moving up the spine from L4. This gives a closer fit of
the centres to the user’s input, but can occasionally violate consistency. In
particular a vertebra could overlap with a neighbouring one.
3. There is a final correction phase, where if any overlaps are detected, the over-
lapping vertebrae are both reduced in height in proportion to the local extent
of their overlap, until they are separated. Points are moved along the local
normals (estimated from a Bezier spline) until they are separated by at least
1mm for thoracic vertebrae, and 2mm for lumbar vertebrae.
Two iterations around steps 2 and 3 are performed.
6.5 Experiments
6.5.1 Summary of optimal AAM determination
The overall optimisation problem is to select the best combination of sub-model
partitioning, fit sequencing and initialisation, and AAM form, together with various
associated parameters (e.g. profile length). It is not computationally tractable to
determine the overall best combination. So we have split the problem up into a
number of stages, at each stage determining the best approach and carrying it forward
to the next.
Since we already knew from our earlier work [113] that the sub-model triplets perform
well, we performed the initial experiments using vertebral triplets as the sub-model
structure, with the 3-point initialisation method. In this first stage we determined
the appearance model form to use, also addressing the question of whether a dynamic
ordering of the fitting (see section 5.3.3) performed better than a fixed static ordering
(from L3 to T8). For the dynamic method we used Nc = 3, with the initial candidate
115
Chapter 6. Vertebral Segmentation using Multiple AAMs
list containing the sub-models corresponding to the 3 initialisation points, i.e. the
triplets centred on T8, T12 and L3.
Secondly we compared the two initialisation methods and picked the better for sub-
sequent use. When using 10-point initialisation we increased Nc to 5, with the initial
candidate list containing alternate sub-models starting with the top-most (i.e. triplets
centred on T8, T10, T12, L2 and then L3 as before). This was done because, although
it made sense with 3-point initialisation to start with a candidate set corresponding
to the more restricted set of initialisation points, with 10-point initialisation we have
as good initial information on every sub-model, and are constrained only by the
algorithm run-time. We subsequently refined certain appearance model parameters.
Thirdly we compared different methods of applying point constraints: a simplistic
diagonal matrix, or a full covariance matrix derived from bootstrapped estimates of
prediction errors.
Fourthly we compared different sub-model structures, also comparing to a single
global AAM. We were able to determine an optimal ‡ sub-model structure.
Finally we investigated the improvement that can be obtained with fractured ver-
tebrae by certain modifications to the sub-model sequencing algorithm (see section
5.3.3), using multiple initialisations of the sub-model shape.
In each set of experiments “miss-8-out” tests were performed over the dataset. On the
earlier experiments the 3-point user initialisation was emulated by using the known
equivalent marked points and adding random offsets to them. These were zero-mean
Gaussian errors with SD of 1mm in the y-direction (along the spine) and 3mm in the
x-direction. In later experiments using a 10 point initialisation on vertebral centres,
the centre point was estimated from the manually annotated data as the mean of
the two (superior and inferior) mid-points; and then similar Gaussian errors were
added, but with the SD in the y-direction increased to 2mm, as the centre is harder
to estimate than the location of an edge. Ten replications (i.e. random initialisations)
of each image were performed.
The accuracy of the search was characterised by calculating the absolute point-to-line
distance error for each point on the vertebral body. This is computed by fitting a
smoothed Bezier spline to the annotated shape (which is treated as a gold standard),
‡strictly speaking we determind a close-to optimal
116
Chapter 6. Vertebral Segmentation using Multiple AAMs
and then for each segmented point we calculate the closest distance to this spline
(within the same vertebra). We computed the mean, median, and 95th percentiles
of the pooled point error data. The number of points which failed to be fitted ad-
equately was assessed using a cut-off threshold of 2mm on the point-to-line error,
which would be 2.5 standard deviations of manual precision (c. 0.8mm) for patients
with osteoporosis [57]. As well as accuracy we also evaluated the precision, which
measures the self-consistency of differently initialised segmentations, rather than how
well they conform to a gold standard. Precision figures were also calculated as a mean
absolute point-to-curve error, but taking the reference curve to be the mean of the 10
segmentations from the 10 different initialisations. Also one degree of freedom must
be deducted so we divide the total error by 9 in this case to obtain the precision.
A potential failure mode, particularly when fitting to fractured vertebrae, is for the
AAM to fit the top edge of a vertebra to the bottom of the vertebra above, or vice
versa (see e.g. Figure 6.8). This may happen when the initial solution is much
closer to the neighbour, because the fracture has collapsed the correct edge far from
the initial solution. We evaluated the number of cases in which this happened as
follows: if a point is more than 0.5 mm closer to the neighbour’s contour than to the
correct contour, and is within 2mm of the former, then this is flagged as a potential
misalignment. If at least 5 points along either the top or bottom 12 contour points
are thus misaligned, then the respective edge is counted as a misaligned edge. Note
that a vertebra can have two misaligned edges (i.e the top has been fitted to the
vertebra above, and the bottom has been fitted to the vertebra below).
6.5.2 AAM form used
6.5.2.1 Summmary of AAM types
We experimented with several forms of AAM. The first distinction to be made is
between “classical” triangulated regional AAMs, which sample within the convex
hull of the shape; and profile AAMs, which (like the earlier ASM) sample along
normals to the current shape, defined at each shape model point. The profile AAMs
may have a larger convergence zone, as the profiles extend over some region outside
of the current shape, just as it has been found [25] that the ASM can have a larger
convergence zone than a regional AAM. On the other hand the regional AAM has
the advantage of consistently defining a shape-free patch, by warping the sampling
117
Chapter 6. Vertebral Segmentation using Multiple AAMs
Figure 6.8: A misaligned AAM fit to T8. The superior edge has been fitted to the inferioredge of T7 (black arrows), whilst the anterior part of the inferior edge has been fitted ontothe superior edge of T9 (white arrows). Only the 6 standard morphometric points areshown to avoid obscuring the vertebrae. Note that T7 has also been displaced, and theanterior points of T9 are affected.
118
Chapter 6. Vertebral Segmentation using Multiple AAMs
region onto the mean shape via a set of triangulations (each of which can use an
affine transform). There would appear to be little useful information in the interior
of a vertebra, as the texture tends not to change much, or not consistently, within
a vertebral body. This is not like for example a human face, where possibly subtle
shading around the nose contains clues about where other parts of the face might
be. Given also the earlier success of the ASM-based work of Smyth et al [126], we
anticipated that profile AAMs would probably perform more accurately than regional
AAMs. We have experimented with the following types of AAM:
1. Profile AAMs sampling raw grey level
2. Profile AAMs sampling renormalised grey level
3. Profile AAMs sampling renormalised gradient (along profile)
4. Triangulated regional AAMs sampling raw grey level
5. Triangulated Regional AAMs sampling renormalised grey level
6. Triangulated Regional AAMs sampling renormalised 2D gradient
7. Triangulated Regional AAMs sampling corner/edge measures [123] derived
from the structure tensor ∇I∇IT §.
In these initial experiments a nominal profile length of 6mm (either side of the shape)
was used (at the highest resolution). A similar sampling scale was used with regional
AAMs, resulting in the use of 4000 pixels per vertebral triplet (at the highest resolu-
tion).
AAM search is usually performed using a pyramid of successively smoothed images,
with a commensurate reduction in resolution. We used a 2-level pyramid, starting
at half the full resolution, because the image features were not large enough to be
preserved beyond half the original resolution. Note that for profile AAMs this means
that the search starts off using a real profile distance double that of the original (i.e.
12mm), but sampling a smoothed (then sub-sampled) image at twice the final step-
size. This is also the same degree of image pyramid as that previously used by Smyth
et al [126].
§with the Cartesian image gradient ∇I as a column vector
119
Chapter 6. Vertebral Segmentation using Multiple AAMs
6.5.2.2 Texture normalisation
There can be problems in using raw grey levels, where the brightness or contrast vary
considerably across the image, or where a few particularly bright outliers may wreak
havoc with the AAM’s underlying least squares error norm. Bosch et al [15] used
a renormalisation to transform the asymmetric pixel-intensity distribution of ultra-
sound cardiac images to a Gaussian. This gave significantly improved AAM matching
results. We noted in chapter 4 that Cootes had used a sigmoidal renormalisation of
image gradient [32], see equation 4.27. However the functional form used by Cootes
and also later by Scott et al [123] is not appropriate for simple grey levels, as the
background has a non-zero expectation value, and the mean is unrelated to the con-
trast scale. Therefore for renormalising grey levels we have instead used a mapping
based on the signed square root of the Geman-McClure robust error function [61],
which like a sigmoidal form applies a standard normalisation to (-1,1) whilst tending
to truncate any extreme values. This maps a sample value gi of pixel i to g′i given
by:
gi′ =
gi − µ√(gi − µ)2 + (βσ)2
(6.2)
where µ is the sample median and the scaling factor σ is a robust estimate of the
standard deviation of the background derived from the median absolute deviation
(MAD) about the median as σ = 1.4826MAD [76]. The multiplicative factor of
1.4826 arises because it can be shown, that assuming a Gaussian distribution, the
MAD is related to the standard deviation as :
MAD
σ= Φ−1(
3
4) (6.3)
where Φ is the cumulative distribution function for the standardised Gassian distri-
bution.
The multiplier β is used because the Geman-McClure influence decreases after σ√3,
and so it is common to use β =√
3, to allow increasing influence out to one standard
devation We used this β =√
3 setting, which normalises the absolute response to be
0.5 at one standard deviation.
For the gradient models we used the sigmoidal form of equation 4.27, except that
120
Chapter 6. Vertebral Segmentation using Multiple AAMs
for one-dimensional gradients along a profile the magnitude of the gradient vector is
replaced by a simple absolute value.
6.5.2.3 Point constraint form
In all initial experiments we assumed a diagonal form of the point constraint matrix
SX in equation 4.31. The constraint weights used were the same as were used in
our original work [113], and are the inverses of the variances associated with the
following error standard deviations (mm) used for the core, overlap and predicted
points respectively (see also section 5.3.5).
σa = 1.0 (6.4)
σO = 2.5 (6.5)
σI = 5.0 on the first iteration, then (6.6)
σI = 3.5 on subsequent iterations (6.7)
Note that the prediction standard deviation is reduced from σI = 5.0 to σI = 3.5
because after the first iteration the prediction will always be based on at least one
fully fitted neighbouring vertebra, not just the initial user click-points.
6.5.2.4 AAM-form results and discussion
The accuracy results for the point-to-line error measure with the various AAM forms
are given in tables 6.1 to 6.5. Each row gives the mean, median and 95th percentile,
the percentage of errors in excess of 2mm, and the percentage of (superior or inferior)
edges erroneously fitted onto the converse edge of a neighbour. The tables give results
separated into normal and fractured vertebrae, with the latter further subdivided by
fractured grade.
It can be seen that using a dynamic ordering of sub-models generally improves the
accuracy. Profile AAMs outperform classical triangulated AAMs overall even when
using the more sophisticated “feature AAM” of Scott et al [123]. It appears that by
focussing the texture model on the most informative regions, and sampling further
outside the current shape, the AAM performance is improved. No doubt this also re-
flects the lack of interesting structure in the middle of vertebrae. However, comparing
121
Chapter 6. Vertebral Segmentation using Multiple AAMs
the triangulated corner-edge AAM with a profile gradient AAM, although the latter
generally outperforms the former, the triangulated corner-edge AAM does give better
results with grade 3 fractures. This may be because the corner-edge AAM is somehow
more robust against local minima on the edges of neighbouring vertebra. Neverthe-
less we adopt a renormalised gradient profile AAM for subsequent optimisation, and
subsequently substantially improve its performance against severe fractures.
Referring to the results for the various regional AAMs, simply renormalising the
texture works almost as well as the sigmoidal gradient AAM. Both forms of renor-
malised feature (gradient and corner-edge) AAMs give similar results. The gradient
AAM may be marginally more accurate with normal vertebrae, but the corner-edge
AAM outperforms it against grade 3 fractures, probably because of a longer conver-
gence region due to the larger spread of the feature induced by the regional smoothing
of the structure tensor implicit in the corner-edge measures. It is not surprising that
the corner-edge AAM performs slightly worse in most “normal” cases, as using a
measure based on quadratic gradients is bound to boost the noise variance, and also
makes the variance a function of the gradient level. This tends to further boost the
variance of significant feature points (prior to normalisation), although it is difficult
to assess what effect the subsequent normalisation then has. Perhaps when using
feature AAMs it might be better to modify the AAM objective function to min-
imise the residual-variance normalised sum of squares, in order to compensate for the
variance-boosting effect of the feature extraction. But in difficult cases (i.e. severe
fractures) where the initialisation is not as good, it appears that the corner-edge
feature-extraction process may give a larger convergence zone. We have refrained
from combining both gradient and corner edge feature sampling, because these more
complex AAMs have a substantially longer compute time for both search and espe-
cially AAM training. In the absence of evidence that they offer much real gain over
simpler gradient profile models, we have continued with the latter.
The dynamic ordering method is significantly better. Results cannot be directly
compared using standard parametric tests such as the t-test, because multiple repli-
cations are not independent, and the errors are far from being normally distributed
due to long error tails caused by search failure. However for matched samples (same
image set and initialisations) we can perform hierarchical bootstrap resampling on
the differences between corresponding points, as demonstrated by Scott [122]. This
allows bootstrapped confidence intervals to be constructed for the mean difference
between two methods. If for example such a 98% confidence interval does not span
122
Chapter 6. Vertebral Segmentation using Multiple AAMs
Sequence Vertebra Search Error StatisticForm Status Mean Median 95%-ile %ge errors %ge edges
(mm) (mm) mm >2mm misalignedNormal 0.95 0.58 1.86 8.8% 1.7%
Dynamic Fractured 1.80 0.84 5.14 25.0% 12.8%Sequence Grade 1 1.26 0.70 3.01 16.3% 8.1%
Grade 2 1.53 0.77 3.89 20.9% 10.0%Grade 3 2.79 1.33 7.78 40.7% 30.0%Normal 1.17 0.62 2.58 13.0% 3.1%
Static Fractured 2.12 0.93 6.08 29.0% 14.8%Sequence Grade 1 1.41 0.73 3.62 18.8% 10.0%
Grade 2 1.98 0.85 5.40 26.3% 12.3%Grade 3 3.15 1.60 8.10 44.6% 32.8%
Table 6.1: Search error statistics (point-to-line) for 6mm profile gradient AAM
Sequence Vertebra Search Error StatisticForm Status Mean Median 95%-ile %ge errors %ge edges
(mm) (mm) mm >2mm misalignedNormal 1.10 0.63 2.46 12.9% 4.00%
Dynamic Fractured 1.90 0.97 5.26 27.6% 14.8%Sequence Grade 1 1.42 0.81 3.72 20.2% 9.9%
Grade 2 1.72 0.88 4.39 23.6% 12.4%Grade 3 2.68 1.50 7.24 41.6% 32.9%Normal 1.32 0.68 3.33 16.6% 5.3%
Static Fractured 2.15 1.03 6.03 30.7% 15.2%Sequence Grade 1 1.51 0.81 4.05 22.1% 9.7%
Grade 2 1.99 0.92 5.24 27.0% 13.8%Grade 3 3.14 1.71 7.97 45.9% 32.9%
Table 6.2: Search error statistics (point-to-line) for 6mm profile intensity AAM
123
Chapter 6. Vertebral Segmentation using Multiple AAMs
Sequence Vertebra Search Error StatisticForm Status Mean Median 95%-ile %ge errors %ge edges
(mm) (mm) mm >2mm misalignedNormal 1.06 0.61 2.27 11.8% 3.4%
Dynamic Fractured 1.88 0.95 5.24 27.1% 14.2%Sequence Grade 1 1.38 0.80 3.34 19.2% 8.9%
Grade 2 1.64 0.85 3.92 21.9% 11.1%Grade 3 2.78 1.54 7.25 43.0% 33.2%Normal 1.30 0.67 3.24 16.2% 4.8%
Static Fractured 2.15 1.03 5.90 30.0% 15.2%Sequence Grade 1 1.42 0.81 3.52 19.9% 8.6%
Grade 2 2.03 0.96 5.14 27.1% 13.9%Grade 3 3.17 1.71 8.03 45.9% 34.2%
Table 6.3: Search error statistics (point-to-line) for 6mm profile renormalised intensityAAM
Sequence Vertebra Search Error StatisticForm Status Mean Median 95%-ile %ge errors %ge edges
(mm) (mm) mm >2mm misalignedNormal 1.24 0.68 3.00 16.3% 4.6%
Dynamic Fractured 2.20 1.12 5.97 33.5% 16.8%Sequence Grade 1 1.61 0.89 4.23 24.7% 11.6%
Grade 2 2.00 1.01 5.32 30.33% 15.9%Grade 3 3.15 1.85 8.12 48.0% 34.4%Normal 1.43 0.73 3.67 19.3% 5.1%
Static Fractured 2.33 1.15 6.27 34.7% 17.6%Sequence Grade 1 1.68 0.92 4.31 26.3% 12.1%
Grade 2 2.18 1.04 5.63 31.6% 15.6%Grade 3 3.32 1.91 8.18 48.8% 37.5%
Table 6.4: Search error statistics (point-to-line) for classical region intensity AAM
124
Chapter 6. Vertebral Segmentation using Multiple AAMs
Sequence Vertebra Search Error StatisticForm Status Mean Median 95%-ile %ge errors %ge edges
(mm) (mm) mm >2mm misalignedNormal 1.15 0.66 2.61 14.0% 3.7%
Dynamic Fractured 2.01 1.03 5.42 30.5% 16.0%Sequence Grade 1 1.45 0.82 3.81 22.3% 9.8%
Grade 2 1.80 0.95 4.49 26.5% 14.1%Grade 3 2.96 1.70 7.59 45.5% 35.9%Normal 1.36 0.72 3.33 17.7% 4.5%
Static Fractured 2.27 1.13 6.09 33.5% 17.4%Sequence Grade 1 1.61 0.91 4.16 24.4% 11.5%
Grade 2 2.11 1.03 5.32 30.4% 15.4%Grade 3 3.28 1.88 7.99 48.3% 37.6%
Table 6.5: Search error statistics (point-to-line) for classical region intensity renormalisedAAM
Sequence Vertebra Search Error StatisticForm Status Mean Median 95%-ile %ge errors %ge edges
(mm) (mm) mm >2mm misalignedNormal 1.16 0.66 2.50 13.2% 3.2%
Dynamic Fractured 1.96 0.93 5.61 28.2% 14.6%Sequence Grade 1 1.41 0.75 3.69 18.9% 9.4%
Grade 2 1.68 0.86 4.50 24.3% 12.1%Grade 3 2.99 1.57 7.64 44.2% 33.1%Normal 1.42 0.71 3.53 17.1% 4.0%
Static Fractured 2.18 1.01 5.98 30.8% 16.2%Sequence Grade 1 1.59 0.81 4.43 22.5% 11.4%
Grade 2 2.02 0.92 5.22 27.7% 14.3%Grade 3 3.09 1.58 7.82 44.7% 34.4%
Table 6.6: Search error statistics (point-to-line) for classical region intensity sigmoidal 2Dgradient AAM
125
Chapter 6. Vertebral Segmentation using Multiple AAMs
Sequence Vertebra Search Error StatisticForm Status Mean Median 95%-ile %ge errors %ge edges
(mm) (mm) mm >2mm misalignedNormal 1.18 0.78 2.67 16.6% 4.2%
Dynamic Fractured 1.74 1.06 4.15 28.7% 13.1%Sequence Grade 1 1.43 0.89 3.32 21.5% 8.6%
Grade 2 1.71 1.10 3.96 30.2% 15.2%Grade 3 2.14 1.31 5.24 33.5% 24.0%Normal 1.20 0.78 2.73 17.0% 4.4%
Static Fractured 1.89 1.14 4.57 31.0% 13.7%Sequence Grade 1 1.41 0.87 3.27 21.4% 6.6%
Grade 2 1.98 1.26 4.58 33.8% 15.6%Grade 3 2.36 1.44 5.97 39.0% 28.5%
Table 6.7: Search error statistics (point-to-line) for region corner feature AAM
zero, then this can be interpreted as implying a significant difference at the 2% level.
Further details on this adaptation of hierarchical bootstrap confidence intervals are
given in [122].
The symmetric (in probability) bootstrapped confidence interval for the difference be-
tween static and dynamic fitting sequences for gradient profile models is [0.155,0.279].
This implies a significant difference at the 2% level, as zero is not spanned by the
interval. Similarly for a triangulated regional AAM with renormalised gradient sam-
pling, the corresponding confidence interval is [0.138,0.306], so again the dynamic
sequencing results in a statistically significant improvement at the 2% level.
Comparing now a classical regional AAM with renormalised 2-D gradient to the pro-
file AAM (with renormalised 1-D gradient), we obtain a bootstrapped 98% confidence
interval for the mean difference between the two AAMs (both with dynamic sequenc-
ing) of [0.149,0.259]. The corresponding confidence interval is [0.150,0.292] for the
difference between a regional feature AAM (corner-edge) and this same profile AAM.
Therefore we conclude that profile AAMs outperform triangulated region AAMs in
this application, probably because the convergence zone is increased by using a profile
which samples outside the current shape, whilst there is little useful internal infor-
mation within the inside of the vertebra. However there is some suggestion that the
corner-edge feature AAM may perform better on severe fractures. Later we find other
means of improving the performance of the profile AAM in the severe fracture cases.
126
Chapter 6. Vertebral Segmentation using Multiple AAMs
6.5.3 Initialisation Method and AAM profile length
Having selected profile gradient models as the AAM form to use, we then compared
the accuracy results for 3-point and 10-point initialisation methods. Only the dy-
namic sequencing method is used in these later experiments. Comparing the results
for 10-point initialisation given in table 6.8 with those already given for 3-point ini-
tialisation in table 6.1, it can be seen that using the full 10 points can improve the
accuracy, particularly for the fractured vertebrae. The 98% bootstrapped confidence
interval for the difference in mean error between 3-point and 10-point initialisation
methods is [0.149,0.281]. For fractured vertebrae only, the confidence interval is
[0.514,0.829]. The difference is therefore statistically significant, especially for frac-
tured vertebrae. Therefore the 10-point initialisation method is used subsequently.
We also evaluated several sampling profile lengths, and refined the sampling step size
by scaling relative to the mean vertebral size. We continued with a nominal 1mm for
the lower lumbar vertebrae (L2-L4), but other vertebrae have the step length reduced
in approximately the ratio of their mean mid-point height to that of L2. This results
in a nominal sampling length of 0.75 at T7. We varied the semi-profile length in
steps of 2 from using 6, 8 and 10 steps. Tables 6.8 to 6.10 give the results. There
is a small but significant reduction in mean error in moving from a profile length of
6 to 8 steps - the 98% confidence interval for the reduction is [0.003,0.033] overall,
and [0.051,0.362] on severely fractured vertebrae. There is no statistically significant
difference (at the 2% level) in overall mean error between an 8 step and a 10 step
profile. Also using 10 steps tends to mean that once a vertebra is a grade 2 fracture
or above, the inner samples from opposite sides (inferior and superior edges) tend to
intersect, so we start to introduce spurious redundancy into the model.
We selected 8 steps as the profile length to use.
6.5.4 Point constraint form
As discussed in section 5.3.5 in chapter 5, we had originally treated point constraints
as independent, i.e. the matrix SX in equation 4.31 is diagonal. We now compare
this method to using a full covariance matrix derived as described in 5.3.5 from a
bootstrapped estimate of the overlap point prediction errors. Results for the same
set of miss-8-out experiments are presented in table 6.11.
127
Chapter 6. Vertebral Segmentation using Multiple AAMs
Vertebra Search Error StatisticStatus Mean Median 95%-ile %ge errors %ge edges
(mm) (mm) mm >2mm misalignedNormal 0.74 0.56 1.58 5.1% 0.3%Fractured 1.15 0.70 2.63 14.8% 5.9%Grade 1 0.85 0.60 1.78 8.1% 2.8%Grade 2 1.01 0.66 2.25 12.3% 4.6%Grade 3 1.69 0.93 4.57 26.2% 15.1%
Table 6.8: Search error statistics (point-to-line) for 6 step profile gradient AAM
Search Error StatisticVertebra Mean Median 95%-ile %ge errors %ge edgesStatus (mm) (mm) mm >2mm misalignedNormal 0.72 0.55 1.54 4.7% 0.2%Fractured 1.08 0.68 2.41 13.4% 5.2%Grade 1 0.80 0.58 1.73 6.8% 3.2%Grade 2 1.00 0.67 2.26 12.4% 4.3%Grade 3 1.53 0.84 4.12 22.5% 12.9%
Table 6.9: Search error statistics (point-to-line) for 8 step profile gradient AAM
Search Error StatisticVertebra Mean Median 95%-ile %ge errors %ge edgesStatus (mm) (mm) mm >2mm misalignedNormal 0.75 0.57 1.62 5.4% 0.2%Fractured 1.04 0.67 2.34 12.8% 4.6%Grade 1 0.81 0.57 1.76 7.5% 2.7%Grade 2 0.97 0.65 2.17 11.3% 3.7%Grade 3 1.41 0.83 3.55 20.9% 10.8%
Table 6.10: Search error statistics (point-to-line) for 10 step profile gradient AAM
Vertebra Search Error StatisticStatus Mean Median 95%-ile %ge errors %ge edges
(mm) (mm) mm >2mm misalignedNormal 0.77 0.56 1.61 5.8% 0.5%Fractured 1.15 0.71 2.68 15.4% 6.8%Grade 1 0.84 0.61 1.76 7.6% 3.9%Grade 2 1.02 0.70 2.39 13.0% 5.2%Grade 3 1.71 0.93 4.72 27.5% 16.3%
Table 6.11: Search error statistics using full covariance matrix for point constraints
128
Chapter 6. Vertebral Segmentation using Multiple AAMs
Comparing to the results in 6.9 it appears that there is no advantage in using the
more complex model of point covariances. There is a small deterioration in mean
accuracy of around 0.05mm arising from using the more complex non-diagonal form
of SX , rising to 0.2mm on grade 3 fractures. Of course even this non-diagonal form of
SX is somewhat ad hoc, as the true error covariance is unknown, and has a circular
dependence on the constraint form assumed. One problem may be that even with
a non-diagonal form of SX , the AAM formulation assumes that the residual errors
are themselves independent, so equation 4.31 has a diagonal form of texture error
covariance matrix. But, as we noted in section 5.3.6 in chapter 5, the actual dis-
tribution of the normalised residual sum of squares is far from being χ2 distributed.
There is a large departure in both mean and variance from that expected. We specu-
lated that under-training and deficiences in the AAM update matrix R will introduce
spatial correlation between nearby residuals. Perhaps there is no point in using more
complex models for SX , whilst using a simple diagonal form of the texture residuals’
Mahalanobis distance. Also as the actual residual variance is underestimated, there
is always a problem in using the correct commensurate scale between texture error
and point spatial constraints.
It seems that given the more fundamental underlying problem of how to formulate
an AAM objective function given the unknown residual distribution, the simpler
diagonal constraint form of equation 4.33 works well, as to some extent the errors in
both diagonal assumptions cancel each other out.
In other applications which use our sub-model combination approach it might be
more worthwhile to use a full non-diagonal point covariance matrix in the manner we
have developed, especially if there is less (or no) overlap between the sub-models, and
the constraints arise only from predictions through the global shape model. However
in such case some further rescaling of the texture error residual sum of squares, such
as we introduced in section 5.3.6, should be trialled, in order to partially compensate
for ignoring “model noise” and residual spatial correlation.
It might also be possible to slightly improve our current results by attempting to
optimise the constraint weights used, especially as we know that the assumed residual
variances will be incorrect. However there is also the danger of over-tuning parameters
to a particular dataset, so we have not attempted to do this. We have continued to
use the initial point constraint weights of [113] (see also section 6.5.2.3).
129
Chapter 6. Vertebral Segmentation using Multiple AAMs
6.5.5 Optimisation of the sub-model structure
Having determined a reasonably good form of triplet profile model with 10-point ini-
tialisation, we then experimented with using this model form with other sub-model
structures. We evaluated sub-models using single vertebrae, a central vertebra plus
neighbouring semi-vertebrae (semi-triplets), vertebral triplets as before, quintets (five
vertebrae), and once again a full global model. Note that when using quintet struc-
tures we continued to use a triplet at both extremes of the spine (T7-T9 and L2-L4).
Tables 6.12 to 6.14 present the point-to-line error statistics for the three alternative
evaluated sub-model structures, and 6.15 shows similar results for using a single
global AAM. Note that the comparable results for triplet sub-models have already
been given in table 6.9. Single vertebrae sub-models do not appear to be reliable,
especially with fractured vertebrae. One problem with them is the rather high pro-
portion of cases in which a vertebral edge is confused with the converse edge of a
neighbour. It seems that not including any information about the neighbours in the
sub-model does not allow the search process to distinguish between the correct edge,
and a local minimum on the converse edge of a neighbour. Once we reach the semi-
triplet structure, there is a substantial increase in accuracy, especially for fractured
vertebrae. By including some information about neighbouring vertebrae the models
and the search process are better constrained, and better able to distinguish between
the correct vertebrae and erroneous solutions on neighbours. There is a further small
improvement moving up to triplets, and again with the quintets. In the latter case
the main advantage appears to be with the severe (grade 3) fractures, although the
relatively small numbers of vertebrae involved means it is difficult to be sure. The
98% confidence interval for the reduction in overall mean error moving from triplet to
quintet structures is [-0.003,0.027]. So the overall difference is not significant at the
2% level, but if we look at the difference only for fractured vertebrae the confidence
interval changes to [0.001,0.103], and so there appears to be a marginally significant
improvement.
If we continue to the largest possible structure and compare the better-performing
sub-model structures with the global AAM, we see that use of a single global AAM
gives substantially poorer results for fractured vertebrae. Comparing results for nor-
mals, the global model is a little worse than triplets or quintets, but this difference
is much smaller than we found with earlier results on a smaller (78 vertebrae) train-
ing set [113], for which the mean error on normal vertebrae reduced from 1.28mm
130
Chapter 6. Vertebral Segmentation using Multiple AAMs
(global AAM) to 0.88mm (triplets). It appears that when the training set is suf-
ficiently expanded much of the advantage of the sub-model approach disappears in
largely “normal” cases. However even with a much-expanded training set the global
AAM is not able to cope with fractures. Even on grade 1 fractures we obtain rather
poor accuracy. This is not just a matter of shape-model undertraining. Tables 6.16
to 6.18 compare the underlying shape model re-fitting accuracies for triplets, quin-
tets, and a global shape model. These are obtained by randomly selecting 16 images
as a test set, and then training shape and appearance models on the remaining data.
The shape models are then refitted to the manually segmented points ¶, and residual
point-to-line errors calculated. The global shape model is less accurate, with around
double the mean errors of the triplet sub-models. There is a small deterioration in
the intrinsic shape model accuracy from triplets to quintets, but this is not reflected
in the AAM search errors, which are dominated instead by errors in locating the
correct solution.
Although the underlying global shape model is less accurate, it is clear that the global
AAM fitting errors are larger than can be explained by the shape model deficiencies
alone. For example the mean segmentation error with grade 2 fractures is 1.4mm,
compared to 0.7mm in the intrinsic shape model accuracy. With the global model
there are 18.5% of points with segmentation errors beyond 2mm, whereas only 4.9%
of points would have this degree of error due to shape model undertraining alone.
It may be that the global AAM search is fundamentally biased towards the prior
of the shape models - i.e. the mean (largely normal) shapes - and cannot cope
well with pathologies. This may partly be due to inadequacies in the AAM update
matrix, and partly due to other difficulties with global texture, such as the problem of
normalisation across an image with varying brightness/contrast‖. For example Figure
6.9 shows an image with a huge difference in brightness between the thoracic and
lumbar vertebrae. The thoracic vertebrae are invisible on the original image, though
can be just seen with locally enhanced contrast. Using sub-models makes this kind
of contrast adjustment more adaptable, and means that the sampled renormalised
structure is more likely to fit a linear model.
¶strictly speaking the appearance model constraints are applied in addition to the shape con-straints
‖see also Figure 4.4 which shows that a major mode of the global APM is brightness variationacross the image
131
Chapter 6. Vertebral Segmentation using Multiple AAMs
Figure 6.9: There is a large variation in brightness across the original (leftmost) image.Local contrast enhancement (right) is necessary to reveal the mid-thoracic vertebrae
Search Error StatisticVertebra Mean Median 95%-ile %ge errors %ge edgesStatus (mm) (mm) mm >2mm misalignedNormal 0.90 0.63 1.90 8.9% 1.2%Fractured 1.75 0.94 4.66 25.3% 13.9%Grade 1 1.16 0.73 2.48 14.0% 5.9%Grade 2 1.43 0.88 3.44 21.5% 11.3%Grade 3 2.85 1.58 7.64 43.6% 35.4%
Table 6.12: Search error statistics (point-to-line) for single vertebra sub-models
132
Chapter 6. Vertebral Segmentation using Multiple AAMs
Search Error StatisticVertebra Mean Median 95%-ile %ge errors %ge edgesStatus (mm) (mm) mm >2mm misalignedNormal 0.77 0.58 1.60 5.2% 0.4%Fractured 1.18 0.73 2.84 16.1% 6.5%Grade 1 0.90 0.63 1.96 9.5% 3.2%Grade 2 1.06 0.69 2.42 13.8% 5.9%Grade 3 1.63 0.93 4.27 25.5% 14.2%
Table 6.13: Search error statistics (point-to-line) for semi-triplet sub-models
Search Error StatisticVertebra Mean Median 95%-ile %ge errors %ge edgesStatus (mm) (mm) mm >2mm misalignedNormal 0.71 0.54 1.51 4.1% 0.1%Fractured 1.02 0.66 2.27 12.4% 4.5%Grade 1 0.76 0.59 1.60 5.4% 2.8%Grade 2 1.00 0.66 2.28 12.6% 4.4%Grade 3 1.35 0.80 3.31 20.3% 9.4%
Table 6.14: Search error statistics (point-to-line) for quintet sub-models
Search Error StatisticVertebra Mean Median 95%-ile %ge errors %ge edgesStatus (mm) (mm) mm >2mm misalignedNormal 0.80 0.57 1.65 6.2% 1.0%Fractured 1.55 0.76 4.36 20.1% 9.9%Grade 1 1.15 0.63 2.39 12.5% 6.6%Grade 2 1.41 0.75 3.57 18.5% 8.8%Grade 3 2.20 0.99 6.76 31.3% 21.2%
Table 6.15: Search error statistics (point-to-line) for single global model
Search Error StatisticVertebra Mean Median 95%-ile %ge errors %ge errorsStatus (mm) (mm) mm >1mm >2mmNormal 0.20 0.15 0.56 0.4% 0.0%Grade 1 0.24 0.18 0.67 0.9% 0.0%Grade 2 0.34 0.24 0.96 4.4% 0.5%Grade 3 0.49 0.34 1.39 12.3% 1.1%
Table 6.16: Shape Model Intrinsic Accuracy for Triplet sub-models
133
Chapter 6. Vertebral Segmentation using Multiple AAMs
Search Error StatisticVertebra Mean Median 95%-ile %ge errors %ge errorsStatus (mm) (mm) mm >1mm >2mmNormal 0.26 0.20 0.70 1.2% 0.0%Grade 1 0.34 0.25 0.97 1.1% 0.0%Grade 2 0.37 0.27 1.03 4.6% 0.5%Grade 3 0.54 0.40 1.58 14.5% 2.4%
Table 6.17: Shape Model Intrinsic Accuracy for Quintet sub-models
Search Error StatisticVertebra Mean Median 95%-ile %ge errors %ge errorsStatus (mm) (mm) mm >1mm >2mmNormal 0.43 0.34 1.16 7.9% 0.5%Grade 1 0.56 0.43 1.47 15.6% 1.4%Grade 2 0.69 0.51 1.98 21.9% 4.9%Grade 3 0.87 0.69 2.35 32.9% 8.5%
Table 6.18: Shape Model intrinsic accuracy for a single global model
Figure 6.10: Mean point-to-line errors (mm) by vertebral fracture grade, comparingquintet sub-model AAMs to a global AAM. The associated bootstrapped 98% confidenceintervals (quintets first,global second) are as follows. Normal Vertebra: [0.693,0.737] vs[0.777,0.834]; grade 1 [0.690,0.836] vs [0.910,1.46]; grade 2 [0.875,1.15] vs [1.17,1.71]; grade3 [1.17,1.63] vs [1.79,2,56].
134
Chapter 6. Vertebral Segmentation using Multiple AAMs
6.5.6 Multiple Initialisations for Fractured Vertebrae
We have already noted that an AAM may fail to converge on a severely fractured
vertebra, because instead it locates a local minimum by fitting to the converse edge
of a neighbour(s). Although this occurs less with the best AAM form, quintet sub-
models, and 10-point initialisation, there remains the possibility of edge confusion.
These local optima may be more likely when a severe fracture has collapsed the central
vertebra so far from the AAM’s initial solution that the neighbouring edges are more
likely to be within the starting solution’s sampling profile. Another possibility is that
the true solution is not reachable using the AAM update matrix because it is too
far from the displacement set that lies in the linear region validly modelled by the
trained Jacobian. We therefore investigated the use of two alternative initialisations,
which were chosen to represent moderately and severely fractured vertebrae.
To do this, we extended the sub-model combination method by adding two alternative
starting solutions for each sub-model in the (size Nc) candidate list. At each iteration
two alternative fractured variants are generated (see next paragraph) for the central
vertebra in each candidate sub-model. Each of these 3Nc AAM searches is now
tentatively run. The best fitting of all these 3Nc searches is selected as the solution
to impose at this iteration, and its submodel is removed from the list of current
candidates. In other respects the method proceeds as in chapter 5, section 5.3.4. So
having picked the best of the 3Nc candidiates, the points of the central vertebra of the
sub-model are saved in an overall solution vector, as are points in either neighbour
which have not been previously determined. Likewise the central points now have
high constraint weights attached, to ensure consistency for subsequent searches with
neighbouring sub-models. Finally a new candidate sub-model is now added as before
(if possible), and this completes an iteration of the fitting sequence. As noted above
in section 6.5.1, we use Nc = 5, so in this variation we run 15 AAM searches during
the initial iterations. Obviously there is a penalty in computation time. On an
Intel Centrino laptop the computation time increases from around 20 seconds to one
minute, though this could be reduced on multi-core processors by running parallel
execution threads.
The alternative fractured initialisations of the central vertebra are generated thus.
Firstly a lower bound is calculated for the posterior height. The McCloskey prediction
method is used [95] (and see also 3.3.2 in chapter 3) to calculate a predicted posterior
height Hp(pred) from the current solutions for the 4 nearest neighbours. The lower
135
Chapter 6. Vertebral Segmentation using Multiple AAMs
bound is then set to
Hp(min) = 0.75Hp
(pred) (6.8)
Next the posterior height Hp of the sub-models’s central vertebra is reduced by 15%,
to allow for a potential misfit onto both neighbours in the worst case, and then
reduced by a further factor rp(g) depending on required fracture grade g, but subject
to the lower bound above.
So given the current posterior height Hp the new height is set to
Hp′ = max(0.85rp
(g)Hp, Hp(min)) (6.9)
Then the mid-height and anterior height are reduced using standard factors rm(g) and
ra(g) respectively. So
Hm′ = rm
(g)Hp′ Ha
′ = ra(g)Hp
′ (6.10)
We used the following reduction factors
rp(2) = 0.925 rm
(2) = 0.7 ra(2) = 0.825 (6.11)
rp(3) = 0.8 rm
(3) = 0.55 ra(3) = 0.7 (6.12)
Having calculated the desired new heights, the posterior, middle and anterior refer-
ence points are moved equidistantly along their current separation vectors to achieve
the required respective heights (as their new separation distance). This fixes the
corners and mid-points in the central vertebra (i.e. the standard 6 morphometric
points). The corners and mid-points in neighbouring vertebrae are temporarily fixed
at their current values, and the alternative solution (for the central vertebra of the
triplet) is then initialised to the best sub-model shape model fit to these (18) fixed
points. Note that the other vertebrae in the sub-model do not have any adjustment
made, although their standard reference points are temporarily set to be fixed for
the purpose of estimating the initialisation of the central vertebra. Note also that
the constraint weights associated with the alternative initialisation are always reset
to the low values of a prediction, even if the vertebra had already been provisionally
136
Chapter 6. Vertebral Segmentation using Multiple AAMs
located as part of another sub-model’s search.
6.5.6.1 Results for multiple initialisations
The results of using the alternative initialisations are given in table 6.19, for both
triplet and quintet sub-structures. Interestingly when using the alternative fractured
initialisations the results for triplets and quintets are also much closer to each other
than when running with a single initialisation. There may be a small improvement,
particularly for the triplets. Any apparent improvement occurs mainly for the grade
3 fractures, with triplet sub-models. Here the mean error is reduced from 1.53mm
to 1.31mm and the percentage of edges misfit is reduced from 12.9% to 9.1%. The
improvement is less than that we reported earlier in [116], mainly because further
optimisation of the AAM (and perhaps also the inclusion of more fractured vertebrae
in the training set), had already improved the results for grade 3 fractures, even
without the alternative initialisation. The 98% confidence interval for the reduction
in mean error on grade 3 fractures is [0.061,0.322], and so this is significant at the
2% level. There is less apparent improvement when using quintets, though the mean
error for grade 3 fracures does reduce from 1.35mm to 1.28mm. The 98% confidence
interval for the mean reduction (on grade 3 fractures) is [-0.055,0.256], and so fails
to be statistically significant at the 2% level.
We conclude that when using quintet structures the increase in computational effort
in using multiple initialisations is not worthwhile, as the method is already very
reliable with only a single initialisation.
6.5.7 Discussion
6.5.7.1 Summary of AAM optimisation
The dynamic ordering of the sub-model fit improves the robustness and accuracy of
the search. The main improvements are in the tails of the error distributions, and
in the more difficult cases such as fractured vertebrae and their neighbours. On the
vertebral dataset, profile AAMs perform better than the classical triangulated region
AAM, even when the more sophisticated feature-AAM of Scott et al [123] is used
for the latter. A profile length of around 8mm in the lower lumbar works well (and
137
Chapter 6. Vertebral Segmentation using Multiple AAMs
Sub-model Vertebra Search Error StatisticForm Status Mean Median 95%-ile %ge errors %ge edges
(mm) (mm) mm >2mm misalignedNormal 0.75 0.56 1.60 5.4% 0.1%
Triplets Fractured 1.00 0.67 2.22 12.2% 4.1%Grade 1 0.79 0.58 1.72 6.7% 2.4%Grade 2 0.97 0.67 2.18 12.1% 3.9%Grade 3 1.31 0.77 3.25 18.8% 9.1%Normal 0.73 0.55 1.55 4.7% 0.1%
Quintets Fractured 0.98 0.65 2.16 11.4% 3.9%Grade 1 0.76 0.59 1.60 5.1% 2.0%Grade 2 0.96 0.65 2.20 11.9% 4.3%Grade 3 1.28 0.78 3.00 18.3% 7.9%
Table 6.19: Search error statistics using alternative fractured initialisations
then scaling relative to vertebral height for the thoracic vertebrae).
We optimised the sub-model structure used, finding that groups of 5 vertebrae (quin-
tets) were marginally superior, though the performance gain over triplets was small,
and largely confined to severe fractures. We found that single vertebra models were
unreliable. At the other extreme we confirmed that a single global model was also
unable to accurately segment fractured cases, though with normal vertebrae its per-
formance was more similar to the sub-model approach. This somewhat contradicted
our earlier result obtained on a smaller training set [113]. It appears that with a small
training set there is more gain to be obtained from decomposing the structure into
multiple sub-models, but this accuracy difference is gradually eroded as the training
set is extended; but only for normal cases. The global AAM still maintains too
great an a priori bias towards the mean shape to cope with pathologies or unusual
sub-shapes (see Figure 6.10). We therefore recommend our sub-model approach be
considered in other applications.
6.5.7.2 Overall final accuracy
The accuracy results in Table 6.14 demonstrate that the performance of the multi-
AAM segmentation on normal vertebrae (mean 0.71mm) is comparable with manual
precision (0.55mm population, 0.81mm osteoporotic patients [57]). “Normal”means
not fractured, but some images in the dataset included disc disease, which can lead
138
Chapter 6. Vertebral Segmentation using Multiple AAMs
to narrowed disc spaces and the endplates of two adjacent vertebrae being closer
together, so that there is no clear edge to separate individual vertebrae. Other images
include large osteophytes, which can confuse the positions of the vertebral corners.
Despite some difficult images, over 95% of points in normal vertebrae are located to
within 2mm. The overall accuracy is better than other comparable cited figures in
the literature [126, 143, 18, 77]. For example de Bruijne et al [18] obtained a mean
point-to-line accuracy of 1.4mm on lumbar radiographs using shape particle filtering
- a stochastic search algorithm which uses a distribution of candidate shapes. Our
overall mean point-to-line error is half that at 0.7mm, even though DXA images are
typically noisier than conventional radiographs and have poorer spatial resolution;
and we also include the mid-thorax, which has more soft-tissue clutter and a greater
probability of fracture. On the other hand the method of de Bruijne et al [18] is
completely automatic, whereas our system uses an approximate manual initialisation
on the vertebral centres. There may be future scope in combining approaches and
using a stochastic search, followed by our AAM based method for fine accuracy. This
might be more appropriate for very large pharmaceutical trials or epidemiological
studies, whereas in a clinical setting an approximate manual initialisation by the
viewing clinician is reasonable when this takes only a few seconds.
A decomposition of the accuracy results by individual vertebrae is given in Table 6.20.
The accuracy is consistently good over most of the spine, but with some deterioration
in accuracy on the upper and lower vertebrae (L4 and T7). This is probably because
the extreme vertebrae tend to be noisier, and L4 can be partially obscured by the iliac
crest. The precision is excellent at around 0.1mm for L2-T9, and then deteriorating
to be in excess of 0.2 at the extreme vertebrae (L4 and T8). This is still better than
typical manual precision.
The accuracy performance on fractured vertebrae is not as good as on non-fractured
vertebrae, but is still promising, with a median accuracy of 0.66mm, and mean error
1mm, which is comparable to manual precision. Fractured vertebrae present a more
challenging problem for an approach using statistical models, as the variation is much
greater for pathological cases than for normal vertebrae. As well as the greater vari-
ation in shape, the appearance is more complex, as in addition to the edge structure
provided by the endplate, there may also be a remnant of the outer ring of cortical
bone, and this can provide a secondary outer edge to confuse the search algorithm.
Approximately 88% of points in fractured vertebrae are located to within 2mm. The
accuracy on fractured vertebrae deteriorates with fracture grade (Table 6.14, Fig-
139
Chapter 6. Vertebral Segmentation using Multiple AAMs
Search Error StatisticVertebra Mean Accuracy (mm) Mean Precision (mm)T7 0.88 0.21T8 0.75 0.16T9 0.72 0.l1T10 0.70 0.10T11 0.74 0.10T12 0.80 0.12L1 0.81 0.11L2 0.77 0.12L3 0.83 0.19L4 0.95 0.24
Table 6.20: Accuracy and Precision (mm) by individual vertebrae. The accuracy is themean point-to-line error averaged over the vertebrae. The precision is calculated similarlybut taking the mean of the 10 segmentations as the reference curve, and deducting onedegree of freedom when calculating the mean
ure 6.10), with the mean error increasing from 0.76 mm in grade 1 fractures to
1.35mm in grade 3. The error distributions are generally skewed, as a number of
search failures produce long tails to the distributions. The skewed error distribution
increases the mean error to 1.35 mm for grade 3 fractures, whereas the median is
only 0.8mm. Around 20% of points in grade 3 fractures have an error beyond 2mm.
We had initially suspected that the tail of the error distribution was substantially
due to undertraining of the shape model on fractures. However, the shape model
intrinsic accuracy results (Table 6.17) indicate that, although this plays some part,
the undertraining alone is not sufficient to explain the size of the error tail. For
example only 0.5% of points in grade 2 fractured vertebrae cannot be fitted at all to
within 2mm because of shape model inadequacies, and even with grade 3 fractures
only 2.4% would fail on this basis alone. The shape model is adequate up to grade
2 fractures, and although in grade 3 fractures 14.5% of points cannot be fitted at all
to within 1mm, the mean fitting error of 0.54mm is still adequate. The tail of the
fitting error for the grade 3 fractures should reduce with more training examples.
If inadequate training of the shape model does not substantially account for the fitting
failures on grade 3 fractures, then the most likely explanation is that the AAM search
locates some form of local optimum. A possible erroneous local optimum is where
the top of the vertebra is fitted to the bottom of the neighbouring vertebra above, or
the converse (Figure 6.8). The number of such misaligned edges increases from 2.8%
with grade 1 fractures to 9.4% with grade 3 fractures (Table 6.14). We attempted to
140
Chapter 6. Vertebral Segmentation using Multiple AAMs
avoid these misaligned solutions by searching from multiple initialisations (starting
from deliberately fractured shapes), but although there was a marginal reduction
down to 7.9% of misaligned edges, and the grade 3 mean dropped slightly to 1.28,
overall this very slight improvement is probably not worth the additional compute
time. If the somewhat less robust triplet structures, or a less highly optimised AAM
were used, then it would become more worthwhile. Ultimately there is a limit set by
the fact that grade 3 fractures have low BMD, which results in a poor signal strength.
It is likely that this inevitably makes them very difficult to segment accurately. In
fact the manually annotated “gold standard” has itself become somewhat tarnished
due to difficulties in visualising the exact location of the edge of the vertebral body
at poor signal-to-noise ratio.
The AAM segmentation provides a detailed shape outline using between 32 and
40 points (Figure 6.3), and so provides much more information than is obtained
in current six point morphometry. In [32] it was shown that the use of a computer-
assisted technique to position tangential lines which define marker placements and the
vertebral axis produced better precision in vertebral height measurements, because a
degree of subjectivity in the placement of points along the endplate boundaries was
removed, and a more consistent definition of the mid-vertebral axis was also obtained.
We anticipate that, for the same reasons, the use of a detailed shape model rather
than just six standard points would also allow better precision in extracted height
information, when this is required for standard morphometric techniques.
The use of AAM based methods in a clinical tool would be much quicker than man-
ual morphometric methods. The tool requires an approximate manual initialisation
(on the centre of each vertebra), but this takes only around 10 seconds per image
to perform, and need not be precise. The automatic algorithms then locate all the
vertebral endplates in around 30 seconds, rather than the minutes (typically 5-15 min-
utes) that would be required to perform manual 6-point placement. This algorithm
run-time could be improved on multi-core processors as the dynamically sequenced
search could be multi-threaded (each candidate AAM search is independent).
It is a limitation of the study that the more modern finer resolution (0.35m) data was
not used to its full potential, as the presence of a substantial portion of older data led
us to construct appearance models appropriate to the worst case (1mm) resolution.
As we acquire more modern data we will be able to build finer scale models that would
be expected to yield better accuracy. We are also currently extending the modelling
141
Chapter 6. Vertebral Segmentation using Multiple AAMs
up to T4, as the more modern DXA scanners tend to produce better images of the
upper thoracic vertebrae than in some of the older data used in the study.
The advantages of sub-model decomposition are greater with small training sets, and
with a large enough training set may become more marginal and apply mainly to
unusual examples (e.g. pathologies). However in medical applications when even
small changes in mis-classifications of diseases can be of importance to patients,
even a modest improvement in handling pathological cases can be of real benefit.
Furthermore the sub-model approach should offer a greater improvement in imaging
modes (e.g. radiographs) where varying projection or magnification across the image
means that there are advantages to locally varing the pose parameters.
6.5.8 Conclusions
We have assessed the location accuracy of multi-AAM segmentation for vertebral
assessment on DXA images, using a large training set containing a substantial number
of fractured vertebrae. The accuracy achieved on normal vertebrae is good and is
comparable to manual precision, whilst the automatic precision is substantially better
than by a manual method at under 0.2mm. The accuracy performance of the tool
does deteriorate with increasing fracture grade (Figure 6.10), but even in fractured
vertebrae, results of acceptable accuracy are achieved in almost 90% of cases. The
shape models are adequate up to grade 2 fractures, and for over 87% of points in even
grade 3 fractures. The feasibility of substantially automating vertebral morphometry
measurements on DXA images is confirmed, even with multiple fractures present.
However a small amount of manual correction would be necessary, mainly for the
more severe fractures.
142
Chapter 7
Vertebral Fracture Classification
using Shape and Appearance
Parameters
7.1 Introduction
It is known that current quantitative methods of detecting vertebral fractures based
on height ratios are insufficiently specific, particularly in distinguishing mild (grade
1) fractures from other kinds of short vertebral height deformities. This has led to
the Genant semi-quantitative method [65] becoming almost a de facto gold-standard
for fracture assessment, but there still remains significant subjectivity, particularly
for mild (grade 1) fractures. The subjectivity problem is discussed at length by Jiang
et al, who proposed an Algorithmically Based Qualitative diagnosis method (ABQ)
[79]. Further discussion has already been given in chapter 3. In addition to the issue
of subjectivity, another problem is the inadequate number of radiologists in some
countries to interpret radiographs. Furthermore, newer scanning technologies (DXA)
are becoming available in units other than radiology departments. Therefore it is
desirable to define a quantitative approach which can capture at least some of the
more subtle information used in expert visual assessment. Our aim is to define more
reliable quantitative fracture classification methods based on a complete definition
of the vertebra’s shape, and the texture within a sampling profile centred on the
endplate. Some similar work using shape parameters had been previously done by
143
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Smyth [126] on a smaller dataset of lumbar vertebrae. Because in practise radiol-
ogists do not classify only using shape, but also employ appearance clues involving
the texture around the endplate, and the presence or otherwise of other differen-
tial indicators (e.g. disc problems, osteophytes), we also investigated using the full
appearance model parameters. These encode both shape and texture information.
Linear classifiers were constructed using shape and appearance parameters and com-
pared with more standard morphometric methods.
We have already published much of the contents of this chapter in a journal paper
[118]. The results presented here are slightly different, due to some further optimisa-
tion of the classifiers, and also some subsequent changes in the dataset, though the
conclusions are not affected in substance.
7.2 Classification Methods
7.2.1 Data and Ground Truth
The same dataset was used as for the segmentation experiments of the previous
chapter, that is 360 DXA images. Initially we trained and tested the classifiers using
the manually segmented points (the “gold standard”). This was to explore classifier
performance unconfounded by AAM segmentation errors. Later in this chapter we
present results for classifiers still trained on the manual segmentations, but tested
against the automatically determined shapes (the AAM solutions of the previous
chapter).
The vertebrae were first independently classified by two radiologists (JEA,EP) us-
ing the Algorithmically Based Qualitative (ABQ) method [79, 54]. This method
emphasises the collapse of the endplate∗ as the fundamental visual indicator that a
deformed vertebra really is a fracture. This typically produces a more diffuse texture
around the endplate. In fractures it is often possible to also see the remnant of the
peripheral intact cortical rim of the vertebral body, which in lateral view appears as
a secondary edge above the endplate itself. This differs from the typically crisper
vertebral edge associated with (non-fracture) short vertebral height deformities (e.g.
∗most commonly the mid-portion of the endplate
144
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
mild wedging caused by spondylosis). The two radiologists then compared readings,
and where there was a discrepancy they reached a consensus judgment. If a vertebra
was judged as fractured, it was assigned a grade according to the Genant grading sys-
tem [65]. Vertebrae were classed as either: Normal, Deformed (but not fractured),
Fractured (with sub-grades in the Genant system), or Not Visualised. Note that
those vertebrae classed as deformed all display height loss of in excess of 15% and
may be confused with mild fractures; often these displayed mild wedging due to disc
disease. The 79 vertebrae which could not be adequately visualised were ignored in
the later analysis. There were 354 fractures (97 mild, 147 moderate, and 110 severe)
and also 158 non-fracture vertebral deformities, and so there was considerable scope
for confusion between mild fractures and non-osteoporotic deformity.
Note that the “gold standard” data used here was slightly different to that previously
used in our earlier publication [118], as some of the spines in a subset of the data
had originally been incorrectly annotated by one vertebral level (usually L3 had
been marked as L4). This affected 53 images. These spines were shifted and the
supplementary vertebrae (usually T7) annotated and classified. In a few cases I
suspect that some confusion may have been introduced as to the vertebral levels
reported by the radiologist. This would lead to the results in [118] being slightly
worse than they should be. The affected spines were therefore completely re-classified
by the two radiologists and so the results reported here differ slightly from those
in [118] (overall performance is slightly better). However the conclusions are not
materially affected.
7.2.2 Linear Classifiers - Inputs and Training Scheme
Although vertebrae at different levels have subtly different shape distributions, there
was not enough data to have a separate classifier for each vertebral level. The data
was pooled into three shape models (and hence three classifiers) for: the lumbar
spine (L4 to L1), lower-thoracic spine (T12-T10), and mid-thoracic spine (T9 to
T7). For each of these sections a point distribution model (PDM) was derived as
follows. Firstly, the vertebral shapes were recursively aligned to the group mean
shape, using generalised Procrustes Analysis to remove translational, rotational and
isotropic scaling from the shape. Then the remaining variance around the mean shape
was modelled using principal components analysis (PCA) to extract the eigenvectors
of the covariance matrix associated with 98% of the remaining point position variance
145
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
(ordered by largest eigenvalue - i.e. highest variance component - first). This follows
the standard method for deriving a linear shape model as discussed in chapter 4, and
results in a substantial reduction in dimensionality. These eigenvectors then define a
basis for the space of allowable shapes, and the shape parameters of a vertebra are
simply the basis coefficients required to reconstruct its aligned shape. A particular
vertebral shape is defined by a vector of points x, which is projected into the shape
model space so:
x ≈ T (x + Psb), (7.1)
where xm is the mean aligned shape (in the model frame), Ps is the shape model
orthogonal basis matrix consisting of the column-wise concatenation of the eigenvec-
tors, b is the vector of shape parameters, and T is a similarity transform representing
the shape’s pose parameters (positional offset, isotropic scaling and rotation).
Once the shape models were derived, the appropriate shape model was fitted to each
vertebra in the training set, and the resultant shape parameters were recorded. Note
that in the model frame, the eigenvectors are normalised, with absolute size implicit
in the pose parameters of T . Thus the shape parameters are scale-free, and so might
fail to determine crush fractures where there is simply an overall height reduction,
with little other change in shape. So in addition a scale measure was also used in the
classifier. This was taken from the McCloskey method [95], and was the ratio of the
actual posterior height Hp to the predicted height Hp(pred) given the posterior height
of four neighbours, with the predicted height calculated as in [95] (see also Chapter
3 for a summary of the McCloskey method). The expected ratios of vertebral heights
for the McCloskey predicted height were based on a trimmed population of normal
reference data for DXA images taken from [111]. We refer to the ratio of actual to
predicted posterior height below as the crush ratio with
rcrush =Hp
Hp(pred)
(7.2)
We then trained a linear discriminant. This was done by assigning categorical values
of -1 to non-fractured vertebrae —(normal or deformed) and +1 to fractured verte-
brae. We combined these into an overall category vector y. To handle mean offsets
in the regression model, let the shape parameters bi of image i be extended with an
146
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
additional -1, to create parameters viT =
[rcrush,bi
T ,−1]T
and denote the row-wise
concatenation of vi by V . The regression model weights are then given by:
w =(VTV
)−1Vy (7.3)
An appearance-based classifier was also derived in a similar manner but using the
parameters of an appearance model [26] rather than simply shape parameters. In
summary the texture used in the appearance model was extracted by sampling along
normal profiles centred on each shape model point. After a non-linear renormalisa-
tion to the local image statistics (see next section), a texture model can be derived in
an essentially similar manner to the shape model, by performing PCA. Finally corre-
lation between shape and texture can be modelled by concatenating both shape and
texture model parameters into a combined vector and performing a tertiary PCA.
This produces the appearance model, whose parameters determine both shape and
surrounding texture. See chapter 4 for further details.
Our classifiers are simply binary (fractured or not). If a vertebra is subsequently
classified as fractured then its grade can be assigned by conventional methods of
height reduction thresholds.
For a comparison with what might be achieved in a similar way using the more
standard morphometric height ratios we also trained a linear discriminant using the
crush ratio, and the mid-height and wedge ratios:
rmid =Hm
Hp
rwedge =Ha
Hp
, (7.4)
where Hm, Hp are the middle and anterior heights respectively. As a second baseline
comparison we used a hybrid morphometric method in the more standard manner,
using these same ratios: the Eastell-Melton method [45] for the mid-height and wedge
ratios, and the McCloskey method’s [95] crush ratio . We calculated the mean and
standard deviation of each of the three ratios over the normal population (excluding
any vertebrae classified as deformed). No sample trimming was performed, as we
considered that in effect this had already been done by the radiologists’ classification
of certain vertebrae as deformed. A vertebra was then classified as fractured if any
of the three ratios was less than X standard deviations below the normal mean. X
was varied to generate derived Receiver-Operating-Characteristic (ROC) curves. We
147
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
refer to this method below as Eastell-McCloskey hybrid.
7.2.2.1 Height Calculation
In order to calculate vertebral heights it is necessary to define a vertebral axis line.
We use the method of Felsenberg and Kalender [53]. This also helps correct for lateral
displacement of the morphometric points, as only separation distance perpendicular
to this axis is used. The axis is defined as the angular bisector of the two (superior and
inferior) lines joining the respective posterior/anterior corner points. More precisely,
two lines L1,L2 are constructed: L1 is the line joining the superior posterior to
superior anterior points; and the inferior posterior to anterior points defines L2. Then
we calculate the point xI where these two lines intersect; or if a parallel check is
satisfied the algorithm takes the simpler case of the parallel line in the middle of
the (parallel) line pair. Assuming the more standard former case, we next calculate
the line LV passing through xI which bisects the angle between L1,L2. This defines
the vertebra’s axis. It is actually the normal L′V to this axis which defines the line
along which heights are measured. Thus for example, with the obvious notation
for superior and inferior posterior coordinates, and having normalised L′V to have a
length of unity:
Hp = | (xsp − xip) .L′V | (7.5)
And likewise for the other point pairs, mutatis mutandis.
7.2.2.2 Appearance Model Form
The models used for classification were similar profile models to the ones previously
used for segmentation. Some pre-smoothing of the image was performed, and the
non-linear renormalisation was also slightly different. We experimented with using
simply the image texture (but renormalised), and also gradient along the profile.
Sampling profiles were defined normal to the vertebral shape tangent at each point
in the shape model, with a nominal step length depending on the vertebral type
(lumbar, lower-thoracic, mid-thoracic), varying from 1mm in the lumbar, 0.9mm
for the lower-thoracic vertebrae (T12-T10), and 0.75 for the mid-thoracic vertebrae
(T9-T7). The local image patch was pre-smoothed in a 5-tap radially symmetric
148
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Gaussian filter with a standard deviation of 0.7 times the step length. These sizes
are strictly applied to the mean shape in the class, and the actual step length is
renormalised based on the ratio of vertebral size to mean size, where the size metric
is the root mean square distance from the boundary to the centroid. For the gradient
APMs, the gradient along the profile was then extracted using a 1x3 Sobel filter,
introducing some additional smoothing across the profile. For texture models a smilar
smoothing across the profile was intoduced by sampling one step either side of the
current position and convolving with the Sobel weights [0.25, 0.5, 0.25]T . The profile
lengths were 5 steps inside the vertebra and 8 steps outside. A slightly larger sample
was taken outside because the shape follows the collapsed endplate in the case of a
fracture, and the larger external sampling distance ensures that where the endplate
has collapsed by up to approximately 40% in height, but a cortical rim remains
above, the latter can be sampled by the outer profile. The ensemble of sampled
vectors was then renormalised to the local image statistics using a mapping based
on the signed square root of the Geman-McClure robust error function [61], which
applies a standard normalisation to (-1,1) whilst tending to truncate any extreme
values. See equation 6.2 in chapter 6. The normal use of the Geman-McClure
function is as an M-estimator in robust statistics, but we have used it here as an
extension of the sigmoidal normaliser used by Cootes et al ( [32], see also equation
4.27). The advantage of the Geman-McClure form over that of Cootes et al is that
scale is explicitly incorporated, and the use of median statistics helps avoid intoducing
too much “signal” into the estimate of background. Thus significant structure should
be towards the end of the function range, whereas background and noise should be
mostly confined to the [-0.5,0.5] range. This normaliser devotes a greater proportion
of the overall range to significant structure than that proposed by Cootes et al [32].
Also it has a more natural extension to pure grey-level AAMs, rather than gradient
or other feature detector AAMs with zero background expectation.
When concatenating the shape and texture parameters to produce the vectors used
to train the appearance models the weighting scheme was to equalise the variance of
the shape and texture sub-components.
149
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
7.3 Experiments
7.3.1 Initial APM form selection
When training classifiers it is generally necessary to select a subset of relevant features.
Often leave-one-out jacknife tests can be performed to select relevant features (e.g. by
greedily introducing a new feature that most improves the ROC curve area). However
in our cases the features are shape and appearance parameters which themselves
depend on the training subset, and so it is not possible to do this. We experimented
instead with performing feature selection by means of a stepwise regression process.
We start with a forwards stepwise process which only selects shape or appearance
parameters that significantly improve the categorical regression’s residual sum-of-
squares. At each iteration the most significant (on a F -ratio) feature variable is
selected until no more significant variables remain. We also run a backwards process
that starts with all parameters included, and gradually removes variables that do
not make the residual sum of squares significantly worse. At each iteration the
variable with minimum associated F -ratio is removed. Next we take the intersection
of variables from both forwards and backwards processes, and run another forwards
process; similarly we take the union of both and run a secondary backwards process.
The final variable set is the union of both of these parameter sets.
A closely related question is what proportion of the total variance of both shape and
texture models to include. We need to include enough shape variance to be able to
accurately fit to unseen shapes, so we simply fixed the shape variance proportion
a priori at 98%. We also noted that the stepwise regression included most shape
parameters (typically reducing for example from 18 dimensions to 15), so the shape
parameters are not hugely redundant. However the texture model is more question-
able, as the noisy nature of the images means that there is a risk of including a large
number of essentially meaningless “noise” modes. We also noted that the stepwise
regression would typically halve the number of included appearance parameters. We
therefore decided to conduct some preliminary experiments to try and optimise the
amount of variance included in the texture model. We could have run several leave-
one-out experiments over our dataset, and then picked the appearance model form
giving the best performance. But this would be liable to be a biassed estimate of
the true population performance, as the same experiment is used to optimise and
test. So instead we performed a coarser level optimisation of the texture model by
150
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
performing a bootstrap estimate of performance as follows.
We conducted two levels of randomisation. At the first level we randomly pick 32
images to be excluded, and in the second we randomly order the remaining dataset,
and loop through it using a leave-16-out train/test cycle. We conducted 50 cycles
of the outer exclude-32 loop. We then calculated ROC curves for the concatenated
bootstrap over all such randomisations. Note that this will tend to under-estimate
performance compared to an overall leave-one-out, as at each classification the models
and classifiers are somewhat less well trained (as 48 images are removed rather than
just one). We stepped through the texture variance included from 50% to 95% in 5%
steps, examining both classifiers with all parameters in, and performing a selection
using stepwise regression.
As each leave-16-out train/test instance produces in fact a slightly different set of
classifiers to all others, the derivation of Receiver-Operating-Characteristic (ROC)
curves is not completely obvious. In order to concatenate the results into single
curves, for each classified vertebrae in the test image we recorded the linear classifier’s
measure of the likelihood ratio that it was fractured, given by:
Li(frac) = (1 + exp (w.vi))
−1 (7.6)
Then in deriving concatenated ROC curves we pool all these likelihood ratios and then
obtain a particular overall performance by imposing a specific detection threshold.
We derived ROC curves by varying the detection threshold used.
Initially we intended to pick the texture model form and variance proportion that
maximised the area under the ROC curve. However we found that ROC curve area
was not a very discriminating measure. All values examined were similarly high
(around 0.98), and varying the variance proportion just tended to move around which
section of the ROC curve was best, rather than producing an overall optimum. There
was a tendency for the models with higher variance to be better at sensitivities in the
85%-95% range, but these often had long final tails and produced poor performance
over a few atypical examples, resulting in lower area over the final portion of the
ROC curve. This would be typical of over-complex models which are undertrained.
Which model form is the best depends really on the portion of the ROC curve around
which one wishes to operate. We knew from provisional results that the 90%-95%
sensitivity region would be a feasible operating region for an appearance classifier,
151
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
whereas higher sensitivity (e.g. > 97%) would be likely to produce unacceptably high
false positive rates. We chose to optimise performance around the 95% region.
There is clearly a danger of an optimisation which is based on only one point on a
ROC curve, which will be subject to sampling noise, increasing in the high sensitivity
region. Therefore we took a blurred version of the false positive rate in the [90,97]%
sensitivity region. This was done by convolving the false positive rates in this re-
gion with a beta distribution, with mode 0.95, and parameters selected to effectively
focus the beta PDF in the skewed [90,97]% region. A beta kernel was used rather
than a Gaussian for several reasons. Firstly we needed an asymmetric distribution
which could be skewed, because there is a natural asymmetry to the false positive
rates around 95% - these very rapidly increase as one moves to higher sensitivites.
Secondly the beta distribution has a finite domain on [0,1], and thirdly it is com-
monly used as a conjugate Bayesian prior in problems involving binomial (Bernoulli)
distributiions. Beta distributions are also commonly used to model events which are
constrained to take place within an interval defined by a minimum, maximum and
a mode; for example in project planning methodologies such as PERT† which use a
mode, optimistic and pessimistic time scale estimate for each task.
The beta distribution has a PDF given by
fB(x) =xα−1(1 − x)β−1
B (α, β)(7.7)
where the normalisation constant B (α, β) is the Beta function given by
B (α, β) =Γ(α + β)
Γ(α)Γ(β)(7.8)
and Γ is the standard gamma function.
The parameters α, β can be estimated from the first two moments of a sample by
inverting the equations for mean and variance to obtain, defining γ = µ(1−µ)σ2 − 1:
α = µγ β = (1 − µ)γ (7.9)
where µ, σ are the mean and standard deviation. These can also be estimated from
†Program Evaluation and Review Technique, originally developed for the management of thePolaris missile project
152
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
the desired effective range and mode according to:
µ ≈ a + 4b + c
6σ ≈ c − a
6(7.10)
where a is the minimum, b is the mode, and c is the maximum. Thus (subsitituting
a = 0.90, b = 0.95, c = 0.97) we obtain µ = 0.945, σ = 0.01167. We then loop over
all points on the ROC curve where the sensitivity changes between 0.90 and 0.97
and convolve their associated false positive value with this beta kernel, and then
normalise. Thus the convolved false positive rate is given by
rfp =
∑i∈I fB(si)rfp(si)∑
i∈I fB(si)(7.11)
where rfp(si) is the false positive rate at sensitivity si, and I is the set of indices
where si ∈ [0.90, 0.97] and si 6= si−1.
The texture variance proportion to use was then selected for each of the three spinal
regions using the best performing value of rfp. Results are given in the next section.
For now we note that rather different texture variance proportions were used (0.775
for mid-thoracic, 0.675 for lower thoracic, and 0.825 for lumbar). Also gradient models
worked best in the thoracic spine, whereas raw grey level‡ worked best in the lumbar.
Final leave-one-out cross-validation tests were then performed over the entire dataset
with texture models thus determined. Again for each classified vertebrae in the test
image we then recorded the linear classifier’s measure of the likelihood ratio that it
is fractured, given equation 7.6 above.
Similarly for the Eastell-McCloskey method we calculated the area in the tail of
the normal vertebra’s CDF Fn (i.e. the probability that a normal vertebra would
have a height ratio less than or equal to the given one), and then recorded the
“likelihood” of a fracture as the complement of this: 1 − inf{Fn}. Although this is
not strictly a likelihood ratio (it is not well-normalised) the use of the minimum is
logically equivalent to performing an OR operation over all three height ratios in the
original method, and fixing a standard deviation threshold is logically equivalent to a
threshold on the normal population CDF. We have reformulated the method in this
way as it makes it easier to combine data into an overall ROC curve in the same way
as the likelhood ratios from the discriminants.
‡but renormalised using the Geman-McClure function
153
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
The statistical significance of classifier differences was investigated using McNemar’s
test [98]. This is applied to single points on the ROC curves [144], and can be used
for pairwise comparisons of either the specificities for given sensitivity, or vice versa.
We compared the specificities of a set of operating points in the 95% sensitivity
region (from 93% to 97%) for the following pairwise comparisons: height ratio vs
shape linear discriminants; height ratio vs appearance linear discriminants; shape
vs appearance linear discriminants; hybrid Eastell-McCloskey vs height ratio linear
discriminant. The test is applicable to matched samples (i.e. same image set) with
binary labels. To compare specificities at given sensitivity, we compared the number
of AB and BA pairings, where an AB result is one where classifier 1 gives the correct
true negative, but classifier 2 gives a false positive; and vice versa for a BA pairing.
Instances where both classifiers give the same result have no influence on the test
statistic. McNemar’s test statistic is given by:
(|Nab − Nba| − 1)2
Nab + Nba
(7.12)
where Nab and Nba are the number of AB and BA pairings as defined above. This
would asymptotically follow a chi-squared distribution with one degree of freedom
(valid for (Nab + Nba) ≥ 10), under the null hypothesis that the classifiers are equiv-
alent apart from random error. We compared a set of points around the desirable
operating region rather than for example ROC curve area, because statistics for the
whole curve area generally involve operating regions of little practical relevance (e.g.
unrealistically low sensitivities or high false positive rates).
We also evaluated the per-patient (rather than per-vertebra) sensitivity and false
positive rates for a range of underlying per-vertebra specificities. Here we assume
that a patient is truly osteoporotic if any vertebral fractures at all are present, and
normal otherwise. Then, for the underlying per-vertebra operating specificity, we
check whether the patient is diagnosed as having any fractures at all by the combined
classifiers applied to all the vertebrae. This enables us to produce a set of patient-
level (really image-level) false positive and sensitivity figures. Osteoporotic patients
will often have several vertebral fractures, and so it need not always matter if say
a grade 1 fracture is missed in a patient with other fractures that are recognised.
On the other hand per-patient false positive rates will be higher than the underlying
single-vertebra false positive rates. We examined these patient-level statistics for
single vertebra false positive rates of 2.5% and 5%.
154
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
7.4 Results
Table 7.1 shows the beta-convolved false positive rate in the 95% sensitivity region
as a function of the variance proportion retained in the texture model, for a gra-
dient based texture model; and Table 7.2 shows the corresponding figures using a
renormalised grey-level intensity model. These results are for the initial bootstrap
experiments, and were used to select the texture model to be used. It can be seen
that gradient models give better performance in the thoracic spine, whilst grey-level
models seem better in the lumbar. The reasons for the difference are not obvious, but
may be to do with the tendency for the lumbar region to have higher unstructured
noise due to a significant number of obese patients. As DXA images use low energy,
fat in the lumbar region can absorb a substantial portion of this photon flux, lead-
ing to poor signal to noise ratio with obese patients. As the variance of a gradient
is larger than the underlying grey level (typically doubled), it could be that under
higher noise conditions some advantages of gradient-based normalisation are lost due
to the increase in variance. Referring to table 7.1, there is generally a region of better
performance, and we selected variance proportions towards the upper end rather than
the lower. This is because we also experimented with a stepwise regression process as
discussed earlier, so we retain a further opportunity to discard useless information;
whereas there is no way to add it once it is excluded from the initial model. Also
as the second set of experiments should be somewhat better trained it is better to
err on the side of including some additional feature information that could yet be
useful. We selected variance proportions in the middle of the 75-80% region for the
mid-thoracic spine, and the 65-70% region for the lower-thoracic. It is interesting
that the optimal proportion for the lower-thoracic region seems smaller than for the
mid-thoracic. Somewhat different sampling lengths are used (smaller from T7-T9)
so this may be part of the explanation. Artefacts from diaphragm motion also tend
to affect T10-T11§. Referring now to the grey-level model results in Table 7.2, we
selected the mid-point of the 80-85% region for the lumbar texture model. The grey
level texture models tend to have significantly fewer modes than gradient based ones,
so it makes sense for the retained variance to be somewhat higher for the lumbar
than the thoracic spine.
Figures 7.1, 7.2 and 7.3 show the ROC curves for the mid-thoracic, lower thoracic
and lumbar spine respectively, derived from the final set of leave-one-out experiments.
§due to the long DXA scan time of around 5 minutes
155
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Each figure gives ROC curves for the shape parameter classifier, appearance param-
eter classifier, and baseline Eastell-McCloskey hybrid classifier. The ROC curves
display the most interesting section of the ROC curves (sensitivity exceeding 0.6,
FPR below 0.5). We show the effect of fracture grade in Figures 7.4 and 7.5 ,
which present similar ROC curves for grade 1 and 2 fractures respectively. No ROC
curve is given for severe fractures, as all classifiers are virtually perfect against the
grade 3 fractures. The areas under the curve for the various ROC curves are given
in Table 7.3, whilst Tables 7.4, 7.5 and 7.6 show the false positive rates obtained
with the various classifiers at sensitivities of 0.90,0.95 and 0.98 for the mid-thoracic,
lower-thoracic, and lumbar vertebrae respectively.
The McNemar test statistics for the specificity comparisons around the 95% sensitiv-
ity operating point are given in Table 7.7. It can be seen that for most comparisons
between a shape model classifer and a height ratio linear discriminant, there are
hugely significant differences; and even more so for the comparison betwen an ap-
pearance model classifier and the height ratio linear discriminant. The exception is
in the lumbar spine, where use of a shape model discriminant is not significantly
better than a height ratio discriminant. The appearance model classifier significantly
outperforms the shape model classifier in every case but one. In the comparison be-
tween the two height based methods (linear discriminant vs Eastell-McCloskey) there
is no consistent overall difference; though one may outperform the other at particular
operating points.
A visual representation of the optimal shape discriminant direction for mid-thoracic
vertebrae is shown in Figure 7.6, which displays how the shape varies in the dis-
criminant direction, starting from a normal vertebra, then moving to the mean shape
and adjusting the shape parameters through the classification boundary and beyond.
It appears from Figure 7.6 that biconcave endplate fractures dominate, although a
modest degree of wedging does also start to appear. Note that Figure 7.6 shows
only the variation in scale-free shape, and absolute height (or McCloskey height ra-
tio) is not indicated on the Figure. Similarly Figure 7.7 shows the variation in
the synthesised appearance moving along the discriminant direction from the mean
appearance.
The results presented are all for classifiers using the full set of model parameters,
given the proportion of texture variance retained. As discussed above we also exper-
imented with using stepwise regression to try and obtain a more parsimonious set of
156
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Beta-Convolved FPR in Spinal Region(%)Texture Variance(%) Mid-Thoracic Lower-Thoracic Lumbar
50 7.1 10.3 9.055 7.9 10.6 9.960 7.0 9.9 9.965 5.9 7.2 10.370 5.1 7.6 9.575 4.6 9.4 8.680 4.8 9.8 9.185 5.0 10.1 9.490 7.7 10.1 9.6
Table 7.1: Beta-convolved false positive rates (%) for the gradient appearance classifieras a function of variance retained in texture model
Beta-Convolved FPR in Spinal Region(%)Texture Variance(%) Mid-Thoracic Lower-Thoracic Lumbar
50 9.1 10.3 8.255 9.4 10.1 8.460 9.3 10.1 8.865 7.2 9.9 7.970 8.4 10.3 7.775 7.9 11.0 6.580 8.1 10.0 5.985 8.0 11.1 6.290 8.6 10.6 7.595 9.5 11.5 7.9
Table 7.2: Beta-convolved false positive rates (%) for the intensity appearance classifieras a function of variance retained in texture model
parameters. Although in provisional results using larger proportions of texture vari-
ance (92%), this resulted in some improvements in certain cases, once the retained
texture variance had been optimised, we found that the stepwise selection process
did not result in any further improvement in appearance model results, and indeed
could even degrade performance. Further results are therefore omitted.
In [118] we had also presented some results using a modified classifier training
method, which used a form of robust statistics to fit the classifier weights. In essence
this downweighted the influence of cases far away from the classification boundary
(e.g. severe fractures). This had appeared to produce some slight improvements, but
again after further optimisation of the texture models, we found that this more com-
157
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Spinal Region Fracture GradeClassifier MT LT Lum G1 G2 G3 AllEastell-McCloskey 0.9631 0.9674 0.9733 0.9248 0.9800 0.9947 0.9690Height Ratios LD 0.9691 0.9718 0.9749 0.9228 0.9846 0.9994 0.9718Shape LD 0.9782 0.9854 0.9804 0.9435 0.9924 0.9998 0.9810Appearance LD 0.9884 0.9848 0.9798 0.9490 0.9955 0.9998 0.9838
Table 7.3: Area under ROC curves. Columns labelled MT, LT and Lum refer to mid-thoracic (T7-T9), lower-thoracic (T10-T12) and lumbar (L1-L4) vertebrae respectively;those labelled G1, G2, G3 refer to fracture grades 1, 2 and 3 respectively.
Figure 7.1: Mid-Thoracic Spine (T7-T9) ROC Curves showing the Eastell-McCloskeyheight classifier and the shape and appearance model linear discriminants
Sensitivity (%)Classifier 90 95 98Eastell-McCloskey 6.7 20.6 27.6Height Ratios LD 2.8 19.9 26.9Shape LD 3.4 8.2 20.9Appearance LD 2.4 3.2 7.6
Table 7.4: False Positive Rates (%) in the mid-thoracic spine (T9-T7) for the differentclassifiers at various sensitivities.
158
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Figure 7.2: Lower-Thoracic Spine (T10-T12) ROC Curves showing the Eastell-McCloskeyheight classifier and the shape and appearance model linear discriminants
Figure 7.3: Lumbar Spine ROC Curves showing the Eastell-McCloskey height classifierand the shape and appearance model linear discriminants
159
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Sensitivity (%)Classifier 90 95 98Eastell-McCloskey 5.4 19.6 33.3Height Ratios LD 13.7 18.0 21.9Shape LD 3.3 6.4 18.3Appearance LD 2.5 4.7 20.8
Table 7.5: False Positive Rates (%)in the lower-thoracic spine (T12-T10) for the differentclassifiers at various sensitivites
Sensitivity (%)Classifier 90 95 98Eastell-McCloskey 6.2 10.0 21.9Height Ratios LD 4.6 11.0 36.4Shape LD 3.4 9.6 32.8Appearance LD 1.5 4.9 45.7
Table 7.6: False Positive Rates (%)in the lumbar spine for the different classifiers atvarious sensitivites
Figure 7.4: ROC Curves for combined Grade 1 Fractures showing the Eastell-McCloskeyheight classifier and the shape and appearance model linear discriminants
160
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Figure 7.5: ROC Curves for combined Grade 2 Fractures showing the Eastell-McCloskeyheight classifier and the shape and appearance model linear discriminants
McNemar Test Statisticat Sensitivity Level:
Spinal region Classifier Comparision 93% 95% 97%T7-T9 Height LD vs Shape LD 37.4 83.8 19.5
Height LD vs Appearance LD 54.9 127.7 138.7Shape LD vs Appearance LD 10.8 33.0 93.6E-M Height vs Height LD (1.6) 0.3 0.2
T10-T2 Height LD vs Shape LD 91.0 93.0 14.7Height LD vs Appearance LD 99.0 96.6 40.7Shape LD vs Appearance LD 3.2 6.5 15.8E-M Height vs Height LD (55.2) 1.3 15.0
L1-L4 Height LD vs Shape LD (8.2) 2.6 3.7Height LD vs Appearance LD 26.7 42.0 14.0Shape LD vs Appearance LD 65.2 37.6 7.4E-M Height vs Height LD 5.6 (0.9) (8.0)
Table 7.7: McNemar Test Statistic comparing various classifiers between 93% and 97%sensitivity. E-M Height refers to Eastell-McCloskey, whereas Height LD is the linear dis-criminant using the 3 heights. Note that the 5% significance level of the χ1
2 distribution is3.84, and large values above this indicate significant differences. Bracketed figures indicatedeterioration from 1st to 2nd classifier.
161
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Figure 7.6: Visualisation of the (scale-free) discriminant direction in shape parameterspace for a mid-thoracic vertebra. Shapes are generated by adjusting the shape parametersfrom the mean shape along the optimum discriminant direction in units of half the distancefrom the mean to the classification boundary. Counting left to right: 1) the shape movedone step into the normal region; 2) the mean shape;3) a slightly deformed vertebra halfwayto the boundary; 4) the boundary shape; 5-7 fractured vertebrae lying 1,2,3 steps into thefractured region. Note these are scale-free shapes with no account of absolute height.
plicated method produced no consistent improvement. Finally we also experimented
with fitting the appearance model using a robust M-estimator using the Geman-
McClure kernel as in [13]. This produced no improvement, though that might be
due to the fact that the radiologists discarded certain cases as “not visualised”. In
practise it could still be worthwhile fitting the models using a robust estimator, to
avoid for example false positives due to diaphragm motion artefacts. But in this the-
sis we have presented results only for standard least squares methods, which appear
to work well with this data.
The false positive rates and associated sensitivities at the patient level (i.e. combining
all vertebrae) are summarised in Table 7.8. Note that these figures reflect the fracture
prevalence within our dataset, which was deliberately fracture-enriched. Patient-level
results cannot be generalised to a more general population, but are included as a rough
guide to how vertebral-level specificities and sensitivities may translate to the overall
patient level.
162
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Figure 7.7: Visualisation of the (scale-free) discriminant direction in appearance param-eter space for a mid-thoracic vertebra. Appearances are generated by adjusting the shapeparameters from the mean shape along the optimum discriminant direction in units of 0.25the distance from the boundary to a borderline grade 3 fracture. The top left appearance isthat of a normal vertebra two units from the boundary, then on the right is a normal verte-bra one unit from the boundary. The middle appearance represents the boundary betweennormal and fractured veretebra. The lower two appearances represent moderate (left) andsevere fractured vertebrae (right) placed at two and four units from the boundary. Notethat the appearance model displays the renormalised smoothed image gradient along thesampling profiles. Also note these are scale-free shapes with no account of absolute height.
163
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Classifier Vertebral Patient PatientType FPR (%) FPR (%) Sensitivity (%)Eastell-McCloskey 2.5 15.5 92.6Shape LD 2.5 12.4 96.3Appearance LD 2.5 10.8 97.2Eastell-McCloskey 5.0 21.6 95.4Shape LD 5.0 18.6 97.2Appearance LD 5.0 18.6 98.1
Table 7.8: Overall Patient-Level FPR and Sensitivity given individual vertebrae FPR
7.5 Discussion
Firstly it is interesting that even the height-ratio based methods performed better
than might have been expected. For example Li et al [88] reported a 60% sensitivity
and FPR of 0.85%, when applying a 3SD threshold to height ratios; or when using
a 2SD threshold in the same study, a false positive rate of 8% pertained, with 80%
sensitivity. At similar specificities we obtained a 68% sensitivity (FPR of 0.85%),
and 92% sensitivity with FPR of 8%; whereas our false positive rate at 80% sensi-
tivity was only 2.2%. However most studies on morphometric methods have used
radiographs, where apparent shape change due to projective tilting may be more li-
able to induce false positives than in DXA. We have only considered vertebrae up to
T7, and some vertebrae have been excluded because the radiologists defined them as
inadequately visualised. Also we use a relatively complex height calculation which
normalises for lateral displacement of points, and the set of image processing facil-
ities and model-based interpolation inherent in our manual markup probably lead
to better precision in the underlying point placement. But it is interesting that our
DXA results appear to be considerably better than earlier studies using radiographs.
This provides some evidence that DXA images may actually be better suited than
radiographs for morphometric methods, though one has to accept a small proportion
of unvisualised vertebrae, and the technique would be poorer for obese patients. We
must also immediately qualify this by saying it is also necessary to use reasonably
sophisticated digital image processing techniques ¶ to bring out the full information
in the image, in order to precisely place the points.
The appearance-based classifier dominates the ROC curves in each of the 3 spinal
regions over the ROC regions of practical interest (e.g. sensitivity exceeding 0.7,
¶i.e. non-linear contrast enhancements tuned to the local histogram
164
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
false positive rate below 0.2). At the 95% sensitivity point the appearance based
classifier is superior in every case. In the lumbar and mid-thoracic spines the false
positive rate is more than halved in comparison to the best other classifier, and in
the lower-thoracic spine the false positive rate drops to 4.7% compared to 6.4% with
a shape based classifier (significant on the McNemar test), or 18% with the best
height-based method. The shape classifiers outperform height based classifiers in the
thoracic spine, especially in the lower-thoraic spine, where a shape-based classifer is
nearly as good as an appearance classifier. However in the lumbar spine, the use
of a shape-based classifier does not outperform height methods in the 95% region,
although it does appear from the ROC curve (Figure 7.3) that it is more specific at
slightly lower sensitivity (in the 80-90% region).
In the thoracic spine, the use of appearance model classifiers allows a huge improve-
ment in specificity at the 95% sensitivity point. The false positive rate drops from
around 20% with current morphometric methods to about 4%, whilst in the lumbar
the false positive rate is halved. The significance of these results is confirmed by the
results of the McNemar tests.
At a sensitivity of 95% we obtain an overall false positive rate (FPR) of 4.9% using
appearance classifiers (roughly 5% equal error rate), whereas this increases to 12.4%
using only the shape parameters, and 18.3% using Eastell-McCloskey (or 16.2% with
a height ratio discriminant). The improvement in specificity is particularly marked
for the grade 1 fractures, as can be seen on Figure 7.4. The overall figure of 95%
sensitivity for the appearance classifier translates to 85.4% sensitivity against grade
1 fractures, 98.5% for grade 2 and 100% for grade 3 fractures (assuming the same
false positive rate of 4.9%). This compares to 75.2% against grade 1 fractures for a
shape classifier or 64.0% for a height ratio discriminant (65.2% Eastell-McCloskey),
at the same specificity. This indicates that the underlying appearance model does
capture some more subtle discriminating features than the simple height ratio or
shape classifiers; for example, information about the crispness of the edge of the
vertebral body. This may allow for example some of the false positive short vertebral
height wedge deformities, typical of the mid-thoracic region, to be rejected by the
appearance classifier. Once the fracture has reached grade 2, there is less to choose
between the three classifier types over much of the sensitivity range. At the same 4.9%
false positive rate, the shape and Eastell-McCloskey classifiers produce sensitivities of
98.5% (identical to appearance classifier) and 92.5% respectively. On severe fractures,
all linear discriminants are so good by this stage that it is not possible to tell any
165
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
difference.
Smyth at al [126] reported on results for classifying vertebrae using shape parame-
ters, but the dataset was small and confined to lumbar vertebrae. Our results extend
the method by using the appearance parameters. Furthermore in [126] a quadratic
classifier was used, which is theoretically optimal for Gaussian distributions with
unequal covariance matrices. We found that quadratic classifiers performed slightly
worse than the simpler linear classifier, even when stepwise regression was used to
reduce the number of features. This is probably because the number of fractured
training examples was too small to reliably estimate the population covariance ma-
trix of fractured parameters. In [126] a larger relative improvement was reported
between the shape parameter and height ratio classifier. However [126] used a more
complex shape model, including both the endplate inner contour, and the outer corti-
cal rim; whereas our shapes model only the endplate outer edge. We would expect to
capture at least a similar amount of information in the appearance model, where for
example secondary edges cause gradient highlights, which should produce particular
appearance mode parameters. De Bruijne et al [17] propose a Neighbour-Conditional
Shape Model. This predicts the expected (normal) shape given several neighbouring
vertebrae, and uses the total deviation from the predicted shape to classify a verte-
bra. The method was evaluated on lumbar radiographs. This does employ additional
prior information about the interrelations between vertebrae, but by using only shape
the method may be more prone than appearance-based classifiers to false positives
on non-fracture deformities. Our appearance classifier specificity at 95% sensitivity
seems good compared to that reported in [17] (95% compared to 84% in [17]), but
of course the datasets are different, and [17] used radiographs, which have the added
difficulty of projective effects.
Li et al [88] derived sensitivity/specificity figures for three radiologists using the semi-
quantitative method, compared to a gold standard derived by a consensus reading
involving also a fourth radiologist expert in the SQ method. The median sensitivity
was 88% with a specificity of 98% (so FPR of 2%). Our overall sensitivity with the
appearance classifier at this FPR is 88.3%, strikingly similar to that of an experienced
radiologist using SQ on radiographs. Of course our gold standard is less rigorous than
that of [88], and the radiologists’ reporting in our study might have been different in
some cases on radiographs rather than with the poorer quality DXA images. Both our
DXA gold-standard consensus reading and the classifier could have been wrong about
particular vertebrae, if compared to an even more rigorous gold standard derived
166
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
from a consensus reading of radiographs. Nevertheless it is interesting that, within
the limitations of DXA image quality, the concordance of our appearance classifier
with expert reading can be configured to be comparable to that of inter-radiologist
concordance using SQ on radiographs.
Examining table 7.8 we see that at the coarser level of an overall patient result,
there is less difference between the different classifiers than would be expected from
single-vertebra performance. The overall patient FPR ‖ for a 5% single vertebra FPR
is around 19% for shape and appearance classifiers, which is about half what would
be expected if all vertebra were independent. When OR-ing all individual vertebrae,
there is little difference between shape and appearance classifiers, though these still
outperform morphometric methods. On the whole, differences in classifier perfor-
mance appear swamped by the statistical correlations between fracture occurrence in
different vertebrae in the same individual. This may be partly a result of our dataset
being deliberately fracture-enriched, including an over-representation of severely os-
teoporotic patients. In a more truly representative population, we might see more
of the individual vertebrae performance differences preserved. Also these diagnosis
results at the patient level reflect the system performance viewed as a stand-alone
automatic system. In reality we envisage our methods being used as an aid to a clin-
ician, and in this context the appearance classifier would result in fewer overall false
alarms at the vertebral level, which might lessen clinician workload in re-examining
each false positive vertebra on the patient’s image.
7.6 Classifying given a semi-automatic segmenta-
tion
7.6.1 Semi-automatic method
The previous results are based on a detailed manual segmentation of all the vertebrae,
and so represent an idealised classifier performance. We also investigated classifier
performance using the semi-automatic segmentation obtained from our AAM meth-
ods as described in the previous chapter. We stored the segmented shapes obtained
by using quintet sub-models with the fractured alternative initialisation method (see
‖i.e. where a positive patient result is one where any of the vertebrae is diagnosed as fractured
167
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
section 6.5.6). In fact for each image we saved 10 replications, using different ran-
dom errors in the initialisation (i.e. randomising over the vertebral centre locations
to simulate the precision of a clinician clicking on the vertebral centres). Thus in
these experiments the test-set has each image present 10 times, but with slightly dif-
ferent segmentations. The vertebral shape and appearance models, and all classifiers
were then trained exactly as above (i.e. using the manual segmentations for classifier
training), and a similar leave-one-out experiment was run, but this time each test
case was evaluated using each of its 10 automatic segmentations. Sensitivity and
corresponding false positive rates were then evaluated exactly as before.
7.6.2 Semi-automatic Results
Figure 7.8 shows the ROC curves obtained from the semi-automatic segmentations,
with all three spinal regions combined. ROC curves for the shape and appearance
classifiers are given in this Figure, with a baseline ROC curve (with legend “Heights”)
for 3 height morphometry (“Eastell-McCloskey hybrid”). It can be seen that overall,
below a sensitivity of 75%, the shape and appearance classifers are almost indistin-
guishable, but as the FPR increases the appearance classifier dominates. Both shape
and appearance classifiers appear better than height-based morphometry. The over-
all ROC curve does mask certain performance differences in the different regions of
the spine. Table 7.9 shows the classifier sensitivities at FPR of 1%, 2% and 5%
for the 3 spinal regions, and 3 fracture grades. The advantages of the appearance
classifier seem greatest in the mid-thoracic spine, which accords with the results with
the “gold-standard” manual segmentation, and which interestingly is where morpho-
metric methods are particularly prone to false positives on mild wedges due to other
diseases (e.g. degenerative disc disease). The appearance classifier gives the best sen-
sitivity at 5% FPR, but at lower FPR (1%) it is outperformed at T10 and below by a
pure-shape classifier, and in the lower thoracic spine even by standard morphometry.
The advantages of the appearance classifier seem to manifest more as the sensitivity
is increased. Overall the sensitivity of the appearance classifier at 5% FPR is reduced
from around 95% with ideal (manual) segmentation to 86% with the semi-automatic
segmentation, a loss in sensitivity of 9% due to segmentation errors. The reduction
remains approximately constant across all fracture grades - even severe fractures only
give 92% sensitivity, which is not surprising given that 8% of the (grade 3) edges had
been mis-fitted to a neighbouring vertebra (see table 6.19).
168
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Figure 7.8: ROC curves for (semi)automatically-segmented images, with all vertebraecombined
Figure 7.9 shows ROC curves for the appearance classifier only, but separated by the
3 fracture grades. The areas under the various ROC curves for classifier diagnoses
given the semi-automatic segmentation are given in table 7.10. The appearance
classifier gives the best ROC curve area in every category.
The patient-level sensitivity/FPR are summarised in table 7.9. At an individual ver-
tebra FPR of 2.5%, a patient level FPR of under 10% was obtained for an appearance
classifier, with 91% sensitivity. To obtain 95% sensitivity at patient level, it would be
necessary to increase the FPR to around 22%. Although these patient statistics are
relative to our (fracture-enriched) dataset, they indicate a good enough performance
to encourage practical use of the technique as an aid to a clinician, or possibly even
in triage. We also anticipate that with a modest degree of user-correction of a few
poorly segmented vertebrae, the system performance could be improved to approach
the limits established for the classification on a fully manual segmentation.
7.7 Conclusions
We have developed linear discriminants for detecting vertebral fracture using shape
and appearance parameters. The main advantages of using the more complex ap-
169
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Spinal Region Classifier Sensitivity given FPRor Grade Type 1% 2.5% 5%Mid-Thoracic E-M Height 41.0 58.9 73.1
Shape LD 54.8 66.0 76.2Appearance LD 61.5 71.4 83.3
Lower-Thoracic E-M Height 69.4 75.6 82.7Shape LD 66.7 82.6 88.5Appearance LD 60.0 79.1 89.4
Lumbar E-M Height 59.3 68.1 79.2Shape LD 74.6 78.6 82.4Appearance LD 72.1 82.1 87.0
Grade 1 E-M Height 34.1 45.5 59.9Shape LD 44.2 62.0 69.5Appearance LD 42.0 61.0 75.2
Grade 2 E-M Height 58.8 70.3 75.4Shape LD 70.0 81.4 86.3Appearance LD 74.7 83.2 89.1
Grade 3 E-M Height 78.5 84.4 87.9Shape LD 81.3 85.2 90.0Appearance LD 80.2 86.6 92.2
All E-M Height 57.6 66.4 74.8Shape LD 66.0 77.0 82.6Appearance LD 66.9 77.8 86.0
Table 7.9: Classifier Sensitivities for 1%, 2.5% and 5% FPR, for semi-automatic segmen-tation. E-M Height means the Eastell-McCloskey morphometric method.
Spinal Region Fracture GradeClassifier MT LT Lum G1 G2 G3 AllEastell-McCloskey 0.8878 0.9456 0.9131 0.8623 0.9238 0.9586 0.9164Height Ratios LD 0.8899 0.9462 0.9119 0.8672 0.9255 0.9507 0.9168Shape LD 0.9167 0.9617 0.9347 0.8889 0.9502 0.9672 0.9376Appearance LD 0.9366 0.9630 0.9484 0.9085 0.9615 0.9708 0.9490
Table 7.10: Area under ROC curves given semi-automatic segmentation. Columns labelledMT, LT and Lum refer to mid-thoracic (T7-T9), lower-thoracic (T10-T12) and lumbar (L1-L4) vertebrae respectively; those labelled G1, G2, G3 refer to fracture grades 1, 2 and 3respectively.
170
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Figure 7.9: ROC curves for appearance classifier on (semi)automatically-segmented im-ages, for the 3 fracture grades
Classifier Vertebral Patient PatientType FPR (%) FPR (%) Sensitivity (%)Eastell-McCloskey 2.5 13.1 89.4Shape LD 2.5 10.0 89.8Appearance LD 2.5 9.8 90.8Eastell-McCloskey 5.0 25.6 92.7Shape LD 5.0 19.8 94.7Appearance LD 5.0 21.9 95.0
Table 7.11: Overall Patient-Level FPR and Sensitivity given individual vertebrae FPR
171
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
pearance or shape parameter discriminants rather than height ratio methods are in
detecting grade 1 fractures. For lumbar vertebrae, the appearance-based classifier
can approximately halve the false positive rate when operating at 95% sensitivity
compared with traditional quantitative morphometric methods; and for thoracic ver-
tebrae the reduction is approximately four-fold.
There is generally an advantage in using a fuller shape description, though this is not
always apparent with lumbar vertebrae, but the better performance of the appear-
ance classifiers indicates that the underlying appearance model also captures textural
indicators of fracture, such as the more complex edge structure associated with end-
plate collapse. However the more complex appearance classifier also displays some
sign of undertraining, as it does not perform well at very high sensitivity (98%). Nev-
ertheless this sensitivity level is essentially unrealistic, as it produces unacceptable
false positive rates for all classifiers.
The results obtained when classifying on the basis of a semi-automatic segmentation
are not as good, but still very promising. At a false positive rate (per vertebra)
of 5% the overall sensitivity is 86% for the appearance classifier, compared to 75%
for standard morphometric methods. We would anticipate a modest degree of user-
correction of the segmentation, which should allow the results to approach those of
the manually segmented solutions.
Further improvements in appearance-parameter based classification might be made
by using more sophisticated training methods and non-linear kernel methods: for
example a Support Vector Machine [136] with radial basis function kernels. Neither
have we investigated what improvement might be made by using information from
neighbouring vertebrae. We have used the appearance parameters as a compact way
of capturing both shape and texture information, but one disadvantage of using a
model is that in a few cases which are not well fitted by the appearance model, the
parameters may not represent the actual texture of the vertebra very well. It might
also be possible to develop purely data-driven texture descriptors that can be used
in similar classifiers. There is thus further scope for developing reliable quantitative
methods of classification. So although current morphometric methods are not widely
trusted to be reliable, there is still scope for reliable quantitative appearance-based
methods. Indeed the 5% equal error rate we have already established is in practice
probably not much worse than typical inter-radiologist concordance, especially when
one considers the subjectivity involved in the widely accepted Genant SQ method.
172
Chapter 7. Vertebral Fracture Classification using Shape and
Appearance Parameters
Our development of more sophisticated quantitative methods also creates the possi-
bility of replacing the current grading system (grades 1-3) with a more continuous
measure based on the perpendicular distance from the classification boundary. This
might make it quicker to spot subtle worsening of existing fractures in longtitudinal
studies (e.g. in a one-year follow up). In principle this could improve the statisti-
cal power of clinical trials, resulting ultimately in cost-savings as fewer subjects (or
shorter studies) would then be needed. Similar points are made by de Bruijne et al
in [17]. There would also be the future potential to combine a quantitative DXA
vertebral fracture measure with a BMD assessment using the same equipment. For
example a patient who might fall in the osteopenia class on BMD alone, but who also
had say two grade 1 vertebral fractures, might be considered as in fact osteoporotic.
173
Chapter 8
Segmention of vertebrae in
radiographs
8.1 Introduction
In previous chapters we have presented results for segmenting and classifying ver-
tebrae on DXA images. In this chapter we present results on the segmentation of
radiographs. The work of this chapter has been previously published in [115].
Radiographs are somewhat more challenging as the fan beam used in conventional
radiography can lead to parallax errors and apparent scale changes, and there is
considerable variation of contrast across the image. In this chapter we apply our
segmentation methods to a dataset of lumbar radiographs, as the first step in obtain-
ing more reliable quantitative classification of vertebral fracture must be to achieve a
reliable automatic segmentation. Some success in automatically locating vertebrae in
radiographs has been reported by Howe et al [77] and by de Bruijne et al [18]. Howe
used an AAM approach not dissimilar to ours. First a Generalized Hough Transform
was used to locate a plausible starting position. Then a global AAM was fitted, and
finally the appearance model parameters for individual vertebrae were reset to half-
way between the global model solution and the mean; and then individual vertebrae
AAMs were run to give the final solution. However we have not found individual
vertebrae models to be reliable, and we have also found that the use of the global
model first tends to locate local minima when fractures are present. So we have used
174
Chapter 8. Segmention of vertebrae in radiographs
the sub-model approach initially trialled on DXA images.
De Bruijne et al [18] used a shape model to generate a large ensemble of candidate
solutions, and then evolve the solution set using shape particle filtering in conjuc-
tion with a bank of nearest neighbour classifiers based on a feature set of Gaussian
derivative filters up to 3rd order. Each pixel in a candidate shape, is assigned a
probability of being background, within vertebra, or on the vertebral boundary, and
then an overall likelihood measure is derived for the whole shape and used to evolve
the particle ensemble. The need to evolve a large candidate set means the method is
computationally rather expensive, but it is a fully automatic method, and the shape
mutation methods used mean that the final solutions are not over-constrained by
shapes in the training set.
In this chapter we assess the accuracy of the triplet AAM sub-model approach to
segmenting vertebrae on lumbar radiographs. Although radiographs typically have
better resolution and signal to noise ratio, the shape and appearance of the vertebrae
are more complex due to projectional parallax effects. The divergent beam used in
conventional radiography causes a variable scaling across the image, and can cause
severe apparent tilting of the vertebral bodies. Also as the more extreme vertebral
bodies tend to be obliquely irradiated, their superior and inferior endplates typically
appear as elliptical rims, rather than the more linear edge typical of DXA. Figure
8.1 shows a typical lumbar radiograph, with some contrast enhancement to ensure
all vertebrae are simultaneously visible. Figure 3.6 in Chapter 3 shows an example
of severe apparent tilting.
8.2 Materials and Methods
8.2.1 Data
The images used were obtained from anonymised radiographs collected in a previous
epidemiological study [23], with the permission of Professor Cyrus Cooper. We have
thoracic and lumbar radiographs, but have initially just used lumbar radiographs
as these are the more straightforward case due to less clutter (e.g. from lungs and
ribs), and a lower fracture prevalence. The dataset consisted of 250 lumbar radio-
175
Chapter 8. Segmention of vertebrae in radiographs
Figure 8.1: Lumbar radiograph. a) shows the raw image (contrast enhanced); b) showsthe automatically located vertebral contours superimposed.
graphs, digitised using a Vidar ∗ Diagnostic Pro Advantage digitiser at 300dpi and
12 bit intensity resolution. This Vidar digitiser allows a variety of analogue to digital
conversion mappings. As there is typically a large range of brightness/contrast at
different vertebral levels in the radiographs it was important to select a transform
that preserved information across a large dynamic range. The default logarithmic
transform did not work well on these images, as it typically “washed out” the often
brighter vertebrae in the lower lumbar, whereas using a more nearly linear transform
had the opposite effect of losing information in the typically darker upper portion
(T12/L1). After some initial experimentation it appeared that the “power 3” † option
gave the best compromise performance.
The digitised images were manually annotated using an in-house tool‡, by an experi-
enced radiograper, supervised by the author and with some advice from JEA. Each
vertebral contour uses 60 points around the vertebral body with 8 further points
around the pedicles. The endplate rims were modelled using a quasi-elliptical shape,
rather than the single edge previously used for DXA images. No images were included
∗Vidar Systems Corp, Herndon VA, USA†manufacturer’s designation, in fact it appears to be a cube root‡written by the author in C++ using the Trolltech Qt GUI library, existing AAM code and using
bootstrapped AAM submodels, see section 6.3
176
Chapter 8. Segmention of vertebrae in radiographs
Figure 8.2: Zoomed in view of L3 showing its shape model points
where the projectively induced tilting was so severe that lumbar vertebrae appeared
to interpenetrate each other (with the occasional exception of the extreme T12/L1 or
L4/L5 pairs). Such images are extremely difficult to read, even by an expert radiol-
ogist, and can lead to unreliable diagnosis. Extreme projective tilting can be caused
by setup error, patient positioning, or can be the result of an intrinsic scoliois. Figure
8.2 shows a zoomed in view of L3 with its shape model points displayed.
8.2.2 AAM approach
The dynamic linked AAM approach of chapter 6 was used to fit a sequence of three
AAMs composed of overlapping vertebral triplets covering the spine from L4 up to
T12. Note that L5 is not normally used in vertebral fracture assessment as it is very
rare for L5 to suffer osteoporotic fracture, and it may be obscured by the iliac crest.
The three triplet models used were T12/L1/L2, L1/L2/L3 and L2/L3/L4. A slight
variation from the DXA method was that affine transforms rather than similarity
transforms were used for the shape model pose. Allowing shearing (i.e. different x
and y scaling) is a better approximation to the perspective distortion induced by the
fan beam than assuming isotropic scaling. Each triplet sub-model has its own affine
pose parameters, thus allowing for variation in projective effects across the image.
Figure 8.3 shows the variation in the first two shape modes of the L2-centred triplet.
177
Chapter 8. Segmention of vertebrae in radiographs
Figure 8.3: L2 triplet 3SD variation in first (left) and second (right)shape modes
T12 was included to form the uppermost L1-centred triplet, although there were
some lumbar radiographs in which T12 was not fully visible. Nevertheless in general
T12 should be visible on a lumbar radiograph, and results from DXA lead us to
believe that it is helpful in fitting L1 to also include the neighbouring T12 in the
model. In fact sometimes T12 is better visualised on the lumbar radiograph than on
the thoracic. There was often a high variation in brightness and contrast across the
different vertebral levels. For example T12 or even L1 were often very dark, and could
often not be seen without some local contrast optimisation, whereas L4 typically had
an over-bright “washed-out” appearance. Figure 8.1 is typical in this respect (though
the displayed figure is after contrast enhancement - L1 would be barely visible on the
original). Another advantage of decomposing the overall shape into sub-structures
is that the texture normalisation can be better tuned to the local brightness and
contrast, where there is substantial variation in these across the image.
As there is little useful information inside the vertebral body we used profile samplers
for the AAM texture model, rather than the triangulated region samplers classically
used with an AAM. We have already extablished that profile samplers work better
on DXA images (see chapter 6). The profile samplers used were similar to those
used in DXA images, but due to the better resolution of radiographs they included
additional scales. The samplers extracted the gradient perpendicular to the local
shape, and this was non-linearly renormalised using a sigmoidal function tuned to
the mean absolute gradient [32, 123] over the entire profile set. We used a 4-level
multi-resolution pyramid search, to extend the convergence zone, with 8 samples
either side of the shape. The finest level step size was 0.375mm, and the images were
pre-smoothed up to a resolution of 0.1694 mm per pixel (i.e. one level of Gaussian
pyramid up-smoothing). Thus the profile step size represents about 2 pixels (at each
level of the pyramid). The extracted gradient is Gaussian smoothed across the local
178
Chapter 8. Segmention of vertebrae in radiographs
tangent, with a smoothing window equal to the step length (on each side). We also
experimented with a profile sampler which concatenated this profile with a similar
profile sampler extracting a measure of image corner strength (“cornerness”), related
to the Harris corner detector [73], as in [123]. As the corners of the vertebrae are of
physical interest in standard morphometry, it was thought that including a cornerness
measure in the AAM might improve the accuracy at points of important diagnostic
interest. Furthermore the projective parallax and oblique beam orientation tend to
introduce curved features in the region of the profile. The cornerness measure has
the further advantage that it implicitly includes feature information from a somewhat
larger region, as the measure is based on the structure tensor (∇I∇IT ) §, which is
Gaussian smoothed over a locally square region with semi-width twice the profile step
length. See [123] for details.
8.2.3 Experiments
Leave-25-out tests were performed over the 250 images. As AAMs perform local
search an approximate initialisation somewhere in the vicinity of the vertebrae is
needed. The initialisation was performed as for DXA (method 2) on the approximate
centres of the vertebrae. We simulated the precision of a clinician clicking on the
centres of each vertebra by adding zero-mean Gaussian errors with SD of 2mm in
the y-direction (along the spine) and 3mm in the x-direction (as for DXA). Twenty
replications (i.e. random initialisations) of each image were performed.
8.3 Results
The accuracy of the search was characterised by calculating the absolute point-to-line
distance error for each point on the vertebral body. Table 8.1 compares results for
the two profile samplers used with the data separated into points within normal or
fractured vertebrae. Each row gives the mean, median and 75th percentiles, and the
percentage of point errors in excess of 2mm. The threshold of 2mm would be around
2.5 SDs of manual precision, and can be viewed as a point failure indicator.
Table 8.1 shows that the results are worse for fractured than normal vertebrae.
§with the Cartesian image gradient ∇I as a column vector
179
Chapter 8. Segmention of vertebrae in radiographs
Gradient Only Sampler Gradient & Corner SamplerAccuracy Normal Fractured Normal FracturedStatistic Vertebrae Vertebrae Vertebrae VertebraeMean (mm) 0.71 1.11 0.64 1.06Median (mm) 0.46 0.62 0.43 0.6175%-ile (mm) 0.89 1.34 0.82 1.32%ge errors> 2mm 6.2% 14.1% 4.6% 13.3%
Table 8.1: Search Accuracy Percentiles by Fracture Status for the two profile samplersused
A more detailed examination by fracture grade gives mean accuracies of 0.84mm,
1.79mm and 3.35mm for fracture grades 1, 2 and 3 ¶ respectively (for the gradient
and corner sampler). However there was a low fracture prevalence in the lumbar
region in the sample, and these figures are based on 17 grade 1 fractured vertebrae,
two grade 2, and only a single grade 3 fracture.
The more sophisticated profile sampler including a corner measure appears to produce
a small improvement in accuracy of around 0.07mm. This represents a 10% reduction
in mean error. We confirmed that this difference is statistically significant at the 1%
level by calculating a 99% confidence interval for the mean difference between the two
samplers, using hierarchical bootstrap resampling of the differences in errors between
the two profiles as in [122]. This symmetric (in probability) 99% bootstrapped
confidence interval on the mean difference was [0.048,0.082]. As this interval does
not span zero, the difference is significant at the 1% level.
8.4 Discussion
8.4.1 Overall Accuracy Performance
The mean segmentation accuracy of 0.64mm on normal vertebrae is comparable to
manual precision in point placement, and to our previous results on DXA images [117].
Over 95% of points in normal vertebrae are located to within 2mm of the manually
annotated outline. However the dataset contained a very low prevalence of fractured
vertebrae, and so the shape models are evidently undertrained, for fractures above
¶i.e. mild, moderate and severe fractures, see [65]
180
Chapter 8. Segmention of vertebrae in radiographs
grade 1. Therefore within the limitations of the small fractured sample it appears that
the results deteriorate with increasing fracture grade. However given our previous
reasonable accuracy achieved on fractured vertebrae with DXA images, we believe
that this problem could be solved by adding more fractured training examples. The
mean accuracy is better than other comparable cited figures in the literature [77, 18].
For example de Bruijne et al [18] obtained a mean point-to-contour accuracy of 1.4mm
on lumbar radiographs using shape particle filtering, which is more than double the
size of error achieved by our AAM approach. On the other hand this was for a fully
automatic search with no approximate manual initialisation such as we use. Howe
et al [77] state that 68% of points on lumbar radiographs were located to within 25
pixels. We understand the dataset used had a resolution of 0.174mm per pixel, so
this is equivalent to a 68th percentile of 4.35mm, clearly substantially worse than
our 75th percentile of 0.85mm. However again Howe et al were using a completely
automatic method, with the AAM being initialised to the best template match found
by an initial Generalised Hough Transform.
8.4.2 Conclusion
In conclusion the results confirm the feasibility of substantially automating vertebral
segmentation on radiographs, although the shape models need better training on
fractured vertebrae. Within the limitations of the dataset, the projective effects of
spinal radiography do not appear to present any substantial problem to an AAM-
based approach.
8.4.3 Future Work
Further fractured training examples are clearly required, and we also intend to extend
the work to the thoracic spine, which tends to contain more osteoporotic fractures.
A dataset is currently being annotated by an experienced radiographer.
Use of the shape and appearance parameters of the fitted models could in future
provide a means of classifying vertebrae as normal, fractured, or otherwise deformed,
as we have already demonstrated for DXA data. Current simplistic quantitative mor-
phometric methods are unreliable, especially for mild fractures, but the appearance
parameters may provide a quantified form of some of the more subtle aspects of visual
181
Chapter 8. Segmention of vertebrae in radiographs
or semi-quantitative expert reading of vertebral fractures. We therefore view obtain-
ing a reliable automatic segmentation as the first step in achieving a Computer Aided
Diagnosis (CAD) system for the diagnosis of vertebral fracture from radiographs.
182
Chapter 9
Conclusions and Further Work
This chapter summarises the work described in this thesis, highlighting its novel
contributions and its potential for real clinical use. Areas of future development are
also summarised.
9.1 Summary of Original Work and Results
9.1.1 AAM methodological developments
We have presented a general algorithm for combining multiple AAM sub-models. Us-
ing multiple sub-models can mitigate the undertraining problem that can be inherent
in statistical models, and also allows for a greater range of pathological cases, or local
illumination/contrast effects. We envisaged that these sub-models could typically be
overlapped to provide linkage; although we also allow for the case where the only link-
age is provided by a global model, which is used to re-predict initial search locations
for latterly-fitted sub-models, given the latest set of already-fitted sub-models. We
also developed a technique of dynamically sequencing the sub-models, thus allowing
the fitting order to be determined by the data. This was shown to lead to modest
but significant improvements in accuracy on the DXA dataset. It also facilitates
generalisation of our method to cases where there is no natural fitting order.
The dynamic sequencing approach also lends itself to allowing multiple initialisations
of each sub-model in order to improve robustness in locating pathological cases.
183
Chapter 9. Conclusions and Further Work
9.1.2 Vertebral Segmentation
We have collated a large database of DXA images, and the dataset has been enriched
with a high fracture prevalence. This has allowed testing of the AAM segmentation
techniques under realistic operating conditions, such as would be encountered in
clinical use. The performance of the segmentation techniqes has been extensively
verified against the full range of fracture grades, and good accuracy is maintained
even against severe fractures, although inevitably there are a modest number of search
failures, which increases with fracture grade. Good performance can be maintained
by using an alternative “fractured” initialisation of each AAM sub-model.
We have evaluated an extensive set of alternative AAMs. We found that profile AAMs
performed better than the classical triangulated region AAM, even when the more
sophisticated feature-AAM of Scott et al [123] was used for the latter. We optimised
the sub-model structure used, finding that groups of 5 vertebrae (quintets) were
marginally superior, though the performance gain over triplets was small, and largely
confined to severe fractures. We found that single vertebra models were unreliable.
At the other extreme we confirmed that a single global model was also unable to
accurately segment fractured cases, though with normal vertebrae its performance
was more similar to the sub-model approach. This somewhat contradicted our earlier
result obtained on a smaller training set [113]. It appears that with a small training
set there is more gain to be obtained from decomposing the structure into multiple
sub-models, but this accuracy difference is gradually eroded as the training set is
extended; but only for normal cases. The global AAM still maintains too great an
a priori bias towards the mean shape to cope with pathologies or unusual sub-shapes.
We therefore recommend our sub-model approach.
We have obtained a location accuracy that is comparable to manual precision of point
placement for osteoporotic patients in around 90% of cases.
We have also applied our methods to a dataset of 250 digitised lumbar radiographs,
with similarly good results.
184
Chapter 9. Conclusions and Further Work
9.1.3 Vertebral Classification
We investigated the use of both shape model and appearance model parameters in dis-
tinguishing vertebral fractures from both normal vertebrae, and other short vertebral
height deformities. We used linear classifiers, and optimised the proportion of texture
variance retained in the model. We compared our classifiers with methods based on
just the standard 3 vertebral heights, and demonstrated a convincing improvement
in specificity for given sensitivity. The false positive rate (per vertebra) at 95% sen-
sitivity is around 5% using an appearance classifier. Thus at 95% sensitivity we have
roughly halved the false positive rate for lumbar vertebrae compared with traditional
quantitative morphometric methods; and for thoracic vertebrae the reduction is ap-
proximately four-fold. This is a substantial improvement in the clinical applicability
of quantitative techniques. There had been a view among many radiologists that
quantitative methods were too unreliable to be of much clinical use. However we
believe that, because our appearance classifiers can capture at least some of the more
subtle distinguising features used in expert evaluation, a reliable Computer Assisted
Diagnosis method is possible. This will be particularly valuable in situations where
DXA scans are conducted in units other than radiology departments. Furthermore
by using measures based on the distance in the appearance parameter hyperspace
from the classification boundary, we will be able to detect more subtle longtitudinal
changes in patients. This should have applicability to a wide range of longtitudinal
clinical trials.
We also evaluated a combined semi-automatic segmentation and classification, and
confirmed that with a semi-automatic segmentation at 5% FPR we can achieve 75%
sensitivity for grade 1 fractures, and over 90% on grades 2 and 3 combined. A modest
degree of user-correction to the segmentation should allow even better performance.
At the overall patient level (combining all vertebrae from L4-T7) this results in a
patient diagnosis sensitivity of 95% with 75% specificity; alternatively reducing the
individual vertebra FPR to 2.5% gave an overall patient specificity of over 90% with
similar sensitivity (10% equal error rate). These patient-level figures partly reflect
the fracture-enriched nature of our dataset.
185
Chapter 9. Conclusions and Further Work
9.2 Future Work
9.2.1 Other modalities
We have mostly used DXA images in this work. However the “gold standard” for
fracture evaluation in both radiology departments and clinical trials is still the spinal
radiograph (or computed radiography). We have collated an extended set of both
thoracic and lumbar radiographs and intend to evaluate both our segmentation and
classification techniques on these.
Because of the scan speed time, single energy SXA vertebral fracture evaluation is
more common than DXA, despite the fact that the mid-thoracic vertebrae are not
as well visualised, due to soft tissue and diaphragm motion artefacts. We therefore
intend to also evaluate our techniques in single energy SXA mode.
9.2.2 Classifer improvements
We have only evaluated classifiers using a single vertebra. There could be some gain
in performance by using larger models, such as a triplet or quintet. By including the
neighbours in the model, some further shape information would be available. This
might be helpful for several reasons. Firstly if no shape information on neighbour lo-
cation is included, it can be difficult to distinguish the additional edge texture caused
by sampling into the converse edge of a neighbour from the cortical rim remnant of
a fractured endplate. Secondly the neighbour location allows better definition of the
inter-vertebral disc space, and the texture therein can be useful in differential diagno-
sis. Thirdly there may be some subtle conditional shape effects that are useful. Also
certain kinds of vertebral deformities tend to affect several neighbouring vertebrae
(e.g. mild wedging due to spondylosis).
However by increasing the number of parameters input to the classifier, there would
be a danger of causing training problems. If some spurious correlations are present
in the training set, the classifier may even generalise less well and actually perform
worse than a single vertebra one. We would therefore also investigate alternative
training methods, specifically the Support Vector Machine [136]. For purely linear
classifiers the SVM training method places no weight on instances that are far from
186
Chapter 9. Conclusions and Further Work
the boundary (e.g. severe fractures or clearly normal cases), instead being based on a
more parsimonious set of “Support Vectors” near the boundary. Also by using kernel
methods, such as a Radial Basis Function kernel, the parameter input space can be
projected into a higher dimensional space in which better linear separation can be
achieved. This allows non-linear relations in the original space to be modelled.
The texture model used to build the appearance model implicit in the classifiers may
still not be optimal. We intend to investigate use of multi-scale texture samplers
(e.g. Gaussian smoothed derivatives at several scales), as well as the inclusion of
other feature detectors, such as the edge/corner AAM samplers developed by Scott
et al [123].
9.2.3 Automatic Detection of Search Failure
There will inevitably be some search failures, especially in cases of severe osteoporosis
with multiple grade 3 fractures. Ideally the segmenation system should be able to
evaluate its own performance, and detect when a search failure was likely. This might
even allow a degree of self-recovery. For example on failure a more extensive set of
alternative AAM initialisations might be tried. The likely failure could be highlighted
to the user in a clinical system, and the vertebra’s classification changed to unknown.
An obvious measure to try and use is the residual sum of squares. However because
the images are so noisy and often have low values of signal, it can be difficult to
distinguish between a correct segmentation of noisy data, and an incorrect fit to
the background; or a partially correct fit with some confusion with a neighbouring
vertebra’s edge. Our work on using a quality of fit measure in the dynamic AAM
sequencing method has also established that the residual sum of squares is far from
being χ2 distributed. We anticipate that in true failures, there are likely to be sets
of residuals that show strong spatial correlation. We therefore intend developing
methods that use the spatial structure of the residuals, as well as their total sum of
squares, to assess whether a search has been successful or not.
187
Chapter 9. Conclusions and Further Work
9.3 Final Statement
We have successfully demonstrated the use of AAM-based methods to locate and
classify vertebral fractures due to osteoporosis. The segmentation accuracy we have
achieved approaches that of human precision. The specificity of current quantitative
fracture diagnosis techniques has been substantially improved. This will enable real
improvements in early diagnosis of osteoporosis, which could have a significant effect
on the health of millions of patients. In parallel with this work we have already
developed a prototype clinical tool utilising these techniques, and hope that our
methods become adopted by clinicians.
188
Appendix A
A.1 Weighted fitting of shape and appearance model
parameters
It is necessary to compute the shape model parameters bs with respect to a weighted
vector of points X, where the weight of each point indicates its relative importance.
Often these weights w will be the reciprocals of the estimated point error variances.
As well as calculating model parameters it is generally necessary to calculate an
alignment transform Tt with pose parameters t to approximately align the shape to
the mean model shape x; and also to apply the shape model constraints. As there
may be some interaction between these 3 stages, in general several iterations around
a loop are performed (e.g. 5), as defined in Algorithm 4.
In general the point weights need not be isotropic. In fact in the constrained AAM
search in general anisotropic weights are assumed. However there were certain limi-
tations in the general structure of the existing higher level API’s in the existing C++
code that we used. This uses a class hierarchy set up for general abstract active mod-
els (i.e. a generalisation of the AAM or ASM), and assumes isotropic weights when
performing weighted fits of the shape model to a set of target points. As we only
use these functions to provide an approximate initialisation to the AAMs we have
not modified the existing API’s. Instead we renormalise the 2n dimensional weights
vector w to an n dimensional isotropic form w′. Assuming for convenience that the
x cartesian coordinates occupy the first n elements in the representation, and the y
cartesian coordinates occupy the second block of n elements, we use the mapping:
189
Appendix A.
w′j = 0.5(wj + wj+n); j ∈ {1 . . . n} (A.1)
Then let W be a 2n dimensional square diagonal matrix, defined as:
[W]j,j = w′j; j ∈ {1 . . . n} (A.2)
and
[W]j,j = w′j−n; j ∈ {n + 1 . . . 2n} (A.3)
Algorithm 4 Weighted fitting of shape model to target world-frame points
1. b = 0
2. counter=0
3. While counter < Max Iterations do
(a) xm = x + Psb
(b) Calculate best transform parameters t so X ≅ Tt(xm).See section A.2
(c) xp = T −1t (X)
(d) b = argmin((xp − x −Psb)TW(xp − x − Psb))See section A.3
(e) If∑i=m
i=1b2
i
λi> Dmax then
i. Shift b to nearest point b′ on hyperellipsoid so that∑i=m
i=1b′
2
i
λi= Dmax
(f) increment counter
A.2 Optimal pose parameters
We assume two sets of n points {xi : 1 ≤ i ≤ n} and {x′i : 1 ≤ i ≤ n}. We seek to
align the shape represented by {xi} to that represented by {x′i} so that the weighted
square error norm is minimised. Note in this section a shape is represented by the set
190
Appendix A.
of 2D points, where xi is the 2D ith point, in contrast to the usual notation of x as a
2n dimensional vector representing all the points. We represent the weightings by a
set of weight matrices Wi, where Wi = wiI2; so in other words the point weightings
are isotropic as discussed above. A treatment of the more general anisotropic case is
given by Cootes in [34]. The assumption of isotropic weightings allows us to make
some simplifications.
We seek a transformation Tt with parameters t so as to minimise:
E =
i=n∑
i=1
(x′i − Tt(xi))
TWi (x
′i − Tt(xi)) (A.4)
Solutions may be obtained in general by setting δEδt
= 0
For a similarity transform, with translation, scaling and rotation we can combine the
rotation and scaling factors into only two parameters a, b, together with a translation
shift d = (tx, ty)T , so that, defining the rotation/scaling matrix S:
S =
(a −b
b a
)(A.5)
we have:
Tt(x) = Sx + d (A.6)
and t = (a, b, tx, ty)T .
First we define the following summation forms.
SxWx =∑
xTi Wixi SWx =
∑Wixi SW =
∑Wi
SxWx′ =∑
xTi Wix
′i SWx′ =
∑Wix
′i
(A.7)
We also define the rotation matrix J which transforms x into its normal, i.e.
J =
(0 −1
1 0
)(A.8)
We define the further summation forms:
SxWJx′ =∑
xTi WiJx′
i SWJx =∑
WiJxi (A.9)
191
Appendix A.
The pose parameters are then given by the solution to the equation:
SxWx 0 STWx
0 SxWx STWJx
SWx SWJx SW
a
b
tx
ty
=
SxWx′
SxWJx′
SWx′
(A.10)
A.3 Weighted fitting of shape model parameters
Having aligned a world shape to the model frame we next seek optimal shape model
parameters to minimise the weighted L2 norm:
(xp − x −Psb)TW(xp − x − Psb) (A.11)
Define d = xp − x − Psb. Differentiating dTWd and setting derivatives to zero for
the minimum leads to the equation:
PTs WPsb = PsW(xp − x) (A.12)
Define A = PTs WPs. Equation A.12 is of form
Ab = z with z = PsW(xp − x) (A.13)
Also AT = PTs WTPs. But as W is diagonal WT = W. Hence also AT = A.
Therefore equation A.13 can be solved efficiently by Cholesky decomposition, though
in cases when the solution is ill-conditioned SVD can be used instead.
A.4 Applying additional appearance model con-
straints
When the sub-models are initialised, in addition to first performing a weighted fit of
the shape model to the required points, we also apply additional appearance model
constraints. This is done by assigning the combined (shape parameter , texture
parameter) vector ba (see equation 4.17) using the required shape model parameters,
192
Appendix A.
and with the texture parameters set to zero (i.e. mean texture is assumed), but the
weights on the texture parameters are all set to zero. Fitting the appearance model
to the shape parameters is in essence conceptually the same as the weighted fit of the
shape model to a set of points already discussed in the previous section. We require
a solution to
QTWaQc = QWaba (A.14)
The weights matrix Wa is a diagonal matrix with the first ms elements along the
diagonal set to unity (i.e. shape parameters all have the same weight), and then the
remaining mt parameters are set to zero (i.e. we attach no importance to the texture
parameters).
The use of zero weights on the texture parameters is liable to lead to equation A.14
being not full rank, so SVD should be used to solve it.
Finally the appearance parameter constraints on maximum Mahalanobis distance are
imposed, and if c is outside the allowed hyper-ellipsoid, then it is brought back to
the nearest point on the allowed hyper-ellipsoid. This may implicitly mean that the
shape is then altered slightly, as the actual shape model parameters that will then
finally be used are re-derived from c using the appearance model (i.e. the shape
model parameters are determined by the first ms elements of Qc)
193
Bibliography
[1] Adams JE. Dual-Energy X-ray absorptiometry. In: Baert A and Sartot K,eds., Radiology of Osteoporosis, (pages 87–100) (Springer-Verlag), 2003.
[2] Armstrong AL and Wallace WA. The epidemiology of hip fractures and methodsof prevention. Acta. Orthop. Belg., 60(S1):85–101, 1994.
[3] Baker S and Matthews I. Equivalence and efficiency of image alignment algo-rithms. In: Computer Vision and Pattern Recognition Conference 2001, vol. 1,(pages 1090–1097). 2001.
[4] Bataur A and Hayes M. Adaptive active appearance models. IEEE Trans.Imaging Processing, 14:1707–1721, 2005.
[5] Bauer JS, Muller D, Ambekar A, Dobritz M, et al. Detection of osteoporoticvertebral fractures using multi-detector CT. Osteoporosis Int, 17:608–615, 2006.
[6] Beck TJ, Looker AC, Ruff CB, Sievanen H, et al. Structural trends in theaging femoral neck and proximal shaft: analysis of the third national healthand nutrition examination survey dual-energy X-ray absorptiometry data. JBone Miner Res, 15:2297–2304, 2000.
[7] Bhargavan M, Sunshine JH, and Schepps B. Too few radiologists ? AJR AmJ Roentgenol., 178:1075–1082, 2002.
[8] Binkley N, Krueger D, Gangnon R, Genant HK, et al. Lateral vertebral assess-ment: a valuable technique to detect clinically significant vertebral fractures.Osteoporosis Int, 16:1513–1518, 2005.
[9] Black DM, Arden NK, Palermo L, Pearson J, et al. Prevalent vertebral deformi-ties predict hip fractures and new vertebral deformities but not wrist fractures.J Bone Miner Res, 14:821–828, 1999.
[10] Black DM, Cummings SR, Karpf DB, Kauley JA, et al. Randomised trialof effect of alendronate on risk of fracture in women with existing vertebralfractures. Lancet, 348:1535–1541, 1996.
194
Bibliography
[11] Black DM, Palermo L, Nevitt MC, Genant HK, et al. Comparison of methodsfor defining prevalent vertebral deformities: The study of osteoporotic fractures.J Bone Miner Res, 10(6):890–902, 1995.
[12] Black DM, Thompson DE, Bauer DC, Ensrud K, et al. Fracture risk reductionwith alendronate in women with osteoporosis: the fracture intervention trial. JClin Endocrinol Metab, 85:4118–4124, 2000.
[13] Black MJ and Jepson AD. Eigentracking: Robust matching and tracking ofobjects using view-based representation. International Journal of ComputerVision, 26(1):63–84, 1998.
[14] Blake GM, Rea JA, and Fogelman I. Vertebral morphometry studies usingdual-energy X-ray absorptiometry. Semin Nucl Med, 27:276–290, 1997.
[15] Bosch HG, Mitchell SC, Boudewijn PF, Leieveldt PF, et al. Active appearance-motion models for endocardial contour detection in time sequences of echocar-diograms. In: SPIE Medical Imaging, (pages 257–268). 2001.
[16] Boutroy S, Bouxsein ML, Munoz F, and Delmas PD. In vivo assessment oftrabecular bone microarchitecture by high-resolution peripheral quantitativecomputed tomography. J Clin Endocrinol Metab, 90:6508–6515, 2005.
[17] de Bruijne M, Lund M, Tanko L, Pettersen PC, et al. Quantitative verte-bral morphometry using neighbour-conditional shape models. In: 9th MICCAIConference, vol. 1, (pages 1–8) (Springer-Verlag), 2006.
[18] de Bruijne M and Nielsen M. Image segmentation by shape particle filtering.In: International Conference on Pattern Recognition, (pages 722–725) (IEEEComputer Society Press), 2004.
[19] Chakraborty A and Duncan JS. Integration of boundary finding and region-based segmentation using game theory. In: Bizais Y, ed., Information Pro-cessing in Medical Imaging: Proc 14th Int. Conf (IPMI 95). In volume 3 ofComputation Imaging and Vision, vol. 3, (pages 189–200) (Kluwer AcademicPress, Dordrecht), 1995.
[20] Chapurlat RD, Duboeuf F, Marion-Audibert HO, Kalpakcioglu B, et al. Effec-tiveness of instant vertebral assessment to detect prevalent vertebral fracture.Osteoporosis Int, 17(8):1189–1195, 2006.
[21] Christensen C, Johansen JS, and Riis B, eds. Epidemiology of vertebral fractures(Copenhagen), 1987.
[22] Cohen LD and Cohen I. Finite element methods for active contour models andballoons for 2D and 3D images. IEEE Trans. on Pattern Analysis and MachineIntelligence, 15:1131–1147, 1993.
195
Bibliography
[23] Cooper C, Shah S, Hand DJ, Adams JE, et al. Screening for vertebral osteo-porosis using individual risk factors. Osteoporosis Int., 2:48–53, 1991.
[24] Cootes TF, Edwards GJ, and Taylor CJ. Active appearance models. In:Burkhardt H and Neumann B, eds., 5th European Conference on ComputerVision, vol. 2, (pages 484–498) (Springer, Berlin), 1998.
[25] Cootes TF, Edwards GJ, and Taylor CJ. A comparative evaluation of activeappearance model algorithms. In: Carter JN and Nixon MS, eds., 9th BritishMachine Vison Conference, vol. 2, (pages 557–566) (BMVA Press, Southamp-ton, UK), 1998.
[26] Cootes TF, Edwards GJ, and Taylor CJ. Active appearance models. IEEETransactions on Pattern Analysis and Machine Intelligence, 23:681–685, 2001.
[27] Cootes TF, Hill A, Taylor CJ, and Haslam J. The use of active shape models forlocating structures in medical images. Image and Vision Computing, 12(6):276–285, 1994.
[28] Cootes TF, Page GJ, Jackson CB, and Taylor CJ. Statistical grey-level modelsfor object location and identification. In: Pycock D, ed., 6th British Machine Vi-son Conference, (pages 533–542) (BMVA Press, Birmingham, England), 1995.
[29] Cootes TF, Petrovic V, Schestowitz R, and Taylor CJ. Groupwise constructionof appearance models using piece-wise affine deformations. In: 16th British Ma-chine Vison Conference, vol. 2, (pages 879–888) (BMVA Press, Birmingham),2005.
[30] Cootes TF and Taylor CJ. Combining elastic and statistical models of appear-ance variation. In: European Conference on Computer Vision, vol. 1, (pages149–163) (Springer), 2000.
[31] Cootes TF and Taylor CJ. Constrained active appearance models. In: 8thInternational Conference on Computer Vision, vol. 1, (pages 748–754) (IEEEComputer Society Press), 2001.
[32] Cootes TF and Taylor CJ. On representing edge structure for model matching.In: Computer Vision and Pattern Recognition Conference 2001, vol. 1, (pages1114–1119). 2001.
[33] Cootes TF and Taylor CJ. Statistical models of appearance for medical imageanalysis and computer vision. Proc SPIE Medical Imaging, 3:138–147, 2001.
[34] Cootes TF and Taylor CJ. Statistical models of appearance for computer vision.Tech. rep., University of Manchester, 2004.
[35] Cootes TF and Taylor CJ. An algorithm for tuning an active appearance modelto new data. In: British Machine Vision Conference, (pages 919–928). 2006.
196
Bibliography
[36] Cummings SR, Black DM, and Nevitt MC. Bone density at various sites forprediction of hip fracture. Lancet, 341:72–75, 1993.
[37] Damilakis J, Maris T, Papadokostakis G, Sideri L, et al. Discriminatory abilityof magnetic resonance T2* measurements in a sample of postmenopausal womenwith low-energy fractures: a comparison with phalangeal speed of sound anddual X-ray absorptiometry. Investigative Radiology, 39(11):706–712, 2004.
[38] Davatzikos C, Tao X, and Shen D. Hierarchical active shape models, using thewavelet transform. IEEE Trans. Med. Imag., 22(3):414–423, 2003.
[39] Davies RH, Cootes TF, Twining CJ, and Taylor CJ. An information theo-retic approach to statistical shape modelling. In: 12th British Machine VisonConference, (pages 3–12) (BMVA Press, Birmingham), 2002.
[40] Davies RH, Twining CJ, Cootes TF, Waterton JC, et al. 3D statistical shapemodels using direct optimisation of description length. In: Heyden A, ed.,9th European Conference on Computer Vision, (pages 3–20) (Springer Verlag,Berlin Heidelberg), 2002.
[41] Delmas PD, Genant HK, Crans GG, Stock JL, et al. Severity of prevalent verte-bral fractures and the risk of subsequent vertebral and non-vertebral fractures:results from the MORE trial. Bone, 33(4):522–532, 2003.
[42] Delmas PD, van de Langerijt L, Watts NB, Eastell R, et al. Underdiagnosis ofvertebral fractures is a worldwide problem: the IMPACT study. J Bone MinerRes, 20(4):557–563, 2005.
[43] Dequeker J, Gautama K, and Roh YS. Femoral trabecular patterns in asymp-tomatic spinal osteoporosis and femoral neck fracture. Clin. Radiol., 25:243–246, 1974.
[44] Dunitz M and Muenier PJ, eds. Ultrasonic evaluation of osteoporosis (Taylorand Francis), 1998.
[45] Eastell R, Cedel SL, Wahner HW, Riggs BL, et al. Classification of vertebralfractures. J Bone Miner Res, 6(3):207–215, 1991.
[46] Edwards GJ, Lanitis A, Taylor CJ, and Cootes TF. Statistical models of faceimages - improving specificity. Image and Vision Computing, 16:203–211, 1998.
[47] Ettinger B, Block JE, Smith R, Cummings SR, et al. An examination of theassociation between vertebral deformities, physical disabilities and psychosocialproblems. Maturitas, 10:283–296, 1988.
[48] Ettinger B, Genant HK, and Cann CE. Long-term estrogen replacement ther-apy prevents bone loss and fractures. Ann. Int. Med., 136:298, 1985.
[49] Evans JG. The significance of osteoporosis. In: Smith R, ed., Osteoporosis 1990,chap. 13, (pages 1–8) (Royal College of Physicians, London), 1 edn., 1995.
197
Bibliography
[50] Evans JG, Prudham D, and Wandles I. A prospective study of fractured prox-imal femur: Incidence and outcome. Public Health, 93:235–241, 1979.
[51] Faulkner K, Cummings S, Black D, Palermo L, et al. Simple measurement offemoral geometry predicts hip fracture: The Study of Osteoporotic Fractures.J Bone Miner Res, 10:1211–1217, 1993.
[52] Faulkner K, Wacker W, Barden H, Simonelli C, et al. Femur strength index pre-dicts hip fracture independent of bone density and hip axis length. OsteoporosisInt, 17:593–599, 2006.
[53] Felsenberg D and Kalender WA. Computer-assisted morphometry of vertebralfractures. In: Genant H, Jergas M, and van Kuijk C, eds., Vertebral Fracturein Osteoporosis, (pages 309–318) (University of California), 1995.
[54] Ferrar L, Jiang G, Adams J, and Eastell R. Identification of vertebral fractures:an update. Osteoporosis Int., 16:717–728, 2005.
[55] Ferrar L, Jiang G, Armbrecht G, Reid DM, et al. Is short vertebral height al-ways an osteoporotic fracture? the osteoporosis and ultrasound study (OPUS)?Bone, 41:5–12, 2007.
[56] Ferrar L, Jiang G, Barrington NA, and Eastell R. Identification of vertebraldeformities in women: comparison of radiological assessment and quantitativemorphometry using morphometric radiography and morphometric X-ray ab-sorptiometry. J Bone Miner Res, 15(3):575–585, 2000.
[57] Ferrar L, Jiang G, and Eastell R. Short-term precision for morphometric X-rayabsorptiometry. Osteoporosis Int., 12:710–715, 2001.
[58] Ferrar L, Jiang G, Eastell R, and Peel N. Visual identification of vertebralfractures in osteoporosis: using morphometric X-ray absorptiometry. J BoneMiner Res, 18(5):933–938, 2003.
[59] Frost HM. Absorptiometry and osteoporosis: problems. J Bone Miner Metab,21:255–260, 2003.
[60] Gehlbach S, Bigelow C, Heimisdottir M, May S, et al. Recognition of vertebralfracture in a clinical setting. Osteoporosis Int, 11:577–582, 2000.
[61] Geman S and McClure D. Statistical methods for tomographic image recon-struction. Bulletin of the International Statistical Institute, LII:4–5, 1997.
[62] Genant HK, Cann CE, Ettinger B, and Gordan GS. Quantitative computedtomography of vertebral spongiosa: A sensitive method of detecting early boneloss after oopherectomy. Ann. Int. Med., 97:699–705, 1982.
[63] Genant HK, Engelke K, Fuerst T, Gluer C, et al. Noninvasive assessment ofbone mineral and structure: state of art. J Bone Miner Res, 11:707–730, 1996.
198
Bibliography
[64] Genant HK, Jergas M, and van Kuijk C, eds. Vertebral Fracture in Osteoporosis(University of California), 1995.
[65] Genant HK, Wu CY, van Kuijk C, and Nevitt MC. Vertebral fracture assess-ment using a semi-quantitative technique. J Bone Miner Res, 8:1137–1148,1993.
[66] Goh S, Price RI, Song S, Davis S, et al. Magnetic resonance-based vertebralmorphometry of the thoracic spine: age, gender and level-specific influences.Clin Biomech, 15:417–425, 2000.
[67] Gordon CL, Lang TF, Augat P, and Genant HK. Image-based assessment ofspinal trabecular bone structure from high-resolution CT images. OsteoporosisInt, 8:317–325, 1998.
[68] Grados F, Roux C, de Vernejoul MC, Utard G, et al. Comparison of fourmorphometric definitions and a semiquantitative consensus reading for assessingprevalent vertebral fractures. Osteoporosis Int, 12:716–722, 2001.
[69] Gregory JS, Stewart A, Undrill PE, Reid DM, et al. Bone shape, structure, anddensity as determinants of osteoporotic hip fracture: a pilot study investigatingthe combination of risk factors. Investigative Radiology, 40(9):591–597, 2005.
[70] Gregory JS, Testi D, Stewart A, Undrill PE, et al. A method for assessmentof the shape of the proximal femur and its relationship to osteoporotic hipfracture. Osteoporosis Int, 15(4):5–11, 2004.
[71] Guermazi A, Mohr A, Grigorian M, Taouli B, et al. Identification of vertebralfractures in osteoporosis. Seminars in Musculoskeletal Radiology, 6(3):241–252,2002.
[72] Guglielmi G, Grimston SK, Fischer KC, and Pacifici R. Osteoporosis: diagno-sis with lateral and posteroanterior dual X-ray absorptiometry compared withquantitative CT. Radiology, 192:845–850, 1994.
[73] Harris C and Stephens M. A combined corner and edge detector. In: AlveyVision Conference, (pages 147–151). 1988.
[74] Harris ST, Watts NB, Genant HK, McKeever CD, et al. Effects of rise-dronate treatment on vertebral and nonvertebral fractures in women with post-menopausal osteoporosis. a randomized clinical trial. JAMA, 282:1344–1352,1999.
[75] Holbrook TL, Grazier K, Kelsey JL, and Stauffer RN. The Frequency of Occur-rence, Impact and the Cost of Musculo-Skeletal Conditions in the United States(American Academy of Orthopedic Surgeons, Chicago), 1985.
[76] Holland PW and Welsch RE. Robust regression using iteratively reweightedleast squares. Communications in statistics,, A6:813–827, 1977.
199
Bibliography
[77] Howe B, Gururajan A, Sari-Sarraf H, and Long R. Hierarchical segmentation ofcervical and lumbar vertebrae using a customized generalized hough transformand extensions to active appearance models. In: Proc IEEE 6th SSIAI, (pages182–186). 2004.
[78] Hui SL, Slemanda CW, and Johnson CC. Age and bone mass as predictors offracture in a prospective study. J. Clin. Invest., 81:1804–1809, 1988.
[79] Jiang G, Eastell R, Barrington NA, and Ferrar L. Comparison of methods forthe visual identification of prevalent vertebral fracture in osteoporosis. Osteo-porosis Int, 15(4):000–nnn, 2004.
[80] Kanis JA and Johnell O. Requirements for DXA for the management of osteo-porosis in Europe. Osteoporosis Int, 16:229–238, 2005.
[81] Kanis JA, Melton LJ, Christiansen C, Johnston CC, et al. The diagnosis ofosteoporosis. J Bone Miner Res, 9:1137–1141, 1994.
[82] Kanis JA, Oden A, Johnell O, Johansson H, et al. The use of clinical risk factorsenhances the performance of BMD in the prediction of hip and osteoporoticfractures in men and women. Osteoporosis Int, 18:1033–1046, 2007.
[83] Kass M, Witkin A, and Terzopoulos D. Snakes: Active contour models. In:1st International Conference on Computer Vision, (pages 259–268) (London),1987.
[84] Kazakia GJ and Majumdar S. New imaging techniques in the diagnosis ofosteoporosis. Rev Endocr Metab Disord, 7:67–74, 2006.
[85] Kelsey JL and Hoffman S. Risk factors for hip fracture. N. Engl. J. Med.,316(7):404–406, 1987.
[86] Krug R, Banerjee S, Han ET, Newitt DC, et al. Feasibility of in vivo structuralanalysis of high-resolution magnetic resonance images of the proximal femur.Osteoporosis Int, 16:1307–1314, 2005.
[87] Li J, Wu CY, Jergas M, and Genant HK. Comparison of semiquantitative andquantitative methods for assessment of vertebral fractures. In: Christiansen C,ed., Fourth International Symposium on Osteoporosis and Consensus Develop-ment Conference (Gardiner-Caldwell), 1993.
[88] Li J, Wu CY, Jergas M, and Genant HK. Diagnosing prevalent vertebral frac-tures: A comparison between quantitative morphometry and a standard visual(semiquantitative) approach. In: Genant H, Jergas M, and van Kuijk C, eds.,Vertebral Fracture in Osteoporosis, (pages 271–279) (University of California),1995.
[89] Lindsay R, Gallagher JC, Kleerekoper M, and Pickar JH. Effect of lower dosesof conjugated equine estrogens with and without medroxyprogesterone acetateon bone in early postmenopausal women. JAMA, 287:2668–2676, 2002.
200
Bibliography
[90] Link TM, Bauer J, Kollstedt A, Stumpf I, et al. Trabecular bone structureof the distal radius, the calcaneus, and the spine: which site predicts fracturestatus of the spine best? Investigative Radiology, 39(8):487–497, 2004.
[91] Link TM, Majumdar S, Augat P, Lin JC, et al. In vivo high resolution MRIof the calcaneus: differences in trabecular structure in osteoporosis patients. JBone Miner Res, 13:1175–1182, 1998.
[92] Majumdar S, Genant HK, Grampp S, Newitt DC, et al. Correlation of trabec-ular bone structure with age, bone mineral density, and osteoporotic status:in vivo studies in the distal radius using high resolution magnetic resonanceimaging. J Bone Miner Res, 12:111–118, 1997.
[93] Manly BFJ. Multivariate Statistical Methods, a Primer (Chapman and Hall),1986.
[94] McCloskey E, Selby P, de Takats D, Bernard J, et al. Effects of clodronateon vertebral fracture risk in osteoporosis: a 1-year interim analysis. Bone,28(3):310–315, 2001.
[95] McCloskey EV, Spector TD, Eyres KS, Fern ED, et al. The assessment ofvertebral deformity: a method for use in population studies and clinical trials.Osteoporosis Int, 3:138–147, 1993.
[96] McClung MR, Geusens P, Miller PD, Zippel H, et al. Effect of risedronate onthe risk of hip fracture in elderly women. N Engl J Med, 344:333–340, 2001.
[97] McInerney T and Terzopoulos D. Deformable models in medical image analysis:a survey. Medical Image Analysis, 1(2):91–108, 1996.
[98] McNemar Q. Note on the sampling error of the difference between correlatedproportions or percentages. Psychometrika, 12:153–157, 1947.
[99] Melton LJ. How many women have osteoporosis now? J Bone Miner Res,10:175–177, 1995.
[100] Melton LJ, Atkinson EJ, Cooper C, O’Fallon WM, et al. Vertebral fracturespredict subsequent fractures. Osteoporosis Int, 10:214–221, 1999.
[101] Melton LJ, Chrischilles EA, Cooper C, Lane AW, et al. How many women haveosteoporosis? J Bone Miner Res, 7:1005–1010, 1992.
[102] Miller CW. Survival and ambulation following hip fractures. J. Bone JointSurg., 60A:930–934, 1978.
[103] Minne HW, Leidig G, Wuster C, Siromachkostov L, et al. A newly developedspine deformity index (SDI) to quantitate vertebral crush fractures in patientswith osteoporosis. Bone Miner, 3:335–349, 1998.
201
Bibliography
[104] Nastar C and Ayache N. Fast segmentation, tracking and analysis of deformableobjects. In: 4th International Conference on Computer Vision, (pages 275–279)(IEEE Computer Society Press, Berlin), 1993.
[105] Neer RM, Arnaud CD, Zanchetta JR, Prince R, et al. Effect of parathyroid hor-mone (1-34) on fractures and bone mineral density in postmenopausal womenwith osteoporosis. N Engl J Med, 344:1434–1441, 2001.
[106] Peacock M, Turner CH, Liu G, Manatunga AK, et al. Better discrimination ofhip fracture using bone density, geometry and architecture. Osteoporosis Int.,5:167–173, 1995.
[107] Poon CS, Braun M, Fahrig R, Ginige A, et al. Segmentation of medicalimages using an active contour model incorporating region-based image fea-tures. In: Robb R, ed., Proc 3rd Conf. on Visualization in Biomedical Com-puting (VBC 94) In volume 2359 of SPIE Proc, vol. 2359, (pages 90–97)(WA:SPIE,Bellingham), 1994.
[108] Press W, Teukolsky S, Vetterling W, and Flannery B. Numerical Recipes in C(Cambridge University Press), 2 edn., 1992.
[109] Prudham D and Evans JG. Factors associated with falls in the elderly: acommunity study. Age Ageing, 10(3):141–146, 1981.
[110] Rea JA, Li J, Blake GM, Steiger P, et al. Visual assessment of vertebral defor-mity by X-ray absorptiometry : a highly predictive method to exclude vertebraldeformity. Osteoporosis Int, 11:660–668, 2000.
[111] Rea JA, Steiger P, Blake GM, Potts E, et al. Morphometric x-ray absorptiome-try: reference data for vertebral dimensions. J. of Bone and Mineral Research,13:464–474, 1998.
[112] Reginster J, Minne H, Sorensen O, Hooper M, et al. Randomized trial of theeffects of risedronate on vertebral fractures in women with established post-menopausal osteoporosis. Osteoporos Int, 11:83–91, 2000.
[113] Roberts MG, Cootes TF, and Adams JE. Linking sequences of active ap-pearance sub-models via constraints: an application in automated vertebralmorphometry. In: 14th British Machine Vision Conference, (pages 349–358).2003.
[114] Roberts MG, Cootes TF, and Adams JE. Vertebral shape: Automatic measure-ment with dynamically sequenced active appearance models. In: 8th MICCAIConference, vol. 2, (pages 733–740). 2005.
[115] Roberts MG, Cootes TF, and Adams JE. Automatic segmentation of lumbarvertebrae on digitised radiographs using linked active appearance models. In:Graham J, Thacker N, and Cootes T, eds., Medical Image Understanding andAnalysis Conference, (pages 120–124) (BMVA), 2006.
202
Bibliography
[116] Roberts MG, Cootes TF, and Adams JE. Improving the segmentation accuracyof fractured vertebrae with dynamically sequenced active appearance models.In: 9th MICCAI Conference - Workshop on joint and bone disease, (pages 1–8).2006.
[117] Roberts MG, Cootes TF, and Adams JE. Vertebral morphometry: semi-automatic determination of detailed shape from DXA images using active ap-pearance models. Investigative Radiology, 41(12):849–859, 2006.
[118] Roberts MG, Cootes TF, Pacheco EM, and Adams JE. Quantitative verte-bral fracture detection on DXA images using shape and appearance models.Academic Radiology, 14:1166–1178, 2007.
[119] Ross PD, Davis JW, Vogel JM, and Wasnich RD. A critical review of bonemass and the risk of fractures in osteoporosis. Calcif. Tissue Int., 46(3):149–161, 1990.
[120] Rousseeuw PJ and Croux C. Alternatives to the median absolute deviation. JAmer Stat Assn,, 88:1273–1283, 1993.
[121] Sclaroff S and Isidoro J. Active blobs. In: International Conference on Com-puter Vision (ICCV 98), (pages 1146–1153) (Springer), 1998.
[122] Scott IM. Searching Image Databases using Appearance Models,PhD Thesis,chap. Further Experiments with Texture AAMs, (pages 135–138) (Division ofImaging Science and Biomedical Engineering, University of Manchester), 2004.
[123] Scott IM, Cootes TF, and Taylor CJ. Improving active appearance modelmatching using local image structure. In: 18th Conference on InformationProcessing in Medical Imaging, (pages 258–269). 2003.
[124] Singh M, Nagrath AR, and Maini PS. Changes in trabecular pattern of theupper end of the femur as an index of osteoporosis. J Bone Joint Surg, 52:437,1970.
[125] Smyth PP. Measurement of osteoporosis using computer vision,PhD Thesis(Department of Medical Biophysics, University of Manchester), 1997.
[126] Smyth PP, Taylor CJ, and Adams JE. Vertebral shape: automatic measurementwith active shape models. Radiology, 211:571–578, 1999.
[127] Sorenson JA and Cameron JR. A reliable In Vivo measurement of bone mineralcontent. J. Bone Joint Surg., 49:481–497, 1967.
[128] Staib LH and Duncan JS. Left ventricular analysis from cardiac images usingdeformable models. Proc Computers in Cardiology, (pages 427–430), 1989.
[129] Ste-Marie LG, Sod E, Johnson T, and Chines A. Five years of treatmentwith risedronate and its effects on bone safety in women with postmenopausalosteoporosis. Calcif Tissue Int, 75:469–476, 2004.
203
Bibliography
[130] Steiger P, Cummings SR, Genant HK, Weiss H, et al. Morphometric X-ray ab-sorptiometry of the spine: Correlation in vivo with morphometric radiography.Osteoporosis Int., 4:238–244, 1994.
[131] Szulc P and Delmas PD. Vertebral Fracture Initiative Resource Document (In-ternational Osteoporosis Foundation), 2005.
[132] Szulc P and Delmas PD. Vertebral Fracture Initiative Resource Pack (Interna-tional Osteoporosis Foundation), 2005.
[133] Tomomitsu T, Murase K, Sone T, and Fukunaga M. Comparison of verte-bral morphometry in the lumbar vertebrae by T1-weighted sagittal MRI andradiograph. European Journal of Radiology, 56:102–106, 2005.
[134] Torgerson DJ and Bell-Syer SEM. Hormone replacement therapy and preven-tion of non-vertebral fractures. a meta-analysis of randomised trials. JAMA,285:2891–2897, 2001.
[135] Turk M and Pentland A. Eigenfaces for recognition. Journal of CognitiveNeuroscience, 3:71–86, 1991.
[136] Vapnik V. The nature of statistical learning theory (Springer-Verlag), 1995.
[137] Wasnich RD, Ross PD, Heilbrun LK, and Vogel JM. Prediction of post-menopausal fracture risk with use of bone mineral measurements. Am. J.Obstet. Gynecol., 153(7):745–751, 1985.
[138] Wilkins C and Birge S. Prevention of osteoporotic fractures in the elderly. Am.J. Med., 118:1190–1195, 2005.
[139] Williams AL, Al-Busaidi A, Sparrow PJ, Adams JE, et al. Under-reporting ofosteoporotic vertebral fractures on computed tomography. European Journalof Radiology, (page In press), 2008.
[140] Wilson CR and Matson M. Dichromatic absortiometry of vertebral bone min-eral content. Invest. Radiol., 12:188–194, 1977.
[141] Wu CY, Li J, Jergas M, and Genant HK. Semiquantitative and quantitativeassessment of incident fractures: comparison of methods - abstract. J BoneMiner Res, 9(Suppl 1):S157, 1994.
[142] Wu CY, Li J, Jergas M, and Genant HK. Comparison of semiquantitative andquantitative methods for the assessment of prevalent and incident vertebralfractures. Osteoporosis Int, 5:354–379, 1995.
[143] Zamora G, Sari-Sarraf H, and Long R. Hierarchical segmentation of vertebraefrom X-ray images. Med Imaging: Image Process, Proc of SPIE, 5032:631–642,2003.
204
Bibliography
[144] Zweig MH and Campbell G. Receiver-operating characteristic (ROC) plots: afundamental evaluation tool in clinical medicine. Clinical Chemistry, 39:561–577, 1993.
205