1
Mapping Visual Features to Semantic Profiles for Retrieval in Medical Imaging Johannes Hofmanninger, Georg Langs Department of Biomedical Imaging and Image-guided Therapy Computational Imaging Research Lab, Medical University of Vienna www.cir.meduniwien.ac.at Content based image retrieval (CBIR) is highly relevant in medical imaging, since it makes vast amounts of imaging data accessible for comparison dur- ing diagnosis. Finding image similarity measures that reflect diagnostically relevant relationships is challenging, since the overall appearance variability is high compared to often subtle signatures of diseases. To learn models that capture the relationship between semantic clinical information and image elements at scale, we have to rely on data generated during clinical rou- tine (images and radiology reports), since expert annotation is prohibitively costly. The problem of learning relations between local image regions and la- bels on the image level can be posed as a multi-label multi-instance learning (MIL) problem. Retrieval related to clinical findings such as lung textures poses a very particular form of MIL different to standard MIL metric learn- ing techniques in several aspects. The number of instances in the bags is substantially higher compared to standard MIL data reported in literature ( 1000 vs. 10 as in e.g. [1, 2]) or MI benchmark datasets such as Fox, Tiger, Elephant. The optimization problem in [1] grows quadratically with the number of instances. Furthermore, when analysing medical imaging data, the bags are heavily skewed, each bag containing a large portion of healthy instances since even patient lungs contain healthy tissue. This poses challenges to distance definitions on the bag level where the minimum dis- tance among the instances of two bags is used to judge their relationship [1, 2]. We demonstrate that re-mapping visual features extracted from medical imaging data based on weak image volume level label information creates descriptions of local image content that capture clinically relevant informa- tion. These labels can be extracted from radiology reports that describe findings in image volumes. Results show that these features enable higher recall and precision during retrieval compared to visual features. Further- more, after learning, we can map specific semantic terms describing disease patterns to localized image volume areas. The method consists of a training and an indexing- or application phase. During training multiple dense, random, independent partitionings of the feature space are generated by a random ferns ensemble [3]. Based on the label distributions in the resulting partitions, a remapping of feature vectors is generated that captures the link between appearance and weak labels. In the indexing- or application phase, an ensemble affinity for a novel record to each class is calculated, and a corresponding semantic profile feature vector is generated. For the experimental validation we created a weakly labeled data set from the labeled lung data. We performed experiments on a set of 300 high resolution computed tomography (HRCT) scans of lungs. All vox- els in the images are labelled into one of five tissue classes: healthy lung texture and four types (ground-glass, reticular pattern, honey combing, em- physema) occuring in interstitial lung diseases (ILD). In advance, we per- form over-segmentation of the volumes to supervoxels of an average size of 1cm 3 . We rebag sets of supervoxels in a way to from training data equiv- alent to what would be training data from clinical source for the proposed algorithm. For our experiments, we extract two texture descriptors for each supervoxel: (1) 1200-dimensional Texture Bags on Local Binary Patterns (BVW) and (2) 52-dimensional Haralick features around the center of the supervoxel. For each descriptor, we generate a semantic profile mapping (SP-BVW and SP-Haralick). Based on a set of queries, we rank the training data using Euclidean distance among 4 feature vectors. Figure 1 (b) shows ground truth labelings of a volume, and figure 1 (b) a labeling obtained by assuming that the highest semantic profile coefficient is a good estimator for the correct label. Figure 2 shows precision-recall curves for the five tissue This is an extended abstract. The full paper is available at the Computer Vision Foundation webpage. a b c d e Figure 1: In medical imaging (a) only a small part of the information cap- tured by visual features relates to relevant clinical information such as dis- eased tissue types (b). Information for learning is typically only available as sets of reported findings on the image level. We demonstrate how to learn a mapping of these weak semantic labels to individual voxels (c). This results in good labeling accuracy (d), and improved retrieval (d). 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Ground-glass 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Honeycombing BVW SPBVW Haralick baseline SPHaralick Figure 2: Precision Recall curves for two anomaly classes. Semantic Pro- files (SP-BVW, SP-Haralick) consistently outperform the corresponding vi- sual descriptors BVW and Haralick. classes. The proposed method offers favourable runtime with less than three minutes for learning on 615000 instances and five classes. The calculation of the semantic profiles of 7526 descriptors (one lung) takes 1.37 seconds. [1] Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. Multiple instance metric learning from automatically labeled bags of faces. In Computer Vision–ECCV 2010, pages 634–647. Springer, 2010. [2] Rong Jin, Shijun Wang, and Zhi-Hua Zhou. Learning a distance metric from multi-instance multi-label data. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009., pages 896–902. IEEE, 2009. [3] Mustafa Ozuysal, Pascal Fua, and Vincent Lepetit. Fast keypoint recog- nition in ten lines of code. In Conference on Computer Vision and Pat- tern Recognition, CVPR 2007., pages 1–8. IEEE, 2007.

Mapping Visual Features to Semantic Profiles for Retrieval ... · tine (images and radiology reports), since expert annotation is prohibitively costly. The problem of learning relations

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mapping Visual Features to Semantic Profiles for Retrieval ... · tine (images and radiology reports), since expert annotation is prohibitively costly. The problem of learning relations

Mapping Visual Features to Semantic Profiles for Retrieval in Medical Imaging

Johannes Hofmanninger, Georg LangsDepartment of Biomedical Imaging and Image-guided TherapyComputational Imaging Research Lab, Medical University of Viennawww.cir.meduniwien.ac.at

Content based image retrieval (CBIR) is highly relevant in medical imaging,since it makes vast amounts of imaging data accessible for comparison dur-ing diagnosis. Finding image similarity measures that reflect diagnosticallyrelevant relationships is challenging, since the overall appearance variabilityis high compared to often subtle signatures of diseases. To learn models thatcapture the relationship between semantic clinical information and imageelements at scale, we have to rely on data generated during clinical rou-tine (images and radiology reports), since expert annotation is prohibitivelycostly.

The problem of learning relations between local image regions and la-bels on the image level can be posed as a multi-label multi-instance learning(MIL) problem. Retrieval related to clinical findings such as lung texturesposes a very particular form of MIL different to standard MIL metric learn-ing techniques in several aspects. The number of instances in the bags issubstantially higher compared to standard MIL data reported in literature(� 1000 vs. ∼ 10 as in e.g. [1, 2]) or MI benchmark datasets such as Fox,Tiger, Elephant. The optimization problem in [1] grows quadratically withthe number of instances. Furthermore, when analysing medical imagingdata, the bags are heavily skewed, each bag containing a large portion ofhealthy instances since even patient lungs contain healthy tissue. This poseschallenges to distance definitions on the bag level where the minimum dis-tance among the instances of two bags is used to judge their relationship[1, 2].

We demonstrate that re-mapping visual features extracted from medicalimaging data based on weak image volume level label information createsdescriptions of local image content that capture clinically relevant informa-tion. These labels can be extracted from radiology reports that describefindings in image volumes. Results show that these features enable higherrecall and precision during retrieval compared to visual features. Further-more, after learning, we can map specific semantic terms describing diseasepatterns to localized image volume areas.

The method consists of a training and an indexing- or application phase.During training multiple dense, random, independent partitionings of thefeature space are generated by a random ferns ensemble [3]. Based on thelabel distributions in the resulting partitions, a remapping of feature vectorsis generated that captures the link between appearance and weak labels. Inthe indexing- or application phase, an ensemble affinity for a novel record toeach class is calculated, and a corresponding semantic profile feature vectoris generated.

For the experimental validation we created a weakly labeled data setfrom the labeled lung data. We performed experiments on a set of 300high resolution computed tomography (HRCT) scans of lungs. All vox-els in the images are labelled into one of five tissue classes: healthy lungtexture and four types (ground-glass, reticular pattern, honey combing, em-physema) occuring in interstitial lung diseases (ILD). In advance, we per-form over-segmentation of the volumes to supervoxels of an average size of1cm3. We rebag sets of supervoxels in a way to from training data equiv-alent to what would be training data from clinical source for the proposedalgorithm. For our experiments, we extract two texture descriptors for eachsupervoxel: (1) 1200-dimensional Texture Bags on Local Binary Patterns(BVW) and (2) 52-dimensional Haralick features around the center of thesupervoxel. For each descriptor, we generate a semantic profile mapping(SP-BVW and SP-Haralick). Based on a set of queries, we rank the trainingdata using Euclidean distance among 4 feature vectors. Figure 1 (b) showsground truth labelings of a volume, and figure 1 (b) a labeling obtained byassuming that the highest semantic profile coefficient is a good estimator forthe correct label. Figure 2 shows precision-recall curves for the five tissue

This is an extended abstract. The full paper is available at the Computer Vision Foundationwebpage.

a b

c

d

e

Figure 1: In medical imaging (a) only a small part of the information cap-tured by visual features relates to relevant clinical information such as dis-eased tissue types (b). Information for learning is typically only available assets of reported findings on the image level. We demonstrate how to learn amapping of these weak semantic labels to individual voxels (c). This resultsin good labeling accuracy (d), and improved retrieval (d).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.10.20.30.40.50.60.70.80.9

1 Ground-glass

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.10.20.30.40.50.60.70.80.9

1 Honeycombing

BVW SP−BVW Haralick baselineSP−Haralick

Figure 2: Precision Recall curves for two anomaly classes. Semantic Pro-files (SP-BVW, SP-Haralick) consistently outperform the corresponding vi-sual descriptors BVW and Haralick.

classes. The proposed method offers favourable runtime with less than threeminutes for learning on 615000 instances and five classes. The calculationof the semantic profiles of 7526 descriptors (one lung) takes 1.37 seconds.

[1] Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. Multipleinstance metric learning from automatically labeled bags of faces. InComputer Vision–ECCV 2010, pages 634–647. Springer, 2010.

[2] Rong Jin, Shijun Wang, and Zhi-Hua Zhou. Learning a distance metricfrom multi-instance multi-label data. In IEEE Conference on ComputerVision and Pattern Recognition, CVPR 2009., pages 896–902. IEEE,2009.

[3] Mustafa Ozuysal, Pascal Fua, and Vincent Lepetit. Fast keypoint recog-nition in ten lines of code. In Conference on Computer Vision and Pat-tern Recognition, CVPR 2007., pages 1–8. IEEE, 2007.