12
Text- and Content- based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner- Fushman, Sameer Antani, George R. Thoma Lister Hill National Center for Biomedical Communications, National Library of Medicine, NIH, Bethesda, MD, USA CLEF 2009

Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Embed Size (px)

Citation preview

Page 1: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Text- and Content-based Approaches to Image

Retrieval for the ImageCLEF2009 Medical Retrieval Track

Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman, Sameer Antani, George R. Thoma

Lister Hill National Center for Biomedical Communications, National Library of Medicine, NIH, Bethesda, MD, USA

CLEF 2009

Page 2: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Retrieval tasks and approaches

• ITI project long term goal– Find a way to combine image and text features so

that the whole is greater than the sum of its parts

• Ad-hoc image retrieval– Text-based– Image content-based– Automatic mixed– Relevance feedback mixed

• Case-based document retrieval– Text-based

Page 3: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Text-based approach

• Indexing:– Create image documents for ad-hoc image

retrieval– Create surrogate documents for case-based

retrieval– Index using Essie

• term normalization using the SPECIALIST Lexicon• query expansion based on UMLS synonymy• term weighting based on location in the document• Phrase-based search

Page 4: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Text documents

• Image document– Title and caption provided by organizers– Mention extracted from paper– MEDLINE citation (abstract +MeSH)– PICO frame of the caption + image modality

(structured caption summary)

• Surrogate document– MEDLINE citation – caption, mention, and structured caption summary of

each image contained in the article

Page 5: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Text retrieval

• PICO-based structured query and case representation– <topicID>19</topicID> <description>Crohn's disease CT</description>

– <modality essieExp="false">ct</modality> <modSyn>c.a.t.</modSyn><modSyn>cat</modSyn><modSyn>computerised axial tomography</modSyn>….

– <cond essieExp="true">Crohn's disease</cond><condPN>crohn disease</condPN><condSyn>Regional enteritis</condSyn> <condSyn>eleocolitis</condSyn><condSyn>Cicatrizing enterocolitis</condSyn><condSyn>granulomatous enteritis</condSyn><condSyn>INFLAMMATORY BOWEL DISEASE</condSyn><condSyn>regional enterocolitis</condSyn> …

Page 6: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

CBIR - Image feature representation

• Concepts - color and texture patches from local image regions

• Low-level global features– Color (Color Layout Descriptor, MPEG-7)– Edge (histogram of local edge distribution and

direction)– Texture (grey level co-occurrence matrix)– Average grey level (256-dimensional vector of blocks

in image normalized to gray-level 64x64)– Lucene (LIRE)-based Color Edge Direction Descriptor

and Fuzzy Color Texture Histogram

Page 7: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Image similarity computation

• Category-specific– Determine image category (training set of

5000 images manually assigned to 32 mutually exclusive categories)

– Use category-specific weights in linear similarity matching

• Relevance feedback– Feature weights updated using images judged

relevant

Page 8: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Combining text and image

• Based on text search results,– Compute mean vector of top 5 retrieved images, use

as input to category-specific retrieval– Select 3-5 relevant images manually, use as input to

category-specific retrieval– Re-rank text retrieval results using visual retrieval

scores

• Provide feedback using all retrieval results, – expand query using image documents– Pad selected relevant images with new retrieval

results

Page 9: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Relevance Feedback

Page 10: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

MAP P@5 Recall

visual

category-specific

RF text

mixed

re-ranked

case-basedBRF RF RF+QE

Page 11: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Image-text search engine

Page 12: Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,

Thank you!

Questions?