23
Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for 1A, 1B, and 1C: will be), are we planning to use this as Gold standard? This means that we DO NOT require the radiologists’ visual readings to test inter-algorithm variation. Please confirm. This means in the study design that we will have a metric of overlap measures between a radiologist’s contour and result from an algorithm. [email protected] - We have to define a metric for the overlap Answer Andy Buckler: discuss this during the next 3A call. My feeling is that we have two types of analyses: one being against true volume (in the phantom studies such as 1A and 1C directly, and indirectly by averaging readers results for the clinical data such as 1B), and the other type being MICCAI-type metrics such as volume overlap (and others from a variety of papers such as http:// books.google.com/books?id =Ot_3klqqO3cC&pg=PA977&lpg=PA977&dq= miccai+metrics+segmentation&source = bl&ots = hrJgveEFDs&sig =inZv8lbHcJ4rC9PiPtUQxIKSPSI&hl= en&ei =kzyrTcHzDITMgQfzypD0BQ&sa= X&oi = book_result&ct = result&resnum =4&ved=0CCAQ6AEwAw#v= onepage&q =miccai%20metrics%20segmentation&f=false ). 3A group member contributions

Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Embed Size (px)

Citation preview

Page 1: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 1 Confidential

3A Group

Dr Maria Athelogou

Grace Kim:

Clarification in the study design: Since we have contours of lesions from many readers (for 1A, 1B, and 1C: will be), are we planning to use this as Gold standard? This means that we DO NOT require the radiologists’ visual readings to test inter-algorithm variation.  Please confirm. This means in the study design that we will have a metric of overlap measures between a radiologist’s contour and result from an algorithm. [email protected]

- We have to define a metric for the overlap

Answer Andy Buckler:

discuss this during the next 3A call.  My feeling is that we have two types of analyses: one being against true volume (in the phantom studies such as 1A and 1C directly, and indirectly by averaging readers results for the clinical data such as 1B), and the other type being MICCAI-type metrics such as volume overlap (and others from a variety of papers such as http://books.google.com/books?id=Ot_3klqqO3cC&pg=PA977&lpg=PA977&dq=miccai+metrics+segmentation&source=bl&ots=hrJgveEFDs&sig=inZv8lbHcJ4rC9PiPtUQxIKSPSI&hl=en&ei=kzyrTcHzDITMgQfzypD0BQ&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCAQ6AEwAw#v=onepage&q

=miccai%20metrics%20segmentation&f=false).

3A group member contributions

Page 2: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 2 Confidential

3A Group

Dr Maria Athelogou

Page 3: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 3 Confidential

3A Group

Dr Maria Athelogou

Page 4: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 4 Confidential

3A Group

Dr Maria Athelogou

Page 5: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 5 Confidential

3A Group

Dr Maria Athelogou

Keep it simple?

Page 6: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 6 Confidential

3A Group

Dr Maria Athelogou

Page 7: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 7 Confidential

3A Group

Dr Maria Athelogou

http://lts08.bigr.nl/about.php

Page 8: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 8 Confidential

3A Group

Dr Maria Athelogou

Page 9: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 9 Confidential

3A Group

Dr Maria Athelogou

Each algorithm needs a different computing time for the volume calculation. We need a appropriate metric for the time factor (performance).

Page 10: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 10 Confidential

3A Group

Dr Maria Athelogou

Hubert Beaumont

Page 11: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 11 Confidential

3A Group

Dr Maria Athelogou

Hubert Beaumont

Page 12: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 12 Confidential

3A Group

Dr Maria Athelogou

Page 13: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 13 Confidential

3A Group

Dr Maria Athelogou

According to study design page 3

We need a exact Workflow for the measuring process (protocol)

Page 14: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 14 Confidential

3A Group

Dr Maria Athelogou

From last 3A call (April, 14th 2011)

Regarding the definition of intra software variability, may I suggest instead to consider algorithm variability as a function of initial conditions? (e.g. From variation of the seed).

It would give in the text:

The aim of the study is twofold:

To estimate inter-algorithm variability

To estimate independently the variability of each algorithm regarding variation of initial condition (eg. the seed point)

These estimations rely on the volume estimation of synthetic nodules from CT

Asked: As all the evaluation is remotely processed via a batch processing, I cannot see how to interactively change contours afterward. How to check the result of a given segmentation and redraw part of the segmentation?

If I have correctly understood, it would be quite tedious to process the first segmentation, download the result, check the contours, correct the segmentation, get the list of point, upload the list to the server and so on may be several time and this for a significant set of lesions.

Hubert Beaumont

Page 15: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 15 Confidential

3A Group

Dr Maria Athelogou

Image Analysis algorithms and Batch Analysis Service conformity? May we give the users the

possibility to download cases, if needed? Seed points could be predefined by QIBA

Using predefined template for up- and download of control and result parameters.

Clarify 3.3.1

Page 16: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 16 Confidential

3A Group

Dr Maria Athelogou

CTZ: Anthropomorphic phantoms have to be measured exact to guaranty connections to the SI Normals (page 2)

CTZ :

2) Very clear algorithm classification

page 2

3) make sure that digital phantom data are produced from physical objects instead of mathematical algorithms (page 7)

Page 17: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 17 Confidential

3A Group

Dr Maria Athelogou

David Gustafson

Page 18: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 18 Confidential

3A Group

Dr Maria Athelogou

David Gustafson

Page 19: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 19 Confidential

3A Group

Dr Maria Athelogou

David Gustafson

Page 20: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 20 Confidential

3A Group

Dr Maria Athelogou

David Gustafson

Page 21: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 21 Confidential

3A Group

Dr Maria Athelogou

David Gustafson

Page 22: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 22 Confidential

3A Group

Dr Maria Athelogou

David Gustafson

Page 23: Slide - 1 Confidential 3A Group Dr Maria Athelogou Grace Kim: Clarification in the study design: Since we have contours of lesions from many readers (for

Slide - 23 Confidential

3A Group

Dr Maria Athelogou

McNitt-Gray, Michael

A.B.: “ground truth.”  Would it also be possible to use LIDC data, where ground truth is the aggregate of the radiologist readers on this data?

If CoreLabs makes available the contours they generated from the 1B data (I understand from David Clunie they did record them, so in theory it is possible), then 3A might be able to use these as comparisons as “truth”.If not, then curating an evaluation set from LIDC, which would also have contours, would be possible as well.LIDC would have a larger set of lesions to draw from (hundreds, not 32).In both cases, first things to address would be:Format of contours – can you read in and decipher contour formats. LIDC used an xml format; CoreLabs presumably used some DICOM SR format (not sure)How to deal with four (LIDC) or five (QIBA 1B) contours so that you can use as a basis for comparison with algorithm.

Do you want to compare actual contours? (voxel by voxel comparison of boundaries using something like STAPLE?)Or just compare outcome variable – volume, diameter, etc.And in each case, you will have 4 or 5 potential individual comparisons; or just take the average of the human contour results and compare algorithm to that.

McNitt-Gray, Michael