40
MEDICAL IMAGE COMPUTING (CAP 5937) LECTURE 14: Segmentation Evaluation Framework Dr. Ulas Bagci HEC 221, Center for Research in Computer Vision (CRCV), University of Central Florida (UCF), Orlando, FL 32814. [email protected] or [email protected] 1 SPRING 2016

MEDICAL IMAGE COMPUTING (CAP 5937)bagci/teaching/mic16/lec14.pdfMEDICAL IMAGE COMPUTING (CAP 5937) LECTURE 14: Segmentation Evaluation Framework Dr. Ulas Bagci HEC 221, Center for

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

MEDICAL IMAGE COMPUTING (CAP 5937)

LECTURE 14: Segmentation Evaluation Framework

Dr. Ulas BagciHEC 221, Center for Research in Computer Vision (CRCV), University of Central Florida (UCF), Orlando, FL [email protected] or [email protected]

1SPRING 2016

Outline• How to evaluate accuracy of image segmentation?

– Qualitative– Quantitative

2

Segmentation EvaluationCan be considered to consist of two components:

(1) Theoretical

Study mathematical equivalence among algorithms.

(2) Empirical

Study practical performance of algorithms in specific application domains.

3

Segmentation Evaluation: TheoreticalFundamental challenges in segmentation evaluation:

(Ch1) Are major pI (purely Image based) frameworks such as activecontours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?

4

Segmentation Evaluation: TheoreticalFundamental challenges in segmentation evaluation:

(Ch1) Are major pI (purely Image based) frameworks such as activecontours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?

(Ch2) How to develop truly distinct methods constituting real advance?

5

Segmentation Evaluation: TheoreticalFundamental challenges in segmentation evaluation:

(Ch1) Are major pI (purely Image based) frameworks such as activecontours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?

(Ch2) How to develop truly distinct methods constituting real advance?

(Ch3) How to choose a method for a given application domain?

6

Segmentation Evaluation: TheoreticalFundamental challenges in segmentation evaluation:

(Ch1) Are major pI (purely Image based) frameworks such as activecontours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?

(Ch2) How to develop truly distinct methods constituting real advance?

(Ch3) How to choose a method for a given application domain?

(Ch4) How to set an algorithm optimally for an applicationdomain?

7

Segmentation Evaluation: TheoreticalFundamental challenges in segmentation evaluation:

(Ch1) Are major pI (purely Image based) frameworks such as activecontours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?

(Ch2) How to develop truly distinct methods constituting real advance?

(Ch3) How to choose a method for a given application domain?

(Ch4) How to set an algorithm optimally for an applicationdomain?

Currently any method A can be shown empirically to be better than anymethod B, even when they are equivalent.

8

Segmentation Evaluation: Theoretical

Attributes commonly used by segmentation methods:

(1) Connectedness (2) Texture(3) Smoothness of boundary(4) Gradient / homogeneity(5) Shape information about object(6) Noise handling(7) Optimization employed(8) Orientedness of boundary

Attributes utilized by well-known delineation models

Connected Gradient Texture Smooth Shape Noise Optimize

Fuzzy con Yes Gr = hom affinity

Obj feat affinity

No No Scale FC

In RFC

Chan-Vese No No Yes Yes No No YesMum-Shah No No Yes Yes No Yes Yes

KWT snake Boundary Yes No Yes No No YesMSV LS Fg when

expandngYes No No No No No

Live wire Boundary Yes Yes Yes User No YesAct. shape Yes No No No Yes No YesAct. app Yes No Yes No Yes No Yes

Graph cut Usly not Yes Possible No No No Yes

Clustering No No Yes No No No Yes

SEGMENTATIONEVALUATION:Theoretical

Segmentation Evaluation: Empirical

T :

B :

P :

Example: Estimating the volume of brain.

A body region -

Imaging protocol -

Application domain: A particular triple .

A task -

Example: Head.

Example: T2 weighted MRimaging with a particular set of parameters.

Q: A set of scenes acquired for a particular application domain

, ,〈 〉T B P

, , .T B P〈 〉

Segmentation Evaluation: Empirical

12

The segmentation efficacy of a method M in an applicationdomain may be characterized by three groupsof factors:

Precision :(Reliability)

Repeatability taking into account all subjective actions influencing the result.

Accuracy :(Validity)

Degree to which the result agrees withtruth.

Efficiency : (Viability)

Practical viability of the method.

, ,T B P〈 〉

Segmentation Evaluation: Empirical

13

Q: A given set of images in

(1) Manual delineation in images in Q – trace or paint à Qtd .

(2) Simulated images I: Create an ensemble of “cut-outs” of the object from different images and bury them realistically in different images à Q. The cut-outs are segmented carefully à Qtd.

For determining accuracy, need true/surrogates of true delineation.

, , .〈 〉T B PQtd : The corresponding set of images with true delineations.

Methods of generating true delineations:

Segmentation Evaluation: Empirical

14

A slice (a) of a scene simulated from an acquired MR proton density scene of a Multiple Sclerosis patient’s brain and its “true” segmentation (b) of the lesions.

(a) (b)

Segmentation Evaluation: Empirical15

(3) Simulated scenes II : Start from (binary/fuzzy) objects (Qtd )segmented from real scenes. Add intensity contrast, blur, noise, background variation realistically à Q.

White matter (WM) in a gray matter background, simulated by segmenting WM from real MR images and by adding blur, noise, background variation: (a) low, (b) medium, and (c) high.

(a) (b) (c)

Segmentation Evaluation: Empirical16

(4) Simulated scenes III : As in (3) or (1) but apply realistic deformations to the scenes in Q and Qtd.

Simulating more scenes (c) and their “true” segmentations (d) from existing scenes (a) and their manual segmentation (b) by applying known realistic deformations.

(d)(a) (c)(b)

Segmentation Evaluation: Empirical

17

(5) Simulated Scenes IV: Start from realistic mathematical phantoms (Qtd). Simulate the imaging process with noise, blur, background variation, etc. Create Scenes Q.

Montreal Neuro InstBrainweb data.

Segmentation Evaluation: Empirical(6) Estimating surrogate segmentations from manual segmentations (STAPLE).

Have many manual segmentations for each image in Q.

Estimate the segmentation that represents the best estimate of truth →Qtd.

18

Segmentation Evaluation: Empirical

19

Intra operator variationsInter operator variationsIntra scanner variationsInter scanner variations

Inter scanner variations include variations due to the same brand and different brands.

Repeatability taking into account all subjective actions that influence the segmentation result (fuzzy membership image).

Precision

OMC

Segmentation Evaluation: Empirical

20

Precision

( ) -

1 - , = 3, 4. + 2

1 2

i

1 2

O OM MT

M O OM M

PR i=C C

C C

A measure of precision for method M in a trial that producesand for situation Ti is given by

Intra/inter operator

Intra/inter scanner

may be binary or fuzzy segmentations.

1OMC 2O

MC

1 2,O OM MC C

Segmentation Evaluation: Empirical

21

Accuracy

The degree to which segmentations agree with true segmentation.

Surrogates of truth are needed.

For any image C acquired for application domain

- fuzzy segmentation of in by method ,- surrogate of true delineation of in .

OM O

Otd

M

C CC C

22

TPFP

TN

FN

True segmentation

OMC

tdC

Segmentation by algorithm M.

FP

FN

Ud

Segmentation Evaluation: Empirical

23

,

, ,

O Otd M td Md d

M Mtd td

O OM td d M tdd d

M Mtd d td

FNVF TPVF

UFPVF TNVF

U U -

−= =

− −= =

IC C C CC C

C C C -CC Cd -

Ud : A binary scene representing a reference super set(for example, this may be the body region that is imaged).

: Amount of tissue truly in that is missed by .

: Amount of tissue falsely delineated by .

dMdM

FNVF O MFPVF M

Segmentation Evaluation: Empirical

24

Requirements for accuracy metrics:

(1) Capture M’s behavior of trade-off between FP and FN.(2) Satisfy laws of tissue conservation:

(3) Capable of characterizing the range of behavior of M.(4) Any monotonic function g(FNVF, FPVF) is fine as a

metric.(5) Appropriate for

1

1

d dM Md dM M

FNVF TPVFFPVF TNVF

= −

= −

, , .T B P〈 〉

25

Segmentation Evaluation: Empirical

Segmentation Evaluation: Empirical

26

1-FNVF

FPVF

Brain WM segmentation in PD MRimages.

Each value of parameter vector p of M gives a point on the DOC curve.The DOC curve characterizes the behavior of M over a range of parametric values of M.

Delineation Operating Characteristic

:MA Area underthe DOC curve

Segmentation Evaluation: Empirical

27

, ,〈 〉T B P

.

FPVF

1-FN

VF

0

1p - parameter vector for method M

gp(FPVF, FNVF) - monotonic fn

p* = arg min p [gp(FPVF, FNVF)]

Set M to operate at p*.

Optimally setting an algorithm for

1

Existent Segmentation Data28

Expert 1 Expert 2 Expert 3 Expert 4

Original Image

• Manual segmentation performed by 4 independent experts

• low grade glioma

Expert and Student Segmentations

29

Test image Expert consensus Student 1

Student 2 Student 3

Segmentation Evaluation: Empirical

30

Describes practical viability of a method.

Four factors should be considered:

(1) Computational time – for one time training of M

(2) Computational time – for segmenting each scene

(3) Human time – for one-time training of M

(4) Human time – for segmenting each scene

(2) and (4) are crucial. (4) determines the degree of automation of M.

Efficiency

( )1cMt( )2cMt( )1hMt

( )2hMt

Segmentation Evaluation: Empirical

31

Precision : Accuracy :

:::

: Area under the DOC curveintra scannerFN fraction for delineation:inter operatorFP fraction for delineation:intra operator1T

MPR

2TMPR

3TMPR

dMFPVF

MA

dMFNVF

Efficiency :

operator time for scene segmentation.:operator time for algorithm training.:computational time for scene segmentation.:computational time for algorithm training.:1c

Mt2cMt

1hMt

2hMt

4TMPR : inter scanner

Segmentation Evaluation32

Software Systems for SegmentationSoftware OS Cost Tools

3D Doctor W fee Manual tracing

3D Slicer W, L, U no fee Manual, EM methods, level sets

3DVIEWNIX L, U binaryno fee Manual, optimal thresh., FC family, live wire family, fuzzy thresh.,

clustering, live snakeAmira W, L, U,

Mfee Manual, snakes, region growing, live wire

Analyze W, L, U fee Manual, region growing, contouring, math morph, interface to ITK

Aquarius Unknown

fee Unknown

Brain Voyager W, L, U fee Thresholding, region growing, histogram methods

CAVASS W, L, U, M

no fee Manual, opt thresh., FC family, live wire family, fuzzy thresh,clustering, live snake, active shape, interface to ITK

etdips W no fee Manual, thresholding, region growing

Freesurfer L, M no fee Atlas-based (for brain MRI)

AdvantageWindows U, W fee Unknown

Image Pro W fee Color histogram

Segmentation Evaluation33

Software Systems for Segmentation

Imaris W fee Thresholding (microscopic images)

ITK W, L, U, M

no fee Thresh., level sets, watershed, fuzzy connectedness, active shape,region growing, etc.

MeVisLab W, L binary no fee Manual, thresh., region growing, fuzzy connectedness, live wire

MRVision L, U, M fee Manual, region growing

Osiris W, M no fee Thresholding, region growingRadioDexter Unknow

nfee Unknown

SurfDriver W, M fee ManualSliceOmatic W fee Thresholding, watershed, region growing, snakes

Syngo InSpace Unknow

n

fee Automatic bone removal

VIDA Unknown

fee Manual, thresholding

Vitrea Unknown

fee Unknown

VolView W, L, U fee Level sets, region growing, watershed

Voxar W, L, U fee Unknown

Remarks

34

(1) Precision, accuracy, efficiency are interdependent.

accuracy à efficiency.precision and accuracy à difficult.

(2) “Automatic segmentation method” has no meaning unless theresults are proven on a large number of data sets withacceptable precision, accuracy, efficiency, and with .

(3) A descriptive answer to “is method M1 better than M2 under ?” in terms of the 11 parameters is more meaningful

than a “yes” or “no” answer.

(4) DOC is essential to describe the range of behavior of M.

2hMt = 0

, ,T B P〈 〉

Shape Based Metrics for Segmentation Evaluation

35

Sensitivity=94.69%Specificity=94.19%

Sensitivity=72.99%Specificity=78.16%

If you use only DSC (dice similarity, or overlap measure), DSC values are similar to each otherIn both examples (but not sensitivity-specificity values).

Sufficient Enough?

Hausdorff Distance• Can be used for a complementary evaluation metric to the

overlap measure for measuring boundary mismatches!

36

Hausdorff Distance• Can be used for a complementary evaluation metric to the

overlap measure for measuring boundary mismatches!• Lower Haussdorff Distance (HD), Better segmentation

accuracy!

37

( ))(max),(maxmax),( bdadBAHD ABbBAa ∈∈=

( )),(min)( badadBbB ∈

= is a distance of one point a on A from B

Segmentation EvaluationAn Evaluation Framework for Computer Aided Visualization and Analysis should consist of:

(FW1) Real life image data for several domains

(FW2) Reference segmentations (of all images) that can be used as surrogates of true segmentations.

(FW3) Specification of computable, effective, meaningful metrics for precision, accuracy, efficiency.

(FW4) Several reference segmentation methods objectively optimized for each

(FW5) Software incorporating (FW1) – (FW4).

38

, , .T B P〈 〉

, , .T B P〈 〉

SummaryNeed unifying segmentation theories that can explain equivalences/distinctness of existing algorithms.

This can ensure true advances in segmentation.

Need evaluation frameworks with FW1-FW5.

This can standardize methods of empirical comparison of competing and distinct algorithms.

39

Slide Credits and References• Credits to: Jayaram K. Udupa of Univ. of Penn., MIPG• Neculai Archip, Ph.D• Simon K. Warfield, Ph.D. (See STAPLE Algorithm)

40