Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
MEDICAL IMAGE COMPUTING (CAP 5937)
LECTURE 14: Segmentation Evaluation Framework
Dr. Ulas BagciHEC 221, Center for Research in Computer Vision (CRCV), University of Central Florida (UCF), Orlando, FL [email protected] or [email protected]
1SPRING 2016
Segmentation EvaluationCan be considered to consist of two components:
(1) Theoretical
Study mathematical equivalence among algorithms.
(2) Empirical
Study practical performance of algorithms in specific application domains.
3
Segmentation Evaluation: TheoreticalFundamental challenges in segmentation evaluation:
(Ch1) Are major pI (purely Image based) frameworks such as activecontours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?
4
Segmentation Evaluation: TheoreticalFundamental challenges in segmentation evaluation:
(Ch1) Are major pI (purely Image based) frameworks such as activecontours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?
(Ch2) How to develop truly distinct methods constituting real advance?
5
Segmentation Evaluation: TheoreticalFundamental challenges in segmentation evaluation:
(Ch1) Are major pI (purely Image based) frameworks such as activecontours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?
(Ch2) How to develop truly distinct methods constituting real advance?
(Ch3) How to choose a method for a given application domain?
6
Segmentation Evaluation: TheoreticalFundamental challenges in segmentation evaluation:
(Ch1) Are major pI (purely Image based) frameworks such as activecontours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?
(Ch2) How to develop truly distinct methods constituting real advance?
(Ch3) How to choose a method for a given application domain?
(Ch4) How to set an algorithm optimally for an applicationdomain?
7
Segmentation Evaluation: TheoreticalFundamental challenges in segmentation evaluation:
(Ch1) Are major pI (purely Image based) frameworks such as activecontours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them?
(Ch2) How to develop truly distinct methods constituting real advance?
(Ch3) How to choose a method for a given application domain?
(Ch4) How to set an algorithm optimally for an applicationdomain?
Currently any method A can be shown empirically to be better than anymethod B, even when they are equivalent.
8
Segmentation Evaluation: Theoretical
Attributes commonly used by segmentation methods:
(1) Connectedness (2) Texture(3) Smoothness of boundary(4) Gradient / homogeneity(5) Shape information about object(6) Noise handling(7) Optimization employed(8) Orientedness of boundary
Attributes utilized by well-known delineation models
Connected Gradient Texture Smooth Shape Noise Optimize
Fuzzy con Yes Gr = hom affinity
Obj feat affinity
No No Scale FC
In RFC
Chan-Vese No No Yes Yes No No YesMum-Shah No No Yes Yes No Yes Yes
KWT snake Boundary Yes No Yes No No YesMSV LS Fg when
expandngYes No No No No No
Live wire Boundary Yes Yes Yes User No YesAct. shape Yes No No No Yes No YesAct. app Yes No Yes No Yes No Yes
Graph cut Usly not Yes Possible No No No Yes
Clustering No No Yes No No No Yes
SEGMENTATIONEVALUATION:Theoretical
Segmentation Evaluation: Empirical
T :
B :
P :
Example: Estimating the volume of brain.
A body region -
Imaging protocol -
Application domain: A particular triple .
A task -
Example: Head.
Example: T2 weighted MRimaging with a particular set of parameters.
Q: A set of scenes acquired for a particular application domain
, ,〈 〉T B P
, , .T B P〈 〉
Segmentation Evaluation: Empirical
12
The segmentation efficacy of a method M in an applicationdomain may be characterized by three groupsof factors:
Precision :(Reliability)
Repeatability taking into account all subjective actions influencing the result.
Accuracy :(Validity)
Degree to which the result agrees withtruth.
Efficiency : (Viability)
Practical viability of the method.
, ,T B P〈 〉
Segmentation Evaluation: Empirical
13
Q: A given set of images in
(1) Manual delineation in images in Q – trace or paint à Qtd .
(2) Simulated images I: Create an ensemble of “cut-outs” of the object from different images and bury them realistically in different images à Q. The cut-outs are segmented carefully à Qtd.
For determining accuracy, need true/surrogates of true delineation.
, , .〈 〉T B PQtd : The corresponding set of images with true delineations.
Methods of generating true delineations:
Segmentation Evaluation: Empirical
14
A slice (a) of a scene simulated from an acquired MR proton density scene of a Multiple Sclerosis patient’s brain and its “true” segmentation (b) of the lesions.
(a) (b)
Segmentation Evaluation: Empirical15
(3) Simulated scenes II : Start from (binary/fuzzy) objects (Qtd )segmented from real scenes. Add intensity contrast, blur, noise, background variation realistically à Q.
White matter (WM) in a gray matter background, simulated by segmenting WM from real MR images and by adding blur, noise, background variation: (a) low, (b) medium, and (c) high.
(a) (b) (c)
Segmentation Evaluation: Empirical16
(4) Simulated scenes III : As in (3) or (1) but apply realistic deformations to the scenes in Q and Qtd.
Simulating more scenes (c) and their “true” segmentations (d) from existing scenes (a) and their manual segmentation (b) by applying known realistic deformations.
(d)(a) (c)(b)
Segmentation Evaluation: Empirical
17
(5) Simulated Scenes IV: Start from realistic mathematical phantoms (Qtd). Simulate the imaging process with noise, blur, background variation, etc. Create Scenes Q.
Montreal Neuro InstBrainweb data.
Segmentation Evaluation: Empirical(6) Estimating surrogate segmentations from manual segmentations (STAPLE).
Have many manual segmentations for each image in Q.
Estimate the segmentation that represents the best estimate of truth →Qtd.
18
Segmentation Evaluation: Empirical
19
Intra operator variationsInter operator variationsIntra scanner variationsInter scanner variations
Inter scanner variations include variations due to the same brand and different brands.
Repeatability taking into account all subjective actions that influence the segmentation result (fuzzy membership image).
Precision
OMC
Segmentation Evaluation: Empirical
20
Precision
( ) -
1 - , = 3, 4. + 2
1 2
i
1 2
O OM MT
M O OM M
PR i=C C
C C
A measure of precision for method M in a trial that producesand for situation Ti is given by
Intra/inter operator
Intra/inter scanner
may be binary or fuzzy segmentations.
1OMC 2O
MC
1 2,O OM MC C
Segmentation Evaluation: Empirical
21
Accuracy
The degree to which segmentations agree with true segmentation.
Surrogates of truth are needed.
For any image C acquired for application domain
- fuzzy segmentation of in by method ,- surrogate of true delineation of in .
OM O
Otd
M
C CC C
Segmentation Evaluation: Empirical
23
,
, ,
O Otd M td Md d
M Mtd td
O OM td d M tdd d
M Mtd d td
FNVF TPVF
UFPVF TNVF
U U -
−= =
− −= =
IC C C CC C
C C C -CC Cd -
Ud : A binary scene representing a reference super set(for example, this may be the body region that is imaged).
: Amount of tissue truly in that is missed by .
: Amount of tissue falsely delineated by .
dMdM
FNVF O MFPVF M
Segmentation Evaluation: Empirical
24
Requirements for accuracy metrics:
(1) Capture M’s behavior of trade-off between FP and FN.(2) Satisfy laws of tissue conservation:
(3) Capable of characterizing the range of behavior of M.(4) Any monotonic function g(FNVF, FPVF) is fine as a
metric.(5) Appropriate for
1
1
d dM Md dM M
FNVF TPVFFPVF TNVF
= −
= −
, , .T B P〈 〉
Segmentation Evaluation: Empirical
26
1-FNVF
FPVF
Brain WM segmentation in PD MRimages.
Each value of parameter vector p of M gives a point on the DOC curve.The DOC curve characterizes the behavior of M over a range of parametric values of M.
Delineation Operating Characteristic
:MA Area underthe DOC curve
Segmentation Evaluation: Empirical
27
, ,〈 〉T B P
.
FPVF
1-FN
VF
0
1p - parameter vector for method M
gp(FPVF, FNVF) - monotonic fn
p* = arg min p [gp(FPVF, FNVF)]
Set M to operate at p*.
Optimally setting an algorithm for
1
Existent Segmentation Data28
Expert 1 Expert 2 Expert 3 Expert 4
Original Image
• Manual segmentation performed by 4 independent experts
• low grade glioma
Segmentation Evaluation: Empirical
30
Describes practical viability of a method.
Four factors should be considered:
(1) Computational time – for one time training of M
(2) Computational time – for segmenting each scene
(3) Human time – for one-time training of M
(4) Human time – for segmenting each scene
(2) and (4) are crucial. (4) determines the degree of automation of M.
Efficiency
( )1cMt( )2cMt( )1hMt
( )2hMt
Segmentation Evaluation: Empirical
31
Precision : Accuracy :
:::
: Area under the DOC curveintra scannerFN fraction for delineation:inter operatorFP fraction for delineation:intra operator1T
MPR
2TMPR
3TMPR
dMFPVF
MA
dMFNVF
Efficiency :
operator time for scene segmentation.:operator time for algorithm training.:computational time for scene segmentation.:computational time for algorithm training.:1c
Mt2cMt
1hMt
2hMt
4TMPR : inter scanner
Segmentation Evaluation32
Software Systems for SegmentationSoftware OS Cost Tools
3D Doctor W fee Manual tracing
3D Slicer W, L, U no fee Manual, EM methods, level sets
3DVIEWNIX L, U binaryno fee Manual, optimal thresh., FC family, live wire family, fuzzy thresh.,
clustering, live snakeAmira W, L, U,
Mfee Manual, snakes, region growing, live wire
Analyze W, L, U fee Manual, region growing, contouring, math morph, interface to ITK
Aquarius Unknown
fee Unknown
Brain Voyager W, L, U fee Thresholding, region growing, histogram methods
CAVASS W, L, U, M
no fee Manual, opt thresh., FC family, live wire family, fuzzy thresh,clustering, live snake, active shape, interface to ITK
etdips W no fee Manual, thresholding, region growing
Freesurfer L, M no fee Atlas-based (for brain MRI)
AdvantageWindows U, W fee Unknown
Image Pro W fee Color histogram
Segmentation Evaluation33
Software Systems for Segmentation
Imaris W fee Thresholding (microscopic images)
ITK W, L, U, M
no fee Thresh., level sets, watershed, fuzzy connectedness, active shape,region growing, etc.
MeVisLab W, L binary no fee Manual, thresh., region growing, fuzzy connectedness, live wire
MRVision L, U, M fee Manual, region growing
Osiris W, M no fee Thresholding, region growingRadioDexter Unknow
nfee Unknown
SurfDriver W, M fee ManualSliceOmatic W fee Thresholding, watershed, region growing, snakes
Syngo InSpace Unknow
n
fee Automatic bone removal
VIDA Unknown
fee Manual, thresholding
Vitrea Unknown
fee Unknown
VolView W, L, U fee Level sets, region growing, watershed
Voxar W, L, U fee Unknown
Remarks
34
(1) Precision, accuracy, efficiency are interdependent.
accuracy à efficiency.precision and accuracy à difficult.
(2) “Automatic segmentation method” has no meaning unless theresults are proven on a large number of data sets withacceptable precision, accuracy, efficiency, and with .
(3) A descriptive answer to “is method M1 better than M2 under ?” in terms of the 11 parameters is more meaningful
than a “yes” or “no” answer.
(4) DOC is essential to describe the range of behavior of M.
2hMt = 0
, ,T B P〈 〉
Shape Based Metrics for Segmentation Evaluation
35
Sensitivity=94.69%Specificity=94.19%
Sensitivity=72.99%Specificity=78.16%
If you use only DSC (dice similarity, or overlap measure), DSC values are similar to each otherIn both examples (but not sensitivity-specificity values).
Sufficient Enough?
Hausdorff Distance• Can be used for a complementary evaluation metric to the
overlap measure for measuring boundary mismatches!
36
Hausdorff Distance• Can be used for a complementary evaluation metric to the
overlap measure for measuring boundary mismatches!• Lower Haussdorff Distance (HD), Better segmentation
accuracy!
37
( ))(max),(maxmax),( bdadBAHD ABbBAa ∈∈=
( )),(min)( badadBbB ∈
= is a distance of one point a on A from B
Segmentation EvaluationAn Evaluation Framework for Computer Aided Visualization and Analysis should consist of:
(FW1) Real life image data for several domains
(FW2) Reference segmentations (of all images) that can be used as surrogates of true segmentations.
(FW3) Specification of computable, effective, meaningful metrics for precision, accuracy, efficiency.
(FW4) Several reference segmentation methods objectively optimized for each
(FW5) Software incorporating (FW1) – (FW4).
38
, , .T B P〈 〉
, , .T B P〈 〉
SummaryNeed unifying segmentation theories that can explain equivalences/distinctness of existing algorithms.
This can ensure true advances in segmentation.
Need evaluation frameworks with FW1-FW5.
This can standardize methods of empirical comparison of competing and distinct algorithms.
39