Practical Issues on Clinical Validation of Digital Imaging Applications in Routine Surgical Pathology FDA Hematology and Pathology Devices Panel Meeting

Practical Issues on Clinical Validation of Digital Imaging Applications in Routine

Surgical PathologyFDA Hematology and Pathology Devices

Panel MeetingOctober 22-23, 2009

Tan Nguyen, MD, PhD, RACFDA/CDRH/OIVD/DIHD-DCTD

2

Digitalization Not a Barrier to Pathologic Diagnosis

• Image-based telepathology having been in place for a number of years

• Availability of capable automated high-speed, high-resolution whole slide imaging technology (WSI)

• At issue: How can we demonstrate that pathologists can safely and effectively sign out routine surgical cases via WSI of H&E glass slides?– Compare with diagnoses made by light microscopy

3

Presentation Outline• Quality of images

– Image acquisition, image display• Clinical performance study

– Possible study designs– Selection of study participants– Case (specimen) selection– Establishing “reference” diagnosis– Evaluating diagnosis agreement– Other issues

4

Image Acquisition• Optimal objective lens power for image scanning?

– Digital magnification or magnification by interchangeable objective lenses?

• Single-focus plane or 3-D image enhancement?– Z-stacks needed for certain examinations (e.g., surgical

margin, H. pylori, microcalcifications, nucleoli)• Compression algorithm, user-selectable ratio?

– Diagnosis made on uncompressed image or image retrieved from prior compressed image data file?

5

Image Display• Viewing monitor

– Standardized size, aspect ratio, display resolution (low, medium, high)?

• Viewing software– Image storage, retrieval, annotation

• Viewer functionality– “Thumbnail” view– Panning, zooming, side-by-side viewing of multiple

images

6

Types of Possible Clinical Study• Prospective study (“field study”)?

– Replicating real-world surgical pathology practice– Minimizing case selection bias – Introducing multiple new sources of variation

• e.g., non-uniform specimen selection/suitability, variable quality of glass slides

– Impractical? • Resource constraints at each study site• Possibly longer overall study duration

7

Types of Possible Clinical Study• Retrospective study?

– Ability to select archival cases to challenge (“stress test”) the competing diagnostic modalities

– Possible to incorporate more case variation– Inherent case selection bias– Often employed in MRMC ROC studies* to assess

diagnostic accuracy of radiologic imaging interpretations• Large study to detect small differences in accuracy

possible* Multiple-reader multiple-case receiver operating characteristic studies

8

MRMC ROC Paradigm

• Possible to adopt MRMC ROC paradigm?– Frequently used tool in diagnostic radiology

• More information per case, smaller sample sizes– Ability to compare accuracy of diagnostic modalities that

rely on wide range of subjective interpretations by readers of varying skill degrees

– Generalizable to similar readers and similar cases– Potentially complicated by multiple observations

(diagnoses) in the same specimen

9

Selecting Study Participants• Study participants

– Spectrum of pathologists without formal specialty training to specialty experts or more homogeneous population?

– Prior exposure to digital pathology• Study locations

– Community/academic practices, commercial laboratories• Number of study participants?

– Traditional MRMC ROC studies: 10-20 readers; 100-300 cases

10

Selecting a Balanced Set of Cases• Adequate mix of biopsies to radical excisions

– Broad spectrum of diagnostic complexity• Not based on ease of diagnosis, typicality of appearance

– Randomly or sequentially selected specimens– Anonymized archival or prospectively collected cases– Use of enriched samples for low-prevalence diseases?– Including all or only representative diagnostic part(s)?– How many cases?

• Statistical power against reader’s burden

11

Observer Variation• Inherent subjectivity in interpretation thresholds

– e.g., “atypia,” tumor grading, borderline or uncommon lesions

– Paucity of lesional area; intra-lesional variation – Lack of clear diagnostic criteria– Non-quantitative nature of scoring (e.g., pleomorphism)– Subjective distinctions on a histologic continuum

• Broad spectrum of experience and confidence– Diagnostic “aggressiveness” or hedging in uncertainty

12

Reducing Observer Variation• Strict adherence to diagnostic criteria and guidelines• Use of pro forma histopathology reporting form• Use of checklist standardized diagnostic lines• Free-text diagnosis for diagnostic uncertainty?

– Accommodating personal reporting style, judgment– Statistically problematic to evaluate

• Collapsed 2-tiered versus 3-tiered grading system?• Circulating an annotated training set prior to study?

13

Establishing Light Microscopy “Reference” Diagnosis

• Diagnosis by expert or consensus panel?– Number of experts

• Consensus diagnosis by study participants?– Unanimous agreement or majority agreement?

• Allowing “acceptable” diagnosis?– Disagreement in opinion, but not “error” (i.e., no

amendment necessary)?• “Reference” diagnosis be abstraction of the primary

diagnosis or all diagnostic lines?

14

Evaluating Diagnosis Agreement• Primary diagnosis agreement only?

– 2o diagnoses often posing no clinical impact– Unacceptable for a pathologist simply to make accurate

diagnosis of malignancy!• Line-by-line agreement (1o and 2o diagnoses)?

– Ideal for collecting performance testing data– Unrealistic to expect high agreements without clearly

defined diagnostic criteria for all lesions under inquiry– Incomplete agreement on 2o diagnoses?

15

Evaluating Diagnosis Agreement• “Major” versus “minor” discrepancy

– Determined based on clinical impact or flat-out histopathologic error?

• Compound nevus versus junctional nevus; CIN II versus CIN III → flat-out error, but no difference in treatment

• Tumor on inked margin or within 1 mm of inked margin in breast biopsy → often a subjective call if specimen not adequately inked, but greatly affecting treatment decision

– False-positive versus false-negative diagnosis?– Treated differently or equally in statistical evaluation?

16

Evaluating Diagnosis Agreement

Panel’s “reference” diagnoses(light microscopy)

Participants’ diagnoses by digital pathology

Participants’ diagnoses by light microscopy

R1 R2

R3

17

“Wash-out” Period• E.g., a study involving the same pathologist reading:

– ½ cases: digital imaging followed by light microscopy– ½ cases: light microscopy followed by digital imaging

• “Wash-out” period between digital imaging reading and light microscopy reading?– Easier said than done!– Not necessary, if desirable to know whether one modality,

when seen first, resulting in improved agreement rate of the subsequent one?

18

Evaluating Diagnosis Agreement• If significant disagreement between R1, R2 , and R3:

– Case-sample variation– Intra- and interobserver variations– Variation intrinsic to each diagnostic modality

• Possible or need to tease out all variations? • Or, account for effects of case and reader variations

on accuracy of competing diagnostic modalities?– e.g., by MRMC statistical models; then comparing the

overall accuracy

19

Other Issues• Assuming valid performance data exist for one tissue

type (e.g., breast pathology):– Can the test system be generalized and labeled for all

other surgical pathology tissue types without the need for further validations?

– Can it be generalized and labeled for intraoperative (frozen section) diagnosis and telepathology?

– If not, how should the label explicitly state the test system’s limitations?

20

Other Issues• Generalizing performance of WSI of H&E glass

slides to non-H&E-stained glass slides?• Required training of pathologists prior to using WSI?

– What type of training?• Need for post-marketing study for additional safety

and effectiveness data?– How to conduct such study?– What data to collect?

Documents

Practical Issues on Clinical Validation of Digital Imaging Applications in Routine Surgical Pathology FDA Hematology and Pathology Devices Panel Meeting