4 th Program Face to Face February 25, 2013 Andrew J. Buckler, MS Principal Investigator, QI-Bench WITH FUNDING SUPPORT PROVIDED BY NATIONAL INSTITUTE

4 th Program Face to Face February 25, 2013 Andrew J. Buckler, MS Principal Investigator, QI-Bench WITH FUNDING SUPPORT PROVIDED BY NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY

Agenda 1:00 PMOverview of QI-Bench Progress since last F2FBuckler CTP and Q/R DemonstrationReynolds 1:30 PMArchitecture and Design MotivationDima Component Model and Biomarker DBWernsing Data Virtualization LayerReynolds 2:30 PMBreak Specify and FormulateSuzek Compute Services, Analysis Library, WorkflowsDanagoulian QI-Bench Client / WorkstationWernsing 3:45 PMCT Volumetry Test BedBuckler 4:15 PMWrap-up(all) 4:45 PMAdjourn 22

Resources are needed to address widening gap in imaging capability as practiced vs. capability of modern medicine 3333333

Example: Beyond Anatomy to Palette of Functional Measures 44444444 glucose metabolism Biologic Target bone formation proliferation hypoxia amino acid metabolism angiogenesis receptor status apotosis 18 F-FDG 18 F-NaF 18 F-FLT 18 F- FACBC 18 F- FMISO 18 F-XXX DCE-MRI PET 18 F-FES

Biomarker Representation in Imaging vs. Genomics/Proteomics Imaging has been around far longer than genomics/proteomics Both are arrays of numbers but only one has data conveniently pre-aligned for quantitative analysis 18951995 Missing for imaging 55555555

Community Development of Quantitative Imaging Biomarkers User Base: Consortia and foundations interested in broadly promoting imaging biomarkers (e.g., FNIH Biomarkers Consortium, Prevent Cancer Foundation, RSNA QIBA) Academic groups and research centers developing novel imaging biomarkers and applications (e.g., Stanford, Georgetown, etc.) Medical device and software manufacturers producing software that quantifies image biomarkers (e.g., Definiens, Vital Images) Biopharmaceutical companies and/or CROs interested in utilizing specific imaging biomarkers in clinical trials (e.g., Merck, Otsuka) Government regulatory and standards agencies (e.g., FDA, NIST) A community of people working together: No single stakeholder can do it alone, and this results in a need for standardized terminology and applications using it. 66666666

Ex vivo Biomarkers (genomic/proteomic) In vivo Biomarkers (imaging) Material Resources Biobanks (Karolinska Institutue Biobank, British Columbia Biobank) Probe/Tracer Banks (Radiotracer Clearinghouse) Data Resources Biomarker Databases (GEO, ArrayExpress, EDRN Biomarker Database, Infectious Disease Biomarker Database) Imaging Biomarker Resources (Midas, NBIA, Xnat, ) Metadata Resources Information Models GO, MIAME Information Models RadLex, DICOM, AIM, etc. Ex vivo and In vivo Biomarker Resources Certainly, as the science evolves, ex vivo and in vivo biomarkers will be thought of as on the same playing field and even combined 77777777

QIBO is analogous to GO Advantages of shared terminology: GO: Gene families, homologs, orthologs create rich relationships; synonyms between researchers resolved QIBO: Imaging biomarkers have rich relationships; synonyms between researchers resolved Scope of the ontology: GO: does not enumerate all gene products; supports annotation QIBO: does not enumerate all imaging biomarkers; supports annotation Cross-links between collaborating databases GO: ArrayExpress, EMBL, Ensembl, GeneCards, KEGG, MGD, NextBio, PDB, SGD, UniProt, etc... QIBO: NBIA, Radiotracer Clearinghouse, etc Variable level of detail queries: GO: all gene products in mouse genome vs. zooming in on only receptor tyrosine kinases QIBO: all ways to measure tumor volume vs. zooming in on % change of CT measurements of NSCLC tumor volumes 88888888

Worked Example (starting from claim analysis we discussed in February 2011) Measurements of tumor volume are more precise (reproducible) than uni- dimensional tumor measurements of tumor diameter. Longitudinal changes in whole tumor volume during therapy predict clinical outcomes (i.e., OS or PFS) earlier than corresponding uni-dimensional measurements. Therefore, tumor response or progression as determined by tumor volume will be able to serve as the primary endpoint in well-controlled Phase II and III efficacy studies of cytotoxic and selected targeted therapies (e.g., antiangiogenic agents, tyrosine kinase inhibitors, etc.) in several solid, measurable tumors (including both primary and metastatic cancers of, e.g., lung, liver, colorectal, gastric, head and neck cancer,) and lymphoma. Changes in tumor volume can serve as the endpoint for regulatory drug approval in registration trials. Biomarker claim statements are information-rich and may be used to set up the needed analyses. 999999

The user enters information from claim into the knowledgebase using Specify Measurements of tumor volume are more precise (reproducible) than uni-dimensional tumor measurements of tumor diameter. Longitudinal changes in whole tumor volume during therapy predict clinical outcomes (i.e., OS or PFS) earlier than corresponding uni-dimensional measurements. Therefore, tumor response or progression as determined by tumor volume will be able to serve as the primary endpoint in well- controlled Phase II and III efficacy studies of cytotoxic and selected targeted therapies (e.g., antiangiogenic agents, tyrosine kinase inhibitors, etc.) in several solid, measurable tumors (including both primary and metastatic cancers of, e.g., lung, liver, colorectal, gastric, head and neck cancer,) and lymphoma. Changes in tumor volume can serve as the endpoint for regulatory drug approval in registration trials. SubjectPredicateObject CTimagesTumor VolumetryanalyzesCT Longitudinal Volumetry estimatesTumorSize Change TumorSize Change predictsTreatment Response Categ oric Contin uous 10

pulling various pieces of information, Measurements of tumor volume are more precise (reproducible) than uni-dimensional tumor measurements of tumor diameter. Longitudinal changes in whole tumor volume during therapy predict clinical outcomes (i.e., OS or PFS) earlier than corresponding uni-dimensional measurements. Therefore, tumor response or progression as determined by tumor volume will be able to serve as the primary endpoint in well- controlled Phase II and III efficacy studies of cytotoxic and selected targeted therapies (e.g., antiangiogenic agents, tyrosine kinase inhibitors, etc.) in several solid, measurable tumors (including both primary and metastatic cancers of, e.g., lung, liver, colorectal, gastric, head and neck cancer,) and lymphoma. Changes in tumor volume can serve as the endpoint for regulatory drug approval in registration trials. SubjectPredicateObject CTimagesTumor VolumetryanalyzesCT Longit udinalVolumetry estimatesTumorSizeChange predictsCytotoxicTreatment Response TyrosineKinase Inhibitor isCytotoxicTreatment well-controlled Phase II and III efficacy studies usesCytotoxicTreatment Response Cytotoxic Treatment influencesNonSmallCellLung Cancer CTimagesThorax containsNonSmallCellLung Cancer Interv ention Target Indicat ion 11

to form the specification. Measurements of tumor volume are more precise (reproducible) than uni-dimensional tumor measurements of tumor diameter. Longitudinal changes in whole tumor volume during therapy predict clinical outcomes (i.e., OS or PFS) earlier than corresponding uni-dimensional measurements. Therefore, tumor response or progression as determined by tumor volume will be able to serve as the primary endpoint in well- controlled Phase II and III efficacy studies of cytotoxic and selected targeted therapies (e.g., antiangiogenic agents, tyrosine kinase inhibitors, etc.) in several solid, measurable tumors (including both primary and metastatic cancers of, e.g., lung, liver, colorectal, gastric, head and neck cancer,) and lymphoma. Changes in tumor volume can serve as the endpoint for regulatory drug approval in registration trials. To produce data for registration To substantiate quality of evidence development SubjectPredicateObject CTimagesTumor VolumetryanalyzesCT Longitudinal Volumetry estimatesTumorSizeChange predictsCytotoxicTreatmentResponse TyrosineKinaseInhibitorisCytotoxicTreatment well-controlled Phase II and III efficacy studies usesCytotoxicTreatmentResponse CytotoxicTreatmentinfluencesNonSmallCellLungCancer CTimagesThorax containsNonSmallCellLungCancer regulatory drug approvaldependsOnPrimaryEndpoint well-controlled Phase II and III efficacy studies assessPrimaryEndpoint CT Volumetryis SurrogateEndpoint 12

Formulate interprets the specification as testable hypotheses, Measurements of tumor volume are more precise (reproducible) than uni-dimensional tumor measurements of tumor diameter. Longitudinal changes in whole tumor volume during therapy predict clinical outcomes (i.e., OS or PFS) earlier than corresponding uni-dimensional measurements. Therefore, tumor response or progression as determined by tumor volume will be able to serve as the primary endpoint in well- controlled Phase II and III efficacy studies of cytotoxic and selected targeted therapies (e.g., antiangiogenic agents, tyrosine kinase inhibitors, etc.) in several solid, measurable tumors (including both primary and metastatic cancers of, e.g., lung, liver, colorectal, gastric, head and neck cancer,) and lymphoma. Changes in tumor volume can serve as the endpoint for regulatory drug approval in registration trials. Type of biomarker, in this case predictive (could have been something else, e.g., prognostic), to establish the mathematical formalism Technical characteri stic SubjectPredicateObject CTimagesTumor VolumetryanalyzesCT Longitudinal Volumetry estimatesTumorSizeChange predictsCytotoxicTreatmentResponse TyrosineKinaseInhibitorisCytotoxicTreatment well-controlled Phase II and III efficacy studies usesCytotoxicTreatmentResponse CytotoxicTreatmentinfluencesNonSmallCellLungCancer CTimagesThorax containsNonSmallCellLungCancer regulatory drug approvaldependsOnPrimaryEndpoint well-controlled Phase II and III efficacy studies assessPrimaryEndpoint CT Volumetryis SurrogateEndpoint 1 3 2 13

setting up an investigation (I), study (S), assay (A) hierarchy SubjectPredicateObject CTimagesTumor VolumetryanalyzesCT Longitudinal Volumetry estimatesTumorSizeChange predictsCytotoxicTreatmentResponse TyrosineKinaseInhibitorisCytotoxicTreatment well-controlled Phase II and III efficacy studies usesCytotoxicTreatmentResponse CytotoxicTreatmentinfluencesNonSmallCellLungCancer CTimagesThorax containsNonSmallCellLungCancer regulatory drug approvaldependsOnPrimaryEndpoint well-controlled Phase II and III efficacy studies assessPrimaryEndpoint CT Volumetryis SurrogateEndpoint 1 3 2 14 Investigations to Prove the Hypotheses: 1.Technical Performance = Biological Target + Assay Method 2.Clinical Validity = Indicated Biology + Technical Performance 3.Clinical Utility = Biomarker Use + Clinical Validity Investigation-Study-Assay Hierarchy: Investigation = {Summary Statistic} + {Study} Study = {Descriptive Statistic} + Protocol + {Assay} Assay = RawData + {AnnotationData} AnnotationData = [AIM file|mesh|]

ADDING TRIPLES TO CAPTURE URIs: SubjectPredicateObject ClinicalUtilityisInvestigation URI ClinicalValidityisInvestigation URI TechnicalPerformanceisInvestigation URI InvestigationhasSummaryStatisticType InvestigationhasStudy URI StudyhasDescriptiveStatisticType StudyhasProtocol URI StudyhasAssay URI AssayhasRawData URI and loading data into Execute (at least raw data, possibly annotations if they already exist) SubjectPredicateObject AIsPatient AisDiagnosedWithDiseaseA IsNonSmallLCellLunCancer PazopanibIsTyrosoineKinaseInhibitor AhasBaselineCT AhasTP1CT AhasTP2CT BisDiagnosedWithDiseaseA BhasBaselineCT BhasTP1CT AhasOutcomeDeath BhasOutcomeSurvival 15 DISCOVERED DATA:LOADING DATA INTO THE RDSM:

If no annotations, Execute creates them (in either case leaving Analyze with its data set up for it) SubjectPredicateObject ClinicalUtilityisInvestigation URI ClinicalValidityisInvestigation URI TechnicalPerformanceisInvestigation URI InvestigationhasSummaryStatisticType InvestigationhasStudy URI StudyhasDescriptiveStatisticType StudyhasProtocol URI StudyhasAssay URI AssayhasRawData URI AssayhasAnnotationData URI AIM fileisAnnotationData URI MeshisAnnotationData URI 16 Either in batch or via Scripted reader studies (using Share and Duplicate functions of RDSM to leverage cases across investigations) (self-generating knowledgebase from RDSM hierarchy and ISA-TAB description files)

Analyze performs the statistical analyses SubjectPredicateObject AIsPatient AisDiagnosedWithDiseaseA IsNonSmallLCellLunCancer AhasClinicalObserva tion B BIsTumorShrinkage CIsPatient ChasClinicalObserva tion B D B PazopanibIsTyrosoineKinaseInhibitor AisTreatedWithPazopanib AhasOutcomeDeath ChasOutcomeSurvival SubjectPredicateObject CTimagesTumor VolumetryanalyzesCT Longitudinal Volumetry estimatesTumorSizeChange predictsCytotoxicTreatmentResponse TyrosoineKinaseInhibitorisCytotoxicTreatment well-controlled Phase II and III efficacy studies usesCytotoxicTreatmentResponse CytotoxicTreatmentinfluencesNonSmallCellLungCancer CTimagesThorax containsNonSmallCellLungCancer regulatory drug approvaldependsOnPrimaryEndpoint well-controlled Phase II and III efficacy studies assessPrimaryEndpoint CT VolumetryisSurrogateEndpoint for CytotoxicTreatment 1 3 2 17

and adds the results to the knowledgebase (using W3C best practices for relation strength). SubjectPredicateObject 45324biasMethod 45324bias 45324variabilityMethod 45324variability 9956 Method 9956correlation 9956 Method 9956ROC 98234Effect of treatment on true endpoint 98234Effect of treatment on surrogate endpoint 98234Effect of surrogate on true endpoint 98234Effect of treatment on true endpoint relative to that on surrogate endpoint SubjectPredicateObject CTimagesTumor VolumetryanalyzesCT Longitudinal Volumetry estimatesTumorSizeChange predictsCytotoxicTreatmentResponse TyrosoineKinaseInhibitorisCytotoxicTreatment well-controlled Phase II and III efficacy studies usesCytotoxicTreatmentResponse CytotoxicTreatmentinfluencesNonSmallCellLungCancer CTimagesThorax containsNonSmallCellLungCancer regulatory drug approvaldependsOnPrimaryEndpoint well-controlled Phase II and III efficacy studies assessPrimaryEndpoint CT VolumetryisSurrogateEndpoint for CytotoxicTreatment 1 3 2 URI=45324 URI=9956 URI=98234 18

Package Structure submissions according to eCTD, HL7 RCRIM, and SDTM Section 2Summaries 2.1.Biomarker Qualification Overview 2.1.1.Introduction 2.1.2.Context of Use 2.1.3.Summary of Methodology and Results 2.1.4.Conclusion 2.2.Nonclinical Technical Methods Data 2.2.1.Summary of Technical Validation Studies and Analytical Methods 2.2.2.Synopses of individual studies 2.3.Clinical Biomarker Data 2.3.1.Summary of Biomarker Efficacy Studies and Analytical Methods 2.3.2.Summary of Clinical Efficacy [one for each clinical context] 2.3.3.Synopses of individual studies Section 3Quality Section 4Nonclinical Reports 4.1.Study reports 4.1.1.Technical Methods Development Reports 4.1.2.Technical Methods Validation Reports 4.1.3.Nonclinical Study Reports (in vivo) 4.2.Literature references Section 5Clinical Reports 5.1.Tabular listing of all clinical studies 5.2.Clinical study reports and related information 5.2.1.Technical Methods Development reports 5.2.2.Technical Methods Validation reports 5.2.3.Clinical Efficacy Study Reports [context for use] 5.3.Literature references 19 SubjectPredicateObject 45324biasMethod 45324bias 45324variabilityMethod 45324variability 9956 Method 9956correlation 9956 Method 9956ROC 98234Effect of treatment on true endpoint 98234Effect of treatment on surrogate endpoint 98234Effect of surrogate on true endpoint 98234Effect of treatment on true endpoint relative to that on surrogate endpoint

Development priorities FunctionalityTheoretical BaseTest Beds Domain Specific Language Executable Specifications Computational Model Enterprise vocabulary / data service registry End-to-end Specify-> Package workflows Curation pipeline workflows DICOM: Segmentation objects Query/retrieve Structured Reporting Worklist for scripted reader studies Improved query / search tools (including link of Formulate and Execute) Continued expansion of Analyze tool box Further analysis of 1187/4140, 1C, and other data sets using LSTK and/or use API to other algorithms Support more 3A-like challenges Integration of detection into pipeline Meta-analysis of reported results using Analyze False-positive reduction in lung cancer screening Other biomarkers 21

Unifying Goal Perform end-to-end characterization of imaging biomarkers (e.g., vCT) including meta-analysis of literature, incorporation of results from groups like QIBA, and "scaled up" using automated detection and reference quantification methods. Integrated characterization across heterogenous data sources (e.g., QIBA, FDA, LIDC/RIDER, Give-a-scan, Open Science sets), through analysis modules and rolling up in a way directly useful for electronic submissions. Specifically have medical physicists, statisticians, and imaging scientists able to use it (as opposed to only software engineers) 22

CTP AND Q/R DEMONSTRATION 23

DICOM Protocol Implementation Uses DCMTK to handle protocol interaction Datasets can be selectively available via the DICOM interface Protocol support was tested on Osirix, Clear Canvas, and Ginkgo CADx 24

DICOM Anonymization Remove Patient Identifying Information based on established protocols Leverages the Clinical Trials Processor (CTP) from the Radiological Society of North America All processing happens client-side 25

Demo 26

Demo 27

Demo 28

Demo 29

Demo 30

ARCHITECTURE AND DESIGN MOTIVATION dima 31

Big Picture

Motivations for Rethinking Design and Architecture Fully embracing reuse and open source can lead to an eclectic architectures and implementations Issues: Finding broadly fluent developers System deployment and maintenance Compliance with organizational security plans Potential loss of architectural coherence and project focus Strategy Describe the required system using a Domain Specific Language (DSL) Use description to guide implementation Use Java Platform as much as possible for implementation Started with sketch using a Backus-Naur (BNF) notation Began looking at describing portions of system in a Java- based DSL 33

DSL Examples Simple Camera Language Grammar: ::= ::= "set" "camera" "size" ":" "by" "pixels" "." ::= "set" "camera" "position" ":" "," "." ::= + ::= "move" "pixels" "." ::= "up" | "down" | "left" | "right Example: Set camera size: 400 by 300 pixels. Set camera position: 100, 100. Move 200 pixels right. Move 100 pixels up. SQL SELECT Book.title AS Title, COUNT(*) AS Authors FROM Book JOIN Book_author ON Book.isbn = Book_author.isbn GROUP BY Book.title; 34

BNF Model Data Data resources: RawDataType = ImagingDataType | NonImagingDataType | ClinicalVariableType CollectedValue = Value + Uncertainty DataService = { RawData | CollectedValue } Implication that contents may change over time ReferenceDataSet = { RawData | CollectedValue } With fixed refresh policy and documented (controlled) provenance Derived from analysis of ReferenceDataSets: TechnicalPerformance = Uncertainty | CoefficientOfVariation | CoefficientOfReliability | ClinicalPerformance = ReceiverOperatingCharacteristic | PPV/NPV | RegressionCoefficient | SummaryStatistic = TechnicalPerformance| ClinicalPerformance Knowledge Managed as Knowledge store: Relation = subject property object (property object) BiomarkerDB = { Relation } Examples: OntologyConcept has Instance | Biomarker isUsedFor BiologicalUse use | Biomarker isMeasuredBy AssayMethod method | AssayMethod usesTemplate AimTemplate template | AimTemplate includes CollectedValuePrompt prompt | ClinicalContext appliesTo IndicatedBiology biology | (AssayMethod targets BiologicalTarget) withStrength TechnicalPerformance | (Biomarker pertainsTo ClinicalContext) withStrength ClinicalPerformance | generalizations beyond this 35

Business Requirements Provide: Means for FNIH, QIBA, and C-Path participants to precisely specify context for use and applicable assay methods (allow semantic labeling): BiomarkerDB = Specify (biomarker domain expertise, ontology for labeling); Ability for researchers and consortia to use data resources with high precision and recall: ReferenceDataSet+ = Formulate (BiomarkerDB, {DataService} ); Vehicle for technology developers and contract research organizations to do large- scale quantitative runs: ReferenceDataSet.CollectedValue+ = Execute (ReferenceDataSet.RawData); Means for community to apply definitive statistical analyses of annotation and image markup over specified context for use: BiomarkerDB.SummaryStatistic+ = Analyze ( { ReferenceDataSet.CollectedValue } ); Standardized methods for industry to report and submit data electronically: efiling transactions+ = Package (BiomarkerDB, {ReferenceDataSet} ); 36

Computational Model Data availability is the bottleneck - purpose here is to define informatics services to make best use of data to: Optimize information content from any given experimental study, and Incorporate individual study results into a formally defined description of the biomarker acceptable to regulatory agencies. 37 efiling transactions = Package (Analyze (Execute (Formulate (Specify (biomarker domain expertise), DataService))));

COMPONENT MODEL AND BIOMARKER DB wernsing 38

Most familiar: Data Services 40

Also familiar: Compute Services 41

Less familiar to some, but foundational to the full vision: The Blackboard 42

Current implementation of Specify 43

Beginning of the *new* Specify 44

Current Bio2RDF site 45

Less familiar to some, but foundational to the full vision: The Blackboard 46

Interfacing to existing ecosystem: Workstations 47

Internal components within QI-Bench to make it work: Controller and Model Layers 48

Internal components within QI-Bench to make it work: QI-Bench REST 49

Last but not least: QI-Bench Web GUI 50

DATA VIRTUALIZATION LAYER 51

Most familiar: Data Services 52

Data Virtualization Layer (Motivation) Datasets come in disparate forms from many different databases with different APIs We need a method to aggregate data in such a way that all data may be addressed equally We need a Java-based solution for this 53

Publicly Available Now for Detailed Use QI-Bench Demonstrators (inc;. QIBA and FDA data) Public facing: 5,281image series over 6 studies of 3 anatomic regions (Secure instance: 17,000 image series over 7 studies of 1 anatomic regions) LIDC/RIDER/TCIA 2,129 patients over 9 studies of 4 anatomic regions: Give-a-scan, 23 patients at http://www.giveascan.org/community/view/2:http://www.giveascan.org/community/view/2 Open Science sets (e.g., biopsy cases), 1,209 datasets over 3 studies at http://midas3.kitware.com/midas/community/6:http://midas3.kitware.com/midas/community/6 NBIA 3,759 image series from 771 patients over 17 studies of 3 anatomic regions (the 3770 from Formulates simple search). have roughly patients, e.g., 54

Data Virtualization (Implementation) Teiid, a framework for exposing nearly any data source via a JDBC-compliant API Teiid will allow for adding new imaging and informatics databases with minimal effort Teiid gets us to think about the data we want. 55

SPECIFY AND FORMULATE suzek 56

Motivation Specify: Support a researcher to state a hypothesis in a natural language like way using ontologies The tumor volume change computed from longitudinal thorax CT images is a biomarker for treatment response to a specific drug family Formulate: Support seamless collection of data sets to support hypothesis One needs longitudinal thorax CT images from lung cancer patients should have been treated with a specific drug family 57 SubjectPredicateObject CTimagesThorax containsNonSmallCellLungCancer VolumetryanalyzesCT LongitudinalVolumetryestimatesTumorSizeChange predictsCytotoxicTreatmentRespo nse TyrosineKinaseInhibitorisCytotoxicTreatment

Contribution A natural language like way of formalizing and standardizing hypothesis statement A computable way to persist the hypothesis to supporting reuse and iteration A automated way to identify the data sets to support/study hypothesis A reproducible flow from hypothesis to data collection/analysis

Solution Overview 59 Specify Formulate Reference Data Sets Knowledge base (in triple store) (evaluation applications) hypotheses and saved queries (testable) assertions existing datasets New datasets initial annotations enriched annotations derived data raw data new/updated triples current triples data services compute services Unstructured or semi- structured expert sources for clinical context for use and assay methods QIBO and linked ontologies

Representing Semantics Leverage existing established ontologies and extend QIBO Normalize representation to ontologies E.g. convert portions of BRIDG and LSDAM in UML to ontologies

Specify Navigate the ontology hierarchy for concepts Create triple (subject, predicate, object) using concepts from ontology Manage and store triples that represent the hypothesis The tumor volume change computed from l ongitudinal thorax CT images is a biomarker for treatment response to a specific drug family

Formulate Automatically populating the query from the triples create by Specify Invoking query against data services Collecting and aggregating normalized data into triples from data services Transform: Entity: CT images Properties:, For Thorax, From patients with Non Small Cell Lung Cancer SELECT ?image WHERE { ?image x:type CT ; ?image x:isFor Thorax; ?image x:isFrom ?patient; ?patient x:has nonSmallCellLungCancer }

Formulate Supported by existing image related data services wrapped to: Serve to SPARQL queries Provide metadata aligned with the same ontologies used by Specify

Specify - Current Status Specify: A prototype leveraging Annotation and Image Markup (AIM) Template Builder All navigation/management capabilities in UI Triple storage

Formulate - Current Status Formulate: A proof of concept leveraging a data query tools specifically designed for caGrid services; caB2B Forms from UML-based metadata to help search Query storage

Challenges and Future Directions Alignment of Formulate with ontologies A new formulate using SPARQL and ontologies used by Specify Integration of Specify and Formulate Import and transform mechanisms to convert a Specify triples to Formulate Wrapping existing services and their metadata Data integration solutions such as Teiid to wrap native imaging services (e.g. MIDAS)

COMPUTE SERVICES, ANALYSIS LIBRARY, WORKFLOWS danagoulian 67

Compute Services 68

Compute Services: Objects for the Analyze Library Technical Performance Capabilities to analyze literature, to extract Reported technical performance Covariates commonly measured in clinical trials Capability to analyze data to Characterize image dataset quality Characterize datasets of statistical outliers. Capability to analyze technical performance of datasets to, e.g. Characterize effects due to scanner settings, geography, scanner, site, and patient status. Quantify sources of error and variability Characterize variability in the reading process. Evaluate image segmentation algorithms. Clinical Performance Capability to analyze clinical performance, e.g. analyze relative effectiveness of response criteria and/or read paradigms. response analysis in clinical trials. characterize metrics limitations. establish biomarkers value as a surrogate endpoint. 69 In Place In Progress In Queue

Analyze Library: Coding View Core Analysis Modules: AnalyzeBiasAndLinearity PerformBlandAltmanAndCCC ModelLinearMixedEffects ComputeAggregateUncertainty Meta-analysis Extraction Modules: CalculateReadingsFromMeanStdev (written in MATLAB to generate synthetic Data) CalculateReadingsFromStatistics (written in R to generate synthetic data. Inputs are number of readings, mean, standard deviation, inter- and intra-reader correlation coefficients). CalculateReadingsAnalytically Utility Functions: PlotBlandAltman GapBarplot Blscatterplotfn 70

MetricPurposeSourceLanguageStatus STAPLETo compute a probabilistic estimate of the true segmentation and a measure of the performance level by each segmentation FDAMATLABtesting STAPLESame as aboveITKC++implemented soft STAPLE Extension of STAPLE to estimate performance from probabilistic segmentations TBD DICEMetric evaluation of spatial overlapITKC++implemented VoteProbability mapITKC++implemented P-MapProbability mapC. MeyerPerlimplemented Jaccard, Rand, DICE, etc. Pixel-based comparisonsVersus (Peter Bajcsy) JAVAtesting 71 Drill-down on segmentation analysis activities

Update on Workflow Engine for the Compute Services allows users to create their own workflows and facilitates sharing and re- using of workflows. has a good interface for capture of the provenance of data. ability to work across different platforms (Linux, OSX, and Windows). easy access to a geographically distributed set of data repositories, computing resources, and workflow libraries. robust graphical interface. can operate on data stored in a variety of formats, locally and over the internet (APIs, Web RESTful interfaces, SOAP, etc). directly interfaces to R, MATLAB, ImageJ, (or other viewers). ability to create new components or wrap existing components from other programs (e.g., C programs) for use within the workflow. provides extensive documentation. grid-based approaches to distributed computation. 72 Supported by Taverna Could also be done in Taverna, but already supported in Kepler

QI-BENCH CLIENT / WORKSTATION wernsing 73

QI-Bench Client a first view 74

The Scientists view 75

The Biomarker view 76

The Blackboard view 77

One idea of the Blackboard view 78

The Clinicians view 79

Personalize Your Experience 80

Future Development Continue with core infrastructure development. Jena TDB Jersey Kepler integration Struts 2 Teiid integration Parallel work QI-Bench API Workflows Connections to the API Web GUI Workstation plugins 81

CT VOLUMETRY TEST BED buckler 82

Test bed: e.g., the 3A challenge series 83 1.Median Technologies 2.Vital Images, Inc. 3.Fraunhofer Mevis 4.Siemens 5.Moffitt Cancer Center 6.Toshiba Pilot Pivotal Investigation 1 Tr ain Te st Pilot Pivotal Investigation Tr ain Te st Pilot Pivotal Investigation Tr ain Te st Pilot Pivotal Investigation n Tr ain Te st PrimaryPrimary SecondarySecondary Defined set of data Defined challenge Defined test set policy Some of the Participants 7.GE Healthcare 8.Icon Medical Imaging 9.Columbia University 10.INTIO, Inc. 11.Vital Images, Inc.

Broader capability: Systematic qualification of CT volumetry 84 1.Median Technologies 2.Vital Images, Inc. 3.Fraunhofer Mevis 4.Siemens 5.Moffitt Cancer Center 6.Toshiba Pilot Pivotal Investigation 1 Tr ain Te st Pilot Pivotal Investigation Tr ain Te st Pilot Pivotal Investigation Tr ain Te st Pilot Pivotal Investigation n Tr ain Te st PrimaryPrimary SecondarySecondary Defined set of data Defined challenge Defined test set policy Some of the Participants 7.GE Healthcare 8.Icon Medical Imaging 9.Columbia University 10.INTIO, Inc. 11.Vital Images, Inc. PROFILE Authoring and Testing Inter-analysis technique (algorithm) variability (3A) Correlation with clinical endpoints and outcomes (3B) machine view human expert view Transformation Modality Environment Therapy Decision Environment Patient Transformation Feedback Therapy- MachineHuman Observer Intra- and inter-reader variability (1A) Minimum detectable biological change (1B) Inter-scanner model, and -site variability (1C) 5 readers, 3 reads each Performance- based branch / compliance procedure Extend to other Lesion characteristics Extend to other Lesion characteristics Explore figures- of-merit and QC procedures Explore figures- of-merit and QC procedures

Scope of Consideration: Purpose and value of the test-bed Thesis: enough data exists and/or could be made available with sufficient clarity on how it would be used to qualify CT volumetry as a biomarker in cancer treatment contexts. Qualification per se is neither the only nor even necessarily the best goal, but it does provide a defined target that is useful in driving activity. Another working model is how RECIST has come to be accepted. There is considerable overlap in the needs of these two models. QI-Bench is ideally suited to meeting these needs. This drives both technical as well as clinical performance characterization activities. Formally articulating requirements for these activities and reducing them to practice using open source methods backed by rigorous system development process continues to drive us. 85

Scope of Consideration: Purpose and value of the test-bed Theoretical contributions lie in the area of formal methods for maximizing value of data, specifically in pushing the limits of generalizability to eke out as much utility per unit of data or analytical resource as possible. Means to this end is to develop practical methods that merge logical and statistical inference. Practical contributions lie in the area of developing tangible and effective systems for image archival, representation of wide-ranging and heterogeneous metadata, and facilities to conduct reproducible workflows to increase scientific rigor and discipline in promoting imaging biomarkers. Means to this end are the applications we develop and the deployment options we implement. CT Volumetry is a rich example because so many have worked on it so long, yet without benefit of actual convergence for lack of these capabilities. However we are not limited to it. Sponsored uses of this capability have been conducted in both anatomic and functional applications of MR, we also hope other QIBA committees might have an interest to use it, e.g., FDG-PET, PDF-MRI, etc. It would be relatively easy for them to do so based on technology choices. Also, NCI CIP, QIN, and other groups have started to express interest. 86

Current initiatives on the test-bed The common footing analyses of QIBA studies The next 3A challenge as described today More broadly: Specific datasets for vCT Specific datasets for vCT Literature-based meta-analysis Literature-based meta-analysis Umbrella SAP Umbrella SAP Project plan Project plan 87

Value proposition of QI-Bench Efficiently collect and exploit evidence establishing standards for optimized quantitative imaging: Users want confidence in the read-outs Pharma wants to use them as endpoints Device/SW companies want to market products that produce them without huge costs Public wants to trust the decisions that they contribute to By providing a verification framework to develop precompetitive specifications and support test harnesses to curate and utilize reference data Doing so as an accessible and open resource facilitates collaboration among diverse stakeholders 89

Summary: QI-Bench Contributions We make it practical to increase the magnitude of data for increased statistical significance. We provide practical means to grapple with massive data sets. We address the problem of efficient use of resources to assess limits of generalizability. We make formal specification accessible to diverse groups of experts that are not skilled or interested in knowledge engineering. We map both medical as well as technical domain expertise into representations well suited to emerging capabilities of the semantic web. We enable a mechanism to assess compliance with standards or requirements within specific contexts for use. We take a toolbox approach to statistical analysis. We provide the capability in a manner which is accessible to varying levels of collaborative models, from individual companies or institutions to larger consortia or public-private partnerships to fully open public access. 90

QI-Bench Structure / Acknowledgements Prime: BBMSC (Andrew Buckler, Gary Wernsing, Mike Sperling, Matt Ouellette, Kjell Johnson, Jovanna Danagoulian) Co-Investigators Kitware (Rick Avila, Patrick Reynolds, Julien Jomier, Mike Grauer) Stanford (David Paik) Financial support as well as technical content: NIST (Mary Brady, Alden Dima, John Lu) Collaborators / Colleagues / Idea Contributors Georgetown (Baris Suzek) FDA (Nick Petrick, Marios Gavrielides) UMD (Eliot Siegel, Joe Chen, Ganesh Saiprasad, Yelena Yesha) Northwestern (Pat Mongkolwat) UCLA (Grace Kim) VUmc (Otto Hoekstra) Industry Pharma: Novartis (Stefan Baumann), Merck (Richard Baumgartner) Device/Software: Definiens, Median, Intio, GE, Siemens, Mevis, Claron Technologies, Coordinating Programs RSNA QIBA (e.g., Dan Sullivan, Binsheng Zhao) Under consideration: CTMM TraIT (Andre Dekker, Jeroen Belien) 91

Documents

4 th Program Face to Face February 25, 2013 Andrew J. Buckler, MS Principal Investigator, QI-Bench WITH FUNDING SUPPORT PROVIDED BY NATIONAL INSTITUTE