Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Center for
Ethics &
Policy
Ethical and Scientific Issues in Developing Deep Learning Systems in Medicine: More Like Drugs Than You Might Think
Alex John London, Ph.D. Clara L. West Professor of Ethics and Philosophy Director, Center for Ethics and Policy Carnegie Mellon University
Twitter: @AlexJohnLondon
1
Center for
Ethics &
Policy
Disclosure
I. I have no conflicts to disclose regarding
the material in this talk.
II. Mention of a product, system or
approach is not an endorsement.
2
Center for
Ethics &
Policy
Agenda
I. An example of medical AI.
II. Why do we need RCTs in medicine?
III. Division of scientific labor across drug
development life cycle.
IV. Commonalities with Deep Learning
Systems
3
Center for
Ethics &
Policy
Ethics and Uncertainty
I. Medical uncertainty is an ethical issue a. Stakeholders rely on medical information for
momentous decisions1: 1. Patients and providers
2. Organizing health systems
3. Spending scarce resources
b. Uncertainty creates inefficiencies for stakeholders
c. Resolution and management of uncertainty is a key aspect of a learning health system2.
1 London, A. J. (2012). A non-paternalistic model of research ethics and oversight: assessing the benefits of prospective review. The Journal of Law, Medicine & Ethics, 40(4), 930-944.
2 London, A. J. (2018). Learning health systems, clinical equipoise and the ethics of response adaptive randomisation. Journal of medical ethics, 44(6), 409-415.
4
Center for
Ethics &
Policy
Contrast Case
I. Structural engineers don’t build 50 physical instances of one design and 50 physical instances of another and see which one works the best.
Icons made by Freepik from www.flaticon.com
5
Center for
Ethics &
Policy
Contrast Case
I. Comprehensive models of relevant causal
systems:
a. Soil composition
b. Weather and climate
c. Material tensile strength, durability…
d. Structural capacities: load…
e. Stresses and loads from use cases…
II. Simulations based on models are reliable
and capture relevant causal relationships. 6
Center for
Ethics &
Policy
A Major Difference
I. In medicine our knowledge of causal structures is limited and unreliable.
II. Powerful machine learning techniques leverage associations in large data sets and computing power to make predictions/ classifications.
III. Utility of these systems depends on empirical verification of accuracy and reliability.
a. Trust hinges on independence and rigor of verification.
7
Center for
Ethics &
Policy
Deep Learning Systems
I. Deep-learning neural-networks
a. Humans: 1. Specify a use case
2. Label datasets to “train” the system.
3. Design system architecture
i. E.g., Number of “layers” used.
b. System: 1. Learn what features to detect (from data)
2. Learns potentially very complex functions from features to labels
3. Given a new case it outputs a label or a probable label. 8
Center for
Ethics &
Policy
Example: AI Screening
I. April 2018 FDA approved IDx-DR, an AI software system for diagnosing “more-than-mild” diabetic retinopathy in adults with diabetes.
a. “IDx-DR is the first device authorized for marketing that provides a screening decision without the need for a clinician to also interpret the image or results, which makes it usable by health care providers who may not normally be involved in eye care.”1
1https://www.fda.gov/newsevents/newsroom/pressannouncements/ucm604357.htm 9
Center for
Ethics &
Policy
IDx-DR: Hybrid System
I. CNNs trained to detect hemorrhages, exudates
and other lesions.
II. Model connecting pathologies to diagnostic
label reflects
expert
knowledge.
10 https://www.reviewofophthalmology.com/article/machine-learning-for-diabetic-retinopathy
Center for
Ethics &
Policy
Types of Approach
I. Gulshan and colleagues1: a. CNN to detect DR using all information in retinal
image.
b. “Another fundamental limitation arises from the nature of deep networks, in which the neural network was provided with only the image and associated grade, without explicit definitions of features (eg, microaneurysms, exudates). Because the network “learned” the features that were most predictive for the referability implicitly, it is possible that the algorithm is using features previously unknown to or ignored by humans.”
1 JAMA. 2016;316(22):2402-2410. doi:10.1001/jama.2016.17216
11
Center for
Ethics &
Policy
Why Do We Need Trials I. Models of pathophysiology and
intervention mechanism often unreliable or incomplete.
a. 9 or 10 new drugs fail to win approval for any indication.1
1. Far lower in some areas. i. Neuroprotective agents for Alzheimer's and
Parkinson’s
ii. Mega-dose vitamins and the CARET trial.
2. 50% failure rate in phase III.
1THOMAS, D. W, BURNS, J., AUDETTE, J., CARROL, A., DOW-HYGELUND, C. AND HAY, M. (2016). Clinical Development Success Rates 2006–2015. San Diego: Biomedtracker.
12
Center for
Ethics &
Policy
Why Do We Need Trials
II. In the absence of complete and accurate
causal models, causal claims must be
tested empirically.
a. Trials directly test specific claims.
b. Indirectly test theories of pathophysiology
and mechanism.
13
Center for
Ethics &
Policy
Why Do We Need Trials
III. Models of pathophysiology and
intervention mechanism guide
intervention use.
a. Patients differ from trial populations.
b. Clinicians extrapolate efficacy and toxicity
for differences in race, gender, age, co-
morbidities…
c. Inferences go beyond validated medical
evidence. 14
Center for
Ethics &
Policy
(source: FDA Innovation or Stagnation)
California Institute of Regenerative Medicine. “Progress toward therapies: Path to
the Clinic” (http://www.cirm.ca.gov/path-clinic)
(Mis)Understanding Translation
Center for
Ethics &
Policy
Theoretical
Understanding
1Kimmelman J and London AJ. (2015) The Structure of Clinical Translation: Efficiency,
Information, and Ethics. The Hastings Center Report 45, no. 2 (2015): 27-39. DOI:
10.1002/hast.433
Structure of Translation1 (1) Theories include:
a. Mechanism of physiology
and disease
b. Mechanism of intervention
c. Pharmacodynamics
d. Pharmacokinetics
Center for
Ethics &
Policy
EXPLORATORY
TRIALS
Theoretical
Understanding
Intervention
Ensemble
Assembly
(1)
Kimmelman J and London AJ. (2015) The Structure of Clinical Translation: Efficiency,
Information, and Ethics. The Hastings Center Report 45, no. 2 (2015): 27-39. DOI:
10.1002/hast.433
Structure of Translation
Center for
Ethics &
Policy
Intervention Ensemble
I. Dimensions of relevance
a. Indication
b. Dose
c. Schedule
d. Co-intervention
e. Diagnostic modalities
II. Values on dimensions that approximate
optimal performance 18
Center for
Ethics &
Policy
CONFIRMATORY
TRIALS
EXPLORATORY
TRIALS
Theoretical
Understanding
Intervention
Ensemble
Assembly
(1) (2)
Kimmelman J and London AJ. (2015) The Structure of Clinical Translation: Efficiency,
Information, and Ethics. The Hastings Center Report 45, no. 2 (2015): 27-39. DOI:
10.1002/hast.433
Structure of Translation
Center for
Ethics &
Policy
CONFIRMATORY
TRIALS
EXPLORATORY
TRIALS
Theoretical
Understanding
Intervention
Ensemble
Assembly
(1) (2)
Kimmelman J and London AJ. (2015) The Structure of Clinical Translation: Efficiency,
Information, and Ethics. The Hastings Center Report 45, no. 2 (2015): 27-39. DOI:
10.1002/hast.433
Structure of Translation
Center for
Ethics &
Policy
CONFIRMATORY
TRIALS
PRACTICE
EXPLORATORY
TRIALS
Theoretical
Understanding
Intervention
Ensemble
Assembly
(1) (2)
(3)
Kimmelman J and London AJ. (2015) The Structure of Clinical Translation: Efficiency,
Information, and Ethics. The Hastings Center Report 45, no. 2 (2015): 27-39. DOI:
10.1002/hast.433
Structure of Translation
Center for
Ethics &
Policy
CONFIRMATORY
TRIALS
PRACTICE
EXPLORATORY
TRIALS
Theoretical
Understanding
Intervention
Ensemble
Assembly
(1) (2)
(3)
Kimmelman J and London AJ. (2015) The Structure of Clinical Translation: Efficiency,
Information, and Ethics. The Hastings Center Report 45, no. 2 (2015): 27-39. DOI:
10.1002/hast.433
Structure of Translation
Center for
Ethics &
Policy
CONFIRMATORY
TRIALS
PRACTICE
EXPLORATORY
TRIALS
Theoretical
Understanding
Intervention
Ensemble
Assembly
(1) (2)
(3)
Kimmelman J and London AJ. (2015) The Structure of Clinical Translation: Efficiency,
Information, and Ethics. The Hastings Center Report 45, no. 2 (2015): 27-39. DOI:
10.1002/hast.433
Structure of Translation
Center for
Ethics &
Policy
Deep Learning
Atheoretic:
– Don’t use our domain
knowledge
– Learn a model from
data
24
Center for
Ethics &
Policy
Deep Learning
Atheoretic: – Don’t use our domain
knowledge
– Learn a model from data
Associationist: – Association doesn’t
entail causation.
– Different assumptions create different models.
25
Center for
Ethics &
Policy
Deep Learning
Atheoretic: – Don’t use our domain
knowledge for task.
– Learn a model from data
Associationist: – Association doesn’t entail
causation.
– Different assumptions create different models.
Black Box: – We know how system
generates a model.
– We often can’t directly inspect and understand the model. 26
Center for
Ethics &
Policy
Verifiability of Performance
I. Precision
a. p(+ | test +)
b. 8/10 vs 11/12
II. Sensitivity a. p(test +| +)
b. 8/12 vs 11/12
III. Specificity
a. p(test - | -)
b. 16/18 vs 17/18
27
+ + + + - - - - - -
+ + + + - - - - - -
+ + + + - - - - - -
+ + + + - - - - - -
+ + + + - - - - - -
+ + + + - - - - - -
(A)
(B)
Center for
Ethics &
Policy
EXPLORATORY
TESTING
Decision Task
Training Data Algorithm,
Standardization
s
(1)
Structure of Translation
Center for
Ethics &
Policy
ML Ensemble Development
I. Use Case:
a. Task to be performed?
1. Better way to do what we already do?
2. Way to do something new?
b. Relationship to clinical endpoints?
1. Direct
2. Indirect
3. Validated
4. Novel 29
Center for
Ethics &
Policy
ML Ensemble Development
II. Training data
a. Does it support the use case?
b. How representative is it?
1. Racial or other biases?
c. Raw data or curated in some way?
III. Algorithm
a. Robust across datasets?
b. Accuracy? 30
Center for
Ethics &
Policy
Why Do We Need Trials
I. Algorithms learn models from test sets.
II. Test sets may not reflect real world data.
a. Higher quality images
b. Curated or standardized fields, missing data
c. Common, arbitrary associations 1. Artifacts, clinical practices…
2. Devices correlate with patient demographics
III. Performance characteristics may not transfer to real world data.
31
Center for
Ethics &
Policy
EXPLORATORY
TESTING
Decision Task
Training Data
Algorithm
Standardization
s
(1)
Structure of Translation CONFIRMATORY
TRIALS
Center for
Ethics &
Policy
Why Do We Need Trials
IV. Testing systems prospectively:
a. Independent assessment of system
performance.
b. Feasibility of replicating conditions needed
to achieve performance benchmarks.
1. Standardizations?
i. Image quality?
ii. Data curation?
2. Protocols and reliability.
3. Personnel proficiency. 33
Center for
Ethics &
Policy
Why Do We Need Trials
V. Proof of accuracy on specific decision
task may not translate into real world
effectiveness on clinically meaningful
endpoints.
34
Center for
Ethics &
Policy
Real World Effectiveness
35
Real World Effectiveness
Ensemble Efficacy
Utilization Factors
Center for
Ethics &
Policy
Real World Effectiveness
36
Real World Effectiveness
Ensemble Efficacy1
Utilization Factors
a. Drug
b. Dosage
c. Schedule
d. Population
e. Co-interventions
f. Diagnostic
requirements
1Kimmelman & London (2015) The Structure of Clinical Translation. Hastings Cent
Rep. Mar-Apr;45(2):27-39. doi: 10.1002/hast.433. Epub 2015 Jan 27
Center for
Ethics &
Policy
Real World Effectiveness
37
Real World Effectiveness
Ensemble Efficacy
Utilization Factors
a. Algorithm
b. Use Case
c. Data
requirements
1. Standardizations?
2. Curation?
Center for
Ethics &
Policy
Real World Effectiveness
38
Real World Effectiveness
Ensemble Efficacy
Utilization Factors
– Provider or Patient
Preferences
– Cost
– Tolerance/Adherence
– Clinical capacity
– Awareness
Center for
Ethics &
Policy
Real World Effectiveness
39
Real World Effectiveness
Ensemble Efficacy
Utilization Factors
– Single vs multiple
(dose, test,
visits…)
– Invasive?
– Less testing
– Time to results
Center for
Ethics &
Policy
Best Epistemic State
40
Real World Effectiveness
Ensemble2 Efficacy Ensemble1 Efficacy
Utilization Factors Utilization Factors
>
>
Center for
Ethics &
Policy
Pre-Approval Uncertainty
41
Real World Effectiveness
Ensemble2 Efficacy Ensemble1 Efficacy
Utilization Factors Utilization Factors
?
?
? ?
Center for
Ethics &
Policy
Post-Approval Uncertainty
42
Real World Effectiveness
Ensemble2 Efficacy Ensemble1 Efficacy
Utilization Factors Utilization Factors
?
?
? ?
Center for
Ethics &
Policy
Conclusions
I. Intervention development is an iterated process.
II. Ensemble of practices necessary to achieve efficacy.
III. Trials validate the utility of this ensemble
IV. Real world evidence includes utilization factors and comparative effectiveness relative to a clinically meaningful benchmark.
43