The best of both worlds
Pharma R&D IT - InformaticsRudi Verbeeck
Guided analytics in the hands of the SME
2
Outline
• Changing world
– More data
– More analysis requirements
• Two worlds
– Technical skills
– Subject matter expertise
• Best of both worlds
– Exploratory analysis: DIAN study
– Guided analytics: phase III clinical trial planning
October 2014Pharma R&D IT Informatics
3
Evolution of data
In the pharmaceutical industry
October 2014Pharma R&D IT Informatics
Largely paper based R&D
Computerised records,
increased efficiency
Rich electronic data sources, new linkages,
new applications
More data sources, more linkages, more
applications
1990 2000 2010
4October 2014Pharma R&D IT Informatics
5
Data challenges for the EMIF consortium
Need for efficiency enhancing workflows
October 2014Pharma R&D IT Informatics
Diversity• Primary care• HIS• Administrative• Regional record linkage• Cohorts• Biobanks
Scale• 52M subject• 7 EU countries• 25K AD• 95K Metabolics
Privacy• Population data• Cohorts
Harmonization• Common data model• Semantic web technology
Analysis & research• Disease insights• Extreme phenotypes• Pharmaco-epidemiology
6
Skills – two worlds
October 2014Pharma R&D IT Informatics
Identify data sources
Preprocessing, QA, cleaning
(Exploratory) Analysis
Verify conclusions
Interpretation, insights Actions
Technicalskills
Domainexpertise
7
Personal Peer Publication
Data analysis & exporation
Discussions Communication
Handshake – packaging your analysis
October 2014Pharma R&D IT Informatics
Graphs that allow a rich analysismay not convey a message very well
(and vice versa)
8
Example: circadian rhythm gene expression
October 2014Pharma R&D IT Informatics
Expression profile of typical genes
9
Exploratory analysis: data interactions
• Overview ⇨ zoom & filter ⇨ details on demand
– A table view of the raw data gives an idea of the variables and values
– Create overview visualizations, investigate distributions and correlations
– Use filtering to look at subsets of the data
– Detailed inspection of groups, outliers, anomalies...
• Brushing & linking show interactions between variables
October 2014Pharma R&D IT Informatics
10
Example: DIAN study
DIAN = Dominantly Inherited Alzheimer Network
Observational study of genetically inherited early onset Alzheimer’s disease
– 73 families
– Mutations in amyloid precursor protein, presenilin 1, 2
– Age at onset estimated from parent
– > 600 variables per patient
Which baseline measurements are correlated with onset of AD?
October 2014Pharma R&D IT Informatics
11
Data overview
October 2014Pharma R&D IT Informatics
Show raw data in a table view
Overview of measurements and
visits
12
Explore correlations between measurements
October 2014Pharma R&D IT Informatics
General overview of measurement groups
Detailed correlation for selected variable(s)
Individual observations
13
Validated statistics: guided analytics
• Guide the SME through a series of decisions
• Give freedom to explore, backtrack, what-if
• Ensure sound & consistent statistical methods
October 2014Pharma R&D IT Informatics
14
Example: clinical phase III trial planning
What patient inclusion criteria should be used for the Phase III trial of an AD compound?
– We expect the compound to work best in patients with mild cognitive impairment (MCI), who are on the verge of converting to Alzheimer (AD)
– We expect the compound to take some time to show clinical effect. We therefore want subjects not to convert early in the trial.
– Which measurements are realistic in a trial selection setting?
– Data from ADNI study (Alzheimer’s Disease Neuroimaging Initiative)
October 2014Pharma R&D IT Informatics
15
Guided analytics
In a guided analysis, advanced statistics are packaged into a wizard-like application to guide the subject matter expert through a decision process
1. How is MCI to AD conversion measured?
2. What timeframe corresponds to “early” conversion?
3. Which (combination of) baseline measurements are predictive for early vs. late conversion?
Use logistic regression modelling
4. For selected baseline measures, what cut-off value should be used as a selection criterion?
Use survival analysis
October 2014Pharma R&D IT Informatics
16
Step 1: define early conversion
October 2014Pharma R&D IT Informatics
Text fields explain decision steps
Select visit that separates early from
late convertors
Select conversion criterion (change in
diagnosis or change in clinical dementia
rating)
Graph shows number of early / late
convertors
17
Step 2: find predictive covariates
October 2014Pharma R&D IT Informatics
Select baseline covariates
Verify that groups are balanced
Verify variables are uncorrelated
Calculate logistic regression model in R
Verify model diagnostics
Find significant covariates by p-value
Inspect ROC for full model, stepwise model
and cross validation
18
Step 3: find cut-off values
October 2014Pharma R&D IT Informatics
Select significant covariates from
previous step
Verify distribution by conversion group and
determine cut-off
Enter cut-off value
Verify time evolution of cut-off groups
Kaplan-Meier plot of conversion rates by
cut-off group (calculate in R)
19
Conclusions
• Usage patterns should be supported by applications or licensing model. For example, using Spotfire
– Informatician / biostatistician prepares guided analysis using full client
– SME follows prepared analysis to draw conclusions using WebPlayer
• Think about how you present the data. Your chart encoding should be easy to understand.
• Conclusions from a guided analysis still need to be confirmed with a statistician.
• Guided analytics is a good communication tool.
• Distribute workload.
October 2014Pharma R&D IT Informatics