22
Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September 2004

Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Embed Size (px)

Citation preview

Page 1: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Integrating Discovery, Development, and

Commercial Data into Data Mining

Jennifer SloanData Mining Consultant

GlaxoSmithKline: US Pharma IT15 September 2004

Page 2: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Data Mining Definition

Data Mining is a process that uses a variety of data analysis tools to discover

patterns and relationships in data that may be used to make valid and accurate

predictions.

Page 3: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Data Mining is a tool that allows us to

Identify problematic areas Control process variability Make concrete decisions on

business needs Develop a model which can aid in

future business decisions

Page 4: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Commercial Data

Analyzing Multivariate DataManaging Data Usage

Model Building

Page 5: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Multivariate Data Sets Data are multivariate in nature

Large data sets containing multiple criteria within each observation

Comparing multiple vectors is nearly impossible without reducing to a single point

Page 6: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Here we view 5-dimensional information on one observation. Each point represents a prescriber and the color represents a Market Share increase or decrease. Overlapping distributions make this difficult to interpret and further analysis is required. Over 200K observations are represented in this graph.

Page 7: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

The same observations are observed but now two-way interactions between the variables help us determine which variables are affecting market shifts and lead to constructing models which will predict prescriber behavior.

Page 8: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Drug Development

Page 9: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Drug Development Issues

Adverse Event Reporting System (AERS) Over 2 million AE reports and approximately

2000 drugs and biologics submitted to the FDA since 1968

Creates Extremely Complicated Matrix of Data

Recently, Data Mining methods have helped address this issue with the development of a method used to examine large databases for associations between drugs and AEs

Page 10: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Data Mining Algorithm

Multi-Item Gamma Poisson Shrinker (MGPS) Developed by William DuMochel (AT&T)

Through statistical modeling, this Empirical Bayesian method identifies higher-than-expected reporting relationships of drug-event combinations

Automated, web-based system with rapid drill-down capability

MGPS runs using all event terms and drugs in the AERS database and produces results for all drug-event combinations

Page 11: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

MGPS: Significance

Handles Complex Stratification (age, gender, year of report >

945 categories) Performs complex computations in

minimal amount of time: Much MORE EFFICIENT

Real World Example:

Page 12: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September
Page 13: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Membership: PhRMA-FDAWorking Group

Chair: June Almenoff (GSK) FDA InvolvementInvolved PhRMA companies: Abbott,

Allergan, AstraZeneca, Bristol-Myers Squibb, GlaxoSmithKline, Johnson & Johnson, Lilly, Merck, Novartis, Schering-Plough, Pfizer, Roche, Wyeth

Page 14: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Drug Discovery

Page 15: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

SCAM—Statistical Classification of Activities of Molecules

Recursive partitioning customized for chemistry

Creates a structure activity relationship (SAR) mode7l

Handles large numbers of descriptors (> 1 million)

Page 16: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

SCAM : Data Structure

1 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 11 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1

1 0 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 11 0 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 11 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 11 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1

1 0 1 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 11 0 1 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1

1 0 0 0 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 0 11 0 0 0 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 0 1

......

YY11

YY22

YY33

YY44

YYnn

ON S

HN

NO

ONH

......

BiologicalBiologicalActivitiesActivities

>100K>100K > 2 million> 2 million

Page 17: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

SCAM’s Recursive Partitioning

n = 1614ave = 0.29sd = 0.73

n = 36ave = 2.60sd = 0.9

Signal 2.60 - 0.29t = = = 18.68

Noise 0.734 1 1 36 1614

+

FeaturerP = 2.03E-70

aP = 1.30E-66

n = 1650Ave = 0.34SD = 0.81

Page 18: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

SCAM Tree

Page 19: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Advantages of SCAM

Works for complex situations, mixtures and interactions.

Output is easy to understand and explain

High statistical power

Produces a valid answer

Page 20: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

SCAM Drawbacks

Data greedyOnly one view of the data Binary descriptors may be too “crude” Disposition of outliers is difficultHighly correlated variables may be

obscuredHigher order interactions may be

masked

Page 21: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

Concluding Remarks

Data Mining enables us to efficiently handle LARGE amounts of data

Data Mining allows us to perform analyses IN REAL TIME

Data Mining covers a wide array of

topics in drug industry and its benefits are plentiful

Page 22: Integrating Discovery, Development, and Commercial Data into Data Mining Jennifer Sloan Data Mining Consultant GlaxoSmithKline: US Pharma IT 15 September

References

Almenoff, June S, et al. “Disproportionality Analysis Using Empirical Bayes Data Mining: A tool for the Evaluation of Drug Interactions in the Post-Marketing Setting.” Pharmacoepidemiology and Drug Safety,12, 517-521 (2003).

Donahue, Rafe. “An Overview of Data Mining in Drug Development and Marketing.” http://home.earthlink.net/~rafedonahue. May 2003.

Hawkins, D.M. and G.V. Kass, “Automatic Interaction Detection.” Topics in Applied Multivariate Analysis, ed. Hawkins, (1982).

Hawkins, D.M., S.S. Young and A. Rusinko. “Analysis of a Large Structure-Activity Data Set Using Recursive Partitioning.” QSAR, 16, 296-302 (1997).