Tom Plasterer, PhD.integrated informatics Semantic Framework Lead (i2SF)
The Path to Linked Data in BioPharma
Integrated R&D Informatics and Knowledge Management
R&D | RDI
Blockbuster ‘Patent Cliff’ Gives Way to Personalized ApproachDrivers & Solutions
Blockbuster Patent Cliff
Growth of Generics
Mergers & Acquisitions
Personalized Medicine•Pharmacogenetics•Biomarkers
American Action Forum; Primer: The Pharmaceutical Industry (Han Zhong l Updated June 2012)
IMAP Pharma & Biotech Industry Global Report 2011
Evaluate Pharma World Preview 2018From: http://www.liv.ac.uk/pharmacogenetics/
R&D | RDI
•Nurture ‘best in class’ programs
•Kill early•Repositi
oning
Build from within
•Partner or Buy?
•Integrate cultures & technology
•Is the disruption worth it?
Mergers & Acquisitions
•How much can be shared—and still be useful?
•Who is driving?
Pre-Competitive Consortiums
•Aggressive Regional Partnerships (Pfizer's Centers for Therapeutic Innovation)
•Co-locate near Academic Centers of Excellence (Novartis)
•Cherry pick (GSK, AZ, others)
Finding ‘KOLs’
Where do the new opportunities arise?Inside & Outside
R&D | RDI
Distributed Data in a Monolithic EnvironmentManaging Silos
• Regulated Systems vs. DiscoveryPartitioned By Content
• US, EU, ASIAPACPartitioned By Geography & Organization
• RDB, Excel, Text, RSS, RDF?Data Formats
• Steps in the right direction?Warehouses & Service Oriented Architecture
• eRooms, Sharepoint,Yammer, ‘Lync’ vs. Twitter, Google Docs, SkypeCollaborative Environment
• Vendor specific or open?• Mixed BagStandards?
• UI? Services?• Metadata?Where are the ‘smarts’
R&D | RDI
Requirements of The Informatics Landscape
Must span the entire drug development lifecycleo and back (post-market surveillance to discovery)
Must support large and very heterogeneous datao single nucleotide polymorphisms to countries
Will change as new science emerges & new regulations come into playo Medline just under 1M articles/year
Must be able to work with multiple, international regulatory bodieso Emerging markets
Partners, customers and collaborators will changeo and will have divergent technical aptitudes
Must be able to interoperated with precompetitive consortiao Can they perform common tasks for the community
Must be able to work with legacy datao Lots of unmined gems here!
Maximal Agility
R&D | RDI
What’s Needed?
Linked Data!
http://thedatahub.org/group/lodcloudLOD Cloud 2011
R&D | RDI
The 5 Stars of Open Linked Data
W3C/TBL Guidance
7 http://www.w3.org/DesignIssues/LinkedData.html
★ Make your stuff available on the web (any format)
★★ make it available as structured data (e.g. Excel instead of image scan of a table)
★★★ Use a non-proprietary format (e.g. CSV instead of Excel)
★★★★ Use URLs to identify things, so that people can point at your stuff
★★★★★ Link your data to other people’s data to provide context
R&D | RDI
The 5 Stars of Open ClosedLinked Data
8 http://www.w3.org/DesignIssues/LinkedData.html
★ Make your stuff available on the web intranet (any format)
★★ make it available as structured data (e.g. Excel instead of image scan of a table)
★★★ Use a non-proprietary format (e.g. CSV instead of Excel)
★★★★ Use URLs to identify things, so that people can point at your stuff
★★★★★ Link your data to other people’s data to provide context
W3C/TBL Guidance
Catalogues, Mapping, Queries
RD
F
Towards a Linked Data Architecture
9
Active & Partial PURLs
Central IdentityManagement
Structured
Triplestores
http://research.vocab.astrazeneca.com/id/DOID/2841 http://humandiseaseontology.astrazeneca.net/DOID/2841
SemanticVisualization
Semi-StructuredUnstructured
Content
+Tagging
VocabularyServer
Search
R&D | RDI
Choosing Linked VocabulariesCurrent LOD Cloud Adoption
10
Vocabulary prefix Vocabulary link
Number of usages in data
sets
dc http://purl.org/dc/elements/1.1/ 92 (31.19 %)
foaf http://xmlns.com/foaf/0.1/ 81 (27.46 %)
skos http://www.w3.org/2004/02/skos/core# 58 (19.66 %)
geo http://www.w3.org/2003/01/geo/wgs84_pos# 25 (8.47 %)
xhtml http://www.w3.org/1999/xhtml/vocab# 19 (6.44 %)
akt http://www.aktors.org/ontology/portal# 17 (5.76 %)
bibo http://purl.org/ontology/bibo/ 14 (4.75 %)
mo http://purl.org/ontology/mo/ 13 (4.41 %)
vcard http://www.w3.org/2006/vcard/ns# 10 (3.39 %)
sioc http://rdfs.org/sioc/ns# 10 (3.39 %)
cc http://creativecommons.org/ns# 8 (2.71 %)
geonames http://www.geonames.org/ontology# 6 (2.03 %)
http://www4.wiwiss.fu-berlin.de/lodcloud/state/#terms
VocabularyServer
R&D | RDI
The 5 Stars of Open Linked Vocabularies
Bernard Vatant (Mondeca) Guidance
11 http://blog.hubjects.com/2012/02/is-your-linked-data-vocabulary-5-star_9588.html
★ Publish your vocabulary on the Web at a stable URI
★★ Provide human-readable documentation and basic metadata (e.g. creator, publisher, date of creation, last modification, version number)
★★★ Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopes
★★★★ Make your vocabulary available via its namespace URI, both as a formal file and human-readable documentation, using content negotiation
★★★★★ Link to other vocabularies by re-using elements rather than re-inventing
R&D | RDI
Domain Specific Vocabularies
Linked Open Vocabularies, NCBO
12
http://labs.mondeca.com/dataset/lov/index.html
http://bioportal.bioontology.org/
Capture Business Questions and
Sources
Domain Expert Concept Map
Build Formal Ontology•Reuse Vocabularies!
Challenge with Linked Data
Model Business Questions (SPARQL)
Interact with RDF answer in a
Faceted Browser
Building Linked Data Applications
Improving Internal Interoperability
Scientists, Clinicians, Informaticists can now freely interoperate as:
The PURL server provides a central identity management authority for resources that are of value (need to persist) across the enterprise. The Persistent URLs are used to connect resources found in multiple locations
The vocabulary server provides a way of harmonizing concepts across different domains
o Where possible, public vocabularies are usedo Where not, they’re extendedo We don’t want to develop and maintain vocabularies
R&D | RDI
Structured
Vendor Content
Consortium ContentRESTful
APIs
Catalogues, Mapping, Queries
RD
F
Structured
Triplestores
Semi-StructuredUnstructured
Content
+Tagging
Inside/Outside Disappears
15
External Internal
Active & Partial PURLs
Central IdentityManagement
SemanticVisualization
VocabularyServer
R&D | RDI
Unstructured Content
Giving Structure to Unstructured ContentoEntity RecognitionoUse of common vocabularies
o Schemaso Domain-Specific Content? Open BEL? TMO?
oCompatibility of text indices with triplestores & middleware tools
Encouraging Publishers to Structure ContentoHow can this be ‘monetized’ so they don’t lose their ROI?oWhat about interoperability & persistence?oCan this be mandated via funding agenciesoRDFa to start?
Publishers or ‘Re-publishers’o Thomson-Reuterso IngenuityoOpen up vocabularies
(or most of the data out there…)
R&D | RDI
Pre-Competitive Consortia
Open PHACTS (Innovative Medicines Initiative)
Pistoia Alliance
W3C Health Care & Life Sciences Interest Group
National Center for Biomedical Ontologies (NCBO)
Open BEL (Biological Expression Language)
R&D | RDI
Flexible and adaptable l Dynamic schema-less approach;
rapidly incorporate new datasets l Queries are adaptive, based on
scientific profiles (e.g. chemist or biologist)
l Use-case driven & tested by users in industry and academia
Great APIs for building apps l JSON REST-style APIs l Also supports XML, Turtle, etc l Chemistry services l Exemplars show how to take
advantage of the platform l Clear licensing details for all data in
the system
Key Points Large scale data integration l Focused on pharmacology l We integrate so you don’t have to l Dealing with multiple identifiers for
the same concept l Always up-to-date l State of the art and industrial
strength
Focus On Data Quality l Provenance is critical – know where
every data point comes from l Google-style indexing; Data
providers keep their own data l Chemistry Standardization –
enhancing chemistry connectivity
l Working with data providers to expose and enhance their data 18
Open PHACTS (Open Pharmacological Space)• EU/EFPIA Innovative Medicines Initiative (IMI) project
From: Open PHACTS Architecture - Building the extensible platform (EuroQSAR 2012 in Vienna, 30.08.2012)
R&D | RDI
W3C HCLS
Activities:o Continue to develop high level (e.g. TMO) and architectural (e.g. SWAN) vocabularies.o Implement proof-of-concept demonstrations and industry-ready code.o Document guidelines to accelerate the adoption of the technology.o Disseminate information about the group's work at government, industry, academic events
and by participating in community initiatives.Use Cases/Domainso Drug Discoveryo Electronic Lab Notebookso Comparator Arm Datao Patient Data Ownershipo Biotech Acquisitiono Supply Chain Automationo Web Integrationo Bio-surveillanceo Co-development
http://www.w3.org/blog/hcls/
The mission of the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG) is to develop, advocate for, and
support the use of Semantic Web technologies across health care, life sciences, clinical research and translational medicine
R&D | RDI
Pleas & Future Directions
PrognosticationsRDF Content Farms
Vendors: Someone will figure out how to monetize this
Consortia: Who ‘Owns’ this?Government in Health Care & Life
Sciences; can we learn from the EPA? open.gov?
Shrinking PharmaSmaller (or virtual) footprint
oBack to first principles—what do we do best?
More modeling & SimulationRise of the informaticist…
Community HelpResist Silos
Where is your data? Where is it likely to be in 5, 10 years?
A single triplestore with all ETL-streams leading to an RDF ‘data warehouse’ is another silo
oBuilding on top of ‘standards+’ may lead to silos
Need to follow & influence emergence of standards if you have a ‘horse in the race’
Support (business focused) ConsortiumsWe’re doing the same job many, many
times
Thank YouListeners & Molecular Med TRI-CON 2013 Organizers