Oracle Life Sciences Platform and 10g Preview
Charlie BergerSr. Director of Product Management, Life Sciences
and Data Mining
Oracle Corporation
Session id: 40263
Welcome to the Oracle Life Sciences User Group Meeting
Oracle HQBldg 350 Conference Center
Redwood Shores, CASeptember 10th, 2003
8:30 am-7:30 pm
Oracle Life Sciences Day & User Group Meeting Agenda 8:00-8:30 Breakfast8:30-8:45 Welcome8:45-9:45 Oracle's Platform for Life Sciences - New 10G Features Preview &
Solicitation Process for Features in Next ReleaseCharlie Berger, Oracle Corporation
9:45-10:30 New In Silico Drug Discovery Integrated DemoJoyce Peng, Oracle Corporation
10:30-10:50 Break 10:50-11:30 European Bioinformatics Institutes (EBI), Peter Stoehr
Managing Scientific Literature (Medline) and XML Data Within Oracle11:30-12:10 The Wellcome Trust Sanger Institute, Martin Widlake
Implementing a Terascale Data Store (20 TB)12:10-1:00 Lunch & Wish List Feature Post-it Notes1:00-1:40 Wyeth Research, Peter Smith
21 CFR PART 11 via Oracle Auditing at Wyeth
Oracle Life Sciences Day & User Group Meeting Agenda 1:40-2:20 Sequence Search Capabilities in the Database, Myriad Proteomics2:20-3:00 Johnson & Johnson, Richard Guida & Rajesh Shah
Building a Secure Infrastructure with Oracle in Life Sciences, J & J PKI and Secure Connectivity to Oracle
3:00-3:20 Break & Afternoon Refreshments3:20-4:00 Kyoto University, Japan, Susumu Goto
Integrating Biological Information and Pathways using Oracle,KEGG at Kyoto University
4:00-4:40 BioMed Central Limited, Matthew CockerillManaging Scientific Images with Oracle - Multimedia Database Improves the Bottom Line
4:40-5:20 Abbott Laboratories, Shon NaeymiradElectronic Records, 21 CFR Part 11 and Oracle 9i
5:20-5:30 Break5:30-6:30 ISV Lightening Rounds, Life Sciences ISV Partners6:30-7:30 ISV Reception and Demo Grounds
"My industry is going to become pretty boring soon –I don't believe you'll ever see this proliferation of informatics
companies or computer companies like you sawin the decade of the Nineties. The life sciences industry
is where the horizons are wide open. There'll be lots and lotsof companies born, lots of new products, lots of new science
at least for the next 50 years.
Because of that...we've decided to focus heavilyon the life sciences industry.”
-Larry Ellison, CEO, Oracle Corporation,Bio-IT World magazine, premier issue March 2002
Oracle’s Commitment
Life Sciences Value Chain
Discovery Discovery
Contract Contract ResearchResearch Organization Organization
HospitalHospital
Pharmaceutical Pharmaceutical Mfg. Plant Mfg. Plant
PharmacPharmacyy
DistributionDistribution
Development Development
Manufacturing, SalesManufacturing, Salesand Marketingand Marketing
PharmaceuticalPharmaceuticalCompany Company
RegulatoryRegulatoryAgencyAgency
Clinical
Clinical
Trials
Trials
Biotech /Biotech /PharmaceuticalPharmaceuticalResearch LabsResearch Labs
Public/Public/Private DataPrivate Data
Wet Lab
In Silico
SampleSampleDataData
BiomedicalBiomedicalFirm Firm
BiomedicalBiomedicalFirm Firm
PharmaceuticalPharmaceuticalCompany Company
Pre-Clinical
Pre-Clinical
Trials
Trials
Database ApplicationServer
DiscoveryDiscovery
Finance
HR Projects
Maintenance
Manufacture/Supply Chain Management
Manage all your dataManage all your data Run all your applicationsRun all your applications
Oracle’s Solutions for Life Sciences
DiscoveryDiscovery
Development& Clinical
Sales & Marketing
Years
Revenue
Identify and
Validate Targets
Identify and
Validate Leads
Pre- Clinical Trails
Clinical Trials
PatentExpiry
Competitionfrom Generics
ProductLaunch
Goal: Accelerate the Discovery Process
Source: Ernst & Young, Price Waterhouse
Costs
R & D Costs
Identify and
Validate Targets
Identify and
Validate Leads
Pre- Clinical Trails
Clinical Trials
R & D Costs20
Sales Revenue
15
Drug Discovery Economics 101Better Data Management Accelerates Discovery
Cell Nucleus
Chromosome
Protein
Graphics courtesy of the National Human Genome Research Institute
Gene (DNA)Gene (mRNA)
Organism
Life Sciences DiscoveryGenes and Proteins Run the Cell
aattggaagc aaatgacatc acagcaggtc agagaaaaag ggttgagcgg caggcacccagagtagtagg tctttggcat taggagcttg agcccagacg gccctagcag ggaccccagcgcccgagaga ccatgcagag gtcgcctctg gaaaaggcca gcgttgtctc caaactttttttcagctgga ccagaccaat tttgaggaaa ggatacagac agcgcctgga attgtcagacatataccaaa tcccttctgt tgattctgct gacaatctat ctgaaaaatt ggaaagagaatgggatagag agctggcttc aaagaaaaat cctaaactca ttaatgccct tcggcgatgttttttctgga gatttatgtt ctatggaatc tttttatatt taggggaagt caccaaagcagtacagcctc tcttactggg aagaatcata gcttcctatg acccggataa caaggaggaacgctctatcg cgatttatct aggcataggc ttatgccttc tctttattgt gaggacactgctcctacacc cagccatttt tggccttcat cacattggaa tgcagatgag aatagctatgtttagtttga tttataagaa gactttaaag ctgtcaagcc gtgttctaga taaaataagtattggacaac ttgttagtct cctttccaac aacctgaaca aatttgatga aggacttgcattggcacatt tcgtgtggat cgctcctttg caagtggcac tcctcatggg gctaatctgggagttgttac aggcgtctgc cttctgtgga cttggtttcc tgatagtcct tgccctttttcaggctgggc tagggagaat gatgatgaag tacagagatc agagagctgg gaagatcagtgaaagacttg tgattacctc agaaatgatt gaaaatatcc aatctgttaa ggcatactgctgggaagaag caatggaaaa aatgattgaa aacttaagac aaacagaact gaaactgactcggaaggcag cctatgtgag atacttcaat agctcagcct tcttcttctc agggttctttgtggtgtttt tatctgtgct tccctatgca ctaatcaaag gaatcatcct ccggaaaatattcaccacca tctcattctg cattgttctg cgcatggcgg tcactcggca atttccctgggctgtacaaa catggtatga ctctcttgga gcaataaaca aaatacagga tttcttacaaaagcaagaat ataagacatt ggaatataac ttaacgacta cagaagtagt gatggagaatgtaacagcct tctgggagga gggatttggg gaattatttg agaaagcaaa acaaaacaataacaatagaa aaacttctaa tggtgatgac agcctcttct tcagtaattt ctcacttcttggtactcctg tcctgaaaga tattaatttc aagatagaaa gaggacagtt gttggcggttgctggatcca ctggagcagg caagacttca cttctaatga tgattatggg agaactggagccttcagagg gtaaaattaa gcacagtgga agaatttcat tctgttctca gttttcctggattatgcctg gcaccattaa agaaaatatc atCTTtggtg tttcctatga tgaatatagtacagaagcg tcatcaaagc atgccaacta gaagaggaca tctccaagtt tgcagagaaagacaatatag ttcttggaga aggtggaatc acactgagtg gaggtcaacg agcaagaatt
agaatttcat
at[T/C]gtg
gaagaggac
3.2 billion letters of human DNA ~ 2 million variation points (SNPs) SNP = Single Nucleotide Polymorphism
Life Sciences ChallengeCorrelate Biological and DNA Variation
Graphics courtesy of the National Human Genome Research Institute
Life Sciences ChallengeCorrelate Diseases, Genes and Environment
Myocardial Infarction
Stroke
Diabetes
Breast cancer Manic-depression
Obesity
Hyperlipidemia
Inflammatory Bowel Disease
Hypertension
Schizophrenia
Graphics courtesy of the National Human Genome Research Institute
0
50TB
100TB
150TB
200TB
250TB
300TB
350TB
400TB
450TB
500TB
Life Science Challenge Exploding Volumes of Data
“To meet the scientific goals we believe we need to add around 80 - 100TB of storage each year for the next 5 years”
P. Butcher, The Sanger Centre
199
41
995
199
61
997
199
8O
ct-
19
99A
pr-
20
00N
ov
-200
1J
an-0
12
002
200
32
004
200
52
006
Data StorageToday
Life Science Challenge Many Different Kinds of Data
GenomicsGenomics
FunctionalGenomics
FunctionalGenomics
Chem-informatics
Chem-informatics
ProteomicsProteomics
Pharmaco-genomics
Pharmaco-genomics
ModelingModeling
ClinicalClinical
PathwaysPathways
Graphic modified from original courtesy of Sun Microsystems
Life Science ChallengeTypical Research Environment
Industrial Research Lab
Public Databases
Private/Service Databases
Local Copies
Partner or Collaborator
Local Databases
Find Patterns and
insights
Manage vast quantities of data
Collaborate securely
Access heterogeneous
Data
Access heterogeneous data
Integrate a variety
of data types
BrowserMobile Device
Oracle10gApp Server
Oracle10gDatabase
Server
Clients
Run All YourRun All YourApplicationsApplications
Manage All Manage All Your DataYour Data
Oracle Vision : At the core is a data management platform
Introducing Oracle 10g
Runs all your applications Stores all your information Highly scalable, available,
reliable Secure Easy to manage
– Make individual systems self-managing
– Manage thousands of servers at once
GenomicsGenomics
ProteomicsProteomics
PathwaysPathways
CheminformaticsCheminformatics
ClinicalClinical
1. Access heterogeneous data2. Integrate a variety of data types3. Manage vast quantities of data4. Find patterns and insights 5. Collaborate securely
Oracle’s Platform for Life Sciences
Oracle Life Sciences Platform
Find Patterns and
insights
Manage vast quantities of data
Collaborate securely
Access heterogeneous
Data
Access heterogeneous data
Integrate a variety
of data types
Oracle Life Sciences Platform
Collaboration SuiteCollaborate securely
iFS/Files Share documents
XML DBFlexibly manage data
interMediaStore & manage images
SQL LoaderHigh performance data loader
Web ServicesStandard communication between applications
Merge/UpsertEnabling update and insert in one step
Oracle PortalBuild personalized portals
Application ServerProvide scalability for themiddle tier
Transparent GatewaysFast access using Oracle OCI
Distributed QueriesPerform searches across domains
Generic GatewaysAccess any data using ODBC
e.g. SwissProt SP-ML
Transportable Tablespaces
Rapidly exchange tables
Oracle StreamsRule-based subscription for
information sharing
Data MiningDiscover patterns & insights
StatisticsPerform basic statistics
Table FunctionsImplement complex algorithms
OLAP & DiscovererInteractive query & drill-down
SecurityEnforce security
AuditingCreate audit trail to facilitate FDA compliance
WorkflowAutomate laboratory & business processes
Extensibility Framework (Data cartridges), manage complex scientific data LOBsManage unstructured data
TextIndex & query text, e.g. literature searches
Real Application Clusters Linear scalability
Cl
Cl
O
e.g. PubMede.g. MySQLGenBank
External TablesAbility to index and query external files
UltraSearchSearch external sites
& repositories
MySQL ToolkitEasily move MySQL
data into Oracle
Find Patterns and
insights
Manage vast quantities of data
Collaborate securely
Access heterogeneous
Data
Access heterogeneous data
Integrate a variety
of data types
Oracle Life Sciences Platform
Collaboration SuiteCollaborate securely
iFS/Files Share documents
XML DBFlexibly manage data
interMediaStore & manage images
SQL LoaderHigh performance data loader
Web ServicesStandard communication between applications
Merge/UpsertEnabling update and insert in one step
Oracle PortalBuild personalized portals
Application ServerProvide scalability for themiddle tier
Transparent GatewaysFast access using Oracle OCI
Distributed QueriesPerform searches across domains
Generic GatewaysAccess any data using ODBC
e.g. SwissProt SP-ML
Transportable Tablespaces
Rapidly exchange tables
Oracle StreamsRule-based subscription for
information sharing
Data MiningDiscover patterns & insights
StatisticsPerform basic statistics
Table FunctionsImplement complex algorithms
OLAP & DiscovererInteractive query & drill-down
SecurityEnforce security
AuditingCreate audit trail to facilitate FDA compliance
WorkflowAutomate laboratory & business processes
Extensibility Framework (Data cartridges), manage complex scientific data LOBsManage unstructured data
TextIndex & query text, e.g. literature searches
Real Application Clusters Linear scalability
Cl
Cl
O
e.g. PubMede.g. MySQLGenBank
External TablesAbility to index and query external files
UltraSearchSearch external sites
& repositories
MySQL ToolkitEasily move MySQL
data into Oracle
Flat files
Distributed query
Transparent Gateway
External Sites
MySQL
Generic Connectivity
MySQL Migration Toolkit
DBlinks
UltraSearch
Sybase DB2
Transparent Gateway
External Table
Transportable Tablespaces
1. Access Heterogeneous Data
1. Access Heterogeneous Data
Oracle Transparent Gateways
– Integrate data from disparate systems
Generic Connectivity– ODBC/JDBC connectivity
External Tables– Access data from flat files
Distributed Queries– Query across multiple Oracle and
heterogeneous data sources Transportable
tablespaces– Rapidly move tablespaces between
Oracle databases
SQL*Loader– High performance data loader
Oracle Streams– Rule-based subscription for information
sharing
Dblinks– Connectivity between databases
UltraSearch– Query range of data repositories (web
sites, files, email, databases, etc.)
Migration Toolkits– Tools to facilitate movement of data into
Oracle
Merge / Upsert – Update and insert in one step
Flat filesMySQL
GenomicsGenomics
FunctionalGenomics
FunctionalGenomics
Chem-informatics
Chem-informatics
ProteomicsProteomics
Pharmaco-genomics
Pharmaco-genomics
ModelingModeling
ClinicalClinical
PathwaysPathways
Graphic modified from original courtesy of Sun Microsystems
2. Integrate a Variety of Data Types
XML DB– Unite XML content and relational data– SQL & XML become one
LOBs– Manage unstructured data
Internet File System (Oracle Files)– Manage files and folders
Text– Index and query of text content & documents (Word,
Powerpoint, HTML, Adobe PDFs, etc.) interMedia
– Manage audio, video and image data
XMLXML
2. Integrate a Variety of Data Types
European Bioinformatics Institute (EBI)
Hosts major public databases (e.g. SwissProt, EMBL Nucleotide Sequence Database, Medline) on Oracle. (Total: > 5 TB)
Uses Oracle XML DB and Oracle Text for Medline – in development.
– Size: 11 million records, 200 GB
Uses Oracle9i Database and Application Server.
Extensibility Framework (Data Cartridges) - Manage complex scientific data
Oracle8iServer
Service Interfaces
DataCartridge
Extensibility Interfaces
TypeSystem
QueryProcessing Data
IndexingServer
Execution . . .
Database Extensibility Services
Oracle9iServer
2. Integrate a Variety of Data Types
Chemistry searching requires special techniques
– Chemical name is not unique“Viagra®”
Chemical Searching
Chemistry searching requires special techniques
– Chemical name is not unique“Viagra®”
“sildenafil citrate”
Chemical Searching
Chemistry searching requires special techniques
– Chemical name is not unique“Viagra®”
“sildenafil citrate”
N
N
SO O
O
N
NN
N
O
H
H H
HHH
H
H
H
– Chemists think graphically
Chemical Searching
Chemistry searching requires special techniques
– Chemical name is not unique “Viagra®”
The solution:
– A graphical user interface
–Specialized operators such as substructure search (“sss”) = a chemical “contains”
“sildenafil citrate”
N
N
SO O
O
N
NN
N
O
H
H H
HHH
H
H
H
– Chemists think graphically
Cl
Cl
O
finds
Chemical Searching
MDL Information Systems, Inc. MDL Discovery Framework
A multi-tier system for managing and integrating discovery data and workflows
– Domain-specific application and database services and API
– Chemistry rules, drawing, and rendering
– Single application access to multiple DBs and services
Key Advantages– Integrate data sources across R&D– Easily create web or client
solutions– Quickly adopt new tools and
methods for development www.mdl.com
Oracle Features– Oracle 8i/9i Database
Extensibility Option (chemical data cartridge)
– Replication support– Oracle9iAS J2EE services
IDBS The ActivityBase Suite
– Capture, manage and use chemical and biological data in life sciences discovery
– Manage full range of disparate data types
– The leading application for drug discovery research worldwide
Key Advantages– Integration framework for
cheminformatics and bioinformatics data
– Rich data context enables data quality– Supports manual and automated data
capture & management– Maximizes the value of discovery data
www.id-bs.com
Oracle Features– Chemistry cartridge (ChemXtra)– PL/SQL stored procedures– JAVA stored procedures – XML– Materialized views– Data warehousing– 9i compatible
Grid support in Oracle 10g Oracle Scales to Petabytes
– Largest life sciences databases run Oracle– Oracle 80% market share - IDC
Partitioning– Divide and conquer
Oracle 10g Application Server– Provide scalability for middle tier
Oracle Data Guard– Protect data from human or system failures
3. Manage Vast Quantities of Data
0
50TB
100TB
150TB
200TB
250TB
300TB
350TB
400TB
450TB
500TB
19
94
19
95
19
96
19
97
19
98
Oc
t-1
99
9A
pr-
20
00
No
v-2
00
1J
an
-01
20
02
20
03
20
04
20
05
20
06
Data StorageToday
3. Manage Vast Quantities of Data Support for Grid
Distributed queries, External Tables, Security, RAC
Grid Access to Oracle Utilities through Globus Resource Allocation Manager (GRAM)
– Export, Import, SQLPlus Grid Access to Oracle 10g Database
– Invoke PL/SQL routines specified in Globus Resource Specification Language
Grid Resource Information Service (GRIS) for Oracle Database
– Discover & monitor Oracle databases
High-speedinterconnect
3. Manage Vast Quantities of Data
– Works with ALLapplications
– Fail-over transparent to users
– Easy to administer
• Real Application Clusters (RAC)– Start with one server, one database and grow
as you grow– Linear scalability out of the box– Save on Hardware and Storage costs
DataLoads
Sample/LabProteomics Portal
A-Z
Oracle Real Application Clusters Works for All Applications
OracleOracleOracleOracle
1. Add new node1. Add new node
2. Start instance on new 2. Start instance on new nodenode
1. Add new node1. Add new node
2. Start instance on new 2. Start instance on new nodenode
No Code Change
Oracle Real Application Clusters Greater Than 85% Scalability
0%
20%
40%
60%
80%
100%
1 Node 2 Nodes 4 Nodes 8 Nodes 16 Nodes
Leading biotech company– Over 2 TBs of data in Oracle– Oracle serves as a centralized
information resource for gene searching and database cross-referencing.
– Oracle used for the entire pipeline from research to clinical data to manufacturing and sales applications.
Key Advantages of Oracle– Improved performance – Greater reliability – Genentech's corporate goal is
99.999% availability in a 24x7 environment
Oracle Environment– Oracle 9i database– Real Application Clusters
Oracle9i Real Application Clusters provide the foundation for the scalable and highly available database infrastructure we require to meet our growing data demands in all areas of our business."
--Scooter Morris, Genentech, Inc.
Genentech, Inc.
The Dragon Genomics Centerof Takara Bio Inc.
High-Level Project Goals– Manage data throughout every
step of a complicated process– Create a laboratory information
management system (LIMS) enabling large scale sequencing
– Provide reliable back up and recovery of vast amounts of data
Key Benefits– Provided easy access and
management for vast amounts of data
– Ensured scalability needed to accommodate future growth
Oracle Environment– Oracle Database Enterprise
Edition– Oracle9iAS Enterprise Edition
"We trust Oracle in its ability to run terabyte-class databases in clustered environments with high availability. And we're pleased to say that Oracle has not disappointed us. "
-- Toru Suzuki, Project Manager, Dragon Genomics Center, Takara Bio Inc.
The Dragon Genomics Center of Takara Bio Inc., specializing in large-scale sequencing, is among the highest speed genome-analyzing centers in Asia.
Bioinformatics Center Institute for Chemical Research Kyoto University The Bioinformatics Center Institute for Chemical Research Kyoto University is leading biotechnology research thanks to its comprehensive studies in various areas, including the life sciences, information sciences, chemistry and physics.
“In order to manage this massive amount of genetic information and to operate efficiently, it is essential to have a platform with paramount stability. Our web site receives accesses from all over the world continuously, 24 hours a day. In order to offer the latest information under such circumstances, performance is also an issue. In this sense, the Oracle Database was the most appropriate since it can handle this enormous amount of data in a fast and stable manner, 24 hours a day.”
– Professor and Director Minoru Kanehisa, Bioinformatics Center Institute for Chemical Research Kyoto University
4. Find Patterns and Insights Oracle Data Mining
– Find relationships and clusters associated with healthy and diseased states
Naïve Bayes, Adaptive Bayes Networks, Attribute Importance, Association Rules, K-Means, O-Cluster, SVM, NMF algorithms
Data Mining for Java (DM4J) GUI wizards and results browser
Oracle Discoverer & Oracle OLAP– Interactive query & drill-down
Statistical functions– Perform basic statistics in Oracle
e.g. summary statistics, e.g. mean, stdev, median, quantiles, hypothesis testing, distribution fitting, correlations, linear regression
Oracle Text & Text Mining– Classify & cluster documents relevant to area of interest
Table Functions– Implement complex algorithms within the database
Deductive Analysis
Inductive Analysis
Answer complex questions about the
relationships in genomic, clinical and pharmacological data
Finding relationships for classification,
class discovery and prediction
Life Sciences data
Pharmacological databases
Proteomics Database
Clinical Databases
4. Find Patterns and Insights
Functional Genomic
Databases