Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be
incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in
making purchasing decision. The development, release, and timing of any features or functionality described for Oracle’s products
remains at the sole discretion of Oracle.
<Insert Picture Here>
Copyright © 2007 Oracle Corporation
Charlie BergerSr. Director Product Management, Data Mining Technologies & Life Sciences & Healthcare IndustriesOracle [email protected]
Oracle Platform for Life Sciences &
Healthcare and Oracle’s In-
Database Analytics
Copyright © 2007 Oracle Corporation
Agenda
• Oracle’s Platform for Life Sciences & Healthcare Overview
• Oracle's in-Database Data Mining and Statistical Functions
• Brief overview of in-database advanced analytics strategy and features
• Data mining examples in Healthcare and life sciences• Demonstrations
• Statistical functions• Fraud detection• Patient profiling• Text Mining
Data Warehousing
ETL
OLAP
Data Mining
Oracle 10Oracle 10gg DBDB
Statistics
Copyright © 2007 Oracle Corporation
Oracle Life & Health Sciences User Community
Copyright © 2007 Oracle Corporation
• Oracle Life Sciences User Group, Inc.
• 5+ years• Goal: Share best practices in
applying Oracle technology in life sciences & healthcare use cases
• Active User Group• 1-2 User Group Meetings per
each• Wiki web site
Copyright © 2007 Oracle Corporation
Oracle Life & Health Sciences User Community
Copyright © 2007 Oracle Corporation
Oracle Life & Health Science PlatformAccess distributed data
Gateways, External Tables, SQL Loader, Streams, Transparent Gateways, etc.
Integrate a variety of data typesRDF, XML DB, InterMedia, Text, Semantic, etc.
Manage vast quantities of dataRAC, ASM, Partitioning, Grid, etc.
Collaborate securelyCollaboration Suite, Oracle FilesOnline, Portal,
Security, etc.
Find patterns and insightsData Mining, Text Mining, Statistics,
Regular Expression Searches, etc.
GenomicsGenomics
ProteomicsProteomics
PathwaysPathways
CheminformaticsCheminformatics
ClinicalClinical
Copyright © 2007 Oracle Corporation
1. Access Distributed Data • SQL*Loader• Heterogeneous Transportable Tablespaces• Oracle Warehouse Builder• Merge Statement• Oracle Streams• Migration Toolkits• High Speed Import/Export• SRS Gateway• Migration Toolkit• Secure Enterprise Search
Flat files MySQL
Copyright © 2007 Oracle Corporation
Merge Statement
MATCHED THEN insert clauseWHEN NOT
WHENSKIP ( condition )
MERGE INTO table
USING table/view/subquery
ON ( condition )
WHEN MATCHED THEN update clause
Fast insert, update or conditional update/insert of records
Copyright © 2007 Oracle Corporation
Cross Platform Transportable Tablespaces• Copy database subsets
(Tablespaces) between databases• Operating system file copy for data• Managed transfer of metadata
(indexes, constraints, stats, etc.) between databases
• “Plug and go”
• High speed conversion between Big-Endian Little-Endian platforms
• Result: Extremely fast bulk data transport between databases
Copyright © 2007 Oracle Corporation
2. Integrate a Variety of Data Types• XML DB
• Unite XML content & SQL/relational data• LOBs
• Manage unstructured data e.g. BFILES, BLOBs, CLOBs, URIs
• Files(Oracle9iFS)• Central repository for structured & unstructured data
• Text• Index & fast query of text content
• interMedia• Manage audio, video & image data
• Network Data Model (Oracle Spatial)• Graph (arc node) relationships
• Extensible indexing• Manage & index complex scientific data
XMLXML
N
N
SO O
O
N
NN
N
O
H
H H
HHHH
H
H
Copyright © 2007 Oracle Corporation
Oracle Database 10g
interMedia• Ability to store wide range of image
types• Processing functionality
• Rotate/flip, brighten/darken using gamma processing, adjust contrast, change bit depth
• Access through SQL, Java & Web interfaces
• Restrict access via security roles• Conform to SQL/MM still image standard• Store images as columns
• Tight integration with annotations
Copyright © 2007 Oracle Corporation
Oracle Text
• Powerful text search and intelligent text management capabilities
• Fully integrated with the database• Text can be ASCII, HTML, XML, or
formatted (150+ formats supported)• Offers premier text search quality• Document Services such as themes,
gist, term highlighting and markup• Classification and clustering capabilities• Simply text applications development
via JDeveloper Wizards
Copyright © 2007 Oracle Corporation
Oracle RDF Data Model• W3C standard for the common data format• Based on triples (subject–predicate–object)
• Expression of well-defined & rich models of biological systems
• Everything has a URI• Annotating & sharing findings with others
• Object-relational implementation• Ontologies used to label the RDF tagged
elements• Subjects and objects are re-used• Links represent complete RDF triples• Applying logic to infer additional insights
Copyright © 2007 Oracle Corporation
•Real Application Clusters (RAC)•Provides high availability, performance and ease of scalability
•Grid Computing•Automated data and computational provisioning
•Automated Storage Management•Scheduler•Partitioning
•Divide and conquer•Oracle Data Guard
•Protect data from human or system failures•Oracle 10g Application Server
•Provide scalability for middle tier
3. Manage Vast Quantities of Data
Copyright © 2007 Oracle Corporation
4. Collaborate Securely•Oracle Collaboration Suite
• Integrated communications•Oracle 10gAS Portal
• Build personalized portals•Oracle Workflow
• Automate laboratory and business processes•Oracle 10gAS Files
• Enable content management and collaboration•Applications Express (formerly HTML DB)
• Develop and deploy database-centric Web applications•Virtual Private Database
• Different users have unique access privileges•Oracle Data Vault
• Solution for ensuring data is secure•Oracle Secure Backup
• Automated encrypted data to tape•Auditing
• Create audit trail to facilitate FDA compliance•Oracle 10gAS Web Services
• Standard way to collaborate through the Web
Copyright © 2007 Oracle Corporation
Oracle Files• Collaborate easily and
securely via workspaces• Groups of users can be
created with different project access privileges
• Protect your data from with role-based security
• Oracle Files supports HTTP/WebDAV, FTP, SMB, AFP, and NFS
• Stop sending/receiving email attachments
Copyright © 2007 Oracle Corporation
Data
Security & Privacy
Network
HealthcareWorker
Identify&
Authenticate
DiagnosisCoverage
Office Visit
Therapy
X-Ray
Enrollment
Lab Test
Rx Shot
Cert 973
Cert Child
Outpatient
Accesscontrol
Nurse
Doctor
Clerical
Employer
Privacy &integrity of
data
Comprehensiveauditing
Privacy &integrity of
communications
Copyright © 2007 Oracle Corporation
5. Discover Patterns and Insights
• Oracle Data Mining Option• In-Database Mining Find relationships and clusters
• Naïve Bayes, Support Vector Machine, Decision Tree, Attribute Importance, Association Rules, K-Means Clustering, O-Cluster, NMF algorithms
• Oracle BI EE• Interactive query & drill-down, Dashboards
• Oracle OLAP Option• In-Database OLAP
• Statistics• Perform statistics in Oracle
• For example, summary statistics, hypothesis tests, cross-tab statistics, distribution tests, correlations, linear regression
• Oracle Text• Search, index, classify and cluster documents
• IEEE Float support• Table Functions
• Implement complex algorithms within the database
Copyright © 2007 Oracle Corporation
11g Statistics & SQL AnalyticsFREE (Included in Oracle Database SE & EE)
• Ranking functions• rank, dense_rank, cume_dist, percent_rank, ntile
• Window Aggregate functions (moving and cumulative)
• Avg, sum, min, max, count, variance, stddev, first_value, last_value
• LAG/LEAD functions• Direct inter-row reference using offsets
• Reporting Aggregate functions• Sum, avg, min, max, variance, stddev, count,
ratio_to_report
• Statistical Aggregates• Correlation, linear regression family, covariance
• Linear regression• Fitting of an ordinary-least-squares regression line
to a set of number pairs. • Frequently combined with the COVAR_POP,
COVAR_SAMP, and CORR functions.
• Descriptive Statistics• average, standard deviation, variance, min, max, median
(via percentile_count), mode, group-by & roll-up• DBMS_STAT_FUNCS: summarizes numerical columns
of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values
• Correlations• Pearson’s correlation coefficients, Spearman's and
Kendall's (both nonparametric).
• Cross Tabs• Enhanced with % statistics: chi squared, phi coefficient,
Cramer's V, contingency coefficient, Cohen's kappa
• Hypothesis Testing• Student t-test , F-test, Binomial test, Wilcoxon Signed
Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA
• Distribution Fitting• Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-
Squared Test, Normal, Uniform, Weibull, Exponential
• Pareto Analysis (documented)• 80:20 rule, cumulative results table
Note: Statistics and SQL Analytics are included in Oracle Database Standard Edition
Copyright © 2007 Oracle Corporation
Copyright © 2007 Oracle Corporation
What is Data Mining?• Process of sifting through massive amounts
of data to find hidden patterns and discover new insights
• Data Mining can provide valuable results:• Identify factors more associated with a target attribute
(Attribute Importance)• Predict individual behavior (Classification)• Find profiles of targeted people or items
(Decision Trees)• Segment a population (Clustering)• Determine important relationships with the population
(Associations)• Find fraud or rare “events” (Anomaly Detection)
Copyright © 2007 Oracle Corporation
Data MiningOverview (Classification)
FunctionalRelationship:Y = F(X1, X2, …, Xm)
Cases
Name Income Age . . . . . . .Buy Product?1 =Yes, 0 =No
JonesSmithLeeRogers
30,00055,00025,00050,000
30672344
1100
Model
Historic Data
CamposHornickHabersBerger
40,50037,00057,20095,600
52733234
New Data.85.74.93.65
Prediction Confidence
1001
????
Input Attributes Target
Copyright © 2007 Oracle Corporation
Data Mining ProvidesBetter Information, Valuable Insights and Predictions
Inco
me
Customer Months
Cell Phone Churners vs. Loyal Customers
InsightSegment #1: IF CUST_MO > 14 AND INCOME < $90K, THEN Prediction = Cell Phone Churner, Confidence = 100%, Support = 8/39
Segment #3: IF CUST_MO > 7 AND INCOME < $175K, THEN Prediction = Cell Phone Churner, Confidence = 83%, Support = 6/39
Copyright © 2007 Oracle Corporation
ExampleSimple, Predictive SQL
• Select patients who are more than 60% likely to become very sick with Lymphoma and display their Treatment_Plan
SELECT * from(SELECT A.I_D, A.TREATMENT_PLAN,
PREDICTION_PROBABILITY(LYMPHOMA70788_DT, 1 USING A.*) prob
FROM CBERGER.LYMPHOMA A)WHERE prob > 0.6;
Copyright © 2007 Oracle Corporation
In-Database AnalyticsAdvantages
• Data remains in the database at all times…with appropriate access security control mechanisms—fewer moving parts
• Straightforward inclusion within interesting and arbitrarily complex queries
• Real-world scalability—available for mission critical appls• Enabling pipelining of results without costly materialization• Scalable & Performant
• Fast scoring: • 2.5 million records scored in 6 seconds on a single CPU system
• Real-time scoring: • 100 models on a single CPU: 0.085 seconds
Data Warehousing
ETL
OLAP
Data Mining
Oracle 10Oracle 10gg DBDB
Statistics
Copyright © 2007 Oracle Corporation
Oracle Data MiningAlgorithm Summary 11g
Classification
Association Rules
Clustering
Attribute Importance
Problem Algorithm ApplicabilityClassical statistical techniquePopular / Rules / transparencyEmbedded appWide / narrow data / text
Minimum Description Length (MDL)
Attribute reductionIdentify useful dataReduce data noise
Hierarchical K-Means
Hierarchical O-Cluster
Product groupingText miningGene and protein analysis
AprioriMarket basket analysisLink analysis
Multiple Regression (GLM)Support Vector Machine
Classical statistical techniqueWide / narrow data / text
Regression
Feature Extraction NMFText analysisFeature reduction
Logistic Regression (GLM)Decision TreesNaïve Bayes Support Vector Machine
One Class SVM Lack examplesAnomaly Detection
Copyright © 2007 Oracle Corporation
Oracle Data Mining 11gOracle in-Database Mining Engine
• In-Database PL/SQL API & Java (JDM) APIs• Develop advanced analytical applications
• Oracle Data Miner (GUI)• Simplified, guided data mining
• Spreadsheet Add-In for Predictive Analytics• “1-click data mining” from a spreadsheet
• Wide range of algorithms• Anomaly detection • Attribute importance• Association rules • Clustering • Classification & regression• Nonnegative matrix factorization• Multiple and logistic regression • Structured & unstructured data (text mining)
Data Warehousing
ETL
OLAP
Data Mining
Oracle 11Oracle 11gg DBDB
Statistics
Copyright © 2007 Oracle Corporation
Oracle Data MiningDecision Trees• Classification, Prediction, Patient “profiling”
>45 <45
No Infection Infection <=35>35
<100 M>100 >4
Age
Risk = 0 Risk = 1 Risk = 1 Risk = 0
<=4
Risk = 1
F
Risk = 0
Age
Status
Temp Gender Days ICU
Simple DM model:Can include unstructured data (e.g. text comments), transactions data (e.g. purchases), etc.
IF (Age > 45 AND Status = Infection AND Temp = >100) THEN P(High Risk=1) = .77 Support = 250
Copyright © 2007 Oracle Corporation
Oracle Data Mining Anomaly Detection
• “One-Class” SVM Models• Fraud, noncompliance• Outlier detection • Network intrusion detection • Disease outbreaks• Rare events, true novelty
Problem: Detect rare cases
Copyright © 2007 Oracle Corporation
Spreadsheet Add-In for Predictive Analytics
• Enables Excel users to “mine”Oracle or Excel data using “one click” Predict and Explain predictive analytics features
• Users select a table or view, or point to data in Excel, and select a target attribute
Copyright © 2007 Oracle Corporation
PROFILE
•PROFILE finds targeted customer segments, size and confidence and “rules”
Copyright © 2007 Oracle Corporation
Oracle Data Mining & Oracle Text
• Oracle Data Mining mines “text” to build classification and clustering models
• Oracle Text (included in Oracle Database Standard Edition)preprocesses unstructured text
• Handles large volumes of “documents” or text
Copyright © 2007 Oracle Corporation
Deductive Analysis
Inductive Analysis
Answer complex questions about the
relationships in genomic, clinical and
pharmacological data
Finding relationships for classification,
class discovery and prediction
Life & Health Sciences data
Pharmacological databases
Proteomics Database
Clinical Databases
5. Discover Patterns and Insights
Functional Genomic
Databases
Copyright © 2007 Oracle Corporation
Biological/ Clinical
Experiments
InstrumentsData Pre-
ProcessingInterpretation
of Results
New PaperNew Drug
New TreatmentNew DB Entries
Life Science Discovery Phases:• Exploratory/Prototype Analysis
• Application Development
• Production System
Analytical Pipelines
Analytical Algorithms
FilesFilesFilesFiles
Perl Scripts
Perl Scripts AlgorithmsAlgorithms
Files
DB Files Files
AlgorithmsPerl
ScriptsFiles
DBFiles
Oracle Life Sciences Platform
C A T G0 0 1 0 1
Copyright © 2007 Oracle Corporation
Walter Reed Medical Center• Improving clinical outcomes
Copyright © 2007 Oracle Corporation
Analytics vs. 1. In-Database Analytics Engine
Basic Statistics (Free)Data MiningText Mining
2. Development Platform
Java (standard)
SQL (standard)
J2EE (standard)
3. Costs (ODM: $20K cpu)Simplified environmentSingle serverSecurity
1. External Analytical EngineBasic StatisticsData MiningText Mining (separate: SAS EM for Text)
Advanced Statistics
2. Development Platform
SAS Code (proprietary)
3. Costs (SAS EM: $150K/5 users)Annual Renewal Fee
(~40% each year)
Data Warehousing
ETL
OLAP
Data Mining
Oracle 10Oracle 10gg DBDB
Statistics
<Insert Picture Here>
Copyright © 2007 Oracle Corporation
Oracle Data Mining
D E M O N S T R A T I O N S
1. Oracle Statistical Functions
2. Oracle Data Miner (gui)
3. ODMr Code Generation feature
4. Oracle Spread-sheet Add-in for Predictive Analytics
5. Oracle Text Mining
Copyright © 2007 Oracle Corporation
Descriptive Statistics
• MEDIAN & MODE• Median: takes numeric or datetype values and returns the middle
value• Mode: returns the most common value
A. SELECT STATS_MODE(EDUCATION) from CD_BUYERS;
B. SELECT MEDIAN(ANNUAL_INCOME) from CD_BUYERS;
C. SELECT EDUCATION, MEDIAN(ANNUAL_INCOME) from CD_BUYERS GROUP BY EDUCATION;
D. SELECT EDUCATION, MEDIAN(ANNUAL_INCOME) from CD_BUYERS GROUP BY EDUCATION ORDER BY MEDIAN(ANNUAL_INCOME) ASC;
> SQL
Copyright © 2007 Oracle Corporation
One-Sample T-Test
STATS_T_TEST_* The t-test functions are:STATS_T_TEST_ONE: A one-sample t-testSTATS_T_TEST_PAIRED: A two-sample, paired t-test (also known as
a crossed t-test)STATS_T_TEST_INDEP: A t-test of two independent groups with the
same variance (pooled variances)STATS_T_TEST_INDEPU: A t-test of two independent groups with
unequal variance (unpooled variances)
http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/functions157.htm
Copyright © 2007 Oracle Corporation
Independent Samples T-Test
Query compares the mean of AMOUNT_SOLD between MEN and WOMEN within CUST_INCOME_LEVEL ranges Comparison results grouped
by gender with statistical level of significance
Copyright © 2007 Oracle Corporation
Customer Example
"..Our experience suggests that Oracle 10g Statistics and Data Mining features can reduce development effort of analytical systems by an order of magnitude."
Sumeet Muju Senior Member of Professional Staff, SRA International (SRA supports NIH bioinformatics
development projects)
<Insert Picture Here>
Copyright © 2007 Oracle Corporation
Oracle Data Mining D E M O N S T R A T I O N
Data Warehousing
ETL
OLAP
Data Mining
Oracle 11Oracle 11gg DBDB
Statistics
Copyright © 2007 Oracle Corporation
Oracle Data Mining Oracle Data Mining provides summary statistical information prior to data mining
Copyright © 2007 Oracle Corporation
Oracle Data Mining
Oracle Data Mining’s Activity Guides simplify & automate data mining for business users
Oracle Data Mining provides model performance and evaluation viewers
Copyright © 2007 Oracle Corporation
Oracle Data Mining
Additional model evaluation viewersAdditional model evaluation viewers
Apply model viewers
<Insert Picture Here>
Copyright © 2007 Oracle Corporation
Oracle Data Miner Code Generation
Data Warehousing
ETL
OLAP
Data Mining
Oracle 11Oracle 11gg DBDB
Statistics
Copyright © 2007 Oracle Corporation
Oracle Data Miner (gui)Code Generation
• PL/SQL code generation for Mining Activities
Copyright © 2007 Oracle Corporation
Oracle Data Miner (gui) Code Generation
Copyright © 2007 Oracle Corporation
Oracle Data Miner (gui) Code Generation
Copyright © 2007 Oracle Corporation
Oracle Data Miner (gui) Code Generation
Copyright © 2007 Oracle Corporation
Oracle Data Miner (gui) Code Generation
<Insert Picture Here>
Copyright © 2007 Oracle Corporation
Oracle Business Intelligence EE
Data Warehousing
ETL
OLAP
Data Mining
Oracle 11Oracle 11gg DBDB
Statistics
Copyright © 2007 Oracle Corporation
Oracle Data Mining results available to Oracle BI EE administrators
Oracle BI EE defines results for end user presentation
Integration with Oracle BI EE
Copyright © 2007 Oracle Corporation
Likelihood to buy
Integration with Oracle BI EE
Create Customers Categories
Oracle
Copyright © 2007 Oracle Corporation
Integration with Oracle BI EE ODM provides likelihood of fraud and other important questions.
<Insert Picture Here>
Copyright © 2007 Oracle Corporation
PartnersData Warehousing
ETL
OLAP
Data Mining
Oracle 11Oracle 11gg DBDB
Statistics
Copyright © 2007 Oracle Corporation
SPSS Clementine• NASDAQ-listed, top 25
software company• 35+ year heritage in
analytic technologies• Operations in over 60
countries• 95+% of FORTUNE
1000 are SPSS customers• Combine SPSS Clementine
ease of use with ODM in-Database functionality & scalability
• Build, store, browse and score models in the Database for optimal performance
• For more information :• SPSS – Mike Bittner, Strategic Alliance Manager, 770.329.3870 or [email protected]• Oracle – Alan Manewitz, TBU, (925) 984-9910 or [email protected]• Oracle – Charlie Berger, Product Management (781) 744-0324 or [email protected]
Copyright © 2007 Oracle Corporation
Oracle Data Sources
Data Mining
Preprocess
Statistics
Text
OLAP
Scheduler
Oracle Functionalities:
Deploy the analytic workflow as an Oracle Portal
Oracle Decision Tree Model
InforSense -- A Single Optimized Environment for Real Time Business Analytics within the Database
SAS free analytics: leverage Oracle analyticsSQL free analytics: drag-drop application buildVisual analytics: interactive visualisation
Integrative analytics: unified analytical environmentAutomated analytics: deploy to Oracle Portal and BPEL
InforSenseService
Deploy the analytic workflow as a service embedding to BPEL, SFA, CRM
Interact with (visualize) data at any step in the workflow
Deployment
Copyright © 2007 Oracle Corporation
Find me any compound that looks like my current structure, and that has been tested on any assay in my company where the IC50>200nM, where I know that I have a unique patent position, and hasn't been published in any journal?
Find me any compound that looks like my current structure, and that has been tested on any assay in my company where the IC50>200nM, where I know that I have a unique patent position, and hasn't been published in any journal?
Oracle11gOracle11gOracle11g
Relational
Message
XML
Text
Image
Oracle’s Contribution to Life Sciences
select c.id, p.structure, from compound c, protein p, assay awhere a.compound_id = c.idand a.protein_id = p.idand a.company = “BIO_SYS”and a.IC50 > 200nM and similar_to(p.id, “protein kinase”)and not_published(p.id, “Medline”)and extract_value(value(p.id), ‘Dgene/Protein/Id’) = p.id
Copyright © 2007 Oracle Corporation
• Oracle 11g Enables you to:• Access distributed data• Integrate a variety of data types• Manage vast quantities of data• Collaborate securely• Find patterns and insights
• Oracle 11g is an ideal platform for health & life sciences
Oracle Life & Health Sciences Platform
<Insert Picture Here>
Copyright © 2007 Oracle Corporation
More Information:
Contact Information:Email: [email protected]
Oracle Life Sciences Platform•oracle.com/technology/industries/life_sciences/index.html
Oracle Data Mining 10g •oracle.com/technology/products/bi/odm/index.html
Oracle Statistical Functions•oracle.com/technology/products/bi/stats_fns/index.html
Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”