65
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”

“This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”

Page 2: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be

incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in

making purchasing decision. The development, release, and timing of any features or functionality described for Oracle’s products

remains at the sole discretion of Oracle.

Page 3: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

<Insert Picture Here>

Copyright © 2007 Oracle Corporation

Charlie BergerSr. Director Product Management, Data Mining Technologies & Life Sciences & Healthcare IndustriesOracle [email protected]

Oracle Platform for Life Sciences &

Healthcare and Oracle’s In-

Database Analytics

Page 4: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Agenda

• Oracle’s Platform for Life Sciences & Healthcare Overview

• Oracle's in-Database Data Mining and Statistical Functions

• Brief overview of in-database advanced analytics strategy and features

• Data mining examples in Healthcare and life sciences• Demonstrations

• Statistical functions• Fraud detection• Patient profiling• Text Mining

Data Warehousing

ETL

OLAP

Data Mining

Oracle 10Oracle 10gg DBDB

Statistics

Page 5: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Life & Health Sciences User Community

Page 6: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

• Oracle Life Sciences User Group, Inc.

• 5+ years• Goal: Share best practices in

applying Oracle technology in life sciences & healthcare use cases

• Active User Group• 1-2 User Group Meetings per

each• Wiki web site

Page 7: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Life & Health Sciences User Community

Page 8: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Life & Health Science PlatformAccess distributed data

Gateways, External Tables, SQL Loader, Streams, Transparent Gateways, etc.

Integrate a variety of data typesRDF, XML DB, InterMedia, Text, Semantic, etc.

Manage vast quantities of dataRAC, ASM, Partitioning, Grid, etc.

Collaborate securelyCollaboration Suite, Oracle FilesOnline, Portal,

Security, etc.

Find patterns and insightsData Mining, Text Mining, Statistics,

Regular Expression Searches, etc.

GenomicsGenomics

ProteomicsProteomics

PathwaysPathways

CheminformaticsCheminformatics

ClinicalClinical

Page 9: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

1. Access Distributed Data • SQL*Loader• Heterogeneous Transportable Tablespaces• Oracle Warehouse Builder• Merge Statement• Oracle Streams• Migration Toolkits• High Speed Import/Export• SRS Gateway• Migration Toolkit• Secure Enterprise Search

Flat files MySQL

Page 10: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Merge Statement

MATCHED THEN insert clauseWHEN NOT

WHENSKIP ( condition )

MERGE INTO table

USING table/view/subquery

ON ( condition )

WHEN MATCHED THEN update clause

Fast insert, update or conditional update/insert of records

Page 11: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Cross Platform Transportable Tablespaces• Copy database subsets

(Tablespaces) between databases• Operating system file copy for data• Managed transfer of metadata

(indexes, constraints, stats, etc.) between databases

• “Plug and go”

• High speed conversion between Big-Endian Little-Endian platforms

• Result: Extremely fast bulk data transport between databases

Page 12: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

2. Integrate a Variety of Data Types• XML DB

• Unite XML content & SQL/relational data• LOBs

• Manage unstructured data e.g. BFILES, BLOBs, CLOBs, URIs

• Files(Oracle9iFS)• Central repository for structured & unstructured data

• Text• Index & fast query of text content

• interMedia• Manage audio, video & image data

• Network Data Model (Oracle Spatial)• Graph (arc node) relationships

• Extensible indexing• Manage & index complex scientific data

XMLXML

N

N

SO O

O

N

NN

N

O

H

H H

HHHH

H

H

Page 13: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Database 10g

interMedia• Ability to store wide range of image

types• Processing functionality

• Rotate/flip, brighten/darken using gamma processing, adjust contrast, change bit depth

• Access through SQL, Java & Web interfaces

• Restrict access via security roles• Conform to SQL/MM still image standard• Store images as columns

• Tight integration with annotations

Page 14: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Text

• Powerful text search and intelligent text management capabilities

• Fully integrated with the database• Text can be ASCII, HTML, XML, or

formatted (150+ formats supported)• Offers premier text search quality• Document Services such as themes,

gist, term highlighting and markup• Classification and clustering capabilities• Simply text applications development

via JDeveloper Wizards

Page 15: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle RDF Data Model• W3C standard for the common data format• Based on triples (subject–predicate–object)

• Expression of well-defined & rich models of biological systems

• Everything has a URI• Annotating & sharing findings with others

• Object-relational implementation• Ontologies used to label the RDF tagged

elements• Subjects and objects are re-used• Links represent complete RDF triples• Applying logic to infer additional insights

Page 16: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

•Real Application Clusters (RAC)•Provides high availability, performance and ease of scalability

•Grid Computing•Automated data and computational provisioning

•Automated Storage Management•Scheduler•Partitioning

•Divide and conquer•Oracle Data Guard

•Protect data from human or system failures•Oracle 10g Application Server

•Provide scalability for middle tier

3. Manage Vast Quantities of Data

Page 17: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

4. Collaborate Securely•Oracle Collaboration Suite

• Integrated communications•Oracle 10gAS Portal

• Build personalized portals•Oracle Workflow

• Automate laboratory and business processes•Oracle 10gAS Files

• Enable content management and collaboration•Applications Express (formerly HTML DB)

• Develop and deploy database-centric Web applications•Virtual Private Database

• Different users have unique access privileges•Oracle Data Vault

• Solution for ensuring data is secure•Oracle Secure Backup

• Automated encrypted data to tape•Auditing

• Create audit trail to facilitate FDA compliance•Oracle 10gAS Web Services

• Standard way to collaborate through the Web

Page 18: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Files• Collaborate easily and

securely via workspaces• Groups of users can be

created with different project access privileges

• Protect your data from with role-based security

• Oracle Files supports HTTP/WebDAV, FTP, SMB, AFP, and NFS

• Stop sending/receiving email attachments

Page 19: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Data

Security & Privacy

Network

HealthcareWorker

Identify&

Authenticate

DiagnosisCoverage

Office Visit

Therapy

X-Ray

Enrollment

Lab Test

Rx Shot

Cert 973

Cert Child

Outpatient

Accesscontrol

Nurse

Doctor

Clerical

Employer

Privacy &integrity of

data

Comprehensiveauditing

Privacy &integrity of

communications

Page 20: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

5. Discover Patterns and Insights

• Oracle Data Mining Option• In-Database Mining Find relationships and clusters

• Naïve Bayes, Support Vector Machine, Decision Tree, Attribute Importance, Association Rules, K-Means Clustering, O-Cluster, NMF algorithms

• Oracle BI EE• Interactive query & drill-down, Dashboards

• Oracle OLAP Option• In-Database OLAP

• Statistics• Perform statistics in Oracle

• For example, summary statistics, hypothesis tests, cross-tab statistics, distribution tests, correlations, linear regression

• Oracle Text• Search, index, classify and cluster documents

• IEEE Float support• Table Functions

• Implement complex algorithms within the database

Page 21: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

11g Statistics & SQL AnalyticsFREE (Included in Oracle Database SE & EE)

• Ranking functions• rank, dense_rank, cume_dist, percent_rank, ntile

• Window Aggregate functions (moving and cumulative)

• Avg, sum, min, max, count, variance, stddev, first_value, last_value

• LAG/LEAD functions• Direct inter-row reference using offsets

• Reporting Aggregate functions• Sum, avg, min, max, variance, stddev, count,

ratio_to_report

• Statistical Aggregates• Correlation, linear regression family, covariance

• Linear regression• Fitting of an ordinary-least-squares regression line

to a set of number pairs. • Frequently combined with the COVAR_POP,

COVAR_SAMP, and CORR functions.

• Descriptive Statistics• average, standard deviation, variance, min, max, median

(via percentile_count), mode, group-by & roll-up• DBMS_STAT_FUNCS: summarizes numerical columns

of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values

• Correlations• Pearson’s correlation coefficients, Spearman's and

Kendall's (both nonparametric).

• Cross Tabs• Enhanced with % statistics: chi squared, phi coefficient,

Cramer's V, contingency coefficient, Cohen's kappa

• Hypothesis Testing• Student t-test , F-test, Binomial test, Wilcoxon Signed

Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA

• Distribution Fitting• Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-

Squared Test, Normal, Uniform, Weibull, Exponential

• Pareto Analysis (documented)• 80:20 rule, cumulative results table

Note: Statistics and SQL Analytics are included in Oracle Database Standard Edition

Page 22: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Page 23: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

What is Data Mining?• Process of sifting through massive amounts

of data to find hidden patterns and discover new insights

• Data Mining can provide valuable results:• Identify factors more associated with a target attribute

(Attribute Importance)• Predict individual behavior (Classification)• Find profiles of targeted people or items

(Decision Trees)• Segment a population (Clustering)• Determine important relationships with the population

(Associations)• Find fraud or rare “events” (Anomaly Detection)

Page 24: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Data MiningOverview (Classification)

FunctionalRelationship:Y = F(X1, X2, …, Xm)

Cases

Name Income Age . . . . . . .Buy Product?1 =Yes, 0 =No

JonesSmithLeeRogers

30,00055,00025,00050,000

30672344

1100

Model

Historic Data

CamposHornickHabersBerger

40,50037,00057,20095,600

52733234

New Data.85.74.93.65

Prediction Confidence

1001

????

Input Attributes Target

Page 25: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Data Mining ProvidesBetter Information, Valuable Insights and Predictions

Inco

me

Customer Months

Cell Phone Churners vs. Loyal Customers

InsightSegment #1: IF CUST_MO > 14 AND INCOME < $90K, THEN Prediction = Cell Phone Churner, Confidence = 100%, Support = 8/39

Segment #3: IF CUST_MO > 7 AND INCOME < $175K, THEN Prediction = Cell Phone Churner, Confidence = 83%, Support = 6/39

Page 26: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

ExampleSimple, Predictive SQL

• Select patients who are more than 60% likely to become very sick with Lymphoma and display their Treatment_Plan

SELECT * from(SELECT A.I_D, A.TREATMENT_PLAN,

PREDICTION_PROBABILITY(LYMPHOMA70788_DT, 1 USING A.*) prob

FROM CBERGER.LYMPHOMA A)WHERE prob > 0.6;

Page 27: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

In-Database AnalyticsAdvantages

• Data remains in the database at all times…with appropriate access security control mechanisms—fewer moving parts

• Straightforward inclusion within interesting and arbitrarily complex queries

• Real-world scalability—available for mission critical appls• Enabling pipelining of results without costly materialization• Scalable & Performant

• Fast scoring: • 2.5 million records scored in 6 seconds on a single CPU system

• Real-time scoring: • 100 models on a single CPU: 0.085 seconds

Data Warehousing

ETL

OLAP

Data Mining

Oracle 10Oracle 10gg DBDB

Statistics

Page 28: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data MiningAlgorithm Summary 11g

Classification

Association Rules

Clustering

Attribute Importance

Problem Algorithm ApplicabilityClassical statistical techniquePopular / Rules / transparencyEmbedded appWide / narrow data / text

Minimum Description Length (MDL)

Attribute reductionIdentify useful dataReduce data noise

Hierarchical K-Means

Hierarchical O-Cluster

Product groupingText miningGene and protein analysis

AprioriMarket basket analysisLink analysis

Multiple Regression (GLM)Support Vector Machine

Classical statistical techniqueWide / narrow data / text

Regression

Feature Extraction NMFText analysisFeature reduction

Logistic Regression (GLM)Decision TreesNaïve Bayes Support Vector Machine

One Class SVM Lack examplesAnomaly Detection

Page 29: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Mining 11gOracle in-Database Mining Engine

• In-Database PL/SQL API & Java (JDM) APIs• Develop advanced analytical applications

• Oracle Data Miner (GUI)• Simplified, guided data mining

• Spreadsheet Add-In for Predictive Analytics• “1-click data mining” from a spreadsheet

• Wide range of algorithms• Anomaly detection • Attribute importance• Association rules • Clustering • Classification & regression• Nonnegative matrix factorization• Multiple and logistic regression • Structured & unstructured data (text mining)

Data Warehousing

ETL

OLAP

Data Mining

Oracle 11Oracle 11gg DBDB

Statistics

Page 30: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data MiningDecision Trees• Classification, Prediction, Patient “profiling”

>45 <45

No Infection Infection <=35>35

<100 M>100 >4

Age

Risk = 0 Risk = 1 Risk = 1 Risk = 0

<=4

Risk = 1

F

Risk = 0

Age

Status

Temp Gender Days ICU

Simple DM model:Can include unstructured data (e.g. text comments), transactions data (e.g. purchases), etc.

IF (Age > 45 AND Status = Infection AND Temp = >100) THEN P(High Risk=1) = .77 Support = 250

Page 31: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Mining Anomaly Detection

• “One-Class” SVM Models• Fraud, noncompliance• Outlier detection • Network intrusion detection • Disease outbreaks• Rare events, true novelty

Problem: Detect rare cases

Page 32: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Spreadsheet Add-In for Predictive Analytics

• Enables Excel users to “mine”Oracle or Excel data using “one click” Predict and Explain predictive analytics features

• Users select a table or view, or point to data in Excel, and select a target attribute

Page 33: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

PROFILE

•PROFILE finds targeted customer segments, size and confidence and “rules”

Page 34: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Mining & Oracle Text

• Oracle Data Mining mines “text” to build classification and clustering models

• Oracle Text (included in Oracle Database Standard Edition)preprocesses unstructured text

• Handles large volumes of “documents” or text

Page 35: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Deductive Analysis

Inductive Analysis

Answer complex questions about the

relationships in genomic, clinical and

pharmacological data

Finding relationships for classification,

class discovery and prediction

Life & Health Sciences data

Pharmacological databases

Proteomics Database

Clinical Databases

5. Discover Patterns and Insights

Functional Genomic

Databases

Page 36: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Biological/ Clinical

Experiments

InstrumentsData Pre-

ProcessingInterpretation

of Results

New PaperNew Drug

New TreatmentNew DB Entries

Life Science Discovery Phases:• Exploratory/Prototype Analysis

• Application Development

• Production System

Analytical Pipelines

Analytical Algorithms

FilesFilesFilesFiles

Perl Scripts

Perl Scripts AlgorithmsAlgorithms

Files

DB Files Files

AlgorithmsPerl

ScriptsFiles

DBFiles

Oracle Life Sciences Platform

C A T G0 0 1 0 1

Page 37: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Walter Reed Medical Center• Improving clinical outcomes

Page 38: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Analytics vs. 1. In-Database Analytics Engine

Basic Statistics (Free)Data MiningText Mining

2. Development Platform

Java (standard)

SQL (standard)

J2EE (standard)

3. Costs (ODM: $20K cpu)Simplified environmentSingle serverSecurity

1. External Analytical EngineBasic StatisticsData MiningText Mining (separate: SAS EM for Text)

Advanced Statistics

2. Development Platform

SAS Code (proprietary)

3. Costs (SAS EM: $150K/5 users)Annual Renewal Fee

(~40% each year)

Data Warehousing

ETL

OLAP

Data Mining

Oracle 10Oracle 10gg DBDB

Statistics

Page 39: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

<Insert Picture Here>

Copyright © 2007 Oracle Corporation

Oracle Data Mining

D E M O N S T R A T I O N S

1. Oracle Statistical Functions

2. Oracle Data Miner (gui)

3. ODMr Code Generation feature

4. Oracle Spread-sheet Add-in for Predictive Analytics

5. Oracle Text Mining

Page 40: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Descriptive Statistics

• MEDIAN & MODE• Median: takes numeric or datetype values and returns the middle

value• Mode: returns the most common value

A. SELECT STATS_MODE(EDUCATION) from CD_BUYERS;

B. SELECT MEDIAN(ANNUAL_INCOME) from CD_BUYERS;

C. SELECT EDUCATION, MEDIAN(ANNUAL_INCOME) from CD_BUYERS GROUP BY EDUCATION;

D. SELECT EDUCATION, MEDIAN(ANNUAL_INCOME) from CD_BUYERS GROUP BY EDUCATION ORDER BY MEDIAN(ANNUAL_INCOME) ASC;

> SQL

Page 41: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

One-Sample T-Test

STATS_T_TEST_* The t-test functions are:STATS_T_TEST_ONE: A one-sample t-testSTATS_T_TEST_PAIRED: A two-sample, paired t-test (also known as

a crossed t-test)STATS_T_TEST_INDEP: A t-test of two independent groups with the

same variance (pooled variances)STATS_T_TEST_INDEPU: A t-test of two independent groups with

unequal variance (unpooled variances)

http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/functions157.htm

Page 42: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Independent Samples T-Test

Query compares the mean of AMOUNT_SOLD between MEN and WOMEN within CUST_INCOME_LEVEL ranges Comparison results grouped

by gender with statistical level of significance

Page 43: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Customer Example

"..Our experience suggests that Oracle 10g Statistics and Data Mining features can reduce development effort of analytical systems by an order of magnitude."

Sumeet Muju Senior Member of Professional Staff, SRA International (SRA supports NIH bioinformatics

development projects)

Page 44: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

<Insert Picture Here>

Copyright © 2007 Oracle Corporation

Oracle Data Mining D E M O N S T R A T I O N

Data Warehousing

ETL

OLAP

Data Mining

Oracle 11Oracle 11gg DBDB

Statistics

Page 45: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Mining Oracle Data Mining provides summary statistical information prior to data mining

Page 46: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Mining

Oracle Data Mining’s Activity Guides simplify & automate data mining for business users

Oracle Data Mining provides model performance and evaluation viewers

Page 47: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Mining

Additional model evaluation viewersAdditional model evaluation viewers

Apply model viewers

Page 48: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

<Insert Picture Here>

Copyright © 2007 Oracle Corporation

Oracle Data Miner Code Generation

Data Warehousing

ETL

OLAP

Data Mining

Oracle 11Oracle 11gg DBDB

Statistics

Page 49: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Miner (gui)Code Generation

• PL/SQL code generation for Mining Activities

Page 50: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Miner (gui) Code Generation

Page 51: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Miner (gui) Code Generation

Page 52: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Miner (gui) Code Generation

Page 53: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Miner (gui) Code Generation

Page 54: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

<Insert Picture Here>

Copyright © 2007 Oracle Corporation

Oracle Business Intelligence EE

Data Warehousing

ETL

OLAP

Data Mining

Oracle 11Oracle 11gg DBDB

Statistics

Page 55: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Mining results available to Oracle BI EE administrators

Oracle BI EE defines results for end user presentation

Integration with Oracle BI EE

Page 56: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Likelihood to buy

Integration with Oracle BI EE

Create Customers Categories

Oracle

Page 57: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Integration with Oracle BI EE ODM provides likelihood of fraud and other important questions.

Page 58: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

<Insert Picture Here>

Copyright © 2007 Oracle Corporation

PartnersData Warehousing

ETL

OLAP

Data Mining

Oracle 11Oracle 11gg DBDB

Statistics

Page 59: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

SPSS Clementine• NASDAQ-listed, top 25

software company• 35+ year heritage in

analytic technologies• Operations in over 60

countries• 95+% of FORTUNE

1000 are SPSS customers• Combine SPSS Clementine

ease of use with ODM in-Database functionality & scalability

• Build, store, browse and score models in the Database for optimal performance

• For more information :• SPSS – Mike Bittner, Strategic Alliance Manager, 770.329.3870 or [email protected]• Oracle – Alan Manewitz, TBU, (925) 984-9910 or [email protected]• Oracle – Charlie Berger, Product Management (781) 744-0324 or [email protected]

Page 60: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Oracle Data Sources

Data Mining

Preprocess

Statistics

Text

OLAP

Scheduler

Oracle Functionalities:

Deploy the analytic workflow as an Oracle Portal

Oracle Decision Tree Model

InforSense -- A Single Optimized Environment for Real Time Business Analytics within the Database

SAS free analytics: leverage Oracle analyticsSQL free analytics: drag-drop application buildVisual analytics: interactive visualisation

Integrative analytics: unified analytical environmentAutomated analytics: deploy to Oracle Portal and BPEL

InforSenseService

Deploy the analytic workflow as a service embedding to BPEL, SFA, CRM

Interact with (visualize) data at any step in the workflow

Deployment

Page 61: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

Find me any compound that looks like my current structure, and that has been tested on any assay in my company where the IC50>200nM, where I know that I have a unique patent position, and hasn't been published in any journal?

Find me any compound that looks like my current structure, and that has been tested on any assay in my company where the IC50>200nM, where I know that I have a unique patent position, and hasn't been published in any journal?

Oracle11gOracle11gOracle11g

Relational

Message

XML

Text

Image

Oracle’s Contribution to Life Sciences

select c.id, p.structure, from compound c, protein p, assay awhere a.compound_id = c.idand a.protein_id = p.idand a.company = “BIO_SYS”and a.IC50 > 200nM and similar_to(p.id, “protein kinase”)and not_published(p.id, “Medline”)and extract_value(value(p.id), ‘Dgene/Protein/Id’) = p.id

Page 62: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Copyright © 2007 Oracle Corporation

• Oracle 11g Enables you to:• Access distributed data• Integrate a variety of data types• Manage vast quantities of data• Collaborate securely• Find patterns and insights

• Oracle 11g is an ideal platform for health & life sciences

Oracle Life & Health Sciences Platform

Page 63: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

<Insert Picture Here>

Copyright © 2007 Oracle Corporation

More Information:

Contact Information:Email: [email protected]

Oracle Life Sciences Platform•oracle.com/technology/industries/life_sciences/index.html

Oracle Data Mining 10g •oracle.com/technology/products/bi/odm/index.html

Oracle Statistical Functions•oracle.com/technology/products/bi/stats_fns/index.html

Page 64: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S

Page 65: “This presentation is for informational purposes only and ...insideinformatics.cambridgesoft.com/webinars/pdf/oracle_2007.pdf · “This presentation is for informational purposes

“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”