47
Oracle BIWA Summit 2015 Personalized Healthcare using large genomic datasets and Oracle Exadata René Kuipers, Principal Consultant VX Company

Biwa summit15

Embed Size (px)

Citation preview

Page 1: Biwa summit15

Oracle BIWA Summit 2015

Personalized Healthcare using large genomic datasets and Oracle Exadata

René Kuipers, Principal Consultant VX Company

Page 2: Biwa summit15

Oracle BIWA Summit 2015

It’s all in the genes Personalized Healthcare using large genomic datasets and Oracle Exadata

Page 3: Biwa summit15

Oracle BIWA Summit 2015

About me

Principal Consultant Data and BI Solutions Datawarehouse Architect Business Intelligence specialist Master degree in Biochemistry –  molecular biology –  cancer genetics

Page 4: Biwa summit15

Oracle BIWA Summit 2015

Agenda

Basic genetics –  analyses

Technology behind this What does it look like The next step: combining genomic data with patient data When both worlds meet

Page 5: Biwa summit15

Oracle BIWA Summit 2015

BASIC GENETICS Set the context

Page 6: Biwa summit15

Oracle BIWA Summit 2015

Page 7: Biwa summit15

Oracle BIWA Summit 2015

Chromosomes

Page 8: Biwa summit15

Oracle BIWA Summit 2015

Genes

Page 9: Biwa summit15

Oracle BIWA Summit 2015

DETERMINING THE GENETIC SEQUENCE

basic genetics

Page 10: Biwa summit15

Oracle BIWA Summit 2015

Genetic sequence

Blood / cancer tissue DNA isolation DNA amplification DNA Sequencing (40x - 80x)

Page 11: Biwa summit15

Oracle BIWA Summit 2015

Genetic sequence

approx. 5% of DNA is gene approx. 95% of DNA is referred to as ‘junk-DNA’ 99% of entire DNA sequence is stable Genetic variations are normal

Page 12: Biwa summit15

Oracle BIWA Summit 2015

Page 13: Biwa summit15

Oracle BIWA Summit 2015

DNA (Next Generation) Sequencing From blood-sample to DNA sequence 3 billion basepairs 2 TB per sample unique: whole genomes

Page 14: Biwa summit15

Oracle BIWA Summit 2015

Abnormal genetic variations

Page 15: Biwa summit15

Oracle BIWA Summit 2015

Searching for the unknown genetic variations normal genetic variations cancer better diagnoses require better analyses. Upfront (predictive) diagnoses require a lot of data and processing power. result: less-invasive treatment, better patient-life. What did we not know (yet)

–  and can be learned from Ultimate goal: centralized DNA library for statistical purposes

Page 16: Biwa summit15

Oracle BIWA Summit 2015

THE TECHNOLOGY BEHIND THIS

Page 17: Biwa summit15

Oracle BIWA Summit 2015

DNA (Next Generation) Sequencing

3 billion basepairs 2 TB per sample Whole genomes

Page 18: Biwa summit15

Oracle BIWA Summit 2015

Handling large volumes Oracle Database

–  Partitioning –  Optimized data model

Oracle Exadata Database Machine –  Optimized to run Oracle Database –  Specific performance features

-  Smart Scans -  Exadata Hybrid Columnar Compression

Performance increase: 700x

Page 19: Biwa summit15

Oracle BIWA Summit 2015

Handling large volumes - database benefits

Datamodel V1 –  Sample-oriented (partitioned) –  Each base-position stored (compared to reference genome)

-  leads to 95% no-calls –  206 samples --> 800 GB

-  max 2500 samples on Exadata –  Indexes are (still) needed: Index size 5x larger than sample-size

Page 20: Biwa summit15

Oracle BIWA Summit 2015

Handling large volumes - database benefits

Datamodel V2 –  Sample-oriented (partitioned) –  positions are stored as regions (buckets)

-  1000 positions per region –  Buckets are indexed –  EHCC Compression –  Reduce redundant data

-  Store allele 1 and 2 as 1 row when values are equal –  Storage 99GB (246 samples)

-  Up to 20.000 samples

–  Indexes require less space than in Datamodel V1

Page 21: Biwa summit15

Oracle BIWA Summit 2015

Exadata benefits

Flash Parallel processing Smart Scans Exadata Hybrid Columnar Compression Let’s have a look…

–  video’s courtesy of Frits Hoogland

Page 22: Biwa summit15

Oracle BIWA Summit 2015

Executed tests

Nr Exadata features Parallel Disk type

1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD

Page 23: Biwa summit15

Oracle BIWA Summit 2015

Executed tests

Nr Exadata features Parallel Disk type

1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD

Page 24: Biwa summit15

Oracle BIWA Summit 2015 24

Page 25: Biwa summit15

Oracle BIWA Summit 2015 25

Page 26: Biwa summit15

Oracle BIWA Summit 2015

Executed tests

Nr Exadata features Parallel Disk type

1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD

Page 27: Biwa summit15

Oracle BIWA Summit 2015 27

Page 28: Biwa summit15

Oracle BIWA Summit 2015 28

Page 29: Biwa summit15

Oracle BIWA Summit 2015

Executed tests

Nr Exadata features Parallel Disk type

1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD

Page 30: Biwa summit15

Oracle BIWA Summit 2015 30

Page 31: Biwa summit15

Oracle BIWA Summit 2015 31

Page 32: Biwa summit15

Oracle BIWA Summit 2015

Executed tests

Nr Exadata features Parallel Disk type

1 - Serial HDD2 - Serial FDD3 - 64 HDD4 - 64 FDD5 SS Serial HDD6 SS Serial FDD7 SS 64 HDD8 SS 64 FDD9 SS + EHCC 64 FDD

Page 33: Biwa summit15

Oracle BIWA Summit 2015 33

Page 34: Biwa summit15

Oracle BIWA Summit 2015

Query performance (times are seconds)

Nr Exadata features Parallel Disk type 11.2.0.1 11.2.0.2

1 - Serial HDD 695 153 2 - Serial FDD 403 91 3 - 64 HDD 19 18 4 - 64 FDD 16 13 5 SS Serial HDD 41 6 SS Serial FDD 37 7 SS 64 HDD 13 8 SS 64 FDD 6 9 SS + EHCC 64 FDD 1

Page 35: Biwa summit15

Oracle BIWA Summit 2015

WHAT DOES IT LOOK LIKE ?

Page 36: Biwa summit15

Oracle BIWA Summit 2015

Page 37: Biwa summit15

Oracle BIWA Summit 2015

Why is this important?

Speed –  Faster results –  ‘No’ is found earlier

Volume (Centralized DNA Library)

–  Better statistical basis –  Less-invasive treatments for patients –  Personalized healthcare

Page 38: Biwa summit15

Oracle BIWA Summit 2015

Even more…

Add clinical data to genomic data. –  Patient history –  Drug treatment history –  Demographics

Clinical Data Biobanks

Lab Systems Omic Data

Integration of Data

Page 39: Biwa summit15

Oracle BIWA Summit 2015

Oracle Translational Research Center (TRC)

Page 40: Biwa summit15

Oracle BIWA Summit 2015

Page 41: Biwa summit15

Oracle BIWA Summit 2015

Advanced visualizations

Page 42: Biwa summit15

Oracle BIWA Summit 2015

Page 43: Biwa summit15

Oracle BIWA Summit 2015

Future

•  Extend Huvariome to use Hadoop for raw reads. •  Big Data Discovery •  Big Data SQL

•  Advanced visualizations •  D3 •  Spotfire

•  RNA expression data •  Pigs / cows / chickens •  Multitenancy •  Cloud offering •  In-memory analyses

Page 44: Biwa summit15

Oracle BIWA Summit 2015

Page 45: Biwa summit15

Oracle BIWA Summit 2015

Summary

Care is primary. –  Technology is supporting.

Oracle offers platforms to provide better care –  Database –  Exadata –  TRC

Clinical and Genomic data are complimentary. Not everything is in the genes…

Page 46: Biwa summit15

Oracle BIWA Summit 2015

Page 47: Biwa summit15

Oracle BIWA Summit 2015