10
PHUSE Connect - DH-08 Controlled and FAIR Data Access with JDBC and Arrow including Clinical Data Ronald Steinhau, Entimo AG 2020-SEP-25

PHUSE Connect -DH-08

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PHUSE Connect -DH-08

PHUSE Connect 2020 / DH-08

PHUSE Connect - DH-08Controlled and FAIR Data

Access with JDBC and Arrow including Clinical Data

Ronald Steinhau, Entimo AG2020-SEP-25

Page 2: PHUSE Connect -DH-08

PHUSE Connect 2020 / DH-08

§ Securely access clinical data from all development tools§ Ensure data security (e.g. fine granular access rights)§ Support older (JDBC) and newer (Arrow) standards for data transport§ Support your favorite development tools

§ R-Studio, Python (Jupyter, Python IDE, …)§ Java/Scala/SAS§ Spark, other Big Data Tools…

§ Fully leverage your infrastructure§ Maximum transfer speeds § Streaming rather then copy & use§ Caching for lasting performance in builds and pipelines

§ Combine clinical data with any other data§ Architecture Presentation

§ entimICE Data Access (EDA)

Vision

Page 3: PHUSE Connect -DH-08

PHUSE Connect 2020 / DH-08

Extendable Data Access (EDA) - Architecture

entimICE File-System

ExternalFile-Systems

ExternalDatabases

entimICE ClinRep-DB

Apache Arrow (Flight) Shared Proxy Service

Apache Ignite/H2 Shared Memory

R-Studio JupyterR/PythonSpotfire SAS PL/SQLExcel

JDBC/ODBC(virtual DB)

EDA-Service

Big-Data-Tools

Web Data Grid (Server Side)

GandivaGPUs

EDA Data Grid via R-ShinyTools

Web API‘s (JSON)

EDA Web Frontend

entimICE Indexing

Data

EDA-Grabber

Remote JDBC(virtual DB)

DatatablesAG-Grid

Page 4: PHUSE Connect -DH-08

PHUSE Connect 2020 / DH-08

§ Findable§ Use entimICE metadata, derived schema and index§ Optional integrate data catalog

§ Accessible§ Use JDBC and Apache Arrow to securely access the data§ Enriched standard tools (R, Python) by custom packages

§ Interoperable§ Join data with SQL from all sources (files, tables, streams)

§ Reusable§ entimICE metadata§ Auto-derived schema information

FAIR Data Access with EDA

Page 5: PHUSE Connect -DH-08

PHUSE Connect 2020 / DH-08

§ Large JAR file for easy tool integration § optional ODBC bridge support

§ Operates as a virtual (in-memory) database§ Respects access rights after login§ Supports joins between any dataset

§ SAS datasets, Database, CSV, Web-API’s§ Full ANSI SQL standard supported§ Supports all entimICE datasets as a data source§ Adaptable to other clinical data sources

Access by EDA smart JDBC Diver (Virtual DB)

Page 6: PHUSE Connect -DH-08

PHUSE Connect 2020 / DH-08

Apache Arrow Architecture

entimICER-StudioJupyter

Gandiva

Page 7: PHUSE Connect -DH-08

PHUSE Connect 2020 / DH-08

§ Apache Arrow new (faster) Data Exchange Standard§ Version 1.0 since August 2020§ Evolution of ideas from Parquet and other columnar formats

§ Columnar optimized data format§ Organized in same typed vectors (compact and fast)§ Parallel vector transfer over network available (gRPC)§ Dictionary encoded arrays (option) for compression (code lists)

§ Tool-Support§ R-Studio à read/write/convert dataframes to/from arrow§ Jupyter à transform into python data structures§ Convenience Packages by entimICE (supporting EDA arrow server)§ Big Data tools like Spark, Pandas, Dremio support Arrow

Access by Apache Arrow

Page 8: PHUSE Connect -DH-08

PHUSE Connect 2020 / DH-08

§ Transparent In-Memory Caching§ Multi-Level Caching (Fast File, In-Memory)§ Controlled via Arrow/JDBC§ No HDFS required

§ Web Data Grid§ Fast data evaluation (sort, filter, search) of large datasets§ All filtering/sorting/searching server side (e.g. in cache)

§ Data Catalog Integration Hooks§ Find data already covered by catalogs, indices and metadata§ Customize EDA by catalog integration

EDA Productivity Enhancements

Page 9: PHUSE Connect -DH-08

PHUSE Connect 2020 / DH-08

§ entimICE EDA architecture

§ Covers secure and fast access to clinical data§ Supports most favorite development tools§ Caching provides full leverage of infrastructure§ FAIR principle implemented

§ Vision turned to reality!

Summary

Page 10: PHUSE Connect -DH-08

PHUSE Connect 2020 / DH-08

Thank you for your attention!

For questions please contact:

[email protected]