Best practice reference architecture for
data standardization and curation
Dr. Michael Engels, OSTHUS
BioIT World, Boston – April 21st 2015
Slide 2
Agenda
OSTHUS – Who we are
Painpoints
Reference architecture
Use cases
Benefits
Slide 3
Who we are
Slide 4
Cutting edge in R&D
Global partner
Independent
Digital Lab Informatics
Innovation
Active network
Open collaboration
Customer orientation
Trust
Who we are
Slide 5
Who we are
Focus on value Concepts and
methodology
Approach &
committment
Slide 6
Agenda
OSTHUS – Who we are
Painpoints
Reference architecture
Use cases
Benefits
Slide 7
Life science data
Scientific data are
Valuable assets to NGO, academic and industries
Domain/context specific
Only interpreted by experts
Scientific data are subject of continuous change:
Growth
Formats, standards, and technology
Concept extensions
Context changes
Slide 8
Change of concepts
Phenomenological based concept Gene-based concept
Pharmacology example: Ion channels taxonomy
Slide 9
Painpoints
Data standardization, data curation, master data management,
data migration, ….
Are complex endeavor's
Are labor, and alignment-intensive
Need expert input (technical and scientific)
Are highly iterative
Are difficult to frame in time-lines or costs
How to address this challenge?
Slide 10
Agenda
OSTHUS – Who we are
Painpoints
Reference architecture
Use cases
Benefits
Slide 11
Reference architecture
Data migration
Manage
Curation runs
Manage
Results
Analysis
I
II
III
IV
…...
Manage
Dictionary
Data
Source
Sources
Copy Copy of target Working area
Transformation Glossary and Vocabulary Property Mapping
Extraction &
Loading
Data Concept
Target
Data
Source Glossary
Vocabulary
Annotation
Rules
Mapping
Rules
Transformation
Rules
Run
Configuration
Data
partitioning
Data
Processing
Filtering
Monitoring &
Audit
Logs & Observ.
Exceptions
Comments
Dashboard
Calculate
Properties
Data
Comparison
Visual
Analytics
Tag
Data
List
Management
CDC
SQL to Load
Audit Trails
Slide 12
Agenda
OSTHUS – Who we are
Painpoints
Reference architecture
Use cases
Benefits
Slide 13
Use case 1
Chemical cartridge/structure migration
Accord Mol2000
#1: racemic
#1
Big Bang
Slide 14
Use case 2
Data integration – DWH
Continuous Growth
Slide 15
Agenda
OSTHUS – Who we are
Painpoints
Reference architecture
Use cases
Benefits
Slide 16
Benefits
Benefits are
Modular set up
All functions available within one integrated framework
Separate components for technical and scientific experts alike
Data curation – part of a process not of individual data editing
Easy-to-use
Configurable toolbox tailored to any program
Integrated visual / comparative analysis between source and target data
Reduction of technical issues
Error propagation contained, roll backs possible
Focus on data, not on technology
Slide 17
Questions?
For more information:
Visit us at Booth # 451
or at Poster # 47