21
Representation of biological data, information and knowledge: opportunities offered by systemic biology V. Fromion [email protected] INRA, MaAIGE, Université Paris-Saclay, Jouy-en-josas DATAIA-JST International Symposium on Data Science and AI Paris, July 11, 2018

Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Representation of biological data,

information and knowledge:

opportunities offered by

systemic biology

V. [email protected]

INRA, MaAIGE, Université Paris-Saclay, Jouy-en-josas

DATAIA-JST International Symposium on Data Science and AI

Paris, July 11, 2018

Page 2: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Enormous progress made by biologists in the understanding of the functioning ofliving systems (in particular at infra-cell scale)

New observation technologies combined with a continuous increase inmeasurement quality and a drastic reduction in costs, i.e. the data Deluge !

Great progress in the multi-scale integration of living systems, in particular for thebacterial cell through the use of the systems biology approach, i.e. Kitano'sapproach

A lot of progress in the biological field in 3 complementary aspects

The presentation focuses on issues raised by knowledge,

information, data and model management attached to a

bacterial cell (infra and cell scale)

Page 3: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

DATA deluge in the biological field …

Omics technologies

Biological data production

Needs for data and knowledgemanagement

Bio-ontologies: controlled vocabularies for

• Knowledge representation in Biology• Structuration• Indexation/annotation• Data sharing• Search retrieval

3

Page 4: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Cellular and molecular biology is a wide and heterogeneous field

Phenomics

Interactomics

Fluxomics

Omics

heterogeneous investigation design

Gra

nu

lari

ty

water ions metabolites protein t-RNA heterogeneous

complex

Universal molecules Individual dependent molecules

heterogeneous molecular entities heterogeneous processes

Page 5: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Bio-ontologies are useful tools to formalize biological knowledge representation…

Phenomics

Interactomics

Fluxomics

Omics

heterogeneous investigation design

Gra

nu

lari

ty

water ions metabolites protein t-RNA heterogeneous

complex

Universal molecules Individual dependent molecules

heterogeneous molecular entities heterogeneous processes

ED

AM

Page 6: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Chemical

GeneGene product

Sequence

GO-Molecular Function (MF)

GO-Biological Process (BP)

GO-Cellular Component (CC)

• Annotations are gene-centered• genes and gene products have the same annotations• independent of the state of a molecule

• Annotations are “implicit” information 3

Bio-ontologies are useful tools to formalize biological knowledge representation…

Page 7: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Systems Biology is a suitable framework to integrate heterogeneous entities at

different scales…

… but even if a mathematical model is a formal object,

it does not really manage the knowledge

Unformal biological data integration

Mathematical models

Page 8: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Systemic approach appears to be a good integrative framework to support

and relate the representation of biological and mathematical knowledge

Biological representation of enzymatic reactionMathematical representation

of enzymatic reaction

Page 9: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Substract

binding

Product

release

A

B

E

C

D

E-AB cplx

Biological representation of enzymatic reactionMathematical representation

of enzymatic reaction

has_input/output

has_subprocess

has_parameter

Systemic representation of the enzymatic reaction

Systemic approach appears to be a good integrative framework to support

and relate the representation of biological and mathematical knowledge

Page 10: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Enzymatic

reaction

Substract

binding

Product

release

A

B

E

C

D

E-AB cplx

KM

VM

Biological representation of enzymatic reactionMathematical representation

of enzymatic reaction

has_input/output

has_subprocess

has_parameter

Multi-scale systemic representation of the enzymatic reaction

Systemic approach appears to be a good integrative framework to support

and relate the representation of biological and mathematical knowledge

Page 11: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Enzymatic

reaction

Substract

binding

Product

release

A

B

E

C

D

E-AB cplx

KM

VM

Biological representation of enzymatic reactionMathematical representation

of enzymatic reaction

has_input/output/mediator

has_subprocess

has_parameter

A

B

C

D

E

Automatic aggregation of the multi-scale systemic representation of the enzymatic reaction

has_input

has_input

process Input

Process

has_subprocess

Logical rules

Systemic approach appears to be a good integrative framework to support

and relate the representation of biological and mathematical knowledge

Page 12: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

The main hypothesis of the approach

• In systemic approach, the representation is process-centered

• The information are supported by the process

• The molecule properties are conditioned by the biological process to which the molecule belongs

A fine description of biological processes as an instances should automatically conferred properties to its participants

Process

function?

activity

type?role?

INOUT

Page 13: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Systemic approach: a process-centered representation of systems

input 1 output 2process 3

activity 3

mediator 1 mediator 2

process 1 process 2input 1 input 2output 1 = output 2

activity 1 activity 2

mediator 1 mediator 2

pro

cess

3

Page 14: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

System biology: a process-centered representation of biology

input 1 output 2process 3

activity 3

mediator 1 mediator 2

process 1 process 2input 1 input 2output 1 = output 2

activity 1 activity 2

mediator 1 mediator 2

pro

cess

3

output

substrate product

Enzymaticreaction 1

activity1

inputoutput

enzyme1

input

substrate

product

Enzymaticreaction 2

activity3

enzyme2

Metabolicpathway

has_subprocess has_subprocess

Proteinassembly

activity1

inputoutput

Page 15: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Protein-coding gene production

Clearance

Binding sequence process

polymerisation process

loading process

release process

translocation proces

Crawling

chemical process

bioBiPON> 300 biological processes and subprocesses with representative singletons as instances

modelBiPON 9 abstract processes defined by mathematical expression

Bacterial interlocked Process ONtology (BiPON)

SWRL rules &automatic reasoning

• Complex and heterogeneous biological knowledge at the molecular scale • could be described using a systemic representation• could be automatically reclassify under a few more abstract processes

and gain new properties

Gene ExpressiontRNA

aminoacylationRNA

degradation

translation

ribosome assembly

regulation

RNA maturation

transcription

System biology: a process-centered representation of biology

Page 16: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Protein-coding gene production

Clearance

Binding sequence process

polymerisation process

loading process

release process

translocation proces

Crawling

chemical process

bioBiPON> 300 biological processes and subprocesses with representative singletons as instances

modelBiPON 9 abstract processes defined by mathematical expression

Bacterial interlocked Process ONtology (BiPON)

SWRL rules &automatic reasoning

• Complex and heterogeneous biological knowledge at the molecular scale • could be described using a systemic representation• could be automatically reclassify under a few more abstract processes

and gain new properties

Gene ExpressiontRNA

aminoacylationRNA

degradation

translation

ribosome assembly

regulation

RNA maturation

transcription

System biology: a process-centered representation of biology

Page 17: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Cellular

Process

BioChemicals

or

Sequences

BioChemicals

or

Sequences

BioChemicals or Sequences

Cellular

subprocess

Cellular

subprocess

Modeling heterogeneous and multi-scale processes

of bacterial gene expression

Gene

Expression

Transcription Translation

Initiation

Pre-Initiation

Factor

Binding

has_input/output

has_subprocess

has_parameter

has_model

is_a (infered)

Biological knowledge representation

300 multi-scale cellular processes involvein the full aggregated Gene Expressionprocess and its regulation processes

Page 18: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Cellular

Process

BioChemicals

or

Sequences

BioChemicals

or

Sequences

BioChemicals or Sequences

Cellular

subprocess

Cellular

subprocess

Modeling heterogeneous and multi-scale processes

of bacterial gene expression

Input Types Output TypesModeled

Process

Mathematical

ExpressionParameter

has_input/output

has_subprocess

has_parameter

has_model

is_a (infered)

Biological knowledge representation

Mathematical modeling knowledge representation

9 multi-scale Modeling Processes related totheir mathematical expressions and parameters

Page 19: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Cellular

Process

BioChemicals

or

Sequences

BioChemicals

or

Sequences

BioChemicals or Sequences

Cellular

subprocess

Cellular

subprocess

Modeling heterogeneous and multi-scale processes

of bacterial gene expression

Input Types Output TypesModeled

Process

Mathematical

ExpressionParameter

has_input/output

has_subprocess

has_parameter

has_model

is_a (infered)

Biological knowledge representation

Mathematical modeling knowledge representation

Our model:• Could describe heterogeneous using a systemic multi-scale representation

with a single pattern• Could automatically relate Biological Process to Mathematical Models

Page 20: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Conclusion

Just a change of point of view :• Processes are already described (GO-BP & GO-MF)• Some are in relationship with chemical (GO-plus / LEGO)• Public databases contain annotated data

Needs a “as fine as possible” description of biological processes and molecular states. This description is based on:

• description of different states of molecule (multimer, PTM,…)

• a systematic template (few properties define a process)• genericity (adapted to all biochemical reactions)• plasticity (flexibility of SWRL rules)

Page 21: Representation of biological data, information and ...dataia.eu/sites/default/files/DATAIA_JST... · living systems (in particular at infra-cell scale) New observation technologies

Vincent Henry, Arnaud Ferré, Anne GoelzerChristine Froidevaux,

and also

Fatiha Sais, Elodie Marchadier, Juliette Dibie

Thanks!

[email protected]

Process

function?

activity

type?role?

INOUT