1
A Unified Framework for Annotating, Engineering and Designing Biological Sequences Michiel Stock , Laurentijn Tilleman, Bernard De Baets, Willem Waegeman [email protected] Introductory example Cytochrome P450 is a family of oxidoreductases that shows an enormous diversity in affinity, specificity and reactivity towards different types of molecules. Understanding and exploiting these interactions has a large medical and biotechnological potential. inhibits binds to oxidises activates activates DRUGS P450 PROTEINS Some possible applications and research questions: explore which parts of the molecules determine the molecular inter- action; predict whether a drug will be detoxified by P450; search for a molecule to inhibit a specific cytochrome P450; design a novel P450 to facilitate an industrial reaction. We present some methods and techniques to represent biomolecules, make functional predictions and search or design novel compounds or sequences. Representing biomolecules Proteins, DNA, RNA or small compounds are complex objects, often rep- resented by sequences and graphs. Kernels are mathematical tools to represent similarities of general objects x ∈X by some explicit feature representation φ(x): k (x, x 0 )= hφ(x)(x 0 )i. φ X F k hφ(x)(x 0 )i Many kernels that encode prior knowledge exist [3]. Representations of sequences, graphs, trees, sets, etc. can be obtained by dividing the objects in subcomponents and defining a suitable convolution. Pairwise learning Using observed data, we can build models to make predictions for a pair of objects, such as a cytochrome x and a small molecule y : f (x, y )= φ(x) T (y ) . (1) Here, the parameters W can be optimized for regression, classification or ranking [1, 6]. TRADITIONAL PAIRWISE molecular interaction toxic receptor Using powerful algorithms, we can learn and evaluate functional relations from large databases of interactions or annotations, e.g. [2, 5, 7]. Searching through the design space Using the bilinear model (1), engineering or designing molecules is formu- lated as an optimization problem. To find the drug y that shows the most affinity for a given cytochrome x, one solves y * = arg max y ∈Y f (x, y ) . SEARCH BASED DE NOVO query database predicted affinity iterative optimization Efficient data structures [3, 8] and algorithms [4] allow for searching huge or infinite design spaces. KERMIT The activities of the research unit KERMIT: knowledge- based, predictive and spatio-temporal modelling are oriented towards the principles and practice of the Extraction, Rep- resentation and Management of Knowledge by means of so- called Intelligent Techniques. These techniques are drawn from the fields of artificial and computational intelligence, operations research and natural computing. KERMIT aims at an optimal blend of fundamental and applied research. Fields of application vary within the applied bio-sciences. KERMIT www.kermit.ugent.be References [1] T. Pahikkala, A. Airola, M. Stock, B. De Baets, and W. Waegeman. Efficient regu- larized least-squares algorithms for conditional ranking on relational data. Machine Learning, 93(2-3):321–356, 2013. [2] R. Pelossof, I. Singh, J. L. Yang, M. T. Weirauch, T. R. Hughes, and C. S. Leslie. Affinity regression predicts the recognition code of nucleic acid - binding proteins. Nature Biotechnology, 33(12):1242–1249, 2015. [3] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cam- bridge University Press, 2004. [4] M. Stock, K. Dembczynski, B. De Baets, and W. Waegeman. Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models. Data Mining and Knowledge Discovery, Submitted, 2016. [5] M. Stock, T. Fober, E. Hüllermeier, S. Glinca, G. Klebe, T. Pahikkala, A. Airola, B. De Baets, and W. Waegeman. Identification of functionally related enzymes by learning-to-rank methods. IEEE Transactions on Computational Biology and Bioinformatics, 11(6):1157–1169, 2014. [6] M. Stock, T. Pahikkala, A. Airola, B. De Baets, and Willem Waegeman. Efficient kernel-based models for pairwise learning. Manuscript in preparation. [7] J.-P. Vert, J. Qiu, and W. S. Noble. A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8(S-10):1–10, 2007. [8] V. Vishwanathan and A. Smola. Fast kernels for string and tree matching. Advances in Neural Information Processing Systems, 15:585–592, 2004.

Poster genome engineering & Synthetic Biology 2016

Embed Size (px)

Citation preview

Page 1: Poster genome engineering & Synthetic Biology 2016

AUnifiedFramework forAnnotating,EngineeringandDesigningBiologicalSequences

Michiel Stock, Laurentijn Tilleman, Bernard De Baets, Willem [email protected]

Introductory exampleCytochrome P450 is a family of oxidoreductases that shows an enormousdiversity in affinity, specificity and reactivity towards different types ofmolecules. Understanding and exploiting these interactions has a largemedical and biotechnological potential.

inhibits

binds t

o

oxidises

activates

activates

DRUGSP450

PROTEINS

Some possible applications and research questions:

• explore which parts of the molecules determine the molecular inter-action;

• predict whether a drug will be detoxified by P450;

• search for a molecule to inhibit a specific cytochrome P450;

• design a novel P450 to facilitate an industrial reaction.

We present some methods and techniques to represent biomolecules, makefunctional predictions and search or design novel compounds or sequences.

Representing biomoleculesProteins, DNA, RNA or small compounds are complex objects, often rep-resented by sequences and graphs. Kernels are mathematical tools torepresent similarities of general objects x ∈ X by some explicit featurerepresentation φ(x): k(x, x′) = 〈φ(x), φ(x′)〉.

X F

k h�(x),�(x0)i

Many kernels that encode prior knowledge exist [3]. Representations ofsequences, graphs, trees, sets, etc. can be obtained by dividing the objectsin subcomponents and defining a suitable convolution.

Pairwise learningUsing observed data, we can build models to make predictions for a pairof objects, such as a cytochrome x and a small molecule y:

f(x, y) = φ(x)TWψ(y) . (1)

Here, the parameters W can be optimized for regression, classification orranking [1, 6].

TRADITIONAL PAIRWISE

molecular interactiontoxic

receptor

Using powerful algorithms, we can learn and evaluate functional relationsfrom large databases of interactions or annotations, e.g. [2, 5, 7].

Searching through the design spaceUsing the bilinear model (1), engineering or designing molecules is formu-lated as an optimization problem. To find the drug y that shows the mostaffinity for a given cytochrome x, one solves

y∗ = argmaxy∈Y

f(x, y) .

SEARCH BASED DE NOVO

query

database

pred

icte

d a

ffin

ity

iterativeoptim

ization

Efficient data structures [3, 8] and algorithms [4] allow for searching hugeor infinite design spaces.

KERMITThe activities of the research unit KERMIT: knowledge-based, predictive and spatio-temporal modelling are orientedtowards the principles and practice of the Extraction, Rep-resentation and Management of Knowledge by means of so-called Intelligent Techniques. These techniques are drawnfrom the fields of artificial and computational intelligence,operations research and natural computing. KERMIT aimsat an optimal blend of fundamental and applied research.Fields of application vary within the applied bio-sciences.

KERMIT

www.kermit.ugent.be

References[1] T. Pahikkala, A. Airola, M. Stock, B. De Baets, and W. Waegeman. Efficient regu-

larized least-squares algorithms for conditional ranking on relational data. MachineLearning, 93(2-3):321–356, 2013.

[2] R. Pelossof, I. Singh, J. L. Yang, M. T. Weirauch, T. R. Hughes, and C. S. Leslie.Affinity regression predicts the recognition code of nucleic acid - binding proteins.Nature Biotechnology, 33(12):1242–1249, 2015.

[3] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cam-bridge University Press, 2004.

[4] M. Stock, K. Dembczynski, B. De Baets, and W. Waegeman. Exact and efficienttop-K inference for multi-target prediction by querying separable linear relationalmodels. Data Mining and Knowledge Discovery, Submitted, 2016.

[5] M. Stock, T. Fober, E. Hüllermeier, S. Glinca, G. Klebe, T. Pahikkala, A. Airola,B. De Baets, and W. Waegeman. Identification of functionally related enzymesby learning-to-rank methods. IEEE Transactions on Computational Biology andBioinformatics, 11(6):1157–1169, 2014.

[6] M. Stock, T. Pahikkala, A. Airola, B. De Baets, and Willem Waegeman. Efficientkernel-based models for pairwise learning. Manuscript in preparation.

[7] J.-P. Vert, J. Qiu, and W. S. Noble. A new pairwise kernel for biological networkinference with support vector machines. BMC Bioinformatics, 8(S-10):1–10, 2007.

[8] V. Vishwanathan and A. Smola. Fast kernels for string and tree matching. Advancesin Neural Information Processing Systems, 15:585–592, 2004.