Poster genome engineering & Synthetic Biology 2016

AUnifiedFramework forAnnotating,EngineeringandDesigningBiologicalSequences

Michiel Stock, Laurentijn Tilleman, Bernard De Baets, Willem [email protected]

Introductory exampleCytochrome P450 is a family of oxidoreductases that shows an enormousdiversity in affinity, specificity and reactivity towards different types ofmolecules. Understanding and exploiting these interactions has a largemedical and biotechnological potential.

inhibits

binds t

o

oxidises

activates

activates

DRUGSP450

PROTEINS

Some possible applications and research questions:

• explore which parts of the molecules determine the molecular inter-action;

• predict whether a drug will be detoxified by P450;

• search for a molecule to inhibit a specific cytochrome P450;

• design a novel P450 to facilitate an industrial reaction.

We present some methods and techniques to represent biomolecules, makefunctional predictions and search or design novel compounds or sequences.

Representing biomoleculesProteins, DNA, RNA or small compounds are complex objects, often rep-resented by sequences and graphs. Kernels are mathematical tools torepresent similarities of general objects x ∈ X by some explicit featurerepresentation φ(x): k(x, x′) = 〈φ(x), φ(x′)〉.

�

X F

k h�(x),�(x0)i

Many kernels that encode prior knowledge exist [3]. Representations ofsequences, graphs, trees, sets, etc. can be obtained by dividing the objectsin subcomponents and defining a suitable convolution.

Pairwise learningUsing observed data, we can build models to make predictions for a pairof objects, such as a cytochrome x and a small molecule y:

f(x, y) = φ(x)TWψ(y) . (1)

Here, the parameters W can be optimized for regression, classification orranking [1, 6].

TRADITIONAL PAIRWISE

molecular interactiontoxic

receptor

Using powerful algorithms, we can learn and evaluate functional relationsfrom large databases of interactions or annotations, e.g. [2, 5, 7].

Searching through the design spaceUsing the bilinear model (1), engineering or designing molecules is formu-lated as an optimization problem. To find the drug y that shows the mostaffinity for a given cytochrome x, one solves

y∗ = argmaxy∈Y

f(x, y) .

SEARCH BASED DE NOVO

query

database

pred

icte

d a

ffin

ity

iterativeoptim

ization

Efficient data structures [3, 8] and algorithms [4] allow for searching hugeor infinite design spaces.

KERMITThe activities of the research unit KERMIT: knowledge-based, predictive and spatio-temporal modelling are orientedtowards the principles and practice of the Extraction, Rep-resentation and Management of Knowledge by means of so-called Intelligent Techniques. These techniques are drawnfrom the fields of artificial and computational intelligence,operations research and natural computing. KERMIT aimsat an optimal blend of fundamental and applied research.Fields of application vary within the applied bio-sciences.

KERMIT

www.kermit.ugent.be

References[1] T. Pahikkala, A. Airola, M. Stock, B. De Baets, and W. Waegeman. Efficient regu-

larized least-squares algorithms for conditional ranking on relational data. MachineLearning, 93(2-3):321–356, 2013.

[2] R. Pelossof, I. Singh, J. L. Yang, M. T. Weirauch, T. R. Hughes, and C. S. Leslie.Affinity regression predicts the recognition code of nucleic acid - binding proteins.Nature Biotechnology, 33(12):1242–1249, 2015.

[3] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cam-bridge University Press, 2004.

[4] M. Stock, K. Dembczynski, B. De Baets, and W. Waegeman. Exact and efficienttop-K inference for multi-target prediction by querying separable linear relationalmodels. Data Mining and Knowledge Discovery, Submitted, 2016.

[5] M. Stock, T. Fober, E. Hüllermeier, S. Glinca, G. Klebe, T. Pahikkala, A. Airola,B. De Baets, and W. Waegeman. Identification of functionally related enzymesby learning-to-rank methods. IEEE Transactions on Computational Biology andBioinformatics, 11(6):1157–1169, 2014.

[6] M. Stock, T. Pahikkala, A. Airola, B. De Baets, and Willem Waegeman. Efficientkernel-based models for pairwise learning. Manuscript in preparation.

[7] J.-P. Vert, J. Qiu, and W. S. Noble. A new pairwise kernel for biological networkinference with support vector machines. BMC Bioinformatics, 8(S-10):1–10, 2007.

[8] V. Vishwanathan and A. Smola. Fast kernels for string and tree matching. Advancesin Neural Information Processing Systems, 15:585–592, 2004.

Science

Poster genome engineering & Synthetic Biology 2016