33
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark [email protected]

Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Biological sequence analysis and information processing by artificial neural networks

Søren Brunak

Center for Biological Sequence Analysis

Technical University of Denmark

[email protected]

Parvis alignment>carp Cyprinus carpio growth hormone 210 aa vs.

>chicken Gallus gallus growth hormone 216 aa

scoring matrix: BLOSUM50, gap penalties: -12/-2

40.6% identity; Global alignment score: 487

10 20 30 40 50 60 70

carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD

:: . : ...:.: . : :. . :: :::.:.:::: :::. ..:: . .::..: .: .:: :.

chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE

10 20 30 40 50 60 70 80

80 90 100 110 120 130 140 150

carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN

: ::.:::..:..: ..:::.:. ::.:: : : ::. .:.:. :. ... ::: ::. ::..:.. : .: .

chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G

90 100 110 120 130 140 150 160

170 180 190 200 210

carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL

.: : .. : . . .:. : ... ::.:::::.:::::::.: .::: .::::.

chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI

170 180 190 200 210

Biological neuron

Diversity of interactions in a network enables complex calculations

• Similar in biological and artificial systems

• Excitatory (+) and inhibitory (-) relations between compute units

Transfer of biological principles to neural network algorithms

• Non-linear relation between input and output

• Massively parallel information processing

• Data-driven construction of algorithms

• Ability to generalize to new data items

Simplest non-trivial classification problem

CNHSYYP, HIETRRA, NWQSADY, NQYSEPR, WHITRCA, DYHSANY, ...

• Two categories: positives and negatives• Data described by two features, e.g. charge, sidechain volume, molecular weight, number of atoms, ...

Features of phosphorylations sites

PKGcGMP-dep.kinase

PKC

CaM-IICa++/cal-modulin-dep. kinase

cdc2Cyclin-dep.kinase 2

CK-IICasein kinase 2

Homotypical cerebral cortex –(from primate) - 6 layers

DEMO

negativepositive

Training and error reduction

Transfer of biological principles to neural network algorithms

• Non-linear relation between input and output

• Massively parallel information processing

• Data-driven construction of algorithms

Sparse encoding of amino acid sequence windows

Sparse encoding of nucleotide sequence windows

Nucleotides

4 letter alphabet

Normally no need for a fifth letter

ACGTAGGCAATCTCAGACGTTTATC

1000010000100001100000100010010010001000000101000001010010000010100001000010000100010001100000010100