6

Click here to load reader

Computational intelligence and machine learning in bioinformatics

Embed Size (px)

Citation preview

Page 1: Computational intelligence and machine learning in bioinformatics

GUEST EDITORIAL

Computational intelligence and machine learningin bioinformatics

Artificial Intelligence in Medicine (2009) 45, 91—96

http://www.intl.elsevierhealth.com/journals/aiim

1. Introduction

The treatment of huge amounts of data delivered byhigh-throughput bio-technologies requires on onehand advanced data management procedures foran efficient storage and retrieval of biological infor-mation [1], and on the other hand refined methodsto extract and model biological knowledge from thedata [2].

Computational intelligence (CI) and machinelearning (ML) methods are widely applied to theextraction of biological knowledge from bio-mole-cular data [3,4], in order to obtain models to bothrepresent biological knowledge and to predict thecharacteristics of biological systems.

It is worth noting that a well-known ML algorithm(namely, the perceptron, inspired to the behaviourof a neuronal cell [5]), has just been applied tobioinformatics in the eighties to distinguish transla-tion initiation sites in prokaryotic organisms [6], andstarting from this early application, a growing num-ber of CI and MLmethods has been applied and oftendeveloped to deal with a wide range of bioinfor-matics problems in genomics, proteomics, geneexpression analysis, biological evolution, systemsbiology, and other relevant bioinformatics domains.

Genomics studies biological sequences at genomelevel. CI and ML methods have been applied to thereconstruction and sequencing of entire genomes[7,8], to the extraction and identification of thestructure of genes [9,10], to the identification andanalysis of regulatory non coding DNA elements[11,12], to the genome-wide identification of genesinvolved in genetic diseases [13], to the prediction ofphenotypic effects of non synonymous single nucleo-tide polymorphisms [14], to identify RNA structuralelements [15], to model haplotype blocks [16], tosplice site prediction [17], to thedetection of gene togene interactions in studies of human diseases [18],

0933-3657/$ — see front matter # 2008 Elsevier B.V. All rights resedoi:10.1016/j.artmed.2008.08.014

to multiple alignment of bio-sequences in phyloge-nomics [19], and to many other relevant genomicsproblems.

In proteomics the main problem of the predictionof the secondary and tertiary structure of proteinsrepresent one of the main challenges for CI and MLmethods in bioinformatics [20]. Another key pro-blem in proteomics (and in genomics too) is theprediction of the functions of the proteins andgenes: the large scale sequencing programs makeavailable sequences of entire genomes of severalorganisms, but besides the identification of genes,we need to understand their properties and thefunctions of the corresponding proteins [21,22].Many other problems in proteomics have been for-malized as ML problems, such as fold recognition[23], the prediction of contact maps [24], and pro-tein sub-cellular localization [25].

Gene expression data analysis and in particulartranscriptomics is another well-established bioin-formatics domain where CI and ML methods havebeen widely applied, providing significant results inseveral fields of molecular biology and medicine[26,27]. Three kinds of problems have been mainlystudied within the community of bioinformaticiansfor answering three basic questions [28]: (a) classprediction, that is the determination of the func-tional state of a cell or tissue through the expressionlevel of its genes [29,30]; (b) gene selection: theidentification of genes correlated to the functionalstate under investigation [4]; (c) class discovery:analysis of the groups (clusters) of co-expressed andfunctionally correlated genes/examples [31].

Systems biology is an emerging bioinformaticsarea [32] where ML and CI techniques play a centralrole. Indeed modeling biological processes insidecells, and more in general biological systems,require the development of mathematical modelsand learning methods to fit the models to biological

rved.

Page 2: Computational intelligence and machine learning in bioinformatics

92 Guest editorial

data. In particular, probabilistic graphical modelshave been widely applied to model biological net-works [33], ranging from genetic [34] to metabolic[35] and signal transduction networks [36].

In other relevant bioinformatics and bioinfor-matics-related areas, such as biomedical imageanalysis, ML and CI methods have been successfullydeveloped and applied, but of course a thoroughoverview of this so wide and growing research area isfar beyond the scope of this editorial.

2. Special issue contents

The special issue presents 13 papers with contribu-tions coming from diverse areas of ML and CI meth-ods for bioinformatics. The papers have beenselected after extensive reviews and revisions,starting from about 50 papers submitted to theFourth International Meeting on ComputationalIntelligence methods for Bioinformatics and Bios-tatistics (CIBB, 2007) that was held in PortofinoVetta, Ruta di Camogli (Italy) in July 2007 in theframework of the activities of the Special InterestGroup in Bioinformatics of the International NeuralNetwork Society. The main goal of CIBB meetings isto provide a forum open to researchers from differ-ent disciplines to present and discuss problemsrelative to computational techniques in bioinfor-matics and medical informatics with a particularfocus on ML and CI methods.

The papers of this special issue embrace a widerange of bioinformatics areas. In a glance, thepapers are subdivided in three main groups, accord-ing to three of the main general bioinformaticsdomains: genomics, transcriptomics and proteo-mics. Even if for some papers this subdivision isquite schematic (indeed several research areasembrace different bioinformatics domains), we fol-low this broad taxonomy according to the mainsubdivisions of bioinformatics research areasadopted by ML and CI communities [3,4].

The opening paper is the extended version of theJoaquin Dopazo invited talk at CIBB 2007 and pro-vides a general overview of a new research line infunctional genomics. Dopazo’s paper is included inthe genomics section, but considering its wide-ran-ging domain, it could be included in the transcrip-tomics section as well.

2.1. Genomics

Dopazo [37] provides a thorough critical dissertationon the hypothesis formulation and testing in func-tional genomics, introducing new perspectives forthe development of computational methods that

relate the available genomic information with thehypotheses that originated the experiments. In thispaper the author reviews themain characteristics offunctional enrichment methods, by which we canfind if gene modules, that is groups of genes relatedby some related biological property (e.g. geneontology functional modules [38] or Kyoto encyclo-pedia of genes and genomes pathways [39]), aresignificantly overrepresented among the relevantgenes selected in the experiment. The inconsisten-cies in the way functional hypotheses are tested byfunctional enrichment methods are analyzed, andnew methods, generally known as gene sets analysismethods, inspired by systems biology criteria, arecritically discussed and reviewed. These methodsrecently proposed in the domain of functional geno-mics, are based on the biological fact that modules,and not single genes constitute the ultimate func-tional ‘‘bricks’’ which act cooperatively to carry outfunctions in the cell. In particular new supervisedand unsupervised methods that attempt to exploit‘‘a priori’’ biological knowledge of functional rela-tionships between sets of genes (modules) are dis-cussed, as well as applications of gene sets analysisin transcriptomics, large scale genotyping and phy-logenomics.

Ceccarelli and Maratea [40] address the problemof the alternative splicing prediction, a key mechan-ism to understand themultiplicity of proteins raisingfrom a relatively low number of genes in eukaryoticorganisms. The authors present a supervised MLapproach using support vector machines with dataobtained from a virtual genetic coding scheme tonumerically modeling the information content ofsequences, and using time series analysis to extractfixed-length set of features from each sequence. MLrecognition of alternatively spliced exons reachesan area under the curve of over 96% on tested C.Elegans data, confirming to be an effective proce-dure especially when no ‘‘a priori’’ biological knowl-edge is available. As a byproduct of this study thevirtual genetic code based on Shannon informationcontent proposed in this paper could be an attrac-tive option whenever a numerical translation of abiological sequence is needed, and could be inprinciple applied in other areas of genomics andtranscriptomics.

Re and Pavesi [41] address the problem of thedetection of the conserved coding genomic regionsthrough signal processing techniques applied to theanalysis of the alignment of nucleotide sequences ofdifferent organisms at the level of the entire gen-ome. The authors analyze the discrete Fourier trans-form spectrum of the signal of the mismatchesbetween two human/mouse aligned sequences oflength n. The main idea behind this approach con-

Page 3: Computational intelligence and machine learning in bioinformatics

Guest editorial 93

sists in the biological fact that coding regions mis-matches occur predominantly in the third codonposition, while they should appear almost randomlyin non-coding regions, thus resulting in a higher n/3frequency component in coding regions. The authorspropose measures based on this analysis to unravelthe coding potential of genomic regions. Thismethod, that applies signal processing methods ina comparative genomics framework, can be in prin-ciple extended to the analysis of the genome ofother organisms, because of the universality of thegenetic code and the selective pressure acting onprotein coding regions.

Nicotra and Micheli [42] tackle the problem ofgene function prediction using phylogenetic data.To this end they propose supervised learning meth-ods based on a class of kernels for structured dataleveraging on a hierarchical probabilistic modelingof phylogeny among species: a sufficient statisticskernel, a Fisher kernel, and a probability productkernel. The authors introduce kernel adaptivity tothe data through the estimation of the parametersof a tree structured model of evolution, showing animprovement in the classification of functionalclasses of genes in S. Cerevisiae w.r.t. to standardvector based kernel and non-adaptive tree kernelfunctions.

Perina et al. [43] address two central problems inmedical genetics, related to the localization ofgenetic regions containing susceptibility genes forgenetic diseases: the haplotype reconstruction andhaplotype block discovery. To this end they proposea new hidden Markov model (HMM) and an inferencestrategy for learning. The estimation of haplotypesfrom genetic patterns in unrelated individuals isperformed by applying variational learning strate-gies, thus avoiding local minima solutions thataffect other HMM methods based on the classicalexpectation-maximization algorithm. Moreover theproposed fully non homogeneous HMM is able tosegment genotypes into linkage disequilibriumblocks, using the Gini index, a classical statisticalmeasure, to determine the segmentation of blockboundaries. The results are competitive with state-of-the-art systems for haplotype reconstruction andblock discovery.

2.2. Transcriptomics

Okun and Priisalu [44] present a paper that opens anew approach to computer-aided bio-moleculardiagnosis of malignancies, explicitly taking intoaccount the complexity that characterizes high-dimensional gene expression and other types ofbio-molecular data. The authors propose a super-vised ensemble method based on the complexity of

the data obtained by randomly choosing subsets offeatures (gene expression levels associated to spe-cific genes) and then selecting only the least com-plex data through a proper measure of complexity.The authors show also through an extensive statis-tical analysis that there is a direct relationshipbetween the accuracy of the base learners (esti-mated through a low biased bolstered error) and thecomplexity of data (estimated through an adapta-tion of the Wilcoxon rank sum test). The proposednew scheme for generating ensembles of classifiersis applied to the analysis of several gene expressiondata sets, showing that the selection of features/genes leading to less complex data ensures a betterperformance of the resulting ensemble.

Weightless connectionist models in which eachneuron performs basically boolean operations andanalog to digital conversions are proposed by Muselliet al. [45], in conjunction with recursive featureaddition (RFA) techniques to properly select genesrelated to a specific phenotype. By this techniquethe authors are able to assign a relevance value tothe variables associated to the expression level ofeach gene and to select the most relevant throughthe RFA approach. The effectiveness of the methodis demonstrated by using a recently proposed math-ematical model based on the biological concepts ofexpression signature and expression profile on bothreal and artificial gene expression data.

In the paper presented by Avogadri and Valentini[46], the authors address the problem of the uncer-tainty underlying the assignments of examples/patients to clusters in the context on unsupervisedgene expression data analysis. This problem is rele-vant to discover subclasses of pathologies based onthe bio-molecular characteristics of patients. To dealwith this problem, a fuzzy approach is adopted byapplying a fuzzy-k-means algorithm to differentinstances of the data and by using a fuzzy t-normto combine the multiple clusterings. The multipleinstances of the data are obtained by Bernoulli ran-dom projections that reduce the high-dimensionalityof gene expression data, without introducing rele-vant distortions into the data, thus improving boththe accuracy and the diversity of the obtained baseclusterings. The advantages and limitations of theproposed approach are shown by comparing its accu-racy and robustness w.r.t. state-of-the-art clusteringensemble algorithms. Finally, an empirical analysis ofthe relationships between the accuracy and diversityof the base fuzzy-clusterings is provided.

The paper of Campadelli et al. [47] is included inthis section only for organizational reasons, but itsresearch domain is within bio-medical imageanalysis, an important research area related tobioinformatics, especially in the perspective of

Page 4: Computational intelligence and machine learning in bioinformatics

94 Guest editorial

the integration of multi-source biological data forthe diagnosis and outcome prediction of diseases.The paper provides a description and a criticalanalysis of the state-of-the-art of semi-automaticand automatic liver segmentation techniques and ofa new algorithm to deal with most of the problemsand drawbacks of the computational methods dis-cussed in the review. Live wire segmentationapproaches, gray level based methods, neural net-works, Bayesian and model fitting based methodsare reviewed, in order to analyze the pros and consof different image processing methods that consti-tute the first step for the automatic liver diseasediagnosis and three-dimensional liver rendering.The authors propose a three-steps gray level basedtechnique to cope with the high inter and intrapatient gray level and shape variability, achievinga high accuracy in the liver segmentation obtainedfrom 40 abdominal contrast enhanced computedtomography images.

2.3. Proteomics

Verkhivker’s paper [48] focuses on the problem ofthe analysis of binding mechanisms and molecularsignatures of the HIV-1 protease (HIV-1 PR) drugs.HIV-1 PR represents an important target for thedesign of antiviral agents, and in this work themolecular basis of the HIV-1 PR inhibition are stu-died. To this end Monte-Carlo simulations with theconformational ensembles of the HIV-1 PR dimer andmonomer structures have been performed, thusenabling a molecular analysis of the active siteand dimerization modes of inhibition. The authorshows that an acetylated tetrapetide Ac-SYEL-OHcan act as both a dimerization inhibitor and a com-petitive active site inhibitor, and unravels the waythat the peptide NIIGRNLLTQI acts as folding inhi-bitor of HIV-1 PR, thus enabling the design of novelinhibitors of HIV-1 protease.

The classification of protein samples w.r.t. a givenphenotype is one of the major goals in quantitativeproteomics. When comparing two biological samplesmeasured with liquid chromatography coupled tomass spectrometry, one often observes a nonlineartime deformation between consecutive experimentswhich introduces a severe alignment problem.

Fischer et al. [49] address this problem by apply-ing a method based on generalized canonical corre-lation analysis, in order to improve the estimation ofdifferential protein expression values. In particularthey introduce an adaptive scale space estimationfor complexity tuning of the time-warping func-tions, and a local model selection procedure foreach time axis instead of the usual global modelselection procedure. With this novel technique the

authors overcome two severe problems of the pre-vious approaches: non-symmetry of the time pre-diction function and a potential violation of themonotonicity constraint in temporal alignments.

The classification of high-dimensional mass-spec-trometry measurements represents a challenging CIand ML problem, with significant applications in can-cer research.

Schleif et al. [50] propose a supervised prototypebased classifier applied to mass-spectrometric datapreprocessed with wavelets techniques that uses afunctional norm that takes into account the specificnature of mass-spectra. The authors propose as pro-totype based classifier the supervised relevanceneural gas (SRNG) whose accuracy, in this context,is comparablewith state-of-the-art supervised learn-ing algorithms. Moreover, considering that SRNG gen-erates models which consist of typical points of thedata, prototypes for the classes under investigation,the representation of the solution allows also a morenatural interpretation of data from a bio-medicalstandpoint.

Vassura et al. [51] address the problem of proteinstructure selection (PSS), that is the assignment of agiven protein to one of 3D structures (named decoys)according to a given distance measure. In literature,existing methods for solving PSS usually rely on pri-mary structure of the protein and on protein chem-istry, making use of specific energy functions thatneed to be minimized through suitable optimizationmethods. On the contrary, the authors propose anoriginal approach to the selectionofdecoyswhicharecloser to the original (unknown) structures, basedsolely on geometric and graph-based information.Indeed, they show that graph properties can be usedtoassess thequality of apredictionof thenative statestructure of a protein, identifying important connec-tions between properties of decoys and graphs. Theresults show that, based on simple geometrical prop-erties, graph-based predictions can be as robust asseemingly more sophisticated energy-based scoringof best decoys, opening new perspectives for solvingPSS problems.

We would like to thank all the reviewers for theircritical, yet constructive, comments that signifi-cantly improved the quality of the papers. We wouldalso like to thank Klaus-Peter Adlassnig, who madepossible the publication of this special issue.

References

[1] Goble C, Stevens R. State of the nation in data integrationfor bioinformatics. Journal of Biomedical Informatics, avail-able on line at http://www.sciencedirect.com; in press[accessed 5.08.08].

Page 5: Computational intelligence and machine learning in bioinformatics

Guest editorial 95

[2] Baldi P, Brunak S, Bioinformatics. The machine learningapproach. Cambridge, MA: MIT Press; 2001.

[3] Cios K, Mamitsuka H, Nagashina T, editors. Computationalintelligence techniques in bioinformatics (special issue).Artificial Intelligence in Medicine 2005;35(1—2).

[4] Larranaga P, Clavo B, Santana R, Bielza C, Gladiano J, Inza I,et al. Machine learning in bioinformatics. Briefings in Bioin-formatics 2006;7(1):86—112.

[5] Rosenblatt F. The perceptron, a probabilistic model forinformation storage and organization in the brain. Psycho-logical Review 1958;65:386—408.

[6] Stormo G, Scheider T, Gold L, Ehrenfeuch A. Use of theperceptron algorithm to distinguish translation initiationsites in E. coli. Nucleic Acid Research 1986;10:2997—3011.

[7] Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC,Baldwin J, et al. Initial sequencing and analysis of the humangenome. Nature 2001;409(6822):860—921.

[8] Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG,et al. The sequence of the human genome. Science2001;291(5507):1304—51.

[9] Brent M, Guigo R. Recent advances in gene structure pre-diction. Current Opinion in Structural Biology 2004;14(3):264—72.

[10] Bernal A, Crammer K, Hatzigeorgiou A, Pereira F. Globaldiscriminative learning for higher-accuracy computationalgene prediction. PLoS Computational Biology 2007;3(3).

[11] Ratsch G, Sonnenburg S, Srinivasan J, Witte H, Muller K-R,Sommer R-J, et al. Improving the Caenorhabditis elegansgenome annotation using machine learning. PLoS Computa-tional Biology 2007;3(2):e20.

[12] Holloway D, Kon M, DeLisi C. Machine learning for regulatoryanalysis and transcription factor target prediction in yeast.Systems and Synthetic Biology 2007;1(1):25—46.

[13] Lopez-Bigas N, Ouzounis C. Genome-wide identification ofgenes likely to be involved in human genetic diseases.Nucleic Acid Research 2004;32(10):3108—14.

[14] Bao L, Cui Y. Prediction of the phenotypic effects of nonsynonymous single nucleotide polymorphisms using structuraland evolutionary information. Bioinformatics 2005;21(5):2185—90.

[15] Fogel G, Porto W, Weekes D. Prediction of the phenotypiceffects of non synonymous single nucleotide polymorphismsusing structural and evolutionary information. Nucleic AcidResearch 2002;30(23):5310—7.

[16] Greenspan G, Geiger D. High density linkage disequilibriummapping using models of haplotype block variations. Bioin-formatics 2004;20(S1):137—44.

[17] Saeys Y, Degroeve S, Aeyels D, Rouze P, Van de Peer Y.Feature selection for splice site prediction: a new methodusing EDA-based feature ranking. BMC Bioinformatics2004;5(64).

[18] Ritchie M, White BC, Parker JS, Hahn LW, Moore JH. Opti-mization of neural network architecture using genetic pro-gramming improves detection and modeling of gene—geneinteractions in studies of human diseases. BMC Bioinfor-matics 2003;4(28).

[19] Handl J, Kell D, Knowles J. Multiobjective optimization inbioinformatics and computational biology. IEEE/ACM Trans-actions on Computational Biology and Bioinformatics2007;4(2):279—92.

[20] Randall A, Cheng J, Sweredoski M, Baldi P. Tmbpro: second-ary structure, beta-contact and tertiary structure predic-tion of transmembrane beta-barrel proteins. Bioinformatics2008;24(4):513—20.

[21] Troyanskaya O, Dolinski K, Owen AB, Altman RB, Botstein D.A Bayesian framework for combining heterogeneous datasources for gene function prediction (in Saccharomices cer-

evisiae). Proceedings of the National Academy of Sciences ofthe United States of America 2003;100:8348—53.

[22] Barutcuoglu Z, Schapire R, Troyanskaya O. Hierarchicalmulti-label prediction of gene function. Bioinformatics2006;22(7):830—6.

[23] Raval A, Ghahramani Z, Wild D. A Bayesian network modelfor protein fold and remote homologue recognition. Bioin-formatics 2002;18(6):788—801.

[24] Pollastri G, Baldi P. Prediction of contact maps by GIOHMMsand recurrent neural networks using lateral propagation fromall four cardinal corners. Bioinformatics 2002;18(S1): 62—70.

[25] Huang Y, Li Y. Prediction of protein subcellular localizationusing fuzzy k-nn method. Bioinformatics 2004;20(1):21—8.

[26] Allison D, Cui X, Page G, Sabripour M. Microarray dataanalysis: from disarray to consolidation and consensus. Nat-ure Review Genetics 2006;7(1):55—65.

[27] Wang S, Cheng Q. Microarray analysis in drug discovery andclinical applications. Methods in Molecular Biology 2006;316:49—65.

[28] Dopazo J. Functional interpretation of microarray experi-ments. OMICS 2006;3(10).

[29] Dudoit S, Fridlyand J, Speed T. Comparison of discriminationmethods for the classification of tumors using gene expres-sion data. JASA 2002;97(457):77—87.

[30] Lee Z. An integrated algorithm for gene selection andclassification applied to microarray data of ovarian cancer.Artificial Intelligence in Medicine 2008;42(1):81—93.

[31] Handl J, Knowles J, Kell D. Computational cluster validationin post-genomic data analysis. Bioinformatics 2005;21(15):3201—15.

[32] Kitano H. Systems biology: a brief overview. Science2002;295(5560):1662—4.

[33] Bower J, Bolouri H. Computational modeling of genetic andbiochemical networks. MIT Press; 2004.

[34] Friedman N. Inferring cellular networks using probabilisticgraphical models. Science 2004;303:799—805.

[35] Green M, Karp P. A Bayesian method for identifying missingenzymes in predicted metabolic pathway databases. BMCBioinformatics 2004;5(76).

[36] Steffen M, Petti A, Aach J, D’haeseleer P, Church G. Auto-matedmodelling of signal transduction networks. BMC Bioin-formatics 2002;3(34).

[37] Dopazo J. Formulating and testing hypothesis in functionalgenomics. Artificial Intelligence inMedicine 2009;45:97—107.

[38] The Gene Ontology Consortium. Gene ontology: tool for theunification of biology. Nature Genetics 2000;25:25—9.

[39] Kanehisa M, Goto S. Kegg: Kyoto encyclopedia of genes andgenomes. Nucleic Acid Research 2000;28:27—30.

[40] Ceccarelli M, Maratea A. Virtual genetic coding and timeseries analysis for alternative splicing prediction in C Ele-gans. Artificial Intelligence in Medicine 2009;45:109—15.

[41] Re M, Pavesi G. Detecting conserved coding genomic regionsthrough signal processing of nucleotide substitution pat-terns. Artificial Intelligence in Medicine 2009;45:117—23.

[42] Nicotra L, Micheli A. Modeling adaptive kernels from prob-abilistic phylogenic trees. Artificial Intelligence in Medicine2009;45:125—34.

[43] Perina A, Cristani M, Xumerle L, Murino V, Pignatti P, MalerbaG. FNH—HMM double net: a Bayesian network for haplotypereconstruction and haplotype block discovery. ArtificialIntelligence in Medicine 2009;45:135—50.

[44] Okun O, Priisalu H. Dataset complexity in gene expressionbased cancer classification using ensembles of k-nearestneighbors. Artificial Intelligence in Medicine 2009;45:151—62.

[45] Muselli M, Costacurta M, Ruffino F. Evaluating gene selectionmethods through artificial and real gene expression data.Artificial Intelligence in Medicine 2009;45:163—71.

Page 6: Computational intelligence and machine learning in bioinformatics

96 Guest editorial

[46] Avogadri R, Valentini G. Fuzzy ensemble clustering based onrandom projections for DNA microarray data analysis. Arti-ficial Intelligence in Medicine 2009;45:173—83.

[47] Campadelli P, Casiraghi E, Esposito A. Liver segmentationfrom CT scans: a survey and a new algorithm. ArtificialIntelligence in Medicine 2009;45:185—96.

[48] Verkhivker G. Computational proteomics analysis of bindingmechanisms and molecular signatures of the HIV-1 proteasedrugs. Artificial Intelligence in Medicine 2009;45:197—206.

[49] Fischer B, Roth V, Buhmann J. Adaptive bandwidth selectionfor biomarker discovery in mass spectrometry. ArtificialIntelligence in Medicine 2009;45:207—14.

[50] Schleif F, Villmann T, Kostrzewa M, Hammer B. GammermanA. Cancer informatics by prototype networks in massspectrometry. Artificial Intelligence in Medicine 2009;45:215—28.

[51] Vassura M, Margara L, Fariselli P, Casadio R. A graph theore-tic approach to protein structure selection. Artificial Intelli-gence in Medicine 2009;45:229—37.

Giorgio Valentini*DSI, Dipartimento di Scienze dell’ Informazione,

Universita degli Studi di Milano,Via Comelico 39, Milan, Italy

Roberto TagliaferriDMI, Dipartimento di Matematica e Informatica,

Universita di Salerno, Via Ponte don Melillo,Fisciano (SA), Italy

Francesco Masullia,baDISI, Dipartimento di Informatica e Scienze dell’

Informazione, Universita di Genova,Via Dodecaneso 35, Genoa, Italy

bCenter for Biotechnology, Temple University,1900 N 12th Street, Philadelphia,

PA 19122, USA

*Corresponding author. Tel.: +39 02 50316225;fax: +39 02 50316373

E-mail addresses: [email protected](G. Valentini)

[email protected](R. Tagliaferri)

[email protected](F. Masulli)