13
Tecniche di Intelligenza Tecniche di Intelligenza Artificiale Artificiale in Bioinformatica in Bioinformatica Università degli Studi di Università degli Studi di Ferrara Ferrara ENDIF – Dipartimento di Ingegneria ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni Giacomo Gamberoni

Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

Tecniche di Intelligenza Artificiale Tecniche di Intelligenza Artificiale in Bioinformaticain Bioinformatica

Università degli Studi di FerraraUniversità degli Studi di FerraraENDIF – Dipartimento di IngegneriaENDIF – Dipartimento di Ingegneria

Giacomo GamberoniGiacomo Gamberoni

Page 2: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

Data Mining in BioinformaticsData Mining in Bioinformatics

Genetic data from comparative Genetic data from comparative experiments (normal-cancer)experiments (normal-cancer)Data provided by Dipartimento di Data provided by Dipartimento di morfologia ed embriologia – Università di morfologia ed embriologia – Università di Ferrara (Dott. Stefano Volinia)Ferrara (Dott. Stefano Volinia)Software used:Software used: WekaWeka MatlabMatlab MySQLMySQL

Page 3: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

Microarray ExperimentsMicroarray ExperimentsSlide is prepared Slide is prepared fixing base fixing base sequences (ESTs) sequences (ESTs) in specific points in specific points (spots) on the glass(spots) on the glass

Hybridization of two Hybridization of two mRNA samples from mRNA samples from two cell populations two cell populations

coloured with different coloured with different fluorescent dyesfluorescent dyes

Scanning the slide, we Scanning the slide, we measure fluorescence measure fluorescence

intensities of the two intensities of the two channels in each spotchannels in each spot

Page 4: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

Dataset normalizationDataset normalization

Keep only spots with Keep only spots with good intensity in at least good intensity in at least 75% of the samples75% of the samples

Log ratio: Log ratio:

Subtract the median of Subtract the median of ratios in each spot ratios in each spot Divided by SD of each Divided by SD of each spotspotKeep only spots with at Keep only spots with at least one sample least one sample significantly expressed significantly expressed (Log Ratio >1.5)(Log Ratio >1.5)

red

green2log

s1s1 s2s2 s3s3 s4s4ClassClass CC CC NN CC

EST1EST1 0.20.2 1.21.2 -2.3-2.3 -0.7-0.7

EST2EST2 -1.1-1.1 2.52.5 0.70.7 0.30.3

EST3EST3 0.90.9 1.21.2 2.32.3 -0.6-0.6

Page 5: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

Datasets analyzedDatasets analyzedHepatocellular CarcinomaHepatocellular Carcinoma Reference: artificial mRNA poolReference: artificial mRNA pool 7449 ESTs for 161 samples 7449 ESTs for 161 samples

95 Cancer95 Cancer 82 HBV+, 3 HCV+, 10 no Hepatitis antibodies82 HBV+, 3 HCV+, 10 no Hepatitis antibodies

66 Normal 66 Normal 47 HBV+, 5 HCV+, 14 no Hepatitis antibodies47 HBV+, 5 HCV+, 14 no Hepatitis antibodies

Larynx squamous cell carcinomaLarynx squamous cell carcinoma Reference: normal larynxReference: normal larynx 7626 ESTs for 22 samples7626 ESTs for 22 samples

11 lynph node negative (N0)11 lynph node negative (N0)11 lynph node positive (N+)11 lynph node positive (N+)

Page 6: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

Supervised/unsupervised learningSupervised/unsupervised learning

Supervised learningSupervised learning Decision treeDecision tree Support vector machinesSupport vector machines

Unsupervised learningUnsupervised learning Hierarchical clusteringHierarchical clustering

Page 7: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

ResultsResultsDecision treeDecision tree

Clustering dendrogramClustering dendrogram

358885 <= 0.719385542358885 <= 0.719385542| 740476 <= 0.856739394| 740476 <= 0.856739394| | 626619 <= 0.552788235| | 626619 <= 0.552788235| | | 451711 <= -0.84774| | | 451711 <= -0.84774| | | | 786690 <= -0.116917241: HBV+ (5.0)| | | | 786690 <= -0.116917241: HBV+ (5.0)| | | | 786690 > -0.116917241: HBV- (4.0)| | | | 786690 > -0.116917241: HBV- (4.0)| | | 451711 > -0.84774: HBV+ (107.0/1.0)| | | 451711 > -0.84774: HBV+ (107.0/1.0)| | 626619 > 0.552788235| | 626619 > 0.552788235| | | 310406 <= -0.162467: HBV- (6.0)| | | 310406 <= -0.162467: HBV- (6.0)| | | 310406 > -0.162467: HBV+ (12.0/1.0)| | | 310406 > -0.162467: HBV+ (12.0/1.0)| 740476 > 0.856739394| 740476 > 0.856739394| | 344648 <= 0.051885057: HBV- (10.0)| | 344648 <= 0.051885057: HBV- (10.0)| | 344648 > 0.051885057: HBV+ (7.0/1.0)| | 344648 > 0.051885057: HBV+ (7.0/1.0)358885 > 0.719385542: HBV- (10.0/1.0)358885 > 0.719385542: HBV- (10.0/1.0)

Page 8: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

Gene correlationGene correlation

Analysis of correlation between Analysis of correlation between expression of different genesexpression of different genes

Study of the expression of every possible Study of the expression of every possible couple of genescouple of genes

Computational complexityComputational complexity

Integration with extra knowledgeIntegration with extra knowledge Genetic annotation (Gene Ontology)Genetic annotation (Gene Ontology) Chromosome locationChromosome location

Page 9: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

Intra-gene relationsIntra-gene relations

Studying intra-gene Studying intra-gene relations we can obtain relations we can obtain useful results for:useful results for:

Quality controlQuality controlDifferent ESTs from the Different ESTs from the same UGC should be same UGC should be equally expressedequally expressed

A bad correlation between A bad correlation between these ESTs may be due to these ESTs may be due to experimental errorexperimental error

Chromosomal aberrationChromosomal aberrationWe can highlight parts of We can highlight parts of genes that lose correlationgenes that lose correlation

UGCUGC ESTsESTs Possible Possible

couplescouples

TotalTotal

Rel.Rel.

LIVLIV

RelRel..

HCCHCC

Rel.Rel.

Hs.315379Hs.315379 66 1515 1212 33 1313

Hs.306864Hs.306864 55 1010 77 44 88

Hs.381184Hs.381184 44 66 66 22 66

Hs.8207Hs.8207 44 66 22 00 22

Hs.386834Hs.386834 44 66 33 11 33

Hs.386784Hs.386784 44 66 00 00 00

Hs.355608Hs.355608 44 66 00 11 00

Hs.236456Hs.236456 44 66 11 00 22

Hs.168913Hs.168913 44 66 22 00 22

Page 10: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

Relations in ProcessesRelations in Processes

Study relations between the genes involved in the Study relations between the genes involved in the same biological processessame biological processes Biological processes as defined by the Gene OntologyBiological processes as defined by the Gene Ontology Highlight differences in gene correlations between Highlight differences in gene correlations between

normal and cancernormal and cancer

Biological ProcessBiological Process Total Rel.Total Rel. Only HCC Only HCC Only LIVOnly LIV

immune responseimmune response 109109 8383 1616

regulation of transcription, DNA-dependentregulation of transcription, DNA-dependent 6161 22 5656

antigen presentation, exogenous antigen antigen presentation, exogenous antigen 4646 4040 00

Page 11: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

Present ActivitiesPresent Activities

Development of a web-based interface to Development of a web-based interface to make several algorithms available for make several algorithms available for biologists (PHP, JAVA)biologists (PHP, JAVA)Implementation of some algorithms as Implementation of some algorithms as plug-ins of an open source analysis suite plug-ins of an open source analysis suite (JAVA) (JAVA) Extension of our algorithms in order to Extension of our algorithms in order to analyze other data sources:analyze other data sources: SAGE dataSAGE data Affymetrix dataAffymetrix data

Page 12: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

PublicationsPublications

Giacomo Gamberoni, Evelina Lamma, Sergio Giacomo Gamberoni, Evelina Lamma, Sergio Storari, Diego Arcelli, Francesca Francioso and Storari, Diego Arcelli, Francesca Francioso and Stefano Volinia. Stefano Volinia. Exploiting supervised and Exploiting supervised and unsupervised learning techniques for unsupervised learning techniques for profiling cancer data.profiling cancer data. Presented at Workshop: Presented at Workshop: Data Mining in Functional Genomics and Data Mining in Functional Genomics and ProteomicsProteomics in ECAI 2004. in ECAI 2004.Giacomo Gamberoni e Sergio Storari. Giacomo Gamberoni e Sergio Storari. Supervised and unsupervised learning Supervised and unsupervised learning techniques for profiling SAGE results. techniques for profiling SAGE results. Presented at Presented at Discovery ChallengeDiscovery Challenge in in ECML/PKDD 2004.ECML/PKDD 2004.

Page 13: Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni

PublicationsPublications

Giacomo Gamberoni, Evelina Lamma, Sergio Giacomo Gamberoni, Evelina Lamma, Sergio Storari, Diego Arcelli, Francesca Francioso and Storari, Diego Arcelli, Francesca Francioso and Stefano Volinia.Stefano Volinia. Correlation of expression Correlation of expression between different IMAGE clones from the between different IMAGE clones from the same UniGene Cluster.same UniGene Cluster. Presented in ISBMDA Presented in ISBMDA 2004; published in 2004; published in Biological and Medical Data Biological and Medical Data AnalysisAnalysis, Lecture Notes in Computer Science , Lecture Notes in Computer Science 3337, Springer.3337, Springer.