View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Tecniche di Intelligenza Artificiale Tecniche di Intelligenza Artificiale in Bioinformaticain Bioinformatica
Università degli Studi di FerraraUniversità degli Studi di FerraraENDIF – Dipartimento di IngegneriaENDIF – Dipartimento di Ingegneria
Giacomo GamberoniGiacomo Gamberoni
Data Mining in BioinformaticsData Mining in Bioinformatics
Genetic data from comparative Genetic data from comparative experiments (normal-cancer)experiments (normal-cancer)Data provided by Dipartimento di Data provided by Dipartimento di morfologia ed embriologia – Università di morfologia ed embriologia – Università di Ferrara (Dott. Stefano Volinia)Ferrara (Dott. Stefano Volinia)Software used:Software used: WekaWeka MatlabMatlab MySQLMySQL
Microarray ExperimentsMicroarray ExperimentsSlide is prepared Slide is prepared fixing base fixing base sequences (ESTs) sequences (ESTs) in specific points in specific points (spots) on the glass(spots) on the glass
Hybridization of two Hybridization of two mRNA samples from mRNA samples from two cell populations two cell populations
coloured with different coloured with different fluorescent dyesfluorescent dyes
Scanning the slide, we Scanning the slide, we measure fluorescence measure fluorescence
intensities of the two intensities of the two channels in each spotchannels in each spot
Dataset normalizationDataset normalization
Keep only spots with Keep only spots with good intensity in at least good intensity in at least 75% of the samples75% of the samples
Log ratio: Log ratio:
Subtract the median of Subtract the median of ratios in each spot ratios in each spot Divided by SD of each Divided by SD of each spotspotKeep only spots with at Keep only spots with at least one sample least one sample significantly expressed significantly expressed (Log Ratio >1.5)(Log Ratio >1.5)
red
green2log
s1s1 s2s2 s3s3 s4s4ClassClass CC CC NN CC
EST1EST1 0.20.2 1.21.2 -2.3-2.3 -0.7-0.7
EST2EST2 -1.1-1.1 2.52.5 0.70.7 0.30.3
EST3EST3 0.90.9 1.21.2 2.32.3 -0.6-0.6
Datasets analyzedDatasets analyzedHepatocellular CarcinomaHepatocellular Carcinoma Reference: artificial mRNA poolReference: artificial mRNA pool 7449 ESTs for 161 samples 7449 ESTs for 161 samples
95 Cancer95 Cancer 82 HBV+, 3 HCV+, 10 no Hepatitis antibodies82 HBV+, 3 HCV+, 10 no Hepatitis antibodies
66 Normal 66 Normal 47 HBV+, 5 HCV+, 14 no Hepatitis antibodies47 HBV+, 5 HCV+, 14 no Hepatitis antibodies
Larynx squamous cell carcinomaLarynx squamous cell carcinoma Reference: normal larynxReference: normal larynx 7626 ESTs for 22 samples7626 ESTs for 22 samples
11 lynph node negative (N0)11 lynph node negative (N0)11 lynph node positive (N+)11 lynph node positive (N+)
Supervised/unsupervised learningSupervised/unsupervised learning
Supervised learningSupervised learning Decision treeDecision tree Support vector machinesSupport vector machines
Unsupervised learningUnsupervised learning Hierarchical clusteringHierarchical clustering
ResultsResultsDecision treeDecision tree
Clustering dendrogramClustering dendrogram
358885 <= 0.719385542358885 <= 0.719385542| 740476 <= 0.856739394| 740476 <= 0.856739394| | 626619 <= 0.552788235| | 626619 <= 0.552788235| | | 451711 <= -0.84774| | | 451711 <= -0.84774| | | | 786690 <= -0.116917241: HBV+ (5.0)| | | | 786690 <= -0.116917241: HBV+ (5.0)| | | | 786690 > -0.116917241: HBV- (4.0)| | | | 786690 > -0.116917241: HBV- (4.0)| | | 451711 > -0.84774: HBV+ (107.0/1.0)| | | 451711 > -0.84774: HBV+ (107.0/1.0)| | 626619 > 0.552788235| | 626619 > 0.552788235| | | 310406 <= -0.162467: HBV- (6.0)| | | 310406 <= -0.162467: HBV- (6.0)| | | 310406 > -0.162467: HBV+ (12.0/1.0)| | | 310406 > -0.162467: HBV+ (12.0/1.0)| 740476 > 0.856739394| 740476 > 0.856739394| | 344648 <= 0.051885057: HBV- (10.0)| | 344648 <= 0.051885057: HBV- (10.0)| | 344648 > 0.051885057: HBV+ (7.0/1.0)| | 344648 > 0.051885057: HBV+ (7.0/1.0)358885 > 0.719385542: HBV- (10.0/1.0)358885 > 0.719385542: HBV- (10.0/1.0)
Gene correlationGene correlation
Analysis of correlation between Analysis of correlation between expression of different genesexpression of different genes
Study of the expression of every possible Study of the expression of every possible couple of genescouple of genes
Computational complexityComputational complexity
Integration with extra knowledgeIntegration with extra knowledge Genetic annotation (Gene Ontology)Genetic annotation (Gene Ontology) Chromosome locationChromosome location
Intra-gene relationsIntra-gene relations
Studying intra-gene Studying intra-gene relations we can obtain relations we can obtain useful results for:useful results for:
Quality controlQuality controlDifferent ESTs from the Different ESTs from the same UGC should be same UGC should be equally expressedequally expressed
A bad correlation between A bad correlation between these ESTs may be due to these ESTs may be due to experimental errorexperimental error
Chromosomal aberrationChromosomal aberrationWe can highlight parts of We can highlight parts of genes that lose correlationgenes that lose correlation
UGCUGC ESTsESTs Possible Possible
couplescouples
TotalTotal
Rel.Rel.
LIVLIV
RelRel..
HCCHCC
Rel.Rel.
Hs.315379Hs.315379 66 1515 1212 33 1313
Hs.306864Hs.306864 55 1010 77 44 88
Hs.381184Hs.381184 44 66 66 22 66
Hs.8207Hs.8207 44 66 22 00 22
Hs.386834Hs.386834 44 66 33 11 33
Hs.386784Hs.386784 44 66 00 00 00
Hs.355608Hs.355608 44 66 00 11 00
Hs.236456Hs.236456 44 66 11 00 22
Hs.168913Hs.168913 44 66 22 00 22
Relations in ProcessesRelations in Processes
Study relations between the genes involved in the Study relations between the genes involved in the same biological processessame biological processes Biological processes as defined by the Gene OntologyBiological processes as defined by the Gene Ontology Highlight differences in gene correlations between Highlight differences in gene correlations between
normal and cancernormal and cancer
Biological ProcessBiological Process Total Rel.Total Rel. Only HCC Only HCC Only LIVOnly LIV
immune responseimmune response 109109 8383 1616
regulation of transcription, DNA-dependentregulation of transcription, DNA-dependent 6161 22 5656
antigen presentation, exogenous antigen antigen presentation, exogenous antigen 4646 4040 00
Present ActivitiesPresent Activities
Development of a web-based interface to Development of a web-based interface to make several algorithms available for make several algorithms available for biologists (PHP, JAVA)biologists (PHP, JAVA)Implementation of some algorithms as Implementation of some algorithms as plug-ins of an open source analysis suite plug-ins of an open source analysis suite (JAVA) (JAVA) Extension of our algorithms in order to Extension of our algorithms in order to analyze other data sources:analyze other data sources: SAGE dataSAGE data Affymetrix dataAffymetrix data
PublicationsPublications
Giacomo Gamberoni, Evelina Lamma, Sergio Giacomo Gamberoni, Evelina Lamma, Sergio Storari, Diego Arcelli, Francesca Francioso and Storari, Diego Arcelli, Francesca Francioso and Stefano Volinia. Stefano Volinia. Exploiting supervised and Exploiting supervised and unsupervised learning techniques for unsupervised learning techniques for profiling cancer data.profiling cancer data. Presented at Workshop: Presented at Workshop: Data Mining in Functional Genomics and Data Mining in Functional Genomics and ProteomicsProteomics in ECAI 2004. in ECAI 2004.Giacomo Gamberoni e Sergio Storari. Giacomo Gamberoni e Sergio Storari. Supervised and unsupervised learning Supervised and unsupervised learning techniques for profiling SAGE results. techniques for profiling SAGE results. Presented at Presented at Discovery ChallengeDiscovery Challenge in in ECML/PKDD 2004.ECML/PKDD 2004.
PublicationsPublications
Giacomo Gamberoni, Evelina Lamma, Sergio Giacomo Gamberoni, Evelina Lamma, Sergio Storari, Diego Arcelli, Francesca Francioso and Storari, Diego Arcelli, Francesca Francioso and Stefano Volinia.Stefano Volinia. Correlation of expression Correlation of expression between different IMAGE clones from the between different IMAGE clones from the same UniGene Cluster.same UniGene Cluster. Presented in ISBMDA Presented in ISBMDA 2004; published in 2004; published in Biological and Medical Data Biological and Medical Data AnalysisAnalysis, Lecture Notes in Computer Science , Lecture Notes in Computer Science 3337, Springer.3337, Springer.