Upload
bridget-smith
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
3/24/20053/24/2005 TIGPTIGP 11
Bioinformatics for Bioinformatics for Microarray Studies at Microarray Studies at
IBSIBSPei-Ing Hwang, Ph.D. Pei-Ing Hwang, Ph.D.
Mar. 24, 2005Mar. 24, 2005
TIGPTIGP 223/24/20053/24/2005
Different aspects Different aspects for life science researchfor life science research
genomics
transcriptomics
proteomics
TIGPTIGP 333/24/20053/24/2005
Building blocks for DNA or Building blocks for DNA or RNARNA
DNA: A, T, G, CDNA: A, T, G, C RNA: A, U, G, CRNA: A, U, G, C
TIGPTIGP 443/24/20053/24/2005
DNA: deoxyribonucleic acidDNA: deoxyribonucleic acid
Double strandedDouble stranded AntiparallelAntiparallel
TIGPTIGP 553/24/20053/24/2005
Why microarray?Why microarray?
Gene ExpressionGene Expression To simultaneously study multiple genesTo simultaneously study multiple genes To obtain an overview of gene expression at To obtain an overview of gene expression at
transcriptional level under specific transcriptional level under specific experimental conditionsexperimental conditions
To study gene interaction network from the To study gene interaction network from the transcriptional aspecttranscriptional aspect
Genome Genome SNP detectionSNP detection To find out recombination site in the To find out recombination site in the
chromosome/genomechromosome/genome HopefullyHopefully to discover the gene responsible for to discover the gene responsible for
a genetic diseasea genetic disease
TIGPTIGP 663/24/20053/24/2005
OutlineOutline
Introduction to Microarray Introduction to Microarray experimentsexperiments
Experiences at IBS for the cDNA Experiences at IBS for the cDNA arrays arrays Data generated with microarray Data generated with microarray DNA annotation DNA annotation Data AnalysisData AnalysisData ManagementData Management
TIGPTIGP 773/24/20053/24/2005
About Microarray About Microarray Technology-1Technology-1
Up to hundreds of thousands of spots Up to hundreds of thousands of spots in a fixed area on a glass slide or a in a fixed area on a glass slide or a membranemembrane
One species of DNA molecules per One species of DNA molecules per one spot one spot Spot is also named as “feature”Spot is also named as “feature” DNA fixed on the chip or membrane is also called “probeDNA fixed on the chip or membrane is also called “probe
The sequence or/and function of each The sequence or/and function of each DNA species on the spot is known .DNA species on the spot is known .
TIGPTIGP 883/24/20053/24/2005
About Microarray About Microarray Technology-2Technology-2
Making use of “hybridization Making use of “hybridization method” method” A : T, U A : T, U G : CG : C
Image processingImage processingData analysisData analysisResult interpretation from biology Result interpretation from biology
aspectaspect
TIGPTIGP 993/24/20053/24/2005
Types of MicroarrayTypes of Microarray
Types of DNA immobilized on the solid Types of DNA immobilized on the solid supportsupport cDNA vs. oligonucleotidescDNA vs. oligonucleotides
Manufacturing methodsManufacturing methods Printing vs. photolithographyPrinting vs. photolithography
Solid supportSolid support Glass slidesGlass slides MembraneMembrane
Nucleotide labeling (slide scanning Nucleotide labeling (slide scanning condition)condition) One color vs. two colorsOne color vs. two colors
TIGPTIGP 10103/24/20053/24/2005
GeneChip® Array GeneChip® Array ManufacuturingManufacuturing
Figure 1. Affymetrix uses a unique combination of photolithography and combinatorial chemistry to manufacture GeneChip® Arrays.
TIGPTIGP 11113/24/20053/24/2005
Microarray printing machineMicroarray printing machine
http://arrayit.com/Products/MicroarrayI/NanoPrint/Nano-Print-new-600.jpg
TIGPTIGP 14143/24/20053/24/2005
Data AnalysesData Analyses
Feature intensity acquisitionFeature intensity acquisition Image analyses: Image analyses:
To identify differentially expressed genesTo identify differentially expressed genes Normalization Normalization (global, local, print-tip, btwn array (global, local, print-tip, btwn array
etc.)etc.)
Clustering or ClassificationClustering or Classification Analyses from biology aspectAnalyses from biology aspect
Significant genesSignificant genes Transcriptional regulation studyTranscriptional regulation study Cellular pathway or network findingCellular pathway or network finding
3/24/20053/24/2005 TIGPTIGP 1515
Experiences at IBS for Experiences at IBS for the cDNA arraysthe cDNA arrays
TIGPTIGP 16163/24/20053/24/2005
AboutAbout IBS tomato arraysIBS tomato arrays
~13000 spots/features per chip~13000 spots/features per chip1 clone per spot1 clone per spotcDNA clones from ~a dozen of cDNA clones from ~a dozen of
various cDNA librariesvarious cDNA librariesAt least two different protocols were At least two different protocols were
followed and six different vectors followed and six different vectors were usedwere used
More than ten technicians involvedMore than ten technicians involved
TIGPTIGP 17173/24/20053/24/2005
Bioinformatics for Microarray at Bioinformatics for Microarray at IBS (cont’d)IBS (cont’d)
IBS tomato EST database IBS tomato EST database constructionconstruction
Installation, management and Installation, management and maintenance of data analyses maintenance of data analyses software software
Reference information searchingReference information searchingBatch Submission of EST sequencesBatch Submission of EST sequences
TIGPTIGP 18183/24/20053/24/2005
Bioinformatics Needs for Microarray Bioinformatics Needs for Microarray Studies at IBSStudies at IBS
Pre-arraying data managementPre-arraying data management cDNA info collection, vector trimming, sequence cDNA info collection, vector trimming, sequence
annotation, EST submission……..etc.annotation, EST submission……..etc.
Array information managementArray information management Gene set characterization, data storage, data retrievalGene set characterization, data storage, data retrieval
Post-hybridization data analysis Post-hybridization data analysis and managementand management array data analyses, storage of the scanning result, array data analyses, storage of the scanning result,
biology-oriented bioinformatics analysesbiology-oriented bioinformatics analyses
TIGPTIGP 19193/24/20053/24/2005
Bioinformatics Service Work for Bioinformatics Service Work for Microarray studies at IBSMicroarray studies at IBS
Data pre-processing for the cDNAsData pre-processing for the cDNAsClone id assignmentClone id assignmentSequence trimmingSequence trimminggene annotationgene annotationFunction classificationFunction classification
Data sheet preparation for commercial Data sheet preparation for commercial software to analyze microarray datasoftware to analyze microarray dataGal file preparation for GenePixProGal file preparation for GenePixProMaster Gene List preparation for Master Gene List preparation for
GeneSpringGeneSpring
TIGPTIGP 20203/24/20053/24/2005
cDNA clones
GenePix
Spotfire,GeneSpring
Biological meaning :
Pathway analysis
Transcription network
Gene-gene interactionFeature intensitiesnormalization
sequencing
PCR
Vector trimming
Assembly
Function annotation
Database
Data analysis:Normalization,Variance Clustering
TIGPTIGP 21213/24/20053/24/2005
Pre-array BioinformaticsPre-array Bioinformatics
clones from labs
sequencing
Raw EST seq
1. Clone id generation
2. Vector Trimming
3. Sequence assembly
4. Seq annotation (BLAST)
5. EST submission to NCBI
6. Database construction
Data Processing and Management
TIGPTIGP 22223/24/20053/24/2005
Clone id generationClone id generation
Data centralization following Data centralization following sequencingsequencing
Rules for re-arrayingRules for re-arraying96 well plate to/from 384 well96 well plate to/from 384 wellPCR from 96 well and spotting from 384 PCR from 96 well and spotting from 384
wellwellOrder of A1, A2, B1, B2Order of A1, A2, B1, B2
TIGPTIGP 25253/24/20053/24/2005
Data collectionData collection
Raw sequencing data obtained from the Raw sequencing data obtained from the sequencing companysequencing company
Organized and stored both ABI and text files by Organized and stored both ABI and text files by labs and by datelabs and by date
Confirmed with each sequence contributor for Confirmed with each sequence contributor for clone infoclone info
Clone id matched with raw sequencesClone id matched with raw sequences
TIGPTIGP 26263/24/20053/24/2005
Processing the sequencing Processing the sequencing datadata
cDNA libraries procedures confirmed with cDNA libraries procedures confirmed with each single labeach single lab
Vector/linker/primer trimming (Seqclean)Vector/linker/primer trimming (Seqclean)Function annotationFunction annotation
Blast against different databaseBlast against different databaseGene Ontology annotationGene Ontology annotation
Sequence Assembly (Phrap)Sequence Assembly (Phrap)
TIGPTIGP 28283/24/20053/24/2005
IBS tomato EST DatabaseIBS tomato EST Database
CloningCloning informationinformationSequencing data Sequencing data Vector/adaptor Trimming Vector/adaptor Trimming
informationinformationEST assemblyEST assemblyFunction annotationFunction annotationCross ReferenceCross Reference
3/24/20053/24/2005 TIGPTIGP 2929
ID MAP
1. Seq id2. Clone _ id3. Contig id4. Lab_id#15. Lab_id#26. NCBI_sbmt_id937. NCBI_sbmt_id948. dbEST _ accn _no 9. note
Trimmed Sequence
1. Seq id2. Trimmed Sequence3. Method4. Trim setAssembly Information
1. Contig _ id2. Contig Sequence3. BLAST Result4. Position5. Component seq id
TAIR Result
1. Seq id2. At number3. E-Value4. Description5. Identity6. Other result
NCBI BLAST Result
1. Seq id2. NCBI _id3. E-Value4. Description5. Identity6. Other result
TIGR Result
1. Seq id2. TC number3. E-Value4. Description5. Identity6. Other result
Lab info
1. Seq id2. Comment3. Primer4. Biotech5. Sender6. Collect From
cDNA Library Information
1. Clone _ id(3)(4) 8. Host.2. Name 9. Species3. Date made 10. Vector4. Developmental stage 11. Antibiotic.5. Cloning sites 12. Authors6. Description 13. Tissue7. Library 14. Primer
Gene Ontology
1. TC number2. EC number3. Process -GO_id -Description4. Function -GO_id -Description5. Component -GO_id -Description
TC number
Untrimmed Sequence
1. Seq id2. Trimmed Sequence
Clone _ idn1 1 n
The Tomato DatabaseEntity-Relationship model
TOM 3
TOM 4 Clone _ id
Clone _ id
Seq _ id
TIGPTIGP 30303/24/20053/24/2005
Information to be further Information to be further analyzedanalyzed
Gene set characterizationGene set characterizationNumber of unique genes on the arrayNumber of unique genes on the arrayNumber of known/ unkown genesNumber of known/ unkown genes
Coordination of each spotted Coordination of each spotted sequencesequence
Statistics about spotted cDNA Statistics about spotted cDNA grouped by function/pathwaygrouped by function/pathwaygrouped by sequence similaritygrouped by sequence similarity
3/24/20053/24/2005 TIGPTIGP 3131
Post-hybridization data Post-hybridization data analysis and analysis and managementmanagement
TIGPTIGP 32323/24/20053/24/2005
Post-hybridization data Post-hybridization data analysisanalysis
Software for Microarray Analysis At Software for Microarray Analysis At IBSIBSGenePix Pro5.0 – image processingGenePix Pro5.0 – image processingGeneSpring – microarray data analysisGeneSpring – microarray data analysisSpotfire – microarray data analysis and Spotfire – microarray data analysis and
data storagedata storageTransPath – pathway searchingTransPath – pathway searching
TIGPTIGP 33333/24/20053/24/2005
Image ProcessingImage Processing
GenePix Pro5.0GenePix Pro5.0GAL (GenePix GAL (GenePix
Array List) file Array List) file
TIGPTIGP 34343/24/20053/24/2005
From multi-well plate to From multi-well plate to microarraymicroarray
TIGPTIGP 36363/24/20053/24/2005
GeneSpring at IBSGeneSpring at IBS
for microarray data analysesfor microarray data analyses standalone softwarestandalone software providing statistical methods for data providing statistical methods for data
analysisanalysis Some bioinformaticsSome bioinformatics providing visaulizationproviding visaulization licensed annuallylicensed annually rigid format requirement for input datarigid format requirement for input data requiring installation of a master gene list requiring installation of a master gene list
(master table) prior to data analysis(master table) prior to data analysis
TIGPTIGP 37373/24/20053/24/2005
Master table for GeneSpringMaster table for GeneSpring
Master table contains information ofMaster table contains information ofIdIdSource of DNA Source of DNA Gene nameGene nameGene function annotation (from Blast Gene function annotation (from Blast
results)results)GO annotationGO annotation
Each array needs its own master tableEach array needs its own master tableFormat of master table may vary with Format of master table may vary with
different version of the software.different version of the software.
TIGPTIGP 38383/24/20053/24/2005
To generate master table for To generate master table for GeneSpring GeneSpring
Batch blast against three sequence Batch blast against three sequence databasedatabase
Parsing Blast resultsParsing Blast results Incorporating EC number, GO number Incorporating EC number, GO number
and other related data from the best and other related data from the best BLAST matched resultsBLAST matched results
Integrate all required data from Integrate all required data from various files and generate the master various files and generate the master tabletable
checkingchecking
TIGPTIGP 39393/24/20053/24/2005
SpotfireSpotfire for microarray data analysesfor microarray data analyses server-client softwareserver-client software linked to Oracle database for data storage linked to Oracle database for data storage providing various statistical methods for data providing various statistical methods for data
analysisanalysis capability in establishing links to more capability in establishing links to more
bioinformatics toolsbioinformatics tools can record analysis procedurecan record analysis procedure more flexible format requirement for input more flexible format requirement for input
datadata
TIGPTIGP 40403/24/20053/24/2005
One color array for One color array for ArabidopsisArabidopsis
Affymetrix ATH1 chipAffymetrix ATH1 chipAnnotation information provided by Annotation information provided by
company and available on internetcompany and available on internet
TIGPTIGP 42423/24/20053/24/2005
Projects for now and the near Projects for now and the near futurefuture
Infrastructure build-upInfrastructure build-upMicroarray data management Microarray data management
systemsystemPlatform for Bioinformatics Platform for Bioinformatics
analysesanalysesPlant Signaling Pathway Plant Signaling Pathway
DatabaseDatabase