Upload
adlai
View
39
Download
1
Embed Size (px)
DESCRIPTION
Probe selection for Microarrays. Considerations and pitfalls. Probe selection wish list. Probe selection strategy should ensure Biologically meaningful results (The truth...) Coverage, Sensitivity (... The whole truth...) Specificity (... And nothing but the truth) Annotation - PowerPoint PPT Presentation
Citation preview
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Probe selection for Microarrays Considerations and pitfalls
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Probe selection wish list Probe selection strategy should ensure
Biologically meaningful results (The truth...) Coverage, Sensitivity (... The whole truth...) Specificity (... And nothing but the truth) Annotation Reproducibility
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Technology Probe immobilization
Oligonucleotide coupling Synthesis with linker, covalent coupling to surface
Oligonucleotide photolithography ds-cDNA coupling
cDNA generated by PCR, nonspecific binding to surface ss-cDNA coupling
PCR with one modified primer, covalent coupling, 2nd strand removal
Spotting With contact (pin-based systems) Without contact (ink jet technology)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Technology-specific requirements
General Not too short (sensitivity, selectivity) Not too long (viscosity, surface properties) Not too heterogeneous (robustness) Degree of importance depends on method
Single strand methods (Oligos, ss-cDNA) Orientation must be known ss-cDNA methods are not perfect ds-cDNA methods don’t care
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Probe selection approaches
Accuracy Throughput
Selected GeneRegions
SelectedGenes
Anonymous
ESTs
ClusterRepresentatives
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Non-Selective Approaches Anonymous (blind) spotting
Using clones from a library without prior sequencing Only clones with interesting expression pattern are
sequenced Normalization of library highly recommended Typical uses:
HT-arrays of ‘exotic’ organisms or tissues Large-scale verification of Differential Display clones
EST spotting Using clones from a library after sequencing Little justification since sequence availability allow selection
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Spotting of cluster representatives
Sequence Clustering For human/mouse/rat EST clones: public cluster
libraries Unigene (NCBI) THC (TIGR)
For custom sequence: clustering tools STACK_PACK (SANBI) JESAM (HGMP) PCP (Paracel, commercial)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
A benign clustering situation
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
In the absence of 5‘-3‘ links
Two clusters corresponding to one gene
!
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Overlap too short
Three clusters corresponding to one gene
!
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Chimeric ESTs
One cluster corresponding to two genes
!!
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Chimeric ESTs ... continued Chimeric ESTs are quite common Chimeric ESTs are a major nuisance for array probe
selection One of the fusion partners is usually a highly expressed
mRNA Double-picking of chimeric ESTs can fool even cautious
clustering programs. Unigene contains several chimeric clusters The annotation of chimeric clusters is erratic
Chimeric ESTs can be detected by genome comparison There is one particularly bad class of chimeric sequences
that will be subject of the exercises.
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
How to select a cluster representative If possible, pick a clone with completely known sequence Avoid problematic regions
Alu-repeats, B1, B2 and other SINEs LINEs Endogenous retroviruses Microsatellite repeats
Avoid regions with high similarity to non-identical sequences In many clusters, orientation and position relative to ORF
are unknown and cannot be selected for. Test selected clone for sequence correctness Test selected clone for chimerism Some commercial providers offer sequence verified
UNIGENE cluster representatives
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Selection of genes If possible, use all of them Biased selection
Selection by tissue Selection by topic Selection by visibility Selection by known expression properties Selection from unbiased pre-screen
Use sources of expression information EST frequency Published array studies SAGE data
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Selection of gene regions
3‘ UTR
ORF5‘ UTR
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Alternative polyadenylation
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Alternative polyadenylation Constitutive polyA heterogeneity
3’-Fragments: reduced sensitivity no impact on expression ratio
Regulated polyA heterogeneity Fragment choice influences expression ratio Multiple fragments necessary
Detection of cryptic polyA signals Prediction (AATAAA) Polyadenylated ESTs SAGE tags
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Alternative splicing
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Alternative splicing Constitutive splice form heterogeneity
Fragment in alternative exon: reduced sensitivity No impact on expression ratio
Regulated splice form heterogeneity Fragment choice influences expression ratio Multiple fragments necessary
Detection of alternative splicing events Hard/Impossible to predict EST analysis (beware of pre-mRNA) Literature
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Alternative promoter usage
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Alternative promoter usage What is the desired readout?
If promoter activity matters most: multiple fragments If overall mRNA level matters most: downstream fragment
Detection of alternative promoter usage Prediction difficult (possible?) EST analysis Literature
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
UDP-Glucuronosyltransferases
UGT1A8
UGT1A7
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
Selection of gene regions Coding region (ORF)
Annotation relatively safe No problems with alternative
polyA sites No repetitive elements or other
funny sequences danger of close isoforms danger of alternative splicing might be missing in short RT
products
3’ untranslated region Annotation less safe danger of alternative polyA
sites danger of repetitive elements less likely to cross-hybridize
with isoforms little danger of alternative
splicing 5’ untranslated region
close linkage to promoter frequently not available
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2001.11
A checklist Pick a gene Try get a complete cDNA sequence Verify sequence architecture (e.g. cross-species comparison) Mask repetitive elements (and vector!) If possible, discard 3’-UTR beyond first polyA signal Look for alternative splice events Use remaining region of interest for similarity searches Mask regions that could cross-hybridize
Use the remaining region for probe amplification or EST selection When working with ESTs, use sequence-verified clones