Probe selection for Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Probe selection for Microarrays Considerations and pitfalls


LF-2001.11

Probe selection wish list Probe selection strategy should ensure

Biologically meaningful results (The truth...) Coverage, Sensitivity (... The whole truth...) Specificity (... And nothing but the truth) Annotation Reproducibility


LF-2001.11

Technology Probe immobilization

Oligonucleotide coupling Synthesis with linker, covalent coupling to surface

Oligonucleotide photolithography ds-cDNA coupling

cDNA generated by PCR, nonspecific binding to surface ss-cDNA coupling

PCR with one modified primer, covalent coupling, 2nd strand removal

Spotting With contact (pin-based systems) Without contact (ink jet technology)


LF-2001.11

Technology-specific requirements

General Not too short (sensitivity, selectivity) Not too long (viscosity, surface properties) Not too heterogeneous (robustness) Degree of importance depends on method

Single strand methods (Oligos, ss-cDNA) Orientation must be known ss-cDNA methods are not perfect ds-cDNA methods don’t care


LF-2001.11

Probe selection approaches

Accuracy Throughput

Selected GeneRegions

SelectedGenes

Anonymous

ESTs

ClusterRepresentatives


LF-2001.11

Non-Selective Approaches Anonymous (blind) spotting

Using clones from a library without prior sequencing Only clones with interesting expression pattern are

sequenced Normalization of library highly recommended Typical uses:

HT-arrays of ‘exotic’ organisms or tissues Large-scale verification of Differential Display clones

EST spotting Using clones from a library after sequencing Little justification since sequence availability allow selection


LF-2001.11

Spotting of cluster representatives

Sequence Clustering For human/mouse/rat EST clones: public cluster

libraries Unigene (NCBI) THC (TIGR)

For custom sequence: clustering tools STACK_PACK (SANBI) JESAM (HGMP) PCP (Paracel, commercial)


LF-2001.11

A benign clustering situation


LF-2001.11

In the absence of 5‘-3‘ links

Two clusters corresponding to one gene

!


LF-2001.11

Overlap too short

Three clusters corresponding to one gene

!


LF-2001.11

Chimeric ESTs

One cluster corresponding to two genes

!!


LF-2001.11

Chimeric ESTs ... continued Chimeric ESTs are quite common Chimeric ESTs are a major nuisance for array probe

selection One of the fusion partners is usually a highly expressed

mRNA Double-picking of chimeric ESTs can fool even cautious

clustering programs. Unigene contains several chimeric clusters The annotation of chimeric clusters is erratic

Chimeric ESTs can be detected by genome comparison There is one particularly bad class of chimeric sequences

that will be subject of the exercises.


LF-2001.11

How to select a cluster representative If possible, pick a clone with completely known sequence Avoid problematic regions

Alu-repeats, B1, B2 and other SINEs LINEs Endogenous retroviruses Microsatellite repeats

Avoid regions with high similarity to non-identical sequences In many clusters, orientation and position relative to ORF

are unknown and cannot be selected for. Test selected clone for sequence correctness Test selected clone for chimerism Some commercial providers offer sequence verified

UNIGENE cluster representatives


LF-2001.11

Selection of genes If possible, use all of them Biased selection

Selection by tissue Selection by topic Selection by visibility Selection by known expression properties Selection from unbiased pre-screen

Use sources of expression information EST frequency Published array studies SAGE data


LF-2001.11

Selection of gene regions

3‘ UTR

ORF5‘ UTR


LF-2001.11

Alternative polyadenylation


LF-2001.11

Alternative polyadenylation Constitutive polyA heterogeneity

3’-Fragments: reduced sensitivity no impact on expression ratio

Regulated polyA heterogeneity Fragment choice influences expression ratio Multiple fragments necessary

Detection of cryptic polyA signals Prediction (AATAAA) Polyadenylated ESTs SAGE tags


LF-2001.11

Alternative splicing


LF-2001.11

Alternative splicing Constitutive splice form heterogeneity

Fragment in alternative exon: reduced sensitivity No impact on expression ratio

Regulated splice form heterogeneity Fragment choice influences expression ratio Multiple fragments necessary

Detection of alternative splicing events Hard/Impossible to predict EST analysis (beware of pre-mRNA) Literature


LF-2001.11

Alternative promoter usage


LF-2001.11

Alternative promoter usage What is the desired readout?

If promoter activity matters most: multiple fragments If overall mRNA level matters most: downstream fragment

Detection of alternative promoter usage Prediction difficult (possible?) EST analysis Literature


LF-2001.11

UDP-Glucuronosyltransferases

UGT1A8

UGT1A7


LF-2001.11

Selection of gene regions Coding region (ORF)

Annotation relatively safe No problems with alternative

polyA sites No repetitive elements or other

funny sequences danger of close isoforms danger of alternative splicing might be missing in short RT

products

3’ untranslated region Annotation less safe danger of alternative polyA

sites danger of repetitive elements less likely to cross-hybridize

with isoforms little danger of alternative

splicing 5’ untranslated region

close linkage to promoter frequently not available


LF-2001.11

A checklist Pick a gene Try get a complete cDNA sequence Verify sequence architecture (e.g. cross-species comparison) Mask repetitive elements (and vector!) If possible, discard 3’-UTR beyond first polyA signal Look for alternative splice events Use remaining region of interest for similarity searches Mask regions that could cross-hybridize

Use the remaining region for probe amplification or EST selection When working with ESTs, use sequence-verified clones

Documents

Probe selection for Microarrays