View
219
Download
0
Embed Size (px)
Citation preview
RNA-RNA interaction
A biological crash course and introduction to prediction methods
Part I – Biological crash course Bacteria
Plasmid copy controlPost-segregational killing systems trans-encoded chromosomal RNAs
RNA interference (gene silencing) Translation regulation
C. elegans developmental regulationmiRNA-miRNA interactions
Human telomerase
DNA vs. RNA
Bases #Strands Structure
DNA A,C,G,T 2 Double helix
RNA A,C,G,U 1 or 2 Stem-loop, pseudoknots, etc.
Gene expression
Central dogma of molecular biology
Translation
mRNA -> protein via triplet code What happens if mRNA is destroyed or
otherwise can’t be translated?
Bacteria backgrounder
Single-celled organisms Prokaryotes = no nucleus Multi-cistronic transcripts -> multiple
genes transcribed at one time, often with overlapping reading frames
Bacterial genetic information Bacterial chromosome (1)
Genome of organismRequired for life
Plasmids (2)Circular DNA moleculesDouble-stranded Independently self-replicatingNot required for life, often confer selective
advantage such as antibiotic resistance
Plasmid replication
(1),(2) – Genes encoded on plasmid (3) – Origin of Replication (ORI)
Plasmid copy control
Recall independent self-replication Copy number fluctuations are unavoidable Too many -> “runaway”, host dies Too few -> increased risk of plasmid loss
Problem: How to control copy count?
Solution: negative feedback loop mediated by RNA-RNA interaction
R1 copy control
Genes:oriR1 – origin of replicationrepA – lots of this protein product is required
for replication initiation tap – translation of protein product is required
for translation of repA proteincopA – product is antisense RNAcopB – product is a repressor protein (not
covered here)
R1 copy control (2)
copA – RNA with stem-loop structure copT – target segment of repA/tap mRNA,
also forms a stem-loop structure Single loop-loop interaction
R1 copy control (3)
R1 copy control (4)
copA RNA is unstable; it degrades If not enough plasmids are producing
copA antisense RNA (copy number is too low), more repA protein can be produced
Therefore the plasmid can replicate
Post-segregational killing systems
Plasmid self-preservation mechanism Bacterial host losing plasmid results in
host death R1 plasmid hok/sok system is the
prototype All such systems work similarly
R1 hok/sok system
hok/sok locus encodes:hok protein – “host killing”Overlapping reading frame – mok –
“modulator of killing”sok RNA – “suppressor of killer”
mok must be translated for hok to be expressed
mok cannot be translated if sok is present
R1 hok/sok system (2)
hok mRNA is extremely compactMany stem-loop structuresFlush 5’ – 3’ pairingHighly stable -> long half-lifeTranslationally inert
mok segment is both:Translationally activeAble to bind sok inhibitor RNA
R1 hok/sok system (3)
sok RNA is highly unstable Bacteria with R1 have lots of sok produced
sok binds mok, hok is not translated Bacteria which lose R1 have:
Lots of stable hok mRNAQuickly degrading sok RNA (low stability)No new sok RNA being producedhok is translated -> bacteria dies
Bacterial chromosomes
Plasmid antisense RNAs are generally cis-encoded Implies complete Watson-Crick
complementarity Bacterial chromosomes contain trans-
encoded antisense RNAsNot necessarily complete complementarity
Often stress-related control systems
oxyS/fhlA in E. coli
oxyS – RNA transcript induced by stress
fhlA – transcriptional activator site
oxyS/fhlA complex binds via two loop-loop interactions
RNA interference (RNAi)
a.k.a. post-transcriptional gene silencing Double-stranded RNAs are introduced into
the cellComplementary to mRNA for a geneDirectly introduced in a wet lab, orProduced by the cell itself
RNA interference (2)
dsRNAs are cleaved into 21-23 nt segments (“small interfering RNAs”, or siRNAs) by an enzyme called Dicer
RNA interference (3)
siRNAs are incorporated into RNA-induced silencing complex (RISC)
RNA interference (4)
Guided by base complementarity of the siRNA, the RISC targets mRNA for degradation
RNA interference – why?
Studying gene functionKnock out or inhibit a gene’s normal functionCan the organism survive?What phenotypic changes are observed?
Therapeutic suppressionE.g. cancer treatment
micro RNA (miRNA)
Gene expression regulation Created by similar process to siRNA Generally prevents binding of ribosome
Ex: C. elegans development
lin-4 and let-7 antisense RNAs Regulate larval development in C. elegans One of the two binding sites for lin-41 and
let-7 interaction:
Human telomerase Telomerase = ribonucleoprotein complex
Ribo = ribosomal/RNA associationNucleo = nuclear localizationProtein = contains a protein
Responsible for maintaining telomere length in eukaryotic chromosomes
Main components:Telomerase reverse transcriptaseHuman telomerase RNA (hTR)
Human telomerase (2)
Reverse transcriptaseTranscribes RNA to DNA (rather than the
usual DNA to RNA) Telomeres – repeated regions at the end
of eukaryotic chromosomes hTR is the template for the repeated
region
Human telomerase (3)
hTR 11-nt templating region consists of:Repeat template: CUAACCCAlignment domain: UAAC
Positions telomerase on the DNA strand Provides template for repeat region
Human telomerase (4)
Loop-loop interaction
Sometimes referred to as “kissing loops” Recall that all of the RNA-RNA interaction
discussed so far (excepting RNAi), involve loop-loop interaction
Predicting miRNA transcripts and targets involves loop structure prediction
ReferencesCouzin, J. (2002) “Breakthrough of the year – Small RNAs
make big splash.” Science 298(5602):2296-2297.
Lai, E.C., Wiel, C., and Rubin, G.M. (2004) “Complementary miRNA pairs suggest a regulatory role for miRNA:miRNA duplexes.” RNA 10(2):171-175.
Moss, E.G. (2001) “RNA interference – It’s a small RNA world.” Current Biology 11(19):R722-775.
Sharp, P.A. (2001) “RNA interference – 2001.” Genes and Development 15(5):485-90.
Shi, Y. (2003) “Mammalian RNAi for the masses.” TRENDS in Genetics 19(1):9-12.
References (2)Ueda, C.T., and Roberts, R.W. (2004) “Analysis of a long-
range interaction between conserved domains of human telomerase RNA.” RNA 10(1):139-147.
Wagner, E.G.H. and Flärdh, K. (2002) “Antisense RNAs everywhere?” TRENDS in Genetics 18(5):223-226.
Wagner, E.G.H., Altuvia, S., and Romby, P. (2002) “Antisense RNAs in bacteria and their genetic elements.” Advances in Genetics 45:361-398.
Part II – Prediction
Identifying effective siRNAsNeural network approach
Identifying targetsMammalian miRNA target prediction
Prediction of siRNAs
Sequence properties that make a good antisense RNA an effective gene inhibitor are not well understood
Most computational models consider only:RNA structure predictionMotif searches
Neural net approach
Training set: 490 known siRNA molecules Input parameters:
Base compositionmRNA:siRNA binding energy properties3’ and 5’ binding energyStructure of siRNA (hairpin energy and
quality) Target function: efficacy
Neural net approach (2)
Neural net results
14 inputs, 11 hidden units, 1 output Success rate of 92% Average prediction of 12 effective siRNAs
per 1000 base pairs Stringent (high specificity) Good for designing siRNAs for RNAi
Prediction of miRNA targets
Mammals/vertebratesLots of known miRNAsMostly unknown target genes
Initial method outlineLook at conserved miRNAsLook for conserved target sites
micro RNAs in animals
0.5-1.0% of predicted genes encode miRNAOne of the more abundant regulatory classes
Tissue-specific or developmental stage-specific expression
High evolutionary conservation
micro RNAs in plants
Finding targets in plants is relatively easy Look for mRNA transcripts with near-
perfect complementarity to known miRNAs Signal-to-noise ratio exceeds 10:1 for
Arabidopsis (model plant organism) Naïve approach in C. elegans and D.
melanogaster? No more hits than expected by random chance!
So what can we use?
Pairing to nucleotides 2-8 at the 5’ end of the miRNATarget recognition
Target regions enriched for genes involved in transcriptional regulation
Goals for algorithm
Predict 100s of miRNA targets Estimate false-positive rates Provide computational and experimental
evidence of authenticity Identify common functionality classes
other than transcriptional regulator genes
TargetScan Algorithm developed by Lewis et al 2003 Input:
miRNA that is known to be conserved across multiple organisms
Orthologous 3’ UTR sequencesCut-off values for two parametersValue for one free parameter
Output: Ranked list of candidate target genes
TargetScan (1)
Search UTRs in one organismBases 2-8 from miRNA = “miRNA seed”Perfect Watson-Crick complementarityNo wobble pairs (G-U)7nt matches = “seed matches”
TargetScan (2)
Extend seed matchesAllow G-U (wobble) pairsBoth directionsStop at mismatches
TargetScan (3)
Optimize basepairingRemaining 3’ region of miRNA35 bases of UTR 5’ to each seed matchRNAfold program (Hofacker et al 1994)
TargetScan (4)
Folding free energy (G) assigned to each putative miRNA:target interaction
Ignores initiation free energy RNAeval (Hofacker et al 1994)
Z score for each UTR (no match -> Z=1.0)
TargetScan (5)
n
k
TGkeZ1
/
n = number of seed matches in UTR (may be more than one)
Gk = free energy of miRNA:target site interaction of kth seed match
T = parameter influencing relative weighting of UTRs with few high affinity target sites against UTRs with lots of low affinity target sites (experimentally determined)
TargetScan (6)
Order UTRs by Z score Assign rank to each UTR Repeat this process for each of the other
organisms with UTR datasets
TargetScan (7)
UTR i is a predicted target if for all organisms:
Ci ZZ Ci RR
Datasets
nrMamm (mammalian – 79 sequences)Homologs in human, mouse, and pufferfish Identical between human and mouse, not
necessarily pufferfish (fugu) nrVert (vertebrate – 55 sequences)
Identical between human, mouse, and fugu Non-redundant: if multiple miRNAs had
the same seed, one representative chosen
Sample program flow
Results for nrMamm
nrMamm searched against human, mouse, and rat orthologous 3’ UTRs
451 miRNA:target interactions predicted for 400 unique genes
Average 5.7 targets per miRNA Signal:noise ratio of 3.2:1
Results for nrVert
Additional search against fugu UTRs Signal:noise ratio improves to 4.6:1 Relaxed cut-off values 115 predicted miRNA:target interactions
for 107 unique genes 2.1 putative targets per miRNA
Signal:noise ratio calculation
Signal = number of predicted targets from nrMamm dataset
Noise = number of predicted targets from randomly shuffled miRNAs
Shuffled control sequences screened to ensure preservation of relevant features – don’t underestimate the noise!
Screening control sequences
Features to consider:Expected frequency of seed matchesExpected frequency of matching to 3’ end of
miRNA (after seed extension)Observed count of seed matches in UTR
datasetsPredicted free energies for seed:match
interactions
Signal:noise results Filled bars are for
authentic miRNAs Open bars show the
mean and standard deviation for shuffled sequences
nrMamm set used for first two, nrVert used for set including fugu
Biological relevance Hypothesis: 5’ conservation of miRNAs is
important for mRNA target recognitionHighest signal:noise ratio observed when
seed positioned close to 5’ end Hypothesis: highly conserved miRNAs are
more involved in regulationHigh degree of conservation -> more
predicted targetsMembership in large miRNA family -> more
predicted targets
Experimental verification
15 predicted target sites chosenAll with known biological functionRepresentative of the entire list of candidates
11 target sites confirmedExpression of upstream ORF influenced27% false positives – close correspondance
to predicted 30% false positives
ReferencesChalk, A.M. and Sonnhammer, E.L.L. (2002)
“Computational antisense oligo prediction with a neural network model.” Bioinformatics 18(12):1567-1575.
Hofacker, I.L., Fontanta, W., Stadler, P.F., Bonhoeffer, S., Tacker, M., and Schuster, P. (1994) “Fast folding and comparison of RNA secondary structures.” Monatshefte fur Chemie 125:167-168.
Lewis, B.P., Shih, I., Jones-Rhoades, M.W., and Bartel, D.P. (2003) “Prediction of mammalian microRNA targets.” Cell 115(7):787-798.