View
313
Download
0
Embed Size (px)
Citation preview
WIDESPREAD PURIFYING SELECTIONON RNA STRUCTURE IN MAMMALSMartin A. Smith Tanja Gesel Peter F. StadlerJohn S. Mattick
Garvan Institute, Sydney, Australia [[email protected]]
Scan me !
2
Select subset ofsequences randomly
6
Submit to RNA structure
prediction algorithms
5
Select random sub-alignmentsimulating a sliding window
1 3Emulate genomic alignment
by realigning with MAFFT
4
Use native RFAMalignments as reference
Select randomRNA family
I. Generating positive controls for algorithm benchmarking
Most identified genetic variants associated to complex diseases occur in non-coding regions of the genome with no evidence of purifying evolutionary selection.
Using an optimised sliding-window approach, we report that a large proportion of 35 sequenced mammalian genomes harbors evolutionarily conserved RNA structure motifs with unprecedented accuracy.
We propose that the higher-order structural components of RNA serve as a flexible and modular evolutionary platform for the diversification of genetic regulatory mechanisms, assisted by low penetrance of affected alleles and by compensatory base-pairing.
Over 75% of the human genome is processed into RNA, with only 2% encoding proteins.
III. Performance on chr10
II. Performance on benchmarking data
0%
20%
40%
60%
80%
100%
SpecificitySensitivity
Partial structure alignments [RFAM]
Partial sequence alignments [RFAM+MAFFT]
13
.6%
9.2
%
[5-2
2]%
Fal
se d
isco
very
rat
eGenomic background
RNAz 2.0
SISSIz 2.0
SISSIz 2.0 [+R]
Den
sity
0
0.1
0.2
0.3
5 10 15 20 25 30 35Species in alignment
2%10%4%
5%
1%
17%
13%
48%
2D structures
Gerp++
SyPhi-merged
PhastCons
3.5%
6.4%
0.9% 6.8%
3.5%
0.8%
3.3%
13.6%
4.4%1.3%
0.7% 4.1%
0.7%
1.3%
0.6%
3’UTR
5’UTR
CDS
Non-coding (0.3%)
Exonic 8%
Intergenic41%
Intronic55%
3%
3%
2%
Overlapping predictions (nt)
Cou
nt (l
og)
10
1000
100000
0 2000 4000
80 900
0.02
0.04
0.06
0.08
30 40 50 60 70Mean pairwise identity (%)
Den
sity
0
0.02
0.04
0.06
20 40 60 80G+C content (%)
Den
sity
SISSIz 2.0Compares a native consensus structure prediction against a
background distribution of randomized alignments
SISSIz 2.0 [+R]Similar to regular SISSIz but
employs a RIBOSUM sub-stitution matrix to score
compensatory mutations.
RNAz 2.0Employs a regression model
trained on known RNA structuresto classify sampled alignments as structred or non-structured
Overlap between predictions
0
4
8
12
Average runtime for 200 nt (s)
IV. Optimised genome-wide screen
Access predicted |structures in UCSC Genome Browser
A. Genomic distribution of evolutionarily conserved RNA structures
B. Overlap with annotated sequence constrained elements
Exon
ic
Intr
onic
CD
S
3’U
TR5
’UTR
Inte
rgen
ic
Fold
Enr
ichm
ent
vs U
nifo
rm D
istr
ibut
ion
Non
-cod
ing
Rep
eats
0.5
1
2.5
2
1.5
www.martinalexandersmith.com/ECS
>4,000,000high-confidence
predictions
Garvan Institute, Sydney, Australia Interdisciplinary Centre for Bioinformatics, Leipzig, GermanyCentre for Integrative Bioinformatics, Vienna, Austria
Less than 10% of the genome is currently defined as evolutionarily constrained.