RNA meets promoter(The story of Old, New, Borrowed and Blue)
Wolfgang Otto
Chair in Bioinformatics, University of Leipzig
Herbstseminar, October 2009
Introduction Method Results
Motivation
• background: ncRNA annotationwithin the worms (in thispresentation only theCaenorhabditis, P. pacificus hasslightly different promoters whileB. malayi, M. haplanaria andM. incognita acrita have norecognisable promoter elements)
• goal: find new members of thencRNA families (and maybe genesof unknown families)
Introduction Method Results
Idea
• find ncRNA genes by theirpromoters
1. analyse promoters of knownworm ncRNA genes
2. search for similar motifs in wormgenomes
3. checking DNA downstream ofpromoter like regions
Introduction Method Results
Promoter Analyses
• ncRNAs are transcribed by polymerase II and III
• extracted 100nt upstream of U3 snoRNA, 7SK RNA, SL1 andSL2 RNA, SmY RNA, U1, U2, U4 and U5 snRNA (pol II) and100nt upstream of RNase MRP, RNase P, sbRNA andU6 snRNA (pol III)
• align extracted sequences based on PSE-B box (included inevery promoter sequence)
• sequences without promoter sequences were used to identifypseudo-genes
Introduction Method Results
Global Promoter Search
• used alignment to create fragrep pattern for each kind ofpromoter sequence
• remarkable: two kinds of PSE-B boxes in pol II promoter
pol II (a) U3 snoRNA, 7SK RNA, SL1 RNApol II (b) SL2 RNA, SmY RNA, U1, U2, U4 and
U5 snRNA
Introduction Method Results
Global Promoter Search
• pattern search with fragrep and relative variable scores ⇒
high number of candidates
• reduce number of false positive hits• use mismatch-based matrix similarity score (mmS)• shuffle genomes and repeat fragrep search, hits are 100%
false positive• define cutoff mmS where only 1% of the false positive hits
remain• use this cutoff mmS for original hits
Introduction Method Results
The Story of Old, New, Borrowed and Blue
• remaining hits are assumed to be real promoter sequences
1. adjust hits of putative ncRNAs, used for the patterngeneration [⇒old and borrowed]
2. look for new members of known families [⇒new]
• generate fasta db consisting of all known ncRNAs• blast search for downstream sequence of the hits (100nt) in
the fasta db
3. check remaining hits for ncRNA genes of unknown families[⇒blue]
Introduction Method Results
The Blue Part
• extract 70nt downstream of each promoter hit
• create clusters of similar sequences with blastclust
• remove all clusters that contain only sequences from onespecies
• check remaining inter-species clusters for potential ncRNAgenes (yet only UCSC genome browser)
• ⇒11 potential new ncRNA families
Introduction Method Results
Cluster 1 (c000124, Pol III)
• chrI:8,245,471-8,245,683(-); MFE = −14.50kcal/mol
• EST, GB: BJ118936.1 (unpublished oligo-capped cDNAlibrary, L1 stage)
ScalechrI:
Gap
Other RefSeq
Spliced ESTs
Conservation
c_remaneic_briggsaec_brenneric_japonicap_pacificus
P. pacificus Net
C. japonica Net
C. brenneri Net
C. briggsae Net
C. remanei Net
RepeatMasker
100 bases8245500 8245550 8245600 8245650
Gap Locations
GC Percent in 5-Base Windows
WormBase Gene AnnotationsRefSeq Genes
Non-C. elegans RefSeq Genes
C. elegans mRNAs from GenBankC. elegans ESTs That Have Been Spliced
Multiz Alignment & Conservation (6 nematodes)
P. pacificus (Feb. 2007/priPac1) Alignment Net
C. japonica (Mar. 2008/caeJap1) Alignment Net
C. brenneri (Feb. 2008/caePb2) Alignment Net
C. briggsae (Jan. 2007/cb3) Alignment Net
C. remanei (May 2007/caeRem3) Alignment Net
Repeating Elements by RepeatMasker
GC Percent
70 _
30 _
AUUAA
UUU
A GUUGCAGUGACC
UC
A
GC
GU
CCAA
UU
CAU
AA
U C AU
UG U UU
CUA
AGACGGCACUUCC
UAG A A
C AG
UUUC
AU
CACCGGUCUGCAAU
AAA
Introduction Method Results
Cluster 2 (c000147, Pol III)
• chrII:5,599,660-5,599,872(+); MFE = −21.00kcal/mol
ScalechrII:
Gap
Other RefSeq
Spliced ESTs
Conservation
c_remaneic_briggsaec_brenneric_japonicap_pacificus
P. pacificus Net
C. japonica Net
C. brenneri Net
C. briggsae Net
C. remanei Net
RepeatMasker
100 bases5599650 5599700 5599750 5599800
Gap Locations
GC Percent in 5-Base Windows
WormBase Gene AnnotationsRefSeq Genes
Non-C. elegans RefSeq Genes
C. elegans mRNAs from GenBankC. elegans ESTs That Have Been Spliced
Multiz Alignment & Conservation (6 nematodes)
P. pacificus (Feb. 2007/priPac1) Alignment Net
C. japonica (Mar. 2008/caeJap1) Alignment Net
C. brenneri (Feb. 2008/caePb2) Alignment Net
C. briggsae (Jan. 2007/cb3) Alignment Net
C. remanei (May 2007/caeRem3) Alignment Net
Repeating Elements by RepeatMasker
GC Percent
70 _
30 _
UC
UA
CC
CG
UGAU
GAAGAAAUUAGAUCCA
AC
UCCCA
G G CCA
GU
UGA U A C
GU
CU
UC
UG
GU U C
AU
GCA
GAUA
AAG
GC
G
A
ACG
AACGGGUU
UU
GA U
AA
UUUU
GG
Introduction Method Results
Cluster 3 (h0000010, Pol III)
• chrII:14,635,601-14,636,068(-)
ScalechrII:
Other RefSeq
Spliced ESTs
Conservation
c_remaneic_briggsaec_brenneric_japonicap_pacificus
P. pacificus Net
C. japonica Net
C. brenneri Net
C. briggsae Net
C. remanei Net
RepeatMasker
100 bases14635650 14635700 14635750 14635800 14635850 14635900 14635950 14636000 14636050
WormBase Gene Annotations
RefSeq Genes
Non-C. elegans RefSeq Genes
C. elegans mRNAs from GenBankC. elegans ESTs That Have Been Spliced
Multiz Alignment & Conservation (6 nematodes)
P. pacificus (Feb. 2007/priPac1) Alignment Net
C. japonica (Mar. 2008/caeJap1) Alignment Net
C. brenneri (Feb. 2008/caePb2) Alignment Net
C. briggsae (Jan. 2007/cb3) Alignment Net
C. remanei (May 2007/caeRem3) Alignment Net
Repeating Elements by RepeatMasker
C38C6.4
sre-13