27
3.31.2005 BIOL497 Undergraduate Presentati on, Stanislav Luban, Member of K ihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department of Computer Sciences 2. Department of Biological Sciences Purdue University, West Lafayette, IN Comparative Study of Small RNA and Small Peptides in Complete Genome Sequences

3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

Embed Size (px)

Citation preview

Page 1: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

1

Stanislav Luban1,2

Daisuke Kihara2,1

1. Department of Computer Sciences2. Department of Biological Sciences

Purdue University, West Lafayette, IN

Comparative Study of Small RNA and Small Peptides in

Complete Genome Sequences

Page 2: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

2

Introduction: Structural Small RNA (sRNA)

Genes which produce non-coding transcripts that function directly as structural, regulatory, or catalytic RNAs

Include rRNAs, tRNAs, small nucleolar RNAs, spliceosomal RNAs, viral associated RNAs, microRNAs, ctRNAs, and others

In Rfam (RNA families) database, 34496 sRNA entries distributed among 352 known families are stored

In E. coli, about 50 sRNAs are known

(figure from Rfam database: http://www.sanger.ac.uk/Software/Rfam/)

Page 3: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

3

Methods: QRNAModel distinctive pattern of mutation: Conserved Structural RNA

Pattern of compensatory mutations consistent with base-paired secondary structure

Pair Stochastic Context-Free Grammar Model Conserved Coding Region

Pattern of synonymous codon substitutions Pair Hidden Markov Model

Other Types of Conserved Regions Approximated by “null hypothesis” that mutations occur position

independently, without pattern Pair Hidden Markov Model

Scores are log likelihoods used to calculate final log odds score for RNA model compared to other two models

(Figure: Rivas et al, Current Biol. 2001)

Page 4: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

4

Procedure for Extracting sRNAs

Extract Intergenic RegionsFrom 30 Sequenced Genomes

Perform All Vs. All Nucleotide-Nucleotide BLAST

Run QRNA, Extract AlignmentsScoring as sRNAs vs. Coding and Null Hypothesis Regions

Select Significant Alignments,Concatenate and Format into QRNA Program Input

Verify Results ComputationallyAnd Experimentally

(Yet To Be Done)

Eliminate Alignment Regions Which Overlap >50% with E. coli Regulatory Regions

Extend Regions Within 25 ntOf Other Regions Causing

Them To Include Each Other

Merge sRNA Regions Which Align or Exactly Overlap Into Families

Eliminate Family Regions Not Found Using Both Query And

Database Organism As Source

Page 5: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

5

Genome Data Set30 Microbial Genomes Used as Queries and Databases:

Gammaproteobacteria

Acinetobacter calcoaceticus Blochmannia floridanus Buchnera aphidicola Coxiella burnetii Erwinia carotovora Escherichia coli Haemophilus ducreyi Haemophilus influenzae Pasteurella multocida Photorhabdus luminescens Pseudomonas aeruginosa Pseudomonas putida Pseudomonas syringae Salmonella enterica Salmonella typhimurium Shewanella oneidensis Shigella flexneri Vibrio cholerae Vibrio parahaemolyticus Vibrio vulnificus Wigglesworthia brevipalpis Xanthomonas campestris Xanthomonas citri Xylella fastidiosa Yersinia pestis

Alphaproteobacteria

Agrobacterium tumefaciens Brucella melitensis Caulobacter crescentus Mesorhizobium loti

Deinococci

Deinococcus radiodurans

Page 6: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

6

Result Statistics

Total number of intergenic regions: 94464

Average number of intergenic regions per organism: 3148.8

Total combined length of intergenic regions: 16663732 nt

Average length of intergenic region: 176.4 nt

Page 7: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

7

sRNA Length vs. Score Plot

0

20

40

60

80

100

120

140

0 500 1000 1500 2000

Length (nt)

Sc

ore

(lo

g o

dd

s)

Total: 29488 sRNAs

Page 8: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

8

875

10361094

976

383

744679

1189

15101372

773715

612

1412

316

1003

1848

503 455

729807

271

1000

2178

401

1402

1788

13311201

885

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Organism

Num

ber

of s

RN

A E

ntri

esTotal: 29488 sRNAs

Number of sRNA Entries by Organism

1 - Pseudomonas putida2 - Shigella flexneri3 - Xanthomonas citri4 - Shewanella oneidensis5 - Wigglesworthia brevipalpis6 - Haemophilus ducreyi7 - Pseudomonas syringae8 - Erwinia carotovora9 - Escherichia coli10 - Vibrio parahaemolyticus11 - Mesorhizobium loti12 - Buchnera aphidicola13 - Brucella melitensis14 - Yersinia pestis15 - Xylella fastidiosa16 - Pseudomonas aeruginosa17 - Salmonella enterica18 - Caulobacter crescentus19 - Agrobacterium tumefaciens20 - Blochmannia floridanus21 - Pasteurella multocida22 - Deinococcus radiodurans23 - Vibrio cholerae24 - Photorhabdus luminescens25 - Coxiella burnetii26 - Vibrio vulnificus27 - Salmonella typhimurium28 - Acinetobacter calcoaceticus29 - Xanthomonas campestris30 - Haemophilus influenzae

Page 9: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

9

Conservation of sRNAs

0

2797

594

240

7526 17 11 1 1 0 2 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1

0

500

1000

1500

2000

2500

3000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Number of Entries from Distinct Organisms

Nu

mb

er o

f sR

NA

Fam

ilie

s

Total: 3768 families

Page 10: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

10

0

2797

594

240

75 26 17 11 1 1 0 2 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 10

247121107 46 10 10 7 0 1 0 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1

0

500

1000

1500

2000

2500

3000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Number of Entries from Distinct Organisms

Nu

mb

er o

f sR

NA

Fam

ilie

s

Conservation of sRNAs

Total: 3768 families

E. Coli Total: 554 families

Along with statistics for all entries, statistics for entries containing at least one entry from E. coli were added for comparison

Page 11: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

11

Common OrganismCombinations in Families Top 5 most frequent combinations of 4 and 7 organisms:

Combination: Occurances:

Ecoli, Senterica, Sflexneri, Styphimurium 117

Ecarotovora, Ecoli, Senterica, Styphimurium 26

Ecoli, Senterica, Styphimurium, Ypestis 20

Ecarotovora, Ecoli, Sflexneri, Styphimurium 18

Ecoli, Sflexneri, Styphimurium, Ypestis 17

Ecarotovora, Ecoli, Pluminescens, Senterica, Sflexneri, Styphimurium, Ypestis 4

Acalcoaceticus, Ccrescentus, Mloti, Paeruginosa, Pputida, Psyringae, Xcampestris 2

Acalcoaceticus, Atumefaciens, Ccrescentus, Mloti, Pputida, Psyringae, Xcampestris 2

Acalcoaceticus, Atumefaciens, Ccrescentus, Mloti, Paeruginosa, Psyringae, Xcampestris 2

Acalcoaceticus, Atumefaciens, Ccrescentus, Mloti, Paeruginosa, Pputida, Xcampestris 2

Page 12: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

12

Result Verification

71 total sRNAs related to E. coli already found to be annotated in Rfam database were used as benchmark

Of those: 15 – found by computational method that were also listed in Rfam and not

tRNAs 6 – not found due to shortcomings of method 29 – tRNAs already annotated as gene loci in E. coli genome sequence used 10 – E. coli plasmid loci not found in full E. coli genome sequence used 2 – 4.5S RNAs already annotated as gene loci in E. coli genome sequence used 2 – E. coli reverse transcriptase loci not found in full E. coli genome sequence used 1 – E. coli insertion sequence not found in full E. coli genome sequence used 1 – E. coli small RNA annotated separately, not found in full E. coli genome sequence used 1 – Antisense RNA already annotated as gene locus in E. coli genome sequence used 1 – Cloning vector with E. coli promoter not found in full E. coli genome sequence used 1 – E. coli transposable element not found in full E. coli genome sequence used 1 – Reporter vector not found in full E. coli genome sequence used 1 – E. coli retron not found in full E. coli genome sequence used

Page 13: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

13

Candidates for ExperimentalVerification of Findings

For the following 2 slides:

Family designation expressed as [Organism name] [locus absolute start location] [locus absolute end location] and is synonymous with the first (header) entry of that family

Entries refer to number of different organism (2 chromosomes counted separately) sRNA entries in the family

Length (nt) and score only refer to the header entry of the family

Scores calculated by QRNA program with log odds post for RNA likelihood as opposed to null hypothesis

Page 14: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

14

Candidates for ExperimentalVerification of Findings Top 10 highest statistically scoring E. coli sRNA loci

found by computational method:

Family designation: Ecoli 3941194 3941327 Length: 133 Score: 34.114 Family designation: Ecoli 2744345 2744445 Length: 100 Score: 29.631 Family designation: Ecoli 780875 781068 Length: 193 Score:

29.194 Family designation: Ecoli 2687537 2687689 Length: 152 Score: 27.734 Family designation: Ecoli 2519348 2519548 Length: 200 Score: 23.876 Family designation: Ecoli 4169337 4169400 Length: 63 Score: 21.625 Family designation: Ecoli 4038218 4038281 Length: 63 Score: 21.596 Family designation: Ecoli 2751994 2752022 Length: 28 Score: 20.893 Family designation: Ecoli 3420989 3421058 Length: 69 Score:

20.821 Family designation: Ecoli 3808832 3808858 Length: 26 Score:

16.995

Page 15: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

15

Candidates for ExperimentalVerification of Findings

Top 10 largest sRNA families found by computational method:

Family designation: Styphimurium 3358766 3358804 Entries: 18 Length: 38 Score: 4.590 Family designation: Ecarotovora 3161909 3161946 Entries: 15 Length: 37 Score: 12.604 Family designation: Ecarotovora 1144121 1144141 Entries: 12 Length: 20 Score: 5.265 Family designation: Styphimurium 3342804 3342899 Entries: 12 Length: 95 Score: 4.328 Family designation: Ecarotovora 2597534 2597593 Entries: 10 Length: 59 Score: 3.343 Family designation: Paeruginosa 2508264 2508282 Entries: 9 Length: 18 Score: 7.068 Family designation: Styphimurium 975191 975219 Entries: 8 Length: 28 Score: 16.296 Family designation: Styphimuriu 3746886 3746903 Entries: 8 Length: 17 Score: 1.146 Family designation: Ecarotovora 3477891 3477922 Entries: 8 Length: 31 Score: 2.697 Family designation: Ecarotovora 4490537 4490683* Entries: 7 Length: 146 Score:

16.753

*This last entry was used a sample for detailed study and is discussed subsequently.

Page 16: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

16

Detailed Study of Located Sample sRNA

Organism Location (in genome) Length(nt) Score Neighboring Genes

Ecarotovora 4490537-4490683 146 16.753 rpsM - rpmJPluminescens 5487752-5487866 114 10.791 rpsM - secYYpestis 232330-232476 146 15.757 rpmJ - rpsMStyphimurium 3585744-3585879 135 41.980 rpsM - rpmJSenterica 4243623-4243770 147 40.046 rpmJ - rpsMEcoli 3440108-3440255 147 43.556 rpsM - rpmJSflexneri 3426855-3427002 147 41.980 rpsM - rpmJ

Hit to Alpha_RBS RNA (Rfam: RF00140) (115 nt)

Rfam Sequence: GUCCUUGAUAUUCUGUUUGAGUAUCCUGAAAACGGGCUUUUCAAGAUCAGAAUAUCAAAUUAAUUAAAAUAUAGGAGUGCAUAGUGGCCCGUAUUGCAGGCAUUAACAUUCCUGAU

Page 17: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

17

Most Likely (Lowest Free Energy) Predicted Fold of 80 nt Segment of Sequence

Mfold by Zuker et al, 2004 Used

Detailed Study of Located Sample sRNA

Page 18: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

18

Another Approach to Finding sRNAs in E. Coli: Paper Summary

Page 19: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

19

Method Used in Paper to Find Putative sRNAs

A database of all E. coli intergenic DNA sequences was created based on gene annotations in early release of the EcoGene database, and used as input to profile search program (pftools2.2, Swiss Bioinformatics Institute) set to find sigma-70 promoter

Terminator motif was searched for in database using following search criteria: (1) An 11-nt A-rich region; (2) variable-length hairpin; (3) variable-length spacer; (4) 5-nt T-rich region nearest the hairpin; and (5) 7-nt distal extra T-rich region

Predicted promoter and terminator pairs were combined to generate putative sRNAs if (1) pair was on same strand; and (2) pair was greater than 45 but less than 350 nt apart

To verify, open reading frames and possible ribosome binding sites were searched for downstream of each promoter

Page 20: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

20

Synopsis of Method Used in Paper

Using the E. Coli MG1655 genome, DNA regions that contained a sigma-70 promoter within a short distance of a rho-independent terminator were searched for

227 putative sRNAs between 80 and 400 nt in length were predicted in E. coli by paper, 32 of which were already known to be sRNAs

Transcripts of some of the candidate loci were verified using Northern hybridization

Approach may possibly be used in annotating sRNA loci in other bacterial genomes

Page 21: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

21

Verification of Paper Results with Results Using Our Method

Along with other results, the paper gives a detailed listing of the 277 sRNAs predicted, including the designation, strand orientation (forward or reverse), left and right boundaries (nt from genome start position), and length (nt) of each sRNA

Left and right boundary positions in genome given by paper were compared with left and right boundary positions of putative sRNAs found by our method

If an sRNA candidate from the paper was within 100 nt of any sRNA predicted by our method, that sRNA was scored as ‘found’

Page 22: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

22

Results of Verification

227 candidate sRNAs were predicted in E. coli by the paper

Among them, 150 (66.1 %) were localized by our method, according to previously utilized criteria

The test was re-run with a 50 nt threshold, yielding 140 hits (61.7 %), a 10 nt threshold, yielding 128 hits (56.4 %), and a 1000 nt threshold, yielding 187 hits (82.4 %)

Page 23: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

23

Preliminary Procedure for Extracting Small Peptides

Extract Intergenic RegionsFrom 30 Sequenced Genomes

Perform All Vs. All Nucleotide-Nucleotide BLAST

Run QRNA, Extract AlignmentsScoring as Coding vs. sRNA and Null Hypothesis Regions

Select Significant Alignments,Concatenate and Format into QRNA Program Input

Observe Results and RefineExtraction Method

Extend Regions Within 25 ntOf Other Reions Causing

Them To Include Each Other

Merge sRNA Regions Which Align or Exactly Overlap Into Families

Blast Resulting Family EntriesAgainst SwissProt Database

Score Regions Based onQuality of Fit Inside a Nearby

Open Reading Frame

Page 24: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

24

Preliminary Results of Small Peptide Search

Organism Location (in genome) Length(nt) E-ValueErwinia Carotovora 843815-843948 133 0.69

Aligns To gb|AAF36091.1| flagelliform silk protein [Nephila madagascariensis]

Sequenceaattccgtcgcatgttctctggtgagtacgacagcgcggattgctatctggatattcaggcgggatctggcggtacggaagcgcaggactgggccagcatgctggtacgtatgtacctgcgttgggcggaagc

Query: 133 LPPNAGTYVPACWPSPALPYRQIPPEYPDSNP 38

Subject: 1373 LPPLXTSXXPPPPPPPSXPLXSLPPSXPPSLP 1278

Tblastx Alignment

Query Sequence Information

Page 25: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

25

Preliminary Results of Small Peptide Search

Organism Location (in genome) Length(nt) E-ValuePseudomonas syringae 6171796-6172006 210 0.23

Aligns Toemb|CAD88221.2| C. elegans GRL-25 protein (corresponding sequence ZK 643.8)

Sequencetgagttccggcagctcgtcatccagcttctgacgcaaccgcccggtcagaaacgcaaagccctcgagcaaccgctccacatccggatcccgtccggcctgccccagaaacggcgccaacgccggactacgctcggcgaagcgacgaccaagctggcgcagtgcagtgagttcgctctggtagtaatggttaaaggacacgggttacctgc

Query: 62 PRATAPHPDPVRPAPETAPTP 124Subject: 90 PPAPAPRPPPVAPAPRPLPPP 28

Tblastx Alignment

Query Sequence Information

Page 26: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

26

Conclusions

Possible sRNAs are found from 20~39% of the intergenic regions in each organism

Among them, ~31% of the sRNAs satisfy the log-odds score threshold of 5.0 or higher

137 “families” are conserved in equal to or more than 5 organisms

Being well conserved, sRNAs may be responsible for fundamental functions of living organisms

Page 27: 3.31.2005BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ. 1 Stanislav Luban 1,2 Daisuke Kihara 2,1 1. Department

3.31.2005 BIOL497 Undergraduate Presentation, Stanislav Luban, Member of Kihara Lab, Purdue Univ.

27

Future Direction

Search for sRNAs will be expanded to a larger quantity of more diverse genomes

Secondary structure prediction will be later employed in greater detail to verify well conserved sRNA regions among multiple evolutionarily distant organisms

Experimental verification of the findings of this particular study under way (particularly for Shewanella oneidensis)

Comparative genomics will be used to discover the function associated with each sRNA and possibly lead to learning its part in pathway