Upload
akbio
View
83
Download
1
Embed Size (px)
DESCRIPTION
Bioinformatics Practicals using C++, Perl, BioPerl and R language
Citation preview
Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 629 180 2.1
PRACTICAL: 01 TRANSCRIPTION AND TRANSLATION USING PERL
/ / 201
AIM:
To write a PERL program to find transcription/translation/complement/reverse complement of a
DNA/RNA/Protein sequence from user’s choice.
SOFTWARE USED:
Perl 5.16.2
SOURCE CODE:
x:
system("cls");
print "\nCentral Dogma Menu:-\n";
print "------------------\n";
print "0. Exit\n";
print "1. Complement\n";
print "2. Reverse Complement\n";
print "3. Transcription\n";
print "4. Translation\n";
print "\nEnter your choice: ";
$choice = <>;
if ($choice == 1)
{
&Complement;
}
elsif ($choice == 2)
{
&RevComplement;
}
elsif ($choice == 3)
{
&Transcription;
}
elsif ($choice == 4)
{
&Translation;
}
elsif ($choice == 0)
{
exit;
}
else
{
print "Enter a valid number !!!\n";
<>;
goto x;
}
sub Complement()
{
system("cls");
print "Enter the DNA sequence:\n";
$seq = <>;
chomp($seq);
$seq =~s/[^actg]//ig;
$seq =~ tr/ATCGatcg/TAGCtagc/;
print "\nComplement of the DNA sequence is:\n$seq";
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.2
<>;
goto x;
}
sub RevComplement()
{
system("cls");
print "Enter the DNA sequence:\n";
$seq = <>;
chomp($seq);
$seq =~s/[^actg]//ig;
$seq =~ tr/ATCGatcg/TAGCtagc/;
$seq = reverse($seq);
print "\nReverse complement of the DNA sequence is:\n$seq";
<>;
goto x;
}
sub Transcription()
{
system("cls");
print "Enter the DNA sequence:\n";
$seq = <>;
chomp($seq);
$seq =~s/[^actg]//ig;
$seq =~ tr/Tt/Uu/;
print "\nTranscribed RNA sequence is:\n$seq";
<>;
goto x;
}
sub Translation()
{
system("cls");
print "Enter the DNA sequence:\n";
$seq = <>;
chomp($seq);
$seq =~s/[^actg]//ig;
$seq =~ tr/Tt/Uu/;
my $seq = uc($seq);
my %CodonMap = (
'GCA'=>'A', 'GCC'=>'A', 'GCG'=>'A', 'GCU'=>'A',
'UGC'=>'C', 'UGU'=>'C',
'GAC'=>'D', 'GAU'=>'D',
'GAA'=>'E', 'GAG'=>'E',
'UUC'=>'F', 'UUU'=>'F',
'GGA'=>'G', 'GGC'=>'G', 'GGG'=>'G', 'GGU'=>'G',
'CAC'=>'H', 'CAU'=>'H',
'AUA'=>'I', 'AUC'=>'I', 'AUU'=>'I',
'AAA'=>'K', 'AAG'=>'K',
'UUA'=>'L', 'UUG'=>'L', 'CUA'=>'L', 'CUC'=>'L', 'CUG'=>'L', 'CUU'=>'L',
'AUG'=>'M',
'AAC'=>'N', 'AAU'=>'N',
'CCA'=>'P', 'CCC'=>'P', 'CCG'=>'P', 'CCU'=>'P',
'CAA'=>'Q', 'CAG'=>'Q',
'CGA'=>'R', 'CGC'=>'R', 'CGG'=>'R', 'CGU'=>'R', 'AGA'=>'R', 'AGG'=>'R',
'UCA'=>'S', 'UCC'=>'S', 'UCG'=>'S', 'UCU'=>'S', 'AGC'=>'S', 'AGU'=>'S',
'ACA'=>'T', 'ACC'=>'T', 'ACG'=>'T', 'ACU'=>'T',
'GUA'=>'V', 'GUC'=>'V', 'GUG'=>'V', 'GUU'=>'V',
'UGG'=>'W',
'UAC'=>'Y', 'UAU'=>'Y',
'UAA'=>'_', 'UAG'=>'_', 'UGA'=>'_');
my $protein = "";
for (my $i=0; $i<length($seq)-2; $i+=3)
{
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.3
$codon = substr($seq,$i,3);
$protein .= $CodonMap{$codon};
}
print "\nTranslated protein sequence is:\n$protein";
<>;
goto x;
}
INPUT/OUTPUT:
Central Dogma Menu:-
------------------
0. Exit
1. Complement
2. Reverse Complement
3. Transcription
4. Translation
Enter your choice: 4
Enter the DNA sequence:
ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGTCTTCTTGTCT
Translated protein sequence is:
TAVSILPGSGVMVHHQFSPSLLV
RESULT:
A program in PERL is written to find transcription/translation/complement/reverse complement
of a DNA/RNA/Protein sequence from user’s choice and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 629 180 2.4
PRACTICAL: 02 SIX READING FRAMES USING PERL
/ / 201
AIM:
To write a PERL program to translate a DNA sequence in all six reading frames.
SOFTWARE USED:
Perl 5.16.2
SOURCE CODE:
system("cls");
print "Six Reading Frames:-\n";
print "------------------\n\n";
print "Enter the DNA sequence:\n";
$seq = <>;
chomp($seq);
$seq =~s/[^actg]//ig;
$seq =~ tr/Tt/Uu/;
my $seq = uc($seq);
my %CodonMap = (
'GCA'=>'A', 'GCC'=>'A', 'GCG'=>'A', 'GCU'=>'A',
'UGC'=>'C', 'UGU'=>'C',
'GAC'=>'D', 'GAU'=>'D',
'GAA'=>'E', 'GAG'=>'E',
'UUC'=>'F', 'UUU'=>'F',
'GGA'=>'G', 'GGC'=>'G', 'GGG'=>'G', 'GGU'=>'G',
'CAC'=>'H', 'CAU'=>'H',
'AUA'=>'I', 'AUC'=>'I', 'AUU'=>'I',
'AAA'=>'K', 'AAG'=>'K',
'UUA'=>'L', 'UUG'=>'L', 'CUA'=>'L', 'CUC'=>'L', 'CUG'=>'L', 'CUU'=>'L',
'AUG'=>'M',
'AAC'=>'N', 'AAU'=>'N',
'CCA'=>'P', 'CCC'=>'P', 'CCG'=>'P', 'CCU'=>'P',
'CAA'=>'Q', 'CAG'=>'Q',
'CGA'=>'R', 'CGC'=>'R', 'CGG'=>'R', 'CGU'=>'R', 'AGA'=>'R', 'AGG'=>'R',
'UCA'=>'S', 'UCC'=>'S', 'UCG'=>'S', 'UCU'=>'S', 'AGC'=>'S', 'AGU'=>'S',
'ACA'=>'T', 'ACC'=>'T', 'ACG'=>'T', 'ACU'=>'T',
'GUA'=>'V', 'GUC'=>'V', 'GUG'=>'V', 'GUU'=>'V',
'UGG'=>'W',
'UAC'=>'Y', 'UAU'=>'Y',
'UAA'=>'_', 'UAG'=>'_', 'UGA'=>'_');
my $protein = "";
for (my $i=0; $i<length($seq)-2; $i+=3)
{
$codon = substr($seq,$i,3);
$protein .= $CodonMap{$codon};
}
print "\nForward Frame 1:\n$protein\n";
my $protein = "";
for (my $i=1; $i<length($seq)-2; $i+=3)
{
$codon = substr($seq,$i,3);
$protein .= $CodonMap{$codon};
}
print "\nForward Frame 2:\n$protein\n";
my $protein = "";
for (my $i=2; $i<length($seq)-2; $i+=3)
{
$codon = substr($seq,$i,3);
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.5
$protein .= $CodonMap{$codon};
}
print "\nForward Frame 3:\n$protein\n";
my $protein = "";
$rev_seq = reverse($seq);
for (my $i=0; $i<length($rev_seq)-2; $i+=3)
{
$codon = substr($rev_seq,$i,3);
$protein .= $CodonMap{$codon};
}
print "\nReverse Frame 1:\n$protein\n";
my $protein = "";
$rev_seq = reverse($seq);
for (my $i=1; $i<length($rev_seq)-2; $i+=3)
{
$codon = substr($rev_seq,$i,3);
$protein .= $CodonMap{$codon};
}
print "\nReverse Frame 2:\n$protein\n";
my $protein = "";
$rev_seq = reverse($seq);
for (my $i=2; $i<length($rev_seq)-2; $i+=3)
{
$codon = substr($rev_seq,$i,3);
$protein .= $CodonMap{$codon};
}
print "\nReverse Frame 3:\n$protein\n";
<>;
INPUT/OUTPUT:
Six Reading Frames:-
------------------
Enter the DNA sequence:
ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGTCTTCTTGTCT
Forward Frame 1:
TAVSILPGSGVMVHHQFSPSLLV
Forward Frame 2:
PPSPFFQDPA_WCTTSFRPVFLS
Forward Frame 3:
RRLHSSRIRRNGAPPVFAQSSC
Reverse Frame 1:
SVLLTRF_PPRGNAA_DLLTSAA
Reverse Frame 2:
LFF_PAFDHHVVMRPRTFLPLPP
Reverse Frame 3:
CSSDPLLTTTW_CGLGPSYLCR
RESULT:
A program in PERL is written to translate a DNA sequence in all six reading frames and
executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 629 180 2.6
PRACTICAL: 03 DOWNLOAD SEQUENCE FROM DATABASE USING BIOPERL
/ / 201
AIM:
To write a BioPERL program to download a nucleotide/protein sequence from a biological
sequence database.
SOFTWARE USED:
Perl 5.16.2
BioPerl 1.6.1
SOURCE CODE:
Gene sequence retrieval from GenBank database system("cls");
use strict;
use Bio::SeqIO;
use Bio::DB::GenBank;
my $genBank = Bio::DB::GenBank->new;
print "\nGenBank Sequence Download:-";
print "\n-------------------------\n";
print "\nAccession No. (AF060485):\n";
my $acc = <>; chomp($acc);
my $seq = $genBank->get_Seq_by_acc($acc);
my $seqOut = Bio::SeqIO->new(-file => ">$acc.fasta", -format => 'fasta');
$seqOut->write_seq($seq);
print "\nDownloaded Successfuly!";
<>;
INPUT/OUTPUT:
Gene sequence retrieval from GenBank database (AF060490.fasta)
GenBank Sequence Download:-
-------------------------
Accession No. (AF060485):
AF060490
Downloaded Successfuly!
>AF060490 Mus musculus TLS-associated protein TASR-2 mRNA, complete cds.
GTGTGGTGTGAGTGGATGTGAGCCGCCGCCGGAGCTGCGGACGGTTTGCCCGAGCCCGTT
AGCGCCGCCGGCCCAGAGTCCCGCCGCCACCATGTCCCGATACCTGCGCCCCCCTAACAC
GTCTCTGTTCGTCAGGAACGTGGCGGACGACACCAGGTCTGAAGATTTACGTCGGGAATT
TGGTCGTTATGGTCCAATAGTAGATGTTTATGTCCCACTTGATTTCTACACTCGGCGTCC
AAGAGGATTTGCATATGTTCAATTTGAGGATGTTCGTGATGCTGAAGACGCTTTACATAA
TTTGGACAGAAAATGGATTTGTGGGCGTCAGATTGAAATCCAGTTCGCACAGGGGGATCG
GAAGACACCAAATCAAATGAAAGCCAAGGAAGGGAGGAATGTATACAGCTCTTCACGATA
TGACGATTATGACCGATATAGACGCTCTCGAAGCCGGAGTTATGAAAGGAGAAGATCGAG
GAGTCGCTCCTTTGATTATAACTATAGGAGATCTTACAGTCCTAGAAACAGTAGACCGAC
TGGAAGACCACGGCGTAGCCGAAGCCATTCCGACAATGATAGATTCAAACACCGAAATCG
ATCTTTTTCAAGATCTAAATCCAATTCAAGATCACGGTCCAAGTCCCAGCCCAAGAAAGA
AATGAAGGCTAAATCACGTTCTAGGTCTGCATCTCACACCAAAACTAGAGGCACCTCTAA
AACAGATTCCAAAACACATTATAAGTCTGGCTCAAGATATGAAAAGGAATCAAGGAAAAA
AGAACCACCTAGATCCAAATCTCAGTCAAGATCACAGTCTAGGTCTAGGTCAAAATCTAG
GTCAAGGTCTTGGACTAGTCCCAAGTCCAGTGGCCACTGATAGTATAAATTATGATACTT
CTAGGCATGTATCATTCATTTACTCATAGTTTGGTATACTTAAATTATCAGGAATACAAT
GTTGCAATGATGCGTTTTAAAAACAAACAAACTTAACTTGTTAGTTTTCCCTGTACTGGG
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.7
CAATGGTTATAATTAAAAAGATGCGCTGTTGAGAAGCCACTCTTAAGAGTCCAGTTTGTT
TAATGTTATGGGCAGCTACCAATTTGTGGTGTCTCTGTATATTTTTGTAAAGATTCTCAT
TTTTTATGCTTGAAGTATTTGGTGAAAAGATGTTGGTTGACCATAATTTGCAACATTGTC
TTATTAGAAATAAATTTTCATATCCATATTTGGTAGAACTGTTAACCTAGAAATGTAGCT
TGCTAATAAGATAGAATGATACAGAAGTGAAGTGGTAGCCACATTACAACACTGACTGCT
CAGACACATTTAGGTTCAGGGTGGACTTTATGTCTTGTCAAGATGTCTAAGCCCATGATG
ATTATTTATGATGCAATGTGGAATAGTTCTTTTGTTAAATCCACCATCTGGGGATTGATG
CCAACTGGGTTAAATAGCGTTTTCAGGGAGAGTGCCCTTTTCACTGAAACATGGAGCCTT
CACTGCTTTCCCCACCTCAATCCCTGCTGGTTTCTAAGATATGGAACATTAAAGCATAAG
GGAAAACCCTCCCCCTTAAGTTGTGAGTGAGTCAGTGATCACAGAAACCATTGTAAGGGG
AAAAGACTGTTCTTAGCATAGTTGCTCTAAATTTAACTATTGTTGATCATTGTTATTTAG
GGGTTTTGTTTTGTTGTTTGTTTTTTCTGTTAGAAACAAGTGAACTGTTTGAAAATACAT
TTTTGTTTGTTTATATGCATAGTGTAAAACAAACTGAATTTTGATGCTCACAGCACTTAC
CATGTGCGTTTGTATCAAAATCTGCCTGTTCTTCATAGGGGAGGCTTGCTCTTCACACCT
CAGTTTATTCATGTGAGACAGGCTGAGAAGATAACACTCCTAGGTGATTTTGTGGTGCCG
TGGATTTTTGGGGAAAGTTGAGTTTTAAGCAAAAGCCACATCACTTAGTTTTTGGTAATG
TAGGACATGACTAAAAAATAACGAAATGATACCCTTAAATATTTATAATTTCTAGTATTT
CAAGATTGTTTTGGAGGCAATAAAATGACTTGAAATGTCCGGTGTCATTTCAGAATACAA
AGCTAGTGTCTCTAAGATCTTAGATTCGTTGCTTACAGATGTGAGTGAAGATACTGTGGG
GGACGATCCTCCTGGAGGATTACCTTATTTTTTTCCTTTCGATTTTGTTTTTAGAAATTT
AGTCCTTGCTTGTAGACAACAAAAGATGGTTTTAAGAACTGTTTGTGGAATGTGTTTGGA
GGGTTAATTCTAGAACCTTTGTATATTTAATAGTATTTCTAACTTTTATTTCTTTACTGT
TTGCAGTTAATGTTCTTGTTCTGCTATGCAATCATTTATATGCACGTTTCTTTAATTTTT
TTAGATTTTCCTGGATGTATAGTTTAAACAAAGTCTATTTAAAACTGTAGCGGTAGTTTG
CAGTTCTAGCAAAGAGGAAAGTTGTGGGGTTAAACTTTGTATTTTCTTTCTTATAGAAGC
TTCTAAAAAGGTATTTTTATATGTTCTTTTTAACAAATATTGTGTACAACCTTTAAAACA
TCAATGTTTGGATCAAAACAAGACCCAGCTTATTTTCTGCTTGCTGTAAATTAAGCAAAG
ATGCTATAATAAAAACAAAATGAAGGAAAAAAAAAAAAAAAAAAAAAAAAAAA
RESULT:
A program using BioPERL is written to download a nucleotide/protein sequence from a
biological sequence database and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 629 180 2.8
PRACTICAL: 04 REMOTEBLAST USING BIOPERL
/ / 201
AIM:
To write a program to find homologous sequences for a query sequence, from biological
sequence database using RemoteBLAST using BioPERL.
SOFTWARE USED:
Perl 5.16.2
BioPerl 1.6.1
SOURCE CODE:
use Bio::Tools::Run::RemoteBlast;
use strict;
system("cls");
print "+------------------------------------+\n";
print "| Remote BLAST Program |\n";
print "+------------------------------------+\n";
print "\nEnter the following details:-\n";
print "\nProgram (blastn|blastp|blastx|tblastn|tblastx):\n";
my $prog = <>; chomp($prog);
print "\nDataBase (nr|swissprot|pdb|month):\n";
my $db = <>; chomp($db);
print "\nE-value (Example: 1e-10):\n";
my $e_val = <>; chomp($e_val);
my @params = ('-prog' => $prog,
'-data' => $db,
'-expect' => $e_val,
'-readmethod' => 'SearchIO');
my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
print "\nFile name (.fasta format):\n";
my $fname = <>; chomp($fname);
my $r = $factory->submit_blast($fname);
while ( my @rids = $factory->each_rid )
{
for my $rid ( @rids )
{
my $rc = $factory->retrieve_blast($rid);
my $result = $rc->next_result();
$factory->save_output("Blast\ Output.txt");
$factory->remove_rid($rid);
}
}
print "\nBlast output is generated successfully!";
<>;
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.9
INPUT:
+------------------------------------+
| Remote BLAST Program |
+------------------------------------+
Enter the following details:-
Program (blastn|blastp|blastx|tblastn|tblastx):
blastn
DataBase (nr|swissprot|pdb|month):
nr
E-value (Example: 1e-10):
1e-5
File name (.fasta format):
dna.fasta
Blast output is generated successfully!
OUTPUT:
BLASTN 2.2.27+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and
David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs", Nucleic
Acids Res. 25:3389-3402.
RID: F5FR6GCG015
Database: Nucleotide collection (nt)
17,084,706 sequences; 43,890,479,962 total letters
Query= gi|440487466|gb|JH795076.1| Magnaporthe oryzae P131 unplaced genomic
scaffold P131_scaffold00326, whole genome shotgun sequence
Length=980
Score E
Sequences producing significant alignments: (Bits) Value
ref|XM_003721193.1| Magnaporthe oryzae 70-15 initiation-speci... 1768 0.0
ref|XM_003711036.1| Magnaporthe oryzae 70-15 initiation-speci... 277 1e-70
ref|XM_003660234.1| Myceliophthora thermophila ATCC 42464 gly... 93.3 3e-15
gb|CP003002.1| Myceliophthora thermophila ATCC 42464 chromoso... 93.3 3e-15
ref|XM_001935551.1| Pyrenophora tritici-repentis Pt-1C-BFP al... 87.8 1e-13
ref|XM_003306105.1| Pyrenophora teres f. teres 0-1 hypothetic... 66.2 5e-07
ref|XM_003300282.1| Pyrenophora teres f. teres 0-1 hypothetic... 64.4 2e-06
ALIGNMENTS
>ref|XM_003721193.1| Magnaporthe oryzae 70-15 initiation-specific alpha-1,6-
mannosyltransferase
(MGG_02562) mRNA, complete cds
Length=1412
Score = 1768 bits (1960), Expect = 0.0
Identities = 980/980 (100%), Gaps = 0/980 (0%)
Strand=Plus/Minus
Query 1 ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGT 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 1144 ACCGCCGTCTCCATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGT 1085
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.10
Query 61 CTTCTTGTCTCCATGCTGTTGATTCATCGTGTCCGCAAAGGCGTAGTCGGGCAACACCAG 120
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 1084 CTTCTTGTCTCCATGCTGTTGATTCATCGTGTCCGCAAAGGCGTAGTCGGGCAACACCAG 1025
Query 121 CACGTCGCCCAACAGCCTGGGCTCTTTGACGTTGGCTATCTCGCCGTCGCCCACGGTCTC 180
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 1024 CACGTCGCCCAACAGCCTGGGCTCTTTGACGTTGGCTATCTCGCCGTCGCCCACGGTCTC 965
Query 181 ATTCAGCGTGTTGCTCAGACTCTTCAAGATGCCCCTCGTCAACCTGCGCGGGCCCGAAAC 240
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 964 ATTCAGCGTGTTGCTCAGACTCTTCAAGATGCCCCTCGTCAACCTGCGCGGGCCCGAAAC 905
Query 241 ATCGACAATGTCGTCAACCATGTCGAGCCTGAGGTCCTGGATCCCGCCGACCTTGCGCTC 300
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 904 ATCGACAATGTCGTCAACCATGTCGAGCCTGAGGTCCTGGATCCCGCCGACCTTGCGCTC 845
Query 301 CTTGGCCTTGGCGACCAGTCCCTCGAGACCATCTTGGACGGCCATCATCATGTGCGGCGA 360
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 844 CTTGGCCTTGGCGACCAGTCCCTCGAGACCATCTTGGACGGCCATCATCATGTGCGGCGA 785
Query 361 TTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCACTGGTCCACATCGAA 420
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 784 TTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCACTGGTCCACATCGAA 725
Query 421 CTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTGGGAACCCATTCGCTGAT 480
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 724 CTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTGGGAACCCATTCGCTGAT 665
Query 481 CGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTACAGGATCAG 540
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 664 CGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTACAGGATCAG 605
Query 541 GTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTGGCGACCAC 600
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 604 GTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTGGCGACCAC 545
Query 601 GTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCGTCCGTCAG 660
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 544 GTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCGTCCGTCAG 485
Query 661 GAACTCGACCTTGAAGCCGGGGTTCTTGGACACACAGGAGTCGACGTGGGGCTTGAGGTC 720
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 484 GAACTCGACCTTGAAGCCGGGGTTCTTGGACACACAGGAGTCGACGTGGGGCTTGAGGTC 425
Query 721 GTCCTTCAAGCCTGCAGGCCCGAGTTTGTACCACAGCCTTTGTGGTAGTGCTGCGACGGC 780
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 424 GTCCTTCAAGCCTGCAGGCCCGAGTTTGTACCACAGCCTTTGTGGTAGTGCTGCGACGGC 365
Query 781 AGTCGGCCCCGAGCTCGACGCGGCAGACGTGGTGGTTTCTTGTGCCAGCAGCGGCGCGGG 840
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 364 AGTCGGCCCCGAGCTCGACGCGGCAGACGTGGTGGTTTCTTGTGCCAGCAGCGGCGCGGG 305
Query 841 CTTCATCCGGGGTGTCGCAAAGGTGGGGCCGGCCTTCCATTCCGAAGGCCTGTGGAAATT 900
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 304 CTTCATCCGGGGTGTCGCAAAGGTGGGGCCGGCCTTCCATTCCGAAGGCCTGTGGAAATT 245
Query 901 GAGAATGAGGAAGCATATTGTGAGAAAGCTCAAGGCAGCCGGCACTTTGGCTGTCAAACG 960
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 244 GAGAATGAGGAAGCATATTGTGAGAAAGCTCAAGGCAGCCGGCACTTTGGCTGTCAAACG 185
Query 961 ATTGTGAAATGCCAAAATCA 980
||||||||||||||||||||
Sbjct 184 ATTGTGAAATGCCAAAATCA 165
>ref|XM_003711036.1| Magnaporthe oryzae 70-15 initiation-specific alpha-1,6-
mannosyltransferase
(MGG_08652) mRNA, complete cds
Length=984
Score = 277 bits (306), Expect = 1e-70
Identities = 524/760 (69%), Gaps = 17/760 (2%)
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.11
Strand=Plus/Minus
Query 12 CATTCTTCCAGGATCCGGCGTAATGGTGCACCACCAGTTTTCGCCCAGTCTTCTTGTCTC 71
|||| |||||||| || |||||||||||||| || || || ||||| | | |||||| |
Sbjct 967 CATTTTTCCAGGACCCTGCGTAATGGTGCACTACGAGCTTCCGCCCTGGCACCTTGTCCC 908
Query 72 CATGCTGTTGATTCATCGTGTCCGCAAAGGCGTAGTCGGGCAACACCAGCACGTCGCCCA 131
| | ||||| | || || || | || ||| |||| | | || | |||||
Sbjct 907 CCCACCACATGTTCATGGAATCTGCGAACGAGTGGTCCGGCAGTATTAAAACATTGCCCA 848
Query 132 ACAGCCTGGGCTCTTTGACGTTGGCTATCTCGCCGTCGCCCACGGTCTC-ATTCAGCGTG 190
||||| || || | || ||| |||| || | ||||| |||| || || |
Sbjct 847 GAAGCCTCGGTTCCGTAACATTGACTATGTCATTATTCCCCAC-CTCTCGGTTGAGTGAT 789
Query 191 TTGCTCAGACTCTTCAAGATGCCCCTCGTCAACCTGCGCGGGCCCGAAACATCGACAATG 250
||||| || ||||| || || ||||||||||| | ||||||||| | |||| ||
Sbjct 788 TTGCTGAGGCTCTTGAAAATCGACCTCGTCAACCGTCTCGGGCCCGACAAGTCGATAACA 729
Query 251 TCGTCAACCATGTCGAGCCTGAGGTCCTGGATCCCGCCG-ACCTTGCGCTCCTTGGCCTT 309
||||| | | || | || | || ||| ||| | | || || || ||||| |
Sbjct 728 TCGTCGAGCTGGTTGCGCTTCAGCTCC--GATATTGGGGTTCCAAGC-CT-TTTGGCAGT 673
Query 310 GGCGACCAGTCCCTCGAGACCATCTTGGACGGCCATCATCATGTGCGGCGATTTTGGTTT 369
||| ||| | |||| | | |||| ||| ||||| | | |||||| ||| || ||
Sbjct 672 GGCCGCCACTTCCTCCAAGCAGTCTTCGACCGCCATTAAGAAATGCGGCTGTTTGGGCTT 613
Query 370 CGCCATGATAGTCCAACTGGCGAACTGCCGAACCCACTGGTCCACATCGAACTCCAGTCC 429
|||||||| ||||| ||||||| ||| || | |||| |||| |||||||||| ||
Sbjct 612 GGCCATGATGGTCCAGCTGGCGATCTGTCGCAACCACCTGTCCGTGTCGAACTCCATCCC 553
Query 430 AACAACAATTTTGGCTTGGT--CTTTGTACTGCTTGGGAACCCATTCGCTGATCGGTGCC 487
|| || | || ||| ||||||||||| ||| | ||||||| ||| |||||
Sbjct 552 GACGACGGTAGCAGC--GGTAGATTTGTACTGCTCGGGGATCCATTCGTCGATGGGTGCT 495
Query 488 TCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTACAGGATCAGGTAGCGG 547
||||| || |||||||| |||||||| | || || ||||| ||||| |||||||||
Sbjct 494 TCGCAAGACACGTCCAGATCGTTCCAGATTCCACCCTTTTCGTAGAGGATGAGGTAGCGG 435
Query 548 AGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTGGCGACCACGTCAGGG 607
|| || |||||||||||||||||||| || |||| |||||| || || | | ||
Sbjct 434 AGGAGGTCGGCCTTGATGATTGGAATGCTGATGGGGAGGAACCTGTTGATTATATTTGGA 375
Query 608 CGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCGTCCGTCAGGAACTCG 667
| | | | |||| |||||||||| ||||||| || |||| |||||||||||
Sbjct 374 TTCCACGAG---TAGTGCTTCTTGACAAACTCGTCGCCCGAGACGTCGGTCAGGAACTCA 318
Query 668 ACCTTGAAGCCGGGGTTCTTGGACACACAGGAGTCGACGTGGGGCTTGAGGTCGTCCTTC 727
|| | | |||| || |||||| | || || | | || ||||||| | | | ||
Sbjct 317 ACATCGTAGCCTGGATTCTTG---AGGCAAGATTTTATGTATGGCTTGATATTCTTCCTC 261
Query 728 AAGCCTGCAGGCCCGAGTTTGTACCACAGCCTTTGTGGTA 767
| || ||| | |||| ||| ||||||||| ||| |||||
Sbjct 260 ACCCCCGCACGTCCGACTTTATACCACAGCTTTTTTGGTA 221
>ref|XM_003660234.1| Myceliophthora thermophila ATCC 42464 glycosyltransferase family
32 protein (MYCTH_97899) mRNA, complete cds
Length=699
Score = 93.3 bits (102), Expect = 3e-15
Identities = 158/229 (69%), Gaps = 9/229 (4%)
Strand=Plus/Minus
Query 353 TGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCA----CTG 408
|||||||| |||||||||||||| |||||| | |||| ||| | || | | |
Sbjct 344 TGCGGCGACCCCGGTTTCGCCATGATGGTCCAAATCGCGAGCTGGTGGACAAAAGGCCGG 285
Query 409 GTCCA-----CATCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTG 463
| ||| ||| |||||| || || || | |||| | ||| | ||||||
Sbjct 284 GGCCAGCCGACATTAAACTCCCAGCCCACGACGACGTTGGTCTCGTCCTCATACTGCGGA 225
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.12
Query 464 GGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCG 523
|||| |||||||| || || | ||||||||||||||||||||| | || || ||
Sbjct 224 GGAATCCATTCGCCGAAGGGCGTGTCGCACGAGACGTCCAGGTCGCAGTAGACGCCTCCT 165
Query 524 AACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAA 572
| | |||| |||||||||||||| ||| | |||| ||||||||
Sbjct 164 TCGGAGAAGAGGAGGAGGTAGCGGAGAAGGTCGACTTTGAGGATTGGAA 116
>gb|CP003002.1| Myceliophthora thermophila ATCC 42464 chromosome 1, complete
sequence
Length=10931058
Features in this part of subject sequence:
glycosyltransferase family 32 protein
Score = 93.3 bits (102), Expect = 3e-15
Identities = 158/229 (69%), Gaps = 9/229 (4%)
Strand=Plus/Plus
Query 353 TGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACCCA----CTG 408
|||||||| |||||||||||||| |||||| | |||| ||| | || | | |
Sbjct 8013281 TGCGGCGACCCCGGTTTCGCCATGATGGTCCAAATCGCGAGCTGGTGGACAAAAGGCCGG 8013340
Query 409 GTCCA-----CATCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTG 463
| ||| ||| |||||| || || || | |||| | ||| | ||||||
Sbjct 8013341 GGCCAGCCGACATTAAACTCCCAGCCCACGACGACGTTGGTCTCGTCCTCATACTGCGGA 8013400
Query 464 GGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCG 523
|||| |||||||| || || | ||||||||||||||||||||| | || || ||
Sbjct 8013401 GGAATCCATTCGCCGAAGGGCGTGTCGCACGAGACGTCCAGGTCGCAGTAGACGCCTCCT 8013460
Query 524 AACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAA 572
| | |||| |||||||||||||| ||| | |||| ||||||||
Sbjct 8013461 TCGGAGAAGAGGAGGAGGTAGCGGAGAAGGTCGACTTTGAGGATTGGAA 8013509
>ref|XM_001935551.1| Pyrenophora tritici-repentis Pt-1C-BFP alpha-1,6-
mannosyltransferase
Och1, mRNA
Length=846
Score = 87.8 bits (96), Expect = 1e-13
Identities = 217/327 (66%), Gaps = 12/327 (4%)
Strand=Plus/Minus
Query 346 CATCATGTGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAA----CTGCCGAA 401
|||||| || || || |||||| |||||||||||||| || ||||| | | || |
Sbjct 585 CATCATATGTGGGGACCGTGGTTTAGCCATGATAGTCCAGCTAGCGAATTGTCGGACGTA 526
Query 402 CCCACTGGTCCA-----CATCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTA 456
| ||| || ||| ||||||||| || ||||| | ||| | | | | |||
Sbjct 525 CACACCGGGCCAGCCTTGGTCGAACTCCCACCCTACAACGAGCGAGGCGTTGGCCTGGTA 466
Query 457 CTGCTTGGGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACAC 516
| || |||||| || || |||| |||||| || ||||| || ||| ||| |
Sbjct 465 TTCAGACGGCACCCATGTGCCAATAGGTGTCTCGCAGGATACGTCTAGATCGGACCATAT 406
Query 517 CCCGCCGAACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCT 576
|||||| || | |||| |||||||| || | ||| || ||| ||| || ||
Sbjct 405 ACCGCCGCGGTCCCAGAGGAGGAGGTAGCGCAGGAAATCTGCTTTGTAGATGGGGATGGG 346
Query 577 AATGGCGAGGAAATTGGCGACCACGTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAA 636
| ||||||| | || ||||| | ||| ||||| | ||| |||| | | ||| |
Sbjct 345 GAGGGCGAGGTAGTTTGCGACGATGTCCGGGCGGGAAGCG--AAAG-CTGTACGGACGTA 289
Query 637 GGCGTCGCCTGATTCGTCCGTCAGGAA 663
| |||||| |||||| |||| |||
Sbjct 288 GTCGTCGCTGCTTTCGTCGGTCATGAA 262
>ref|XM_003306105.1| Pyrenophora teres f. teres 0-1 hypothetical protein, mRNA
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.13
Length=1044
Score = 66.2 bits (72), Expect = 5e-07
Identities = 167/251 (67%), Gaps = 6/251 (2%)
Strand=Plus/Minus
Query 416 TCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTACTGCTTGGGAACCCATTCG 475
||||||||| ||| || |||| |||| | ||||||| ||| || ||||||| |
Sbjct 566 TCGAACTCCCATCCCACGACAACACTGGCGTTTGCTTTGTATCGCTCCGGGACCCATTGG 507
Query 476 CTGATCGGTGCC---TCGCACGAGACGTCCAGGTCGTTCCACACCCCGCCGAACTCGTAC 532
|| ||| | || || |||||||| |||||| | || || || ||| |
Sbjct 506 TCCATGGGTACTCCTTCACAGGAGACGTCGAGGTCGGCGTAGACGCCACCCTGGTCGAAG 447
Query 533 AGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCTAATGGCGAGGAAATTG 592
|||| |||||||||||| | ||||| || | ||| |||||| || |||| | ||
Sbjct 446 AGGAGCAGGTAGCGGAGCATGTCGGCTTTCAGGATGGGAATCGGAAGACCGAGATAGTTC 387
Query 593 GCGACCACGTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAAGGCGTCGCCTGATTCG 652
|||| | || || |||| || | | | |||| || | ||||| | | ||||
Sbjct 386 TCGACGATATCCGGACGCATCACG--TATGCCT-TCTTTACGTATTCGTCGGCAGTTTCG 330
Query 653 TCCGTCAGGAA 663
||||||| |||
Sbjct 329 TCCGTCATGAA 319
>ref|XM_003300282.1| Pyrenophora teres f. teres 0-1 hypothetical protein, mRNA
Length=939
Score = 64.4 bits (70), Expect = 2e-06
Identities = 211/327 (65%), Gaps = 12/327 (4%)
Strand=Plus/Minus
Query 346 CATCATGTGCGGCGATTTTGGTTTCGCCATGATAGTCCAACTGGCGAACTGCCGAACC-- 403
||||||||| || || ||||| || | ||||||||| || || ||||| |||||
Sbjct 582 CATCATGTGTGGGGACCGGGGTTTGGCTAGGATAGTCCAGCTAGCAAACTGACGAACGTA 523
Query 404 --CACTGGTCCACA-----TCGAACTCCAGTCCAACAACAATTTTGGCTTGGTCTTTGTA 456
||| || ||| ||||| ||| || || || | || | | | | |||
Sbjct 522 GACACCGGGCCAGCCTTGGTCGAATTCCCAGCCTACCACCAGCGACGCGTTGGCCTGGTA 463
Query 457 CTGCTTGGGAACCCATTCGCTGATCGGTGCCTCGCACGAGACGTCCAGGTCGTTCCACAC 516
| || ||||| | ||| || ||||||| || ||||| || ||| ||| |
Sbjct 462 TTCGGGCGGCACCCACGAGTCGATGGGCACCTCGCAGGATACGTCTAGATCGGACCATAT 403
Query 517 CCCGCCGAACTCGTACAGGATCAGGTAGCGGAGAAGATCGGCCTTGATGATTGGAATCCT 576
||||| || | |||| ||||||||||| | |||||| || ||| || |
Sbjct 402 GCCGCCTTGGTCCCAGAGGAGGAGGTAGCGGAGGAAATCGGCTTTATAGATGGGGACGGG 343
Query 577 AATGGCGAGGAAATTGGCGACCACGTCAGGGCGCAAGGCGGCAAAGTGTCTCTTGACAAA 636
| ||||||| | |||| ||| | ||| ||||| || |||||| | | ||| |
Sbjct 342 GAGGGCGAGGTAGTTGGAGACGATGTCGGGGCGGAA---TGCAAAGGCTGTACGGACGTA 286
Query 637 GGCGTCGCCTGATTCGTCCGTCAGGAA 663
| |||||| ||||||||||| |||
Sbjct 285 GCCGTCGCTGCTTTCGTCCGTCATGAA 259
Database: Nucleotide collection (nt)
Posted date: Jan 12, 2013 4:14 PM
Number of letters in database: 43,890,479,962
Number of sequences in database: 17,084,706
Lambda K H
0.634 0.408 0.912
Gapped
Lambda K H
0.625 0.410 0.780
Matrix: blastn matrix:2 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Sequences: 17084706
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.14
Number of Hits to DB: 15849831
Number of extensions: 96625
Number of successful extensions: 96625
Number of sequences better than 1e-05: 0
Number of HSP's better than 1e-05 without gapping: 0
Number of HSP's gapped: 96625
Number of HSP's successfully gapped: 0
Length of query: 980
Length of database: 43890479962
Length adjustment: 36
Effective length of query: 944
Effective length of database: 43275430546
Effective search space: 40852006435424
Effective search space used: 40852006435424
A: 0
X1: 22 (20.1 bits)
X2: 33 (29.8 bits)
X3: 110 (99.2 bits)
S1: 25 (23.8 bits)
S2: 68 (62.6 bits)
RESULT:
A program using BioPERL is written to find homologous sequences for a query sequence, from
biological sequence database using RemoteBLAST and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 629 180 2.15
PRACTICAL: 05 SECONDARY STRUCTURE PREDICTION USING BIOPERL
/ / 201
AIM:
To write a BioPERL program to predict secondary structure of a protein sequence.
SOFTWARE USED:
Perl 5.16.2
BioPerl 1.6.1
SOURCE CODE:
system("cls");
use Bio::PrimarySeq;
use Bio::Tools::Analysis::Protein::Sopma;
print "Protein Secondary Structure Prediction (SOPMA):-";
print "\n----------------------------------------------\n";
print "\nEnter your query sequence:\n";
$query = <>;
my $seqs = Bio::PrimarySeq->new(-seq => $query);
$tool = Bio::Tools::Analysis::Protein::Sopma->new( -seq => $seqs,
-window_width => 15);
$tool->run();
my $raw = $tool->result('');
my @fts = $tool->result(Bio::SeqFeatureI);
print "\n Predicted Regions are below:\n";
for my $ft (@fts)
{
print "From ", $ft->start, " to ",$ft->end, " struc: " ,
($ft->each_tag_value('type'))[0],"\n";
}
<>;
INPUT/OUTPUT:
Protein Secondary Structure Prediction (SOPMA):-
----------------------------------------------
Enter your query sequence:
EHIMELLIMVDALKRASAKTINIVIPYYGYARQDRKARSREPITAKLFANLLETAGATRVIALDLHAPQI
Predicted Regions are below:
From 1 to 20 struc: H
From 43 to 54 struc: H
From 55 to 56 struc: T
From 25 to 42 struc: C
From 21 to 24 struc: E
From 59 to 64 struc: E
RESULT:
A program using BioPERL is written to predict secondary structure of a protein sequence and
executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 629 180 2.16
PRACTICAL: 06 GLOBAL ALIGNMENT USING R
/ / 201
AIM:
To write a R program to align pair of sequences using Needleman-Wunsch algorithm.
SOFTWARE USED:
R 2.15.2
Biostrings 2.6.6: Module for string objects representing biological sequences, and matching
algorithms in R
SOURCE CODE:
library("seqinr")
library("Biostrings")
leprae <- read.fasta(file = "E:/R\ Practical/Q9CD83.fasta")
ulcerans <- read.fasta(file = "E:/R\ Practical/A0PQ23.fasta")
lepraeseq <- leprae[[1]]
ulceransseq <- ulcerans[[1]]
lepraeseqstring <- c2s(lepraeseq)
ulceransseqstring <- c2s(ulceransseq)
lepraeseqstring <- toupper(lepraeseqstring)
ulceransseqstring <- toupper(ulceransseqstring)
globalAlignLepraeUlcerans <- pairwiseAlignment(lepraeseqstring,
ulceransseqstring, substitutionMatrix = "BLOSUM50", gapOpening = -2,
gapExtension = -8, scoreOnly = FALSE)
printPairwiseAlignment <- function(alignment, chunksize=60, returnlist=FALSE)
{
require(Biostrings) # This function requires the Biostrings package
seq1aln <- pattern(alignment) # Get the alignment for the first sequence
seq2aln <- subject(alignment) # Get the alignment for the second sequence
alnlen <- nchar(seq1aln) # Find the number of columns in the alignment
starts <- seq(1, alnlen, by=chunksize)
n <- length(starts)
seq1alnresidues <- 0
seq2alnresidues <- 0
for (i in 1:n)
{
chunkseq1aln <- substring(seq1aln, starts[i], starts[i]+chunksize-1)
chunkseq2aln <- substring(seq2aln, starts[i], starts[i]+chunksize-1)
# Find out how many gaps there are in chunkseq1aln:
gaps1 <- countPattern("-",chunkseq1aln) # countPattern() is from Biostrings
package
# Find out how many gaps there are in chunkseq2aln:
gaps2 <- countPattern("-",chunkseq2aln) # countPattern() is from Biostrings
package
# Calculate how many residues of the first sequence we have printed so far in
the alignment:
seq1alnresidues <- seq1alnresidues + chunksize - gaps1
# Calculate how many residues of the second sequence we have printed so far
in the alignment:
seq2alnresidues <- seq2alnresidues + chunksize - gaps2
if (returnlist == 'FALSE')
{
print(paste(chunkseq1aln,seq1alnresidues))
print(paste(chunkseq2aln,seq2alnresidues))
print(paste(' '))
}
}
if (returnlist == 'TRUE')
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.17
{
vector1 <- s2c(substring(seq1aln, 1, nchar(seq1aln)))
vector2 <- s2c(substring(seq2aln, 1, nchar(seq2aln)))
mylist <- list(vector1, vector2)
return(mylist)
}
}
printPairwiseAlignment(globalAlignLepraeUlcerans, 60)
INPUT:
File 1: (E:\R Practical\Q9CD83.fasta)
>sp|Q9CD83|PHBS_MYCLE Chorismate--pyruvate lyase OS=Mycobacterium leprae
(strain TN) GN=ML0133 PE=3 SV=1
MTNRTLSREEIRKLDRDLRILVATNGTLTRVLNVVANEEIVVDIINQQLLDVAPKIPELE
NLKIGRILQRDILLKGQKSGILFVAAESLIVIDLLPTAITTYLTKTHHPIGEIMAASRIE
TYKEDAQVWIGDLPCWLADYGYWDLPKRAVGRRYRIIAGGQPVIITTEYFLRSVFQDTPR
EELDRCQYSNDIDTRSGDRFVLHGRVFKNL
File 2: (E:\R Practical\A0PQ23.fasta)
>tr|A0PQ23|A0PQ23_MYCUA Chorismate pyruvate-lyase OS=Mycobacterium ulcerans
(strain Agy99) GN=MUL_2003 PE=4 SV=1
MLAVLPEKREMTECHLSDEEIRKLNRDLRILIATNGTLTRILNVLANDEIVVEIVKQQIQ
DAAPEMDGCDHSSIGRVLRRDIVLKGRRSGIPFVAAESFIAIDLLPPEIVASLLETHRPI
GEVMAASCIETFKEEAKVWAGESPAWLELDRRRNLPPKVVGRQYRVIAEGRPVIIITEYF
LRSVFEDNSREEPIRHQRSVGTSARSGRSICT
OUTPUT:
[1] "MT-----NR--T---LSREEIRKLDRDLRILVATNGTLTRVLNVVANEEIVVDIINQQLL 50"
[1] "MLAVLPEKREMTECHLSDEEIRKLNRDLRILIATNGTLTRILNVLANDEIVVEIVKQQIQ 60"
[1] " "
[1] "DVAPKIPELENLKIGRILQRDILLKGQKSGILFVAAESLIVIDLLPTAITTYLTKTHHPI 110"
[1] "DAAPEMDGCDHSSIGRVLRRDIVLKGRRSGIPFVAAESFIAIDLLPPEIVASLLETHRPI 120"
[1] " "
[1] "GEIMAASRIETYKEDAQVWIGDLPCWLADYGYWDLPKRAVGRRYRIIAGGQPVIITTEYF 170"
[1] "GEVMAASCIETFKEEAKVWAGESPAWLELDRRRNLPPKVVGRQYRVIAEGRPVIIITEYF 180"
[1] " "
[1] "LRSVFQDTPREELDRCQYSNDIDTRSGDRFVLHGRVFKN 230"
[1] "LRSVFEDNSREEPIRHQRS--VGT-SA-R---SGRSICT 233"
[1] " "
RESULT:
A program using R is written to align pair of sequences using Needleman-Wunsch algorithm
and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 629 180 2.18
PRACTICAL: 07 DOTPLOT USING R
/ / 201
AIM:
To write a R program to display DotPlot from the pair of sequences.
SOFTWARE USED:
R 2.15.2
Seqinr 3.0-7: Biological Sequences Retrieval and Analysis module of R
SOURCE CODE:
Online: library("seqinr")
choosebank("swissprot")
query("leprae", "AC=Q9CD83")
lepraeseq <- getSequence(leprae$req[[1]])
query("ulcerans", "AC=A0PQ23")
ulceransseq <- getSequence(ulcerans$req[[1]])
closebank()
dotPlot(lepraeseq, ulceransseq)
Offline:
library("seqinr")
leprae <- read.fasta(file = "C:/Users/Ashok\ Kumar/Desktop/Q9CD83.fasta")
ulcerans <- read.fasta(file = "C:/Users/Ashok\ Kumar/Desktop/A0PQ23.fasta")
lepraeseq <- leprae[[1]]
ulceransseq <- ulcerans[[1]]
dotPlot(lepraeseq, ulceransseq)
INPUT:
Sequence 1: (SwissProt ID: Q9CD83)
>sp|Q9CD83|PHBS_MYCLE Chorismate--pyruvate lyase OS=Mycobacterium leprae
(strain TN) GN=ML0133 PE=3 SV=1
MTNRTLSREEIRKLDRDLRILVATNGTLTRVLNVVANEEIVVDIINQQLLDVAPKIPELE
NLKIGRILQRDILLKGQKSGILFVAAESLIVIDLLPTAITTYLTKTHHPIGEIMAASRIE
TYKEDAQVWIGDLPCWLADYGYWDLPKRAVGRRYRIIAGGQPVIITTEYFLRSVFQDTPR
EELDRCQYSNDIDTRSGDRFVLHGRVFKNL
Sequence 2: (SwissProt ID: A0PQ23)
>tr|A0PQ23|A0PQ23_MYCUA Chorismate pyruvate-lyase OS=Mycobacterium ulcerans
(strain Agy99) GN=MUL_2003 PE=4 SV=1
MLAVLPEKREMTECHLSDEEIRKLNRDLRILIATNGTLTRILNVLANDEIVVEIVKQQIQ
DAAPEMDGCDHSSIGRVLRRDIVLKGRRSGIPFVAAESFIAIDLLPPEIVASLLETHRPI
GEVMAASCIETFKEEAKVWAGESPAWLELDRRRNLPPKVVGRQYRVIAEGRPVIIITEYF
LRSVFEDNSREEPIRHQRSVGTSARSGRSICT
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.19
OUTPUT:
RESULT:
A program using R is written to display DotPlot from the pair of sequences and executed
successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 629 180 2.20
PRACTICAL: 08 FILE FORMAT CONVERSION USING R
/ / 201
AIM:
To write a program to convert a file in GenBank file format to FASTA file format using R.
SOFTWARE USED:
R 2.15.2
Seqinr 3.0-7: Biological Sequences Retrieval and Analysis module of R
SOURCE CODE:
library("seqinr")
gb2fasta(source.file = "E:/R\ Practical/AF060490.gb",
destination.file = "E:/R\ Practical/AF060490.fasta")
INPUT:
File Name: AF060490.gb
LOCUS AF060490 2693 bp mRNA linear ROD 02-MAY-2000
DEFINITION Mus musculus TLS-associated protein TASR-2 mRNA, complete cds.
ACCESSION AF060490
VERSION AF060490.1 GI:3327956
KEYWORDS .
SOURCE Mus musculus (house mouse)
ORGANISM Mus musculus
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
Sciurognathi; Muroidea; Muridae; Murinae; Mus; Mus.
REFERENCE 1 (bases 1 to 2693)
AUTHORS Yang,L., Embree,L.J. and Hickstein,D.D.
TITLE TLS-ERG leukemia fusion protein inhibits RNA splicing mediated by
serine-arginine proteins
JOURNAL Mol. Cell. Biol. 20 (10), 3345-3354 (2000)
PUBMED 10779324
REFERENCE 2 (bases 1 to 2693)
AUTHORS Yang,L., Embree,L., Tsai,S. and Hickstein,D.D.
TITLE Molecular cloning of TASR-2, a TLS-associated protein with Ser-Arg
repeats
JOURNAL Unpublished
REFERENCE 3 (bases 1 to 2693)
AUTHORS Yang,L., Embree,L., Tsai,S. and Hickstein,D.D.
TITLE Direct Submission
JOURNAL Submitted (17-APR-1998) Medicine/Oncology, University of
Washington, 1660 S. Columbian Way, GMR 151, Seattle, WA 98108, USA
FEATURES Location/Qualifiers
source 1..2693
/mol_type="mRNA"
/db_xref="taxon:10090"
/cell_line="EML"
/cell_type="hematopoietic"
/organism="Mus musculus"
CDS 92..880
/db_xref="GI:3327957"
/codon_start=1
/protein_id="AAC26715.1"
/translation="MSRYLRPPNTSLFVRNVADDTRSEDLRREFGRYGPIVDVYVPLD
FYTRRPRGFAYVQFEDVRDAEDALHNLDRKWICGRQIEIQFAQGDRKTPNQMKAKEGR
NVYSSSRYDDYDRYRRSRSRSYERRRSRSRSFDYNYRRSYSPRNSRPTGRPRRSRSHS
DNDRFKHRNRSFSRSKSNSRSRSKSQPKKEMKAKSRSRSASHTKTRGTSKTDSKTHYK
SGSRYEKESRKKEPPRSKSQSRSQSRSRSKSRSRSWTSPKSSGH"
/product="TLS-associated protein TASR-2"
/note="contains Ser-Arg (SR) repeats"
ORIGIN
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.21
1 gtgtggtgtg agtggatgtg agccgccgcc ggagctgcgg acggtttgcc cgagcccgtt
61 agcgccgccg gcccagagtc ccgccgccac catgtcccga tacctgcgcc cccctaacac
121 gtctctgttc gtcaggaacg tggcggacga caccaggtct gaagatttac gtcgggaatt
181 tggtcgttat ggtccaatag tagatgttta tgtcccactt gatttctaca ctcggcgtcc
241 aagaggattt gcatatgttc aatttgagga tgttcgtgat gctgaagacg ctttacataa
301 tttggacaga aaatggattt gtgggcgtca gattgaaatc cagttcgcac agggggatcg
361 gaagacacca aatcaaatga aagccaagga agggaggaat gtatacagct cttcacgata
421 tgacgattat gaccgatata gacgctctcg aagccggagt tatgaaagga gaagatcgag
481 gagtcgctcc tttgattata actataggag atcttacagt cctagaaaca gtagaccgac
541 tggaagacca cggcgtagcc gaagccattc cgacaatgat agattcaaac accgaaatcg
601 atctttttca agatctaaat ccaattcaag atcacggtcc aagtcccagc ccaagaaaga
661 aatgaaggct aaatcacgtt ctaggtctgc atctcacacc aaaactagag gcacctctaa
721 aacagattcc aaaacacatt ataagtctgg ctcaagatat gaaaaggaat caaggaaaaa
781 agaaccacct agatccaaat ctcagtcaag atcacagtct aggtctaggt caaaatctag
841 gtcaaggtct tggactagtc ccaagtccag tggccactga tagtataaat tatgatactt
901 ctaggcatgt atcattcatt tactcatagt ttggtatact taaattatca ggaatacaat
961 gttgcaatga tgcgttttaa aaacaaacaa acttaacttg ttagttttcc ctgtactggg
1021 caatggttat aattaaaaag atgcgctgtt gagaagccac tcttaagagt ccagtttgtt
1081 taatgttatg ggcagctacc aatttgtggt gtctctgtat atttttgtaa agattctcat
1141 tttttatgct tgaagtattt ggtgaaaaga tgttggttga ccataatttg caacattgtc
1201 ttattagaaa taaattttca tatccatatt tggtagaact gttaacctag aaatgtagct
1261 tgctaataag atagaatgat acagaagtga agtggtagcc acattacaac actgactgct
1321 cagacacatt taggttcagg gtggacttta tgtcttgtca agatgtctaa gcccatgatg
1381 attatttatg atgcaatgtg gaatagttct tttgttaaat ccaccatctg gggattgatg
1441 ccaactgggt taaatagcgt tttcagggag agtgcccttt tcactgaaac atggagcctt
1501 cactgctttc cccacctcaa tccctgctgg tttctaagat atggaacatt aaagcataag
1561 ggaaaaccct cccccttaag ttgtgagtga gtcagtgatc acagaaacca ttgtaagggg
1621 aaaagactgt tcttagcata gttgctctaa atttaactat tgttgatcat tgttatttag
1681 gggttttgtt ttgttgtttg ttttttctgt tagaaacaag tgaactgttt gaaaatacat
1741 ttttgtttgt ttatatgcat agtgtaaaac aaactgaatt ttgatgctca cagcacttac
1801 catgtgcgtt tgtatcaaaa tctgcctgtt cttcataggg gaggcttgct cttcacacct
1861 cagtttattc atgtgagaca ggctgagaag ataacactcc taggtgattt tgtggtgccg
1921 tggatttttg gggaaagttg agttttaagc aaaagccaca tcacttagtt tttggtaatg
1981 taggacatga ctaaaaaata acgaaatgat acccttaaat atttataatt tctagtattt
2041 caagattgtt ttggaggcaa taaaatgact tgaaatgtcc ggtgtcattt cagaatacaa
2101 agctagtgtc tctaagatct tagattcgtt gcttacagat gtgagtgaag atactgtggg
2161 ggacgatcct cctggaggat taccttattt ttttcctttc gattttgttt ttagaaattt
2221 agtccttgct tgtagacaac aaaagatggt tttaagaact gtttgtggaa tgtgtttgga
2281 gggttaattc tagaaccttt gtatatttaa tagtatttct aacttttatt tctttactgt
2341 ttgcagttaa tgttcttgtt ctgctatgca atcatttata tgcacgtttc tttaattttt
2401 ttagattttc ctggatgtat agtttaaaca aagtctattt aaaactgtag cggtagtttg
2461 cagttctagc aaagaggaaa gttgtggggt taaactttgt attttctttc ttatagaagc
2521 ttctaaaaag gtatttttat atgttctttt taacaaatat tgtgtacaac ctttaaaaca
2581 tcaatgtttg gatcaaaaca agacccagct tattttctgc ttgctgtaaa ttaagcaaag
2641 atgctataat aaaaacaaaa tgaaggaaaa aaaaaaaaaa aaaaaaaaaa aaa
//
OUTPUT:
File Name: AF060490.fasta
>AF060490 2693 bp
gtgtggtgtgagtggatgtgagccgccgccggagctgcggacggtttgcccgagcccgtt
agcgccgccggcccagagtcccgccgccaccatgtcccgatacctgcgcccccctaacac
gtctctgttcgtcaggaacgtggcggacgacaccaggtctgaagatttacgtcgggaatt
tggtcgttatggtccaatagtagatgtttatgtcccacttgatttctacactcggcgtcc
aagaggatttgcatatgttcaatttgaggatgttcgtgatgctgaagacgctttacataa
tttggacagaaaatggatttgtgggcgtcagattgaaatccagttcgcacagggggatcg
gaagacaccaaatcaaatgaaagccaaggaagggaggaatgtatacagctcttcacgata
tgacgattatgaccgatatagacgctctcgaagccggagttatgaaaggagaagatcgag
gagtcgctcctttgattataactataggagatcttacagtcctagaaacagtagaccgac
tggaagaccacggcgtagccgaagccattccgacaatgatagattcaaacaccgaaatcg
atctttttcaagatctaaatccaattcaagatcacggtccaagtcccagcccaagaaaga
aatgaaggctaaatcacgttctaggtctgcatctcacaccaaaactagaggcacctctaa
aacagattccaaaacacattataagtctggctcaagatatgaaaaggaatcaaggaaaaa
agaaccacctagatccaaatctcagtcaagatcacagtctaggtctaggtcaaaatctag
gtcaaggtcttggactagtcccaagtccagtggccactgatagtataaattatgatactt
ctaggcatgtatcattcatttactcatagtttggtatacttaaattatcaggaatacaat
gttgcaatgatgcgttttaaaaacaaacaaacttaacttgttagttttccctgtactggg
caatggttataattaaaaagatgcgctgttgagaagccactcttaagagtccagtttgtt
taatgttatgggcagctaccaatttgtggtgtctctgtatatttttgtaaagattctcat
tttttatgcttgaagtatttggtgaaaagatgttggttgaccataatttgcaacattgtc
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.22
ttattagaaataaattttcatatccatatttggtagaactgttaacctagaaatgtagct
tgctaataagatagaatgatacagaagtgaagtggtagccacattacaacactgactgct
cagacacatttaggttcagggtggactttatgtcttgtcaagatgtctaagcccatgatg
attatttatgatgcaatgtggaatagttcttttgttaaatccaccatctggggattgatg
ccaactgggttaaatagcgttttcagggagagtgcccttttcactgaaacatggagcctt
cactgctttccccacctcaatccctgctggtttctaagatatggaacattaaagcataag
ggaaaaccctcccccttaagttgtgagtgagtcagtgatcacagaaaccattgtaagggg
aaaagactgttcttagcatagttgctctaaatttaactattgttgatcattgttatttag
gggttttgttttgttgtttgttttttctgttagaaacaagtgaactgtttgaaaatacat
ttttgtttgtttatatgcatagtgtaaaacaaactgaattttgatgctcacagcacttac
catgtgcgtttgtatcaaaatctgcctgttcttcataggggaggcttgctcttcacacct
cagtttattcatgtgagacaggctgagaagataacactcctaggtgattttgtggtgccg
tggatttttggggaaagttgagttttaagcaaaagccacatcacttagtttttggtaatg
taggacatgactaaaaaataacgaaatgatacccttaaatatttataatttctagtattt
caagattgttttggaggcaataaaatgacttgaaatgtccggtgtcatttcagaatacaa
agctagtgtctctaagatcttagattcgttgcttacagatgtgagtgaagatactgtggg
ggacgatcctcctggaggattaccttatttttttcctttcgattttgtttttagaaattt
agtccttgcttgtagacaacaaaagatggttttaagaactgtttgtggaatgtgtttgga
gggttaattctagaacctttgtatatttaatagtatttctaacttttatttctttactgt
ttgcagttaatgttcttgttctgctatgcaatcatttatatgcacgtttctttaattttt
ttagattttcctggatgtatagtttaaacaaagtctatttaaaactgtagcggtagtttg
cagttctagcaaagaggaaagttgtggggttaaactttgtattttctttcttatagaagc
ttctaaaaaggtatttttatatgttctttttaacaaatattgtgtacaacctttaaaaca
tcaatgtttggatcaaaacaagacccagcttattttctgcttgctgtaaattaagcaaag
atgctataataaaaacaaaatgaaggaaaaaaaaaaaaaaaaaaaaaaaaaaa
RESULT:
A program using R is written to convert a file in GenBank file format to FASTA file format and
executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 629 180 2.23
PRACTICAL: 09 HYPOTHESIS t-TEST USING R
/ / 201
AIM:
To write a R program to compute t-test value from two variables and conclude the hypothesis.
SOFTWARE USED:
R 2.15.2
PROBLEM/SOURCE CODE:
1. One sample t-test
Problem:
An outbreak of Salmonella related illness was attributed to ice cream produced at a certain
factory. Scientists measured the level of Salmonella in 9 randomly sampled batches of ice
cream. The levels (in MPN/g) were: 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418
Is there evidence that the mean level of Salmonella in the ice cream is greater than 0.3 MPN/g?
SourceCode:
x = c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)
t.test(x, alternative="greater", mu=0.3)
Output:
One Sample t-test
data: x
t = 2.2051, df = 8, p-value = 0.02927
alternative hypothesis: true mean is greater than 0.3
95 percent confidence interval:
0.3245133 Inf
sample estimates:
mean of x
0.4564444
Conclusion:
From the output we see that the p-value = 0.029. Hence, there is moderately strong evidence that
the mean Salmonella level in the ice cream is above 0.3 MPN/g.
2. Two sample t-test
Problem:
Subjects were given a drug (treatment group) and an additional 6 subjects a placebo (control
group). Their reaction time to a stimulus was measured (in ms). We want to perform a two-
sample t-test for comparing the means of the treatment and control groups.
Control (x) 91 87 99 77 88 91
Treatment (y) 101 110 103 93 99 104
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.24
SourceCode:
Control = c(91, 87, 99, 77, 88, 91)
Treat = c(101, 110, 103, 93, 99, 104)
t.test(Control,Treat,alternative="less", var.equal=TRUE)
t.test(Control,Treat,alternative="less")
Output:
Two Sample t-test
data: Control and Treat
t = -3.4456, df = 10, p-value = 0.003136
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -6.082744
sample estimates:
mean of x mean of y
88.83333 101.66667
Welch Two Sample t-test
data: Control and Treat
t = -3.4456, df = 9.48, p-value = 0.003391
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -6.044949
sample estimates:
mean of x mean of y
88.83333 101.66667
Conclusion:
Here the pooled t-test and the Welsh t-test give roughly the same results (p-value = 0.00313 and
0.00339, respectively).
3. Paired t-test
Problem:
A study was performed to test whether cars get better mileage on premium gas than on regular
gas. Each of 10 cars was first filled with either regular or premium gas, decided by a coin toss,
and the mileage for that tank was recorded. The mileage was recorded again for the same cars
using the other kind of gasoline. We use a paired t-test to determine whether cars get
significantly better mileage with premium gas.
Regular (x) 16 20 21 22 23 22 27 25 27 28
Premium (y) 19 22 24 24 25 25 26 26 28 32
SourceCode:
reg = c(16, 20, 21, 22, 23, 22, 27, 25, 27, 28)
prem = c(19, 22, 24, 24, 25, 25, 26, 26, 28, 32)
t.test(prem,reg,alternative="greater", paired=TRUE)
I M.Sc. Bioinformatics (2012 – 2014) Lab in Programming in C, PERL and R
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 6291 80 2.25
Output:
Paired t-test
data: prem and reg
t = 4.4721, df = 9, p-value = 0.0007749
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
1.180207 Inf
sample estimates:
mean of the differences
2
Conclusion:
The results show that the t-statistic is equal to 4.47 and the p-value is 0.00075. Since the p-value
is very low, we reject the null hypothesis. There is strong evidence of a mean increase in gas
mileage between regular and premium gasoline.
RESULT:
A program using R is written to compute t-test value from two variables and concluded the
hypothesis and executed successfully.
Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil – 629 180 2.26
PRACTICAL: 10 RETRIEVE SEQUENCE FROM DATABASE USING R
/ / 201
AIM:
To write a R program to download a nucleotide/protein sequence from a biological sequence
database.
SOFTWARE USED:
R 2.15.2
Seqinr 3.0-7: Biological Sequences Retrieval and Analysis module of R
SOURCE CODE:
library("seqinr")
choosebank("swissprot")
query("seq_id", "AC=Q9CD82")
seqs <- getSequence(seq_id$req[[1]])
closebank()
write.fasta(names="Q9CD82", sequences=seqs,
file.out="E:/R\ Practical/Q9CD82.fasta")
INPUT/OUTPUT:
>Q9CD82
MRSENLAALLARQAAEAGWYDKPAYFAPDVVTHGQIHDGAVRLGEVLRNRGLSAGDRVLL
CLPDSPDLVQLLLACLARGIMAFLANPELHRDDYAFPERDTAAALVITNGSLRDRFQSSN
VVEPAELLSDATRVEPSDYEPVSGDAYAFATYTSGTTGKPKAAIHRHADPFTFVDAMCRK
ALRLTPQDIGLCSARMYFAYGLGNSVWFPLATGGSAVISSVPVSAESAAMLSTRFEPSVL
YGVPSFFARVVGACSPDSFRSLRCVVTAGEALEPALAERLVEFFGGIPILDGIGSSEVGQ
TFVSNSVDDWRVGTLGKVLPPYEIRVVAPDGATAGSGIEGNLWVRGPSIAQSYWNRPDSL
LENGDWLNTRDRVRIDGDGWVTYGCRADDTEIVGGVNINPREVERLIIEADAVAEAAVVG
VREFTGASTLQAFLVPAVGAFIDESVMRDVHRRLLTQLTAFKVPHRFAIIERLPRSTNGK
LLRNVLRAQSPTKPIWELSLTESQSATKAQLDGRPASNAHAQAAVGHAAGATLKQRLSAL
QQERERLVVEAVCAEAVKMLGESDPGLINRDLAFSDLGFDSQMTVTLCNRLAVVTGLRLP
ETVGWDYGSISGLSRYLEAELSGVRSRPETPLSANSGAKGLSPIDEELKKVEEMVVAIGA
SEKQRVADRLRALLGIIVDGEAGLSKRIQAASTPDEIFQLIDSELCE
RESULT:
A program using R is written to download a nucleotide/protein sequence from a biological
sequence database and executed successfully.