Upload
lethithaoem
View
224
Download
0
Embed Size (px)
Citation preview
8/3/2019 Project Bio Information Technology
1/19
Project Bioinformation Technology
Martijn Heddes (870313134060)
Nguyen Thi Kha Tu (860319828030)
8/3/2019 Project Bio Information Technology
2/19
Introduction
In this experiment a set of bioinformatics applications were used to find out as much as possible about
a raw set of sequences obtained. The five main goals involved are gene annotation, protein
identification, determination of protein topology and localization, determination of 3D-structure model
and getting some extra information from literature. The set of sequences obtained was coded andcoded 17 different sequences.
METHOD SECTION
The 17 obtained sequences were attended to be different sequences obtained from a shotgun
sequencing reaction. Therefore, the sequences were screened on vector contamination sequences using
VecScreen[1] before constructing the contig using CAP3[2]. The contig obtained was used for a
nucleotide blast search against the Nucleotide collection (nr/nt) database in order to find similar or
related sequences with a known function. The highly similar sequences obtained were aligned to the
contig sequence and checked for differences. Some improvements were made in de contig sequence.
which was compared then to the protein sequence of the protein obtained from the blast search.Furthermore, additional literature about the specific protein was obtained searching in the medline
database using pubmed and google scholar. This in order to get some additional information about the
function of the protein, the involved working mechanism, the conserved domains and or motifs of the
protein, the evolution of the protein superfamily and the conformations of previous findings.
Signal Topology and subcellular localization
To indicate the topology and sub cellular localization of the predicted protein PSORT (version 2.0.4)
[8] and SOSUI [9] were used to search for specific motifs, which could reveal some information about
the subcellular localization of the protein. In addition, signal P was used in order to predict a terminalcleavage site.
3D protein modeling
To get a proper view of the three-dimensional structure of the predicted protein, several modeling
applications were used. Therefore, the obtained protein sequence was blasted against de PDB database
and also the predicted 3D structure together with the PDB code of the protein and other homologous
proteins were obtained from Modbase[11]. These PDB codes were used for searching the VAST[10]
database. Furthermore, the obtained predicted models were analyzed using respectively DeepView[13]
and Cn3D. Finally a check for the reliability has been done by using ProSa [12] and by making a
Ramachandran plot in DeepView.
RESULTS AND DISCUSSION
Gene annonation
8/3/2019 Project Bio Information Technology
3/19
No vector contamination was detected by VecScreen in any of the 17 sequences provided. Also, no
strange sequences were detected in the file and thus all the sequences were used for the contig
alignment. Cap3 gave as outpute one contig and used 15 of the 17 sequences . Sequences 10 and 12
were determined as contamination.
>Contig1
GCGGCCGAATGTGGGGGATGTCCCATTCATTACCATCCGGAATGGACACAGATTTTTCATCATATC
CATCAACGATAGGTCGGCAACACTCACAACAATCTCAAAGATATTACCAAAATAATAATTGTGGTT
TAGGTTCAGTGGGAAATATGGCAAACAGTACAAATTCCTTAAATTCAGGTACCAACAATAGTGGAA
CAAATTTGATTGTAAATTATTTGCCCCAAGATATGCAAGACCGTGAACTTTATTCATTATTTAGGAC
CATTGGTCCAATCAATACCTGCAGAATAATGCGAGATTATAAGACTGGTTACAGTTATGGATATGG
TTTTGTGGACTTTGGATCGGAAGCAGATGCATTGAGAGCCATTAATAATCTTAATGGAATCACAGT
CAGAAATAAGAGGATAAAGGTTTCATTCGCTAGGCCGGGTGGAGAACAACTGAGAGATAAATTCA
AACTTGTATGTAACAAATTTATCTAGATCAATTACTGACGAACAATTAGAAACAATTTTTGGAAAA
TATGGGCAAATTGTACAGAAAAACATTTTACGCGACAAACATACAGGAACACCGCGTGGAGTGGC
CTTTATCAGATTTAATAAGAGAGAAGAAGCTCAAGAAGCAATATCAGCTTTGAACAACGTAATACC
TGAAGGGGGAACACAACCGCTAACTGTTCGTGTTGCCGAGGAACACGGAAAGTCAAAGGGCCATG
TTTATATGGCTCCAAATCAACCACCTCACGGAAACATGGGTCATGGAAATATGGGGAATATGGGAC
ATGGAAACATGGGAATGGCCGGTGGTTCCGGAATGAATTTAAATAATATGAATGCATTCAATGGG
ATGAATCAAATGGTGCACAGAGGTAGACAAAAACATAGTTACCAACGTAAAATTCATCCATATAA
TCCAAATTTTCTTTAAACATTTAATTAAACAATATAAACAGAAACTTAGTTTTGCTTTGCTGGACAA
ATTTTAGAAAACCAATAATTAATAACACAAGTCCAAAACTAATTTTTTTTTATCGTTTTTCATATAA
AAATTCGTTGTTTATCCGATATTTTAAACTTTAAAAACGAGACACTGTAATTTAATAAGTAGTTATA
CTTAATCAATTTACTAAATTTTCAGTAGCAGTAGATGTATGTGTTAAATATCCACACTTAAAAAAGA
TGAAAACTTTTAAAACAAACAAAACACATGTAATAGTTATTCTGTATTAATATTTAAGGTATTCGAT
ATTATTCATCTTATATTTGAAAATGTCGCGAATTAATTTTAAAACAATATTAAAATTTCAATATTTA
ATGTACGACCTATTTGGGTTTGCATCAATAAACTAGATTTATGTTGTTCTCCGAAAAAAGAAAAAA
AAAAAAAAAAAAAAAAAA
Also, getting the most possible protein from ORF Finder on NCBI. The most possible open reading frame is
frame +3, producing a protein with length 307aa, starting with M (Methionine), this full possible
protein sequence is as follow:
>translation
MWGMSHSLPSGMDTDFSSYPSTIGRQHSQQSQRYYQNNNCGLGSVGNMANSTNSLNSGTNN
SGTNLIVNYLPQDMQDRELYSLFRTIGPINTCRIMRDYKTGYSYGYGFVDFGSEADALRAINN
LNGITVRNKRIKVSFARPGGEQLRDTNLYVTNLSRSITDEQLETIFGKYGQIVQKNILRDKHTG
TPRGVAFIRFNKREEAQEAISALNNVIPEGGTQPLTVRVAEEHGKSKGHVYMAPNQPPHGNM
GHGNMGNMGHGNMGMAGGSGMNLNNMNAFNGMNQMVHRGRQKHSYQRKIHPYNPNFL
8/3/2019 Project Bio Information Technology
4/19
This sequence was blasted against the nr database using megablast and gave the following output.
8/3/2019 Project Bio Information Technology
5/19
As shown in the Blast output our contig got a 100% query coverage with the Megaselia
scalaris mRNA for Sex-lethal orthologous protein (Megsxl), splice variant, clone pMSWc114
with a bit score of 2571. In Adition the query gave also a high similarity with Megaselia
scalaris mRNA for sex-lethal homologue (sxl gene) with a coverage of 96% and a bit score of
2479. After that a blast P search was performed for both translated sequences against the nr
database and resulted both in a 100% identity with a Sex-lethal homologous protein of the
Megaselia scalaris. Then, the taxonomy report of the Blastp output was analyzed all the
homologous and paralogous sequences were obtained in the protein database of NCBI. A
multiple sequence alignment of these sequences was been made using clustal x and of that
alignment a phylogenetic tree was build using the same program.
8/3/2019 Project Bio Information Technology
6/19
Fig. the BlastP output
Fig. the BlastP output of purified predicted protein against nr database.
8/3/2019 Project Bio Information Technology
7/19
Fig. the alignment output BlastP against nr database.
As the Blast output, we can confirm that our protein is Sex-lethal orthologous protein
Fig. Taxonomy output of BlastP
Protein identification
As the SuperFamily link of the BLASTP output, our protein is located in RNA/DNA binding site,
into a conserved location of RRM superfamily. RRM (RNA recognition motif), also known as RBD(RNA binding domain) or RNP (ribonucleoprotein domain), is a highly abundant domain in
8/3/2019 Project Bio Information Technology
8/19
eukaryotes found in proteins involved in post-transcriptional gene expression processes including
mRNA and rRNA processing, RNA export, and RNA stability. This domain is 90 amino acids in
length and consists of a four-stranded beta-sheet packed against two alpha-helices. RRM usually
interacts with ssRNA, but is also known to interact with ssDNA as well as proteins. RRM binds a
variable number of nucleotides, ranging from two to eight. The active site includes three aromatic
side-chains located within the conserved RNP1 and RNP2 motifs of the domain. The RRM domain isfound in a variety heterogeneous nuclear ribonucleoproteins (hnRNPs), proteins implicated in
regulation of alternative splicing, and protein components of small nuclear ribonucleoproteins
(snRNPs).
8/3/2019 Project Bio Information Technology
9/19
And from output of Structure Search in NCBI, we can predict very highly that our protein have a
homodimer, with 2 domains.
As the below result of ClustalX to find the similarity of our protein with its orthologs, a multiple sequence
alignment of the proteins from different nineteen eukaryotes which have more than 95% identity with our protein
has been done (Appendix 1 and 2). Expectedly, from the phylogenetic tree shows clear relationships with our
protein, and the protein of Megaselia_scalaris is nearest with our protein as the Blast output. Hence, the
phylogenetic tree built was very highly reliable.
8/3/2019 Project Bio Information Technology
10/19
Fig 3. Phylogeny relationship of our protein with other homolog proteins from other organisms
The active sites (large conserved region) consist of two RNA-binding domains (Fig. 2, RBD1, RBD2).
(Bopp et al. 1996)
8/3/2019 Project Bio Information Technology
11/19
Fig: the location ofRBD1, RBD2
8/3/2019 Project Bio Information Technology
12/19
8/3/2019 Project Bio Information Technology
13/19
Two main active sites of our protein is totally (100% identity) conserved comparing with the active sites RBD1,
RBD2 of the RNA binding domains of Sex-lethal protein (Bopp et al. 1996), so we can say that our protein is a
sex-lethal protein containing two RNA binding domains and can bind RNA well.
Protein Signal Topology & subcellular localization (Martijn do this part!)
According to SignalP, no signal peptide was detected from our protein sequence, indicating that it was
not a secretory protein. This result was in line with the one from SOSUI that our protein is a soluble
protein.
8/3/2019 Project Bio Information Technology
14/19
In addition, by using PSORT Prediction the protein was predicted to be localized in the cytoplasm
with a high certainty while there is no clear prediction whether it could localize in outer or inner
membranes. This is quite logical as prokaryotes themselves, containing a single plasma membrane and
cell wall, have no organelles within their cells and lack a periplasmic space.
3D protein modeling
We got a predicted model with two domains, a fair sequence identity (49%), a very significant E-
Value (6e-28) and a Model Score 1.00, which is very good. Therefore, we have a high prediction that
our 3D protein model is very similar to this model
As the output of the Ramachandran Plot of PDB Viewer programme, almost all
residues of our protein were in the allowed region. Although there are four residues
outside the allowed regions which are acceptable because proteins are not perfect, theyare flexible.
8/3/2019 Project Bio Information Technology
15/19
8/3/2019 Project Bio Information Technology
16/19
Overall structure quality was showed by Z-score (-6.7) which was quite low, indicating high
stableness and less probability for erroneous structures. Residue scores of our protein and template
were also negative which showed usual energy level
Apendix 1:
>Megaselia_scalaris_gi|6456838|emb|CAA04179.2| Sex-lethal orthologous protein [Megaselia scalaris]
MWGMSHSLPSGMDTDFSSYPSTIGRQHSQQSQRYYQNNNCGLGSVGNMANSTNSLNSGTNNSGTNLIVNY
LPQDMQDRELYSLFRTIGPINTCRIMRDYKTGYSYGYGFVDFGSEADALRAINNLNGITVRNKRIKVSFA
RPGGEQLRDTNLYVTNLSRSITDEQLETIFGKYGQIVQKNILRDKHTGTPRGVAFIRFNKREEAQEAISA
LNNVIPEGGTQPLTVRVAEEHGKSKGHVYMAPNQPPHGNMGHGNMGNMGHGNMGMAGGSGMNLNNMNAFN
GMNQMVHRGRQKHSYQRKIHPYNPNFL
>Ceratitis_capitata_gi|2981305|gb|AAC38968.1| sex-lethal homolog CcSXL [Ceratitis capitata]
MYGNMNNGGHAPYGYNGYRPSGGRMWGMSHSLPSGMDTDFTSSYPGPSMNRRGGYNDFSGGGGGGGGTMG
SMNNMVNAASTNSLNCGGGGGRDGHGGGSNGTNLIVNYLPQDMTDRELYALFRTIGPINTCRIMRDYKTG
YSFGYAFVDFAAETDSQRAIKSLNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDQLDTIFGK
YGMIVQKNILRDKLTGKPRGVAFVRFNKREEAQEAISALNNVIPEGASQPLTVRLAEEHGKAKAQHYMSQ
LGLIGGGGGGGGGGGGGGGGMGGPPPPPMNMGYNNMVHRGRQNKSRFQKMHPYHNAQKFI
>Lucilia_cuprina_gi|13357168|gb|AAK20025.1|AF234183_1 sex-lethal protein SXL1 [Lucilia cuprina]
MYGQNIRNVTYAPYGYNGYRQSGERMWRMSHSLPSGMDTDFTSSYPGPSAMNHRGGRGGGYNDFSGGGSA
MGSMCNMAPAISTNSVNSGGGDCGDTQGCNGTNLIVNYLPQDMTDRELYALFRTCGPINTCRIMKDYKTG
YSFGYAFVDFASEIDAQNAIKSLNGVTVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDELEKIFGK
YGNIVQKNILRDKLTGKPRGVAFVRFNKREEAQEAISALNNVIPEGASQPLTVRLAEEHGKMKANHFMNQ
LGMGPPAAPIPAAGPGYNNMVHRGRHNKNRNQKSHPYHNPQKFI
8/3/2019 Project Bio Information Technology
17/19
>Musca_domestica_gi|6226777|sp|O17310.1|SXL_MUSDO RecName: Full=Sex-lethal homolog
MYGQNVRNVSYYPPYGYNGYKQSGERMWRMSHSLPSGMDTDFTSSYPGPSAMNPRGRGGYNDFSGGGSAM
GSMCNMAPAQSTNSLNSGGDGGGGDTQAVNGTNLIVNYLPQDMTDRELYALFRTCGPINTCRIMKDYKTG
YSFGYAFVDFASEIDAQNAIKTVNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDELEKIFGKYGNIVQKNILRDKLTGRPRGVAFVRFNKREEAQEAISALNNVIPEGASQPLTVRLAEEHGKMKAHHFMNQ
LGMGPPAAPIPAAGPGYNNMVHRGRQNKMRNHKVHPYHNPQKFI
>Chrysomya_rufifacies_gi|6226775|sp|O97018.2|SXL_CHRRU RecName: Full=Sex-lethal homolog
MWRMSHSLPSGMSRYAFSPQDTDFTSSYPGPSAMNHRGGRGGGYNDFSGGGSAMGSMCNMAPAISTNSVN
SGGGDCGDNQGCNGTNLIVNYLPQDMTDRELYALFRTCGPINTCRIMKDYKTGYSFGYAFVDFASEIDAQ
NAIKSLNGVTVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDELEKIFGKYGNIVQKNILRDKLTGK
PRGVAFVRFNKREEAQEAISALNNVIPEGASQPLTVRLAEEHGKMKAHHFMNQLGMGPPAAPIPAAGPGY
NSMVHRGRHNKNRNQKPHPYHNPQKFI
>Bactrocera_oleae_gi|52075416|emb|CAG29242.1| sex-lethal protein [Bactrocera oleae]
MYGNMNNGGHVPYGFNGYRPSGGRRWGMSHSLPSGMDTDFTSSYPGPSMNRRGYNDFSGGGGGGSGGGGG
AMGSMNNAPAISTNSLNCGSGGGGGDGHGGGSNGTNLIVNYLPQDMTDRELYALFRTIGPINTCRIMRDY
KTGYSFGYAFVDFASETDSQRAIKSLNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDQLDTI
FGKYGMIVQKNILRDKLTGKPRGVAFVRFNKREEAQEAISALNNVIPEGGSQPLTVRLAEEHGKAKAQQY
MSQLGLIGGGGGGGGGGGGMGGPPPPPMNMGYNNMVHRGRQNKSRFQKMHPYHNAQKFI
>Drosophila_grimshawi_gi|195048229|ref|XP_001992493.1| GH24172 [Drosophila grimshawi]
MYGNNNPGSNNNNGGYPPYGYNNKSSGGRVFGMSHSLPSGMSRYAFSPQDTEFTFPSSSSRRGYNEFAGV
NGGSANSLGGLCNMPMASNNSLNNLCGLSIGSGGSDDLMNDPRNSNTNLIVNYLPQDMTDRELYALFRSI
GTINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNL
PRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRL
AEEHGKAKAAHFMTQMGMGPPQVPPPPPPPPPHMASFNMVHRDGAMEKLRSLFDAICDAIFGLDSDNFAD
LLDGLYRRKYHYPYL
>Drosophila_willistoni_gi|195439114|ref|XP_002067476.1| GK16445 [Drosophila willistoni]
MYGNNNPGSNNNNGGYPPYGYKQSSGGRGFGMSHSLPSGMSRYAFSPQDTEFTFPSSSSRRGYNDFPGGG
GNGGSANSLGGGNMCNMPPMASNNSLNNLCGLSLGSGGSDDLMNDHRPSNTNLIVNYLPQDMTDRELYAL
FRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLY
VTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPL
SVRLAEEHGKAKAAHFMSQMGMGPSPPNVPPPPPPPPPHMAAGFNNMVHRDGAMEKLRSLFDAICDAIFG
LDSDNFADLLDGLYRRKYHYPYL
>Drosophila_virilis_gi|195396313|ref|XP_002056776.1| Sex lethal [Drosophila virilis]
MYGNNNPGSNNNNGGYPPYGYNKSSGGRVFGMSHSLPSGMSRYAFSPQDTEFTFPSSSSRRGYNDFPGGN
GGSANSLGGNICNMPPMASNNSLNNLCGLSIGSGGSDDHMNDQRNSNTNLIVNYLPQDMTDRELYALFRA
IGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTN
LPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVR
LAEEHGKAKAAHFMSQMGMGPPQAPPPPPPPPPHMAAGFNNMVHRDGAMEKLRSLFDAICDAIFGLDSDN
8/3/2019 Project Bio Information Technology
18/19
FADLLDGLYRRKYHYPYL
>Drosophila_mojavensis_gi|195132500|ref|XP_002010681.1| GI21573 [Drosophila mojavensis]
MYGNNNPGSNNNNGGYPPYGYNKSSGGRVFGMSHSLPSGMSRYAFSPQDTEFTFPSSSSRRGYNDFPGGN
GGSANSLGGNICNMPMASNNSLNNLCGLSIGSGGSDDLMNDQRTSNTNLIVNYLPQDMTDRELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNL
PRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRL
AEEHGKAKAAHFMSQMGMGPPQAPPPPPPPPPHMAAGFNNMVHRDGAMEKLRSLFDAICDAIFGLDSDNF
ADLLDGLYRRKYHYPYL
>Drosophila_melanogaster_gi|78706524|ref|NP_001027063.1| Sex lethal, isoform N [Drosophila
melanogaster]
MDFNFDTVTPCSTMSSYYNFKMASGGRGFGMSHSLPSGMDTEFSFPSSSSRRGYNDFPGCGGSGGNGGSA
NNLGGGNMCHLPPMASNNSLNNLCGLSLGSGGSDDLMNDPRASNTNLIVNYLPQDMTDRELYALFRAIGP
INTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPR
TITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRLAE
EHGKAKAAHFMSQMGVVPANVPPPPPQPPAHMAAAFNMMHRGRSIKSQQRFQNSHPYFDAKKFI
>Drosophila_subobscura_gi|1403308|emb|CAA67016.1| sex-lethal [Drosophila subobscura]
MYGNNNPGSNNNNGGYPPYGYNKSSGGRGFGMSHSLPSGMDTEFSFPSSSSRRGYNEFPGGGGIGIGANG
GSANNLGGNMCNLLPMTSNNSLSNLCGLSLGSGGSDDHMMMHDQRSSNTNLIVNYLPQDMTDRELYALFR
AIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVT
NLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSV
RLAQEHGKAKAAHFMSQIGVPSANAPPPPPPPPHMAFNNMVHRGRSIKSQQRFQKTHPYFDAQKFI
>Drosophila_sechellia_gi|195353431|ref|XP_002043208.1| Sxl [Drosophila sechellia]
MYGNNNPGSNNNNGGYPPYGYNNKSSGGRGFGMSHSLPSGMDTEFSFPSSSSRRGYNDFPGCGGSGGNGG
SANNLGGGNMCHLPPMASNNSLNNLCGLSLGSGGSDDLMNDPRASNTNLIVNYLPQDMTDRELYALFRAI
GPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNL
PRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRL
AEEHGQGEGCPLYVADGRGSS
>Drosophila_simulans_gi|195565544|ref|XP_002106359.1| Sex lethal [Drosophila simulans]
MYGNNNPGSNNNNGGYPPYGYNNKSSGGRGFGMSHSLPSGMDTEFSFPSSSSRRGYNDFPGCGGSGGNGG
SANNLGGGNMCHLPPMASNNSLNNLCGLSLGSGGSDDLMNDPRASNTNLIVNYLPQDMTDRELYALFRAI
GPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNL
PRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRFSESNILFYGAMEKLRSLLDGIWDAIFGLD
SENFADLLDGLYRRKYHYPYL
>Drosophila_erecta_gi|194896665|ref|XP_001978518.1| GG17637 [Drosophila erecta]
MSHSLPSGMDTEFSFPSSSSRRGYNDFPGCGGSGGNGGSANNLGGGNMCHLPPMASNNSLNNLCGLSLGS
GGSDDLMNDPRASNTNLIVNYLPQDMTDRELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQ
RAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGR
PRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRLAEEHGKAKAAHFMSQMGVAAPNVPPPPPPPPP
8/3/2019 Project Bio Information Technology
19/19
HMAAGFNNMVHRDGAMEKLRSLFDAICDAIFGLDSDNFADLLDGLYRRKYHYPYL
>Drosophila_yakuba_gi|195480535|ref|XP_002101294.1| GE17544 [Drosophila yakuba]
MSHSLPSGMDTEFSFPSSSSRRGYNDFPGCGGSGGNGGSANNLGGGNMCHLPPMASNNSLNNLCGLSLGS
GGSDDLMNDPRASNTNLIVNYLPQDMTDRELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGR
PRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRLAEEHGKAKAAHFMSQMGVAAPNVPPPPPPPPP
HMAAGFNNMVHRDGAMEKLRSLFDAICDAIFGLDSDNFADLLDGLYRRKYHYPYL
>Drosophila_ananassae_gi|194763735|ref|XP_001963988.1| GF20968 [Drosophila ananassae]
MYGNNNPGSNNNNGGYPPYGYNNKSSGGRGFGMSHSLPSGMSRYAFSPQDTEFSFPSSSSRRGYNDFPGC
GIGGNGGSANSLGGGGGGNMCNLPPMTSNNSLNNLCGLSLGSGGSDDHMLNDQRPSNTNLIVNYLPQDMT
DRELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGES
IKDTNLYVTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIP
EGGSQPLSVRLAEEHGKAKAAHFMSQMGVPAPNAPQPPPPPPLPHMAAGFNSMVHRDGAMEKLRSLFDAI
CDAIFGLDSDNFADLLDGLYRRKYHYPYL
>Drosophila_persimilis_gi|195168643|ref|XP_002025140.1| GL26885 [Drosophila persimilis]
MYGNNNPGSNNNNGGYPPYGYNKSSGGRGFGMSHSLPSGMSRYAFSPQDTEFSFPSSSSRRGYNEFPGGG
GIGIGANGGSANNLGGNMCNLLPMTSNNSLSNLCGLSLGSGGSDDHMMMHDQRSSNTNLIVNYLPQDMTD
RELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESI
KDTNLYVTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPE
GGSQPLSVRLAEEHGKAKAAHFMSQMGVPAPNAPPPPPPPPPHMAAGFNNMVHRDGAMEKLRSLFDAICD
AIFGLDSDNFADLLDGLYRRKYHYPYL
>Drosophila_pseudoobscura_gi|198471287|ref|XP_002133706.1| GA22653 [Drosophila pseudoobscura
pseudoobscura]
MYGNNNPGSNNNNGGYPPYGYNKSSGGRGFGMSHSLPSGMSRYAFSPQDTEFSFPSSSSRRGYNEFPGGG
GIGIGANGGSANNLGGNMCNLLPMTSNNSLSNLCGLSLGSGGSDDHMMMHDQRSSNTNLIVNYLPQDMTD
RELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESI
KDTNLYVTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPE
GGSQPLSVRLAEEHGKAKAAHFMSQMGVPAPNAPPPPPPPPPHMAAGFNNMVHRDGAMEKLRSLFDAICD
AIFGLDSDNFADLLDGLYRRKYHYPYL
Appendix 2:
Bopp, D., Calhoun, G., Horabin, J.I., Samuels, M., and Schedl, P. 1996. Sex-specific control of Sex-lethal is a
conserved mechanism for sex determination in the genus Drosophila. Development, 122: 971982.