Project Bio Information Technology

Embed Size (px)

Citation preview

  • 8/3/2019 Project Bio Information Technology

    1/19

    Project Bioinformation Technology

    Martijn Heddes (870313134060)

    Nguyen Thi Kha Tu (860319828030)

  • 8/3/2019 Project Bio Information Technology

    2/19

    Introduction

    In this experiment a set of bioinformatics applications were used to find out as much as possible about

    a raw set of sequences obtained. The five main goals involved are gene annotation, protein

    identification, determination of protein topology and localization, determination of 3D-structure model

    and getting some extra information from literature. The set of sequences obtained was coded andcoded 17 different sequences.

    METHOD SECTION

    The 17 obtained sequences were attended to be different sequences obtained from a shotgun

    sequencing reaction. Therefore, the sequences were screened on vector contamination sequences using

    VecScreen[1] before constructing the contig using CAP3[2]. The contig obtained was used for a

    nucleotide blast search against the Nucleotide collection (nr/nt) database in order to find similar or

    related sequences with a known function. The highly similar sequences obtained were aligned to the

    contig sequence and checked for differences. Some improvements were made in de contig sequence.

    which was compared then to the protein sequence of the protein obtained from the blast search.Furthermore, additional literature about the specific protein was obtained searching in the medline

    database using pubmed and google scholar. This in order to get some additional information about the

    function of the protein, the involved working mechanism, the conserved domains and or motifs of the

    protein, the evolution of the protein superfamily and the conformations of previous findings.

    Signal Topology and subcellular localization

    To indicate the topology and sub cellular localization of the predicted protein PSORT (version 2.0.4)

    [8] and SOSUI [9] were used to search for specific motifs, which could reveal some information about

    the subcellular localization of the protein. In addition, signal P was used in order to predict a terminalcleavage site.

    3D protein modeling

    To get a proper view of the three-dimensional structure of the predicted protein, several modeling

    applications were used. Therefore, the obtained protein sequence was blasted against de PDB database

    and also the predicted 3D structure together with the PDB code of the protein and other homologous

    proteins were obtained from Modbase[11]. These PDB codes were used for searching the VAST[10]

    database. Furthermore, the obtained predicted models were analyzed using respectively DeepView[13]

    and Cn3D. Finally a check for the reliability has been done by using ProSa [12] and by making a

    Ramachandran plot in DeepView.

    RESULTS AND DISCUSSION

    Gene annonation

  • 8/3/2019 Project Bio Information Technology

    3/19

    No vector contamination was detected by VecScreen in any of the 17 sequences provided. Also, no

    strange sequences were detected in the file and thus all the sequences were used for the contig

    alignment. Cap3 gave as outpute one contig and used 15 of the 17 sequences . Sequences 10 and 12

    were determined as contamination.

    >Contig1

    GCGGCCGAATGTGGGGGATGTCCCATTCATTACCATCCGGAATGGACACAGATTTTTCATCATATC

    CATCAACGATAGGTCGGCAACACTCACAACAATCTCAAAGATATTACCAAAATAATAATTGTGGTT

    TAGGTTCAGTGGGAAATATGGCAAACAGTACAAATTCCTTAAATTCAGGTACCAACAATAGTGGAA

    CAAATTTGATTGTAAATTATTTGCCCCAAGATATGCAAGACCGTGAACTTTATTCATTATTTAGGAC

    CATTGGTCCAATCAATACCTGCAGAATAATGCGAGATTATAAGACTGGTTACAGTTATGGATATGG

    TTTTGTGGACTTTGGATCGGAAGCAGATGCATTGAGAGCCATTAATAATCTTAATGGAATCACAGT

    CAGAAATAAGAGGATAAAGGTTTCATTCGCTAGGCCGGGTGGAGAACAACTGAGAGATAAATTCA

    AACTTGTATGTAACAAATTTATCTAGATCAATTACTGACGAACAATTAGAAACAATTTTTGGAAAA

    TATGGGCAAATTGTACAGAAAAACATTTTACGCGACAAACATACAGGAACACCGCGTGGAGTGGC

    CTTTATCAGATTTAATAAGAGAGAAGAAGCTCAAGAAGCAATATCAGCTTTGAACAACGTAATACC

    TGAAGGGGGAACACAACCGCTAACTGTTCGTGTTGCCGAGGAACACGGAAAGTCAAAGGGCCATG

    TTTATATGGCTCCAAATCAACCACCTCACGGAAACATGGGTCATGGAAATATGGGGAATATGGGAC

    ATGGAAACATGGGAATGGCCGGTGGTTCCGGAATGAATTTAAATAATATGAATGCATTCAATGGG

    ATGAATCAAATGGTGCACAGAGGTAGACAAAAACATAGTTACCAACGTAAAATTCATCCATATAA

    TCCAAATTTTCTTTAAACATTTAATTAAACAATATAAACAGAAACTTAGTTTTGCTTTGCTGGACAA

    ATTTTAGAAAACCAATAATTAATAACACAAGTCCAAAACTAATTTTTTTTTATCGTTTTTCATATAA

    AAATTCGTTGTTTATCCGATATTTTAAACTTTAAAAACGAGACACTGTAATTTAATAAGTAGTTATA

    CTTAATCAATTTACTAAATTTTCAGTAGCAGTAGATGTATGTGTTAAATATCCACACTTAAAAAAGA

    TGAAAACTTTTAAAACAAACAAAACACATGTAATAGTTATTCTGTATTAATATTTAAGGTATTCGAT

    ATTATTCATCTTATATTTGAAAATGTCGCGAATTAATTTTAAAACAATATTAAAATTTCAATATTTA

    ATGTACGACCTATTTGGGTTTGCATCAATAAACTAGATTTATGTTGTTCTCCGAAAAAAGAAAAAA

    AAAAAAAAAAAAAAAAAA

    Also, getting the most possible protein from ORF Finder on NCBI. The most possible open reading frame is

    frame +3, producing a protein with length 307aa, starting with M (Methionine), this full possible

    protein sequence is as follow:

    >translation

    MWGMSHSLPSGMDTDFSSYPSTIGRQHSQQSQRYYQNNNCGLGSVGNMANSTNSLNSGTNN

    SGTNLIVNYLPQDMQDRELYSLFRTIGPINTCRIMRDYKTGYSYGYGFVDFGSEADALRAINN

    LNGITVRNKRIKVSFARPGGEQLRDTNLYVTNLSRSITDEQLETIFGKYGQIVQKNILRDKHTG

    TPRGVAFIRFNKREEAQEAISALNNVIPEGGTQPLTVRVAEEHGKSKGHVYMAPNQPPHGNM

    GHGNMGNMGHGNMGMAGGSGMNLNNMNAFNGMNQMVHRGRQKHSYQRKIHPYNPNFL

  • 8/3/2019 Project Bio Information Technology

    4/19

    This sequence was blasted against the nr database using megablast and gave the following output.

  • 8/3/2019 Project Bio Information Technology

    5/19

    As shown in the Blast output our contig got a 100% query coverage with the Megaselia

    scalaris mRNA for Sex-lethal orthologous protein (Megsxl), splice variant, clone pMSWc114

    with a bit score of 2571. In Adition the query gave also a high similarity with Megaselia

    scalaris mRNA for sex-lethal homologue (sxl gene) with a coverage of 96% and a bit score of

    2479. After that a blast P search was performed for both translated sequences against the nr

    database and resulted both in a 100% identity with a Sex-lethal homologous protein of the

    Megaselia scalaris. Then, the taxonomy report of the Blastp output was analyzed all the

    homologous and paralogous sequences were obtained in the protein database of NCBI. A

    multiple sequence alignment of these sequences was been made using clustal x and of that

    alignment a phylogenetic tree was build using the same program.

  • 8/3/2019 Project Bio Information Technology

    6/19

    Fig. the BlastP output

    Fig. the BlastP output of purified predicted protein against nr database.

  • 8/3/2019 Project Bio Information Technology

    7/19

    Fig. the alignment output BlastP against nr database.

    As the Blast output, we can confirm that our protein is Sex-lethal orthologous protein

    Fig. Taxonomy output of BlastP

    Protein identification

    As the SuperFamily link of the BLASTP output, our protein is located in RNA/DNA binding site,

    into a conserved location of RRM superfamily. RRM (RNA recognition motif), also known as RBD(RNA binding domain) or RNP (ribonucleoprotein domain), is a highly abundant domain in

  • 8/3/2019 Project Bio Information Technology

    8/19

    eukaryotes found in proteins involved in post-transcriptional gene expression processes including

    mRNA and rRNA processing, RNA export, and RNA stability. This domain is 90 amino acids in

    length and consists of a four-stranded beta-sheet packed against two alpha-helices. RRM usually

    interacts with ssRNA, but is also known to interact with ssDNA as well as proteins. RRM binds a

    variable number of nucleotides, ranging from two to eight. The active site includes three aromatic

    side-chains located within the conserved RNP1 and RNP2 motifs of the domain. The RRM domain isfound in a variety heterogeneous nuclear ribonucleoproteins (hnRNPs), proteins implicated in

    regulation of alternative splicing, and protein components of small nuclear ribonucleoproteins

    (snRNPs).

  • 8/3/2019 Project Bio Information Technology

    9/19

    And from output of Structure Search in NCBI, we can predict very highly that our protein have a

    homodimer, with 2 domains.

    As the below result of ClustalX to find the similarity of our protein with its orthologs, a multiple sequence

    alignment of the proteins from different nineteen eukaryotes which have more than 95% identity with our protein

    has been done (Appendix 1 and 2). Expectedly, from the phylogenetic tree shows clear relationships with our

    protein, and the protein of Megaselia_scalaris is nearest with our protein as the Blast output. Hence, the

    phylogenetic tree built was very highly reliable.

  • 8/3/2019 Project Bio Information Technology

    10/19

    Fig 3. Phylogeny relationship of our protein with other homolog proteins from other organisms

    The active sites (large conserved region) consist of two RNA-binding domains (Fig. 2, RBD1, RBD2).

    (Bopp et al. 1996)

  • 8/3/2019 Project Bio Information Technology

    11/19

    Fig: the location ofRBD1, RBD2

  • 8/3/2019 Project Bio Information Technology

    12/19

  • 8/3/2019 Project Bio Information Technology

    13/19

    Two main active sites of our protein is totally (100% identity) conserved comparing with the active sites RBD1,

    RBD2 of the RNA binding domains of Sex-lethal protein (Bopp et al. 1996), so we can say that our protein is a

    sex-lethal protein containing two RNA binding domains and can bind RNA well.

    Protein Signal Topology & subcellular localization (Martijn do this part!)

    According to SignalP, no signal peptide was detected from our protein sequence, indicating that it was

    not a secretory protein. This result was in line with the one from SOSUI that our protein is a soluble

    protein.

  • 8/3/2019 Project Bio Information Technology

    14/19

    In addition, by using PSORT Prediction the protein was predicted to be localized in the cytoplasm

    with a high certainty while there is no clear prediction whether it could localize in outer or inner

    membranes. This is quite logical as prokaryotes themselves, containing a single plasma membrane and

    cell wall, have no organelles within their cells and lack a periplasmic space.

    3D protein modeling

    We got a predicted model with two domains, a fair sequence identity (49%), a very significant E-

    Value (6e-28) and a Model Score 1.00, which is very good. Therefore, we have a high prediction that

    our 3D protein model is very similar to this model

    As the output of the Ramachandran Plot of PDB Viewer programme, almost all

    residues of our protein were in the allowed region. Although there are four residues

    outside the allowed regions which are acceptable because proteins are not perfect, theyare flexible.

  • 8/3/2019 Project Bio Information Technology

    15/19

  • 8/3/2019 Project Bio Information Technology

    16/19

    Overall structure quality was showed by Z-score (-6.7) which was quite low, indicating high

    stableness and less probability for erroneous structures. Residue scores of our protein and template

    were also negative which showed usual energy level

    Apendix 1:

    >Megaselia_scalaris_gi|6456838|emb|CAA04179.2| Sex-lethal orthologous protein [Megaselia scalaris]

    MWGMSHSLPSGMDTDFSSYPSTIGRQHSQQSQRYYQNNNCGLGSVGNMANSTNSLNSGTNNSGTNLIVNY

    LPQDMQDRELYSLFRTIGPINTCRIMRDYKTGYSYGYGFVDFGSEADALRAINNLNGITVRNKRIKVSFA

    RPGGEQLRDTNLYVTNLSRSITDEQLETIFGKYGQIVQKNILRDKHTGTPRGVAFIRFNKREEAQEAISA

    LNNVIPEGGTQPLTVRVAEEHGKSKGHVYMAPNQPPHGNMGHGNMGNMGHGNMGMAGGSGMNLNNMNAFN

    GMNQMVHRGRQKHSYQRKIHPYNPNFL

    >Ceratitis_capitata_gi|2981305|gb|AAC38968.1| sex-lethal homolog CcSXL [Ceratitis capitata]

    MYGNMNNGGHAPYGYNGYRPSGGRMWGMSHSLPSGMDTDFTSSYPGPSMNRRGGYNDFSGGGGGGGGTMG

    SMNNMVNAASTNSLNCGGGGGRDGHGGGSNGTNLIVNYLPQDMTDRELYALFRTIGPINTCRIMRDYKTG

    YSFGYAFVDFAAETDSQRAIKSLNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDQLDTIFGK

    YGMIVQKNILRDKLTGKPRGVAFVRFNKREEAQEAISALNNVIPEGASQPLTVRLAEEHGKAKAQHYMSQ

    LGLIGGGGGGGGGGGGGGGGMGGPPPPPMNMGYNNMVHRGRQNKSRFQKMHPYHNAQKFI

    >Lucilia_cuprina_gi|13357168|gb|AAK20025.1|AF234183_1 sex-lethal protein SXL1 [Lucilia cuprina]

    MYGQNIRNVTYAPYGYNGYRQSGERMWRMSHSLPSGMDTDFTSSYPGPSAMNHRGGRGGGYNDFSGGGSA

    MGSMCNMAPAISTNSVNSGGGDCGDTQGCNGTNLIVNYLPQDMTDRELYALFRTCGPINTCRIMKDYKTG

    YSFGYAFVDFASEIDAQNAIKSLNGVTVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDELEKIFGK

    YGNIVQKNILRDKLTGKPRGVAFVRFNKREEAQEAISALNNVIPEGASQPLTVRLAEEHGKMKANHFMNQ

    LGMGPPAAPIPAAGPGYNNMVHRGRHNKNRNQKSHPYHNPQKFI

  • 8/3/2019 Project Bio Information Technology

    17/19

    >Musca_domestica_gi|6226777|sp|O17310.1|SXL_MUSDO RecName: Full=Sex-lethal homolog

    MYGQNVRNVSYYPPYGYNGYKQSGERMWRMSHSLPSGMDTDFTSSYPGPSAMNPRGRGGYNDFSGGGSAM

    GSMCNMAPAQSTNSLNSGGDGGGGDTQAVNGTNLIVNYLPQDMTDRELYALFRTCGPINTCRIMKDYKTG

    YSFGYAFVDFASEIDAQNAIKTVNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDELEKIFGKYGNIVQKNILRDKLTGRPRGVAFVRFNKREEAQEAISALNNVIPEGASQPLTVRLAEEHGKMKAHHFMNQ

    LGMGPPAAPIPAAGPGYNNMVHRGRQNKMRNHKVHPYHNPQKFI

    >Chrysomya_rufifacies_gi|6226775|sp|O97018.2|SXL_CHRRU RecName: Full=Sex-lethal homolog

    MWRMSHSLPSGMSRYAFSPQDTDFTSSYPGPSAMNHRGGRGGGYNDFSGGGSAMGSMCNMAPAISTNSVN

    SGGGDCGDNQGCNGTNLIVNYLPQDMTDRELYALFRTCGPINTCRIMKDYKTGYSFGYAFVDFASEIDAQ

    NAIKSLNGVTVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDELEKIFGKYGNIVQKNILRDKLTGK

    PRGVAFVRFNKREEAQEAISALNNVIPEGASQPLTVRLAEEHGKMKAHHFMNQLGMGPPAAPIPAAGPGY

    NSMVHRGRHNKNRNQKPHPYHNPQKFI

    >Bactrocera_oleae_gi|52075416|emb|CAG29242.1| sex-lethal protein [Bactrocera oleae]

    MYGNMNNGGHVPYGFNGYRPSGGRRWGMSHSLPSGMDTDFTSSYPGPSMNRRGYNDFSGGGGGGSGGGGG

    AMGSMNNAPAISTNSLNCGSGGGGGDGHGGGSNGTNLIVNYLPQDMTDRELYALFRTIGPINTCRIMRDY

    KTGYSFGYAFVDFASETDSQRAIKSLNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDQLDTI

    FGKYGMIVQKNILRDKLTGKPRGVAFVRFNKREEAQEAISALNNVIPEGGSQPLTVRLAEEHGKAKAQQY

    MSQLGLIGGGGGGGGGGGGMGGPPPPPMNMGYNNMVHRGRQNKSRFQKMHPYHNAQKFI

    >Drosophila_grimshawi_gi|195048229|ref|XP_001992493.1| GH24172 [Drosophila grimshawi]

    MYGNNNPGSNNNNGGYPPYGYNNKSSGGRVFGMSHSLPSGMSRYAFSPQDTEFTFPSSSSRRGYNEFAGV

    NGGSANSLGGLCNMPMASNNSLNNLCGLSIGSGGSDDLMNDPRNSNTNLIVNYLPQDMTDRELYALFRSI

    GTINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNL

    PRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRL

    AEEHGKAKAAHFMTQMGMGPPQVPPPPPPPPPHMASFNMVHRDGAMEKLRSLFDAICDAIFGLDSDNFAD

    LLDGLYRRKYHYPYL

    >Drosophila_willistoni_gi|195439114|ref|XP_002067476.1| GK16445 [Drosophila willistoni]

    MYGNNNPGSNNNNGGYPPYGYKQSSGGRGFGMSHSLPSGMSRYAFSPQDTEFTFPSSSSRRGYNDFPGGG

    GNGGSANSLGGGNMCNMPPMASNNSLNNLCGLSLGSGGSDDLMNDHRPSNTNLIVNYLPQDMTDRELYAL

    FRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLY

    VTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPL

    SVRLAEEHGKAKAAHFMSQMGMGPSPPNVPPPPPPPPPHMAAGFNNMVHRDGAMEKLRSLFDAICDAIFG

    LDSDNFADLLDGLYRRKYHYPYL

    >Drosophila_virilis_gi|195396313|ref|XP_002056776.1| Sex lethal [Drosophila virilis]

    MYGNNNPGSNNNNGGYPPYGYNKSSGGRVFGMSHSLPSGMSRYAFSPQDTEFTFPSSSSRRGYNDFPGGN

    GGSANSLGGNICNMPPMASNNSLNNLCGLSIGSGGSDDHMNDQRNSNTNLIVNYLPQDMTDRELYALFRA

    IGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTN

    LPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVR

    LAEEHGKAKAAHFMSQMGMGPPQAPPPPPPPPPHMAAGFNNMVHRDGAMEKLRSLFDAICDAIFGLDSDN

  • 8/3/2019 Project Bio Information Technology

    18/19

    FADLLDGLYRRKYHYPYL

    >Drosophila_mojavensis_gi|195132500|ref|XP_002010681.1| GI21573 [Drosophila mojavensis]

    MYGNNNPGSNNNNGGYPPYGYNKSSGGRVFGMSHSLPSGMSRYAFSPQDTEFTFPSSSSRRGYNDFPGGN

    GGSANSLGGNICNMPMASNNSLNNLCGLSIGSGGSDDLMNDQRTSNTNLIVNYLPQDMTDRELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNL

    PRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRL

    AEEHGKAKAAHFMSQMGMGPPQAPPPPPPPPPHMAAGFNNMVHRDGAMEKLRSLFDAICDAIFGLDSDNF

    ADLLDGLYRRKYHYPYL

    >Drosophila_melanogaster_gi|78706524|ref|NP_001027063.1| Sex lethal, isoform N [Drosophila

    melanogaster]

    MDFNFDTVTPCSTMSSYYNFKMASGGRGFGMSHSLPSGMDTEFSFPSSSSRRGYNDFPGCGGSGGNGGSA

    NNLGGGNMCHLPPMASNNSLNNLCGLSLGSGGSDDLMNDPRASNTNLIVNYLPQDMTDRELYALFRAIGP

    INTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPR

    TITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRLAE

    EHGKAKAAHFMSQMGVVPANVPPPPPQPPAHMAAAFNMMHRGRSIKSQQRFQNSHPYFDAKKFI

    >Drosophila_subobscura_gi|1403308|emb|CAA67016.1| sex-lethal [Drosophila subobscura]

    MYGNNNPGSNNNNGGYPPYGYNKSSGGRGFGMSHSLPSGMDTEFSFPSSSSRRGYNEFPGGGGIGIGANG

    GSANNLGGNMCNLLPMTSNNSLSNLCGLSLGSGGSDDHMMMHDQRSSNTNLIVNYLPQDMTDRELYALFR

    AIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVT

    NLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSV

    RLAQEHGKAKAAHFMSQIGVPSANAPPPPPPPPHMAFNNMVHRGRSIKSQQRFQKTHPYFDAQKFI

    >Drosophila_sechellia_gi|195353431|ref|XP_002043208.1| Sxl [Drosophila sechellia]

    MYGNNNPGSNNNNGGYPPYGYNNKSSGGRGFGMSHSLPSGMDTEFSFPSSSSRRGYNDFPGCGGSGGNGG

    SANNLGGGNMCHLPPMASNNSLNNLCGLSLGSGGSDDLMNDPRASNTNLIVNYLPQDMTDRELYALFRAI

    GPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNL

    PRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRL

    AEEHGQGEGCPLYVADGRGSS

    >Drosophila_simulans_gi|195565544|ref|XP_002106359.1| Sex lethal [Drosophila simulans]

    MYGNNNPGSNNNNGGYPPYGYNNKSSGGRGFGMSHSLPSGMDTEFSFPSSSSRRGYNDFPGCGGSGGNGG

    SANNLGGGNMCHLPPMASNNSLNNLCGLSLGSGGSDDLMNDPRASNTNLIVNYLPQDMTDRELYALFRAI

    GPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNL

    PRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRFSESNILFYGAMEKLRSLLDGIWDAIFGLD

    SENFADLLDGLYRRKYHYPYL

    >Drosophila_erecta_gi|194896665|ref|XP_001978518.1| GG17637 [Drosophila erecta]

    MSHSLPSGMDTEFSFPSSSSRRGYNDFPGCGGSGGNGGSANNLGGGNMCHLPPMASNNSLNNLCGLSLGS

    GGSDDLMNDPRASNTNLIVNYLPQDMTDRELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQ

    RAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGR

    PRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRLAEEHGKAKAAHFMSQMGVAAPNVPPPPPPPPP

  • 8/3/2019 Project Bio Information Technology

    19/19

    HMAAGFNNMVHRDGAMEKLRSLFDAICDAIFGLDSDNFADLLDGLYRRKYHYPYL

    >Drosophila_yakuba_gi|195480535|ref|XP_002101294.1| GE17544 [Drosophila yakuba]

    MSHSLPSGMDTEFSFPSSSSRRGYNDFPGCGGSGGNGGSANNLGGGNMCHLPPMASNNSLNNLCGLSLGS

    GGSDDLMNDPRASNTNLIVNYLPQDMTDRELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESIKDTNLYVTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGR

    PRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVRLAEEHGKAKAAHFMSQMGVAAPNVPPPPPPPPP

    HMAAGFNNMVHRDGAMEKLRSLFDAICDAIFGLDSDNFADLLDGLYRRKYHYPYL

    >Drosophila_ananassae_gi|194763735|ref|XP_001963988.1| GF20968 [Drosophila ananassae]

    MYGNNNPGSNNNNGGYPPYGYNNKSSGGRGFGMSHSLPSGMSRYAFSPQDTEFSFPSSSSRRGYNDFPGC

    GIGGNGGSANSLGGGGGGNMCNLPPMTSNNSLNNLCGLSLGSGGSDDHMLNDQRPSNTNLIVNYLPQDMT

    DRELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGES

    IKDTNLYVTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIP

    EGGSQPLSVRLAEEHGKAKAAHFMSQMGVPAPNAPQPPPPPPLPHMAAGFNSMVHRDGAMEKLRSLFDAI

    CDAIFGLDSDNFADLLDGLYRRKYHYPYL

    >Drosophila_persimilis_gi|195168643|ref|XP_002025140.1| GL26885 [Drosophila persimilis]

    MYGNNNPGSNNNNGGYPPYGYNKSSGGRGFGMSHSLPSGMSRYAFSPQDTEFSFPSSSSRRGYNEFPGGG

    GIGIGANGGSANNLGGNMCNLLPMTSNNSLSNLCGLSLGSGGSDDHMMMHDQRSSNTNLIVNYLPQDMTD

    RELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESI

    KDTNLYVTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPE

    GGSQPLSVRLAEEHGKAKAAHFMSQMGVPAPNAPPPPPPPPPHMAAGFNNMVHRDGAMEKLRSLFDAICD

    AIFGLDSDNFADLLDGLYRRKYHYPYL

    >Drosophila_pseudoobscura_gi|198471287|ref|XP_002133706.1| GA22653 [Drosophila pseudoobscura

    pseudoobscura]

    MYGNNNPGSNNNNGGYPPYGYNKSSGGRGFGMSHSLPSGMSRYAFSPQDTEFSFPSSSSRRGYNEFPGGG

    GIGIGANGGSANNLGGNMCNLLPMTSNNSLSNLCGLSLGSGGSDDHMMMHDQRSSNTNLIVNYLPQDMTD

    RELYALFRAIGPINTCRIMRDYKTGYSFGYAFVDFTSEMDSQRAIKVLNGITVRNKRLKVSYARPGGESI

    KDTNLYVTNLPRTITDDQLDTIFGKYGSIVQKNILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPE

    GGSQPLSVRLAEEHGKAKAAHFMSQMGVPAPNAPPPPPPPPPHMAAGFNNMVHRDGAMEKLRSLFDAICD

    AIFGLDSDNFADLLDGLYRRKYHYPYL

    Appendix 2:

    Bopp, D., Calhoun, G., Horabin, J.I., Samuels, M., and Schedl, P. 1996. Sex-specific control of Sex-lethal is a

    conserved mechanism for sex determination in the genus Drosophila. Development, 122: 971982.