Accessing Molecular Data & Web Tools

Embed Size (px)

Citation preview

  • 8/9/2019 Accessing Molecular Data & Web Tools

    1/4

    Accessing Molecular Data & Web Tools. 1

    Accessing Molecular Data & Web Tools

    Somchai Saengamnatdej

    April 25, 2010

    1. Retrieving a genome/chromosome

    EBI genomes (http://www.ebi.ac.uk/genomes/) For downloading a chromosome.

    Click on eukaryota to see a list.

    Scroll down to find the chromosome you want.

    Click on the accession number, then save file.

    Genomes online database, GOLD (http://genomesonline.org)

    To retrieve a genome. Click on 'Enter GOLD'

    Click on 'Search GOLD'

    Type your name of organism in 'Organism Name Box' When a list appears, click on the link (in Data-Search column), you will be led to the entry in

    NCBI/ GenBank

    Artemis (If you know the accession number)

    Click 'File', then 'Open from EBI-Dbfetch'.

    Type in the accession number.

    Click OK.

    2. Retrieving a protein sequence.

    SRS (http://srs.ebi.ac.uk)

    In Quick Text Search Window, select 'Protein' in find box, and type in 'name of your protein' inmatching box. Them, click 'Search'.

    When the list shows up, go to the entry with the 'accession number' you want.

    Tick in the box at the start of the entry.

    In the 'Display Options' window, select 'UniprotView' in the 'view results using: box'

    Then, click on "Apply Display Options" button.

    When the window of a list appears, double click on the UniProtKB to open.

    When the full entry shows up, scroll through the entry. (General information, description &origin of the protein, published/unpublished references, comments on the function of the gene,

    database cross references, keyword, sequence features, & sequence.)

    Click on the hyper-linked text to go to the database entries. Go back to the query list page.

    Now, again tick into the box at the start of the entry.

    On the Result Options window, select 'FastA' in the Launch analysis tool box. Click 'Save'

    The new window shows up, select 'FastaSeqs' in save with box.

    In the window 'Output To', select 'Browser Window (HTML)'

  • 8/9/2019 Accessing Molecular Data & Web Tools

    2/4

    Accessing Molecular Data & Web Tools. 2

    Click 'Save'.

    3. Annotation of a gene.

    Protein domain prediction

    PROSITE (http://www.expasy.ch/prosite/) Paste the protein sequence retrieved from a database in the box provided.

    Click on 'Scan'.

    In the results viewer, there is a list of Prosite hits, click on the individual hits to go to thespecific entries and read their descriptions.

    There is a high level of false positives because prosite motif patterns are generally small and

    rarely cover complete domains.

    The more reliable methods (Pfam, SMART) use HMMs (by searching against a library ofHMMs describing hundreds of conserved domains.

    Pfam (http://pfam.sanger.ac.uk/)

    Select 'SEQUENCE SEARCH' Paste your protein sequence in the box.

    Click 'Go'

    A 'progress' window appears. Then, search results window shows up.

    There is a list of 'significant' & 'insignificant' matches and an interactive graphical output.

    Click on the link in the 'Family' column. to go to the entry.

    In the Pfam entry page, click on the tabs at the top (Domain organization & Speciesdistribution)

    SMART (http://www.embl-heidelberg.de/) Paste the protein sequence into the box.

    Select all the search options available.

    Click on 'Sequence SMART' to run. Output

    Schematic output.

    Description of the programs that are used to produce the schematic output.

    Interaction network.

    Other output including BLAST results.

    InterPro (http://www.ebi.ac.uk/interpro/) A database of protein families, domains and functional sites.

    Identifiable features found in known proteins can be applied to unknown protein sequences.

    The icons at the bottom of the page are about the databases involved. Enter the interProScan Sequence Search page by clicking on the 'InterProScan' (on the left

    column).

    The submission form presents. Paste the sequence of your protein in the box.

    In the 'Results', select 'interactive'

    Check all in 'APPLICATIONS TO RUN'

  • 8/9/2019 Accessing Molecular Data & Web Tools

    3/4

    Accessing Molecular Data & Web Tools. 3

    Then, click 'Submitt Job' button.

    The temporary window appears. The results page shows up. There is a list of hits, click on 'InterPro' link to go to the entry.

    BLOCKS (http://www.blocks.fhcrs.org/)TIGRfam (http://www.tigr.org/TIGRFAMs/)

    PRINTS (http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/)

    ProDom (http://prodom.prabi.fr/prodom/current/html/home.php)

    Transmembrane predictions

    TMHMM (http://www.cbs.dtu.dk/services/TMHMM/)

    Open the TMHMM v2.0 server page from the URL above.

    Paste the protein sequence in the box. Select output format as 'Extensive, with graphics'

    Click on 'Submit'

    The results are in tabular output and graphics. How many transmembrane domains in the protein. Try 'TMPRED' at the URL below to

    compare the result.

    TMPRED (http://www.ch.embnet.org/software/TMPRED_form.html)

    PHOBIUS (http://phobius.cgb.ki.se/)

    Signal peptide prediction

    SignalP (http://www.cbs.dtu.dk/services/TMHMM/)

    Go to the SignalP3.0 Server output page.

    Paste the protein sequence into the box. Select your search options and output format.

    Click on 'Submit' button.

    The prediction results are graphical, tabular, and SignalP-HMM outputs.

    Try 'PSORT' at the following URL to compare the results.

    PSORT (http://psort.nibb.ac.jp/)

    RNA annotation

    tRNA Scan (http://selab.janelia.org/tRNAscan-SE/)

    Go to t-RNA Scan server

    Select your sequence format, source organism, analysis type, & output format Paste the genome DNA sequence in the box.

    Click on 'Run tRNAscan -SE'

  • 8/9/2019 Accessing Molecular Data & Web Tools

    4/4

    Accessing Molecular Data & Web Tools. 4

    Rfam (http://www.sanger.ac.uk/Software/Rfam/ or http://Rfam.sanger.ac.uk/)

    4. Access the sequence read archive.

    Sequence Read Archive (SRA) (http://www.ebi.ac.uk/ena)

    Type 'RNA-seq Plasmodium falciparum' into the box, in All Databases.

    When the new page appears, click on 'Nucleotide Sequences'. In a list on a new page, click on the link 'Experiments'

    A list of all the RNA experiments will show up.

    Click on the red arrow at the end of the line of the entry to expand the window.

    Then, click on 'Runs' at the end of the window.

    The SRA Run Record shows up and allows you to download the RNA-seq data.

    Sequence Read Archive (SRA)(http://www.ncbi.nlm.nih.gov/sra)

    5. Other web-based resources.

    Entrez (http://ww.ncbi.nlm.nih.gov/Entrez/)

    Blast searches (http://www.ncbi.nlm.nih.gov/BLAST)

    Fasta searches (http://www.ebi.ac.uk/fasta33/)

    Expasy Molecular Biology Server: (http://ca.expasy.org/)

    References

    See my previous documents.