8
Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 19 Section C. Comparative Genomics Analysis of West Nile Virus Objective Upon completion of this exercise, you will be able to use the Virus Pathogen Resource (ViPR; http://www.viprbrc.org/) to: Search for virus sequences and view genome annotations in ViPR Save selected sequences as a working set in your private Workbench space Build and visualize a phylogenetic tree on a set of sequences to infer their evolutionary relationships Predict genotype and detect recombination in virus genomes Annotate virus genome sequences I. Search for sequences and save matching sequences into working sets a. Go to the ViPR homepage (http://www.viprbrc.org/), click “Flaviviridae” to get to the family homepage. b. Mouse-over “Search Data” in the grey navigation bar and click “Genomes”. c. The Genome Search page allows you to search for sequences based on taxonomy, collection year, sample location, host selection, complete genome or not, etc. A dynamic number of matching search results is displayed at the top of the page to help you search more efficiently. d. For this exercise, we are going to search for West Nile viruses isolated from 1999-2001 in the US and South Africa. Select the following criteria and click the “Search” button to run the query. Virus: West Nile virus (Flaviviridae->Flavivirus->West Nile virus) Complete Genome: Complete Genome Only Collection Year: 1999-2001 Geographic Grouping: Africa, North America Country: South Africa, USA Start to type strain to get suggestions Deselect All SELECT VIRUS(ES) TO INCLUDE IN SEARCH Jump to strain in taxonomy: COMPLETE GENOME Complete Genome Only Start: 1999 End: 2001 COLLECTION YEAR To add month to search, see Advance Search Options: Month Range GEOGRAPHIC GROUPING Africa Asia Europe North America Oceania South America COUNTRY South Africa Sudan Tanzania Trinidad and Tobago Tunisia USA HOST SELECTION All Alpaca American Flamingo Avian Bat Bearded Parrotbill Bird Bison Black Howler Blackbird Blue Jay Blue Tit Boar Bongo BrownHeaded Cowb Buffalo Camel Cardinal ADVANCED OPTIONS Search Clear Results matching your criteria: 45 Tip: To select multiple or deselect, Ctrlclick (Windows) or Cmdclick (MacOS) Show All (0/25 strains selected) (25 Strains 3 complete genomes) Species: West Nile virus Deselect All (18367/18367 strains selected) (18367 Strains 711 complete genomes) Species: Yaounde virus Select All (0/2 strains selected) (2 Strains 1 complete genomes) Species: Yellow fever virus Select All (0/440 strains selected) (440 Strains 79 complete genomes) Species: Yokose virus Select All (0/5 strains selected) (5 Strains 2 complete genomes) Species: Zika virus Select All (0/16 strains selected) (16 Strains 5 complete genomes) Genome Search Search for virus genomic sequences and related information. You can search for the whole virus family or search for specified genus, species etc. You can also find your strain or genome record if you have its information, such as strain name, accession. Genome searches for Dengue virus or Hepatitis C virus can be augmented with clinical metadata criteria. Selecting the appropriate nodes in the taxonomy browser (Flavivirus, Dengue virus, Hepacivirus, Hepatitis C virus) will add metadata search panels and enable you to include these criteria. Some sequences have more metadata fields defined than others. Queries based on metadata only retrieve sequences for which those fields are defined.

Section C. Comparative Genomics Analysis of West Nile ... · Buffalo Camel Cardinal Cattle ADVANCED OPTIONS Clear Search Results matching your criteria: 45 Tip: To select multiple

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Section C. Comparative Genomics Analysis of West Nile ... · Buffalo Camel Cardinal Cattle ADVANCED OPTIONS Clear Search Results matching your criteria: 45 Tip: To select multiple

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 19

Section C. Comparative Genomics Analysis of West Nile Virus

Objective

Upon completion of this exercise, you will be able to use the Virus Pathogen Resource (ViPR; http://www.viprbrc.org/) to:

• Search for virus sequences and view genome annotations in ViPR

• Save selected sequences as a working set in your private Workbench space

• Build and visualize a phylogenetic tree on a set of sequences to infer their evolutionary relationships

• Predict genotype and detect recombination in virus genomes

• Annotate virus genome sequences

I. Search for sequences and save matching sequences into working sets

a. Go to the ViPR homepage (http://www.viprbrc.org/), click “Flaviviridae” to get to the family homepage.

b. Mouse-over “Search Data” in the grey navigation bar and click “Genomes”.

c. The Genome Search page allows you to search for sequences based on taxonomy, collection year, sample location, host selection, complete genome or not, etc. A dynamic number of matching search results is displayed at the top of the page to help you search more efficiently.

d. For this exercise, we are going to search for West Nile viruses isolated from 1999-2001 in the US and South Africa. Select the following criteria and click the “Search” button to run the query. Virus: West Nile virus (Flaviviridae->Flavivirus->West Nile virus) Complete Genome: Complete Genome Only Collection Year: 1999-2001 Geographic Grouping: Africa, North America Country: South Africa, USA

2/14/14 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome Search

www.viprbrc.org/brc/vipr_genome_search.do?method=ModifySearch&selectionContext=1392425810978 1/1

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Feb 11, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of Veterans Affairs ,Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

Start to type strain to get suggestions Deselect All

SELECT VIRUS(ES) TO INCLUDE IN SEARCHJump to strain in taxonomy:

COMPLETE GENOME Complete Genome Only

Start: 1999

End: 2001

COLLECTION YEAR

To add month to search, seeAdvance Search Options: MonthRange

GEOGRAPHIC GROUPING

AfricaAsiaEuropeNorth AmericaOceaniaSouth America

COUNTRY

South AfricaSudanTanzaniaTrinidad and TobagoTunisiaUSAUganda

HOST SELECTION

AllAlpacaAmerican FlamingoAvianBatBearded ParrotbillBirdBisonBlack HowlerBlackbirdBlue JayBlue TitBoarBongoBrown-‐‑Headed CowbirdBuffaloCamelCardinalCattle

ADVANCED OPTIONSSearchClear

Results matching your criteria: 45

Tip: To select multiple or deselect, Ctrl-­click (Windows) or Cmd-­click (MacOS)

Show All

(0/25 strains selected) (25 Strains -­ 3 complete genomes)

Species: West Nile virus Deselect All(18367/18367 strains selected) (18367 Strains -­ 711 complete genomes)

Species: Yaounde virus Select All(0/2 strains selected) (2 Strains -­ 1 complete genomes)

Species: Yellow fever virus Select All(0/440 strains selected) (440 Strains -­ 79 complete genomes)

Species: Yokose virus Select All(0/5 strains selected) (5 Strains -­ 2 complete genomes)

Species: Zika virus Select All(0/16 strains selected) (16 Strains -­ 5 complete genomes)

Genome Search Search for virus genomic sequences and related information. You can search for the whole virus family or search for specified genus, species etc. You can also find yourstrain or genome record if you have its information, such as strain name, accession.

Genome searches for Dengue virus or Hepatitis C virus can be augmented with clinical metadata criteria. Selecting the appropriate nodes in the taxonomy browser(Flavivirus, Dengue virus, Hepacivirus, Hepatitis C virus) will add metadata search panels and enable you to include these criteria. Some sequences have more metadatafields defined than others. Queries based on metadata only retrieve sequences for which those fields are defined.

ViPR Home Flaviviridae Home Genome Search

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES You are logged in as [email protected]

FlaviviridaeAbout Us Community Announcements Links Resources Support

Page 2: Section C. Comparative Genomics Analysis of West Nile ... · Buffalo Camel Cardinal Cattle ADVANCED OPTIONS Clear Search Results matching your criteria: 45 Tip: To select multiple

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 20

e. The Search Results page will be displayed. Here you can:

i. Save the search query to your Workbench and rerun the search again later.

ii. Download the sequences (genome, CDS, protein) by clicking “Download”.

iii. Store selected sequences as a working set in the Workbench so that you can run various analyses on the working set.

iv. View the details for any item in the results table by clicking on “View” next to any row.

f. On the Genome Details page, you will find the strain information, genome information, genome image map, and mature peptide annotations generated by ViPR.

g. Click “View” for a protein to load the Mature Peptide Details page. Here you will find annotations of the mature peptide including: genomic locations, HMM/Pfam domains, related protein structures, predicted and experimentally determined immune epitopes, etc.

h. Return to the Genome Search Results page by clicking “Results” in the breadcrumb.

2/14/14 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome Search Result

www.viprbrc.org/brc/vipr_genome_search.do 1/2

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Add to Working Set Save Search Download

Your search returned 45 genomes. Search Criteria Displaying 50 records per page , sorted by Species Name, Strain

Name, GenBank Accession in ascending order.Display Settings

Genome Search Result

Your Selected Items: 45 items selected | Deselect All

Select all 45 genomes

Strain NameSpecies

Name

GenBank

Accession

Sequence

Length

Collection

DateHost GenBank Host Country Mol Type

3356.2.1.1(JEV) West Nilevirus

EF530047 11029 2000 Crow American crow USA genomicRNA

3356K VP2 West Nilevirus

EF657887 11029 2000 Crow American crow USA genomicRNA

FL2001 crow 67030 West Nilevirus

GQ379156 11029 07/2001 Crow crow USA genomicRNA

LSU-­AR01 West Nilevirus

FJ527738 11029 2001 Blue Jay blue jay USA genomicRNA

New York 99 West Nilevirus

HQ596519 11029 1999 Crow crow USA genomicRNA

NY 2001 Suffolk West Nilevirus

DQ164194 11029 2001 Crow American crow USA genomicRNA

NY99-­crow-­V76/1 West Nilevirus

FJ151394 11029 1999 Crow crow USA genomicRNA

WNV-­1/US/BID-­V4186/1999

West Nilevirus

HM488125 10516 1999 Crow Corvusbrachyrhynchos

USA genomicRNA

WNV-­1/US/BID-­V4187/1999

West Nilevirus

HM488126 10598 1999 Crow Corvusbrachyrhynchos

USA genomicRNA

WNV-­1/US/BID-­V4188/1999

West Nilevirus

HM488127 10625 1999 Crow Corvusbrachyrhynchos

USA genomicRNA

WNV-­1/US/BID-­V4189/1999

West Nilevirus

HM488128 10616 1999 Crow Corvusbrachyrhynchos

USA genomicRNA

WNV-­1/US/BID-­V4191/2000

West Nilevirus

HM488129 10617 2000 Mosquito Culex salinarius USA genomicRNA

WNV-­1/US/BID-­V4192/2000

West Nilevirus

HM488130 10621 2000 Mosquito Culex salinarius USA genomicRNA

WNV-­1/US/BID-­

V4193/2000

West Nile

virus

HM488131 10620 2000 Mosquito Culex pipiens USA genomic

RNA

WNV-­1/US/BID-­V4194/2000

West Nilevirus

HM488132 10621 2000 Mosquito Culiseta melanura USA genomicRNA

WNV-­1/US/BID-­V4195/2001

West Nilevirus

HM488133 10621 2001 Mosquito Culex pipiens USA genomicRNA

WNV-­1/US/BID-­V4196/2001

West Nilevirus

HQ671696 10618 2001 Mosquito Culex salinarius USA genomicRNA

WNV-­1/US/BID-­V4197/2001

West Nilevirus

HQ671697 10621 2001 Mosquito Aedes vexans USA genomicRNA

WNV-­1/US/BID-­V4198/2001

West Nilevirus

HM488134 10513 2001 Mosquito Ochlerotatussollicitans

USA genomicRNA

WNV-­1/US/BID-­V4199/2001

West Nilevirus

HM488135 10621 2001 Mosquito Ochlerotatuscantator

USA genomicRNA

WNV-­1/US/BID-­V4200/2001

West Nilevirus

HM488136 10621 2001 Mosquito Culex restuans USA genomicRNA

WNV-­1/US/BID-­V4689/2001

West Nilevirus

HM488246 10533 2001 Avian Corvusbrachyrhynchos

USA genomicRNA

WNV-­1/US/BID-­V4691/2001

West Nilevirus

HM488247 10612 2001 Avian Corvusbrachyrhynchos

USA genomicRNA

Run Analysis

ViPR Home Flaviviridae Home Genome Search Results

Identify Similar Sequences (BLAST)

Analyze Sequence Variation (SNP)

Align Sequences (MSA)

Metadata-­driven Comparative Analysis Tool

Generate Phylogenetic Tree

Genotype Recombination

Sequence Format Conversion

PCR Primer Design

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES You are logged in as [email protected]

FlaviviridaeAbout Us Community Announcements Links Resources Support

2/14/14 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Details for Flavivirus West Nile virus Strain 3356.2.1.1

www.viprbrc.org/brc/viprStrainDetails.do?ncbiAccession=EF530047&decorator=flavi&context=1392429622993 1/2

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Send Comments to Curator

Strain Name: 3356.2.1.1

Organism: West Nile virus

Taxonomy: Flaviviridae -­> Flavivirus -­> West Nile virus -­> Type JEV

GenBank Host: American crow

Host: Crow

Isolation Country: USA

Collection Date: 2000

GenBank Definition: West Nile virus strain 3356.2.1.1, complete genome.

Authors:

Jia,Y., Dupuis,A.P. II, Jerzak,G.V.S., Maffei,J.G. andKramer,L.D.,Jia,Y., Moudy,R.M., Dupuis,A.P. II, Ngo,K.A.,Maffei,J.G., Jerzak,G.V., Franke,M.A., Kauffman,E.B. andKramer,L.D.

GenBank Sequence Accession: EF530047

Sequence Length: 11029

Sequence Status: Complete

Sequence: View Nucleotide Sequence and design PCR primers

Number of Proteins: 14

Organism Name: West Nile virus

Isolation Source: kidney

GenBank Note: small plaque phenotype;; plaque purified virus variant with smallplaque morphology;; derived from WN NY 2000-­crow3356 VP1

Mol Type: genomic RNA

GenBank Host: American crow

Host: Crow

Isolation Country: USA

Collection Date: 2000

Genome Image Map Hide Show

Strain Details for West Nile virus Strain 3356.2.1.1(JEV)

Strain Information

Genome: EF530047

ViPR Home Flaviviridae Home Genome Search Results Strain Details (3356.2.1.1)

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES You are logged in as [email protected]

FlaviviridaeAbout Us Community Announcements Links Resources Support

2/14/14 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Details for Flavivirus West Nile virus Strain 3356.2.1.1

www.viprbrc.org/brc/viprStrainDetails.do?ncbiAccession=EF530047&decorator=flavi&context=1392429622993 2/2

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Feb 11, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of Veterans Affairs ,Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

Protein Information (SOP)Gene Symbol Protein Product Name ViPR Locus ID CDS Start CDS End NCBI Gene ID Locus Name

GenBank

-­N/A-­ polyprotein WNV-­1 97 10398 -­N/A-­ -­N/A-­

ViPR-­generated

ancC anchored core protein C ancC 97 465 -­N/A-­ -­N/A-­

C core protein C C 97 411 -­N/A-­ -­N/A-­

preM PreM protein preM 466 966 -­N/A-­ -­N/A-­

M matrix protein M M 742 966 -­N/A-­ -­N/A-­

E envelope protein E 967 2469 -­N/A-­ -­N/A-­

NS1 non-­structural protein NS1 NS1 2470 3525 -­N/A-­ -­N/A-­

NS2a non-­structural protein NS2a NS2a 3526 4218 -­N/A-­ -­N/A-­

NS2b non-­structural protein NS2b NS2b 4219 4611 -­N/A-­ -­N/A-­

NS3 non-­structural protein NS3 NS3 4612 6468 -­N/A-­ -­N/A-­

NS4a non-­structural protein NS4a NS4a 6469 6846 -­N/A-­ -­N/A-­

2k 2K protein 2k 6847 6915 -­N/A-­ -­N/A-­

NS4b non-­structural protein NS4b NS4b 6916 7680 -­N/A-­ -­N/A-­

NS5 RNA-­dependent RNA polymerase NS5 7681 10395 -­N/A-­ -­N/A-­

Page 3: Section C. Comparative Genomics Analysis of West Nile ... · Buffalo Camel Cardinal Cattle ADVANCED OPTIONS Clear Search Results matching your criteria: 45 Tip: To select multiple

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 21

i. To analyze these sequences, we will select all records by ticking the checkbox above the table and add them to a working set by clicking the “Add to working set” button. This way, we will be able to retrieve the data from the Workbench later and run various analyses on the same data set.

j. You’ll be prompted to log in to your Workbench account in order to save data to a working set. If you don’t have an account already, simply register for an account for free by choosing the “Register for a new account” option and following the prompts.

k. A lightbox of “Add to Working Set” will pop up. Now create a new working set and name it “WNV 1999-2001 US & S Africa complete genomes”. Click “Add to Working Set” to save the sequences to a working set.

II. Construct and visualize a genome phylogenetic tree

a. Click “Workbench” in the grey navigation bar to access your Workbench area.

b. On the Workbench page, click “View” next to the saved WNV working set.

c. The Working Set Details page displays the sequence records saved in the working set. Select all records by clicking the checkbox above the table. Mouse over “Run Analysis” and click “Generate Phylogenetic Tree”.

d. On the Tree setting page, select “Quick Tree”, choose strain name and date as tree tip label, and click “Build Tree”.

e. While the analysis is running, you can save the analysis to your Workbench by entering a name and then clicking “Save to Workbench”. Once it is saved, you can come back to the Workbench at any time to retrieve the analysis results.

f. After the analysis is finished, a View Phylogenetic Tree page will be loaded. Here you can save the phylogenetic file in Newick or PhyloXML format to your computer. Click “View Tree” to load the Archaeopteryx Tree Viewer window.

2/14/14 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Phylogenetic Tree

www.viprbrc.org/brc/tree.do?method=ModifyInputPage&decorator=flavi&ticketNumber=TR_889545101471 1/1

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Feb 11, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of Veterans Affairs ,Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

INPUT45 GENOMES SELECTED FOR TREE

LABEL TREE TIPS (ENDS) WITH Strain Name Specify custom format of tip label (max 4)

Strain Name Accession Number Date Country USA State Host Species Species Name

ANALYSIS NAME

TREE GENERATION Quick Tree Custom Tree (I want to set my own parameters and/or I have a large

dataset)

SOURCE OF SEQUENCES TO BE ANALYZED *

Build TreeClear

Generate Phylogenetic Tree Tutorial

The "Quick Tree" option uses the FastME [ Desper, R., Gascuel, O. (2002) Journal of Computational Biology 19(5), pp. 687-­705. ]. This algorithm uses a fast, distance-­based approach and is good for generating trees from datasets containing 1) more than 1,000 sequences of short or medium length sequences, 2) more than 100 very longsequences, or 3) to reconstruct a "quick and dirty" tree. The "Custom Tree" option incorporates PhyML [ Guindon, S. and Gascuel, O., (2003) Syst Biol. 52: 696-­704 ] or RaxML [Stamatakis, A. et al. (2005) Bioinformatics 21:456-­463] algorithms. User-­defined settings are required for either. PhyML infers a more evolutionarily-­accurate phylogenetic topology by applying a substitution model to thenucleotide sequences. This algorithm is best applied to datasets containing 1) fewer than 100 very long sequences, 2) between 100 and 1,000 small or medium lengthsequences. When large datasets are input, ViPR will automatically use the RaxML algorithm. Click here to view a tutorial on generating a phylogenetic tree using ViPR tools.

ViPR Home Flaviviridae Home Genome Search Results Generate Phylogenetic Tree

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES You are logged in as [email protected]

FlaviviridaeAbout Us Community Announcements Links Resources Support

Page 4: Section C. Comparative Genomics Analysis of West Nile ... · Buffalo Camel Cardinal Cattle ADVANCED OPTIONS Clear Search Results matching your criteria: 45 Tip: To select multiple

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 22

g. A Tree Viewer window will pop up. Many tree customization options exist including: change tree type, reroot the tree, collapse/expand/display subtree, swap descendants, decorate (color) the tree leaves by any associated metadata (e.g., country, year of isolation, host, virus type, sequence position, etc.), resize the tree, change the font size, etc.

i. Color the tree leaves by country by selecting “Country” in the Basic Decoration Options section.

ii. Re-root the tree based on the South African sequences. To do so, make sure “Root/ Reroot” is selected in the Tree Manipulations section, and then click the node next to the two South African sequences.

iii. Our previous Meta-CATS analysis identified amino acid position 522 in the polyprotein (232 in the E protein) as significantly different between the lineage 1a viruses and lineage 2 viruses, with more likely a V in lineage 1a viruses and a T in lineage 2 viruses (p-value = 1.3E-128). Now color-code the tree by its corresponding nucleotide position 1667.

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Jan 24, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

Save Analysis Newick File PhyloXml File Phylip File Tree Parameters PhyML Log Tree Build Parameters

View Tree

The IRD Tree Decorator is a custom-enhancement of Archaeopteryx . The original FORESTER/ATV library is freely available from SourceForge . Credits: Zmasek C.M. and Eddy S.R.

(2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384.

Click the "View Tree" button below to launch the tree viewer software in a new window. If you prefer other viewing software, the tree data is available fordownload in Newick or PhyloXml format using the buttons above.

Due to security concerns, certain browsers (e.g. Safari and Firefox) have disabled Java plug-ins by default. If the Tree Viewer takes a long time toload, please test your browser's Java plug-in to make sure it can display Java Applets properly.Safari has recently tightened the security settings on Java Applets, which may affect image export functions of the Tree Viewer. Click Here forinstructions on how to fix this.

ENHANCED TREE VIEWER

The IRD team provides software that allows 'decoration' of your tree by features such as host species, year, country, and subtype. This custom software isbased on Archaeopteryx . In the tree viewer, use the drop-down menu for basic decoration or advanced decoration to select the feature for coloring. Thedecorated tree and corresponding legend can be exported using options in the File drop-down menu.

A user's guide is available. How to create a publication quality tree image

View Phylogenetic TreeHome My Workbench Working... Generate Phylogenetic Tree Results

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Influenza Research Database - Phylogenetic Tree Viewer http://www.fludb.org/brc/tree.do?decorator=influenza&method...

1 of 1 2/10/14 7:45 PM

Page 5: Section C. Comparative Genomics Analysis of West Nile ... · Buffalo Camel Cardinal Cattle ADVANCED OPTIONS Clear Search Results matching your criteria: 45 Tip: To select multiple

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 23

iv. Click on the “Advanced Decoration” button and select the “Sequence Position” option from the drop-down menu.

v. Enter position 1667 into the textbox to highlight this nucleotide position.

vi. The tree shows that the South African sequences (lineage 2) both have A at this position while the American strains (lineage 1a) all have G at this position. This position, therefore, can be used to color-code the tree by taxonomic lineage.

vii. The default colors may or may not be ideal for your purpose. You can change the color by using the “Advanced Decoration”. In the Advanced Decoration Options dialog box, select “Sequence Position”, click the Manual Decoration checkbox and click “Go”.

viii. Check A and choose red in the color palette, then click “Apply”. Now strains with A at position 1667 are colored in red.

ix. You can save the tree image by clicking the “File” menu and then a file format.

h. Return to the Tree Results page. Save the tree analysis to your Workbench by clicking “Save Analysis”. Rename the analysis so that you can recognize it later, for example, “WNV 1999-2001 USA S Africa phylogeny”. Then click “Save”.

i. Go to your Workbench. You can see the tree is listed at the top of the Workbench table. Click “View” to retrieve the tree analysis result. The parameters used to generate the tree are also saved.

III. Genotype & Recombination Detection

a. Download a WNV genome sequence to your computer from: http://tinyurl.com/l4dbg29.

b. Mouse-over “Analyze & Visualize” in the grey navigation bar and click “Genotype-Recombination Detection”.

Page 6: Section C. Comparative Genomics Analysis of West Nile ... · Buffalo Camel Cardinal Cattle ADVANCED OPTIONS Clear Search Results matching your criteria: 45 Tip: To select multiple

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 24

c. When the landing page for this tool loads, either:

• Select the “Paste sequences in fasta format” option and paste the contents of the downloaded genome into the textbox, OR

• Select the “Upload a file containing my sequences in fasta format” and find the correct file on your own computer.

d. Next, select the “West Nile Virus” species from the drop-down list and click the “Run” button.

e. The results should be displayed in a table with the summary information for each analyzed strain shown in separate rows. To view more detailed information for the results for the chimera strain, click on the “View” link in the first column of the table.

f. On the Genotype Report page, you can:

• View the predicted genotype and recombination type (if applicable).

• Download a spreadsheet listing the detailed results of recombination determination.

• View the genotyping results in graphical format.

• Download or view the alignment of your sequence with representative sequences from each taxon selected by ViPR.

• Download or view the phylogenetic tree based on the alignment of your sequence with representative sequences from each taxon selected by ViPR.

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Feb 11, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of Veterans Affairs ,Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

Upload a file containing my sequences in FASTA format.

Paste sequences in FASTA format.

Use working sets

SOURCE OF SEQUENCES TO BE ANALYZED *Sequences can also be selected from search results or a working set in your workbench

>WNV-Chimera|New York 99|SA93/01AGTAGTTCGCCTGTGTGAGCTGACAAACTTAGTAGTGTTTGTGAGGATTAACAACAATTAACACAGTGCGAGCTGTTTCTTAGCACGAAGATCTCGATGTCTAAGAAACCAGGAGGGCCCGGCAAGAGCCGGGCTGTCAATATGCTAAAACGCGGAATGCCCCGCGTGTTGTCCTTGATTGGACTGAAGAGGGCTATGTTGAGCCTGATCGACGGCAAGGGGCCAATACGATTTGTGTTGGCTCTCTTGGCGTTCTTCAGGTTCACAGCAATTGCTCCGACCCGAGCAGTGCTGGATCGATGGAGAGGTGTGAACAAACAAACAGCGATGAAACACCTTCTGAGTTTTAA

Only 1 sequence is needed.Defline in your FASTA file will be used to label the display

West Nile Virus

RunClear

ANALYSIS NAME

SELECT SPECIES

Genotype Determination and Recombination DetectionThis annotation pipeline takes an alignment of sequences containing at least two representatives from each taxon. This reference alignment is then used to construct adistance-based tree, which is then parsed in order to find the closest relatives for any query sequence using a Branch Indexing method. By incorporating a static window size,this pipeline can also identify any recombinant query sequence. When the analysis is completed, a graphical representation of the score corresponding to the genotypeclassification for each region of the "sliding window" will be shown. A spreadsheet file with the results will also be available for download. This tool is based on the GenotypeDetermination Tool developed by Carla Kuiken's group at Los Alamos National Laboratory for the HCV database . (SOP)

ViPR Home Flaviviridae Home Genotype determination and Recombination detection

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES You are logged in as [email protected]

FlaviviridaeAbout Us Community Announcements Links Resources Support

Virus Pathogen Database and Analysis Resource (ViPR) - Flaviv... http://www.viprbrc.org/brc/genotypeRecombination.do?metho...

1 of 1 2/20/14 3:57 PM

2/20/14 3:39 PMVirus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genotype Recombination Result

Page 1 of 1http://www.viprbrc.org/brc/genotypeRecombination.do?decorator=flavi&method=RetrieveResults&ticketNumber=GR_566099818252

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Feb 11, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of Veterans Affairs, Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

Save Analysis Download

Your analysis contains 1 records

Genotype Recombination Analysis Result

Defline Species Status(Genotype) BI(Genotype) Genotype Status(Recombination) Recombination Comment

WNV_Chime WESTNILE Success 0.354 2 Success 1A,2 -N/A-

ViPR Home Flaviviridae Home Genotype determination and Recombination detection Results

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES You are logged in as [email protected]

FlaviviridaeAbout Us Community Announcements Links Resources Support

Page 7: Section C. Comparative Genomics Analysis of West Nile ... · Buffalo Camel Cardinal Cattle ADVANCED OPTIONS Clear Search Results matching your criteria: 45 Tip: To select multiple

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 25

Note that this strain was artificially made to be a recombinant, which was detected by this algorithm. Strains that have natural recombination will be detected in a similar way.

IV. Genome Annotation

a. Mouse-over the “Analyze & Visualize” tab from the grey navigation bar and click “Genome Annotator (GATU)”.

b. In order to annotate your own sequence, you need to select a previously annotated reference sequence. If you already have an annotated reference sequence in .gb format, click “Launch GATU” to proceed directly to launch GATU. If not, you can use ViPR BLAST to search for a closely-related annotated sequence as your reference.

i. If you have your own sequence, prepare the sequence in FASTA format, save it in plain

text and use .fasta as the file extension. FASTA file example:

2/20/14 3:40 PMVirus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genotype Recombination Details

Page 1 of 2http://www.viprbrc.org/brc/genotypeRecombination.do?decorator=flavi&method=ShowDetails&ticketNumber=GR_566099818252&shortName=_seq001

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Genotype

Download

The genotype results include a tab separated file listing the sequence name, a single consensus genotype result for the entire genome, and the confidence metric.

Recombination

Download

This is a tab separated file listing the results for all windows for the sequence.

Genotyping results in graphical format

Alignment

Download Aligned Fasta Visualize Aligned Sequences

This is the multiple sequence alignment of your sequence with a ViPR reference sequence alignment that consists of at least 2 representatives from each taxon

Tree

View phylogentic treeDownload Newick File

This is the tree generated by PAUP based on the input alignment for the whole genome

Genotype Report

Genotype InformationWhole Genome Genotype prediction: 2

Whole Genome Recombination Type: 1A,2

Run Analysis

ViPR Home Flaviviridae Home Genotype determination and Recombination detection Results Details

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES You are logged in as [email protected]

FlaviviridaeAbout Us Community Announcements Links Resources Support

2/14/14 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - GATU

www.viprbrc.org/brc/gatuStart.do?decorator=flavi 1/1

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Feb 11, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of Veterans Affairs ,Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

Go

REFERENCE SEQUENCE

To use GATU, you will need to select a reference sequence. If you already have an appropriate GenBank file, proceed directly to Launch GATU. If not,you can use ViPR Blast to search for one. Browse to your target sequence in a FASTA format, then click on Go. Run a Blast search, pick a referencesequence file, and download it to your directory in GenBank format. Then click Launch GATU and upload your reference and target sequences using therespective controls.

File Path:

No file chosenChoose File

Launch GATU

Go

FILE FORMAT CONVERSION

The GATU-­produced annotation file can be modified for submission to GenBank by using the file format conversion tool. This tool will convert the GATU-­produced (GenBank format) annotation file into a Fasta sequence file and a tab-­delimited 'Feature Table' file. Both files can then be used for submittingnew sequences to GenBank with the Sequin and tbl2asn tools.

File Path:

No file chosenChoose File

ANALYSIS NAME

Genome Annotator (GATU)GATU, a Genome Annotation Transfer Utility (Tcherepanov, et al., BMC Genomics 2006, 7:150 PubMed: 16772042) is an initial-­stage tool to transfer annotations from apreviously annotated reference to a new, closely-­related target genome. ViPR users should ensure that their system has Java 1.6 or higher. The GATU interface providescontrols for uploading a reference .gb file of the relevant viral family, along with the target genome in .gb or Fasta format. When done, a table summarizes the similarities oftransferred annotations and provides users with checkbox control over which to accept. GATU also detects ORFs in the target and bioinformatics tools to assess if theseshould be annotated. The annotated target genome can be saved in multiple file formats.

Originally developed at the University of Victoria, GATU was adapted for use with ViPR.

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES You are logged in as [email protected]

FlaviviridaeAbout Us Community Announcements Links Resources Support

Launch GATU directly if you have a reference

sequence in .gb format.

Use your target sequence to BLAST for a closely-related annotated

sequence as your reference.

Page 8: Section C. Comparative Genomics Analysis of West Nile ... · Buffalo Camel Cardinal Cattle ADVANCED OPTIONS Clear Search Results matching your criteria: 45 Tip: To select multiple

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 26

>gb:EF657887|Organism:West Nile virus 3356K VP2|Subtype:null|Host:Crow AGTAGTTCGCCTGTGTGAGCTGACAAACTTAGTAGTGTTTGTGAGGATTAACAACAATTAACACAGTGCG AGCTGTTTCTTAGCACGAAGATCTCGATGTCTAAGAAACCAGGAGGGCCCGGCAAGAGCCGGGCTGTCAA

Otherwise, you can use a sample sequence from: https://tinyurl.com/mz62l34

ii. Click “Browse”, find the target sequence file on your computer, and click “Go” to run a BLAST search again annotated WNV reference sequences in ViPR.

iii. After BLAST is finished, a list of recommended reference sequences will be displayed. Choose a closely-related sequence and download its GenBank file to your computer.

c. Now, click “Launch GATU” to run the GATU application. A dialog box will pop up. Click “Allow” to allow the GATU applet to be loaded on your computer.

d. In the GATU window, upload your .gb file as the “Reference Genome” and your target genome FASTA file as the “Genome to Annotate”.

e. Click “Annotate” to execute annotation process. When done, a table is displayed which summarizes the similarities of transferred annotations and provides users with checkbox control over which to accept.

f. Click “Save” to save the annotated target genome in Genbank, EMBL, or XML file formats.

2/20/14 6:09 PMVirus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Sequence Similarity Search (BLAST) Report

Page 1 of 1http://www.viprbrc.org/brc/blast.do?decorator=flavi&method=RetrieveResults&ticketNumber=BL_457758125639

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Feb 11, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library, Wellcome Images, U.S. Department of Veterans Affairs, Science of the Invisible and ViralZone, Swiss Institute of Bioinformatics.

REFERENCE SEQUENCE

Here are some recommended Reference Sequences. Select one, click the link Download GenBank File, and save the file to your local machine.Now click Launch GATU. Under Genome Selection > Reference Genome, click Upload Genome File and browse to the saved Reference.

Download GenBank File Sequence header Bit Score E Value

EXT853727 >gi|853727| Country:USA| West Nile virus, complete genome.|gb|158516887 21740 0.0

EXT870105 >gi|870105| Country:| West Nile virus, complete genome.|gb|11528013 2553 0.0

EXT849462 >gi|849462| Country:| Murray Valley encephalitis virus, complete genome.|gb|9633622 204 5.0E-51

EXT391627 >gi|391627| Country:USA| St. Louis encephalitis virus, complete genome.|gb|123205971 153 2.0E-35

EXT844430 >gi|844430| Country:Austria| Usutu virus, complete genome.|gb|56692441 145 4.0E-33

Launch GATU

GATUGATU, a Genome Annotation Transfer Utility (Tcherepanov, et al., BMC Genomics 2006, 7:150 PubMed: 16772042) is an initial-stage tool to transfer annotations from apreviously annotated reference to a new, closely-related target genome. ViPR users should ensure that their system has Java 1.6 or higher. The GATU interface providescontrols for uploading a reference .gb file of the relevant viral family, along with the target genome in .gb or Fasta format. When done, a table summarizes the similarities oftransferred annotations and provides users with checkbox control over which to accept. GATU also detects ORFs in the target and bioinformatics tools to assess if these shouldbe annotated. The annotated target genome can be saved in multiple file formats.

Originally developed at the University of Victoria, GATU was adapted for use with ViPR.

ViPR Home Flaviviridae Home Identify Similar Sequences (BLAST) Results

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES You are logged in as [email protected]

FlaviviridaeAbout Us Community Announcements Links Resources Support