17
Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 15 Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses Objective Upon completion of this exercise, you will be able to use the Influenza Research Database (IRD; http://www.fludb.org/) to: Search for virus sequences and view detailed information about these sequences in IRD Save selected sequences as a working set in your private Workbench space Combine multiple working sets Build a phylogenetic tree on a set of sequences to infer their evolutionary relationships Use the Meta-CATS tool to identify nucleotide or amino acid positions that significantly differ between groups of virus sequences Perform a multiple sequence alignment to observe sequence conservation and variation Determine if significant positions are located in viral protein Sequence Features and examine Sequence Feature Variant Type reports Search for 3D protein structures and highlight Sequence Features and custom positions on a structure Background In December 2014, a highly pathogenic avian influenza H5N2 infection was identified in two farms in British Columbia (Canadian Food Inspection Agency, 2015). As of June 17, 2015, a total of 223 detections have been reported in the United States with 48,091,293 wild and domestic birds affected (United States Department of Agriculture, 2015). Phylogenetic analysis showed that this H5N2 outbreak lineage is a reassortant where five genome segments (HA, PA, M, PB2, NS) are from the high pathogenic Eurasian clade 2.3.4.4 and the remaining three segments (NA, NP, PB1) are from the North American low pathogenic avian lineage (Pasick, 2015; Ip, 2015). For this use case we are interested in analyzing and comparing the HA segments from the H5N2 outbreak isolates using the data and tools provided in the Influenza Research Database.

Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 15

Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses

Objective

Upon completion of this exercise, you will be able to use the Influenza Research Database (IRD; http://www.fludb.org/) to:

• Search for virus sequences and view detailed information about these sequences in IRD

• Save selected sequences as a working set in your private Workbench space

• Combine multiple working sets

• Build a phylogenetic tree on a set of sequences to infer their evolutionary relationships

• Use the Meta-CATS tool to identify nucleotide or amino acid positions that significantly differ between groups of virus sequences

• Perform a multiple sequence alignment to observe sequence conservation and variation

• Determine if significant positions are located in viral protein Sequence Features and examine Sequence Feature Variant Type reports

• Search for 3D protein structures and highlight Sequence Features and custom positions on a structure

Background

In December 2014, a highly pathogenic avian influenza H5N2 infection was identified in two farms in British Columbia (Canadian Food Inspection Agency, 2015). As of June 17, 2015, a total of 223 detections have been reported in the United States with 48,091,293 wild and domestic birds affected (United States Department of Agriculture, 2015). Phylogenetic analysis showed that this H5N2 outbreak lineage is a reassortant where five genome segments (HA, PA, M, PB2, NS) are from the high pathogenic Eurasian clade 2.3.4.4 and the remaining three segments (NA, NP, PB1) are from the North American low pathogenic avian lineage (Pasick, 2015; Ip, 2015).

For this use case we are interested in analyzing and comparing the HA segments from the H5N2 outbreak isolates using the data and tools provided in the Influenza Research Database.

Page 2: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 16

Analysis Workflow

Highlight significant positions and Sequence Features on protein structure: - search for H5 3D protein structures - highlight Meta-CATS positions and Sequence Features on a structure

Determine if the significant positions are located in Sequence Features: - follow the Sequence Feature links on the Meta-CATS report - examine Sequence Features containing the significant positions

Visualize multiple sequence alignment: - align amino acid sequences of North American H5N2 viruses (4) - identify variant positions on alignment

Run Metadata-driven Comparative Analysis Tool (Meta-CATS): - convert segment working set (1) into protein working set (4) - input working set (4) to Meta-CATS - identify positions that are significantly different between H5N2 outbreak and older viruses

Construct nucleotide phylogenetic tree: - construct phylogenetic tree using HA segment sequences from working set (3) - color tree by subtype, flu season, H5 clade, and SFVT to explore the  evolution  of  H5N2 outbreak viruses

Search for sequences and save sequences into working sets: (1) search for HA segment sequences from North American H5N2 viruses from 2010-2015 and save sequences as working set (1) (2) BLAST for HA segment sequences similar to H5N2 outbreak sequences and save them as working set (2) (3) combine working sets (1) and (2) into (3)

Page 3: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 17

I. Search for sequences and save matching sequences into working sets

1. Search for HA sequences from North American H5N2 viruses

a. Go to the IRD homepage (http://www.fludb.org/), mouse-over “Search Data” in the gray navigation bar, then “Search Sequences” and click “Nucleotide Sequences”.

b. The Protein Sequence Search page allows you to search for sequences based on data type, virus type, subtype, strain name, classical proteins, IRD-predicted variant proteins, host, geographical region, complete sequences or not, H1N1 pandemic sequences or not, and date range. Select the following parameters and click “Search”. Subtype: H5N2 Select Proteins: ý HA Date Range: From: 2010

Geographic Grouping: ý North America Country: ý Canada; USA

Advanced Options (Click “Advanced Options” to view and select additional search options.) Laboratory Strains: Exclude laboratory strains Minimum Segment Length: 4: 1600

Note: IRD shows instant counts of search results (in red) to help you search quickly and efficiently. When you select search criteria on search pages, you will instantly know how many records match your search criteria before clicking the “Search” button and actually running the search.

Page 4: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 18

c. Once the “Search” button is clicked, the search result will be displayed in a table as shown below. Each column is sortable by clicking the header. Click the “Collection Date” header to sort records by flu season.

d. The IRD team has implemented an algorithm for classifying the clade of the hemagglutinin gene of influenza A (H5) viruses. It uses phylogenetic analysis to classify HA (H5) sequences (from both highly pathogenic Goose/Guangdong-like viruses and from non-pathogenic Eurasian and American lineage viruses) according to the WHO classification scheme. Now, click the “Display Settings” button and select “H5 Clade” as an additional display field. You will see H5 Clade annotation in the rightmost column.

e. From the Search Results page you can:

i. Store selected sequences as a working set in the Workbench so that you can save the dataset and run various analyses on the dataset in the future.

ii. Save the search query to your Workbench and rerun the search again later.

iii. Select records and run an analysis on the selected records by mousing-over the “Run Analysis” button and clicking a desired analysis option.

iv. Download the sequences (gene, CDS, protein) by clicking “Download”.

v. View the details for any sequence in the results table.

For this exercise, click the “View” link for A/turkey/Washington/61-22/2014 to access the Segment/Protein Details page. Specifically look for the following annotations:

• H5 Clade • SNP • Sequence Derived Phenotype Marker • Protein Sequence Features

Page 5: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 19

f. Click the “Results” breadcrumb to return to the Search Results page. To analyze the retrieved sequences, we will select records by ticking the checkbox above the table and adding them to a working set by clicking the “Add to working set” button. This way, we will be able to retrieve the data from the Workbench later and run various analyses on the same data set.

g. You’ll be prompted to log in to your Workbench account in order to save data to a working set. If you don’t have an account already, simply register for an account for free by choosing the “Register for a new account” option and following the prompts.

h. A lightbox of “Add to Working Set” will pop up. Now create a new working set and name it “H5N2 HA N America 2010-2015”. Click “Add to Working Set” to save the sequences to a working set.  

i. Access your Workbench by clicking the “Workbench” tab. You will see the newly created working set at the top of the content list.

2. BLAST for HA sequences similar to H5N2 outbreak sequences

Now we are going to expand the sequence set by including HA sequences that are highly similar to the H5N2 North American viruses. The IRD BLAST tool utilizes the NCBI BLAST program set and has a collection of custom influenza sequence databases to search against.

a. From the Workbench table, click “View” next to “H5N2 HA N America 2010-2015” to display items in the working set.

b. Select A/turkey/BC/FAV10/2014 by ticking its checkbox, mouse over “Run Analysis” and click “Identify Similar Sequences (BLAST)”.

c. In the Select Sequence Type lightbox, choose “Nucleic Acid (NA)” and click “Continue”.

d. Now the BLAST setting page will be loaded. IRD provides a collection of custom influenza sequence databases to search against. Select “Blastn” and “Nucleotides for segment 4 H5”. Keep the default settings for other parameters. Then click “Run”.

Page 6: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 20

e. On the BLAST Report page, all nearest hits are listed in the table. Click a hit to view its alignment. Click the IRD link (e.g., ird|1281736) to view the hit’s Segment/Protein Details page in IRD. What are the geographic region and subtype of the sequences that are most similar to the H5N2 outbreak sequences?

f. Now select all hits by ticking the checkbox in the column header and click “Add to Working Set” to save these sequences into a new working set named “A/turkey/BC/FAV10/2014 BLAST hits”.

3. Combine working sets of North American H5N2 and closely related viruses

a. Now we are going to combine the working sets of H5N2 HA North American sequences and A/turkey/BC/FAV10/2014 BLAST hits and use the combined working set for phylogeny analysis.

b. Click the “Workbench” tab in the gray navigation bar to go to your Workbench.

c. Find the two working sets we just saved and select them by ticking the corresponding

Page 7: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 21

checkboxes. Next, click “More Actions” above the table. A lightbox will pop up. Click “Combine” to combine the selected working sets into a new working set. Name the new working set to be “H5N2 HA N America 2010-2015 + blastn hits” and click “Combine”.

d. The combined working set will appear at the top of the Workbench table. Duplicate sequences have been automatically removed.

II. Construct an HA segment phylogenetic tree

a. Now we will construct a phylogenetic tree using HA segment sequences from North American H5N2 isolates and sequences that are most similar to the current H5N2 outbreak isolates.

b. On the workbench content list, click “View” next to “H5N2 HA N America 2010-2015 + blastn hits” to display sequences in the working set.

c. On the working set page, select all records by checking the checkbox above the table, mouse-over the “Run Analysis” button and click “Generate Phylogenetic Tree”.  

d. A “Select Sequence Type” lightbox will pop up. Choose “Nucleic Acid (NA)” and click “Continue”.

e. On the Generate Phylogenetic Tree page, select “Quick Tree”, choose Strain Name, Accession Number, Season, and Subtype  as tree tip label, and click “Build Tree”.

f. If you are logged into your Workbench and a very large dataset is submitted to the tree, a recommendation for running the job in the CIPRES high performance computing environment will

Page 8: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 22

appear.   If you choose the CIPRES option, your selected sequences will be automatically transferred to CIPRES for tree calculation. Once the analysis is completed, the analysis result will be sent back to IRD and saved to your personal Workbench.

g. While the analysis is running, you can save the analysis to your Workbench by entering a name and then clicking “Save to Workbench”. Once it is saved, you can come back to the Workbench at any time to retrieve the analysis results. You can also request email notification when the analysis task is completed.

h. After the analysis is finished, a View Phylogenetic Tree page will be loaded. Here you can save the phylogenetic file in Newick or PhyloXML format to your computer. Click “View Tree” to load the Archaeopteryx Tree Viewer window.

i. A Tree Viewer window will pop up. Many tree customization options exist including: reroot the tree, collapse/expand/display subtrees, swap descendants, decorate (color) the tree leaves by any associated metadata (e.g. host, year or country of isolation, HA or NA subtype, etc.), resize the tree, zoom in/out, fit the tree to window, change the font size, etc.

i. With the Root/Reroot tree manipulation option selected, reroot the tree such that the outbreak H5N2 and older H5N2 are in two separate lineages.

ii. In the Tree Decorations section, choose “HA & NA subtype” from the Basic Decoration Options. Click “Show Legend” to display the color code for different subtypes.

iii. The default colors may or may not be ideal for your purpose. You can change the color by using the “Advanced Decoration” option. In the Advanced Decoration Options dialog box, choose “HA & NA subtype”, click the Manual Decoration checkbox and click “Go”.

iv. Check H5N2 and choose red in the color palette, then click “Apply”. Now the H5N2 strains are colored in red.

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Jan 24, 2014

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

Save Analysis Newick File PhyloXml File Phylip File Tree Parameters PhyML Log Tree Build Parameters

View Tree

The IRD Tree Decorator is a custom-enhancement of Archaeopteryx . The original FORESTER/ATV library is freely available from SourceForge . Credits: Zmasek C.M. and Eddy S.R.

(2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384.

Click the "View Tree" button below to launch the tree viewer software in a new window. If you prefer other viewing software, the tree data is available fordownload in Newick or PhyloXml format using the buttons above.

Due to security concerns, certain browsers (e.g. Safari and Firefox) have disabled Java plug-ins by default. If the Tree Viewer takes a long time toload, please test your browser's Java plug-in to make sure it can display Java Applets properly.Safari has recently tightened the security settings on Java Applets, which may affect image export functions of the Tree Viewer. Click Here forinstructions on how to fix this.

ENHANCED TREE VIEWER

The IRD team provides software that allows 'decoration' of your tree by features such as host species, year, country, and subtype. This custom software isbased on Archaeopteryx . In the tree viewer, use the drop-down menu for basic decoration or advanced decoration to select the feature for coloring. Thedecorated tree and corresponding legend can be exported using options in the File drop-down menu.

A user's guide is available. How to create a publication quality tree image

View Phylogenetic TreeHome My Workbench Working... Generate Phylogenetic Tree Results

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Influenza Research Database - Phylogenetic Tree Viewer http://www.fludb.org/brc/tree.do?decorator=influenza&method...

1 of 1 2/10/14 7:45 PM

Page 9: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 23

v. Now try coloring the tree by flu season and H5 clade.

vi. The tree shows that i) the HA sequences from the H5N2 outbreak are closely related to the HA sequences from several US H5N1/H5N8 isolates and a series of H5N8 isolates from Korea, ii) the HA sequences from the H5N2 outbreak and the closely related H5N1 and H5N8 isolates belong to the 2.3.4.4 clade within the highly pathogenic H5 lineage. Can you infer the origin of the North American H5N2 outbreak isolates?

vii. You can export the tree image by using options under the “File” menu.

viii. Save the tree analysis to your Workbench. To do so, return to the View Phylogenetic Tree page, click the “Save Analysis” button and enter a name.

N American H5N2 2010-2013

N American H5N2 outbreak 2014-2015

N American H5N2 outbreak nearest BLAST hits

Page 10: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 24

III. Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS)

Now we will use Meta-CATS to identify amino acid positions on the HA protein that significantly differ between the North American H5N2 outbreak isolates and older H5N2 isolates.

a. Go to your Workbench. Select working set “H5N2 HA N America 2010-2015”, click “More Actions” above the table, then click “Convert” in the “More Actions” lightbox.

b. A “Convert Working Set” lightbox will pop up. Name the new working set to be “H5N2 HA N America 2010-2015 protein”, select “Protein” from the Type list, and then click “Convert”.

c. The converted protein working set will appear at the top of the Workbench table.

d. Click the “View” link for the converted protein working set to display protein records saved in the working set.

e. Select all records by ticking the checkbox above the table, mouse-over the “Run Analysis” dropdown list, and click “Metadata-driven Comparative Analysis Tool”.

 

Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS) • A unique comparative genomics analysis tool in IRD to identify nucleotide

/amino acid positions that significantly differ between two or more groups of virus sequences.

• Meta-CATS consists of three parts: a multiple sequence alignment (using MUSCLE), a chi-square goodness of fit test to identify positions (columns) of the multiple sequence alignment that significantly differ from the expected (random) distribution of residues between all metadata groups, and a Pearson's chi-square test to identify the specific pairs of metadata groups that contribute to the observed statistical difference.

• Picket BE, et al. (2013) "Metadata-driven Comparative Analysis Tool for Sequences (meta-CATS): an Automated Process for Identifying Significant Sequence Variations Dependent on Differences in Viral Metadata." Virology, 447(1-2):45-51. doi: 10.1016/j.virol.2013.08.021.

Page 11: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 25

f. The Meta-CATS landing page will be loaded. Here we will separate the input sequences into two groups: older H5N2 sequences as a group and H5N2 outbreak sequences as another group. We can do so by grouping the sequences by year. Choose “Auto Grouping”, and then select “Year” from the dropdown list. Now enter year break point “2013” to get groups of: (1) 2013 & before, and (2) >2013 (recent outbreak isolates). Keep the default settings for other parameters and click “Continue”.

g. On the next page, you will see that your sequences are grouped into two groups: Group 1 containing <=2013 sequences, and Group 2 containing 2014-2015 sequences. Verify the sequences in each group assignment. You can manually remove any sequence from the lists by selecting a sequence and then clicking “Remove”. Click “Run” when you are finished.

h. While the analysis is running, you can choose to save the analysis result to your own Workbench upon completion by typing an analysis name in the “Save Analysis to Workbench” box and clicking the “Save to Workbench” button. Now you can move to other parts of the site. The analysis result can be retrieved from your Workbench later.

i. The Meta-CATS analysis result has two reports: a Chi-square Goodness of Fit test result table listing the positions that have a significant non-random distribution between your specified groups, and a Pearson's chi-square test result table listing the specific pairs of groups that contribute to the observed statistical difference. Since this analysis only deals with two groups of sequences, the results in the two tables are identical.

j. Review the Chi-square test results to see the positions that differ significantly between the older and outbreak H5N2 isolates.

i. How many positions are significantly different between the two groups?

ii. Sort the results by p-value to push the most different positions to the top of the table. What is the position number with the most significant P-value?

Page 12: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 26

k. To compare the Meta-CATS positions with positions characterized in H5N1 HA sequences, we will convert the numbering of the current analysis to the H5 numbering scheme based on an H5N1 reference strain. To do so, in the Reference Coordinate section, select “4: H5 A/Viet Nam/1203/2004(H5N1)” from the drop-down list.

l. The numbering scheme from the custom alignment of the input sequences are now mapped to both the H5 HA0 and H5 HA1/HA2 numbering schema based on A/Viet Nam/1203/2004(H5N1).

m. Save the analysis result to your Workbench by clicking the “Save Analysis” button.

   

IV. Visualize protein sequence alignment

Now we are going to view the protein sequence alignment to confirm the Meta-CATS results and to verify clade relationships inferred from the phylogenetic analysis.

a. From the Meta-CATS Report page, click “Visualize Aligned Sequences” at the top of the page.

Note: You can also run an alignment on the saved sequence working set by navigating to the working set in your Workbench area and then clicking the “Visualize Aligned Sequences” option from the “Run Analysis” pull down menu.

b. The alignment is presented in the JalView visualization window. The window is interactive.

i. The consensus sequence is shown at the bottom of the window. You can choose to show sequence logos by right-clicking on consensus and then selecting “Show logo”.

ii. You can manually adjust the alignment and display using various gray menu options.

iii. Scroll right to the region of 335-345. Several amino acid substitutions and a 3-residue insertion are observed in all H5N2 outbreak isolates. This is the polybasic cleavage site that characterizes highly pathogenic avian influenza.

Page 13: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 27

iv. We can change the View Option to “Conserved vs. reference” such that only the first sequence shows full characters; for the remaining sequences, only the nucleotides/residues differing from the reference sequence are shown as full characters.

v. You can download the input sequences or alignment in various formats, or save the alignment to your Workbench. Click “Save Analysis”, give it a name, and click “Save”.

V. Determine if the significant positions are located in Sequence Features

Sequence Features (SFs) are defined as interesting protein regions with known structural or functional properties. They are curated from the literature or obtained from other databases and validated by domain experts. Once a Sequence Feature region has been defined, the number of distinct amino acid sequences observed in the sequence database are determined and each defined as a unique variant type. The reference strain is always Variant Type (VT)-1.

The Sequence Feature (SF) column in the meta-CATS table provides a convenient linkout to a list of all Sequence Features that contain that amino acid position.

a. On the Meta-CATS report page, click the “View SF” link for position 344. How many Sequence Features are mapped to this position?

b. Click “View” for Influenza A_H5_SF456 to get to the Sequence Feature (SF) Details page.

N American H5N2 outbreak 2014-2015

N American H5N2 2010-2013

Page 14: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 28

i. This SF is a feature characteristic of highly pathogenic H5N1 influenza viruses. The polybasic cleavage site RXRR/RXKR is a major determinant of virulence and systemic spread in poultry and mammals, as shown by multiple independent studies. It begins at residue 339 and is 8 amino acids long.

ii. What is the reference strain used to define the position coordinates of this SF? What is the position number on the reference strain?

iii. How many Variant Types does this SF have?

iv. Click strain count for VT-4 to retrieve all sequences harboring this Variant Type. On the Sequence Feature Strains page, sort records by Collection Date. Are they from the current outbreak? Click “View” for a strain to access the Strain Details page. Look at H5 Clade annotation in the Strain Information section. Is it low pathogenic or highly pathogenic?

v. Return to the Sequence Feature Details page. All H5N2 outbreak isolates have RERRRK in this Sequence Feature. Now search for all strains harboring this combination. Click “Find a VT”, type in wildcard “?” at positions 344 and 345 and click “Search”.

Page 15: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 29

vi. Did you find the VT for H5N2 outbreak isolates?

vii. Now sort the Variant Type list by Phenotypic Variant Type. What common feature is shared among all Phenotypic Variant Type - Yes Variant Types?

c. In the IRD Tree Viewer, you can also color tree leaves by SFVT. Return to the HA tree window, click “Advanced Decoration” to open up the dialog box. Choose “SFVT” in the Decorate By dropdown list. Type “Influenza A_H5_SF456” in the Sequence Feature ID box and click “Apply”. The North American H5N2 outbreak isolates and their closest BLAST hits belong to VT-3, while the older H5N2 isolates belong to either VT-4 or VT-12.

N American H5N2 2010-2013

N American H5N2 2014-2015 & closest BLAST hits

Page 16: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 30

VI. Highlight variant positions and Sequence Features on protein structure

IRD imports experimentally-determined virus protein structures from the Protein Data Bank (PDB), integrates data from the Immune Epitope Database (IEDB) and UniProt, and provides various visualization options. To investigate the structural implications of the sequence variants identified by the Meta-CATS analysis, we are going to highlight the positions on an H5 protein structure.

a. From the gray navigation bar, mouse over “Search Data” and click “3D Protein Structures”.

b. Search for the 3D structures of influenza A H5 proteins. Subtype: H5 Select Proteins to search: ý 4 HA

c. The Search Results page displays a list of matching structures. Click “View Structure” for 4BH3 (Viet Nam/1203 (H5N1)) to display the structure.

d. Now we are on the 3D protein structure viewer page. Click and drag with your mouse in display window to change the focus point.

i. In the Display Options section, you can change the Display Type to line, stick, space, primary structure, secondary structure, etc. Choose “Secondary Structure in Cartoon”.

ii. In the Highlight Label Features section, choose “Chain” to color the structure by chain.

iii. In the Highlight Ligands section, choose a desired color and check the “Highlight Ligands” checkbox.

iv. In the Highlight Sequence Features section, choose “Functional” from the dropdown list, then select “Influenza A_H5_sialic-acid-binding-site_107(14)”.

v. In the Highlight By Swiss-Prot Position section, type in “171” which is a determinant of virulence and species adaptation, and then click “Highlight”.

vi. Click “Spin” to view the structure spinning. Then click “Rock” to rock the structure back and forth.

vii. The custom highlighted protein structure can be downloaded as an image by clicking “Save View As Image” beneath the image, or a 3D movie of either a spinning structure or a rocking structure by clicking “Generate Video”.

Page 17: Section B. Comparative Genomics Analysis of Influenza H5N2 ... · 18 Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

               

Created by the ViPR/IRD team and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License 31

References

Canadian Food Inspection Agency. Avian influenza investigation in British Columbia - 2014/2015. Accessed on June 17, 2015 from: http://www.inspection.gc.ca/animals/terrestrial-animals/diseases/reportable/ai/2014-2015-ai-investigation-in-bc/eng/1418491040802/1418491095666

United States Department of Agriculture. Update on Avian Influenza Findings. Accessed on June 17, 2015 from: http://www.aphis.usda.gov/wps/portal/aphis/ourfocus/animalhealth/sa_animal_disease_information/sa_avian_health/ct_avian_influenza_disease/!ut/p/a1/lZLJbsIwEIafpQeOwUMWEnpjadlLVdSW5BI5jpNYJHbkOCD69DVLKyoVaH2bmW80_z9jFKAVCjjesBQrJjjO93HQDieLkdnqgTkeLjsPMH56e5x7M9dajGwN-BroD7sj250BgO2ZMB70RgO3MwcYt2_1v6MABYSrUmXIx2XGqpAIrihXYc4iieWuARUORS3DRJC6OkSYswLnYUZxrrLzTMwqiisaMp4IWRxMHMsbhvk3T9QpobG8pvwDfzXuxZSExciPzE4CbZMaltfChk0iy8AOdgzTiZM4JsS1LPdkHi68LvzJ_A9k-NDTyOPs2Z1OTJg6J-Dafg_AFQ2-FuleVNGx0fKfric3bfVQsGzVfn-735Ap5_15qidglRn7w6DV1YMdy2cHQ6srB9Ni0lxEh6_qd3lkeXqUpAmVVDZrqdOZUmV134AGbLfbZipEmtMmEUUDfmvJRKXQ6ieJyuK18KydsX7xwHLW7wMn38y6d3efvyh9NQ!!/?1dmy&urile=wcm%3apath%3a%2Faphis_content_library%2Fsa_our_focus%2Fsa_animal_health%2Fsa_animal_disease_information%2Fsa_avian_health%2Fsa_detections_by_states%2Fct_ai_pacific_flyway

Pasick J, et al. 2015. Reassortant highly pathogenic influenza A H5N2 virus containing gene segments related to Eurasian H5N8 in British Columbia, Canada, 2014. Sci Rep. 2015 Mar 25;5:9484. doi: 10.1038/srep09484.

Ip HS, et al. 2015. Novel Eurasian highly pathogenic avian influenza A H5 viruses in wild birds, Washington, USA, 2014. Emerg Infect Dis. 2015 May;21(5):886-90. doi: 10.3201/eid2105.142020.

Noronha JM, et al. Influenza virus sequence feature variant type analysis: evidence of a role for NS1 in influenza virus host range restriction. J Virol. 2012 May;86(10):5857-66. doi: 10.1128/JVI.06901-11. PMID: 22398283

Picket BE, et al. Metadata-driven Comparative Analysis Tool for Sequences (meta-CATS): an Automated Process for Identifying Significant Sequence Variations Dependent on Differences in Viral Metadata. Virology, 2013, 447(1-2):45-51. doi: 10.1016/j.virol.2013.08.021. PMID:  24210098

Squires RB, et al. Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance. Influenza Other Respir Viruses, 2012, 6(6):404-16. doi: 10.1111/j.1750-2659.2011.00331.x. PMID: 22260278