1
Statistical Tool for Identifying Sequence Variations that Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) Brett E. Pickett 1 , Douglas S. Greer 1 , Yun Zhang 1 , Liwei Zhou 2 , Sanjeev Kumar 2 , Sam Zaremba 2 , Chris Larsen 3 , Edward B. Klem 2 , Richard H. Scheuermann 1 1 J. Craig Venter Institute, San Diego, CA; 2 Northrop Grumman Health Solutions, Rockville MD; 3 Vecna Technologies, Greenbelt MD. Introduction Figure 2: Screenshots of the Ortholog Group Component. Users can search for orthologs using various criteria (left) and then browse the results according to ortholog group (right). Of the 493 gD protein orthologs predicted by ViPR, 39 (HHV-1) and 25 (HHV-2) non-redundant sequences were included in this analysis. 1 Pickett, B.E., et al. (2012) ViPR: an open bioinformatics database and analysis resource for virology research. Nucl. Acids Res. 40(D1): D593-D598. We would like to thank the primary data providers for the data that was used throughout this study. We also recognize the scientific and technical personnel responsible for supporting and developing ViPR, which has been wholly supported with federal funds from the NIH/NIAID (N01AI2008038 and N01AI40041 to R.H.S.). Figure 6: 3D Protein Structure Viewer in ViPR. A display of a 3D protein structure for HHV-1 glycoprotein D complexed with Nectin-1. Residue 48 (cyan) and an epitope comprising residues 77-87 (green) are highlighted (PDB ID: 3U82). ViPR can assist in various comparative genomics analyses. As an example use case, we identified 2 significant sequence variations that: • Have diverged through speciation between HHV-1 and HHV-2 Overlap with known B-Cell epitopes • Could vary in response to external pressure(s) while retaining the ability to bind and enter host cells following speciation In conclusion, the ViPR resource combines a powerful database with integrated bioinformatics tools to perform computational analyses and assist in hypothesis generation. The uniqueness of ViPR lies in: integrating data from various sources capturing unique data on the host response to virus infection • combining necessary tools to perform analytical workflows allowing data sharing and storage with collaborators Figure 1: A screenshot of the ViPR homepage The ViPR homepage is the portal used to access the various types of data and advanced functionality for any supported virus family. The Virus Pathogen Database and Analysis Resource (ViPR, www.viprbrc.org), sponsored by the National Institute of Allergy and Infectious Diseases, serves as a single publicly accessible repository of integrated datasets and analysis tools for 14 different virus families--including Herpesviridae. ViPR supports wet-bench virology research focusing on the development of diagnostics, prophylactics, vaccines, and treatments for these pathogens 1 . The usefulness of the ViPR system can be demonstrated by using a scientific use case. Here we examine the sequence variation existing within the glycoprotein D (gD) protein in Human Herpesvirus-1 (HHV-1) and Human Herpesvirus-2 (HHV-2). ViPR Supports 14 Virus Families ViPR Integrates Data from Many Sources • GenBank sequence records, gene annotations, and strain metadata • Protein Databank (PDB) 3D protein structures • Immune epitopes from the Immune Epitope Database (IEDB) Clinical data • Host Factor Data generated from the NIAID Systems Biology projects and the ViPR- funded Driving Biological Projects UniProtKB protein annotations Gene Ontology (GO) classifications • Additional data derived from computational algorithms ViPR Provides Analysis and Visualization Tools Multiple Sequence Alignment Phylogenetic Tree Construction Sequence Polymorphism Analysis • Metadata-driven Comparative Genomics Statistical Analysis Genome Annotator Gbrowse Genome Viewer Sequence Format Conversion BLAST Sequence Similarity Search 3D Protein Structure Visualization Sequence Feature Variant Types Ortholog Group Assignments ViPR enables you to store and share data and results through the ViPR Workbench Figure 4: Alignment of gD Amino Acid Sequences HHV-1 (white) and HHV-2 (gray) gD sequences show a high degree of divergence towards the N-terminus of the protein. Blue arrows highlight a subset of significant positions. Phylogenetic Tree 3D Protein Structure Viewer Summary Acknowledgements References Protein Ortholog Search ViPR groups viral proteins together based on their predicted orthology within a virus taxon to facilitate gene/protein search, gene function inference, and virus evolution research. These orthologous groups can then be queried intuitively. • A search for orthologs of the US6 gene, which codes for Glycoprotein D (gD), was performed. Non-redundant HHV-1 and HHV-2 sequences were selected for more in-depth analysis. Metadata-driven Comparative Genomics Statistical Analysis Figure 5: Results from Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS). Shows abridged output using meta-CATS to compare HHV-1 and HHV-2 with residues located within experimentally- determined positive B-Cell epitopes (underlined) found in ViPR. • Multiple sequence alignment (MSA) can be calculated directly from: search results, a working set, or custom uploaded sequences. Multiple Sequence Alignment Figure 3: A Phylogenetic Tree Reconstruction of HHV-1 and HHV-2 sequences. The distance- based FastME algorithm, implemented in ViPR was used to generate a phylogenetic tree of all 64 HHV-1 (red) and HHV-2 (blue) amino acid sequences. ViPR uses an automated pipeline to identify sequence variations that significantly differ between groups of strains. Position Chi-square Value P-value Degree Freedom Residue Diversity (Group 1) Residue Diversity (Group 2) 3 59.864 1.02E-14 1 39 G 25 R 17* 63.993 1.27E-14 2 36 I, 3 L 25 A 32 59.864 1.02E-14 1 39 A 25 P 46 59.864 1.02E-14 1 39 D 25 N 354 32.708 1.07E-08 1 39 A 8 A, 17 V • ViPR can generate phylogenetic trees from search results, multiple sequence alignment, working set, or custom sequences via upload. • ViPR provides multiple data types for viewing on a 3D structure. Arenaviridae Bunyaviridae Caliciviridae Coronavirida e Filoviridae Flaviviridae Hepeviridae Herpesviridae Paramyxovirida e Picornaviridae Poxviridae Reoviridae Rhabdovirida e Togaviridae

Introduction

  • Upload
    taite

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Statistical Tool for Identifying Sequence Variations that Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction

Statistical Tool for Identifying Sequence Variations that Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR)

Brett E. Pickett1, Douglas S. Greer1, Yun Zhang1, Liwei Zhou2, Sanjeev Kumar2, Sam Zaremba2, Chris Larsen3, Edward B. Klem2, Richard H. Scheuermann1

1J. Craig Venter Institute, San Diego, CA; 2Northrop Grumman Health Solutions, Rockville MD; 3Vecna Technologies, Greenbelt MD.

Introduction

Figure 2: Screenshots of the Ortholog Group Component. Users can search for orthologs using various criteria (left) and then browse the results according to ortholog group (right). Of the 493 gD protein orthologs predicted by ViPR, 39 (HHV-1) and 25 (HHV-2) non-redundant sequences were included in this analysis.

1 Pickett, B.E., et al. (2012) ViPR: an open bioinformatics database and analysis resource for virology research. Nucl. Acids Res. 40(D1): D593-D598.

We would like to thank the primary data providers for the data that was used throughout this study. We also recognize the scientific and technical personnel responsible for supporting and developing ViPR, which has been wholly supported with federal funds from the NIH/NIAID (N01AI2008038 and N01AI40041 to R.H.S.).

Figure 6: 3D Protein Structure Viewer in ViPR. A display of a 3D protein structure for HHV-1 glycoprotein D complexed with Nectin-1. Residue 48 (cyan) and an epitope comprising residues 77-87 (green) are highlighted (PDB ID: 3U82).

ViPR can assist in various comparative genomics analyses. As an example use case, we identified 2 significant sequence variations that:

• Have diverged through speciation between HHV-1 and HHV-2 • Overlap with known B-Cell epitopes • Could vary in response to external pressure(s) while retaining

the ability to bind and enter host cells following speciation

In conclusion, the ViPR resource combines a powerful database with integrated bioinformatics tools to perform computational analyses and assist in hypothesis generation. The uniqueness of ViPR lies in:

• integrating data from various sources• capturing unique data on the host response to virus infection• combining necessary tools to perform analytical workflows• allowing data sharing and storage with collaborators

Figure 1: A screenshot of the ViPR homepageThe ViPR homepage is the portal used to access the various types of data and advanced functionality for any supported virus family.

The Virus Pathogen Database and Analysis Resource (ViPR, www.viprbrc.org), sponsored by the National Institute of Allergy and Infectious Diseases, serves as a single publicly accessible repository of integrated datasets and analysis tools for 14 different virus families--including Herpesviridae. ViPR supports wet-bench virology research focusing on the development of diagnostics, prophylactics, vaccines, and treatments for these pathogens1.

The usefulness of the ViPR system can be demonstrated by using a scientific use case. Here we examine the sequence variation existing within the glycoprotein D (gD) protein in Human Herpesvirus-1 (HHV-1) and Human Herpesvirus-2 (HHV-2).

ViPR Supports 14 Virus Families

ViPR Integrates Data from Many Sources• GenBank sequence records, gene annotations, and strain

metadata • Protein Databank (PDB) 3D protein structures• Immune epitopes from the Immune Epitope Database (IEDB)• Clinical data• Host Factor Data generated from the NIAID Systems Biology

projects and the ViPR-funded Driving Biological Projects• UniProtKB protein annotations• Gene Ontology (GO) classifications• Additional data derived from computational algorithms

ViPR Provides Analysis and Visualization Tools• Multiple Sequence Alignment• Phylogenetic Tree Construction• Sequence Polymorphism Analysis • Metadata-driven Comparative Genomics Statistical Analysis• Genome Annotator• Gbrowse Genome Viewer• Sequence Format Conversion • BLAST Sequence Similarity Search• 3D Protein Structure Visualization• Sequence Feature Variant Types• Ortholog Group Assignments

ViPR enables you to store and share data and results through the ViPR Workbench

Figure 4: Alignment of gD Amino Acid SequencesHHV-1 (white) and HHV-2 (gray) gD sequences show a high degree of divergence towards the N-terminus of the protein. Blue arrows highlight a subset of significant positions.

Phylogenetic Tree

3D Protein Structure Viewer

Summary

Acknowledgements

References

Protein Ortholog Search• ViPR groups viral proteins together based on their predicted

orthology within a virus taxon to facilitate gene/protein search, gene function inference, and virus evolution research. These orthologous groups can then be queried intuitively.

• A search for orthologs of the US6 gene, which codes for Glycoprotein D (gD), was performed. Non-redundant HHV-1 and HHV-2 sequences were selected for more in-depth analysis.

Metadata-driven Comparative Genomics Statistical Analysis

Figure 5: Results from Metadata-driven Comparative Analysis Tool for Sequences (Meta-CATS). Shows abridged output using meta-CATS to compare HHV-1 and HHV-2 with residues located within experimentally-determined positive B-Cell epitopes (underlined) found in ViPR.

• Multiple sequence alignment (MSA) can be calculated directly from: search results, a working set, or custom uploaded sequences.

Multiple Sequence Alignment

Figure 3: A Phylogenetic Tree Reconstruction of HHV-1 and HHV-2 sequences. The distance-based FastME algorithm, implemented in ViPR was used to generate a phylogenetic tree of all 64 HHV-1 (red) and HHV-2 (blue) amino acid sequences.

• ViPR uses an automated pipeline to identify sequence variations that significantly differ between groups of strains.

Position Chi-square

Value P-value Degree Freedom Residue Diversity

(Group 1) Residue Diversity

(Group 2)

3 59.864 1.02E-14 1 39 G 25 R

17* 63.993 1.27E-14 2 36 I, 3 L 25 A

32 59.864 1.02E-14 1 39 A 25 P

46 59.864 1.02E-14 1 39 D 25 N

354 32.708 1.07E-08 1 39 A 8 A, 17 V

• ViPR can generate phylogenetic trees from search results, multiple sequence alignment, working set, or custom sequences via upload.

• ViPR provides multiple data types for viewing on a 3D structure.

• Arenaviridae• Bunyaviridae• Caliciviridae• Coronaviridae• Filoviridae

• Flaviviridae• Hepeviridae• Herpesviridae• Paramyxoviridae• Picornaviridae

• Poxviridae• Reoviridae• Rhabdoviridae• Togaviridae