19 July 2011 Richard H. Scheuermann, Ph.D. Department of Pathology

  • Published on

  • View

  • Download

Embed Size (px)


Sequence Feature Variant Type and Evolutionary Trajectory Analysis using the Influenza Research Database (IRD). 19 July 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center. Outline. Brief o verview of NIAID-Sponsored Influenza Research Database (IRD) - PowerPoint PPT Presentation


<p>Slide 1</p> <p>Sequence Feature Variant Type and Evolutionary Trajectory Analysis using the Influenza Research Database (IRD)19 July 2011</p> <p>Richard H. Scheuermann, Ph.D.Department of PathologyU.T. Southwestern Medical Center</p> <p>www.fludb.orgOutlineBrief overview of NIAID-Sponsored Influenza Research Database (IRD)Comprehensive integrated databaseAnalysis and visualization toolsU.S. NIH-funded, free access, open to allDeveloped by a team of research scientists, bioinformaticians and professional software developerswww.fludb.orgwww.viprbrc.org for other human viral pathogens Novel approach to genotype-phenotype association studies Sequence Feature Variant Type (SFVT) analysisEvolutionary Trajectory analysis of the pandemic (H1N1) 2009 strainwww.fludb.orgPublic Health Impact of InfluenzaSeasonal flu epidemics occur yearly during the fall/ winter months and result in 3-5 million cases of severe illness worldwide.More than 200,000 people are hospitalized each year with seasonal flu-related complications in the U.S.Approximately 36,000 deaths occur due to seasonal flu each year in the U.S. Populations at highest risk are children under age 2, adults age 65 and older, and groups with other comorbidities.Pandemics1918 Spanish flu (H1N1); 20 - 100 million deaths1957 Asian flu (H2N2); 1 - 1.5 million deaths1968 Hong Kong flu (H3N2); 750,000 - 1 million deaths2009 Swine origin (H1N1); &gt; 16,000 deaths as of March 2010Source: World Health Organization - http://www.who.int/mediacentre/factsheets/fs211/en/index.html </p> <p>www.fludb.org3Source: World Health Organization - http://www.who.int/mediacentre/factsheets/fs211/en/index.html Influenza Virus</p> <p>Orthomyxoviridae familyNegative-strand RNASegmentedEnveloped8 RNA segments encode11 proteinsClassified based on serology of HA and NAwww.fludb.orgIRD Overview</p> <p>www.fludb.orgwww.fludb.org</p> <p>www.fludb.orgData from both public archives (e.g. GenBank, PDB) and novel data derived by IRD through core data analysis and manual curationGenBank data loaded on a daily basisOther loads based on data refresh frequencies of source archives6Search Access to Data</p> <p>www.fludb.orgwww.fludb.orgData accessed through optimized search interfaces7Data Types</p> <p>www.fludb.orgSearch pages for various different data types8Core Query Attributes</p> <p>www.fludb.orgCommonly used search criteria9Advanced Query Options</p> <p>www.fludb.orgA variety of different advanced search options10Segment search results</p> <p>www.fludb.orgAnalysis and Visualization</p> <p>www.fludb.orgwww.fludb.orgLink to list of analysis and visualization tools12Analysis and Visualization Tools</p> <p>www.fludb.orgCurrent analysis tools focused on comparative genomicsEmphasis placed on data integration for visualization13Workbench Access</p> <p>www.fludb.orgwww.fludb.orgLink to personal workbench to save working sets of sequence and surveillance records, and analysis results14My Private Workbench</p> <p>www.fludb.orgExample of my workbench showing surveillance, segment and protein working sets as well as SNP analysis resultsIn left panel note sharing function in Access panel15</p> <p>www.fludb.orgExample of data integration in 3D protein visualization.A various custom display options B Ribbon diagram of influenza hemagglutinin in complex with a neutralizing antibodyC Sequence conservation highlighted; red residues are hypervariable among different virus isolatesD Added in location of an antibody epitope highlighted in yellow; note that the antibody epitope corresponds to a hypervariable region (red in panel C)16</p> <p>www.fludb.orgPhylogenetic trees of H4 surveillance sequences with custom coloring based on year of isolation. Panels B and C are zoomed in views of sections of the tree shown in panel A.Based on these trees, virus isolated from shore birds (Ruddy turnstone) are more closely related to viruses isolated from Alberta duck species than from Minnesota, North Dakota, Texas duck lineage (panel C). Note that this is not intended to be a definitive study but rather to illustrates IRD functionality.Other options for coloring in addition to year of isolation include country of isolation, HA subtype, NA subtype, host specie, and SFVT.17</p> <p>www.fludb.orgMultiple sequence alignment of H4 surveillance sequences with custom coloring based on year of isolation. Red arrows indicate positions conserved between viruses isolated from shore birds (Ruddy turnstone) and Alberta duck species supporting their common origin in contrast to viruses from the Minnesota, North Dakota, Texas duck lineage.18</p> <p>www.fludb.org</p> <p>www.viprbrc.orgwww.fludb.orgIRD SummaryFunded by U.S. National Institute of Allergy and Infectious Diseases (NIAID)Free and open access with no use restrictionsDeveloped by a team of research scientists, bioinformaticians and professional software developersComprehensive collection of public dataNovel derived data, novel analytical tools, unique functionsIntegration Integration Integrationwww.fludb.org www.viprbrc.org www.fludb.orgNovel approach to genotype-phenotype association studies Sequence Feature Variant Type (SFVT) Analysiswww.fludb.orgLimitations to PhylogeneticsTraditional virus phylogenetics focuses on comparative analysis of whole genome/genome segments, and is most useful to understand virus evolutionHowever, the genetic determinants of important viral phenotypes, e.g. virulence, host range, replication efficiency, immune response evation, etc., are determined by focused functional regions of viral proteinsTherefore, specific genotype-phenotype association can be masked by other evolutionary factors that contribute to traditional phylogenetic analysiswww.fludb.orgSFVT approachVT-1I F D R L E T L I LVT-2I F N R L E T L I LVT-3I F D R L E T I V LVT-4L F D Q L E T L V SVT-5I F D R L E N L T LVT-6I F N R L E A L I LVT-7I Y D R L E T L I LVT-8I F D R L E T L V LVT-9I F D R L E N I V LVT-10I F E R L E T L I LVT-11 L F D Q M E T L V SInfluenza A_NS1_nuclear-export-signal_137(10)</p> <p>Identify regions of protein/gene with known structural or functional properties Sequence Features (SF)an alpha-helical region, the binding site for another protein, an enzyme active site, an immune epitopeDetermine the extent of sequence variation for each SF by defining each unique sequence as a Variant Type (VT)High-level, comprehensive grouping of all virus strains by VT membership for each SF independentlyGenotype-phenotype association statistical analysis, e.g. genetic determinants of host range, virulence, replication rateInfluenza A_NS1_alpha-helix_171(17)www.fludb.orgInfluenza A NS1 protein (PDB 2GX9) crystal structure showingNuclear Export Signal Sequence Feature (SF) highlighted in RedAlpha-helix SF highlighted in greenAmino acid alignment with colors showing variation within nuclear export signal regionEach sequence with 1+ substitutions comprises a unique fingerprint or Variant Type (VT)A set of unique sequence substitutions existing within any defined region is a sequence feature variant type (SFVT) Statistical analyses on SFVTs can identify genotype-phenotype relationships24SF definitionBased on experimentation reported in the literature and 3D protein structures (PDB records)Captured by manual curationDefined by the specific amino acid positions in the polypeptide chainAnnotated with the know structural or functional propertieswww.fludb.orgInfluenza A Sequence Features as of 18JUL20114128 SFs total</p> <p>www.fludb.orgNS1 Sequence Features</p> <p>www.fludb.orgSF8 (nuclear export signal)</p> <p>www.fludb.orgVT for SF8 (nuclear export signal)</p> <p>www.fludb.orgVT-1 strains</p> <p>www.fludb.orgDo variations in NS1 sequence featureS influence influenza virus host range? www.fludb.orgNS1 Sequence Features</p> <p>www.fludb.orgVT for SF8 (nuclear export signal)</p> <p>www.fludb.orgVT distribution by host</p> <p>www.fludb.orgCauses of apparent NS1 VT-associated host range restrictionVirus spread - capability + opportunityPhenotypic property of the virus limited capacityRestricted founder effect limited opportunityRestricted spatial-temporal distributionSampling bias assumption of random samplingOversampling avian H5N1 in Asia; 2009 H1N1Undersampling large and domestic catsLinkage to causative variantwww.fludb.orgVT-11 strains</p> <p>www.fludb.orgVT for SF8 (nuclear export signal)</p> <p>www.fludb.orgVT lineages</p> <p>www.fludb.orgVT-4 lineage</p> <p>www.fludb.org</p> <p>www.fludb.orgVT-4 lineage = B allele/group</p> <p>www.fludb.orgVT-16 &amp; VT-9 lineages</p> <p>www.fludb.org</p> <p>www.fludb.orgVT-7 lineage</p> <p>www.fludb.org</p> <p>www.fludb.orgEvolutionary Trajectory analysis of the pandemic (H1N1) 2009 strainwww.fludb.orgPhylogenetic AnalysisEvolutionary originSelect a representative pandemic (H1N1) 2009 sequence from the IRD databaseBLAST to identify most similar sequencesAssess phylogenetic relationshipswww.fludb.orgPandemic (H1N1) 2009 selection</p> <p>www.fludb.orgSearch for all Influenza A Segment 4 sequence records from Human H1N1 2009.As of 29AUG2009 this search returns 993 sequence records in the IRD.Select A/California/04/2009, which was one of the original index cases of pandemic (H1N1) 2009 in the U.S.Run a BLAST analysis using this query sequence.48BLAST Result </p> <p>www.fludb.orgSelect top 1000 hits and save as a working set in my own personal workbench49</p> <p>Segment 1 phylogenetic treeSwine/Ohio/2004Duck/USA/2000sHuman/USA/2007 (seasonal)Swine/USA/1990sPandemic (H1N1) 2009www.fludb.orgTrees suggest that Segment 1 of pandemic (H1N1) 2009 (a.k.a. swine) is most like one of two North American swine lineages - Swine/Ohio/2004-2007;Segment 1 of pandemic (H1N1) 2009 (a.k.a. swine) is quite distinct from human seasonal H1N1 flu from previous years (e.g. 2007, 2008) and from the other swine flu lineage circulating in the USA in the 1990sSimilar comparative relationships are seen with Segments 1, 2, 3</p> <p>50Temporal componentReference strainA/California/04/2009BLASTReturn top 1000 resultsNormalize dataGraph nucleotide differences versus isolation year differenceswww.fludb.orgNP chart</p> <p>www.fludb.org52NS chart</p> <p>www.fludb.org53HA chart</p> <p>www.fludb.org54</p> <p>Group 1Group 3Group 2www.fludb.org</p>


View more >