Future Plans for the HGNC Elspeth Bruford. HGNC Team Elspeth Bruford Susan Tweedie* Ruth Seal Kris...

Preview:

Citation preview

Future Plans for the HGNC

Elspeth Bruford

HGNC Team

ElspethBruford

SusanTweedie*

RuthSeal

KrisGray

Welcome to Susan

* starting 8.12.2014

BethYates

1. continue naming of human protein-coding genes, pseudogenes

& RNA genes

2. continue reassignment of uninformative symbols based on

functional data

3. coordinate gene naming across vertebrates

4. assign gene names within complex families across vertebrate

species (olfactory receptors, cytochrome P450s)

Funded Aims (2012-2017)

Human Gene Naming

• Locus types: complex cases

segregating pseudogenes: pseudogene/protein codingpseudogenes/lncRNAs

smORFs: lncRNA/protein codingMT-RNR2: rRNA/protein coding

• Continue to display only one locus type per symbol?

Proteogenomics

• Proposed workshop with NextProt, UniProt, Havana, RefSeq, HPA, Peptide Atlas etc. (Tress, Pandey, Rinn, Ponting, smORFs)

• Discuss complex cases and genes encoding Uniprot PE3/4/5

NB we currently have 316 entries of locus type “unknown”

• Protein nomenclature – separate field? e.g. TACR3, “tachykinin receptor 3” vs “neuromedin K

receptor”

Bidirectional promoters

Grzechnik et al 2014, TiBS

• lncRNA lies “head to head” (TSSs < 1 kb) with protein coding gene on antisense

strand

• implications for transcription of both loci

• currently denoted in gene name, e.g. FOXG1-AS1, “FOXG1 antisense RNA 1

(head to head)”

• have proposed naming these as GENE1-AU1, “gene 1 antisense upstream RNA 1”

Human Phenotype Naming• Historically HGNC maintained symbols for mapped phenotypes where causative

gene was unknown• Many of these fall into series, e.g. MRX#, DFN#, JBTS#• Now only approve new phenotypic symbols upon request, usually via direct

contact from researchers • Allows researchers to reserve provisional symbols that can be confirmed upon

acceptance of ms• In the last decade OMIM have started assigning new members to these series

without notification/consultation• Many have been added retrospectively• Some confusion caused, attempts made to minimise this but…• …OMIM have told us they want to take over assignment

Human Phenotype Ontology

• Being developed by Peter Robinson et al in Berlin• Standardized vocabulary of phenotypic abnormalities encountered in

human disease• ~10,000 terms, over 50,000 annotations to inherited diseases• Created using information from literature, Orphanet, Decipher & OMIM

Human Phenotype Ontology• CEP290: mutations can cause multiple phenotypes

Human Phenotype Ontology

Human Phenotype Ontology

• CEP290 has 143 HPO terms

• HPO PhenExplorer

• How/where to display143 terms?

• Likewise, should we include GO terms?

Renaming

• Literature searches• Comparison with UniProt names• CFAP# genes example – working with expert:

19 genes renamed, 19 aliased

Gene Naming Across Vertebrates

• What species? chimp > dog > cow > pig > horse > ?

• GeneFam – website brandingDatabase ID format: e.g. HGNCGF:#, GeneFam:#, HGNC:PTRO#

?

Gene Naming Across VertebratesSemi-automated naming of consensus 1:1 orthologs as identified by 4 comprehensive orthology resources:

• Ensembl Compara

• Homologene

• OMA

• Panther

HCOP – HGNC Comparison of Orthology Predictions toolcompares human to 17 species: chimp, macaque, mouse, rat, dog, horse, cow,

pig, opossum, platypus, chicken, Anole lizard, Xenopus, zebrafish, C. elegans, Drosophila and S. cerevisiae

using data from 12 resources:

Text mining

• Recent study by EBI Literature Services team

• Looked at gene symbol usage in full text articles in EuropePMC

• In 2006 Tamames & Valencia article estimated usage at 30% HGNC

symbols vs 70% synonyms

• In 2014 -70% HGNC symbols vs 30% synonyms

• Awaiting data on the recalcitrant 30%

can we update to what is being used?

Journals• HGNC have checked manuscripts for Genomics since 1990s (or earlier?)• 113 mss in the last year• impact factor in 1994: 5.037 > now: 2.793

• possible alternatives: PLOS One? (IF 2013 = 3.534)publishes ncRNA paperse.g. in June, PMID 24905231 cited AK096725 (LBX2-AS1) and

ENST00000453068 (now CYP51A1-AS1)in May, PMID 24879036 cited lincRNA-ENST00000515084 (now LINC01373) - NB ENST00000515084 is a retired IDBUT huge no. of submissions, how to filter/identify, speed, logistics

etc

• other suggestions?

Creative Commons?

• Public copyright licence, now on version 4.0• Different types

• allow adaptations of your work to be shared?• yes• no• yes, as long as others share alike

• CC0 – public domain mark

Complex Gene Families

• Olfactory receptors – Tsviya• Cytochrome P450s – Jed & David

Recommended