14
Molecular Immunology 40 (2004) 647–660 Review IMGT-ONTOLOGY and IMGT databases, tools and Web resources for immunogenetics and immunoinformatics Marie-Paule Lefranc a,b,a Laboratoire d’ImmunoGénétique Moléculaire, LIGM, Institut de Génétique Humaine IGH, Université Montpellier II, UPR CNRS 1142, 141 rue de la Cardonille, 34396 Montpellier Cedex 5, France b Institut Universitaire de France, France Received 18 June 2003; received in revised form 2 September 2003; accepted 16 September 2003 Abstract The international ImMunoGeneTics information system ® (IMGT; http://imgt.cines.fr), is a high quality integrated information system specialized in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC), and related proteins of the immune system (RPI) of human and other vertebrates, created in 1989, by the Laboratoire d’ImmunoGénétique Moléculaire (LIGM; Université Montpellier II and CNRS) at Montpellier, France. IMGT provides a common access to standardized data which include nucleotide and protein sequences, oligonucleotide primers, gene maps, genetic polymorphisms, specificities, 2D and 3D structures. IMGT consists of several sequence databases (IMGT/LIGM-DB, IMGT/MHC-DB, IMGT/PRIMER-DB), one genome database (IMGT/GENE-DB) and one 3D structure database (IMGT/3Dstructure-DB), interactive tools for sequence analysis (IMGT/V-QUEST, IMGT/JunctionAnalysis, IMGT/PhyloGene, IMGT/Allele-Align), for genome analysis (IMGT/GeneSearch, IMGT/GeneView, IMGT/LocusView) and for 3D struc- ture analysis (IMGT/StructuralQuery), and Web resources (“IMGT Marie-Paule page”) comprising 8000 HTML pages. IMGT other ac- cesses include SRS, FTP, search by BLAST, etc. By its high quality and its easy data distribution, IMGT has important implications in medical research (repertoire in autoimmune diseases, AIDS, leukemias, lymphomas, myelomas), veterinary research, genome diversity and genome evolution studies of the adaptive immune responses, biotechnology related to antibody engineering (single chain Fragment variable (scFv), phage displays, combinatorial libraries) and therapeutical approaches (grafts, immunotherapy). IMGT is freely available at http://imgt.cines.fr. © 2003 Elsevier Ltd. All rights reserved. Keywords: IMGT; Database; Immunoglobulin; T cell receptor; MHC; HLA; Immunogenetics; 3D structure; Web resources; Leukemia; Lymphoma 1. Introduction The molecular synthesis and genetics of the immunoglob- ulin (IG) and T cell receptor (TR) chains is particularly complex and unique as it includes biological mechanisms such as DNA molecular rearrangements in multiple loci (three for IG and four for TR in humans) located on different chromosomes (four in humans), nucleotide dele- tions and insertions at the rearrangement junctions (or N-diversity), and somatic hypermutations in the IG loci (for review Lefranc and Lefranc, 2001a,b). The number of potential protein forms of IG and TR is almost un- limited. Owing to the complexity and high number of published sequences, data control and classification and Tel.: +33-4-9961-9965; fax: +33-4-9961-9901. E-mail address: [email protected] (M.-P. Lefranc). URL: http://imgt.cines.fr. detailed annotations are a very difficult task for the gen- eralist databanks such as EMBL (UK) (Stoesser et al., 2003), GenBank (USA) (Benson et al., 2003), and DDBJ (Japan) (Miyazaki et al., 2003). These observations were the starting point of the international ImMunoGeneT- ics information system® (IMGT; http://imgt.cines.fr) (Lefranc, 2003a), created in 1989, by the Laboratoire d’ImmunoGénétique Moléculaire (LIGM; Université Mont- pellier II and CNRS) at Montpellier, France. IMGT is a high quality integrated information system specialized in IG, TR, major histocompatibility complex (MHC) and related proteins of the immune system (RPI) of hu- man and other vertebrate species (Giudicelli et al., 1997; Lefranc et al., 1998, 1999; Ruiz et al., 2000; Lefranc, 2001a, 2002, 2003a,b). IMGT consists of several se- quence databases (IMGT/LIGM-DB, IMGT/MHC-DB, IMGT/PRIMER-DB, IMGT/PROTEIN-DB, this last one in development), one genome database (IMGT/GENE-DB) 0161-5890/$ – see front matter © 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.molimm.2003.09.006

Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

Molecular Immunology 40 (2004) 647–660

Review

IMGT-ONTOLOGY and IMGT databases, tools and Webresources for immunogenetics and immunoinformatics

Marie-Paule Lefranca,b,∗a Laboratoire d’ImmunoGénétique Moléculaire, LIGM, Institut de Génétique Humaine IGH, Université Montpellier II,

UPR CNRS 1142, 141 rue de la Cardonille, 34396 Montpellier Cedex 5, Franceb Institut Universitaire de France, France

Received 18 June 2003; received in revised form 2 September 2003; accepted 16 September 2003

Abstract

The international ImMunoGeneTics information system® (IMGT; http://imgt.cines.fr), is a high quality integrated information systemspecialized in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC), and related proteins of the immunesystem (RPI) of human and other vertebrates, created in 1989, by the Laboratoire d’ImmunoGénétique Moléculaire (LIGM; UniversitéMontpellier II and CNRS) at Montpellier, France. IMGT provides a common access to standardized data which include nucleotide andprotein sequences, oligonucleotide primers, gene maps, genetic polymorphisms, specificities, 2D and 3D structures. IMGT consists ofseveral sequence databases (IMGT/LIGM-DB, IMGT/MHC-DB, IMGT/PRIMER-DB), one genome database (IMGT/GENE-DB) andone 3D structure database (IMGT/3Dstructure-DB), interactive tools for sequence analysis (IMGT/V-QUEST, IMGT/JunctionAnalysis,IMGT/PhyloGene, IMGT/Allele-Align), for genome analysis (IMGT/GeneSearch, IMGT/GeneView, IMGT/LocusView) and for 3D struc-ture analysis (IMGT/StructuralQuery), and Web resources (“IMGT Marie-Paule page”) comprising 8000 HTML pages. IMGT other ac-cesses include SRS, FTP, search by BLAST, etc. By its high quality and its easy data distribution, IMGT has important implications inmedical research (repertoire in autoimmune diseases, AIDS, leukemias, lymphomas, myelomas), veterinary research, genome diversityand genome evolution studies of the adaptive immune responses, biotechnology related to antibody engineering (single chain Fragmentvariable (scFv), phage displays, combinatorial libraries) and therapeutical approaches (grafts, immunotherapy). IMGT is freely availableathttp://imgt.cines.fr.© 2003 Elsevier Ltd. All rights reserved.

Keywords:IMGT; Database; Immunoglobulin; T cell receptor; MHC; HLA; Immunogenetics; 3D structure; Web resources; Leukemia; Lymphoma

1. Introduction

The molecular synthesis and genetics of the immunoglob-ulin (IG) and T cell receptor (TR) chains is particularlycomplex and unique as it includes biological mechanismssuch as DNA molecular rearrangements in multiple loci(three for IG and four for TR in humans) located ondifferent chromosomes (four in humans), nucleotide dele-tions and insertions at the rearrangement junctions (orN-diversity), and somatic hypermutations in the IG loci(for review Lefranc and Lefranc, 2001a,b). The numberof potential protein forms of IG and TR is almost un-limited. Owing to the complexity and high number ofpublished sequences, data control and classification and

∗ Tel.: +33-4-9961-9965; fax:+33-4-9961-9901.E-mail address:[email protected] (M.-P. Lefranc).

URL: http://imgt.cines.fr.

detailed annotations are a very difficult task for the gen-eralist databanks such as EMBL (UK) (Stoesser et al.,2003), GenBank (USA) (Benson et al., 2003), and DDBJ(Japan) (Miyazaki et al., 2003). These observations werethe starting point of the international ImMunoGeneT-ics information system® (IMGT; http://imgt.cines.fr)(Lefranc, 2003a), created in 1989, by the Laboratoired’ImmunoGénétique Moléculaire (LIGM; Université Mont-pellier II and CNRS) at Montpellier, France. IMGT isa high quality integrated information system specializedin IG, TR, major histocompatibility complex (MHC)and related proteins of the immune system (RPI) of hu-man and other vertebrate species (Giudicelli et al., 1997;Lefranc et al., 1998, 1999; Ruiz et al., 2000; Lefranc,2001a, 2002, 2003a,b). IMGT consists of several se-quence databases (IMGT/LIGM-DB, IMGT/MHC-DB,IMGT/PRIMER-DB, IMGT/PROTEIN-DB, this last onein development), one genome database (IMGT/GENE-DB)

0161-5890/$ – see front matter © 2003 Elsevier Ltd. All rights reserved.doi:10.1016/j.molimm.2003.09.006

Page 2: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

648 M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660

Fig. 1. Databases, interactive tools and Web resources of IMGT, the international ImMunoGeneTics information system® (http:/lirngt.cines.fr).

and one three-dimensional 3D structure database (IMGT/3Dstructure-DB), interactive tools for sequence analy-sis (IMGT/V-QUEST, IMGT/JunctionAnalysis, IMGT/PhyloGene, IMGT/Allele-Align), for genome analysis(IMGT/GeneSearch, IMGT/GeneView, IMGT/LocusView)and 3D structure analysis (IMGT/StructuralQuery), andWeb resources (“IMGT Marie-Paule page”) comprising8000 HTML pages which include IMGT Scientific chart,IMGT Repertoire (for IG and TR, MHC, RPI), IMGTBloc-notes, IMGT Education and IMGT Index (Fig. 1).IMGT other accesses include SRS, FTP, search by BLAST,

etc. By its high quality and its easy data distribution, IMGThas important implications in medical research (reper-toire in normal and pathological situations: autoimmunediseases, infectious diseases, AIDS, detection of residualdiseases in leukemias, lymphomas, myelomas), veterinaryresearch, genome diversity and genome evolution studiesof the adaptive immune responses, biotechnology relatedto antibody engineering (single chain Fragment variable(scFv), phage displays, combinatorial libraries) and ther-apeutical approaches (grafts, immunotherapy). IMGT isfreely available athttp://imgt.cines.fr.

Page 3: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660 649

2. IMGT-ONTOLOGY

IMGT has developed a formal specification of the termsto be used in the domain of immunogenetics and immunoin-formatics to ensure accuracy, consistency and coherencein IMGT. This has been the basis of IMGT-ONTOLOGY(Giudicelli and Lefranc, 1999), the first ontology inthe domain, which allows the management of the im-munogenetics knowledge for human and other vertebratespecies. IMGT-ONTOLOGY comprises five main con-cepts: IDENTIFICATION, CLASSIFICATION, DESCRIP-TION, NUMEROTATION and OBTENTION (Giudicelliand Lefranc, 1999). Standardized keywords, standardizedsequence annotation, standardized IG and TR gene nomen-clature, the IMGT unique numbering, and standardizedorigin/methodology were defined, respectively, based onthese five main concepts. The controlled vocabulary and theannotation rules for data and knowledge management ofthe IG, TR, MHC and RPI of human and other vertebratespecies constitute the IMGT Scientific chart. All IMGT dataare expertly annotated according to the IMGT Scientificchart. IMGT is the global internationally acknowledgedreference in immunogenetics and immunoinformatics.

IMGT standardized keywords, based on the IDENTIFI-CATION concept, have been assigned to all IMGT/LIGM-DBentries. They include (i)general keywords: indispensablefor the sequence assignments, they are described in anexhaustive and non-redundant list, and are organized ina tree structure, and (ii)specific keywords: they are morespecifically associated with particularities of the sequences(orphon, transgene, etc.) or with diseases (leukemia, lym-phoma, myeloma, etc.) (Giudicelli et al., 1997). The listis not definitive and new specific keywords can easily beadded if needed.

Two hundred fifteen feature labels, based on the DE-SCRIPTION concept, are necessary in IMGT/LIGM-DB todescribe all structural and functional subregions that com-pose IG and TR sequences (Giudicelli et al., 1997), whereasonly seven of them are available in EMBL, GenBank orDDBJ. Annotation of sequences with these labels constitutesthe main part of the expertise. Levels of annotation havebeen defined, which allow the users to query sequences inIMGT/LIGM-DB even though they are not fully annotated(Giudicelli et al., 1997). An internal tool, IMGT/Automat,has been developed to automatically perform the annota-tion of the rearranged cDNA sequences in IMGT/LIGM-DB(Giudicelli et al., 2003). One hundred seventy two additionallabels were defined for IG, TR, MHC, and RPI amino acidsequences and domain structures in IMGT/PROTEIN-DBand IMGT/3Dstructure-DB. Prototypes represent the orga-nizational relationship between labels and give informationon the order and expected length (in number of nucleotides)of the labels (Giudicelli et al., 1997; Lefranc et al., 1999).Prototypes can apply to general configuration of IG, TRor MHC, independently of the chain type, the species orany other parameters like functionality. However, prototypes

Fig. 2. The IMGT-ONTOLOGY CLASSIFICATION concept. The CLAS-SIFICATION concept organizes the immunogenetics knowledge useful toname and classify IG and TR genes in lMGT. (A) Concepts of classifi-cation and their relations. (B) Examples of instances: for each concept ofclassification is shown the instance which allows the classification of thehuman IGLV2-1 1∗02 allele.

may also be established for very precise cases when se-quence characteristics are clearly established.

IMGT provides immunologists and geneticists with astandardized nomenclature per locus and per species whichallows extraction and comparison of data for the complexB and T cell antigen receptors. The CLASSIFICATIONconcept (Fig. 2) has been used to set up a unique nomencla-ture of the human IG and TR genes, which was approvedby the Human Genome Organization (HUGO) Nomencla-ture Committee (HGNC) in 1999 (Wain et al., 2002a) andhas become the community standard. The complete list ofthe human IG and TR gene names (Lefranc, 2000a,b,c,d;Lefranc, 2001b,c,d; Lefranc and Lefranc, 2001a,b) hasbeen entered by the IMGT Nomenclature Committee in thegenome database (GDB, Canada) (Letovsky et al., 1998),LocusLink at NCBI (USA) (Pruitt and Maglott, 2001),and GeneCards (Safran et al., 2002). The complete list ofthe mouse IG and TR gene names was sent by IMGT, inJuly 2002, to the mouse genome informatics (MGI) mousegenome database (MGD) (USA) (Blake et al., 2003), Lo-cusLink at NCBI, and HGNC. Both lists are available fromthe IMGT site (Lefranc, 2003a) and queries on the humanand mouse IG and TR gene classification and gene namescan be made from IMGT/GENE-DB. IMGT reference se-quences have been defined for each allele of each genebased on one or, whenever possible, several of the follow-ing criteria: germline sequence, first sequence published,longest sequence, mapped sequence (Lefranc et al., 1999).

A uniform numbering system for IG and TR sequencesof all species, based on the NUMEROTATION concept,has been established to facilitate sequence comparison andcross-referencing between experiments from different labo-ratories whatever the antigen receptor (IG or TR), the chaintype, the domain (variable V or constant C), or the species(Lefranc, 1997, 1999; Lefranc et al., 2003). This number-ing results from the analysis of more than 5000 IG and TR

Page 4: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

650 M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660

Fig. 3. Example of IMGT Collier de Perles on two layers. TheIMGT Collier de Perles on two layers are generated dynamically inlMGT/3Dstructure-DB (http://imgt.cines.fr). CDR1-IMGT, CDR2-IMGTand CDR3-IMGT are in red, orange and purple, respectively. The outer AB D E beta strands (from left to right) are in white, whereas the inner GF C C′ C′ ′ beta strands (from left to right) are in grey. Hydrogen bondsbetween amino acids of the inner G F C C′ C′ ′ sheet, deduced from thePDB X-ray crystallographic data, are shown in green. Hatched circles cor-respond to missing positions according to the IMGT unique numbering.The disulfide bond between 1st-CYS 23 and 2nd-CYS 104 is not shown.

variable region sequences of vertebrate species from fish tohuman. It takes into account and combines the definition ofthe framework (FR) and complementarity determining re-gion (CDR) (Kabat et al., 1991), structural data from X-raydiffraction studies (Satow et al., 1986) and the characteriza-tion of the hypervariable loops (Chothia and Lesk, 1987). Inthe IMGT numbering, conserved amino acids from frame-works always have the same number whatever the IG orTR variable sequence, and whatever the species they comefrom (as examples: cysteine 23 (in FR1), tryptophan 41 (inFR2), leucine 89 and cysteine 104 (in FR3). Based on theIMGT unique numbering, standardized 2D graphical repre-sentations, designated as IMGT Colliers de Perles (Lefrancet al., 1999) (Fig. 3), are available in IMGT Repertoire(http://imgt.cines.fr).This IMGT unique numbering has sev-eral advantages:

• It has allowed the redefinition of the limits of the FRand CDR of the IG and TR variable regions (Lefranc andLefranc, 2001a,b) and domains (Ruiz and Lefranc, 2002).The FR-IMGT and CDR-IMGT lengths become in them-selves crucial information which characterize variable re-gions belonging to a group, a subgroup and/or a gene.

• Framework amino acids (and codons) located at the sameposition in different sequences can be compared withoutrequiring sequence alignments. This also holds for aminoacids belonging to CDR-IMGT of same length.

• The unique numbering is used as the output of theIMGT/V-QUEST alignment tool. The aligned sequencesare displayed according to the IMGT numbering andwith the FR-IMGT and CDR-IMGT delimitations.

• The unique numbering has allowed a standardization ofthe description of mutations and the description of IG andTR allele polymorphisms (Lefranc and Lefranc, 2001a,b).These mutations and allelic polymorphisms are describedby comparison to the IMGT reference sequences of thealleles∗01 (Lefranc, 1998).

• The unique numbering allows the description and com-parison of somatic hypermutations of the IG IMGT vari-able domains.

By facilitating the comparison between sequences andby allowing the description of alleles and mutations, theIMGT unique numbering represents a big step forward inthe analysis of the IG and TR variable region (V-REGION)sequences of all vertebrate species (Pommié et al., in press).Moreover, it gives insight into the structural configurationof the variable domain (V-DOMAIN encoded by the V–J-or V–D–J-REGION) (Ruiz and Lefranc, 2002; Lefrancet al., 2003)(Fig. 3). The IMGT unique numbering opensinteresting views on the evolution of the proteins belong-ing to the “immunoglobulin superfamily” (IgSF) (Williamsand Barclay, 1988). It has been applied with success toall the sequences of domains belonging to the IgSF V-set,designated as V-LIKE-DOMAINs in IMGT, which in-clude non-rearranging sequences in vertebrates (humanCD4 [D1,D3], Xenopus CTXg1, etc.) and in invertebrates(drosophila amalgam, drosophila fasciclin II, etc.) (IMGTRepertoire (RPI),http://imgt.cines.fr) (Lefranc, 1997, 1999;Lefranc et al., 2003).

This standardized approach has been applied to theconstant domain (C-DOMAIN) of the IG and TR, and tothe C-LIKE-DOMAINs of proteins other than IG and TR(IMGT Repertoire (RPI),http://imgt.cines.fr). An IMGTunique numbering has also been implemented for the groovedomain (G-DOMAIN) (Duprat and Lefranc, 2003) of theMHC class I and II chains (IMGT Repertoire (MHC),http://imgt.cines.fr), and for the G-LIKE-DOMAINsof proteins other than MHC (IMGT Repertoire (RPI),http://imgt.cines.fr).

The OBTENTION concept is a set of standardizedterms that precise the origins of the sequence (the ‘origin’concept) and the conditions in which the sequenceshave been obtained (the ‘methodology’ concept). The‘origin’ concept comprises the subsets of ‘cell, tissue ororgan’, ‘auto-immune diseases’, ‘clonal expansion diseases’(such as leukemia, lymphoma, myeloma), whereas the‘methodology’ concept comprises the subsets related tothe ‘hybridoma’, to the experimental conditions (sequences

Page 5: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660 651

amplified by ‘PCR’), to the obtention from ‘libraries’ (ge-nomic, cDNA, combinatorial, etc.) or from ‘transgenic’organisms (animal, plant). At this stage of development, theexhaustive definition of the concepts of obtention and oftheir instances is still in progress.

3. IMGT databases

IMGT/LIGM-DB contains three sequence databases,IMGT/LIGM-DB, IMGT/MHC-DB, and IMGT/PRIMER-DB, one genome database, IMGT/GENE-DB, and one 3Dstructure database, IMGT/3Dstructure-DB. IMGT/LIGM-DB, the first and the largest database, is the comprehensiveIMGT database of IG and TR nucleotide sequences from hu-man and other vertebrate species, with translation for fullyannotated sequences, created in 1989 by LIGM, at Montpel-lier, France, on the Web since July 1995 (Giudicelli et al.,1997; Lefranc et al., 1998, 1999; Ruiz et al., 2000; Lefranc,2001a, 2002, 2003a,b). In October 2003, IMGT/LIGM-DBcontained 77,810 nucleotide sequences of IG and TR fromhuman and 150 other vertebrate species.

IMGT/LIGM-DB sequence data are identified by theEMBL/GenBank/DDBJ accession number. The uniquesource of data for IMGT/LIGM-DB is EMBL which sharesdata with the other two generalist databases GenBankand DDBJ. Once the sequences are allowed by the au-thors to be made public, LIGM automatically receives IGand TR sequences by e-mail from EMBL. After controlby LIGM curators, data are scanned to store sequences,bibliographical references and taxonomic data, and stan-dardized IMGT/LIGM-DB keywords are assigned to allentries. Based on expert analysis, specific detail annotationsare added to IMGT flat files in a second step (Giudicelliet al., 1997). Since August 1996, the IMGT/LIGM-DBcontent has closely followed that of the EMBL for the IGand TR, with the following advantages: IMGT/LIGM-DBcontains IG and TR entries which have disappeared fromthe generalist databases (as examples: the L36092 acces-sion number which encompasses the complete human TRBlocus is still present in IMGT/LIGM-DB, whereas it hasbeen deleted from EMBL/GenBank/DDBJ due to its toolarge size (684,973 bp); in 1999, IMGT detected the disap-pearance of 20 IG and TR sequences which inadvertentlyhad been lost by GenBank, and allowed the recuperationof these sequences in the generalist databases); conversely,IMGT/LIGM-DB does not contain sequences which havepreviously been wrongly assigned to IG and TR.

The IMGT/LIGM-DB data are provided with a userfriendly interface. The Web interface allows searches ac-cording to immunogenetic specific criteria and is easy touse without any knowledge in a computing language. Theinterface allows the users to get easily connected fromany type of platform (PC, Macintosh, workstation) usingfreeware such as Explorer, Netscape. All IMGT/LIGM-DBinformation is available through five modules of search:

catalogue, taxonomy and characteristics, keywords, anno-tation labels and references (Fig. 4). Selection is displayedat the top of the “results of your search” page, so theusers can check their own queries (Lefranc et al., 1999).Users have the possibility to modify their request or consultthe results. They can (i) add new conditions to increaseor decrease the number of resulting sequences, (ii) viewdetails: selecting this “View” option provides a list of re-sulting sequences; selection of one sequence in the listoffers nine possibilities: annotations, IMGT flat file, cod-ing regions with protein translation, catalogue and externalreferences, sequence in dump format, sequence in FASTAformat, sequence with three reading frames, EMBL flat file,IMGT/V-QUEST, or (iii) search for sequence fragments:selecting this “Subsequences” option allows to search forsequence fragments (subsequences) corresponding to a par-ticular label for the resulting sequences (available for fullyannotated sequences (Lefranc et al., 1999).

IMGT/LIGM-DB data are distributed by anonymousFTP servers at the Centre Informatique National del’Enseignement Supérieur (CINES), Montpellier, France(ftp://ftp.cines.fr/IMGT/), at the European Bioinformat-ics Institute (EBI), Hinxton, UK (ftp://ftp.ebi.ac.uk/pub/databases/imgt/) and at the Institut de Génétique Hu-maine (IGH), Montpellier, France (ftp://ftp.igh.cnrs.fr/pub/IMGT), and from many sequence retrieval system (SRS)sites. IMGT/LIGM-DB releases are produced weekly.Users can also compare their own sequences againstIMGT/LIGM-DB data using BLAST or FASTA on dif-ferent servers (EBI, IGH, INFOBIOGEN, Institut PasteurParis, etc.). Links are provided from the IMGT Home page,http://imgt.cines.fr, in “IMGT other accesses”).

IMGT/MHC-DB, hosted on the EBI server at Hinx-ton (UK), comprises a database of the human MHCallele sequences (IMGT/MHC-HLA, developed by Can-cer Research, UK and Anthony Nolan Research Institute(ANRI), London, UK, on the Web since December 1998(1738 entries in October 2003) (Robinson et al., 2003),databases of the MHC class II sequences from non-humanprimates (IMGT/MHC-NHP, curated by the BiomedicalPrimate Research Centre (BPRC), Rijswijk, The Nether-lands) and from felines and canines (IMGT/MHC-FLA andIMGT/MHC-DLA, curated by the Centre for IntegratedGenomic Medical Research, Manchester, UK), on the Websince April 2002 (Robinson et al., 2003).

IMGT/PRIMER-DB (Fig. 5) is the IMGT oligonucleotide(primer) database for the IG and TR, created by LIGM(Montpellier, France) in collaboration with EUROGEN-TEC S.A. (Seraing, Belgium), on the Web since February2002. IMGT/PRIMER-DB provides information on IGand TR primers which are useful for the analysis of theIG and TR gene repertoire and expression, the detectionof minimal residual diseases in B and T cell malignan-cies, the construction of antibody combinatorial libraries,scFv, phage display or microarray technologies. In Octo-ber 2003, IMGT/PRIMER-DB contained 292 entries from

Page 6: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

652 M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660

Fig. 4. IMGT/LIGM-DB search page (http://imgt.cines.fr). Five modules of search are available: Catalogue, Taxonomy and Characteristics, Keywords,Annotation labels, and References. These modules allow extensive and complex queries on IG and TR sequences from human and other vertebrates. InOctober 2003, IMGT/LIGM-DB contained 77,810 sequences of IG and TR from 150 species. A short path selection allows a direct query with an accessionnumber or with a part of it. For example, “AJ012264" will retrieve that sequence, whereas “AJ012" will retrieve all sequences beginning with AJ012.

Homo sapiensand Mus musculus. IMGT/PRIMER-DBcontains information on primers and combinations ofprimers described as “sets” (primers sharing identicalproperties (species, group and orientation)) and “couples”(sets of opposite orientation for which IMGT/LIGM-DBsequences are known (or expected)). Primers, Sets andCouples are described in IMGT Primer cards, IMGT Setcards and IMGT Couple cards, respectively. An IMGTPrimer is an oligonucleotide described by comparison toan IMGT/LIGM-DB reference sequence, according to thestandardized rules of the IMGT Scientific chart, based onIMGT-ONTOLOGY (Giudicelli and Lefranc, 1999). Tax-onomy species and IMGT classification (group, subgroup,gene, allele) of a primer are those of the IMGT/LIGM-DBreference sequence, and not those of the PCR amplifica-tion products. This provides the following advantages forthe data standardization: IMGT/PRIMER-DB primer defi-nition, classification, and description are independent fromthe experimental conditions, from DNA sources and fromthe different combinations (sets and couples) in which theprimer can be used. That means that (i) the specificity ofa primer (subgroup, gene or allele specific) which is ei-ther described experimentally or deduced from sequence

comparison is not used for the primer classification, al-though these data are provided in the IMGT/PRIMER-DBPrimer card in “Classification comments and specificity”,(ii) the sequences resulting from the PCR amplifications areuniquely associated to the couples. The IMGT Primer cardsare linked to IMGT/LIGM-DB flat files, IMGT Colliers dePerles and IMGT Repertoire> Alignments of alleles of theIMGT/LIGM-DB reference sequence used for the primerdescription.

IMGT/GENE-DB (Fig. 6) is the comprehensive IMGTgenome database for IG and TR genes from human andmouse, and in development, from other vertebrates, createdby LIGM, on the Web since January 2003. IMGT/GENE-DBannotated data are extracted from IMGT Repertoire, theglobal ImMunoGeneTics Web resource, and from theIMGT/LIGM-DB database. All the human IMGT genenames (Lefranc and Lefranc, 2001a,b) were approved byHGNC in 1999 (Wain et al., 2002a), and entered in GDB(Canada), LocusLink at NCBI (USA) and GeneCards. InOctober 2003, IMGT/GENE-DB contained 1375 genesand 2201 alleles (673 IG and TR genes and 1024 alle-les from H. sapiens, and 702 IG and TR genes and 1177alleles fromM. musculus, Mus cookii, Mus pahari, Mus

Page 7: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660 653

Fig. 5. IMGT/PRIMER-DB Query page (http://imgt.cines.fr). IMGT group, subgroup, gene and allele name used for the query and display are accordingto the IMGT gene nomenclature, based on the CLASSIFICATION concept of IMGT-ONTOLOGY.

spretus, Mus saxicola, Mus minutoides). IMGT/GENE-DBallows a search of IG and TR gene entries by locus, group,subgroup, based on the CLASSIFICATION concept ofIMGT-ONTOLOGY (Giudicelli and Lefranc, 1999). Ashort cut allows to search genes by a selection on gene name(according to the IMGT nomenclature) or on clone name(s)(data from the “Reference sequences” and “Sequences fromthe literature” columns in IMGT Repertoire > Gene ta-bles). The selection is displayed at the top of the resultinggenes page. The users can select the genes to view theirdetailed entries. Each IMGT/GENE-DB entry correspondsto one gene and provides, for each gene, the chromosomallocalization, the gene name and definition, the number ofalleles, links to IMGT Repertoire and to external sequencedatabases (EMBL, GenBank, DDBJ), genome databases(LocusLink, OMIM), and gene nomenclature database(Genew (Wain et al., 2002b)). Reciprocally, LocusLink,GDB, GeneCards and HGNC Genew have direct links tothe IMGT/GENE-DB entries. IMGT/GENE-DB providesfor each allele, the functionality, the clone names, theIMGT/LIGM-DB reference sequence accession numbers

(with link to the flat files), and the “IMGT/GENE-DB ref-erence sequences in FASTA format” (nucleotide and aminoacid sequences of the coding regions extracted from theIMGT/LIGM-DB reference sequence), with gaps accordingto the IMGT unique numbering (Lefranc et al., 2003).

IMGT/3Dstructure-DB is the IMGT 3D structuredatabase for IG, TR, MHC, and RPI of human and othervertebrate species, created by LIGM, on the Web sinceNovember 2001 (Kaas and Lefranc, 2002; Kaas et al.,2004). IMGT/3Dstructure-DB comprises IG, TR, MHC,and RPI with known 3D structures. In October 2003,IMGT/3Dstructure-DB managed 634 atomic coordinatefiles which correspond to 422 different proteins (260 IG,18 TR and 144 MHC). Coordinate files are extracted fromthe Protein Data Bank (PDB) (Berman et al., 2000), andIMGT annotations are added according to the IMGT Scien-tific chart rules, based on the IMGT-ONTOLOGY concepts(Giudicelli and Lefranc, 1999). An IMGT/3Dstructure-DBcard provides the IMGT gene and allele identification(based on the CLASSIFICATIOB concept), domain de-limitations (based on the DESCRIPTION concept), amino

Page 8: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

654 M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660

Fig. 6. IMGT/GENE-DB Query page (http://imgt.cines.fr). IMGT locus, group, subgroup are according to the IMGT nomenclature, based on theCLASSIFICATION concept of IMGT-ONTOLOGY. In October 2003, IMGT/GENE-DB contained 1375 IG and TR genes and 2201 alleles, from humanand mouse.

acid positions according to the IMGT unique numbering(Lefranc et al., 2003) (based on the NUMEROTATION con-cept). Domains that are analyzed in IMGT/3Dstructure-DBinclude V-DOMAIN (variable) and C-DOMAIN (constant)found in IG and TR, V-LIKE- and C-LIKE-DOMAINsfound in proteins other than IG and TR, G-DOMAIN(groove) found in MHC, and G-LIKE-DOMAINs foundin proteins other than MHC (Duprat and Lefranc, 2003).Moreover, IMGT/3Dstructure-DB provides renumbered co-ordinate flat files, Colliers de Perles (Fig. 3) (standard andtwo-layer 2D graphical representations) (Ruiz and Lefranc,2002; Kaas and Lefranc, 2002; Kaas et al., 2004), and re-sults of contact analysis. The IMGT unique numbering andgene standardization will provide a great help in large scalesequence–structure studies and more generally in proteinengineering.

4. IMGT interactive tools

IMGT interactive immunoinformatics tools rely onthe DESCRIPTION and NUMEROTATION concepts ofIMGT-ONTOLOGY, and are available for sequence anal-

ysis: IMGT/V-QUEST, IMGT/JunctionAnalysis, IMGT/PhyloGene and IMGT/Allele-Align, for genome analysis:IMGT/GeneSearch, IMGT GeneView, IMGT LocusView,and for 3D structural analysis: IMGT/StructuralQuery.IMGT/V-QUEST (V-QUEry and standardization) is anintegrated software for IG and TR (Lefranc, 2001). Thistool, easy to use, analyses an input IG or TR germline orrearranged variable nucleotide sequence. IMGT/V-QUESTresults (Fig. 7) comprise the identification of the V, D, andJ genes and alleles and the nucleotide alignment by com-parison with sequences from the IMGT reference directory,the delimitations of the FR-IMGT and CDR-IMGT basedon the IMGT unique numbering, the protein translationof the input sequence, the identification of the JUNC-TION and the 2D Collier de Perles representation of theV-REGION. IMGT/V-QUEST is particularly useful forthe analysis of the rearranged variable genes: it allows toidentify the V-GENE and J-GENE and alleles involved inthe IG and TR rearrangements, and to delimit the JUNC-TION. Searches can be done related to IG and TR of humanand mouse, and of other species (non-human primates,sheep, teleostei, and chondrichthyes). IMGT/V-QUESTcan also be used for the analysis of functional or ORF

Page 9: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660 655

Fig. 7. IMGT/V-QUEST (http://imgt.cines.fr) results on gene and allele identification. IMGT/V-QUEST compares the input germline or rearrangedIG or TR variable sequences with the IMGT/V-QUEST reference directory sets. For example, the highest scores for the input AJ399821 rearrangedsequence allow to identify IGHV1-3∗01, lGHD4-4∗01 or IGHD4-11∗01, IGHJ1∗01 as being the genes and alleles most probably involved in the V-D-Jrearrangement. The IMGT/V-QUEST results comprise the translation of the JUNCTION for rearranged sequences, the delimitations of the FR-IMGTand CDR-IMGT, and also, not shown in the figure, the protein translation and the two-dimensional representation or Collier de Perles of the V-REGION.Information provided by lMGT/V-QUEST (V and J gene and allele names, sequence of the JUNCTION (from 2nd-CYS 104 to J-PHE or J-TRP 118) canthen be used in IMGT/JunctionAnalysis for a confirmation of the D gene and allele identification and a more accurate analysis of the junction (see Fig. 8).

germline variable genes from other species, to delimit theFR-IMGT and CDR-IMGT, provided that the similaritywith sequences of the IMGT/V-QUEST reference direc-tory sets is sufficiently high. The set of sequences fromthe IMGT reference directory, used for IMGT/V-QUEST,can be downloaded in FASTA format from the IMGTsite. IMGT/JunctionAnalysis is a tool, complementary toIMGT/V-QUEST, which provides a thorough analysis of

the V–J and V–D–J junction of IG and TR rearranged genes(Fig. 8). IMGT/JunctionAnalysis identifies the D-GENEsand alleles involved in the IGH, TRB and TRD V–D–Jrearrangements by comparison with the IMGT referencedirectory, and delimits precisely the P, N and D regions(Fig. 8). Results from IMGT/JunctionAnalysis are more ac-curate than those given by IMGT/V-QUEST regarding theD-GENE identification. Indeed, IMGT/JunctionAnalysis

Page 10: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

656 M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660

Fig. 8. lMGT/Junction-Analysis (http://imgt.cines.fr) results. (A) Examples of IGK V-J junctions. (B) Examples of IGH V-D-J junctions. ThelMGT/JunctlonAnalysis results comprise the junction translation, the identification of the D-GENE and allele (for each V-D-J junction), the identificationof the P and N regions (N1, N2, etc.) and their precise delimitations, the identification of the deleted nucleotides by comparison to the germline allelesequences. The CDR3-IMGT numbering is according to the IMGT unique numbering for V-DOMAIN. Vmut, Dmut and Jmut correspond to the numberof mutations in the input junction sequence by comparison to the germline allele sequences. Ngc is the ratio of the number of g+ c nucleotides to thetotal number of nudeotides in the N regions. lMGT/JunctionAnalysis analyses, in a single search, an unlimited number of junctions provided that theV-GENE and J-GENE allele IMGT names are identified.

Page 11: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660 657

works on shorter sequences (JUNCTION), and with ahigher constraint since the identification of the V-GENEand J-GENE and alleles is a prerequisite to perform theanalysis. Several hundreds of junction sequences can beanalyzed simultaneously.

IMGT/PhyloGene is an easy to use tool for phyloge-netic analysis of variable region (V-REGION) and con-stant domain (C-DOMAIN) sequences (Elemento andLefranc, 2003). This tool is particularly useful in de-velopmental and comparative immunology. The userscan analyze their own sequences by comparison withthe IMGT standardized reference sequences for humanand mouse IG and TR. IMGT/Allele-Align allows thecomparison of two alleles highlighting the nucleotideand amino acid differences. IMGT/GeneSearch, IMGTGeneView and IMGT LocusView are tools which providethe display of physical maps for the human IG, TR, andMHC loci. The mouse TRA/TRD locus is also available.IMGT/StructuralQuery is an interactive tool which allows toretrieve the IMGT/3Dstructure-DB entries based on specificstructural characteristics: phi (φ) and psi (ψ) angles, accessi-ble surface area ASA, amino acid type, distance in angstrombetween amino acids, CDR-IMGT lengths (Kaas andLefranc, 2003; Kaas et al., 2004). IMGT/StructuralQuery iscurrently available for the V-DOMAINs.

5. IMGT Web resources

The IMGT Web resources (“IMGT Marie-Paule page”)(Lefranc, 2003a) consist of 8000 HTML pages and comprisethe following sections: “IMGT Scientific chart”, “IMGTRepertoire”, “IMGT Bloc-notes”, “IMGT Education”and “IMGT Index”. IMGT Scientific chart providesthe controlled vocabulary and the annotation rules andconcepts defined by IMGT-ONTOLOGY (Giudicelliand Lefranc, 1999) for the identification, the descrip-tion, the classification and the numerotation of the IG,TR, MHC, and RPI data of human and other verte-brates. The IMGT Scientific chart rules are described inthe corresponding sections: sequence and 3D structureidentification and description, nomenclature, numbering(http://imgt.cines.fr/textes/IMGTScientificChart).

IMGT Repertoire is the global Web resource in ImMuno-GeneTics for the IG, TR, MHC, and RPI of human andother vertebrates, based on the “IMGT Scientific chart”.IMGT Repertoire provides an easy-to-use interface to care-fully and expertly annotated data on the genome, proteome,polymorphism, and structural data, organized in three majorsections: IMGT Repertoire (IG and TR), IMGT Repertoire(MHC), and IMGT Repertoire (RPI) (Lefranc, 2001a).Only titles of this large section are quoted here. Genomedata (“Locus and genes”) include chromosomal localiza-tions, locus representations, locus description, gene tables,potential germline repertoires, lists of IG and TR genesand links between IMGT, HUGO, GDB, LocusLink and

OMIM, correspondence between nomenclatures (Lefrancand Lefranc, 2001a,b), reference sequences (Barbié andLefranc, 1998; Pallarès et al., 1998, 1999; Ruiz et al., 1999;Folch and Lefranc, 2000a,b; Scaviner and Lefranc, 2000a,b;Martinez-Jean et al., 2001; Bosc and Lefranc, 2003). Pro-teome and polymorphism data (“Proteins and alleles”)are represented by protein displays which show translatedsequences of the allele∗01 of each functional or ORFgene (Lefranc and Lefranc, 2001a,b; Scaviner et al., 1999;Folch et al., 2000), alignments of alleles, tables of alle-les, allotypes, particularities in protein designations, IMGTreference directory in FASTA format, correspondence be-tween IG and TR chain and receptor IMGT designations(Lefranc and Lefranc, 2001a,b). Structural data (“2D and3D structures”) comprise 2D graphical representations des-ignated as Colliers de Perles (Lefranc et al., 1998, 1999),FR-IMGT and CDR-IMGT lengths, and 3D representationsof IG and TR variable domains (Ruiz and Lefranc, 2002;Lefranc et al., 2003). This visualization permits rapid cor-relation between protein sequences and 3D data retrievedfrom the PDB. Other data comprise: “Probes and RFLP”with phages, probes used for the analysis of IG and TRgene rearrangements and expression, and restriction frag-ment length polymorphism (RFLP) studies, “Taxonomy”of vertebrate species present in IMGT/LIGM-DB, “Generegulation and expression” with data on promoters, primers,cDNAs, reagent monoclonal antibodies, and “Genes andclinical entities”: translocations and inversions, humanizedantibodies, monoclonal antibodies with clinical indications.

IMGT Bloc-notes is a selection of useful links for im-munoinformatics, immunogenetics, immunology, genetics,molecular biology, and bioinformatics. IMGT Bloc-notes isorganized in several sections. The IMGT immunoinformat-ics page comprises links to databases, tools, and resourceson IG, TR, MHC, and RPI. Interesting links provide numer-ous hyperlinks towards the Web servers specializing in im-munology, genetics, molecular biology, and bioinformatics(associations, biopharmaceuticals, collections, companies,databases, immunology themes, journals, molecular biologyservers, resources, societies, tools, etc.) (Lefranc, 2000e).Other sections are meeting announcements, postdoctoral po-sitions, etc.

IMGT Education is a section which provides use-ful biological resources for students. It includes IMGTAide-mémoire which provides an easy access to informa-tion such as genetic code, splicing sites, amino acid struc-tures, restriction enzyme sites, etc., questions and answers,tutorials (in English and/or in French) on 3D structure, im-munoglobulins and B cells, T cell receptors and T cells, NKreceptors, pathologies of the immune system, cancer, AIDS,etc. IMGT Index is a referential index which provides a fastway to access data when information has to be retrievedfrom different parts of the IMGT site. For example, “allele”provides links to the IMGT Scientific chart rules for theallele description, and to the IMGT Repertoire Alignmentsof alleles and Tables of alleles.

Page 12: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

658 M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660

6. Conclusion

Since July 1995, IMGT has been available on the Webat http://imgt.cines.fr. IMGT provides the biologists withan easy to use and friendly interface. Since January 2000,the IMGT www server at Montpellier has been accessed bymore than 250,000 sites. IMGT has an exceptional responsewith more than 140,000 requests a month. Two-thirdsof the visitors are equally distributed between the Euro-pean Union and the United States. IMGT distributes highquality data with an important incremental value addedby the IMGT expert annotations, according to the rulesdescribed in the IMGT Scientific chart. Control of co-herence in IMGT combines data integrity control and bi-ological data evaluation (Giudicelli et al., 1998a,b). Theinformation provided by IMGT is of much value to clin-icians and biological scientists in general (Lefranc, 2002,2003b; Chardès et al., 2002). Further developments in-clude the implementation of the IMGT/GeneInfo (Baumet al., 2004) and IMGT/GeneFrequency tools, and that ofIMGT/PROTEIN-DB, a protein database for IG and TR,which will contain translations of potentially functionaland ORF sequences from IMGT/LIGM-DB, and proteindata fromKabat et al. (1991)and PDB. To facilitate theintegration of IMGT data into applications developed byother laboratories, we have built an application program-ming interface (API) to access the database (Giudicelliet al., 1998a). This API includes: a set of URL links toaccess biological knowledge data (keywords, labels, func-tionalities, list of gene names, etc.), a set of URL links toaccess all data related to one given sequence. To increaseinteroperability with other ontologies and informationsystems, IMGT-ONTOLOGY is currently being writtenusing extensible markup language (XML) approach, inIMGT-ML ( Chaume et al., 2001, 2003). IMGT-ML wilIalso be used for implementing Web services and developingIMGT-Chorégraphie, a long-term project aimed at creatingdynamic interactions between the IMGT databases, tools,and Web resources. IMGT is designed to allow a commonaccess to all immunogenetics data, and particular attentionis given to the establishment of cross-referencing links toother databases pertinent to the users of IMGT.

7. Citing IMGT

If you use IMGT databases, tools and/or Web re-sources, please citeLefranc (2003)and this paper as ref-erences, and quote the IMGT Home page URL address,http://imgt.cines.fr.

Acknowledgements

I am deeply grateful to the IMGT team for its hardwork, expertise, constant motivation and enthusiasm. “Mille

mercis” to Nathalie Bosc, Denys Chaume, Kora Com-bres, Elodie Duprat, Géraldine Folch, Chantal Ginestoux,David Girod, Véronique Giudicelli, Delphine Guiraudou,Joumana Jabado-Michaloud, Stéphanie Jeanjean, QuentinKaas, Gérard Lefranc, Séverine Magris, Christelle Pommié,Céline Protat, Dominique Scaviner, Valérie Thouvenin, andMehdi Yousfi Monod. I thank Marc Lemaitre, Marie-ClaireBeckers, Géraldine Folch, and Delphine Valette from EU-ROGENTEC S.A., Belgium, for their scientific contributionto IMGT/PRIMER-DB, Julien Bertrand, Olivier Elemento,Christèle Jean-Martinez, and Manuel Ruiz for their pre-vious work at LIGM, Peter Stoehr (EBI), Denis Pugnère(IGH), Nicolas Joly (Institut Pasteur Paris), and Jean-MarcPlaza (INFOBIOGEN) for the IMGT/LIGM-DB flat filedistribution. IMGT is funded by the European Union’sfifth PCRDT programme (QLG2-2000-01287), the CentreNational de la Recherche Scientifique (CNRS), the Min-istère de l’Education Nationale, and the Ministère de laRecherche. Subventions have been received from Associa-tion pour la Recherche sur le Cancer (ARC) and the RégionLanguedoc-Roussillon.

References

Barbié, V., Lefranc, M.-P., 1998. The human immunoglobulin kappavariable (IGKV) genes and joining (IGKJ) segments. Exp. Clin.Immunogenet. 15, 171–183.

Baum, T.-P., Pasqual, N., Thuderoz, F., Hierle, V., Chaume, D.,Lefranc, M.-P., Jouvin-Marche, E., Marche, P.-N., Demongeot, J.,2004. IMGT/GeneInfo: enhancing V(D)J recombination databaseaccessibility. Nucleic Acids Res. 32, Database January issue.

Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler,D.L., 2003. GenBank. Nucleic Acids Res. 31, 23–27.

Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig,H., Shindyalov, I.N., Bourne, P.E., 2000. The Protein Data Bank.Nucleic Acids Res. 28, 235–242.

Blake, J.A., Richardson, J.E., Bult, C.J., Kadin, J.A., Eppig, J.T., andthe mouse genome database group, 2003. MGD: the Mouse GenomeDatabase. Nucleic Acids Res. 31, 193–195.

Bosc, N., Lefranc, M.-P., 2003. IMGT Locus in Focus: the mouse (Musmusculus) T cell receptor alpha (TRA) and delta (TRD) variable genes.Dev. Comp. Immunol. 27, 465–497.

Chardès, T., Chapal, N., Bresson, D., Bès, C., Giudicelli, V., Lefranc,M.-P., Péraldi-Roux, S., 2002. The human anti-thyroid peroxidaseautoantibody repertoire in Graves and Hashimoto’s autoimmune thyroiddiseases. Immunogenetics 54, 141–157.

Chaume, D., Giudicelli, V., Lefranc, M.-P., 2001. IMGT-XML a languagefor IMGT-ONTOLOGY and IMGT/LIGM-DB data. In: Proceedingsof the NETTAB on CORBA and XML: Towards a BioinformaticsIntegrated Network Environment. Network Tools and Applications inBiology. pp. 71–75.

Chaume, D., Giudicelli, V., Combres, K., Lefranc, M.-P., 2003.IMGT-ONTOLOGY and IMGT-ML for Immunogenetics andImmunoinformatics. European Congress in Computational BiologyECCB. Sequence Databases and Ontologies Satellite Event. Paris.

Chothia, C., Lesk, A.M., 1987. Canonical structures for the hypervariableregions of immunoglobulins. J. Mol. Biol. 196, 901–917.

Duprat, E., Lefranc, M.-P., 2003. IMGT Standardization and Analysisof V-LIKE, C-LIKE and G-LIKE-DOMAINs. In: Proceedings of theEuropean Congress in Computational Biology ECCB. Paris, 2003,pp. 223–224.

Page 13: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660 659

Elemento, O., Lefranc, M.-P., 2003. IMGT/PhyloGene: an online softwarepackage for phylogenetic analysis of immunoglobulin and T cellreceptor genes. Dev. Comp. Immunol. 27, 763–779.

Folch, G., Lefranc, M.-P., 2000a. The human T cell receptor beta variable(TRBV) genes. Exp. Clin. Immunogenet. 17, 42–54.

Folch, G., Lefranc, M.-P., 2000b. The human T cell receptor beta diversity(TRBD) and beta joining (TRBJ) genes. Exp. Clin. Immunogenet. 17,107–114.

Folch, G., Scaviner, D., Contet, V., Lefranc, M.-P., 2000. Protein displaysof the human T cell receptor alpha, beta, gamma and delta variableand joining regions. Exp. Clin. Immunogenet. 17, 205–215.

Giudicelli, V., Lefranc, M.-P., 1999. Ontology for immunogenetics:IMGT-ONTOLOGY. Bioinformatics 12, 1047–1054.

Giudicelli, V., Chaume, D., Bodmer, J., Müller, W., Busin, C., Marsh, S.,Bontrop, R., Lemaitre, M., Malik, A., Lefranc, M.-P., 1997. IMGT,The International ImMunoGeneTics Database. Nucleic Acids Res. 25,206–211.

Giudicelli, V., Chaume, D., Lefranc, M.-P., 1998a. IMGT/LIGM-DB:a systematized approach for ImMunoGeneTics database coherenceand data distribution improvement. In: Proceedings of the SixthInternational Conference on Intelligent Systems for Molecular Biology(ISBM). 1998, pp. 59–68.

Giudicelli, V., Chaume, D., Mennessier, G., Althaus, H.H., Müller, W.,Bodmer, J., Malik, A. Lefranc, M.-P., 1998b. IMGT, the internationalImMunoGeneTics database: a new design for immunogenetics dataaccess. In: Cesnik, B., et al. (Eds.), Proceedings of the Ninth WorldCongress on Medical Informatics. MEDINFO, IOS Press, Amsterdam,pp. 351–355.

Giudicelli, V., Protat, C., Lefranc, M.-P., 2003. The IMGT strategy for theautomatic annotation of IG and TR cDNA sequences: IMGT/Automat.In: Proceedings of the European Congress in Computational Biology(ECCB). Paris, 2003, pp. 103–104.

Kaas, Q., Lefranc, M.-P., 2002. IMGT/3DStructure-DB for immuno-globulin, T cell receptor and MHC structural data. In: Proceedings ofthe European Congress in Computational Biology (ECCB). 2002, P72http://www.eccb2002.de.

Kaas, Q., Lefranc, M.-P., 2003. IMGT/StructuralQuery: a tool forstructural data analysis of immunoglobulin and T cell receptorvariable domains (http://imgt.cines.fr). In: Proceedings of the EuropeanCongress in Computational Biology (ECCB). 2003, Paris, 101–102.

Kaas, Q., Ruiz, M., Lefranc, M.-.P., 2004. IMGT/3DStructure-DB andIMGT/StructuralQuery, a database and a tool for immunoglobulin, Tcell receptor and MHC structural data. Nucleic Acids Res. 32, DatabaseJanuary issue.

Kabat, E.A., Wu, T.T., Perry, H.M., Gottesman, K.S., Foeller, C., 1991.Sequences of Proteins of Immunological Interest. National Institute ofHealth Publications. Washington, DC, U.S.A., pp. 91–3242.

Lefranc, M.-P., 1997. Unique database numbering system forimmunogenetic analysis. Immunol. Today 18, 509.

Lefranc, M.-P., 1998. IMGT (ImMunoGeneTics) Locus on Focus. Anew section of Experimental and Clinical Immunogenetics. Exp. Clin.Immunogenet. 15, 1–7.

Lefranc, M.-P., 1999. The IMGT unique numbering for immunoglobulins,T cell receptors and Ig-like domains. The Immunologist 7, 132–136.

Lefranc, M.-P., 2000a. Nomenclature of the Human ImmunoglobulinGenes. Current Protocols in Immunology. Supplement 40,A.1P.1-A.1P.37. Wiley, New York, U.S.A.

Lefranc, M.-P., 2000b. Nomenclature of the Human T cell Receptor Genes.Current Protocols in Immunology. Supplement 40, A.1O.1-A.1O.23.Wiley, New York, U.S.A.

Lefranc, M.-P., 2000c. Locus maps and genomic repertoire of the humanIg genes. The Immunologist 8, 80–87.

Lefranc, M.-P., 2000d. Locus maps and genomic repertoire of the humanT-cell receptor genes. The Immunologist 8, 72–79.

Lefranc, M.-P., 2000e. Web sites of Interest to Immunologists. CurrentProtocols in Immunology. A.1J.1-A.1J.33. Wiley, New York, U.S.A.

Lefranc, M.-P., 2001a. IMGT, the international ImMunoGeneTicsdatabase. Nucleic Acids Res. 29, 207–209.

Lefranc, M.-P., 2001b. Nomenclature of the human immunoglobulin heavy(IGH) genes. Exp. Clin. Immunogenet. 18, 100–116.

Lefranc, M.-P., 2001c. Nomenclature of the human immunoglobulin kappa(IGK) genes. Exp. Clin. Immunogenet. 18, 161–174.

Lefranc, M.-P., 2001d. Nomenclature of the human immunoglobulinlambda (IGL) genes. Exp. Clin. Immunogenet. 18, 242–254.

Lefranc, M.-P., 2002. IMGT, the international ImMunoGeneTics database:a high-quality information system for comparative immunogeneticsand immunology. Dev. Comp. Immunol. 26, 697–705.

Lefranc, M.-P., 2003a. IMGT, the international ImMunoGeneTicsdatabase®http://imgt.cines.fr.Nucleic Acids Res. 31, 307–310.

Lefranc, M.-P., 2003b. IMGT® databases, Web resources and toolsfor immunoglobulin and T cell receptor sequence analysis,http://imgt.cines.fr. Leukemia 17, 260–266.

Lefranc, M.-P., Lefranc G., 2001a. The Immunoglobulin FactsBook.Academic Press, London, UK, pp. 1–458, ISBN: 012441351X.

Lefranc, M.-P., Lefranc, G., 2001b. The T Cell Receptor FactsBook.Academic Press, London, UK, pp. 1–398, ISBN: 0124413528.

Lefranc, M.-P., Giudicelli, V., Busin, C., Bodmer, J., Müller, W., Bontrop,R., Lemaitre, M., Malik, A., Chaume, D., 1998. IMGT, the internationalImMunoGeneTics database. Nucleic Acids Res. 26, 297–303.

Lefranc, M.-P., Giudicelli, V., Ginestoux, C., Bodmer, J., Müller, W.,Bontrop, R., Lemaitre, M., Malik, A., Barbié, V., Chaume, D., 1999.IMGT, the international ImMunoGeneTics database. Nucleic AcidsRes. 27, 209–212.

Lefranc, M.-P., Pommié, C., Ruiz, M., Giudicelli, V., Foulquier, E.,Truong, L., Thouvenin-Contet, V., Lefranc, G., 2003. IMGT, uniquenumbering for immunoglobulin and T cell receptor variable domainsand Ig superfamily V-Like domains. Dev. Comp. Immunol. 27, 55–77.

Letovsky, S.I., Cottingham, R.W., Porter, C.J., Li, P.W., 1998. GDB: thehuman Genome DataBase. Nucleic Acids Res. 26, 94–99.

Martinez-Jean, C., Folch, G., Lefranc, M.-P., 2001. Nomenclature andoverview of the mouse (Mus musculus and Mus sp.) immunoglobulinkappa (IGK) genes. Exp. Clin. Immunogenet. 18, 255–279.

Miyazaki, S., Sugawara, H., Gojobori, T., Tateno, Y., 2003. DNA DataBank of Japan (DDBJ) in XML. Nucleic Acids Res. 31, 13–16.

Pallarès, N., Frippiat, J.P., Giudicelli, V., Lefranc, M.-P., 1998. The humanimmunoglobulin lambda variable (IGLV) genes and joining (IGLJ)segments. Exp. Clin. Immunogenet. 15, 8–18.

Pallarès, N., Lefebvre, S., Contet, V., Matsuda, F., Lefranc, M.-P., 1999.The human immunoglobulin heavy variable (IGHV) genes. Exp. Clin.Immunogenet. 16, 36–60.

Pommié, C., Levadoux, S., Sabatier, R., Lefranc, G., Lefranc, M.-P.IMGT standardized criteria for statistical analysis of immunoglobulinV-REGION amino acid properties. J. Mol. Recognit., in press.

Pruitt, K.D., Maglott, D.R., 2001. RefSeq and LocusLink: NCBIgene-centered resources. Nucleic Acids Res. 29, 137–140.

Robinson, J., Waller, M.J., Parham, P., de Groot, N., Bontrop, R., Kennedy,L.J., Stoehr, P., Marsh, S.G.E., 2003. IMGT/HLA and IMGT/MHC:sequence databases for the study of the major histocompatibilitycomplex. Nucleic Acids Res. 31, 311–314.

Ruiz, M., Pallarès, N., Contet, V., Barbié, V., Lefranc, M.-P., 1999. Thehuman immunoglobulin heavy diversity (IGHD) and joining (IGHJ)segments. Exp. Clin. Immunogenet. 16, 173–184.

Ruiz, M., Giudicelli, V., Ginestoux, C., Stoehr, P., Robinson, J., Bodmer,J., Marsh, S.G., Bontrop, R., Lemaitre, M., Lefranc, G., Chaume,D., Lefranc, M.-P., 2000. IMGT, The International ImmunoGeneTicsDatabase. Nucleic Acids Res. 28, 219–221.

Ruiz, M., Lefranc, M.-P., 2002. IMGT gene identification and Colliersde Perles of human immunoglobulin with known 3D structures.Immunogenetics 53, 857–883.

Safran, M., Solomon, I., Shmueli, O., Lapidot, M., Shen-Orr, S., Adato,A., Ben-Dor, U., Esterman, N., Rosen, N., Peter, I., Olender, T.,Chalifa-Caspi, V., Lancet, D., 2002. GeneCards 2002: towards acomplete, object-oriented, human gene compendium. Bioinformatics18, 1542–1543.

Satow, Y., Cohen, G.H., Padlan, E.A., Davies, D.R., 1986. Phosphocholinebinding immunoglobulin Fab, McPC603. J. Mol. Biol. 190, 593–604.

Page 14: Review IMGT-ONTOLOGY and IMGT databases, tools and Web … · 648 M.-P. Lefranc/Molecular Immunology 40 (2004) 647–660 Fig. 1. Databases, interactive tools and Web resources of

660 M.-P. Lefranc / Molecular Immunology 40 (2004) 647–660

Scaviner, D., Lefranc, M.-P., 2000a. The human T cell receptor alphavariable (TRAV) genes. Exp. Clin. Immunogenet. 17, 83–96.

Scaviner, D., Lefranc, M.-P., 2000b. The human T cell receptor alphajoining (TRAJ) genes. Exp. Clin. Immunogenet. 17, 97–106.

Scaviner, D., Barbié, V., Ruiz, M., Lefranc, M.-P., 1999. Protein displaysof the human immunoglobulin heavy, kappa and lambda variable andjoining regions. Exp. Clin. Immunogenet. 16, 234–240.

Stoesser, G., Baker, W., Van den Broek, A., Garcia-Pastor, M., Kanz, C.,Kulikova, T., Leinonen, R., Lin, Q., Lombard, V., Lopez, R., Mancuso,R., Nardone, F., Stoehr, P., Tuli, M.A., Tzouvara, K., Vaughan, R., 2003.

The EMBL nucleotide sequence database: major new developments.Nucleic Acids Res. 31, 17–22.

Wain, H.M., Bruford, E.A., Lovering, R.C., Lush, M.J., Wright, M.W.,Povey, S.,2002a. Guidelines for human gene nomenclature. Genomics79, 464–470.

Wain, H.M., Lush, M., Ducluzeau, F., Povey, S., 2002b. Genew: thehuman gene nomenclature database. Nucleic Acids Res. 30, 169–171.

Williams, A.F., Barclay, A.N., 1988. The immunoglobulin superfamily-domains for cell surface recognition. Annu. Rev. Immunol. 6, 381–405.