28
1 R.M.Piro R.M.Piro MBC, Torino – May 26 MBC, Torino – May 26 th th 2007 2007 Outline: Outline: 1) Introduction to OMIM 1) Introduction to OMIM 2) Phenotype similarity map 2) Phenotype similarity map 3) Exercises 3) Exercises Phenotype analysis Phenotype analysis in humans in humans using OMIM using OMIM Rosario M. Piro Molecular Biotechnology Center University of Torino, Italy

Phenotype analysis in humans using OMIM - Rosario M. Pirormpiro.net/teaching/pub/lectures/lecture-OMIM-phenomap-2007-05-26.… · 1 MBC, Torino – May 26th 2007 R.M.Piro Outline:

  • Upload
    vumien

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

11 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Outline:Outline:1) Introduction to OMIM1) Introduction to OMIM

2) Phenotype similarity map2) Phenotype similarity map3) Exercises3) Exercises

Phenotype analysisPhenotype analysisin humansin humansusing OMIMusing OMIM

Rosario M. PiroMolecular Biotechnology Center

University of Torino, Italy

22 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

● Online Mendelian Inheritance in Man

● comprehensive knowledge-base of human genes and genetic disorders

● started by V.A. McKusick in 1966 as MIM

● available online since 1987

● maintained since 1995 at National Center for Biotechnology Information (NCBI)

– integrated with Entrez

– daily updated by Johns Hopkins University

What is OMIM?What is OMIM?Hamosh et al., Nucleic Acids Res. 30(1):52-55, 2002

33 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

OMIM entries (1)OMIM entries (1)

● OMIM entries:– currently 17,663 entries (May 14th, 2007)

– full-text summary of gene and/or phenotype● important fields: titles/symbols, gene map locus

(cytogenetic location), text, clinical synopsis and features of a disorder, allelic variants, references, ... (not all present)

● links to other genetic databases, PubMed references, etc.

● MIM numbers/IDs: six digits starting with:

– allelic variants: 10-digit number: <MIMentry>.NNNN

1----- (100000- ) Autosomal dominant loci or phenotypes (entries created before May 15, 1994)2----- (200000- ) Autosomal recessive loci or phenotypes (entries created before May 15, 1994)3----- (300000- ) X-linked loci or phenotypes4----- (400000- ) Y-linked loci or phenotypes5----- (500000- ) Mitochondrial loci or phenotypes6----- (600000- ) Autosomal loci or phenotypes (entries created after May 15, 1994)

44 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

OMIM entries (2)OMIM entries (2)

● Entry classes:– the prefix of a MIM number indicates the

type of entry: e.g. “+113705”, BREAST CANCER 1 GENE; BRCA1

Symbol Category* gene of known sequence# descriptive entry, usually of a phenotype; does not represent a unique locus+ contains the description of a gene of known sequence and a phenotype% confirmed Mendelian phenotype or phenotypic locus; molecular basis unknown

<none> phenotype with suspected Mendelian basis or independence of entry unclear^ entry no longer exists (removed or moved to another entry)

55 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

OMIM entries (3)OMIM entries (3)

● Example: “+113705”, BREAST CANCER 1 GENE; BRCA1

+113705BREAST CANCER 1 GENE; BRCA1

Alternative titles; symbols

BREAST CANCER, TYPE 1, INCLUDEDBREAST CANCER 1, EARLY-ONSET, INCLUDEDBREAST-OVARIAN CANCER, INCLUDED

Gene map locus 17q21

TEXT

For a general discussion of hereditary breast cancer, see 114480.

CLINICAL FEATURES

Familial Breast Cancer

Features characteristic of familial, versus sporadic, breast cancer are younger age at diagnosis, frequent bilateral disease,and frequent occurrence of disease among men Hall et al. (1990).

According to the conclusions of the Breast Cancer Linkage Consortium (1997), the histology of breast cancers in womenpredisposed by reason of carrying BRCA1 and BRCA2 (600185) mutations differs from that in sporadic cases, and thereare differences between breast cancers in carriers of BRCA1 and BRCA2 mutations. [...]

66 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

OMIM entries (3)OMIM entries (3)

● Example: “+113705”, BREAST CANCER 1 GENE; BRCA1

+113705BREAST CANCER 1 GENE; BRCA1

Alternative titles; symbols

BREAST CANCER, TYPE 1, INCLUDEDBREAST CANCER 1, EARLY-ONSET, INCLUDEDBREAST-OVARIAN CANCER, INCLUDED

Gene map locus 17q21

TEXT

For a general discussion of hereditary breast cancer, see 114480.

CLINICAL FEATURES

Familial Breast Cancer

Features characteristic of familial, versus sporadic, breast cancer are younger age at diagnosis, frequent bilateral disease,and frequent occurrence of disease among men Hall et al. (1990).

According to the conclusions of the Breast Cancer Linkage Consortium (1997), the histology of breast cancers in womenpredisposed by reason of carrying BRCA1 and BRCA2 (600185) mutations differs from that in sporadic cases, and thereare differences between breast cancers in carriers of BRCA1 and BRCA2 mutations. [...]

#114480BREAST CANCER

Alternative titles; symbols

BREAST CANCER, FAMILIALBREAST CANCER, FAMILIAL MALE, INCLUDED

Gene map locus 22q12.1, 17q22-q23, 17q22, 17p13.1, 15q15.1,13q12.3, 12p12.1, 11q22.3, 11p15.5, 8q11, 16p12, 3q26.3, 2q34-q35

TEXT

A number sign (#) is used with this entry because of evidence that mutation at morethan one locus can be involved in different families or even in the same case. Theseloci include BRCA1 (113705) on 17q, BRCA2 (600185) on 13q12, [...]

77 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

OMIM entries (3)OMIM entries (3)

● Example: “+113705”, BREAST CANCER 1 GENE; BRCA1

+113705BREAST CANCER 1 GENE; BRCA1

Alternative titles; symbols

BREAST CANCER, TYPE 1, INCLUDEDBREAST CANCER 1, EARLY-ONSET, INCLUDEDBREAST-OVARIAN CANCER, INCLUDED

Gene map locus 17q21

TEXT

For a general discussion of hereditary breast cancer, see 114480.

CLINICAL FEATURES

Familial Breast Cancer

Features characteristic of familial, versus sporadic, breast cancer are younger age at diagnosis, frequent bilateral disease,and frequent occurrence of disease among men Hall et al. (1990).

According to the conclusions of the Breast Cancer Linkage Consortium (1997), the histology of breast cancers in womenpredisposed by reason of carrying BRCA1 and BRCA2 (600185) mutations differs from that in sporadic cases, and thereare differences between breast cancers in carriers of BRCA1 and BRCA2 mutations. [...]

#114480BREAST CANCER

Alternative titles; symbols

BREAST CANCER, FAMILIALBREAST CANCER, FAMILIAL MALE, INCLUDED

Gene map locus 22q12.1, 17q22-q23, 17q22, 17p13.1, 15q15.1,13q12.3, 12p12.1, 11q22.3, 11p15.5, 8q11, 16p12, 3q26.3, 2q34-q35

TEXT

A number sign (#) is used with this entry because of evidence that mutation at morethan one locus can be involved in different families or even in the same case. Theseloci include BRCA1 (113705) on 17q, BRCA2 (600185) on 13q12, [...]

88 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

OMIM entries (3)OMIM entries (3)

● Example: “+113705”, BREAST CANCER 1 GENE; BRCA1

+113705BREAST CANCER 1 GENE; BRCA1

Alternative titles; symbols

BREAST CANCER, TYPE 1, INCLUDEDBREAST CANCER 1, EARLY-ONSET, INCLUDEDBREAST-OVARIAN CANCER, INCLUDED

Gene map locus 17q21

TEXT

For a general discussion of hereditary breast cancer, see 114480.

CLINICAL FEATURES

Familial Breast Cancer

Features characteristic of familial, versus sporadic, breast cancer are younger age at diagnosis, frequent bilateral disease,and frequent occurrence of disease among men Hall et al. (1990).

According to the conclusions of the Breast Cancer Linkage Consortium (1997), the histology of breast cancers in womenpredisposed by reason of carrying BRCA1 and BRCA2 (600185) mutations differs from that in sporadic cases, and thereare differences between breast cancers in carriers of BRCA1 and BRCA2 mutations. [...]

#114480BREAST CANCER

Alternative titles; symbols

BREAST CANCER, FAMILIALBREAST CANCER, FAMILIAL MALE, INCLUDED

Gene map locus 22q12.1, 17q22-q23, 17q22, 17p13.1, 15q15.1,13q12.3, 12p12.1, 11q22.3, 11p15.5, 8q11, 16p12, 3q26.3, 2q34-q35

TEXT

A number sign (#) is used with this entry because of evidence that mutation at morethan one locus can be involved in different families or even in the same case. Theseloci include BRCA1 (113705) on 17q, BRCA2 (600185) on 13q12, [...]

+600185BREAST CANCER 2 GENE; BRCA2

Alternative titles; symbols

FANCD1 GENE; FANCD1BREAST CANCER, TYPE 2, INCLUDEDBREAST CANCER 2, EARLY-ONSET, INCLUDED[...]

99 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Gene and Morbid MapGene and Morbid Map

● Morbid Map (maps disorders to genes):

● Gene Map (maps genes to disorders):Location Symbol Title MIM # Disorder Comments Method Mouse17q21 BRCA1, PSCP Breast cancer-1 gene 113705 Breast cancer-1 (3); Ovarian cancer (3); Fd, REc 11(Brca1)

Breast-ovarian cancer (3); Papillary serouscarcinoma of the peritoneum (3)

Disorder Symbol(s) OMIM LocationBreast cancer (1) BCPR 113721 17p13.3Breast cancer (1) ESR1, ESR 133430 6q25.1Breast cancer (3) TSG101 601387 11p15.2-p15.1Breast cancer 2, early onset (3) BRCA2, FANCD1 600185 13q12.3Breast cancer, 114480 (3) PIK3CA 171834 3q26.3Breast cancer, 114480 (3) PPM1D, WIP1 605100 17q22-q23Breast cancer, 114480 (3) SLC22A1L, BWSCR1A, IMPT1 602631 11p15.5Breast cancer, 114480 (3) TP53, P53, LFS1 191170 17p13.1Breast cancer, 11:22 translocation associated (1) BRCATA 600048 11q23Breast cancer, ductal (2) BRCD1 211410 Chr.13Breast cancer, ductal (2) BRCD2 211420 1p36Breast cancer, early-onset, 114480 (3) BRIP1, BACH1, FANCJ 605882 17q22Breast cancer, invasive intraductal (3) RAD54L, HR54, HRAD54 603615 1p32Breast cancer, lobular (3) CDH1, UVO, LCAM, ECAD 192090 16q22.1Breast cancer, male, with Reifenstein syndrome (3) AR, DHTR, TFM, SBMA, KD, SMAX1 313700 Xq11-q12Breast cancer, somatic, 114480 (3) KRAS2, RASK2, NS3 190070 12p12.1Breast cancer, somatic, 114480 (3) RB1CC1, CC1, KIAA0203 606837 8q11Breast cancer, sporadic (3) PHB 176705 17q21Breast cancer, type 3 (2) (?) BRCA3, BRCAX 605365 13q21Breast cancer-1 (3) BRCA1, PSCP 113705 17q21Breast-ovarian cancer (3) BRCA1, PSCP 113705 17q21

1010 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Database SearchesDatabase Searches

● Database searches:– by MIM number– by disorder or gene name/symbol– plain English (e.g. tissues)

1111 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Example: OMIM search (1)Example: OMIM search (1)

1212 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Example: OMIM search (2)Example: OMIM search (2)

Filters(no filtering if not ticked)

1313 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Example: OMIM search (2)Example: OMIM search (2)

1414 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Example: OMIM search (2)Example: OMIM search (2)

1515 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

OMIM Output Formats (1)OMIM Output Formats (1)

can also be saved as a file

ASN.1

1616 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

OMIM Output Formats (2)OMIM Output Formats (2)

XML

can also be saved as a file

1717 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Example 2: Morbid MapExample 2: Morbid Map

1818 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Example 3: Gene MapExample 3: Gene Map

1919 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

OMIM DownloadsOMIM Downloads

2020 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

PhenomicsPhenomics

● systematic classification/grouping of relationships between (disease) phenotypes (“phenomics”)– in analogy to more frequent classification of gene relationships,

protein-protein interactions, etc.

– similarity between phenotypes reflects gene relationships at the levels of

● protein sequence and protein motifs● protein-protein interactions (e.g. genes involved in the same

multi-protein complex or biochemical pathway)● functional annotation

– may be used to predict biological relations between genes and proteins (for functional annotation or disease gene prediction)

2121 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

PhenomapPhenomap

● www.cmbi.ru.nl/MimMiner/

● phenotype map that describes the level of similarity between the majority of phenotypes

– establishes normalized similarity scores (between 0=unrelated and 1=identical/highly related)

– organized in a symmetrical score matrix:

– reasonable threshold for related phenotypes: ≥0.4

van Driel et al., European Journal of Human Genetics 14:535-542, 2006

#114480 [...] #176807 [...] #120435 [...] +176860#114480 1.0000 0.5108 0.4560 0.3027 [...]#176807 0.5108 1.0000 0.3881 0.2389 [...]#120435 0.4560 0.3881 1.0000 0.2132 [...]+176860 0.3027 0.2389 0.2132 1.0000

OMIM ID: Phenotype:

#114480 BREAST CANCER#176807 PROSTATE CANCER#120435 COLORECTAL CANCER, HEREDITARY

NONPOLYPOSIS, TYPE 1; HNPCC1+176860 PROTEIN C DEFICIENCY, CONGENITAL

THROMBOTIC DISEASE DUE TO

2222 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Phenomap Construction (1)Phenomap Construction (1)

● Problem:

– no systematic description of phenotypes in OMIM (hand-curated, different authors, etc.)

● Approach: Text-mining to classify >5000 phenotypes in OMIM

– extraction of phenotypic features of full text (TX) and clinical synopsis (CS) of phenotype entries

– search terms taken from anatomy (A) and disease (C) sections of the Medical Subject Headings (MeSH) vocabulary

● hierarchical structure that groups MeSH terms into “concepts” ● useful for mining OMIM entries that use different terminology for the

same concepts

2323 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Phenomap Construction (2)Phenomap Construction (2)

● Construction of “feature vectors”

– one feature vector per OMIM phenotype entry

– each feature corresponds to a MeSH concept● non-significant concepts like “syndrome” or “disease” are excluded

from the analysis

– count the number of times the terms of a MeSH concept occur in the OMIM entry

● reflects the concept's relevance to the phenotype entry!● the relevance is locally weighted (for each entry) according to how

detailed it is (e.g. “eye” -> “retina” -> “photoreceptors”) and how detailed the record itself is (shorter records have fewer occurrences)

– the significance of a feature/concept is also weighted according to the inverse document frequency measure:

● global weight of a concept c: gwc = log

2( )number of entries analyzed

number of entries that contain c

2424 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Phenomap Construction (3)Phenomap Construction (3)

● Comparing OMIM phenotype entries:– similarity quantified by comparing feature vectors

(corrected according to local and global weight of MeSH concepts)

– similarity score s = cosine of the angle between the feature vectors a,b of two phenotype entries A and B:

where ai and b

i are the weighted frequencies of

feature/concept i in entries A and B.

s a , b=∑ ai bi

∑ ai2∑ bi

2

2525 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Phenomap EvaluationPhenomap Evaluation

● evaluation on subset of 1653 phenotypes for which the causative gene and protein were known

● average of 10 randomized phenomaps as control for background signal

● results:

– genes involved in similar phenotypes often have similar sequences

● allelic mutations of the same gene can cause similar phenotypes

● paralogs that have similar function can cause similar phenotypes

● genes that share functional protein domains (only partly similar sequence)

– similar phenotypes often share a protein-protein interaction or are related to the same pathway

– genes involved in similar phenotypes often share Gene Ontology (GO) annotations

2626 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

ReferencesReferences

● OMIM:– Hamosh et al., “Online Mendelian Inheritance

in Man (OMIM), a knowledgebase of human genes and genetic disorders”, Nucleic Acids Research 30(1):52-55, 2002.

– website: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM

● Phenome Map / MimMiner:– van Driel et al., “A text-mining analysis of the

human phenome”, European Journal of Human Genetics 14:535-542, 2006.

– website: http://www.cmbi.ru.nl/MimMiner/

2727 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Exercises (1)Exercises (1)

● What are the ten most similar phenotypes of “Amyotrophic Lateral Sclerosis” (ALS) in the phenomap?

● What can you find out about the most similar phenotype?

– Of what OMIM category is its entry? Why?

● Is “Hexosaminidase A Deficiency” also related to ALS?

– What are the ID end class of its OMIM entry?

– What gene causes the disease?

– What is the most similar phenotype to “Hexosaminidase A Deficiency”? Why is it so similar?

2828 R.M.PiroR.M.PiroMBC, Torino – May 26MBC, Torino – May 26thth 2007 2007

Exercises (2)Exercises (2)

● What is the most similar phenotype to “Breast Cancer”?

– Is “Breast Cancer” also the most similar phenotype to that phenotype according the phenomap, i.e. does the reverse relationship hold?

● Does it make sense to talk about “most similar phenotypes”? Is a phenomap score of 0.5225 so much better then 0.5108?

– Remember how the phenomap is constructed and what assumptions are made

● Does the phenomap accurately reflect phenotype similarities? What is its strength/advantage?