View
217
Download
2
Category
Tags:
Preview:
Citation preview
Cincinnati Comparative Mouse Genomics Centers Consortium: Bioinformatics Analysis Tools for Assessment of Human Gene PolymorphismsAnil G Jegga, Sivakumar Gowrisankar, Jing Chen, Rafal Adamczak, Ashima Gupta, Marc A Ramirez, Kalyan SC Andra, James W Carman, Bruce J Aronow
University of Cincinnati and Cincinnati Children’s Hospital Medical Center, Cincinnati, OH-45229
A principal goal of the NIEHS Comparative Mouse Genome Centers Consortium (CMGCC) is to systematically evaluate the effect of human genome polymorphisms on critical genes, pathways, and processes that alter the impact of environmental agents on human disease. The evaluation process is to identify polymorphisms, perform sequence analysis, and assess disease association and functional impact using mouse models. Sequence analysis provides an opportunity to prioritize the functional evaluation process in favor of polymorphisms likely to have harmful impact. The University of Cincinnati – CCMGC (Cincinnati Comparative Mouse Genomics Center; http://cmgcc.cchmc.org) has developed several bioinformatics’ analysis tools to improve visualization, assessment, and ranking of polymorphisms. We are now collaborating with the Universities of Washington and Utah and have developed methods to study all of the genes within the EGP and have focused analyses and tools development in three areas: genes, proteins, and pathways. In particular, we have analyzed most of the genes within DNA repair and cell cycle control categories.
Support: NIEHS U01 ES11038 Mouse Centers Genomics Consortium
Human RefSeq Proteins
NCBI-dbSNP(non-EGP Genes)
EGP-SNPsDomain Mapping
PolyDom
Pfam Domain Database
Biological Implication•Text Parsing•PolyPhen/SIFT
PolyDoms: We have now mapped all non-synonymous SNPs of EGP genes onto the corresponding conserved and known functional protein domains. The potential protein structure altering implications of the coding SNPs have been collected into a general visualizer for the polymorphic proteins (http://polydoms.cchmc.org) using the PolyPhen (Polymorphism Phenotyping; http://tux.embl-heidelberg.de/ramensky/) annotations and SIFT (Sorting Intolerant From Tolerant; http://blocks.fhcrc.org/~pauline/SIFT.html) server. Links to MedLine abstracts referring to the disease implications of any polymorphism of each protein is also provided when available, and are automatically updated for each of the proteins. We are also extending the mapping of the nsSNPs in the context of the 3D structure.
TraFaC: To identify gene features potentially susceptible to the effects of polymorphisms, we have developed a system that uses mouse-human comparative genomic sequence feature analysis to identify conserved cis-element clusters that could act as gene regulatory regions. Our current implementation of this tool has focused on the identification and characterization of genomic features that are conserved between mouse and human (http://trafac.chmcc.org). Additional functionalities under development will allow for direct visualization of conserved features altered by insertions, deletions, and non-coding SNPs.
V$ETSF/ETS1_B 8333 - 8347
V$STAT/STAT1_01 8335 - 8355
V$ETSF/PU1_B 8335 - 8350
V$ETSF/GABP_B 8336 - 8347
V$ETSF/NRF2_01 8338 - 8347
V$CLOX/CDPCR3_01 8363 - 8377
V$EVI1/EVI1_01 8373 - 8388
V$ETSF/ETS1_B 8880-8894
V$STAT/STAT1_01 8881-8901
V$ETSF/PU1_B 8882-8897
V$ETSF/NRF2_01 8892-8902
V$CLOX/CDPCR3_01 8908-8922
V$GATA/GATA_C 8916-8928
V$FKHD/FREAC2_01 8923-8938
>Seq 1 Genomic
AGAGAAAATTGCTAGAGCTCAGGAGTTTGAGACCAGCCTGGGCAATAGAGTAAGACTTTGTCTCTATCAAAAATTTAAAAATTAACTGGGCTTGGCGGTGTGCACCTGTGGTCCAGCTACTCAGGAGGCTGAGGTGGGAGGATTGCTTGAGCCCAAGAGGTTGAGGCTGCAGTAAGCCGT
>Seq 2 Genomic
GACTGAGGGCTTGTGAAACAGCAAGAACCTGTCTCAAAAAACAGTGGGCAGGGAGGGGATTAATGAATAGGCAGCTACGTTCTGGGACTGGAGGGACTCGAGGTGGCTAGAAAGCAAGAGGTACTGGGAGACAAGGCTGCAGACATTTCTTTTTTTTTTTTTTTTTTTTGAGACAGAGTC
Local Alignment Number 5 Similarity Score: 3074 Match Percentage: 51 % Number of Matches: 96 Number of Mismatches: 39 Total Length of Gaps: 52 Begins at (8281,8874) and Ends at (8416,9059)
Seq 1 <--> Seq 2 Sim% No. of Nt8281-8300 <--> 8874-8893 70% (20 nt)8301-8310 <--> 8902-8911 90% (10 nt)8311-8324 <--> 8923-8936 57% (14 nt)8325-8376 <--> 8947-8998 62% (52 nt)8378-8386 <--> 8999-9007 67% (9 nt)8387-8416 <--> 9030-9059 90% (30 nt)
BlastZ
TF Binding Sites TF Binding Sites
Seq 1 <--> Seq 2 Sim% Nt Hits8301-8310 <--> 8902-8911 90% (10 nt) 38311-8324 <--> 8923-8936 57% (14 nt) 28325-8376 <--> 8947-8998 62% (52 nt) 38378-8386 <--> 8999-9007 67% (9 nt) 08387-8416 <--> 9030-9059 90% (30 nt) 4
TraFaC
CMGCC-UC BIOINFORMATICS TOOLS CMGCC-UC BIOINFORMATICS TOOLS
TraFaC (http://trafac.cchmc.org) TraFaC (http://trafac.cchmc.org)
SIFT Analysis Gene Group No. of Genes
Genes with at least one nsSNP
Genes with no nsSNP
No. of nsSNPs
Genes with at least one nsSNP in a Pfam Domain
No. of nsSNPs in Pfam Domains
Genes with at least one PubMed Reference
Genes Analysed
(%/Intolerant=No. Intolerant/Total analysed nsSNP)
Cell Cycle 76 56 20 159 41 69 31 35 21/82 = 26% Cell Division 17 12 5 52 7 14 10 6 7/34 = 21% Cell Signaling 21 18 3 102 11 44 19 10 17/67 = 25% Cell Structure 1 1 0 1 1 1 1 0 0 DNA Repair 70 66 4 445 47 130 40 42 29/274 = 11% Gene Expression 6 3 3 10 2 2 3 2 1/7 = 14% Homeostasis 21 17 4 58 14 36 14 11 11/37 = 30% Metabolism 22 17 5 76 17 55 18 11 12/37 = 32% Total 234 190 44 903 140 351 136 117 98/538 = 26%
EGP genes (234): Incidence of SNPs in the context of functional domains
PolyDoms (http://polydoms.cchmc.org) PolyDoms (http://polydoms.cchmc.org) PathMaker PathMaker
PathMaker: To represent the presence and impact polymorphisms in the context of biological pathways, we have sought to unify our representation of molecular, biological, and environmental entities such that biological knowledge from experts and biomedical literature could be assembled in a storyboard canvas. For example, the representation of a disease could consist of a biological process that is itself comprised of one or more pathways, within which, entities (gene products, complexes, and cellular and sub-cellular components) are subjected to one or more interactions and transitions to disease term associated states. We have begun the development of an application and database structure that can represent these processes, using a host of publicly available data sources including gene objects and biological ontologies to represent biomedical literature and expert knowledge.
Ensembl GenBank Swiss-Prot KEGGOMIM PubMed
Molecular Databases Processes & Associated Databases
PathMaker Application Layers
Search and Browse
Organization and Modification Layer
Display Layer
PathMakerCanvas and Navigator
OntologyCategorizer
State ChangerOntologyBrowserModel Viewer
ComplexBuilder
TaxonomyViewer
GeneralSearch
MoleculeSearch
OntologyExplorer
Generalized Biological Object Model
Molecular Entities &
Relationships
Pathways Processes &
Diseases
Biological Ontologies
PATHMAKER: A systems biology modeling and data mining tool built on a generalized biological object model able to represent the interaction of genes, biologic processes, environment, oncogenic pathways, and disease.
BiomaterialTaxon
DevelopmentalTemporalAnatomic
body partorgantissuecellsubcellular
ProcessPhysiologicalgene ontology
molecular functionbiological processcellular localization
TemporalPathologic
Toxicologic Injury Genomic damageSNOMED pathologyICD-9/10OMIM
ExposureXenobioticMicrobiologicPharmacologicBiologic
Clinical Outcome
PathwayBiomaterial
Process_InProcess_outMolecule_inMolecule_outAction
binddissociateactivateinhibitconvert
PathMaker uses ontologies of PathMaker uses ontologies of specific biological domain specific biological domain knowledge to model the knowledge to model the effects of environment, genetic effects of environment, genetic variation, and therapy on variation, and therapy on oncogenic molecular pathways oncogenic molecular pathways and disease processes.and disease processes.
MoleculeGenefeature
promotercis-element
producttranscript
splice formprotein
domainpolymorphism
snpins / del
Chromosomestructuredamage state
Chemicalstructurereaction
Molecular Complexcompositionmodification
state
Human-Mouse Comparative Genomics Analysis of OGG1 for Coding and Non-Coding Regulatory Region Conservation
Genomic Sequence with Exons (red)% sequence
identity
conserved Cis-element density between human and mouse
Regions of sequence similarity between human and mouse
conserved cis-elements in 2nd intron of Ogg1
* Disease Implication
Esophageal cancer (Xing et al 2001)
Lung cancer (Sugimura et al 1999)
Prostate cancer (Xu et al 2002)
Stomach cancer (Hanaoka et al 2001)
Posit. Allele Freq #Chrom.
nonsyn Arg 229 Gln 15804 G/A 0.03 840nonsyn Ala 288 Val 17566 C/T 0.01 180nonsyn Asp 322 Asn 18056 G/A 0.01 154nonsyn Ser 326 Cys 18069 C/G 0.29 756
Implication
*
OGG1: Mapping Non-Synonymous SNPs onto Conserved Protein Domains & 3D Structure
OGG1 Peptide Sequence 345 aa
Protein domain-pfam00730, HhH-GPD superfamily base excision DNA repair protein
Ala 288 Val
Arg 229 Gln Asp 322Asn
Clinical/Experimental-GeneChip DataSetClinical/Experimental-GeneChip DataSet
Questions: what are the relevant patterns for disease/biology?Questions: what are the relevant patterns for disease/biology?
Controls
Poly-Articular JRA Course
...
...
1111
3.3,
Pau
ci,
1_P
auci
, M
TX
0 ,
XR
_unk
now
n
878,
Pau
ci,
2_P
oly
, M
TX
1 ,
XR
_ero
sion
s
845,
Spo
nd,
JAS
, M
TX
0 ,
XR
_spa
ce n
arro
win
g 18
057,
Spo
nd,
JAS
, M
TX
0 ,
XR
_scl
eros
is
1803
6, P
auci
, 1_
Pau
ci ,
MT
X 0
, X
R_n
orm
al
7029
, P
auci
, 1_
Pau
ci ,
MT
X 0
, X
R_n
orm
al
894,
Pau
ci,
2_P
oly
, M
TX
1 ,
XR
_nor
mal
831,
Pol
y, 2
_Pol
y ,
MT
X 1
, X
R_e
rosi
ons
850,
Pol
y, 2
_Pol
y ,
MT
X 1
, X
R_e
rosi
ons
872,
Pol
y, 2
_Pol
y ,
MT
X 1
, X
R_s
pace
nar
row
ing
1073
, S
yst,
2_P
oly
, M
TX
0 ,
XR
_spa
ce n
arro
win
g
1087
PB
, P
oly,
2_P
oly
, M
TX
1 ,
XR
_spa
ce n
arro
win
g
9272
, P
oly,
2_P
oly
, M
TX
1 ,
XR
_unk
now
n
1081
, P
oly,
2_P
oly
, M
TX
0 ,
XR
_spa
ce n
arro
win
g 91
2, S
yst,
2_P
oly
, M
TX
1 ,
XR
_spa
ce n
arro
win
g
19,
Pau
ci,
2_P
oly
, M
TX
1 ,
XR
_spa
ce n
arro
win
g
9137
, P
oly,
2_P
oly
, M
TX
0 ,
XR
_spa
ce n
arro
win
g
993,
Sys
t, 2
_Pol
y ,
MT
X 1
, X
R_s
pace
nar
row
ing
8003
, P
auci
, 1_
Pau
ci ,
MT
X 0
, X
R_u
nkno
wn
7177
, S
pond
, JS
PA
, M
TX
1 ,
XR
_spa
ce n
arro
win
g
9161
, P
auci
, 1_
Pau
ci ,
MT
X 1
, X
R_n
orm
al
976,
Pau
ci,
1_P
auci
, M
TX
1 ,
XR
_nor
mal
1083
, C
ontr
ol,
na ,
MT
X 0
, X
R_n
a
7145
, P
auci
, 1_
Pau
ci ,
MT
X 0
, X
R_n
orm
al
7206
, P
auci
, 1_
Pau
ci ,
MT
X 0
, X
R_u
nkno
wn
817,
Pau
ci,
1_P
auci
, M
TX
0 ,
XR
_spa
ce n
arro
win
g
1082
, C
ontr
ol,
na ,
MT
X 0
, X
R_n
a
1085
, C
ontr
ol,
na ,
MT
X 0
, X
R_n
a
1089
, C
ontr
ol,
na ,
MT
X 0
, X
R_n
a
1095
, C
ontr
ol,
na ,
MT
X 0
, X
R_n
a
7149
.3,
Con
trol
, na
, M
TX
0 ,
XR
_na
801,
Pau
ci,
2_P
oly
, M
TX
1 ,
XR
_ero
sion
s 10
87ct
rl, C
ontr
ol,
na ,
MT
X 0
, X
R_n
a
9245
, S
pond
, JS
PA
, M
TX
0 ,
XR
_spa
ce n
arro
win
g
824,
Pol
y, 2
_Pol
y ,
MT
X 0
, X
R_n
orm
al
9264
, P
auci
, 1_
Pau
ci ,
MT
X 0
, X
R_e
rosi
ons
7118
.3,
Con
trol
, na
, M
TX
0 ,
XR
_na
1804
2, S
pond
, JA
S ,
MT
X 0
, X
R_s
cler
osis
7021
.31,
Con
trol
, na
, M
TX
0 ,
XR
_na
1084
, C
ontr
ol,
na ,
MT
X 0
, X
R_n
a
813.
3, C
ontr
ol,
na ,
MT
X 0
, X
R_n
a
7108
, S
yst,
3_S
yste
mic
, M
TX
1 ,
XR
_nor
mal
9150
, S
pond
, JA
S ,
MT
X 0
, X
R_s
cler
osis
7113
.3,
Con
trol
, na
, M
TX
0 ,
XR
_na
242 genes
105 Genes with Significantly Lower
Expression InPolyArticular
JRA
137 Genes with Significantly
Higher Expression In
PolyArticularJRA
Individual:Individuals (33 patients + 12 controls)
Recommended