Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Experimental Approaches in Nutrigenetics #2:
Studying the Effects of Phytochemicals on Human Health and Disease
Cory Brouwer, Ph.D.
Director Bioinformatics Services and Professor of Bioinformatics and Genomics
2
Outline
bioservices.uncc.edu
Nutrigenetic discoveries
How they were made
Bioinformatics in Nutrigenetics/NutrigenomicsP2EP KnowledgebaseVisualization
3
1-Carbon Metabolism
Produce the building blocks of DNA and RNA
S-adenosylmethionine (SAM) – methyl group donor
Vitamins B12, B6 and folic acid co-enzymes
Amino acid methionine
Choline metabolism
Possible links to Cognitive function
CVD
Cancer
Etc.
Fredrickson et al. HUMAN MUTATION (2007) 28(9), 856-865
4Jay JJ, Brouwer C. (2016) Lollipops in the clinic: information dense mutation plots for precision medicine. PLoS ONE 11:8.
1-Carbon Metabolism
Fredriksen et al. (2007) Hum. Mutat. 28(9) 856-865.
5
Finding Variation
>2,000 variants discovered through GWAS
Usually require a minor allele frequency > 5%
Typically produce Manhattan Plots
GWAS Manhattan plot showing differences in allele
frequency between responders and nonresponders to
n-3 PUFA supplementation.
Iwona Rudkowska et al. J. Lipid Res. 2014;55:1245-1253
6 bioservices.uncc.edu
NHGRI GWA Catalog
www.genome.gov/GWAStudies
www.ebi.ac.uk/fgpt/gwas/
NHGRI GWA Catalog www.genome.gov/GWAStudies www.ebi.ac.uk/fgpt/gwas/
Published Genome-Wide Associations through 12/2013 Published GWA at p≤5X10-8 for 17 trait categories
7
However…
bioservices.uncc.edu
Even large GWAS typically only explain a fraction of heritability
Type 2 diabetes >150,000 individuals – 11% of heritability (Morris et al. Nat. Genet. 2012;44:981–990)
Crohn Disease >210,000 individuals – 23% of heritability(Franke et al. Nat. Genet. 2010;42:1118–1125)
So what other approaches can you take?
8 bioservices.uncc.edu
http://www.koonec.com/k-blog/2010/06/22/how-to-select-candidate-genes-for-your-association-study/
or RNA-seq
Candidate Gene Approach
9
1. Bioinformatics Services
2. Bioinformatics Research
3. Student Training (Ph.D., Masters, Undergrad.)
UNCC Bioinformatics at NCRC
10
Bioinformatics Services Division
Majority Ph.D. level research staff
Specializations in:
Full time high performance computing expert
Full time office manager and administrative staff
Student interns
NGS (DNA-seq, RNA-seq, reference and de novo)biostatisticsgene expressionmetabolomics
metagenomicsvariant analysispathway analysisdata integrationtext mining
11
Bioinformatics Services Division
1000+ core Linux cluster
8 GB RAM per core
1 PB Compellent Storage Solution
Multiple High-Memory servers,2 with 2 TB RAM, 1 with 4 TB RAM
Multiple Mac, Linux and Windows workstations
High Performance Computing
12
Software Resources
Ingenuity IPA & GeneGo Metacore, an integrated knowledge database and software suite for pathway analysis of experimental data and gene lists.
Linguamatics i2e, text mining software that enables the extraction of relevant facts and relationships from Medline and AGRICOLA.
CLC Bio, which provides some of the newest algorithms and analyses available for analyses of genomic, transcriptomic and epigenomic NGS data.
Geneious Pro, a software platform that is able to search, organize and analyze genomic and protein information.
Umetrics SIMCA-P+, software is used for the process of designing experiments and also for multivariate data analysis.
SAS program package for statistical analyses.
Numerous open-source software packages as well as in-house developed applications, scripts and workflows
13
Omics
GenomicsGWAS, WGS, WES, etc.
EpigenomicsMethyl-seq
TranscriptomicsRNA-seq
ProteomicsProtein-Protein interactions
MetabolomicsMass Spec, NMR
14
‘Omics
TranscriptionalProfiling
ProteomicsMetabonomics
Functional Screens
RNAi
Genetics
Pathways
PPi Networks
Literature
Bringing Information Together
Knowledge of human health and nutrition
Disease
15
Traditional Bioinformatics
bioservices.uncc.edu
BLAST Search
ClustalW Alignment
Protein Domains
16
Traditional Bioinformatics
bioservices.uncc.edu
Command-line Programming
17
Knowledge-Based Bioinformatics
Text Mining
Ontologies
Knowledge bases
Network analysis
bioservices.uncc.edu
18
Growth of PubMed 1986-2010
Zhiyong Lu Database 2011;2011:baq036bioservices.uncc.edu
19
Text mining concepts
Information Retrieval Define relevant literature
Entity RecognitionIdentifying agricultural and biomedical entities
Information ExtractionFormatting of typified information
20
Results
P2EP Knowledge Base – Hypothesis Generation
OMIM
Homologene
NCBI Taxonomy
AGRICOLA
PubMed
NCBI Gene
BioCyc
Allen Brain Atlas
Gene Expression(GEO)
Gene Ontology
Mammalian Phenotype Ontology
Human Phenotype Ontology
Reactome
ChEBI
P2EP Internal data
SequenceMarkers
Curated Gene ListsCurated LiteratureCurated Pathways
P2EP-KB
Data Sources
Queries
21
P2EP Knowledge-base Statistics
bioservices.uncc.edu
Node Source Unique Count Description
ncbi.gene 14,314,272 All NCBI Gene identifiers
ncbi.taxonomy 1,113,614 Everyone entry in NCBI Taxonomy tree
ncbi.pubmed 1,075,534 PubMed records with Gene Annotations
chebi 41,824 ChEBI nodes (full tree)
gene_ontology 40,959 Gene Ontology nodes (full tree)
tair 33,334 TAIR Gene Identifiers
biocyc.gene 29,770 BioCyc Gene Identifiers
biocyc.enzyme_reaction 27,408 BioCyc Enzyme-specific Reaction Identifiers
ensembl 20,056 Ensembl Identifiers referenced in NCBI
biocyc.protein 18,271 BioCyc Protein Identifiers
hpo 10,491 Human Phenotype Ontology terms
mpo 10,244 Mammalian Phenotype Ontology terms
biocyc.reaction 4,627 BioCyc Reaction Identifiers
mim 3,995 OMIM Identifiers
biocyc.compound 3,444 BioCyc Compound Identifiers
biocyc.regulation 1,383 BioCyc Regulation Identifiers
plant_trait_ontology 1,327 Plant Trait Ontology Identifiers
enzyme_commission 1,191 EC Numbers
biocyc.pathway 563 BioCyc Pathway Identifiers
mirbase 425 miRBase Identifiers
unit_ontology 330 Unit Ontology Identifiers
plant_ontology 193 Plant Ontology Identifiers
433,383,352 Linked Pairs
22
Knowledge-base Workflow
bioservices.uncc.edu
Ontologies• MeSH• NALT• ChEBI• Entrez
Gene• KEGG
• Text Mining/NLP• Ontology based• Query development• Co-occurrence (6,7)
Agricola
Graph Database
JSON(Mining Results)
CSV(Intermediate
Files)
EXTRACT data (Python)
23
Literature Based Discovery
“Literature Based Discovery (LBD) refers to a particular type of text mining that seeks to identify nontrivial assertions that are implicit, and not explicitly stated, and that are detected by juxtaposing (generally a large body of) documents”
-Neil R. Smalheiser
24
ABC Model of Discovery
Concept A:Dietary Fish Oils
Concept C:Raynaud’s Syndrome
Concept B: Blood Viscosity
Concept B: Platelet Aggregation
Concept B: Vascular Reactivity
Figure adapted from Figure 1 of:Smalheiser, Neil R., Vetle I. Torvik, and Wei Zhou. "Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in MEDLINE." Computer methods and programs in biomedicine 94.2 (2009): 190-197.
25
Cancer Treatment
bioservices.uncc.edu
We know of many food constituents that have anti-cancer properties,” says Dr. Steven Zeisel, director of the
University of North Carolina’s Nutrition Research Institute … Garlic, broccoli, green tea and turmeric, for
example, have been shown to fight cancer through extensive good research, he says. “But we do not know
precisely which mixture of these constituents works best.”
26
Cancer pathways
bioservices.uncc.edu
Fruman & Rommel Nature Reviews Drug
Discovery 13, 140–156 (2014)
PI3K–AKT–mTOR signaling network
27
Broccoli to Cancer
bioservices.uncc.edu
Brassica oleracea
Hydroxycinnamicacid
PIK3CB
PIK3CD
PIK3CG
PIK3CA
has
Thyroid Cancer
Endometrial Cancer
Prostate Cancer
Cervical Cancer
Neuro-blastoma
Breast Cancer
Colorectal Cancer
Plant
Phytochemical
Drug target
Disease
Key
28
Broccoli to Cancer
bioservices.uncc.edu
Brassica oleracea
Hydroxycinnamicacid
PIK3CB
PIK3CD
PIK3CG
PIK3CA
Thyroid Cancer
Endometrial Cancer
Prostate Cancer
Cervical Cancer
hasNeuro-
blastoma
Breast Cancer
Colorectal Cancer
Plant
Phytochemical
Drug target
Disease
Key
Kaempferol
29
P2EP Knowledgebase – Web Interface
bioservices.uncc.edu
30
P2EP Knowledgebase
bioservices.uncc.edu
Plant Genetic Markers
Plant Genomic Sequence
Plant Genes Plant Pathways
Plant Phytochemicals
& nutrients
Human genesHuman
Pathways
Human Health & Disease
31
Plant Pathway Elucidation Project (P2EP)
Scientific DiscoveryWhat plants make
How it makes them
What is the effect on human health?
Educational OpportunityStudents training students (Ph.D.s -> undergrads)
Research opportunities for undergrads
Knowledgebase Creation Assemble what is known about plant metabolic pathways
Capture what is discovered
bioservices.uncc.edu
32
1-Carbon Metabolism
33 bioservices.uncc.eduhttps://github.com/pbnjay/lollipops
34 bioservices.uncc.edu
35
36 bioservices.uncc.edu
37
Pathview – pathview.uncc.edu
bioservices.uncc.edu
RNA-seq results
Metabolomics
Proteomics
Anything mapped togenes or metabolites
38
Software Developed by BiSD
LinkageMapViewExome Project Reports& Lollipops
39
Acknowledgements
bioservices.uncc.edu
Richard Linchangco
Jeremy Jay
Rob Reid
Weijun Luo
Bioinformatics Services Group
P2EP Students and PIs