Upload
human-variome-project
View
107
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Background: The major bottleneck in genome sequencing is no longer data generation, but the computational challenges around data analysis, display and integration. New approaches and methods are, therefore, required to meet these challenges. Visual analytics is the representation and presentation of data that exploits human visual perception abilities in order to amplify cognition. Opportunities exist for African researchers to expand the use of visual discovery tools and curated datasets to enable visual discovery (exploration, mining and analysis via interactive visual interfaces) of bioinformatics results from high-quality genomics research. Methods: We are developing a system of visual analytics resources that are based on molecular and clinical data including molecular consequences of single nucleotide variants; the RNA-seq expression levels of transcripts; and the functional sites in protein sequences. Results: We have developed an initial set of visual analytics resources with the use case as the major intrinsic protein family of water and glycerol transporters. Members of these protein family have been implicated in diverse cardiometabolic diseases. The computational resources developed can be adapted for gene lists including those obtained from high-throughput assays. The long-term goal of the project is to empower researchers to make discoveries from largescale molecular and clinical datasets to support decision-making on genetic and environmental determinants of cardiometabolic diseases in Africa.
Citation preview
Presenter: Oyekanmi Nash, PhDNode Principal Investigator,
H3Africa Bioinformatics Network Node at National Biotechnology Development Agency (NABDA)
Abuja, Nigeria
Visual Analytical Screening System for Disease Linked Gene Variants
Visual Discovery Tools
Cardiometabolic Diseases
NIH/WT H3Africa Research Network
H3Africa/HVP: Leveraging Potentials
• NIH/WT-H3Africa • Collaborative Centers : 8• Research Projects: 9• BioBanks : 4• Bioinformatics Network : 1
Map of Africa showing the distribution of nodes in the H3ABioNet network
H3Africa: Bioinformatics Network
• H3ABioNet: a sustainable African Bioinformatics Network for H3Africa
The network provide:• computational infrastructure and hardware, • human resources, • tools and computational solutions for genomic and population-based research,
and • communications among African researchers and other interested parties.These aims are be achieved by:• providing user support, • training and capacity development, • research and tools development, and • outreach and communication.
ORGANIZATION OF THE HVP Nigeria Node
ICCAC Country Representative : Prof. Oyekanmi Nash, Alternate Representative: Hadiza Rasheed-Jada
Reports directly to the DG/CEO, NABDA/FMST
ORGANIZATION OF THE HVP NIGERIA NODE IIThe staff members of the Node include:• Alternate Representative - Hadiza Rasheed-Jada• Node Manager - Atinuke Hassan• Systems Administrator - Adekunle Farouk• Research Associates - Abimbola Kashim
- Deborah Fasesan - Taoheed Abdulkareem - Ayodele Fakoya - Adijat Ozohu Jimoh • Post-doctoral Researcher - Dr. Segun Fatumo
Institutional and Researchers Affiliation to the Node will drive the activities of the Node.
Background – Cardiometabolic Diseases• Worldwide cardiometabolic diseases are the major causes of:
• Disability; Rising Healthcare Costs and Deaths• Examples:
• Type 2 diabetes, hypertension, dyslipidemia, coronary heart disease and chronic kidney disease
• Over the next 7 years • Africa is projected to experience the largest increase in
death rates from cardiovascular disease, cancer, respiratory disease and diabetes (Aikins et al., 2010)
Noncommunicable Diseases AFR - 2015 AFR - 2030 Fold ChangeDiabetes mellitus 205,378.79 390,614.91 1.90Malignant neoplasms 521,029.65 966,876.53 1.86Other neoplasms 20,155.67 37,375.03 1.85Cardiovascular diseases 1,179,320.20 1,966,212.66 1.67Respiratory diseases 234,649.72 356,651.78 1.52
Source: Global Health Estimates (GHE) 2013: Deaths by age, sex and cause
A Strategy in Africa to Address Burden of Cardiometabolic Diseases
• Genomic and Environmental Determinants (H3Africa Projects)
• H3Africa Kidney Disease Research Network• Genomic and environmental risk factors for
cardiometabolic disease in Africans• Burden, spectrum and etiology of type 2
diabetes in sub-Saharan Africa• …..
Examples of Projected Massive and Complex Datasets from H3Africa Projects (2013….
Type 2 Diabetes Project• 12,000 Cases and 12,000 Controls • Sequencing of known T2DM regions • Genome-wide genotyping arrays• Whole exome/genome sequencing
Body Composition Project• African genome structure• Phenotyping and sampling for Cohorts • Genetic and environmental contribution to
body composition (~12,000 individuals)
These research investigations rely significantly on bioinformatics analysis and inferences from large and heterogeneous datasets
obtained from populations inside and outside Africa.
DATA SCIENCE
• Data Flow• Data Curation• Data Analysis
“The major bottleneck in genome sequencing is no longer data generation—the computational challenges around data analysis, display and integration are now rate limiting. New approaches and methods are required to meet these challenges”.
National Human Genome Research Institute Strategic Plan:Charting a course for genomic medicine from base pairs to bedside http://www.genome.gov/Pages/About/Planning/2011NHGRIStrategicPlan.pdf
Making Discoveries from the Massive and Complex Genomics Datasets and Bioinformatics Results
from H3Africa Projects
Visual Discovery Tools
Visual Discovery Tasks• Exploration • Mining • Analysis
To access and analyze data visually at the speed of thought with
minimal or no IT assistance and then share the results of their
discoveries with colleagues, usually in the form of an interactive
dashboard
Benefits• Data sharing • Collaboration• Easy to Deploy• Research in Limited or No Internet
Access
What is Visual Analytics?
http://www.slideshare.net/TableauSoftware/visual-analytics-best-practices
“Visual analytics is the representation and presentation of data that exploits our visual perception abilities in order to amplify cognition.”
- Andy Kirk, author of “Data Visualization: a successful design process”
Knowledge-Building Insights from Visual Analytics
http://www.flickr.com/photos/pnnl/6310387725/
Visual Interfaces
Examples of Visual Analytics Softwarehttp://www.vacommunity.org/Education+Resources
Toolkits
Analytic Tools
Jigsaw
Types of Visual Discovery Tools
H3ABioNet Workshop: Visual Analytics of Human Genomics Variation Datasets
July 2013Opportunities exist for African researchers to expand the use of visual
discovery tools and curated datasets to enable visual discovery (exploration, mining and analysis via interactive visual interfaces) of
bioinformatics results from high-quality genomics research.
Long-Term Goal of Project• Visual Analytical System for
• discovery of molecular consequences of variants and linked transcript expression for sets of genes or gene families
http://www.ensembl.org/info/genome/variation/predicted_data.html
Molecular Consequences of Gene Variants
Transcripts
Research Approach Obtain Datasets
Ensembl Genome Browser (www.ensembl.org) BioMart for genes and variants
Database of Alternate Transcript Expression Data Download for transcript expression values
Data Cleaning and Preparation Scripting and Spreadsheets
Construct Views and Dashboards To address scientific questions such as:
Identify molecular consequences of gene variants (Single Nucleotide Variants) in specific disease or trait.
Identify gene variants that result in multiple molecular consequences in gene transcripts.
Identify gene variant specific for transcript Compare RNA-Seq expression values for gene transcripts in
tissues.
Use Case – Gene Families AQUAPORIN – Water and glycerol transporter
13 Mammalian Aquaporins (AQP0-AQP12). Malfunction or absence linked to disease. Adipose AQP7 deficiency is associated with an increase of
intracellular glycerol content. Up-regulation of AQP1 in the glomeruli of most diseased
kidneys.
Reference: Hibuse et al. (2005). Aquaporin 7 deficiency is associated with development of obesity through activation of adipose glycerol kinase. Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):10993-8. http://www.ncbi.nlm.nih.gov/pubmed/16009937
Molecular Consequences of Single Nucleotide Variants of Aquaporin Genes - Overview
Consequence to transcript
Ensembl Gene ID / Approved Symbol
ENSG00000135517
MIP
ENSG00000240583
AQP1
ENSG00000167580
AQP2
ENSG00000165272
AQP3
ENSG00000171885
AQP4
ENSG00000161798
AQP5
ENSG00000086159
AQP6
ENSG00000165269
AQP7
ENSG00000103375
AQP8
ENSG00000103569
AQP9
ENSG00000143595
AQP10
ENSG00000178301
AQP11
ENSG00000184945
AQP12A
ENSG00000185176
AQP12B
3_prime_UTR_variant5_prime_UTR_variantcoding_sequence_variantdownstream_gene_variantfeature_elongationfeature_truncationframeshift_variantinframe_deletioninframe_insertioninitiator_codon_variantintron_variantmissense_variantnc_transcript_variantNMD_transcript_variantnon_coding_exon_variantsplice_acceptor_variantsplice_donor_variantsplice_region_variantstop_gainedstop_loststop_retained_variantsynonymous_variantupstream_gene_variant 270
11
2
347
35526
369
31829
28815
375
87584
611
30
295130
2,52610
29
1471048244622
182
39088
178
69
504
245471
12219
55231718
205
58645
1126
10
761
1,36245
861
715919
800
360
1,33499
41
104172256168941
45883
9985732
136
30816
17
52
9239
151
1
1829
298
61
68647
114
275
42778
49512
51464
433
6820
2,108222
214
1111215
4755,875
2
37344455
1,99124
120259
29950
47
60538
1
64434
229
644
60973
321
326
252173
5,925
106295950
40155
33850
10
354
13373
223
54
71199
231
22815
22
31
42933
743
5648
197
1314
70139
46
217
3582
128
510
4445
16
78540
36
73961175
338
12
234
723
2255
Visual Analytical System for Screening Disease Linked Gene Variants Integrates data from ENSEMBL and Database of Alternate Transcript Expression (DBATE)
Data
Sou
rces Blending of Data Dimensions from multiple Data Sources
Identifies Variants linked to TranscriptsInsights: rs199936776 is unique to AQP7-004 and could affect expression of transcript or properties of protein isoform
Identification of variants that could affect transcript expression in adipose tissues
Variation Name PolyPhen predictionAssociated TranscriptName Ensembl Transcript ID
0 50 100adipose_GSE12946
0 20 40adipose_GSE30611
rs2381002 probably damaging AQP7-001 ENST00000379507rs2381003 probably damaging AQP7-001 ENST00000379507rs2381004 possibly damaging AQP7-001 ENST00000379507rs62542743 probably damaging AQP7-001 ENST00000379507rs62542744 probably damaging AQP7-004 ENST00000447660
AQP7-001 ENST00000379507rs62542746 possibly damaging AQP7-204 ENST00000541274rs74668961 probably damaging AQP7-204 ENST00000541274rs114937176 probably damaging AQP7-204 ENST00000541274rs138290992 probably damaging AQP7-004 ENST00000447660
AQP7-001 ENST00000379507rs139024279 possibly damaging AQP7-004 ENST00000447660
AQP7-001 ENST00000379507probably damaging AQP7-004 ENST00000447660
AQP7-001 ENST00000379507rs140405277 probably damaging AQP7-001 ENST00000379507rs140411243 probably damaging AQP7-204 ENST00000541274rs199936776 possibly damaging AQP7-004 ENST00000447660TMP_ESP_9_33385689 probably damaging AQP7-004 ENST00000447660
AQP7-001 ENST00000379507
SummaryIn Africa, researchers will be able to use visual discovery tools to make
DISCOVERIES from large-scale molecular and clinical datasets to support decision-making on genetic and environmental determinants of cardiometabolic
diseases.
Visual Analytics can facilitate collaboration between Data Experts and Subject Matter Experts
Acknowledgments• H3Africa Bioinformatics Network (H3ABioNet)
– National Human Genome Research Institute – NIH Common Fund– Grant U41HG006941
• National Institutes of Health• Dr. Raphael Isokpehi, Bethune-Cookman University,
Florida, USA• National Biotechnology Development Agency, Federal
Ministry of Science and Technology, Nigeria• Visual Analytics in Biology Curriculum Network