77
Adam Stevens Research Fellow MANCHESTER ACADEMIC HEALTH SCIENCES CENTRE Kat Hollywood Senior Experimental Officer SYNBIOCHEM SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL RESEARCH MODULE 2: ‘ OMICS, SYSTEMS BIOLOGY & BIO INFORMATICS Manchester Institute of Biotechnology Discovery through innovation

SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Adam Stevens

Research Fellow

MANCHESTER ACADEMIC HEALTH SCIENCES CENTRE

Kat Hollywood

Senior Experimental Officer

SYNBIOCHEM

SOC-B CENTRE FOR DOCTORAL TRAINING IN

BIOSOCIAL RESEARCH

MODULE 2: ‘OMICS, SYSTEMS BIOLOGY & BIO

INFORMATICS

Manchester Institute of Biotechnology

Discovery through innovation

Page 2: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 2

INTRODUCTIONS

Page 3: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Overview

• What is biological information?

• Types of ‘omic data

• Basic analysis and tools

• Network analysis of ‘omic data

BIOSOCIAL CDT 13/02/2020 3

Page 4: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 4

WHAT IS BIOLOGICAL

INFORMATION?

Page 5: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Genotype vs. Phenotype

• Genotype

• The genetic makeup of an organism

• An organisms complete set of genes

• Instructions for building and maintaining

• Formation of proteins, regulation of metabolism

• Genetic traits

• Internally coded- not observed

• Copied during cell division & reproduction

• Phenotype

• Observable physical properties of an organism

• Appearance, development & behavior

Phenotype is determined

by an organisms

genotype and ALSO

environmental factors

Genes + Environment = PhenotypeBIOSOCIAL CDT 13/02/2020 5

Page 6: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Central Paradigm of Molecular Biology

Information

Flow/Process

Function

Page 7: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Central dogma of molecular biology

Proteomics

Metabolomics

Transcriptomics

GenomicsDNA

RNA

Protein

Metabolite

DNA Replication

Transciption

Translation

‘Om

ics

BIOSOCIAL CDT 13/02/2020 7

Page 8: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

DNA & Genomics

BIOSOCIAL CDT 13/02/2020 8https://courses.lumenlearning.com/microbiology/chapter/structure-and-function-of-dna/

• Deoxyribonucleic acid (DNA)

• Polymer of nucleotides

• Sequence of nucleotides is responsible for carrying and retaining

hereditary information in a cell – Base Sequence

• Double helix of complementary base pairs

• Nitrogenous base

• Adenine and Thymine

• Cytosine and Guanine

• Phosphate group

• Deoxyribose

Page 9: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

RNA & Transciptomics

• Ribonucleic acid (mRNA)

• Result of TRANSCRIPTION

• Information encoded with the DNA sequence of one or more genes

is TRANSCRIBED into a strand of RNA –RNA transcript

• Single stranded

• A,G,C,U (T)

BIOSOCIAL CDT 13/02/2020 9

Translation: DNA to mRNA to Protein

By: Suzanne Clancy, Ph.D. & William Brown, Ph.D. (Write Science Right) © 2008 Nature Education

Citation: Clancy, S. & Brown, W. (2008) Translation: DNA to mRNA to Protein. Nature Education 1(1):101

Page 10: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

• Play key roles within all cells

• Enzymes

• Transport molecules

• Structural

• Polymers

• Long chains of amino acids joined together by peptide bonds

• Range of sizes & shapes

• Primary, secondary, tertiary & quaternary

Proteins & Proteomics

BIOSOCIAL CDT 13/02/2020 10

Page 11: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Metabolites & Metabolomics

• Low molecular weight products/components of metabolism

• Metabolism is a complex interplay of chemical reactions

that occur within cells

• Diverse roles

• Energy metabolism

• Amino acid metabolism

• Lipid metabolism

• Primary metabolism –directly involved in growth,

development

• Secondary metabolism – end products of metabolism

• Antibiotics, steroids etc

BIOSOCIAL CDT 13/02/2020 11

Page 12: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Central dogma of molecular biology

Proteomics

Metabolomics

Transcriptomics

GenomicsDNA

RNA

Protein

Metabolite

BIOSOCIAL CDT 13/02/2020 12

Genotype

Phenotype

Page 13: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

What is ‘Omic Data

Omic data sets include:

• Genetics (SNPs, CNVs)

• Transcriptomics (Affymetrix, RNAseq)

• Epigenomics (DNA methylation, histone mods)

• ChIPseq

• Metabolomics

• Proteomics

• Phosphoproteomics

All have specific quality control (QC) issues and difficulties in analysis

All rely on the use of a false discovery rate correction (FDR) for analysis

Multi-omic analysis is the integration of different omic data sets

Fixed

Gene:Environment

BIOSOCIAL CDT 13/02/2020 13

Page 14: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 14

TYPES OF ‘OMICS

DATA?

Page 15: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Why ‘omics?

• Adopts an holistic view of all ‘molecules’ that make up a

cell, tissue or organism

• Universal approach/Hypothesis-generating

• No analysis bias

• Many applications

• ‘BIOMARKER’ discovery

• Early detection/population screening

• Increasing understanding of disease aetiology

• Drug discovery & toxicity and efficacy screens

BIOSOCIAL CDT 13/02/2020 15

Top Down

Bottom Up

Page 16: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Biomarkers of diseaseIn medicine, a biomarker is a measurable indicator of the severity or presence of

some disease state. More generally a biomarker is anything that can be used as an

indicator of a particular disease state or some other physiological state of an organism.

BIOSOCIAL CDT 13/02/2020 16

Page 17: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Experimental Design

BIOSOCIAL CDT 13/02/2020 17

• Case vs. Control (an appropriate one!)

• Samples numbers?

• How many is sufficient?

• Test cohort + validation cohort?

• Quality of samples

• Collection/storage/processing

• Confounding factors?

• Age, gender, environment

Page 18: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Is this good experimental design?

• Metabolomics investigation of liver failure from plasma

samples

BIOSOCIAL CDT 13/02/2020 18

Page 19: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Or what about this one?

Generic approach to cancer classification

based on gene expression monitoring by

DNA microarrays applied to human acute

leukemias

38 Affymetrix microarrays with 6,817

probes

27 from childhood acute lymphoblastic

leukemia

11 from adult acute myeloid leukemia

BIOSOCIAL CDT 13/02/2020 19

Page 20: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Or this?

• Pre-eclampsia - Pregnancy-induced hypertension which may affect mother and foetus

BIOSOCIAL CDT 13/02/2020 20

Page 21: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Genomics

• Genome = total DNA of a cell or an organism

• Human genome = 3.2 billion bases and estimated >

30,000 protein coding genes

• Seeking mutations or alterations that may contribute

towards a certain disease!• i.e. the genes BRCA1 and BRCA2 cause 60% of all cases of hereditary breast and ovarian

cancers

• BUT not a single mutation- there are >800 different mutations in BRCA1 alone

TOOLS: Whole Genome Sequencing

BIOSOCIAL CDT 13/02/2020 21

https://theanalyticalscientist.com/fields-applications/the-tools-behind-genomics

Page 22: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Sequencing

• Determining the complete DNA sequence of an organisms

genome at a single time.

BIOSOCIAL CDT 13/02/2020 22https://www.cdc.gov/pulsenet/pdf/Genome-Sequencing-508c.pdf

Page 23: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Cost of Sequencing

BIOSOCIAL CDT 13/02/2020 23

PacBio Sequencer

£300K

Single-Molecule, Real-Time (SMRT) technology

www.pacb.com

Page 24: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Transcriptomics

• Measuring the transcriptome

• All the mRNA that is transcribed at a given point in a given cell or

organism

• Provides direct knowledge of gene regulation and protein

content information

• Which genes are actively expressed

• Began in 1990s and is now a widespread discipline

• Many technological advances

• Now a routine, ‘simple’ process

• TOOLS: Microarrays & RNA-Seq

BIOSOCIAL CDT 13/02/2020 24

Page 25: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

DNA Microarrays

http://www.pbs.org/wgbh/nova/education/activities/3413_genes_02.htmlBIOSOCIAL CDT 13/02/2020 25

Page 26: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

DNA Microarrays

Converted to

numerical

values for

interpretation

BIOSOCIAL CDT 13/02/2020 26

Limitation: Results

limited to what

probes you have on

your chip

Page 27: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

RNA Seq• High throughput sequencing with computational methods

• No reference sequence

• Large dynamic range

• Sequences every RNA molecule and profiles the

expression of a particular gene by counting the number of

times its transcript has been sequenced

• Expression levels!

SAMPLE RNA

RNA

FRAGMENTSREADS

FragmentationReverse transcription

& amplification

Sequencing

Machine

cDNA

FRAGMENTS

BIOSOCIAL CDT 13/02/2020 27

Page 28: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Proteomics

• Investigating the entire complement of proteins within a

cell, tissue or organism.

• >100,000 proteins

• Large dynamic range

• Some questions

• When & where proteins are expressed

• The involvement of proteins with particular phenotypes

• How proteins are modified or how they interact with each other

• Blood, urine, tissues

BIOSOCIAL CDT 13/02/2020 28

Page 29: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Proteomic Tools

• Mass spectrometry

BIOSOCIAL CDT 13/02/2020 29http://planetorbitrap.com/data/fe/image/Workflows_TopDownProt(1).png

Page 30: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Proteomic Tools

• Gels

BIOSOCIAL CDT 13/02/2020 30

https://www.creative-proteomics.com/services/2d-electrophoresis.htm

Page 31: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Proteomics Data

id Protein

ID

1 2 3 4 5 6 7 8 9 10 11

1 x

2 y

3 z

4

5

6

7

8

9

Sample Names – often 10’s with different classes

Dete

cte

d P

rote

ins -

100’s

Abundance of protein ?

in each sample

Do statistical analysis to decide which features change between classes

BIOSOCIAL CDT 13/02/2020 31

Page 32: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Metabolomics• Investigating the entire complement of low molecular

weight molecules in an sample of interest.

• ~ 5000 metabolites in a biological sample from a human

• GREATLY impacted by environment!

• GREATLY impacted by time!

• Experimental design & sample collection is VERY

important!

BIOSOCIAL CDT 13/02/2020 32

Page 33: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Goodacre, R. (2007) Metabolomics of a superorganism. Journal of Nutrition 137, 259S-266S.

Page 34: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Biological Matrices

Primary

Sources

Serum

Plasma

CSF

BAL

Saliva

Semen

Urine

Faeces

Sweat

Breath

Secondary

Sources

Tissue Biopsies

Brain

Nerve

Lung

Pancreas

Liver

Heart

Gut

Skin

Additional

Sources

Animal Models

Mammalian Cell Culture

IVF culture medium

BIOSOCIAL CDT 13/02/2020 34

Page 35: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Chromatography linked to Mass Spectrometry

Higher affinity to

stationary phase

Lower affinity to

stationary phase

DETECTOR

lactate

monosaccharides

disaccharides

Amino and organic acids

GC-MS

Yeast footprint

70-140 metabolite peaks

Serum: GC-MS or LC-MS Urine: GC-MS or LC-MS

BIOSOCIAL CDT 13/02/2020 35

Page 36: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

GC-MS (electron impact)

Much greater degree of

fragmentation as higher

energy ionisation process

Hippuric acid UHPLC-MS (electrospray (ESI))

Soft ionisation technique, intact

parent mass ion detected, but many

adducts can be produced

m/z=179.17

Matching of the chromatographic retention time and fragmentation mass spectra

between a sample analyte and a reference standard is required for definitive id.

We have ca. 1600 analytes in our GC-MS library

Higher affinity to

stationary phase

Lower affinity to

stationary phase

DETECTOR

Sumner L.W. et al. (2007) Metabolomics 3, 211-221BIOSOCIAL CDT 13/02/2020 36

Chromatography linked to Mass Spectrometry

Page 37: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Metabolomics Data

id RT

(min)

m/z ID 1 2 3 4 5 6 7 8 9 10 11

1

2

3

4

5

6

7

8

9

BIOSOCIAL CDT 13/02/2020 37

Sample Names – often 100’s with different

classes

Dete

cte

d F

eatu

res-

1000’s

Use RT & m/z to identify the metabolite – MAJOR CHALLENGE!

Do statistical analysis to decide which features change between classes

Page 38: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Common Theme

Data ....... & lots of it!

BIOSOCIAL CDT 13/02/2020 38Nix, H., A national geographic information system - an achievable objective?, in Keynote

address. 1990: AURISA.

“Data does not equal information;

information does not equal knowledge;

and, most importantly of all, knowledge

does not equal wisdom. We have oceans

of data, rivers of information, small puddles

of knowledge, and the odd drop of

wisdom.”

Page 39: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

The challenge of biomarkers?

• Poste, G., Bring on the biomarkers. Nature, 2011.

469(7329): p. 156-157.

• Multi-Centre, large cohort studies

BIOSOCIAL CDT 13/02/2020 39

Page 40: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Increasing

sample

numbers

and

throughput

Biomarker Discover: From lab to bedside

Conception (objectives, collaborations, design of experiment)

Set of candidate metabolites

Representative Cohort study

(n=1000s) with analysis of

candidates (LC-MS and/or

assay)

To the clinic

Hypothesis generation study 1

(independent sample set 1)

Hypothesis validation

(independent sample set 2)

BIOSOCIAL CDT 13/02/2020 40

Page 41: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 41

SOME EXAMPLES......

Page 42: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Personalised/Precision Medicine

• Is this the future of healthcare?

• 2015 State of the Union address President Obama announced that hewas launching the Precision Medicine Initiative

• Assess the genotype (SNPs) and phenotype (metabolome) of apatient before they undergo any treatment

• Population monitoring & data collation

• Seeking cures & preventative screening

• Offering a well-designed screening program at a reasonablecost may not always be possible due to the numerousassociated challenges

• monetary limitations (labour and consumable costs) as well as ethical,legal and social considerations for an opt-in test

BIOSOCIAL CDT 13/02/2020 42

Page 43: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

(Metabol)’omics for the masses?

• Wearable technologies – smartphones, smart-watches, health

bands, necklaces, glucose monitoring contact lenses• Innovations for collecting personal information

• mPower - mobile Parkinson’s Disease study that attempts to

research the occurrence, presentation and management of PD

symptoms via survey telemetry data using a smartphone app• Bot, B.M., et al., The mPower study, Parkinson disease mobile data collected using ResearchKit. Scientific Data, 2016. 3: p. 160011.

• Smart-phone app to monitor the association between pain and

the weather for people suffering from rheumatoid arthritis• Reade, S., et al., Cloudy with a Chance of Pain: Engagement and Subsequent Attrition of Daily Data Entry in a

Smartphone Pilot Study Tracking Weather, Disease Severity, and Physical Activity in Patients With Rheumatoid Arthritis.

JMIR Mhealth Uhealth, 2017. 5(3): p. e37.

• Dixon, W.G., et al. How the weather affects the pain of citizen scientists using a smartphone app. Npj Digital Medicine

(2019) 2:105

BIOSOCIAL CDT 13/02/2020 43

Page 44: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

44BIOSOCIAL CDT 13/02/2020

Page 45: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 45

The future cycle of metabolomics precision medicine-based research and healthcare where

academia, industrial partners, corporate data analytics work with patients’ wearable data collection

devices to provide health monitoring solutions.

Page 46: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

TWINS UK

BIOSOCIAL CDT 13/02/2020 46

Page 47: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

TWINS UK

• Comprehensive study

• Unique design with internal controls

• Well documented & controlled

• Wealth of scientific publications

BIOSOCIAL CDT 13/02/2020 47

Page 48: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

The super smeller

BIOSOCIAL CDT 13/02/2020 48

Page 49: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Biomarker detection

BIOSOCIAL CDT 13/02/2020 49

Gaining distance on biomarker

discovery......

Page 50: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Data Repositories

BIOSOCIAL CDT 13/02/2020 50

Page 51: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 51

Page 52: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 52

Page 53: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 53

Page 54: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 54

Page 55: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 55

Page 56: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Human Metabolome Database

BIOSOCIAL CDT 13/02/2020 56

Page 57: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Search Parameters

BIOSOCIAL CDT 13/02/2020 57

Page 58: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Standardisation• MSI formed in 2005 to unify and to engage with the growing

metabolomics community so that experiments can be

reproduced by others and are based on solid sample

collection, analysis and data processing. – Complete

Transparency!

• Working group now working on how to perform experimental

design better.

• Pre-requisite for publication in Metabolomics

Metabolite Identification

MSI Leve1 1 – Definitive

MSI Level 2 - Putative

BIOSOCIAL CDT 13/02/2020 58

Page 59: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 59

NETWORK ANALYSIS

Page 60: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Individual Omic Data Set Analysis

Lists and Cut-

offs!!

Age group 2 Vs Rest, ANOVA,

q<0.1 = 1524 probe-sets

What does a p-value threshold mean?

What does a Fold change cut off mean?

BIOSOCIAL CDT 13/02/2020 60

Page 61: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Fisher’s Exact Test /

Hypergeometric Test

Pick 5 marbles

P(2 are red)

Gene Ontology

BIOSOCIAL CDT 13/02/2020 61

Page 62: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Upstream Regulator Analysis

Enrichment of ‘causal transitive triangles’

Literature Driven analysis

BIOSOCIAL CDT 13/02/2020 62

Page 63: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

What is Network Biology

Biological networks are:

• “Scale free”

• Resistant to random error

• Exhibit “small world” properties

small-world network - most

nodes are not neighbours of

one another, but most nodes

can be reached from every

other by a small number of

steps.

typical distance L between two randomly chosen

nodes (the number of steps required) grows

proportionally to the logarithm of the number of

nodes N in the network

63

Page 64: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 64

Page 65: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Human Interactome.

Biogrid 3.1.88

Proteins = 14 334

Interactions = 65 710

Biogrid 3.2.105

Proteins = 18 107

Interactions = 217 927

BIOSOCIAL CDT 13/02/2020 65

Page 66: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

FIFA World Cup 2014

http://blog.physicsworld.com/2014/06/19/a-network-analysis-of-the-fifa-world-cup/

What is a network model

66

Page 67: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Network Topology Is Associated with Essential Function

Sun et al (2010), BMC Genomics 11 S5

Red = Cancer Genes, Black = Essential Genes, Grey = Control Genes

BIOSOCIAL CDT 13/02/2020 67

Page 68: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Network Clusters are associated with hierarchy of biological function

“overlapping modules”

Hierarchy of Functions and Pathways Core nodes of cluster

Cellular responses to stress, Apoptosis, Circadian clock, DNArepair, Metabolism of Proteins (q<1x10-3).G1/S DNA damage checkpoint (q<6.2x10-8)

SUMO2, UBC, MDM2, TP53, EEF1A1,PRKDC, RPS4X, SRRM2, RPS13, RPS20

Cell Cycle, Circadian Clock (q<1x10-5).Wnt & Prolactin signalling (q<1.0x10-5).

SKP1, BTRC, CUL1,GSK3B, CTNNB1, SKP2,NFKBIA, CLSPN, FBXW11, FBXO6

Apoptosis, Signal transduction (q<0.01).Wnt, PI3/AKT signalling (q<0.01).

GSK3B, AXIN1, APP, AKT1, MAPT,CTNNB1, ELAVL1, KIF5B, AXIN2, HIPK2

Transcriptional regulation of white adipocyte differentiation, Regulation of lipid metabolism by PPARalpha, Circadian Clock, Mitochondrial biogenesis (q<0.01).

NCOA1, NCOA6, ESR1, PPARG, MLL3,RXRA, ESR2, MLL4, NCOR2, VDR

Cellular Senescence, Epigenetic regulation of gene expression (q<1.0x10-4)

EZH2, EED, SUZ12, EZH1, JARID2, SON,SRSF7, NRF1, FBL, HDAC1

Cellular response to hypoxia (q<1x10-5).TGF-beta Signalling (q<0.05)

TCEB2, VHL, CUL5, TCEB1, CUL2, TCEB3,ASB9, STK16, NEDD8, COPS6

Transcriptional regulation of white adipocyte differentiation, Regulation of lipid metabolism by PPARalpha, Circadian Clock, Mitochondrial biogenesis (q<0.01).

NCOR2, NCOA6, HDAC3, BCL6, RARA, AR,ANKRD11, THRB, HDAC1, KDM5B

Immune system, Apoptosis (q<0.05)Toll-Like Receptors Cascades (q<1.0x10-4)

DIABLO, XIAP, BIRC2, BIRC6, UBE2D4,BIRC3, BIRC7, TRAF2, BIRC5, ELAVL1

DNA Repair, Cell Cycle (q<1.0x10-4)G2/M DNA damage checkpoint (q<6.7x10-4)

NBN, MRE11A, MDC1, RAD50, BRCA1,H2AFX, FANCD2, ATM, TP53BP1, ATR

Cellular response to stress (q<0.01).Insulin, IGF1, mTOR, PI3K signalling (q<1.0x10-5).

TSC1, RHEB, TSC2, YWHAE, MTOR,YWHAB, RAF1, RPTOR, BECN1, MAST3

Hierarchy of functions and pathways ranked by centrality of cluster

Cluster

SUMO2

SKP1

GSK3B

NCOA1

EZH2

TCEB2

NCOR2

DIABLO

NBN

TSC1

“Overlapping Community Clustering”

BIOSOCIAL CDT 13/02/2020 68

Page 69: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Constraint Based Modelling

NATURE REVIEWS | GENETICS VOLUME 15 | FEBRUARY

2014 | 107 69

Page 70: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Why are methods of prioritising Omic data important?

Answer: Lists don’t necessarily deliver the solution!

Gene set analysis of Transcriptomic data

Systems Analysis by constraint based

modelling predicts correctly

BIOSOCIAL CDT 13/02/2020 70

Page 71: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Exponential Random Graphs – Network statistics

Many structural elements can be used to model

networks

E.G.

BIOSOCIAL CDT 13/02/2020 71

Page 72: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Integration of Multi-Omic Data

“Network Biology is a primary tool”

Page 73: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Integration of Networks to Enhance Clinical Prediction

BIOSOCIAL CDT 13/02/2020 73

Page 74: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

BIOSOCIAL CDT 13/02/2020 74

Page 75: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

• A central goal of genetics is to understand the links between genetic variation and

disease.

• Intuitively, one might expect disease-causing variants to cluster into key pathways that

drive disease etiology.

• But for complex traits, association signals tend to be spread across most of the

genome—including near many genes without an obvious connection to disease.

• We propose that gene regulatory networks are sufficiently interconnected such that all

genes expressed in disease-relevant cells are liable to affect the functions of core

disease-related genes and that most heritability can be explained by effects on

genes outside core pathways.

• We refer to this hypothesis as an “omnigenic” model.

An Expanded View of Complex Traits: From Polygenic to Omnigenic

Cell - Volume 169, Issue 7, p1177–1186, 15 June 2017BIOSOCIAL CDT 13/02/2020 75

Page 76: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

An Expanded View of Complex Traits: From Polygenic to Omnigenic

Cell - Volume 169, Issue 7, p1177–1186, 15 June 2017BIOSOCIAL CDT 13/02/2020 76

Page 77: SOC-B CENTRE FOR DOCTORAL TRAINING IN BIOSOCIAL … · Genomics •Genome = total DNA of a cell or an organism •Human genome = 3.2 billion bases and estimated > 30,000 protein coding

Take home messages?

BIOSOCIAL CDT 13/02/2020 77