Lecture 4. Topics in Gene Regulation and Epigenomics (Basics)
The Chinese University of Hong KongCSCI5050 Bioinformatics and Computational Biology
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 2
Lecture outline1. Introduction to gene regulation and
epigenetics2. Problems in computational biology and
bioinformatics
Last update: 26-Sep-2015
INTRODUCTION TO GENE REGULATION AND EPIGENETICS
Part 1
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 4
Gene regulation• Here defined as the control of the amount and
the products of a gene• Amount:– Number of transcripts produced– Number of protein produced
• Products:– RNAs
• Isoforms• Modifications
– Proteins• Modifications
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 5
Gene “expression”• Gene expression is a general term used to
indicate the production of gene products• More specific terms:– Transcription rate (number of new transcripts per
time)– Transcript level (total number of transcripts in the
cell)– Translation rate– Protein level
• All these are correlated but not identical, sometimes only weakly correlated
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 6
Gene regulation• Expression of genes needs to be tightly
regulated– Differentiation into different cell types– Response to environmental conditions
• How are genes regulated?– Transcriptional– Post-transcriptional– Translational– Post-translational
• Analogy: lighting controllingLast update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 7
A simple illustration
Last update: 26-Sep-2015
G1
P1
G2
P3
MeMeMe
Me
Me
Ac
G3
P5
P6
P7
G4
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 8
G4
Histone modifications
Chromatin accessibility
Protein-protein interactions and DNA long-range interactions
Protein-RNA interactions
miRNA-mRNA interactions
DNA methylation
Tran
scrip
tion
fact
or b
indi
ng
A simple illustration
Last update: 26-Sep-2015
G1
P1
G2
P3
MeMeMe
Me
Me
Ac
G3
P5
P6
P7
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 9
More details and other mechanisms• Transcriptional regulation
– Transcription factors• Binding to promoter vs. distal elements (e.g., enhancers)• Activators vs. repressors
• Post-transcriptional regulation– Capping– Poly-adenylation– Splicing– RNA editing– mRNA degradation
• Translation– Translational repression
• Post-translational– Protein modifications (e.g., phosphorylation)
Last update: 26-Sep-2015
Image source: http://www.emunix.emich.edu/~rwinning/genetics/eureg.htm
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 10
Epigenetics• Wikipedia: “the study of heritable changes in
gene expression or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence”– Heritable: Can pass on to offspring (daughter cells)– Mechanisms other than changes in DNA• Same DNA, different outcomes
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 11
Active and inactive epigenetic signals
• DNA methylation• Chromatin remodeling• Histone modifications• RNA transcripts• ...
Last update: 26-Sep-2015
Image credit: Zhou et al., Nature Reviews Genetics 12(1):7-18, (2011)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 12
DNA methylation• Methyl group (-CH3) added to cytosine in eukaryotic DNA, usually next to a
guanine (in a CpG dinucleotide)– Forming 5-methylcytosine
• Can be further modified into 5-hydroxymethycytosine
• Hypermethylation at promoter can cause gene repression• Recent studies have suggested links between DNA methylation and
– Protein binding– Transcriptional elongation– Splicing
• Gene imprinting: parent-specific expression• Implications in diseases• De novel vs. maintenance
Last update: 26-Sep-2015
Image source: http://www.zymoresearch.com/media/images/products/D5405-2.jpg, http://missinglink.ucsf.edu/lm/genes_and_genomes/methylation.html
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 13
Chromatin remodeling• Chromatin: compact structure of DNA
and proteins– DNA wraps around histone proteins to
form nucleosomes– Nuelceosome positioning can be changed
dynamically, affecting DNA accessibility (e.g., to binding proteins)
Last update: 26-Sep-2015
Image credit: wikipedia
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 14
Histone modifications• Modification of specific residues on histone proteins
– Acytelation, methylation, phosphorylation, ubiquination, etc.– Nomenclature: H3K4me3 (Histone protein H3, lysine 4, tri-
methylation)– Histone modifications give different types of signals in gene regulation
Last update: 26-Sep-2015
Image credit: Zhou et al., Nature Reviews Genetics 12(1):7-18, (2011)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 15
Non-coding RNA• There are different types of functional RNA
that do not translate into proteins
Last update: 26-Sep-2015
Type Abbreviation Function
Ribosomal RNA rRNA Translation
Transfer RNA tRNA Translation
Small nuclear RNA snRNA Splicing
Small nucleolar RNA snoRNA Nucleotide modifications
MicroRNA miRNA Gene regulation
Small interfering RNA siRNA Gene regulation
… … …
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 16
MicroRNA• Short (~22 nucleotides) RNAs that
regulate gene expression by promoting mRNA degradation or repressing translation
Last update: 26-Sep-2015
Image credit: wikipedia, Sun et al., Annual Review of Biomedical Engineering 12:1-27, (2010)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 17
Gene regulation and epigenetics• Some mechanisms are known to regulate gene
expression. For example:– Transcription factor binding can activate or repress
transcription– miRNA-mRNA binding can promote mRNA cleavage or
repress translation• Some signals are correlated with expression, but the
causal direction is not certain (or not fixed). For example:– Promoter DNA methylation and transcriptional repression– Histone modifications with expression levels
• The different mechanisms are not independent.
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 18
High-throughput methods (recap)• Protein-DNA binding (ChIP-seq, ChIP-exo, ...)• DNA long-range interactions (ChIA-PET, Hi-C, TCC, ...)
[project]• DNA methylation (bisulfite sequencing, RRBS, MeDIP-
seq, MBDCap-seq, ...) [project]• Open chromatin (DNase-seq, FAIRE-seq, ...)• Histone modifications (ChIP-seq)• Gene expression (RNA-seq, CAGE, ...), isoforms [project]• Protein-RNA binding (CLIP-Seq, HITS-CLIP, PAR-CLIP, RIP-
seq, ...) [project]• ...
Last update: 26-Sep-2015
PROBLEMS IN COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
Part 2
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 20
Some related CBB Problems• Analysis of chromatin patterns [project]• Identification of regulatory elements [lecture,
discussion paper]• Reconstruction of transcription factor (TF)
regulatory networks [project]• Identification of non-coding RNAs [project]• Prediction of miRNA targets [project]• Construction of gene expression models
[project]
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 21
Analysis of chromatin patterns• Computational tasks:– Segmentation of the human genome• Fix-sized bins• Based on annotation• Unsupervised clustering
– Hidden Markov models
• Supervised classification
– Data aggregation and integration– Large-scale correlations• Learning of signal shapes
– Visualization
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 22
Genome segmentation• Using
chromatin state to segment the genome– Hidden
Markov model– Clustering
• Annotate identified states using biological knowledge
Last update: 26-Sep-2015
Image credit: Ernst and Kellis, Nature Methods 9(3):215-216, (2012)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 23
Global chromatin patterns• Many recent
findings that relate chromatin patterns with other features– Global example:
histone modifications, recombination rates and chromosome 1D and 3D structures in C. elegans
Last update: 26-Sep-2015
Image credit: Gerstein et al., Science 330(6012):1775-1787, (2010)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 24
Local chromatin patterns– Histone modifications and protein binding at
promoters and enhancers in human
Last update: 26-Sep-2015
Image credit: Heintzman et al., Nature Genetics 39(3):311-318, (2007)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 25
Identifying regulatory elements• There are different types of protein-binding regions
in the DNA– Promoters– Enhancers– Silencers– Insulators– ...
• How to locate them in the genome?
Last update: 26-Sep-2015
Image credit: Raab and Kamakaka, Nature Reviews Genetics 11(6):439-446, (2010)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 26
Identifying regulatory elements• Some useful information:
– Genomic location• E.g., promoters are around transcription start sites
– Evolutionary conservation• Functional regions are more conserved
– Protein binding signals and motifs• E.g., EP300 at enhancers, CTCF at insulators
– Chromatin features• E.g., DNase I hypersensitivity, H3K4me1 and H3k27ac at active
enhancers
– Reporter assays– ...
• Difficulty: integrating different types of informationLast update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 27
Reconstruction of TF network• Goals:
– Identifying TF binding sites– Determining the target genes of each TF
• In different cell types• In different conditions
– Deducing how gene expression is regulated by TFs– Studying how TFs interact with each other
• Methods:1. From expression data2. Sequence-based (motif analysis)3. From binding experiments– Sign of regulation (activation vs. repression) usually not
determined for #2 and #3
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 28
Expression-based methods• Input: gene expression levels of
genes– Usually from microarrays– Often time series data
• Output: a network (i.e., directed graph)– Each node is a gene (and its protein
product)– An AB edge means A is a TF and it
regulates B
• Types:– Differential equations– Probabilistic networks– Boolean networks
Last update: 26-Sep-2015
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 50 100 150 200
G1
G2
G3
G4
G5
G6
G7
G8
G9
G10
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 29
Expression-based methods• Differential equations– Models (yj: expression level of gene j, aji: influence of TF i
on gene j):• Linear• Sigmoidal• ...
– Methods:• Solve system of equations to get best-fit parameter values
– Difficulties:• Many parameters when there are many TFs
– Insufficient training data– L1 (LASSO) regularization to control the number of non-zero variables
• Long running time
Last update: 26-Sep-2015
jjjjk
kjkj
m
kkjkj
j yayaayaadt
dy
0
10
jj
jkkjkj
jj yb
yaa
b
dt
dy2
0
1
exp1
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 30
Boolean networks• Considering each gene to be either on or off• Treat the gene regulatory network as a Boolean network
(similar to a electric circuit)– Expression of a gene at time t+1 depends on the expression of genes
that regulate it at time t– Goal: Find the logical relationships between genes
Last update: 26-Sep-2015
Image credit: Akutsu and Miyano, Pacific Symposium on Biocomputing 4:17-28, (1999)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 31
From binding data• Input: binding signals of transcription factors in the whole genome
– Usually from ChIP-chip or ChIP-seq– Or from motifs– (Best to combine both)
• Output: TF regulatory network• Difficulties:
– Finding binding sites• Peak calling• Motif analysis
– Associating binding sites with target genes• Promoters (e.g., 500bp upstream of transcription start site)• More difficult for distal binding sites• Expression patterns could help
– Evaluating functional effects of binding (strong vs. weak, transient binding)
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 32
Combining both types of data1. Use expression
data to infer initial network
2. Identify potential regulators
3. Search for binding motifs of these regulators
4. Incorporate global occurrence of these motifs at gene promoters to refine the network
Last update: 26-Sep-2015
Image credit: Tamada et al., Bioinformatics 19(Suppl.2):ii227-ii236, (2003)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 33
Identification of non-coding RNAs• It has recently been shown that a vast amount
of DNA is transcribed into RNA by high-throughput experiments
• What are they?– Experimental artifacts?– Unannotated protein-coding genes?– Non-functional transcripts?– Functional non-coding RNAs?– Pseudogenes?
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 34
Construction of expression models• Given the many different mechanisms
involved in gene regulation, how are they related to each other?– Are they redundant?– Do they simply add to each other, or have
synergistic effects?– Which have more impacts to final expression
levels?– What are their time scales?– When is each mechanism used?
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 35
Construction of expression models• Modeling and prediction– An indirect way to estimate how well a model is:
evaluating the accuracy of predicted expression• Prediction of:– Expression level• Regression: yi f(xi)
• Classification: (yi > t) f(xi)
– Differential expression
Last update: 26-Sep-2015
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 36
Construction of expression models• Chromatin features and expression
Last update: 26-Sep-2015
Image credit: Cheng et al., Genome Biology 12(2):R15, (2011)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 37
Construction of expression models• Model construction and accuracy
Last update: 26-Sep-2015
Image credit: Cheng et al., Genome Biology 12(2):R15, (2011)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 38
“Histone code” hypothesis• The statistical models are
good, but too complex for humans to interpret
• Is there a simple set of rules (i.e., a “code”) that can easily tell the expression level of a gene?
Last update: 26-Sep-2015
Image credit: Cheng et al., Genome Biology 12(2):R15, (2011)
CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015 39
Summary• “Gene expression” is a general term with several
possible meanings• Gene expression is regulated by many mechanisms,
including (but not limited to)– Transcription factor binding– DNA long-range interactions– DNA methylation– Chromatin structure– Histone modifications– MicroRNA-mRNA binding
• A lot of new genome-wide data• Many emerging research topics in CBB
Last update: 26-Sep-2015