Upload
aldous-sanders
View
221
Download
0
Embed Size (px)
Citation preview
Epigenomics: A Practical Guide
Benjamin Rodriguez, PhDWei Li Lab, Baylor College of Medicine
Molecular Biology Refresher Course with BioinformaticsAugust 30th 2013
Course Materials:http://dldcc-web.brc.bcm.edu/lilab/benji/MBRB_2013/index.htmlMost up to date slidesI will upload for all three of my lectures
Browsers:http://genome.ucsc.edu/http://epigenomegateway.wustl.edu/
Web-based analysis:http://bejerano.stanford.edu/great/public/html/http://david.abcc.ncifcrf.gov
Software, Sites, Materials
Outline
• DNA methylation• Histone
modifications• DNase
hypersensitivity• Aberrant methylation
in cancer• Epigenetic
inheritance in development and disease
DNA is Packaged in Chromatin
nucleosomehistone
DNA
chromatin
Chromatin consists of nucleosomes, DNA wrapped around histone proteins
• Chromatin organizes genes to be accessible for transcription, replication, and repair
CpG Islands and Promoters
• Although C & G constitute 42% of the human genome, less than 1% of pairs are CpG– Less than ¼ of expected frequency of 0.04
• A ‘CpG island’ is a run of “CpG-rich” sequence – min 200 bp in length– GC content > 50%– Observed : Expected ratio > 0.6 – This definition is not precise
• Many CpG islands occur within promoters
DNA Methylation
• Methylation at CpG islands often repress nearby gene expression
• Many highly expressed genes have CpG methylation on their exons
• Some genes could be imprinted, so maternal and paternal copies have different DNA methylation
• In embryonic stem cells, there are also CHG methylation
• Recently, another type of DNA methylation called hydroxyl methylation (hmC) is found
Epigenetic Mechanisms: DNA Methylation
1 32 4
CGCG CG CG CG MCG
MCG
Normal
C: cytosinemC: methylcytosine
CpG island
DNA Methylation and Gene Silencing
1 32 4
1 2 3 4
X
CGCG CG CG CG MCG
MCG
Normal
Cancer
CG CG CGMCG
MCG
MCG
MCG
C: cytosinemC: methylcytosine
CpG island
DNA Methylation and Regulation
• Cytosine methylation blocks DNA-binding proteins’ access to regulatory sites and creates binding sites for repressive proteins
• Methylation often follows decrease in site use
From Thurman et al Nature 2012
Methylation and Expression
R2 = 0.7817 P < 0.0005
Some genes (e.g. HOXB13 in breast cancer) show strong correlation of promoter methylation with expression
From Rodriguez et al Carcinogenesis 2008
Methylation, Retroviruses and Repeats
• Bacteria use DNA methylation to limit invasive DNA from viruses
• A large fraction of the human genome consists of carcasses of retro-viruses and transposons
• Almost all DNA repeats are heavily methylated• If they lose methylation they are more likely to
be expressed
DNA Methylation and Development
• Almost all DNA de-methylated in embryo• Increasing methylation at various times during
fetal development restrict functionality – This is why cloning is difficult
• Wave of methylation in adolescence• Gradual de-methylation in old age
DNA Methylation and Inheritance• Most DNA is de-methylated during
gametogenesis and embryogenesis• Methylation persists in some DNA regions• Humans and mice show epigenetic inheritance
apparently mediated by DNA methylation
Agouti Mice and DNA Methylation
Paternal allele Maternal allele
Epigenetic mechanism of transcriptional regulation
Maternal allele Paternal allele
Genomic Imprinting
Expression of a subset of mammalian genes is restricted to one parental allele
Paternal allele Maternal alleleMaternal allele Paternal allele
Genomic Imprinting
Parental chromosomes are differentially marked by DNA methylation
Imprinting regulated by cis-acting elements (Imprinting Control Regions) and non-coding RNAs
Paternal allele Maternal alleleMaternal allele Paternal allele
Imprinting Control Regions act over long distances and control the imprinting of multiple genes
We will examine a recent study of the IGF2 DMR in individuals exposed to famine in utero
Genomic Imprinting
Epigenetic Mechanisms: Post-Translational Modification to Histones
HistoneAcetylation
HistoneMethylation
AcMe
• Epigenetic modifications of Histones include Histone Acetylation and Methylation
Histone Modifications
• Different modifications at different locations by different enzymes
• Potential temporal and spatial specificity
Histone Modifications
• Gene body mark: H3K36me3, H3K79me3• Active promoter (TSS) mark: H3K4me3• Active enhancer (TF binding) mark: H3K4me1,
H3K27ac• Both enhancers and promoters: H3K4me2,
H3/H4ac, H2AZ• Repressive promoter mark: H3K27me3• Repressive mark for DNA methylation:
H3K9me3
Genes, regulatory DNA, and epigenetic features
Graphic from NIH RoadMap Epigenomics Site
- promoters- enhancers- silencers- insulators- etc.
DNaseI
Genes, regulatory DNA, and epigenetic features
DNase Hypersensitive (HS) Mapping
• DNase randomly cuts genome (more often in open chromatin region)
• Select short fragments (two nearby cuts) to sequence
• Map to activepromoters andenhancers
~100,000 – 250,000 DHSs per cell type (0.5-1.5% of genome)
genome.ucsc.edu www.epigenomebrowser.org
DNaseI Hypersensitive site (DHS)
Promoters
Enhancers
DNaseI hypersensitive sites mark regulatory DNA
HMT
HMT
• Coordinated activities of chromatin modifying enzymes lead to condensation of chromatin and inhibition of gene expression
HDAC
HDAC
Ac
Ac
Ac
MeMeMe
Me
Me
Me
Me
Me
Me
Me
Geneexpression
Geneexpression
DNMT
Epigenetic Modifications to Histones and DNA Can Cooperate to Silence Gene Expression
• Regulation of genes involved in differentiation, cell cycle, and cell survival
EPIGENETICS
Normal epigenetic mechanisms
Roles in Normal Development and Cancer
Differentiated cells
Progenitor cell
• Regulation of genes involved in differentiation, cell cycle, and cell survival
• Through epigenetic silencing of certain genes, affected cells may acquire new phenotypes which promote tumorigenesis
EPIGENETICS
Malignant progenitor cell Tumor
Normal epigenetic mechanisms
Deregulated epigenetic mechanisms
Roles in Normal Development and Cancer
Differentiated cells
Progenitor cell
HOXB13 hypermethylation in breast cancer cells
From Rodriguez et al Carcinogenesis 2008
Strong inverse assocation between promoter CpG island hypermethylation and HOXB13 gene expression
R2 = 0.7817 P < 0.0005
HOXB13 hypermethylation in breast cancer cells
From Rodriguez et al Carcinogenesis 2008
Bisulfite sequencing
(Sanger, clone-based, very laborious)
Inhibition of DNA methyltransferase activity restores expression of HOXB13
From Rodriguez et al Carcinogenesis 2008
Paired Tumor and Adjacent Normal TissuesPatient ER Status
HOXB13 hypermethylation strongly associates with patient ER status
(OR=3.75, 95% CI 1.41-9.96; P = 0.008)
From Rodriguez et al Carcinogenesis 2008
HOXB13 hypermethylation associates with poor disease free survival in ERα-positive patients
From Rodriguez et al Carcinogenesis 2008
• Epidemiologic studies suggest adult disease risk is associated with adverse environmental conditions early in development
• Involvement of epigenetic dysregulation has been hypothesized
• Do early-life environmental conditions can cause epigenetic changes in humans that persist throughout life? Is there are role for clinical intervention?
1. Periconceptual exposure to famine2. Offspring born before vs. after maternal gastrointestinal
bypass surgery
Epigenetic inheritance and human development
Persistent epigenetic differences associated with prenatal exposure to famine in humans
Individuals who were prenatally exposed to famine during the Dutch Hunger Winter in 1944–45 had, 6 decades later, less DNA methylation of the imprinted IGF2 gene compared with their unexposed, same-sex siblingsAssociation was specific for periconceptional exposure, reinforcing that very early mammalian development is a crucial period for establishing and maintaining epigenetic marks
Heijmans et al PNAS 2008
Insulin-like growth factor II (IGF2)
• One of the best-characterized epigenetically regulated loci• Key factor in human growth and development• Maternally imprinted• Imprinting is maintained through the IGF2 differentially
methylated region (DMR)• Hypomethylation of DMR leads to bi-allelic expression of IGF2• IGF2 DMR methylation is a normally distributed quantitative trait
largely determined by genetic factors• methylation mark is stable up to middle age
• If affected by environmental conditions early in human development, altered methylation may be detected many years later
Difference in IGF2 DMR methylation between individuals prenatally exposed to famine and their same-sex sibling
Fig A displays the difference in IGF2 DMR methylation within sibships according to the estimated conception date of the famine-exposed individualIGF2 DMR methylation was lowest in the famine-exposed individual among 72% (43/60) of sibships; this lower methylation was observed in conceptions across the famine period.
IGF2 DMR methylation among individuals periconceptionally exposed to famine and their
unexposed, same-sex siblings
• 62 individuals exposed to famine late in gestation for at least 10 weeks, they were born in or shortly after the famine
• No difference in IGF2 DMR methylation between the exposed individuals and their unexposed siblings
IGF2 DMR methylation among individuals exposed to famine late in gestation and their unexposed, same-sex siblings
Timing of famine exposure during gestation and IGF2 DMR methylation
• Periconceptional, late exposure groups and 122 controls• Periconceptional exposure associated with lower methylation• Statistically significant association between timing and exposure
Differential methylation in offspring born before versus after maternal gastrointestinal bypass surgery
• Obesity during pregnancy affect fetal programming of adult disease
• Children born after surgery (AMS) are less obese and exhibit improved cardiometabolic risk profiles carried into adulthood
• Analyze the impact of maternal weight loss surgery on methylation levels in BMS and AMS offspring.
• Statistically significant correlations between gene methylation levels and gene expression and plasma markers of insulin resistance
• Effective treatment of a maternal phenotype is durably detectable in the methylome and transcriptome of subsequent offspring
Guenard et al. PNAS 2013
Offspring born before vs. after maternal gastrointestinal bypass surgery
BMS offspringhigher weight, height, and waist and hip girth (P < 0.05)
AMS offspringLower body fat % (P = 0.07)Improved Fasting insulin levels (P = 0.03) Homeostatic model of insulin resistance (HOMA-IR) index (P = 0.03)Lower blood pressure (P < 0.05).
• 14,466 CpG sites (2.9% of sites analyzed) exhibited significant differences• corresponded to 5,698 unique genes
• significant biological functions related to autoimmune disease, pancreas disorders, diabetes mellitus, and disorders of glucose metabolism
Differential methylation analysis of offspring born before vs. after maternal gastrointestinal bypass surgery
Any questions?
On to the Laboratory!
Laboratory Excercises
• We will work with the significant differentially methylated CpG sites published as supporting data from the Guenard study
• The MGBS_study.xlsx and AMS.probes.bed files are available from the class web site
• We will perform our own mapping and significance testing of the CpG sites (in relation to genes) using GREAT
• We will analyze the published gene list and our custom gene list in DAVID
• Finally, we will analyze last week’s gene list (from the MLL-AF9 fusion protein study) in DAVID
Do hyper- and hypo-methylated sites in AMS offspring have different distributions?
Open MGBS_study.xlsx and examine the “DMC list” worksheet
The study used a poorly described algorithm, DiffScore, to assess statistical differences and to rank CpG sitesAlso implemented a loose threshold for change cutoff
In excel, we can easily compute summary stats
The average and standard deviation of the Delta beta values are quite similar
• Choose human GRCh37 on species assembly• Test regions upload bed file AMS.probes.bed
• Set Background regions to whole genome
• Choose submit
Mapping significant CpG sites to genes with GREAT
http://jura.wi.mit.edu/cgi-bin/bioc/tools/compare.cgiChoose compare 2 lists, Paste lists of genes, press submit
Compare gene lists from publication to those obtained by GREAT
• GREAT recovers 170 of 198 genes from the publication (AMS and GREAT)• GREAT identifies 170 additional genes (because by default it searches a wider
space of genomic distances)• The missing 28 genes may result from gene name synonyms
Mapping significant CpG sites to genes with GREAT
On “Region-Gene Association Graphs”We see 3 / 4 of the CpG sites are assigned two genesOrientation and distance to TSS show upstream pretty flat, but a spike in predictions when the distance is > 5 kb from TSSCould that be the reason we don’t see any significance test results?Let us find out
Open the “Association rule settings” dialog boxChange downstream to 5kb and distal to 5kbResubmit the job
Modifying the genomic region search range in GREAT
What happened with a smaller genomic search interval?
We only returned 64 genes! Crap.
But we did finally return a single significant test resultInterPro(protein sequence analysis and classification)
Functional enrichment analyses with DAVID
• With GREAT, we were able to identify the majority of genes published in the original study
• We do not have sufficient information to repeat the study’s original analyses
• We can use DAVID to analyze the study gene list and our gene list from GREAT
Open http://david.abcc.ncifcrf.gov and choose “Start Analysis”
Open http://david.abcc.ncifcrf.gov and choose “Start Analysis”
• “Upload Gene List” Dialog box• Copy and Paste the list from MGBS_study.xlsx
worksheet “Study Genes” • On “Select Identifier”, choose “Official Gene
Symbol” and choose “Gene List” on “List Type”• Then Submit List
Functional enrichment analyses with DAVID
For species, highlight Homo sapiens and click “Select Species”
Rename the list
Functional enrichment analyses with DAVID
Choose “Functional Annotation Tool”
Functional enrichment analyses with DAVID
Each Annotation Category on the left can be expanded to reveal a number of optional databases to queryThis allows for powerful customizationFor this exercise, we will accept the default options
Choose “Functional Annotation Chart”
Functional enrichment analyses with DAVID
Shown above are the first three results, the only ones to pass multiple-testing correctionThey reference the same group of genes
Functional Annotation Chart fields are: category, term, related term (RT), genes, count, percentage, p-value (univariate modified Fisher’s), and Benjamini p-value (correction for multiple testing)
Terms with arrows can be sorted
Functional enrichment analyses with DAVIDClicking on the link for term “Pleckstrin homology” opens the corresponding entry at Interpro
Proteins containing this domain can bind to and interact with membrane bound proteins, potentially mediating various signal transduction pathways in the cell
Let’s now perform the analysis with our list of differentially methylated genes obtained via GREAT• “Upload Gene List” Dialog box• Copy and Paste the list from MGBS_study.xlsx
worksheet “Great Analysis” • On “Select Identifier”, choose “Official Gene
Symbol” and choose “Gene List” on “List Type”• Submit List, choose “Homo sapiens”• Select “Functional Annotation Chart”
Functional enrichment analyses with DAVID
Note: Entrez Gene ID’s are a preferred way to search for gene functionsThey can account for the fact that a gene may go by several different names
Functional enrichment analyses with DAVID
We see the same first three results as before, but now they do not pass multiple-testing correctionWhy? One explanation, we introduced “noisy” genes with GREAT
Why did we not see any significant biological functions related to autoimmune disease, pancreas disorders, diabetes mellitus, or disorders of glucose metabolism?1. Study authors gave us a small piece of the data they likely used2. Methodological issues3. Commercial IPA is very different from publicly curated databases and
search tools
Functional enrichment analyses with DAVID
Finally, lets analyze the list of genes from last week’s MLL-AF9 fusion gene study.The file “MLL-AF9_promoters.bed” is available from the course website• “Upload Gene List” Dialog box• Open the bed file in excel, copy and paste the
fifth column into DAVID• On “Select Identifier”, choose “Entrez Gene ID”
and choose “Gene List” on “List Type”• Submit List, choose “Mus musculus”• Select “Functional Annotation Chart”
Note: Entrez Gene ID’s are a preferred way to search for gene functionsThey can account for the fact that a gene may go by several different names
Functional enrichment analyses with DAVIDJackpot! We have dozens of highly enriched terms for the genes bound by oncogenic MLL-AF9 in mouse leukemia stem cells
Enriched functions include transcription regulation and cell cycleMore than 40% of targets are phosphoproteins
Laboratory Summary
• The Guenard study was not very fruitful, so to speak• I have some issues with their methodology• Limited data (published) sharing is poor practice• DNA methylation data is difficult to interpret
• GREAT and DAVID are powerful tools for functional enrichment analyses of genome-wide studies
• With the right tools and a little patience, you can make novel discoveries and draw meaningful biological interpretation from genomics datasets