26
Introduction to Bioinformatics and Genomics 2013 University of Brawijaya 4 th December 2013 1 Understanding the Human Genome: Lessons from the ENCODE project Austen Ganley INMS

University of Brawijaya 4 th December 2013

  • Upload
    arty

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

University of Brawijaya 4 th December 2013. Austen Ganley INMS. Understanding the Human Genome: Lessons from the ENCODE project. Glossary. Non-coding RNA Sequencing Microarray Transcription start site Active/open Inactive/repression. Genome Genes DNA/RNA Protein Cell - PowerPoint PPT Presentation

Citation preview

Page 1: University of  Brawijaya 4 th  December 2013

Introduction to Bioinformatics and Genomics 2013

University of Brawijaya4th December 2013

1

Understanding the Human Genome:

Lessons from the ENCODE project

Austen GanleyINMS

Page 2: University of  Brawijaya 4 th  December 2013
Page 3: University of  Brawijaya 4 th  December 2013

Glossary• Genome• Genes• DNA/RNA• Protein• Cell• Transcription• Chromatin• Histones• Nucleosomes

• Non-coding RNA• Sequencing• Microarray• Transcription

start site• Active/open• Inactive/

repression

Page 4: University of  Brawijaya 4 th  December 2013

promoter

transcriptional start site

exon

intron

transcriptional terminator

Page 5: University of  Brawijaya 4 th  December 2013

Introduction• Individual scientists worked together• Aim was to understand 1% of the human

genome (2007), and 100% (2012)• Looked at:

• Transcription• Chromatin/transcription factors• Replication• Evolution

Page 6: University of  Brawijaya 4 th  December 2013

Genes• Now estimated to be about 21,000

protein-coding genes (taking about 3% of the whole genome)

• In addition, there are about 9,000 microRNAs, and about 10,000 long non-coding RNAs

Page 7: University of  Brawijaya 4 th  December 2013

Transcription• Transcription was measured by two

different methods:• Whole genome microarrays• RNA-sequencing

Page 8: University of  Brawijaya 4 th  December 2013

Detecting transcription using tiled microarrays

Page 9: University of  Brawijaya 4 th  December 2013

Transcription• Transcription was measured by two

different methods:• Whole genome microarrays• RNA-sequencing

• They found at least 62% of the whole genome is transcribed (remember, genes only account for about 3% of the whole genome)

Page 10: University of  Brawijaya 4 th  December 2013

Transcriptional start sites

• Goal is to identify the transcription start sites

• Not easy to do!• Use a technique called CAGE (Cap

Analysis Gene Expression)

Page 11: University of  Brawijaya 4 th  December 2013

CAGE• Makes use of the 5’ CAP on mRNA• First, mRNA is reverse-transcribed, to

form cDNA (RNA-DNA hybrid)• Then, biotin is attached to the 5’ CAP,

and the cDNA is fragmented• The biotin fragments are isolated

(representing the 5’ end of mRNA), and these fragments are sequenced

Page 12: University of  Brawijaya 4 th  December 2013

• About 60,000 transcription start sites found

• Only half of these match known genes

• What do the other ones do? May explain high level of transcription

• The transcription start sites are often far upstream of the gene start, and can overlap genes

Page 13: University of  Brawijaya 4 th  December 2013

Overlapping GenesTranscriptional start sites from the DONSON gene

• An overlapping gene, starting far upstream• The DONSON gene is a known gene• However, some transcripts start in the ATP50

gene, and include some ATP50 exons• Two genes are skipped out

Page 14: University of  Brawijaya 4 th  December 2013

• Nucleosomes are formed from DNA that is packaged around histones

• Histones are a set of proteins that usually associate as an octamer

www.palaeos.com/Eukarya/Eukarya.Origins.5.html

www.mun.ca/biochem/courses/3107/Topics/supercoiling.html

Chromatin: histones and nucleosomes

Page 15: University of  Brawijaya 4 th  December 2013

Dnase I hypersensitive sites (DHS)

Gilbert, Developmental Biology, Sinauer

Hebbes Lab, University of Portsmouth, UK

• DNase I preferentially digests nucleosome-depleted regions (DNase I hypersensitive sites)

• These are associated with gene transcription

• Chromatin is digested with DNase I: only digests nucleosome-free regions

• The remaining DNA is isolated, and put on a microarray or sequenced

• Find the open, active regions of the genome

Page 16: University of  Brawijaya 4 th  December 2013

DNase I hypersensitive sites

• In total, about 3 million DNase I hypersensitive sites in the genome, covering about 15% (versus about 40,000 genes covering about 4%)

• Transcriptional start sites are regions of DNase I hypersensitivity, as expected

• Most DNase I hypersensitive sites are not associated with transcriptional start site, though

Page 17: University of  Brawijaya 4 th  December 2013

Genome

Transcribed region

DNase I hypersensitiv

e region

Transcription start sites

Genes

Page 18: University of  Brawijaya 4 th  December 2013

Histone Modification

Effects• Modifications occur

on the histone tails• They alter the

strength of DNA-histone binding, and influence the binding of other proteins to the DNA

• Thus they can activate or silence gene expression

Page 19: University of  Brawijaya 4 th  December 2013

The “Histone Code”• The combination of histone modifications determine a

gene’s transcriptional status – histone code• Some modifications are associated with active gene

expression– H3K4me2– H3K4me3– H3ac– H4ac

• Some with repression– H3K27me3– H3K4me1

www.nature.com/nrm/index.html

Page 20: University of  Brawijaya 4 th  December 2013

ChIP (Chromatin immunoprecipitation)

• Method to find where your protein of interest is binding to

• You cross-link the sample, and fragment the DNA into pieces

• Immunoprecipitate using an antibody to your protein of interest

• Reverse the cross-links, and isolate the DNA

• To find where in the genome the protein was bound:

• Hybridise the DNA to a microarray (ChIP-chip) OR sequence it (ChIP-seq)

www.rndsystems.com/product_detail_objectname_exactachip_assayprinciple.aspx

Page 21: University of  Brawijaya 4 th  December 2013

Histone modification profiles

• They found that histone modifications associated with active transcription were found around transcription start sites

• They found that histone modifications associated with gene repression were depleted around transcription start sites

• This is as expected• Around DNase I hypersensitive sites not

near transcription start sites, they found almost the opposite pattern

Page 22: University of  Brawijaya 4 th  December 2013

Enrichment of active histone marks and depletion of inactive histone marks at a transcription start site

Enrichment of inactive histone marks but little enrichment of active histone marks at a DNase I hypersensitive site

Page 23: University of  Brawijaya 4 th  December 2013

Histone modification profiles

• They also found other patterns• Combining all the results (plus results for

transcription factor binding), they say that the human genome is divided into seven different types of chromatin states

• Which state it is depends on what combination of histone modifications/transcription factor binding there is

Page 24: University of  Brawijaya 4 th  December 2013

The seven chromatin states

Page 25: University of  Brawijaya 4 th  December 2013

The seven chromatin states

Promoter (red) Enhancer (yellow)

Gene body (green)

Inactive region (grey)

Page 26: University of  Brawijaya 4 th  December 2013

Grand Summary

ENCODE

Transcription:• a lot of non-coding transcription (~60% of the genome transcribed) – much more than needed just to transcribe all the genes

Transcription start sites:• Twice as many transcription start sites as traditional “genes”• transcripts span large regions, even between genes

DNase I hypersensitive sites:• more than just at transcription start sites• two types: those found both at TSS, and those found at other regions• these have different chromatin profiles

Histone modifications:• active marks correlate with TSS/DHS• distal DHS have a different histone modification profile

Chromatin states:• The genome can be divided into seven different types• these are determined by the combination of histone modifications and transcription factor binding that occur

Overview:• genome can be generalised into seven different states• the function of some of these states is known – e.g. promoter• the function of others is not known, but may explain the high level of transcription and open chromatin structure