30
4"23 Recita,on Prof. Gifford L18 Analysis of Chroma,n Structure 1

Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

4"23%Recita,on%

Prof.%Gifford%L18%Analysis%of%Chroma,n%Structure%

1

Page 2: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Announcements%

•  Problem%Set%5%due%next%Thursday%(May%1st)%•  2%more%lectures%from%Prof.%Gifford%(including%today),%then%2%guest%lectures%"%Ron%Weiss%from%MIT%(Synthe,c%Bio),%George%Church%from%Harvard%(Genome%Engineering%&%Systems%Biology)%

•  2nd%exam%–%Tuesday,%May%6th%

2

Page 3: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Outline%

•  Chroma,n%Structure%

•  Dynamic%Bayesian%Networks%/%Segway%

•  DNAse"seq%&%Protein%Interac,on%Quan,ta,on%(PIQ)%

%•  ChIA"PET%reveals%3D%interac,ons%in%the%genome%

3

Page 4: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Introduc,on%to%Chroma,n%•  DNA%in%one%cell%is%3%meters%long,%yet%fits%into%a%,ny%nucleus%•  To%facilitate%the%packaging,%DNA%is%wrapped%around%nucleosomes,%and%this%fiber%is%wrapped%into%higher%order%structures%up%to%the%level%of%a%chromosome%–  “chroma,n”%refers%to%the%structure%of%DNA%+%nucleosomes%–  Each%nucleosome%is%an%octamer%composed%of%4%pairs%of%different%histone%proteins:%H2A,%

H2B,%H3,%and%H4%

hbps://www.broadins,tute.org/files/news/images/2010/chroma,n_states_2a.png%

hbp://www.mun.ca/biology/desmid/brian/BIOL2060/BIOL2060"18/1820.jpg%

Courtesy of the Broad Institute. Used with permission. The most recent best practicescan be found at this website: https://www.broadinstitute.org/gatk/guide/best-practices.

© Pearson Education, Inc. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

4

Page 5: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

| yHistone modification

or variantSignal

characteristicsPutative functions

H2A.Z Peak Histone protein variant (H2A.Z) associated with regulatory elements with dynamic chromatinH3K4me1 Peak/region Mark of regulatory elements associatedwithenhancersandotherdistal elements,but alsoenricheddownstreamof transcription startsH3K4me2 Peak Mark of regulatory elements associated with promoters and enhancersH3K4me3 Peak Mark of regulatory elements primarily associated with promoters/transcription startsH3K9ac Peak Mark of active regulatory elements with preference for promoters

H3K9me1 Region Preference for the 59 end of genesH3K9me3 Peak/region Repressive mark associated with constitutive heterochromatin and repetitive elementsH3K27ac Peak Mark of active regulatory elements; may distinguish active enhancers and promoters from their inactive counterparts

H3K27me3 Region Repressive mark established by polycomb complex activity associated with repressive domains and silent developmental genesH3K36me3 Region Elongation mark associated with transcribed portions of genes, with preference for 39 regions after intron 1H3K79me2 Region Transcription-associated mark, with preference for 59 end of genesH4K20me1 Region Preference for 59 end of genes

Histone%modifica,ons%–  Par,cular%residues%on%the%tails%of%these%histones%commonly%undergo%

post"transla,onal%chemical%modifica,ons%

%

hbp://a.sta,c"abcam.com/CmsMedia/Media/common"histone"modifica,on"1.jpg%

–  Some%of%these%modifica,ons%are%associated%with%func,ons%–%different%combina,ons%of%marks%and%their%meaning%compose%the%“histone%code”%

%

ENCODE%Consor,um%Nature%2012%

Most%common%in%papers%

© Abcam. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

Courtesy of Macmillan Publishers Limited. Used with permission.Source: ENCODE Project Consortium. "An Integrated Encyclopedia of DNA Elements in the Human Genome." Nature 489, no. 7414 (2012): 57-74.

5

Page 6: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

•  In%addi,on%to%histone%modifica,ons,%gene%expression%can%be%affected%by%DNA%methyla,on:%the%5%carbon%of%cytosines%in%DNA%can%be%methylated.%–  In%metazoans,%only%C’s%before%G’s%can%be%methylated%

(the%C’s%of%CpG).%Hundred%or%thousands%of%base"long%stretches%rich%in%methylated%cytosines%form%“CpG%islands”%at%some%promoters%to%repress%gene%expression%

•  “Epigene,c”%changes%are%changes%to%DNA%not%at%the%level%of%primary%sequence%which%are%reversible%&%heritable%

•  Epigene,c%marks%are%ohen%cell"type%and/or%disease%state"specific%–  For%example,%the%pluripotency%gene%Nanog%is%%

%demethylated%during%reprogramming%of%differen,ated%%cells%into%iPSCs%

•  Enzymes%ac,vely%regulate%the%epigene,c%marks%–  Chroma,n%modifiers%and%nucleosome%remodelers%are%enzymes%

that%ac,vely%regulate%chroma,n%marks,%nucleosome%posi,oning%&%turnover%

–  DNA%methyltransferases%to%methylate%DNA%

Histone%code%&%DNA%methyla,on%regulate%gene%expression%

Courtesy of Macmillan Publishers Limited. Used with permission.

6

Page 7: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Profiling%histone%modifica,ons%•  ChIP"seq%w/%an,bodies%specific%to%a%par,cular%type%of%modified%

nucleosome%can%map%histone%modifica,ons%genome"wide%

•  We’d%like%to%come%up%with%a%way%to%combine%these%combina,ons%of%histone%marks%throughout%the%genome%into%func,onal%annota,ons%%•  Based%on%an%observed%pabern%of%marks%at%a%locus,%we’d%like%to%label%it%w/%

‘enhancer’,%‘promoter’,%‘inac,ve’%or%‘ac,ve%gene%body%region’,%etc.%

•  2%approaches%to%func,onal%annota,on%of%the%genome:%%"%Hidden%Markov%Model%(ChromHMM)%%"%Dynamic%Bayesian%Network%(Segway)%

%%

7

Page 8: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Dynamic%Bayesian%Networks%•  A%Bayesian%network%(directed%graphical%model%where%arcs/edges%represent%condi,onal%dependencies)%that%models%a%dynamic%process%(sequen,al%data,%either%temporal%or%spa,al%–%e.g.%along%the%genome)%

•  Similar%to%Hidden%Markov%Models,%but%include%addi,onal%random%variables%that%allow%tuning%(e.g.,%hard%limits%on%segment%lengths)%

8

Page 9: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Segway:%Dynamic%Bayesian%Network%

Variable’s%parents%are%indicated%by%its%direct%predecessor%in%the%directed%graph%%Every%variable%is%condi,onally%independent%of%all%variables%in%the%model%given%its%parents**n%observa,on%tracks%T:%sequence%length%%Square:%discrete%random%variable%Circle:%con,nuous%random%variable%%White:%hidden%variable%Black:%observed%variable%

Black%arcs%(edges)%=%determinis,c%condi,onal%dependence,%red%=%stochas,c%condi,onal%dependence%

) ) )( ( ( (

)

) )) )

) )) )

)) )

) )))

) )

) ) )( ( ( (

)

) )) )

) )) )

)) )

) )))

) )

© source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

9

Page 10: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Segway:%Dynamic%Bayesian%Network%

∈Observa,on%track:%assay%output%(e.g.,%density%of%H3K4me3%ChIP"seq%reads%–%one%track%for%each%of%n%experiments)%%

Indicator:%1/0%whether%or%not%the%assay%produced%any%results%for%that%region%(=0%if%assay%can’t%map%reads%to%that%region.%In%this%case,%the%edge%from%Qt%to%Xt(i)%is%edited%out)%%

Segment%label:%The%hidden%annota,on%you’re%trying%to%infer%(e.g.%promoter).%

) ) )

( ( ( (

)

) )) )

) )) )

)) )

) )))

) )

) ) )

( ( ( (

)

) )) )

) )) )

)) )

) )))

) )

© source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

10

Page 11: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Segway:%Dynamic%Bayesian%Network%

Countdown:%Discrete%variable%that%allows%the%specifica,on%of%minimum%or%maximum%segment%length.%Starts%at%ini,al%value%dependent%on%QT%(might%want%TSS%to%be%short%but%intergenic%region%to%be%long)%and%decreases%where%ruler%marker%MT=1.%%

Ruler%marker:%%=1%every%10th%posi,on,%0%

otherwise%(every%10th%posi,on,%we%update%the%countdown%variable%as%to%how%long%we’ve%been%in%that%label)%

) ) )

( ( ( (

)

) )) )

) )) )

)) )

) )))

) )

) ) )

( ( ( (

)

) )) )

) )) )

)) )

) )))

) )

© source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

11

Page 12: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Segway:%Dynamic%Bayesian%Network%

Transi,on:%binary%segment%transi,on%label%that%either%forces%the%segment%label%to%change%at%the%current%posi,on%(Jt=1)%or%prevent%it%from%changing%(Jt=0).%

Segway%generates%a%condi,onal%probability%table%P(Jt=1|Qt"1,Ct"1)%that%maps%each%(Qt"1,Ct"1)%to%one%of%three%rules%that%determine%the%value%of%Jt:%

%1.%Force:%P(Jt=1)%=%1%%2.%Prevent:%P(Jt=1)%=%0%%3.%Allow:%P(Jt=1)=1/(1+L)%

“Allow”%rule%models%geometric%distribu,on%w/expected%length%L%

) ) )

( ( ( (

)

) )) )

) )) )

)) )

) )))

) )

) ) )

( ( ( (

)

) )) )

) )) )

)) )

) )))

) )

© source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

12

Page 13: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

•  Train%on%1%%of%the%Genome%–  Assign%equal%probability%(=1/n)%of%each%label%to%the%star,ng%posi,on,%then%use%

Expecta,on"Maximiza,on%(EM)%algorithm%to%learn%model%parameters%(contribu,ons%of%each%track%(experimental%assay)%to%each%label)%%

–  Star,ng%from%different%ini,al%condi,ons%(i.e.,%contribu,ons%of%each%track%to%a%par,cular%label)%gave%similar%results%

•  Then%use%these%parameters%to%segment%the%rest%of%the%genome%using%Viterbi%decoding%(similar%to%what%we%discussed%for%HMMs)%

Segway:%Dynamic%Bayesian%Network%

13

Page 14: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Window position

33046000 33047000ScaleChr6:

DL0

F0F1R0R1R2R3R4R5C0

H3K9me1TF0TF1TF2TSSGS

E/GMGM0GM1GE0GE1GE2

HLA-DMAHLA-DMA

BRD2

BRD2

BRD2BRD2

BRD2

BRD2

BRD2

AL645941.1BRD2

BRD2BRD2

BRD2

BRD2BRD2

C1

L1

33048000 33049000 33050000 33051000Segway 31-track chromatin segmentation (K562)

ENCODE Gencode Manual Gene Annotations (level 1+2) (Oct 2009)

Human Mar. 2006 (NCBI36/hg18)5 kb

chr6:33044414-33057260 (12,847 bp)

33052000 33053000 33054000 33055000 33056000 33057000

"Arbitrarily%chose%there%to%be%25%labels%(so%that%they%would%remain%interpretable%by%biologists)%%The%authors%gave%names%to%the%resul,ng%25%labels:%% %D:%“dead”%–%no%ac,vity%% %GS:%gene%start%% %GM:%gene%middle%% %GE:%gene%end%% %E:%enhancer%%%

Example%of%Segway’s%segmenta,on%for%a%gene%

© source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

14

Page 15: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Transcrip,on%Factor%Binding%•  Many%more%possible%binding%sites%in%genome%than%are%actually%occupied%•  Binding%sites%are%different%in%cell%types%and%across%,me%

) ) )

~650,000(((((TF(Mo/fs)

~50,000(binding(sites(for(a(typical(TF)

( ( ( ( ( ( ( ( ( (( ( (

Mo/fs(are(insufficient(to(predict(binding(

Binding(sites(change(across(/me(

•  One%key%determinant%of%whether%or%not%a%TF%binds%is%the%local%chroma,n%landscape:%is%the%DNA%accessible?%

15

Page 16: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Dnase"seq%reveals%protected%regions%of%the%genome%

•  DNase"I%cleaves%at%unprotected%regions%–  Regions%of%open%chroma,n%–  Not%wrapped%around%nucleosomes%or%bound%strong%by%TFs%

) ) )

( ( ( ( (

(175)–)400bp))

© source unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

16

Page 17: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Protein%Interac,on%Quan,ta,on%(PIQ)%•  Predicts%TF%binding%from%DNase"seq%+%sequence%mo,f%preferences%of%TFs%

•  Can%get%predic,ons%for%hundreds%of%TFs%(need%a%mo,f%for%that%TF)%–%no%need%for%an,bodies%specific%to%proteins%

) ) )

( ( ( ( ( (( ( ( ( (

;( ;(…(

Input:)

Modeling:)

Predic'ons:)Courtesy of Macmillan Publishers Limited. Used with permission.Source: Sherwood, Richard I., Tatsunori Hashimoto, et al. "Discovery of Directional and Nondirectional PioneerTranscription Factors by Modeling DNase Profile Magnitude and Shape." Nature Biotechnology 32, no. 2 (2014): 171-8.

17

Page 18: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

) ) )

( ( ( ((( (

  ) ) ) ) ) ) ) )) )

))  ) ) ) ) ) ) ) )

) )

3%Steps%of%PIQ%algorithm%•  1.%Iden,fica,on%of%candidate%sites%using%TF%mo,fs%from%TF%

databases%

•  2.%Smoothing%of%raw%reads%from%each%DNase"seq%experiment.%DNase"seq%reads%are%modeled%as%arising%from%a%Gaussian%process%to%remove%noise%by%adap,vely%smoothing%the%reads%from%neighboring%bases%

•  3.%Iden,fy%binding%sites%of%TF%by%itera,vely%combining%direct%evidence%of%binding%(DNase"seq)%with%computer"generated%model%of%DNaseI%hypersensi,vity%that%includes%that%event%(uses%TF"signature%profile%shapes%and%magnitudes%for%each%TF%to%build%a%model%of%the%expected%DNaseI%hypersensi,vity)%

•  Use%log"likelihood%ra,o%to%test%each%region%for%TF%binding,%calling%those%above%1%%of%null%distribu,on%as%binary%“bound”%regions%

18

Page 19: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Pioneer%Transcrip,on%Factors%•  Region%of%“closed”%chroma,n%that’s%inaccessible%to%most%TFs%can%be%opened%by%pioneer%TF%binding%

•  Then%once%chroma,n%is%opened,%other%“sebler”%TFs%can%bind%

) ) )

( ( ( ( ( (

  ) ) ) ) ) ) ) ) )  ) ) ) ) )

  ) ) ) ) ) ) ) ) )

  ) ) ) ) ) )

)

)

Zaret)2011)© Cold Spring Harbor Laboratory Press. All rights reserved. This content is excluded from ourCreative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.Source: Zaret, Kenneth S., and Jason S. Carroll. "Pioneer Transcription Factors: EstablishingCompetence for Gene Expression." Genes & Development 25, no. 21 (2011): 2227-41.

19

Page 20: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Iden,fica,on%of%Pioneer%TFs%•  Apply%PIQ%to%a%developmental%lineage%model%that%involves%stepwise%differen,a,on%of%mouse%stem%cells%–  Collect%DNase"seq%data%at%six"cell%states%at%different%,mepoints%–  “Pioneer%index”%measures%mo,f"specific%expected%increase%in%DNaseI%

accessibility%at%sites%whose%binding%changes%at%successive%,mepoints%–  Most%mo,fs%showed%lible%pioneer%ac,vity,%while%a%small%number%of%mo,fs%

(TFs)%open%chroma,n%substan,ally%upon%binding%

–  Sebler%TFs%can%bind%once%pioneers%have%opened%the%chroma,n;%loss%of%pioneer%binding%causes%chroma,n%to%return%to%a%closed%state%

mESCs (day 0)

Serum removalWntAct + Activin

Bmp

WntAct RA + Bmp+ Tgf inh

ActivinBmpinh

Mesendoderm(day 3)

Mesoderm(day 5)

Endoderm(day 5)

Intestinalendoderm

(day 6)

Prepancreaticendoderm

(day 6)

a

120

100

80

60

40

20

0–3 –1 1 3 5 7 9 11 13 15

Num

ber

of T

Fs

Pioneer index score

c

d

t0

t1

Oct

4, R

XR

:RA

R, M

yc

Fox

A

CR

EB

Klf/

Sp,

ET

SN

FY

A, E

2F

Nrf

1Z

fp16

1

KA

ISO

Motif

Motif

Pioneer

Courtesy of Macmillan Publishers Limited. Used with permission.Source: Sherwood, Richard I., Tatsunori Hashimoto, et al. "Discovery of Directional and Nondirectional PioneerTranscription Factors by Modeling DNase Profile Magnitude and Shape." Nature Biotechnology 32, no. 2 (2014): 171-8.

20

Page 21: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Asymmetrical%chroma,n%opening%by%direc,onal%pioneers%

•  For%non"palindromic%mo,fs%(e.g.%AATTCG),%we%know%which%strand%(+/")%the%mo,f%is%on%and%therefore%in%which%direc,on%the%TF%is%binding%

•  Some%pioneer%TFs%tend%to%open%chroma,n%more%strongly%in%one%direc,on%–%could%inform%mechanisms%of%pathways%how%TFs%deposit%histone%marks%

Creb1a

Klf7 NFYA Zfp161

Chr

omat

in

open

ing

Inde

x

Motif Motif Motif Motif

21

Page 22: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

3D%structure%of%the%genome%&%enhancer%looping%

•  DNA%is%packaged%,ghtly%in%3D%space%in%the%nucleus%–  This%structure%dictates%which%elements%far%apart%on%the%genome%(Mb%away)%

can%physically%interact%due%to%close%proximity%in%3D%space%•  Important%for%forma,on%of%promoter"enhancer%interac,ons%

–  Enhancers:%distal%regulatory%elements%that,%when%bound%by%specific%TFs,%enhance%the%expression%of%an%associated%gene%

hbp://www.science.ngfn.de/images/S31T04_fig1.JPG% hbp://www.nature.com/nrg/journal/v15/n4/images/nrg3663"f4.jpg%

© Michael Speicher. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

Courtesy of Macmillan Publishers Limited. Used with permission.Source: Ong, Chin-Tong, and Victor G. Corces. "CTCF: An Architectural ProteinBridging Genome Topology and Function." Nature Reviews Genetics (2014).

22

Page 23: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

ChIA"PET%reveals%3D%interac,ons%of%the%genome%

"Abach%linkers%w/%restric,on%enzyme%sites%&%perform%liga,on%in%dilute%condi,ons%to%favor%liga,on%within%each%complex%

hbp://genomebiology.com/content/figures/gb"2010"11"2"r22"1"l.jpg%

"Crosslink%DNA%&%proteins,%ChIP%on%protein%of%interest%(e.g.%RNA%Pol%II),%and%shear%DNA%

"%Perform%restric,on%digest%&%PCR%amplify%fragments,%then%sequence%

"  Two%types%of%liga,on%events:%1.%Self"liga,on%(e.g.,%both%tags%are%near%the%promoter)%–%these%map%near%each%other%on%the%genome%&%we%throw%these%out%2.%Inter"liga,on%(e.g.,%1%tag%from%promoter%&%1%from%enhancer)%reveal%interac,ons%

Chroma,n%Interac,on%Analysis%by%Paired"End%Tagging%

hbp://www.nature.com/nature/journal/v462/n7269/images/nature08497"f1.2.jpg%

Courtesy of Macmillan Publishers Limited. Used with permission.Source: Fullwood, Melissa J., Mei Hui Liu, et al. "An Oestrogen-receptor-α-boundHuman Chromatin Interactome." Nature 462, no. 7269 (2009): 58-64.

Courtesy of Li et al. License: CC-BY.Source: Li, Guoliang, Melissa J. Fullwood, et al. "Software ChIA-PET Tool for ComprehensiveChromatin Interaction Analysis with Paired-end Tag Sequencing." Genome Biology 11 (2010): R22

23

Page 24: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

ChIA"PET%reveals%3D%interac,ons%of%the%genome%

"ChIA"PET%sequence%tags%that%pair%with%tags%from%known%promoter%regions%reveal%RNA%PolII%ChIP%peaks%that%are%at%enhancer%regions%

hbp://www.nature.com/ng/journal/v45/n8/images/ng.2677"F2.jpg%

Courtesy of Macmillan Publishers Limited. Used with permission.Source: Mercer, Tim R., Stacey L. Edwards, et al. "DNase I-hypersensitive ExonsColocalize with Promoters and Distal Regulatory Elements." Nature Genetics (2013).

24

Page 25: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Assessing%significance%of%ChIA"PET%interac,ons%

•  Inter"liga,on%events%(e.g.%between%a%puta,ve%enhancer%and%promoter)%could%arise%from%two%sources%–  1.%Within%the%same%cluster%–%these%are%true%3D%interac,ons%–  2.%From%liga,on%between%2%different%clusters%–%these%are%false%posi,ves%

•  We%need%to%assess%if%the%inter"liga,on%events%are%significantly%enriched%for%having%occurred%from%within%the%same%cluster%(true%interac,on%events)%

hbp://genomebiology.com/2010/11/2/R22/figure/F1%

Courtesy of Li et al. License: CC-BY.Source: Li, Guoliang, Melissa J. Fullwood, et al. "Software ChIA-PET Tool for ComprehensiveChromatin Interaction Analysis with Paired-end Tag Sequencing." Genome Biol 11 (2010): R22.

25

Page 26: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Assessing%significance%of%ChIA"PET%interac,ons%

•  Hypergeometric%test%for%significance%–  IA,B:%#%of%inter"liga,on%events%between%loci%A%and%B%(paired"tags%mapping%to%A%

&%B)%

–  cA%,%cB:%total%number%of%liga,on%events%associated%with%A%,%B%(single%tags%mapping%to%A%or%B)%

–  N:%total%number%of%liga,on%events%(total%single%tags)%

) ) )

( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( (

( (

P (IA,B |N, cA, cB) =

(cA

IA,B

)(NcA

cBIA,B

)

(NcB

)

) ) )

( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( (

( (

p =

min{cA,cB}X

i=IA,B

P (i|N, cA, cB)P"value%is%probability%of%your%observa,on%plus%anything%more%extreme:%

Probability%of%observing%exactly%your%observed%number%of%inter"liga,on%events%under%null%hypothesis%that%each%s,cky%end%has%an%equal%probability%of%liga,ng%to%any%other%end:%

26

Page 27: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

ChIA"PET%has%a%high%false%nega,ve%rate%

•  Due%to%heterogeneous%star,ng%popula,on%of%cells%(transient%promoter"enhancer%interac,ons),%complex%protocol%&%stringent%P"values%to%cut%down%on%false"posi,ves,%ChIA"PET%has%a%high%false%nega,ve%rate%

•  So,%given%an%a%number%of%promoter"enhancer%interac,ons%observed%in%ChIA"PET%experiment%1%&%in%ChIA"PET%experiment%2%with%each%capturing%only%a%subset%of%the%total%events,%we’d%like%to%es,mate%the%true%number%of%interac,ons%occurring%in%the%cell%

27

Page 28: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

Es,ma,ng%the%total%number%of%events%from%overlap%

•  Again,%we%can%use%the%hypergeometric%model%–  Given%two%observed%sample%sizes%m%&%n%along%with%their%overlap%k,%we’d%like%to%es,mate%the%total%number%events%N*

–  The%maximum%likelihood%es,mate%of%N%is%approximately:%

–  Example:%Experiment%1:%100%events%% % %Experiment%2:%200%events%% % %"%overlap%is%only%20.%It%seems%like%we%must%be%sampling%

only%a%small%frac,on%of%the%total%events%each%,me!%%"%Indeed,%maximum%likelihood%es,mate%is%1000%total%events%that%

each%sample%came%from%

) ) )

( ( ( ( (

) ) ) ) ) ) ) )) ) ) ) ) ) ) ) )

) ) )) ) ) ) ) ) ) ) ) )

) ) ) ) ) ) ) ) % )) ) ) ) )

))

) ) ) ) ) ) ) )))

) ) )

( ( ( ( ( (( ( (

  ) ) ) ) % ) ))

  ) ) ) ) ) ) ) ) ) )) ) ) ) ) )

) ) )

28

Page 29: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

•  The%previous%model%assumes%all%events%were%true%posi,ves,%while%in%reality%some%are%false%posi,ves.%–  We%overes,mate%the%total%event%count%since%the%observed%m%and%n%are%larger%than%they%truly%are%without%the%false%posi,ves%

–  Assume%the%overlapping%events%are%true%posi,ves%and%the%non"overlapping%events%have%false%posi,ve%rate%f%(so%1"f%of%the%events%are%true%posi,ves).%Then%we%can%update%es,mates%of%m%and%n:%

–  With%the%previous%example%(100%and%200%events%w/%20%overlapping),%let’s%say%there%is%a%false%posi,ve%rate%of%5%.%Then:%•  m’%=%(0.95)(80)%+%20%=%96%•  n’%=%(0.95)(180)%+%20%=%191%•  The%modified%es,mate%of%the%total%#%of%events%if%therefore:%

(96)(191)/20%≈%869%(vs.%1000%without%considering%false"posi,ves)%

*

Es,ma,ng%the%total%number%of%events%from%overlap%

) ) )

( ( ( ( (

  ) ) ) ) ) ) ) ) ))) ) ) ) ) ) )

) )  ) ) ) ) ) ) )

) ) ) ) ) ) ) % ) ) )) ) ) ) ) ) ) )

) ) ) ) )  ) ) ) ) ) ) ) % )

) ) ) )

29

Page 30: Recitation 11: Review of topics covered in Lecture 18 · 2020. 7. 10. · R5 C0 H3K9me1 TF0 TF1 TF2 TSS GS E/GM GM0 GM1 GE0 GE1 GE2 HLA-DMA HLA-DMA BRD2 BRD2 BRD2 BRD2 BRD2 BRD2 AL645941.1

MIT OpenCourseWarehttp://ocw.mit.edu

7.91J / 20.490J / 20.390J / 7.36J / 6.802J / 6.874J / HST.506J Foundations of Computational and Systems BiologySpring 2014

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.