45
Functional annotation of ChIP-peaks Minghui Wang, Qi Sun Bioinformatics Facility Institute of Biotechnology

Functional annotation of ChIP-peakscbsu.tc.cornell.edu/lab/doc/CHIPseq_workshop_2017_lecture2.pdfSense Antisense Antisense_step1 Antisense_step2 1 0 1 2 6 1 2 8 3 2 8 3 2 8 3 3 8 3

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • Functional annotationof ChIP-peaksMinghui Wang, Qi SunBioinformatics Facility

    Institute of Biotechnology

  • Sense Antisense Antisense_step1 Antisense_step21 0 1 26 1 2 83 2 8 32 8 3 38 3 3 84 3 8 02 8 0 0

    R= -0.27R= 0.13

    R= 0.79

    Sense

    Antisense

    Antisense_step1

    Antisense_step2

    peak_annotation

    seqnamesstartendwidthstrandlengthsummittagsX.10.log10.pvalue.fold_enrichmentFDR...annotationgeneChrgeneStartgeneEndgeneLengthgeneStrandgeneIdtranscriptIddistanceToTSS

    1I40584225168*16841364612.73.5710.99Promoter (

  • Experimental designBiorep 1

    Yong

    Old

    Biorep 2 Biorep 3

    TR

    CL

    TR

    CL

    𝑢𝑢𝑢

    𝑢𝑢3

    𝑢𝑢4

    𝑢𝑢𝑢-𝑢𝑢2 ? 0𝑢𝑢2

    𝑢𝑢3-𝑢𝑢4 ? 0

    ((𝒖𝒖𝒖𝒖-𝒖𝒖𝒖𝒖) − (𝒖𝒖𝒖𝒖 − 𝒖𝒖𝒖𝒖) )? 0 is for ????

  • GLM (Poisson)

    Yi

    341242184420251032153814

    =

    1 1 1 11 1 0 01 1 1 11 1 0 01 1 1 11 1 0 01 0 1 01 0 0 01 0 1 01 0 0 01 0 1 01 0 0 0

    µ

    αβ

    α*β

    + εi

    out

  • Identify enriched regions within Yong or Old

    GLM

    Biorep 1

    Biorep 2

    Biorep 3

    M pu (2015) Trimethylation of Lys36 on H3 restricts gene expression change during aging and impacts life span. Genes Dev 1;29(7):718-31

  • Identify enrichment regions between young and old stages

    Young

    Old

    Biorep 1

    Biorep 2

    Biorep 3

    Biorep 1

    Biorep 2

    Biorep 3

    GLM

    M pu (2015) Trimethylation of Lys36 on H3 restricts gene expression change during aging and impacts life span. Genes Dev 1;29(7):718-31

  • Visualization & Annotation

  • Downstream analysis workflow

    Peak calling

    Enriched regionsVisualization with IGV

    Nearest genes

    Relationship to gene features

    Density plotting of gene features

    Functional enrichment(e.g. GO categories)

    Motif analysis

    Other advanced analysis

  • Tools for visualization

    https://deeptools.readthedocs.io/en/latest/

    http://software.broadinstitute.org/software/igv/

    http://homer.ucsd.edu/homer/

    https://deeptools.readthedocs.io/en/latest/http://software.broadinstitute.org/software/igv/http://homer.ucsd.edu/homer/

  • deepTools

    Tools for BAM and bigWig file processing - multiBamSummary compute read coverages over bam files. Output used for plotCorrelation or plotPCA- multiBigwigSummary extract scores from bigwig files. Output used for plotCorrelation or plotPCA- correctGCBias corrects GC bias from bam file. Don't use it with ChIP data- bamCoverage computes read coverage per bins or regions- bamCompare computes log2 ratio and others of read coverage of two samples per bins or regions- bigwigCompare computes log2 ratio and others from bigwig scores of two samples per bins or regions- computeMatrix prepares the data from bigwig scores for plotting with plotHeatmap or plotProfile

    Tools for QC - plotCorrelation plots heatmaps or scatterplots of data correlation- plotPCA plots PCA- plotFingerprint plots the distribution of enriched regions

    Heatmaps and summary plots- plotHeatmap plots one or multiple heatmaps of user selected regions over different genomic scores- plotProfile plots the average profile of user selected regions over different genomic scores- plotEnrichment plots the read/fragment coverage of one or more sets of regions

  • [mingh@cbsumm16 ChIP_seq_workshop_2017]$ bamCoverage -hbamCoverage -b reads.bam -o coverage.bw optionalOptional arguments:--help, -h show this help message and exit--scaleFactor SCALEFACTOR--binSize INT bp…

    usage: bamCompare -b1 treatment.bam -b2 control.bam -o log2ratio.bwOptional arguments:--help, -h show this help message and exit--scaleFactorsMethod--binSize INT bp--ratio {log2,ratio,subtract,add,mean,reciprocal_ratio,first,second}…

    deepTools

  • bigwig

    peak_region

  • HOMERmakeTagDirectory [options] [alignment file 2]ormakeTagDirectory reps-combined -d rep-1 rep-2 rep-3

    Optimizing tag files...Estimated genome size = 119638054Estimated average read density = 0.014636 per bpTotal Tags = 1751052.0Total Positions = 1559937Average tag length = 50.0Median tags per position = 1 (ideal: 1)Average tags per position = 1.123Fragment Length Estimate: 291Peak Width Estimate: 345Autocorrelation quality control metrics:

    Same strand fold enrichment: 3.9Diff strand fold enrichment: 3.8Same / Diff fold enrichment: 1.0

    Guessing sample is ChIP-Seq or unstranded RNA-Seq - autocorrelation looks good.

    Optimizing tag files...Estimated genome size = 25588101Estimated average read density = 0.090854 per bpTotal Tags = 2324782.0Total Positions = 754412Average tag length = 48.9Median tags per position = 2 (ideal: 1)Average tags per position = 3.082

    !! Might have some clonal amplification in this sample if sonication was used

    Fragment Length Estimate: 155Peak Width Estimate: 154

    !!! No reliable estimate for peak sizeSetting Peak width estimate to be equal to fragment length estimate

    Autocorrelation quality control metrics:Same strand fold enrichment: 1.2Diff strand fold enrichment: 1.2Same / Diff fold enrichment: 1.0

    Guessing sample is ChIP-Seq - may have low enrichment with lots of background

    Treatment file Control file

  • HOMERmakeUCSCfile [options] -bigWig -o outputbedGraphToBigWig should be available in your executable pathwget https://github.com/ENCODE-DCC/kentUtils/blob/v302.1.0/bin/linux.x86_64/bedGraphToBigWigchmod 777 bedGraphToBigWiggenome_size.file1 304276712 196982893 234598304 185850565 26975502Mt 366924Pt 154478

    https://github.com/ENCODE-DCC/kentUtils/blob/v302.1.0/bin/linux.x86_64/bedGraphToBigWig

  • Tools for gene features analysis

  • HOMERannotatePeaks.pl options >outputannotatePeaks.pl test_results_peaks.narrowPeak_chr tair10 >out

    est_results_peaks.narrowP Chr Start End Strand Pea FDAnnotation Detailed Annotat Distance to TSS Nearest PromotNearest U Nearest Re Gene Name Gene Aliastest_results_peak_14368 Chr5 6833504 6837577 + exon (AT5G20 exon (AT5G20250 1881 AT5G20250.4 At.74986 NM_00103 DIN10 DARK IND test_results_peak_1382 Chr1 6971312 6973001 + exon (AT1G20 exon (AT1G20110 671 AT1G20110.1 At.15444 NM_10186 AT1G20110 T20H2.10|test_results_peak_855 Chr1 4347808 4349969 + promoter-TSS promoter-TSS (A 390 AT1G12760.1 At.43884 NM_00103 AT1G12760 T12C24.29

    test_results_peak_15041 Chr5 15843775 15845935 + exon (AT5G39 exon (AT5G39570 896 AT5G39570.1 At.20492 NM_12331 AT5G39570 MIJ24.6|Mtest_results_peak_154 Chr1 739488 742090 + intron (AT1G0 intron (AT1G0309 1110 AT1G03090.2 At.24059 NM_10019 MCCA -

    test_results_peak_6386 Chr2 16483892 16485127 + exon (AT2G39 exon (AT2G39480 530 AT2G39480.1 At.63501 NM_12950 PGP6 F12L6.14|F test_results_peak_3313 Chr1 24046891 24048872 + promoter-TSS promoter-TSS (A 741 AT1G64720.1 At.74749 NM_10514 CP5 F13O11.4|test_results_peak_8490 Chr3 7116011 7117776 + exon (AT3G20 exon (AT3G20410 692 AT3G20410.1 At.8182 NM_11293 CPK9 domain pro

  • PAVIS

    features distance to TSS

  • PeakAnalyzer

    Peak fileGTF file

  • peak

  • Tools for density plot

    https://github.com/shenlab-sinai/ngsplot

    https://github.com/shenlab-sinai/ngsplot

  • ngs.plot

    ngs.plot.r -G genome -R region -C [cov|config] file-O name [Options]

    -G Genome name. Use ngsplotdb.py list to show available genomes.-R Genomic regions to plot: tss, tes, genebody, exon, cgi, enhancer, dhs or bed-C Indexed bam file or a configuration file for multiplot-O Name for output: multiple files will be generated

    Options-L Flanking region size(will override flanking factor)-N Flanking region factor-RB The fraction of extreme values to be trimmed on both ends default=0, 0.05 means 5% of extreme values will be trimmed-S Randomly sample the regions for plot, must be:(0, 1]-P #CPUs to use. Set 0(default) for auto detection-AL Algorithm used to normalize coverage vectors: spline(default), bin-CS Chunk size for loading genes in batch(default=100)-MQ Mapping quality cutoff to filter reads(default=20)-FL Fragment length used to calculate physical coverage(default=150)

    https://github.com/shenlab-sinai/ngsplot/wiki/SupportedGenomes

  • ngs.plot.r -G hg19 -R tss -C hesc.H3k4me3.rmdup.sort.bam -O hesc.H3k4me3.tss -T H3K4me3 -L 3000 -FL 300

    ngs.plot

  • ngs.plot.r -G hg19 -R tss -C hesc.H3k4me3.rmdup.sort.bam:hesc.Input.rmdup.sort.bam -O hesc.H3k4me3vsInp.tss -T H3K4me3 -L 3000

    ngs.plot

  • ngs.plot.r -G hg19 -R genebody -C config.hesc.k36.txt -O hesc.k36.genebody -D ensembl -FL 300

    hesc.H3k36me3.rmdup.sort.bam high_expressed_genes.txt "High"hesc.H3k36me3.rmdup.sort.bam medium_expressed_genes.txt "Med"hesc.H3k36me3.rmdup.sort.bam low_expressed_genes.txt "Low"

    ngs.plot

  • deepToolscomputeMatrix scale-regions -S H3K27Me3-input.bigWig H3K4Me1-Input.bigWig H3K4Me3-Input.bigWig -R genes19.bed genesX.bed --beforeRegionStartLength 3000 --regionBodyLength 5000 --afterRegionStartLength 3000 --skipZeros -o matrix.mat.gz

    plotProfile -m matrix.mat.gz -out ExampleProfile1.png --numPlotsPerRow 2 --plotTitle "Test data profile"

  • HOMERannotatePeaks.pl -hist …

  • ChIPseeker

  • Functional enrichmentOver-represented functional annotations of nearest genes of peaks• Gene Ontology• Biological Pathways

    Typical tools• DAVID https://david.ncifcrf.gov/• GREAT http://bejerano.stanford.edu/great/public/html/• Blast2go https://www.blast2go.com/• Mapman http://mapman.gabipd.org• Homer http://homer.salk.edu/homer/ngs/annotation.html

    https://david.ncifcrf.gov/http://bejerano.stanford.edu/great/public/html/https://www.blast2go.com/http://mapman.gabipd.org/http://homer.salk.edu/homer/ngs/annotation.html

  • Webpage tools

  • Oliver et al (2004) The plant J

  • HOMER

    findGO.pl [options]

    http://homer.ucsd.edu/homer/microarray/go.html

    http://homer.ucsd.edu/homer/microarray/go.html

  • Motif pattern analysis

  • Motif analysismeme [optional arguments]

    MEME (http://meme.sdsc.edu/meme/cgi-bin/meme.cgi)

    http://meme.sdsc.edu/meme/cgi-bin/meme.cgi

  • HOMER

  • biomaRt & Bioconductor

  • ChIPseeker

  • ChIPpeakAnnopeak

  • peak

  • promoter

  • genome