How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y...

Preview:

Citation preview

How do we represent the position specific preference ?

BID_MOUSE I A R H L A Q I G D E MBAD_MOUSE Y G R E L R R M S D E FBAK_MOUSE V G R Q L A L I G D D IBAXB_HUMAN L S E C L K R I G D E L BimS I A Q E L R R I G D E FHRK_HUMAN T A A R L K A L G D E LEgl-1 I G S K L A A M C D D F

Statistical representation

G: 5 -> 71%

S: 1 -> 14 %

C: 1 -> 14 %

Basic concept of motif identification 2.

Practice: identify potential transcription factor binding sites on a promoter

sequence.

Using TESS : Transcription Element Search System

http://www.cbil.upenn.edu/cgi-bin/tess/tess33?RQ=WELCOME

TESS result

Why there are many false positives for TF binding site scan?

Contextual dependency is not considered.

Stringency of the matrices.

Stringency of the matrices

A C G T Consens

us 40 13 23 23 N

20 3 70 5 G

55 3 40 0 R

0 93 0 5 C

53 8 8 30 W

15 0 3 82 T

0 0 100 0 G

0 50 0 50 Y

0 68 0 30 C

12 35 3 48 Y

A C G T

Consensus

4 0 13 0 G 5 0 12 0 G

15 0 2 0 A 0 17 0 0 C

17 0 0 0 A 0 0 0 17 T 0 0 17 0 G 0 13 0 4 C 0 17 0 0 C 0 17 0 0 C 0 0 17 0 G 0 0 17 0 G 2 0 15 0 G 0 17 0 0 C

17 0 0 0 A 0 0 0 17 T 0 0 17 0 G 0 2 0 15 T 0 13 0 4 C 0 7 2 7 Y P53_01

P53_02

Consensus –10 bp

Consensus –20 bp

DNA Pattern – Transcription factor binding site

• Pattern strings / Matrixes are extracted from known binding sequence.

• Core vs whole.

• Some short and/or ambiguous patterns will have many hits.

Sequence logo

Info N A C G T Consensus

1 0.679 27 0 5 17 5 G

2 0.883 27 6 2 19 0 G

3 1.771 27 1 0 26 0 G

4 1.619 27 25 2 0 0 A

5 2 27 0 0 0 27 T

6 1.771 27 0 0 1 26 T

7 1.771 27 26 0 0 1 A

8 0.192 27 8 2 11 6 R

1.0

2.0 Information

content

Comparing genomes

For understanding genome organization.

For identifying functionally conserved region / sequences. 3’, 5’ UTR (eg. microRNA binding sites) Transcription factor binding sites /

regulatory modules.

Vista Genome Browser

Practice & Observe: cross genome comparison using vista browser

Identifying conserved regulatory modules

• Regulatory module: a set of TF binding sites that controls a particular aspects of transcriptional regulation.

• Functional requirement conservation at the binding site (sequence) level.

Ways to Identify conserved regulatory modules

• Based on sequence similarity: MEME, rVista, Whole genome rVista for model

organisms…

• Based on binding site identity: BLISS

Practice: Identifying conserved TF binding sites using rVista

1.) Search for your gene in Whole genome rVista.

Or

2.) Compile corresponding genomic region from different species (can be >2). Load to rVista. This can be used for identifying shared regulatory modules in related genes in the same organism as well.

rVista

Practice & Observe: Load genomic sequences from Human, Rat, and Opossum to rVista. Choose TF matrices (e.g. E2F, P53, ATF, etc)

Representation of Deep Seq data

chr2L 10000192 10000217 U0 0 + chr2L 10000227 10000252 U1 0 -chr2R 10000310 10000335 U2 0 +chr3L 10000496 10000521 U1 0 -chr21 10000556 10000581 U2 0 +

Chrom. Start End name Scor Strand

Representation of Deep Seq data

The importance of reference genome

• All coordinates are only meaningful for a given genome assembly.

• One assembly may have multiple releases (annotations).

Manipulating Deep Seq data with Galaxy

Practice & Observe:

1.Load the PolII.H99.Bed file to Galaxy with the Get Data tool.

2.Sort data based on chromosome location c2.

3.Filter out lines with U0 with the expression c4!=‘U2’

Visualizing Deep Seq data with UCSC genome browser

Practice & Observe I:

1.Load the PolII.H99.Bed file as custom track to the browser by copy/past the URL link.

2.View ‘dense’ and then ‘full’ presentation of the track.

Visualizing Deep Seq data with UCSC genome browser

Practice & Observe II:

1.Save the landmark.bed file to your local computer. View the contents with Notepad.

2.Load the local file to UCSC browser.

3.Edit the color value, save, resubmit, and observe the differences.

Apollo Genome annotation tools

Observe: Using Apollo to organize information for studying complex genomic regions.

Recommended