21
-Go over HW 4 -Q/A on projects -Q/A on guest lectures rd Marcotte/Univ. of Texas/BIO337/Spring 2014

Go over HW 4 Q/A on projects Q/A on guest lectures

  • Upload
    presley

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Go over HW 4 Q/A on projects Q/A on guest lectures. Edward Marcotte/Univ. of Texas/BIO337/Spring 2014. Motifs. BIO337 Systems Biology / Bioinformatics – Spring 2014. Edward Marcotte, Univ of Texas at Austin. Edward Marcotte/Univ. of Texas/BIO337/Spring 2014. - PowerPoint PPT Presentation

Citation preview

Page 1: Go over HW 4 Q/A on projects Q/A on guest lectures

- Go over HW 4- Q/A on projects- Q/A on guest lectures

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Page 2: Go over HW 4 Q/A on projects Q/A on guest lectures

Motifs

BIO337 Systems Biology / Bioinformatics – Spring 2014

Edward Marcotte, Univ of Texas at Austin

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Page 3: Go over HW 4 Q/A on projects Q/A on guest lectures

Nature Communications 4, Article number: 2078 doi:10.1038/ncomms3078

RamR represses the ramA gene, which encodes the activator protein

for the acrAB drug efflux pump genes.

An example transcriptional regulatory cascade Here, controlling Salmonella bacteria multidrug resistance

RamR dimer

Sequence-specific DNA binding

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Page 4: Go over HW 4 Q/A on projects Q/A on guest lectures

Antimicrob Agents Chemother. Feb 2012; 56(2): 942–948. Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Historically, DNA and RNA binding sites were defined biochemically (DNAse footprinting, gel shift assays, etc.)

Hydroxyl radical footprinting of ramR-ramA intergenic region with RamR

Page 5: Go over HW 4 Q/A on projects Q/A on guest lectures

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Historically, DNA and RNA binding sites were defined biochemically (DNAse footprinting, gel shift assays, etc.)

Now, many binding motifs are discovered bioinformatically

Isolate different nucleic acid segments bound by

copies of the protein

(e.g. all sites bound across a

genome)

Sequence

Search computationally

for recurring motifs

Image: Antimicrob Agents Chemother. Feb 2012; 56(2): 942–948.

Page 6: Go over HW 4 Q/A on projects Q/A on guest lectures

Transcription factor regulatory networks can be highly complex, e.g. as for embryonic stem cell regulators

http://www.pnas.org/content/104/42/16438

TF target

PPI

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

TF

Page 7: Go over HW 4 Q/A on projects Q/A on guest lectures

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

MOTIFS

Binding sites of the transcription factor ROX1

consensus

frequencies

scaled by informationcontentfreq of nuc b in genome

frequency of nuc b at position i

Page 8: Go over HW 4 Q/A on projects Q/A on guest lectures

So, here’s the challenge:

Given a set of DNA sequences that contain a motif (e.g., promoters of co-expressed

genes), how do we discover it computationally?

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Page 9: Go over HW 4 Q/A on projects Q/A on guest lectures

Could we just count all instances of each k-mer?

Why or why not?

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

promoters and DNA binding sites are not well

conserved

Page 10: Go over HW 4 Q/A on projects Q/A on guest lectures

How does motif discovery work?

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Page 11: Go over HW 4 Q/A on projects Q/A on guest lectures

How does motif discovery work?

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Update the motif model

Assign sites to motif

Update the motif model

Update the motif model

Assign sites to motif

Assign sites to motif

etc.

What does this process remind you of?

Page 12: Go over HW 4 Q/A on projects Q/A on guest lectures

How does motif discovery work?

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Motif finding often uses expectation-maximization(like the k-means clustering we already learned about),

i.e. alternating between building/updating a motif model and assigning sequences to that motif model.

Searches the space of possible motifs for optimal solutions without testing everything.

Most common approach = Gibbs sampling

Page 13: Go over HW 4 Q/A on projects Q/A on guest lectures

We will consider N sequences, each with a motif of length w:

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

N seqsk

Ak = position in seq k of motif

qij = probability of finding nucleotide (or aa) j at position i in motifi ranges from 1 to wj ranges across the nucleotides (or aa)

pj = background probability of finding nucleotide (or aa) j

w

Page 14: Go over HW 4 Q/A on projects Q/A on guest lectures

Start by choosing w and randomly positioning each motif:

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

N seqsk

Ak = position in seq k of motif

qij = probability of finding nucleotide (or aa) j at position i in motifi ranges from 1 to wj ranges across the nucleotides (or aa)

pj = background probability of finding nucleotide (or aa) j

Completely randomly

positioned!

NOTE: You won’t give any information at all about what or

where the motif should be!

Page 15: Go over HW 4 Q/A on projects Q/A on guest lectures

Predictive update step: Randomly choose one sequence,calculate qij and pj from N-1 remaining sequences

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

qij = probability of finding nucleotide (or aa) j at position i in motifi ranges from 1 to wj ranges across the nucleotides (or aa)

pj = background probability of finding nucleotide (or aa) j

Randomly choose

Update model w/ these

background frequency of

symbol jcount of symbol j at position i

bj

pj is calculated similarly from the counts outside

the motifs

Page 16: Go over HW 4 Q/A on projects Q/A on guest lectures

Stochastic sampling step: For withheld sequence, slide motif down sequence & calculate agreement with model

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Withheld sequence

Odds ratio of agreement with model

vs. background

Position in sequence

qij

pj

cxij

cxij

(see the paper for details)

Page 17: Go over HW 4 Q/A on projects Q/A on guest lectures

Stochastic sampling step: For withheld sequence, slide motif down sequence & calculate agreement with model

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Withheld sequence

Odds ratio of agreement with model

vs. background

Position in sequence

qij

pj

cxij

cxij

(see the paper for details)

Here’s the cool part. DON’T just choose the maximum. INSTEAD, select a new Ak position proportional to this odds ratio.

Then, choose a new sequence to withhold, and repeat everything.

Page 18: Go over HW 4 Q/A on projects Q/A on guest lectures

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Over many iterations, this magically converges to the most enriched motifs. Note, it’s stochastic:

3 runs on the same data

Page 19: Go over HW 4 Q/A on projects Q/A on guest lectures

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Measure mRNA abundances using DNA

microarrays

Search for motifs in promoters of glucose vs

galactose controlled genes

Discovered motifs Known motif

Galactose upstream activation sequence

“AlignAce”

Page 20: Go over HW 4 Q/A on projects Q/A on guest lectures

Edward Marcotte/Univ. of Texas/BIO337/Spring 2014Edward Marcotte/Univ. of Texas/BIO337/Spring 2014

Measure mRNA abundances using DNA

microarrays

Search for motifs in promoters of heat-

induced and repressed genes

Discovered motifs Known motif

Cell cycle activation motif, histone activator

“AlignAce”

Page 21: Go over HW 4 Q/A on projects Q/A on guest lectures

e.g., http://compbio.mit.edu/encode-motifs/

If you need them, we now know the binding motifs for 100’s of transcription factors at 1000’s of distinct sites in the human genome, including many new motifs.