Motif Detection in Yeast Vishakh Joe Bertolami Nick Urrea Jeff Weiss

Motif Detection in Yeast

VishakhJoe Bertolami

Nick UrreaJeff Weiss

Overview1. Problem Statement2. Motivation3. History4. Our Approach5. Evaluation6. Results7. Discussion8. References

1. The Problem Find regulatory sequences in the upstream

region of yeast DNA. Regulatory sequences are segments of

DNA where proteins can bind to enhance transcription of a gene.

The Problem We are given:

Upstream Genome- consists of: Gene Families- consists of:

Individual Genes- consists of: Strings like ATGC

We had to find substrings unusually frequent in gene families given their distribution in the whole upstream genome.

The Problem We emulated techniques devised by van

Helden. Worked on similar data set and tried to

emulate and even better his findings.

2. Motivation Organisms like yeast share many genes

with humans. As a result, they share diseases too. Finding regulatory sequences in yeast

might lead to medical advances. Might lead to therapies for diseases such

as cystic fibrosis.

3. History Previous century saw rapid advances in

genetics. Scientific community trying to get a better

understanding of various genomes. This particular technique was developed

by Jacques van Helden.

4 .Our approach Extract all substrings of lengths 6-8 in the

upstream genome. Calculate frequency of occurrence of each

substring. Put this data in a table.

Our Approach Consider a gene family. Find all substrings in it and frequencies

and build table. For each entry, add the probability of

occurrence. Use above data to calculate three scores.

Our Approach Score 1: Expected Occurrence / Actual

Occurrence Use probability of occurrence and size of

gene family to calculate expected occurrence.

Divide by actual occurrence. Low score -> Unusually frequent substring.

Our Approach Score 2: Poisson Distribution Use expected and actual number of

occurrences. If substring occurs ‘n’ times, calculate

probability of ‘n’ occurrences using Poisson Distribution.

Lower probability -> Unusually frequent

Our Approach Score 3: Binomial Theorem Use probability of occurrence, sizes of

genome and gene family and actual occurrences.

If substring occurs ‘n’ times, calculate probability of ‘n’ occurrences using Binomial Distribution.

Lower probability -> Unusually frequent

Our Approach Sort substrings by a score. Take top sequences, create a probability

matrix. Iterate probability matrix to get

probabilistic model of regulatory sequence.

5. Evaluation Metrics Van Helden’s results in ’98 paper and his

website. ’98 paper used old data, not very reliable

for evaluation. Website very useful since it works on

current data and dynamically calculates results.

Compared our output to his.

Evaluation Metrics Also, compare three scores types to find

best method.

6. ResultsComparison of Results for MET FAMILY

Gene Van Helden’s site Binomial Dist Poisson Dist Expected / Actual Old Paper

CACGTG 1 1 3 4 1

ACGTGA 2 2 1 2 3

TCACGT 3 3 2 1 2

ATATAT 4 4 N/A N/A 5

TATATA 5 5 N/A N/A 10

AACTGT 6 7 4 28 4

ACAGTT 7 6 N/A 29 N/A

ACACAC 8 9 7 N/A N/A

GTGTGT 9 8 6 N/A N/A

Results

Probability matrices generated successfully!

7. Discussion Paper results clearly outdated. Close co-relation with van Helden’s site. Binomial distribution best, followed by

Poisson and Expected/Actual

Discussion Why don’t Binomial results perfectly

match van Helden’s site? Van Helden paper only outlines general

method. He uses many filters and adjustments. Limited info about them on site. We used similar, but not same, filters. Example: Purge sequences that appear twice in

a row.

Discussion Future work

Find more filters. Try other similar organisms’ genomes. Biologically verify results!

Discussion What we learnt

Biology! First-hand look at genetic data Became more familiar with genes Clearly understood what the fuss about genetics is

about Computer Science

Teamwork Interfacing CS with other scientific disciplines

References van Helden, J., André, B. & Collado-Vides, J.

(1998). Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281(5), 827-42.

van Helden, J., Rios, A. F. & Collado-Vides, J. (2000). Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28(8):1808-18.

Motif Detection in Yeast Vishakh Joe Bertolami Nick Urrea Jeff Weiss

Documents

Desmontadora de Llantas - Urrea

FusionNet: 3D Object Classiﬁcation Using Multiple Data ...rezab/papers/fusionnet.pdfFusionNet: 3D Object Classiﬁcation Using Multiple Data Representations Vishakh Hegde Matroid

Orfeu Bertolami and Hodjat Marijiy - arXiv · Orfeu Bertolami and Hodjat Marijiy Departamento de F sica e Astronomia, Faculdade de Ci^encias da Universidade do Porto and Centro de

The Giant Branches – Leiden 14/05/09 The Initial-Final Mass Relation Aldo Serenelli – MPA Salaris, Serenelli, Weiss & Miller Bertolami (2009)

Cambridge, 17 September, 2008 The XO and Learning Claudia Urrea OLPC claudia@laptop.orgclaudia@laptop.org

Néstor Mario Urrea Duque

WITH URREA TOOLS I CAN DO EVERYTHING. · 2019-05-10 · 01 WITH URREA TOOLS I CAN DO EVERYTHING. urreaprofessionaltools.com There will always be a set of Urrea tools designed with

Autonomous Navigation of Generic Monocular Quadcopter in ... · Autonomous Navigation of Generic Monocular Quadcopter in Natural Environment Kumar Bipin, Vishakh Duggal and K.Madhava

Rotomartillo - Urrea

Lijadora de Banda - Urrea

Orfeu Bertolami- Dark Energy - Dark Matter Unification: Generalized Chaplygin Gas Model

Regression Using Boosting Vishakh (vv2131@columbia.edu)vv2131@columbia.edu Advanced Machine Learning Fall 2006

FusionNet: 3D Object Classiﬁcation Using Multiple Data ... · Vishakh Hegde Stanford and Matroid vishakh@matroid.com Reza Zadeh Stanford and Matroid reza@matroid.com Abstract High-quality

HFSda-ip.getmyip.com/PDF/Documents/Transcripts... · Author: Raymond Urrea Created Date: 7/11/2018 1:49:00 AM

Urreamedios.urrea.com/catalogo/manuales/SC507.pdf · 2016. 8. 8. · Urrea ... sc507

Dobladora de Tubo Hidráulica - Urrea

Disarmament & International Security (1st Committee of the General Assambly) Presidents: Laura Guáqueta & José Urrea

Colector de Polvo - Urrea

Mariola Urrea - directivoscede.com€¦ · Mariola Urrea | Professor of Public International and European Union Law at Universidad de La Rioja To speak of Mariola Urrea is to speak

Electric Chain Block - Urrea