HISPIG A Discriminative Model Refinement Approach with Iterations for Detecting Regulatory Regions...

HISPIG – A Discriminative Model Refinement Approach with Iterations for Detecting

Regulatory Regions

Takuma Tsukaharattsukaha@indiana.edu

kobestory@hotmail.com

Milton Taylor Laboratory• Using microarrays and bioinformatics techn

ologies to develop better treatments for HCV (Virahep-C project)– Only known treatment for HCV is treatment wit

h interferon-alpha (IFN-a), or more recently combination treatment of pegylated IFN-a and Ribavirin

– Interferons were discovered as proteins that inhibit virus replication, and are induced in mammalian cells in response to virus infection

PBMC Experiment• PBMC was isolated from group of healthy i

ndividuals, and treated with IFN-a alone, or with Ribavirin.

• By microarray experiment results, expression of large number of genes were either up-regulated or down-regulated – It was of interest to analyze the upstream region

of these genes for the presence of motifs (ISRE and GAS)

Goal of My Project• Build a computer model that effectively

searches ISRE and GAS sequences in human genes – ISRE/GAS both work as a promoter– ISRE drives the expression of most of type I

IFN stimulated genes (and some gamma)– GAS drives the expression of type II IFN

stimulated genes – Genes that contain ISRE / GAS express more

with IFN than ones that do not – Generalize to be able to search any motif in the

future

Type I IFN Signal Transduction

HETERODIMER

Transcription

CYTOPLASM NUCLEUS

(IRF-9)

The Situation• We have a list of known motifs to refer to

– Numerous ISRE and GAS are known and published • We have sets of sequences from microarray

experiments that is– likely to contain motifs…S1 (up-regulated genes)– unlikely to contain motifs…S2 (down-regulated

genes, and random genes)• To detect motifs, build a model M(+) using the

list of known motifs– Occurrences of the model will be detected in both

S1 and S2

How to Solve• Still, it is difficult to accurately predict motifs

– Motifs are short in length, and also divergent– So, occurrences in S1 and S2 are difficult to

distinguish• We overcome this problem by a

discriminative model refinement approach– We make two models:

M(+)…from known motifsM(-)…from false motifs

– Iteratively refine the models, and separate the occurrences in S1 and S2

HISPIG

Methods Used

• HMMER

• Log-likelihood Method

• Both with iterative model refinement approach

HMMER• Detects ISRE and GAS sequences (up-regulat

ed genes, down-regulated genes and random genes)

1. Build a model with a list including known and functional motifs from journals by hmmbuild

hmm consensus sequence2. Parse promoter region of each gene3. Look for occurrences of the consensus within the

promoter region of the three gene groups by hmmsearch

Alignment File (.aln)• List of known motifs – as .aln file

• Example of ISRE:IP10 AGGTTTCACTTTCCAISG15 CAGTTTCGGTTTCCCFactor CAGTTTCTGTTTCCTTla TAGTTTCACTTTTTGGBP TACTTTCAGTTTCATISG20 ATCTTTGACTTTGTC

*** ***

Result for INDO gene (2 ISREs)Alignments of top-scoring domains:

INDO: domain 1 of 2, from 4901 to 4915: E = 0.0097 *-> g g g a a a . t g a a a c t a<-* + g a a a + t g a a a c + a INDO 4901 TAGAAA a TGAAACCA 4915

INDO: domain 2 of 2, from 5370 to 5384: E = 0.18 *-> g g g a a a . t g a a a c t a <-* g ++ a a + g a a a c t a

INDO 5370 TGAGAA a GGAAACTA 5384

negative strand

Iterative Model Refinement

ModelS1 :Sm+n

ModelS1 :Sm

1. look for more occurrences

2. rank the new sequences3. add top k sequences

ModelS1 :Sm+k

n sequences were significant

(may be functional)

But that is too many to add

Let’s add only relevant k sequences This is my

new model for next iteration

hmmsearch results (ISRE)

group iterations up-regulated random down-

regulated

e-val < 0.011 6 2 0

2 22 4 1

e-val < 0.11 53 11 16

2 82 25 28

hmmsearch results (GAS)

group iterations up-regulated random down-

regulated

e-val < 0.11 0 0 0

2 23 7 19

e-val < 0.31 9 2 7

2 72 37 52

Problems of hmmsearch• Number of significant motifs detected

– ISRE >>> GAS (in terms of e-value)• Cannot tell whether the detected motifs are

functional or not– E-value is the only measure

• GAS overlap between different gene groups– 25% between up-regulated and random

• As in previous slides, occurrences detected from the different gene groups are hard to distinguish

Log Likelihood Method• Calculate scores for each detected motif to tell

whether functional, and to discriminate gene groups– Score = log (M(+) / M(-))– M(+)… Known motifs, M(-)… False motifs– 1 pseudo count for each nucleotide per 10 sequences

• If the log-likelihood score for the given motif is– positive… the motif is functional if also have

significantly low e-value– negative… the motif is not functional

Concept of Models(+/-)

ISRE1 CAGTTT..ISRE2 TAGTTT..GAS1 TTTCAA..

List of known & functional motifs

Model(+)

ISRE1 TACTTT..ISRE2 AGGCTT..GAS1 TATGAA..

List of false positive motifs

Model(-)

1. build model

3. build model

2. search occurrences of M(+) in negative model

Base Composition Tweaking• All known functional ISRE has two “TTT”s

– Without tweaking, a motif with a “TTT” and a “TCC” will receive high log-likelihood score

• To solve this problem, we look for high percentage nucleotides, and make them dominant – Example: base composition of a certain column

-3%-14%-12%-71%

-0.1%-0.1%-0.1%-99.7%

tweak!

Model(+)S(+)1 :S(+)n

Iteration and Model Refinement

First iteration (model refinement)

Second iteration (model refinement)

Model(-)S(-)1 :S(-)n

Up-regulated vs. Random

Iterations

ISRE(positive)ISRE(negative)GAS(positive)GAS(negative)

up-regulated genes AVG

random genes AVG

Search Result of HISPIG• Numerous potentially functional ISRE and

GAS were detected from 100 most up-regulated genes (both known and unknown)– Approximately 80% of the genes had either

functional ISRE or GAS– Numerous genes contain unknown functional

motifs that match with other gene expression experiments previously shown in journals

• All motifs included in the model were concluded to be functional

Improvement of log-likelihood• Re-aligning process of model refinement

– Rank sequences that match criteria by1. e-value2. log-likelihood score3. both (not easy to implement algorithm)

– Convincing if 2. works better than others• Which model to refine each iteration

– Only positive? Only negative? Both?

Measuring the Reliability of the Program

• Best Way – Do wet lab experiments to see if a detected unknown motif is really functional

• Alternative1. Remove some known and functional

sequences from the initial model2. See if the program still detects those in

the end

Reliability Experiment (ISRE)gene name detected e-value log-likelihood result

INDO YES 0.23 4.28 FAIR

INDO YES 0.097 2.74 GOOD

ISG20 NO BAD

BF YES 0.057 5.90 GOOD

IFIT2 YES 0.011 5.88 GOOD

G1P3 YES 0.0033 5.06 GOOD

G1P3 YES 0.0039 5.54 GOOD

CXCL10 YES 0.43 4.31 FAIR

OAS1 YES 0.01 4.68 GOOD

Acknowledgements

Sun KimMilton TaylorStuart Young

HISPIG A Discriminative Model Refinement Approach with Iterations for Detecting Regulatory Regions...

Documents

Anywhere Ballot iterations

I' Alamos National Labs... · 2013. 6. 14. · ,r E N T I N J A p A N ity. Takuma markets a unit tg to Mr. Nakazato of Takuma : material have been found for although, according to

Screen Grab Iterations

Message from Top Management › english › csr › pdf › csr_report › 2018 › ...It was the Company Motto of Takuma, then Takuma Boiler Manufacturing Co., Ltd., founded by Mr

CSCI213 Spring2013 Lectures Iterations

1990 Convergence of inner/outer source iterations with ... · CONVERGENCE OF INNER/OUTER SOURCE ITERATIONS WITH FINITE TERMINATION OF THE INNER ITERATIONS PAUL NELSON, C.P. KATTI

Syllabus - 長崎大学...7th: New approaches for the drug discovery against cancer (Tsukahara) 8th: Introduction of medicinal chemistry (Tsukahara) Key word Chronic pain, Stroke,

LumiO: A Plaque-aware Toothbrushiis-lab.org/paper/ubicomp2016.pdf · 2019-02-06 · LumiO: A Plaque-aware Toothbrush Takuma Yoshitani† takuma@iis-lab.org Masa Ogata‡ ogata@ailab.ics.keio.ac.jp

Project-Based Curriculum for Teaching Analytical Design to ...systemdesign.illinois.edu/publications/Her16a.pdf · design iterations versus algorithmic design optimization iterations

Vinod Sasidharan, Overview of course design iterations

“Doctor, this patient is sick” From the ward to the PICU John Tsukahara MD Pediatric ICU California Pacific Medical Center

Py4inf 05 Iterations

BOATS CATALOG 2016 - takuma boat · BOATS CATALOG 2016. 3/32 TAKUMA IS THE FIRST BRAND OFFERING INFLATABLE AND CUSTOMIZABLE ... obsidienne chrysocolle azurite crocoite vanadinite

2.1 Mathematical Morphology - InriaMathematical Morphology •Technique to manipulate digital shapes •Unknown image ... 110. 111 Erosion iterations= 2 4‐structuring element iterations=

Massive Continuous Integration and Light-speed Iterations

Flink Batch Processing and Iterations

Takuma Nakahira - Three Essays

Learn from Digital Iterations and Adapt Your Strategy

Users. Iterations. Disruptive Thinking

Siebel Food Video Wall Feed: Design Iterations