View
20
Download
0
Category
Tags:
Preview:
DESCRIPTION
TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Data. Harry Hochheiser Eric Baehrecke Stephen Mount Ben Shneiderman. Harry Hochheiser is supported by a fellowship from America Online. Time Series Data. Real-Valued function over time Goal: find patterns - PowerPoint PPT Presentation
Citation preview
TimeSearcher: Interactive Querying for Identification of Patterns in
Genetic Data
Harry Hochheiser Eric Baehrecke Stephen MountBen Shneiderman
Harry Hochheiser is supported by a fellowship from America Online.
2
Time Series Data• Real-Valued function over time• Goal: find patterns
– “Starts Low, Ends High”– Outliers– Periodic Patterns– Laggards and Leaders
• Hypothesis generation
3
Microarray Data
Chu, et al. The transcriptional program of sporulation in budding yeast, Science 1998 Oct 23; 282(5389): 699-705.
4
Timeboxes• Rectangular query regions• Value must be in range for all time points in region• Combine multiple timeboxes for conjunctive query
Sharp Rise Panic Reversal
5
TimeSearcher/Microrarray demo
6
TimeSearcher
• Interactive exploration of time-series data
• Dynamic queries (<100ms)• Linear display of individual items • Create queries on graph area• Move, scale timeboxes to modify query• Drag-and-Drop for query-by-example
7
Other Applications
• “Time”: linear ordered sequence• Use TimeSearcher for general sequences
– E.g., DNA
8
SF1Splicing signals are recognized during earlysteps in the biochemical process of splicing.U2AF65
Exon 1U1
U2AF35
(Y)n AGExon 2
BranchSite
Application to the case of the Arabidopsis thaliana branch site consensus splicing signal.
Steve MountCell Biology and Molecular Genetics
Harry Hochheiser and Ben ShneidermanHuman Computer Interaction Lab
Steven SalzbergThe Institute for Genomic Research
TimeSearcher for analysis of weak signals in nucleotide sequences:
9
Two-step pre-mRNA splicing mechanism with branched intermediate:
Diagram courtesy of Dr. Martinez Hewlett
Yeast (Saccharomyces cerevisiae)Invariant: TACTAAC
Humans (Homo sapiens)Consensus: TNYTRAYY
Fruit flies (Drosophila melanogaster)Invariant: WCTAATY
Weeds (Arabidopsis thaliana):Invariant: CTRAY
Consensus sequences:
Here we sought to verify and extend the experimentally determined branch site consensus CTRAY determined by Simpson et al. (2002). Our long-term goal is the characterization of an even weaker signal, the ‘exonic splicing enhancer.’
Y = C or T; W = A or T; R = A or G; N = A, C, G or T
10
11
12
13
14
15
16
ACTAA ACTGA ATAAC ATTGA CTAAA CTAAC CTAAT CTCAT CTGAC TAACG TAACT TCTAA TGACT TGATT TTAAC WYTRAY
Branch site
Pyrimidines
Distance to 3’ splice site
Num
ber
of
over-
repre
sente
d w
ord
s
one sigma
two sigma
Y = C or T; W = A or T; R = A or G; N = A, C, G or T
Conclusions:TimeSearcher can be used to identify weak signals in aligned nucleotide sequences.
Analysis of 8,550 exons from Arabidopsis supports the branch site consensus WYTRAY.
17
Future Work: Extensions to query model
• Leaders and Laggards– Identification of regulatory genes
• Multiple time-varying values• Variable Time timeboxes• Collaborations with biologists
inform design
What sort of queries are of interest?
18
Conclusions• TimeSearcher: interactive tool for
graphical exploration of time series data• Ongoing use for analyzing microarray
data and sequence data
We’re interested in working with motivated users & real data sets
www.cs.umd.edu/hcil/timesearcher
Recommended