View
219
Download
4
Tags:
Embed Size (px)
Citation preview
Deliver what?
‘Comprehensive catalog’ of mammalian regulatory elements
‘Validated’, known accuracy
Clustered into similar groups - ‘TF models’
Annotated as known/novel
Modules identified, ‘specific to...’
Predictions extrapolated to remote regions
Predictive system
Mostly JavaSome Perl/bash270 CPUs/OSCAR
TRANFSAC 9.1Manual TFBS
EnsEMBL-basedGeneralize...
OPTICS
Accuracy
metrics
Multi-source orthologue resource
Compara, HomoloGene, Inparanoid, KEGG
Compara, HomoloGene, Inparanoid,
KEGG, …
Visual comparative genomics: Assessing ortholog annotations
LAGAN alignment detects
misannotated chicken gene
Orthologues of a human gene
Assess sequence conservation for a
coding exon (MLAGAN).
Motif discovery with multiple
methods/params
Methods(W)CONSENSUSMEMEMotifSamplerGibbs SamplerBioprospector, MDmodule, …WeederCisModuleNestedMICA, Sombrero,...
‘Multiple’ means Methods Motif occurrence models Other parameters
Motif scores p-values
Target
Cumulative motif score distns
p-val = 0.02
No p-val threshold
1 Discover with target and random sequences.2 Apply method-independent score.3 Use random distribution to assign p-value to a score.
Random
1500b region
Clustering with OPTICS
Reachability plot
JASPAR scan test: 50-PWMs, 100 target sequence sets
Labeled cluster
contents
1 Pairwise motif similarity measure. 2 Scalable hierarchical clustering method with automatic stopping. [32 CPUs, 96 GB RAM, 64-bit OS]
www.cisred.orgv1.1: human, mousehuman: 6K genes, 120K motifs
Web database design and construction
Main competitors
Zhang - Cold Spring Harbor Lab
Lander/Kellis - MIT
Bolouri - Institute for Systems Biology
Hardison/Haussler - Penn State/UCSC
...
High throughput ... low throughput
Large scale’s here. Now what?
Production / R&D
Hi/lo throughput. Collaborators
Accuracy / complexity / data integrationChIP-xxxx, expression specificity, chromatin state, 3’UTRs, LREs... ENCODE
Regulatory networks and cascades
Competitive opportunities
Monica - C. elegans, briggsae, unannotated
Erin - Drosophila, ..., unannotated
Han Hao / Jim Kronstad (UBC) - fungi
Generalize
SNPs - Stephen MontgomeryRepetitive regions - Dixie
Mager
Competitive opportunities
Many target genes, many orthologuesLow-coverage/unannotated genomes
Accuracy - resources, methods, protocols, ...
Coexpression and orthologyDiscovery input vs. co-occurrence/modules
Motif similarity, clustering - a superset?
cisRED annotations in EnsEMBL
‘Contextual’ motif/module resource...
‘Context’ in cisRED
Discovered motifsDiscovered motifs
Motif similarity measuresMotif similarity measures
Clustering methodsClustering methods
‘Known’ motif resources
‘Known’ motif resources
Annotate motifs as known/novel
Annotate motifs as known/novel
Motif groups(specific to...)Motif groups(specific to...)
Other result types
Other result types
‘Accuracy’‘Accuracy’Motif classification system
Motif classification system
Competitive opportunities
Validated predictions Myers/Stanford Collaborators
Be ‘on the short list’ Collaborators, publications
GC3 - ChIP-SAGE, networks...
Acknowledgements
Misha Bilenky, Chris Fjell, Obi Griffith, Han Hao, Ann He, Bernard Li, Keven Lin, Stephen Montgomery, Mehrdad Oveisi, Erin Pleasance, Neil Robertson, Wenjia Pan, Monica Sleumer, Kevin Teague, Richard Varhol, Maggie Zhang, Asim Siddiqui, Steven Jones
Jianjun Zhou, Jörg SanderDept. Computing Science, University of Alberta
Tamara Astakhova, Maik Hassel, James Kennedy, Eddy Tsang, Tony Fu, ...
FundingGenome Canada, BC Cancer Foundation, Michael Smith Foundation for Health Research