1
Control ler View (web) Model THE EUPATHDB / GUS-WDK SEARCH STRATEGY SYSTEM Cristina Aurrecoechea 1 , Brian P. Brunk 2 , Steve Fischer 2 , Xin Gao 2 , Omar S. Harb 2 , Mark Heiges 1 , Jessica C. Kissinger 1 , Eileen T. Kraemer 1 , Cary Pennington 1 , David S. Roos 2 , Chris Ross 1 , Christian J. Stoeckert 2 & Charles Treatman 2 1 Univ. Georgia, Athens GA, & 2 Univ. Pennsylvania, Philadelphia PA User perspectives on Strategies •Computer-human interaction (CHI) studies during prototyping drove the design, and showed high user enthusiasm. •Usage stats show 3-fold increase in use of Booleans in two months since release. •User feedback very positive. WDK Implementation •Runs on any relational database schema •Model: configured by you in XML. •Abstracts DB to high level Records (Genes, ORFs, etc) •Also specifies queries and returned columns •Automated sanity testing •Can talk to processes (BLAST) via a WS Framework •View: Tomcat, JSP, tag library, JavaScript, Ajax, CSS • You embed JSP tags in your site and style them w/ CSS •Controller: Struts WDK Upcoming features •Add genes to a “basket” to generate a report, add to a strategy as a step or send to a tool (e.g., multiple sequence alignment) •Web services access to queries •Assign weights to results from individual steps for improved filtering •Transform a set of one type into another type The EuPathDB suite of genome database web sites recently introduced a graphical search interface that motivates users to undertake dynamic computational experiments, exploring relationships across datasets to identify biologically meaningful genes and other entities. For example, users seeking novel therapeutic targets may wish to prioritize putative enzymes that distinguish pathogens from their hosts, and are expressed during appropriate developmental stages. Strategies are initiated by running one of 80+ queries, and extended by adding additional searches, linked via Boolean operators represented graphically as Venn diagrams. Sub-strategies allow modular construction and tree structures, and searches may be extended using filters (e.g. by strain or species) and transforms (e.g. orthologs). A graphical display makes the overall logic obvious, and facilitates revision of individual steps, with changes propagated forward through the strategy. Users may name and save their strategies, creating protocols that can be shared with colleagues. (See, e.g., http://plasmodb.org/plasmo/im.do?s=2aa0454db6a6cca0.) The strategy system has been subjected to extensive usability studies, and deployed on all EuPathDB databases (CryptoDB, GiardiaDB, PlasmoDB, ToxoDB, TrichDB and TriTrypDB). Although these sites have offered text-based Boolean operations for many years, usability analysis indicated that most users were not taking full advantage of that feature. Following release of the graphical Search Strategy system, the number of searches per visit dramatically increased. Response from our user community has been extremely positive, as investigators have discovered the power of combining datasets and making dynamic adjustments to define optimal parameters and highlight biologically-relevant relationships. With the accelerating growth in diversity and scale of available datasets, the potential for exploiting interrelationships increases dramatically, and we expect this interface to have a significant impact in bringing “genomic thinking” to a broad audience. This system was developed using the GUS Web Development Kit (WDK), a schema-independent middleware system for generating genomics websites The EuPathDB suite of databases covers genomic and functional genomics datasets for a variety of eukaryotic pathogens. Shown here is PlasmoDB, which contains the genus Plasmodium, including P.falciparum, the malaria parasite. Use Case Use data in PlasmoDB to find parasite (Plasmodium) drug target genes This panel shows a schematic of a strategy, using queries and booleans. The actual strategy is built below. Transferases (E.C.) [union] Kinase activity (GO) [intersect] --------------------------------------------------------------------------- [intersect] present in Haemosporida, not Mammals [intersect] not under diversifying selection (SNPs) [transform] orthology to any Plasmodium genes Run a query (choose from menu) 2 Add a step (another query) Add more steps… Build a Strateg y 3 1 4 Revise steps at any time…. Changes propagate forward. A strategy can integrate data from genome annotation, expression, SNPs, proteomics, etc. Nest strategies to add complexity. View results from all or any species. Use orthology to transform results to other species. Download customized reports of results. Choose from many available columns. Sort and move columns. Dynamically revise, add or delete steps. Email a strategy link tocollegaues. It’s Easy to Build a Strategy… Genomics Database WDK Engine Query Cache Genomics Data Denormaliz ed For Query Speed Genomics Data User Login and Search History WDK Model (Java Objects) WDK Model (XML) WDK Query Engine (Java) Web Services Framewor k JavaBean s (JSP compatible) JSP Tag Library Struts controll er WDK Sanity Test …Strategies are Powerful Save and browse strategies. Challenge: exploit the power of integrated genome annotation, expression data, proteomics data, SNPs, etc. Solution: Strategies… A Graphical Query Interface for Genomics Databases # Nested Strategy P.f. transcript expr. at 24 hours +/- 8 [union] P.f. transcript expr. in Trophozoites [union] P.f. protein expr. in Trophozoites JSP and CSS = You provide = WDK provides = Optional Different types of strategies: Genes, Isolates, SNPs, Transcript assemblies, Chromosomes, Array Elements, ORFs, etc. Strategies Web Dev Kit (WDK) www.gusdb.org/wdk EuPathDB is an NIAID Bioinformatics Resource Center Supported by NIAID Contract No. HHSN266200400037C and The Bill & Melinda Gates Foundation Processe s (eg, BLAST)

Controller View (web) Model Model T HE E U P ATH DB / GUS-WDK S EARCH S TRATEGY S YSTEM Cristina Aurrecoechea 1, Brian P. Brunk 2, Steve Fischer 2, Xin

Embed Size (px)

Citation preview

Page 1: Controller View (web) Model Model T HE E U P ATH DB / GUS-WDK S EARCH S TRATEGY S YSTEM Cristina Aurrecoechea 1, Brian P. Brunk 2, Steve Fischer 2, Xin

Controller

View (web)

Model

THE EUPATHDB / GUS-WDK SEARCH STRATEGY SYSTEM

Cristina Aurrecoechea1, Brian P. Brunk2, Steve Fischer2, Xin Gao2, Omar S. Harb2, Mark

Heiges1, Jessica C. Kissinger1, Eileen T. Kraemer1, Cary Pennington1, David S. Roos2, Chris Ross1, Christian J. Stoeckert2 & Charles Treatman2

1Univ. Georgia, Athens GA, & 2Univ. Pennsylvania, Philadelphia PA

User perspectives on Strategies• Computer-human interaction (CHI) studies during

prototyping drove the design, and showed high user enthusiasm.• Usage stats show 3-fold increase in use of Booleans in two

months since release.• User feedback very positive.

WDK Implementation•Runs on any relational database schema•Model: configured by you in XML.

• Abstracts DB to high level Records (Genes, ORFs, etc)• Also specifies queries and returned columns• Automated sanity testing• Can talk to processes (BLAST) via a WS Framework

•View: Tomcat, JSP, tag library, JavaScript, Ajax, CSS• You embed JSP tags in your site and style them w/ CSS

•Controller: Struts

WDK Upcoming features• Add genes to a “basket” to generate a report, add to a strategy as a

step or send to a tool (e.g., multiple sequence alignment)•Web services access to queries•Assign weights to results from individual steps for improved filtering•Transform a set of one type into another type based on genome span relations

The EuPathDB suite of genome database web sites recently introduced a graphical search interface that motivates users to undertake dynamic computational experiments, exploring relationships across datasets to identify biologically meaningful genes and other entities. For example, users seeking novel therapeutic targets may wish to prioritize putative enzymes that distinguish pathogens from their hosts, and are expressed during appropriate developmental stages. Strategies are initiated by running one of 80+ queries, and extended by adding additional searches, linked via Boolean operators represented graphically as Venn diagrams. Sub-strategies allow modular construction and tree structures, and searches may be extended using filters (e.g. by strain or species) and transforms (e.g. orthologs). A graphical display makes the overall logic obvious, and facilitates revision of individual steps, with changes propagated forward through the strategy. Users may name and save their strategies, creating protocols that can be shared with colleagues. (See, e.g., http://plasmodb.org/plasmo/im.do?s=2aa0454db6a6cca0.)

The strategy system has been subjected to extensive usability studies, and

deployed on all EuPathDB databases (CryptoDB, GiardiaDB, PlasmoDB, ToxoDB, TrichDB and TriTrypDB). Although these sites have offered text-based Boolean operations for many years, usability analysis indicated that most users were not taking full advantage of that feature. Following release of the graphical Search Strategy system, the number of searches per visit dramatically increased. Response from our user community has been extremely positive, as investigators have discovered the power of combining datasets and making dynamic adjustments to define optimal parameters and highlight biologically-relevant relationships. With the accelerating growth in diversity and scale of available datasets, the potential for exploiting interrelationships increases dramatic ally, and we expect this interface to have a significant impact in bringing “genomic thinking” to a broad audience.

This system was developed using the GUS Web Development Kit (WDK), a schema-independent middleware system for generating genomics websites

The EuPathDB suite of databases covers genomic and functional genomics datasets for a variety of eukaryotic pathogens.

Shown here is PlasmoDB, which contains the genus Plasmodium, including P.falciparum, the malaria parasite.

Use Case Use data in PlasmoDB to find parasite (Plasmodium) drug target genes

This panel shows a schematic of a strategy, using queries and booleans. The actual strategy is built below.

Transferases (E.C.)[union] Kinase activity (GO)[intersect] ---------------------------------------------------------------------------[intersect] present in Haemosporida, not Mammals[intersect] not under diversifying selection (SNPs)[transform] orthology to any Plasmodium genes

Run a query (choose from menu) 2 Add a step (another query) Add more steps…

Build a Strategy

31

4

Revise steps at any time….Changes propagate forward.

A strategy can integrate data from genome annotation, expression, SNPs, proteomics,

etc.

Nest strategies to add complexity.

View results from all or any species.

Use orthology to transform results to other species.

Download customized reports of results.

Choose from many available columns.Sort and move columns.

Dynamically revise, add or delete steps.

Email a strategy link tocollegaues.

It’s Easy to Build a Strategy…

Genomics Database

WDK Engine Query Cache

Genomics DataDenormalized

For Query Speed

Genomics Data

User Login and Search

History

WDK Model(Java Objects)

WDK Model(XML)

WDK QueryEngine(Java)

Web Services

Framework

JavaBeans(JSP compatible)

JSP Tag Library

Struts controller

WDK Sanity Test

…Strategies are Powerful

Save and browse strategies.

Challenge: exploit the power of integrated genome annotation, expression data, proteomics data, SNPs, etc. Solution: Strategies… A Graphical Query Interface for Genomics Databases

# Nested Strategy P.f. transcript expr. at 24 hours +/- 8[union] P.f. transcript expr. in Trophozoites[union] P.f. protein expr. in Trophozoites

JSP and CSS

= You provide

= WDK provides

= Optional

Different types of strategies: Genes, Isolates, SNPs, Transcript assemblies,

Chromosomes, Array Elements, ORFs, etc.

Strategies Web Dev Kit (WDK)www.gusdb.org/wdk

EuPathDB is an NIAID Bioinformatics Resource Center Supported by NIAID Contract No. HHSN266200400037C and The Bill & Melinda Gates Foundation

Processes (eg, BLAST)