Download ppt - Rachel Adams, Jerry Choate, Nathan Harrelson, Divya Mistry, and Whitney Smith

Rachel Adams, Jerry Choate, Nathan Harrelson, Divya Mistry, and Whitney Smith

BINF 4360, Fall 2007

Overview

Goals Implementation Interface Images Final product Conclusions

Goals

Create a dynamic map of the Shewenella Oneidensis MR-1 genome

Populate local database with relevant information from web-based databases

Provide an efficient searching algorithm for key terms

Implement user-friendly navigation and readability

Implementation

SQL Schema Parsing Databases

Parsing

XPath XPath was used to quickly parse through XML documents generated from

NCBI’s SOAP interface. my $xp=XML::XPath->new(filename=>$file);

# gets the locus tag

foreach $var ($xp->find('//Gene-ref')->get_nodelist) {

$name = $var->find('Gene-ref_locus')->string_value;

$locus = $var->find('Gene-ref_locus-tag')->string_value;

}

LWP::Simple Simple was used to grab content from a url so it could be easily written to an

XML file.

Regular Expressions Regular expressions were used to parse through HTML files, match

specific string patterns, and manipulate text.

Schema

kegg

areaarea_id integer

href text

title text

target text

coords text

img_id integer

area_area_id_seqsequence_name name

last_value bigint

increment_by bigint

max_value bigint

min_value bigint

cache_value bigint

log_cnt bigint

is_cycled boolean

is_called boolean

img_img_id_seqsequence_name name

last_value bigint

increment_by bigint

max_value bigint

min_value bigint

cache_value bigint

log_cnt bigint

is_cycled boolean

is_called boolean

imgimg_id integer

map varchar(5)

imgplacementimg_id integer

tilex integer

tiley integer

gene_id text

kegg_id text

ncbi_genesid integer

name text

locus_tag text

month integer

day integer

year integer

location text

description text

function text

cog_id text

gi text

img_id text

pdbid text

pdb text

ncbi_proteinslocus_tag text

date date

defintion text

description text

gene text

Databases

NCBI Local databases were populated using information retrieved from gene, protein, and

3D domain web-based databases.

COG Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing

protein sequences encoded in complete genomes, representing major phylogenetic lineages.

IMG The Integrated Microbial Genomes (IMG) system's goal is to facilitate the visualization

and exploration of genomes from a functional and evolutionary perspective.

KEGG Knowledge-based methods for uncovering higher-order systemic behaviors of the cell

and the organism from genomic information is stored in KEGG, Kyoto Encyclopedia of Genes and Genomes.

More Databases

MIST The Microbial Signal Transduction database contains the signal transduction proteins

for 591 complete bacterial and archaeal organisms.

ORNL The Genome Analysis and System Modeling Group of the Life Sciences Division of

ORNL provides bioinformatics and analytic services and resources to collaborators, predicts prospective gene and protein models for analysis, and provides user services for the general community.

PDB The RCSB PDB provides a variety of tools and resources for studying the structures

of biological macromolecules and their relationships to sequence, function, and disease.

ShewCyc ShewCyc is a part of BioCyc, a collection of 371 Pathway/Genome Databases, which

describes the genome and metabolic pathways of the Shewenella Oneidensis MR-1 genome.

Interface

Functions provided by Google’s Map API were used to display pathways of the Shewenella genome.

A small overview map is provided to give a bird’s eye view of the entire image. The current view is indicated with a translucent box.

The user has the ability to view the pathways using 5 different zoom levels. Text balloons show information relevant to the user’s selected target.

A search bar offers quick targeting of a user’s query of interest. The user can either pan over the images and click on areas of interest or

enter a query in a search bar to find specific information. If the user submits a term to be queried, relevant targets are indicated on the

map with colored pins.

Images

ImageMagick is a free software suite to create, edit, and compose bitmap images. The main functions that we took advantages of

included the ability to resize, sharpen, pad, and stitch together images.

We also were able to create a composite image by combining several (212) separate images.

Placing the images within 16384 by 16384 pixels took strategic manipulation and tedious offset calculation.

Final Product

Zoomed image

Final Product

Query for glycogen

Final Product

Query for ATP

Conclusions

Using GoogleMaps we were able to create a searchable map of pathways in the Shewenella genome.

Efficient parsing methods made collecting and querying data far simpler.

With more time, additional improvements could be implemented to increase the usability of this application. Currently we offer links to images, but it would be optimal to have

thumbnails of the pictures themselves readily viewable. GoogleWebToolkit has several functions that would make more

information available for the user. Tabs on text balloons could separate data into topical subgroups. Overlaying a transparent map on top of the current map could be a useful tool for comparing two pathways.

Additionally, the overall scope of the project would be enhanced if we had even more indepth zoom levels such that the user could actually see the sequence of the amino acids and nucleotides.