1
BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, * , ANIL KHADKA , MOHAMMAD SHAFIULLAH AND HESHAM ALI †, * College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE 68182-0116 * Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE 68198-6495 Abstract High through put technologies in biomedical research allow for the study of biological processes as integrated systems. Significant amount of such data need to be stored, managed and analyzed to make the contribution of these data meaningful. A resource facility comprising of members with diverse skill sets, including knowledge of programming and an educational background in biology will help in resource integration. This project exemplifies such relation that exists between University of Nebraska at Omaha and other multiple INBRE units in Nebraska. The objective of current work was to develop a mechanism that allowed multiple researchers access constantly evolving hardware and software resources that would aid in their biomedical research involving high-throughput data analysis. Custom application and functionality were created under the open source web application framework called WebGUI. A prototype framework (BIOlogical Content Management System) was developed that allows multiple users within INBRE units in Nebraska to upload data, manage and use a collection of open-source and ‘in- house’ computational tools. The features from the parent WebGUI simplify content management for users, while the versioning and workflow systems allow system administrators to manage the system more efficiently. The development and availability of software tailored to the needs of an individual laboratory will ensure data quality and enhance data analysis and discovery. Motivation Use of informatics tools in the analysis of biological data has become essential in many biological fields. Biologists now use, in addition to spreadsheet, word processor and statistical tools, many commercial and in-house-developed bioinformatics software for sequence analysis, bio- data management, gene expression and proteomics. Most popular tools include those which are accessible through the web. While more and more such tools are available freely, it is not always easy to find the relevant ones. Biologists then turn to Bioinformatics Core facility for help and often request the facilities to assemble a list of such focused resources and make it available to their clientele. Motivated by this need we developed a prototype framework (BIOlogical Content Management System) that allows multiple users within INBRE units in Nebraska to upload data, manage and use a collection of open-source and ‘in-house’ computational tools. WebGUI WebGUI is an open source content management system written in Perl and released under the GNU General Public License. In WebGUI, creating and modifying web pages is done with an intuitive browser-based system. Users have the freedom to create/modify content fairly easily, have tools for sharing different files, and can choose different software to use from a list of software that were either developed in-house or requested by the user or deemed useful by the Bioinformatics Core Facility. This becomes an attractive feature for Bioinformatics Core Service providers who have to deal with multiple users (defined as groups) with similar needs but different content. This framework was especially attractive for the following reasons: • Most customizable of the many CMS that is available. • Allow the ability to build and maintain complex website(s). • It is modular, pluggable and platform independent. • It is open source, well documented and supported. Fig 2: (right-center and far-right) Integration of molecular biology laboratory inspired features. Extension of WebGUI to incorporate requirements of researchers in the life sciences. The screen shots shown include some of the available graphical user interface in the BIOCMS for (A) Introduction to the research group (B) File sharing between group members (C) Management of figures, graphs and pictures with relevant captions (D) Collection of software available to group members (E) research notebook supported by mediawiki and (F) genomic workflow management tool- Galaxy Fig 1. (above) BIOCMS and relations between different components (A) (B) (C) (D) (E) (F) BIOCMS Features 1. User authentication One user can belong to one or multiple groups with specific levels of privilege. Access to content in BIOCMS is controlled by privileges assigned to the user. All assets are separated based on groups. 2. File sharing and management Manuscripts and documents can be managed, organized and shared in BIOCMS. Users can view files being shared by group members and their attributes. All files are maintained through a version controlling system to allow users to easily revert to an older version of the file or retrieve deleted files just with a few clicks. File owners can control the attributes of a file and allow read-only access to the file by other members. Swish-e indexing technology is used for advance searching. This technology allows full indexing of DOC, PDF, HTML, plain text and Excel files. 3. Software Users have the option to choose from a list of in-house and third-party software. An aesthetic, intuitive, and customizable dashboard is provided for browsing the list of software. Authentication modules have been developed for a seamless user experience with third-party software. It allows user authentication token can be passed to the third-party software to avoid multiple login. Software are categorized into ‘Group only’, ‘Group requested third-party’, and ‘Generic’. A group notepad system is supported by Mediawiki technology and Galaxy provides a genomic workflow management. Acknowledgement This project was supported by the NIH grant number P20 RR016469 from the INBRE Program of the National Center for Research Resources.

BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, *, ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, *, ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI

BIOCMS: Resource Integration and Web Application Framework for Bioinformatics

DHUNDY R BASTOLA†,* , ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI†,*†College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE 68182-0116 *Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE 68198-6495

AbstractHigh through put technologies in biomedical research allow for the study of biological processes as integrated systems. Significant amount of such data need to be stored, managed and analyzed to make the contribution of these data meaningful. A resource facility comprising of members with diverse skill sets, including knowledge of programming and an educational background in biology will help in resource integration. This project exemplifies such relation that exists between University of Nebraska at Omaha and other multiple INBRE units in Nebraska. The objective of current work was to develop a mechanism that allowed multiple researchers access constantly evolving hardware and software resources that would aid in their biomedical research involving high-throughput data analysis. Custom application and functionality were created under the open source web application framework called WebGUI. A prototype framework (BIOlogical Content Management System) was developed that allows multiple users within INBRE units in Nebraska to upload data, manage and use a collection of open-source and ‘in-house’ computational tools. The features from the parent WebGUI simplify content management for users, while the versioning and workflow systems allow system administrators to manage the system more efficiently. The development and availability of software tailored to the needs of an individual laboratory will ensure data quality and enhance data analysis and discovery.

MotivationUse of informatics tools in the analysis of biological data has become essential in many biological fields. Biologists now use, in addition to spreadsheet, word processor and statistical tools, many commercial and in-house-developed bioinformatics software for sequence analysis, bio-data management, gene expression and proteomics. Most popular tools include those which are accessible through the web. While more and more such tools are available freely, it is not always easy to find the relevant ones. Biologists then turn to Bioinformatics Core facility for help and often request the facilities to assemble a list of such focused resources and make it available to their clientele. Motivated by this need we developed a prototype framework (BIOlogical Content Management System) that allows multiple users within INBRE units in Nebraska to upload data, manage and use a collection of open-source and ‘in-house’ computational tools.

WebGUIWebGUI is an open source content management system written in Perl and released under the GNU General Public License. In WebGUI, creating and modifying web pages is done with an intuitive browser-based system. Users have the freedom to create/modify content fairly easily, have tools for sharing different files, and can choose different software to use from a list of software that were either developed in-house or requested by the user or deemed useful by the Bioinformatics Core Facility. This becomes an attractive feature for Bioinformatics Core Service providers who have to deal with multiple users (defined as groups) with similar needs but different content. This framework was especially attractive for the following reasons:

• Most customizable of the many CMS that is available.• Allow the ability to build and maintain complex website(s).• It is modular, pluggable and platform independent.• It is open source, well documented and supported.

Fig 2: (right-center and far-right) Integration of molecular biology laboratory inspired features. Extension of WebGUI to incorporate requirements of researchers in the life sciences. The screen shots shown include some of the available graphical user interface in the BIOCMS for (A) Introduction to the research group (B) File sharing between group members (C) Management of figures, graphs and pictures with relevant captions (D) Collection of software available to group members (E) research notebook supported by mediawiki and (F) genomic workflow management tool- Galaxy

Fig 1. (above) BIOCMS and relations between different components

(A)

(B)

(C)

(D)

(E)

(F)

BIOCMS Features

1. User authentication• One user can belong to one or multiple groups with specific levels of

privilege.• Access to content in BIOCMS is controlled by privileges assigned to the

user.• All assets are separated based on groups.

2. File sharing and management• Manuscripts and documents can be managed, organized and shared in

BIOCMS. • Users can view files being shared by group members and their attributes.• All files are maintained through a version controlling system to allow users

to easily revert to an older version of the file or retrieve deleted files just with a few clicks.

• File owners can control the attributes of a file and allow read-only access to the file by other members.

• Swish-e indexing technology is used for advance searching. This technology allows full indexing of DOC, PDF, HTML, plain text and Excel files.

3. Software• Users have the option to choose from a list of in-house and third-party

software. An aesthetic, intuitive, and customizable dashboard is provided for browsing the list of software.

• Authentication modules have been developed for a seamless user experience with third-party software. It allows user authentication token can be passed to the third-party software to avoid multiple login.

• Software are categorized into ‘Group only’, ‘Group requested third-party’, and ‘Generic’.

• A group notepad system is supported by Mediawiki technology and Galaxy provides a genomic workflow management. Acknowledgement

This project was supported by the NIH grant number P20 RR016469 from the INBRE Program of the National Center for Research Resources.