1
RESEARCH POSTER PRESENTATION DESIGN © 2011 www.PosterPresentations.com Web views of use cases SharePoint ELN CDS Genius Documents Servers / Databases Bridging the Knowledge Gap: Searching SharePoint, E-Notebook, Chromatography Data Systems and Unstructured Documents for Chemical and other Scientific Information to Enable Cooperation, Collaboration and Improved Decision Making The development of discrete information systems to capture scientific information, such as Document Management Systems (DMS, e.g. SharePoint), Electronic Lab Notebooks (ELN) and Chromatography Data Systems has led to information being largely distributed in different silos, reflecting the scientific disciplines of the primary users. In-application search tools often perform badly because systems tend to be optimized for data capture rather than search and retrieval. Increasing demands for cross-disciplinary collaboration and decision making have increased the need for highly adaptable, cross-application, scientifically aware search tools that can aid project management and scientific discovery. Collaboration with key industry leaders has identified three interfaces where sub-optimal scientific and chemical searchability has hindered information gathering and sharing, namely 1) Cross-application searching between SharePoint and E-Notebook, including chemical structure searchability; 2) Rapid chemical reaction searching across multiple Electronic Lab Notebook systems; and 3) Chemically-aware data mining in a chromatography data system to connect method data and results to structural information. E-Notebook: Rapid Reaction Searching & Federated Searches Industry Problem Statement: Reaction information, stored within ELNs, requires the user to access the ELN to conduct the search. Searches cannot be consolidated across multiple ELNs. Traditional searches wait until completion of the search before returning hits, leading to slower performance across large data sets. Lab performance metrics are difficult to extract. Desired Outcomes: 1) An external interface to rapidly search reactions; 2) Federated searches across multiple ELN or other reaction data sources ; 3) Extraction of performance metrics such as number of reactions per scientist, project or site. Solution: Reaction Genius was developed to extract reaction information from one, or multiple, ELNs along with relevant metadata, such as project, user, creation date, yield, temperature etc. The data is extracted and consolidated within a single XML document which is then transformed into new database tables that are optimized for search which allows searching of thousands of reactions in seconds. Results are returned in buckets of graded relevancy, providing the most important hits immediately, as presented in an intuitive web form. Metadata associated with the record (date, project etc.) can be displayed in graphical representations in a widget based dashboard, to give performance metrics tailored to the specific needs of the organization. Industry Problem Statement: Analytical method development requires significant trial and error to develop new methods for analyzing and identifying pharmaceutical intermediates and impurities. Historical data on the analysis of prior analytes is stored within Chromatography Data Systems (CDS) where peaks are labeled with unique identifying numbers. The peak labels are correlated with their chemical structure elsewhere. The CDS does not support a chemical similarity search which would be highly beneficial to predicting best methods for analysis of the latest compounds. Desired Outcomes: 1) A system to allow structure searching of analytical run data contained within a CDS, including substructure and structural similarity searching; 2) Presentation of run and method parameters within an intuitive interface; and 3) Additional chemical property search criteria, such as cLogP, for improved predictive properties. Solution: Method Genius extracts peak information, along with method and run parameters, from a CDS. This data is merged with structural information, held in an additional document or database, compiled within an XML document and transformed into database tables optimized for search. Results are returned in buckets of graded relevancy, providing the most important hits immediately, as presented in an intuitive web form. Chemical properties are calculated or predicted on the identified structures. Searches can be a combination of structure (either substructure or structure similarity), chemical properties (such as cLogP), and method and run parameters. A central data repository can be created to pull, merge and reorganize information from a combination of file shares and databases. The reorganized data can be optimized for faster retrieval, and to combine related data sets. Data views can be tailored towards specific use cases and delivered through either web pages or application specific forms. Philip J Skinner PhD, Phil McHale D. Phil, Rudy Potenzone PhD, Kate Blanchard, and Megean Schoenberg PerkinElmer Informatics, 100 CambridgePark Drive, Cambridge, MA02140 Industry Problem Statement: Scientific information is dispersed across and external to the organization, in E-Notebook, SharePoint and the web. It cannot be simultaneously searched from a single interface, nor can SharePoint be searched for chemical structure. Search results cannot be stored readily within the E-Notebook as a record of the inventive thought process. Desired Outcomes: 1) A search tool to enable simultaneous structure searching on SharePoint and E-Notebook content, and text searching on both and the web; 2) An easily configurable interface for searching and presenting results; 3) Ability to record the thought process going into searches for possible intellectual property around inventive steps. Solution: Search Genius for SharePoint enhances Microsoft SharePoint to include searches by chemical structure. Custom web parts were introduced to easily expose widgets for searching and displaying results from within the SharePoint framework. All included datasources with relevant chemical data are crawled and indexed by structure. Chemical hits are combined with relevant textual hits to produce a comprehensive result list for the user, allowing federated searches, across the web, SharePoint and E-Notebook simultaneously, from either SharePoint or the E-Notebook. Any search result can be stored, annotated, and saved in the experimental record within E-Notebook as evidence of the thought process. ELN &/or Reaction Databases All three solutions were assessed by industrial partners and found to meet or exceed the desired outcomes. Production implementation is currently underway, in addition to extensions including incorporation of data from custom in-house systems including robotics and additional bidirectional data transfer. Further information available by email ([email protected]), at www.perkinelmer.com/informatics, from our booth (#219) or by scanning the QR code above into your mobile device. CDS Compound Registry Reaction Genius Method Genius Web FAST Science Parser E-Notebook SharePoint Front End E-Notebook Client Scan to download a copy of this poster Introduction Concept SharePoint and E-Notebook Cross Platform and Chemical Searching E-Notebook: Rapid Reaction Searching & Federated Searches Chromatography Data Systems & Structural Searching Conclusions Federate chemical and text searching, from SharePoint, the web and E-Notebook from either a SharePoint or E-Notebook front end. Results can be viewed from either SharePoint or E-Notebook Collect links and articles as a thought experiment within E-Notebook to document and protect the invention process Performance metrics highlight the most productive scientists, teams, projects or sites. Dashboard is built on a widget model to allow easy customization and hence institutionally specific views into the data. Widgets provide real- time views of the most recent additions. Results are returned in buckets, with the most relevant results returned first. Combined structure, chemical , experimental and hierarchical property search parameters. Expandable reaction graph to explore precursors and products throughout the synthetic scheme. Structural results are returned in buckets, with the most relevant results returned first. . All the identified peaks in a run are associated with their relevant structures. . Combined structure, property and metadata search parameters. . Reports consolidate structural, chemical properties and run and method parameters where relevant. . .

Bridging the Knowledge Gap: Searching SharePoint, E ...Chemical hits are combined with relevant textual hits to produce a comprehensive result list for the user, allowing federated

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bridging the Knowledge Gap: Searching SharePoint, E ...Chemical hits are combined with relevant textual hits to produce a comprehensive result list for the user, allowing federated

RESEARCH POSTER PRESENTATION DESIGN © 2011

www.PosterPresentations.com

Web views of use cases

SharePoint

ELN

CDS Genius

Documents Servers / Databases

Bridging the Knowledge Gap: Searching SharePoint, E-Notebook,

Chromatography Data Systems and Unstructured Documents for

Chemical and other Scientific Information to Enable Cooperation,

Collaboration and Improved Decision Making

The development of discrete information systems to capture scientific information, such as Document

Management Systems (DMS, e.g. SharePoint), Electronic Lab Notebooks (ELN) and Chromatography Data

Systems has led to information being largely distributed in different silos, reflecting the scientific disciplines of

the primary users. In-application search tools often perform badly because systems tend to be optimized for

data capture rather than search and retrieval. Increasing demands for cross-disciplinary collaboration and

decision making have increased the need for highly adaptable, cross-application, scientifically aware search

tools that can aid project management and scientific discovery.

Collaboration with key industry leaders has identified three interfaces where sub-optimal scientific and chemical

searchability has hindered information gathering and sharing, namely

1) Cross-application searching between SharePoint and E-Notebook, including chemical structure

searchability;

2) Rapid chemical reaction searching across multiple Electronic Lab Notebook systems; and

3) Chemically-aware data mining in a chromatography data system to connect method data and results to

structural information.

E-Notebook: Rapid Reaction Searching & Federated Searches

Industry Problem Statement: Reaction information, stored within ELNs, requires the user to access the ELN

to conduct the search. Searches cannot be consolidated across multiple ELNs. Traditional searches wait until

completion of the search before returning hits, leading to slower performance across large data sets. Lab

performance metrics are difficult to extract.

Desired Outcomes: 1) An external interface to rapidly search reactions; 2) Federated searches across multiple

ELN or other reaction data sources ; 3) Extraction of performance metrics such as number of reactions per

scientist, project or site.

Solution: Reaction Genius was developed to extract reaction information from one, or multiple, ELNs along

with relevant metadata, such as project, user, creation date, yield, temperature etc. The data is extracted and

consolidated within a single XML document which is then transformed into new database tables that are

optimized for search which allows searching of thousands of reactions in seconds. Results are returned in

buckets of graded relevancy, providing the most important hits immediately, as presented in an intuitive web

form. Metadata associated with the record (date, project etc.) can be displayed in graphical representations in a

widget based dashboard, to give performance metrics tailored to the specific needs of the organization.

Industry Problem Statement: Analytical method development requires significant trial and error to develop

new methods for analyzing and identifying pharmaceutical intermediates and impurities. Historical data on the

analysis of prior analytes is stored within Chromatography Data Systems (CDS) where peaks are labeled with

unique identifying numbers. The peak labels are correlated with their chemical structure elsewhere. The CDS

does not support a chemical similarity search which would be highly beneficial to predicting best methods for

analysis of the latest compounds.

Desired Outcomes: 1) A system to allow structure searching of analytical run data contained within a CDS,

including substructure and structural similarity searching; 2) Presentation of run and method parameters within

an intuitive interface; and 3) Additional chemical property search criteria, such as cLogP, for improved predictive

properties.

Solution: Method Genius extracts peak information, along with method and run parameters, from a CDS. This

data is merged with structural information, held in an additional document or database, compiled within an XML

document and transformed into database tables optimized for search. Results are returned in buckets of

graded relevancy, providing the most important hits immediately, as presented in an intuitive web form.

Chemical properties are calculated or predicted on the identified structures. Searches can be a combination of

structure (either substructure or structure similarity), chemical properties (such as cLogP), and method and run

parameters.

A central data repository can be created to

pull, merge and reorganize information

from a combination of file shares and

databases. The reorganized data can be

optimized for faster retrieval, and to

combine related data sets. Data views can

be tailored towards specific use cases and

delivered through either web pages or

application specific forms.

Philip J Skinner PhD, Phil McHale D. Phil, Rudy Potenzone PhD, Kate Blanchard, and Megean Schoenberg

PerkinElmer Informatics, 100 CambridgePark Drive, Cambridge, MA02140

Industry Problem Statement: Scientific information is dispersed across and external to the organization, in

E-Notebook, SharePoint and the web. It cannot be simultaneously searched from a single interface, nor can

SharePoint be searched for chemical structure. Search results cannot be stored readily within the E-Notebook

as a record of the inventive thought process.

Desired Outcomes: 1) A search tool to enable simultaneous structure searching on SharePoint and

E-Notebook content, and text searching on both and the web; 2) An easily configurable interface for searching

and presenting results; 3) Ability to record the thought process going into searches for possible intellectual

property around inventive steps.

Solution: Search Genius for SharePoint enhances Microsoft SharePoint to include searches by chemical

structure. Custom web parts were introduced to easily expose widgets for searching and displaying results from

within the SharePoint framework. All included datasources with relevant chemical data are crawled and

indexed by structure. Chemical hits are combined with relevant textual hits to produce a comprehensive result

list for the user, allowing federated searches, across the web, SharePoint and E-Notebook simultaneously, from

either SharePoint or the E-Notebook. Any search result can be stored, annotated, and saved in the

experimental record within E-Notebook as evidence of the thought process.

ELN &/or Reaction Databases

All three solutions were assessed by industrial partners and found to meet or exceed the desired outcomes.

Production implementation is currently underway, in addition to extensions including incorporation of data from

custom in-house systems including robotics and additional bidirectional data transfer.

Further information available by email ([email protected]), at www.perkinelmer.com/informatics, from our booth (#219) or

by scanning the QR code above into your mobile device.

CDS Compound Registry

Reaction Genius

Method Genius

Web

FAST Science Parser

E-Notebook

SharePoint Front End

E-Notebook Client

Scan to download a copy of this poster

Introduction

Concept

SharePoint and E-Notebook Cross Platform and

Chemical Searching

E-Notebook: Rapid Reaction Searching &

Federated Searches

Chromatography Data Systems & Structural Searching

Conclusions

Federate chemical and text searching, from SharePoint, the web

and E-Notebook from either a SharePoint or

E-Notebook front end.

Results can be viewed from either

SharePoint or E-Notebook

Collect links and articles as a thought

experiment within E-Notebook to document

and protect the invention process

Performance metrics highlight the

most productive scientists, teams,

projects or sites.

Dashboard is built on a widget

model to allow easy customization

and hence institutionally specific

views into the data.

Widgets provide real-

time views of the most

recent additions.

Results are returned in buckets,

with the most relevant results

returned first.

Combined structure, chemical ,

experimental and hierarchical

property search parameters.

Expandable reaction graph to

explore precursors and products

throughout the synthetic scheme.

Structural results are

returned in buckets, with

the most relevant results

returned first.

.

All the identified peaks in a run

are associated with their

relevant structures.

.

Combined structure, property and

metadata search parameters.

.

Reports consolidate structural, chemical

properties and run and method parameters

where relevant.

.

.