Semantic web technologies applied to bioinformatics and laboratory data management

Embed Size (px)

Citation preview

Presentacin de PowerPoint

Semantic web technologies applied to bioinformatics and laboratory data management

Toni Hermoso [email protected]

Bioinformatics Core Facility

http://biocore.crg.cat

THE CLASSICAL WEB

> SyntaxMarkup languages (HTML, XHTML, etc.)

> ContentText inside the tags (or as attributes)

> StyleHTML tags themselves

CSS (in content or as external files)

Robert CailliauWWW fomer logo

Tim Berners-Lee, Robert Cailiau. CERN (1990)

THE CLASSICAL WEB

WEB 2.0

> Buzz word. First coinage associated to Tim O'Reilly.

> The term "Web 2.0" (2004present) is commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centered design, and collaboration on the World Wide Web.

> Examples of Web 2.0 include web-based communities, hosted services, web applications, social-networking sites, video-sharing sites, wikis, blogs, mashups, and folksonomies.

> AJAX, RSS, Web APIs

wikis may allow anyone to edit

wikis are intended to be easy to use

wiki content is easy to link

wikis support tracking of all changes

wikis may allow upload different media

Wiki Wiki !

WikiWikiWeb. Ward Cunningham - 1994

MediaWiki

> Most popular wiki software

> Behind Wikimedia Foundation.

> The most know implementation is: Wikipedia http://www.wikipedia.org

First version 2002.Wikipedia before UseModWiki (Perl Wiki)

Gene Wiki: Gene annotation project in Wikipedia

http://en.wikipedia.org/wiki/Portal:Gene_Wiki

> Approach rellevant human genes information to end-users

> Manual collaborative annotation & automated external reference thanks to robot software

> Wikipedia portal within Molecular and Cellular Biology Project

Published September 2009

Gene Wiki: Gene annotation project in Wikipedia

GENE WIKI

> Example of a wiki pageReelin

GENE WIKI

> Example of a wiki category pageHuman proteins

GENE WIKI

> Example of a wiki source page:Reelin

GENEWIKI

> Example of a wiki template page:Reelin

Web parsing / scraping

> To get information from a HTML source (wiki included)

Download tools: Lynx

Wget

Perl LWP

Perl WWW::Mechanize

Python Beautiful Soap

Web parsing / scraping

> Processing content. (example, EC: 3.4.21.-)Regular expressionss/ http://en.wikipedia.org/w/api.php

MediaWiki API

> Common scripting with Python or Perl: MediaWiki::Bot

> You can get / store information from/in wiki.

MediaWiki API

> Easier to extract data:Retrieve wiki syntax, not direct HTML content

Useful when templates are used

Can retrieve all pages from a category

SEMANTIC WEB

> The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in collaboration.

Sir Tim Berners-Lee

The Berners-Lee Semantic Web Birthday Cake

http://www.mkbergman.com/231/from-data-federation-pyramid-to-the-semantic-web-birthday-cake/

The evolution of the Web

SNPedia: a semantic wiki for human genetic studies

> http://www.snpedia.com (starts from 2007)

> Semantic MediaWiki (first releases 2005)

> Database of SNP (Single Nucleotide Polymorphisms)

> In September 2009, website claimed 7,938 SNPs in their database.

> Predictive medicine report against SNPedia using Promethease:An application to query SNPedia against your genotyping

SNPedia: a semantic wiki for human genetic studies

SNPedia

> Example of a wiki pageRs333

SNPedia

> Example of a wiki pageRs333

SNPedia

> Example of a wiki page propertiesRs333

SNPedia

> Example of a page property (disease) valueHIV

Semantic MediaWiki Data Types

* Type:Page: links to pages (the default) * Type:String: text strings that are not longer than 250 letters * Type:Number: integer and decimal numbers with optional exponent * Type:Boolean: restricts the value of a property to true/false (also 1/0 and yes/no) * Type:Date: specifies particular points in time * Type:Text: like Type:String but can have unlimited length; the trade-off is values of this type cannot be selection or sort criteria in queries. * Type:Code: like Type:Text but with additional precautions to preserve special formatting as used for technical texts. The value displays as regular text everywhere else (query results, factbox, "Pages using the property", etc.). * Type:Temperature: variation of Type:Number that supports uits of temperature (cannot be user-defined since converting temperature units is more complicated than multiplying by a conversion factor). * Type:Telephone number: validates and stores international telephone numbers based on the RFC 3966 standard * Type:Record: type for compound property values that consists of a short list of values with fixed type and order

Semantic MediaWiki Data Types

For specifying URLs and emails, there are some special variations of the string data type:

* Type:URL: displays an external link to its URL object * Type:Email: displays an e-mail address as a link (with mailto:) * Type:Annotation URI: similar to Type:URL but with some technical differences in SMW's RDF export

Some extension provide further types:

* Type:Geographic coordinate (provided by Semantic Maps): describes geographic locations. Different forms of geographic coordinates are supported.

http://semantic-mediawiki.org/wiki/Help:Properties_and_types

SNPedia RDF behind a wiki page

RDF (Resource Description Framework)

Triple {subject, property/predicate, object}

Defining & describing data and relations among data

Suitable to attach metadata to certain resources

Understood by machines (not so much by humans)

Normally in XML format

Alternative: RDFa (in XHTML pages directly)

RDF: Gene Ontology

OWL: Gene Ontology

SPARQL

RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language.

Example query of Wikipedia: http://dbpedia.org/sparql

Example query of biological resources:

http://www.semantic-systems-biology.org/biogateway/querying

SPARQL

SPARQL

Semantic MediaWiki vs MediaWiki (I)

Semantic MediaWiki (and other semantic addons) is an extension of MediaWiki.At least as much as with MediaWiki.

Better and more specific search capabilitiesNot only free text search on pagesIt resembles relational database searchingSPARQL =~ SQL

Semantic MediaWiki vs MediaWiki (II)

Better browsing interface (browsing through properties, not only categories)

Importing and exporting of logical mesh.

Easier exchange of information with 3rd party applications (through RDF)

Protein-Wiki

Semantic wiki-based system for the management of a protein production service.

Currently in testing phase

In collaboration with CRG Protein ServiceCustomisation built up after study of their present workflow and actual needs.

Intended for internal use

Protein-Wiki. Advantages

> Cheaper approach than most commercial similar solutions

> Open-source technology. Blooming comunity behind.

> Avoidance of vendor lock-in and abusive licensing.

> Customisable to specific needs. Extrapolable to other cases.

Protein-Wiki. Example Workflow

Create study

Accept

Lab Member

Researcher

Access web interface

Fill form

Submit request

Reject

Review scientific info

Review study

Accept

Reject

Reject

Lab Manager

Assign study to core members

Finance Controller(ORDER MANAGEMENT SYSTEM)

Review financialInfo?

Accept

Open study

Retrieve SOP

Perform all study steps

(quotation)

Review study results

Reject

Sign-off (?)

Request review

Prepare report

(order number)

Send results/report

(communication)

//

Retrieve results/report

Sign-off (?)

Accept

Meeting

Meeting

Meeting

Meeting

Meeting

Meeting

// ?

//

// ?

Receive invoice

Design: Guglielmo Roma

Protein-Wiki: Users roles

Submit requests to the service using pre-defined templates, view the status of his/her requests at any time, and retrieve the study reports when experiments are complete

Can add, edit experimental data, cannot create or delete experiments.

Can create, edit, delete new experiments, associated to submitted requests, using pre-defined templates

Creation of new templates, users management and their training

ResearcherLab memberLab managerAdministrator

Protein-Wiki: permissions & security

Login & role permissions. Done automatically or via administrator

Namespaces specific permissions: Experiment:: (only lab members/managers)Template:: (only administrators)

Page specific permissionsBy using user and parse functions extensions

Network?

Protein-Wiki Homepage

Protein-Wiki. Request Form

Anonymous

Protein-Wiki. Request Form

Researcher

Protein-Wiki. Request result page

Researcher

Protein-Wiki. Enable experiment

Lab manager

Protein-Wiki. Experiment form

Lab member

Protein-Wiki. Experiment form

Lab member

Protein-Wiki. Experiment form

Lab member

Logical inputRestrictions. DataType linked

Protein-Wiki

Experiment page

Lab member

Protein-Wiki

Browse experiment properties

Protein-Wiki. Semantic properties

Administrator

Allowedvalues

Invalid value

Protein-Wiki. Conditional syntax

Enable certain experiment sections if asked by the researcher or lab manager

Input value restriction at the form level
Example: Only nucleotides allowed in Primer sequences

Protein-Wiki. List of tasks

May be visible or not to researchers. Workload.

Different fields depending on the user's role.

Protein-Wiki. List of tasks

Lab memberAny kind of customised listcan be created fromsemantic properties.

Conclusions (I)

Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments

Efficient collaboration between different usersGroup roles specific permissionsResearchers, lab members, lab managers, administrators

Well-know interface. All people should have edited Wikipedia once!

Note-taking in wiki for future consultation

Conclusions (II)

Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments

Users can be both humans and robot script applications

Refined and specific queries

Logic connection with other semantic empowered software

Easy set up of new environments (high level programming)Wiki templates, properties and forms vs coding and database design

Conclusions (III)

Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments

Tracking (page history and recent changes)

Unless performed by the wiki administrator, workflow cannot be avoided

Unless performed by the system administrator, history cannot be forged.

Permits 3rd party quality check auditing

Bioinformatics Unit

Guglielmo Roma

Luca Cozzuto

Francesco Mancuso

Acknowledgments

Protein Service

Michela Bertero

Silvia Speroni

Miriam Alloza