52
Bioinformatics 2.0/3.0 Kei Cheung Yale Center for Medical Informatics

Bioinformatics 2.0/3.0

Embed Size (px)

DESCRIPTION

Bioinformatics 2.0/3.0. Kei Cheung Yale Center for Medical Informatics. Outline. Introduction Web 2.0 Web 3.0 Semantic Web Topic Map Merging Web 2.0 and Web 3.0. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Bioinformatics 2.0/3.0

Bioinformatics 2.0/3.0

Kei Cheung

Yale Center for Medical Informatics

Page 2: Bioinformatics 2.0/3.0

Outline

• Introduction

• Web 2.0

• Web 3.0 – Semantic Web– Topic Map

• Merging Web 2.0 and Web 3.0

Page 3: Bioinformatics 2.0/3.0

Introduction

• The Human Genome Project (HGP) has transformed genome sciences from being experimental to being increasingly computational

• HGP has intensified the growth of bioinformatics• The Web has become a popular medium for accessing

information over the Internet• Numerous bioinformatics databases and tools are Web

accessible• These databases and tools as well as the Web have

become indispensable for modern-day genomic research• Web 1.0 -> Web 2.0 -> Web 3.0

Page 4: Bioinformatics 2.0/3.0

Web 1.0

• It is read-only

• It is about a single person, organization, …

• It is document centric

• It is based on HTML

• It is for human to read

Page 5: Bioinformatics 2.0/3.0

Web 2.0

Page 6: Bioinformatics 2.0/3.0

Web 2.0

• Social networking (wiki, blog, tagging, bookmarking, rating, etc)

• Multimedia content (photo, audio, video, etc)

• Interactive, responsive, and dynamic web interface (Facebook, Flickr, YouTube, etc)

• Mashup (assembly tools and visualization tools)

Page 7: Bioinformatics 2.0/3.0

Folksonomy (Social Tagging)

• Folksonomy is the practice and method of collaboratively creating and managing tags to annotate and categorize content

• In contrast to traditional subject indexing, metadata is not only generated by experts but also by creators and consumers of the content

• Freely chosen keywords are used instead of a controlled vocabulary

Page 8: Bioinformatics 2.0/3.0

Tag Cloud

• A tag cloud (or weighted list in visual design) is a visual depiction of user-generated tags used typically to describe the content of web sites.

Page 9: Bioinformatics 2.0/3.0

Web 2.0 (cont’d)

• It is decentralized

• It is a community/collaborator model instead of authority/consumer model

• It is fun

• It can be seriously used to share and integrate scientific datasets and algorithms

Page 10: Bioinformatics 2.0/3.0

Bioinformatics Applications of Web 2.0

Page 11: Bioinformatics 2.0/3.0

Wiki Proteins

Page 12: Bioinformatics 2.0/3.0

Nature Precedings (pre-publication research and preliminary findings)

Page 13: Bioinformatics 2.0/3.0

Scientific Podcasts

Page 14: Bioinformatics 2.0/3.0

Multimedia (cont’d)

Page 15: Bioinformatics 2.0/3.0

Journal of Visualized Experiments

Page 16: Bioinformatics 2.0/3.0

myExperiment

Page 17: Bioinformatics 2.0/3.0

Mashup (1): Assembly Tools

• Dapper (scrape web content and convert it into machine readable format)

• Yahoo! Pipes (fetch, filter, and integrate data)

Page 18: Bioinformatics 2.0/3.0

Yahoo! Pipes Demo

Page 19: Bioinformatics 2.0/3.0

Yahoo! Pipes Use Case

Page 20: Bioinformatics 2.0/3.0

GeoCommons: Mashup of Maps

Page 21: Bioinformatics 2.0/3.0

Mashup (2): Visualization Tools

• E.g., Google Earth

Page 22: Bioinformatics 2.0/3.0

Geo-Mashup: Google Earth (tracking H5N1 virus over time)

Page 23: Bioinformatics 2.0/3.0

Bioinformatics Mashup’s

• Mashup of biological entities of the same type– Protein network mashup– Sequence annotation mashup

• Mashup of biological entities of different types

Page 24: Bioinformatics 2.0/3.0

Mashup of pathway data and gene expression data

Calvin cycle pathway associated with gene expressions

Page 25: Bioinformatics 2.0/3.0

Challenges to Data Mashup

• Lack of annotation

• Lack of links

• Lack of link semantics

• Lack of data semantics

• Lack of standards or use of standards

Page 26: Bioinformatics 2.0/3.0

Lack of Semantic Annotation

Kei Tsi Daniel Cheng(this is not me!!)

Kei Cheung (16 years ago)

Kei Cheung(6 months ago)

Page 27: Bioinformatics 2.0/3.0

Lack of Links

colllaborators

Page 28: Bioinformatics 2.0/3.0

Lack of Link Semantics

(?)prototyped

Page 29: Bioinformatics 2.0/3.0

Lack of Data Semantics

<html”<body> …<table><tr><td>Alcohol Dehydrogenase 1B (class I), beta polypeptide</td><td>ADH1B</td></tr> …</table> …</body></html>

Page 30: Bioinformatics 2.0/3.0

Lack of Standards (Use of Standards)

• Different naming rules (based on phenotype, sequence, function, organisms, etc)– Armadillo (fruitflies) vs. i-catenin (mice)– PSM1 (human) = PSM2 (yeast); PSM1 (yeast) = PSM2 (human)– Sonic Hedgehog

• ID proliferation – Different ID schemes: 1OF1  (PDB ID) and P06478 (SwissProt

ID) correspond to Herpes Thymidine Kinase– Lexcial variation: GO1234, GO:1234, GO-1234

• Synonyms vs. homonyms– Dopamine receptor D2: DRD2, DRD-2, D2– PSA: prostate specific antigen, puromycin-sensitive

aminopeptidase, psoriatric arthritis, pig serum albumin

Page 31: Bioinformatics 2.0/3.0

Web 3.0

Page 32: Bioinformatics 2.0/3.0

Web 3.0

• It refers to a third generation of Internet-based services that emphasize machine-facilitated understanding of information in order to provide a more productive and intuitive user experience. – Semantic Web– Topic Map

Page 33: Bioinformatics 2.0/3.0

Semantic Web• "The Semantic Web is an extension of the current web in which

information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001

• It provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries

• It is based on the Resource Description Framework (RDF)– URI for naming/identify web objects– Graph structure (directed acyclic graph or DAG) for connecting web

objects

Page 34: Bioinformatics 2.0/3.0

Resource Description Framework (RDF)

• It is a standard data model (directed acyclic graph) for representing information (metadata) about resources in the World Wide Web

• In general, it can be used to represent information about “things” or “resources” that can be identified (using URI’s) on the Web

• It is intended to provide a simple way to make statements (descriptions) about Web resources

Page 35: Bioinformatics 2.0/3.0

RDF Statement

A RDF statement consists of:• Subject: resource identified by a URI• Predicate: property (as defined in a name space identified by a

URI) • Object: property value (literal) or a resource

A resource can be described by multiple statements.

Page 36: Bioinformatics 2.0/3.0

<?xml version="1.0"?> <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:en=“http://en.wikipedia.org/wiki/” ><rdf:Description about=“http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&list_uids=125”>

<en:name>Alcohol Dehydrogenase 1B (class I), beta polypeptide”></en:name><en:synonym>ADH1B</en:synonym>

</rdf:Description></rdf:RDF>

Graphical & XML Representationhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&list_uids=125

“Alcohol Dehydrogenase 1B (class I), beta polypeptide”

http://en.wikipedia.org/wiki/Namehttp://en.wikipedia.org/wiki/Snynonym

“ADH1B”

Page 37: Bioinformatics 2.0/3.0

RDF Schema (RDFS)

• RDF Schema terms:– Class– Property– type– subClassOf– range– Domain

• Example:<DNASequence, type, Class><Promoter,subClassOf,DNASequence><Protein,type,Class><TranscriptionFactor,subClassOf,Protein><Bind,type,Property><Bind,domain, TranscriptionFactor><Bind,range, Promoter>

Page 38: Bioinformatics 2.0/3.0

Ontologies

• In both computer science and information science, an ontology is a representation of a set of concepts within a domain and the relationships between those concepts.

• It is a shared conceptualization of a domain

• Ontologies are commonly encoded using ontology languages.

Page 39: Bioinformatics 2.0/3.0

Web Ontology Language (OWL)

• Latest standard in ontology languages from the W3C

• Built on top of RDF

• OWL semantically extends RDF while it is syntactically the same as RDF

• Three species of OWL– OWL-Lite– OWL-DL– OWL-Full

Page 40: Bioinformatics 2.0/3.0

OWL > RDF/RDFS

• Cardinality restrictions: (e.g., a gene may have more than one transcription factor binding sites)

• Disjointedness of classes: (e.g., mRNA may be classified either as introns or exons)

• Other OWL constructs – uniqueness: (e.g.,a GO term can have only one GO identifier)– unionOf: (e.g., gene may be the unionOf intron and exons– sameAs: specifying synonymous relationship between classes

(e.g., “Cerebellar Purkinje Cell” sameAs “Purkinje Neuron”).

Page 41: Bioinformatics 2.0/3.0

Topic Map• A topic map (an ISO standard) is used represent

information using topics (concepts), associations, and occurrences

• It is used to organize information in a way that can be optimized for navigation.

association

occurrence

Page 42: Bioinformatics 2.0/3.0

Neuroscience Topic Map

Page 43: Bioinformatics 2.0/3.0

Topic Map Encoding/Querying

• XML Topic Map (XTM)

• Top Map Query Language (TMQL)

Page 44: Bioinformatics 2.0/3.0

Visual Topic Maps

• A Visual Topic Map can be defined as a topic map including visual topics. A visual topic is defined by a topic name which refers to a visual content.

Page 45: Bioinformatics 2.0/3.0

NCBI Site Map

Page 46: Bioinformatics 2.0/3.0

Mosaic of Chinese Characters in Stories about the Meaning of Ideograms

Page 47: Bioinformatics 2.0/3.0
Page 48: Bioinformatics 2.0/3.0

Visualization of the del.icio.us Tags in an Interactive Graph

Page 49: Bioinformatics 2.0/3.0

Combining Semantic Web and Topic Map

Topic MapSemantic Web

Visualization

Machinereasoning

Knowledge organization & representation (mapping between XTM and RDF/OWL)

Page 50: Bioinformatics 2.0/3.0

Web 2.0 Meets Web 3.0

• Folksonomy meets ontology– Tags can evolve into standard heavy-weight

ontologies, while light-weight ontologies can be applied to tagging

• Human readability meets machine readability– Visual network vs. semantic network

• Social network meets semantic network– FOAF, semantic wiki

• Syntactic mashup meets semantic mashup– Dapper and yahoo pipes may become ontologically

aware

Page 51: Bioinformatics 2.0/3.0

Conclusions

• Web 2.0 and 3.0 provides a platform for data/tool sharing and integration (mashup) and scientific collaboration

• More use cases are needed• Question?

– While Web 1.0 has played an important role in organizing/disseminating information produced by HGP, can Web 2.0/3.0 offer more to present “big science” projects like ENCODE?

Page 52: Bioinformatics 2.0/3.0

The End