1
Biotea: RDFizing PubMed Central in Support for the Paper as an Interface to the Web of Data Leyla Garcia Castro Departamento de Leguajes y Sistemas Informáticos Universitat Jaumé I Alexander Garcia, Casey Mclaughlin, Institute for Digital Information, Florida State University. Tallahassee Corresponding author: [email protected] In a nutshell, Biotea at http://biotea.idiginfo.org • Is a semantic dataset for full-text, open-access subset of PubMed Central • Makes extensive use of existing ontologies and semantic enrichment services • Supports the generation of self-describing machine- readable scholarly documents. • Comprises a flexible and adaptable set of tools for metadata enrichment and semantic processing of biomedical documents. • Provides semantically rich and highly interconnected dataset with self-describing content. AGC and CM have been funded by US DoD Grant MOMRP w81xwh-10-2-0181. Scholarly data and documents are of most value when they are interconnected rather than independent Christine L. Borgman Consuming the dataset, a first prototype Graph-based retrieval for the terms “catalase”; only shared terms with more than 30 associated biological terms are included in the results. Search and retrieval based on human gene names: the term is resolved with GeneWiki, and the associated UniProt accession is used in the query RDF4PMC and Bio2RDF 1. Retrieval: Metadata + Cloud of annotations Enriched content based on annotations is displayed in the interactive zone Interactive zone Context ual reading Graphic al tools 2. Enriched content facts-based reading NXML Metadata BIBO RDFized article Content CNT Provenance PROV-O VOID Annotation Enriched content RDFization 1. Metadata & content 2. Semantic content enrichment RDF4PMC, our workflow 3. Navigating the neighborhood Consuming the dataset, SPARQL and API Retrieval Service A list of terms and their related topics http://biotea.idiginfo.org/api/terms A list of topics and their related vocabularies http://biotea.idiginfo.org/api/topics All topics related to a term e.g., http://biotea.idiginfo.org/api/topics? term=cancer All vocabularies related to a term e.g., http://biotea.idiginfo.org/api/vocabularies? term=cancer All terms that start with a specific string (for autocompletion) e.g.,http://biotea.idiginfo.org/api/terms? prefix=canc All topics related to a vocabulary e.g., http://biotea.idiginfo.org/api/topics? vocabulary=po RDF of articles that include a term e.g., http://biotea.idiginfo.org/api/articles? SPARQL query Query expressed in natural language SELECT distinct ?pmid WHERE { ?article a bibo:AcademicArticle ; bibo:pmid ?pmid . ?annotation a aot:ExactQualifier ; ao:annotatesResource ?article ; ao:hasTopic <http://purl.obolibrary.org/obo/CHEBI_60004 > . } Retrieving PubMed identifier for those articles that have been semantically annotated with the biological entity CHEBI:60004. The semantic annotation comes from the occurrence of the term “mixture” in any paragraph of the retrieved

Biotea poster biolinks at ISMB 2013

Embed Size (px)

Citation preview

Page 1: Biotea poster biolinks at ISMB 2013

Biotea: RDFizing PubMed Central in Support for the Paper as an Interface to the Web of Data

Leyla Garcia CastroDepartamento de Leguajes y Sistemas Informáticos

Universitat Jaumé I

Alexander Garcia, Casey Mclaughlin, Institute for Digital Information,

Florida State University. Tallahassee

Corresponding author: [email protected]

In a nutshell, Biotea at http://biotea.idiginfo.org• Is a semantic dataset for full-text, open-access subset of PubMed Central• Makes extensive use of existing ontologies and semantic enrichment services• Supports the generation of self-describing machine- readable scholarly

documents. • Comprises a flexible and adaptable set of tools for metadata enrichment and

semantic processing of biomedical documents.• Provides semantically rich and highly interconnected dataset with self-describing

content.

AGC and CM have been funded by US DoD Grant MOMRP w81xwh-10-2-0181.

Scholarly data and documents are of most value when they are interconnected rather than independent Christine L. Borgman

Consuming the dataset, a first prototype

Graph-based retrieval for the terms “catalase”; only shared terms with more than 30 associated biological terms are included in the results.

Search and retrieval based on human gene names: the term is resolved with GeneWiki, and the associated UniProt accession is used in the query

RDF4PMC and Bio2RDF

1. Retrieval: Metadata + Cloud of annotations

Enriched content based on annotations is displayed in the interactive zone

Interactive zone

Contextual reading

Graphical tools

2. Enriched content facts-based reading

NXML

MetadataBIBO

RDFized article

Content CNT Provenance PROV-OVOID

Annotation

Enriched contentRDFization

1. Metadata & content

2. Semantic content enrichment

RDF4PMC, our workflow

3. Navigating the neighborhood

Consuming the dataset, SPARQL and API Retrieval Service

A list of terms and their related topics http://biotea.idiginfo.org/api/terms

A list of topics and their related vocabularies http://biotea.idiginfo.org/api/topics

All topics related to a term e.g., http://biotea.idiginfo.org/api/topics?term=cancer

All vocabularies related to a term e.g., http://biotea.idiginfo.org/api/vocabularies?term=cancer

All terms that start with a specific string (for autocompletion) e.g.,http://biotea.idiginfo.org/api/terms?prefix=canc

All topics related to a vocabulary e.g., http://biotea.idiginfo.org/api/topics?vocabulary=po

RDF of articles that include a term e.g., http://biotea.idiginfo.org/api/articles?term=cancer

Count of RDF of articles that include a term e.g., http://biotea.idiginfo.org/api/articles?term=cancer&count=true

A list of vocabularies and their prefixes http://biotea.idiginfo.org/vocabularies

RDF of articles that include a vocabulary e.g., http://biotea.idiginfo.org/api/articles?vocabulary=po

SPARQL query Query expressed in natural language

SELECT distinct ?pmidWHERE { ?article a bibo:AcademicArticle ; bibo:pmid ?pmid .?annotation a aot:ExactQualifier ;ao:annotatesResource ?article ;ao:hasTopic <http://purl.obolibrary.org/obo/CHEBI_60004> .}

Retrieving PubMed identifier for those articles that

have been semantically annotated with the biological

entity CHEBI:60004. The semantic annotation comes

from the occurrence of the term “mixture” in any

paragraph of the retrieved articles.