32
Drupal and Apache Stanbol LINKED DATA BASED SEMANTIC ANNOTATION Gabriel Dragomir Sunday, August 18, 13

Linked data based semantic annotation using Drupal and Apache Stanbol

Embed Size (px)

DESCRIPTION

My presentation from Drupalaton 2013 - http://drupalaton.hu/schedule#speaker-30 This session will focus on the implementation of semantic services (automatic content enhancement, autotagging, content recommendation, reasoning) based on linked data datasets using the integration of Drupal with Apache Stanbol. During the presentation the audience will find out about: main features of Apache Stanbol and its integration with Drupal how to discover and use custom/domain specific Linked Data datasets with Apache Stanbol/Drupal how to build an advanced semantic processing chain in Apache Stanbol that will automatically annotate Drupal entities how to implement a content recommendation/reasoning feature for Drupal based on Apache Stanbol services. Apache Stanbol is an Open Source software stack designed to provide a powerful semantic engine via RESTful services returning results as RDF (Resource Description Language) and JSON. Unlike existing proprietary, commmerically oriented solutions such as OpenCalais, Apache Stanbol is highly customizable and may be trained to provide semantic services for virtually any language.

Citation preview

Page 1: Linked data based semantic annotation using Drupal and Apache Stanbol

Drupal and Apache StanbolLINKED DATA BASED SEMANTIC ANNOTATION

Gabriel Dragomir

Sunday, August 18, 13

Page 2: Linked data based semantic annotation using Drupal and Apache Stanbol

The Semantic Web

Tim Berners Lee:

‘‘The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a Web of data that can be processed directly or indirectly by machines.’’

Sunday, August 18, 13

Page 3: Linked data based semantic annotation using Drupal and Apache Stanbol

What’s the hype?Most organizations need to organize/analyze/relate huge amounts of textual, unstructured, dissipated data

Examples:

keyword extraction from content: annotate abstracts

text categorization: organize big volumes of text based on a thesaurus

media monitoring of tags: occurences of a specific keyword on social media channels

Sunday, August 18, 13

Page 4: Linked data based semantic annotation using Drupal and Apache Stanbol

Linked data

http://lod-cloud.net/Sunday, August 18, 13

Page 5: Linked data based semantic annotation using Drupal and Apache Stanbol

Linked data

Project started in 2007

Aimed at building the Web of Data by:

identifying open access data sets

converting them into RDF vocabularies

publish them as open access data sets

Sunday, August 18, 13

Page 6: Linked data based semantic annotation using Drupal and Apache Stanbol

Linked data ecosystem

Linked Open Vocabularies (LOV): http://lov.okfn.org/dataset/lov/

Provides a conceptual map of the vocabularies

Various providers: libraries, governmental actors, NGOs

Sunday, August 18, 13

Page 7: Linked data based semantic annotation using Drupal and Apache Stanbol

Linked data ecosystem

Where to find other data sets?

http://www.w3.org/2001/sw/wiki/SKOS/Datasets

Swoogle: http://swoogle.umbc.edu/

PoolParty: http://vocabulary.semantic-web.at

Sunday, August 18, 13

Page 8: Linked data based semantic annotation using Drupal and Apache Stanbol

Linked data at work!

Sunday, August 18, 13

Page 9: Linked data based semantic annotation using Drupal and Apache Stanbol

Semantic annotation

Creates specific metadata that enable new ways to retrieve and aggregate information

Annotations are done based on a conceptual scheme, an ontology (ex. FOAF, DC Core)

For more on ontologies see: http://www.w3.org/wiki/Good_Ontologies

The annotations build semantic relationships: e.g. rdf:type, owl:sameAs

Sunday, August 18, 13

Page 10: Linked data based semantic annotation using Drupal and Apache Stanbol

Semantic annotation

Most common uses:

Named Entity Linking: limited recognizing entities of type person, organization, place (e.g. OpenCalais)

Entityhub Linking: annotation based on vocabularies with no limitations of entity types. Requires more natural language processing prior to annotation.

Sunday, August 18, 13

Page 11: Linked data based semantic annotation using Drupal and Apache Stanbol

Apache Stanbol on the fly

Here comes Apache Stanbol

A new approach:

modular semantic analysis of documents

processing components can be built for virtually any language

flexible workflows via semantic annotation chains

any vocabulary (Linked Data, custom) can be used

Sunday, August 18, 13

Page 12: Linked data based semantic annotation using Drupal and Apache Stanbol

From IKS to Apache Stanbol

IKS - Interactive Knowledge Stack for small to medium CMS providers - EU funded consortium

An open source software stack written in Java

Goal: extract and process semantic data from documents

Project undergoing incubation at Apache Foundation

http://stanbol.apache.org

Sunday, August 18, 13

Page 13: Linked data based semantic annotation using Drupal and Apache Stanbol

Service oriented architecture

Stanbol is designed to offer service oriented integration

RESTful web services API returning RDF or JSON/JSON-LD

Each component exposes an endpoint independently

Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling

Remote component management

Sunday, August 18, 13

Page 14: Linked data based semantic annotation using Drupal and Apache Stanbol

ImplementationOSGi layer: Apache Felix and Apache Sling

Build environment: Apache Maven

RDF framework: Apache Clerezza

Triples store, reasoning engine: Apache Jena

Indexing and semantic search: Apache Solr

Content analysis/metadata extraction: Apache Tika

Natural language processing: Apache OpenNLP

Sunday, August 18, 13

Page 15: Linked data based semantic annotation using Drupal and Apache Stanbol

Architecture

Sunday, August 18, 13

Page 16: Linked data based semantic annotation using Drupal and Apache Stanbol

Components

Semantic layer:

Enhancer, EntityHub, ContentHub

Enhancement engines: internal, 3rd party

User interfaces

Knowledge integration (rule sets, reasoners)

Storage integration

Sunday, August 18, 13

Page 17: Linked data based semantic annotation using Drupal and Apache Stanbol

Content enhancement

Examples:

retrieve additional metadata for a piece of content

identify the language of a text

extract entities (persons, places, organizations)

create annotations to external sources

use 3rd party services for named entities recognition

Sunday, August 18, 13

Page 18: Linked data based semantic annotation using Drupal and Apache Stanbol

Drupal meets Stanbol

Several modules implement RDF support allowing data transport to Stanbol semantic annotations

Taxonomy system allows for complex annotation

Fieldable taxonomy terms allow for storage of complex semantic data

Sunday, August 18, 13

Page 19: Linked data based semantic annotation using Drupal and Apache Stanbol

User scenarios

Semantic indexing via Stanbol (SOLR yard)

Content enrichment with semantically related information (documents, factual data, images etc.)

Tag as you type: dynamic annotation of text in editors

Sunday, August 18, 13

Page 20: Linked data based semantic annotation using Drupal and Apache Stanbol

How it worksPOST request sends content via REST API

content is processed by an enhancement chain

Returns JSON-LD, RDF/XML, RDF/JSON etcJSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport format

for best results an enancement chain should do language detection, tokenization, POS Tagging prior to performing semantic annotation

http://drupalaton.jelastic.dogado.eu/stanbol/enhancer Sunday, August 18, 13

Page 21: Linked data based semantic annotation using Drupal and Apache Stanbol

Drupal integration

Source: blog.iks-project.eu

Sunday, August 18, 13

Page 22: Linked data based semantic annotation using Drupal and Apache Stanbol

Drupal distribution: IKS CEIKS CE distribution - Wolfgang Ziegler (fago), Stéphane Corlosquet (scor)

Components:

Search API Stanbol

VIE.js - semantic annotation UI

https://drupal.org/project/iksce

http://drupal.org/project/vie

http://drupal.org/project/search_api_stanbol

Sunday, August 18, 13

Page 23: Linked data based semantic annotation using Drupal and Apache Stanbol

Search API Stanbol

enables the indexing of Drupal entities such as nodes, users, taxonomy terms, files, etc. in Stanbol EntityHub.

data sent as RDF

data can be mashed up with data from other sources (Managed Sites, Remote Sites)

Sunday, August 18, 13

Page 24: Linked data based semantic annotation using Drupal and Apache Stanbol

VIE.js

“Vienna IKS Editables”

JavaScript library for implementing decoupled Content Management Systems and semantic interaction in web applications.

Sunday, August 18, 13

Page 25: Linked data based semantic annotation using Drupal and Apache Stanbol

Monolitic vs Decoupled Content Management

Monolitic vs Decoupled Content Management Systems

source: Henri Bergius - http://bergie.iki.fiSunday, August 18, 13

Page 26: Linked data based semantic annotation using Drupal and Apache Stanbol

Demo setup

we store Drupal entities in a SOLR index

annotations are to be made based on:

DBPedia - bundled with Apache Stanbol

a custom vocabulary of terms related to semantic web - Social Semantic Web Thesaurus

SemWeb is imported as a SOLR index into Apache Stanbol

Sunday, August 18, 13

Page 27: Linked data based semantic annotation using Drupal and Apache Stanbol

Custom vocabularies

Social Semantic Web Thesaurus

1959 concepts related to semantic web

Author: Andreas Blumauer

http://vocabulary.semantic-web.at/semweb.html

http://vocabulary.semantic-web.at/semweb/8.visual

Sunday, August 18, 13

Page 28: Linked data based semantic annotation using Drupal and Apache Stanbol

Demo

index Drupal entities in Apache Stanbol

retrieve annotated entites via REST API

annotate entities using dbpedia and semweb indexes

edit Drupal entities and annotate on the fly

retrieve linked data tag recommendations

Sunday, August 18, 13

Page 29: Linked data based semantic annotation using Drupal and Apache Stanbol

Questions?

Sunday, August 18, 13

Page 30: Linked data based semantic annotation using Drupal and Apache Stanbol

Contact me

[email protected]

twitter: gabidrg

Sunday, August 18, 13

Page 31: Linked data based semantic annotation using Drupal and Apache Stanbol

Thank you!

Sunday, August 18, 13

Page 32: Linked data based semantic annotation using Drupal and Apache Stanbol

http://mures2013.drupalcamp.ro

Sunday, August 18, 13