21
Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Semantic Search on Heterogeneous Wiki Systems Fabrizio Orlandi, Alexandre Passant DERI – Galway Wikimania 2010 Gdansk – 10th July 2010

Semantic search on heterogeneous wiki systems - Wikimania 2010

Embed Size (px)

DESCRIPTION

presented by Fabrizio Orlandi at the Wikimania 2010 conference in Gdansk

Citation preview

Page 1: Semantic search on heterogeneous wiki systems - Wikimania 2010

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Semantic Search on Heterogeneous Wiki Systems

Fabrizio Orlandi, Alexandre PassantDERI – Galway

Wikimania 2010Gdansk – 10th July 2010

Page 2: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

Interlinking wikisInterlinking wikis

All wikis share a wide common knowledge, within many different All wikis share a wide common knowledge, within many different wiki platforms:wiki platforms:

All with different structures, platform dependent, all disconnected...All with different structures, platform dependent, all disconnected...

MoinMoin

TWiki DokuWiki

2 of 21

Widely used even in the workplace...Widely used even in the workplace...

AtlassianConfluence

TracWiki XWiki

Page 3: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ieMany isolated communities of users and their dataMany isolated communities of users and their data

* Source: Pidgin Technologies, www.pidgintech.com* Source: Pidgin Technologies, www.pidgintech.com

Wikis are also disconnected with other Wikis are also disconnected with other social media websitessocial media websites

Page 4: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

We propose a new approach based on Linked Data principles to solve such issues and to enable semantic search across heterogeneous wiki systems

Interlinking wikisInterlinking wikis

4 of 21

Page 5: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

Wiki ModelsWiki Models

Several semantic models have been implemented and used within Several semantic models have been implemented and used within specific semantic wiki platformsspecific semantic wiki platforms

Semantic MediaWiki

as well as efforts to create generic ontology models:as well as efforts to create generic ontology models:•WikiOnt WikiOnt ontology ontology (DERI)(DERI)

•WIFWIF (Wiki Interchange Format) ontology ontology ((Völkel, Oren - 1st Workshop on Semantic Wikis - 2006Völkel, Oren - 1st Workshop on Semantic Wikis - 2006))

e.g.:

But they are all But they are all specific to wikisspecific to wikis and not open to other social and not open to other social websites websites

5 of 21

Page 6: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

• A project developed by A project developed by DERIDERI to semantically describe the content to semantically describe the content and structure of community sitesand structure of community sites

• It aims to create new connections between online discussion posts It aims to create new connections between online discussion posts and items, forums, blogs... and wikis.and items, forums, blogs... and wikis.

• In particular the SIOC ontology is In particular the SIOC ontology is not specific to wikisnot specific to wikis and is and is widely widely usedused on the Web on the Web

• Adopted in a framework of more than 50 applications, Adopted in a framework of more than 50 applications, deployed on deployed on over 400 sitesover 400 sites

http://sioc-project.orghttp://sioc-project.org

SIOCSIOC Semantically-Interlinked Online CommunitiesSemantically-Interlinked Online Communities

6 of 21

including including Drupal 7Drupal 7 and and Yahoo! SearchMonkeyYahoo! SearchMonkey

Page 7: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

Extending the SIOC ontologyExtending the SIOC ontology

We decided to extend the SIOC ontology to make it compliant with wikis We decided to extend the SIOC ontology to make it compliant with wikis and make wikis and make wikis interoperableinteroperable and and linkablelinkable to other social objects. to other social objects.

Advantages:Advantages:

• Integration with all the existing semantic dataIntegration with all the existing semantic data

• Ability to run the Ability to run the same queriessame queries to find items on: to find items on:

– wikis, forums, blogs, social neworking sites, etc.wikis, forums, blogs, social neworking sites, etc.

First we considered the typical and First we considered the typical and relevantrelevant featuresfeatures ofof wikiswikis in terms of in terms of structure and social interactions.structure and social interactions.

7 of 21

Page 8: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

Relevant wiki featuresRelevant wiki features

Multi-authoring:Multi-authoring: multiple users edit the same content collaboratively. multiple users edit the same content collaboratively.

• CategoriesCategories:: hierarchicalhierarchical organization of articlesorganization of articles..A solution: A solution: SKOSSKOS vocabulary (W3C recommendation to model hierarchical structures between various vocabulary (W3C recommendation to model hierarchical structures between various categories) and the categories) and the sioct:Categorysioct:Category class class

• Social Tagging:Social Tagging: non-organized but dynamic organization process.non-organized but dynamic organization process.The properties The properties sioc:topicsioc:topic (using URIs) and (using URIs) and dc:subjectdc:subject (using keywords) can be used to represent tags (using keywords) can be used to represent tags related to a particular wiki page.related to a particular wiki page.

8 of 21

http://wiki.../The_Clash http://wiki.../Punk_rock

Punk rock

sioc:topic

dc:subject tag:hasTag

Page 9: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

Relevant wiki featuresRelevant wiki features

Pages Versioning:Pages Versioning: each page has an associated page history.each page has an associated page history. We use We use sioc:next(previous)_version and and sioc:latest_version properties.properties. Added 2 Added 2 transitivetransitive (OWL) properties: (OWL) properties: sioc:earlier_version & & sioc:later_version;; Defined Defined sioc:next(previous)_version as subproperties of as subproperties of sioc:later(earlier)_version..

9 of 21

• DiscussionsDiscussions: : pagespages wherewhere people can people can discussdiscuss aboutabout the the articlearticle subjectsubject..We added a new We added a new sioc:has_discussion sioc:has_discussion property, property, with domain with domain sioc:Itemsioc:Item and open range. and open range.

• BacklinksBacklinks:: (or “what links here”) (or “what links here”) wiki internal links pointing to the same wiki article.wiki internal links pointing to the same wiki article.We use the already existing We use the already existing sioc:links_tosioc:links_to property property..

Page 10: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

An exporter from a popular wiki platform to expose data in RDF using our An exporter from a popular wiki platform to expose data in RDF using our proposed model.proposed model.

A webservice, written in PHP, that exports a MediaWiki article in RDF publicly A webservice, written in PHP, that exports a MediaWiki article in RDF publicly available at: available at:

http://ws.sioc-project.org/mediawiki/http://ws.sioc-project.org/mediawiki/

SIOC-MediaWiki ExporterSIOC-MediaWiki Exporter

10 of 21

Page 11: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

An exporter from a popular wiki platform to expose data in RDF using our An exporter from a popular wiki platform to expose data in RDF using our proposed model.proposed model.

A webservice, written in PHP, that exports a MediaWiki article in RDF publicly A webservice, written in PHP, that exports a MediaWiki article in RDF publicly available at: available at:

http://ws.sioc-project.org/mediawiki/http://ws.sioc-project.org/mediawiki/

SIOC-MediaWiki ExporterSIOC-MediaWiki Exporter

11 of 21

Page 12: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

RDF data extracted from a wiki page is browsable with tools such as RDF data extracted from a wiki page is browsable with tools such as The TabulatorThe Tabulator

To offer a better browsing experience and ease the process of To offer a better browsing experience and ease the process of crawling SIOC exports of MediaWiki instances, the webservice crawling SIOC exports of MediaWiki instances, the webservice automatically produces automatically produces rdfs:seeAlsordfs:seeAlso links between wiki pages, links between wiki pages,

following the following the Linked DataLinked Data practices; practices;

Link to the corresponding Link to the corresponding DbpediaDbpedia resource added resource added automatically,automatically, if if the article is from the Wikipedia the article is from the Wikipedia [English] [English] (with (with foaf:primaryTopicfoaf:primaryTopic))

A RDF A RDF crawlercrawler can easily follow all the can easily follow all the seeAlsoseeAlso links found on every links found on every document and continue to crawl, so it is possible to crawl an entire document and continue to crawl, so it is possible to crawl an entire

wiki site starting from a single URI.wiki site starting from a single URI.

Browsing the generated dataBrowsing the generated data

Page 13: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

RDF data extracted from a wiki page is browsable with tools such as RDF data extracted from a wiki page is browsable with tools such as The TabulatorThe Tabulator

The webservice automatically produces The webservice automatically produces rdfs:seeAlsordfs:seeAlso links between links between wiki pages, following the wiki pages, following the Linked DataLinked Data principles; principles;

A RDF A RDF crawlercrawler can easily follow all the can easily follow all the seeAlsoseeAlso links found on every links found on every document and continue to crawl, so it is possible to crawl an entire document and continue to crawl, so it is possible to crawl an entire

wiki site starting from a single URI.wiki site starting from a single URI.

Browsing the generated dataBrowsing the generated data

13 of 21

Page 14: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

The DokuSIOC pluginThe DokuSIOC plugin

A A pluginplugin forfor DokuWikiDokuWiki that exports RDF data using popular lightweight ontologies that exports RDF data using popular lightweight ontologies (originally(originally developed by M. Haschke, a SIOC contributor). developed by M. Haschke, a SIOC contributor).

We We modifiedmodified and and extendedextended this plug-in in order to be compliant with our proposed this plug-in in order to be compliant with our proposed model and to export all the needed wiki features.model and to export all the needed wiki features.

It takes information from the metadata stored in the wiki system about pages, It takes information from the metadata stored in the wiki system about pages, users, links, etc. and provides it as raw RDF/XML serialized datausers, links, etc. and provides it as raw RDF/XML serialized data(instead of the usual HTML page).(instead of the usual HTML page).

Developed in Developed in PHPPHP and easy to install in every DokuWiki system. and easy to install in every DokuWiki system.

It uses the It uses the SIOC PHP APISIOC PHP API..

14 of 21

Page 15: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

The DokuSIOC pluginThe DokuSIOC plugin

Page 16: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

Collecting Data Collecting Data

To evaluate our proposal, we exported and crawled To evaluate our proposal, we exported and crawled 55 different different MediaWiki and DokuWiki instances MediaWiki and DokuWiki instances

Collecting more than: 1GB of RDF data,Collecting more than: 1GB of RDF data,

3000 wiki articles and 700 users3000 wiki articles and 700 users

Data loaded in a Data loaded in a triple-storetriple-store ( (Sesame + OWLIM))

On the top of that it is possible to run On the top of that it is possible to run cross-sites queriescross-sites queries

by combining FOAF and SIOC by combining FOAF and SIOC e.g.:

SELECT DISTINCT ?contentWHERE { <http://example.org/js#me> foaf:account ?account . ?account rdf:type sioc:UserAccount . ?content sioc:has_creator ?account .}

16 of 21

Page 17: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

Collecting Data Collecting Data

17 of 21

SELECT DISTINCT ?contentWHERE { <http://example.org/js#me> foaf:account ?account . ?account rdf:type sioc:UserAccount . ?content sioc:has_creator ?account .}

Page 18: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

Building the applicationBuilding the application

The The data acquisitiondata acquisition module is a module is a PHP scriptPHP script that: that: queries the triple-store queries the triple-store collects and parses the resultscollects and parses the results translates the data in the correct format translates the data in the correct format (JSON)(JSON) for the visualization for the visualization

layerlayer

The The visualization layervisualization layer has been built with the has been built with the ExhibitExhibit framework by the framework by the MIT SIMILE ProjectMIT SIMILE Project

It is a set of Javascript files directly configurable on the HTML code of It is a set of Javascript files directly configurable on the HTML code of the page to displaythe page to display

It allows for faceted browsing capabilities It allows for faceted browsing capabilities

18 of 21

Page 19: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

Page 20: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

ConclusionsConclusions

Presented how the SIOC ontology and lightweight semantics can be used and extended to represent the structure of wikis;

How to interlink wikis to other online communities;

Demonstrated an overall benefit on applying SemWeb technologies to wikis:

– enabling end-users to access the information generated in a simple and transparent way,

– showing potentialities that cannot be obtained using the traditional Web 2.0 instruments;

20 of 21

Page 21: Semantic search on heterogeneous wiki systems - Wikimania 2010

Digital Enterprise Research Institute www.deri.ie

Thank you!Thank you!

Any questions?Any questions?

21 of 21