Publishing Technology Online Forum - Engineering the semantic web

Preview:

DESCRIPTION

Priya Parvatakir from Publishing Technology demonstrates how it is implementing semantic web technologies in new publisher GSE Research's online publishing website.

Citation preview

innovation. quality. service

“Enabling clients to realize the full potential of their content and increase efficiency throughout their enterprise.”

Engineering technology to deliver the revolution

Presentation to Online Publishers’ forum

November 29, 2011

Priya Parvatikar, Technical Architect

About this talk

Engineering technology to deliver the revolution 2

• Features of the GSE Research website

• Overview of how the features have been achieved

• ‘Under the hood’ look at the technology

Improved search - Enhancing auto-suggest

Engineering technology to deliver the revolution 3

Using taxonomy information for “did you mean”

Engineering technology to deliver the revolution 4

Boosting relevant results

Engineering technology to deliver the revolution 5

Guiding the user through facets

Engineering technology to deliver the revolution 6

Guiding the user through suggestions

Engineering technology to deliver the revolution 7

Concept homepages

Engineering technology to deliver the revolution 8

Showing concepts on item homepages

Engineering technology to deliver the revolution 9

Suggest related items

Engineering technology to deliver the revolution 10

GSE Research – How?

Engineering technology to deliver the revolution 11

• Built using the pub2web platform

• MetaStore used for metadata storage

• Apache Solr used for search indexing

• Semantic enrichment of content

• Apache UIMA used for entity extraction

MetaStore

Engineering technology to deliver the revolution 12

• RDF triplestore for storing metadata

• Agnostic to the type of data being stored

• Able to store rich and very granular data

• Flexible to cater for future data enhancements

For the GSE Research site:

Content

Authors

Taxonomy concepts and relations

Federation of data from external datasets

Search

Engineering technology to deliver the revolution 13

• Uses enterprise-grade Apache Solr

• Inbuilt support for rich features

• Faceted searching

• Synonyms

• Stemming

• Boosting

• ‘More like this’

• ‘Did you mean’

Content for GSE Research website

Engineering technology to deliver the revolution 14

Provided by GSE

• Content XML

• Taxonomy prepared by GSE

Taxonomy enhancement

• Concepts mapped to Library of Congress classifications

• Taxonomy automatically enhanced with terms from this classification

GSE Research taxonomy - example

Engineering technology to deliver the revolution 15

For example, the GSE taxonomy contains

Climate change, pollution & environmental impacts

Water pollution

Air pollution

After enhancing with Library of Congress classification

Climate change, pollution & environmental impacts

Water pollution – variants: aquatic pollution, water contamination

Marine pollution – variants: ocean pollution, sea pollution

Oil pollution of water – variants: petroleum pollution of water

Estuarine pollution – variants: estuary pollution

Air pollution

Content workflow in GSE Research

Engineering technology to deliver the revolution 16

MetaStoreMetaStore

SearchIndex

SearchIndex

MetaStoreLoader

MetaStoreLoader

Text miningpipelinesText miningpipelines

Content Content

ImagesImages

TablesTables

AuthorsAuthors

Additional concepts

ConceptsConcepts

External datasetsExternal datasets

Entity extraction for GSE Research content

Engineering technology to deliver the revolution 17

Apache UIMA

• Architectural framework to manage unstructured data

• Apache license open-source project

• OASIS standard

Provides

• Framework

• Annotators – multiple annotators can be applied in a pipeline

• Ability to plug in external text-mining services as annotators

Example of entity extraction

Engineering technology to deliver the revolution 18

Editorial curation

Engineering technology to deliver the revolution 19

Future possibilities for GSE Research

Engineering technology to deliver the revolution 20

• Extraction of geographical concepts

• Federation of data from other external datasets eg. government datasets

• Semantic analysis of search queries to deliver better results

Summary

Engineering technology to deliver the revolution 21

• Tagging drives discovery

• Provide multiple routes to content

• Provide external context to content

• Start simple and experiment

• Flexibility of underlying systems is key

Thank you!

Recommended