conTEXT -- Lightweight Text Analytics using Linked Data

Preview:

DESCRIPTION

The Web democratized publishing -- everybody can easily publish information on a Website, Blog, in social networks or microblogging systems. The more the amount of published information grows, the more important are technologies for accessing, analysing, summarising and visualising information. While substantial progress has been made in the last years in each of these areas individually, we argue, that only the intelligent combination of approaches will make this progress truly useful and leverage further synergies between techniques. In this paper we develop a text analytics architecture of participation, which allows ordinary people to use sophisticated NLP techniques for analysing and visualizing their content, be it a Blog, Twitter feed, Website or article collection. The architecture comprises interfaces for information access, natural language processing and visualization. Di erent exchangeable components can be plugged into this architecture, making it easy to tailor for individual needs. We evaluate the usefulness of our approach by comparing both the e ectiveness and eciency of end users within a task-solving setting. Moreover, we evaluate the usability of our approach using a questionnaire-driven approach. Both evaluations suggest that oridinary Web users are empowered to analyse their data and perform tasks, which were previously out of reach.

Citation preview

Lightweight Text Analytics using Linked Data

Ali Khalili, Sören Auer, Axel-Cyrille Ngonga Ngomo

Extended Semantic Web Conference May 27th, 2014Crete, Greece

http://context.aksw.org

2 Agenda

Motivation

How does conTEXT work?

Workflow

Features

Evaluation

Conclusion

Demo

3 Motivation: Analytical Information Imbalance

People should be able to find out what patterns can be discovered and what conclusions can be drawn from the information they share.

4 Motivation: Lightweight Text Analytics

Unstructured

Semi-structured

Structured

• IBM Content Analytics platform• GATE• Apache UIMA

• Attensity• Trendminer• MashMaker• Thomson

Data Analyzer

• Zoho Reports• SAP NetWeaver• Jackbe• Rapidminer

• Excel• DataWrangler• Google Docs Spreadsheets• Google Refine

• Alchmey• OpenCalais

• Facete• CubeViz

• TweetDeck• Topsy• Flumes

Lack of tools dealing with unstructured content, catering non-expert users and providing extensible analytics interfaces.

5 conTEXT

http://context.aksw.org A platform for lightweight text analytics Approach

No installation and configuration required

Access content from a variety of sources

Instantly show the results of analysis to users in a variety of visualizations

Allow refinement of automatic annotations and take feedback into account

Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together

How does it work?

6 Data Collection

Data tr

ansform

ation

Input Data Model

•Rest

APIs

•SPA

RQL endpoints

•RSS, A

tom

, RDF f

eeds

•Web C

rawlers

Handling different input types

- RDF-based- Relational

7 Data Analysis

Natural Language Processing (NLP)

• DBpedia Spotlight

• FOX

• Any other NLP services which support NIF

http://spotlight.dbpedia.org

http://fox.aksw.orgDBpedi

a

Nam

ed

entities NLP

Serv

ice Corp

ora

8 NLP Interchange Format (NIF)

http://nlp2rdf.org An RDF/OWL-based format

Provides Interoperability between Natural Language Processing (NLP) tools and services.

Standardize access parameters, annotations (e.g. tokenization), validation & log messages.

9 NLP Interchange Format (NIF)

10 Data Enrichment

De-referencing the DBpedia URIs of the recognized entities.

(e.g. longitude and latitudes for locations , birth and death dates for people, etc.)

Matching the entity co-occurrences with pre-defined natural language patterns for DBpedia predicates provided by BOA (BOotstrapping linked datA)

(e.g. authorship relation ) Catalyst

11 Data Mixing (Mashups)

NLP service integration Composite corpus

E.g. Twitter + Blog + Facebook

Helps to create a user model

12 Data Visualization & Exploration Different Views on Semantically-enriched

data

Using Exhibit & D3.js

13 Faceted browsing

14 Places map & People timeline

15 Tag cloud

16 Chordal graph view

17 Matrix view

18 Trend view

19 Sentiment view

20 Image view

21 Annotation refinement Lightweight text analytics as an incentive for users to

revise semantic annotations

RDFaCE WYSIWYM (What-You-See-Is-What-You-Mean) interface for manual content annotation in RDFa format

Feedback to NLP services NLP calibration

calibration

FOX Feedback APIhttp://139.18.2.164:4444/api/ner/feedback

DBPedia Spotlight Feedback APIhttp://spotlight.dbpedia.org/rest/feedback

22 Annotation refinement UI

23 conTEXT architecture overview

24 Other features: Interactive & Progressive Annotation

Interactive systems can be responsive despite low performance.

25 Other features: Real-time Semantic Analysis (ReSA)

https://github.com/ali1k/resa

26 Other features:

• Search Engine Optimization (SEO) using Schema.org & JSON-LD

• Drilling down results using a subgraph of DBpedia

• Changing the underlying DBpedia ontology

27 Evaluation: Usefulness study Task-driven usefulness study

25 Users

10 questions pertaining to knowledge discovery in corpora of unstructured data E.g. What are the five most mentioned countries by Bill Gates tweets?

28 Evaluation: Results of usefulness study

Measuring time & Jaccard similarity for answers using/without conTEXT

second

Avg. 136% more time without conTEXT

29 Evaluation: Usability study System Usability Scale (SUS) 82

http://www.measuringusability.com/

30

Lightweight Text Analytics using Linked Data Democratizing the NLP usage

Alleviating the Semantic Web's chicken-and-egg problem

Harnessing the power of feedback loops

Conclusions

31 Future Work

Improving the performance & scalability of views

Exposing APIs for third-parties Enable batch refinement of annotations More input source types More…

32 Any Questions?

33 Demo

Progressive data collection and annotation http://context.aksw.org

Different views LOD2 Blog http

://context.aksw.org/app/hub.php?corpus=6 Example of adding extra input types + changing the

DBpedia ontology + composite corpora LinkedIn Jobs

http://context.aksw.org/app/hub.php?corpus=242

Recommended