Transcript
Page 1: conTEXT -- Lightweight Text Analytics using Linked Data

Lightweight Text Analytics using Linked Data

Ali Khalili, Sören Auer, Axel-Cyrille Ngonga Ngomo

Extended Semantic Web Conference May 27th, 2014Crete, Greece

http://context.aksw.org

Page 2: conTEXT -- Lightweight Text Analytics using Linked Data

2 Agenda

Motivation

How does conTEXT work?

Workflow

Features

Evaluation

Conclusion

Demo

Page 3: conTEXT -- Lightweight Text Analytics using Linked Data

3 Motivation: Analytical Information Imbalance

People should be able to find out what patterns can be discovered and what conclusions can be drawn from the information they share.

Page 4: conTEXT -- Lightweight Text Analytics using Linked Data

4 Motivation: Lightweight Text Analytics

Unstructured

Semi-structured

Structured

• IBM Content Analytics platform• GATE• Apache UIMA

• Attensity• Trendminer• MashMaker• Thomson

Data Analyzer

• Zoho Reports• SAP NetWeaver• Jackbe• Rapidminer

• Excel• DataWrangler• Google Docs Spreadsheets• Google Refine

• Alchmey• OpenCalais

• Facete• CubeViz

• TweetDeck• Topsy• Flumes

Lack of tools dealing with unstructured content, catering non-expert users and providing extensible analytics interfaces.

Page 5: conTEXT -- Lightweight Text Analytics using Linked Data

5 conTEXT

http://context.aksw.org A platform for lightweight text analytics Approach

No installation and configuration required

Access content from a variety of sources

Instantly show the results of analysis to users in a variety of visualizations

Allow refinement of automatic annotations and take feedback into account

Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together

How does it work?

Page 6: conTEXT -- Lightweight Text Analytics using Linked Data

6 Data Collection

Data tr

ansform

ation

Input Data Model

•Rest

APIs

•SPA

RQL endpoints

•RSS, A

tom

, RDF f

eeds

•Web C

rawlers

Handling different input types

- RDF-based- Relational

Page 7: conTEXT -- Lightweight Text Analytics using Linked Data

7 Data Analysis

Natural Language Processing (NLP)

• DBpedia Spotlight

• FOX

• Any other NLP services which support NIF

http://spotlight.dbpedia.org

http://fox.aksw.orgDBpedi

a

Nam

ed

entities NLP

Serv

ice Corp

ora

Page 8: conTEXT -- Lightweight Text Analytics using Linked Data

8 NLP Interchange Format (NIF)

http://nlp2rdf.org An RDF/OWL-based format

Provides Interoperability between Natural Language Processing (NLP) tools and services.

Standardize access parameters, annotations (e.g. tokenization), validation & log messages.

Page 9: conTEXT -- Lightweight Text Analytics using Linked Data

9 NLP Interchange Format (NIF)

Page 10: conTEXT -- Lightweight Text Analytics using Linked Data

10 Data Enrichment

De-referencing the DBpedia URIs of the recognized entities.

(e.g. longitude and latitudes for locations , birth and death dates for people, etc.)

Matching the entity co-occurrences with pre-defined natural language patterns for DBpedia predicates provided by BOA (BOotstrapping linked datA)

(e.g. authorship relation ) Catalyst

Page 11: conTEXT -- Lightweight Text Analytics using Linked Data

11 Data Mixing (Mashups)

NLP service integration Composite corpus

E.g. Twitter + Blog + Facebook

Helps to create a user model

Page 12: conTEXT -- Lightweight Text Analytics using Linked Data

12 Data Visualization & Exploration Different Views on Semantically-enriched

data

Using Exhibit & D3.js

Page 13: conTEXT -- Lightweight Text Analytics using Linked Data

13 Faceted browsing

Page 14: conTEXT -- Lightweight Text Analytics using Linked Data

14 Places map & People timeline

Page 15: conTEXT -- Lightweight Text Analytics using Linked Data

15 Tag cloud

Page 16: conTEXT -- Lightweight Text Analytics using Linked Data

16 Chordal graph view

Page 17: conTEXT -- Lightweight Text Analytics using Linked Data

17 Matrix view

Page 18: conTEXT -- Lightweight Text Analytics using Linked Data

18 Trend view

Page 19: conTEXT -- Lightweight Text Analytics using Linked Data

19 Sentiment view

Page 20: conTEXT -- Lightweight Text Analytics using Linked Data

20 Image view

Page 21: conTEXT -- Lightweight Text Analytics using Linked Data

21 Annotation refinement Lightweight text analytics as an incentive for users to

revise semantic annotations

RDFaCE WYSIWYM (What-You-See-Is-What-You-Mean) interface for manual content annotation in RDFa format

Feedback to NLP services NLP calibration

calibration

FOX Feedback APIhttp://139.18.2.164:4444/api/ner/feedback

DBPedia Spotlight Feedback APIhttp://spotlight.dbpedia.org/rest/feedback

Page 22: conTEXT -- Lightweight Text Analytics using Linked Data

22 Annotation refinement UI

Page 23: conTEXT -- Lightweight Text Analytics using Linked Data

23 conTEXT architecture overview

Page 24: conTEXT -- Lightweight Text Analytics using Linked Data

24 Other features: Interactive & Progressive Annotation

Interactive systems can be responsive despite low performance.

Page 25: conTEXT -- Lightweight Text Analytics using Linked Data

25 Other features: Real-time Semantic Analysis (ReSA)

https://github.com/ali1k/resa

Page 26: conTEXT -- Lightweight Text Analytics using Linked Data

26 Other features:

• Search Engine Optimization (SEO) using Schema.org & JSON-LD

• Drilling down results using a subgraph of DBpedia

• Changing the underlying DBpedia ontology

Page 27: conTEXT -- Lightweight Text Analytics using Linked Data

27 Evaluation: Usefulness study Task-driven usefulness study

25 Users

10 questions pertaining to knowledge discovery in corpora of unstructured data E.g. What are the five most mentioned countries by Bill Gates tweets?

Page 28: conTEXT -- Lightweight Text Analytics using Linked Data

28 Evaluation: Results of usefulness study

Measuring time & Jaccard similarity for answers using/without conTEXT

second

Avg. 136% more time without conTEXT

Page 29: conTEXT -- Lightweight Text Analytics using Linked Data

29 Evaluation: Usability study System Usability Scale (SUS) 82

http://www.measuringusability.com/

Page 30: conTEXT -- Lightweight Text Analytics using Linked Data

30

Lightweight Text Analytics using Linked Data Democratizing the NLP usage

Alleviating the Semantic Web's chicken-and-egg problem

Harnessing the power of feedback loops

Conclusions

Page 31: conTEXT -- Lightweight Text Analytics using Linked Data

31 Future Work

Improving the performance & scalability of views

Exposing APIs for third-parties Enable batch refinement of annotations More input source types More…

Page 32: conTEXT -- Lightweight Text Analytics using Linked Data

32 Any Questions?

Page 33: conTEXT -- Lightweight Text Analytics using Linked Data

33 Demo

Progressive data collection and annotation http://context.aksw.org

Different views LOD2 Blog http

://context.aksw.org/app/hub.php?corpus=6 Example of adding extra input types + changing the

DBpedia ontology + composite corpora LinkedIn Jobs

http://context.aksw.org/app/hub.php?corpus=242


Recommended