Upload
ali-khalili
View
660
Download
2
Tags:
Embed Size (px)
DESCRIPTION
The Web democratized publishing -- everybody can easily publish information on a Website, Blog, in social networks or microblogging systems. The more the amount of published information grows, the more important are technologies for accessing, analysing, summarising and visualising information. While substantial progress has been made in the last years in each of these areas individually, we argue, that only the intelligent combination of approaches will make this progress truly useful and leverage further synergies between techniques. In this paper we develop a text analytics architecture of participation, which allows ordinary people to use sophisticated NLP techniques for analysing and visualizing their content, be it a Blog, Twitter feed, Website or article collection. The architecture comprises interfaces for information access, natural language processing and visualization. Dierent exchangeable components can be plugged into this architecture, making it easy to tailor for individual needs. We evaluate the usefulness of our approach by comparing both the eectiveness and eciency of end users within a task-solving setting. Moreover, we evaluate the usability of our approach using a questionnaire-driven approach. Both evaluations suggest that oridinary Web users are empowered to analyse their data and perform tasks, which were previously out of reach.
Citation preview
Lightweight Text Analytics using Linked Data
Ali Khalili, Sören Auer, Axel-Cyrille Ngonga Ngomo
Extended Semantic Web Conference May 27th, 2014Crete, Greece
http://context.aksw.org
2 Agenda
Motivation
How does conTEXT work?
Workflow
Features
Evaluation
Conclusion
Demo
3 Motivation: Analytical Information Imbalance
People should be able to find out what patterns can be discovered and what conclusions can be drawn from the information they share.
4 Motivation: Lightweight Text Analytics
Unstructured
Semi-structured
Structured
• IBM Content Analytics platform• GATE• Apache UIMA
• Attensity• Trendminer• MashMaker• Thomson
Data Analyzer
• Zoho Reports• SAP NetWeaver• Jackbe• Rapidminer
• Excel• DataWrangler• Google Docs Spreadsheets• Google Refine
• Alchmey• OpenCalais
• Facete• CubeViz
• TweetDeck• Topsy• Flumes
Lack of tools dealing with unstructured content, catering non-expert users and providing extensible analytics interfaces.
5 conTEXT
http://context.aksw.org A platform for lightweight text analytics Approach
No installation and configuration required
Access content from a variety of sources
Instantly show the results of analysis to users in a variety of visualizations
Allow refinement of automatic annotations and take feedback into account
Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together
How does it work?
6 Data Collection
Data tr
ansform
ation
Input Data Model
•Rest
APIs
•SPA
RQL endpoints
•RSS, A
tom
, RDF f
eeds
•Web C
rawlers
Handling different input types
- RDF-based- Relational
7 Data Analysis
Natural Language Processing (NLP)
• DBpedia Spotlight
• FOX
• Any other NLP services which support NIF
http://spotlight.dbpedia.org
http://fox.aksw.orgDBpedi
a
Nam
ed
entities NLP
Serv
ice Corp
ora
8 NLP Interchange Format (NIF)
http://nlp2rdf.org An RDF/OWL-based format
Provides Interoperability between Natural Language Processing (NLP) tools and services.
Standardize access parameters, annotations (e.g. tokenization), validation & log messages.
9 NLP Interchange Format (NIF)
10 Data Enrichment
De-referencing the DBpedia URIs of the recognized entities.
(e.g. longitude and latitudes for locations , birth and death dates for people, etc.)
Matching the entity co-occurrences with pre-defined natural language patterns for DBpedia predicates provided by BOA (BOotstrapping linked datA)
(e.g. authorship relation ) Catalyst
11 Data Mixing (Mashups)
NLP service integration Composite corpus
E.g. Twitter + Blog + Facebook
Helps to create a user model
12 Data Visualization & Exploration Different Views on Semantically-enriched
data
Using Exhibit & D3.js
13 Faceted browsing
14 Places map & People timeline
15 Tag cloud
16 Chordal graph view
17 Matrix view
18 Trend view
19 Sentiment view
20 Image view
21 Annotation refinement Lightweight text analytics as an incentive for users to
revise semantic annotations
RDFaCE WYSIWYM (What-You-See-Is-What-You-Mean) interface for manual content annotation in RDFa format
Feedback to NLP services NLP calibration
calibration
FOX Feedback APIhttp://139.18.2.164:4444/api/ner/feedback
DBPedia Spotlight Feedback APIhttp://spotlight.dbpedia.org/rest/feedback
22 Annotation refinement UI
23 conTEXT architecture overview
24 Other features: Interactive & Progressive Annotation
Interactive systems can be responsive despite low performance.
25 Other features: Real-time Semantic Analysis (ReSA)
https://github.com/ali1k/resa
26 Other features:
• Search Engine Optimization (SEO) using Schema.org & JSON-LD
• Drilling down results using a subgraph of DBpedia
• Changing the underlying DBpedia ontology
27 Evaluation: Usefulness study Task-driven usefulness study
25 Users
10 questions pertaining to knowledge discovery in corpora of unstructured data E.g. What are the five most mentioned countries by Bill Gates tweets?
28 Evaluation: Results of usefulness study
Measuring time & Jaccard similarity for answers using/without conTEXT
second
Avg. 136% more time without conTEXT
29 Evaluation: Usability study System Usability Scale (SUS) 82
http://www.measuringusability.com/
30
Lightweight Text Analytics using Linked Data Democratizing the NLP usage
Alleviating the Semantic Web's chicken-and-egg problem
Harnessing the power of feedback loops
Conclusions
31 Future Work
Improving the performance & scalability of views
Exposing APIs for third-parties Enable batch refinement of annotations More input source types More…
32 Any Questions?
33 Demo
Progressive data collection and annotation http://context.aksw.org
Different views LOD2 Blog http
://context.aksw.org/app/hub.php?corpus=6 Example of adding extra input types + changing the
DBpedia ontology + composite corpora LinkedIn Jobs
http://context.aksw.org/app/hub.php?corpus=242