12
Extracting value from data: Introducing AnnoMarket - the cloud-based text annotation marketplace Helen Lippell, Press Association The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n°296322”

AnnoMarket - Cloud-based text analytics

Embed Size (px)

DESCRIPTION

Presentation introducing AnnoMarket which I gave at the Open Data Institute, Women in Data meetup, on 3rd December 2013

Citation preview

Page 1: AnnoMarket - Cloud-based text analytics

Extracting value from data: Introducing AnnoMarket - the cloud-based text annotation marketplace

Helen Lippell, Press Association

The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n°296322”

Page 2: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

Getting started

AnnoMarket is a European research project

PA part of a consortium with the University of Sheffield, French start-up IMR and Bulgarian semantic specialists Ontotext

I work as a data wrangler within the PA technology team, working with linked data curation and semantic modelling

Agenda:

Brief overview of text analytics

Introducing the AnnoMarket platform

2

Page 3: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

Text analytics isn’t that new People have been trying to make sense of

unstructured data for a long time

Rosetta Stone an early use case!

Experts compared patterns in the 3 texts and eventually could identify entities in the previously-incomprehensible hieroglyphics

This 1950s definition startlingly accurate:

H.P. Luhn, IBM Journal, 1958:

"...utilize data-processing machines for auto- abstracting and auto-encoding of documents for creating interest profiles for each of the 'action points' in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points”

Now there’s an unprecedented buzz around text analytics

Big Data movement

Semantics gaining traction in business applications

3

Page 4: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

Who uses text analytics?

Anyone who wants to derive value from unstructured data at scale

Not just spooks…

Scientific and technical

Media and publishing

Open data community

Researchers

Data-driven businesses

Customer experience

4

Page 5: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

What text analytics can do

Named entity recognition

Disambiguation Eg Iceland!

Entity types Eg People, places, things, organisations

Relevance

Pattern-identified entities Eg amounts of money, postcodes

Co-occurrence

Classification and categorisation

Sentiment analysis

5

Page 6: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

AnnoMarket

The marketplace - An “App store” for text analysis services

Breaking down barriers to entry for SMEs (developers and end-users alike)

Built on robust, mature GATE applications (open-source with global community supported by the University of Sheffield)

Benefits to end-users Affordable, pay-for-what-you-need model SaaS, cloud-based Flexible input and output formats

Benefits to suppliers Payments system Access to user base

(A note on look and feel: It is basic at the moment!)

6

Page 7: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

Running an annotation job

Find a service

Test it on site

Upload documents or specify a custom crawl

Manage server (GATE Teamware or Mimir)

Platform handles execution of job, keeps user updated

Download results or export to a GATE Mimir instance

Formats include XML, HTML, PDF, DOC

7

- GATE Teamware – web-based management platform for annotation - GATE Mimir – open-source framework for integrated semantic search

Page 8: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

Uploading pipelines

Straightforward process

Standard components: Pipeline – GATE saved application state Supporting files (eg gazetteers)Metadata for the platform and user-facing pages

Files checked then put live

Platform tracks usage and handles payments

8

Page 9: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

AnnoMarket screenshots

9

Browsable portal Tag-based filtering

Input config Output config

Page 10: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

News pipeline tool

10

- Customised pipeline which annotates named entities in the news domain (optimised for the UK)- Leverages PA’s knowledgebase and Linked Data references, also other entity types

Page 11: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

Get involved

Public beta

Register your interest now

We’ll email when it’s open

Free credit to early registrants

Ultimate aim:A sustainable platform that generates revenue

for contributors who wouldn’t have an outlet otherwise

Play with the platform

Feed back to us – bugs, functionality, finding resources, what more you’d like to see, etc!

11

Page 12: AnnoMarket - Cloud-based text analytics

Women in Data: NLP edition, Open Data Institute, 3 December 2013

Get in touch

Public beta – http://annomarket.com

Project site – https://annomarket.eu

@AnnoMarket

[email protected]

@octodude

12