Upload
helenlippell
View
264
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presentation introducing AnnoMarket which I gave at the Open Data Institute, Women in Data meetup, on 3rd December 2013
Citation preview
Extracting value from data: Introducing AnnoMarket - the cloud-based text annotation marketplace
Helen Lippell, Press Association
The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n°296322”
Women in Data: NLP edition, Open Data Institute, 3 December 2013
Getting started
AnnoMarket is a European research project
PA part of a consortium with the University of Sheffield, French start-up IMR and Bulgarian semantic specialists Ontotext
I work as a data wrangler within the PA technology team, working with linked data curation and semantic modelling
Agenda:
Brief overview of text analytics
Introducing the AnnoMarket platform
2
Women in Data: NLP edition, Open Data Institute, 3 December 2013
Text analytics isn’t that new People have been trying to make sense of
unstructured data for a long time
Rosetta Stone an early use case!
Experts compared patterns in the 3 texts and eventually could identify entities in the previously-incomprehensible hieroglyphics
This 1950s definition startlingly accurate:
H.P. Luhn, IBM Journal, 1958:
"...utilize data-processing machines for auto- abstracting and auto-encoding of documents for creating interest profiles for each of the 'action points' in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points”
Now there’s an unprecedented buzz around text analytics
Big Data movement
Semantics gaining traction in business applications
3
Women in Data: NLP edition, Open Data Institute, 3 December 2013
Who uses text analytics?
Anyone who wants to derive value from unstructured data at scale
Not just spooks…
Scientific and technical
Media and publishing
Open data community
Researchers
Data-driven businesses
Customer experience
4
Women in Data: NLP edition, Open Data Institute, 3 December 2013
What text analytics can do
Named entity recognition
Disambiguation Eg Iceland!
Entity types Eg People, places, things, organisations
Relevance
Pattern-identified entities Eg amounts of money, postcodes
Co-occurrence
Classification and categorisation
Sentiment analysis
5
Women in Data: NLP edition, Open Data Institute, 3 December 2013
AnnoMarket
The marketplace - An “App store” for text analysis services
Breaking down barriers to entry for SMEs (developers and end-users alike)
Built on robust, mature GATE applications (open-source with global community supported by the University of Sheffield)
Benefits to end-users Affordable, pay-for-what-you-need model SaaS, cloud-based Flexible input and output formats
Benefits to suppliers Payments system Access to user base
(A note on look and feel: It is basic at the moment!)
6
Women in Data: NLP edition, Open Data Institute, 3 December 2013
Running an annotation job
Find a service
Test it on site
Upload documents or specify a custom crawl
Manage server (GATE Teamware or Mimir)
Platform handles execution of job, keeps user updated
Download results or export to a GATE Mimir instance
Formats include XML, HTML, PDF, DOC
7
- GATE Teamware – web-based management platform for annotation - GATE Mimir – open-source framework for integrated semantic search
Women in Data: NLP edition, Open Data Institute, 3 December 2013
Uploading pipelines
Straightforward process
Standard components: Pipeline – GATE saved application state Supporting files (eg gazetteers)Metadata for the platform and user-facing pages
Files checked then put live
Platform tracks usage and handles payments
8
Women in Data: NLP edition, Open Data Institute, 3 December 2013
AnnoMarket screenshots
9
Browsable portal Tag-based filtering
Input config Output config
Women in Data: NLP edition, Open Data Institute, 3 December 2013
News pipeline tool
10
- Customised pipeline which annotates named entities in the news domain (optimised for the UK)- Leverages PA’s knowledgebase and Linked Data references, also other entity types
Women in Data: NLP edition, Open Data Institute, 3 December 2013
Get involved
Public beta
Register your interest now
We’ll email when it’s open
Free credit to early registrants
Ultimate aim:A sustainable platform that generates revenue
for contributors who wouldn’t have an outlet otherwise
Play with the platform
Feed back to us – bugs, functionality, finding resources, what more you’d like to see, etc!
11
Women in Data: NLP edition, Open Data Institute, 3 December 2013
Get in touch
Public beta – http://annomarket.com
Project site – https://annomarket.eu
@AnnoMarket
@octodude
12