Upload
meaningcloud
View
629
Download
0
Tags:
Embed Size (px)
Citation preview
Better Text Analytics with MeaningCloud
How our customization tools can boost text analysis accuracy
Webinar - Daedalus / MeaningCloud, May 14, 2015
Introduction
Presenter
Logistics
Send text questions, or
“Raise your hand” to speak and we’ll open your mic
Will publish link to recorded webinar
Jarred McGinnis, PhD
Business Development, UK
Agenda
Text analytics: accuracy, precision, recall
Customized linguistic resources for improved accuracy
MeaningCloud customization tools
Conclusions, Q&A
Text analytics
Extract meaning and actionable insights from unstructured content Automatization of costly manual activities
OpinionsFacts
Concepts
Organizations
People
Semantic
Analysis
Relationships
Themes
Just how precise is precise?
Precision is relative
Even experts aren’t 100% precise
Tests involving human analysts: 85-95% agreement
Along with precision, recall is also important
High precision High recall
High precision Low recall
Low precision High recall
Identified by algorithm
Accuracy: precision & recall
Precision and recall are inversely related
Trade-off needed
Requirements are application-specific
Brand monitoring in social media: high precision, low recall
Counter-terrorism : high recall, low precision
Precision – Recall Curve
State of the Art for Text Analysis
Precision Measurements
Topic Extraction: 70-85%
Classification: 70-80%
Sentiment Analysis: 60-70%
Quality improvement depends on the adaptation of the tools and resources to the application / task
MeaningCloud: cloud-based semantic APIs
Register and use it FREE at http://www.meaningcloud.com
APIs services of MeaningCloud Sentiment analysis Global Aspect-based
Classification Standard models
Topic extraction Entities Concepts Dates Addresses Economic quantities Time expressions …
https://www.meaningcloud.com/demos/media-analysis/
MeaningCloud: standard resources
Ontodaedalus (ontology) 437 nodes
78 themes
250,000+ lemmas/language
https://www.meaningcloud.com/developer/documentation/ontodaedalus
MeaningCloud: Standard classification models
‘out-of-the-box’ support of well-known classification standards
IPTC: news
Business Reputation: corporate reputation
EuroVoc: public administration
IAB (coming soon): advertising https://www.meaningcloud.com/developer/resources/models
VoC / Customer Insights scenario
Social networks, forums
Survey verbatims encuestas
Contact Center interactions: voice, email…
Structure and extract meaning
What companies/ brands are they mentioning?
What are they talking about?
What’s their opinion?
Analysis
Insights
Opinions
The sentence “The highest interest rate in industry!” is…
Positive, if talking about savings
Negative, if talking about mortgages
Customized linguistic resources improve accuracy
Mentions
Names of banks and financial companies, e.g., Citibank, BBVA
Product names, e.g., Your Waysm Account. Compass Account…
Themes
Example: analysis of a bank’s customer opinions
Products
Accounts
Checking
Savings
Borrowing
Credit
Mortgage
Channel
Office
Phone
Internet
Demo agenda
Bank dictionary
Bank names, product names (entities)
Generic product names, e.g., mortgage (concepts)
Classification models
Channel model: phone, web…
Sentiment models (preview)
Creating a new entity
Aliases: It is NOT necessary to explicitly include “trivial” aliases as the engine generates typical variants
Use your own ontology
Possible to include additional semantic info
Dictionary import
The best way to include a pre-existing dictionary in MeaningCloud
Form Alias ID Semantic info attributes
Outcome: APIs identify topics in dictionary
Identifies semantic info Product: Cash Card Account Type: Currents Accounts Bank: Barclays
Defining a new category: hybrid approach
Rule-based
Training-based
Possible to opt for one of the approaches, or to
combine both, depending on the application
Defining a category: training
Fed with precodified training texts
Based on machine learning technology
Defining a category: rules
Terms that
Are indispensable
Are banned
Increase relevance
Reduce relevance
Improving precision and recall using rules and training
Statistical Rules Hybrid
Benefits Fast, provided tagged
texts are available
Good accuracy for long
texts
No false positives
Very good accuracy for
limited environments
Can be easily started with
training texts
Does not need exhaustive
definition of rules
Dis-
advantages
“Black box” approach
False positives difficult
to correct
Bias in results,
depending on training
Costly if starting from
zero
False negatives,
depending on rule
quality
Difficult to scale
Requires deep domain
knowledge
Outcome: APIs classify according to model
Justifies classification relevance depending on the terms appearing
Custom sentiment dictionaries (COMING SOON)
Not all terms have the same polarity in all domains
E.g., in the luxury goods’ domain the term “cheap” doesn’t necessarily have a positive polarity (like in other domains)
Define a luxury goods custom sentiment dictionary where: “cheap” N
A given term can have different polarities, according to context
We’re presently testing this feature. If you want to take part in the private beta send an email to [email protected]
Term Context Polarity
close stock market NEUTRAL
close deal, contract Pos
close company Neg
Conclusions
How to improve accuracy?
Graphical tools
Possibility to include own dictionaries and models
Broad coverage: mentions, themes, opinions…
Empowered users
High accuracy analysis is within your reach.
Democratizing the extraction of meaning
High quality semantic analysis Optimized technology mix
Continuously updates semantic resources
High-level APIs, e.g., Corporate Reputation
Customizable to customer domain: models, dictionaries, sentiment
Affordable, no risks
Mature, tested technology
Test and use for FREE (40,000 requests per month)
Pay per use
No commitment or permanence
Commercial plans beginning at $99 /mo
For developers and non
technical users
Add-in for Excel
Standard web services APIs
Plug-ins and SDKs for diverse environments and languages
Plug-and-play approach
OpinionesTemasHechos
Conceptos
Organizaciones
Personas
Relaciones
Thank you for your attention!
Questions, suggestions...
Jarred McGinnis, PhD
Business Development, UK
http://www.meaningcloud.com
http://www.daedalus.es