38
Better Text Analytics with MeaningCloud How our customization tools can boost text analysis accuracy Webinar - Daedalus / MeaningCloud, May 14, 2015

Boost Your Text Analytics Accuracy - MeaningCloud Webinar

Embed Size (px)

Citation preview

Better Text Analytics with MeaningCloud

How our customization tools can boost text analysis accuracy

Webinar - Daedalus / MeaningCloud, May 14, 2015

Introduction

Presenter

Logistics

Send text questions, or

“Raise your hand” to speak and we’ll open your mic

Will publish link to recorded webinar

Jarred McGinnis, PhD

Business Development, UK

Agenda

Text analytics: accuracy, precision, recall

Customized linguistic resources for improved accuracy

MeaningCloud customization tools

Conclusions, Q&A

Text analytics

Extract meaning and actionable insights from unstructured content Automatization of costly manual activities

OpinionsFacts

Concepts

Organizations

People

Semantic

Analysis

Relationships

Themes

Just how precise is precise?

Precision is relative

Even experts aren’t 100% precise

Tests involving human analysts: 85-95% agreement

Along with precision, recall is also important

High precision High recall

High precision Low recall

Low precision High recall

Identified by algorithm

Accuracy: precision & recall

Precision and recall are inversely related

Trade-off needed

Requirements are application-specific

Brand monitoring in social media: high precision, low recall

Counter-terrorism : high recall, low precision

Precision – Recall Curve

State of the Art for Text Analysis

Precision Measurements

Topic Extraction: 70-85%

Classification: 70-80%

Sentiment Analysis: 60-70%

Quality improvement depends on the adaptation of the tools and resources to the application / task

MeaningCloud: cloud-based semantic APIs

Register and use it FREE at http://www.meaningcloud.com

APIs services of MeaningCloud Sentiment analysis Global Aspect-based

Classification Standard models

Topic extraction Entities Concepts Dates Addresses Economic quantities Time expressions …

https://www.meaningcloud.com/demos/media-analysis/

MeaningCloud: standard resources

Ontodaedalus (ontology) 437 nodes

78 themes

250,000+ lemmas/language

https://www.meaningcloud.com/developer/documentation/ontodaedalus

MeaningCloud: Standard classification models

‘out-of-the-box’ support of well-known classification standards

IPTC: news

Business Reputation: corporate reputation

EuroVoc: public administration

IAB (coming soon): advertising https://www.meaningcloud.com/developer/resources/models

A practical example

A walk through MeaningCloud customization tools

VoC / Customer Insights scenario

Social networks, forums

Survey verbatims encuestas

Contact Center interactions: voice, email…

Structure and extract meaning

What companies/ brands are they mentioning?

What are they talking about?

What’s their opinion?

Analysis

Insights

Opinions

The sentence “The highest interest rate in industry!” is…

Positive, if talking about savings

Negative, if talking about mortgages

Customized linguistic resources improve accuracy

Mentions

Names of banks and financial companies, e.g., Citibank, BBVA

Product names, e.g., Your Waysm Account. Compass Account…

Themes

Example: analysis of a bank’s customer opinions

Products

Accounts

Checking

Savings

Borrowing

Credit

Mortgage

Channel

Office

Phone

Internet

Demo agenda

Bank dictionary

Bank names, product names (entities)

Generic product names, e.g., mortgage (concepts)

Classification models

Channel model: phone, web…

Sentiment models (preview)

MeaningCloud customization tools

Customized dictionaries

Creating a new dictionary

Possible to import dictionary from file

Creating a new entity

Aliases: It is NOT necessary to explicitly include “trivial” aliases as the engine generates typical variants

Use your own ontology

Possible to include additional semantic info

Resulting dictionary

Entities

Concepts

Dictionary derived ontology

Dictionary import

The best way to include a pre-existing dictionary in MeaningCloud

Form Alias ID Semantic info attributes

Outcome: APIs identify topics in dictionary

Identifies semantic info Product: Cash Card Account Type: Currents Accounts Bank: Barclays

Custom classification models

Creating a new model

Ability to import model from file

Defining a new category: hybrid approach

Rule-based

Training-based

Possible to opt for one of the approaches, or to

combine both, depending on the application

Defining a category: training

Fed with precodified training texts

Based on machine learning technology

Defining a category: rules

Terms that

Are indispensable

Are banned

Increase relevance

Reduce relevance

Improving precision and recall using rules and training

Statistical Rules Hybrid

Benefits Fast, provided tagged

texts are available

Good accuracy for long

texts

No false positives

Very good accuracy for

limited environments

Can be easily started with

training texts

Does not need exhaustive

definition of rules

Dis-

advantages

“Black box” approach

False positives difficult

to correct

Bias in results,

depending on training

Costly if starting from

zero

False negatives,

depending on rule

quality

Difficult to scale

Requires deep domain

knowledge

Resulting model

Classification model import

The best way to configure a pre-existing model in MeaningCloud

Outcome: APIs classify according to model

Justifies classification relevance depending on the terms appearing

Application of Sentiment Analysis

Sentiment: use of custom entity and concept dictionaries

Polarity associated with Barclays

Custom sentiment dictionaries (COMING SOON)

Not all terms have the same polarity in all domains

E.g., in the luxury goods’ domain the term “cheap” doesn’t necessarily have a positive polarity (like in other domains)

Define a luxury goods custom sentiment dictionary where: “cheap” N

A given term can have different polarities, according to context

We’re presently testing this feature. If you want to take part in the private beta send an email to [email protected]

Term Context Polarity

close stock market NEUTRAL

close deal, contract Pos

close company Neg

Conclusions

How to improve accuracy?

Graphical tools

Possibility to include own dictionaries and models

Broad coverage: mentions, themes, opinions…

Empowered users

High accuracy analysis is within your reach.

Democratizing the extraction of meaning

High quality semantic analysis Optimized technology mix

Continuously updates semantic resources

High-level APIs, e.g., Corporate Reputation

Customizable to customer domain: models, dictionaries, sentiment

Affordable, no risks

Mature, tested technology

Test and use for FREE (40,000 requests per month)

Pay per use

No commitment or permanence

Commercial plans beginning at $99 /mo

For developers and non

technical users

Add-in for Excel

Standard web services APIs

Plug-ins and SDKs for diverse environments and languages

Plug-and-play approach

OpinionesTemasHechos

Conceptos

Organizaciones

Personas

Relaciones

Thank you for your attention!

Questions, suggestions...

Jarred McGinnis, PhD

Business Development, UK

[email protected]

http://www.meaningcloud.com

http://www.daedalus.es