7
Lexalytics Loves Machine Learning Whitepaper

Machine Learning - Lexalytics · Machine Learning Whitepaper. INTRODUCTION We often get the question “why should we use Lexalytics software when there are so many great machine

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning - Lexalytics · Machine Learning Whitepaper. INTRODUCTION We often get the question “why should we use Lexalytics software when there are so many great machine

Lexalytics Loves Machine Learning

Whitepaper

Page 2: Machine Learning - Lexalytics · Machine Learning Whitepaper. INTRODUCTION We often get the question “why should we use Lexalytics software when there are so many great machine

INTRODUCTION

We often get the question “why should we use Lexalytics software when there are so many great machine learning services out there?” Because it’s not a question of “or”, it’s an opportunity for “and.” Besides which, Lexalytics out-performs uni-dimensional methods by layering all the proven approaches in a single package.

Lexalytics maintains almost 40 different machine-learning models, covering things from part-of-speech tagging to language-specific sentiment. We use a number of different types of machine learning, both supervised and unsupervised. We combine those with hand-curated dictionaries, sophisticated natural language processing algorithms, like lexical chaining, and custom context-free grammars.

In other words, we don’t believe that one tool is right for all jobs., and there is no such thing as a free lunch.

Companies that are entirely focused on machine learning tend to harp on applications like extracting named entities, categorizing content into buckets, and document level sentiment. Some of these cases can work really well with a “just machine learning” approach, but even some of these simple cases are better handled by a multi-layered approach.

This paper isn’t going to delve into different machine learning algorithms and the advantages/disadvantages of each, that’s a topic for a different time. We also have a different build vs. buy paper that has a different take on this, a more financial view, so we’re not going to discuss the whole build/buy conundrum here, at least from the “total-cost-of-ownership” or “return on investment” perspective.

There are a few main points that we’ll be talking about:• Singlepurpose• Opacity• Customizability/Trainability• Multi-tenancyissues• Usingtextminingresultsaspartofyour machine-learning models

Page 3: Machine Learning - Lexalytics · Machine Learning Whitepaper. INTRODUCTION We often get the question “why should we use Lexalytics software when there are so many great machine

MODELS ARE BUILT FOR ONE TASK

We’re going to generally consider supervised machine learning algorithms for this part of the paper. The promise of these models is this – take a bunch of content that’s tagged however you want to classify the content, be it “here’s the sentiment” or “here’s the entities” or “here’s the category this content belongs to,” and then pour it into your magic model-building system, and ta-da, you have a really high-precision, high-recall, easy to use system for identifying the classes that you indicated.

And it’s true, models can work really great for those three tasks, which are really all subsets of the same task, - classification. (Which is why we usemachine learning models so extensively throughout our software.)

The difficulty comes when you have what is effectively a multi-pass problem. Say you want to get Entity Sentiment. Take a sentence like “Pepsi is great but Coke sucks.” (Or vice versa, I’m not harshing on Coke here.) Entity sentiment would pull out that Pepsi is positive and Coke is negative. Or, something like “It’s not like I don’t like coffee, I just prefer tea.”

Which is a tad more complicated, but should still end up with a sentiment that is more positive for “tea” than for “coffee.” That’s still a relatively uncomplicated example – there are far more complex cases that involve many sentences in a paragraph.

For these cases, you need to extract the entities, and then you need to associate sentiment with the individual entities themselves. No simple model building process is going to accomplish this task. Yes, there is active research in this area, but no off-the-shelf machine learning system is going to give you this functionality.

Similar thinking applies to Entity Themes, or Entity Summaries, or Theme Sentiment, or Category Sentiment. These are all multi-layered problems that allow you to zoom into what’s actually being said, and don’t just give you a generic view of the content. It’s like spy satellites. Is it better to be able to identify that “there’s a bunch of people there” or is better to be able to say “there’s a known terrorist there, and he’s holding a gun?” Resolution is important, and lone models don’t give fine-grained resolution of what’s in content.

That’s why the Lexalytics text analytics system is built upon layers and layers of functionality.

As a side issue to this, even if you just want to deal with a simple task, like document sentiment, you still need to train a model for each one of those languages. Right now, Lexalytics supports 18 languages, covering approximately 63% of the world’s native speakers. We’ve even broken the layers down further, to take into account some of the vagaries of social content vs. the neatness of news content.

Page 4: Machine Learning - Lexalytics · Machine Learning Whitepaper. INTRODUCTION We often get the question “why should we use Lexalytics software when there are so many great machine

MODELS ARE OPAQUE

Models are black boxes. You feed them a bunch of content, they work their magic and spit out a result. You can’t really see why they gave you this result, but the result is there. Take sentiment as an example. Our primary mode of accomplishing document sentiment analysis takes a multi-step approach of:

1. Tokenization2. Part-of-Speech Tagging3. Chunking4. Extraction of candidate sentiment-bearing phrases5. Matching patterns to ensure that they are actually sentiment phrases6. Evaluating intensifiers and negators7. Considering location of phrases within a sentence8. Performing the final math to come up with an overall document score and confidence measure.

There are several models involved in this process; as well as a very large dictionary of phrases; some pattern files that allow us to differentiate actual sentiment bearing phrases from phrases that have the same strings, but are different part-of-speech patterns, and thus aren’t carrying sentiment (someone named “Rich” isn’t necessarily wealthy); and the final computation to roll it all together.

The results of each stage are completely visible and transparent. You can see every layer that has built up to the final result. This is what “deep learning/neural network” models try to accomplish, but our approach has the added benefit of being able to get in and tune every single layer. Also, although deep learning is “deep,” it’s not necessarily doing all of these steps above. There’s just a bunch of layers in deep learning, but not clean, easily separable phases.

You can pry some models apart and see the vectors applied to certain words, but generally speaking these are black-box things, and it can be hard or impossible to tell just what went into the decision to score a piece of content a particular way.

Page 5: Machine Learning - Lexalytics · Machine Learning Whitepaper. INTRODUCTION We often get the question “why should we use Lexalytics software when there are so many great machine

CUSTOMIZATION

Models are opaque. So what? It’s doing what I want it to do, and I don’t really care what’s inside of it.

Ah. But what happens if your model isn’t doing what you want it to do? For example, one of your users says “I don’t like how this article is scored, change it.” Some model-based systems have feedback built in, and if enough people score something differently, the model will eventually start to take that into consideration.

But it’s not necessarily immediately or precisely the effect you want, as it is going to take the whole article into account. In the Lexalytics system, you would check to see which sentiment phrases were contributing to the sentiment, and then change them to match what you wanted. Then, from there on, that change is reflected in all ongoing content.

There is a misperception that you don’t have to tune models. The very process of training a model is a tuning process. We can do exactly the same thing given the same set of tagged content. We could certainly train a model for you based on your tagged set, and we’ve done that for customers who just want that. But our system has been configured on a set of broadly interesting content, to give generically useful results. It’s usually only a matter of changing a relatively small number of sentiment phrases to get it to where you want it to be. Tuning on the Lexalytics systems can be done by an untrained individual who is not a machine-learning specialist, and will be applied immediately and predictably.

Page 6: Machine Learning - Lexalytics · Machine Learning Whitepaper. INTRODUCTION We often get the question “why should we use Lexalytics software when there are so many great machine

MULTI-TENANCY

Different users have different viewpoints. Even inside of the same organization, you can have different people wanting the same piece of content to be scored differently. We’ve put a lot of thought into providing different views for these users, so that our software can use different configurations for different people with those differing views.

One model can’t handle this. You now need to maintain two separate models for those two users. We can handle the contextualization of the content as well as the varying sentiment weights that the security team might put on terms vs. the social media team’s opinion.

This is clearly most important for our OEM partners who are providing services to many customers, but you will see this inside of some of our large brand customers as well.

JOINING TEXT ANALYTICS AND MACHINE LEARNING

Like I said at the beginning, we’re big fans of machine learning, and we just want you to use it in the right way. We use it every single day. We process billions of pieces of content through our many different models, constantly.

In addition, we work really well as an input into a higher-level ma-chine learning system.

Say you want to do some predictive analytics, or even backwards looking analytics. You have a large amount of structured data already – sales, locations, customer IDs, demographic information, etc. You also may have a bunch of customer comments, tweets about your brand, support emails, sales call logs. As an aside, the best unstruc-tured information (for what we’re talking about now) is that which can be tied back to a customer identity, because you know all kinds of other stuff about them.

You could simply plop all of the structured and unstructured data together, and let the model sort out what is important and unimportant, based on what sort of things you want to predict, like sales. The problem with this is that the view of the unstructured data is going to be simplistic.

Let us distill out the important points, like themes, or sentiment, or roll it up to broad “categories of conversation” like, say, your sales staff. Then, take this extracted output and combine it with your other structured data, and you can see that if you improved the lighting, you’d drive more sales in your Los Angeles location; or that customers who swear a lot are prone to buying fedoras.

In other words, extensive, layered text mining approaches allow for explora-tion and discovery – machine learning-based classification can plop content into multiple buckets for you; but sentiment phrases, themes, summaries: these can tell you things you didn’t know you were looking for.

Page 7: Machine Learning - Lexalytics · Machine Learning Whitepaper. INTRODUCTION We often get the question “why should we use Lexalytics software when there are so many great machine

CONCLUSION

320 Congress StBoston, MA 02210

General Inquiries1-800-377-8036

Salessales @lexalytics.com

1-800-377-8036 x1

International1-617-249-1049

Machine learning is a tool. We use a number of different types of machine learning, including both supervised and unsupervised. We combine layers of machine learning with other tools like algorithms, patterns, and dictionaries. Results from our text analytics system are transparent, and each layer is easily tunable.

Machine learning is a tool that is really good at doing single-layer classification. Meaning, even if it is a so-called “deep learning” system, it’s still going to be aimed at “is this document positive” or “is this document about trees.” Multi-layer problems, like “entity sentiment” require a multi-pass/multi-leveled approach.

Not all machine-learning models require the same amount of effort. Training a generic “cate-gory” classifier is way less effort than training a named entity extraction model, and that, in turn, is way less effort than training something like a part-of-speech model.

We’re happy to train you up your own custom sentiment model, or entity extraction model, or other classification model. However, it is more likely that it will be less work to tune the system in other ways, and you’ll continue to have access to the rich, multi-layered results that we provide.

Let us worry about extracting meaning from your text, and use our distilled output to enable your other models that are looking more broadly at the entirety of your business.