The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

Synthesio – The Truth About Natural Language Processing -‐ March 2011 1

The Truth About Sentiment & Natural Language Processing By Synthesio

Summary Introduction .2

Artificial Intelligence’s difficulties with sentiment .3 Human analysis is an obligatory step when analyzing web content .5

Current technological advances .5 The future of semantic technology .8

.7.Conclusion .10


Introduction

The web has made it possible for brands to discover what

people are saying about their brands online, either in mainstream media like online newspapers and magazines, or on social media. Consumers now search for opinions online before, during, and after a purchase. The next step for brands is finding out whether people are talking positively or negatively about their brand, and why. Some online ratings provide a number but not the reasoning behind it, and may only present half of the story. Numerous companies have been working on text mining for close to 30 years in some cases, thus sentiment analysis is not a new area but it has become a hot topic thanks to social media. Social media monitoring companies, as well as PR practitioners, and digital marketers in general, have waged debate over whether sentiment should be analyzed by man or machine. Synthesio currently uses human analysts for sentiment analysis but can add natural language processing capacities on a case-‐by-‐case basis. Although technology is quickly advancing to catch up on its lag behind human analysis, as we advance toward what is referred to as the singularity, it seems as though the best option is currently combining both machine and man.


Artificial Intelligence’s difficulty with sentiment One way that researchers have attempted to classify sentiment is by creating a “sentiment lexicon” Sentiment is not analyzed via artificial intelligence, as some people may be tempted to think. Rather, it is analyzed via a systematic process that involves the use of a sentiment lexicon. This lexicon assigns a degree of positivity or negativity to a word by itself that is then used to give meaning to the entirety of the article. This is a way of analyzing sentiment, then, by considering a type of inherent positivity or negativity of each word that would be used by someone to talk about your business or products. For example, “happy” would be deemed a positive word, as well as “like” and “love”. At the opposite end of the spectrum we can see words like “hate”, “dislike”, etc. There are two problems with this methodology, however. The first problem is that this assigning of positive and negative sentiment evaluates a word without the context of what is around it. The dictionary is extremely limited in the number of words that will always attach a positive or negative sentiment to an expression. The second problem is that researchers may assign different degrees of positivity to a word. Particularly in the case of ambiguous expressions, a researcher may be more inclined to note a word as more or less positive.

Text categorization classifies articles by topic1 Text categorization does not classically look at the various features mentioned within one article. Sentiment analysis has traditionally been performed using technology that evaluates an article at a global level. Within one text, however, the topic may not be linked to the descriptors. For example, take the sentence: “This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up.” The sentence should be positive, given the number of positive descriptors. It is only at the end that a human can identify the finality of the judgment that is overall negative. The dictionaries used are developed through analysis of various factors, including sentiment polarity and degrees of positivity (“like” vs “dislike”; relatedness of topics..), identifying which parts of a document contain subjective content (subjectivity detection and opinion Identification), identifying which parts of a document regard the same subject before analyzing (joint topic-‐sentiment analysis), and determining the political orientation of a text (viewpoints and perspectives). Other non-‐factual information in the text can also be taken into account. For example, there are six “universal” emotions: anger, disgust, fear, happiness, sadness, and surprise2 that may be analyzed, as well as term presence, term frequency, syntax, and negation.

The majority of sentiment analysis literature has focused on text written in English This means that for the time being, most of the resources that have been developed for automatic sentiment analysis have been developed in English and for the English language. We looked at this with Seth Grimes, a text analytics expert, later in this document in an exclusive interview, but there have traditionally been two types of solutions. One solution for multilingual resources has been using bilingual dictionaries to transfer the corpus, meaning finding parallels for all of the rules that were applied to the English texts. A second solution has been to apply sentiment analysis to a translated version of the text, but accuracy rates may be questionable.

1 Opinion mining and sentiment analysis, 2008 2 Idem


Seth Grimes, expert in NLP

There are companies that propose sentiment analysis in one language (typically English) while others propose an analysis in 10 different languages. Linguistic approaches (lexicons and dictionaries) may be used for several languages, but they have incomplete sentiment capabilities in most of them. Translating linguistic content in French or Chinese, fore example, can’t possibly offer the best results.


Human analysis is an obligatory step when analyzing web content Machines are capable of deciphering meaning from large amounts of information An advantage of having an automation of text analysis is that computers are able to work on large pieces of text that are homogenous in form and written in one language much more quickly than a human ever could. Much as in the same way that macros in Excel accelerate the speed at which a human may advance, having algorithms treat information can accelerate sentiment analysis. The text must be written using a specific vocabulary, however, with very little variability, in order to obtain high levels of accuracy.

Collocations and complex syntactic patterns have been found to be useful in detecting subjectivity3 Some technology experts have attempted to create syntactic relations within feature sets that are then tested on text corpuses to “train” the software and allow for the detection of subjective expressions. This is done by creating syntactic templates that are run through a training corpus, generating extraction patterns for every time the templates appear. For example, <x> pleased me should match any time the word “pleased” is present. There are certain limitations to this technique, as the software will then search for specific syntactic expressions and not exact word sentences. When analyzing for sentiment, then, this is only the first step in identifying if there is sentiment present at all.

Online reviews have had the most success with NLP online “Opinion-‐oriented information extraction” is advancing in identifying subjects in a text and their relationship with the words around them that give them their context.4 Nouns in online reviews are particular in that they most likely – but not always – pertain to the product or service being reviewed. The context is similarly most likely – but not always – the reviewer’s opinion of such product or service. Whereas other online media, like blog posts, may post various opinions throughout one post, with both positive and negative sentiment attached accordingly, online reviews are one type of media that is typically focused on uniquely one subject. A heuristic for NLP software has been to detect adjectives that are in the same sentence as the feature/product/service being evaluated. These can then be analyzed by manual or semi-‐manual rules or lexicons. Specialist in PR Relations KD Paine explains:

“Computers can do a lot of things well, but differentiating between positive and negative comments in consumer generated media isn’t one of them. The problem with consumer generated media is that it is filled with irony, sarcasm and non-‐traditional ways of expressing sentiment. That’s why we recommend a hybrid solution. Let computers do the heavy lifting, and let humans provide the judgment.” –KD Paine

3 Learning Extraction Patterns for Subjective Expression 4 Opinion mining and sentiment analysis, 2008


Current technological advances Technology is continually progressing Mao and Lebanon are two researchers who proposed using “isotonic conditional random fields” to analyze sentiment at sentence level.5 They created mathematical calculations to determine sentiment, given that certain words may be strongly positive or negative and thus affect the “local sentiment” positively or negatively. These could be new models for programming machines to determine sentiment within certain probabilities by also incorporating the author into the equation. Uses like these are interesting because human reviewers do not always agree, either.

“I, for one, welcome our new computer overlords” – Ken Jennings, Jeopardy contestant

Watson, a question-‐and-‐answer computer developed by IBM, made history on Jeopardy this year, an American game show renowned for its difficult question-‐and-‐answer format, by making an appearance against two top champions. Contestants typically study volumes of encyclopedias in order to arrive at the final round, but IBM put their supercomputer Watson to the test – and he won. Programmed not only to buzz in according to the level of certainty

he had for each question, Watson was trained to answer in the form of a question and decipher the complex language that goes into a game of Jeopardy. The category names are often puns, as well as the “answers” (which serve as questions).6 IBM proved that their technology has advanced to the point where it can intelligently parse language and weigh different parts of a phrase. Researchers scanned some 200 million pages of content — or the equivalent of about one million books — into the system but were unable to teach it to avoid all traps. During the practice session, one wrong answer led to a string of wrong answers, as the machine veered into a wrong direction.

The web is comprised of many different types of media, both mainstream and social Some media online are more “fact-‐based,” such as newspapers or general news, while other are inherently more “opinion-‐based,” like Twitter, Facebook, and forums. Still other media may be one or the other, like blogs, all of which makes it difficult for automated sentiment analysis technology to differentiate between subjective and objective information. For example, if we look at the sentence “the battery lasts 2 hours” versus “the battery only lasts 2 hours,” there is a sentiment that is implied in the second sentence that is not in the first. Social media has also engendered new forms of expression via an “SMS-‐like” writing on social media that makes text analysis more complicated. Emoticons may or may not help, and slang is more commonly used in social media, along with misspellings and bad grammar, or poor syntax like missing or added characters. Take, for example, “oh my gooooooood WTF did you see Biebur’s concert? It was aewsome! I lved it.” New forms of association and ways of depicting negative sentiment have also arisen, including ironic or sarcastic phrasing. “Another winner from the almighty Microsoft,” for example, or most recently “Charlie Sheen is a winner.”

5 Isotonic Conditional Random Fields and Local Sentiment Flow 6 IBM and the Jeopardy! Challenge -‐ Video -‐ Wired


Automated sentiment analysis cannot understand sentiment in the context of your business goals One factor that many automated proponents have struggled to respond to is analyzing text in the context of a business. For example, “Nike reports that profits rose” and “Adidas reports that profits rose” are both positive sentences when evaluated with no context. If Nike is the firm listening to social media, however, the second phrase is suddenly not as positive. The “goodness” or “badness” depends on whether the client is Nike or Adidas. Beyond looking at whether the information is positive or negative for a client, automated text analysis may extract information that the company already knows or does not wish to focus on. For example, the level at which machines can decipher meaning is often limited to what brands already know. If a machine is told to analyze the top trends around a brand, it may include information that the brand already knows.

Automated analysis is limited in analyzing sentiment for several topics within an article Only now are certain technologies emerging that can analyze sentiment at a feature level, but in general automated sentiment analysis technology has difficulty distinguishing sentiment between one topic and another, particularly if more than one are mentioned in the same sentence. A blog post may be positive in the first sentence and negative in the second, or there may be one overall sentiment for the blog post with positive and negative comments. “Much work on analyzing sentiment and opinions in politically oriented text focuses on general attitudes expressed through texts that are not necessarily targeted at a particular issue or narrow subject.”7 A blogger, for example, may compare 2 products within the same post (or more). Posts on a forum are often responses to earlier posts, and the lack of context makes it difficult for machines to decipher whether the post is in agreement or disagreement.

7 Opinion mining and sentiment analysis, 2008


The future of semantic technology An interview with Seth Grimes, an “Analytics visionary”

“Watson”, the IBM computer won on the game show, Jeopardy, created a huge buzz around “his” technology. Why do you think there was so much buzz? Getting a computer to play Jeopardy was a great stunt. IBM made the technology do something that everyone can understand. It was a “stunt,” however, because the ability to win Jeopardy is not in high

demand in business or society. Nonetheless, Watson’s Jeopardy playing helps the non-‐technologist public understand the potential and the reality of the technology. Question-‐answer systems are already out there, automating responses to business questions – for instance, for contact-‐center support, customer inquiries, and online commerce – no requirement for a live person on the line. Right now Watson is focused on extracting factual information, but the technology could be working on sentiment via a sentiment “annotator.” Then we won’t be limited to asking questions about facts. We’ll be able to ask about opinions and emotions. (An annotator analyzes text and marks it up with meaning, or attributes, features in the text. For example, a name identity annotator finds geographic locations and “marks them up”, finding semantic meaning. Annotating pattern-‐based entities can find addresses, identity location numbers by looking for patterns, and other annotators can mark up other parts of the text.) How accurate can this technology be? Accuracy goals, and the amount of work you put into meeting them, should be decided in light of the business problem. Some problems will be solvable even with low levels of precision (e.g., positive versus negative sentiment classification) while you might need higher precision for other applications. “Recall,” the ability to identify all applicable cases, is also factored into accuracy measurements. My impression is that most sentiment tools that extract entities have out-‐of-‐the-‐box accuracy (without training) of something like 40-‐50% but can be “trained” (by having humans create marked-‐up samples or language rules or correct the tool) to reach above the 80% level. I saw one claim of 98% accuracy, which is laughable and ludicrous. The only way you can do this is by highly restricting the problem and tailoring the solution and being more lenient on what counts as accurate or not. It matters most, first that you identify that there is sentiment there at all, without even identifying if it is positive or negative, and then passing materials on for human or machine classification. With machine filtering and humans analyzing, for certain problems, you can yield high levels of accuracy. If you really want the machine to do everything, you need to do a lot more work or you will get much lower levels of accuracy over all, but again, decisions should be made based on business needs and also the nature of source materials. Let me add that I consider that while tools that analyze only at the message or document level may be accurate, the results they produce will also often be far less than useful. Think about it. It might be helpful if you’re running, say, a hotel group with 4,200 hotels, to know that (making up numbers) 77% of reviews were overall positive, 17% neutral, and 6% negative. Wouldn’t it be far more helpful to know, by hotel, opinion details? You want to know when a reviewer found that room cleanliness and staff friendliness were exemplary but that noise was a problem. The details in a net positive review are not typically going to be all positive, and only by knowing sentiment at a detailed, “feature,” level can you reinforce what’s great and correct what’s not. By the way, let’s not overstate the accuracy of human sentiment analysis. The best study I’ve seen of accuracy was done at the University of Pittsburgh in 2005. While they found only 82% human agreement in annotating for sentiment Results jumped to over 90% when they removed uncertain cases (when they subtracted cases where people said they weren’t sure).


Are there certain online channels (among forums, blogs, Twitter, etc) that are easier to analyze using text mining as opposed to others? To really do it well you have to go to the feature level (to the individual item). You need strong natural language processing (NLP) to do that right. Twitter is interesting because it is very hard to express more than one idea in a given tweet. Most tweets focus on a single idea which, in theory, should make it easy to analyze. The problem is, people use a lot of slang and abbreviation, which makes it difficult to analyze, as opposed to a blog or article. Also, a tweet is often part of a conversation. Very few tweets stand on their own; many including an article link or are responses to someone, for example. Others are part of multi-‐way conversations, and you very often need to understand the whole conversation to get the context. Most of the tools that are out there don’t do that; they don’t reach “through the tweet” to take into account the threaded nature of Twitter conversations. The more text there is, the easier it is to analyze, but at the same time the shorter it is the more focused it’s going to be. But let’s move from ease of analysis to business value delivered. Applications like Synthesio’s get a lot of visibility because so many people use social media, but customer service is the sentiment-‐analysis application that has probably delivered the clearest business benefits, the greatest business value. Contact centers and surveys provide important data that is more focused than material out on the web, associated with actual customers and transactions. You’ll get greater benefit tying customer feedback to social media data, rather than if you spend your funds broadly listening to people that are expressing opinion in a void, without context. There’s no denying the potential benefit in broad social-‐media monitoring and engagement, however. People will tell you what they like about your product (or don’t) and will post things that can be analyzed and shown to be indicators of their intent (to buy, to complain, or cancel their service, etc.) This information can be used to fix problems: the customer-‐service scenario. Answering a customer to make that person happy can turn them into a “net promoter,” and the information can be used to improve quality so the problems don’t happen to other people. Posted and analyzed information – beyond-‐polarity (positive/negative) intent signals – can also be used by companies to identify and act on opportunities. This is engagement that not only reactively responds to particular comments about products and services. It’s engagement that proactively creates new and higher-‐value customers. What recent advances have you seen in sentiment analysis technology? The latest advances in analysis do go beyond “polarity” or “valence” (positive, negative, neutral), and I don’t just mean by rating sentiment on a scale from -‐10 to +10 to capture “intensity”: an advance, but we can do more. For example, you might look at sentiment in the terms of emotional categories such as “angry”, “sad”, or “happy,” about a hotel service, for example. I’m sure we can all think of ways that automated understanding of emotional tone can be useful in business contexts. Then there are the “intent signals” I was just discussing: sentiment as an indicator of plans, or actions. You’re going to get the most flexibility in creating business-‐suited categorizations via statistical approaches. That is, the analyst sets up categories that make sense and drags and drops documents into the different categories for “training” purposes. The machine uses statistical similarity measures to discover what the items in the category have in common in order to automate classification. Further, the market is beginning to understand that influence is best measured by ability to affect business. Certainly influence is correlated with the number of Facebook friends, Twitter followers, and retweets, but what should interest far more is how those measures translate into inquiries, sales, and monetizable perceptions. A person is influential for real if he or she drives business transactions. And the market is understanding just how shallow many of the listening tools are – treating social media as a silo, completely unlinked to enterprise systems and actual business transactions, using simple keyword lists for sentiment classification, and applying sentiment analysis only at message, article, or document level – and that they can and should do better, including by joining the abilities of humans, who judge me and discern, and the power of machines, which are fast, work 24 hours per day, and can tap huge volumes of social, online, and enterprise information that are beyond human analysis regardless of cost.


Conclusion No social media monitoring vendor would dare to pretend that technology can accurately (or even near-‐accurately) assess sentiment on a specific topic. At subtopic-‐level (such as what we do at Synthesio), it is completely impossible. However, NLP can at least help identify trends at a macro level such as hot topics or aggregate changes in sentiment over time. The theory is that even if the sentiment marking is inaccurate (even by an order of magnitude), by tracking and trending it over time we can watch the pattern for changes because we are assuming that the level of inaccuracy will be consistent over time... However, there is no proof of this yet.

About Synthesio Synthesio is a global, multi-‐lingual Social Media Monitoring and research company, utilizing a powerful hybrid of tech and human monitoring services to help Brands and Agencies collect and analyze consumer

conversations online. The result is actionable analytics and insights that provide an accurate snapshot of a brand and help answer the ultimate questions – how are we really doing right now, and how can we make it

better. Founded in 2006, the company has grown to include analysts who provide native-‐language monitoring and

analytic services in over 30 languages worldwide. Brands such as Toyota, Microsoft, Sanofi, Accor Hotels, Orange Telecom and many other well-‐known companies turn to Synthesio for the data they need to engage

with their markets, anticipate and prepare for emerging crisis situations, and prepare for new product or new campaign launches.

WWW.SYNTHESIO.COM