22
Getting started with and realising ROI on Text Analytics Ryan Stuart Founder & CTO

Introduction to Text Analytics for Customer Insights

Embed Size (px)

Citation preview

Page 1: Introduction to Text Analytics for Customer Insights

Getting started with and realising ROI on Text Analytics

Ryan Stuart Founder & CTO

Page 2: Introduction to Text Analytics for Customer Insights

Who am I?

•  Software Engineer (previously?) •  Founder & CTO of Kapiche •  Work in the Text Analytics industry

since 2008. •  Interests: Distributed Computing,

Database Design, Machine Learning.

@rstuart85 / @Kapiche Official

rstuart85 / Kapiche

Page 3: Introduction to Text Analytics for Customer Insights

Raise your hand if you are a….. •  Engineer / Developer / Technical; •  Data Scientist; •  Academic; •  Market Researcher; •  Statistician; •  Have “analyst” or “risk” in your job title; or •  Other;

Who are you?

Page 4: Introduction to Text Analytics for Customer Insights

•  Overview of Text Analytics –  What is it? –  Different Types

•  Who are Kapiche? •  Solving Business Problems with Text Analytics –  Automation –  Enterprise Search –  Voice of the Customer (with demo) –  Machine Learning

•  Resources

Overview

Page 5: Introduction to Text Analytics for Customer Insights

Overview of Text Analytics

Page 6: Introduction to Text Analytics for Customer Insights

“…the process of analyzing unstructured text, extracting relevant information, and transforming it into useful business intelligence.”

What is Text Analytics?

•  Consider a big customer survey with two questions: •  How likely are you to recommend Microsoft to your family,

friends or colleagues? (0-10) •  Why did you give us that score?

•  You get 10,000 responses to your survey. Now what? •  Maybe add more structure to the survey? •  Maybe send it offshore to be understood? •  Enter Text Analytics.

Text Analytics performs some sort of dimensionality reduction which results in a lower-dimensional representation of data to serve the task of

analytics.

Page 7: Introduction to Text Analytics for Customer Insights

Types of Text Analytics?

•  Entity Extraction (NER): –  Markup text with entity tags: Person, Organisation, Time etc. –  Used to improve processing/routing of text

•  Classification: –  The process of classifying a piece of text with a fixed set labels. –  Sentiment Analysis and Categorisation are both examples of

classification. •  Topic Modeling:

–  Identifying of high level constructs (topics or ideas) present in the text.

–  Some approaches treat topic as abstract constructs useful for specific tasks (e.g. more like this search). Others use them as a mechanism for understanding data.

Page 8: Introduction to Text Analytics for Customer Insights

Who are Kapiche?

Page 9: Introduction to Text Analytics for Customer Insights

What does Kapiche do?

•  Take away all the marketing lingo and Kapiche does automatic Topic Modeling.

•  Not the abstract variety. The understandable variety.

•  The goal is to understand large amounts of data quickly.

•  But what is a topic and how are they identified?

Page 10: Introduction to Text Analytics for Customer Insights

What is a Topic?

•  Remember, most text analytics is just noise reduction.

•  Kapiche uses a pure mathematical approach to determine which terms from a text corpus have high entropy.

•  This is done by combining influence of a term with the frequency.

•  Once these nodes of information have been identified, we begin to build topics around them.

Page 11: Introduction to Text Analytics for Customer Insights

Understand the Data using Topics

Understanding the Topic Model helps us understand the data.

Page 12: Introduction to Text Analytics for Customer Insights

Solving Business Problems with Text Analytics

Page 13: Introduction to Text Analytics for Customer Insights

Automation (prediction?)

•  Text Analytics can help automate a range of business processes.

•  NER and Classification can be used to: – Assign support tickets to the right person

(routing) – Determine if email is spam – Automatically tag new documents in a

database – Fraud detection

Page 14: Introduction to Text Analytics for Customer Insights

Enterprise Search

•  Using a combination of Topic Modeling and Classification / NER, it’s possible to come up with a bunch of different approaches to search.

•  NER can be used for “semantic search”. •  Abstract Topic Modeling (the type where the

topics are abstract constructs) is great for More Like This.

•  Concrete is great for understanding the search results and finding what you are looking for (quick demo).

Page 15: Introduction to Text Analytics for Customer Insights

Voice of the Customer

•  Perhaps the most powerful tool in sales and marketing is knowing what your customers think about your brand / product / business.

•  It has always been possible to just ask them of course, but what do you do with the responses? Read them all?

•  Actually, that is the exact approach most companies take. They develop complicated coding frameworks and offshore it all.

•  Obviously, that is a seriously flawed (human bias?) and expensive approach. So much so that surveys are tailored to be easier to extract knowledge from.

Page 16: Introduction to Text Analytics for Customer Insights

Sentiment Analysis for VotC

•  Sentiment Analysis is usually how people get started. It has problems though.

Gee, I really love the complementary snacks on Virgin

Airlines!

•  Sentiment analysis is traditionally just a classification problem using machine learning.

•  Generally require a new model for each data domain.

Page 17: Introduction to Text Analytics for Customer Insights

Topic Modeling for VotC

•  Companies like Kapiche (and Luminoso for example) are trying to make it easy to understand your customer.

•  The approach is generally based around some degree of automated insight extraction.

•  In the case of Kapiche, we are trying to reduce the noise to significantly decrease the time to understand customers.

•  This technology doesn’t replace the analyst! It does reduce the amount of expertise need though.

Page 18: Introduction to Text Analytics for Customer Insights

Demo!

Page 19: Introduction to Text Analytics for Customer Insights

Future of VotC

•  The current best practice for survey design, which a bunch of structured multiple choice questions, is flawed.

•  It’s build around the idea that automating the extraction of insights from text is hard.

•  These complex surveys also result in low engagement rates.

•  Technology like this has the ability to change how we design customer surveys.

•  I propose simple surveys with only 2 questions. •  Also consider how we are extracting value from social

media, call centre data, etc.

Page 20: Introduction to Text Analytics for Customer Insights

Machine Learning

•  Another way to describe dimensionality reduction in a manner for Machine Learning is feature extraction.

•  Combining features extracted using some techniques from Text Analytics with structured data to build a classifier has lots and lots of uses. –  News reports and stock price changes? –  Book content and customer review scores? –  Movie scripts and critic ratings?

•  The traditional approach here has been Bag of Words. •  New methods like Word2Vec and GloVe are emerging that

don’t discard structure of the text.

Page 21: Introduction to Text Analytics for Customer Insights

Resources

•  Word2Vec - https://en.wikipedia.org/wiki/Word2vec •  GloVe - http://nlp.stanford.edu/projects/glove/ •  Sentiment Analysis -

https://blog.monkeylearn.com/sentiment-analysis-apis-benchmark/

•  Kapiche for Research – https://research.kapiche.com •  Gensim - https://radimrehurek.com/gensim/index.html •  NLTK - http://www.nltk.org/

Page 22: Introduction to Text Analytics for Customer Insights

Questions?

W:  h%p://kapiche.com  E:  [email protected]  

Slides:  h%p://goo.gl/mPnArB