36
Text Analysis and Search Analytics

Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Page 2: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Outline

▪ Text Analysis□ Text analysis in the online environment□ Steps in text analysis□ OfficeStar illustration□ Analysis of Otto’s reviews

▪ Search Analytics□ Basic ideas of search analytics□ Google analytics

Page 3: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Learning goals

▪ Appreciate the potential and challenges of text analysis (particularly in the online context)

▪ Know how to conduct a text analysis of online reviews using Enginius

▪ Explain the idea of search analytics▪ Be familiar with the basics of Google Analytics

Page 4: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Text Analysis▪ a lot of unstructured data are available in the online

world:□ online communities (Twitter, Facebook, etc.)□ review sites (TripAdvisor, Yelp, etc.)□ retail sites (Amazon, Alibaba, etc.)

▪ marketers try to extract useful information contained in these online conversations by their customers;

Page 5: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Steps in text analysis

▪ Social data collection and database setup▪ Feature generation and pruning▪ Opinion word extraction▪ Sentiment analysis (valence and emotion analysis)▪ Summary generation (visualization)

Page 6: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Creating a word cloud

▪ create a text file containing the text to be analyzed;▪ clean the text (remove punctuation marks and

numbers, eliminate common stopwords, use text stemming to reduce words to their root form, etc.);

▪ determine the frequency of occurrence of all the words and eliminate unwanted words;

▪ choose the minimum frequency of words to be used in the word cloud and the maximum number of words to be included;

▪ generate and plot the word cloud;

Page 7: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Sentiment analysis

▪ if numeric ratings are available, they will (hopefully) summarize the overall emotional tone of the text;

▪ sometimes, it is necessary to extract the emotional tone of the text from the text itself;

▪ usually, an attempt is made to classify words in terms of their valence (or more specific emotions):

□ positive words (great, good, delicious, loved, enjoyed, etc.)□ negative words (disappointed, etc.)□ ambiguous words (busy, crowded, etc.)

▪ a sentiment lexicon can be used to classify individual words as positive or negative, and then the overall sentiment of the text can be determined;

Page 8: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

OfficeStar Text Analysis

▪ Brief reviews are available from 54 OfficeStarcustomers who commented on various aspects of their experience with the company;

▪ The reviews cover the period 8/21/13 to 10/29/13;▪ Customers also provided an overall rating of their

experience on a 1-5 scale;

Page 9: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Sample reviews for OfficeStar

Page 10: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Page 11: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Word cloud for OfficeStar

Page 12: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Word cloud for OfficeStar

(without stemming)

Page 13: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Daily frequency of posts for OfficeStar

Page 14: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Histogram of ratings for OfficeStar

Page 15: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Average ratings over time

Page 16: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Valence histogram for OfficeStar

(Number of posts by valence)

Page 17: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Daily post valence ratio

Page 18: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Valence word cloud for OfficeStar

Page 19: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Emotion histogram for OfficeStar

Page 20: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Emotion word cloud for OfficeStar

Page 21: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

In-class exercise:

Analyzing TripAdvisor reviews of Otto’s

Look at the spreadsheet Ottos.xlsx showing all the Otto’s reviews on TripAdvisor from 11/07 to 3/20 (see the detailed description in the file Ottos.pdf):

□ How well is Otto’s performing in terms of overall ratings? Have the ratings changed over time?

□ What does the Word Cloud for both the quotes and the reviews tell you about how well Otto’s is doing?

□ What does the valence analysis tell you about customers’ satisfaction with Otto’s?

□ Does the emotion analysis provide useful information?

Page 22: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Otto’s reviews on TripAdvisor

Page 23: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,
Page 24: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Sample Otto’s reviews

▪ Good Variety of FOODVery good variety of food for everyone in your group. Their house-made ROOTBEER is very good as well as their HARD CIDER. We ate here during our fly fishingtrip to Spring Creek Trout Camp near Bellefonte, PA. It's usually too crowded during football season.

▪ Great foodFood was fantastic! Our server Adam was amazing. I would highly recommend this place. My husband said one of the best burgers he has ever eaten.

Page 25: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Search analytics▪ when somebody conducts an online search, the search

will usually yield both organic listings (sorted by relevance to the query) and paid-for content (product listing ads and text ads);

▪ paid links are based on how much advertisers have bid on specific keywords associated with particular searchterms and other factors (CTR, quality of the landing page, etc.);

▪ if somebody clicks on an organic listing, a display ad (e.g., banner ad) may appear on the landing page;

▪ by inserting tracking codes on the pages of a website, user visits to webpages and related information can be tracked and summary reports can be generated;

Page 26: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Results of a search for ‘kitchen knives’:• Product listing ads• Text ads• Organic listings

Page 27: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,
Page 28: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Google Analytics reports

▪ Audience reports show you characteristics about your users like age and gender, where they’re from, their interests, how engaged they were, whether they’re new or returning users, and what technology they’re using.

▪ Acquisition reports show you which channels (such as advertising or marketing campaigns) brought users to your site.

▪ Behavior reports show how people engaged on your site including which pages they viewed, and their landing and exit pages.

▪ Conversion reports allow you to track website goals based on your business objectives.

Page 29: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Audience Overview Report

▪ “Sessions” are the total number of sessions for the given date range.▪ “Users” are the total number of users that visited for the given date

range,▪ “Pageviews” are the total number of times pages that included your

Analytics tracking code were displayed to users▪ “Pages per session” is the average number of pages viewed during

each session.▪ “Average session duration” is the average length of a session based

on users that visited your site in the selected date range.▪ “Bounce rate” is the percentage of users who left after viewing a

single page on your site and taking no additional action.▪ “Percent of new sessions” is the percentage of sessions in your date

range who are new users to your site.

Page 30: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Audience Overview Report (cont’d)

e.g., this provides data on languages:

Page 31: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Additional Audience Reports

Page 32: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Acquisition Reports

▪ information about the traffic medium, specific source, and name of the marketing campaign;

▪ different types of mediums: □ “Organic” is used to identify traffic that arrived on your site

through unpaid search like a non-paid Google Search result.□ “CPC” indicates traffic that arrived through a paid search

campaign like Google AdWords text ads.□ “Referral” is used for traffic that arrived on your site after the

user clicked on a website other than a search engine.□ “Email” represents traffic that came from an email marketing

campaign.□ “(none)” is applied for users that come directly to your site by

typing your URL directly into a browser.

Page 33: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Behavior and Conversion Reports

▪ information about Pageviews, Average Time on Page, and Bounce Rate;

▪ additional information includes metrics for particular pages of the website, landing page, exit page, etc.;

▪ if website goals were defined, conversion rates can be assessed;

Page 34: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Tracking marketing campaigns

▪ by using campaign tagging, the performance of online marketing and advertising campaigns can be assessed (e.g., a monthly E-mail newsletter with a link offering a special promotion);

▪ AdWords can be used to generate text and display ads; advertisers have to bid on keywords, and they can assess how well keywords and individual ads are performing;

▪ bids can also be adjusted based on certain criteria (e.g., time of day or distance to the store), and the effectiveness of these adjustments can be analyzed;

Page 35: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Keyword analysis

Page 36: Text Analysis and Search AnalyticsText Analysis and Search Analytics Text Analysis a lot of unstructured data are available in the online world: online communities (Twitter, Facebook,

Text Analysis and Search Analytics

Metrics for assessing online

advertising effectiveness

▪ CPM (cost per mille): cost per thousand impressions;▪ CTR (click-through-rate): percentage of people who

see an ad and click on it to go to the website advertised (i.e., clicks on an ad over total impressions);

▪ CPC (cost-per-click): price paid for each click on an ad;

▪ CPA (cost per conversion or action): cost of an ad per action of interest (e.g., purchase, subscription to a newsletter);