1 SOCIAL MEDIA MINING WHAT IS IT GOOD FOR? AND WHEN IS IT GOOD
ENOUGH? Nick Buckley SoShall Consulting asc Funky Data 25 th
September 2012
Slide 2
2 The Plan What is Social Media Mining? [SMM] How do Market
Researchers tend to think about it? Nuts & Bolts practical
outcomes Challenges and Constraints [How] Do these make Researchers
re-think the place of SMM Where will it go from here? BUT:
Assumption of a vendor researcher distinction even if in house No
naming or comparing of vendors/applications Difficult to judge
where to pitch the basics too familiar vs. too abstract
Slide 3
3 1. What are we talking about?
Slide 4
4 Definition* of social media monitoring: Social Media
Monitoring (SMM) means the identification, observation, and
analysis of user-generated social media content for the purpose of
market research. What exactly are we talking about? What they say *
http://www.social-media-monitoring.org
Slide 5
5 What are we talking about? Social Media... Review sites
Professional & Consumer Blogs/ Microblogs Forums Client sites
Video sites Public Communities Newsgroups News sites
Slide 6
6 Whats in a word? GfK NOP currently prefers Mining. User
generated content in social media lays down a rich seam of
activity, opinion, thought and information mess, echoes and whimsy.
For some time marketing and PR professionals have been monitoring
Social Media to capture headline buzz in real time, and to detect
sudden changes requiring a response. But collecting and counting
this content is only the beginning of a process which can add value
via many techniques including integration with other sources such
as market research data.
Slide 7
7 2. What happens when Market Researchers get hold of it?
Slide 8
8 Sony brand damage was driven by PlayStation breach (2011)
sony buzz this year sony sentiment this year sony buzz in april
sony sentiment in april playstation buzz playstation sentiment
Slide 9
9 Market Researchers believe that SMM can also give clients a
window on other dimensions of online conversations Category
Dynamics Consumer needs Problems and issues consumers discuss
Product usage discussions New product entries & trends in
purchase intention Corporate Corporate mentions related to
reputation Crises Social issues Brand/Product Brand/sub-brand
mentions, brand buzz Number of positive vs. negative sentiments for
each brand including customer service Brand content analysis, whats
being said about brand Advertising noticed most and related
discussion launch tracking Source of mentions (specific sites) and
the most influential sites Competition All the above for preference
& competition SMM provides insights into:
Slide 10
10 Market Researchers are fitting SMM into different places
within method or process As a precursor to traditional Market
Research Refining hypotheses for research design Prioritising
criteria identifying new ones Defining or qualifying the
competitive set Identifying niche respondents for small-scale
studies As a successor to traditional Market Research Tracking the
impact of implemented findings Monitoring for events which may
create discontinuities in this Low intensity/low detail follow-up
As a companion to traditional Market Research Compare and contrast
e.g unconditioned Add granularity to satisfaction drivers
Complement reach Interpolate lengthy studies So can SMM research
stand alone? Is there a hierarchy, within these hybrid uses, of
best fit. Does the story change if you get longitudinal with a
category? To what extent do some of these uses assume that the data
can be treated like conventional MR data? In any case should it be
treated and analysed thus?
Slide 11
11 *within certain technical limitations You can ask a new
question without having to issue a new questionnaire* Unconditioned
by participant awareness of a research process, often more emotive
than considered survey responses Low cost - under certain
circumstances Spontaneously generated content - unconstrained by
research frame Offers insight into active social media users
Potentially global Very immediate Not necessarily representative of
the general population Difficult to weight back to general
population, as demographic data is sparse Automated sentiment
analysis only as good as the algorithms [and these vary greatly]
Automated harvesting can capture a lot of noise for certain words
or brands No guarantee of sufficient data Costs rise when we use
supplementary analysis to overcome some of these issues But
inevitably they think about comparison with surveys
Slide 12
12 Different approaches for different client needs For example
- Precision Extraction vs Trawl & Filter Crude mention &
mood tracking Quantitative - Brand tracking and integration with
traditional research Indicative Qual e.g. using trends and volumes
to guide focus of analysis Exploratory Qual more complex
collection. Manually manageable volumes and tuning Higher data
volumes from simple search terms Lower data volumes from targeted
& compound search terms More post processing, applied to data
by GfK - to reduce noise and refine sentiment attribution Accept
raw data output from application
Slide 13
13 3. Too Abstract?
Slide 14
14 The raw material - Results from search terms SMM
applications extract results from wholesale supplies of data,
conducting searches defined by search terms can be anything from a
simple and distinctive brand or product name, to a complex
expression configured to capture discussions about a category or
concept search terms combine words or phrases via logical
instructions such as AND, OR, NOT by employing functions such as
WITHIN to detect words in a certain proximity to each other with
brackets that can dictate sequence in which instructions are
applied e.g. word1 AND ( word2 OR word3 ) SMM applications extract
results from wholesale supplies of data, conducting searches
defined by search terms can be anything from a simple and
distinctive brand or product name, to a complex expression
configured to capture discussions about a category or concept
search terms combine words or phrases via logical instructions such
as AND, OR, NOT by employing functions such as WITHIN to detect
words in a certain proximity to each other with brackets that can
dictate sequence in which instructions are applied e.g. word1 AND (
word2 OR word3 ) 14
Slide 15
15 Typical SMM application offers a dashboard view of data
returned by these search terms and the facility to export the
underlying data
Slide 16
16 Analyses Whatever the Search Terms define here is what can
be measured about the results returned in combination or in
isolation Volume how much is it talked about, and how is this
changing over time Volume how much is it talked about, and how is
this changing over time Channels where on the web is it being
talked about twitter, blogs, forums, comments? Channels where on
the web is it being talked about twitter, blogs, forums, comments?
Location where in the world is it being talked about? Location
where in the world is it being talked about? Themes what other
words and phrases are most regularly associated with it? Themes
what other words and phrases are most regularly associated with it?
People who is talking about it? That may be by influence according
to various proprietary indices or by demographics [to be used with
caution] People who is talking about it? That may be by influence
according to various proprietary indices or by demographics [to be
used with caution] Sentiment : Across all of these variables is
superimposed automatically generated Sentiment analysis positive,
negative or neutral language associated with the subject of the
posts Verbatims drill-down to individual posts, in their own words
what do people actually say? Verbatims drill-down to individual
posts, in their own words what do people actually say?
Slide 17
17 Combinations of these basics tell different types of story
Brand As new ad was mainly discussed on Forums when it was being
shot by a famous pop star, but was mainly discussed on Twitter when
it was being aired. Volume + Channels Automotive brand X is
associated mainly with topics around performance, whereas brand Y
is associated with comfort and style. Both enjoy roughly the same
level of positive sentiment overall. Themes + Sentiment Beverage
brand N enjoyed a bigger spike in its mentions when news of a
future big game at a sponsored venue was announced, than it got
from a tournament sponsorship that was live at the time. Volume vs.
Offline Schedule Some general social Forum sites enjoy bigger
concentrations of discussion of a particular topic than specialist
Forums dedicated to that same topic! Channels + Themes + People
17
Slide 18
18 Examples of outcomes from SMM studies Focus on the right
social media channels at the right time. Differentiate trade press
buzz from real engagement. Consumers dont always talk about the
product features that you highlight. Places where naturally
occurring discussion of a category offers an opportunity for brands
to intercept rather than try to create competing social media
conversations. The world can sometimes throw up more interesting
stories about you than you could hope to generate for yourself but
not always with the connotations you would like.
Slide 19
19 BUT!
Slide 20
20 There are many forces which erode this nice model Accuracy?
Reach?...................................................
Relevance? Reach image from titletrack.com
Slide 21
21 Accuracy Is the searched-for phrase even in the returned
snippet? Is it real content or is it Navigation? Ticker or title
content? Ad Content? Various species of spam [overlaps with
Relevance]? Is meta-data about the poster Present? Reliable?
Understanding this, apart from making your own manual checks, is
about understanding your 3rd party vendor and, often, their
wholesale data suppliers in turn.
Slide 22
22 Reach [T]here are known knowns; there are things we know
that we know. There are known unknowns; that is to say there are
things that, we now know we don't know. But there are also unknown
unknowns there are things we do not know, we don't know. Donald
Rumsfeld Are these results from scrutiny of the entire [English
speaking] social webNo Are they results from a very large,
sometimes stated, number of social sources? Yes Could this range be
skewed relative to the subject under scrutiny? Yes Where its
Twitter data is it from the whole of Twitter Maybe Is historical
data always the same basis as current data, or data gathered since
the search was defined? Not always Do we always have a good idea of
what the Reach is? No
Slide 23
23 Relevance Even when the application has collected exactly
what we asked for, and it is legitimate content, with some nice
useful data about the poster it might not be relevant Cats are
great company. #EMT Bolt one cool cat! Also, the Cat is a great
resort I love my aunt Cat! I think Cat Stark is worse than any
Lanister. I think this hurricane was a scam cooked up by the fat
cats in Big Grocer.
Slide 24
24 put another way Oh s**t! I forgot its still the
internet.
Slide 25
25 Other challenges include However, commencing too early
public smoking facts will just overstress your pet ; quite a fresh
pet will not learn everything from services. Just after he has
ended up perched for some a few moments, supply him with the
particular take care of, plus for instance in advance of, make sure
you compliment the pup. When dog house teaching your dog, continue
to keep the dog house in the vicinity of the spot where you as well
as the canine are usually conversing.
Slide 26
26 And I havent mentioned automated Sentiment Analysis yet!
Irony really? Slang/Dialect/Register Multiple meanings 50 strong
Adjacent subjects My beautiful FIAT next to a BMW
Slide 27
27 4. And what is Good, and what is not Good?
Slide 28
28 To Recap SMM tools make it very easy to Super Google certain
Brands, people, objects and even categories or concepts quickly
generating tables and charts. But underneath theres a complex story
about accuracy, reach and relevance which you only really see when
you drill down and which you only really understand by getting
inside the providers systems and sources. The fact that this isnt
blazoned across all dashboards, is about the fact that many
solution providers started out somewhere else with monitoring. Its
not that they should have anticipated our needs. Sentiment analysis
is only part of this story it doesnt define it.
Slide 29
29 Relationships matter as much as technology Social Media
Content 3 rd Party System [e.g. SaaS] 3 rd party organisation
Vendor Dashboard-wielding MR Agency Clients FEEDS Queries and more
refined requirements Reports [inc post hoc analysis] Results
Modified searches Topic-specific feedback Customise Engine
Customise Feeds Wholesalers?
Slide 30
30 Natural Language Processing [NLP] to the rescue? Definition
Specifically, it is the process of a computer extracting meaningful
information from natural language input and/or producing natural
language output* Many SMM applications now claim some level of NLP.
*Warschauer, M., & Healey, D. (1998). Computers and language
learning: An overview This may legitimately be contrasted with
simpler analysis of vocabulary combinations, and probabilistic
methods, it sometimes means little. It may only mean that some
rules of language have been attended to in what is still
essentially a pattern- matching exercise
Slide 31
31 But clearly sophisticated NLP can make a big difference
Improved Accuracy including filtering out of unstructured spam More
tools available to achieve/check Relevance Much-improved Sentiment
Analysis Trends: theres more NLP not just in social media analysis,
theres more commercially affordable NLP and it keeps getting
better, some of it is even helpfully self-auditing. Significantly,
when NLP is set to retain only high-confidence classifications,
volumes of results are dramatically reduced.
Slide 32
32 Barking up the wrong Tree? Researchers instincts have been
to use, and so judge, SMM like survey data. But what is good the
ancient philosophers would tell us, is really about function and
purpose. I think weve now learned enough about SMM to stop and
ask.. what was it we were trying to do?
Slide 33
33 Remind me what we are trying to do? Use the social web as a
proxy for the population? Understand how the social web is
responding for the benefit of those solely interested in this
sub-set of the population as a channel or marketplace? Access
particularly niches which are more concentrated online than off?
Detect significant events? Measure shifts and changes? Make rough
comparisons? Discover new insights, themes and connections?
Slide 34
2012 GfK NOP 34 Different client needs indicate different SMM
approaches For example - Precision Extraction vs Trawl & Filter
Crude mention & mood tracking Quantitative - Brand tracking and
integration with traditional research Indicative Qual e.g. using
trends and volumes to guide focus of analysis Exploratory Qual more
complex collection. Manually manageable volumes and tuning Higher
data volumes from simple search terms Lower data volumes from
targeted & compound search terms More post processing, applied
to data by MR agency - to reduce noise and refine sentiment
attribution Accept raw data output from application Not radical
enough! Too much like hard work? Sensible
Slide 35
35 Rather than wait for NLP utopia Settle, for now, on: 1.SMM
as a powerful and novel Qual exploration tool 2.Big number
crunching, on single terms, that takes a hyena approach. i.e.
Accept all* occurrences of a brand or product name in posts as an
indication of significance even the trending spam and the adverts
and the competitions Look for pure correlations between
words/phrases and other word/phrases Or between trends in these
numbers and classes of offline events such as sales, complaints and
other behaviours with a view to predicting, explaining or causing
such events in the future. *Except for the most obvious duplication
errors such as over-indexing
Slide 36
36 5. Some Concquestions
Slide 37
37 Talking Points How will commercial SMM applications and
services with the best accuracy, reach and relevance capabilities
be recognised, validated and promoted? If youre a researcher and
you want to use this stuff, for the first time, tomorrow what must
be done? Fortunately theres enough to learn by super-googleing,
browsing and crude trend tracking to keep us going and learning for
some time to come. Is that, whilst pragmatic, enough of an
ambition?
Slide 38
38 Dr Nick Buckley SoShall Consulting Tel: 07958 516967 t:
@grimbold E: [email protected] Babita Earle Digital Strategy
Director GfK NOP Tel: 020 7890 9467 E: [email protected]