1 SOCIAL MEDIA MINING WHAT IS IT GOOD FOR? AND WHEN IS IT GOOD ENOUGH? Nick Buckley SoShall Consulting asc Funky Data 25 th September 2012

Embed Size (px)

Citation preview

  • Slide 1
  • 1 SOCIAL MEDIA MINING WHAT IS IT GOOD FOR? AND WHEN IS IT GOOD ENOUGH? Nick Buckley SoShall Consulting asc Funky Data 25 th September 2012
  • Slide 2
  • 2 The Plan What is Social Media Mining? [SMM] How do Market Researchers tend to think about it? Nuts & Bolts practical outcomes Challenges and Constraints [How] Do these make Researchers re-think the place of SMM Where will it go from here? BUT: Assumption of a vendor researcher distinction even if in house No naming or comparing of vendors/applications Difficult to judge where to pitch the basics too familiar vs. too abstract
  • Slide 3
  • 3 1. What are we talking about?
  • Slide 4
  • 4 Definition* of social media monitoring: Social Media Monitoring (SMM) means the identification, observation, and analysis of user-generated social media content for the purpose of market research. What exactly are we talking about? What they say * http://www.social-media-monitoring.org
  • Slide 5
  • 5 What are we talking about? Social Media... Review sites Professional & Consumer Blogs/ Microblogs Forums Client sites Video sites Public Communities Newsgroups News sites
  • Slide 6
  • 6 Whats in a word? GfK NOP currently prefers Mining. User generated content in social media lays down a rich seam of activity, opinion, thought and information mess, echoes and whimsy. For some time marketing and PR professionals have been monitoring Social Media to capture headline buzz in real time, and to detect sudden changes requiring a response. But collecting and counting this content is only the beginning of a process which can add value via many techniques including integration with other sources such as market research data.
  • Slide 7
  • 7 2. What happens when Market Researchers get hold of it?
  • Slide 8
  • 8 Sony brand damage was driven by PlayStation breach (2011) sony buzz this year sony sentiment this year sony buzz in april sony sentiment in april playstation buzz playstation sentiment
  • Slide 9
  • 9 Market Researchers believe that SMM can also give clients a window on other dimensions of online conversations Category Dynamics Consumer needs Problems and issues consumers discuss Product usage discussions New product entries & trends in purchase intention Corporate Corporate mentions related to reputation Crises Social issues Brand/Product Brand/sub-brand mentions, brand buzz Number of positive vs. negative sentiments for each brand including customer service Brand content analysis, whats being said about brand Advertising noticed most and related discussion launch tracking Source of mentions (specific sites) and the most influential sites Competition All the above for preference & competition SMM provides insights into:
  • Slide 10
  • 10 Market Researchers are fitting SMM into different places within method or process As a precursor to traditional Market Research Refining hypotheses for research design Prioritising criteria identifying new ones Defining or qualifying the competitive set Identifying niche respondents for small-scale studies As a successor to traditional Market Research Tracking the impact of implemented findings Monitoring for events which may create discontinuities in this Low intensity/low detail follow-up As a companion to traditional Market Research Compare and contrast e.g unconditioned Add granularity to satisfaction drivers Complement reach Interpolate lengthy studies So can SMM research stand alone? Is there a hierarchy, within these hybrid uses, of best fit. Does the story change if you get longitudinal with a category? To what extent do some of these uses assume that the data can be treated like conventional MR data? In any case should it be treated and analysed thus?
  • Slide 11
  • 11 *within certain technical limitations You can ask a new question without having to issue a new questionnaire* Unconditioned by participant awareness of a research process, often more emotive than considered survey responses Low cost - under certain circumstances Spontaneously generated content - unconstrained by research frame Offers insight into active social media users Potentially global Very immediate Not necessarily representative of the general population Difficult to weight back to general population, as demographic data is sparse Automated sentiment analysis only as good as the algorithms [and these vary greatly] Automated harvesting can capture a lot of noise for certain words or brands No guarantee of sufficient data Costs rise when we use supplementary analysis to overcome some of these issues But inevitably they think about comparison with surveys
  • Slide 12
  • 12 Different approaches for different client needs For example - Precision Extraction vs Trawl & Filter Crude mention & mood tracking Quantitative - Brand tracking and integration with traditional research Indicative Qual e.g. using trends and volumes to guide focus of analysis Exploratory Qual more complex collection. Manually manageable volumes and tuning Higher data volumes from simple search terms Lower data volumes from targeted & compound search terms More post processing, applied to data by GfK - to reduce noise and refine sentiment attribution Accept raw data output from application
  • Slide 13
  • 13 3. Too Abstract?
  • Slide 14
  • 14 The raw material - Results from search terms SMM applications extract results from wholesale supplies of data, conducting searches defined by search terms can be anything from a simple and distinctive brand or product name, to a complex expression configured to capture discussions about a category or concept search terms combine words or phrases via logical instructions such as AND, OR, NOT by employing functions such as WITHIN to detect words in a certain proximity to each other with brackets that can dictate sequence in which instructions are applied e.g. word1 AND ( word2 OR word3 ) SMM applications extract results from wholesale supplies of data, conducting searches defined by search terms can be anything from a simple and distinctive brand or product name, to a complex expression configured to capture discussions about a category or concept search terms combine words or phrases via logical instructions such as AND, OR, NOT by employing functions such as WITHIN to detect words in a certain proximity to each other with brackets that can dictate sequence in which instructions are applied e.g. word1 AND ( word2 OR word3 ) 14
  • Slide 15
  • 15 Typical SMM application offers a dashboard view of data returned by these search terms and the facility to export the underlying data
  • Slide 16
  • 16 Analyses Whatever the Search Terms define here is what can be measured about the results returned in combination or in isolation Volume how much is it talked about, and how is this changing over time Volume how much is it talked about, and how is this changing over time Channels where on the web is it being talked about twitter, blogs, forums, comments? Channels where on the web is it being talked about twitter, blogs, forums, comments? Location where in the world is it being talked about? Location where in the world is it being talked about? Themes what other words and phrases are most regularly associated with it? Themes what other words and phrases are most regularly associated with it? People who is talking about it? That may be by influence according to various proprietary indices or by demographics [to be used with caution] People who is talking about it? That may be by influence according to various proprietary indices or by demographics [to be used with caution] Sentiment : Across all of these variables is superimposed automatically generated Sentiment analysis positive, negative or neutral language associated with the subject of the posts Verbatims drill-down to individual posts, in their own words what do people actually say? Verbatims drill-down to individual posts, in their own words what do people actually say?
  • Slide 17
  • 17 Combinations of these basics tell different types of story Brand As new ad was mainly discussed on Forums when it was being shot by a famous pop star, but was mainly discussed on Twitter when it was being aired. Volume + Channels Automotive brand X is associated mainly with topics around performance, whereas brand Y is associated with comfort and style. Both enjoy roughly the same level of positive sentiment overall. Themes + Sentiment Beverage brand N enjoyed a bigger spike in its mentions when news of a future big game at a sponsored venue was announced, than it got from a tournament sponsorship that was live at the time. Volume vs. Offline Schedule Some general social Forum sites enjoy bigger concentrations of discussion of a particular topic than specialist Forums dedicated to that same topic! Channels + Themes + People 17
  • Slide 18
  • 18 Examples of outcomes from SMM studies Focus on the right social media channels at the right time. Differentiate trade press buzz from real engagement. Consumers dont always talk about the product features that you highlight. Places where naturally occurring discussion of a category offers an opportunity for brands to intercept rather than try to create competing social media conversations. The world can sometimes throw up more interesting stories about you than you could hope to generate for yourself but not always with the connotations you would like.
  • Slide 19
  • 19 BUT!
  • Slide 20
  • 20 There are many forces which erode this nice model Accuracy? Reach?................................................... Relevance? Reach image from titletrack.com
  • Slide 21
  • 21 Accuracy Is the searched-for phrase even in the returned snippet? Is it real content or is it Navigation? Ticker or title content? Ad Content? Various species of spam [overlaps with Relevance]? Is meta-data about the poster Present? Reliable? Understanding this, apart from making your own manual checks, is about understanding your 3rd party vendor and, often, their wholesale data suppliers in turn.
  • Slide 22
  • 22 Reach [T]here are known knowns; there are things we know that we know. There are known unknowns; that is to say there are things that, we now know we don't know. But there are also unknown unknowns there are things we do not know, we don't know. Donald Rumsfeld Are these results from scrutiny of the entire [English speaking] social webNo Are they results from a very large, sometimes stated, number of social sources? Yes Could this range be skewed relative to the subject under scrutiny? Yes Where its Twitter data is it from the whole of Twitter Maybe Is historical data always the same basis as current data, or data gathered since the search was defined? Not always Do we always have a good idea of what the Reach is? No
  • Slide 23
  • 23 Relevance Even when the application has collected exactly what we asked for, and it is legitimate content, with some nice useful data about the poster it might not be relevant Cats are great company. #EMT Bolt one cool cat! Also, the Cat is a great resort I love my aunt Cat! I think Cat Stark is worse than any Lanister. I think this hurricane was a scam cooked up by the fat cats in Big Grocer.
  • Slide 24
  • 24 put another way Oh s**t! I forgot its still the internet.
  • Slide 25
  • 25 Other challenges include However, commencing too early public smoking facts will just overstress your pet ; quite a fresh pet will not learn everything from services. Just after he has ended up perched for some a few moments, supply him with the particular take care of, plus for instance in advance of, make sure you compliment the pup. When dog house teaching your dog, continue to keep the dog house in the vicinity of the spot where you as well as the canine are usually conversing.
  • Slide 26
  • 26 And I havent mentioned automated Sentiment Analysis yet! Irony really? Slang/Dialect/Register Multiple meanings 50 strong Adjacent subjects My beautiful FIAT next to a BMW
  • Slide 27
  • 27 4. And what is Good, and what is not Good?
  • Slide 28
  • 28 To Recap SMM tools make it very easy to Super Google certain Brands, people, objects and even categories or concepts quickly generating tables and charts. But underneath theres a complex story about accuracy, reach and relevance which you only really see when you drill down and which you only really understand by getting inside the providers systems and sources. The fact that this isnt blazoned across all dashboards, is about the fact that many solution providers started out somewhere else with monitoring. Its not that they should have anticipated our needs. Sentiment analysis is only part of this story it doesnt define it.
  • Slide 29
  • 29 Relationships matter as much as technology Social Media Content 3 rd Party System [e.g. SaaS] 3 rd party organisation Vendor Dashboard-wielding MR Agency Clients FEEDS Queries and more refined requirements Reports [inc post hoc analysis] Results Modified searches Topic-specific feedback Customise Engine Customise Feeds Wholesalers?
  • Slide 30
  • 30 Natural Language Processing [NLP] to the rescue? Definition Specifically, it is the process of a computer extracting meaningful information from natural language input and/or producing natural language output* Many SMM applications now claim some level of NLP. *Warschauer, M., & Healey, D. (1998). Computers and language learning: An overview This may legitimately be contrasted with simpler analysis of vocabulary combinations, and probabilistic methods, it sometimes means little. It may only mean that some rules of language have been attended to in what is still essentially a pattern- matching exercise
  • Slide 31
  • 31 But clearly sophisticated NLP can make a big difference Improved Accuracy including filtering out of unstructured spam More tools available to achieve/check Relevance Much-improved Sentiment Analysis Trends: theres more NLP not just in social media analysis, theres more commercially affordable NLP and it keeps getting better, some of it is even helpfully self-auditing. Significantly, when NLP is set to retain only high-confidence classifications, volumes of results are dramatically reduced.
  • Slide 32
  • 32 Barking up the wrong Tree? Researchers instincts have been to use, and so judge, SMM like survey data. But what is good the ancient philosophers would tell us, is really about function and purpose. I think weve now learned enough about SMM to stop and ask.. what was it we were trying to do?
  • Slide 33
  • 33 Remind me what we are trying to do? Use the social web as a proxy for the population? Understand how the social web is responding for the benefit of those solely interested in this sub-set of the population as a channel or marketplace? Access particularly niches which are more concentrated online than off? Detect significant events? Measure shifts and changes? Make rough comparisons? Discover new insights, themes and connections?
  • Slide 34
  • 2012 GfK NOP 34 Different client needs indicate different SMM approaches For example - Precision Extraction vs Trawl & Filter Crude mention & mood tracking Quantitative - Brand tracking and integration with traditional research Indicative Qual e.g. using trends and volumes to guide focus of analysis Exploratory Qual more complex collection. Manually manageable volumes and tuning Higher data volumes from simple search terms Lower data volumes from targeted & compound search terms More post processing, applied to data by MR agency - to reduce noise and refine sentiment attribution Accept raw data output from application Not radical enough! Too much like hard work? Sensible
  • Slide 35
  • 35 Rather than wait for NLP utopia Settle, for now, on: 1.SMM as a powerful and novel Qual exploration tool 2.Big number crunching, on single terms, that takes a hyena approach. i.e. Accept all* occurrences of a brand or product name in posts as an indication of significance even the trending spam and the adverts and the competitions Look for pure correlations between words/phrases and other word/phrases Or between trends in these numbers and classes of offline events such as sales, complaints and other behaviours with a view to predicting, explaining or causing such events in the future. *Except for the most obvious duplication errors such as over-indexing
  • Slide 36
  • 36 5. Some Concquestions
  • Slide 37
  • 37 Talking Points How will commercial SMM applications and services with the best accuracy, reach and relevance capabilities be recognised, validated and promoted? If youre a researcher and you want to use this stuff, for the first time, tomorrow what must be done? Fortunately theres enough to learn by super-googleing, browsing and crude trend tracking to keep us going and learning for some time to come. Is that, whilst pragmatic, enough of an ambition?
  • Slide 38
  • 38 Dr Nick Buckley SoShall Consulting Tel: 07958 516967 t: @grimbold E: [email protected] Babita Earle Digital Strategy Director GfK NOP Tel: 020 7890 9467 E: [email protected]