View
1.616
Download
1
Tags:
Embed Size (px)
DESCRIPTION
my presentation to NYU ITP Camp on 6-22-12
Citation preview
Moving from Unstructured Data into Structured Meanings and Data Stories
Marshall SponderWebMetricsGuru INC for
NYU ITP CAMP6-22-12
Marshall Sponder is the CEO/Founder of WebMetricsGuru Inc., a social solution design, social media analytics, web data analysis and SEO/SEM practice focusing on cutting edge market research and social media trend analysis.
He is the author of "Social Media Analytics: Effective Tools for Building, Interpreting, and Using Metrics", published by McGraw-Hill, 2012.
Marshall also teaches Social Media Analytics and Art at Rutgers University and UCI Irvine, Extension and is a frequent speaker at Analytics conferences internationally and in the United States.
Introduction – about me, besides being an ITP Camper….
Geo-located check-in data, somehow, got much harder to accurately capture, across the board, via listening platforms after last summer (but
it was always fragmentary, at best).Sysomos Map Query: 4sq.com/
Last Year did some data crunching using Radian6 and 4SQ check-ins – found context / story much easier to get via Geo-local data
My findings are that adding additional “dimensions” to the social data provides “context” that is often missing, because the social data is largely unstructured.
Also was able to look at “influencers” by the venues they habitually visited and their Twitter following.
Hitting People through multiple Channels, creates “meaning” and DejaVu.
NY Lottery online site
1. Pandora
2. NY Subway Train
Having cookies tracked across sites is probably doing something similar – but the idea is awareness, relevance and meaning are “created” by repetition across varied channels with in a certain frame of time.
Finding /Creating Meaning / Creating the Story using Social Media
Moment of Receptivity
At the moment of receptivity – your message, argument, proposition has a chance of being received and acted on. The story you create will be a mixture of what you wish to create, and what your recipients will make of it (how they will process it).
Some DataSome types Data (not all inclusive)
Printed / Written Data (ledgers, lists)
Learning/ Training Data (Education/University)
Financial, manufacturing, Legal and Legislative Data
Large and medium business/marketing silo data (business intelligence)
Search Engine and Web Analytics Data (structured and Page Based)
Social Media, Video, Audio and Geo Local (Mobile) Data
Big Data, including machine generated data (and Big Analytics)
Offline Data (verbal, observed) recorded or non-recorded
Unstructured Data
Structured Data
More work is req for unstructured data
How much Unstructured Data is there?
Types of Unstructured Data (some of these types you deal with at ITP Camp)
Examples of Structured Data (Businesses, Governments and Educational Institutions have a lot of experience with this kind of data) - Databases - XML data - Data warehouses - Enterprise systems (CRM, ERP, etc) Examples of Unstructured Data (no one has a corner of this type of data, yet – everyone is struggling with it). - Excel spreadsheets (one can argue this point – as Excel can have structure too) - Word documents - Email messages - RSS feeds - Audio files - Video files - Social Media Data (tweets, posts, photos, likes, shares, Near Field, Geo-fensing) - Mobile Data (check-ins, SMS, etc)
Add a Plan to help Structure to Data• Identity (who are you identifying?)
• What (are you measuring?)
• Where (are you monitoring?)(have enough data?)
• When (is it happening?)
• Why (is it happening?)
You A business, non-profit, Gov, an official, Industry, etc.
Some one else (depends how you want to define this)
Check-ins, mentions, posts, clicks, pulse data, etc
Visits, Page views, Unique Visitors, etc (perhaps a specific audience type)
Behavioral and Attitudinal data – much harder – some ITP experiments seem to go here -
Social Media Channels Location (where) Venue Type Situation/mindset
Real Time / Asynchronous
Seasonal Specific event Time not defined
Exploratory (don’t know – trying to find out)
You know why, but you want to know how much, be more tactical, effect specific changes
Business Goals? Art Goals, Effect Changes,
Would be nice if Social Data could interface into something like Isadora, don’t know if anything that does that yet, or what the metaphors for the data would be (a good direction, though, if someone wants to take that on).
I suspect such an interface would lend itself to “Big Data”
Maybe this taxonomy would work (http://behaviorgrid.org/)... But you have code the verbatim manually – unless you can program machine
learning to do it for you (wont be that accurate, though)
What Platforms Provide (not a complete list by any means)
• Geolocation (location, venue type, other friends)• Social – Sentiment, Volume, age/gender, some attempt at topic,
usually inadequate, text analytics• Web Analytics – Visit, Page view, Visitors (unique/new) Cookie
Location, pathing (on site only), correlation tools, search keywords, links (referrers), ecommerce tracking (on site only)
• Audience Measurement – via Ad Exchanges, online panels, demographics, psychographics, and geo-demographics.
• Census and Governmental data• Financial Data – Wall Street, • Market Research – Traditional – Forums, Polls, opinioned analysis
based on sampling (political polls, ie).• Market Research – New – Big Data – try to find out hidden
patterns (ie: people who fix their roofs have less car accidents and get cheaper car insurance, stuff like that).
For ITP – Suggested Taxonomy• Author / Artist / Subject / Activity• Place (location)• Time (timeframe)• Type of Action / Behavior• Persona (this will have to be defined)• Medium (i.e.: GSM/Mobile, Projection, etc)• Subject / Area• Purpose (recreational, exploratory, consciousness
raising, etc)• Etc, etc, etc (these need to be further defined)
Two-Tiered Segmentation
Visitor Type
Product Directed
Category Directed
Early Stage Research
Discount Shopper
High-Ticket Buyer
Visitor Record
Prestige Giver
SMB Shopper
Brand Loyalist
Visit Types
Merging customer and visit type segmentation creates a two-tiered segmentation framework that becomes the core of our data model
Medium is the Message - McLuhan• How your measuring and viewing affects what you see, what you find.
Many ITP experiments are directly impacted by the method/medium being used. In emerging media, especially, due to “unstructured” aspect of it, tools shape the data (and insights).
• What tools or platforms are you using? What are tools are platforms can you use (are at your disposal)?
• What is your budget for the tools platforms and people?• Are you in control of the measurement process yourself, or are you
depending on others to execute it for you?• Do you have a framework to put all this data in? That’s pretty
important.
IMPLICATION: Choice of tool or platform profoundly shapes the results of your experiment or project
What’s the Use Case? Pick One (or add a new use case)
TypeBehavioral
Use cases from a Tools/Platform perspective
Consumer Research
Listening for Insights
Rich Categorization
NLP (machine learning)
Social Media Coverage*
PR Monitoring & Support
Listening and Engagement
Traditional Media
Coverage*
Influencer Identification
Topic Categorization
Social CampaignsAutomating of the
engagement
Workflow
Operational Metrics
Low-Latency
Care of Gary Angel – Semphonic.com
Full Service
Examples:
Source: Semphonic.com
Not so much an issue for ITP, but many organizations end up buying the same data from multiple vendors (over and over) (something to avoid, if
you can)
Goal(s): Audience:
Location: Timing :
Vehicle (how your going to do it): Venues (where your going to do it):
Message (Call(S) to Action):
Product / Service / Program
Metrics/KPI’s
among
through/ with
ask fans and customers to
Regarding our
Where Success will be judged by
In some cases, a high level plan (similar to a 1 minute pitch) might help to add structure and meaning to what your going to try to do (even here at ITP Camp or ITP, in general)
Salvage the reputation of the Romanian 20th century composer, George Enescu
Classical music institutions, enthusiasts, and musicians alike
Ideally GeorgeEnescu.com A 6 month campaign period
Online videos, online networking, podcasts, musicological research
Personal blog, radio stations, YouTube, musicological conferences, etc.
Enescu’s art ought to be enjoyed and celebrated as the work of a deserving, 20th-century master
Program to promote the musicians and orchestras who wish to explore Enescu’s work
Popularity on Google Trends
New business connections and partnerships
Goal: Audience:
Location: Timing:
Vehicle: Venues:
Message:
Program:
Metrics/KPI’sNew visitors to website
Among
Through/ With
Ask fans and customers to
Regarding the
Success will be judged by
Youtube statistics
Example of a Student’s Goal – Resurrecting George Enescu’s Work
And what is a Plan, Anyway?
85
We’re drinking from the social media fire hose
Massive data to process and make sense of it all But … We Don’t Need to boil the ocean!
New Solutions Lie in …• Adding additional dimensions to the data (i.e.:
time, place)• Adding Custom Taxonomies, Lexicons and data
mashups helps, if done well and cleanly• Customizing the source data feeds• Customizing Data Extraction from Pages Crawled• Defining what your goals are• Defining what, when, where and how your going to
accomplish your goals• Define your Key Performance Indicators that tell
you if you hit or missed your goal targets
Internet Abundant with Predictive Signals
Beyond Listening: Reinventing Social Media Monitoring
If a status update reaches a social network but no one sees it, does it exist?
Are people using the wrong solutions to
determine what people are saying?
@listening as a use case … Why Bother??
“the problem with social is that there is so much data - there’s 40 or 50 data points that you can measure and you have to figure out whether they are important. Some of those measurements are fundamentally not important.
Pain!!
“I’m drowning in data and documents from the internet but I need actionable insights”
• Broad listening across the internet
• Focused on keyword matches
– Mentions of• Brand name “Starbucks”• Product names “Frappuccino”
• Produces valuable insights, but is exploratory in nature, as a result, it can not answer tactical questions and is not scalable.
Blogs
Social Networks
News Sites Trade Sites
Forums Press
No ProcessSuccess Undefined90 % Unstructured
Time ConsumingHard to Scale
esp. at the beginning
Problems we all face with Social
“Lens” approach using Boolean queries and saved
datasets don’t seem to work very well
http://www.youtube.com/watch?v=4Y-SVxnVOv8
"housing solution"~2 AND "rhode island" AND "foreclosure", "road home program"~3 AND "foreclosure", "home loan modification"~4 AND "foreclosure", "jobless rate"~3 AND "foreclosure", "bankrupcy" AND "foreclosure" AND "housing" AND "obama", "rhode island housing"~3 AND "forclosure", "foreclosure prevention funds"~5, "bank foreclosures rhode island"~4 AND "obama", "selling house"~4 AND "foreclosure" AND "obama", "hardest hit fund"~4, "national foreclosure mitigation"~6, "homeowner stability initiative"~5 AND "obama", "roadhome program"~2, "hud homes rhode island"~3 AND "obama", "foreclosure settlement"~4 AND "25 billion"~2 AND "obama", "fannie mae freddie mac"~10 AND "foreclosure", "keeping people in their homes"~4
Radian6 Query on Foreclosures in Rhode Island
Monitoring has become too complex
And we don’t get our “Pie in the Sky”
Recorded Future
Web is Loaded with EventsSilicon Valley executives head to Vail, Colo. next week for the annual Pacific Crest Technology Leadership Forum
The carrier may select partners to set up a new carrier as early as next month
“2010 is the year when Iran will kick out Islam. Ya Ahura we will.”
“... Dr Sarkar says the new facility will be operational by March 2014...”
Drought and malnutrition hinder next year’s development plans in Yemen...
“...opposition organizers plan to meet on Thursday to protest...”
“Excited to see Mubarak speak this weekend...”
“According to TechCrunch China’s new 4G network will be deployed by mid-2010”
“Strange new Russian worm set to unleash botnet on 4/1/2012...”
Recorded Future Architecture
70,000 Real-time Sources
3+ Billion Time-tagged Facts
100,000 future events/day
Baghdad Next Week - Google
Baghdad Next Week – Recorded Future
Baghdad Next Week – Recorded Future
Mobile and Tablets - Next three years Huge market segments still emerging
• Over 75% of businesses plan on deploying tablets by 2013• Revolutionizing health care delivery, on-site and mobile • Disrupting software engineering and user expectations
VenueLabs – G
eo-Location AnalyticsActionable Intelligence gleamed from
LBS instead of exploratory insights from SMM
Topic Sentiment Influence
Engagement
They mess up here A LOT! If I wasn’t in a rush nor a coffee addict I would go somewhere else!
Traditional Insights Location Date /Time
Staff Working ManagersLocal Context Unit Sales
Nearby Competitors
New Insights
VenueLabs solves Local Data Gap Example
They mess up here A LOT! If I wasn’t in a rush nor a coffee addict I would go somewhere else!
70% data is UV
Verified the Local Data Gap
The text of the verbatim don’t help much since they we can’t tell where this actually was taking place without looking at the additional short url and creating a context – which the software, today, usually isn’t able to do.
New York Art Instance - VenueLabs
Most active Museums?
Local Data Analytics of Museums – adding location automatically makes info more
actionable (context)
Facebook & Twitter
Smoking Cessation Phases Smoking Cessation Patient Journey Stage 1
Behavioral: Cold TurkeyPatient Journey Stage 2
Behavioral: OtherPatient Journey Stage 3
Over The CounterPatient Journey Stage 4
RX
Side Effects of Smoking Cessation
Craving comfort food, nicotine, fear of weight gain, quitting and wondering how friends and family will view decision.
Respondents are actively seeking to quit smoking by fear the physical and psychological side effects.
Respondents in stage 3 are settling on a treatment option presenting the least side effects as possible or going cold turkey.
smoking cessation is the main choice at stage 4 (based on our listening) but many respondents are having problems staying on the regime due to side effects.
Choosing the right medication for Smoking Cessation
Respondents are looking for a way to stop smoking but are confused with options and asking for advice.
Patients are suffering the side effects associated with Nicorette, nicotine patches, smoking cessation or cold turkey
In Stage 3 Smoking Cessation decisions complicated by product bans for e-cigarettes and smoking cessation in some communities and occupations
In Stage 4 smoking cessation side effects are the main issue respondents have, with men appearing to do better with the treatment than woman. Negative press over side effects of smoking cessation are upsetting - making many rethink their decision to stop smoking,
Available Options for Smoking Cessation are Confusing
In Stage 1 respondents are seeking guidance on all the available treatment options and making a decision on which one(s) to try.
In Stage 2, the overwhelming choice of Smoking Cessation treatment option is Hypnosis, with second most popular treatment being Nicorette and then smoking cessation.
In Stage 3 use of Electronic Cigarettes followed by Nicorette gum as the most popular treatment according to our listening reports.
In Stage 4 Patients struggle with the side effects of smoking cessation treatment, itself. Some patients complete treatment successful but others do not and are dissatisfied with their progress.
Getting advice on the right treatment options for Smoking Cessation
Online respondents are going on blogs, twitter and forums looking for people who have experiences taking drugs for Smoking Cessation so they can get information on the right approach to take.
In Stage many side effects associated with each treatment are evident and respondents are grappling with which choice to make - often going with hypnosis first.
Patients in stage 3 have tried treatments and are sharing their experiences struggles and successes with Smoking Cessation.
In Stage 4 just about all the information on smoking cessation is negative, although that does not stop many patients from taking the drug, but many are stopping once they experience side effects.
Defining Key WordsPatient Journey Stage 1
Behavioral: Cold TurkeyPatient Journey Stage 2
Behavioral: OtherPatient Journey Stage 3
Over The CounterPatient Journey Stage 4
RX
"quit smoking"
"smoking cessation"
"cold turkey" AND "smoking" AND "quit on my own“
OR
“jotharmcoe”“smoking cessation”“mytimetoquit.com”“smoking cessation.com”“let’s quit together now”“get quit clinic”“qut-quit.com”
"counseling" AND "smoking"
"cutting back" AND "smoking"
"nicotine free-cigarettes"
"homeopathic remedies" AND "quit smoking" AND "stop smoking“
“hypnosis” AND “smoking” AND “-weed”
"embarrassed to" AND "doctor" AND "smoking"
"quit line" AND "smoking"
"herbal remedies" AND "quit smoking" AND "stop smoking"
"support group" AND "smoking"
"snus" AND "smoking"
"nicotrol“ (1 mention)
"e-cigarette" AND "-buy" AND "-"buying“
"stop smoking gum"
"" nasal sprayAND "smoking“(1 mention)
"nicoderm"
"lozenge" AND "smoking“
"patch" AND "smoking"
"inhaler AND "smoking"
"Nicolette“(strongest keyword)
"nicotine replacement therapy" AND "smoking"
"smoking cessation"
"buproban" AND "-order"
Location of Conversation for Each Stage of the Patient Journey
Stage 1 Stage 2
Stage 3 Stage 4
Behavioral: Cold Turkey Behavioral: Other
Over The Counter RX
Putting targeting into action RI Primary = Create The Story (Be The Story)?
RI District 1
80,120? Over 50
Members of Facebook over 49 years old in RI District 1 = 102,920 (caveat: there are a few zip-codes in both districts)
1. Hit potential voters in District 1 with issue targeted sponsored stories for AWARENESS only (expect little if any Clicks)
2. Blanket Zipcodes with mailing (post office now does this).
3. Use Venuelabs to sift checkin data and find voters –cross link to voting list when possible.
4. Categorize (data mining – persuadable?)5. Reach out / Community management – etc6. Set up tracking (i.e.; Campalyst – next slide)
Connecting Engagement To Conversions campalyst.com
Google Social Reports Cannot connect the dots to ROI (yet) though Campalyst, Can.
Google Social Reports Cannot connect the dots to ROI (yet) though Campalyst, Can.
Not enough information – Google cannot connect the dots back to the original post that generated the referral, but Campalyst does.
Campalyst ties cause and effect for Twitter and Facebook better than any other platform I’ve yet seen – marshall sponder – WebMetricsGuru.com
Campalyst can also find the brand advocates that generate the most traction and engagement for a brand or website.
Summary• The Future of Analytics is with
Actionable Data
• Actionable Data comes from adding contextual information and metadata in meaningful ways related to your business or organizational goals.
• You need a Plan (the right one) to execute, together with the metrics, audience, timing, venue, program /vehicle and KPI’s to succeed with Analytics of any kind.
Examples of Platforms (you can play with these later)Radian6 Basic
Sysomos Map
Brandwatch
Campalyst
Infinigraph
6dgree
Netbase
PeekAnalytics
Traackr
mPact
Venuelabs
Marshall SponderWebMetricsGuru INC.
www.webmetricsguru.comwww.smabook.com
[email protected]@webmetricsguru@smanalyticsbook
WebMetricsGuru.com
WebMetricsGuru INC.