Upload
diana-maynard
View
431
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Slides from my talk at LT-Accelerate 2014, Brussels
Citation preview
Social Media Analytics as a Service
Dr. Diana MaynardUniversity of Sheffield, UK
We are all connected to each other...
● Information, thoughts and opinions are shared prolifically on the social web these days
● 72% of online adults use social networking sites
● In Britain and theUS, approx 1 houra day on social media
● 90% of marketers usesocial media channels for business
Popularity of Social Networking Sites
Twitter● 284 million monthly active users● 100 million daily active users● 80% of world leaders use Twitter
Facebook ● 1.35 billion monthly active users. 864 million
daily active, 10 billion messages a day● 30% of Americans get their news from Facebook● Facebook has more users than the whole of the
Internet did in 2005
● Google+: 300 million monthly active users● LinkedIn: 332 million users● MySpace: 36 million users
● 44% of “users” have never sent a tweet
● 390 million users have no followers
● Google+: only 7 minutes per month
Your grandmother is three times as likely to use a social networking site now as in 2009
Why analyse social media?
● Contrary to popular belief, Twitter isn't just full of tweets about Justin Bieber.
● In an emergency, one in two people would use social media to let people know they were safe or to find out more information
● Less than 24 hours after the recent Nepal trekking disaster hit, Facebook and Twitter accounts had been set up to provide information channels, missing persons register etc.
● For companies, sentiment analysis tools are critical to keep track of the market pulse, customer feedback, etc.
● Fast-growing, highly dynamic and high volume source of data● Reflects language and current views of today's society● Analysing social media is far more efficient than e.g. youGov
polls
Opinion mining from social media
● Understanding customer reviews and so on is a huge business● But also:
● Tracking political opinions: what events make people change their minds?
● How does public mood influence the stock market, consumer choices etc?
● How are opinions distributed in relation to demographics?● Who are the opinion influencers?
● SMA tools are crucial in order to make sense of all the information
Social media analysis for journalists
● Twitter is immensely valuable to news professionals● gauging opinion on breaking news● discovering new stories● first hand reports from disasters,
war zones, ...● Issues of veracity: London Eye on Fire!
Analysing language in social media is hard
● Grundman:politics makes #climatechange scientific issue,people don’t like knowitall rational voice tellin em wat 2do
● @adambation Try reading this article , it looks like it would be really helpful and not obvious at all. http://t.co/mo3vODoX
● Want to solve the problem of #ClimateChange? Just #vote for a #politician! Poof! Problem gone! #sarcasm #TVP #99%
● Human Caused #ClimateChange is a Monumental Scam! http://www.youtube.com/watch?v=LiX792kNQeE … F**k yes!! Lying to us like MOFO's Tax The Air We Breath! F**k Them!
We need tools for hashtag analysis
● Hashtags need unravelling:● #gasprices
● And disambiguating:● #therapist● #nowthatcherisdead
Hijacking of hashtags
#earthhour2014
NER is dead! Long live NER!
NER on Tweets
● NER on Tweets much harder than on longer text● Very short, so ambiguous terms hard to interpret● Poor grammar and spelling, use of abbreviations, shorthands● Twitter-specific features: hashtags, @mentions, etc.● Tools designed for longer texts do very badly on Twitter
System P R F1 F0.5
OpenCalais 68.59 67.17 67.87 68.30
Lupedia 70.93 44.17 54.44 63.27
TextRazor 59.12 83.83 69.34 62.82
TwitIE 69.69 61.03 65.07 67.76
Zemanta 29.64 29.31 29.47 29.57
Tools for Sentiment Analysis
● There are lots of tools for sentiment analysis around● Many of them don't work well at more than a very basic level● They mainly use dictionary lookup for positive and negative words● ML methods only works for text that's similar in style to the
training data, and it's hard to understand when it goes wrong● Things like sarcasm tend not to get picked up● They classify the tweets as positive or negative, but not with
respect to the keyword you're searching for● keyword search just retrieves any tweet mentioning it, but not
necessarily about it as a topic● no correlation between the keyword and the sentiment
Sentiment Analysis in GATE
● Knowledge-based linguistic approach based on entity detection for opinion holders and targets
● Sentiment words have to be in a linguistic relation to the opinion holder and target
● Use linguistic analysis to deal with scope issues (negation, hashtags, sarcasm etc)
● Sentiment word scores are modified incrementally● Easy understanding of errors and adaptation of the rules● Twitter-specific pre-processing using TwitIE
This all sounds like it would be hard
to set up on my system!
GATE Cloud to the Rescue
● What?● end-to-end text and web processing solutions from the
GATE family running on cloud computing infrastructures.● Why?
● Solve any sort of text processing problem: web, text or opinion mining; indexing and search (fulltext, boolean, conceptual, structural); information extraction; semantic annotation; sentiment analysis; ontology population; etc.
● Run large-scale jobs without investing in server hardware or other fixed costs.
● Exploit a 15-year R&D programme, the expertise of the GATE community and a defined and repeatable process.
Benefits of Gate Cloud
Text Analytics Consumer
Cloud Large scale, no CAPEX, no system admin, no commitment
Open Source No vendor lock-in
TA Services Twitter, News, BioMed, Sentiment, etc.low-level pre-processing support (POS tagging etc)
APIs Integrate
20
Application Types
● Low-level: stemmers, PoS taggers, phrase chunkers, morphological analysers
● Coverage: tools for 18 languages including BG and RU● General Purpose IE: named entities, numbers,
measurements, language ID● Domain-specific IE: News, TwitIE, Biomed● LOD-based semantic annotation: DBpedia, GeoNames,
Freebase● Sentiment analysis● Summarisation● Includes many 3rd party tools also
On-demand document processing workflow
It's just like online shopping
● Click through to the online shop, browse products and add them to your shopping basket.
● Create an account and then buy credit vouchers● Put the vouchers in your account, and go to checkout. ● We'll email you the login or job creation details for your cloud servers.● Monitor and control your cloud machines on your dashboard.● Use our existing applications:
● Just upload your documents and sit down with a cup of tea● Create your own pipeline:
● Upload your own customised application along with your documents, and sit down with a cup of tea
23
Summary
● SMA tools are crucial, but hard to find what's good● Solutions are readily available in GATE● Easy to test different versions and configurations● Open source and easily customisable● Big data and installation problems are solved with GATE Cloud:
● PaaS for text analytics● Low barrier to entry● Just pay for what you use● State-of-the-art pipelines for news and social media● More pipelines constantly being added
Acknowledgements and more information
● GATE: http://gate.ac.uk● GATE Cloud: http://gatecloud.net● Annomarket: http://www.annomarket.eu● Research partially supported by the European Union/EU under
the Information and Communication Technologies (ICT) theme of the 7th Framework Programme for R&D (FP7) DecarboNet (610829) and AnnoMarket (296322)
● Original GATE Cloud development supported by JISC/EPSRC, reference number EP/I034092/1
This document does not represent the opinion of the European Community, and the European Community is not responsible for any use that might be made of its content