47
SocialSensor: Sensing User Generated Input for Improved Media Discovery and Experience Social Multimedia Crawling & Mining EventSense: Capturing the Pulse of Large- scale Events by Mining Social Media Streams Yiannis Kompatsiaris, Project Coordinator s 2013 Summit on Digital Innovation for Government, Business and Society

SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

Embed Size (px)

DESCRIPTION

SocialSensor: Sensing User Generated Input for Improved Media Discovery and Experience Social Multimedia Crawling & Mining EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

Citation preview

Page 1: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

SocialSensor: Sensing User Generated Input for Improved Media Discovery and Experience

Social Multimedia Crawling & Mining

EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

Dr. Yiannis Kompatsiaris, Project CoordinatorSamos 2013 Summit on Digital Innovation for Government, Business and Society

Page 2: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#2

Overview

• Motivation• Objectives • Architecture• Use Cases and Requirements• News

– Social Multimedia Crawling & Mining• Infotainment

– EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media Streams

• Conclusions

Page 3: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#3

What is SocialSensor?

• 3-year FP7 European Integrated Project– http://www.socialsensor.eu

• Members: CERTH, ATC (Greece), Deutsche Welle, University Koblenz, Research Center for Artificial Intelligence (Germany), The City University London, Alcatel – Lucent Bell Labs, JCP Consult (France), University of Klagenfurt (Austria), IBM Israel, Yahoo Iberia + Robert Gordon University Aberdeen (UK)

• 1.5 years into the project (Development of user requirements, use case scenarios, architecture and implementation and first R&D components and prototypes. Currently: Evaluation and 2nd round)

Page 4: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

Motivation: Social Networks as Sensors

• Social Networks is a data source with an extremely dynamic nature that reflects events and the evolution of community focus (user’s interests)

• Transform individually rare but collectively frequent media to meaningful topics, events, points of interest, emotional states and social connections

• Mine the data and their relations and exploit them in the right context

• Scalable mining and indexing approaches taking into account the content and social context of social networks

Page 5: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

Relevant ApplicationsXin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social multimedia: using flickr for prediction and forecast, International conference on Multimedia (MM '10). ACM.

Federal Emergency Management Agency plans to engage the public more in disaster response by sharing data and leveraging reports from mobile phones and social media

5

“…if you're more than 100 km away from the epicenter [of an earthquake] you can read about the quake on twitter before it hits you…”

Page 6: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

Objective

SocialSensor quickly surfaces trusted and relevant material from social media – with context

DySCODySCO

behaviour

location

timecontent

usage

social context

Massive social mediaand unstructured web

Social media miningAggregation & indexing

News - InfotainmentPersonalised access

Ad-hoc P2P networks

Page 7: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#7

The SocialSensor Vision

SocialSensor quickly surfaces trusted and relevant material from social media – with context.

•“quickly”: in real time•“surfaces”: automatically discovers, clusters and searches •“trusted”: automatic support in verification process•“relevant”: to the users, personalized•“material”: any material (text, image, audio, video = multimedia), aggregated with other sources (e.g. web)•“social media”: across many relevant social media platforms•“with context”: location, time, sentiment, influence, trust

Page 8: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#10

Conceptual Architecture and Main components

SEMANTIC MIDDLEWARE

Public Data

In-project Data

SEARCH & RECOMMENDATION

USER MODELLING & PRESENTATION

INDEXINGMINING

STORAGE

DATA COLLECTION / CRAWLING

• Real time dynamic topic and event clustering

• Trend, popularity and sentiment analysis

• Calculate trust/influence scores around people

• Personalized search, access & presentation based on social network interactions

• Semantic enrichment and discovery of services

Page 9: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

DySCO concept

• Integrate social content mining, search and intelligent presentation in a personalized, context and network-aware way, through the new concept of Dynamic Social COntainers (DySCOs)• Composite objects containing a number of items (e.g. articles,

tweets, images, videos)• Focused on a particular topic of interest (e.g. an event, a story)• Contain all available information about the topic• Metadata can be added dynamically• Ability to search for DySCOS by matching “DySCO features” with

“search features”• Ability to check and recommend similar DySCOs

recommendations

#11

Page 10: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#13

Use Cases: News

Page 11: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#14

“It has changed the way we do news”(MSN)

“Social media is the key place for emerging stories – internationally, nationally, locally” (BBC)

“Social media is transforming the way we do journalism”(New York Times)

Source: picture alliance / dpa

Page 12: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#15

Page 13: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#16

Source: Getty Images

“It’s really hard to find the nuggets of useful stuff in an ocean of content” (BBC)

“Things that aren’t relevant crowd out the content you are looking for” (MSN)

“The filters aren’t configurable enough” (CNN)

Page 14: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

Verification was simpler in the past...

Source: Frank Grätz

#17

Page 15: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#18

An example: BBC Verification Procedure: Arab Spring Coverage• Referencing locations against maps and existing images

from, in particular, geo-located ones.• Working with our colleagues in BBC Arabic and BBC

Monitoring to ascertain that accents and language are correct for the location.

• Searching for the original source of the upload/sequences as an indicator of date.

• Examining weather reports and shadows to confirm that the conditions shown fit with the claimed date and time.

• Maintaining lists of previously verified material to act as reference for colleagues covering the stories.

• Checking scenery, weaponry, vehicles and licence plates against those known for the given country.

Page 16: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

News application• Real-time Search/browse news items crawled from different social media • Automatically discovered trending topics• Web analytics• Sentiment scores for topics

#19

Page 17: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

Alethiometer• Measuring the degree of truth behind tweets• Overall trust score for a tweet• Various Contributor, Content, Context validity metrics

#20

Page 18: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#21

Social Multimedia Crawling & MiningCase study: #OccupyGeziE. Schinas, S. Papadopoulos, I. Tsampoulatidis, K. Iliakopoulou, Y. Kompatsiaris

Page 19: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#22

Multimedia crawling & mining

• Monitor/query multiple sources for shared media content: Twitter, Facebook, Flickr, YouTube, etc.

• Multiple indexing schemes:– text-based (Solr)– visual content-based (SURF+VLAD for feature extraction,

ADC for similarity-based indexing)• Clustering

– geo-spatial (BIRCH)– visual (SCAN)

• Web-based presentation of results

Page 20: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#23

Crawling & mining system deploymentStreamManager

Twitter Facebook Flickr YouTube RSS Instagram160.xx.xx.207

MongoDBWrapper160.xx.xx.207

TextIndexer (Solr)160.xx.xx.207

160.xx.xx.207

MediaFetcher, FeatureExtractor (HDFS)160.xx.xx.58 160.xx.xx.107

Social Focused Crawler (HDFS)160.xx.xx.187

Nutch

Nutch VLAD

FeatureIndexer (HDFS)160.xx.xx.207

IVFADC

Data Mining160.xx.xx.191

Visual Clust. Geo Clust. Statistics

Web server160.xx.xx.116

API (3)API (4)

API (1) API (2)

Page 21: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#24

#OccupyGezi

• Monitors: Keywords: gezipark, taksimgezipark, Taksim, Taksim Gezi ParkLocation: Istanbul

• Current statistics:

Page 22: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#25

Geographical spread of event

• Seems like it is not a localized event (as several official Turkish news sources claimed), but spreads all over Turkey and even in major cities abroad

Page 23: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#26

Different granularities

Page 24: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#27

Trending media by use of clustering

Page 25: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#28

Visual Memes

Page 26: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#29

Statistics

Page 27: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#31

Use Cases: Infotainment

Page 28: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#32

Capturing & mining large-scale events

• Large-scale events attended by thousands of people captured by mobile devices in the form of status updates, photos, ratings, etc.

• Challenge: – Organize information around

entities of interest– Extract meaningful insights,

obtain informative summaries

• EventSense framework

Page 29: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#33

Infotainment• Thessaloniki

International Film Festival – 80,000 viewers / 100,000

visitors in 10 days– 150 films, 350 screenings

• Fete de la Musique Berlin– 100,000 visitors every

year– 5,000 musicians

Page 30: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#35

ThessFestThessFest• Thessaloniki

International Film Festival

• Support twitter/comment usage within the app

• Ratings and comments per film

• Feedback aggregation– Votes– Tweets

• Real-time feedback to the organisation and visitors

Page 31: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#36

ThessFest

• Gather “realistic” user requirements• Early showcase and evaluation of SocialSensor

technologies in real-world event scale• Engage users and create an informed user basis• TDF14, 15 + TIFF53

– 1400+ users – 40K+ user sessions– positive response to social media

• Next version– Updated features bases on SocialSensor prototype

Page 32: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

Fête de la Musique Berlin app• FETEberlin in App Store and Google Play• More than 100K visitors• About 5K musicians• More than 5K app downloads

App features•Browse and filter detailed program•Interactive maps and routing •Social Sharing•Artists’ and Stages Details•Social MonitoringMain benefits for attendants•Visitors can browse through maps and don’t get lost as stages are numerous•Event schedule is available always and per stage

– Very useful when the server was down and there was no access to the online schedule

#37

Page 33: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

Fête de la Musique Berlin app

Page 34: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

FETEberlin Facts & User FeedbackUnique Users Sessions Frequency of Use

App Store 2904 13751 2,5 sessions per day

Google Play 2210 12097 2,9 sessions per day

Total 5114 25848 Avg: 2,7 sessions per day

Future Plans•Enhanced Event and Visitor Engagement•Send Last minute updates•Create buzz around the event and make users the event ambassadors•Gain insightful knowledge on the impact that the event made via social media analysis

– Organize better events

#39

Page 35: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#40

EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media StreamsCase study: Thessaloniki International Film Festival

E. Schinas, S. Papadopoulos, S. Diplaris, Y. Kompatsiaris, Y. Mass, J. Herzig, L. Boudakidis

Page 36: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#41

Entity Detection

• Entities are defined as lists of properties:– a film consists of a title, description, names of

director(s)/actors• Matching status updates (tweets) to entities relies on

representing both as vectors, using cosine similarity, and thresholding:

m: message (tweet), f: feature (term), M: set of all event messagesboost(f): boosting factor when f is a named entity

Page 37: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#42

Topic detection

• For each new message, find the Nearest Neighbour (NN) using Locality Sensitive Hashing (LSH)

• If similarity exceeds an empirically selected threshold, assign to the topic of NN, otherwise create new topic

• Clusters of one messages are discarded as outliers• Cluster-merging is conducted as a post-processing

step to compensate for topic over-segmentation• Similar approach to (Petrovic et al., 2010)S. Petrovic, M. Osborne, V. Lavrenko. Streaming first story detection with application to twitter. In Human Language Technologies, HLT ’10, pages 181–189, Stroudsburg, PA, USA, 2010. ACL

Page 38: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#43

Sentiment detection (1/2)

• Build positive/negative sentiment classifiers using emoticons

• Build neutral classifier using positive/negative classifiers

• Feature extraction: – Remove stop words, emoticons, terms occurring only

once, trim repeated letters– Negation terms (“not”, “isn’t”) are attached to subsequent

terms to form new unigrams (e.g. “nothappy”)– Treat user mentions, URLs, punctuation, repeated letters

and all-caps words as additional features

Page 39: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#44

Sentiment detection (2/2)

• Naïve Bayes classifier per sentiment• P(f|c) estimate using ML and Laplace correction

• Probability estimate for special features (e.g. user mentions) using Bernoulli model and Laplace correction

• Classification using maximum log-likelihood• Neutral messages:

– Mutual Information (MI)– MI of features– Sentiment intensity of message– Use thresholding for decision

Page 40: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#45

Evaluation

• Case study: 53rd Thessaloniki International Film Festival (TIFF53), Nov 2-11, 2012

• 168 films included in the program (titles, descriptions in Greek and English)

• 3974 tweets using #tiff53• Manual annotation regarding:

– film– sentiment (pos/neg/neut)

• Additional data using ThessFest mobile app:– #bookmarks per film (number of times a user added the film to their

schedule)– #ratings + avg. rating per film

Page 41: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#46

Tweet-film matching

• film = <title, description, directors, actors>• Multiple entity representations using Greek/English/both, uni-/bi-grams• Similarity threshold sensitivity analysis

Pooling multiple representationsthreshold (0.1, 0.3)

Page 42: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#47

Topic analysis

• Top-10 topics• Manual inspection

of clusters:– 53.8% of topic titles

considered informative

– 98.5% of clusters were found to be “clean”

• Topics in time

Page 43: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#48

Sentiment analysis

• Training (using emoticons and Twitter API)– 800K positive & negative tweets for English– 12K positive & negative tweets for Greek

• Tuning (for threshold)– Manually annotated dataset from Thessaloniki Documentary Festival

(similar event)– 325/73/553 in English and 781/216/781 in Greek

• Testing– 324/33/724 in English and 901/315/1667 in Greek

– Best accuracy (English) ~ 0.75– Performance in Greek much poorer

compared to English need for richer training corpus

pos neg neut

Page 44: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#49

Aggregation & summarization (1/2)

#T: number of tweetsPol: polarity of film tweetsSubj: subjectivity of film tweetsR: average rating#R: number of ratings#F: number of times the film was bookmarked

• Films with positive polarity are rated higher. • Films that are tweeted a lot are also more likely to be rated. • Films that are tweet a lot are also more likely to be added to the users’ bookmarks.

Page 45: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

#50

Aggregation & summarization (2/2)

Most active & influential Twitter accounts (+sentiment per user)

Most shared photos (+number of retweets)

Page 46: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

Conclusions

• Great interest in both use cases• In news social media have transformed both news generation and

consumption• Social media data mining can provide interesting results in

many applications• Not all data always available (e.g. User queries, fb)

– Infrastructure, Policy issues• Technical challenges

– Fusion (multi-modality, context), real-time, noise, big data, aggregation (web, Linked Open Data)

• Applications challenges• User engagement, visualization, become part of existing workflows,

privacy, copyright, commercialization

Page 47: SocialSensor Project: Sensing User Generated Input for Improved Media Discovery and Experience - Social Multimedia Crawling & Mining - EventSense: Capturing the Pulse of Large-scale

Thank you!