20
Understanding RSM: Relief Social Media -William Murnane -Anand Karandikar 15 September 2009

Understanding RSM: Relief Social Media

Embed Size (px)

DESCRIPTION

Understanding RSM: Relief Social Media. William Murnane Anand Karandikar. 15 September 2009. Objective. - PowerPoint PPT Presentation

Citation preview

Page 1: Understanding RSM: Relief Social Media

Understanding RSM:Relief Social Media

-William Murnane-Anand Karandikar

15 September 2009

Page 2: Understanding RSM: Relief Social Media

Objective

Build better sensors into emerging social media environments. These environments are increasingly important in Humanitarian and Disaster Relief (HADR) and Security, Stability, Transition and Reconstruction (SSTR) scenarios, providing real-time situational awareness. Deliver an analytic toolkit that can be integrated into the Human, Social, Cultural and Behavioral (HSCB) computational infrastructure

Page 3: Understanding RSM: Relief Social Media

Project Overview

• Joint venture by Lockheed Martin Advanced Technology Laboratories (LM ATL) and the University of Maryland, Baltimore County (UMBC)

• Team membersProf. Finin, Prof. Joshi – Principal Faculty, UMBC, CS DeptDr. Brian Dennis – Staff Computer Scientist , LM ATLWilliam Murnane, Anand Karandikar – Graduate students, UMBC, CS Dept

Page 4: Understanding RSM: Relief Social Media

Project Overview

What It’s Like Today

• HADR/SSTR response has focused on highly centralized, tightly coordinated organization.

• Responders– Domestic: FEMA, DHS, National Guard, State and Local,

NGOs– International: Army, Navy, USAID, NGOs

• Centralization slows response, throttles critical information, limits situational awareness

Adopted from Dr. Brian Dennis's slides

Page 5: Understanding RSM: Relief Social Media

Project Overview

What’s Changing• Response at the edge• Affected populace is using Internet/Web for communication

– Assuming network availability• Social media tools are being used for communication &

coordination• Example social media platforms:

– Twitter, Flickr, YouTube, open blogs– Social visibility + coordination + content

Adopted from Dr. Brian Dennis's slides

Page 6: Understanding RSM: Relief Social Media

Technical Approach

• Harvesting of Datai. Focus on social media like Twitter, Flickrii. Capture data that has relief contexts

• Computational modelsi. Generative model of social connections that can help building

forecasting tools

• Building Analytics Toolkitsi. Capabilities to analyze and mine sentimentii. Automated generation of appropriate confidence levels for

information extracted

Page 7: Understanding RSM: Relief Social Media

Twitter• Lots and lots of data:

Lots and lots of stuff nobody cares about: "omg, when I get home I am so going to blog about your new haircut." --Nick Taylor ... but maybe some stuff someone might care about. People talk about getting sick, wild fires, floods, etc, so maybe we can track that.

Page 8: Understanding RSM: Relief Social Media

Dataset #1: Twitter• Nicely segmented into tables:

users, locations, statuses.

• Referential integrity needs work:

• select count(*) from (select follower_id from user_relationships except select id from users) as missing_uids; Count ------- 24201

• Fairly big: roughly 1.5M users, 150M statuses, 1M locations. 30GB on disk.

Page 9: Understanding RSM: Relief Social Media

Current Progress

• Dataset loaded into PostgreSQL from MySQL• Fixed corruption problems • Gave full-text indexing on tweets a try in Postgres • Too slow: 72 hours for CREATE INDEX and no progress • May try again on new hardware • Lucene-based app to build and search indices

Page 10: Understanding RSM: Relief Social Media

Current ProgressStatus and speed of query Status and speed of query • Pretty Good performance: • ~35k rows/second while creating index on current hardware, quick

queries • Easy to write: 459 LOC counting the GUI, half that without it.

Page 11: Understanding RSM: Relief Social Media

Tweet index design

• Index only statuses: that's all we need to search quickly so far.

• Document ID: maps to SQL primary key on statuses

• Text: Analyze for words, do TF-IDF to order results.

• UID: Can filter by user at the query level rather than have to go ask the database. We don't know if this will be useful, but it doesn't hurt.

Page 12: Understanding RSM: Relief Social Media

Raw data for events of interestExample chosen here is ‘California Wildfires’

1. Twitter tweets for California wildfires2. Technocrati search for California wildfire videos3. Yahoo! Pipes mashup for California wildfires using Flickr data

Page 13: Understanding RSM: Relief Social Media

Twitter API methods• Search - Returns tweets that match a specified

query.• statuses/public_timeline - Returns the 20 most

recent statuses from users• statuses/show - Returns a single status, specified

by the id parameter• Trends - Returns the top ten topics that are

currently trending on Twitter

• GeoLocation API from Twitter by October 2009

Page 14: Understanding RSM: Relief Social Media

Facebook API methods

• Users.getStandardInfo – Returns users current location, timezone etc.

• Stream.get – if an user ID is specified it can return the last 50 posts from that user's profile stream.

• Status.get - Returns the user's current and most recent statuses.

Page 15: Understanding RSM: Relief Social Media

YouTube Data API

• To search for videos, submit an HTTP GET request to the following http://gdata.youtube.com/feeds/api/videos

• Example: California Fires

• Other parameters like location, location-radius can be added while building the query.

Page 16: Understanding RSM: Relief Social Media

GeoCoding API

• GeoCoding is a process of converting addresses like ‘1000 Hilltop Circle Baltimore MD’ to geographical co-ordinates which can be used to mark that address on the map.

• Google Map API: via GClientGeocoder object. Use GClientGeocoder.getLatLng() to convert a string address into latitudes and longitudes.

• Yahoo! Maps web service: Example: 701 First Ave Sunnyvale CA

Page 17: Understanding RSM: Relief Social Media

Similar InitiativesAirTwitter (Started in August 2009)• Designed to harvest user-generated content like tweets,

delicious bookmarks, flickr pictures and youtube videos that are relevant to Air Quality Uses

• Yahoo! Pipes for aggregated feed generation.• When events are identified, the location will be harvested

from contextual information in the feed such as a place name or as development evolves IP address of tweet.

• To further automate event identification, Air Twitter feeds will be archived in order to conduct temporal trend analysis that can be used to separate the background noise from AQ events in the social media stream.

Page 18: Understanding RSM: Relief Social Media

Similar Initiatives

Crisis Informatics• ConnectivIT Research Group at University of

Colorado, Boulder• Investigates the evolving role of information and

communication technologies (ICT) in emergency and disaster situations.

• Particular focus on information dissemination and the implications of ICT-supported public participation on informal and formal crisis response

Page 19: Understanding RSM: Relief Social Media

To Do

• Index locations, too? Lucene or SQL? • Better Analyzer: discard non-English (tricky!) and do

stemming (simple!) • Test on new hardware: SSD versus disk, for what

parts? • Higher-level abstractions: what Tweets are similar?

Build an ontology that things fit into, or search for particular things?

• Run human classifier for a while, then train machine classifier off that data.

• Geo-location in Twitter space

Page 20: Understanding RSM: Relief Social Media

Thanks