66
Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Embed Size (px)

Citation preview

Page 1: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Ubiquitous Human Computation

KAIST KSE Uichin Lee

May 11, 2011

Page 2: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Outline

• Review recent papers:– Crowd-Sourced Sensing and Collaboration Using Twitter,

WoWMOM 2010 – Earthquake Shakes Twitter User:

Analyzing Tweets for Real-Time Event Detection, WWW 2010 – Location-based Crowdsourcing: Extending Crowdsourcing to

the Real World, NordiCHI 2010– Social Sensors and Pervasive Services: Approaches and

Perspectives, PerCol 2011• Understand the potential of ubiquitous human

computation (+social networking)

Page 3: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Crowd-Sourced Sensing and Collaboration Using Twitter

Murat Demirbas, Murat Ali Bayir, Cuneyt Gurcan Akcora, Yavuz Selim Yilmaz

SUNY BuffaloWoWMOM 2010

Slides are based on http://www.cse.buffalo.edu/~demirbas/presentations/twitter.pdf

Page 4: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Cellphones!

• 3-4B cellphone users worldwide

• 1.13 billion phones sold in 2009 (36 per sec) vs 0.3 billion PCs

• 174M were smartphones– 15% (up from 12.8% in 2008)– Expected to exceed # feature

phones

Page 5: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Status quo in cellphones

• Each device connects to the Internet – to download/upload data and – to accomplish a task that does not require

collaboration and coordination

Page 6: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

What is missing?

• An infrastructure to assist mobile users to perform collaboration and coordination ubiquitously

• Any user should be able to search & aggregate the data published by other users in a region

Page 7: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Our goal

• To provide a crowdsourced sensing and collaboration service using Twitter

• To enable aggregation and sharing of data; dynamically assign sensing tasks to other cellphone users

Page 8: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Why Twitter?

• Open publish-subscribe system: 105 million users, over 30 million users in US, 55 million tweets 600 million search queries everyday

• Each tweet has 140 char limit• Twitter provides an open source search API and a

REST API (that enables developers to access tweets, timelines, and user data)

• Different actors may integrate published data differently and can offer new services in unanticipated ways

Page 9: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Crowdsourcing architecture

Page 10: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Sensweet

• Employs the smartphone’s ability to work in the background without distracting a mobile user– Sense the surrounding environment and send the resulting

data to Twitter • To search and process sensor values on Twitter, we

need to agree on a standard for publishing these sensor readings– Bio-code: Uses Twitter bio sections & allows users to search

for the sensors they are looking for on-the-fly– TweetML: Uses pre-defined hashtags to improve

searchability

Page 11: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Askweet

• Accepts a question from Twitter – tries to answer the question using the data on

Twitter, potentially data published by Sensweets– if that is not possible, Askweet finds experts on

Twitter and forwards the question to these experts (not clear how this was done in the paper)

• Parallelizable, easy to “cloudify” for scalable service provisioning

Page 12: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Applications

• Crowdsourced weather• Noise map application• Location-based queries (with Foursquare)

Page 13: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

1. Crowdsourced weather

• Current weather, everybody on Twitter can be an expert

• Question to Askweet: “?Weather Loc:Buffalo,NY”• Forwarded question:“How is the weather there now?

reply 0 for sunny, 1 for cloudy, 2 for rainy, and 3 for snowy

Page 14: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

http://ubicomp.cse.buffalo.edu/rainradar

Page 15: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Experimental results for NYC in different

time slices

Page 16: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

2. Noise map application

• Implemented a Sensweet client for the Nokia N97 Smartphone series

• Sensweet client detects a noise level of the surrounding environment and forwards this data to Twitter in the TweetML format

• Sound sample is classified into: Low, Medium, High state– Each level is modeled using normal distribution– Input signal is compared with 3 distributions (Low,

Medium, and High)

Page 17: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Noise map application

Page 18: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Noise levels for a user

Page 19: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

3. Location based queries

• Factual vs. non-factual queries– Factual: “hotels in Miami”– Non-factual: “Anyone knows any cheap, good

hotel, price ranges between 100 to 200 dollars in Miami?” • Traditional search engine performs poorly!

• Significant fraction of location-based queries (in Twitter) is non-factual– e.g., 63% of the queries were non-factual, while only 37%

of them were factual (manual classification of 269 queries)Crowdsourcing Location-based Queries, Bulut et al., Pervasive Collaboration and Social Networking, 2011

http://www.percom.org/proceedings/workshops/papers/p490-bulut.pdf

Page 20: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Location based queries

• Aardvark uses a social network of the asker to find suitable answerers for the query and forwards this query to the answerers, and returns any answer back to the asker.

• How about Twitter + Foursquare?– Use Foursquare to determine users

that frequent the queried locale and that have interests on the queried category (e.g., food, nightlife)

– Find a right set of people to ask!

Page 21: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

[Questions to be asked]

[Users]

[Valid questions] [Valid answers]

[Questions detected] [Answer detected]

[Answer to be forwarded]Moderator

Asker

tweet starting with ?keyword checking (anyone,

suggestion, where)

label the category and quality of questions

forwards validated questions to appropriate people (using

Twitter bio or Foursquare info)

Constantly polling Twitter account to check answers

1

2

3

5

6

74

Page 22: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Experiment Setup

• Question dataset consists of 269 questions that the system collected over Twitter and validated as acceptable by the moderators.

• Manually categorize questions as factual and nonfactual: 63% - non-factual; 37% factual

• Some examples of questions for each type.

Page 23: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Foursquare Reply Rate vs. Random User Reply Rate Foursquare

Page 24: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Response Time

• 13 minutes median response time which is comparable with Aardvark

• 50% of the answers were received within the first 20 minutes.

Page 25: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Earthquake Shakes Twitter User:Analyzing Tweets for Real-Time Event Detection

Takehi Sakaki Makoto Okazaki Yutaka Matsuo@tksakaki @okazaki117 @ymatsuo

Tokyo UniversityWWW 2010 Conference

Page 26: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

What’s happening?

• Twitter– is one of the most popular microblogging services– has received much attention recently

• Microblogging – is a form of blogging

• that allows users to send brief text updates

– is a form of micromedia• that allows users to send photographs or audio clips

• In this research, we focus on an important characteristic real-time nature

Page 27: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Real-time Nature of Microblogging

– Twitter users write tweets several times in a single day.– There is a large number of tweets, which results in many

reports related to events

– We can know how other users are doing in real-time– We can know what happens around other users in real-time.

social events parties baseball games presidential campaign

disastrous events storms fires traffic jams riots heavy rain-falls earthquakes

Page 28: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Our Goals

• propose an algorithm to detect a target event– do semantic analysis on Tweet

• to obtain tweets on the target event precisely

– regard Twitter user as a sensor• to detect the target event• to estimate location of the target

• produce a probabilistic spatio-temporal model for – event detection– location estimation

• propose Earthquake Reporting System using Japanese tweets

Page 29: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Twitter and Earthquakes in Japan

a map of earthquake occurrences world wide

a map of Twitter userworld wide

The intersection is regions with many earthquakes and large twitter users.

Page 30: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Twitter and Earthquakes in Japan

Other regions: Indonesia, Turkey, Iran, Italy, and Pacific coastal US cities

Page 31: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Event detection algorithms

• do semantic analysis on Tweet – to obtain tweets on the target event precisely

• regard Twitter user as a sensor– to detect the target event– to estimate location of the target

Page 32: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Semantic Analysis on Tweet• Search tweets including keywords related to a

target event– Example: In the case of earthquakes

• “shaking”, “earthquake”

• Classify tweets into a positive class or a negative class– Example:

• “Earthquake right now!!” --- positive• “Someone is shaking hands with my boss” --- negative

– Create a classifier

Page 33: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Semantic Analysis on Tweet

• Create classifier for tweets– use Support Vector Machine(SVM)

• Features (Example: I am in Japan, earthquake right now!)– A: Statistical features (7 words, the 5th word) the number of words in a tweet message and the position of the query

within a tweet

– B: Keyword features ( I, am, in, Japan, earthquake, right, now) the words in a tweet

– C: Word context features (Japan, right) the words before and after the query word

Page 34: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Tweet as Sensor Data

・・・ ・・・ ・・・tweets

・・・・・・

Probabilistic model

Classifier

observation by sensorsobservation by twitter users

target event target object

Probabilistic model

values

Event detection from twitter Object detection in ubiquitous environment

the correspondence between tweets processing andsensor data processing for event detection

Page 35: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Tweet as Sensor Data

some users posts“earthquake right now!!”

some earthquake sensors

responses positive value

We can apply methods for sensory data detection to tweets processing

・・・ ・・・ ・・・tweets

Probabilistic model

Classifier

observation by sensorsobservation by twitter users

target event target object

Probabilistic model

values

Event detection from twitter Object detection in ubiquitous environment

・・・・・・

search and classify them into

positive class

detect an earthquake

detect an earthquake

earthquake occurrence

Page 36: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Tweet as Sensor Data• We make two assumptions to apply methods for observation by

sensors

• Assumption 1: Each Twitter user is regarded as a sensor– a tweet → a sensor reading– a sensor detects a target event and makes a report probabilistically– Example:

• make a tweet about an earthquake occurrence• “earthquake sensor” return a positive value

• Assumption 2: Each tweet is associated with time and location info– time : posting timestamp– location : GPS data or location information in user’s profile

By processing time and location information, we can detect target events and find events’ locations

Page 37: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Probabilistic Model

• Why we need probabilistic models?– Sensor readings are noisy and sometimes sensors work

incorrectly– We cannot judge whether a target event occurred or not

from a single tweet– We have to calculate the probability of an event

occurrence from a series of data

• We propose probabilistic models for– event detection from time-series data– location estimation from a series of spatial information

Page 38: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Temporal Model

• We must calculate the probability of an event occurrence from a set of sensor readings

• We examine the actual time-series data to create a temporal model

Page 39: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Aug 9

0..

.A

ug 9

0..

.A

ug 9

1..

.A

ug 9

2..

.A

ug 1

0..

.A

ug 1

0..

.A

ug 1

0..

.A

ug 1

1..

.A

ug 1

1..

.A

ug 1

1..

.A

ug 1

1..

.A

ug 1

2..

.A

ug 1

2..

.A

ug 1

2..

.A

ug 1

3..

.A

ug 1

3..

.A

ug 1

3..

.A

ug 1

3..

.A

ug 1

4..

.A

ug 1

4..

.A

ug 1

4..

.A

ug 1

5..

.A

ug 1

5..

.A

ug 1

5..

.A

ug 1

6..

.A

ug 1

6..

.A

ug 1

6..

.A

ug 1

6..

.A

ug 1

7..

.A

ug 1

7..

.0

20

40

60

80

100

120

140

160

num

ber

of

tweets

Oct

9 .

..O

ct 9

...

Oct

10..

.O

ct 1

0..

.O

ct 1

0..

.O

ct 1

0..

.O

ct 1

1..

.O

ct 1

1..

.O

ct 1

1..

.O

ct 1

2..

.O

ct 1

2..

.O

ct 1

2..

.O

ct 1

3..

.O

ct 1

3..

.O

ct 1

3..

.O

ct 1

3..

.O

ct 1

4..

.O

ct 1

4..

.O

ct 1

4..

.O

ct 1

5..

.O

ct 1

5..

.O

ct 1

6..

.O

ct 1

6..

.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

num

ber

of

tweets

Temporal Model with Exponential Dist. Example: Earthquake and Typhoon

Page 40: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Spatial Model

• We must calculate the probability distribution of location of a target

• We apply Bayes filters to this problem which are often used in location estimation by sensors– Kalman Filters– Particle Filters

Page 41: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Bayesian Filters for Location Estimation

• Kalman Filters– are the most widely used variant of Bayes filters– approximate the probability distribution which is

virtually identical to a uni-modal Gaussian representation

– advantages: computational efficiency– disadvantages: limited to accurate sensors or

sensors with high update rates

Page 42: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Bayesian Filters for Location Estimation

• Particle Filters– represent the probability distribution by sets of samples, or

particles– advantages: able to represent arbitrary probability densities

• particle filters can converge to the true posterior even in non-Gaussian, nonlinear dynamic systems.

– disadvantages: difficult to apply to high-dimensional estimation problems

Page 43: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Information Diffusion Related to Real-time Events

• Proposed spatiotemporal models need to meet one condition that– sensors are assumed to be independent

• We check if information diffusions about target events happen because– if an information diffusion happened among users,

Twitter user sensors are not independent, they affect each other (correlation!)

Page 44: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Information Diffusion Related to Real-time Events

Nintendo DS Game an earthquake a typhoonInformation Flow Networks on Twitter

In the case of an earthquake and a typhoon, very little information diffusion takes place on Twitter, compared to Nintendo DS Game→ We assume that Twitter user sensors are independent about earthquakes and typhoons

Page 45: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Experiments and Evaluation

• We demonstrate performances of– tweet classification– event detection from time-series data →   show this result in “application”– location estimation from a series of spatial

information

Page 46: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Evaluation of Semantic Analysis

• Queries– Earthquake query: “shaking” and “earthquake”– Typhoon query:”typhoon”

• Examples to create classifier– 597 positive examples

Page 47: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Evaluation of Semantic Analysi

• We obtain highest F-value when we use Statistical features and all features.

• Keyword features and Word Context features don’t contribute much to the classification performance

• A user becomes surprised and might produce a very short tweet

• It’s apparent that the precision is not so high as the recall

Features Recall Precision F-Value

Statistical 87.50% 63.64% 73.69%Keywords 87.50% 38.89% 53.85%Context 50.00% 66.67% 57.14%All 87.50% 63.64% 73.69%

Page 48: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Evaluation of Spatial Estimation• Target events

– earthquakes• 25 earthquakes from August.2009 to October 2009

– typhoons• name: Melor

• Baseline methods– weighed average

• simply takes the average of latitudes and longitudes

– median• simply takes the median of latitudes and longitudes

• Metric: distance from an epicenter – The smaller the better!

Page 49: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Evaluation of Spatial Estimation

Tokyo

Osaka

actual earthquake center

Kyoto

estimation by median

estimation by particle filter

balloon: each tweets color : post time

Page 50: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Evaluation of Spatial Estimation

Typhoon

Page 51: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Discussions of Experiments

• Particle filter performs better than other methods• If the center of a target event is in an oceanic area,

it’s more difficult to locate it precisely from tweets• It becomes more difficult to make good estimation in

less populated areas

Page 52: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Results of Earthquake DetectionJMA intensity scale 2 or more 3 or more 4 or more

Num of earthquakes 78 25 3Detected 70(89.7%) 24(96.0%) 3(100.0%)

Promptly detected* 53(67.9%) 20(80.0%) 3(100.0%)

Promptly detected: detected in a minutesJMA intensity scale: the original scale of earthquakes by Japan Meteorology Agency

Period: Aug.2009 – Sep. 2009Tweets analyzed : 49,314 tweetsPositive tweets : 6291 tweets by 4218 users

We detected 96% of earthquakes that were stronger than scale 3 or more during the period.

Page 53: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Conclusions

• We investigated the real-time nature of Twitter for event detection

• Semantic analyses were applied to tweets classification • We consider each Twitter user as a sensor and set a problem to

detect an event based on sensory observations• Location estimation methods such as Kaman filters and particle

filters are used to estimate locations of events • We developed an earthquake reporting system, which is a

novel approach to notify people promptly of an earthquake event

• We plan to expand our system to detect events of various kinds such as rainbows, traffic jam etc.

Page 54: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Location-based Crowdsourcing: Extending Crowdsourcing to the Real World

Alt et al.NordiCHI 2010

Page 55: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Motivation

• Crowdsourcing beyond the digital?– Seeker and solvers– Important aspects: right time and location for

matchmaking.• Several scenarios:– Recommendations on demand (e.g., buying something?)– Recording on demand (e.g., missing lectures?)– Remotely looking around? (e.g., apartment?)– Real-time weather information– Translations on demand

Page 56: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

System Architecture

Page 57: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011
Page 58: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

The mobile client screenshots: (a) Main menu where users can search tasks. (b) A sample task retrieved from the database.

Page 59: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Lessens learned

• Users prefer address-based task selection (GPS is too hard to parse)

• Picture tasks are most popular (easy to handle)

• Tasks were mainly solved at or close to home• Tasks are solved after work• Response times vary

Page 60: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Lessens learned

• Informative tasks are as popular as picture tasks• Time-critical tasks are out of interest• Solution should be achievable in 10 minutes• Tasks are still solved after work• Mid-day breaks are good times to search for task• Solving a task can take up to one day• Home and surrounding areas are the most favorite places for

solving tasks• Voluntary tasks have lower chance (monetary rewards: 77%)• Users search for tasks in their current location

Page 61: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Social Sensors and Pervasive Services: Approaches and Perspectives

Rosi et al.,PerCol 2011

Page 62: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Social Sensors?

• Device intelligence with various on-board sensors such as GPS

• Human intelligence with “social sensors” – Twitter posts, Facebook status updates, pictures

posted on Flickr– Personal information: shopping patterns, place

visit patterns, etc. (with some potential social interactions)

Page 63: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Approaches to integrate social sensing and pervasive services

Page 64: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Approaches to integrate social sensing and pervasive services

• A: Extracting data from social networks– Detecting crowded sites (Fujisaka et al., 2010)– Mining landmarks from blogs (Ji et al., 2009)– Event detection using Flickr (Zhao et al., 2006)

• B: Exploiting social networks as a socio-pervasive middleware– Twitter with sensors (Demirbas et al., 2010)– S-Sensors with micro-blogging (Baqer et al., 2009)– Status update feeds to social networks (CenceMe)

Page 65: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Approaches to integrate social sensing and pervasive services

• C: Pervasive overlays on social networks– Interconnecting and sharing data sensed from

personal devices with the rest of the world– SenseFace:

• Capture and process (local), and disseminate data (social nets)

• Dynamically mash-up sensor data and social networks

• D: App-specific socio-pervasive networks– Fusing mobile, sensor, and social data to fully

enable context-aware computing

Page 66: Ubiquitous Human Computation KAIST KSE Uichin Lee May 11, 2011

Some Issues

• Key issues– Rich data, yet comes at the cost of understanding the data– Sheer size (raw facts and data produced by sensors)

• Un-structured, noisy data– Unified data representation and interpretation– Overcoming uncertainty of data

• No guarantee on the delivery of specific info about facts and at specific times by social sensors– Systems require “critical mass”; heterogeneous popularity based on

location (e.g., rural area vs. urban area)• Completely out-of-loop of system managers and app developers