21
Anand Padmanabhan 1,4 , Shaowen Wang 1,2,3,4 , Guofeng Cao 6 , MyungHwa Hwang 1 , Yanli Zhao 1 , Zhenhua Zhang 1,5 , Yizhao Gao 1 , Kiumars Soltani 1 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information Science School of Earth, Society and Environment 2 Department of Computer Science 3 Department of Urban and Regional Planning 4 National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign 5 Computer Network Information Center, Chinese Academy of Sciences 6 Texas Tech University GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media Data 1

GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Anand Padmanabhan1,4, Shaowen Wang1,2,3,4, Guofeng Cao6, MyungHwa Hwang1, Yanli Zhao1, Zhenhua Zhang1,5, Yizhao Gao1 , Kiumars Soltani1

1CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Department of Geography and Geographic Information Science

School of Earth, Society and Environment 2Department of Computer Science

3Department of Urban and Regional Planning 4National Center for Supercomputing Applications (NCSA)

University of Illinois at Urbana-Champaign 5Computer Network Information Center, Chinese Academy of Sciences

6Texas Tech University

GISolve-FluMapper for Interactive Spatial

Analysis of Streaming Social Media Data

1

Page 2: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Motivations

• Conventional approaches to disease surveillance use indicators like physician hospital visits, and laboratory-test o Influenza like illnesses (ILI) surveillances are usually conducted on fixed

spatiotemporal scale

o Typically updated weekly for regions/states/cities

o Early detection and progression ILI are important in public health research

o Social media data could provide an additional indicator

• Recent studies with significant societal impacts have been conducted with social networking and media data o E.g. predictions of the stock market (Bollen et al. 2011), disaster responses

(Goodchild and Glennon 2010) to infectious disease tracking (Signorini and Segre 2011, Wang et al. 2013)

• Dramatic increase in the volume of spatial data available via social media o 2-4% of the 340M daily tweets sent via Twitter have GPS information available

o This number is growing with increase in usage of location-aware mobile devices

2

Page 3: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Using Location Based Social Media Data

• Advantages o Contains rich and dynamic information over space and time about various social

themes (Tsou et al. 2013)

o Cost efficient and can be collected in near real-time (Signorini et al. 2011)

o Individual-level observations allow exploratory analysis of dynamics on varying spatiotemporal scales (Wang et al. 2013)

o Far less restriction on use of location based social media Data (LBSMD) compared to clinical or survey data

• Challenges o Data sizes quickly grow beyond the capabilities of conventional geographic

information systems (GIS) (Wang et al. 2013)

o Data are in unstructured textual form so intensive data modeling and processing is required to transform LBSMD into structured spatial data entities (e.g., point patterns and trajectories) on which spatiotemporal analysis could be applied (Li et al. 2013)

o Sampled natured of LBSMD data raises questions of quality and statistical validity

3

Page 4: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

FluMapper Goals

• Provide an exploratory tool to public health researchers for

detect ILI activity early as indicated in social media data

• Efficiently manage the large volumes of LBSMD

• Capture the dynamics of flu risk across multiple

spatiotemporal scales o Provide ILI risk maps from national scale to very fine local scales

• Identify movement patterns of population o Aggregate individual user trajectory across different spatial and temporal scales to

generate flows

o Capture travel patterns across the country, regional, state, city and local levels

• Facilitate interactive and exploratory analysis of LBSMD o Update ILI activity in near real-time for the conterminous United States

o Provide an integrated view of flu risk indicators and mobility of users

4

Page 5: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

CyberGIS Software Ecosystem

5

Spatially Explicit Agent-Based Modeling of Disease

Spread

Viewshed

Analysis

CyberGIS Social Media Analytics

• CyberGIS Gateway

and Applications

• CyberGIS Toolkit

• GISolve Middleware

FluMapper

Page 6: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

FluMapper Architecture

6 After (Wang et al. 2013)

Page 7: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

FluMapper Components

• Data handling o Data collection and processing

o Collects, processes and stores data from Twitter in near real time

o Efficient and scalable services to query raw and derived data developed

o Spatiotemporal data model

o Spatiotemporal data cube is provides aggregated data and statistics at multiple scales for efficient spatial query

o At the finest scale, North America is represented as a uniform grid with cell size of 30 arc seconds

• Spatial Analysis o Exploratory spatial data analysis

o Kernel density estimation (KDE) , a spatial cluster detection technique, is used

o Monte-Carlo simulations with random labeling is applied

o Flow mapping analysis

o Scalable interactive multi-scale visualization of movement pattern is massive social media data was developed

7

Page 8: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Data Handling

• Provides near real-time access to data from Twitter o Data from Twitter collected using streaming application program interface (API)

o Applies techniques like support vector machine (SVM) and keyword filtering to identify

flu-related tweets

o A 7-day sliding window data is provided for interactive analysis

• GISolve middleware is enhanced to provide scalable service

interface to access the massive data stream o MongoDB, a scalable and open source NoSQL database used

o Set of distributed databases manage spatial data entities

o Spatiotemporal surveillance of flu is achieved by processing tweets continuously and

aggregating summary flow data in a hierarchical multi-scale structure using

spatiotemporal data cube

8

Page 9: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Data Handling - Architecture

9

Page 10: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Data Handling - Components

• Crawler o Use Twitter’s streaming API

o ~2.5M tweets daily (Data has been collected for a year)

o A spatial filter (no keyword filter) of North America is applied when data is collected

• Tweet importer o Reads in raw tweets and converts them into spatial events

o Uses text mining techniques to identify flu related tweets (Signorini et al. 2011)

• Cube generator o Generates trajectories of individual users

o Aggregates these trajectories into flows at multiple spatiotemporal scale

o Cube consisted of ten layers of spatial grids, and at the finest level contained a

spatial grid of 3072×7168 cells of about 30 arc-seconds (≈1 km) width

• Data filter service o Provides spatiotemporal query capabilities to the flu and flow databases

10

Page 11: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Spatiotemporal Data Model

• Spatiotemporal data cube is developed to support

efficient query of aggregated statistics o Data cube decomposes the spatiotemporal space into a lattice of multi-scale,

hierarchical cuboids

o Provides multiple scales of spatial indices for fast and efficient spatial query

• Given a region, statistics such as the number of flu occurrences, and number of

trajectories traveling out and in from this region are efficiently retrieved

Page 12: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Spatial Analysis - Flow Mapping

• Scalable interactive visualization of movement patterns in

massive streaming social media data was achieved o Implemented two independent but complementary approaches that show flows to all

destination from (a) single-source, and (b) all-sources (MovePattern) o Multi-scale visualization of massive graphs conducted

o A Hadoop-based strategy for data-intensive processing applied

o An adaptive and interactive user interface developed

• See two student posters for more-details

• Scalable GISolve data services are used to efficiently query

the data in spatiotemporal data cube and the mongodb

database

12

Page 13: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

MovePattern Architecture

Page 14: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Exploratory Spatial Data Analysis

• Kernel density estimation (KDE) is applied to detect concentration of flu events o Adaptive bandwidth is applied to account for inhomogeneous background

o Monte-Carlo simulations with random labeling are conducted (Shi. 2010)

• Spatiotemporal maps showing the flu risk produced o Input are individual tweets in the Twitter feeds indexed by both space and time

o Flu-related tweets represent infection case, with all tweets forming the background

• Analysis is conducted at multiple spatial scales o Might help to ameliorate the effect of modifiable areal unit problem

• Code developed to run on GPUs (Graphical Processing Units) and executed on XSEDE

• GISolve Open Service API used to conduct the analysis

14

Page 15: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

What was accomplished? Demo

15

http://flumapper.org/

Page 16: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

16

Gateway

Open Service API

Analysis

High Performance High Throughput

GISolve Middleware

Cyberinfrastructure

Computation

Management

Web Browser Desktop Software

Visualization Data

Computational

Intensity Analysis

CI Resource

Management

Integration

Services Build & Test

Service Infrastructure

Clouds

GISolve Middleware

Page 17: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

GISolve Open Service API

• Defines a set of REST Web service interfaces for CyberGIS

authentication, application integration, and

cyberinfrastructure-based computation o http://sandbox.cigi.illinois.edu/home/doc/gosapi/GISolveOpenServiceAPI.html

o https://wiki.cigi.illinois.edu/display/DOC/GISolve+Open+Service+API+User+Guide

• Manages complexity of CyberGIS software integration and

cyberinfrastructure-based computational workflows

• Supports broad access to integrated CyberGIS applications,

data, and services

• Facilitates software integration processes

• Client codes written in multiple programming languages,

including Java, Perl, PHP, and Python provided

17

Page 18: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

GISolve Open Service API (Cont.)

• Security API: provides an easy-to-use security mechanism

for CyberGIS software integration, programming access to

CyberGIS services, and cyberinfrastructure resources

• Integration API: provides a service oriented approach for

CyberGIS software registration, configuration, build & test,

and publication & subscription

• Computation API: provides an easy-to-use interface for

cyberinfrastructure-based geospatial computation

18

Page 19: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Concluding Summary

• By leveraging the scalable and easy-to-use Open Service API provided by GISolve middleware , CyberGIS software environment is able to resolve the compute and data challenges from massive streaming social media data • CyberGIS integrates computation, data and communication technologies from

XSEDE to resolve computational challenges

• FluMapper effectively handles big data and supports interactive spatial analysis on it

• Providing early ILI activity detection to public health researchers and practitioners is crucial o The techniques implemented in FluMapper are generalizable to other problems

(e.g. disaster response) where spatial analysis of location based social media data could provide valuable insights

19

Page 20: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Acknowledgements

• National Science Foundation o BCS-0846655

o OCI-1047916

o OCI-0503697

o TeraGrid/XSEDE SES070004

• Colleagues o http://www.cigi.illinois.edu/doku.php/people/index

20

Page 21: GISolve-FluMapper for Interactive Spatial Analysis of Streaming Social Media …cybergis.cigi.uiuc.edu/cyberGISwiki/lib/exe/fetch.php/... · 2013-10-04 · Using Location Based Social

Thank You

• Questions?

21