Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Anand Padmanabhan1,4, Shaowen Wang1,2,3,4, Guofeng Cao6, MyungHwa Hwang1, Yanli Zhao1, Zhenhua Zhang1,5, Yizhao Gao1 , Kiumars Soltani1
1CyberInfrastructure and Geospatial Information Laboratory (CIGI)
Department of Geography and Geographic Information Science
School of Earth, Society and Environment 2Department of Computer Science
3Department of Urban and Regional Planning 4National Center for Supercomputing Applications (NCSA)
University of Illinois at Urbana-Champaign 5Computer Network Information Center, Chinese Academy of Sciences
6Texas Tech University
GISolve-FluMapper for Interactive Spatial
Analysis of Streaming Social Media Data
1
Motivations
• Conventional approaches to disease surveillance use indicators like physician hospital visits, and laboratory-test o Influenza like illnesses (ILI) surveillances are usually conducted on fixed
spatiotemporal scale
o Typically updated weekly for regions/states/cities
o Early detection and progression ILI are important in public health research
o Social media data could provide an additional indicator
• Recent studies with significant societal impacts have been conducted with social networking and media data o E.g. predictions of the stock market (Bollen et al. 2011), disaster responses
(Goodchild and Glennon 2010) to infectious disease tracking (Signorini and Segre 2011, Wang et al. 2013)
• Dramatic increase in the volume of spatial data available via social media o 2-4% of the 340M daily tweets sent via Twitter have GPS information available
o This number is growing with increase in usage of location-aware mobile devices
2
Using Location Based Social Media Data
• Advantages o Contains rich and dynamic information over space and time about various social
themes (Tsou et al. 2013)
o Cost efficient and can be collected in near real-time (Signorini et al. 2011)
o Individual-level observations allow exploratory analysis of dynamics on varying spatiotemporal scales (Wang et al. 2013)
o Far less restriction on use of location based social media Data (LBSMD) compared to clinical or survey data
• Challenges o Data sizes quickly grow beyond the capabilities of conventional geographic
information systems (GIS) (Wang et al. 2013)
o Data are in unstructured textual form so intensive data modeling and processing is required to transform LBSMD into structured spatial data entities (e.g., point patterns and trajectories) on which spatiotemporal analysis could be applied (Li et al. 2013)
o Sampled natured of LBSMD data raises questions of quality and statistical validity
3
FluMapper Goals
• Provide an exploratory tool to public health researchers for
detect ILI activity early as indicated in social media data
• Efficiently manage the large volumes of LBSMD
• Capture the dynamics of flu risk across multiple
spatiotemporal scales o Provide ILI risk maps from national scale to very fine local scales
• Identify movement patterns of population o Aggregate individual user trajectory across different spatial and temporal scales to
generate flows
o Capture travel patterns across the country, regional, state, city and local levels
• Facilitate interactive and exploratory analysis of LBSMD o Update ILI activity in near real-time for the conterminous United States
o Provide an integrated view of flu risk indicators and mobility of users
4
CyberGIS Software Ecosystem
5
Spatially Explicit Agent-Based Modeling of Disease
Spread
Viewshed
Analysis
CyberGIS Social Media Analytics
• CyberGIS Gateway
and Applications
• CyberGIS Toolkit
• GISolve Middleware
FluMapper
FluMapper Architecture
6 After (Wang et al. 2013)
FluMapper Components
• Data handling o Data collection and processing
o Collects, processes and stores data from Twitter in near real time
o Efficient and scalable services to query raw and derived data developed
o Spatiotemporal data model
o Spatiotemporal data cube is provides aggregated data and statistics at multiple scales for efficient spatial query
o At the finest scale, North America is represented as a uniform grid with cell size of 30 arc seconds
• Spatial Analysis o Exploratory spatial data analysis
o Kernel density estimation (KDE) , a spatial cluster detection technique, is used
o Monte-Carlo simulations with random labeling is applied
o Flow mapping analysis
o Scalable interactive multi-scale visualization of movement pattern is massive social media data was developed
7
Data Handling
• Provides near real-time access to data from Twitter o Data from Twitter collected using streaming application program interface (API)
o Applies techniques like support vector machine (SVM) and keyword filtering to identify
flu-related tweets
o A 7-day sliding window data is provided for interactive analysis
• GISolve middleware is enhanced to provide scalable service
interface to access the massive data stream o MongoDB, a scalable and open source NoSQL database used
o Set of distributed databases manage spatial data entities
o Spatiotemporal surveillance of flu is achieved by processing tweets continuously and
aggregating summary flow data in a hierarchical multi-scale structure using
spatiotemporal data cube
8
Data Handling - Architecture
9
Data Handling - Components
• Crawler o Use Twitter’s streaming API
o ~2.5M tweets daily (Data has been collected for a year)
o A spatial filter (no keyword filter) of North America is applied when data is collected
• Tweet importer o Reads in raw tweets and converts them into spatial events
o Uses text mining techniques to identify flu related tweets (Signorini et al. 2011)
• Cube generator o Generates trajectories of individual users
o Aggregates these trajectories into flows at multiple spatiotemporal scale
o Cube consisted of ten layers of spatial grids, and at the finest level contained a
spatial grid of 3072×7168 cells of about 30 arc-seconds (≈1 km) width
• Data filter service o Provides spatiotemporal query capabilities to the flu and flow databases
10
Spatiotemporal Data Model
• Spatiotemporal data cube is developed to support
efficient query of aggregated statistics o Data cube decomposes the spatiotemporal space into a lattice of multi-scale,
hierarchical cuboids
o Provides multiple scales of spatial indices for fast and efficient spatial query
• Given a region, statistics such as the number of flu occurrences, and number of
trajectories traveling out and in from this region are efficiently retrieved
Spatial Analysis - Flow Mapping
• Scalable interactive visualization of movement patterns in
massive streaming social media data was achieved o Implemented two independent but complementary approaches that show flows to all
destination from (a) single-source, and (b) all-sources (MovePattern) o Multi-scale visualization of massive graphs conducted
o A Hadoop-based strategy for data-intensive processing applied
o An adaptive and interactive user interface developed
• See two student posters for more-details
• Scalable GISolve data services are used to efficiently query
the data in spatiotemporal data cube and the mongodb
database
12
MovePattern Architecture
Exploratory Spatial Data Analysis
• Kernel density estimation (KDE) is applied to detect concentration of flu events o Adaptive bandwidth is applied to account for inhomogeneous background
o Monte-Carlo simulations with random labeling are conducted (Shi. 2010)
• Spatiotemporal maps showing the flu risk produced o Input are individual tweets in the Twitter feeds indexed by both space and time
o Flu-related tweets represent infection case, with all tweets forming the background
• Analysis is conducted at multiple spatial scales o Might help to ameliorate the effect of modifiable areal unit problem
• Code developed to run on GPUs (Graphical Processing Units) and executed on XSEDE
• GISolve Open Service API used to conduct the analysis
14
What was accomplished? Demo
15
http://flumapper.org/
16
Gateway
Open Service API
Analysis
High Performance High Throughput
GISolve Middleware
Cyberinfrastructure
Computation
Management
Web Browser Desktop Software
Visualization Data
Computational
Intensity Analysis
CI Resource
Management
Integration
Services Build & Test
Service Infrastructure
Clouds
GISolve Middleware
GISolve Open Service API
• Defines a set of REST Web service interfaces for CyberGIS
authentication, application integration, and
cyberinfrastructure-based computation o http://sandbox.cigi.illinois.edu/home/doc/gosapi/GISolveOpenServiceAPI.html
o https://wiki.cigi.illinois.edu/display/DOC/GISolve+Open+Service+API+User+Guide
• Manages complexity of CyberGIS software integration and
cyberinfrastructure-based computational workflows
• Supports broad access to integrated CyberGIS applications,
data, and services
• Facilitates software integration processes
• Client codes written in multiple programming languages,
including Java, Perl, PHP, and Python provided
17
GISolve Open Service API (Cont.)
• Security API: provides an easy-to-use security mechanism
for CyberGIS software integration, programming access to
CyberGIS services, and cyberinfrastructure resources
• Integration API: provides a service oriented approach for
CyberGIS software registration, configuration, build & test,
and publication & subscription
• Computation API: provides an easy-to-use interface for
cyberinfrastructure-based geospatial computation
18
Concluding Summary
• By leveraging the scalable and easy-to-use Open Service API provided by GISolve middleware , CyberGIS software environment is able to resolve the compute and data challenges from massive streaming social media data • CyberGIS integrates computation, data and communication technologies from
XSEDE to resolve computational challenges
• FluMapper effectively handles big data and supports interactive spatial analysis on it
• Providing early ILI activity detection to public health researchers and practitioners is crucial o The techniques implemented in FluMapper are generalizable to other problems
(e.g. disaster response) where spatial analysis of location based social media data could provide valuable insights
19
Acknowledgements
• National Science Foundation o BCS-0846655
o OCI-1047916
o OCI-0503697
o TeraGrid/XSEDE SES070004
• Colleagues o http://www.cigi.illinois.edu/doku.php/people/index
20
Thank You
• Questions?
21