Upload
frank-ostermann
View
108
Download
0
Embed Size (px)
Citation preview
Geographic Context Analysis
of
Volunteered Information
Frank OstermannLaura Spinsanti
European Commission – Joint Research CentreInstitute for Environment and Sustainability
Digital Earth and Reference Data Unit
2April 11, 2023
Outline
1. Some background on social media in crisis contexts
2. Geographic context for credibility and relevance
3. Twitter and Flickr for fighting forest fires
4. Some examples and results
3April 11, 2023
Opportunities for crisis response
California wildfires 2007
• Near real time
• Heterogeneous input
• More centralized information
processing (basically one
journalist)
4April 11, 2023
+ = !
Why not treat information from the citizens
as another type of sensor data?
5April 11, 2023
VGI during Crisis Events
6April 11, 2023
What social media offer… What crisis management needs…
rich up-to-date information up-to-date information
new paths of communication redundant paths of communication
noise and uncertain lineage and
accuracy of the information high-quality and reliable information
7April 11, 2023
Big Data – Big ChallengesHuman volunteered curation faces limits of
• Sustainability
• Scalability
Proposed Solution:
Automated filtering and quality assessment
8April 11, 2023
Outline
1. Some background on social media in crisis contexts
2. Geographic context for credibility and relevance
3. Twitter and Flickr for fighting forest fires
4. Some examples and results
9April 11, 2023
The two main challenges:
1. Flood of information -> What is relevant?
2. Quality of information -> What is credible and valid?
“Back at hotel. Fire skirted
round village. Little
evidence of significant
damage. Helicopters still
overhead damping scrub.
Beer unaffected”
(Canada BCGovFireInfo):
“Important notice from the Reg
Dist of Bulkley-Nechako
regarding evacuations due to
wildfires in the area
http://ow.ly/2sBxH”“Are you a fireman?
Cause you’re always there to extinguish the fire inside my heart.”
10April 11, 2023
Source
Credibility
Relevance
Context
Content
Location
Elements of quality assessment
11April 11, 2023
Relevance• Main criterion: Match with information needs of a user
• Three perspectives (Saracevic 1975, Hjorland 2010):
• User (What s/he thinks is relevant)
• System (Results matching query)
• Subject knowledge (goal-, task-oriented)
• Geographic relevance vs. geographic aspect of relevance
• Aspects include
• Topicality (Content)
• Location/Origin (Location, Context)
• Novelty (Content)
12April 11, 2023
Credibility
• Two main characteristics:
• Trustworthiness (affecting source credibility) (Source, Context)
• Expertise (affecting information credibility, or accuracy) (Location, Source)
• Two main heuristics (Metzger et al 2010)
• Social confirmation (what do others do or say?) (Source, Context)
• Expectancies (what do I already know?) (Location, Context)
13April 11, 2023
Context, Context, Context
Tackle the context deficit:
• Source characteristics: high number of followers could indicate trustworthiness and/or expertise (social confirmation)
• Context characteristics: land cover, population density, other UGC, could indicate trustworthiness and accuracy (expectancy)
Our focus: Geographic context
14April 11, 2023
Outline
1. Some background on social media in crisis contexts
2. Geographic context for credibility and relevance
3. Twitter and Flickr for fighting forest fires
4. Some examples and results
2
15April 11, 2023
Project Overview• Opportunistic sensing approach
• Geographic focus on Europe
• Collaboration with:
• European Forest Fire Information System (effis.jrc.ec.europa.eu/)
• European Media Monitor (emm.newsbrief.eu/overview.html)
• Universitat Jaume I de Castellón (Institute of New Imaging
Technologies: www.init.uji.es Geographic Information research
group: www.geoinfo.uji.es)
16April 11, 2023
Research Objectives
1. Deploy a system for using UGC in crisis decision support on
forest fires• Integration
• Quality control
• Communication of risk/uncertainty, alerts
2. Assess the added value of using UGC for forest fire response.
17April 11, 2023
• Dynamics require near real-time
processing
• Less signals since often in
sparsely populated areas
• Predictability and recurrence
facilitate sensor and model
calibration
Forest fire characteristics
18April 11, 2023
19April 11, 2023
Example pieces of information: Text Messages
User: “Back at hotel. Fire skirted round village. Little evidence of significant
damage. Helicopters still overhead damping scrub. Beer unaffected” User: “The forest fire is sooo close! We are now on evacuation notice. SOO
nervous!” News: “Okanagan forest fire forces resort evacuation: About 60 people fled in
boats from a forest fire in B.C.'s Okanagan…http://bit.ly/9cstp7” Governative (Canada BCGovFireInfo): “Important notice from the Reg Dist of
Bulkley-Nechako regarding evacuations due to wildfires in the area
http://ow.ly/2sBxH”
Microblogging, e.g. Twitter
20April 11, 2023
Example pieces of information: Visual and Annotations
21April 11, 2023
Scoring VGI
• Sum of weighted scores: QS(VGIj) = ∑Ni=1wisji
• with w being weight for criterion i, and s being the score for the VGI object j
• Topicality: keyword-based
• Proximity: next concurrent reported hotspot
• Land cover: Forest, no-Forest, Built-up
• Population Density: Risk factor
• Information clusters: Similar messages or lone signal?
22April 11, 2023
Workflow overview
23April 11, 2023
Initial context info, tasks and data sources
• Spatio-temporal location -> Geocoding
• Topicality -> Natural Language Processing
• Landcover -> Corine Landcover Classification
• Known fires/hotspots -> MODIS hotspot data
• Population density -> Official LAU2 statistics
24April 11, 2023
Detailed implemented 2011 CONAVI workflow
1.1 RetrievalScheduled Java code
accessing APIs
2.1 TopicalityScheduled PLSQL job
2.2 Geo-Codinga) Scheduled PLSQL jobb) Scheduled Java code
2.3 Geographic contextScheduled PLSQL job
3.1 Spatio-temporal clustering
Scheduled Python script calling SatScan job
2.4 Quality AssessmentScheduled PLSQL job
1.2 StorageScheduled Java code writing to DBMS
Oracle DBMS
3.2 Quality Re-AssessmentScheduled PLSQL job
TwitterStream-ing API
FlickrSearch API
DisseminationSMS, WFS, WMS, RSS, SES
EFFISHotspot Data
European Media MonitorGeo-coding API
25April 11, 2023
Detailed implemented 2012 CONAVI workflow
1.1 RetrievalScheduled Java code
accessing APIs
2.1 TopicalityScheduled PLSQL job
2.2 Geo-Codinga) Scheduled PLSQL jobb) Scheduled Java code
2.3 Geographic contextScheduled PLSQL job
3.1 Spatio-temporal clustering
Scheduled Python script calling SatScan job
2.4 Quality AssessmentScheduled PLSQL job
1.2 StorageScheduled Java code writing to DBMS
Oracle DBMS
3.2 Quality Re-AssessmentScheduled PLSQL job
TwitterStream-ing API
FlickrSearch API
DisseminationSMS, WFS, WMS, RSS, SES
EFFISHotspot Data
European Media MonitorGeo-coding API
26April 11, 2023
Scoring VGI TopicalityFour cases:
A: Probably about a forest fireB: Possibly about a forest fireC: Possibly not about a forest fireD: Probably not about a forest fire
FOR EACH UGCCASE = DIF feu OR feux THEN
CASE = CIF hectares THEN CASE = AIF foret OR forets OR fôret OR fôrets THEN CASE = ANEXT
IF foret OR forets OR fôret OR fôrets THEN CASE = CNEXT
IF hectares THEN CASE = CNEXT
IF incendie THEN CASE = BIF hectares THEN CASE = AIF foret OR forets OR fôret OR fôrets THEN CASE = A
Only CASE = {A,B} gets send to GISCO
27April 11, 2023
Geocoding VGI
Geocoders used: • GISCO/LAU2 brute string matching• European Media Monitor algorithms• Yahoo! Placemaker (2010)Aim: Exploit triplets of Location information (Source, Content, Message)
TWITTER FLICKRAugust 2010 August 2011 August 2010 August 2011
Number of retrieved VGI 2,904,065 7,996,228 7,991 17,850
Percentage with toponym 35% 27% 53% 50%
Percentage with geocode 1.1% 0.92% 20% 21%
28April 11, 2023
Geographic Context of VGI• All context information aggregated on LAU2 GISCO level (because
Geo-coding is done at this level)
• Each VGI is checked for information and scored on
29April 11, 2023
Clustering VGI• SatScan external software
• Scheduled Python script
1. Reads new VGI from database
2. Converts it to SatScan input format
3. Calls SatScan from the command line with appropriate
parameters
4. Waits for SatScan to complete analysis
5. Reads SatScan output
6. Stores relevant information in database
30April 11, 2023
Clustering parameters• Type of clustering algorithm
• Spatial location of clusters based on grid/locations or not
• Type of spatial overlap of clusters
• Maximum spatial cluster size
• Maximum temporal cluster size
Used in 2011: Discrete Poisson adjusting for population, no
grid, no overlap, max radius 50 km, max temporal extent 10%
of study period (9 days)
31April 11, 2023
Outline
1. Some background on social media in crisis contexts
2. Geographic context for credibility and relevance
3. Twitter and Flickr for fighting forest fires
4. Some examples and results
2
32April 11, 2023
Case Studies:
2010 and 2011 French forest fires
and their social media echo
33April 11, 2023
2010 processing steps and corresponding VGI volume
Processing steps applied Data volume
(0) Keyword filtered retrieval from API8 million Tweets 700 thousand Flickr images
(1) Filtering for French keywords611,274 Tweets 61,697 Flickr images
(2) Filtering for incendie keyword6,754 Tweets458 Flickr images
(3) Successfully geocoded VGI1,123 Tweets293 Flickr images
(4) Filtering for location in France 437 Tweets243 Flickr images
34April 11, 2023
Geocoding UGC
Geocoders used: • GISCO/LAU2 brute string matching• European Media Monitor algorithms• Yahoo! Placemaker• Exploit triplets of Location information (Source, Content, Message)
TWITTER FLICKRAugust 2010 August 2011 August 2010 August 2011
Number of retrieved UGGC 2,904,065 7,996,228 7,991 17,850
Percentage with toponym 35% 27% 53% 50%
Percentage with geocode 1.1% 0.92% 20% 21%
35April 11, 2023
WHEN
At least 7 Tweets, 35 fires
96% during first three days80% during first two days
Tweeting on Forest Fires
36April 11, 2023
Spatio-temporal clustering for event detection
CaseTotal
number of Clusters
True Positives False Positives
Undetected Fires
1 (default, all locations) 6 3 3 3
2 (default, known fire locations) 4 3 1 3
3 (modified, all locations) 74 7# 67 0
4 (modified, known fire locations) 8 6 2 0
Default: Space-Time Permutation, no spatial overlap, max spatial size unrestricted, max temporal size 50%Modified: spatial overlap except sharing centers
37April 11, 2023
Case study on French forest fires: spatio-temporal clustering (Case 4)
38April 11, 2023
Space-time cubeTIM
E D
IMEN
SIO
N
(showing un-clustered VGI as grey dots)
39April 11, 2023
2011 processing steps and corresponding VGI volumeProcessing steps applied Data volume
(0) Keyword filtered retrieval from API21.9 million Tweets 54,000 Flickr images
(1) Filtering for French keywords659,676 Tweets 39,016 Flickr images
(2) Calculating topicality and filtering high scores 25,684 VGI items
(3) Successfully enriched VGI 5,770 VGI items
(4) Spatio-Temporal Clustering 129 clusters containing 2,682 VGI
(5) Excluding smaller clusters (<6 items) 75 clusters containing 2,565 VGI
(6) Filtering for keywords in clusters 11 clusters containing 469 VGI
40April 11, 2023
2011 French geo-located VGI, MODIS hotspots, and forest cover
41April 11, 2023
(2) Machine-learned relevance filter: 25,684 items left
(3) Geocoded and context enriched:5,770 items left
(4) Clustered in space and time: 129 clusters with2,682 items
(5) Second relevance filter: 11 clusters left with 469 items
(1) Containing French keywords:659,676 Tweets and39,016 Flickr images
Geographic Context Analysis of Volunteered Information (GeoCONAVI)
42April 11, 2023
2011 French forest fires, VGI clusters, and forest cover
43April 11, 2023
EFFIS
44April 11, 2023
Results online:
http://s-jrciprap258p/vgi/vgitwitter/
http://geocommons.com/maps/183605
45April 11, 2023
Some preliminary results: • Simple keyword queries suffice
• Additional Geo-coding indispensable
• Topicality and context filtering plus spatio-temporal clustering
crucial
• Able to detect fires from Tweets and Flickr images by spatio-
temporal clustering
• Relevance, credibility and overall quality vary greatly, thus more
rules and human assessment needed
46April 11, 2023
Thank you for your attention!
{Frank.Ostermann, Laura.Spinsanti}@jrc.ec.europa.eu