View
212
Download
0
Tags:
Embed Size (px)
Citation preview
Karsten SteinhaeuserUniversity of Minnesota
joint work withNitesh Chawla & Auroop Ganguly
12th International Symposium onSpatial and Temporal Databases
Minneapolis, MN
August 24, 2011
Comparing Predictive Power in Climate Data: Clustering Matters
08/24/2011 University of Minnesota 2
Outline
• Motivation
• Networks Primer
• From Data to Networks
• Motivating Networks in Climate Science
• Descriptive Analysis and Predictive Modeling
• Empirical Evaluation & Comparison
• Conclusions
08/24/2011 University of Minnesota 3
Mining Complex Data
• Complex spatio-temporal data pose unique challenges
• Tobler’s First Law of Geography:
“Everything is related, but near
things more than distant.”– But are all near things equally related?
– Are there phenomena explained by
interactions among distant things?
(teleconnections)
08/24/2011 University of Minnesota 4
Networks Primer
What is a Network?
• Oxford English Dictionary:network, n.: Any netlike or complexsystem or collection of interrelatedthings, as topographical features,lines of transportation, ortelecommunications routes(esp. telephone lines).
• My working definition:Any set of items that are connected or related to each other.(“items” and “connections” can be concrete or abstract)
08/24/2011 University of Minnesota 5
Networks Primer
Community Detection in Networks
• Identify groups of nodes
that are relatively more tightly
connected to each other than
to other nodes in the network
• Computationally challenging
problem for real-world networks
08/24/2011 University of Minnesota 6
From Data to Networks
• Networks are pervasive in
social science, technology,
and nature
• Many datasets explicitly
define network structure
• But networks can also represent other types of data,
framework for identifying relationships, patterns, etc.
08/24/2011 University of Minnesota 7
Motivating Networks in Climate
Uncertainty derives from many known and
often many more unknown sources
08/24/2011 University of Minnesota 8
Motivating Networks in Climate
• Projections of climate
rely on many factors
– Understanding of the
physical processes
– Ability to implement
this understanding in
computational models
– Assumptions about
the futureSource: IPCC SRES and AR-4
08/24/2011 University of Minnesota 9
Motivating Networks in Climate
• Some processes well-understood and modeled,
others much less credible
• Comparison to observations shows varying skills
08/24/2011 University of Minnesota 10
Motivating Networks in Climate
• Models cannot capture some features/processes
• Comparison to observations illustrates severe
geographic variability, topographic bias
08/24/2011 University of Minnesota 11
Motivating Networks in Climate
Research Question:
Can we characterize the credible variables,
identify relationships to the relatively less
credible variables, and leverage them to
improve or refine our understanding?
Answer:
Stay Tuned…
08/24/2011 University of Minnesota 12
Basic OLAPCapabilities
Observations
GCM outputs
ARM Data
Data Storage & Management
Knowledge Discovery
HPSS
Oracle DB
Complex Networks
Data Mining
System Inputs
HPCCLens - Jaguar
High-Performance Computing
Visualization
ArcGIS
PowerWall
System Outputs
Novel Insights
Decision Support
Knowledge Discovery for Climate
08/24/2011 University of Minnesota 13
Historical Climate Data
• NCEP/NCAR Reanalysis (proxy for observation)
• Monthly for 60 years (1948-2007) on 5ºx5º grid
• Seven variables:Sea surface temperature (SST)Sea level pressure (SLP)Geopotential height (GH)Precipitable water (PW)Relative Humidity (RH)Horizontal wind speed (HWS)Vertical wind speed (VWS)
De-S
easonalize
Raw Data
AnomalySeries
08/24/2011 University of Minnesota 14
Network Construction
• View global climate system as a collection of interacting oscillators[Tsonis & Roebber, 2004]
– Vertices represent locations in space
– Edges denote correlation in variability
• Link strength estimated by correlation, low-weight edges are pruned from the network
• Construct networks only for
locations over the oceans
– Relatively better captured by models
08/24/2011 University of Minnesota 15
Geographic Properties
• Examine network structure in spatial context
– Link lengths computed as great-circle distance
– Compare autocorrelation / de-correlation lengths for
different variables, interpret within the domain
Sea Level Pressure Precipitable Water Vertical Wind Speed
Autocorrelation
Teleconnection
08/24/2011 University of Minnesota 16
Clustering Climate Networks
• Apply community detection to partition networks– Use Walktrap algorithm
[Pons & Latapy, 2006]
– Efficient and works well for dense networks
• Visualize spatial pattern using GIS tools
• Cluster structure suggests relationships within the climate system
Sea Level Pressure
Precipitable Water
08/24/2011 University of Minnesota 17
Update!
Research Question:
Can we characterize the credible variables,
identify relationships to the relatively less
credible variables, and leverage them to
improve or refine our understanding?
Revised Answer:
Yes… but that’s not all.
08/24/2011 University of Minnesota 18
Descriptive Predictive
• Network representation is able to capture interactions, reveal patterns in climate– Validate existing assumptions / knowledge
– Suggest potentially new insights or hypothesesfor climate science
• Want to extract the relationships between atmospheric dynamics over ocean and land– i.e., “Learn” physical phenomena from the data
08/24/2011 University of Minnesota 19
Predictive Modeling
• Use network clusters as candidate predictors
• Create response variables for target regions
around the globe (illustrated below)
• Build regression
model relating
ocean clusters
to land climate
08/24/2011 University of Minnesota 20
Illustrative Example
• Predictive model for air temperature in Peru
– Long-term variability highly predictable due to
well-documented relation to El Nino
• Small number of clusters have majority of skill
– Feature selection (blue line) improves predictionsRaw DataAll ClustersFeature Selection
08/24/2011 University of Minnesota 21
Update!
Research Question:
Can we characterize the credible variables,
identify relationships to the relatively less
credible variables, and leverage them to
improve or refine our understanding?
Revised Answer:
Yes and Yes… but wait, there’s more.
08/24/2011 University of Minnesota 22
Results on Train/Test
08/24/2011 University of Minnesota 23
Predictive Skill
08/24/2011 University of Minnesota 24
Update!
Research Question:
Can we characterize the credible variables,
identify relationships to the relatively less
credible variables, and leverage them to
improve or refine our understanding?
Revised Answer:
Yes, Yes, and Yes.
08/24/2011 University of Minnesota 25
Variations / Extensions
• Compare network approach to traditional
clustering methods
– k-means, k-medoids, spectral, EM, etc.
• Compare different types of predictive models
– (linear) regression, regression trees, neural nets,
support vector regression
08/24/2011 University of Minnesota 26
Compare Clustering Methods
08/24/2011 University of Minnesota 27
Compare Predictive Models
08/24/2011 University of Minnesota 28
Refining Model Projections
Conclusions
• Networks capture behavior of the climate system
• Clusters (or “communities”) derived from these
networks have useful predictive skill
– Statistically significantly better than predictors based
on clusters derived using traditional methods
• Potential for advancing climate science
– Understanding of physical processes
– Complement climate model simulations
08/24/2011 University of Minnesota 29
Upcoming Events
1. First International Workshop on Climate Informatics, New York, NY, August 26, 2011http://www.nyas.org/climateinformatics
2. NASA Conference on Intelligent Data Understanding (CIDU), Mountain View, CA, Oct 19-21, 2011https://c3.ndc.nasa.gov/dashlink/projects/43/
3. IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Vancouver, Canada, December 10, 2011http://www.nd.edu/~dial/climkd11/
08/24/2011 University of Minnesota 30
08/24/2011 University of Minnesota 31
Thanks & Questions
Contact
Personal Homepage
http://www.nd.edu/~ksteinha
NSF Expeditions on
Understanding Climate Change
http://climatechange.cs.umn.edu
This work was supported in part by the National Science Foundation
under Grants OCI-1029584 and BCS-0826958. This research was
also funded in part by the project entitled “Uncertainty Assessment
and Reduction for Climate Extremes and Climate Change Impacts”
under the initiative “Understanding Climate Change Impact: Energy,
Carbon, and Water Initiative” within the Laboratory Directed
Research and Development (LDRD) Program of the Oak Ridge
National Laboratory, managed by UT-Battelle, LLC for the U.S.
Department of Energy under Contract DE-AC05-00OR22725.