Indiana University School of Informatics
The LEAD Gateway
Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie
School of InformaticsIndiana University
Indiana University School of Informatics
Overview• The LEAD ITR Project
– Science Objectives– Adaptive CyberInfrastructure for Mesoscale Storm Prediction
• A tour of the LEAD project– Components of our approach to Data and Data Driven Adaptive Workflow
• Experience so far.• The Gateway Lifecycle
Indiana University School of Informatics
Predicting Storms• Hurricanes and tornadoes cause
massive loss of life and damage to property
• Underlying physical systems involve highly non-linear dynamics so computationally intense
• Data comes from multiple sources– “real time” derived from streams of
data from sensors– Archived in databases of past storms
• Infrastructure challenges:– Data mine instrument radar data for
storms– Allocate supercomputer resources
automatically to run forecast simulations
– Monitor results and retarget instruments.
– Log provenance and metadata about experiments for auditing.
Indiana University School of Informatics
Analysis/Assimilation
Quality ControlRetrieval of Unobserved
QuantitiesCreation of Gridded Fields
Prediction/Detection
PCs to Teraflop Systems
Product Generation, Display,
Dissemination
End Users
NWSPrivate Companies
Students
Traditional Methodology
STATIC OBSERVATIONS
Radar DataMobile Mesonets
Surface ObservationsUpper-Air BalloonsCommercial Aircraft
Geostationary and Polar Orbiting SatelliteWind ProfilersGPS Satellites
The Process is Entirely Serial
and Static (Pre-Scheduled):
No Response to the Weather!
The Process is Entirely Serial
and Static (Pre-Scheduled):
No Response to the Weather!
Indiana University School of Informatics
Analysis/Assimilation
Quality ControlRetrieval of Unobserved
QuantitiesCreation of Gridded Fields
Prediction/Detection
PCs to Teraflop Systems
Product Generation, Display,
Dissemination
End Users
NWSPrivate Companies
Students
The LEAD Vision: Adaptive Cyberinfrastructure
DYNAMIC OBSERVATIONS
Models and Algorithms Driving Sensors
The CS challenge: Build cyberinfrastructure services that The CS challenge: Build cyberinfrastructure services that provide adaptability, scalability, availability, useability, and provide adaptability, scalability, availability, useability, and real-time response. real-time response.
Indiana University School of Informatics
Change the Paradigm• To make fundamental advances we need:
– Adaptivity in computational model.
• But also Cyberinfrastructure to:– Execute complex scenarios in response to weather events• Stream processing, triggers• Close loop with the instruments.
– Acquire computational resources on demand.• Need supercomputer-scale resources• Invoked in response to weather events
– Deal with data deluge• User can no longer manage his/her own experiment products
Indiana University School of Informatics
The LEAD Gateway Portal• To support three classes of users– Meteorology research scientists & grad students.
– Undergrads in meteorology classes– People who want easy access to weather data.
Go to:http://www.leadproject.org
Indiana University School of Informatics
Gateway Components • A Framework for Discovery
– Four basic components
• Data Discovery– Catalogs and index services
• The experiment– Computational workflow managing on-demand resources
• Data analysis and visualization• Data product preservation,
– automatic metadata generation and experimental data providence.
Indiana University School of Informatics
Data Search
• Select a region and a time range and desired attributes
Indiana University School of Informatics
Portal: Experimental Data & Metadata Space
• CyberInfrastructure extends user’s desktop to incorporate vast data analysis space.
• As users go about doing scientific experiments, the CI manages back-end storage and compute resources.– Portal provides ways to explore
this data and search and discover it.
• Metadata about experiments is largely automatically generated, and highly searchable.– Describes data object (the file)
in application-rich terms, and provides URI to data service that can resolve an abstract unique identifier to real, on-line data “file”.
Indiana University School of Informatics
Workflow: Composing Computational Tools to build
new Tools• Workflow is a term that describes
the process of moving data through a sequence of analysis and transformational steps to achieve a goal.
• Another Paradigm Shift for the users.
• Each activity a user initiates in LEAD is an Experiment which consists of– Data discovery and collection.– Applied analysis and transformation
• A graph of activities (workflow)
– Curated data products and results
• Each workflow activity is logged using an event system and stored as metadata in the users workspace.– Provides a complete provenance of
work.
Indiana University School of Informatics
The Experiment Builder
• A Portal “wizzard” that leads the user through the set-up of a workflow
• Asks the user: – “Which workflow do you want to run?”
• Once this is know, it can prompt the user for the required input data sources
• Then it “launches” the workflow.
Indiana University School of Informatics
Gateway Support for Adaptive QueriesLEAD requires ability to construct workflows that
are • Data Driven
– Weather data streams define nature of computation• Persistent and Agile
– Data mining of data stream, detects “interesting” feature, event triggers workflow scenario that has been waiting for months.
• Adaptive– In response to weather: weather changes. – Nature of workflow may have to change on-the-fly.– Resource and requirements change.
Indiana University School of Informatics
Experience with on-demand computing
• We use TeraGrid.– Actually “best effort” and not yet “on demand”
– Use Grid technology for remote job execution and security.
• Reliability is critical.• Workflow can automatically resubmit a failed task to another resource
• Urgent Computing handled by the Spruce Gateway.
Indiana University School of Informatics
Validating Scientific Discovery
• The Gateway is becoming part of the process of science by being an active repository of data provenance
• Disks are cheap, so why not record everything?
• The system records each computational experiment that a user initiates – A complete audit trail of the experiment or computation
– Published results can include link to provenance information for repeatability and transparency.
Indiana University School of Informatics
Experience so far• First release to support “WxChallenge: the new
collegiate weather forecast challenge”– The goal: “forecast the maximum and minimum temperatures,
precipitation, and maximum sustained wind speeds for select U.S. cities.
– to provide students with an opportunity to compete against their peers and faculty meteorologists at 64 institutions for honors as the top weather forecaster in the nation.”
– 79 “users” ran 1,232 forecast workflows generating 2.6TBybes of data.
• Over 160 processors were reserved on Tungsten from 10am to 8pm EDT(EST), five days each week
• National Spring Forecast– First use of user initiated 2Km forecasts as part of that program.
Generated serious interest from National Severe Storm Center.
• Integration with CASA project scheduled for final year of LEAD ITR.
Indiana University School of Informatics
The LEAD Gateway Lifecycle
• Work began in 2003 with requirements analysis by the LEAD meteorology and CS teams.
• First 2 years of development supported by LEAD ITR and NMI Portals project.
• Year 3 & 4 support of 2 FTE from TG.– Public Release March 2007.
• Current Status– A new production release in July 2007.– Last year of LEAD ITR: hardened version of the Gateway to transition to community support• UCAR - UNIDATA may be the host. • Extensive planning underway.