20
Discovery Systems Program Barney Pell, Ph.D. RIACS / NASA Ames Research Center [email protected] Presentation to IJCAI-2003 Workshop on Information Integration Using the Web

Discovery Systems Program

  • Upload
    kirby

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Discovery Systems Program. Barney Pell, Ph.D. RIACS / NASA Ames Research Center [email protected] Presentation to IJCAI-2003 Workshop on Information Integration Using the Web. Discovery Systems Program Context NASA’s Computing Information and Communications Technology Program - PowerPoint PPT Presentation

Citation preview

Page 1: Discovery Systems Program

Discovery Systems Program

Barney Pell, Ph.D.

RIACS / NASA Ames Research Center

[email protected]

Presentation to IJCAI-2003 Workshop on Information Integration Using the Web

Page 2: Discovery Systems Program

Outline of Talk

• Discovery Systems Program Context– NASA’s Computing Information and Communications

Technology Program– NASA Program Funding Philosophy

• Discovery Systems Project– Project Overview– Exploratory Environments and Collaboration– Distributed Data Search, Access, and Analysis– Machine-Assisted Model Discovery and Refinement– Demonstrations, Applications, and Infusions

• Schedule and participation

Page 3: Discovery Systems Program

FY02-FY08 CICTOverall Project Structure Phasing

FY02 FY03 FY04 FY05 FY06 FY07 FY08

Computing, Networking, and Info. Systems

Space Communications

Information Technology Strategic Research

Advanced Networking & Communications

Intelligent Systems

Discovery Systems

Collaborative Decision Systems

Reliable Software

Advanced Computing

Adaptive Embedded Information Systems

Page 4: Discovery Systems Program

CICT Project Definition- Existing Projects -

• Intelligent Systems – Smarter, more adaptive systems and tools that work collaboratively with humans in a

goal-directed manner to achieve the mission/science goals

• Computing, Networking and Information Systems – Seamless access to ground-, air-, and space-based distributed information technology

resources

• Space Communications – Innovative technology products for space data delivery enabling high data rates, broad

coverage, internet-like data access

• Information Technology Strategic Research – Fundamental information, biologically-inspired, and nanoscale technologies for

infusion into NASA missions

Page 5: Discovery Systems Program

CICT Project Definition- Proposed FY05-FY07 New-Start Projects -

• Collaborative Decision Systems (FY05)– Information technologies enabling improved decision making for science and exploration missions

• Discovery Systems (FY05)– Knowledge management and discovery technologies accelerating the scientific process and engineering

analysis

• Advanced Networking and Communications (FY05)– Integrated, intelligent, deeply networked ground and in-space system technologies to enable the next

generation of NASA Enterprise communication architectures

• Advanced Computing (FY05)– Advanced ground and space-based computing technologies to enable NASA’s science and engineering

activities

• Reliable Software (FY07)– Software development, verification, and validation technologies to maintain and increase the reliability of

increasingly complex NASA operational and analysis software systems

• Adaptive Embedded Information Systems (FY07)– Embedded information systems capable of adapting to evolving mission science requirements, system

health, and environmental factors in support of improved science return with reduced mission risk.

Page 6: Discovery Systems Program

Funding Philosophy

• Cross-cutting Information Technologies• “As Only NASA Can”• NASA Relevance

– Future needs of NASA Enterprises – Would not be filled without funding by NASA

• Research Excellence – Competitive Evaluation

• Technology Maturity Spectrum– Breakthrough research – Demonstrations of capability– Selective infusions for NASA-relevant efforts

• Milestones and Metrics– Failable– “So-what”-able

Page 7: Discovery Systems Program

Discovery SystemsProject Overview

• Objective– Create and demonstrate new discovery and analysis technologies

– Make them easier to use

– Extend them to complex problems in massive, distributed, diverse data

– Enabling scientists and engineers to solve increasingly complex interdisciplinary problems in future data-rich environments.

• Subprojects

– Exploratory Environments and Collaboration– Distributed Data Search, Access, and Analysis– Machine-Assisted Model Discovery and Refinement– Demonstrations, Applications, and Infusions

Page 8: Discovery Systems Program

Discovery Systems Project- WBS Technology Elements -

– Distributed data search, access and analysis• Grid based computing and services• Information retrieval• Databases • Planning, execution, agent architecture, multi-agent systems • Knowledge representation and ontologies

– Machine-assisted model discovery and refinement• Information and data fusion• Data mining and Machine learning• Modeling and simulation languages

– Exploratory environments and Collaboration• Visualization• Human-computer interaction• Computer-supported collaborative work• Cognitive models of science

Page 9: Discovery Systems Program

Discovery Systems Before/AfterTechnical Area Start of Project After 5 years

Distributed Data Search Access and Analysis

Answering queries requires specialized knowledge of content, location, and configuration of all relevant data and model resources. Solution construction is manual.

Search queries based on high-level requirements. Solution construction is mostly automated and accessible to users who aren’t specialists in all elements.

Machine integration of data / QA

Publish a new resource takes 1-3 years. Assembling a consistent heterogeneous dataset takes 1-3 years. Automated data quality assessment by limits and rules.

Publish a new resource takes 1 week. Assembling a consistent heterogeneous dataset in real-time. Automated data quality assessment by world models and cross-validation.

Machine Assisted Model Discovery and Refinement

Physical models have hidden assumptions and legacy restrictions.

Machine learning algorithms are separate from simulations, instrument models, and data manipulation codes.

Prediction and estimation systems integrate models of the data collection instruments, simulation models, observational data formatting and conditioning capabilities. Predictions and estimates with known certainties.

Exploratory environments and collaboration

Co-located interdisciplinary teams jointly visualize multi-dimensional preprocessed data or ensembles of running simulations on wall-sized matrixed displays.

Distributed teams visualize and interact with intelligently combined and presented data from such sources as distributed archives, pipelines, simulations, and instruments in networked environments.

Page 10: Discovery Systems Program

Distributed Search, Access and Analysis

• Objective– Develop and demonstrate technologies to enable investigating

interdisciplinary science questions by finding, integrating, and composing models and data from distributed archives, pipelines; running simulations, and running instruments.

– Support interactive and complex query-formulation with constraints and goals in the queries; and resource-efficient intelligent execution of these tasks in a resource-constrained environment.

– Milestone: Enable novel what-if and predictive question answering• Across NASA’s complex and heterogeneous data and simulations • By non data-specialists • Use world-knowledge and meta-data• Support query formulation and resource discovery• Example query: “Within 20%, what will be the water runoff in the

creeks of the Comanche National Grassland if we seed the clouds over southern Colorado in July and August next year?”

Page 11: Discovery Systems Program

Carbon Assimilation

CO2 CH4

N2O VOCsDust

HeatMoistureMomentum

ClimateTemperature, Precipitation,Radiation, Humidity, Wind

ChemistryCO2, CH4, N2Oozone, aerosols

MicroclimateCanopy Physiology

Species CompositionEcosystem StructureNutrient Availability

Water

DisturbanceFiresHurricanesIce StormsWindthrows

EvaporationTranspirationSnow MeltInfiltrationRunoff

Gross Primary ProductionPlant RespirationMicrobial RespirationNutrient Availability

Ecosystems

Species CompositionEcosystem Structure

WatershedsSurface Water

Subsurface WaterGeomorphology

Biogeophysics

En

erg

y

Wa

ter

Ae

ro-

dyn

am

ics

Biogeochemistry

MineralizationDecomposition

Hydrology

So

il W

ate

r

Sn

ow

Inte

r-ce

pte

dW

ate

r

Phenology

Bud Break

Leaf Senescence

HydrologicCycle

VegetationDynamics

Min

ute

s-T

o-H

ou

rsD

ays-

To

-Wee

ks

Yea

rs-T

o-C

en

turi

es

Terrestrial Biogeoscience Involves Many Complex Processes and Data

(Courtesy Tim Killeen and Gordon Bonan, NCAR)

Page 12: Discovery Systems Program

Solution Construction via Composing Models

surface watercommunity

snow coverage

snow and iceDAAC (NASA)

snow meltmetadata

runoff model

evaporationmodel

rainfall

Nat. WeatherService

topography

USGS

data preparation

service interface:required inputs,provided outputs,data descriptions,events

climate model

parameterizedphenomenon

modeledphenomenon

modeledphenomenon

modeledphenomenon

binary data streams

Each model typically has acommunity of experts thatdeal with the complexity of themodel and its environment

Page 13: Discovery Systems Program

Materialized Data Catalogue

MetadataCatalogue

Virtual Data Grid Example

Application: Three data types of interest: is derived from , is derived from , which is primary data(interaction and and operations proceed left to right)

Need

is known. Contact

Materialized Data Catalogue.

Need

Abstract Planner(for materializing data)

Need tomaterialize

Virtual Data Catalogue(how to generate

and )

How to generate ( is at LFN)

Estimate forgenerating

Concrete Planner(generates workflow)

Grid compute resources

Data Grid replica services

Grid storage resources

Grid workflow engine

data and LFN

Have Proceed?

LFN = logical file namePFN = physical file namePERS = prescription for generating unmaterialized data

PERSrequires

Need

Need

As illustrated, easy to deadlock w/o QoS and SLAs.

Need

Materialize with PERS

ismaterialized

at LFN

Exact steps to generate Resolve

LFN

PFN

Store an archival copy, if so requested. Record existence of cached copies.

Inform that is materialized

Request

Notifythat exists

LFN for

Page 14: Discovery Systems Program

Machine assisted model discovery and refinement

• Develop and demonstrate methods to– assist discovery of and fit physically descriptive models with

quantifiable uncertainty for estimation and prediction – improve the use of observational or experimental data for

simulation and assimilation applied to distributed instrument systems (e.g. sensor web)

– integrate instrument models with physical domain modeling and with other instruments (fusion) to quantify error, correct for noise, improve estimates and instrument performance.

• Eg. Metrics– 50% reduction in scientist time forming models – 10% reduction in uncertainty in parameter estimates or a 10%

reduction in effort to achieve current accuracies– 10% reduction in computational costs associated with a forward

model – ability to process data on the order of 1000s of dimensions– ability to estimate parameters from tera-scale data.

Page 15: Discovery Systems Program

A reasonable 15 month prediction of the 97/98 El Nino is achieved when ocean height, temperature and surface wind data are combined to initialize the model.

A reasonable 15 month prediction of the 97/98 El Nino is achieved when ocean height, temperature and surface wind data are combined to initialize the model.

JFM1998PredictedPrecipitation

19991997

Prediction of the 97/98 El Nino

Page 16: Discovery Systems Program

User Community

Observing System of the

Future

• Information Synthesis

• Access to Knowledge

•Advanced Sensors

•Sensor Web

InformationInformation

•Partners•NASA•DoD•Other

Govt•Commerci

al•Internatio

nal

Page 17: Discovery Systems Program

Exploratory Environments and Collaboration

• Objective– Develop exploratory environments in which

interdisciplinary and/or distributed teams visualize and interact with intelligently combined and presented data from such sources as distributed archives, pipelines, simulations, and instruments in networked environments.

– Demonstrate that these environments measurably improve scientists’ capability to answer questions, evaluate models, and formulate follow-on questions and predictions.

Page 18: Discovery Systems Program

Multi-parameter Explorations

Page 19: Discovery Systems Program
Page 20: Discovery Systems Program

Conclusion

• Discovery Systems Program– Exciting NASA funding program

• Follow-on to CNIS and IS/IDU• ~$250M total over 5 years

– Information Integration is highly relevant– Focus on NASA needs, but these are challenging

• Program Funding starts FY 2005– Targeting funding external community FY05

• So likely a broad call sometime in FY04

• We’d like your help– Technical workshops in FY04– Advisors wanted for planning teams– Submissions to funding calls– Reviewers