53
The Federated Data System DataFed -Non-intrusive data integration infrastructure -Based on standards-based web services -Processing tools created from reusable components

2008-02-11: EPA DataFed Presentation

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 2008-02-11: EPA DataFed Presentation

The Federated Data System

DataFed-Non-intrusive data integration infrastructure-Based on standards-based web services-Processing tools created from reusable components

Page 2: 2008-02-11: EPA DataFed Presentation

Local, Regional, Global Pollution

Before 1950s:

LocalSmoke, Fly ash

Post- 2000s:

Global, HTAPOzone, PM,Global Change

1970s-1990s:

Regional, LRTPAcid Rain, Haze

• The LRTP/HTAP flow of air pollutants is likely to increase as overseas economies grow.• Pollutant influx leads to significant exceedances of O3 PM NAAQS in some regions• Even after domestic controls, some US areas will be no-compliant because of LRTP

Page 3: 2008-02-11: EPA DataFed Presentation

Terrestrial

Airborne

Near-Space

LEO/MEO Commercial Satellites and Manned Spacecraft

Far-Space

L1/HEO/GEO TDRSS & CommercialSatellites

Dep

loya

ble

P

erm

anen

t Coordinating Earth Observing Systems

Forecasts & Predictions

Aircraft/Balloon Event Tracking and Campaigns

User Community

Vantage Points Capabilities

Page 4: 2008-02-11: EPA DataFed Presentation

`

Products

Products

State & Local

Canadian Providences

NOAANWS

HHSCDC-EPHTN

Aerosol Optical Depth(GASP)

TERRA MODISTERRA MODISAQUA MODISAQUA MODIS

ProductsCMAQ Forecast Data

US EPAAQS

ProductsSLAMS/NAMS SURACE PM2.5 Data

Air Quality/Public Health NTO Integrated Observed-Modeled Air Quality FieldsAir Quality/Public Health NTO Integrated Observed-Modeled Air Quality Fields

ProductsSpatial surface Predictions Satellite/Model/Surface Data Fusion

State Public Health

Departments

~10:30 local overpass~1:30 local overpass

Algorithms/QA

NASAGFSC

Science Team

NASAGFSCDACC

*Note: Regional East Atmospheric Lidar Mesonet (REALM) is university led federated network by UMBC and is identified as a NTO in the implementation plan.

Products Aerosol Optical Depth(MOD04_L2)

NOAANESDIS

NOAANESDIS/ORA &

CREST Institutes

?UMBC

CREST Institute

GEOS-12GEOS-12CONUS every 30 minutes

REALMREALMContinuous VerticalResolution Data

ProductsAlgorithms/QA

EPAOAR & ORD

ProductsCMAQ Assessment Data

ProductsStudies and Impacts to human health

US EPA OAQPS/ORD/OEI

RSI Gateway

Page 5: 2008-02-11: EPA DataFed Presentation

P. Dickerson, EPA

Page 6: 2008-02-11: EPA DataFed Presentation

http://www.igospartners.org/http://www.fz-juelich.de/icg/icg-ii/iagos/

http://www.fz-juelich.de/icg/icg-ii/mozaic/home

http://earthobservations.org/

http://www.epa.gov/ttn/amtic/monstratdoc.html

National Ambient Air Monitoring StrategyOffice of Air Quality Planning and StandardsResearch Triangle Park, NCDecember 2005

http://www.al.noaa.gov/AQRS/reports/monitoring.html

http://www.empa.ch/gaw/gawsis/

http://www.nesdis.noaa.gov/

http://www.cmdl.noaa.gov/

Barrow

Mauna Loa

Trinidad Head

A. Samoa

S. Pole

L2

NCORE L3

L1

http://www.emep.int/

GAW

CENR/AQRSGEOSS

NO

AA

CM

DL

NOAA NESDIS

EMEP

R. Scheffe

Page 7: 2008-02-11: EPA DataFed Presentation

GEOSS

Eco-informatics

Accountability/indicatorsSIPs, nat.rules

designations

PHASE

PM research

Risk/exposureassessments

AQ forecastingPrograms

NAAQS setting

EPANOAA

NASA

NPS

USDA

DOE

PrivateSector

States/TribesRPO’s/Interstate

Academia

NARSTO

NAS, CAAACCASAC, OMB

Enviros

Organizations

CDC

Supersites

IMPROVE, NCorePM monit, PAMS

CASTNET

Lidarsystems

NADP Satellite data

Intensive studies

PM centers

Other networks:SEARCH, IADN..

Data sources

CMAQGEOS-CHEM

EmissionsMeteorology

Health/mort.records

The Scheffe Challenge: Organizations - Programs – Data: A Mess

Info System Challenges:

What’s the overall dependency

Information Flow

Forces and Controls on Data Flow

Cooperation, Competition, Co-Opetition

Page 8: 2008-02-11: EPA DataFed Presentation

Relationship Between

Organizations - Programs – DataVersion 0.1

Goals $$Info needs, $$

Data need, $$

Judge, Decide, ActAnalyze, Report

Actionable Knowledge

Decision, Action

Public

Measure, Organize

Organized Data

Flow of InformationData systems organize the measurements and models and provide them to programs.

Programs analyze the data and provide actionable knowledge to organizations.

Organizations evaluate multiple information sources, make decisions and act.

Flow of ControlPublic and special interest groups set up organizations and provides them with funding

Organizations develop programs, define their scope, governance and funding

Programs satisfy their information needs by monitoring or by using other’s data

Data sources acquire the data for their parent programs and also expose them for reuse

Page 9: 2008-02-11: EPA DataFed Presentation

System of SystemsGlobal Earth Observing System of Systems - GEOSS

Characteristics of System of Systems (SoS)

• Autonomous constituents managed/operated independently

• Independent evolution of each constituent

• SoS displays emergent behavior

Must recognize, manage, exploit the characteristics:

• No stakeholder has complete SoS insight

• Central control is limited; distributed control is essential

• Users, must be involved throughout the life of a SoS

Page 10: 2008-02-11: EPA DataFed Presentation
Page 11: 2008-02-11: EPA DataFed Presentation

GEOSS Architecture and Interoperability

Page 12: 2008-02-11: EPA DataFed Presentation

Screencast: Information Landscape

Page 13: 2008-02-11: EPA DataFed Presentation

Screencast: Info System Screencast

Page 14: 2008-02-11: EPA DataFed Presentation

Screencast: DataFed Technologies

Page 15: 2008-02-11: EPA DataFed Presentation

Screencast: DataFed Tools

Page 16: 2008-02-11: EPA DataFed Presentation
Page 17: 2008-02-11: EPA DataFed Presentation

KMZ: Google Earth-DataFed Mashup

GA Smoke Global Chem

Page 18: 2008-02-11: EPA DataFed Presentation

The Transformational Effect of Networking

“Networking has led to an unprecedented surge of productivity” Time Magazine, Person of the Year 2006, YOU

• These are opportunities to enable Earth Science through more networking

• But many resistances to networking exist that need to be overcome

• Information has become the main driver of progress• Time and place are no longer barriers to participation and interaction • The Web has become a medium participation - ‘Web 2.0’ phenomenon

Page 19: 2008-02-11: EPA DataFed Presentation

Networking Multiplies Value Creation

ApplicationData

1 User Stovepipe Value = 1 1 Data x 1 Program = 1

Enclosed Value-Creating Process - ‘Stovepipe’

Page 20: 2008-02-11: EPA DataFed Presentation

ApplicationData

Application

Application

Application

Application

Stovepipe

1 User Stovepipe Value = 1 1 Data x 1 Program = 1

5 Uses of Data Value = 5 1 Data x 5 Program = 5

Networking Multiplies Value Creation

Page 21: 2008-02-11: EPA DataFed Presentation

Merging data may creates new, unexpected opportunities

Not all data are equally valuable to all programs

1 User Stovepipe Value = 1 1 Data x 1 Program = 1

5 Uses of Data Value = 5 1 Data x 5 Program = 5

Open Network Value = 25 5 Data x 5 Program = 25

Data

Data

Data

Data

Data

StovepipeApplication

Application

Application

Application

Application

Networking Multiplies Value Creation

Page 22: 2008-02-11: EPA DataFed Presentation

The Future

• AQ Science, Management – Pollutant Characterization (Obs-Model-Emission Integration)– Agile monitoring and assessment

• GEOSS, Collaboration, Informatics – The future is bright, too bright?– So many new things, so little time

• DataFed– Continue promoting standards-based networking– Enabling IS users create new, actionable knowledge faster – Move data flow maintenance from R/D to operational

Page 23: 2008-02-11: EPA DataFed Presentation

Integrated observation-modeling complex – R. Scheffe

Land AQ Monitors

Total column depth(through Satellites)

AQ model results

Vertical Profiles

Integrated Observation- Modeling

Optimized PM2.5, O3

Characterizations

Health

Air management

ecosystems

Page 24: 2008-02-11: EPA DataFed Presentation

Pollutant Characterization, Understanding

• Characterization – creating the best available pollutant pattern as distributed in space-time-parameter• Characterization - achievable by Reanalysis with the ‘best available’ model and assimilated observations• Understanding gained from the model processes and applying previous/tacit knowledge

Goal: Pollutant Characterization and Understanding

Models

Observations

Emissions

Reanalysis

Forward model with assimilated observations

Data Interpretation

Use of previous & tacit knowledge to explain data

GOAL:

Knowledge Creation

Characterization of pattern; understating of processes

Characterization

Page 25: 2008-02-11: EPA DataFed Presentation

NAAMS: National Ambient Air Monitoring Strategy and NCore

…coordinated multi-pollutant real-time monitoring network

Page 26: 2008-02-11: EPA DataFed Presentation

Public Public InformationInformation

Health/Exp. Assessment

Emissions Planning

AQ Trends and Accountability

Science Support

NAAQS

National Air Quality Information Integration

AQ Data Pool

National Air Quality Info Network

Re-examination of Data Access and processing Systems

Pooling of data/info resources for re-use in multiple applications; a la GEOSS

Page 27: 2008-02-11: EPA DataFed Presentation

Sensing Revolution

Web 2.0

Page 28: 2008-02-11: EPA DataFed Presentation

Summary

There is a slow ‘aligning of stars’ for integrating heterogeneous data

• System of Systems architecture is suitable for integrating data– Standard data access is a key interoperability protocol

– Heterogeneous data can be non-intrusively standardized by mediators

– Service-based software architecture delivers tailored products to diverse uses

• Federated data and shared web-based tools are in use– DataFed already includes over 100 datasets (emissions, ground, satellite)

– The system has been applied to EPA policy, regulatory and science development

• However, – DataFed is just one of the many mediator nodes, but these need to be connected

– Much more data would need to be federated

– HTAP model-data comparison would be an attractive use case

Page 29: 2008-02-11: EPA DataFed Presentation

DataFed Applications (2002-2007)

ScienceMystery (Nitrate?) EventsData Integration (PM-Bext; NO2 Sat-Surf; AQ Event Detection Algorithms

AQ ManagementExceptional Event Analysis (EPA – N. Frank)Network Assessment (EPA – R. Scheffe)Fire-Smoke, Global Emissions (EPA – T. Keating)FASTNET, CATT Tools, S/R Analysis (RPO – R. Poirot)

IS Networking Infrastructure

NASA/ESIP Web Services, SAO (NASA – L. Friedl, K. Moe) GEOSS Interoperability Demos (Wash. U)HTAP Network, Integration (EPA – Keating)

Page 30: 2008-02-11: EPA DataFed Presentation

FASTNET Report: 0409FebMystHaze (RPO – R. Poirot)

Mystery Winter Haze:

Natural? Nitrate/Sulfate? Stagnation?

Contributed by the FASNET Community, Sep. 2004

Correspondence to R Husar , R Poirot

Coordination Support by

Inter-RPO WG Fast Aerosol Sensing Tools for Natural Event Tracking, FASTNETNSF Collaboration Support for Aerosol Event Analysis

NASA REASON CoopEPA -OAQPS

AIRNOW PM25 - February

Sulfate-driven Jul-Aug peak

Feb-Mar peak, of unknown

origin

Page 31: 2008-02-11: EPA DataFed Presentation

Data Fusion: AIRNOW PM25 - ASOS Bext

2004 July 20 14:00

July 21, 2004 July 22, 2004 July 23, 2004

ARINOW PM25 ARINOW PM25ARINOW PM25

ASOS RHBext

ASOS RHBext

ASOS RHBext

Page 32: 2008-02-11: EPA DataFed Presentation

PM Event Detection from Time Series

Contributed by the FASNET Community, Sep. 2004Correspondence to R Husar , R Poirot

Coordination Support by

Inter-RPO WG Fast Aerosol Sensing Tools for Natural Event Tracking, FASTNETNSF Collaboration Support for Aerosol Event Analysis

NASA REASON CoopEPA -OAQPS

Event : Deviation > x*percentile

Page 33: 2008-02-11: EPA DataFed Presentation

Speciated PM Network Assessment (EPA – R. Scheffe)

CIRA/ VIEWS

Database

CAPITA/ DataFed Database

Network Assessment

PPT

IMPROVE

EPA SPEC

CIRA Tools and Processes

DataFed Tools and Processes

Analysis Tools and Processes

Speciated Data Flow and Processing

EPA NCore Process

Evaluation, Feedback

Page 34: 2008-02-11: EPA DataFed Presentation

Distributed Fire Data Sources (S. Falke, EPA, NASA)

Numerous state, regional, and national fire related databases and online access applications exist. The challenge is to bring them together, on-the-fly, without requiring substantial changes to the underlying systems.

Also need to access data sources that are not “Web-ready”.

BlueSkyRAINS

GeoMACWFAS

USGS

NOAAUMaryland

Page 35: 2008-02-11: EPA DataFed Presentation

Next Process

Next Process

Aerosol Data

Collection IMP. EPA

Aerosol Sensors

Integration VIEWS

Integrated AerData

AEROSOL

Weather Data

Assimilate NWS

Gridded Meteor.

Trajectory ARL

Traject.Data

TRANSPORT

TrajData Cube

Aggreg. Traject.

AerData Cube

CATT

Aggreg.Aerosol

CATT-In CAPITA

CATT-In CAPITA

Combined Aerosol Trajectory Tool, CATT (RPO – R. Poirot)

Trajectory Browser

Kitty: Simple CATT

CATT Transport Analyzer

Page 36: 2008-02-11: EPA DataFed Presentation

HTAP Data Network (EPA – T. Keating)

TF HTAP Workshop Forshungszentrum Juelich, Oct 17-19, 2007, Juelich, Germay

Application Examples for NOx Analysis

Collaborators:

Rudolf Husar, Washington U. St. Louis

Stefan Falke, Northrop, Wash U.

Greg Leptoukh, NASA, Goddard

Martin Schultz, FZJ, Juelich

Page 37: 2008-02-11: EPA DataFed Presentation

GEOSS Interoperability Demos (Washington Univ.)

Beijing

BarcelonaDenver

Page 38: 2008-02-11: EPA DataFed Presentation
Page 39: 2008-02-11: EPA DataFed Presentation

Origin of Fine Dust Events over the US

Sulfate is local, no major spikes

Gobi dust transport in springSahara dust import in summer

Fine dust spikes over the entire US are mainly from intercontinental transport

Page 40: 2008-02-11: EPA DataFed Presentation

Air Quality Management System: Components and Functions

Public

Analyzing

Interpreting Evaluating Separating Synthesizing

Organizing

Quality controlFormattingDocumentingDisplaying

Deciding

Evaluate optionsMatching goals CompromisingChoosing

Data Manager, Organizer

Technical Analysts, Program Manager

Policy Analysts, Decision Maker

Valu

e A

dd

ing

P

rocesses

Human Agents

Decision Support System (DSS)

The primary purpose of data systems is to mediate between data providers and programs/projects

Programs perform analysis for Orgs., the DSS is within programs

The big decisions of societal importance are done by Organizations

(This needs more wisdom from the practioners)

Page 41: 2008-02-11: EPA DataFed Presentation

Flow of Data and Usage Control

Data

Control

Requesting Information

Providing Information

Sensors Acquisition processing

User Agencie

s

User Progra

ms

NAAQS SIPs Forecast GEOSS …

Info SystemNegotiating Space

Domain ProcessingData Sharing

Std

. In

terf

ace

Gen. ProcessingS

td.

Inte

rface

Data

Control

Reports

Reporting

Obs. & Models Decision Support System

Page 42: 2008-02-11: EPA DataFed Presentation

DataFed Tools - Subset

Consoles: Data from diverse sources are displayed to create a rich context for exploration and analysis

CATT: Combined Aerosol Trajectory Tool for the browsing backtrajectories for specified chemical conditions

Viewer: General purpose spatio-temporal data browser and view editor applicable for all DataFed datasets

Page 43: 2008-02-11: EPA DataFed Presentation

Summary

• Global Monitoring - Modeling Revolution – ‘May you live in interesting times’– We are in the midst of an observational revolution (satellites, monitoring networks).– The global distribution and transport of some pollutants can be monitored daily– Global models are maturing into effective analytical and predictive tools

• Results to Date: – Compelling evidence for global-scale transport of PM and Ozone– Qualitative evidence of ‘extra-jurisdictional’ impact on the US air quality– Potential for quantification of natural and non-US impact

• Such massive job will require:– International, interagency, interdisciplinary collaboration.– Open flow of data/knowledge– Scientific ‘value-adding chains’

Page 44: 2008-02-11: EPA DataFed Presentation

FASTNET and DataFedFASTNET (Fast Aerosol Sensing Tools for Natural Event Tracking) an open communal information sharing facility to study aerosol events, including detection, tracking and impact on PM and haze.

The main asset of FASTNET is the community of data analysts, modelers, managers participating in the production of actionable knowledge from data and models

The community is supported by a non-intrusive data integration infrastructure based on Internet standards (web services) and a set of web-tools evolving under the federated data system, DataFed

DataFed is supported by its community and is under the umbrella of the interagency Earth Science Information Partners, ESIP (NASA, NOAA and EPA)

Page 45: 2008-02-11: EPA DataFed Presentation

OGC WCS Data Access Protocol

GEOSS Provides SOA for Coupling for Autonomous Nodes Facilitates Publishing, Finding and Accessing Data

Emerging Air Quality Data Flow Network

Page 46: 2008-02-11: EPA DataFed Presentation

Application of OGC WCS Data Access Protocol

Regardless of the data location, data type and format,

• the parameter-space-time query is the same

• the return is in user selectable format from the offerings

Coverage=THEEDDS.T& BBOX=-126,24,-65,52,0,0 &TIME=2002-07-07/2002-07-07 &FORMAT=NetCDFCoverage=SEAW.Refl& BBOX=-126,24,-65,52,0,0 &TIME=2002-07-07/2002-07-07 &FORMAT=GeoTIFFCoverage=SURF.Bext& BBOX=-126,24,-65,52,0,0 &TIME=2002-07-07/2002-07-07 &FORMAT=NetCDF-table

Grid Image Station Data

Parameter Bounding Box Time Range Out Format

Page 47: 2008-02-11: EPA DataFed Presentation

Web 1.0 -> Web 2.0 Transformation

• The Web is being transformed: It is becoming more participatory

• Its content is increasingly generated and distributed by individuals

• See the explosive growth of wikies, picture-sharing, blogs, Facebook

• This architectural, technological and cultural change is Web 2.0

• Web 2.O is good for AtmosphericScience community since it allows

– Better harvesting of current knowledge– Collaborative creation new knowledge.

Page 48: 2008-02-11: EPA DataFed Presentation

Distribution of ResponsibilityDistribution of Responsibility

Distributed Responsibility DataFedDistributed Responsibility DataFed

The data lies with the data providersThe wrappers and mediators with DataFed communityApplication programs with end user Data discovery with data & service registries

Page 49: 2008-02-11: EPA DataFed Presentation

The Information Interoperability Stack

Page 50: 2008-02-11: EPA DataFed Presentation

Imagine…More Shared Obs & Models…. On Your Fingertips or Google Earth..

2007++More Global Data & Models

2007Global Data & Models

Page 51: 2008-02-11: EPA DataFed Presentation

Regional Haze Rule: Natural Aerosol

Looking ahead to reach natural conditions … in 60+ years!!!

Page 52: 2008-02-11: EPA DataFed Presentation

Asian Dust Cloud over N. America

On April 27, 1998 the dust cloud arrived in North America.

Regional average PM10 concentrations increased to 65 mg/m3

In Washington State, PM10 concentrations exceeded 100 mg/m3

Asian Dust 100 g/m3

Hourly PM10

Page 53: 2008-02-11: EPA DataFed Presentation

Aircraft Detection of Siberian Forrest Smoke near Seattle, WA

Jaffe et. al., 2003