16
Distributed Data Analysis & Dissemination System (D-DADS) Special Interest Group on Data Integration June 2000

Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Embed Size (px)

Citation preview

Page 1: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Distributed Data Analysis &

Dissemination System (D-DADS)

Special Interest Group

on Data Integration

June 2000

Page 2: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Overview

Environmental data are collected by multiple, disparate data providers, such as individual EMPACT projects

Each data provider presents their data in their own format making it difficult to find, access, read, and integrate the data

Standardized formats and data dissemination systems are required for data accessibility and integration of distributed data sets

This proposal presents a distributed data analysis and delivery system that provides users with data access to multiple sources

Page 3: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

The Data Flow Process:From Raw Data to Refined Knowledge

• Primary data are gathered from providers of sensory data• Data are integrated, filtered, aggregated and fused into secondary data• Reports are prepared for delivering environmental knowledge to the public

EMPACT

Page 4: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Data Flow Resistances

These resistances can be overcome through a distributed system that catalogs and standardizes the data allowing easy access for data manipulation and analysis.

•The user does not know what data are available•The available data are poorly described (metadata)•There is a lack of QA/QC information•The data come in various formats requiring hand crafted codes to read and manipulate them

The data flow process is hampered by a number of resistances.

Page 5: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Interoperability

“the ability to freely exchange all kinds of spatial information about the Earth and about objects and phenomena on, above, and below the Earth’s surface; and to cooperatively, over networks, run software capable of manipulating such information.” (Buehler & McKee, 1996)

Such a system has two key elements:

• Exchange of meaningful information

• Cooperative and distributed data management

One requirement for an effective distributed environmental data system is interoperability, defined as,

Page 6: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Distributed Data Analysis & Dissemination System:

D-DADS

• Specifications: Uses standardized forms of data, metadata and access protocols Supports distributed data archives, each run by its own provider Provides tools for data exploration, analysis and presentation

• Features: Data are organized as multidimensional data cubes Dimensional data cubes are distributed but shared Analysis is supported by built-in and user functions Supports other data types, such as images, GIS data layers, etc.

Page 7: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

D-DADS Architecture

ARC/INFO

VirtualDataCube

ArcSDETranslator

OLAPService

Provider

DataCube

LegacyDatabase

CustomOLAP

Translator

DataCube

SQLDatabase

OLAP ServiceProvider

GISTable

OLAP

StandardizedDescription &

Format

Database(SQL,

Oracle,etc.)

ArcSDETranslator

GISMap

DataProviders

Data Access andManipulation Tools

UserInteraction

DataCube

ArcIMS

Page 8: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

The D-DADS Components

• Data Providers supply primary data to system, through SQL or

other data servers. • Standardized Description & Format populate and describe

the data cubes and other data types using a standard metadata describing data

• Data Access and Manipulation tools for providing a unified interface to the data cubes and GIS data layers for accessing and processing (filtering, aggregating, fusing) data and integrating data into virtual data cubes

• Users are the analysts who access the D-DADS and produce

knowledge from the data

The multidimensional data access and manipulation component of D-DADS can be implemented using OLAP.

Page 9: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

On-line Analytical Processing: OLAP

• A multidimensional data model making it easy to select, navigate, integrate and explore the data.

• An analytical query language providing power to filter, aggregate and merge data as well as explore complex data relationships.

• Ability to create calculated variables from expressions based on other variables in the database.

• Pre-calculation of frequently queried aggregated values, i.e. monthly averages, enables fast response time to ad hoc queries.

Page 10: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Fast Analysis of Shared Multidimensional Information (FASMI)

(Nigel, P. “The OLAP Report”)

being Fast – The system is designed to deliver relevant data to users quickly and efficiently; suitable for ‘real-time’ analysis

facilitating Analysis – The capability to have users extract not only “raw” data but data that they “calculate” on the fly.

being Shared – The data and its access are distributed.

being Multidimensional – The key feature. The system provides a multidimensional view of the data.

exchanging Information – The ability to disseminate large quantities of various forms of data and information.

An OLAP system is characterized as:

Page 11: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Multi-Dimensional Data Cubes

•Multi-dimensional data models use inherent relationships in data to populate multidimensional matrices called data cubes.

•A cube's data can be queried using any combination of dimensions

•Hierarchical data structures are created by aggregating the data along successively larger ranges of a given dimension, e.g time dimension can contain the aggregates year, season, month and day.

Page 12: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

User Interaction with D-DADS

Query

Data View(Table, Map,

Time Chart, etc.)

Distributed Database

XML data

XML data

Page 13: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Example Application: Visibility D-DADS

Visibility observations (extinction coefficient) are an indicator of air quality and serve as an important data set in the public’s understanding of air quality.

A visibility D-DADS will consist of multiple forms of visibility data, such as visual range observations and digital images from web cameras.

Potential visibility data providers include:

- EMPACT projects and their hourly visual range data

- The IMPROVE database

- CAPITA, a warehouse for global surface observation data available every six hours

Page 14: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Possible Node in Geography Network

National Geographic and ESRI are establishing a geography network consisting of distributed spatial databases.

Some EMPACT projects are participating as nodes in the initial start-up phase

The visibility distributed data and analysis system could link to and become another node in the geography network, making use of the geography network’s spatial viewers.

Other views, such as a time view could be linked with the spatial viewer to take advantage of the multidimensional visibility data cubes.

Page 15: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Example Viewer

Map View

Variable View

Time View WebCam

View

The views are linked so that making a change in one view, such as selecting a different location in the map view, updates the other views.

Page 16: Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000

Summary

In the past, data analysis has been hampered by data flow resistances. Fortunately, the tools and framework to overcome these resistances now exist, including:

• World Wide Web• XML• OLAP• ArcIMS• Metadata standards

It appears timely to consider a distributed environmental data analysis and dissemination system.