23
ITSC/University of Alabam a in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Rahul Ramachandran, Sara Graves and Ken Keiser Ken Keiser Mathematical Challenges in Scientific Data Mathematical Challenges in Scientific Data Mining Mining IPAM January 14-18, 2002 IPAM January 14-18, 2002 Information Technology and Systems Center Information Technology and Systems Center University of Alabama in Huntsville University of Alabama in Huntsville rramachandran @ itsc . uah . edu

ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

Embed Size (px)

Citation preview

Page 1: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM System Architecture

Rahul Ramachandran, Sara Graves and Rahul Ramachandran, Sara Graves and

Ken KeiserKen Keiser

Mathematical Challenges in Scientific Data MiningMathematical Challenges in Scientific Data Mining

IPAM January 14-18, 2002IPAM January 14-18, 2002

Information Technology and Systems CenterInformation Technology and Systems Center

University of Alabama in HuntsvilleUniversity of Alabama in Huntsville

[email protected]

Page 2: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

Talk Overview

Mining System Requirements

ADaM System Architecture

ADaM Plan Builder

Research directions

Page 3: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

Mining System Requirements: Mining System Requirements: When,Where and WhoWhen,Where and Who

WHEN•Real Time•On-Ingest•On-Demand•Repeatedly

WHERE•User Workstation•Data Archive Center•Data Mining Center

WHO•Casual Users•Domain Experts•Mining Experts

Data Mining

Page 4: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

Algorithm Development and Mining (ADaM) System

ADaM system developed under NASA research grant

The system provides knowledge discovery, feature

detection and content-based searching for data values, as

well as for metadata. It contains over 120 different operations to be performed on

the input data stream.

Operations vary from specialized atmospheric science data-

set specific algorithms to different digital image processing

techniques, processing modules for automatic pattern

recognition, machine perception, neural networks and genetic

algorithms.

Page 5: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Features

Handles science data set variability Multiple resolution/multiple scales Variability of formats Granularity of data Includes spatial/temporal dimensions

Allows addition of new algorithms

Allow scientists to select and sequence

different operations

Page 6: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Engine ADaM Engine ArchitectureArchitecture

PreprocessedData

PreprocessedData

DataDataTranslated

Data

Patterns/ModelsPatterns/Models

ResultsResults

OutputGIF ImagesHDF-EOSHDF Raster ImagesHDF SDSPolygons (ASCII, DXF)SSM/I MSFC

Brightness TempTIFF ImagesOthers...

Preprocessing AnalysisClustering K Means Isodata MaximumPattern Recognition Bayes Classifier Min. Dist. ClassifierImage Analysis Boundary Detection Cooccurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture OperationsGenetic AlgorithmsNeural NetworksOthers...

Selection and Sampling Subsetting Subsampling Select by Value Coincidence SearchGrid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find HolesImage Processing Cropping Inversion ThresholdingOthers...

Processing

InputHDFHDF-EOSGIF PIP-2SSM/I PathfinderSSM/I TDRSSM/I NESDIS Lvl 1BSSM/I MSFC

Brightness TempUS RainLandsatASCII GrassVectors (ASCII Text)

Intergraph RasterOthers...

Page 7: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Mining ADaM Mining EnvironmentEnvironment

MiningResults

Mining Engine (ADaM)AnalysisModules

InputModules

OutputModules

Analysis/Vis Tools

Knowledge Base

Distributed Clients

Web-basedWorkstation

basedOther Systems

Common Client API

Data Stores

Data Mining Server

Event/Relationship SearchSystem

Page 8: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Architecture

Page 9: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Miner Engine

Manages the processing of data through a series of specified operations Loads input, processing and output modules dynamically as needed at execution timeAllows for the addition of newly developed modules without the need to rebuild the engine Interprets a mining plan script that provides the details about specified operations and the order that they should be executed

Page 10: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Miner Database

Used to store information that includes the names, locations and related metadata for input data sets available on the serverIncludes information about users, jobs, mining results, and other related information Simple relational database

Page 11: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Daemon and Scheduler

Scheduler Examines the list of jobs to be executed on the

server and determines which job or jobs to execute at any given time

Queues the requests and executes them sequentially.

Daemon Handles all network communications with the

mining system Is configured to listen on a specific port for any

socket communications

Page 12: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Input/Operation Filters

Input/Output Filters are data readers and writersOperations are the algorithmsEach of the operations and (input/output) filters is implemented as a shared library New modules may be added to the system without recompiling or relinking. All operations/filters either produce or operate on a data collection, which provides a common format for representing scientific data.

Page 13: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

General Mining Steps

Select data files to be mined

“Check-In” the data files into the Miner Database

Write a “Mining Plan” consisting of sequence of input filter and operations

Execute the Mining Plan using the engine

Check and save results

Iterate

Page 14: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

What is Check-In?Process of encoding information such as the names, locations and related metadata for input data sets available on the serverCreate complex data hierarchy in the database

Page 15: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Plan Builder: Check-InTwo Modes of Operation-General: which only requiresminimal information-Advanced: requires moredetailed information and Allows user to set up structured database

Path to the data file

Data file name

Input Filter associated with theData file

Load an XML file containingexisting Check-In specifications

Page 16: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Plan Builder – Layout

Plan Menu allows one to:•Select a new plan•Load existing plan•Check-In data

Input Menu contains the listof Input Filters one can select

Operation Menu contains the listof operations one can select

Page 17: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Plan Builder – Layout

Panel where Mining Plan can be viewed either as text or a tree

Page 18: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Plan Builder – Layout

Description about the Operation/Input Filter can be viewed in this panel

Page 19: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Plan Builder – Layout

All the parameters needed forthe Operation are described here

Page 20: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Plan Builder – Layout

Sample values for Operation’sparameters are shown in this panel

Page 21: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Plan Builder – Layout

Go Mine the data using the Mining Plan

Allows user to select the operationand add it to the Mining Plan

Page 22: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

Research Directions

Generic Data Reader for ADaM ESML – Earth Science Markup Language

Programmers Guide for ADaM

Distributed Mining

Grid Mining Successful implementation and testing of the ADaM

system on the NASA Information Power Grid

Mining Onboard the Space Craft The EnVironmEnt for On-Board Processing (EVE) system

Page 23: ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific

ITSC/University of Alabama in Huntsville

ADaM Information

Web site: datamining.itsc.uah.edu

ADaM Lite beta version download Contact: [email protected]