38
European Grid of Solar Observations The need for Automatic Feature Recognition in the EGSO Project Bob Bentley, Valentina Zharkova, Jean Aboudarham & the EGSO Team Feature Recognition Workshop 23-24 October 2003, ROB

European Grid of Solar Observations The need for Automatic Feature Recognition in the EGSO Project Bob Bentley, Valentina Zharkova, Jean Aboudarham & the

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

The need for Automatic Feature Recognition in the EGSO Project

Bob Bentley, Valentina Zharkova, Jean Aboudarham & the EGSO

Team

Feature Recognition Workshop23-24 October 2003, ROB

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Outline

Overview of EGSO and its relationship with other projectsThe problem being addressed by EGSOEGSO Search capabilityFeature Recognition softwareCurrent status of EGSO

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

EGSO – European Grid of Solar Observations

EGSO is a Grid test-bed related to a particular application

Designed to improve access to solar data for the solar physics and other communitiesAddresses the generic problem of a distributed heterogeneous data set and a scattered user community

Funded under the Information Society Technologies (IST) thematic priority of the EC’s Fifth Framework Program (FP5)

Started March 2002; duration of 36 months

Involves 11 groups in Europe and the US, led by UCL-MSSL

4 in UK, 2 in France, 2 in Italy, 1 in Switzerland, 2 in USSeveral associate partners, mainly in the US

EGSO is interacting with many other projectsUS-VSO CoSEC and EGSO working closely together

Also working with the ILWS/SDO ProjectCollaborated with ESA’s study project SpaceGRIDInvolved with other EC funded Grid projects through GRIDSTART

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Objectives of EGSO

Support user community scattered around the worldCurrent and future projects are international collaborationsEGSO funded by EC, but has US partners

Provide access to solar data centres and observatories around the world

Data available in Europe (or US) not enough for many studies

Build enhanced search capability for solar dataAnalysis of solar data is event drivenSearch capability linked to this not currently available

Similar approach also required for STP, etc. data

Increasing data volumes, etc. require new methodology

Provide ability to process data at sourceBoth pipeline and more complex processing

Need for these capabilities not unique to solar physics…

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

The extended EGSO family

Partners provide expertise in solar physics and IT

UKUCL-MSSL, UCL-CS, RAL, Univ. Bradford, Astrium

FranceIAS (Orsay), Observatoire de Paris-Meudon

ItalyIstituto Nazionale di Astrofisico, Politecnico di Torino

INAf includes observatories of Turin, Florence, Naples and Trieste

SwitzerlandUniv. Applied Sciences (Windisch)

NetherlandsESA – Solar Group

USSDAC (NASA-GSFC), National Solar Observatory (VSO)Stanford University, Montana State University (VSO)Lockheed-Martin (CoSEC)

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Connections with related Projects

Virtual Solar Observatory (US-VSO)SDAC (NASA-GSFC) and NSO are partners in EGSO

Joe Gurman (SDAC) in VSO Project Scientist Frank Hill (NSO) is lead investigator

EGSO Coordinator in Chair of VSO Steering CommitteeDifferences in scale in objectives – big/small box…

Sun-Earth Connector (CoSEC)EGSO Coordinator is a CoI on the recently funded CoSEC continuation grantSignificant synergies between EGSO and CoSEC

The three project are trying to collaborate closelyJoint sessions at AAS and AGU; regular telecons.First joint meeting held at MSSL in Oct. 2002Joint Technical Meeting in conjunction with AGU (Dec. 2003)

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Potential expansion…

EGSO architecture is designed to be flexible, and the system should be able to handle potential areas of expansion. On the horizon are:

Living with a Star (ILWS)Solar Dynamic Observatory (SDO) will produce 2TB/dayEGSO could enable immediate access to the dataAnd optimize creation of “duplicate copy” in Europe

Modelling and SimulationsShould be possible if we can support uploading of code

Space WeatherImmediacy of access to data – few minutes to few hoursAccess to other types of data: STP, magnetospheric, etc.Use of models with parameters derived from data as input

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Generic Solar Physics Query

Identify suitable observations (many serendipitous)

As many different data sets as are availableShould be possible without accessing the data

Locate the dataData scattered, with differing means of access (some proprietary)

Often only need a subset of each data setProcess the data

Involves extraction and calibration of a subset of raw dataUses code defined by instrument teams (SolarSoft, C…)

Return results to the User

Compare results from different instrumentsSolarSoft (IDL) provides a standard platform for analysis

Note the exchange in order of the 3rd and 4th bullets in this Grid expression of the problem, as compared to current practice

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Nature of solar observations

For a complete picture of what is happening, we need to use as wide a range of observations as possible

The appearance of the Sun changes dramatically with wavelengthDifferent layers of the solar atmosphere and material at different temperatures are best seen at different wavelengths

For technical and practical reasons:UV, EUV, X-rays and -rays observed from spaceRadio and optical wavelengths observed from the ground

Issues related to coverage by each observatoryDifferences in approach to handling data have developed

The observations used to build up a picture of the plasma in multi-dimensional parameter space (incl. x, y, z, t, T & )

How plasma contained in 3d structures evolves with timeWhere and how energy released and how it affects the systemEtc…

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Some generic issues

We need to build on the existing situation

User community scattered around the worldCapabilities of users & their computing facilities vary greatlyUsers want to know if data addressing a problem exists

Not really interested in where the data are located Or, how the data are accessed, processed, etc.

Increasing desire for combined studies with other regimes Astrophysics, Climate Physics, Space Weather, etc.

Data centres and observatories located around the world

Large and small data providers (with varying resources)Need to make it as easy as possible to add new data sets

Planned data volumes much larger than for current instruments

Cataloguing differs in quality, contents, and dependenciesMust handle multiple copies of data and proprietary dataMust ensure integrity of data providers

Authentication an issue that needs serious consideration Need to minimize how it affects the user, etc.

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

The EGSO Search Engine

In order to provide an enhanced search capability, EGSO will improve the quality and availability of metadata

Enhanced cataloguing describes the data more fullyStandardized metadata versions of observing catalogues tie together the heterogeneous data setsNew types of catalogue allow searches on events, features and phenomena rather than just date & time, pointing, etc…

Ancillary data used to provide additional search criteriaImages, time series, derived products, etc.

Search Registry describes all metadata available for search

It will be possible to access to EGSO through:

A flexible Graphic User Interface (GUI ) – normal routeAn Application Program Interface (API) – this provides access for users from other applications, communities or Grids

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

The enhanced solar catalogues

Unified Observing Catalogues (UOC)Metadata form of observing catalogues used to tie together the heterogeneous data, leaving the data unchangedSelf describing (e.g. XML), quantised by time and instrument, with no dependencies on ancillary data or proprietary software and any errors correctedStandards defined for future data sets (e.g. STEREO, ILWS, Solar-B)

Solar Event Catalogues (SEC)Built from information contained in published listsFlare lists, CME lists, lists in SGD, etc.

Solar Feature Catalogue (SFC)Lists of the occurrence of events, phenomena and features provides an alternate means of selecting dataDerived using image recognition software developed in WP5

Similar hierarchical cataloguing required in other data Grid projects

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

The enhanced solar catalogues

Unified Observing Catalogues (UOC)Metadata form of observing catalogues used to tie together the heterogeneous data, leaving the data unchangedSelf describing (e.g. XML), quantised by time and instrument, with no dependencies on ancillary data or proprietary software and any errors correctedStandards defined for future data sets (e.g. STEREO, ILWS, Solar-B)

Solar Event Catalogues (SEC)Built from information contained in published listsFlare lists, CME lists, lists in SGD, etc.

Solar Feature Catalogue (SFC)Lists of the occurrence of events, phenomena and features provides an alternate means of selecting dataDerived using image recognition software

Similar hierarchical cataloguing required in other data Grid projects

Objective of the improved

metadata is to pose questions

like:

Identify events when a filament

eruption occurred within 30° of

the north-west limb and there

were good observations in H,

EUV and soft X-rays

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

One User Interface implementation

will not satisfy all user

requirements and users will be able

to tailor the interface to their needs

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Feature Recognition in EGSO

The enhanced search capability of EGSO requires development of new types of metadata – the Solar Feature Catalog (SFC) is a major part of this

Key to developing alternate routes into the data

EGSO has a work package (WP5) dedicated to developing tools needed to detect common solar features and then employing them to derive the feature catalog

Major undertaking! Where possible we need help from others in the community to:

Help verify the results and extend capability to as wide a range of features as possibleHelp refine ideas of how the results can be used

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Outline of progress on WP5

Software developed to prepare images for feature recognition codes

Removal of artifacts, regularize shape, etc.

Now working on codes to detect the featuresCodes for sunspots, active regions and filaments developed and under testCodes to recognize coronal holes and magnetic neutral lines under investigationDocument discussing techniques available shortly

Trying to define standard way of describing features for the feature catalogue

Preliminary version of SFC has been prepared Document on format available for discussion

Now starting experimenting with the results to determine if objectives can be realized with the stored information

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Image Preprocessing Toolkit

Difficulties with Images:Image shape (ellipse), centre and pole coordinatesWeather transparency (clouds) and different thickness of atmosphereCentre-to-limb darkeningDefects in data (strips, lines intensity)Errors in FITS header information

First part is code to clean images prior to further processing

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Sunspots detection in white light

Original image on the left and detected sunspots on the right

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Filament Detection in H

Original image on the left and detected filaments on the right

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Active Regions detection in H

Detected active regions on the right with corresponding result from Big Bear Solar Observatory on the left

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

AR detection in Ca K II, H and EUV images

04/04/2002 Ha07:23 + Ca K307:30 + EUV07:26

Ca KII

EUV

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Building the Solar Feature Catalog

For each feature we must first:Fully test the feature recognition code using images from a wide time period and several sourcesFinalize the format of how the information is described in the catalog

Then run code on a representative set of images

Summary & synoptic data gathered by SOHO one example of type, but not necessarily coherentIdeally need image cadence that allows us to have a reasonable idea of when things change

Probably requires use of images from several GBOs Raises issues of consistency related to image

quality, etc

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Use of the Solar Feature Catalog

The SFC can be used in at least three ways:Outline features recognized in one wavelength on an image taken in another (at a different time)Determine when events related to features have occurred – e.g. filament eruptions, flux emergenceTrack relative motion of features – e.g. sunspots

The SFC will be deployed as a Server addressed through Web Services

Not clear whether the SFC Server will be combined with the already deployed SEC ServerServer will be accessible to other VO projects

Feature Recognition software will be released under the EGSO’s Open Source software policy

Requirement of EC on IST Projects

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Current Status of EGSO

Extensive survey of requirements in 2002Working architecture defined and detailed during the first half of 2003

Release 1 of EGSO was demonstrated at IST2003 in Milan (October 2-4)

Demonstration of how the three roles work together

Solve simple query based on time and wavelength Access to data resources initially through SolarWeb

Working prototype of the Solar Event Catalog Server

New version of EGSO Data Model document was released recently

Describes both solar and heliospheric data

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Activities in the near future

First components of Feature Recognition software and documentation available shortly

Image Cleaning software is first part of toolkit

Release 2 of EGSO due at the end of November

Development of interface to Data Providers More complex query supported through SEC ServerGreater GUI capabilities

Release 3 of EGSO due mid-February 2004

Building profiles of data (etc.) providersDiscussing file formats and metadata with producers

Trying to finalize designs of UOC and Search Registry, and description of providers in Resource Registry

Format of synoptic maps discussed with NOAA and NSO

Discussing concerns and interfaces with data sources

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Conclusions

Of necessity the solar community needs to move towards a virtual environment to access solar and related data

EGSO is a Data/Computing GRID that will be a key part of the global virtual solar observatory and has already established close links with counterparts in the US

Feature recognition techniques are being used to detect solar features at a number of wavelengths

One of the ways that we are developing to provide innovative ways to search for solar observations

For more information on EGSO see:

http://www.egso.org

Or e-mail

[email protected]

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Grid Infrastructure

RD RI RD RI

RD RIRD RI

Brokers

User Interface

Role-specific Infrastructure

Connectors and Adapters

Internal Resources

Providers ConsumersE

GS

O

Architecture – Viewed as 3 Tiers

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Possible requirements for Data Providers

Need to register each dataset. Required info. could include:

Catalogue Information: All observations “should be summarised in observing catalogues” (prime source only)Data Map: This defines in broad outline what time intervals of the data are actually held. It should be considered dynamic. Type of storage: On-line, near on-line or off-line.

Means to retrieve data: The exact meaning of this information depends on whether the provider is active or passive.

Active source: address where process data can be retrieved from and the means of retrieval (ftp, http, etc.). Passive source: map of the physical location of data within the provider system and the means of retrieval.

Resource limits: The resource usage beyond this a provider switches from active to passive mode needs to be defined.

Details of access restrictions: If any part of a data set is proprietary (for some period) or otherwise restricted.Frequency of updates: So that system does not have to constantly monitor all data sources

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Access to Resources

EGSO is a Grid and activities depend on access to resources

Resources described by entries in a Resource Registry and managed by a Broker. Types include:

Metadata – from prime data providersData – from data centres, observatories, etcProcessing – simple, multi-instance processors, HPC(?) Storage – cache space, on-line mass storage, etc.Services – support of complex (meta)data products

Note: Some providers can support multiple capabilities

The Broker allocates resources and controls:How much being requested of a particular providerProcessing of data & staging of results

Processing may be at different site to data provider

Broker & Registries replicated to provide system resilience and permit load sharing

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Ancillary Data

This is “catch-all” term for all non-catalogue items

Items used to set the context for a searchImages, time series, derived parameters, etc.Processed products from data-intensive instrumentsSTP, etc. data could be incorporated in this way

Some data items will have to be derived on-the-flyNot possible for everything to have been prepared alreadyServers will provide this type of complex data products

Products include derived parameters, specialized actions, etc. e.g. GOES temperature, matched areas of images, etc.

SolarSoft packages, e.g. Chianti, could be brought up like this

Objective of the improved

metadata is to pose questions

like:

Identify events when a filament

eruption occurred within 30° of

the north-west limb and there

were good observations in H,

EUV and soft X-rays

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Handling the data

EGSO will dramatically enhance access to solar data

Data could be located anywhere in the worldUser only needs to know observations exist, not where locatedSystem able to optimize use of sources (closest, least used, etc) and handle of replicated data and aggregated sources Burden on provider minimized to encourage participation

As far as possible, process the data “at source”Involves extraction and calibration of a subset of the raw data

Software for processing defined by instrument team (IDL, C…)Processing reduces volumes of data moved aroundSimplifies requirements on user’s own system

Standard (pipe-line) processing adequate for many users

More complex problems require ability to upload codeUsed in analysis of extended data sets (helioseismology, etc)System allocates resources; Security an issueModels and simulations have similar requirements

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

EGSO Services & Products

EGSO will establish a number of services which can also be used by other communities and Grids

API access to the EGSO Search EngineSupports complex queries of all metadata as a service

Solar Event Catalogue serversSites that specialize in providing access to Solar Event data

Servers of complex data productsServers able to process requested items on the fly

Products include derived parameters, specialized actions, etc. e.g. GOES temperature, matched areas of images, etc.

SolarSoft packages, e.g. Chianti, could be brought up like this

A lot of scope for processing a number of things in this way

Extracted & processed data productsRequested data can be provided in a number of formats

Single file type/format will not satisfy all requirements

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Definition of Requirements

Solicitation of RequirementsUser Survey - in collaboration with SpaceGRID, in March 2002, with over 100 responsesUse Cases - covering multiple usage modes and scientific goalsUser Consultation - discussions at meetings and other discussionsBrainstorming - Systems Concepts Document, cross-fertilization from other GRID and Virtual Observatory projects

Total of 220 requirements definedRequirements refined to address system capabilities, not implementation

Specified in formal manner applicable in system design, but annotated to allow for meaning to be readily understood by stakeholdersCurrently soliciting feedback and finalizing priority of each requirement

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Architecture – Roles and Relations

Architecture defined in terms of rolesProvider, Broker and Consumer

provider

provider

provider

provider

consumerconsumer

consumer

Information Broker

Information Broker

Information Broker

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

Prototypes of Services

Solar Event Catalogue (SEC) ServerServer specializing in event catalogues

EGSO interested in things that could be added

Currently being tested with: Active regions, CMEs, and Flares using data from

NOAA, Hawaii and LASCO, etc. Can be seen through URL:

http://radiosun.ts.astro.it/sec/sec.php

Database for Solar Observatories (DSO)Registry needed within EGSO to provide more detailed information on possible data sourcesCurrently being populated and will shortly be accessible through the Web

Eur

opea

n G

rid

of S

olar

Obs

erva

tions

http://radiosun.ts.astro.it/sec/sec.php