View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
The need for Automatic Feature Recognition in the EGSO Project
Bob Bentley, Valentina Zharkova, Jean Aboudarham & the EGSO
Team
Feature Recognition Workshop23-24 October 2003, ROB
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Outline
Overview of EGSO and its relationship with other projectsThe problem being addressed by EGSOEGSO Search capabilityFeature Recognition softwareCurrent status of EGSO
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
EGSO – European Grid of Solar Observations
EGSO is a Grid test-bed related to a particular application
Designed to improve access to solar data for the solar physics and other communitiesAddresses the generic problem of a distributed heterogeneous data set and a scattered user community
Funded under the Information Society Technologies (IST) thematic priority of the EC’s Fifth Framework Program (FP5)
Started March 2002; duration of 36 months
Involves 11 groups in Europe and the US, led by UCL-MSSL
4 in UK, 2 in France, 2 in Italy, 1 in Switzerland, 2 in USSeveral associate partners, mainly in the US
EGSO is interacting with many other projectsUS-VSO CoSEC and EGSO working closely together
Also working with the ILWS/SDO ProjectCollaborated with ESA’s study project SpaceGRIDInvolved with other EC funded Grid projects through GRIDSTART
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Objectives of EGSO
Support user community scattered around the worldCurrent and future projects are international collaborationsEGSO funded by EC, but has US partners
Provide access to solar data centres and observatories around the world
Data available in Europe (or US) not enough for many studies
Build enhanced search capability for solar dataAnalysis of solar data is event drivenSearch capability linked to this not currently available
Similar approach also required for STP, etc. data
Increasing data volumes, etc. require new methodology
Provide ability to process data at sourceBoth pipeline and more complex processing
Need for these capabilities not unique to solar physics…
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
The extended EGSO family
Partners provide expertise in solar physics and IT
UKUCL-MSSL, UCL-CS, RAL, Univ. Bradford, Astrium
FranceIAS (Orsay), Observatoire de Paris-Meudon
ItalyIstituto Nazionale di Astrofisico, Politecnico di Torino
INAf includes observatories of Turin, Florence, Naples and Trieste
SwitzerlandUniv. Applied Sciences (Windisch)
NetherlandsESA – Solar Group
USSDAC (NASA-GSFC), National Solar Observatory (VSO)Stanford University, Montana State University (VSO)Lockheed-Martin (CoSEC)
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Connections with related Projects
Virtual Solar Observatory (US-VSO)SDAC (NASA-GSFC) and NSO are partners in EGSO
Joe Gurman (SDAC) in VSO Project Scientist Frank Hill (NSO) is lead investigator
EGSO Coordinator in Chair of VSO Steering CommitteeDifferences in scale in objectives – big/small box…
Sun-Earth Connector (CoSEC)EGSO Coordinator is a CoI on the recently funded CoSEC continuation grantSignificant synergies between EGSO and CoSEC
The three project are trying to collaborate closelyJoint sessions at AAS and AGU; regular telecons.First joint meeting held at MSSL in Oct. 2002Joint Technical Meeting in conjunction with AGU (Dec. 2003)
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Potential expansion…
EGSO architecture is designed to be flexible, and the system should be able to handle potential areas of expansion. On the horizon are:
Living with a Star (ILWS)Solar Dynamic Observatory (SDO) will produce 2TB/dayEGSO could enable immediate access to the dataAnd optimize creation of “duplicate copy” in Europe
Modelling and SimulationsShould be possible if we can support uploading of code
Space WeatherImmediacy of access to data – few minutes to few hoursAccess to other types of data: STP, magnetospheric, etc.Use of models with parameters derived from data as input
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Generic Solar Physics Query
Identify suitable observations (many serendipitous)
As many different data sets as are availableShould be possible without accessing the data
Locate the dataData scattered, with differing means of access (some proprietary)
Often only need a subset of each data setProcess the data
Involves extraction and calibration of a subset of raw dataUses code defined by instrument teams (SolarSoft, C…)
Return results to the User
Compare results from different instrumentsSolarSoft (IDL) provides a standard platform for analysis
Note the exchange in order of the 3rd and 4th bullets in this Grid expression of the problem, as compared to current practice
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Nature of solar observations
For a complete picture of what is happening, we need to use as wide a range of observations as possible
The appearance of the Sun changes dramatically with wavelengthDifferent layers of the solar atmosphere and material at different temperatures are best seen at different wavelengths
For technical and practical reasons:UV, EUV, X-rays and -rays observed from spaceRadio and optical wavelengths observed from the ground
Issues related to coverage by each observatoryDifferences in approach to handling data have developed
The observations used to build up a picture of the plasma in multi-dimensional parameter space (incl. x, y, z, t, T & )
How plasma contained in 3d structures evolves with timeWhere and how energy released and how it affects the systemEtc…
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Some generic issues
We need to build on the existing situation
User community scattered around the worldCapabilities of users & their computing facilities vary greatlyUsers want to know if data addressing a problem exists
Not really interested in where the data are located Or, how the data are accessed, processed, etc.
Increasing desire for combined studies with other regimes Astrophysics, Climate Physics, Space Weather, etc.
Data centres and observatories located around the world
Large and small data providers (with varying resources)Need to make it as easy as possible to add new data sets
Planned data volumes much larger than for current instruments
Cataloguing differs in quality, contents, and dependenciesMust handle multiple copies of data and proprietary dataMust ensure integrity of data providers
Authentication an issue that needs serious consideration Need to minimize how it affects the user, etc.
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
The EGSO Search Engine
In order to provide an enhanced search capability, EGSO will improve the quality and availability of metadata
Enhanced cataloguing describes the data more fullyStandardized metadata versions of observing catalogues tie together the heterogeneous data setsNew types of catalogue allow searches on events, features and phenomena rather than just date & time, pointing, etc…
Ancillary data used to provide additional search criteriaImages, time series, derived products, etc.
Search Registry describes all metadata available for search
It will be possible to access to EGSO through:
A flexible Graphic User Interface (GUI ) – normal routeAn Application Program Interface (API) – this provides access for users from other applications, communities or Grids
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
The enhanced solar catalogues
Unified Observing Catalogues (UOC)Metadata form of observing catalogues used to tie together the heterogeneous data, leaving the data unchangedSelf describing (e.g. XML), quantised by time and instrument, with no dependencies on ancillary data or proprietary software and any errors correctedStandards defined for future data sets (e.g. STEREO, ILWS, Solar-B)
Solar Event Catalogues (SEC)Built from information contained in published listsFlare lists, CME lists, lists in SGD, etc.
Solar Feature Catalogue (SFC)Lists of the occurrence of events, phenomena and features provides an alternate means of selecting dataDerived using image recognition software developed in WP5
Similar hierarchical cataloguing required in other data Grid projects
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
The enhanced solar catalogues
Unified Observing Catalogues (UOC)Metadata form of observing catalogues used to tie together the heterogeneous data, leaving the data unchangedSelf describing (e.g. XML), quantised by time and instrument, with no dependencies on ancillary data or proprietary software and any errors correctedStandards defined for future data sets (e.g. STEREO, ILWS, Solar-B)
Solar Event Catalogues (SEC)Built from information contained in published listsFlare lists, CME lists, lists in SGD, etc.
Solar Feature Catalogue (SFC)Lists of the occurrence of events, phenomena and features provides an alternate means of selecting dataDerived using image recognition software
Similar hierarchical cataloguing required in other data Grid projects
Objective of the improved
metadata is to pose questions
like:
Identify events when a filament
eruption occurred within 30° of
the north-west limb and there
were good observations in H,
EUV and soft X-rays
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
One User Interface implementation
will not satisfy all user
requirements and users will be able
to tailor the interface to their needs
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Feature Recognition in EGSO
The enhanced search capability of EGSO requires development of new types of metadata – the Solar Feature Catalog (SFC) is a major part of this
Key to developing alternate routes into the data
EGSO has a work package (WP5) dedicated to developing tools needed to detect common solar features and then employing them to derive the feature catalog
Major undertaking! Where possible we need help from others in the community to:
Help verify the results and extend capability to as wide a range of features as possibleHelp refine ideas of how the results can be used
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Outline of progress on WP5
Software developed to prepare images for feature recognition codes
Removal of artifacts, regularize shape, etc.
Now working on codes to detect the featuresCodes for sunspots, active regions and filaments developed and under testCodes to recognize coronal holes and magnetic neutral lines under investigationDocument discussing techniques available shortly
Trying to define standard way of describing features for the feature catalogue
Preliminary version of SFC has been prepared Document on format available for discussion
Now starting experimenting with the results to determine if objectives can be realized with the stored information
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Image Preprocessing Toolkit
Difficulties with Images:Image shape (ellipse), centre and pole coordinatesWeather transparency (clouds) and different thickness of atmosphereCentre-to-limb darkeningDefects in data (strips, lines intensity)Errors in FITS header information
First part is code to clean images prior to further processing
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Sunspots detection in white light
Original image on the left and detected sunspots on the right
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Filament Detection in H
Original image on the left and detected filaments on the right
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Active Regions detection in H
Detected active regions on the right with corresponding result from Big Bear Solar Observatory on the left
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
AR detection in Ca K II, H and EUV images
04/04/2002 Ha07:23 + Ca K307:30 + EUV07:26
Hα
Ca KII
EUV
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Building the Solar Feature Catalog
For each feature we must first:Fully test the feature recognition code using images from a wide time period and several sourcesFinalize the format of how the information is described in the catalog
Then run code on a representative set of images
Summary & synoptic data gathered by SOHO one example of type, but not necessarily coherentIdeally need image cadence that allows us to have a reasonable idea of when things change
Probably requires use of images from several GBOs Raises issues of consistency related to image
quality, etc
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Use of the Solar Feature Catalog
The SFC can be used in at least three ways:Outline features recognized in one wavelength on an image taken in another (at a different time)Determine when events related to features have occurred – e.g. filament eruptions, flux emergenceTrack relative motion of features – e.g. sunspots
The SFC will be deployed as a Server addressed through Web Services
Not clear whether the SFC Server will be combined with the already deployed SEC ServerServer will be accessible to other VO projects
Feature Recognition software will be released under the EGSO’s Open Source software policy
Requirement of EC on IST Projects
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Current Status of EGSO
Extensive survey of requirements in 2002Working architecture defined and detailed during the first half of 2003
Release 1 of EGSO was demonstrated at IST2003 in Milan (October 2-4)
Demonstration of how the three roles work together
Solve simple query based on time and wavelength Access to data resources initially through SolarWeb
Working prototype of the Solar Event Catalog Server
New version of EGSO Data Model document was released recently
Describes both solar and heliospheric data
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Activities in the near future
First components of Feature Recognition software and documentation available shortly
Image Cleaning software is first part of toolkit
Release 2 of EGSO due at the end of November
Development of interface to Data Providers More complex query supported through SEC ServerGreater GUI capabilities
Release 3 of EGSO due mid-February 2004
Building profiles of data (etc.) providersDiscussing file formats and metadata with producers
Trying to finalize designs of UOC and Search Registry, and description of providers in Resource Registry
Format of synoptic maps discussed with NOAA and NSO
Discussing concerns and interfaces with data sources
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Conclusions
Of necessity the solar community needs to move towards a virtual environment to access solar and related data
EGSO is a Data/Computing GRID that will be a key part of the global virtual solar observatory and has already established close links with counterparts in the US
Feature recognition techniques are being used to detect solar features at a number of wavelengths
One of the ways that we are developing to provide innovative ways to search for solar observations
For more information on EGSO see:
http://www.egso.org
Or e-mail
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Grid Infrastructure
RD RI RD RI
RD RIRD RI
Brokers
User Interface
Role-specific Infrastructure
Connectors and Adapters
Internal Resources
Providers ConsumersE
GS
O
Architecture – Viewed as 3 Tiers
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Possible requirements for Data Providers
Need to register each dataset. Required info. could include:
Catalogue Information: All observations “should be summarised in observing catalogues” (prime source only)Data Map: This defines in broad outline what time intervals of the data are actually held. It should be considered dynamic. Type of storage: On-line, near on-line or off-line.
Means to retrieve data: The exact meaning of this information depends on whether the provider is active or passive.
Active source: address where process data can be retrieved from and the means of retrieval (ftp, http, etc.). Passive source: map of the physical location of data within the provider system and the means of retrieval.
Resource limits: The resource usage beyond this a provider switches from active to passive mode needs to be defined.
Details of access restrictions: If any part of a data set is proprietary (for some period) or otherwise restricted.Frequency of updates: So that system does not have to constantly monitor all data sources
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Access to Resources
EGSO is a Grid and activities depend on access to resources
Resources described by entries in a Resource Registry and managed by a Broker. Types include:
Metadata – from prime data providersData – from data centres, observatories, etcProcessing – simple, multi-instance processors, HPC(?) Storage – cache space, on-line mass storage, etc.Services – support of complex (meta)data products
Note: Some providers can support multiple capabilities
The Broker allocates resources and controls:How much being requested of a particular providerProcessing of data & staging of results
Processing may be at different site to data provider
Broker & Registries replicated to provide system resilience and permit load sharing
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Ancillary Data
This is “catch-all” term for all non-catalogue items
Items used to set the context for a searchImages, time series, derived parameters, etc.Processed products from data-intensive instrumentsSTP, etc. data could be incorporated in this way
Some data items will have to be derived on-the-flyNot possible for everything to have been prepared alreadyServers will provide this type of complex data products
Products include derived parameters, specialized actions, etc. e.g. GOES temperature, matched areas of images, etc.
SolarSoft packages, e.g. Chianti, could be brought up like this
Objective of the improved
metadata is to pose questions
like:
Identify events when a filament
eruption occurred within 30° of
the north-west limb and there
were good observations in H,
EUV and soft X-rays
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Handling the data
EGSO will dramatically enhance access to solar data
Data could be located anywhere in the worldUser only needs to know observations exist, not where locatedSystem able to optimize use of sources (closest, least used, etc) and handle of replicated data and aggregated sources Burden on provider minimized to encourage participation
As far as possible, process the data “at source”Involves extraction and calibration of a subset of the raw data
Software for processing defined by instrument team (IDL, C…)Processing reduces volumes of data moved aroundSimplifies requirements on user’s own system
Standard (pipe-line) processing adequate for many users
More complex problems require ability to upload codeUsed in analysis of extended data sets (helioseismology, etc)System allocates resources; Security an issueModels and simulations have similar requirements
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
EGSO Services & Products
EGSO will establish a number of services which can also be used by other communities and Grids
API access to the EGSO Search EngineSupports complex queries of all metadata as a service
Solar Event Catalogue serversSites that specialize in providing access to Solar Event data
Servers of complex data productsServers able to process requested items on the fly
Products include derived parameters, specialized actions, etc. e.g. GOES temperature, matched areas of images, etc.
SolarSoft packages, e.g. Chianti, could be brought up like this
A lot of scope for processing a number of things in this way
Extracted & processed data productsRequested data can be provided in a number of formats
Single file type/format will not satisfy all requirements
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Definition of Requirements
Solicitation of RequirementsUser Survey - in collaboration with SpaceGRID, in March 2002, with over 100 responsesUse Cases - covering multiple usage modes and scientific goalsUser Consultation - discussions at meetings and other discussionsBrainstorming - Systems Concepts Document, cross-fertilization from other GRID and Virtual Observatory projects
Total of 220 requirements definedRequirements refined to address system capabilities, not implementation
Specified in formal manner applicable in system design, but annotated to allow for meaning to be readily understood by stakeholdersCurrently soliciting feedback and finalizing priority of each requirement
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Architecture – Roles and Relations
Architecture defined in terms of rolesProvider, Broker and Consumer
provider
provider
provider
provider
consumerconsumer
consumer
Information Broker
Information Broker
Information Broker
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Prototypes of Services
Solar Event Catalogue (SEC) ServerServer specializing in event catalogues
EGSO interested in things that could be added
Currently being tested with: Active regions, CMEs, and Flares using data from
NOAA, Hawaii and LASCO, etc. Can be seen through URL:
http://radiosun.ts.astro.it/sec/sec.php
Database for Solar Observatories (DSO)Registry needed within EGSO to provide more detailed information on possible data sourcesCurrently being populated and will shortly be accessible through the Web