33
Helioseismic and Magnetic Imager for Solar Dynamics Observatory AIA Logo Forthcoming HMI AIA JSOC Ground Data System Plan Overview HMI-S019 CDR Version 10 November 2004 Stanford University Hansen Experimental Physics Laboratory and Lockheed-Martin Solar and Astrophysics Laboratory

Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

Helioseismic and Magnetic Imager for

Solar Dynamics Observatory

AIA Logo Forthcoming

HMI AIA JSOC

Ground Data System Plan Overview

HMI-S019

CDR Version 10 November 2004

Stanford University Hansen Experimental Physics Laboratory and

Lockheed-Martin Solar and Astrophysics Laboratory

Page 2: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

The cover of the NASA 1984 report "Probing the Depth of a Star: The Study of Solar Oscillations from Space" featured Hirschhorn's the Pomodoro Sphere. That report led to the helioseismic study of the global Sun. Pomodoro's Cube at Stanford symbolizes HMI data cubes for investigation of localized regions in the Sun.

Page 3: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

Changed on 10 November 2004 2:36 PM

HMI & AIA Joint Science Operations Center Ground Data System

for Science Data Processing Plan Overview

a.k.a.

JSOC-GDS Overview

HMI-S019

10 November 2004 2:33 PM

First update for JSOC

version 0.7

i

Page 4: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

ii

Page 5: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

Contents

1. Overview ................................................................................................................................ 1

1.1. Scope of this document ................................................................................................. 2 1.2. History of JSOC ............................................................................................................ 2 1.3. JSOC role in SDO ......................................................................................................... 3 1.4. Heritage from MDI ....................................................................................................... 7 1.5. Data Access Policy......................................................................................................... 7 1.6. Related Documents ....................................................................................................... 8

2. JSOC GDS Functions ........................................................................................................... 9 2.1. Basic JSOC Data Handling and Processing Concepts Common Infrastructure . 9

2.1.1. Data Organization and Naming......................................................................... 10 2.1.2. Meta-data............................................................................................................. 12 2.1.3. Data Catalog ........................................................................................................ 12 2.1.4. Data formats ........................................................................................................ 13 2.1.5. Processing Control .............................................................................................. 13 2.1.6. Five Kinds of Users ............................................................................................. 13

2.2. Data Capture System .................................................................................................. 14 2.3. Level-0 Data Processing ............................................................................................. 15 2.4. Data Processing to Level-1 ......................................................................................... 15 2.5. HMI Higher Level Processing.................................................................................... 16 2.6. AIA Higher Level Processing..................................................................................... 21 2.7. Archives ....................................................................................................................... 21 2.8. Export System ............................................................................................................. 21 2.9. Integration & Test....................................................................................................... 22

3. JSOC Data Volumes ........................................................................................................... 23 4. JSOC Hardware Configuration......................................................................................... 23

4.1. Capture system............................................................................................................ 23 4.2. Pipeline Processing system ......................................................................................... 25 4.3. Archive (online, offline, nearline, shelf, offsite)........................................................ 25 4.4. Connectivity (web, grid, DDS, LMSAL, MOC, etc) ................................................ 26

5. Development Plan ............................................................................................................... 27 5.1. Data EGSE................................................................................................................... 28 5.2. Prototype System ........................................................................................................ 28 5.3. Pipeline......................................................................................................................... 28 5.4. Science Modules .......................................................................................................... 28

iii

Page 6: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

Changed on 10 November 2004 2:36 PM

1. Overview The SDO HMI Contract Performance Specification states that:

9.3 Science Operations Control Center (SOC) The Contractor shall design, develop and operate a Science Operations Center (SOC). The SOC will interface with the SDO Mission Operations Center (MOC) to provide operational interfaces for the on-orbit operation and monitoring of the instrument and the transfer of instrument housekeeping and science data collected by the spacecraft and transmitted to the MOC ground stations. The SOC shall subsequently convert this raw science data into valid research quality data and data products for archival and public access and distribution. Some key functions to be performed in the SOC include: a. Instrument Flight Operations. The conduct of instrument on-orbit flight operations in concert with the SDO MOC and its associated Flight Operations Team (FOT). The Contractor shall be responsible for the command planning and the health and safety monitoring of the flight instrument. The specific extent of command authority will be resolved during mission operations development dialogues with the SDO Project, but as a minimum, any command activity that constitutes a hazardous command activity or requires coordination with the FOT for reconfiguring of spacecraft resources or reconfiguring another instrument shall be planned, integrated and authorized by the FOT. The SOC shall provide all the required operator interfaces to display, monitor and analyze the instrument operating state, operating condition and trended behavior; support the instrument operation and observation planning and the build of associated commands and command loads; and provide effective communication means for contact and coordination with the MOC and among the instrument operations team members. b. Science Data Processing. The receipt, sorting, quality checking and process of the instrument science data forwarded to the SOC by the SDO ground system. The Contractor shall provide the required software and computational algorithms to process this data into the required science data products on a regular, routine basis. This effort includes the need to monitor the calibration of the flight instrument and adjust the processing software accordingly. The Contractor shall store and archive the science data and perishable data products, and shall provide public access and distribution to the data and data products. The Contractor shall prepare and submit the plan for the architecture, data flow, processing, archival and distribution of the science data through the SOC (CDRL SD326).

This document is the first of the three parts of CDRL SD326.

1

Page 7: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

1.1. Scope of this document

The HMI Science Operations Center (SOC) has been merged with the AIA SOC to become the HMI/AIA Joint Science Operations Center (JSOC). The JSOC consists of two parts. The Ground Data System (GDS) at Stanford University provides the HMI and AIA data capture from the SDO DDS at White Sands, the processing through level-0 and level-1 data products, and permanent local and offsite archiving of the telemetry data streams. The JSOC-GDS also provides higher level science data processing for HMI through to the level-2 data products identified as HMI standard data products. The JSOC-OPS center at Lockheed Solar and Astrophysics Laboratory provides the instrument commanding and health and safety monitoring.

This document contains the top level description of the JSOC-GDS.

1.2. History of JSOC

The HMI investigation was selected for SDO in August 2002. The SDO plan for operations, data capture and data analysis calls for each instrument investigation team to provide a Science Operations Center or SOC where the instrument will be operated and monitored and where the science data will be archived and, processed into high level data products. The science analysis of these data products is also a responsibility of the selected investigations but is beyond the scope of the SOCs. The original plan for the HMI SOC was to build upon the successful SOHO/MDI SOI Science Support Center at Stanford University for the science data handling functions of the HMI SOC and to operate the HMI instrument jointly with our Co-I team at Lockheed Martin Solar and Astrophysics Laboratory (LMSAL).

In the fall of 2003 NASA decided to add the LMSAL proposed AIA instrument to the SDO mission. At that time it became clear that the most cost effective and science-efficient plan was to merge the AIA SOC with the HMI SOC to form the Joint Science Operations Center (JSOC) for the HMI and AIA investigations. For several reasons including ease of access by visitors at Stanford, in place secure access areas at LMSAL, lower cost of procurement and staffing at Stanford, and the operations experience at LMSAL we jointly concluded that the commanding and health monitoring for both instruments would best occur at LMSAL while the data archiving, low level processing, and data export functions would best be handled at Stanford. The higher level science analysis for each instrument will be centered at the lead institution for each instrument. Thus we have the present JSOC split into two parts, the JSOC-OPS at LMSAL and the JSOC-GDS at Stanford.

For many years the solar physics groups at Stanford University and LMSAL have cooperated on a number of missions under the umbrella of the Stanford-Lockheed Institute for Space Research. This agreement of cooperation between the two institutions helps to enable a seamless joint effort.

2

Page 8: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

1.3. JSOC role in SDO

The SDO SOC roles include several functions which the HMI and AIA teams have divided into three distinct tasks. These are the JSOC-OPS center to be located at the LMSAL facility in Palo Alto, the JSOC-GDS center to be located on the Stanford University campus, and the AIA interactive science processing to be located at LMSAL. The JSOC-OPS functions and the low level JSOC-GDS functions for both HMI and AIA are essentially identical. The higher level processing differs and will be accomplished cooperatively but independently at Stanford and LMSAL for HMI and AIA respectively.

The HMI goals require processing of the HMI data into higher level helioseismology and magnetic field derived standard data products to enable the data analysis needed to advance the investigations. These processing tasks are very data intensive and require easy access to the whole volume of HMI data with a sequence of processing steps. The processing and data handling infrastructure required for the JSOC low level data processing is very well suited for these tasks. Therefore for HMI the higher level processing to standard data products will be accomplished in the same computational system as the combined lower level processing.

The AIA science goals require interactive processing of AIA data with easy access by scientists to movie-like sequences of data images and derived products. These processing steps will be built on the experience of the AIA team which derives from the TRACE and Yohkoh missions. The interactive processing center will be located at LMASL and connected to the Stanford JSOC -GDS center with dedicated high bandwidth networking.

In summary, the JSOC-OPS will for both HMI and AIA: • Provide secure workstation area for instrument commanding and status monitoring. • Be the access point for the dedicated network connections to the SDO Mission

Operations Center (MOC). • Provide an isolated facility for reviewing quicklook science data provided by the JSOC-

GDS. • Implement the science observing plan developed by each instruments Science Planning

Team. • Provide command history logs needed for data analysis to the JSOC-GDS.

The JSOC-GDS will for both HMI and AIA:

• Provide the JSOC-GDS Data Capture System including: o An isolated area for data receipt from the SDO Data Distribution System (DDS) o The access point for the dedicated network connections to the DDS. o The data capture from the DDS with archiving and offsite copy of the DDS

delivered data o Processing raw data to level-0 images with associated meta-data o Receive JSOC-OPS logs and SDO FDS and planning data

3

Page 9: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

o Deliver the level-0 data, logs, FDS data, and planning data to the JSOC-GDS pipeline processing system.

• Provide the JSOC-GDS Pipeline Processing System including: o Provide the data center area for processing, archiving, and export functions. o Archive and distribution on request of level-0 data o Produce quicklook images with required processing and deliver to JSOC-OPS o Produce level-1 calibrated science observables for HMI o Produce level-1a subset of calibrated observables for AIA o Process, archive, and distribute JSOC-OPS logs, SDO FDS and planning data o Produce level-2 HMI data products o Distribution on request of level-1, level-1a and level-2 data in standard data

storage formats o Produce and distribute selected HMI and AIA Space Weather data products o Archive selected HMI level-1, AIA level-1a, and level-2 products o Produce, archive, and distribute selected HMI higher level products o Receive, archive, and distribute selected AIA higher level science analysis

products o Provide data catalog with access to the broad user community o Support LMSAL led development of AIA calibration and quicklook processing

software o Support HMI and AIA Co-Investigators led development of analysis software that

will run in the JSOC processing pipeline o Support E/PO related access to selected data products

Figure 1 shows the big picture role of the JSOC in the SDO mission while Figure 2 shows the connectivity between the components of the JSOC, the SDO MOC, the SDO DDS, and the user community.

In line with the LWS Data Management Plan the term SOC as used here, and hopefully in subsequent documents, will refer to the instrument operations, data capture, and data processing through to science data products to be distributed to the science teams (including the PIs science teams), data archive, and data export functions. The JSOC does not include processing for science investigations per se although JSOC facilities will be used to support the investigations by the PI teams as well as Co-Is and other LWS investigators.

The essence of SDO is to point at the Sun, turn on, and observe. The basic observation sequence is to simply start a standard observing program and let it run for a solar cycle. HMI has some calibration modes that will be run and AIA has more than one standard mode but the basic operation mode is near autonomous and continuous operations. The JSOC operations components will be exceedingly simple as compared even to SOHO/MDI. The plan is to have two dedicated commanding workstations one each for HMI and AIA. Each will be able to function as a redundant system for the other. These systems will be used to generate command loads, monitor instrument health and status, and capture the instrument and observatory housekeeping data from the MOC. The only communications to the MOC will be via these workstations. They will be located in a space at LMSAL that meets NASA data system IT-2

4

Page 10: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

security requirements. They will be staffed normally only during the regular business day. The operator on duty will normally tend both instruments.

The bulk of the JSOC activities are concerned with the science data. HMI will generate 55Mbps and AIA about 67Mbps. While this is a significant dataflow, about 15Mbytes/sec or 1.4TB/day, the number of separate images for HMI is 30/minute and for AIA is 48/minute and the complexity of image types 36 for HMI (6 tunings at 6 polarizations) and 8-12 for AIA is similar in scope to the SOHO/MDI program which had 5-6 images per minute but dozens of distinct image types in normal operations. Thus, while the data volume is some 800 times larger than MDI, the complexity is comparable. Perhaps more important, the data volume is at about the same compared to the state of the art in computer hardware a few years before launch. The hardware needed to deal with the dataflow is larger than that found in a typical solar physics analysis facility but significantly less than that at a supercomputer center. In terms of floor space needed to accommodate the computer hardware the two projects, HMI/AIA JSOC in 2008 and MDI in 1995 are similar.

5

Page 11: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

Telemetry & Command(T&C)

SystemASIST / FEDS

Telemetry MonitoringCommand Management

HK Data Archival (DHDS)HK Level-0 Processing

Automated OperationsAnomaly detection

Flight DynamicsSystem

Orbit DeterminationManeuver PlanningProduct Generation

R/T Attitude DeterminationSensor/Actuator Calibration

SDO Mission Operations Center

EVE SOC(LASP /

Boulder Co.)

Acquisition Data

Observatory Commands

Observatory Housekeeping Telemetry

Tracking Data

Trending

Mission Planning& Scheduling

plan daily/periodic eventscreate engineering planGenerate Daily Loads

HMIScience Data

55Mbps

Ka-Band:150 Mbps

Science Data

S-Band: TRK,Cmd & HK Tlm

Instrument Commands / Loads

Data DistributionSystem(Incl. 30-Day

Science Data Storage)

SDO Ground Site #1(White Sands)

S-Bandground system

Ka-Bandground system

Ka Science Data

AIAScience Data

67Mbps

EVEScience Data

7 Mbps

R/T Housekeeping Telemetry/ Science Planning and FDS Products

Science Planning and FDS Products

ExternalTracking Station

S-Band HK Tlm, TRK Data

AcquisitionData

Station/DDS Control

Station/DDS Status

Ground StationControl System

DDSControl System

SDO Ground Site #2(White Sands)

Ka-Band:150 Mbps

Science Data

R/T Housekeeping Telemetry

S-Band: TRK, Cmd & HK Tlm

Cmd & HK TlmS-Band: TRK,

Cmd(Includes 72-hr storage)

S-Bandground system

Ka-Bandground system

Alert NotificationSystem (ANS)

Flight SoftwareMaintenance Lab

(FLATSAT)

Flight software loadsSimulated housekeeping telemetry

Memory dumpsSimulated commands

(Includes 48-hr storage)

Same Interfacesas Prime Ground Site

(Includes 72-hr storage)

(Includes 48-hr storage)

Status and Control

HMI AIA JSOC(Palo Alto Ca.)

StanfordScience Data Processing

LMSALInstrument Monitoring

& Control Instrument Commands / Loads

Telemetry & Command(T&C)

SystemASIST / FEDS

Telemetry MonitoringCommand Management

HK Data Archival (DHDS)HK Level-0 Processing

Automated OperationsAnomaly detection

Flight DynamicsSystem

Orbit DeterminationManeuver PlanningProduct Generation

R/T Attitude DeterminationSensor/Actuator Calibration

SDO Mission Operations Center

EVE SOC(LASP /

Boulder Co.)

Acquisition Data

Observatory Commands

Observatory Housekeeping Telemetry

Tracking Data

Trending

Mission Planning& Scheduling

plan daily/periodic eventscreate engineering planGenerate Daily Loads

HMIScience Data

55Mbps

Ka-Band:150 Mbps

Science Data

S-Band: TRK,Cmd & HK Tlm

Instrument Commands / Loads

Data DistributionSystem(Incl. 30-Day

Science Data Storage)

SDO Ground Site #1(White Sands)

S-Bandground system

Ka-Bandground system

Ka Science Data

AIAScience Data

67Mbps

EVEScience Data

7 Mbps

R/T Housekeeping Telemetry/ Science Planning and FDS Products

Science Planning and FDS Products

ExternalTracking Station

S-Band HK Tlm, TRK Data

AcquisitionData

Station/DDS Control

Station/DDS Status

Ground StationControl System

DDSControl System

SDO Ground Site #2(White Sands)

Ka-Band:150 Mbps

Science Data

R/T Housekeeping Telemetry

S-Band: TRK, Cmd & HK Tlm

Cmd & HK TlmS-Band: TRK,

Cmd(Includes 72-hr storage)

S-Bandground system

Ka-Bandground system

Alert NotificationSystem (ANS)

Flight SoftwareMaintenance Lab

(FLATSAT)

Flight software loadsSimulated housekeeping telemetry

Memory dumpsSimulated commands

(Includes 48-hr storage)

Same Interfacesas Prime Ground Site

(Includes 72-hr storage)

(Includes 48-hr storage)

Status and Control

HMI AIA JSOC(Palo Alto Ca.)

StanfordScience Data Processing

LMSALInstrument Monitoring

& Control

HMI AIA JSOC(Palo Alto Ca.)

StanfordScience Data Processing

StanfordScience Data Processing

LMSALInstrument Monitoring

& Control Instrument Commands / Loads

Figure1. Role of JSOC in SDO mission.

6

Figure 2. Basic Connectivity between SDO Mission Operations, Data Distribution, and instrument Science Operations Centers.

Science TeamForecast Centers

EPOPublic

Catalog

Primary Archive

HMI & AIAOperations

House-keeping

Database

MOCDDS

Redundant Data

Capture System

30-DayArchive

OffsiteArchive

OfflineArchive

HMI JSOC Pipeline Processing System

DataExport& WebService

Stanford

LMSAL

High-LevelData Import

AIA AnalysisSystem

Local Archive

QuicklookViewing

housekeeping GSFCWhite Sands

WorldScience Team

Forecast CentersEPO

Public

Catalog

Primary Archive

HMI & AIAOperations

House-keeping

Database

MOCDDS

Redundant Data

Capture System

30-DayArchive

OffsiteArchive

OfflineArchive

HMI JSOC Pipeline Processing System

DataExport& WebService

Stanford

LMSAL

High-LevelData Import

AIA AnalysisSystem

Local Archive

QuicklookViewing

housekeeping GSFCWhite Sands

World

Catalog

Primary Archive

Catalog

Primary Archive

HMI & AIAOperations

House-keeping

Database

House-keeping

Database

MOCDDS

Redundant Data

Capture System

30-DayArchive

OffsiteArchiveOffsiteArchive

OfflineArchive

HMI JSOC Pipeline Processing System

DataExport& WebService

Stanford

LMSAL

High-LevelData Import

AIA AnalysisSystem

Local Archive

QuicklookViewing

housekeeping GSFCWhite Sands

World

Page 12: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

1.4. Heritage from MDI

As a result of the similarity of the task to our experience with MDI and TRACE we have adopted similar approaches for HMI and AIA, respectively. The basic system can be described as a pipeline of processing steps managed by automatically generated scripts with the aid of a relational database to keep track of the locations and status of each dataset.

The current MDI components of data validation, pipeline execution, standard data product generation, parallel virtual machine, data storage management, database server, data catalog, Oracle DBMS, media archive server, quality reporter, job management, SOAP data query URL, and extensive data export methods will be retained with modifications for the HMI demands. As a result, large-scale system prototyping can begin immediately following the initial system engineering studies. Changes do need to be made to data formats, and to remove hardware specific dependencies from some components.

Particular changes will be made to handle the HMI/AIA telemetry stream and required data product generation. A small version of the MDI system has been adapted to serve the high speed data capture for the EGSE during instrument development. A new simulation subsystem was built to generate the telemetry formats and rates expected from HMI and AIA. The final data capture subsystem will be a limited version of the pipeline processing system with only sufficient processors, disk, and tape units to deal with the capture, level-0 processing, and archiving role.

There will be expanded provisions for inserting known data so that data validation at each processing level can be performed.

The keyword data associated with each dataseries will now be kept in database tables independently of the image data. So information can be extracted, or keywords reprocessed without accessing the files containing the image data.

The binary format of the data inside the pipe will be designed for efficient storage while the format of exported data will continue to be in community standard data formats.

Some terminology will be changed to ease the conversion to a modified handling of the data collection in a more flexible manner more suited to end user requests while still allowing efficient pipeline processing.

1.5. Data Access Policy

HMI and AIA are committed to an open data policy. As with MDI and TRACE, JSOC data will be available to the science community in each level of reduction as soon as it is available at the JSOC. The current best calibration parameters and software will also be freely available. The HMI and AIA team will coordinate with EVE investigators to determine appropriate formats for data exchange, catalog exchange, and archive locations. We suggest that the JSOC be the NASA designated mission archive for HMI and AIA data for the duration of the mission. At the conclusion of the mission the raw, reduced, and calibrated data will be deposited in an appropriate NASA specified data archive. A number of HMI and AIA data products will be of immediate value for space-weather analysis. These products will be computed from the best available data set in near real time for rapid delivery to users. The particular set will be

7

Page 13: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

determined in Phase-D but will certainly include full-disk magnetograms, continuum images, and farside activity images from HMI and selected lower resolution images and image sequences (movies) from AIA.

1.6. Related Documents

The basic requirements for the JSOC-GDS are derived from the HMI and AIA Contracts between NASA and Stanford University and Lockheed-Martin. These requirements originated in the SDO Definition Study published in 2002 and are reflected in the SDO Level 1 mission requirements. These documents and others listed below are all available via the HMI web site at http://hmi.stanford.edu/doc/index.html

The JSOC-GDS and related topics are described in more detail in:

• CDRL SD301: Instrument Science Requirements which is a section of the HMI Science Plan (HMI-S014)

• The AIA Instrument Science Requirements

• CDRL SD326b: Science Data Analysis, Processing, Archive and Distribution Plan;

• CDRL SD326c: Functional Descriptions of Data Products

• JSOC IT Security Plan

8

Page 14: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

2. JSOC GDS Functions

The JSOC Ground Data System consists of several distinct components. Figure 2 shows a top level schematic of the planned JSOC system. The Stanford provided components are on the left and the LMSAL components are on the right. The connections to the SDO-MOC and SDO-DDS are shown at the top with connections to the science teams and the world on the lower right. While the entire JSOC operation must meet high standards of data security, only specific components must meet the more stringent data security criteria. The dashed-shaded regions are components which must meet NASA data security controls levels IT-2 at LMSAL and possibly IT-3 at Stanford.

2.1. Basic JSOC Data Handling and Processing Concepts Common Infrastructure

The data handling and processing environment for the data capture system and the pipeline processing system are built on a common infrastructure. This infrastructure is mostly inherited from the MDI processing system. The MDI system is built within the SOI Science Support Center (SSSC) at Stanford.

Because of its central role in the implementation of the JSOC an extended discussion of the infrastructure is included in this version of this document. Much of this discussion will be moved to a separate document in a later version.

HMI and AIA data is inherently a set of sequences in time of images of like quantities. In the case of the helioseismology data in particular these sequences are analyzed as time series of physical quantities. These time series grow in duration from the start of the mission to its conclusion. With the capability of current and projected computer technology it is not practical to maintain these sequences as single growing datasets. This was also the situation for SOHO/MDI data starting in 1995 and in fact for WSO data starting in 1975. In the case of WSO computer technology has long since obliterated the problem. In the case of MDI current trends suggest that the problem will be simpler in only another half decade. For SDO the mismatch of the size of the growing time series with affordable online storage will continue well into the baseline mission. Based on the then 15 years of experience with WSO we developed the MDI infrastructure for data handling. The plan for the JSOC is to make incremental modifications to the MDI data handling system to account for convenience issues in the MDI system, new capabilities enabled by lower cost online storage, the larger volume of data, and AIA needs for near-line storage.

One of the key concepts for the MDI system and the JSOC system is that data is labeled with descriptive names that are not strongly coupled with their present location within the file system of the computers. These descriptive names are designed to reflect the nature of the data as a collection of growing sequences of like data. When a user needs access to a particular time series segment the data system makes that data available on the disks and informs the user where it is. This decouples the storage and maintenance of the data from the working directory where it is made or used. It also removes those functions from the users responsibility and places it under the control of the data storage and distribution system.

9

Page 15: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

The infrastructure developed for MDI and being modified for the JSOC includes the Data Storage and Distribution System which owns all the data, the DataUnit catalog that binds descriptive names to the actual storage both online and offline, the programming libraries that provide services to the user programs for access to the data and its metadata, and the management of user program execution within the pipeline environment.

2.1.1. Data Organization and Naming

Each basic sequence of like images is called a dataseries. A dataseries consists of a sequence of datarecords. A datarecord is the basic atomic unit of a dataseries. Each datarecord is the data for one step in time. Most but certainly not all dataseries are sequences in time. They can be in principle any list of data items. The datarecords are each associated with a unique index within their dataseries. This index can be referred to as the time axis or simply as the principal index axis of the dataseries. The index is usually a simple ordinal number with a specified mapping onto time but it can, even for a time series, be viewed as simply an index number where there is an associated value of the principal axis (e.g. the time of the image in the datarecord).

A dataset is simply a collection of datarecords. Normally a dataset is a contiguous set of records from a single dataseries but this is not due to any restriction on the scope of a dataset. As viewed by the user a dataset is the result of a query to the JSOC data catalog. As viewed by the DSDS system a dataset is an ordered set of pointers to datarecords.

It is the job of the DSDS to bind sets of datarecords to the real world. The basic chunk of data that is managed by the DSDS is called a dataunit. A dataunit is a working directory and is the atomic unit of the DSDS. All dataunits are owned and managed by the DSDS. The datarecords from a particular dataseries will in general be stored in many dataunits. Each dataunit contains datarecords from a single dataseries. A dataunit may be stored online on magnetic disk, offline e.g. on a magnetic tape in a cabinet, or nearline on a tape in a robotic tape library. (The particular storage media is not important to the concept).

In response to a user requests access for a particular datarecord the JSOC catalog will identify the dataunit containing that datarecord. The DSDS catalog tracks the location of each dataunit. If that dataunit is not online the DSDS will allocate storage space, name a directory, and copy the dataunit into that directory. The DSDS will report the working directory pathname to the DataRecord catalog where it is accessible to the user.

While a dataset is a collection of datarecords there is no guarantee that all of the data records in a dataset will be in a single dataunit. In fact in the JSOC it is unlikely to be true in particular at the higher levels or analysis. A dataset may span many dataunits. It may contain one or a few datarecords from many dataunits each of which contains many datarecords. On the other hand, a dataunit is a collection of datarecords so can in fact be described as a dataset. It is likely that at the lowest levels of processing, e.g. telemetry handling and perhaps level-0, that the datasets actually specified will usually be single dataunits. The purpose of allowing dataunits to be larger than single datarecords is primarily to enable efficient storage on tapes. The physical record size must be larger than the inter-record gaps to achieve high efficient storage. The desire for large dataunits is balanced by a desire to maintain simplicity in datarecord management. The blocking

10

Page 16: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

(number of datarecords per dataunit) will be specified for each dataseries and it is expected that the size of dataunits will vary between dataseries based on these compromises. For MDI datasets were defined to be the same role as dataunits since the term dataunit was not used. In the MDI system dataset names were constructed as a 5-part name consisting of a project or program name, a reduction level name, a series name, a series index number, and a version number. The series name contained user readable indicators of the blocking of time into datasets and standardized series numbers were referenced to a common epoch (1 January 1993, 0 UT). Thus the dataset name for hour 0 of full disk velocity images for July 14 1996 is:

prog:mdi,level:lev1.5,series:fd_V_01h[30960] After requesting the location of that dataset by a call to a service (e.g. the peq program) the user could learn that the dataset is now in the directory /PDS20/D6362810. If that dataset is not used for a few weeks its online copy will be deleted so the next time it is again requested it will be staged to disk and will appear in another directory. But the user never needs to see the actual storage working directory since the user refers to the data by its descriptive name. The velocity image for say the 10th minute of that hour will be in a file (e.g. 0009.fits) in that working directory. Again the actual file name is not seen by the user since she has used an SSSC provided API function to open the file for the requested minute number. The JSOC naming system is derived from the MDI system but has some key differences. The concept of a multipart name is eliminated to reflect the MDI actual experience. A new top level name will identify the JSOC data server as a whole to allow a common format for access to e.g. the existing MDI data. The existing name parts simply become user viewable parts of the dataseries name. Thus the same dataset in the example above might be: jsoc:/mdi_fd_V_lev1.5/[t_rec>=1996.07.14_00,t_rec<1996.07.14_01] or jsoc:/mdi_fd_V_01h_lev1.5/[sn=30960] where the syntax here is still TBR. When the user asks for this dataset in the JSOC system a function provided by the DataRecord catalog API will generate a list of all the datarecords in the dataset. That list will include the working directory (dataunit) and file name and optionally slice within a file for each datarecord. The user will not normally need to see this information since she will simply use the open_record JSOC API function call. After a dataunit is created it will be managed by the lower level DSDS system. The DSDS system will manage the disk space, copy dataunits to tape, retrieve them when needed, report their location to the JSOC database when needed, etc. When a dataunit is created it is tagged in the DSDS catalog as temporary, permanent online, to be archived, archived, etc. The user program (via the DataRecord catalog) will determine when a dataunit is complete and when it should be archived. Dataunits are locked while being modified or being archived. Once a dataunit has been archived it is write-protected and that version of the data is frozen. In the JSOC system the role of the dataunit is to allow efficient storage of the datarecords on tape. Tape systems require large transfers for efficient storage and access. Often in practice and

11

Page 17: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

usually in the lower levels of processing, there will be a one-to-one mapping between datasets and dataunits but the system does not require that.

All of the disk and tape storage is owned by the DSDS and it is allocated to dataunits when needed. Storage containing dataunits that have been archived to tape may be reused as needed. Storage containing dataunits that contain parts of dataseries that are declared to be computable on demand may also be reassigned as needed. The priority system that manages the online and nearline storage is only slightly modified from the MDI system to allow priorities to be set on a dataseries basis. Dataunits that are archived to tape are grouped on tapes according to storage groups depending on the dataseries. This is the same method as used in the MDI system to maintain compactness of dataseries on tapes even when the dataseries grows over years.

2.1.2. Meta-data

HMI and AIA data images are nearly useless unless accompanied by a number of ancillary data quantities. These are called metadata. There are several types of metadata including data-, processing-, and file-metadata.

Data metadata is the collection of keyword-value pairs typically found in the header of solar data as stored in FITS files. It describes the attributes of the data such as the time of observation, the spacecraft velocity at that time, the instrument parameters, the mean value of the data in the image, the dimensionality of the image, etc. For MDI data this metadata is stored in an internal format ascii file in each dataset, (i.e. the record.rdb file). This metadata is not easily accessible when the dataset is offline (although online copies are kept for some dataseries) and is often the only part of a dataset that changes over several stages of processing. The inefficiency of binding the data metadata with the data is one of the prime reasons for modifying the MDI DSDS system for SDO. The JSOC data metadata is stored in the JSOC relational database on a per datarecord basis.

Processing metadata is the information generated during processing such as the time and data or processing, the version of the telemetry unpacking program, the processing settable parameters (e.g. command line options), etc. The processing metadata usually refers to a group of dataunits. We are evaluating several choices for storage of processing metadata. For MDI it is mostly kept in the dataset (i.e. dataunit) created by each processing step and as a result it is unavailable unless the dataunit is online. For the JSOC we are evaluating maintaining a processing log system either outside or as part of the DataRecord catalog.

The file metadata includes such information as the file change date, the file size, the data protocol (e.g. FITS or CDF), parameters of that protocol (e.g. FITS as shorts or floats), compression protocols, etc. For MDI this information is determined by the actual datarecord file names and in a standard overview.fits file included in each conforming dataset, or in the FITS header of each image file.

2.1.3. Data Catalog

The JSOC system will have two primary database catalogs. The dataunit catalog and the datarecord catalog of datarecords. The single MDI catalog is implemented as an Oracle database.

12

Page 18: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

The baseline for the DataUnit catalog is also Oracle. We are examining Oracle, MySQL, and postgreSQL for the DataRecord catalog.

The DataUnit catalog has a prime table of dataunits with fields such as current working directory, total size, associated dataseries, tape location, storage group, retention priority, etc. The schema for this catalog is nearly identical to the MDI dsds_main table.

The JSOC catalog is a collection of tables including a master keyword catalog containing keyword definitions, a dataseries catalog with a list of dataseries names, a table for each dataseries containing the per record information, etc. The JSOC catalog has been prototyped in an initial version and preliminary timing tests suggest that we will indeed be able to manage the several hundred billion data entries anticipated.

2.1.4. Data formats

The representation of image and other data as it moves from one storage area to another will vary. In the lowest level seen by the JSOC that data is compressed images made by onboard processing then split into telemetry frames. In level-0 processing it is uncompressed and details of the transmission process are removed. The level-0 data and all higher level products stored in the JSOC internal pipeline archive will be stored as compressed images using a lossless compression format. The metadata will be stored in the JSOC catalog. When data is exported to local disks, to Lockheed or other closely coupled users, or to general users via web exports it will be converted into the requested standard data format. There will likely be at least two choices including FITS and VOTables. Other formats such as CDF or HDF may be available if the need is convincingly identified. External GRID users will probably prefer to use the internal pipeline format since they should be able to seamlessly use the pipeline libraries.

2.1.5. Processing Control

Process control in the MDI SSSC system is managed by the Pipeline Execution system known as pe. Pe implements a shell-like script processor language that manages process execution, storage allocation, requests to the MDI catalog, and other services for program module execution. Pe is based on the PVM process communication system. The JSOC pipeline system will be a modified version of the pe system with enhancements to allow easier access to services from non-pe programs. We will also remove the module start-up services that parse dataset specifiers on the command line. These services will be provided as part of the JSOC user API library.

The flow of processing steps in the pipeline in the MDI system is governed by a program called pui for Pipeline User Interface. Pui makes highlevel products by determining which low level products are needed, computing them first if they are not already available, then computing the desired product. Pui will need to be modified to reflect the continuous nature of SDO data vs the daily data receipt of the SOHO data. A similar capability will be build to control the JSOC pipeline dataflow.

2.1.6. Five Kinds of Users

13

Page 19: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

The JSOC needs to support five kinds of end users defined by their location and ease of access to the JSOC computer system. These users are pipeline internal users, local users not using the pipeline, remote co-operating users such as the AIA team at LMSAL, remote investigators using remote systems, and GRID users such as the UK Co-I group. To support these user communities we will implement the user level API in two versions, one for the internal pipeline use where the data is in compressed images in datarecords in dataunits with metadata in the JSOC catalog and outside users where the data is in, e.g., FITS files with the metadata embedded into the data files. The goal is to make the C and the IDL library interface seen by the user identical in the two environments. This will allow remote users (as well as local users) to develop and test analysis code in their own environment and easily port the programs into the pipeline environment. Figure 3 shows the five user types and service layers provided by the JSOC for each.

NEED TO TALK ABOUT SOLARSOFT HERE

Figure 3. The five kinds of JSOC users. The pink shaded services will be provided as part of the JSOC.

2.2. Data Capture System

The data capture system is a separate computer system dedicated to capturing the HMI and AIA data delivered by the SDO DDS. The functions are to receive the telemetry data, archive it, report to the DDS that the data is received, and deliver the telemetry files to the pipeline processing system. The data capture system also serves as a pass-through for SDO Flight Dynamics data, mission planning, and other data originating at the SDO MOC and for HMI and AIA housekeeping data and operations logs originating at the JSOC-OPS facility at LMSAL.

14

Page 20: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

The data capture system will be a redundant server which will connect to the dedicated links from the DDS and will capture the data and communicate with the DDS to ensure all the data is received. There will be a dedicated 30-day archive of the raw telemetry data which will hold data until it has been copied offline offsite and has been archived in the JSOC pipeline processing system.

The details of the offsite storage facility are not yet determined. But the requirement is that data must be delivered to it with certainty (by verification if tapes sent by commercial carrier) before the data is released at the DDS.

Archiving the data to an offline media and storing that media in an offsite location and verifying that copy prior to the expiration of the data in the DDS 30-day cache will ensure that the JSOC never holds the only copy of the primary mission data.

Every minute the DDS transfers several (one per virtual channel) .tlm file consisting of the stream of compressed VCDUs, less the R/S Check Symbols, to the SOC ftp directory. After each .tlm file, a .txt file is written containing the name, size and TBD quality information of the .tlm file. When the JSOC detects the .txt file it ingests the .tlm file and marks this file as received. Every hour the DDS sends a data status file (.dsf) containing all the .tlm file names it believes it has transferred in the last hour. The JSOC creates an acknowledgement status file (.asf) acknowledging all those .tlm files it has received and verified and marks those it has not for retransmission. Once every 24 hours the JSOC sends a .ack file of all the .tlm files it has permanently archived with offsite copies indicating that these files can now be deleted by the DDS. All the telemetry data are permanently archived and represent approximately 1,400GB per day. The telemetry datasets can be retrieved at any time.

2.3. Level-0 Data Processing

Level-0 processing converts the raw telemetry data into images that look like they did on the CCD. I.e. the transmission specific details are not needed to interpret the level-0 data and the images are in a form that is available for immediate use. Internal storage format for level-0 will be e.g. FITZ (lossless compression version of FITS) files with one file per image. Data exported from level-0 will be FITS format files with one file per image. A level-0 dataunit will be a collection of level-0 images. The collection will be ordered as a time series of like images with some duration (e.g. minutes or hours) or a series of a set of related images (e.g. the 20 filtergrams that will generate one set of vector field observables). The particular sets of standard data products at level-0 will be chosen to facilitate the level-1 processing. The level-0 data is changed in format and organization but will not have the values changed in a way that can not be removed. The level-0 data will be archived on permanent media and kept offline at the JSOC. The level-0 data will also be kept in the online archive for 30-60 days. The housekeeping data that is bundled into the high-rate channel will also is stored as level-0 data types. HMI Level-0 data is not expected to be particularly useful as is for science analysis. AIA level-0 data will be useful and may in fact be the format used by many analyses depending on the variability and complexity that may be desired The AIA Level-0 data will be maintained in near-line tape libraries for the duration of the mission.

2.4. Data Processing to Level-1

15

Page 21: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

The level-1 processing converts the data into calibrated images of observable physical quantities. While the units of level-0 data are simply CCD DNs (digital numbers) the level-1 data is in m/s, gauss, etc. in the case of HMI. The velocity data will be referenced to the Sun with orbit velocity removed. The level-1 data is the primary format for science analysis and for input to the higher level data product pipelines. The goal is to maintain all the level-1 data online for at least three months. Some key HMI level-1 products will be maintained online for the duration of the mission. For AIA the level-1 data will have passed through cosmic ray cleaning and flat fielding, offset removal, and brightness calibrations to standard units. AIA level-1a snapshots of extracted regions, lower resolution movies, etc will be generated at Stanford or LMSAL depending on the particular product. These will be archived at Stanford (thus the 1a designation). Other AIA level-1 products will not be archived but will be recomputed at LMSAL or Stanford as requested. The system will be designed to hold the active data online at the beginning of the mission with the expectation that for the first few years seldom accessed data will be recalled when needed. Some level-1 products will be archived offline but some will be recomputed from level-0 if the online version is lost or removed for storage space needs. The decision of offline or nearline archiving of higher level products will be made based on the compute resources needed to regenerate them as compared to the resources needed to store and retrieve them. This tuning need not be done until near launch.

2.5. HMI Higher Level Processing

Both HMI and AIA will have standard automatic higher level processing to produce higher level science data products that will be the dominant source of data for science analysis.

The HMI higher level processing will be accomplished in the same pipeline control system as the level-0 and level-1 processing. The AIA specific processing will be split between this system and specific capability located at LMSAL which allows interactive access to the data via advanced visualization tools.

The HMI higher level processing concept is shown in Figure 4. The Filtergrams are produced onboard and are the Level-0 data products. The second column in Figure 4 contains the physical observables, velocity, magnetograms, and brightness. These are the Level-1 products. The steps shown in the Processing columns are intermediate data processing and data products and are referred to as level-2 here. The column labeled Data Product contains the Higher-level data products that will be archived and made available for science analysis and are referred to as level-3 here. These in addition to the Level-1 data are the primary sources of data for the HMI science community. (The names of processing levels have not been used consistently and will be defined in the data products documentation at a later date.)

The AIA higher level processing done at Stanford will involve generating lower resolution images suitable for most users who wish to examine the evolution of features via movie sequences and correlations with magnetic field regions. This is the data that will most likely be of use for immediate use by space-weather forecast models. Each NOAA active region will also be extracted into a tracked data-cube as it crosses the disk. These images combined with the same regions from the magnetogram database will be used for many individual analyses contributing to the HMI and AIA science goals.

16

Page 22: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

An important set of the AIA science goals will rely on additional special processing capability that will be part of the LMSAL AIA Visualization Center. This activity will be located at LMSAL and will receive all of the Level-0/1 AIA data and all of the HMI magnetograms for additional processing and interactive visualization studies. As part of the processing at LMSAL, an event database will be generated which will catalog solar features and events of interest for further study. This catalog will also contain the bounding box information for the NOAA regions as extracted in the Stanford processing. The catalog will be available at both LMSAL and Stanford facilities and for export and online analysis by the science team at large.

Level 2 data are intermediate products that are the results of reorganization of the Level 1 data such as sampling, filtering, projections/mapping, transposition, and spatial or temporal transforms. Broken down by main area the level 2 data products include:

a) Global helioseismology • Heliographic maps of Doppler velocity • Spherical Harmonic Time series • Mode frequencies and splitting

b) Local-area helioseismology • Tracked tiles of Doppler velocity • Local wave frequency shifts derived by ring diagram analysis • Wave travel times derived by time-distance analysis • Wave phase shift maps derived by helioseismic holography

c) Line-of-sight and vector magnetography • Full-disk averaged maps of Stokes parameters • Tracked tiles of Stokes parameters and/or vector magnetograms • Vector magnetograms

d) Continuum intensity data products • Tracked full-disk 1-hour averaged continuum maps • Solar limb parameters • Brightness feature maps

Level 3 data represent the results of scientific model analysis, such as helioseismic mode fits, mode inversions, magnetic and velocity field reconstruction, and feature identification. Broken down by main area the level 3 data products include:

a) Global helioseismology • Internal rotation and large scale flows • Internal sound speed

b) Local-area helioseismology • Full-disk maps of velocity and sound speed at depths of 0-30Mm • High resolution maps of velocity and sound speed near active regions at depths of

0-30Mm • Deep focus maps of velocity and sound speed at depths of 0-200Mm • Far-side activity index

c) Line-of-sight and vector magnetography

17

Page 23: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

• Line of sight magnetic flux maps • Full-disk vector magnetic field, filling factor, and other thermodynamic

parameters • Coronal magnetic field extrapolations • Coronal and Solar wind models

d) Continuum intensity data products • Brightness images

The principal data flows and products are summarized in Figure 4. Level 2 data are not normally archived, but generated on demand as they are needed. If the higher level data products are not archived, documentation of the algorithms, of the actual code, and of the calibration data used to create them from lower level data will accompany the higher-level data products as ancillary information. This will typically include version and configuration information for a pipeline analysis module in addition to references to the lower level data products and calibration data needed to recreate the higher level data product in question.

18

Page 24: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

19

Leve

l-0

Leve

l-1

HM

I Dat

a A

naly

sis

Pip

elin

e

Dop

pler

Vel

ocity

Hel

iogr

aphi

cD

oppl

er v

eloc

itym

aps

Trac

ked

Tile

sO

f Dop

pler

gram

s

Sto

kes

I,V

Filte

rgra

ms C

ontin

uum

Brig

htne

ss

Trac

ked

full-

disk

1-ho

ur a

vera

ged

Con

tinuu

m m

aps

Brig

htne

ss fe

atur

em

aps

Sola

r lim

b pa

ram

eter

s

Sto

kes

I,Q,U

,VFu

ll-di

sk 1

0-m

inA

vera

ged

map

s

Trac

ked

Tile

s

Line

-of-s

ight

Mag

neto

gram

s

Vect

or M

agne

togr

ams

Fast

alg

orith

m

Vec

tor M

agne

togr

ams

Inve

rsio

n al

gorit

hm

Egr

essi

on a

ndIn

gres

sion

map

s

Tim

e-di

stan

ceC

ross

-cov

aria

nce

func

tion

Rin

g di

agra

ms

Wav

e ph

ase

shift

map

s

Wav

e tra

vel t

imes

Loca

l wav

efre

quen

cy s

hifts

Sph

eric

alH

arm

onic

Tim

e se

ries

To l=

1000

Mod

e fre

quen

cies

And

spl

ittin

g

Brig

htne

ss Im

ages

Line

-of-S

ight

Mag

netic

Fie

ld M

aps

Cor

onal

mag

netic

Fiel

d E

xtra

pola

tions

Cor

onal

and

Sol

ar w

ind

mod

els

Far-s

ide

activ

ity in

dex

Dee

p-fo

cus

v an

d c s

map

s (0

-200

Mm

)

Hig

h-re

solu

tion

v an

d c s

map

s (0

-30M

m)

Car

ringt

on s

ynop

tic v

and

cs

map

s (0

-30M

m)

Full-

disk

vel

ocity

, v(r

,Θ,Φ

),An

d so

und

spee

d, c

s(r,Θ

,Φ),

Map

s (0

-30M

m)

Inte

rnal

sou

nd s

peed

,c s

(r,Θ

) (0<

r<R

)

Inte

rnal

rota

tion Ω

(r,Θ

)(0

<r<R

)

Vec

tor M

agne

ticFi

eld

Map

s

HM

I Dat

aD

ata

Prod

uct

Proc

essi

ng

Leve

l-0

Leve

l-1

Leve

l-0

Leve

l-1

HM

I Dat

a A

naly

sis

Pip

elin

e

Dop

pler

Vel

ocity

Hel

iogr

aphi

cD

oppl

er v

eloc

itym

aps

Trac

ked

Tile

sO

f Dop

pler

gram

s

Sto

kes

I,V

Filte

rgra

ms C

ontin

uum

Brig

htne

ss

Trac

ked

full-

disk

1-ho

ur a

vera

ged

Con

tinuu

m m

aps

Brig

htne

ss fe

atur

em

aps

Sola

r lim

b pa

ram

eter

s

Sto

kes

I,Q,U

,VFu

ll-di

sk 1

0-m

inA

vera

ged

map

s

Trac

ked

Tile

s

Line

-of-s

ight

Mag

neto

gram

s

Vect

or M

agne

togr

ams

Fast

alg

orith

m

Vec

tor M

agne

togr

ams

Inve

rsio

n al

gorit

hm

Egr

essi

on a

ndIn

gres

sion

map

s

Tim

e-di

stan

ceC

ross

-cov

aria

nce

func

tion

Rin

g di

agra

ms

Wav

e ph

ase

shift

map

s

Wav

e tra

vel t

imes

Loca

l wav

efre

quen

cy s

hifts

Sph

eric

alH

arm

onic

Tim

e se

ries

To l=

1000

Mod

e fre

quen

cies

And

spl

ittin

g

Brig

htne

ss Im

ages

Line

-of-S

ight

Mag

netic

Fie

ld M

aps

Cor

onal

mag

netic

Fiel

d E

xtra

pola

tions

Cor

onal

and

Sol

ar w

ind

mod

els

Far-s

ide

activ

ity in

dex

Dee

p-fo

cus

v an

d c s

map

s (0

-200

Mm

)

Hig

h-re

solu

tion

v an

d c s

map

s (0

-30M

m)

Car

ringt

on s

ynop

tic v

and

cs

map

s (0

-30M

m)

Full-

disk

vel

ocity

, v(r

,Θ,Φ

),An

d so

und

spee

d, c

s(r,Θ

,Φ),

Map

s (0

-30M

m)

Inte

rnal

sou

nd s

peed

,c s

(r,Θ

) (0<

r<R

)

Inte

rnal

rota

tion Ω

(r,Θ

)(0

<r<R

)

Vec

tor M

agne

ticFi

eld

Map

s

HM

I Dat

aD

ata

Prod

uct

Proc

essi

ng

HM

I Dat

a A

naly

sis

Pip

elin

e

Dop

pler

Vel

ocity

Hel

iogr

aphi

cD

oppl

er v

eloc

itym

aps

Trac

ked

Tile

sO

f Dop

pler

gram

s

Sto

kes

I,V

Filte

rgra

ms C

ontin

uum

Brig

htne

ss

Trac

ked

full-

disk

1-ho

ur a

vera

ged

Con

tinuu

m m

aps

Brig

htne

ss fe

atur

em

aps

Sola

r lim

b pa

ram

eter

s

Sto

kes

I,Q,U

,VFu

ll-di

sk 1

0-m

inA

vera

ged

map

s

Trac

ked

Tile

s

Line

-of-s

ight

Mag

neto

gram

s

Vect

or M

agne

togr

ams

Fast

alg

orith

m

Vec

tor M

agne

togr

ams

Inve

rsio

n al

gorit

hm

Egr

essi

on a

ndIn

gres

sion

map

s

Tim

e-di

stan

ceC

ross

-cov

aria

nce

func

tion

Rin

g di

agra

ms

Wav

e ph

ase

shift

map

s

Wav

e tra

vel t

imes

Loca

l wav

efre

quen

cy s

hifts

Sph

eric

alH

arm

onic

Tim

e se

ries

To l=

1000

Mod

e fre

quen

cies

And

spl

ittin

g

Brig

htne

ss Im

ages

Line

-of-S

ight

Mag

netic

Fie

ld M

aps

Cor

onal

mag

netic

Fiel

d E

xtra

pola

tions

Cor

onal

and

Sol

ar w

ind

mod

els

Far-s

ide

activ

ity in

dex

Dee

p-fo

cus

v an

d c s

map

s (0

-200

Mm

)

Hig

h-re

solu

tion

v an

d c s

map

s (0

-30M

m)

Car

ringt

on s

ynop

tic v

and

cs

map

s (0

-30M

m)

Full-

disk

vel

ocity

, v(r

,Θ,Φ

),An

d so

und

spee

d, c

s(r,Θ

,Φ),

Map

s (0

-30M

m)

Inte

rnal

sou

nd s

peed

,c s

(r,Θ

) (0<

r<R

)

Inte

rnal

rota

tion Ω

(r,Θ

)(0

<r<R

)

Vec

tor M

agne

ticFi

eld

Map

s

Brig

htne

ss Im

ages

Line

-of-S

ight

Mag

netic

Fie

ld M

aps

Cor

onal

mag

netic

Fiel

d E

xtra

pola

tions

Cor

onal

and

Sol

ar w

ind

mod

els

Far-s

ide

activ

ity in

dex

Dee

p-fo

cus

v an

d c s

map

s (0

-200

Mm

)

Hig

h-re

solu

tion

v an

d c s

map

s (0

-30M

m)

Car

ringt

on s

ynop

tic v

and

cs

map

s (0

-30M

m)

Full-

disk

vel

ocity

, v(r

,Θ,Φ

),An

d so

und

spee

d, c

s(r,Θ

,Φ),

Map

s (0

-30M

m)

Inte

rnal

sou

nd s

peed

,c s

(r,Θ

) (0<

r<R

)

Inte

rnal

rota

tion Ω

(r,Θ

)(0

<r<R

)

Vec

tor M

agne

ticFi

eld

Map

s

HM

I Dat

aD

ata

Prod

uct

Proc

essi

ng

Figure 4. HMI pipeline processing

Page 25: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

20

Figure 5. AIA higher level processing.

Page 26: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

2.6. AIA Higher Level Processing

The AIA higher level processing is described in the AIA science plan. An overview of that plan is shown here in Figure 5 to show the links into the Stanford part of the JSOC.

LEVEL Description Examples RATE

[GB/day] RATE [TB/year]

Cache [day]

%Archive

Raw Telemetry 723 264 30 1000 Primary

Data Loss-less 1,100 400 30 100

1a Organized Sets, Current Calibration

Movies, Extracted Regions

55 20 100 (mission)

100

1 Best calibrations

On demand 1,100 400 100 10

2 Higher Level Products

DEM, Movies

28 10 1900 (mission)

100

2.7. Archives

The term JSOC Archive is used inclusively to include the data online on magnetic disks, nearline in robotic tape libraries, and offline on tapes in cabinets. The online component of the archive will be large enough for at least 30 days of all levels of data with a goal of up to 90 days for level-1 HMI and level-0 AIA data. After the online interval the HMI data will migrate to nearline tape units and then to offline status depending on usage. The AIA level-0 data will remain in nearline storage for the mission duration.

In addition to the JSOC archive we will maintain an offsite archive for the raw telemetry files. If it is ever needed the offsite data will remain offsite where copies will be made for import into the JSOC archive. The details of the offsite archive are TBD.

2.8. Export System

There are two issues for exported data. The first is the catalog interface to allow identification and specification of the desired data and tools to export it to the users system. The second is tools needed by the user to conveniently use the data.

The export catalog tools will be compatible with the CoSEC and VSO access tools that have been build in recent years with LWS and other NASA support. Archived data via web request will often be available immediately although for large requests of nearline or offline data the request may take up to 24 hours to complete. Since most level-1 HMI data will be online for its

21

Page 27: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

period of most use and all AIA level-0 data will be nearline, requests for these datasets will be very fast.

In addition to the web portals to the data catalog, local and remote co-operating users will be able to use a program interface to the catalog via a set of JSOC-lite tools that will use the SOAP interface to access the data. These tools will support a table-driven mapping of datasets into a directory structure. This will allow these users to access the data as if it is local. When it is not local it will be automatically requested. Thus such a user can make a processing script that first checks the online status of the needed data, and then uses it without needing to explicitly export the data via a web request. Facilities will also be provided for importing datasets from these users back into the JSOC catalog so that they are available for other HMI and AIA team members for pipline and export purposes.

We expect to provide two semi-separated web protals into the data export system. One for the public at large via the E/PO interface and another to be used by researchers. The more public site will likely be mirrored through a commercial service to support large volumes of requests for limited sets of data such as we have found to be useful after solar activity events during the TRACE and SOHO missions.

2.9. Integration & Test

With the provision of injecting known telemetry data into the JSOC front end, a complete test of data flow and integrity can be performed for each stage of the processing. Standard regression test suites and validation procedures will be developed to verify processing at each stage through all the system additions and revisions. Throughput and load balancing tuning will be performed. All development source code is under a Configuration Management (CM) system. The CM is based on CVS. CVS provides configuration management functionality as well as the means for multiple users to work concurrently on a common source tree with minimal conflict.

22

Page 28: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

3. JSOC Data Volumes

The data flow estimates for the combined system are shown in Figure 6. The estimates here are based on the model described in Table 1. The raw telemetry data is highly compressed onboard SDO and will be archived in the compressed form. The level-1, level-1, and higher level data products will probably be kept uncompressed while being used for near-term processing (this may change depending on compute costs vs. storage costs in 4 years). These higher level products will be compressed with normal software compression methods when stored to the (probably) tape near-line/offline archive. A factor of two is assumed for this compression.

4. JSOC Hardware Configuration

4.1. Capture system

The capture system hardware will be a pair of multi-processor computers operated as a single redundant system to receive both HMI and AIA data. The two OC3 lines from the DDS, one for HMI and one for AIA, will both be connected to the redundant system.. The system will be

HMI: 55,000,000 bps 553 30 16AIA: 67,000,000 bps 674 30 20HMI: 4k*4k*2 bytes/2-seconds*(pi/4) 530 30 16AIA: 4k*4k*2 bytes * 8 imgs per 10 1080 30 32HMI: V,M,Ic @ 45s & B, ld, ff @ 130 46AIA: same as level-0 1080 100 105HMI: See HMI-S015 and Figure 5 20 0 7AIA (lev1a): movies & extracted regions. 54 0 19HMI: Magnetograms (M, B) 59 100 6AIA: Full Level-0 data+lev1_extract 1134 100 111HMI: 2 * Higher Level products + 5*10 49 60 3AIA: 3* higher Level products (TRACE 162 60 9HMI: tlm 553 197AIA: tlm 674 30 20 240HMI: Lev0, Lev-1, All Higher 680 100 66 242AIA: Lev0, Lev1a 1134 365 404 404HMI Disk Totals 40 53AIA Disk Totals 277 19HMI Tape Totals 439AIA Tape Totals 644Combined Disk (TB) 317 73Combined Tape per year (TB) 1084

Totals

Offsite tape

1227

Offline tape

1814

LMSAL Link

1193

Export 211

Level-1 1210

Higher level

74

In from DDS

1227

Level-0 1610

Table 1. Data Flow AssumptionsData Path

Assumptions Volume (GB/day)

Combined

Online cache

Cache size (TB)

Perm TB/yr

23

Page 29: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

supported by a UPS system to assure availability. The disk subsystem will employ RAID to assure availability. The tape subsystem will be a robotic library with capacity for at least 30-days of data with capacity to load older tapes if requested by the pipeline processing system.

Dataflow (GB/day)Joint Ops

ScienceArchive440TB/yr(Offiste)

Data Capture2 processors each

1230

1610

HMI &AIA Science

Hk

0.04

30d cache40TB each

Quick Look

LMSAL secure host

Level 0(HMI & AIA)2 processors

75

Level 1(HMI)

16 processors

Online Data325TB+50TB/yr

HMI High LevelProcessingc. 200 processors

HMI Science Analysis Archive 650TB/yr

Redundant data capture system

1210

1210

Data Exports

1200

LMSAL Link(AIA Level 0, HMI Magnetograms)240

1610

1820

1230

rarelyneeded

1230

2 processorsSDO Scientist &User Interface

Dataflow (GB/day)Joint Ops

ScienceArchive440TB/yr(Offiste)

Data Capture2 processors each

1230

1610

HMI &AIA Science

Hk

0.04

30d cache40TB each

Quick Look

LMSAL secure host

Level 0(HMI & AIA)2 processors

75

Level 1(HMI)

16 processors

Online Data325TB+50TB/yr

HMI High LevelProcessingc. 200 processors

HMI Science Analysis Archive 650TB/yr

Redundant data capture system

1210

1210

Data Exports

1200

LMSAL Link(AIA Level 0, HMI Magnetograms)240

1610

1820

1230

rarelyneeded

1230

2 processorsSDO Scientist &User Interface

Figure 6. HMI-AIA SOC data volume. Estimates of data flow in Gigabytes/day. See Table 1 for assumptions.

The offline archive media is yet to be determined but will likely be one of the non-proprietary high-density half-inch formats. The media used for the telemetry data archive need not be the same format as the primary archive offline media. There are actually advantages in choosing a different media type if there are two or more equally attractive options none of which have long enough histories to allow certainty of long term stability. This choice will be made in 2006.

The connections in and out of the data capture system will at a minimum be gigabit Ethernet. The connections from the DDS will be OC3 dedicated lines into the Stanford central communications center and distributed on Stanford fiber from there.

The present HMI facilities in the Hansen Experimental Physics Laboratory (HEPL) will be razed in the fall of 2007. The new facilities will be a combination of facilities in the new Varian II HEPL and KIPAC (Hansen Experimental Physics Laboratory and Kavli Institute for Particle Astrophysics and Cosmology) and another temporary building at a site TBD. Space for the JSOC data center has been allocated in the first basement of the Varian-II building which is scheduled be completed by May 2006. If the final site for the data center is Varian-II rather than the new temporary building (our preferred choice) then it is likely that the OC3 lines will be in place before the first DDS-JSOC testing. However it is apparently not difficult to move the terminal location of these lines once they get to the Stanford communications hub.

24

Page 30: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

4.2. Pipeline Processing system

Currently MDI data production and analysis is done on an 8-CPU SGI 2200 server (SPECfp_rate 26) and a 12-node dual-Xeon cluster (total SPECfp_rate 168) with an overall utilization of less than 50% (SPECfp_rate 100-). By comparison, HMI data production and analysis requirement is expected to see a 50-fold increase, or SPECfp_rate 5000, based on the estimate that local helioseismology analysis and vector magnetogram inversion, the two dominant tasks in terms of CPU usage, would require a combined processing power of SPECfp_rate 2500.

Processing requirements for the various steps can be estimated based on experience with MDI and TRACE and with model calculations that have been done in Phase-A. We estimate that, in units of projected year 2007 high-end 64-bit PC processors, we will need 2 processors for data capture, 2 processors for Level-0, 16 processors for combined Level-1 and AIA higher level (at SU), and about 190 processors for the higher level HMI products. The Stanford processor "farm will contain about 210 processors, about 10 more than needed for HMI alone.

4.3. Archive (online, offline, nearline, shelf, offsite)

Disks: For HMI we plan to keep a 30-day cache of raw telemetry (20 TB), a 90-day cache of filtergrams (90 TB), and observables for the life of the mission (50 TB/yr) on RAID arrays. For a 5-year mission the required total disk capacity is 400 TB.

Currently the best price for a fibrechannel RAID array with IDE disks is about $3.3 per GB (Nexsan ATABeast, 12 TB for $40K). Assuming a factor of 4 improvement in price/performance, we estimate the cost of the disk to be $320K for the first 5 years.

Online data storage will need to contain 60 days of level-0, at least 100 days of level-1, and all higher level products. If this model does not change we must start with 450TB of disk and grow to 550TB. We desire to maintain the Level-1 data online for the full mission. This would increase the initial configuration to 900TB and require an additional 450 TB per year. It is unlikely that this volume of disk can be obtained within our cost limits at the start of the mission but may well be possible after a few years if disk capacities and prices continue present trends.

Tapes: We plan on using high density high capacity cartridge tape systems. The prototype system uses 500GB SAIT tapes. We expect 1TB tapes to be available at or shortly after launch.

Tape Libraries: For HMI we plan to keep a 90-day cache of raw telemetry (60 TB), a 270-day cache of filtergrams (270 TB), and enough blank tapes for 30 days' archiving needs (60 TB) near-line. This makes the nearline storage requirement about the same as the online disk capacity. For AIA the tape library requirement is much larger. The need will start at 300TB for a 9-month capacity and grow by XXX TB per year.

Today a midsize tape library with two fibrechannel LTO-2 drives and 150 tape slots (200GB each) costs about $40K (Exabyte Magnum 20).The fourth generation LTO (or competing formats such as SDLT) is expected to reach 800GB per cartridge by 2007. If tape drive and robotics prices remain constant, a set of 4 libraries will offer approximately 500 TB of near-line storage for $160K.

25

Page 31: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

4.4. Connectivity (web, grid, DDS, LMSAL, MOC, etc)

Stanford

DDS

Router

MOCLMSAL 1 Gb

Private line

Router

JSOC Disk array

Router

Router

NASAAMES

WhiteNet

Stanford

DDS

Router

MOCLMSAL 1 Gb

Private line

Router

JSOC Disk arrayJSOC Disk array

Router

Router

NASAAMES

WhiteNet

Figure 7. Network connectivity of the JSOC. The connection to the DDS is 2 OC-3 lines, one for HMI and one for AIA. The LMSAL-MOC connections are dedicated secure lines.

2 OC-3 to DDS, gigabit to LMSAL, XXX to GRID,

26

Page 32: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

5. Development Plan

JSOC Data system is being Stanford as an evolution of the existing operating MDI data system. The development plan is basically as follows:

2004 - 2005: First 2 Years • Procure development system with most likely components • Modify pipeline and catalog infrastructure and implement on prototype system. • Modify analysis module API for greater simplicity and compliance with pipeline. • Develop calibration software modules.

2006 - 2007: Two years prior to launch • Complete Level-1 analysis development, verify with HMI test data. • Populate prototype system with MDI data to verify performance. • Procure, install, verify computer hardware. • Implement higher-level pipeline processing modules with Co-I support

During Phase-E • Add media and disk farm capacity in staged plan, half-year or yearly increments • First two years of mission continue Co-I pipeline testing support

Figure 8. JSOC development schedule.

27

Page 33: Helioseismic and Magnetic Imager - Stanford Universityhmi.stanford.edu/doc/SOC_GDS_Plan/JSOC_GDS_Plan_Overview_CDR.pdf · Helioseismic and Magnetic Imager for Solar Dynamics Observatory

JSOC GDS Science Data Processing Plan version 0.7

5.1. Data EGSE

The needs of the system for the high speed bus component of the EGSE used for instrument development and testing is nearly the same as that needed for the JSOC capture system. The data EGSE system has been completed in its first simple version and now functions with data delivered from the SDO spacecraft simulator. The first version has 2 processors and XXX GB od disk. The second version, HMI-2 and AIA-2, will have the same software system but will have additional disk and a tape subsystem so that test data may be imported into the JSOC Pipeline system prototype for analysis.

5.2. Prototype System

The prototype system consists of a tape library, disks, a quad-processor Opteron system and use of an available Altix 8-processor system. This system has been ordered and will be in place shortly after HMI CDR. It will be running a prototype of the JSOC catalog in winter 2005 and will be used for high volume simulated dataflow shortly after that. After initial testing it will be populated with MDI data and the MDI DSDS compatibility system will be installed. This will allow the basic DataUnit Catalog and dataunit handling system to be fully exercised in a realistic user environment for several months while the JSOC catalog system is being tuned.

5.3. Pipeline

The pipeline infrastructure code will be first developed as part of the prototype system. After experience it will be revised as necessary with a goal of freezing the basic capabilities before the beginning of science module development. The semi-automatic processing control system will be developed in parallel with the science modules.

The pipeline will produce a set of standard data products for each instrument. For HMI it will proceed through to the higher level standard products. For AIA it will be a restricted set needed for quicklook and for users to determine which full datasets they need for detailed analysis.

The HMI pipeline is illustrated in Figure 4.

5.4. Science Modules

The development of the science processing modules will, for the most part, be delayed until two years prior to launch when the Co-Investigators first receive support funding. The basic pipeline for helioseismology and coronal field inferences is based on code developed for MDI analysis. That code will be ported to the JSOC environment earlier as needed for instrument testing.

28