Upload
bernardo-emens
View
239
Download
0
Tags:
Embed Size (px)
Citation preview
NGAS – The Next Generation Archive System
Jens Knudstrup
NGASThe Next Generation Archive System
NGAS – The Next Generation Archive System
Jens Knudstrup
Motivation
Motivation for NGAS:
- Handle huge amount of data streams in real time.
- Reduce operational costs (man-power).
- Decrease expenses in general.
- Provide online and offline processing capabilities.
- Ease integration of archive facility with external clients/applications.
- Provide a common concept for the online archive and the long-term storage facilities (NGAS ≈ OLAS + ASTO + Jukebox SW + more). Note, no plan to replace OLAS for now.
- Simplify and unify the overall infrastructure of the archive system.
- Increase data security.
NGAS – The Next Generation Archive System
Jens Knudstrup
Main Objectives
Main Objectives of NGAS:
Provide an archive facility with services for handling all stages in the life-time of data files:
- Archiving files (+ on-the-fly checking and processing).
- Retrieving & on-the-fly processing of files.
- Ensuring data consistency.
- Providing services for managing data.
- (Executing complex, parallel data processing - TBD)
In addition, to provide a system:
- Which is adaptable to specific contexts.
- With a high performance + scalable.
NGAS – The Next Generation Archive System
Jens Knudstrup
NGAS: History
History of NGAS:
- April 2001: Project started.- Mid June 2001: First operational prototype.- June 2001: Review + approval of design/concept.- Beginning July 2001: Installation/commissioning at La Silla (2.2m/WFI).- Mid July 2001: Entered operation at La Silla.- August 2001: Started operation of Garching NGAS Cluster.- February 2001: Upgrade from Suse to RedHat Linux.- August 2003: Installation/commissioning at Paranal (VLTI).- January 2004: Installation of second archive system for 3.6m/LS.- March 2004: First integration of NGAS on new HW (SATA).- September 2004: First tests using NGAS together with RAID5 Arrays.- September 2004: Archiving of HARPS pipeline products.- December 2004: Archiving of WFCAM frames from Cambridge/UK.
NGAS – The Next Generation Archive System
Jens Knudstrup
NGAS: Components
Main Components of the NGAS Project:
1. NGAS SW – NG/AMS (Next Generation Archive Management System).
2. NGAS WEB Interfaces.
3. HW – (low cost) PCs with removable ATA disks.
4. NGAS OS (Linux).
5. NGAS Utilities.
6. NGAS Installation and Configuration Tools.
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Basic Concepts
Basic Concepts of the NGAS SW (NG/AMS):• NG/AMS is a platform/framework providing basic services.
• No information is hard-coded to support specific types of data – NG/AMS ‘does not know’ what e.g. a FITS file is.
• No information is hard-coded to support specific HW configurations.
• The specific behavior and the specific knowledge has to be added to the NGAS system – customizable.
• Based on standard protocols and formats wherever possible – can be used as a building block.
• Simple - advanced features can be added in front-end applications giving clients a different view of the data + provide specific services.
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Main Features/1
Main Features of NG/AMS (1):• Multi-threaded server.
• Standard communication protocol (HTTP) + HTTP Authentication.
• Data file archiving via Push and Pull Techniques.
• Subscription Service including filter mechanism.
• DB synchronization (DB Snapshot Feature).
• Easy adaptation to different kinds of DBMS’ (ANSI SQL Engine/DB Driver).
• Flexible/adaptable due to usage of 10 different kinds of plug-ins.
• Many configurable parameters.
• XML information exchange.
• Email Notification Service.
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Main Features/2
Main Features of NG/AMS (2):• Advanced logging service (Verbose, Local Log File, Syslog).
• Background Data Consistency Checking.
• Operation in Cluster Mode.
• Transparent data retrieval & on-the-fly processing.
• APIs in ANSI-C and Python + two clients applications based on these.
• Archive Client for secure and simple, remote data file archiving.
• Many commands to interact with and control the system.
• Portable.
• Unit/Functional Tests.
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Server
Data
Provider
Data Provider Host
Data
Requestor
Data Requestor Host
Info
Requestor
Info Requestor Host
NGAS
DB
DBMS Host
Operations UNIX Sys
Logs Log
NG/AMS
Server
Main Disks Array
NGAS Host
Replication Disk Array
Stdout
NG/AMS
Configuration
Archive Pull Request
Data
Subscriber
Client
Data Subscriber Host HTTP POST
Request
NG/AMS
Server
NGAS Subscriber Host
Archive Push Request
HTTP POST Request
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Storage Media Infrastructure
Basic Infrastructure of Storage Media:
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: XML Information Exchange
Interprocess Data Exchange:- Most information exchanged between NG/AMS Servers and between the NG/AMS Server
and clients, is based on XML.
- Example, NgasDiskInfo Document (NG/AMS Status XML Document):
<?xml version="1.0" ?><NgamsStatus> <Status Date="2003-01-02T08:40:23.350" HostId="acngast1" Message="Disk status file" Version="v2.0-Beta2/2002-12-04T09:22:53"/> <DiskStatus Archive="ESO-ARCHIVE" AvailableMb="32300" BytesStored="8709834319“ Checksum="" Completed="0" CompletionDate="“ DiskId="IC35L040AVER07-0-SXPTX093675" InstallationDate="2002-11-25T09:48:25.000“ LogicalName="FITS-M-000001“ Manufacturer="IBM" NumberOfFiles="163“ TotalDiskWriteTime="905.324898006" Type="MAGNETIC DISK/ATA"/></NgamsStatus>
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: HTTP Command Interface
ngasmgr@acngast1:/opsw/NGAS/ngams> telnet acngast1 7777Trying 134.171.21.30...Connected to acngast1.Escape character is '^]'.GET STATUSHTTP/1.0 200 OK<?xml version="1.0" ?><!DOCTYPE NgamsStatus SYSTEM "http://acngast1.hq.eso.org:7777/RETRIEVE?internal=ngamsStatus.dtd"><NgamsStatus> <Status Date="2002-12-23T14:59:42.724" HostId="acngast1" Message="Successfully handled command STATUS" State="ONLINE" Status="SUCCESS" SubState="IDLE" Version="v2.0-Beta2/2002-12-04T09:22:53"/></NgamsStatus>Connection closed by foreign host.
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: DB Synchronization
DB Synchronization:• NGAS DBs replicated from Paranal/La Silla to Garching (Unidirectional).
• Synchronization between DBs of the various NGAS sites also carried out by NGAS.
• NG/AMS maintains snapshot (DBM) on the disks with info about the files stored on it.
• Local DB synchronized with this info when the disk reappears on a site.
• DB Snapshot can be used as a table of contents for the disk.
LS
NGAS
DB
La Silla
PAR
NGAS
DB
Paranal
PAR
NGAS
DB
Garching
DB
Snapshot
NG/AMS DB Synchronization
DBMS Synchronization (Sybase)
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Plug-Ins
NG/AMS Plug-Ins:-Ten different kinds of plug-ins provided. These make it possible to adapt the system to different kinds of hardware and different types of data – nothing is hard-coded:
1. Online Plug-In.
2. Offline Plug-In.
3. Data Archiving Plug-In.
4. Checksum Plug-In.
5. Data Processing Plug-In.
6. Registration Plug-In.
7. Label Printer Plug-In.
8. Filter Plug-In.
9. Suspension Plug-In.
10. Wake-Up Plug-In.
-Standard plug-ins delivered with the system. Possible to replace these or add new plug-ins when needed.
-The plug-ins delivered with a distribution of NGAS should be viewed as belonging to the core of the system when it comes to testing.-Normal user does not need to know about the plug-ins used.
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Plug-Ins
Data Archiving Plug-In – Basic Functioning:
Replication Disk
Storage
Area
Staging
Area
Main Disk
Bad Files
AreaStorage
Area
NgasDiskInfo
Target Storage Set
NG/AMS
Server
DAPI
Data File
NGAS
DB
1. Archive Request
2. Reception in
Staging Area
3. DAPI Invocation
4. Data Checking/Processing,
Parameter Extraction
5. DAPI Return Status
6. Storage of Main
File in Final Location
7. DB Update,
Main File
8. Replication of File
9. DB Update,
Replication File
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: XML Configuration
NG/AMS Configuration (1):• About 110 different configurable parameters.• Configuration can be loaded from an XML document or from the DB
or a combination of these.• Possible to re-use DB based parameters to compose specific
configurations (easier to handle many, slightly different installations). • Main groups of configurable parameters (1):
- Basic Parameters: Port number, simulation mode, proxy mode, root mount point, …
- Plug-Ins: The various plug-ins the system should use e.g. to handle data of a specific type.
- DB Connection: The DB connection parameters.- Permissions: Archive, Retrieve, Processing, Remove Requests allowed.- Archive Handling Parameters: Parameters for handling Archive Requests.- Accepted Data Types: Types of data (mime-types) the system is can handle.
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: XML Configuration
NG/AMS Configuration (2):• Main groups of configurable parameters (2):
- Storage Sets: The disk configuration.
- Streams: Defines how the different kind of data should be streamed onto the Storage Sets.
- Available Processing Capabilities: Defines the types of data that can be processed and which Data Processing Plug-Ins to use.
- Data Check/Janitor Thread Configuration: Parameters to tune the Data Checking and Janitor Threads.
- Logging Parameters: E.g. name of log files + intensity to apply when logging.
- Email Notification Parameters: Recipients of the various types of Email Notification Messages.
- Host Suspension Parameters: Parameters for suspending a host + for waking up suspended hosts.
- Subscription Parameters: Parameters to define if a server should subscribe for data.
- Authorization Parameters: Defines the known users and their access code.
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Data Consistency Checking
Data Consistency Checking:- Necessary constantly to monitor the condition of the data in the archive.
- Data Consistency Checking – Thread running in background.
- Possible to tune the amount of resources occupied by the service.
- A check run can be scheduled to run periodically via the configuration.
- Checksum check, file availability, unregistered files on storage media.
- A check sub-thread is started per disk (max. number configurable).
- Info about files on the system dumped once in a DBM, retrieved file by file during checking.
- Possible to resume a checking from where the previous was interrupted.
- Email Notification send to subscribers in case problems found, e.g.:
Subject: NGAS-arcus2-7778: DATA INCONSISTENCY(IES) FOUNDDate: Fri, 25 Jan 2002 01:06:26 +0100 (MET)From: [email protected] Message:DATA INCONSISTENY(IES) FOUND IN DATA HOLDING:Date: 2002-02-12T15:32:05.424NGAS Host: arcus2Inconsistencies: 1Problem Description File ID Version ----------------------------------------------------------------------------------ERROR: Inconsistent checksum found TEST.2001-05-08T15:25:00.123 3 ----------------------------------------------------------------------------------
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Operation in Cluster Mode/1
Example:NGAS
Super Node(Proxy Mode)
NGASSuper Node
(Proxy Mode)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASMain Node 1
NGASMain Node 1
NetworkSwitch
NetworkSwitch
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASMain Node 2
NGASMain Node 2
NetworkSwitch
NetworkSwitch
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASSub-Node
(10.X.X.X)
NGASMain Node 3
NGASMain Node 3
NetworkSwitch
NetworkSwitch
NetworkSwitch
NetworkSwitch
Retrieve RequestPrivate Network
Cluster Back-Bone Network
25
34
1
6
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Operation in Cluster Mode/2
Example:
NGAS
Main Node
NGAS
Main Node
Network
Switch
Network
Switch
Retrieve Request
NGAS
Node
NGAS
Node
NGAS
Node
NGAS
Node
NGAS
Node
NGAS
Node
NGAS
Node
NGAS
Node
NGAS
Node
NGAS
Node
NGAS
Node
NGAS
Node NGAS
Node
NGAS
Node
NGAS
Node
NGAS
Node
2
1
34
NGAS
Node
NGAS
Node
NGAS – The Next Generation Archive System
Jens Knudstrup
Garching NGAS Cluster
NGAS Cluster
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Data Processing
Data Processing at Retrieval:
• Simple processing supported when retrieving files.
• Possible to request the system to apply a Processing Plug-In on the data and to send back the result of the plug-in rather than the data itself.
• Processing performed on the sub-node hosting the data.
• Possible for clients to use the NGAS Cluster as a ‘number cruncher’ to carry out parallel data processing in a simple manner.
• Reduces the amount of data to be transferred to the client. I.e., a floating point number may be returned rather than the entire data file.
• Can be extended by providing new Data Processing Plug-Ins for specific contexts.
• Could be used to integrate NGAS with the AVO or other archive services.
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: APIs
NG/AMS APIs + Clients:• Two APIs implemented in C (C library) and Python (class) provided.• Facilitates implementation of client applications communicating with NGAS, e.g. to
retrieve data files.• Two command line utilities are provided, based on the C and Python API, which
can be used to interact with an NG/AMS Server.• A standalone Archive Client is provided, based on the C-API:
— Independent of any DBMS.— Can be used to archive files from any remote host which can access the NGAS Archive
via HTTP.— Attempts to archive file is retried until success is returned or file classified as bad by the
remote NGAS system.— Files not cleaned up before cross-checking that they are really in the remote NGAS
Archive (CHECKFILE Command).— First applications: Archiving of HARPS pipeline products and WFCAM files from
Cambridge/UK.
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS Client Applications
NG/AMS Archive Client
NG/AMS
Server
Remote NGAS System
NG/AMS
Archive
Client
Data Provider Host
Archive Queue
Archived Files Area
Bad Files Area
Log Files Area
BAD
Log Info
Log Rotation Control
Archive Requests + Commands NGAS
DB
NGAS – The Next Generation Archive System
Jens Knudstrup
NG/AMS: Server Commands
NG/AMS Server Commands (HTTP Protocol):- Commands issued as URLs: http://<Host>:<Port>/<Command>[?<Par=Val>[&<Par=Val>]]
- Commands:• ARCHIVE: Archive data with Archive Push or Archive Pull Technique.• CHECKFILE: Execute an explicit file check of the given file.• CLONE: Clone an entire disk or individual files.• CONFIG: Configure an online system.• DISCARD: Force removal of file from disk and/or DB independent of number of copies.• EXIT: Make the NG/AMS Server exit.• INIT: Re-initialize the NG/AMS Server.• LABEL: Print out disk labels.• OFFLINE: Bring server to Offline State.• ONLINE: Bring server Online.• REGISTER: Register a file of a set of file already stored on an ‘NGAS Disk’.• REMDISK: Remove a disk from the archive (only allowed if at least 3 copies of each files available).• REMFILE: Remove a file from the archive.• RETRIEVE: Retrieve a file, transparently, from the archive.• STATUS: Query status about the server or another component in the NGAS system/cluster.• SUBSCRIBE: Subscribe to new data or a set of data.• UNSUBSCRIBE:Unsubscribe a previously created subscription.
NGAS – The Next Generation Archive System
Jens Knudstrup
Unit/Functional Tests - Features
Unit/Functional Tests:- Extensive set of automatic tests provided, consisting of:
- 30 Test Suites.
- ~130 Test Cases.
- Tests portable (platform/HW independent).
- Testing the business logic of the system and correct functioning (simulation mode).
- Need to add more Test Cases for testing correct and consistent behavior under abnormal conditions and stress tests.
- Needs to be enhanced with ~200 Test Cases before next release.
- Possible to generate Test Plan from test code (next slide - overhaul ongoing).
NGAS – The Next Generation Archive System
Jens Knudstrup
Unit/Functional Tests - Test Plan
Example:
NGAS – The Next Generation Archive System
Jens Knudstrup
NGAS WEB Interfaces
NGAS WEB Interfaces:• WEB Interfaces provided to assist operators in querying the status of the system and to
search for various components (data files, disks, machines).
• Used at all sites by the operators (Garching, Paranal, La Silla).
• Based on Zope. WEB management system providing editing via WEB browser (http://www.zope.org).
• Local Zope WEB Servers available on each site.
• Tools provided to list disks, find specific files get an overview of the nodes and their status.
• Also the so-called Operator’s Log Book is provided. The operators use this to log all actions carried out.
• Used by the operators at Paranal/La Silla to monitor the online archiving activities.
• Services missing for interacting with the system. Only possible to control the disk label printing for now.
• An enhancement is planned in the near future.
NGAS – The Next Generation Archive System
Jens Knudstrup
NGAS System/OS
NGAS OS Distribution:- Started on a Suse Linux distribution and migrated to RedHat Linux (ESO
standardization).- OS distribution prepared/managed by OTS-SOS.- Support for single-processor and multi-processor configurations.- Support for old HW (PATA) and new HW (SATA).- Limited installation, many packages removed to reduce the size of system.- Special packages needed by NGAS: Python, Sybase interface, Zope, … -
installed by the NGAS Installation Tool.- Special driver SW needed for the 3ware controller.- Zope WEB server running on some nodes (optional).- 3ware disk controller WEB server running on every host.- Possibility to back-up/restore complete system by means of the Mondo/Mindi
tool kit (from a single CDROM) in 10 minutes.- From July 2004 NGAS OS platform installed with kickstart installation script.
NGAS – The Next Generation Archive System
Jens Knudstrup
NGAS HW
NGAS HW (1):- Started with 8 slots parallel ATA systems.- 8 x 80 GB storage capacity per node (640 GB/node, ~1.2 TB compressed).- Since March 2004 a 24 slot serial ATA system in operation (up to 24 * 400 GB
= 9.6 TB/node, 19.2 TB compressed).- Reduces price per GB.- More robust HW amongst other due to serial ATA (cleaner cabling).- Disk handling easier, more robust disk frames.- Overall HW stability (hopefully) better and less intervention needed (TBC).- Amount of data/CPU should be balanced to be able to process the data in a
limited time.- TBD when to use new HW in operation at observatory sites.- Investigating usage of RAID5 rather then JBOD disks.
NGAS – The Next Generation Archive System
Jens Knudstrup
NGAS HW
NGAS HW (2):
NGAS – The Next Generation Archive System
Jens Knudstrup
NGAS Utilities
NGAS Operator’s Utilities/Installation Utilities:- Small module provided (NGAS Utilities) with utilities for the daily work of the
operators:- Limited time invested in this so far, however essential tools for the operation provided
(e.g. Clone Verification Tool, Check File List Tool, Clone File List Tool, …).
- The function of many of these tools should be taken over by the NGAS WEB Interfaces when these have been enhanced.
- The module NGAS Installation Tools provides some utilities to install and check the system:
- Tool provided to build ‘NGAS layer’ on top of the ‘basic’ NGAS Linux distribution.
- Functionality still to be implemented.
NGAS – The Next Generation Archive System
Jens Knudstrup
NGAS Infrastructure
Present ESO NGAS Infrastructure:
NGAS
DB NGAS
DB
NGAS
DB
Replication Archive
Disk Sets
Archive Unit
Buffering Unit
Archive Handling Unit
Cluster Unit
Ext.
Archive
ClientExt.
Archive
Client
LS
PAR
GAR
INS
NGAS – The Next Generation Archive System
Jens Knudstrup
NGAS: Future Plans
(Near) Future Plans for NGAS:• Received detailed requirements from archive operations.• Enhance NGAS WEB Management Interfaces.• Enhancement of services for operation in cluster (extended proxy mode).• Enhancement of installation utilities.• Enhancement of unit tests (simulation of archive cluster operation).• Implement load balancing/archive cluster operation for high availability/high
data rates (VST/ΩCam: up to 300 GB/night, VISTA/VistaCAM up to 1 TB/night - TBC).
• Support for advanced data processing, utilizing an NGAS Cluster as a parallel processing engine (specify complex recipes, which are executing parallel data processing) – will be analyzed in the near future.
• Support for the Astrophysical Virtual Observatory/GRID?
NGAS – The Next Generation Archive System
Jens Knudstrup
Status - December 2004
Status of NGAS Project December 2004:- In operation since July 2001.
- Used heavily on a daily basis by archive operators in Garching.
- Data archived daily at La Silla, Paranal and at ESO HQ.
- Data archived directly into NGAS Archive in Garching from Paranal and Cambridge/WFCAM.
- Some statistics:- Total number of nodes: ~25.
- Total number of disks in use: ~260.
- Total number of files in NGAS Archive: ~1,500,000.
- Amount of compressed data in NGAS Archive: ~27 TB.
- Amount of uncompressed data in NGAS Archive: ~45 TB.
- Maximum throughput per node (archiving): ~400 GB/24 hours (including compression).
- Major Issues to Address:- Need to invest more resources in implementing automatic tests in particular for testing robustness and handling of
abnormal conditions.
- Need to implement resources in implement an enhanced user interface - not very user-friendly at the moment.
- Need to update the design document to reflect present status of system (not updated since it was written SPRING 2001).
- Should investigate improved ways of ensuring data consistency and means for recovering lost data.