14
Ten Years of Software Sustainability at The Infrared Processing and Analysis Center G. Bruce Berriman and John Good NASA Exoplanet Science Institute, Infrared Processing and Analysis Center, Caltech, USA Ewa Deelman Information Sciences Institute, University of Southern California, USA Anastasia Alexov Astronomical Institute Anton Pannekoek, Amsterdam, Netherlands Presentation at AHM 2010, Cardiff, September 2010.

Ten Years of Software Sustainability at The Infrared Processing and Analysis Center

  • Upload
    aolani

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Ten Years of Software Sustainability at The Infrared Processing and Analysis Center. G. Bruce Berriman and John Good NASA Exoplanet Science Institute, Infrared Processing and Analysis Center, Caltech, USA Ewa Deelman Information Sciences Institute, University of Southern California, USA - PowerPoint PPT Presentation

Citation preview

Page 1: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

Ten Years of Software Sustainability

at The Infrared Processing and

Analysis Center

G. Bruce Berriman and John GoodNASA Exoplanet Science Institute,

Infrared Processing and Analysis Center, Caltech, USAEwa Deelman

Information Sciences Institute, University of Southern California, USA

Anastasia AlexovAstronomical Institute Anton Pannekoek, Amsterdam,

Netherlands

Presentation at AHM 2010, Cardiff, September 2010.

Page 2: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

The Role of IPAC in Astronomy

http://www.ipac.caltech.edu

Long-term archive

Curation of data

Dissemination to the community

Page 3: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

Size and Usage Have Grown

Archives contain data from 30 missions and projects

Space based, ground based and knowledge based

Archives Built on a Common Hardware And Software ArchitectureArchives Built on a Common Hardware And Software Architecture

85 million queries

3 TB/month downloaded

Page 4: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

A Common Software Architecture

Application is usually a CGI program

Each component is a module with a standard interface that communicates with other components and fulfills one general functionModules are stand-alone portable ANSI-C toolsComponents plugged together & controlled by an executive library Executive starts components as child services and parses return values

Application is usually a CGI program

Each component is a module with a standard interface that communicates with other components and fulfills one general functionModules are stand-alone portable ANSI-C toolsComponents plugged together & controlled by an executive library Executive starts components as child services and parses return values

Applications are generally simple web forms or Web services that search for data The “smarts” are on the server side;

optimize complex queries on large data sets

Component based architecture which enables strong re-use and adaptation Optimized for astronomical spatial

searches and complex, general queries regardless of wavelength and type of mission

All services are integrated into the Infrared Science Information System (ISIS)

Components are generic; minimize dependencies on third-party software or environments

Avoid shared memories or system calls All database queries are performed in

one module 300 KLOC

New projects automatically inherit functionality Supports efficient development and

controls maintenance costs

Page 5: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

Engage Your Users! Concerted program of user engagement to

attract new users and build a user community

Method

User Surveys

End User Group(drawn from the community)

Exhibits and demos

Coffee pot conversations

Advertize in newsletters

Number of end-users has increased to 18,000

12% of peer-reviewed papers cited IPAC archives or data

Actively seek feedback, e.g.

Watch users as they try services; see where they get stuck

User Surveys ask respondents to write down their views rather than answer questions

Page 6: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

Listen to the advice you don’t want to hear

Listen to the advice you don’t want to hear

Page 7: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

Speed Is King In An Archive Image data sets becoming very large: Spitzer

Space Telescope will deliver over 100 million images, with varying footprints on the sky.

Searches for spatially extended images are slow: a scan of Spitzer images can take 2,000 s

… results pages are becoming more complex.

What matters more – fast access? Or interactivity? Speed won hands down.

Page 8: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

R-tree Indexing Uses hierarchically

nested minimum bounding boxes

Performance scales as log(N)

Performance gain of x1000 over table scan

Memory-mapped files Parallelization / cluster processing REST-based web services

Segment of virtual memory is assigned a byte for byte correlation with part of a file.

Page 9: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

Modernization of Scanpi Written in 1983, Scanpi co-adds scans from the far-infrared

IRAS survey. 15 papers per year on average by 2007.

Sensitivity gain of x5 over survey data products

Improve spatial resolution of extended or confused sources

User panel strongly recommended modernization because of its value in supporting interpretation of data from current IR missions Spitzer and Herschel.

But it was coughing up blood and was a classic legacy program

Written in F66, it had become a patchwork of scripts and bug fixes and was a maintenance nightmare.

Dependent modules for data compression etc. no longer supported.

Stranded on Solaris 2.8

Developer retiring

Page 10: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

Scanpi Workflow

Co-registerscans

Co-add all scans

Re-usable Components

plotting

background

table manipulationbulk download

coordinate transformation

Sourcefitting

Back-ground fitting

Output:Results and

files on Web

Get scansInput:Source

info

Rewritten from ground up in C

Developed as a workflow application that gives visibility into the processing steps

Calls existing components, reduce code base to 21 KLOC cf. 102 KLOC

1.25 FTE development cf. 0.5 FTE for maintenance

Page 11: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

The Montage Image Mosaic Engine

Montage Workflow (http://montage.ipac.caltech.edu)

Reprojection Background Rectification Co-addition OutputInput

BgModel

Project

Project

Project

Diff

Diff

Fitplane

Fitplane

Background

Background

Background

Add

Image1

Image2

Image3

Creates science-grade image mosaics

Scalable, modular design

ANSI-C code (300 MB) runs on all common *nix platforms – desktops, clusters, grids and supercomputers.

Processes 40 million 2MASS pixels in 32 min on 128 nodes of 1.2 GHz Linux cluster

Creates science-grade image mosaics

Scalable, modular design

ANSI-C code (300 MB) runs on all common *nix platforms – desktops, clusters, grids and supercomputers.

Processes 40 million 2MASS pixels in 32 min on 128 nodes of 1.2 GHz Linux cluster

Page 12: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

How Is It Used? Science Analysis

Support Production of Data Sets, Data Products and Preview Products

Incorporate into Workflows and Pipelines

Spitzer Space Telescope teams

Quality Assurance of data products

5,000 downloads by bona-fide astronomers

Users now contributing to the project

Scripts for generating mosaics

Python front ends

MPI version

Contributed Script (Dr. Inseok Song)

Page 13: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

Development of Cyber Infrastructure

Task scheduling in distributed environments (performance focused)

Designing job schedulers for the grid

Designing fault tolerance techniques for job schedulers

Exploring issues of data provenance in scientific workflows

Exploring applicability of scientific applications running on Clouds

Developing high-performance workflow restructuring techniques

Developing application performance frameworks

Developing workflow orchestration techniques

Cost of running workflows on Amazon EC2 cloud

Page 14: Ten Years of Software Sustainability at  The Infrared Processing and Analysis Center

Best Practices for Software Sustainability

Design for sustainability, extensibility, re-use and portability

Build an engaged user community that encourages users to contribute to sustainability

Be careful about new technologies – do a cost benefit analysis before adopting them

Use rigorous software engineering practices to ensure well-organized and well-documented code.

Control your and manage your interfaces.

Make source code and test and validation data available