13
Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Embed Size (px)

Citation preview

Page 1: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Towards Self-Describing Workflows for Climate Models

Kathy Saint – UCAR

Ufuk Utku Turuncoglu – ITU

Sylvia Murphy – NCAR

Cecelia DeLuca – NCAR

Page 2: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Outline

• Motivation• Application• Implementation• Collecting Provenance• Future Steps• Analysis of Kepler

Page 3: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Motivation

• Problems in typical Earth System Modeling Application– Changing the science in complex Earth system models can involve

numerous parameter changes that are hard to record and track– HPC is complex and involves many technologies each with its own

learning curve– Reproducibility is becoming increasingly important– It is not easy to share information (configuration parameters,

results, post-processing scripts)

Page 4: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Motivation (cont.)• Approach:

• The user can create a different case with only minor changes in the workflow

• The workflow layer can hide the details of different technologies such as the computing environment, model and post-processing tools etc.

• Users can query collected standardized provenance information to compare, debug, or reproduce the results

• Users can share information easily:• They can run same case with different input and parameters

Page 5: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Components of Workflow Environment

The workflow encapsulates the technical details of the compute platform and allows the user to focus on the science of the model.

Page 6: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Conceptual Workflow

Workflow includes uploading source code; creating, building and running case; and collecting provenance data.

Page 7: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Implementation

The implementation can be mapped back directly to the conceptual workflow.

Page 8: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Collecting Provenance

• Provenance is defined as structured information that keeps track of the origin and derivation of the workflow.

• The basic types of provenance information:• System (system environment, OS, CPU architecture, compiler

versions etc.)• Data (history or lineage of data, data flows, input and outputs,

data transformations)• Process (statistics about workflow run, transferred data size,

elapsed time etc.)• Workflow (version, modifications etc.)

Page 9: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Collecting Provenance

• CCSM is a multi-component model and which makes it complicated to collect provenance information.

pymake – provided my ORNL and NCSU [2,7,10]tgwrapper.pl – uses SoftEnv [9] and Modules [8] applications

Page 10: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Future Steps

• Integration with Web Services– Move logic from Kepler platform to Web Server platform– Simplifies client, so user doesn’t have to build a custom

Kepler with custom actors– Takes advantage of existing actors for communicating with

SOAP services• WebServices – for handling simple message types• WSWithComplexType – for handling complex message types

– An extension of the ESMF Web Services

Page 11: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Future Steps

An idea of what the new, simplified workflow will look like, utilizing web service actors.

Page 12: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

Analysis of Kepler

• Pros– Ease of Use– Customization

• Cons– WSWithComplexType limited & hard to debug

• Suggestions– Better discussion boards (searchable)

Page 13: Towards Self-Describing Workflows for Climate Models Kathy Saint – UCAR Ufuk Utku Turuncoglu – ITU Sylvia Murphy – NCAR Cecelia DeLuca – NCAR

References

[1] Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludäscher, B., Mock, S., 2004, Kepler: An Extensible System for Design and Execution of Scientific Workflows, 16th Intl. Conf. on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece.

[2] Altintas, I., Chin, G., Crawl, D., Critchlow, T., Koop, D., Ligon, J., Ludaescher, B., Mouallem, P., Nagappan, M., Podhorszki, N., Silva, C., Vouk, M., 2007, Provenance in Kepler-based Scientific Workflow Systems. Microsoft e-Science Workshop, poster.

[3] Barton, T., Basney, J., Freeman, T., Scavo, T., Siebenlist, F., Welch, V., Ananthakrishnan, R., Baker, B., Goode, M., and Keahey, K. 2006, Identity Federation and Attribute-based Authorization through the Globus Toolkit, Shibboleth, Gridshib, and MyProxy. 5th Annual PKI R&D Workshop, April 2006.

[4] Catlett, C. et al. "TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications," HPC and Grids in Action, Ed. Lucio Grandinetti, IOS Press 'Advances in Parallel Computing' series, Amsterdam, 2007.

[5] Furlani J. L., "Modules: Providing a Flexible User Environment", Proceedings of the Fifth Large Installation Systems Administration Conference (LISA V), pp. 141-152, San Diego, CA, September 30 - October 3, 1991.

[6] Hill, C., C. DeLuca, V. Balaji, M. Suarez, and A. da Silva, (2004). Architecture of the Earth System Modeling Framework. Computing in Science and Engineering, Volume 6, Number 1.

[7] Klasky, S.; Barreto, R.; Kahn, A.; Parashar, M.; Podhorszki, N.; Parker, S.; Silver, D.;Vouk, M. A., "Collaborative visualization spaces for petascale simulations," Collaborative Technologies and Systems, 2008. CTS 2008. International Symposium on, vol., no., pp.203-211, 19-23 May 2008

[8] Modules, http://modules.sourceforge.net/

[9] SoftEnv, http://www.mcs.anl.gov/hs/software/systems/msys/

[10] Vouk, M., Altintas, I., Klasky, S., Ludaescher, B., Silva, C., 2008, On SDM Provenance Framework, SDM Provenance White Paper, V3