23
Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Embed Size (px)

Citation preview

Page 1: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Presented by: Marlon Bright

19 June 2008

Advisor: Masoud Sadjadi, Ph.D.

REU – Florida International University

Page 2: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Outline

Grid Enablement of Weather Research and Forecasting Code (WRF)

Profiling and Prediction Tools Research Goals Project Timeline Current Progress Challenges

REU - Florida International University 2

Page 3: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Motivation Weather Prediction can:

Save LivesHelp Business Owners & Emergency Response

How?Accurate and Timely ResultsPrecise Location Information

What do we have?WRF – Weather Research Forecast“The Weather Research and Forecasting (WRF)

Model is a next-generation mesocale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs.”

REU - Florida International University 3

Page 4: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Motivation (Cont.) - WRF WRF Status

Over 160,000 lines (mostly FORTRAN and C)Single Machine/Cluster compatibleSingle DomainFine Resolution -> Resource Requirements

How to Overcome this?Through Grid Enablement

Expected Benefits to WRFMore available resources – Different DomainsFaster resultsImproved Accuracy

REU - Florida International University 4

Page 5: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Grid Enablement “Grid-enabling is the practice of taking existing applications,

which currently run on a single node or on a cluster of homogeneous nodes, and adapt them (either automatically or manually) so that they can be deployed over non-homogeneous computing resources connected through the Internet across multiple organizational boundaries (e.g., multiple clusters from different organizations) without major modifications to the underlying source code.”

Grid-enablement process successful if the resulting Grid-enabled application “performs better” than the original application.

Performs better can be interpreted differently Improved execution time, better resource utilization,

enabling collaboration, …REU - Florida International University 5

Page 6: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

System Overview Web-Based Portal Grid Middleware (Plumbing)

Job-Flow ManagementMeta-Scheduling

○ Performance Prediction

Profiling and Benchmarking Development Tools and Environments

Transparent Grid Enablement (TGE)○ TRAP: Static and Dynamic adaptation of programs○ TRAP/BPEL, TRAP/J, TRAP.NET, etc.

GRID superscalar: Programming Paradigm for parallelizing a sequential application dynamically in a Computational Grid

REU - Florida International University 6

Page 7: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Meta - Scheduling

IMPORTANT: WRF cannot be gridified trivially! “Global” scheduler of grid environment—above

Local Resource Manager Selects resources for jobs to run on if not run on

local resources Submits user jobs to optimal remote resources

(different domain/Virtual Organization):Analyzes application and hardware characteristics to

find best matchUses application performance prediction models

REU - Florida International University 7

Page 8: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Performance Prediction

Allows for: Optimal usage of grid resources through

“smarter” meta-schedulingMany users overestimate job requirementsReduced idle time for compute resourcesCould save costs and energy

Optimal resource selection for most expedient job return time

REU - Florida International University 8

Page 9: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Better Scheduling by Modeling WRF Behavior

networkdiskmemory

k

kk bbbbbnx 443cache2CPU10

4

10

REU - Florida International University 9

Mathematical Modeling

Parameter Estimation

ProfilingCode Inspection & Modeling

Texe= ( 0 + 1 / #nodes ) ( 0 + 1 / clock )

ModelingModelingWRFWRF

BehaviorBehavior

An Iterative Process

An Incremental

Process

Start

Page 10: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University
Page 11: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Amon / Aprof

Amon – monitoring program that runs on each compute node recording processes

Aprof – regression analysis program running on head node; receives input from Amon to make execution time predictions (within cluster & between clusters)

REU - Florida International University 11

Page 12: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Amon / Aprof Monitoring and Prediction

REU - Florida International University 11

Page 13: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Amon / Aprof Approach to Modeling Resource Usage

12REU - Florida International University

WRF

mv wrfjob.${jobid}.out ${RESULTS_DIR}/${cpu_limit}/${i}.out

Page 14: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Previous Findings for Amon / AprofExperiments were performed on two clusters at FIU

—Mind (16 nodes) and GCB (8 nodes) Experiments were run to predict for different

number of nodes and cpu loads (i.e. 2,3,…,14,15 and 20%, 30%,…,90%, 100%)

Aprof predictions were within 10% error versus actual recorded runtimes within Mind and GCB and between Mind and GCB

Conclusion: first step assumption was valid. -> Move to extending research to higher number of nodes.

REU - Florida International University 14

Page 15: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Paraver / Dimemaso Dimemas - simulation tool for the

parametric analysis of the behavior of message-passing applications on a configurable parallel platform.

o Paraver – tool that allows for performance visualization and analysis of trace files generated from actual executions and by Dimemas

Tracefiles generated by MPItrace that is linked into execution code

REU - Florida International University 15

Page 16: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Paraver/Dimemas – DiP Environment

REU - Florida International University 16

Page 17: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Goals

1. Extend Amon/Aprof research to larger number of nodes, different archtitecture, and different version of WRF (Version 2.2.1).

2. Compare/contrast Aprof predictions to Dimemas predictions in terms of accuracy and prediction computation time.

3. Analyze if/how Amon/Aprof could be used in conjunction with Dimemas/Paraver for optimized application performance prediction and, ultimately, meta-scheduling

REU - Florida International University 17

Page 18: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Timeline End of June:

Get MPItrace linking properly with WRF Version Compiled on GCB, then Mind

a) Install Amon and Aprof on MareNostrum and ensure proper functioning

b) Run benchmarks on MareNostrum Early July:

Use Amon/Aprof to predict within MareNostrum (and possibly between MareNostrum, GCB, and Mind)

Use generated MPI/ OpenMP tracefiles (Paraver/Dimemas) to predict within/between Mind, GCB, and MareNostrum

Late July/Early August: Experiment with how well Amon and Aprof relate to/could possibly be

combined with Dimemas Analyze how findings relate to bigger picture. Make optimizations on grid-

enablement of WRF. Compose paper presenting significant findings.

REU - Florida International University 18

Page 19: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Current Progress

Familiarized and up-to-speed on current state of research

Completed reading of most essential related works papers

Functional user of Paraver In final stages of being fully functional on

Linux Platform Amon/Aprof installed on MareNostrum

REU - Florida International University 19

Page 20: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Current Progress (cont’d) Becoming functional Amon/Aprof driver

on MareNostrum Supercomputer Developing research plan for experiments Developing benchmarking scripts for

executing experiments

Working out bugs/becoming functional user of Dimemas on GCB and Mind Working to properly generate Dimemas

tracefiles on GCB

REU - Florida International University 20

Page 21: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Current Challenges Compiling version 2.2 of WRF in Mind (and

possibly MareNostrum) or:

Compiling version 2.2.1 of WRF in GCB and Mind

Linking MPItrace into compiled WRF in GCB/Mind cluster to generate accurate Paraver/Dimemas trace files

Adapting/developing benchmarking scripts to new architecture of MareNostrum

REU - Florida International University 21

Page 22: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

References S. Masoud Sadjadi, Liana Fong, Rosa

M. Badia, Javier Figueroa, Javier Delgado, Xabriel J. Collazo-Mojica, Khalid Saleem, Raju Rangaswami, Shu Shimizu, Hector A. Duran Limon, Pat Welsh, Sandeep Pattnaik, Anthony Praino, David Villegas, Selim Kalayci, Gargi Dasgupta, Onyeka Ezenwoye, Juan Carlos Martinez, Ivan Rodero, Shuyi Chen, Javier Muñoz, Diego Lopez, Julita Corbalan, Hugh Willoughby, Michael McFail, Christine Lisetti, and Malek Adjouadi. Transparent grid enablement of weather research and forecasting. In Proceedings of the Mardi Gras Conference 2008 - Workshop on Grid-Enabling Applications, Baton Rouge, Louisiana, USA, January 2008.

http://www.cs.fiu.edu/~sadjadi/Presentations/Mardi-Gras-GEA-2008-TGE-WRF.ppt

S. Masoud Sadjadi, Shu Shimizu, Javier Figueroa, Raju Rangaswami, Javier Delgado, Hector Duran, and Xabriel Collazo. A modeling approach for estimating execution time of long-running scientific applications. In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS-2008), the Fifth High-Performance Grid Computing Workshop (HPGC-2008), Miami, Florida, April 2008.

http://www.cs.fiu.edu/~sadjadi/Presentations/HPGC-2008-WRF%20Modeling%20Paper%20Presentationl.ppt “Performance/Profiling”. Presented by

Javier Figueroa in Special Topics in Grid Enablement of Scientific Applications Class. 13 May 2008

REU - Florida International University 22

Page 23: Presented by: Marlon Bright 19 June 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University

Acknowledgements

REU PIRE BSC Masoud Sadjadi, Ph. D. - FIU Rosa Badia, Ph.D. - BSC Javier Delgado – FIU Javier Figueroa - UM

REU - Florida International University 23