17
National Institute of Advanced Industrial Science and Technology Introduction of PRAGMA routine-basis experiments Yoshio Tanaka ([email protected]) Yoshio Tanaka ([email protected]) PRAGMA PRAGMA Grid Technology Research Center, Grid Technology Research Center, AIST, AIST, Japan Japan

National Institute of Advanced Industrial Science and Technology Introduction of PRAGMA routine-basis experiments Yoshio Tanaka ([email protected])

Embed Size (px)

Citation preview

  • Grid Communities in Asia Pacific at a glance ApGrid: Asia Pacific Partnership for Grid ComputingOpen Community as a focal pointmore than 40 member institutions from 15 economicsKick-off meeting: July 2000, 1st workshop: Sep. 2001PRAGMA: Pacific Rim Applications and Grid Middleware AssemblyNSF funded project led by UCSD/SDSC19 member institutionsEstablish sustained collaborations and advance the use of the grid technologies1st workshop: Mar. 2002, 10th workshop: next month!APAN (Asia Pacific Advanced Network) Grid CommitteeBridging APAN application communities and Grid communities outside of APANGrid WG was launched in 2002, re-organized as a committee in 2005APGrid PMA: Asia Pacific Grid Policy Management AuthorityGeneral Policy Management Authority in the Asia Pacific Region16 member CAsA founding member of the IGTF (International Grid Trust Federation)Officially started in June 2004APEC/TEL APGridBuilding social frameworkSemi-annual workshopsAPAN (Asia Pacific Advanced Network) Middleware WGShare experiences on middleware.Recent topics include ID management and National Middleware Efforts.Approved in January 2006.

    National Institute of Advanced Industrial Science and Technology

    PRAGMA routine-basis experiments

    Most slides are by courtesy ofMason Katz and Cindy Zheng (SDSC/PRAGMA)

  • PRAGMA Grid TestbedAIST, JapanCNIC, ChinaKISTI, KoreaASCC, TaiwanNCHC, TaiwanUoHyd, IndiaMU, Australia BII, SingaporeKU, ThailandUSM, MalaysiaNCSA, USASDSC, USACICESE, MexicoUNAM, MexicoUChile, ChileTITECH, JapanUMC, USAUZurich, SwitzerlandGUCAS, Chinahttp://pragma-goc.rocksclusters.org

  • Application vs. Infrastructure Middleware

  • PRAGMA Grid resources http://pragma-goc.rocksclusters.org/pragma-doc/resources.html

  • Features of PRGMA GridGrass-root approachNo single source of funding for testbed developmentEach site contributes its resources (computers, networks, human resources, etc.)Operated/maintained by administrators of each site.Most site admins are not dedicated for the operation.Small-scale clusters (several 10s CPUs) are geographically distributed in the Asia Pacific Region.Networking is there (APAN/TransPAC), but performance (throughput and latency) is not enough.Aggregated #cpus is more than 600 and still increasing.Really an international Grid across national boundary.Give middleware developers, application developers and users many valuable insights through experiments on this real Grid infrastructure.

  • Why Routine-basis Experiments?Resources group Missions and goalsImprove interoperability of Grid middlewareImprove usability and productivity of global gridPRAGMA from March, 2002 to May, 2004Computation resources 10 countries/regions, 26 institutions, 27 clusters, 889 CPUsTechnologies (Ninf-G, Nimrod, SCE, Gfarm, etc.)Collaboration projects (Gamess, EOL, etc.)Grid is still hard to use, especially global gridHow to make a global grid easy to use?More organized testbed operationFull-scale and integrated testing/researchLong daily application runsFind problems, develop/research/test solutions

  • Routine-basis ExperimentsInitiated in May 2004 PRAGMA6 workshopTestbedVoluntary contribution (8 -> 17)Computational resources firstProduction grid is the goalApplicationsTDDFT, mpiBlast-g2, Savannah, QM/MDiGAP over GfarmOcean science, Geoscience (proposed)Learn requirements/issuesResearch/implement solutionsImprove application/middleware/infrastructure integrationsCollaboration, coordination, consensus

  • Rough steps of the experimentPlayers:A conductorAn application driverSite administratorsSelect an application and an application driverThe application driver prepares a web page that describes software requirements (prerequisite software, architecture, public/private IP addresses, disk usage, etc.) of the application. Then, the application driver informs the conductor that the web page is ready.The conductor ask site administrators to (1) create an account for the driver (including adding an entry to grid-mapfile, and CA certificate/policy file), and (2) install necessary software listed on the web site.Each site admin let the conductor and the application driver know when she/he has done account creation and software installation.The application driver login and test the new site. If she/he finds any problems, she/he will directly contact to the site admin.The application driver will start main (long-run) experiment when she/he decides the environment has been ready.

  • Progress at a GlanceMayJuneJulyAugSC04SepOctNovPRAGMA61st App. start1st App. endPRAGMA72nd App. startSetup Resource Monitor (SCMSWeb)1. Site admins install required software2. Site admins create users accounts (CA, DN, SSH, firewall)3. Users test access4. Users deploy application codes5. Users perform simple tests at local sites6. Users perform simple tests between 2 sitesJoin in the main executions (long runs) after alls done2 sites5 sites8 sites10 sitesOn-going works2nd user start executionsSetup GridOperation CenterDec - Mar3rd App. start12 sites14 sites

  • main(){ : grpc_function_handle_default( &server, tddft_func); : grpc_call(&server, input, result); :gatekeepertddft_func()Exec func() on backendsCluster 1Cluster 3Cluster 4Client program of TDDFTGridRPCSequential programClientServer1st applicationTime-Dependent Density Functional Theory (TDDFT)Cluster 2Computational quantum chemistry applicationDriver: Yusuke Tanimura (AIST, Japan)Require GT2, Fortran 7 or 8, Ninf-G26/1/04 ~ 8/31/044.87MB3.25MBhttp://pragma-goc.rocksclusters.org/tddft/default.html

  • Routine Use Applications

  • QM/MD simulation of Atomic-scale Stick-Slip PhenomenonH-saturatedSi(100)Nanoscale-Tipunder strainmotion40Initial StatusAfter 520 fs (image)

  • Lessons Learned http://pragma-goc.rocksclusters.org/Grid Operators point of viewPreparing a web page is good to understand necessary operation.But should be described in more detail as much as possibleGrid-enabled MPI is ill-suited to Griddifficult to support co-allocationprivate IP address nodes are not usable(performance, fault tolerance, etc)Middleware developers point of viewObserved may kinds of faults (some of them were difficult to detect)Improved capabilities for fault detectione.g. heartbeat, timeout, etc.Application user (driver)s point of viewHeterogeneity in various layersHardware/software configuration of clustersFront node, compute nodes, compile nodespublic IP, private IPFile systemConfiguration of batch systemNeed to check the configuration when the driver accessed to the site for the first timeNot easy to trace jobs (check the status of jobs / queues)Clusters were sometimes not clean (zombie processes were there).

  • Summary: Collaboration is the keyNon-technical, most importantDifferent funding sourcesHow to get enough resourcesHow to get people to act, togetherhow to motivate them to participate inMutual interests, collective goalsCultivate collaborative spiritKey to PRAGMAs success Experiences on the routine-basis experiments helped experiments on multi-grid interoperation between PRAGMA and TeraGrid.Details will be presented by Phil P. this afternoon