27
Kento Aida, Tokyo Institute of Te chnology Grid Challenge - programming competition on the Grid - Kento Aida Tokyo Institute of Technology 22nd APAN Meeting in Singapore

Grid Challenge - programming competition on the Grid -

  • Upload
    haley

  • View
    37

  • Download
    3

Embed Size (px)

DESCRIPTION

22nd APAN Meeting in Singapore. Grid Challenge - programming competition on the Grid -. Kento Aida Tokyo Institute of Technology. What is Grid Challenge?. programming competition to develop high-performance programs on the Grid The organizer operates a Grid testbed. - PowerPoint PPT Presentation

Citation preview

Page 1: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Grid Challenge - programming competition on the Grid -

Kento AidaTokyo Institute of Technology

22nd APAN Meeting in Singapore

Page 2: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

What is Grid Challenge?programming competition to develop high-

performance programs on the GridThe organizer operates a Grid testbed.Participants develop/run programs on the

testbed.a special event in the Annual Symposium on

Advanced Computing Systems and Infrastructures (SACSIS)

history1st Grid Challenge in SACSIS 20052nd Grid Challenge in SACSIS 2006

Page 3: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Categorycompulsory

programming competition on the Grid testbedsolving the problem provided by the organizer

Graph Partitioning Problemstudents (university and high school)

freegiving opportunities to perform experiments on

the Gridpresentations during the conferencestudents, engineers and researchers

Page 4: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

CompulsoryGraph Partitioning Problem

for given undirected graph G(V,E), |V| = 2nL and R are disjoint partitions generated by equally dividing G, where |L| = |R|.Find partition that minimizes the number of edges with one endpoint in L and the other in R.

2

3

4

5

61

L R

Page 5: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Compulsory (cont’d)qualifying runs (3 weeks)

Solve early!to find a solution within a given thresholdshared resourcesproblem size: |V| = 500 - 1500

final runs (2 weeks)Solve fast!

dedicated time slots for finalists (2.5h per a team)to find a solution within a given period (10 min)A finalist with the best solution will be a winner!problem size: |V| = 30000 - 35000

Page 6: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Freeexperiments of research projects (1 month)

shared resourcesprojects

toolsa monitoring tool, a message passing system, a

programming tool, volunteer computingapplications

physics simulation, bio informatics, simulation of diesel engine, optimization problems

Page 7: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Participants

D, 2

M, 12U, 6

H, 1

compulsory free

D, 2

M, 5

U, 1

Page 8: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

TestbedGrid Challenge Federation

AISTTokyo Institute of TechnologyThe University of TokyoDoshisha University

more than 1,200 CPUs

Page 9: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Resourcescollection of PC clustersspec of a PC cluster

a gateway nodegateway, compiling

computing nodescomputation

global IP address/private IP addressNFS

“/home” is shared among nodes

Page 10: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Resources (cont’d)name site compt. node #compt. node

(#CPUs)F32 AIST

(Tsukuba)Xeon 3GHz x2, 4GB mem.,1000BASE-T

128(256)

SAKURA Opteron 1.8GHz x2, 3GB mem., 1000BASE-T

16(32)

DIS TITECH(Yokohama)

Athlon MP 2000+ 1.6GHz x2, 512MB mem. 100BASE-TX

50(100)

PrestoIII TITECH(Tokyo)

Opteron 246/242 2/1.6GHz x2, 4/3/2GB mem. 1000BASE-T

103(206)

Tau U. Tokyo(Tokyo)

Xeon 2.4/2.8GHz x2, 2GB mem., 1000BASE-T

175(350)

Chikayama U. Tokyo(Chiba)

Xeon 2.4GHz x2, 2GB mem., 1000BASE-T

64(128)

Xenia Doshisha U.(Kyoto)

Xeon 2.4GHz x2, 1GB em. 100BASE-TX

63/126

Page 11: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Internet Connection

TsukubaWAN

F32SAKURA

PrestoIII

Chikayama

Tau

DIS

SINETXenia

WIDE

Page 12: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

SoftwareGrid middleware

Globus Tool Kit 2.4batch queueing system

Sun Grid Engine, PBSremote process invocation

SSH, GXPmonitoring

Gangliaprogramming

MPICH 1.2.7, Ninf-G 2.4

Page 13: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

GXPshell for distributed multi-cluster environment

fast simultaneous command submissionsparallel job pipesinteractive selection of nodes to execute

commandsno cumbersome per-node operations!

installation and deploymentinvocation of parallel processesmonitoring, trouble diagnosis, debugging dead processes clean-up

http://www.logos.ic.i.u-tokyo.ac.jp/phoenix/gxp_quick_man.shtml

Page 14: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Ninf-Greference implementation of GridRPC

GridRPC : a simple RPC-based programming model for the GridClient invokes remote libraries installed on remote

servers on the Grid.utilizing task parallelism

http://ninf.apgrid.org/

serverlibrary

serverlibrary

data

resultdata

result

client

clientprogram

serverprogram

grpc_call(…)

Page 15: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Gangliaa distributed monitoring tool for high-

performance computing systems such as PC clusters and GridsCPU loadmemory usagenetwork traffic

http://ganglia.sourceforge.net/

Page 16: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

OperationThe testbed is operated by volunteers!

researchers/technical staff/students What we need to do

installation and its training for studentsuser managementjob management

Page 17: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

User Management local account

the same UID and login name for a user on all sites

remote login via sshpublic key

Globus accounttemporal CA for the Grid Challenge

Page 18: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Job Management interactive or batch

All sites provide both environment for job execution.

dedicated slotFinalists are assigned

dedicated slots for their application runs.

the gentlemen’s agreement

Page 19: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Troubles …computing nodes

OS hang up, troubles on hard disc drivespower supply

failure of balancing power supplyservers

troubles on NFS, batch queueing systemsmonitoring

troubles to collect monitoring data on ganglia

Page 20: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Troubles … (cont’d) jobs being out of control

waste of CPU/memory resources by jobs being out of control

dedicated slotsjobs running beyond its slot.

Page 21: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Operational Issue trouble on computing nodes

monitoring tools to identify computing nodes power supply

critical problem for small groups, e.g., a lab in university

tools for power monitoringlow-power processor

serversredundancy

Page 22: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Operational Issue (cont’d)user/process management

tools to control user processesmonitoring user processesdetecting unusual behaviorsuspending/killing jobs being out of control

tools for reservationreserving dedicated slots for userscontrolling user jobs

Page 23: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Snapshots qualifying runs

final runs

Page 24: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Snapshots (cont’d)

Page 25: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

ConclusionsGrid Challenge is programming competition

to develop high-performance programs on the Grid.compulsory and free categories

Grid testbed for Grid Challenge6 sites, 7 PC clusters, >1200 CPUGlobus, SGE, PBS, GXP, Ganglia, Ninf-G,

MPICH, …discussion about operational issue

tools for monitoring, power supply, user/process management

Page 26: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Acknowledgements Information Processing Society of JapanSun MicrosystemsSoum Corporation Grid Consortium Japan

Page 27: Grid Challenge -  programming competition on the Grid -

Kento Aida, Tokyo Institute of Technology

Thank you.