Upload
haley
View
37
Download
3
Embed Size (px)
DESCRIPTION
22nd APAN Meeting in Singapore. Grid Challenge - programming competition on the Grid -. Kento Aida Tokyo Institute of Technology. What is Grid Challenge?. programming competition to develop high-performance programs on the Grid The organizer operates a Grid testbed. - PowerPoint PPT Presentation
Citation preview
Kento Aida, Tokyo Institute of Technology
Grid Challenge - programming competition on the Grid -
Kento AidaTokyo Institute of Technology
22nd APAN Meeting in Singapore
Kento Aida, Tokyo Institute of Technology
What is Grid Challenge?programming competition to develop high-
performance programs on the GridThe organizer operates a Grid testbed.Participants develop/run programs on the
testbed.a special event in the Annual Symposium on
Advanced Computing Systems and Infrastructures (SACSIS)
history1st Grid Challenge in SACSIS 20052nd Grid Challenge in SACSIS 2006
Kento Aida, Tokyo Institute of Technology
Categorycompulsory
programming competition on the Grid testbedsolving the problem provided by the organizer
Graph Partitioning Problemstudents (university and high school)
freegiving opportunities to perform experiments on
the Gridpresentations during the conferencestudents, engineers and researchers
Kento Aida, Tokyo Institute of Technology
CompulsoryGraph Partitioning Problem
for given undirected graph G(V,E), |V| = 2nL and R are disjoint partitions generated by equally dividing G, where |L| = |R|.Find partition that minimizes the number of edges with one endpoint in L and the other in R.
2
3
4
5
61
L R
Kento Aida, Tokyo Institute of Technology
Compulsory (cont’d)qualifying runs (3 weeks)
Solve early!to find a solution within a given thresholdshared resourcesproblem size: |V| = 500 - 1500
final runs (2 weeks)Solve fast!
dedicated time slots for finalists (2.5h per a team)to find a solution within a given period (10 min)A finalist with the best solution will be a winner!problem size: |V| = 30000 - 35000
Kento Aida, Tokyo Institute of Technology
Freeexperiments of research projects (1 month)
shared resourcesprojects
toolsa monitoring tool, a message passing system, a
programming tool, volunteer computingapplications
physics simulation, bio informatics, simulation of diesel engine, optimization problems
Kento Aida, Tokyo Institute of Technology
Participants
D, 2
M, 12U, 6
H, 1
compulsory free
D, 2
M, 5
U, 1
Kento Aida, Tokyo Institute of Technology
TestbedGrid Challenge Federation
AISTTokyo Institute of TechnologyThe University of TokyoDoshisha University
more than 1,200 CPUs
Kento Aida, Tokyo Institute of Technology
Resourcescollection of PC clustersspec of a PC cluster
a gateway nodegateway, compiling
computing nodescomputation
global IP address/private IP addressNFS
“/home” is shared among nodes
Kento Aida, Tokyo Institute of Technology
Resources (cont’d)name site compt. node #compt. node
(#CPUs)F32 AIST
(Tsukuba)Xeon 3GHz x2, 4GB mem.,1000BASE-T
128(256)
SAKURA Opteron 1.8GHz x2, 3GB mem., 1000BASE-T
16(32)
DIS TITECH(Yokohama)
Athlon MP 2000+ 1.6GHz x2, 512MB mem. 100BASE-TX
50(100)
PrestoIII TITECH(Tokyo)
Opteron 246/242 2/1.6GHz x2, 4/3/2GB mem. 1000BASE-T
103(206)
Tau U. Tokyo(Tokyo)
Xeon 2.4/2.8GHz x2, 2GB mem., 1000BASE-T
175(350)
Chikayama U. Tokyo(Chiba)
Xeon 2.4GHz x2, 2GB mem., 1000BASE-T
64(128)
Xenia Doshisha U.(Kyoto)
Xeon 2.4GHz x2, 1GB em. 100BASE-TX
63/126
Kento Aida, Tokyo Institute of Technology
Internet Connection
TsukubaWAN
F32SAKURA
PrestoIII
Chikayama
Tau
DIS
SINETXenia
WIDE
Kento Aida, Tokyo Institute of Technology
SoftwareGrid middleware
Globus Tool Kit 2.4batch queueing system
Sun Grid Engine, PBSremote process invocation
SSH, GXPmonitoring
Gangliaprogramming
MPICH 1.2.7, Ninf-G 2.4
Kento Aida, Tokyo Institute of Technology
GXPshell for distributed multi-cluster environment
fast simultaneous command submissionsparallel job pipesinteractive selection of nodes to execute
commandsno cumbersome per-node operations!
installation and deploymentinvocation of parallel processesmonitoring, trouble diagnosis, debugging dead processes clean-up
http://www.logos.ic.i.u-tokyo.ac.jp/phoenix/gxp_quick_man.shtml
Kento Aida, Tokyo Institute of Technology
Ninf-Greference implementation of GridRPC
GridRPC : a simple RPC-based programming model for the GridClient invokes remote libraries installed on remote
servers on the Grid.utilizing task parallelism
http://ninf.apgrid.org/
serverlibrary
serverlibrary
data
resultdata
result
client
clientprogram
serverprogram
grpc_call(…)
Kento Aida, Tokyo Institute of Technology
Gangliaa distributed monitoring tool for high-
performance computing systems such as PC clusters and GridsCPU loadmemory usagenetwork traffic
http://ganglia.sourceforge.net/
Kento Aida, Tokyo Institute of Technology
OperationThe testbed is operated by volunteers!
researchers/technical staff/students What we need to do
installation and its training for studentsuser managementjob management
Kento Aida, Tokyo Institute of Technology
User Management local account
the same UID and login name for a user on all sites
remote login via sshpublic key
Globus accounttemporal CA for the Grid Challenge
Kento Aida, Tokyo Institute of Technology
Job Management interactive or batch
All sites provide both environment for job execution.
dedicated slotFinalists are assigned
dedicated slots for their application runs.
the gentlemen’s agreement
Kento Aida, Tokyo Institute of Technology
Troubles …computing nodes
OS hang up, troubles on hard disc drivespower supply
failure of balancing power supplyservers
troubles on NFS, batch queueing systemsmonitoring
troubles to collect monitoring data on ganglia
Kento Aida, Tokyo Institute of Technology
Troubles … (cont’d) jobs being out of control
waste of CPU/memory resources by jobs being out of control
dedicated slotsjobs running beyond its slot.
Kento Aida, Tokyo Institute of Technology
Operational Issue trouble on computing nodes
monitoring tools to identify computing nodes power supply
critical problem for small groups, e.g., a lab in university
tools for power monitoringlow-power processor
serversredundancy
Kento Aida, Tokyo Institute of Technology
Operational Issue (cont’d)user/process management
tools to control user processesmonitoring user processesdetecting unusual behaviorsuspending/killing jobs being out of control
tools for reservationreserving dedicated slots for userscontrolling user jobs
Kento Aida, Tokyo Institute of Technology
Snapshots qualifying runs
final runs
Kento Aida, Tokyo Institute of Technology
Snapshots (cont’d)
Kento Aida, Tokyo Institute of Technology
ConclusionsGrid Challenge is programming competition
to develop high-performance programs on the Grid.compulsory and free categories
Grid testbed for Grid Challenge6 sites, 7 PC clusters, >1200 CPUGlobus, SGE, PBS, GXP, Ganglia, Ninf-G,
MPICH, …discussion about operational issue
tools for monitoring, power supply, user/process management
Kento Aida, Tokyo Institute of Technology
Acknowledgements Information Processing Society of JapanSun MicrosystemsSoum Corporation Grid Consortium Japan
Kento Aida, Tokyo Institute of Technology
Thank you.