View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Recent Efforts on the Ninf Project and the Asia-Pacific
Grid (ApGrid)
Recent Efforts on the Ninf Project and the Asia-Pacific
Grid (ApGrid)Satoshi MatsuokaSatoshi Matsuoka
Tokyo Inst. Technology/JSTTokyo Inst. Technology/[email protected]@is.titech.ac.jp
SEKIGUCHI, Satoshi SEKIGUCHI, Satoshi Electrotechnical Laboratory,Electrotechnical Laboratory,
AIST(TACC), MITIAIST(TACC), [email protected]@etl.go.jp
Several slides are courtesy of Grid people
What is ApGrid?What is ApGrid?
A meeting point for A meeting point for allall Asia-Pacific Asia-Pacific HPCN researchersHPCN researchers..doing grid-related work..doing grid-related work
Communication channel to the Global Communication channel to the Global Grid Forum, and other grid Grid Forum, and other grid communitiescommunities
Pool for finding international project Pool for finding international project partnerspartners
NotNot a single source funded “project”! a single source funded “project”!
APAN: http://apan.net
Europe
Exchange Point
Access Point
Current Status
Planned
South Korea
Japan
China
Hong Kong
Malaysia
Singapore
Indonesia
Australia
Philippines
TransPAC(100 Mbps)
North America(STARTAP)
Latin America
Europe
Australia-Japan Link(1.5Mbps Frame Relay)
Thailand
ACSys
Success2: Tsukuba Advanced Computing Center (TACC): SC99 HPC Games w/Pittsburgh, Stuttgart, Manchester, etc.
Success2: Tsukuba Advanced Computing Center (TACC): SC99 HPC Games w/Pittsburgh, Stuttgart, Manchester, etc.
APAN
TACC
National Backbones for Japaese AcademiaNational Backbones for Japaese Academia
nGrid/eGrid Partners
TACC: Tsukuba Advanced Computing CenterOsaka: Osaka UniversityRWCP: Real World Computing PartnershipTIT: Tokyo Institute of Technology Waseda: Waseda University
APAN Tokyo
RWCP
TIT
Waseda
Osaka
TransPAC100Mbps
vBNS
TACC
STAR TAPChicago
IMnet
WIDE
SINET
10Mbps
10Mbps
100Mbps
135Mbps
100Mbps
155Mbps
Australia
1.5Mbps Frame Relay
384Kbps
1.5Mbps
135Mbps
Super SINETSuper SINET
Similar to Internet 2Similar to Internet 2 10GBps backbone10GBps backbone interconnecting major Ja interconnecting major Ja
panese Universitiespanese Universities 10/2.4 GBps link10/2.4 GBps link to each Univ. to each Univ. Collaboration with other national 10GBps bCollaboration with other national 10GBps b
ackbone projectsackbone projectsE.g., 10GBps backbone in Tsukuba areaE.g., 10GBps backbone in Tsukuba area
APGrid Locations/Potential PartnersAPGrid Locations/Potential Partners
Japan Japan AIST/TACC/ETLAIST/TACC/ETL
National Institute of Advanced Industrial SNational Institute of Advanced Industrial Science and Technologycience and Technology
Tokyo Institute of TechnologyTokyo Institute of Technology Waseda U,Waseda U, Osaka-u,Osaka-u, Nara Advanced Institute of S & TNara Advanced Institute of S & T HEPL (DataGrid)HEPL (DataGrid)
AustraliaAustralia ANU, Monash UANU, Monash U
United StatesUnited States PNNLPNNL
Korea (KORDIC, )Korea (KORDIC, ) Singapore (NUS)Singapore (NUS) MalaysiaMalaysia ThailandThailand ROC,ROC, Hong Kong,Hong Kong, TaiwanTaiwan Other APAN membersOther APAN members
ApGRID: motivations (1)ApGRID: motivations (1)
Establish a regional wide testbed for global Establish a regional wide testbed for global computing (Grid and/or Meta)computing (Grid and/or Meta)Disseminating research activitiesDisseminating research activitiesProviding an easy-access environment for reseProviding an easy-access environment for rese
archers, students, vendors, etc.archers, students, vendors, etc.Improving interoperability of existing toolsImproving interoperability of existing toolsTestbed for software development and trial to Testbed for software development and trial to
have evaluation of usability and to archive perfhave evaluation of usability and to archive performance numbersormance numbers
Finding demonstrative applicationsFinding demonstrative applications
ApGRID: motivations (2)ApGRID: motivations (2)
Create a competitive/collaborative community to thCreate a competitive/collaborative community to the iGRID and the eGRID for:e iGRID and the eGRID for: Making international collaborationsMaking international collaborations Supporting and collaborating with network people, ex. APSupporting and collaborating with network people, ex. AP
AN, IM-net, etc. AN, IM-net, etc. Attempt to negotiate for standardization with real experieAttempt to negotiate for standardization with real experie
nce (in Global Gridforum.)nce (in Global Gridforum.) Also, domestic (intra-country) serviceAlso, domestic (intra-country) service
Nation-wideNation-wideSeveral “non-cooperative” network communities Several “non-cooperative” network communities Seeking governmental and/or industrial fundingSeeking governmental and/or industrial funding
Campus-wideCampus-wideFind Volunteers within our friendsFind Volunteers within our friends
ApGrid ResourcesApGrid Resources
Gov. Lab. and Univ. Supercomputing centerGov. Lab. and Univ. Supercomputing centerssMITI TACC, HEPL (DataGrid), etc.MITI TACC, HEPL (DataGrid), etc.
Individual Univ. LabIndividual Univ. LabTITECHTITECHWaseda Univ.Waseda Univ.Nara AISTNara AISTEtc.Etc.
Network configuration at TACC(AIST Supercomputing Center)Network configuration at TACC(AIST Supercomputing Center)
FEx8GbE
Firewall
Internet135Mbps/2.4Gbps
RS6000/SP/128200+GFlops
SR8000/64512GFlops
Clusters
TACC ResourcesTACC Resources
High Performance Computing SystemHigh Performance Computing System SR8000, RS/6000 SP, UE10000, etc.SR8000, RS/6000 SP, UE10000, etc. Super Clusters (Alpha Ev56x40, Ev6x256, …)Super Clusters (Alpha Ev56x40, Ev6x256, …)
High Speed NetworkHigh Speed Network Giga bit campus backboneGiga bit campus backbone ATM Megalink national backboneATM Megalink national backbone
15 national laboratory over Japan15 national laboratory over Japan High Speed Internet AccessHigh Speed Internet Access
IM net 135Mbps, StarTAP 100Mbps via TransPAC/APANIM net 135Mbps, StarTAP 100Mbps via TransPAC/APAN Highly functional Data BaseHighly functional Data Base
RIO DBRIO DB Visit http://www.aist.go.jp/RIODBVisit http://www.aist.go.jp/RIODB
Hitachi SR8000/64 (sr8k)Hitachi SR8000/64 (sr8k)
Power PC + PVP + HXBPower PC + PVP + HXB 64 nodes64 nodes 512 Gflops (peak)512 Gflops (peak) 449.7Gflops (linpack)449.7Gflops (linpack) 512GB memory512GB memory 2D cross bar network2D cross bar network 1.98TB Disks1.98TB Disks R&D, Parallel Program developmentR&D, Parallel Program development + 8 nodes Front-end for interactive usa+ 8 nodes Front-end for interactive usa
ge (ex. Global Computing)ge (ex. Global Computing)
IBM RS/6000 SP (rssp)IBM RS/6000 SP (rssp)
Power3 SMP 2CPU, Winter hawkPower3 SMP 2CPU, Winter hawk 128 nodes128 nodes 205 Gflops (peak)205 Gflops (peak) 149.36Gflops (linpack)149.36Gflops (linpack) 256GB memory256GB memory High speed swtichHigh speed swtich 3.3TB (user 873GB) Disks3.3TB (user 873GB) Disks ISV applications’ platformISV applications’ platform + 4way/350MHz P3 x 8nodes front-end WH-II+ 4way/350MHz P3 x 8nodes front-end WH-II
Bamboo Alpha ClusterBamboo Alpha Cluster
256 Alpha EV6 500Mhz, 51256 Alpha EV6 500Mhz, 512MB, 256 GFlops Peak2MB, 256 GFlops Peak
Two-stage Gigabit Ethernet Two-stage Gigabit Ethernet Switch (Myrinet 2K?)Switch (Myrinet 2K?)
Special Compact-PCI PackaSpecial Compact-PCI Packaging by Alta Tech.ging by Alta Tech.
Linux-based, Commodity SoLinux-based, Commodity Software (Beowulf)ftware (Beowulf)
$5 mil$5 mil Production Cluster, OperatiProduction Cluster, Operati
onal RSNonal RSN
TITECH Matsuoka Lab. Grid ClustersTITECH Matsuoka Lab. Grid Clusters
““Very” Commodity Very” Commodity clusters as Grid Rclusters as Grid Resources and Resesources and Research pltfm.earch pltfm.
Current 2 clustersCurrent 2 clusters192 procs total192 procs total
6 clusters, over 46 clusters, over 400procs/400GFlo00procs/400GFlops by 1Q 2001, Gps by 1Q 2001, Gigabit linkageigabit linkage
The PRESTO Grid Clusters at Matsuoka Lab, TITECH for 2000The PRESTO Grid Clusters at Matsuoka Lab, TITECH for 2000 Presto IPresto I
64 PII-350, 256MB/node64 PII-350, 256MB/node Linux + RWC Score + our stuffLinux + RWC Score + our stuff Semi production, parallel OR algorithm on Semi production, parallel OR algorithm on
the Gridthe Grid Presto IIPresto II
64 Celeron-900, 512MB/node, multiple i64 Celeron-900, 512MB/node, multiple interconnectnterconnect
Grid Simulation, HP JavaGrid Simulation, HP Java ProsperoProspero
64nodex2proc SMP PIII-824, 640MB, 3-tr64nodex2proc SMP PIII-824, 640MB, 3-trunked 100Base-T (will be 192proc RSN wunked 100Base-T (will be 192proc RSN w/6TB disks)/6TB disks)
General-purpose cluster research, Grid siGeneral-purpose cluster research, Grid simulation, app. Run (incl. Mcell over the Pmulation, app. Run (incl. Mcell over the Pacific)acific)
ProntoPronto > 64Athlon, > 1.1Ghz, > 512> 64Athlon, > 1.1Ghz, > 512
MB DDR-DRAM, Hybrid 1000MB DDR-DRAM, Hybrid 1000/100Base-T/100Base-T
Semi-production, 1Q2001Semi-production, 1Q2001 PortoPorto
Plug & Play ClusteringPlug & Play Clustering 32 High-Performance Notebo32 High-Performance Notebo
oks (600Mhz Mobile Celeron)oks (600Mhz Mobile Celeron) PintoPinto
16-32 node Alpha cluster16-32 node Alpha cluster Heterogeneous Clustering ovHeterogeneous Clustering ov
er the Grid er the Grid Total >400nodesTotal >400nodes
Grid Cluster Research at TITECHGrid Cluster Research at TITECH
Grid Simulation and PerformanGrid Simulation and Performance Benchmarkingce Benchmarking
Cluster Federation w/GridCluster Federation w/Grid Commodity High-Performance Commodity High-Performance
NetworkingNetworking Incl. OpenMP (w/RWCP)Incl. OpenMP (w/RWCP)
Fault Tolerance and SecurityFault Tolerance and Security Dynamic Plug&Play ClusterDynamic Plug&Play Cluster
Downloadable Self-tuning Java LiDownloadable Self-tuning Java Libs and Appsbs and Apps
ApplicationsApplications Operation Research/ControlOperation Research/Control Netsolve MCell run Resource (w/Netsolve MCell run Resource (w/
UCSD)UCSD)
Java/Jini-based Grid&Cluster Java/Jini-based Grid&Cluster computingcomputing Migratory CodeMigratory Code Jini-based Cluster Grid ServiceJini-based Cluster Grid Service
ssResource Publication and ResResource Publication and Res
ource Discoveryource DiscoveryJiPANG Jini-based Grid Portals JiPANG Jini-based Grid Portals
Architecture (w/UTK)Architecture (w/UTK) Performance PortabilityPerformance Portability
High-Performance Portable JavHigh-Performance Portable Java DSMa DSM
Open-ended, downloadable JIT Open-ended, downloadable JIT CompilerCompilerThe OpenJIT Proj(w/Fujitsu)The OpenJIT Proj(w/Fujitsu)
Bricks Grid Simulatior (HPDC’99)Bricks Grid Simulatior (HPDC’99)
Consists of simulated Consists of simulated Global Computing Global Computing Environment Environment and and Scheduling Unit.Scheduling Unit.
Allows simulation of various behaviors ofAllows simulation of various behaviors of resource scheduling algorithmsresource scheduling algorithms programming modules for schedulingprogramming modules for scheduling network topology of clients and serversnetwork topology of clients and servers processing schemes for networks and servers processing schemes for networks and servers
(various queuing schemes)(various queuing schemes)using the using the Bricks scriptBricks script..
Makes benchmarks of existing global Makes benchmarks of existing global scheduling components availablescheduling components available
The Bricks ArchitectureThe Bricks Architecture
Scheduler
NetworkMonitor ServerMonitor
ClientNetwork
NetworkServer
Scheduling UnitScheduling Unit
Global Computing EnvironmentGlobal Computing Environment
ResourceDB
NetworkPredictorServerPredictor
Predictor
Applications on PRESTO Clusters –Op. ResearchApplications on PRESTO Clusters –Op. Research
SCRM(Generalized Quadratic SCRM(Generalized Quadratic Optimization Algorithm)Optimization Algorithm)
Iterative execution of multiple Iterative execution of multiple SDP solver w/Ninf via MasterSDP solver w/Ninf via Master-Worker-Worker
Some problems 100Fold speSome problems 100Fold speedup/128 procs (exec. Time edup/128 procs (exec. Time world record)world record)
Other difficult OR problems alOther difficult OR problems also very positive -> Larger exeso very positive -> Larger exection on Cluster Federation rction on Cluster Federation resourcesesources
PRESTO SCRMクラスタによる非凸二次計画問題の 法による並列実行
01000
20003000
40005000
60007000
8000
1 2 4 8 16 32 64
#Processors
()
実行
時間
秒 NQP15_1.datNQP12_1.dat
Titanium Terascale Grid ClusterTitanium Terascale Grid Cluster
Proposal for 10TF-scale “commodity” cluster at thProposal for 10TF-scale “commodity” cluster at the TITECH computing centere TITECH computing center
2 x 500 Itanium-class “commodity” cluster on two 2 x 500 Itanium-class “commodity” cluster on two TITECH campusesTITECH campuses
Interconnect via 2.4 Gigabit WANInterconnect via 2.4 Gigabit WAN Campus-wide usage with Grid softwareCampus-wide usage with Grid software
Centerpiece of Grid infrastructure within TITECH campuCenterpiece of Grid infrastructure within TITECH campuss
ApGrid and Global Grid collaborationApGrid and Global Grid collaboration 2002-3? W/restructuring of computing center2002-3? W/restructuring of computing center
Titanium Cluster OverviewTitanium Cluster Overview
Goal: Construct Goal: Construct as “cheap” as as “cheap” as possiblepossible
Semi-reliable Semi-reliable serviceservice
Use Grid Use Grid technology to technology to federate and federate and manage the manage the clustersclusters大岡山⇔長津田間
2.4G-10Gbps
内外の Grid インフラへ
(NPACI/Alliance/IPG, J- Grid, E- Grid など)
分散 ImmersaDesk
Titaneum クラスタ 1 号機
1024 プロセッサ , 100TB ストレジ
クラスタ OS/Grid ミドルウェア
学内 Grid ユーザ
学内 Grid ユーザ
高速無線 LAN AP
( 教室、研究室等)
学内ユーザの自由
な Grid 資源への
アクセス
大岡山地区
Gigabit 学内 LAN
ApGrid: Services (1)ApGrid: Services (1)
Grid computing serviceGrid computing serviceDeploy major grid software packages ready to uDeploy major grid software packages ready to u
seseNinf v.2.0 (Another talk Ninf v.2.0 (Another talk ))Globus, Netsolve, NWS, Nimrod, Condor Legion,etGlobus, Netsolve, NWS, Nimrod, Condor Legion,et
c.c.MPICH/G(2), PACX-MPI, Harness, etcMPICH/G(2), PACX-MPI, Harness, etc
System resourcesSystem resourcesUS220R x 2CPU x 4 from ETLUS220R x 2CPU x 4 from ETLORIGIN 2000/16CPU, J90/16CPU, CS6400/64ORIGIN 2000/16CPU, J90/16CPU, CS6400/64SR8000/8node, WH-II 8nodeSR8000/8node, WH-II 8nodeClusters (Pentium, Alpha), etc in many placesClusters (Pentium, Alpha), etc in many places
(“lapack”,”dgesv”, .., ..)
lapack.ApGrid.orgmurata.ApGrid.orglapack.eGrid.org 3-DNS
hpcc.gr.jp 192.50.75.0/24
ninf.org 150.29.218.0/23
150.29.219.128(VIP)
BIG/IP
Selector/scheduler
BIG/IP
Selector/scheduler
Different VIP per packagee.g. linpack.apgrid.org
Grouping of libraries via VIPVIP expands the URL to address of appropriate serverNinf 2.0/netsolve etc
Simplified architecture than the Metaserver・ Limit the # of Servers・ Load balancing with L4 switch technology・ Central administration of servers and DB・ Transactions
Res DB
package routine
ASP-Like ApGrid Ninf ServiceASP-Like ApGrid Ninf Service
Simplified architecture Simplified architecture than the Ninf Metaserthan the Ninf Metaserverver Limit the # of known SLimit the # of known S
erverservers Load balancing with L4 Load balancing with L4
switch technologyswitch technology Central administration Central administration
of servers and DBof servers and DB Transaction supportTransaction support
Resource access and Resource access and Load balancing w/VIPLoad balancing w/VIP Different VIP per packaDifferent VIP per packa
gegee.g. linpack.apgrid.orge.g. linpack.apgrid.org
Grouping of libraries viGrouping of libraries via VIPa VIP
VIP expands the URL to VIP expands the URL to address of appropriate address of appropriate serverserver
Ninf 2.0/Netsolve etcNinf 2.0/Netsolve etc
ApGrid: Services (2) ApGrid: Services (2)
Grid information serviceGrid information serviceMaintain name servers and databasesMaintain name servers and databasesASP-like portal serviceASP-like portal service
Handling users, micro economicsHandling users, micro economics
Grid security support service (Plan)Grid security support service (Plan)PKI: Public Key InfrastructurePKI: Public Key InfrastructureCertificate AuthorityCertificate Authority
ApGrid Information ServicesApGrid Information Services
Resource InfoResource Info Performance MonitorinPerformance Monitorin
g and Archiveg and Archive
Would like to collaboraWould like to collaborate w/other Grid patnerte w/other Grid patnerss
APANTokyo
RWCP
TITECH
TransPAC100MbpsETL/TACC
STAR TAPChicago
ApGrid - Korea, Singapore,Australia, etc,
ApGrid nodes in Japan
NWS Sensors
Virutal/Real Client
ApGridApGridTestbedTestbed
NWS Sensors
Virutal/Real Client
Virutal/Real Client
NWS Sensors
US and EuropeanPartners
Osaka-U
ApGrid: Current StatusApGrid: Current Status
Just kicked off, and some of the resources Just kicked off, and some of the resources are ready, but still we need:are ready, but still we need:Hiring people to maintain and to install the regHiring people to maintain and to install the reg
ular services initiallyular services initially Enrolling more partnersEnrolling more partners
Reserved: apgrid.org, Web site will be open shReserved: apgrid.org, Web site will be open shortlyortly
Find international partnersFind international partnersCreating much stronger relation with APAN actiCreating much stronger relation with APAN acti
vitiesvities
SummarySummary Some success storiesSome success stories
Collaboration with Application Scientists Collaboration with Application Scientists International CollaborationsInternational Collaborations
Osaka-U/UCSD (Globus)Osaka-U/UCSD (Globus)NetSolve/Ninf CollaborationNetSolve/Ninf Collaboration
WGCC2000, Grid Forum, metacomputing WSWGCC2000, Grid Forum, metacomputing WS Government funded several small projectsGovernment funded several small projects
the Asia-Pacific Grid (ApGrid)the Asia-Pacific Grid (ApGrid) TACC is ready for providing computing resourcesTACC is ready for providing computing resources National, Regional testbedNational, Regional testbed International Collaborations Efforts a MUST!International Collaborations Efforts a MUST!
TACC OverviewTACC Overview
MissionsMissions Providing world leadership in advanced computing sciencProviding world leadership in advanced computing scienc
e and technology through the development and applicatie and technology through the development and application of computing science and engineeringon of computing science and engineering
OrganizationOrganization MITI/AIST operates directly since 1981MITI/AIST operates directly since 1981 2 executive, 7 technical, 2 admin + SEs2 executive, 7 technical, 2 admin + SEs Annual budget 2,400M JPY (=20M USD)Annual budget 2,400M JPY (=20M USD)
Incl. Supercomputer rental, SE, network maintenance, electricity, Incl. Supercomputer rental, SE, network maintenance, electricity, etc.etc.
Collaborative activities with partners Collaborative activities with partners RWCP, Tsukuba Univ., NAL, Jaeri, KEK,RWCP, Tsukuba Univ., NAL, Jaeri, KEK, HRLS, CSAR, SDSC, UTK, LANL, NIST, ETHZ, ANU...HRLS, CSAR, SDSC, UTK, LANL, NIST, ETHZ, ANU...
ITBL is NOTITBL is NOT
ApGrid nor Japan Grid nor Tokyo Grid nor Tsukuba Grid nor…ApGrid nor Japan Grid nor Tokyo Grid nor Tsukuba Grid nor… An Infrastructure-oriented projectAn Infrastructure-oriented project An Application-oriented projectAn Application-oriented project An Earth Simulator-related projectAn Earth Simulator-related project A successor to RWCPA successor to RWCP A Grid projectA Grid project An internationally collaborative projectAn internationally collaborative project A domestically collaborative projectA domestically collaborative project A huge projectA huge project A Good project (at least to our opinion)A Good project (at least to our opinion) Then, what is IT?Then, what is IT?
Nobody really knows (or cares)Nobody really knows (or cares) And thus its objective must be top secret (even to us)And thus its objective must be top secret (even to us) Probably upgrades several supercomputer boxes (Probably upgrades several supercomputer boxes ( 箱物箱物 ) and network links ) and network links
(( ゼネコン対策ゼネコン対策 ))