Upload
shon-craig
View
215
Download
1
Embed Size (px)
Citation preview
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Update since last HEPiX/HEPNT meeting
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
NETWORKDNS BOOTP DHCPPRINTERS/PLOTTERSLEGACY VMS ALPHA/TRU64 SERVERSPC on X-WINDOWSPC PRINT/DOMAIN SERVERUNIX PRINT SERVERMAIL SERVERWEB SERVER
LINUX/TRU64 AP. SERVERLINUX COMPUTE SERVERUNIX BACKUP SERVERPC BACKUP SERVERExtruded Offsite Subnet
FUTURE DEVELOPMENTS WestGrid
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
TRIUMF NETWORKComponents : 100 Mbit Fore Powerhub to UBC (route01) Passport 8003 Routing Switch – 8*1Gb ports
(2Q02) 32 Baystack 450-24T Gigabit Switches FDDI Ring (phased out Oct 1/01)Function: TRIUMF’s Internet connection Site-wide network connections Class B with 26 active subnets ( C ) ~800 active
connections
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Module Name: REG/ERICHComponents : Dual DSSI/ Dual Vax 4000 VMS 6.2 128Mb memory each, 172Gbytes diskFunction: Legacy VMS server VMS Mail terminated Dec31/2000 Disable Login Sep 1/2002, Power-off
Jan1/2003
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Module Name: TNTSRV00
Components : Dual 1GHz P3, Windows 2000 1GB memory, 3*62GB SCSI disks Old TNTSRV01 acts as backupFunction: Windows Primary Domain Controller Active Directory File Server for Windows Print Server for PC’s/Macs Application Server for Macs
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Module Name: TRBACKUPComponents : 667 MHz Pentium III – VALinux- RH6.2 128 Mb memory, 10 Gbytes disk DLT-8000 (40/80GB tapes) ATL PowerStor L200 SDLT220 (8 slot
110/220GB) SDLT Native goals: 160(2002Q1),
320(2003Q4), 640(2005), 1280(2006Q4)Function: Central Backup/Restore Utility(BRU) server
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Linux RH 7.2FreeSWAN / I PSEC
142.90.126.3142.90.126.1 142.90.126.2
VPN1
TRIUMF
142.90.100.18Internet
142.90.125.1
TRIUMF Network
The TRIUMF - BC Research Extruded SubnetSecure tunnel between VPN1 and VPN2
BC ResearchVPN2
Linux RH 7.2FreeSWAN / IPSEC
142.90.126.254 207.23.243.188
GATEWAY 142.90.126.254NETMASK 255.255.255.0
BROADCAST 142.90.126.255
207.23.243.254 Router
Router
switch
Recipee: Steve [email protected]
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
SUPPORT SUMMARYFAST – FLEXIBLE – MANAGEABLE NETWORKAll servers / network components on UPS/DieselRedHat Linux V6.2 / V7.1 (CERN Library)RedHat Linux V7.2 EXT3 (non-CERN)Migrate X-window terminals Redhat Linux BoxesCentralized Mail/Print/Backup servicesCentralized Linux Updates/Security PatchesMore Wireless (Lucent Orinoco AP1000)Extruded Subnet SupportRedhat Linux Cluster/Grid Research
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Works in Progress
Problems Need for small local cluster Need to consolidate disk storage
- More efficient use of disk space- Move to rack-mountable- Add more with little impact- Improved reliability (raid, hot-swap)
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Windows NT/2000/9x(SMB)
Linux(NFS)
Mac (atalk)
ProLiant DL360
9.1 - GB 10kULTRA2 SCSI
9.1 - GB 10kULTRA2 SCSI
ProLiant DL360
9.1 - GB 10kULTRA2 SCSI
9.1 - GB 10kULTRA2 SCSI
ProLiant DL360
9.1 - GB 10kULTRA2 SCSI
9.1 - GB 10kULTRA2 SCSI
ProLiant DL360
9.1 - GB 10kULTRA2 SCSI
9.1 - GB 10kULTRA2 SCSI
ProLiant DL360
9.1 - GB 10kULTRA2 SCSI
9.1 - GB 10kULTRA2 SCSI
ProLiant DL360
9.1 - GB 10kULTRA2 SCSI
9.1 - GB 10kULTRA2 SCSI
ProLiant DL360
9.1 - GB 10kULTRA2 SCSI
9.1 - GB 10kULTRA2 SCSI
ProLiant DL360
9.1 - GB 10kULTRA2 SCSI
9.1 - GB 10kULTRA2 SCSI
ProLiant DL360
9.1 - GB 10kULTRA2 SCSI
9.1 - GB 10kULTRA2 SCSI
S D12 3 4 5 6 7 8 9 1 0 11 1 2
1 3 14 15 1 6 1 7 18 19 2 0 2 1 2 2 23 2 4
25 2 6 2 7 2 8
BayNe tworks
BaySt ack 450-24T Swi tch
2 3 4 5 6 7 8 9 1 0 1 1 1 2
1 3 1 4 1 5 1 6 17 18 1 9 2 0 2 1 2 2 2 3 2 4
1
U p lin k M o d u le
C o m m Po r t 1 0 0
1 0F Dx
A c tiv i ty
1 0 0
1 0F Dx
A c tiv i ty
SD
142.90.0.0
142.90.100.212 142.90.100.211
SCSI ID 7SCSI ID 6
SCSI ID 0
142.90.100.210
IDE DISK ARRAY ~ 1.4TB withSCSI interconnects
192.168.0.2 192.168.0.1
ttyS0ttyS1Heartbeat(s)
Using IP ALIASING
Mosix Clientnodes
Mosix Masternode
Mosix Secondarynode
TRIUMF Proposed Compute/FileserverCluster
192.168.1.2 192.168.1.1
192.168.1.0
HOT SWAP ~120 GB IDE DRIVES
ARENA INDY-2400
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
IDE Box Details
512Mb SDRAM Dual Channel Ultra 160 Hot swappable IDE drives Raid 0, 1, 0+1, 3, or 5 Dual Power Supplies
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Status of WestGrid
Federal Funding of $12m approved Waiting for BC/Alberta matching funds Request for Information is in draft mode Request for Proposals upon matching
funding
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Grid Storage
Scientific Visualization
Advanced Collaboration
Computational Resources
LEGEND
Participants
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Computational Sites
Alberta: University of Alberta, University of Calgary,University of Lethbridge, The Banff Centre,British Columbia: University of British Columbia, Simon Fraser
University, New Media Innovation Centre (NewMIC), TRIUMF
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
WestGrid Site Facilities
Univ. of Alberta: A 128-CPU / 64GB memory SMMP machine + 5Tb Disks 25Tb Tape (Fine grained concurrency)
Univ. of Calgary: A 256-node 64bit / 256GB Cluster of Multi-Processors (CluMP) + 4Tb Disks (Medium grained concurrency)
UBC/TRIUMF: 1000/1500 CPU “naturally” parallel commodity cluster, 512Mb/CPU, with 10 TB of (SAN) disk, 70-100 TB of variable tape storage (Coarse grained concurrency)
SFU (Harbour Center): Network storage facility25 TB of disk, 200 TB of tape (Above sites storage, database serving)
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
TRIUMF Computing Needs TWIST: 72 TB/yr of data; 20 TB/yr of Monte Carlo
~125 cpus (1 GHz) E949: 100 TB/yr of data (50% at TRIUMF/UBC)
~ 100 cpus (part time) ISAC: Data set not huge, but analysis requires
large- scale parallel computing PET: 3-D image reconstruction parallel
computing Large-scale Monte Carlo: BaBar (230 cpus),
ATLAS (200 cpus), HERMES (100 cpus)
Summary: - 250-300 cpus for data analysis- ~500 cpus for Monte Carlo- 150-200 TB/yr of storage- Access to parallel computing
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Management Issues Limit to 3-4 people/site
(hardware/system, not software dev.) SAN management Tape management Security Single logins across WestGrid
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
624 * 10/100 ports144 * 10/100/1000 ports
8 x 48 port
5 x 48 port
3*48 Gigabit ports
TRIUMF SITE REPORT – Corrie KostApril 15-19 Catania (Italy)
Sample Rack Configuration• 3U blades• 20 Blades / 3U• 280 servers in 42U rack• 512Mbytes memory/server• 9GB disk/server• Two 10/100 Ethernets/server