Upload
esmond-fowler
View
212
Download
0
Embed Size (px)
Citation preview
November 2, 2000 HEPiX/HEPNT
FERMI SAN Effort
Lisa Giacchetti
Ray Pasetes
GFS information contributed by Jim Annis
November 2, 2000 HEPiX/HEPNT
Overview Motivation
Current Problems Future goals
Evaluation CXFS SANergy GFS
Current Status
November 2, 2000 HEPiX/HEPNT
Motivation Current Problems
Unbalanced use of central UNIX cluster Large dataset(s) need to be shared in a
large distributed compute environment Current solutions lack performance
throughput Future goals
Linux analysis cluster with SMP feel
November 2, 2000 HEPiX/HEPNT
Evaluation: CXFS Currently SGI-only Currently requires RAID True(er) SAN FS Commitment to Linux port
Equipment 1 Origin 2200 2 Origin 200s 1 Brocade F-C switch 1 SGI (Clarion) RAID ~1TB raw
November 2, 2000 HEPiX/HEPNT
Evaluation: SANergy Heterogeneous solution -- Solaris, WinNT, WIN2K,
IRIX, Tru-64, MacOS, AIX Works with RAID or JBOD Pseudo SAN FS with NFS look Linux port in future (11/00, both MDC and Client)
Equipment 1 Sun Sparc20: RAID management box 1 Ultra 60, 3 Linux, 1 NT4: MDC and client 1 O2200 (client only) 1 16 port Brocade switch 1 Metastor E4400 RAID ~720GB raw
November 2, 2000 HEPiX/HEPNT
Evaluation: GFSGFS
Open source (GPL’d) Sistina Software (ex-University of Minnesota) High performance 64-bit files and file system Distributed, server-less metadata Data synchronization via global, disk based locks Journaling and node cast-out Three major pieces:
The network storage pool driver The file system The locking modules
November 2, 2000 HEPiX/HEPNT
Evaluation: GFS (equipment) System integrator
Linux NetworX Cluster control box
Compute Nodes Linux NetworX Dual 600 MHz Pentium
III ASUS motherboard 1 Gig RAM 2x36 Gig EIDE disks Qlogic 2100 HBA
Ethernet Cisco Catalyst 2948G
Fibre Channel Gadzoox Capellix 3000
Global Disk DotHill SanNet 4200 Dual Fibre Channel
controllers 10x73 Gig Seagate
Cheetah SCSI disk Software
Linux 2.2.16 Qlogic drivers GFS V3.0 Condor
November 2, 2000 HEPiX/HEPNT
Current Status: CXFSConfig: 1 file system, 2-9 36GB disk RAID 5 LUNs
striped together; Each system w/ 1 HBA 3 writes and 3 reads simultaneous of 1GB files at
64K blocks (6 different files) READ 11.5/11.6/11.9 MB/s WRITE 36.5/28.4/28.4 MB/s
SGI Clarion RAID biases towards writes Aggregate: 128 MB/s / 200 MB/s 64% utilization
Peak single write for 2GB file 64K blk = 45MB/s Peak single read for 2GB file 64K blk = 28MB/s Simultaneous writes to same file = 0.165455
MB/sec
November 2, 2000 HEPiX/HEPNT
Current Status: CXFS Stability Issues
Cluster can hang when unmounting file systems
Problem on one machine can affect all nodes resulting in need to reboot entire cluster
Simple reboot often does not work and will need to execute a hard reset.
Java GUI Occasionally hangs Occasionally reports erroneous cluster status
November 2, 2000 HEPiX/HEPNT
Current Status: SANergy Equipment almost in place
MetaStor hardware raid tested w/out SANergy
Pleased with performance Worked as AFS file server central disk store Used this hardware with CXFS test
Config: 2 - 9 disk RAID 5 Results: 95+MB/s read ; 90+MB/s write Limited by HBA
Software yet to be received
November 2, 2000 HEPiX/HEPNT
Current Status: GFSConfig: 5 machines, 1 5-disk RAID-5 2 reads and 1 write, simultaneous of 1 GB
files at 64k blocks Write: 5.1 MB/s Read: 30.0, 30.0 MB/s
Aggregate 65 MB/s / 90 MB/s 72% utilization