11
November 2, 2000 HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

Embed Size (px)

Citation preview

Page 1: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

FERMI SAN Effort

Lisa Giacchetti

Ray Pasetes

GFS information contributed by Jim Annis

Page 2: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

Overview Motivation

Current Problems Future goals

Evaluation CXFS SANergy GFS

Current Status

Page 3: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

Motivation Current Problems

Unbalanced use of central UNIX cluster Large dataset(s) need to be shared in a

large distributed compute environment Current solutions lack performance

throughput Future goals

Linux analysis cluster with SMP feel

Page 4: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

Evaluation: CXFS Currently SGI-only Currently requires RAID True(er) SAN FS Commitment to Linux port

Equipment 1 Origin 2200 2 Origin 200s 1 Brocade F-C switch 1 SGI (Clarion) RAID ~1TB raw

Page 5: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

Evaluation: SANergy Heterogeneous solution -- Solaris, WinNT, WIN2K,

IRIX, Tru-64, MacOS, AIX Works with RAID or JBOD Pseudo SAN FS with NFS look Linux port in future (11/00, both MDC and Client)

Equipment 1 Sun Sparc20: RAID management box 1 Ultra 60, 3 Linux, 1 NT4: MDC and client 1 O2200 (client only) 1 16 port Brocade switch 1 Metastor E4400 RAID ~720GB raw

Page 6: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

Evaluation: GFSGFS

Open source (GPL’d) Sistina Software (ex-University of Minnesota) High performance 64-bit files and file system Distributed, server-less metadata Data synchronization via global, disk based locks Journaling and node cast-out Three major pieces:

The network storage pool driver The file system The locking modules

Page 7: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

Evaluation: GFS (equipment) System integrator

Linux NetworX Cluster control box

Compute Nodes Linux NetworX Dual 600 MHz Pentium

III ASUS motherboard 1 Gig RAM 2x36 Gig EIDE disks Qlogic 2100 HBA

Ethernet Cisco Catalyst 2948G

Fibre Channel Gadzoox Capellix 3000

Global Disk DotHill SanNet 4200 Dual Fibre Channel

controllers 10x73 Gig Seagate

Cheetah SCSI disk Software

Linux 2.2.16 Qlogic drivers GFS V3.0 Condor

Page 8: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

Current Status: CXFSConfig: 1 file system, 2-9 36GB disk RAID 5 LUNs

striped together; Each system w/ 1 HBA 3 writes and 3 reads simultaneous of 1GB files at

64K blocks (6 different files) READ 11.5/11.6/11.9 MB/s WRITE 36.5/28.4/28.4 MB/s

SGI Clarion RAID biases towards writes Aggregate: 128 MB/s / 200 MB/s 64% utilization

Peak single write for 2GB file 64K blk = 45MB/s Peak single read for 2GB file 64K blk = 28MB/s Simultaneous writes to same file = 0.165455

MB/sec

Page 9: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

Current Status: CXFS Stability Issues

Cluster can hang when unmounting file systems

Problem on one machine can affect all nodes resulting in need to reboot entire cluster

Simple reboot often does not work and will need to execute a hard reset.

Java GUI Occasionally hangs Occasionally reports erroneous cluster status

Page 10: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

Current Status: SANergy Equipment almost in place

MetaStor hardware raid tested w/out SANergy

Pleased with performance Worked as AFS file server central disk store Used this hardware with CXFS test

Config: 2 - 9 disk RAID 5 Results: 95+MB/s read ; 90+MB/s write Limited by HBA

Software yet to be received

Page 11: November 2, 2000HEPiX/HEPNT FERMI SAN Effort Lisa Giacchetti Ray Pasetes GFS information contributed by Jim Annis

November 2, 2000 HEPiX/HEPNT

Current Status: GFSConfig: 5 machines, 1 5-disk RAID-5 2 reads and 1 write, simultaneous of 1 GB

files at 64k blocks Write: 5.1 MB/s Read: 30.0, 30.0 MB/s

Aggregate 65 MB/s / 90 MB/s 72% utilization