Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
LLNL-PRES-724397This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
ZFS Monitoring and Management at LLNL
Tony Hutter
ZFS User Conference 2017
March 16, 2017
LLNL-PRES-7243972
Contract awarded to RAID Inc
Lustre 2.8 on top of ZFS
Wanted a vendor agnostic software stack
16 MDS nodes, 36 OSS nodes
2880 8TB HDDs, 96 800GB SSDs
8 24-bay SSD JBODs, 36 84-bay HDD JBODs
Smaller configuration for “Brass” and “Jet” systems, but same RAID Inc. hardware.
Meet “Zinc” our new 18PB filesystem
LLNL-PRES-7243973
Glamour shot
LLNL-PRES-7243974
Node configuration
MDS Node 3
MDS Node 2
MDT enclosure L
MDS Node 1
MDT enclosure U
MDS Node 0
OST enclosure U
OSS Node 0
OST enclosure L
OSS Node 1
Metadata rack Object store rack
LLNL-PRES-7243975
Multipath = each disk has two SAS connections
Increases bandwidth and provides link failover
Disk shows up as /dev/sda and /dev/sdb, and also multipath device /dev/dm-N
We use the ZFS 'vdev-id' script to make friendly aliases for the drives in /dev/disk/by-vdev/
Multipath drives
/dev/disk/by-vdev/L0 -> ../../dm-50/dev/disk/by-vdev/L1 -> ../../dm-81/dev/disk/by-vdev/L2 -> ../../dm-99.../dev/disk/by-vdev/L83 -> ../../dm-157
LLNL-PRES-7243976
Splunk is a syslog processing engine with web front-end.
It has a query language that allows you to construct tables and graphs from syslog values.
You can group together multiple graphs and tables into a “dashboard”, for a single pane of glass view of your systems.
Just log key=value pairs to syslog and Splunk can graph it.
Monitoring with Splunk
LLNL-PRES-7243977
Splunk example
Dec 6 14:00:41 jet21 zpool_status: pool=mypool, vdev=B7, state=FAULTED, read_errors=0, write_errors=3, chksum_errors=0, resilver=0
zpool_status host=* pool=*| where state!="ONLINE" OR read_errors!=0 OR write_errors!=0 OR chksum_errors!=0...
syslog
Splunk query
=
+
LLNL-PRES-7243978
zpool status across all filesystems
LLNL-PRES-7243979
Graphing zpool status over time
LLNL-PRES-72439710
SMART stats (smartctl -a)
We're logging smart status, read & write uncorrectable errors, and Grown Defect List (GLIST). All drives report SAS stats.
LLNL-PRES-72439711
SMART Gown Defect List (smartctl -a)
Not enough data to know if GLIST is a predictor of pending drive failure yet.
LLNL-PRES-72439712
SMART Drive Temperatures (smartctl -a)
We've noticed on occasions that a few of our disks run hotter than spec.
LLNL-PRES-72439713
SMART Drive Temperatures (smartctl -a)
Drives at the back of the enclosure get hotter, so we adjusted our raidz2 configuration to have mix of drives from front and back.
LLNL-PRES-72439714
Enclosure sensor values (sg_ses)
We graph enclosure fan speed, temperature, voltage, and current.
LLNL-PRES-72439715
Enclosure sensor values (sg_ses)
We graph the number of SES values reporting “Critical” or to look for potential hardware problems.
LLNL-PRES-72439716
Enclosure sensor values (sg_ses)
LLNL-PRES-72439717
SAS PHY Errors (/sys/class/sas_phy/...)
Bad SAS PHYs can create ZFS read/write errors, and cause drives to disappear and re-appear.
LLNL-PRES-72439718
Disk history by drive serial number
Periodically logging drive serial numbers allow you to see if and when drives were replaced, and it helps locate the drive if it's been moved to another enclosure.
You can also build a record of any SMART errors associated with that serial number in case you need to RMA the drive.
LLNL-PRES-72439719
zpool iostat bandwidth
LLNL-PRES-72439720
zpool iostat latency
LLNL-PRES-72439721
We log stats with cron every hour. Also log zpool stats on every vdev state change via a zedlet.
'zpool status -c' can be useful for grabbing stats:
Logging scripts
# zpool status -c 'smartctl -a $VDEV_UPATH | grep "Drive Temp"' ...
NAME STATE READ WRITE CKSUMjet18 DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 Current Drive Temperature: 26 C L1 ONLINE 0 0 0 Current Drive Temperature: 25 C L14 ONLINE 0 0 0 Current Drive Temperature: 33 C L15 ONLINE 0 0 0 Current Drive Temperature: 30 C L28 ONLINE 0 0 0 Current Drive Temperature: 37 C L29 ONLINE 0 0 0 Current Drive Temperature: 35 C L42 ONLINE 0 0 0 Current Drive Temperature: 40 C L43 ONLINE 0 0 0 Current Drive Temperature: 39 C L56 ONLINE 0 0 0 Current Drive Temperature: 43 C L70 ONLINE 0 0 0 Current Drive Temperature: 46 C
LLNL-PRES-72439722
We use zed to automatically turn on/off slot LEDs when vdevs go FAULTED/DEGRADED/UNAVAIL.
We enable auto-replace on our pools so we can swap in a new disk for an old one and have it auto-resilver.
This allows operations staff to replace bad drives without being root.
Slot fault LEDs