19
Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect High Performance Computing

Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast)

  • Upload
    ura

  • View
    51

  • Download
    1

Embed Size (px)

DESCRIPTION

Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast). Torben Kling Petersen, PhD Principal Architect High Performance Computing. The Challenge. The REAL challenge. File system Up/down Slow Fragmented Capacity planning - PowerPoint PPT Presentation

Citation preview

PowerPoint Presentation

Advanced Lustre Infrastructure Monitoring

(Resolving the Storage I/O Bottleneckand managing the beast)Torben Kling Petersen, PhDPrincipal ArchitectHigh Performance Computing

The Challenge

#File systemUp/downSlowFragmentedCapacity planningHA (Fail-overs etc)HardwareNodes crashingComponents breakingFRUsDisk rebuildsCables ??

The REAL challengeSoftwareUpgrades / patches ??BugsClientsQuotas Workload optimizationOtherDocumentationScalabilityPower consumptionMaintenance windowsBack-ups

Xyratex 2013#Tightly integrated solutions HardwareSoftwareSupportExtensive testingClear roadmapsIn-depth trainingEven more extensive testing ..

The Answer ??Xyratex 2013#

ClusterStor Software Stack Overview

ClusterStor 6000 Embedded Application Server Intel Sandy Bridge CPU, up to 4 DIMM slots FDR & 40GbE F/E, SAS-2 (6G) B/E SBB v2 Form Factor, PCIe Gen-3 Embedded RAID & Lustre supportCS 6000 SSULustre File System (2.x)ClusterStor ManagerData Protection Layer(RAID 6 / PD-RAID)Linux OSUnified System Management(GEM-USM)

Embedded server modules

Xyratex 2013#5

ClusterStor dashboard

Problems found #Hardware inventory .

#Hardware inventory .

#Finding problems ???

#But things brake .Especially disk drives

What then ???#Most enterprise NL-SAS HDDs have an AFR of .7 - .8% And some companies use S-ATA with a stated 3% AFR 10Large systems use many HDDs to deliver both performance and capacityNCSA BW uses 17,000+ HDDs for the main scratch FSAt 3% AFR this means 531 HDDs fail annuallyThats ~1.5 drives per day !!!!RAID 6 rebuild time under use is 24 36 hoursBottom line, the scratch system would NEVER be fully operational and there would constantly be a risk of loosing additional drives leading to data loss !!

Lets do some math .Xyratex 2013#Xyratex pre-tests all drives used in ClusterStor solutionsEach drive is subjected to 24-28 hours of intense I/OReads and writes are performed to all sectors Ambient temperature cycles between 40 C and 5C Any drive surviving, goes on to additional testingAs a result Xyratex disk drives deliver proven reliability with less that 0.3% annual failure rateReal Life ImpactOn a large system such as NCSA BlueWaters with 17,000+ disk drives, this means a predicted failure of 50 drives per year*Other vendors publically state a failure rate of 3%* which (given equivalent number of disk drives) means 500+ drive failures per yearWith fairly even distribution, the file system will ALWAYS be in a state of rebuildIn addition as a file system with wide stripes will perform according to the slowest OST, the entire system will always run in degraded mode ..Drive Technology/Reliability*DDN, Keith Miller, LUG 2012Xyratex 2013#Annual Failure Rate of Xyratex Disks Actual AFR Data (2012/13) Experienced by Xyratex Sourced SAS DrivesXyratex drive failure rate is less than half of industry standard !At 0.3%, the annual failure would be 53 HDDs Xyratex 2013#As growth in areal density growth slows (