Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
© 2014 IBM Corporation
Enterprise2014
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Chris Churchey – Principal ATS Group, LLC
[email protected] (610-574-0207)
October 2014
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise20142
Why Monitor? (Clusters, Servers, Storage, Net, etc.)
Ensure the services and apps are available to our users (customers)
Ensure they perform optimally
Identify constraints, problems or configuration concerns
Learn from past behaviors and trends
Anticipate/Avoid capacity constraints vs. “reacting” to them and impact to users
It’s our job………I hope…
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise20143
What to Monitor (for starters)
CPU
(User + System) >= 80%
Waiting on I/O >= 10% Possible IO bottleneck
Memory
Paging Page-In/Swap-In >= 5 per second
Scan/Free Ratio >= 4 Thrashing
Page/Swap Space Used >= 80% >90% Critical
Huge/Large pages Allocated >0 but Used=0 Waste
Network & Fiber Adapters
Running-Speed = Supported-Speed
Read/Write Throughput >= 80% Running-Speed
Load Balanced across adapters
HBA Queue Depth and Transfer Size settings give huge gains
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise20144
What to Monitor………..
Filesystems
Space Used >= 90% Traditional check
Space Used >= 90% and Free < 1GB less Alerts
“/ and /var” Space Used > 95% and Free < 512MB Critical
I-nodes Used >= 90%
Disks
Write Size < 64KB and Writes/s > 20 and Service Time < 1ms
SAN storage today with write Cache should have all small to medium size writes be
< 1ms on average
Queue Depth, Algorithm and Transfer Size settings give huge gains
Processes
High CPU and/or Memory consumers
Runaway long running processes
Long running gradual memory growth (Memory Leak?)
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise20145
What to Monitor………..
GPFS
All previously listed plus….
NSD’s are distributed equally and balanced across NSD servers
unless you designated specific Roles to NSD server pairs
Server and Client node GPFS specific Node/Filesystem stats
mmpmon, etc.
Special tuning cases arise with Large clusters, millions to billions of files, mixed large
and small files and the “behavior” access to them often will determine special design
considerations
Use of Meta-only NSD’s on dedicated disks using SSDs or Flash and dedicated
adapters for short size IOps intensive access away from large throughput IO
Contact IBM or the Galileo Performance team for assistance
Worker Threads
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise20146
Daily Monitoring Steps (Methodology)
1. Cluster view – Check the Dashboard
2. Identify candidates to investigate…e.g. “What to Monitor”
2. Follow the data….charts…views....
3. View over a period of time
4. Determine usage mix and observed Peaks
* Make it easy with Galileo Performance Explorer GPFS and Storage agents…and new
automated Analytics capability!
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise20147
Cluster view
Immediately 3 observations stand out! (May be ok…May not be….)
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise20148
Investigate high CPU %Busy……which NODE?
Find out which node it is (Top: 1)…..gvicp8gpfsRH05….Lets look at Processes next
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise20149
Investigate high CPU %Busy…found Node…which Process?
Find which Process(s)…(Top: 2)…runaway and every2hrs…3 & 1 Threads……..
* Checked with user…runaway is bad…every2hrs is Scheduled (good)…..
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201410
Investigate high IO Wait……which NODE?
Find out which node it is …..gvicp8gpfsaix04….next..look at nodes details
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201411
Investigate high IO…found Node…is problem HBA or Disks?
Found (4) HBAs…fcs0/fcs1 each 500MB/s…fcs2=100MB/s…fcs3=0….
* Problem was fcs3 not zoned…corrected…lets see what this improved…..
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201412
Investigate high IO…found Node…is problem HBA or Disks?
Corrected fcs3 zoning….now both fcs2 and fcs3 pushing 250MB/s each…
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201413
Investigate high IO…found Node…is problem HBA or Disks?
Fixed zoning, increased IO throughput…BUT…now caused a Memory Paging problem…
*……the OLD saying…Fixing one Perf problem often Exposes another!......
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201414
Eg. NSD Servers not Balanced (Clients constrained)
Looks like (1) NSD Server is doing all the work (gvicp8gpfsaix01)
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201415
NSD Servers not Balanced (Clients constrained) ……..
Identify what “File-System” is heavily used and the Client node(s)
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201416
Round-Robin NSD Server-list to Balance load
Changed NSD Server Order to Balance between gvicp8gpfsaix01 and …aix02
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201417
Switched the 2 Clients to Direct-attached-Node
Now Data intensive nodes can go Direct storage, major throughput improvement
….Yes…could do an all Infiniband Network…..
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201418
Galileo Analytics engine…minutes vs. hours of past 11-Slides….
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201419
Galileo Analytics engine…..Booth-22
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201420
E.g. Seq. 50/50 Read/Write 256K 8-Threads V7K-SAS
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201421
E.g. Seq. 50/50 Read/Write 256K 8-Threads Flash-840
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201422
We are seeking Use-Cases for input to Galileo PE Analytics engine for ‘automation’
– Lessons Learned / Best Practices / Thresholds as well
We have an Innovation Center lab where we test, demo and showcase technology
– Ideas to demo, POC, verify claims, etc. you would like to see us perform and share!
[email protected] or [email protected] or [email protected]
…..Please contact us…..!!!!!!
Booth #22
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201423
Questions and Answers
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201424
We can help analyze and implement. Contact us!
Check-out Galileo Performance Explorer™
– Visit Booth #22 for a hands-on demo
– Sign-up for a trial at www.GalileoSuite.com
– Complimentary* no-strings attached 3 months use for Conference attendees
[email protected] (484-320-4302)
www.GalileoSuite.com
* First time Galileo user
© 2014 IBM Corporation
GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)
Enterprise201425
Deploying a big data solution using IBM GPFS-FPOhttp://public.dhe.ibm.com/common/ssi/ecm/en/dcw03051usen/DCW03051USEN.PDF
GPFS tuning guidelines for deploying SAShttp://www.sas.com/content/dam/SAS/en_us/doc/partners/ibm-gpfs-tuning-guidelines.pdf
GPFS Wiki – IBM DeveloperWorkshttps://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29
GSS / ESS https://www.ibm.com/developerworks/community/blogs/5things/entry/gpfs_storage_server?lang=en
Galileo Performance Explorerhttp://www.GalileoSuite.com
* First time Galileo user
Referenced Material