31
Red Hat Gluster Storage performance Manoj Pillai and Ben England Performance Engineering June 25, 2015

13767 Red Hat Gluster Storage Performance

  • Upload
    cyber

  • View
    230

  • Download
    4

Embed Size (px)

DESCRIPTION

Red Hat Gluster Storage

Citation preview

Red Hat Gluster Storage performanceManoj Pillai and Ben EnglandPerformance EngineeringJune 25, 2015 New or improved features (in last year)Erasure CodingSnapshotsNFS-GaneshaRDMASSD support Erasure Codingdistributed software !"#$!lternati%e to !"# controllers or &'wa( re)licationCuts storage cost*+B, but com)utationall( e,)ensi%eBetter -e.uential /rite )erformance for some wor0loadsoug1l( same se.uential ead )erformance 2de)ends on mount)oints3"n 45- &61 a%oid Erasure Coding for )ure'small'file or )ure random "*7 wor0loadsE,am)le use cases are arc1i%al, %ideo ca)ture #is)erse translator s)reads EC stri)es in file across 1ostsE,am)le8 EC9:2E;1< E;2H- -na)s1otsBased on de%ice'ma))er t1in')ro%isioned sna)s1ots-im)lified s)ace management for sna)s1ots!llow large number of sna)s1ots wit1out )erformance degradatione.uired c1ange from traditional EI to t1in EI for 45- bric0 storagePerformance im)actB +()icall( 10'15K for large file se.uential read as a result of fragmentation-na)s1ot )erformance im)actMainl( due to writes to s1ared$ bloc0s, co)('on'write triggered on first write to a region after sna)s1ot"nde)endent of number of sna)s1ots in e,istence "m)ro%ed rebalancingebalancing lets (ou add*remo%e 1ardware from an online 5luster %olume"m)ortant for scalabilit(, rede)lo(ment of 1ardware resourcesE,isting algorit1m 1ad s1ortcomings#id not wor0 well for small files/as not )arallel enoug1>o t1rottle>ew algorit1m sol%es t1ese )roblemsE,ecutes in )arallel on all bric0s5i%es (ou control o%er number of concurrent "*7 re.uests*bric016Best practices for sizing, install, administration1= Configurations to a%oid wit1 5luster 2toda(3-u)er'large !"# %olumes 2e6g6 !"#=03F e,am)le8 !"#=0 wit1 2 stri)ed !"#= 12'dis0 com)onentsF -ingle glusterfsd )rocess ser%ing a large number of dis0sF recommend se)arate !"# E?>s insteadJB7# configuration wit1 %er( large ser%er countF 5luster directories are still s)read across e%er( bric0F wit1 JB7#, t1at means e%er( dis0LF =9 ser%ers , &= dis0s*ser%er C M2&00 bric0sF recommendation8 use !"#= bric0s of 12 dis0s eac1F e%en t1en, =9,& C 1N2 bric0s, still not ideal for an(t1ing but large files +est met1odolog(4ow well does 45- wor0 for (our use'caseB -ome benc1mar0ing tools8?se tools wit1 a distributed mode, so multi)le clients can )ut load on ser%ers"o@one 2large'file se.uential wor0loads3, smallfile benc1mar0, fio 2better t1an io@one for random i*o testing6Be(ond micro'benc1mar0ing-PECsfs2019 )ro%ides a))ro,imation to some real'life wor0loadsBeing used internall(e.uires license-PECsfs2019 )ro%ides mi,ed'wor0load generation in different fla%orsI#! 2%ideo data ac.uisition3, I#" 2%irtual des0to) infrastructure3, -/B?"E# 2software build3 !))lication files(stem usage )atterns to a%oid wit1 5luster-ingle't1readed a))lication F one'file'at'a'time )rocessingF uses onl( small fraction 21 #4+ sub%olume3 of 5luster 1ardware+in( files F c1ea) on local files(stems, e,)ensi%e on distributed files(stems-mall directoriesF creation*deletion*read*rename*metadata'c1ange cost, bric0 countLF large file:directory ratio not bad as of glusterfs'&6D?sing re)eated director( scanning to s(nc1roni@e )rocesses on different clientsF 5luster &6= 24- &60693 does not (et in%alidate metadata cac1e on clients "nitial #ata ingestProblem8 a))lications often 1a%e )re%ious data, must load 5luster %olume+()ical met1ods are e,cruciatingl( slow 2see lower rig1tL3 F E,am)le8 single mount)oint, rs(nc 'ra%u-olutions8' for large files on glusterfs, use largest ,fer si@e' co)( multi)le subdirectories in )arallel' multi)le mount)oints )er client' multi)le clients' mount o)tion Ogid'timeoutC5O ' for glusterfs, increase client6e%ent't1reads to P --#s as bric0s!%oid use of storage controller /B cac1ese)arate %olume for --#C1ec0 to) '4$,loo0 for 1ot glusterfsd t1reads on ser%er wit1 --#s5luster tuning for --#s8 ser%er6e%ent't1reads Q 2-!- --#8-e.uential "*78 relati%el( low se.uential write transfer rateandom "*78 a%oids see0 o%er1ead, good "7P--caling8more -!- slots CQ greater +B*1ost, 1ig1 aggregate "7P-PC"8-e.uential "*78 muc1 1ig1er transfer rate since s1orter data )at1andom "*78 lowest latenc( (ields 1ig1est "7P--caling8 more e,)ensi%e, aggregate "7P- limited b( PC" slots 4ig1's)eed networ0ing Q 10 5b)s#onAt need #M! for 10'5b)s networ0, better wit1 QC 90 5b)s"nfiniband alternati%e to #M! F i)oib'Jumbo Hrames 2M+?C=55203 F all switc1es must su))ort'connected mode$ '+CP will get (ou to about R F S 90'5b)s line s)eed10'5bE bonding F see gluster6org 1ow'to' default bonding mode 0 F donAt use it' best modes are 2 2balance',or3, 9 2P026&ad3, = 2balance'alb3H?-E 2glusterfs mount)oints3 F >o 90'5b)s line s)eed from one mount)oint-er%ers donAt run H?-E CQ best wit1 multi)le clients*ser%er>H-:-MB ser%ers use libgfa)i, no H?-E o%er1ead >etwor0ing F Putting it all toget1er Features coming soono a Gluster volume near you(i!e! glusterfs"#!$ and later) Eoo0u)'un1as1ed fi, Bitrot detection F in glusterfs'&6D C 4- &61Pro%ides greater durabilit( for 5luster data 2JB7#3Protects against silent loss of datae.uires signature on re)lica recording original c1ec0sume.uires )eriodic scan to %erif( data still matc1es c1ec0sum>eed more data on cost of t1e scan+B- F #"!5!M-, !>T #!+!B ! tale of two mount)oints ' se.uential write )erformance!nd t1e result666 drum roll6666 Balancing storage and networ0ing )erformanceBased on wor0load+ransactional or small'file wor0loads donAt need Q 10 5b)s >eed lots of "7P- 2e6g6 --#3Earge'file se.uential wor0loads 2e6g6 %ideo ca)ture3#onAt need so man( "7P->eed networ0 bandwidt1/1en in doubt, add more networ0ing, cost U storage Cac1e tiering5oal8 )erformance of --# wit1 cost*+B of s)inning rust-a%ings from Erasure Coding can )a( for --#L#efinition8 5luster tiered %olume consists of two sub%olumes8' 1ot$ tier8 sub'%olumelow ca)acit(, 1ig1 )erformance' cold$ tier8sub'%olume F 1ig1 ca)acit(, low )erformance'' )romotion )olic(8 migrates data from cold tier to 1ot tier'' demotion )olic(8 migrates data from 1ot tier to cold tier' new files are written to 1ot tier initiall( unless 1ot tier is full )erf en1ancementsunless ot1erwise stated, ?>#E C7>-"#E!+"7>, >7+ "MPEEME>+E#Eoo0u)'un1as1edCauto in 5lusterfs &6D toda(, in 45- &61 soonEliminates E77J?P )er bric0 during file creation, etc6JB7# su))ort F 5lusterfs 960 F #4+ I2 intended to eliminate s)read of directories across all bric0s-1arding F s)read file across more bric0s 2li0e Ce)1, 4#H-3Erasure Coding F "ntel instruction su))ort, s(mmetric encoding, bigger c1un0 si@eParallel utilities F e,am)les are )arallel'untar6)( and )arallel'rm'rf6)(Better client'side cac1ing F cac1e in%alidation starting in glusterfs'&6DT7? C!> 4EEP #EC"#ELE,)ress interest and o)inion on t1is