25
Toward new HSM Toward new HSM solution solution using using GPFS/TSM/StoRM GPFS/TSM/StoRM integration integration Vladimir Sapunenko Vladimir Sapunenko (INFN, CNAF) (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo Zappi (INFN, CNAF) Riccardo Zappi (INFN, CNAF) Lunca Magnoni (INFN, CNAF) Lunca Magnoni (INFN, CNAF) Elisabetta Ronchieri (INFN, CNAF) Elisabetta Ronchieri (INFN, CNAF) Vincenzo Vagnoni (INFN, Bologna Vincenzo Vagnoni (INFN, Bologna)

Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

Embed Size (px)

Citation preview

Page 1: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

Toward new HSM Toward new HSM solution solution

usingusing GPFS/TSM/StoRM GPFS/TSM/StoRM

integrationintegrationVladimir Sapunenko Vladimir Sapunenko (INFN, CNAF)(INFN, CNAF)

Luca dell’Agnello (INFN, CNAF)Luca dell’Agnello (INFN, CNAF)Daniele Gregori (INFN, CNAF)Daniele Gregori (INFN, CNAF)Riccardo Zappi (INFN, CNAF)Riccardo Zappi (INFN, CNAF)Lunca Magnoni (INFN, CNAF)Lunca Magnoni (INFN, CNAF)

Elisabetta Ronchieri (INFN, CNAF)Elisabetta Ronchieri (INFN, CNAF)Vincenzo Vagnoni (INFN, BolognaVincenzo Vagnoni (INFN, Bologna))

Page 2: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 2HEPiX 2008, Geneve

Storage classes @ CNAFStorage classes @ CNAF Implementation of 3 Storage Classes needed for LHCImplementation of 3 Storage Classes needed for LHC Disk0Tape1 (D0T1) Disk0Tape1 (D0T1) CASTOR CASTOR

Space managed by systemSpace managed by system Data migrated to tapes and deleted from when staging area Data migrated to tapes and deleted from when staging area

is full is full Disk1tape0 (D1T0) Disk1tape0 (D1T0) GPFS/StoRM (in production) GPFS/StoRM (in production)

Space managed by VO Space managed by VO Disk1tape1 Disk1tape1 (D1T1) (D1T1) CASTOR (production), CASTOR (production),

GPFS/StoRM (production prototype for LCHb only)GPFS/StoRM (production prototype for LCHb only) Space managed by VO (i.e. if disk is full, copy fails)Space managed by VO (i.e. if disk is full, copy fails) Large permanent buffer of disk with tape back-end and no Large permanent buffer of disk with tape back-end and no

gcgc

Page 3: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 3HEPiX 2008, Geneve

Looking into HSM solution Looking into HSM solution on the base of on the base of

StoRM/GPFS/TSMStoRM/GPFS/TSM Project developed as a collaboration between:Project developed as a collaboration between:

GPFS development team (US)GPFS development team (US) TSM HSM development team (Germany)TSM HSM development team (Germany) End-users (INFN-CNAF)End-users (INFN-CNAF)

Main idea is to combine new features of GPFS Main idea is to combine new features of GPFS (v.3.2) and TSM (v.5.5) with SRM (StoRM), to (v.3.2) and TSM (v.5.5) with SRM (StoRM), to provide transparent GRID-friendly HSM solution.provide transparent GRID-friendly HSM solution. Information Lifecycle Management (ILM) used to order Information Lifecycle Management (ILM) used to order

moving of data between disks and tapesmoving of data between disks and tapes Interface between GPFS and TSM is on our shouldersInterface between GPFS and TSM is on our shoulders

Improvements and development are needed from Improvements and development are needed from all sidesall sides Transparent recall vs. massive (list ordered, optimized) Transparent recall vs. massive (list ordered, optimized)

recallsrecalls

Page 4: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 4HEPiX 2008, Geneve

What we have nowWhat we have now GPFS and TSM are widely used as GPFS and TSM are widely used as

separate productsseparate products Build-in functionality in both products Build-in functionality in both products

to implement backup and archiving to implement backup and archiving from GPFS.from GPFS.

In GPFS v.3.2 concept of “external In GPFS v.3.2 concept of “external storage pool” extends use of policy storage pool” extends use of policy driven ILM to tape storage.driven ILM to tape storage.

Some groups in HEP world are Some groups in HEP world are starting to investigate this solution or starting to investigate this solution or expressed interest to startexpressed interest to start

Page 5: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 5HEPiX 2008, Geneve

GPFS Approach: GPFS Approach: “External Pools”“External Pools”

External pools are really interfaces to External pools are really interfaces to external storage managers, e.g. HPSS or external storage managers, e.g. HPSS or TSMTSM External pool “rule” defines script to call to External pool “rule” defines script to call to

migrate/recall/etc. filesmigrate/recall/etc. filesRULE EXTERNAL POOL ‘PoolName’ EXEC ‘InterfaceScript’ RULE EXTERNAL POOL ‘PoolName’ EXEC ‘InterfaceScript’ [ OPTS ’options’][ OPTS ’options’]

GPFS policy engine builds candidate lists GPFS policy engine builds candidate lists and passes them to external pool scriptsand passes them to external pool scripts

External storage manager actually moves External storage manager actually moves the datathe data

Page 6: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 6HEPiX 2008, Geneve

Storage class Disk1-Tape1Storage class Disk1-Tape1 D1T1 prototype in GPFS/TSM was tested for D1T1 prototype in GPFS/TSM was tested for

about two monthsabout two months Quite simple when no competition between Quite simple when no competition between

migration and recallmigration and recall D1T1 requires that every file written to disk will be copied D1T1 requires that every file written to disk will be copied

to tape (and remain resident on disk)to tape (and remain resident on disk) recalls needed only in case of data loss (on disk)recalls needed only in case of data loss (on disk)

Although the D1T1 is a living concept…Although the D1T1 is a living concept…

Some adjustments were needed in StoRMSome adjustments were needed in StoRM Basically to place a file on hold for migration until the Basically to place a file on hold for migration until the

write operation is completed (SRM “putDone” on file)write operation is completed (SRM “putDone” on file) Definitely positive results of the test with the Definitely positive results of the test with the

current testbed hardwarecurrent testbed hardware Need to more tests up with a larger scaleNeed to more tests up with a larger scale Need to establish production modelNeed to establish production model

Page 7: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 7HEPiX 2008, Geneve

Storage class Disk0-Tape1Storage class Disk0-Tape1 Prototype is ready and being tested nowPrototype is ready and being tested now More complicated logic is neededMore complicated logic is needed

Define priority between reads and writesDefine priority between reads and writes For example in actual version of CASTOR migration to For example in actual version of CASTOR migration to

tape have absolute prioritytape have absolute priority logic of reordering of recall “list optimized logic of reordering of recall “list optimized

recall”: by tapes and by files inside a taperecall”: by tapes and by files inside a tape The logic is realized by means of special The logic is realized by means of special

scripts scripts First tests are encouraging, even First tests are encouraging, even

considering the complexity of the problemconsidering the complexity of the problem Modification were requested in StoRM to Modification were requested in StoRM to

implement recall logic and file pinning for implement recall logic and file pinning for files in use.files in use. The identified solutions are simple and linearThe identified solutions are simple and linear

Page 8: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 8HEPiX 2008, Geneve

GPFS+TSM testsGPFS+TSM tests So far we have performed full tests of a So far we have performed full tests of a

D1T1 solution (StoRM+GPFS+TSM) D1T1 solution (StoRM+GPFS+TSM) and the D0T1 implementation is being and the D0T1 implementation is being developed in close contact with IBM developed in close contact with IBM GPFS and TSM developersGPFS and TSM developers

The D1T1 is entering now its first The D1T1 is entering now its first production phase, being used by LHCb production phase, being used by LHCb during this month’s CCRC08during this month’s CCRC08 As well as the D1T0, which is served by the As well as the D1T0, which is served by the

same GPFS cluster but without migrationssame GPFS cluster but without migrations GPFS/StoRM based D1T0 is also already GPFS/StoRM based D1T0 is also already

used since February by Atlasused since February by Atlas

Page 9: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 9HEPiX 2008, Geneve

D1T0 and D1T1 @CNAFD1T0 and D1T1 @CNAF using StoRM/GPFS/TSMusing StoRM/GPFS/TSM

3 STORM 3 STORM instances instances

3 major HEP 3 major HEP experimentsexperiments

2 Storage 2 Storage classesclasses

12 servers, 200TB 12 servers, 200TB of disk spaceof disk space

3 LTO2 tape 3 LTO2 tape drivesdrives

Page 10: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 10HEPiX 2008, Geneve

Hardware used for testHardware used for test

40TB GPFS File system (v.3.2.0-3) served by 4 40TB GPFS File system (v.3.2.0-3) served by 4 I/O NSD servers (SAN devices are EMC CX3-80)I/O NSD servers (SAN devices are EMC CX3-80) FC (4Gbit/s) interconnection between servers and FC (4Gbit/s) interconnection between servers and

disks arraydisks array TSM v.5.5TSM v.5.5 2 servers (1Gb Ethernet) HSM front-ends each 2 servers (1Gb Ethernet) HSM front-ends each

one acting as:one acting as: GPFS client (reads and writes on the file-system via GPFS client (reads and writes on the file-system via

LAN)LAN) TSM client (reads and writes from/to tapes via FC)TSM client (reads and writes from/to tapes via FC)

3 LTO-2 tape drives3 LTO-2 tape drives Sharing of the tape library (STK L5500) between Sharing of the tape library (STK L5500) between

Castor e TSMCastor e TSM i.e. working together with the same tape libraryi.e. working together with the same tape library

Page 11: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 11HEPiX 2008, Geneve

GPFS Server

GPFS/TSM client

GPFS/TSM client

TSM server

Tape drive

Tape drive

Tape drive

GPFS

TSM

Gigabit LAN

Gigabit LAN

FC SANFC SAN

GPFS Server

gridftp Server

DB

TSM server (backup)

DBmirror

2 EMC CX3-80 controllers4 GPFS server2 StoRM servers2 Gridftp Servers2 HSM frontend nodes3 Tape Drive LTO-21 TSM server

1/10 Gbps Ethernet 2/4 Gbps FC

LHCb D1T0 and D1T1 detailsLHCb D1T0 and D1T1 details

FC TANFC TAN

Page 12: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 12HEPiX 2008, Geneve

How it worksHow it works GPFS performs file system metadata scans according GPFS performs file system metadata scans according

to ILM policies specified by the administratorsto ILM policies specified by the administrators The metadata scan is very fast (is not a find…) and is used The metadata scan is very fast (is not a find…) and is used

by GPFS to identify the files which need to be migrated to by GPFS to identify the files which need to be migrated to tapetape

Once the list of files are obtained, it is passed to an Once the list of files are obtained, it is passed to an external process which is run on the HSM nodes and external process which is run on the HSM nodes and it actually performs the migration to TSMit actually performs the migration to TSM This is in particular what we implementedThis is in particular what we implemented

Note:Note: The GPFS file system and the HSM nodes can be kept The GPFS file system and the HSM nodes can be kept

completely decoupled, in the sense that it is possible to completely decoupled, in the sense that it is possible to shutdown the HSM nodes without interrupting the file shutdown the HSM nodes without interrupting the file system availabilitysystem availability

All components of the system are having intrinsic All components of the system are having intrinsic redundancy (GPFS failover mechanisms). redundancy (GPFS failover mechanisms).

No need to put in place any kind of HA features (apart from the No need to put in place any kind of HA features (apart from the unique TSM server)unique TSM server)

Page 13: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 13HEPiX 2008, Geneve

Example of a ILM policyExample of a ILM policy

/* Policy implementing T1D1 for LHCb:/* Policy implementing T1D1 for LHCb: -) 1 GPFS storage pool-) 1 GPFS storage pool -) 1 SRM space token: LHCb_M-DST-) 1 SRM space token: LHCb_M-DST -) 1 TSM management class-) 1 TSM management class -) 1 TSM storage pool */-) 1 TSM storage pool */

/* Placement policy rules *//* Placement policy rules */RULE 'DATA1' SET POOL 'data1' LIMIT (99)RULE 'DATA1' SET POOL 'data1' LIMIT (99)RULE 'DATA2' SET POOL 'data2' LIMIT (99)RULE 'DATA2' SET POOL 'data2' LIMIT (99)RULE 'DEFAULT' SET POOL 'system'RULE 'DEFAULT' SET POOL 'system'

/* We have 1 space token: LHCb_M-DST. Define 1 external pool accordingly. *//* We have 1 space token: LHCb_M-DST. Define 1 external pool accordingly. */RULE EXTERNAL POOL 'TAPE MIGRATION LHCb_M-DST‘RULE EXTERNAL POOL 'TAPE MIGRATION LHCb_M-DST‘

EXEC '/var/mmfs/etc/hsmControl' OPTS 'LHCb_M-DST‘EXEC '/var/mmfs/etc/hsmControl' OPTS 'LHCb_M-DST‘/* Exclude from migration hidden directories (e.g. .SpaceMan), /* Exclude from migration hidden directories (e.g. .SpaceMan), baby files, hidden and weird files. */baby files, hidden and weird files. */RULE 'exclude hidden directories' EXCLUDE WHERE PATH_NAME LIKE '%/.%'RULE 'exclude hidden directories' EXCLUDE WHERE PATH_NAME LIKE '%/.%'RULE 'exclude hidden file' EXCLUDE WHERE NAME LIKE '.%'RULE 'exclude hidden file' EXCLUDE WHERE NAME LIKE '.%'RULE 'exclude empty files' EXCLUDE WHERE FILE_SIZE=0RULE 'exclude empty files' EXCLUDE WHERE FILE_SIZE=0RULE 'exclude baby files' EXCLUDERULE 'exclude baby files' EXCLUDE

WHERE (CURRENT_TIMESTAMP-MODIFICATION_TIME)<INTERVAL '3' MINUTEWHERE (CURRENT_TIMESTAMP-MODIFICATION_TIME)<INTERVAL '3' MINUTE

Page 14: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 14HEPiX 2008, Geneve

Example of a ILM Example of a ILM policy policy (cont.)(cont.)

/* Migrate to the external pool according to /* Migrate to the external pool according to space token (i.e. fileset). */space token (i.e. fileset). */

RULE 'migrate from system to tape LHCb_M-DST'RULE 'migrate from system to tape LHCb_M-DST'MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)TO POOL 'TAPE MIGRATION LHCb_M-DST'TO POOL 'TAPE MIGRATION LHCb_M-DST'FOR FILESET('LHCb_M-DST')FOR FILESET('LHCb_M-DST')

RULE 'migrate from data1 to tape LHCb_M-DST'RULE 'migrate from data1 to tape LHCb_M-DST'MIGRATE FROM POOL 'data1' THRESHOLD(0,100,0)MIGRATE FROM POOL 'data1' THRESHOLD(0,100,0)WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)TO POOL 'TAPE MIGRATION LHCb_M-DST'TO POOL 'TAPE MIGRATION LHCb_M-DST'FOR FILESET('LHCb_M-DST')FOR FILESET('LHCb_M-DST')

RULE 'migrate from data2 to tape LHCb_M-DST'RULE 'migrate from data2 to tape LHCb_M-DST'MIGRATE FROM POOL 'data2' THRESHOLD(0,100,0)MIGRATE FROM POOL 'data2' THRESHOLD(0,100,0)WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)WEIGHT(CURRENT_TIMESTAMP-ACCESS_TIME)TO POOL 'TAPE MIGRATION LHCb_M-DST'TO POOL 'TAPE MIGRATION LHCb_M-DST'FOR FILESET('LHCb_M-DST')FOR FILESET('LHCb_M-DST')

Page 15: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 15HEPiX 2008, Geneve

Example of configuration Example of configuration filefile

# HSM node list (comma separated)# HSM node list (comma separated)HSMNODES=diskserv-san-14,diskserv-san-16HSMNODES=diskserv-san-14,diskserv-san-16

# system directory path# system directory pathSVCFS=/storage/gpfs_lhcb/systemSVCFS=/storage/gpfs_lhcb/system

# filesystem scan minimum frequency (in sec)# filesystem scan minimum frequency (in sec)SCANFREQUENCY=1800SCANFREQUENCY=1800

# maximum time allowed for a migrate session # maximum time allowed for a migrate session (in sec)(in sec)

MIGRATESESSIONTIMEOUT=4800MIGRATESESSIONTIMEOUT=4800

# maximum number of migrate threads per node# maximum number of migrate threads per nodeMIGRATETHREADSMAX=30MIGRATETHREADSMAX=30

# number of files for each migrate stream# number of files for each migrate streamMIGRATESTREAMNUMFILES=30MIGRATESTREAMNUMFILES=30

# sleep time for lock file check loop# sleep time for lock file check loopLOCKSLEEPTIME=2LOCKSLEEPTIME=2

# pin prefix# pin prefixPINPREFIX=.STORM_T1D1_PINPREFIX=.STORM_T1D1_

# TSM admin user name# TSM admin user nameTSMID=xxxxxTSMID=xxxxx

# TSM admin user password# TSM admin user passwordTSMPASS=xxxxxTSMPASS=xxxxx

# report period (in sec)# report period (in sec)REPORTFREQUENCY=86400 REPORTFREQUENCY=86400

# report email addresses (comma # report email addresses (comma separated)separated)

REPORTEMAILADDRESS=Vladimir.Sapunenko@[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],Vincenzo,[email protected],[email protected]@bo.infn.it

# alarm email addresses (comma # alarm email addresses (comma separated)separated)

[email protected][email protected]

# alarm email delay (in sec)# alarm email delay (in sec)ALARMEMAILDELAY=7200ALARMEMAILDELAY=7200

Page 16: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 16HEPiX 2008, Geneve

Example of a reportExample of a reportA first automatic reporting system has been implementedA first automatic reporting system has been implemented

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Start: Sun 04 May 2008 11:38:48 PM CESTStart: Sun 04 May 2008 11:38:48 PM CESTStop: Mon 05 May 2008 08:03:15 AM CEST Seconds: 30267Stop: Mon 05 May 2008 08:03:15 AM CEST Seconds: 30267------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Tape Files Failures File throughput Total throughputTape Files Failures File throughput Total throughputL00595 5 0 31.0798 MiB/s 0.702259 MiB/sL00595 5 0 31.0798 MiB/s 0.702259 MiB/sL00599 10 0 32.4747 MiB/s 1.41891 MiB/sL00599 10 0 32.4747 MiB/s 1.41891 MiB/sL00611 57 0 29.0862 MiB/s 6.59165 MiB/sL00611 57 0 29.0862 MiB/s 6.59165 MiB/sL00614 47 0 31.5084 MiB/s 6.61944 MiB/sL00614 47 0 31.5084 MiB/s 6.61944 MiB/sL00615 46 0 30.3926 MiB/s 6.57133 MiB/sL00615 46 0 30.3926 MiB/s 6.57133 MiB/sL00617 47 0 31.1735 MiB/s 6.5116 MiB/sL00617 47 0 31.1735 MiB/s 6.5116 MiB/sL00618 62 0 28.4119 MiB/s 6.06469 MiB/sL00618 62 0 28.4119 MiB/s 6.06469 MiB/sL00619 44 0 27.0226 MiB/s 4.10937 MiB/sL00619 44 0 27.0226 MiB/s 4.10937 MiB/sL00620 53 0 27.1009 MiB/s 7.13976 MiB/sL00620 53 0 27.1009 MiB/s 7.13976 MiB/sL00621 66 0 28.9043 MiB/s 6.67269 MiB/sL00621 66 0 28.9043 MiB/s 6.67269 MiB/sL00624 44 0 11.4347 MiB/s 5.82468 MiB/sL00624 44 0 11.4347 MiB/s 5.82468 MiB/sL00626 62 0 30.4792 MiB/s 6.53114 MiB/sL00626 62 0 30.4792 MiB/s 6.53114 MiB/s------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Drive Files Failures File throughput Total throughputDrive Files Failures File throughput Total throughputDRIVE3 218 0 30.2628 MiB/s 25.7269 MiB/sDRIVE3 218 0 30.2628 MiB/s 25.7269 MiB/sDRIVE4 197 0 29.5188 MiB/s 23.6487 MiB/sDRIVE4 197 0 29.5188 MiB/s 23.6487 MiB/sDRIVE5 128 0 21.5395 MiB/s 15.3819 MiB/sDRIVE5 128 0 21.5395 MiB/s 15.3819 MiB/s------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Host Files Failures File throughput Total throughputHost Files Failures File throughput Total throughputdiskserv-san-14 285 0 29.9678 MiB/s 34.0331 MiB/sdiskserv-san-14 285 0 29.9678 MiB/s 34.0331 MiB/sdiskserv-san-16 258 0 25.6928 MiB/s 30.7245 MiB/sdiskserv-san-16 258 0 25.6928 MiB/s 30.7245 MiB/s------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Files Failures File throughput Total throughputFiles Failures File throughput Total throughputTotal 543 0 27.9366 MiB/s 64.7575 MiB/sTotal 543 0 27.9366 MiB/s 64.7575 MiB/s------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Alarm part is being developedAlarm part is being developed

An email is sent with the reports every day (period of time is configurable by the option file)

Page 17: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 17HEPiX 2008, Geneve

Description of the testsDescription of the tests Test ATest A

Data transfer of LHCb files from CERN Castor-disk to Data transfer of LHCb files from CERN Castor-disk to CNAF StoRM/GPFS using the File Transfer ServiceCNAF StoRM/GPFS using the File Transfer Service

Automatic migration of the data files from GPFS to TSM Automatic migration of the data files from GPFS to TSM while the data was being transferred by FTSwhile the data was being transferred by FTS

This is a realistic scenarioThis is a realistic scenario Test BTest B

1GiB zero’ed files created locally on the GPFS file system 1GiB zero’ed files created locally on the GPFS file system with the migration turned off, then migrated to tape when with the migration turned off, then migrated to tape when the writes were finishedthe writes were finished

The migration of zero’ed files to tape is faster due to The migration of zero’ed files to tape is faster due to compression compression measures physical limits of the system measures physical limits of the system

Test CTest C Similar to Test B, but with real LHCb data files instead of Similar to Test B, but with real LHCb data files instead of

dummy zero’ed filesdummy zero’ed files Realistic scenario, e.g. when for maintenance a long queue Realistic scenario, e.g. when for maintenance a long queue

of files to be migrated accumulates in the file systemof files to be migrated accumulates in the file system

Page 18: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 18HEPiX 2008, Geneve

Test A: input filesTest A: input files

Most of the files are Most of the files are of of 4 and 2 GiB size, 4 and 2 GiB size, with a bit of other with a bit of other sizes in additionsizes in addition

data files are LHCb data files are LHCb stripped DSTstripped DST

2477 files2477 files 8 TiB in total8 TiB in total

File size distribution

Page 19: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 19HEPiX 2008, Geneve

Test A: resultsTest A: resultsBlack curve: net data throughput from CERN to CNAF vs. time

Red curve: net data throughput from GPFS to TSM

FTS transfers were temporarily interrupted

Just two LTO-2 drives

A third LTO-2

drive was added

A drive was

removed

8 TiB in total were transferred to tape in 150k seconds (almost 2 days) from CERN

About 50 MiB/s to tape with two LTO-2 drives and 65 MiB/s with three LTO-2 drives

Zero tape migration failuresZero retrials

Page 20: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 20HEPiX 2008, Geneve

Test A: results (II)Test A: results (II)

Most of the files were Most of the files were migrated within less migrated within less than 3 hours with a than 3 hours with a tail up to 8 hourstail up to 8 hours The tail comes from The tail comes from

the fact that at some the fact that at some point the CERN-to-point the CERN-to-CNAF throughput CNAF throughput raised to 80 MiB/s, raised to 80 MiB/s, overcoming max overcoming max performance of tape performance of tape migration at that time. migration at that time. So, GPFS/TSM So, GPFS/TSM accumulated a queue accumulated a queue of files with respect to of files with respect to the FTS transfers the FTS transfers

Retention time on disk (time since file is written

until it is migrated to tape)

Page 21: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 21HEPiX 2008, Geneve

Test A: Test A: resultsresults (III) (III) The distribution peaks The distribution peaks

at about 33 MiB/s at about 33 MiB/s which is the maximum which is the maximum sustainable for LHCb sustainable for LHCb data files by the LTO-2 data files by the LTO-2 drivesdrives Due to compression the Due to compression the

actual performance actual performance depend on the content depend on the content of the files…of the files…

Tail is mostly due to Tail is mostly due to the fact that some of the fact that some of the tapes showed the tapes showed much smaller much smaller throughputsthroughputs For this test we reused old For this test we reused old

tapes no longer used by tapes no longer used by CastorCastor

Distribution of throughput per migration to tape

What is this secondary peak?

It is due to files which are written to the end of the tapes and the TSM splits them to a subsequent tape (i.e. must dismount and remount a new tape to continue writing the file)

Page 22: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 22HEPiX 2008, Geneve

IntermezzoIntermezzo

Between Test A and Test B we Between Test A and Test B we realized that the interface logics was realized that the interface logics was not perfectly balancing between the not perfectly balancing between the two HSM nodestwo HSM nodes

Then the logics of the interface has Then the logics of the interface has been slightly changed in order to been slightly changed in order to improve the performanceimprove the performance

Page 23: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 23HEPiX 2008, Geneve

Test B: resultsTest B: results File system prefilled File system prefilled

with 1000 files of 1 with 1000 files of 1 GiB size each all GiB size each all filled with zeroesfilled with zeroes migration to tape migration to tape

turned off while turned off while writing data to diskwriting data to disk

Migration to tape Migration to tape turned on when turned on when prefilling finishedprefilling finished

Hardware Hardware compression is very compression is very effective for such effective for such filesfiles

About 100 MiB/s About 100 MiB/s observed over 10k observed over 10k secondsseconds

What is this valley here?

Explained in the next slide where they are more visible

Net throughput to tape versus time

No tape migration failures

and no retrials observed

Page 24: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 24HEPiX 2008, Geneve

Test C: resultsTest C: results Similar to Test B, but with real Similar to Test B, but with real

LHCb data files taken from the LHCb data files taken from the same sample of Test A instead of same sample of Test A instead of zero’ed fileszero’ed files The valleys clearly visible here The valleys clearly visible here

have a period of exactly 4800 have a period of exactly 4800 secondsseconds

They were also partially present in They were also partially present in Test A, but not clearly visible in the Test A, but not clearly visible in the plot due to larger binningplot due to larger binning

The valleys are due to a tunable The valleys are due to a tunable feature of our interfacefeature of our interface

Each migration session is timed out Each migration session is timed out if not finished within 4800 secondsif not finished within 4800 seconds

After the timeout GPFS performs a After the timeout GPFS performs a new metadata scan and a new new metadata scan and a new migration session is initiatedmigration session is initiated

4800 seconds is not a magic 4800 seconds is not a magic number, could be larger or even number, could be larger or even infiniteinfinite

No tape migration failuresand no retrials observed

Net throughput to tape versus time

About 70 MiB/s on averagewith peaks up to 90 MiB/s

Page 25: Toward new HSM solution using GPFS/TSM/StoRM integration Vladimir Sapunenko (INFN, CNAF) Luca dell’Agnello (INFN, CNAF) Daniele Gregori (INFN, CNAF) Riccardo

07/05/2008 25HEPiX 2008, Geneve

ConclusionsConclusions and outlook and outlook

First phase of tests for T1D1 StoRM/GPFS/TSM-based First phase of tests for T1D1 StoRM/GPFS/TSM-based concludedconcluded LHCb is now starting the first production experience with such a LHCb is now starting the first production experience with such a

T1D1 systemT1D1 system

Work is ongoing for a T1D0 implementation in collaboration Work is ongoing for a T1D0 implementation in collaboration with IBM GPFS and TSM HSM development teamswith IBM GPFS and TSM HSM development teams T1D0 is more complicated since it should include active recalls T1D0 is more complicated since it should include active recalls

optimization, concurrence between migrations and recalls, etc.optimization, concurrence between migrations and recalls, etc. IBM will introduce efficient IBM will introduce efficient ordered recallsordered recalls features in the next features in the next

major release of TSMmajor release of TSM Waiting for that release, in the meanwhile we are implementing it Waiting for that release, in the meanwhile we are implementing it

through an intermediate layer of intelligence between GPFS and through an intermediate layer of intelligence between GPFS and TSM driven by StoRMTSM driven by StoRM

A first proof of principle prototype already exists, but this is A first proof of principle prototype already exists, but this is something to be discussed in a future talk… stay tuned!something to be discussed in a future talk… stay tuned!

New library recently acquired at CNAFNew library recently acquired at CNAF Once the new library will be online and old data files will be Once the new library will be online and old data files will be

repacked to the new one, the old library will be devoted entirely to repacked to the new one, the old library will be devoted entirely to TSM production systems and testbedsTSM production systems and testbeds

About 15 drives, much more realistic and interesting scale than 3 drivesAbout 15 drives, much more realistic and interesting scale than 3 drives