May 2008 John Mycroft – WAVV 2008
VSE/VSAM – Under the covers
John MycroftProduct Development Manager
CSI Internationalwww.csi-international.com
May 2008 John Mycroft – WAVV 2008
Acknowledgement
With grateful thanks to Dan Janda, The Swami of VSAM, from whom much of this presentation was stolenTo CSI for providing me with Data-Miner, CSI-Sort and a machine to create the examplesTo my fellow developers at CSI who put up with my hogging the machine for hours on end
May 2008 John Mycroft – WAVV 2008
Abstract
Overview of VSAM & its components. We take a look at what a VSAM file really looks like and how to soup up its performance.We also look at some common mistakes and how to avoid them.This presentation and its materials are copyrighted and developed by John Mycroft from a presentation originally copyrighted by Dan Janda. Permission is granted for WAVV to reproduce this presentation for distribution to its members at no charge.Trademarks:
IBM, VSE, VSE/ESA, zVSE, CICS & DL/I are trademarks or registered trademarks of the IBM CorporationThe Swami of VSAM is a trademark of Dan Janda.
May 2008 John Mycroft – WAVV 2008
VSE/VSAM Overview
Virtual Storage Access MethodFor disk files
Sequential – “Entry Sequence Dataset” or ESDSBegin at the beginning, go on til you get to the end and then stop
Indexed – “Keyed Sequence Dataset” or KSDSProcess by key or sequentially or a mixture
Direct – “Relative Record Dataset” or RRDS (fixed) or VRDS (variable)
Calculate a record’s location in the file to access itAlternate index (AIX) – gives an alternative route to a KSDS
Allows unique & non-unique keys
May 2008 John Mycroft – WAVV 2008
VSE/VSAM Functional areas
CatalogVolume & file informationUsage statistics
Disk space managementSpace allocation including secondary allocationsVSAM and VSAM/SAM filesSystem filesLibraries
May 2008 John Mycroft – WAVV 2008
VSE/VSAM Functional areas
IntegrityPerformance
Data transfer sizeBufferingBackup / restoreFile sharing between jobs and systems
May 2008 John Mycroft – WAVV 2008
Processing a VSAM fileSequentially (ESDS)
Forward or backward
Keyed access (KSDS)Direct by full or partial (generic) keySequentially, forward or backwardSkip sequential, forward or backward
Addressed access (RRDS, VRDS)Direct, by record addressSequential & skip sequential
Alternate Index AccessSame as keyed accessAlso direct access by non-unique key
May 2008 John Mycroft – WAVV 2008
How VSAM stores data
We’re going to look at How VSAM stores records logically on disk
Performance considerations
How VSAM physically stores data on diskDisk space usage calculationsOptimizing disk capacityPerformance considerations
VSAM jargonControl IntervalControl AreaCI & CA splitsFreespaceRDF, CIDF
May 2008 John Mycroft – WAVV 2008
VSAM Jargon
Control Interval (CI)“Smallest unit of data transfer between main & disk storage”
In other words, when you read a record, VSAM reads the whole CI that contains that recordThink of it as the same as a block of records in a sequential file if you like (though it’s laid out differently)
A CI can initially contain 1 or more recordsMore can be insertedSome or all can be deletedWhen you try to add a new record to a CI with no room, a “CI split” takes place – more about that later
May 2008 John Mycroft – WAVV 2008
Layout of a control interval
ALL VSAM FILES ARE VARIABLE LENGTHEven if all the records are the same sizeRec 1 – Rec n 1 to n logical records of any lengthFreespace Unused space in CI for inserting records or making existing records longerRDFs 3 byte record descriptor field
ESDS/KSDS 1 per LRECL, 1 for all consecutive records of same lengthRRDS one per numbered record slot
CIDF 4 byte Control Interval Descriptor Field
Rec 1 Rec 2 Rec 3 Rec … Freespace RDFs CIDF
May 2008 John Mycroft – WAVV 2008
Control Area (CA)
CA size is the smallest of :One cylinder orThe size of the primary allocationThe size of the secondary allocation
The number of CIs per CA depends on the device and the CI and CA sizesIt is generally a good idea to go for the biggest CA possible
A CA is a group of CIs. In a KSDS, all the data CIs in a CA are indexed by one index CI
CI 0 CI 1 CI 2 CI 3 CI 4 CI 5 CI 6 CI 7 CI 8 CI 9
CI10 CI11 CI12 CI13 CI14 CI15 CI16 CI17 CI18 CI19
CI20 CI21 CI22 CI23 CI24 CI25 CI26 CI27 CI28 CI29
May 2008 John Mycroft – WAVV 2008
Index Control Interval (Index CI)
CI 0 CI 1 CI 2 CI 3 CI 4 CI 5 CI 6 CI 7 CI 8 CI 9
CI10 CI11 CI12 CI13 CI14 CI15 CI16 CI17 CI18 CI19
CI20 CI21 CI22 CI23 CI24 CI25 CI26 CI27 CI28 CI29
A CI in an index containing pointers to The next level in the index orThe Data CI in the CA – this is referred
to as a Sequence Set CI
Index CI
May 2008 John Mycroft – WAVV 2008
Index and data structure
Balanced treeSparse index
Always just 1 high-level index CIThere can be 0 to many intermediate level index CIsThere can be one or more low-level (sequence set) index CIs.If there is only 1 sequence set CI, it is also the high-level index CI
May 2008 John Mycroft – WAVV 2008
And now the bit you’ve all been waiting for……
May 2008 John Mycroft – WAVV 2008
Performance rules of thumb
Use largest data CI possible, especially for sequential workUse as small an index CI as you can (but not too small!)Use large data CA – allocate primary and secondary as at least 1 cylinderAvoid too many extents / allocations
May 2008 John Mycroft – WAVV 2008
Allocation calculations
CI freespace = CI Size * Freespace %
Number of records per CI“Fixed” length:
(CI Size -10 –Freespace) / LRECL
Variable length:(CI Size -7 –Freespace) / (Average LRECL +3)
May 2008 John Mycroft – WAVV 2008
What’s in a CI?
Data and control info (end of CI)
May 2008 John Mycroft – WAVV 2008
CI control information
At the end of each data CI
May 2008 John Mycroft – WAVV 2008
Data records
May 2008 John Mycroft – WAVV 2008
The CIDF
Note – (back 2 slides) free space has data in it from earlier CI split
May 2008 John Mycroft – WAVV 2008
The Index
May 2008 John Mycroft – WAVV 2008
Allocation calculations
Calculate Freespace in each CAGet number of CIs per CA from LISTCAT or device characteristics (3390, 12 x 4K CIs/track, 180/cyl)CA freespace = No of CIs per CA * CA Freespace %, rounded upNumber of CIs loaded per CA =
CIs per CA – CA freespace
Number of records loaded per CA = Loaded CIs in CA * No of recs in CI
May 2008 John Mycroft – WAVV 2008
VSAM Catalogs
Exactly one master catalogAssigned at IPL with DEF CAT orDEFINE MCAT IDCAMS command
User catalogs – 0 to manyNo more than 1 per volumeCatalog can own multiple spaces on a volumeMany catalogs can own space on a volume
May 2008 John Mycroft – WAVV 2008
VSAM Catalogs
Catalog contains :-Self-describing recordsUser catalog pointersVolume definitionsSpace definitionsCluster (file) definitionsComponent (data, index) definitionsAIX & Path definitions
May 2008 John Mycroft – WAVV 2008
Catalog recommendations
Use naming conventionsName Cluster, Data and Index components explicitlyUse partition / system independent names where applicable
SeparateFiles seldom defined or deletedFiles often defined or deletedOnline critical filesBatch files
Multiple baskets – all the eggs won’t get broken
May 2008 John Mycroft – WAVV 2008
More recommendations
Don’t use recoverable catalogsHangover from 2314 / 3330
Backup is vastly betterIDCAMS, Faver, Maxback, Dr D, user-written …
May 2008 John Mycroft – WAVV 2008
CI & CA splits and freespace
You try to insert a record in a CI or extend a record already thereIf there is enough free space in the CI, everyone moves up, record is inserted and CI rewrittenBUT what if there isn’t enough free space????
May 2008 John Mycroft – WAVV 2008
CI & CA splits
CI split – 4 physical IOsSet “Split in progress”, write CIMove half of records to new CI & write itUpdate sequence set, write index CIErase moved records from old CI, turn off “Split in progress”, write old CI
BUT…..
May 2008 John Mycroft – WAVV 2008
Failure in CI split
System failureCorrected next time CI is updated
No free CI in the CACA split is needed
Remember – 1 physical IO = 30,000 – 40,000 CPU instructions…
May 2008 John Mycroft – WAVV 2008
CA Split
MANY physical reads and writesSet “Split in progress”, write sequence set CIMaybe get new extentFormat new CA at HURBA positionRead / write half of CIs to new CAWrite new sequence set CI for new CAUpdate higher level index CIsErase moved CIs from old CA, write empty CIsWrite updated original sequence set CI
May 2008 John Mycroft – WAVV 2008
Recommendations
Don’t worry about CI splitsAvoid excessive CA splits by defining CA freespaceDon’t do a reorg just because you have done n CI / CA splits
May 2008 John Mycroft – WAVV 2008
To reorg or not to reorg?
“We’ve done 1000 CA splits – better reorg!”Inserts tend to be clusteredCI / CA split creates freespace where it is needed, allows faster insertsReorg gets rid of freespace, causing more CI / CA splits
May 2008 John Mycroft – WAVV 2008
My house
Buy a 3 bedroom houseHave 2 kidsMa-in-law moves in – add a roomMa-in-law moves out – demolish room Have another kid - Add a bedroomOldest kid goes to college – demolish bedroomOldest kid brings home girlfriend……
May 2008 John Mycroft – WAVV 2008
My KSDS
Get some spaceInsert records causing CI splitsREORG!!Delete some records, freeing spaceREORG!!!Add records, causing CA splitsREORG!!!!
May 2008 John Mycroft – WAVV 2008
Recommendations
Avoid frequent reorgsOnce a split has occurred, the processing cost has been paidDon’t reorg to compress out free space
May 2008 John Mycroft – WAVV 2008
Reorgs
Understand your application1 “hot spot”
Little distributed freespace – let it splitMany hot spots
Little distributed freespace – let it splitEven distribution – no hot spots
Use distributed freespace
May 2008 John Mycroft – WAVV 2008
Freespace
•3% of each CI is empty
•5% of CIs in each CA are empty
•3% of 2048 = 61 bytes = 0 records (or, at most, 1)
•5% of 315 CIs per CA = 16 CIs
May 2008 John Mycroft – WAVV 2008
Freespace
3% CI freespace where CISZ=2048 and average LRECL=120
No room in this CI for an average length record
May 2008 John Mycroft – WAVV 2008
Altering freespace
Initial freespace set via DEFINE eg 10% of CI and 5% of CAIf inserts are clustered, consider
DEFINE with 0% freespace, thenLoad the “fixed” part of the file then ALTER freespace to non-zeroLoad the “variable” part of the file
May 2008 John Mycroft – WAVV 2008
Freespace ain’t free space
Freespace is empty, not usedYou still have to pay IBM for it
May 2008 John Mycroft – WAVV 2008
Strings
VSAM allows multiple concurrent processing e.g.
CICS transactionsBrowsingUpdatingPlaceholders (“strings”) hold file location info
May 2008 John Mycroft – WAVV 2008
Shared / non-shared resources
Non-shared resources (NSR)Each string has its own buffersMultiple copies of a CI may be in memoryWorks well for batch
Local Shared Resources (LSR)Many strings share a pool of buffersOnly 1 copy of a CI in the poolIdeal for online
May 2008 John Mycroft – WAVV 2008
Recommendations - NSR
Non-shared resourcesEach string must have enough index buffers
Bad – 1 buffer (old default)OK – 1 buffer per index level (new default)Good – enough buffers for all high level indexes + 1 moreBest – enough buffers to hold entire index
May 2008 John Mycroft – WAVV 2008
Recommendations - LSR
Local Shared Resource buffersSame index buffer needs as NSR (buffers are per pool, not per string)Monitor VSAM LSR stats to make sure BUFNI keeps up with index growthMonitor data buffers for high hit rates
May 2008 John Mycroft – WAVV 2008
IO with NSR
VSAM uses chained IO to read ahead and write behind
Better to read many CIs in one IO Block big
Large CI sizesBe aware that VSAM will split CIs into smaller blocks to save space
Eg 3390 with 32K CI gets written as 2 x 16K blocks giving 1.5 CIs = 48K/track
Buffer big½ to 1 cyl of BUFND to minimize IO
May 2008 John Mycroft – WAVV 2008
IO with LSR
VSAM reads 1 CI at a time, even for sequential processing
May 2008 John Mycroft – WAVV 2008
Monitor your stats
LISTCAT before and after critical jobData & Index EXCPs – the fewer the better. Index EXCPs should be close to number of index CIs.Job Accounting data
IO count by deviceOveral CPU & IO activity
CICS statsShows logical / physical IO counts by fileLSR pool hits and misses
VSAM buffer stats – in VSE/ESA examples docLSR is in 31 bit – use LOTS but don’t page
May 2008 John Mycroft – WAVV 2008
Sharing VSAM datasets
VSAM can share files among partitionsAnd among VSE systems
BUTTANSTAAFL (Robert Heinlein)Sharing is not a performance option (Dan Janda)It’s your gun and your foot (Steve Huggins)
May 2008 John Mycroft – WAVV 2008
Sharing VSAM datasets
Sharing is based onThe type of sharing you ask for (SHAREOPTIONS)VSE Lock Table within a single VSE systemVSE Lock File when sharing across VSE systems
VSE sharing mechanism is not compatible with zOS or zVM
May 2008 John Mycroft – WAVV 2008
Sharing VSAM datasets
Sharing at OPEN / CLOSE timeEntries checked and placed in / removed from lock tableIf DASD volume is added as shared (ADD cuu,SHR), it is added to lock file
VSE & VSAM allow concurrent processing to protect against concurrent updates messing up the file
May 2008 John Mycroft – WAVV 2008
Sharing VSAM datasets
Integrity classes – your choiceNO INTEGRITY – VSE & VSAM provide no data protection: it’s all up to you. Your data can be messed up.WRITE INTEGRITY – VSE & VSAM protect against concurrent updatesREAD INTEGRITY – VSE & VSAM make sure your programs always see the latest version of a record
The priceHigher levels & broader scopes of integrity lead to more CPU and IO activity
May 2008 John Mycroft – WAVV 2008
SHAREOPTIONS
Ready – Fire – AimSet in DEFINE CLUSTERGet it wrong & be prepared to sufferIf a disk drive isn’t shared between VSEs, don’t ADD it with SHR as this causes lock file IO
May 2008 John Mycroft – WAVV 2008
SHAREOPTIONS & Locking
SHR(1) 1 output OR many input External lock at OPEN, unlock at CLOSE
SHR(2) 1 output AND many inputExternal lock at OPEN, unlock at CLOSE
SHR(3) No checking or locking Prepare for garbage data
SHR(4) Many output in one VSE & many input OPENs across all VSEs
External lock at OPEN, unlock at CLOSEExternal lock at access, unlock at release
SHR(4 4) Many output OPENs across all VSEs + many input OPENs
Locks same as SHR(4)
May 2008 John Mycroft – WAVV 2008
Alternate indexes (AIX)
An AIX is a VSAM KSDS, acting as a “pointer file” for another file
Target file (“Base Cluster”) can beKSDS – pointers are KSDS key valuesESDS – pointers are Relative Byte Addrs
Great for multiple or non-unique keysBUT
Processing via an AIX needs IO to both the AIX and to the base cluster
May 2008 John Mycroft – WAVV 2008
Setting up an AIX
DEFINE CLUSTER for base clusterDEFINE AIX for the alternate index
Give base cluster’s name & alternate keyData & Index CI sizes
DEFINE PATHAllows specifying of NOUPGRADE paths
BLDINDEXReads primary & alternate key info from base clusterSorts into alternate key sequenceLoads alternate index
May 2008 John Mycroft – WAVV 2008
AIX recommendations
To process the base cluster in AIX order, it is better to sort it and use the SORTOUT fileRemember VSAM processes base clusters directly based on AIX valuesBase cluster will need lots of index buffers for batch processing. Give Base cluster large BUFFERSPACE on DEFINE or ALTER
May 2008 John Mycroft – WAVV 2008
AIX and CICS
“SPHERE” – a base cluster and all its AIXs related to itRequirements
Each sphere must be wholly within one LSR poolUse Dataset Name Sharing
In CICS 2.3, add BASE= to FCT entry forBase cluster file entryEach related path file entry
This is automatic in CICS TSSHR(2) is usually bestMake sure your CICS and VSAM service is current!
May 2008 John Mycroft – WAVV 2008
•MYTH 1 - RECOVERY is a good option for a dataset
•Oh yeah? RECOVERY makes it possible for you to write a recovery routine to restart loading.
•COPY 50,000 record KSDS-
•SPEED = 6 secs, 1512 I/Os
•RECOVERY = 10 secs, 1925 I/Os
•BUSTED!!!
May 2008 John Mycroft – WAVV 2008
•MYTH 2 – No need to sort before loading KSDS
•Load 100,000 record KSDS with Data-Miner COPY
•Elapsed = 7:11,CPU = 51”, EXCP = 294412, CIsplit = 2011, CAsplit = 63
•Sort to KSDS with CSI-Sort
•Elapsed = 0:27,CPU = 6”, EXCP = 4314,
CIsplit = 0 ,CAsplit = 0 BUSTED!!!
May 2008 John Mycroft – WAVV 2008
And now the most burning question of the day……
How do you delete an unwanted slide from a Power Point presentation?
May 2008 John Mycroft – WAVV 2008
Contacting the presenter
You can contact me by email at [email protected], if you want to find me this evening…
May 2008 John Mycroft – WAVV 2008
You’ll find me here