Upload
logan-griffith
View
213
Download
0
Embed Size (px)
Citation preview
NUG Meeting1
File and Data ConversionFile and Data Conversion
Jonathan CarterNERSC User Services [email protected]
510-486-7514
NUG Meeting2
IntroductionIntroduction
Converting file and data for use on the IBM SP IBM uses IEEE data representation Industry standard Fortran unformatted file
structure
Tools available on the Cray systems
Tools available on the IBM SP
NUG Meeting3
Demand for File ConversionDemand for File Conversion
Currently, CTSS text filesctou, rlib will be available on the IBM SP
After decommissioning the Cray Systems in October 2002Cray Fortran unformatted filesCray C binary files
NUG Meeting4
Tools on the Cray Systems - Tools on the Cray Systems - FFIOFFIO
Flexible File I/O - general system of specifying how data should be written or readCan be used without recompiling or linking
(Fortran)Can be changed at runtimeVarious layers available to convert both file
structure and dataControlled via the assign command
NUG Meeting5
assign Commandassign Command
Can specify how I/O is doneOn a Fortran unit basis: assign –F f77 u:10
On a filename basis: assign –F f77 f:filename
Common optionsClear assigns: assign -R See current assigns in effect: assign -V
NUG Meeting6
Fortran Unformatted Fortran Unformatted Sequential-access FilesSequential-access Files
Cray uses a vendor specific format called COS blocked, or simply blocked
IBM (and most Unix vendors) use f77 blocking Use –F f77 option to have the FFIO f77 blocking
layer used instead of the default COS blocking:assign –F f77 u:10
T3E already uses IEEE arithmetic, so –F f77 is sufficientNote that default real and integer data types on
the T3E are 64 bit SV1 data needs to be converted, so an IEEE
conversion layer is needed-N ieee performs basic conversionassign –F f77 -N ieee f:filename
NUG Meeting7
Fortran Unformatted Direct-Fortran Unformatted Direct-access Filesaccess Files
Files are not blocked on Cray or IBM
Data conversion layers can be used as in sequential-access files for the SV1 machines
assign -N ieee u:20
T3E files don’t need any conversion
NUG Meeting8
C Binary FilesC Binary Files
Files are not blocked on Cray or IBM
FFIO conversion layer not easy to use
Use library routines such as cry2cri
NUG Meeting9
Using FFIO to Convert a FileUsing FFIO to Convert a File
Isolate I/O statements for the file from program to make a simple conversion program
Pair each read with a write
Use assign to have all written data converted, or use data conversion routines
NUG Meeting10
Tools on the IBM SP - NCARU Tools on the IBM SP - NCARU LibraryLibrary
Library developed by the SCD at NCARRead COS blocked fileConvert Cray data to IEEE data
Does not use Fortran API, so program modification is requiredBasic calls are crayopen, crayread, crayrew, crayback, crayclose
Calls to crayread can convert data if record is composed of one data type only, otherwise user must handle explicitlyConversion routines are ctodpf, ctospf, ctospi
Cray Fortran I/O sometimes inserts padding, user must handle explicitly
NUG Meeting11
Using the NCARU LibraryUsing the NCARU Library
To use:
module load ncaru
xlf -o a.out b.f $NCARU
Limitations2GB limit for unblocked filesCurrently no 64 bit address space supportNot thread-safeNo support for 128 bit data
NUG Meeting12
Dealing with Different FilesDealing with Different Files
Open using blocked option to crayopen for Fortran unformatted sequential access, open with unblocked option for Fortran unformatted direct access
If written on the SV1 use conversion option on read, or call conversion routines directly
C binary files can be read by the unblocked I/O calls or by usual C I/O followed by data conversion routines
NUG Meeting13
Records with Mixed Data Records with Mixed Data TypesTypes
Read into a buffer and convert items one by onereal x(50)integer n(50)real*8 buffer(100)
! open in blocked modeifc = crayopen(‘filename’,10,0)! read record without convertingnwds = crayread(ifc,buffer,100,0)! convert datacall ctospf(buffer,x,50)call ctospi(buffer(51),n,50)
NUG Meeting14
Data PaddingData Padding
With Cray Fortran I/O, extra bytes are inserted into the user data.
In cases where padding occurs, bytes are inserted so that any datum of length 8 bytes is at a byte offset, which is measured from the beginning of the record, that is a multiple of 8 bytes. Then the end of the record is padded so that the whole record length is a multiple of 8.
Padding will only occur if you have used character variables that are not of lengths that are a multiple of 8 or have used real*4 or integer*4 data on the T3E (on the SV1 systems, 8 bytes are used).
NUG Meeting15
ExampleExample
A Fortran record is written on an SV1:
real a(50)
integer n(50)
character*17 label
write(50) n, a, label
The lengths of n, a, and label are 8 bytes, 8 bytes, and 17 bytes respectively. Within the Fortran record, n starts at offset 0, a at offset 400, and label at offset 800. The only padding that occurs is at the end of the record, where 7 bytes are added to make the total record length 816 bytes, which is a multiple of 8.
NUG Meeting16
ExampleExample
A Fortran record is written on an SV1:
real a(50)
integer n(50)
character*17 label
write(50) label, n, a
Without padding, the alignments are label at offset 0, a at offset 17, and n at offset 417. Since a has elements of length 8 bytes, it must be written at an offset that is a multiple of 8 bytes; therefore a pad of 7 bytes is inserted between the end of label and the beginning of a. In the record that is written to the file, the alignments are label at offset 0, a at offset 24, and n at offset 424.
NUG Meeting17
ExampleExample
A Fortran record is written on the T3E:
real a(40), b(40)integer*4 n(13), m(13)character*12 label
write(50) label, n, a, m, b
The data has lengths: label 12 bytes, n and m 52 bytes, and a and b both 320 bytes. Without padding, the alignments are label at offset 0, n at offset 12, a at offset 64, m at offset 384, and b at offset 436. a and b need to be at offsets that are a multiple of 8 bytes; the offset of a is already correct, but 4 bytes must be inserted before b, so that it starts at offset 440.
NUG Meeting18
crayconv Utilitycrayconv Utility
crayconv automatically converts files written on the SV1 to IBM compatible formatBasic Fortran data types onlySequential access unformatted files onlyPossible problem if compiler option -Onofastint used, or integer*8 explicitly declared and written-- Integers over 246 not correctly interpreted
Pad data not removedExtension to T3E data and direct access
unformatted files planned
NUG Meeting19
More InformationMore Information
http://hpcf.nersc.gov/computers/SP/ffio.html -by Mike Stewart
http://hpcf.nersc.gov/computers/crayretire.html
man ncaru
NUG Meeting20