45
January 19, 2005 SGI® Altix™ Application I/O Reiner Vogelsang SGI GmbH [email protected]

SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005

SGI® Altix™Application I/O

Reiner VogelsangSGI GmbH

[email protected]

Page 2: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 2| |

Module Objectives

After you complete this module, you will be able to:–Profile I/O from an application–Use lsof command to see what files are open–Use strace to determine I/O characteristics–List common I/O system calls–Determine what types of I/O an application uses–Explain the advantages of direct I/O or buffered I/O

–Determine default library buffer sizes–Use FFIO to modify application I/O–Use MPI-I/O essentials to modify application I/O

Page 3: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 3| |

Characterize Application I/O

–Read vs write ratio–Transfer size–Positioning Sequential vs Random–Buffered vs Direct–sync, async–Formatted vs unformatted–Memory mapped mmap(2)–read-write-write–Bandwidth vs IOPs vs metadata–fsync–Parallel I/O

Page 4: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 4| |

lsof Report

linux% /usr/sbin/lsof | fgrep mpi_IO

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME

.

.

mpi_IO 18066 reiner 0u CHR 136,0 2 /dev/pts/0

mpi_IO 18066 reiner 1w FIFO 0,9 30140 pipe

mpi_IO 18066 reiner 2w FIFO 0,9 30141 pipe

mpi_IO 18066 reiner 3u CHR 136,0 2 /dev/pts/0

mpi_IO 18066 reiner 4u CHR 136,0 2 /dev/pts/0

mpi_IO 18066 reiner 5w REG 8,19 0 436941586 /tmp/.arraysvcs/errlog0a900000425d04b3

mpi_IO 18066 reiner 6u CHR 10,59 134751937 /dev/xpmem

mpi_IO 18066 reiner 7u IPv4 30107 TCP dcm27.munich.sgi.com:32794->dcm27.munich.sgi.com:32792 (ESTABLISHED)

mpi_IO 18066 reiner 8u REG 252,0 17179869184 268438051 /mnt/fcscratch/reiner/matrix_8.dat

mpi_IO 18066 reiner 9u REG 252,0 17179869184 268438051 /mnt/fcscratch

.

.

.

Page 5: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 5| |

I/O System Calls

–open Open a file and return the file desciptor–read Read n bytes into user memory from a file–write Write n bytes to a file from user memory –lseek Position file offset pointer n bytes into a file –pread Read n bytes into user seek location–pwrite Write n bytes to a file from user seek location–readv Read n bytes into user buffer vector from a file–writev Write n bytes to a file from user buffer vector–close Close the file and release pointers like fildes–fcntl Control file and file descriptor attributes–ioctl–mmap Map a file into memory but handle like a file

Page 6: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 6| |

I/O Hierarchy

Diskcache

Text

Stack

Library

Librarybuffer

Dataarray

a.out

biosize

User space                                   Kernel space

System  Calls

FilesystemBufferCache

Disk

Page 7: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 7| |

Buffered I/O

–Default C I/O library–Goes through file system buffer cache

• Read ahead could be bad for small random I/O–Slowest bandwidth I/O, Good for latency of small IOPs

• bcopy is bandwidth restrictive routine• bcopy used for memory to memory transfers

–Delayed write data written to disk by bdflush, pagebufd, xfsbuf• Lazy writes having kernel chunk data to be ready for disk• Kernel charged for the I/O, program could have exited• Sectorizes data presented to disk

Page 8: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 8| |

Direct I/O

–Direct I/O is better bandwidth than buffered I/O–Bypasses the kernel filesystem buffer cache

•Still uses filesystem control meta-data–Direct DMA access to/from disks from/to user space–Very low CPU utilization, high bandwidth

•Often combined with asynchronous I/O–XFS Filesystems only

•open (path, O_DIRECT)•ioctl checks instead of IRIX's way with an fcntl call

»ioctl(fd, XFS_IOC_DIOINFO, &dioinfo)–User alignment requirements

• Filesystem block size (mkfs -b) whole multiple

Page 9: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 9| |

Direct I/O Alignment

Page 10: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 10| |

Sequential I/O

–Uses C stdio buffering, 16KB library buffer by default• Changeable with library call setbuf(3)

–Each write has length stored before-after data write–Buffer characteristics same as those with stdio for C–One user library buffer per logical unit

• Unless readv or writev which is a multiple linked list of buffers

Page 11: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 11| |

Direct Access Random I/O

•Record referencing WRITE(rec=,unit)•Doesn’t use C stdio•Random I/O with lseeks to a position in a file

–Could be sequential rewinds or appends, watch the position–Located in memory of CPU that created it

Page 12: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 12| |

Formatted Vs Unformatted

•Formatted–Human readable–Requires additional CPU time–Goes through file system buffer cache to sectorize

•Unformatted–Less CPU overhead–post processed

Page 13: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 13| |

C Unformatted I/O

•Default user library buffer is 16KB–Can be changed with setbuf(3)

•Writes store up to 16KB before system call to write•Read system call size is 16KB

Page 14: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 14| |

FORTRAN Unformatted I/O

•Direct access and Sequential behave differently–open(form=‘unformatted’,access=‘direct’) Fortran buffering–open(form=‘unformatted’) uses std C IO

•Not OPENMP safe–Places lock on logical unit–Multiple threads can write to multiple files connected to different

logical units in parallel–Multiple threads can safely use the same logical unit–Multiple threads cannot safely write to different logical unit

associated with the same file

Page 15: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 15| |

Synchronous I/O

•read(2), write(2) are synchronous calls–Process goes to sleep until IO is done–Good if there is nothing to do until the I/O is done–writes ie delay write buffering is truly asynchronous

Page 16: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 16| |

Asynchronous I/O

•POSIX 1003.1b•Database (DBM) improves transaction performance•FFIO also uses async I/O•See info libc -> low level I/O – Asynchronous•Library Support

–aio_read Asynchronous I/O reads–aio_write Asynchronous I/O writes–lio_listio Queue arbitrary list of I/O requests

•XFS kernel support with SGI 2.6 Kernel

Page 17: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 17| |

Foreign Dataset Conversion

•IRIX is Big Endian•Linux/Intel is Little Endian•Foreign Datasets, see ifort man page•Intel 8.0 Compilers have a -convert option

–big_endian–little_endian–cray–Vax - fdx, fgx, vaxd, vaxg–ibm

Page 18: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 18| |

Flexable File IO Layering

•SGI ProPack contains the FFIO package•Allows I/O attribute modification from command line. •Carried over from Cray UNICOS and IRIX•C programs must use ffopen, ffread, ffwrite to recognize assign attributes

•See Application Programmer I/O Guide for details•NO INTEL FORTRAN I/O LIBRARY SUPPORT

–However, you can use the --wrap loader option to overwrite referencesto open,read,write,lseek

•NO FOREIGN DATASET CONVERSION

Page 19: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 19| |

C FFIO Example

#include 'ffio.h'#include <fcntl.h>#include <unistd.h>#define open (n,o,p) ffopen(n,o,p)#define close (f) ffclose (f)#define read (f,b,l) ffread (f,b,l)#define write (f,b,l) ffwrite (f,b,l)#define lseek (f,o,w) ffseek (f,o,w)

main(){ int fd, ret ; char *data_ptr = "abcd" ; fd = open("file.dat", O_RDWR|O_CREAT , 0640 ) ; ret = lseek( fd , 3001 , SEEK_SET ) ; ret = write( fd , data_ptr , 4) ; close(fd);}

Page 20: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 20| |

FF_IO Environment Variables

•FF_IO_LOGFILE Profile statistics•FF_IO_OPEN_DIAGS open diagnostics•FF_IO_OPTS Verbose configure FFIO•FF_IO_DEFAULTS Short configure FFIO•FF_IO_AIO_THREADS Num of IO threads•FF_IO_AIO_LOCKS Num of locks•FF_IO_AIO_NUMUSERS Num of users•FF_IO_TRACE_FILE Trace file•FF_IO_RECOVER_CMD I/O Recovery•FF_IO_FILESIZEESTIMATE Preallocate (Deferred)

Page 21: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 21| |

eie Layer

•direct Layer style•diag | no diag Diagnostics•wb | nowb | hldwb Write behind, hold WB•save | scr Save or remove file•rls | norls Release or norelease FD•bpons | nobpons Bypass•pagesize Page size•numpages Num of pages•max_lead Pages read-ahead•share Private or shared•stride Page stride (1 is default)•alloc Prealloc (Deferred)

Page 22: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 22| |

event Layer

•trace | notrace Trace I/O•rtc | cpc RTC or CPU clock•diag | nodiag | brief | summary Report verbosity

Page 23: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 23| |

FFIO Compile ExampleOld approach:•Fortran•ifort -Wl,-u _ffopen -Wl,--wrap open -Wl,--wrap open64 -Wl,--wrap lseek64 -Wl,--wrap lseek -Wl,--wrap read -Wl,--wrap write -Wl,--wrap close test.f ./libeag_ffio.a ./libffio.a -lrt

•Read man page ld(1) with respect to --wrap •C/C++•icc -D_LITTLE_ENDIAN -g \ -o nastbio -Wl,-u _ffopen \ -leag_ffio -lffio -lrt nastbio.c

New approach:•Fortranifort test.f•C/C++icc -D_LITTLE_ENDIAN -g nastbio.csetenv LD_PRELOAD /usr/local/lib/libFFIO.so./a.out.....

Page 24: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 24| |

FFIO Examples

limit stacksize 655360

setenv FF_IO_OPTS '*.SCR*(eie.direct.diag.mbytes:1024:256:2:1:1:0,event.summary.mbytes.notrace)'

nast2001 jid=test mem=200m ...

export FF_IO_OPTS='cachea.mem:256:32:2:1,event.summary'

nastbio test.bio

Page 25: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 25| |

FFIO eie Layer Numericssetenv FF_IO_OPTS '*.SCR*(eie.direct:1024:256:2:1:1:0)'

page_size: unit is 4k pages

num_page

max_lead: Number of pages read ahead.

share: 0: cache is private,

1: shared by a couple of files

stride

alloc: Cache requests alloc pages from the kernel if writes from the cache extend file. (On Altix deferred.)

Page 26: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 26| |

Event Layer Reportevent_close(SCRATCH16698 ) eie <-->syscall ( 39 mbytes)/( 0.12 s)= 315.29 mbytes/s

oflags=0x0000000000004242=RDWR+CREAT+TRUNC+DIRECT

sector size =4096(bytes)

cblks =0 cbits =0x0000000000000000

current file size =21 mbytes high water file size =21 mbytes

function times wall all mbytes mbytes min max avg ill

called time hidden requested delivered request request request formed

open 1 0.00

seek 5 0.00

writea 8 0.00 0 39 39 1 15 5 0

fcntl

recall

writea 8 0.12

other 6 0.00

flush 1 0.00

close 1 0.00

extends 4

Page 27: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 27| |

FFIO Performance Example

•NASTRAN jobs under tied memory conditions–4 serial jobs on 4 CPUS and 64 GB Memory. Each job needs ~16 GB.

•FFIO

•Linux I/O Buffer Cache

job1:13615.13user 290.99system 5:24:32elapsed 71%CPU

job2:13576.73user 301.75system 5:14:24elapsed 73%CPU

job3:13576.88user 214.44system 4:59:04elapsed 76%CPU

job4:13562.33user 215.55system 5:00:03elapsed 76%CPU

job1:10658.47user 4699.10system 7:11:44elapsed 59%CPU

job2:10519.47user 4728.90system 7:05:25elapsed 59%CPU

job3:10460.77user 4303.55system 6:45:08elapsed 60%CPU

job4:10460.63user 4317.87system 6:45:38elapsed 60%CPU

Page 28: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 28| |

FFIO, When should I Use It ?

•You know the data access patterns of your application

•You are running your application under high memory pressure–Memory pages of the buffer cache are recycled for the memory demands

of the application

• Data sets are exceeding the memory available in your current working set.

•Libraries like libnetcdf can cause ill-conditioned access patterns

–Slicing and Concatenating of NetCDF files are performance hogs.

–NCO tools can be significantly enhanced using the cache layer of FFIO.

Page 29: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 29| |

MPI-I/O

•Implemented on top of all I/O schemes discussed so fare:–synchronous–asynchronous

•Performs additional scheduling of tasks on collective I/O operations like reading/writing from/to one file.

•Expect MPI-I/O files to be not portable between different hosts

and MPI-I/O implementations.

Page 30: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 30| |

MPI-I/O

•I/O was not part of MPI-1 –Asynchronous I/O only by means of different MPI tasks or AIO with

threads

•MPI-2 provided an I/O interface implemented on top of the methods discussed

–Asynchronous and synchronous I/O possible

–Additional buffer management and locking mechanism to allow for collective operation on ONE file.

–MPI hints describe• File access methods and file system layout like stripe size and stripe unit• Array layout and sizes (if not given as a derived type)• Configuration of MPI internal buffer caches for data sieving

•Implementation on Altix based on ROMIO–http://www-unix.mcs.anl.gov/romio/

–http://www-unix.mcs.anl.gov/~thakur/papers/romio-coll.pdf

Page 31: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 31| |

MPI-I/O Essentials

•File handle manipulation–MPI_FILE_OPEN(comm, filename, amode, info, fh, ierror)

•Synchronous–MPI_FILE_WRITE(fh, buf, count, datatype, status, ierror)

–MPI_FILE_READ(fh, buf, count, datatype, status, ierror)

•Asynchronous–MPI_FILE_IWRITE(fh, buf, count, datatype, request, ierror)

–MPI_FILE_IREAD(fh, buf, count, datatype,request, ierror)

–MPI_WAIT(request, status(MPI_STATUS_SIZE),ierror)

•Declaring access patterns, file system specs, etc–MPI_FILE_SET_VIEW(fh, disp, etype, filetype,datarep, info, ierror)

–MPI_Info_create(info), MPI_Info_set(info, key, value),MPI_INFO_NULL

Page 32: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 32| |

MPI-I/O And Direct I/O

•You can turn on globally direct I/O via the environment:–setenv MPI_DIRECT_READ true

–setenv MPI_DIRECT_WRITE true

•Don't mix different I/O mechanism during a run !!!

Page 33: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 33| |

MPI I/O And Data Sieving

Reserved File Hints:cb_block_sizecb_buffer_sizecb_nodescollective_buffering

Page 34: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 34| |

MPI-IO Tuning: Step 0

•Many independent, contiguous requests– No access information available to MPI system at runtime

MPI_File_open (MPI_COMM_SELF,'filename', ..., &fh);

for (i=0; i<n_rows; i++) {

MPI_File_seek (fh,....)

MPI_File_read(fh,row[i],...)

}

Page 35: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 35| |

MPI-IO Tuning: Step 1

•Many collective, contiguous requests– MPI implementation expects to see same access patterns at multiple

sites

–Can lead to good read-ahead, prefetching descisions when implementation sees patterns repeat at different processors

MPI_Info_Set (&info, collectiv_buffering,'true');

MPI_File_open (MPI_COMM_WORLD,'filename',...,&info, &fh);

for (i=0; i<n_rows; i++) {

MPI_File_seek (fh,....)

MPI_File_read(fh,row[i],...)

}

Page 36: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 36| |

MPI-IO Tuning: Step 2

•Single independent, non-contiguous requests– Data sieving can be used.

–Based on an application defined data type

MPI_Type_create_subarray (..., &subarray, ...);

MPI_Type_commit(&subarray);

MPI_File_open (MPI_COMM_SELF,'filename',..., &fh);

MPI_File_set_view(fh,...,&subarray,...);

MPI_File_read(fh,local_array,...);

Page 37: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 37| |

MPI-IO Tuning: Step 3

•Single single collective, non-contiguous requests– Data sieving can be used.

– Collective I/O can be used

MPI_Type_create_subarray (..., &subarray, ...);

MPI_Type_commit(&subarray);

MPI_File_open (MPI_COMM_WORLD,'filename',..., &fh);

MPI_File_set_view(fh,...,&subarray,...);

MPI_File_read(fh,local_array,...);

Page 38: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 38| |

MPI-IO Performance

MPI-IO Benchmark

================

16384 MBytes written/read in 32 blocks a 64 MBytes

by 8 Processes to /fastfs/reiner

using the min. time of 5 repetitions

WRITE: Time 16.211s => 1010.675 MBytes/s

READ: Time 9.789s => 1673.723 MBytes/s

Page 39: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 39| |

MPI-I/O Hints for Performance sprintf(key,"striping_unit");

sprintf(value,"524288");

MPI_Info_set(myinfo, key,value);

sprintf(key,"striping_factor");

sprintf(value,"16");

MPI_Info_set(myinfo, key,value);

sprintf(key,"collective_buffering");

sprintf(value,"true");

MPI_Info_set(myinfo, key,value);

sprintf(key,"cb_block_size");

sprintf(value,"131072");

MPI_Info_set(myinfo, key,value);

sprintf(key,"cb_buffer_size");

sprintf(value,"1048576");

MPI_Info_set(myinfo, key,value);

sprintf(key,"cb_nodes");

sprintf(value,"8");

MPI_Info_set(myinfo, key,value);

}

main(){

MPI_Barrier(MPI_COMM_WORLD);

.

MPI_Info_create(&myinfo);

set_info(myinfo);

...

}

void set_info(MPI_Info myinfo)

{ char key[80],value[80];

Page 40: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 40| |

strace Commandlinux# strace cp install.log /tmpexecve("/bin/cp", ["cp", "install.log", "/tmp"], [/* 22 vars */]) = 0uname({sys="Linux", node="pc-daw", ...}) = 0brk(0) = 0x8054ea8open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory)open("/etc/ld.so.cache", O_RDONLY) = 3fstat64(3, {st_mode=S_IFREG|0644, st_size=73450, ...}) = 0old_mmap(NULL, 73450, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40013000close(3) = 0...snip.....open("install.log", O_RDONLY|O_LARGEFILE) = 3fstat64(3, {st_mode=S_IFREG|0644, st_size=23685, ...}) = 0open("/tmp/install.log", O_WRONLY|O_CREAT|O_LARGEFILE, 0100644) = 4fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0fstat64(3, {st_mode=S_IFREG|0644, st_size=23685, ...}) = 0read(3, "Installing 773 packages\n\nInstall"..., 4096) = 4096write(4, "Installing 773 packages\n\nInstall"..., 4096) = 4096read(3, "nstalling apmd-3.0.2-12.\nInstall"..., 4096) = 4096write(4, "nstalling apmd-3.0.2-12.\nInstall"..., 4096) = 4096read(3, "lling hotplug-2002_04_01-13.\nIns"..., 4096) = 4096write(4, "lling hotplug-2002_04_01-13.\nIns"..., 4096) = 4096read(3, "talling alchemist-1.0.24-4.\nInst"..., 4096) = 4096write(4, "talling alchemist-1.0.24-4.\nInst"..., 4096) = 4096read(3, "-3.\nInstalling gal-devel-0.19.2-"..., 4096) = 4096write(4, "-3.\nInstalling gal-devel-0.19.2-"..., 4096) = 4096read(3, "ksnapshot-3.0.3-3.\nInstalling kt"..., 4096) = 3205write(4, "ksnapshot-3.0.3-3.\nInstalling kt"..., 3205) = 3205read(3, "", 4096) = 0close(4) = 0close(3) = 0_exit(0)

Page 41: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 41| |

Tracing I/O

Text

Stack

Library

Librarybuffer

Dataarray

a.out

biosizeText

Stack

Library

Librarybuffer

Dataarray

a.out

biosize

strace

Page 42: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 42| |

Default Buffer Size History

1) Buffers set to the disk geometry (like sectors per track)C/H/S Cylinder Head SectorBefore there were system caches, slow on reuse

2) Cache pages set to the disk geometry (like sectors per track)When writing through buffer cache more than reading

3) Cache pages set to request sizeWhen cache is for reading more than writingZBR and RAID devices have different geometries

4) Privately cache within user spaceAvoids someone else polluting what you have in cacheTakes more user memoryUser time increases instead of system time

5) Negotiate with sync,stat, fcntl system callsPortable from one filesystem to another

Page 43: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 43| |

Summary

•Know your I/O characteristics–cached – direct–sync – async–sequential - random

•strace shows the read and write system calls•FFIO in C can modify I/O characteristics without rewriting the application

–FF_IO environment variables checked at run time•FFIO provides user level cache management•FFIO has a trace of library I/O calls

Page 44: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 44| |

Additional References

•open(2),read(2),write(2),lseek(2), stat(2),fcntl(2),setbuf(3), ioctl, strace(1), lsof(1) man pages

•info libc•info libc -> low level I/O –> Asyncronous•Application Programmer I/O Guide 007-3695-004

Page 45: SGI® Altix™ Application I/O - GWDGparallel/parallelrechner/altix... · 2007. 2. 9. · SGI® Altix ™ Application I/O ... After you complete this module, you will be able to:

January 19, 2005 Page 45| |