Upload
doris-perkins
View
228
Download
4
Embed Size (px)
Citation preview
2 2
Outline
Introduction to File CachingIntroduction to File Caching Page Cache and Virtual memory System File System performance
3 3
Introduction to File Caching
File Caching One of the most important features of a file
system Unix file system caching is implemented in
the I/O subsystem by keeping copies of recently read or written blocks in a block cache
Solaris, implemented in the virtual memory system
5 5
Solaris Page Cache
Page chahe a new method of caching file system data developed at Sun as part of the virtual memory used by System V Release 4 Unix now also used in Linux and Windows NT
major differences from the old caching method it’s dynamically sized and can use all memory that is
not being used by applications it caches file blocks rather than disk blocks
The key difference is that the page cache is a virtual file cache rather than a physical block cache
6 6
The Solaris Page Cache
for internal file system data -- metadata items(direct/indirect blocks, inodes)
for file data
7 7
Block Buffer Cache
used for caching of inodes and file metadata In old versions of Unix, fixed in size by nbuf
specified the number of 512-byte buffers
now also dynamically sized can grow by nbuf, as needed,
until it reaches a ceiling specified by the bufhwm
By default, it is allowed to grow until it uses 2 percent of physical memory.
We can look at the upper limit for the buffer cache by using the sysdef command.
8 8
sysdef command.
# sysdef** Tunable Parameters*7757824 maximum memory allowed in buffer cache (bufhwm)5930 maximum number of processes (v.v_proc)99 maximum global priority in sys class (MAXCLSYSPRI)5925 maximum processes per user id (v.v_maxup)30 auto update time limit in seconds (NAUTOUP)25 page stealing low water mark (GPGSLO)5 fsflush run rate (FSFLUSHR)25 minimum resident memory for avoiding deadlock (MINARMEM)25 minimum swapable memory for avoiding deadlock (MINASMEM)
9 9
Buffer cache size needed
300 bytes per inode about 1 MB per 2 GB of files Example
A DBS with 100 files, total 100GB of storage space
Access only 50GB at the same time Need:
100*300 bytes=30KB for inodes 50/2*1MB=25MB for metadata (direct and indirect blocks)
On a system with 5GB of physical memory Default bufhwm will be 102MB
10 10
monitor the buffer cache hit statistics # sar -b 3 333SunOS zangief 5.7 Generic sun4u 06/27/99
22:01:51 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s22:01:54 0 7118 100 0 0 100 0
022:01:57 0 7863 100 0 0 100 0 022:02:00 0 7931 100 0 0 100 0 022:02:03 0 7736 100 0 0 100 0 022:02:06 0 7643 100 0 0 100 0 022:02:09 0 7165 100 0 0 100 0 022:02:12 0 6306 100 8 25 68 0 022:02:15 0 8152 100 0 0 100 0 022:02:18 0 7893 100 0 0 100 0
0
11 11
Introduction to File CachingIntroduction to File Caching Page Cache and Virtual memory System File System performance
Outline
12 12
file system caching behavior
physical memory is divided into pages “pages in” a file
To read data from a file into memory, the virtual memory system reads in one page at a time
page scanner searches and puts LRU pages back on the free list
14 14
File System Paging Optimizations reduce the amount of memory pressure
invoke free-behind with sequential access free pages when free memory falls to lotsfree
limit the file system’s use of the page cache pages_before_pager, default 200 pages reflects the amount of memory above the
point where the page scanner starts (lotsfree) when memory falls to 1.6 megabytes (on
UltraSPARC) above lotsfree, the file system throttles back the use of the page cache
15 15
File System Paging Optimizations (Cont.) memory falls to lotsfree +
pages_before_pager Solaris file systems free all pages after they
are written UFS and NFS enable free-behind on
sequential access NFS disables read-ahead NFS writes synchronously, rather than
asynchronously VxFS enables free-behind (some versions
only)
16 16
Introduction to File CachingIntroduction to File Caching Page Cache and Virtual memory System File System performance
Outline
17 17
Paging affects user’s application page scanner puts too much pressure on
user application’s private process memory If scan rate is several hundred pages a
second, the amount of time to check whether a page has been accessed falls to a few seconds.
any pages have not been used in the last few seconds will be taken
This behavior negatively affects application performance
18 18
Example consider an OLTP application that makes heavy use
of the file system database is generating file system I/O, making the
page scanner actively steal pages from the system. user of the OLTP application has paused for 15 seconds
to read the contents of a screen from the last transaction.
During this time, page scanner has found that those pages associated with the user application have not been referenced and makes them available for stealing.
The pages are stolen, when user types the next keystroke, he is forced to wait until the application is paged back in—usually several seconds.
Our user is forced to wait for an application to page in from the swap device, even though the application is running on a system with sufficient memory to keep all of the application in physical memory!
19 19
The priority paging algorithm
places a boundary around the file cache so that file system I/O does not cause unnecessary paging of applications
prioritizes the different types of pages in the page cache, in order of importance:
Highest — Pages associated with executables and shared libraries, including application process memory (anonymous memory)
Lowest — Regular file cache pages
as long as the system has sufficient memory, the scanner only steals pages associated with regular files
20 20
Enable priority paging
set the parameter priority_paging in /etc/system:set priority_paging=1
To enable priority paging on a live 32-bit system, set the following with adb:
# adb -kw /dev/ksyms /dev/mem
lotsfree/D
lotsfree: 730 <- value of lotsfree is printed
cachefree/W 0t1460 <- insert 2 x value of lotsfree preceded with 0t (decimal)
dyncachefree/W 0t1460 <- insert 2 x value of lotsfree preceded with 0t (decimal)
cachefree/D
cachefree: 1460
dyncachfree/D
dyncachefree: 1460
21 21
Enable priority paging (Cont.)
To enable priority paging on a live 64-bit system, set the following with adb:
# adb -kw /dev/ksyms /dev/mem
lotsfree/E
lotsfree: 730 <- value of lotsfree is printed
cachefree/Z 0t1460 <- insert 2 x value of lotsfree preceded with 0t (decimal)
dyncachefree/Z 0t1460 <- insert 2x value of lotsfree preceded with 0t (decimal)
cachfree/E
cachefree: 1460
dyncachfree/E
dyncachefree: 1460
22 22
Paging types
Execute bit associated with address space executable files regular files
paging types: executable, application, and file
memstat command Output is similar to that of vmstat, but with
extra fields to differentiate paging types pi po fr sr epi epf api apo apf fpi fpo fpf
23 23
paging caused by an application memory shortage
# ./readtest testfile&
# memstat 3
Memory ----------- paging ------ ---------executable- -- anonymous ------- -- filesys - --- cpu ---
free re mf pi po fr de sr epi epo epf api apo apf fpi fpo fpf us sy wt id
2080 1 0 749 512 821 0 264 0 0 269 0 512 549 749 0 2 1 7 92 0
1912 0 0 762 384 709 0 237 0 0 290 0 384 418 762 0 0 1 4 94 0
1768 0 0 738 426 610 0 1235 0 0 133 0 426 434 738 0 42 4 14 82 0
1920 0 2 781 469 821 0 479 0 0 218 0 469 525 781 0 77 24 54 22 0
2048 0 0 754 514 786 0 195 0 0 152 0 512 597 754 2 37 1 8 91 0
2024 0 0 741 600 850 0 228 0 0 101 0 597 693 741 2 56 1 8 91 0
2064 0 1 757 426 589 0 143 0 0 72 8 426 498 749 0 18 1 7 92 0
24 24
paging through the file system
# ./readtest testfile&
# memstat 3memory ----------- paging ------------------ -executable - -anonymous - -- filesys -- ---- cpu ------ free re mf pi po fr de sr epi epo epf api apo apf fpi fpo fpf us sy wt id3616 6 0 760 0 752 0 673 0 0 0 0 0 0 760 0 752 2 3 95 03328 2 198 816 0 925 0 1265 0 0 0 0 0 0 816 0 925 2 10 88 03656 4 195 765 0 792 0 263 0 0 0 2 0 0 762 0 792 7 11 83 03712 4 0 757 0 792 0 186 0 0 0 0 0 0 757 0 792 1 9 91 03704 3 0 770 0 789 0 203 0 0 0 0 0 0 770 0 789 0 5 95 03704 4 0 757 0 805 0 205 0 0 0 0 0 0 757 0 805 2 6 92 03704 4 0 778 0 805 0 266 0 0 0 0 0 0 778 0 805 1 6 93 0
25 25
Paging parameters affecting performance When priority paging is enabled, the file
system scan rate is higher. High scan rates should not be used as a
factor for determining memory shortage If the file system activity is heavy, the
scanner parameters are insufficient and will limit file system performance.
set the scanner parameters fastscan and maxpgioto to allow the scanner to scan at a high enough rate to keep up with the file system.
26 26
Scanner parameters
fastscan the number of pages per second the scanner can
scan. defaults ¼ of memory per second, limited to 64
MB per second limits file system throughput
when memory is at lotsfree,the scanner runs at half of fastscan, limited to 32 MB per second
If only 1/3 physical memory pages is a file page, the scanner will only be able to put 32 / 3 = 11MB per second of memory on the free list.
27 27
Scanner parameters (Cont.)
Maxpgio the maximum number of pages the page
scanner can push. limits the write performance of the file system
If memory is sufficient, set maxpgio large, 1024
Example: on a 4 GB machine set fastscan=131072 set handspreadpages=131072 set maxpgio=1024
29 29
Direct I/O
unbuffered I/O , bypass file system page cache
UFS Direct I/O allows reads and writes to files in a regular file
system to bypass the page cache and access the file at near raw disk performance
be advantageous when accessing a file in a manner where caching is of no benefit e.g., copying a very large file from one disk to another
eliminates the double copy that is performed when the read and write system calls are used arranging for the DMA transfer to occur directly into the user’s address
space
30 30
Enable direct I/O
Direct I/O will only bypass the buffer cache if all of the following are true
The file is not memory mapped. The file is not on a logging file system. The file does not have holes. The read/write is sector aligned (512 byte)
enable direct I/O mounting an entire file system with the forcedirectio
mount option
# mount -o forcedirectio /dev/dsk/c0t0d0s6 /u1 with the directio system call, on a per-file basis
int directio(int fildes, DIRECTIO_ON | DIRECTIO_OFF);
31 31
UFS direct I/O
Direct I/O can provide extremely fast transfers when moving data with big block sizes (>64 kB), but it can be a significant performance limitation for smaller sizes.
Structure ufs_directio_kstats, direct I/O statisticsstruct ufs_directio_kstats {
uint_t logical_reads; /* Number of fs read operations */
uint_t phys_reads; /* Number of physical reads */
uint_t hole_reads; /* Number of reads from holes */
uint_t nread; /* Physical bytes read */
uint_t logical_writes; /* Number of fs write operations */
uint_t phys_writes; /* Number of physical writes */
uint_t nwritten; /* Physical bytes written */
uint_t nflushes; /* Number of times cache was cleared */
} ufs_directio_kstats;
32 32
Directory name Cache
caches path names for vnodes DNLC, The Directory Name Lookup Cache
Each time we find the path name for a vnode, we store it in DNLC
Ncsize, system-tunable parameter, used to set the number of entries in the DNLC is set at boot time ncsize = (17 * maxusers) + 90 in Solaris 2.4, 2.5, 2.5.1 ncsize = (68 * maxusers) + 360 in Solaris 2.6, 2.7 Maxusers, equal to the number of megabytes of memory installed in the system,
maximum of 1024, it can also be overridden to 2048
Hit rate the number of times a name was looked up and
found in the name cache
33 33
Inode Caches
keep a number of inodes in memory to minimize disk inode reads to keep the inode’s vnode in memory
ufs_ninode, size the tables for the expected number of inodes affects the number of inodes in memory
how the UFS maintains inodes Inodes are created when a file is first referenced States: referenced, or on an idle queue Are destroyed when pushed off the end of the idle
queue
34 34
Inode Caches (Cont.)
The number of inodes in memory is dynamicno upper bound to the number of inodes open at a time
the idle queueWhen inode is no longer referenced, the inode is placed on the idle
queue its size is controlled by the ufs_ninode parameter and is limited to ¼ of
ufs_ninode referred by
other subsystem
35 35
Inode Caches (Cont.)# sar -v 3 3
SunOS devhome 5.7 Generic sun4u 08/01/9911:38:09 proc-sz ov inod-sz ov file-sz ov lock-sz11:38:12 100/5930 0 37181/37181 0 603/603 0 0/011:38:15 100/5930 0 37181/37181 0 603/603 0 0/011:38:18 101/5930 0 37181/37181 0 607/607 0 0/0
# netstat -k ufs_inode_cache
ufs_inode_cache:
buf_size 440 align 8 chunk_size 440 slab_size 8192 alloc 1221573 alloc_fail 0
free 1188468 depot_alloc 19957 depot_free 21230 depot_contention 18 global_alloc 48330
global_free 7823 buf_constructed 3325 buf_avail 3678 buf_inuse 37182
buf_total 40860 buf_max 40860 slab_create 2270 slab_destroy 0 memory_class 0
hash_size 0 hash_lookup_depth 0 hash_rescale 0 full_magazines 219
empty_magazines 332 magazine_size 15 alloc_from_cpu0 579706 free_to_cpu0 588106
buf_avail_cpu0 15 alloc_from_cpu1 573580 free_to_cpu1 571309 buf_avail_cpu1 25
36 36
Inode Caches (Cont.)
hash table used to look up inodes Its size is controlled by the ufs_ninode By default, ufs_ninode is set to the size of the
directory name cache (ncsize) set ufs_ninode separately in /etc/system
set ufs_ninode = new_value