9
Dusan Baljevic [email protected] a) LVM recover mirror consistency uses two methods: MWC (Mirror Write Cache) MCR (Mirror Consistency Record) MCR and MWC are methods of keeping mirrors in synch and tracking writes to disk. MCR is kept on the disks, in Volume Group Restricted Area (VGRA). MWC is kept in core memory. MWC/MCR is permanently running with the MWC in memory communicating with the MCR on disk. This can have an effect on performance. Also it is used because of quick recovery from a crash. b) The purpose of the mirror write consistency cache (MWC) is to provide a list of possibly out of sync mirrored areas. When a volume group is activated, the LVM copies all areas with an entry in the MWC from one of the good copies to all the other copies. This ensures that the mirrors are consistent, but makes no claims about the quality of the data. c) On each write request to a mirrored logical volume that requests MWC, the LVM checks to see if there is already an entry for the data area in the current MWC. If so, it just sends the write to the underlying device driver. If there isn't an entry, it gets one and then waits for the now updated MWC to be written to disk. So, each write to one of these logical volumes will potentially introduce one extra serial disk access. Whether or not this occurs is dependent on the degree to which accesses are random. The more random, the higher probability of missing the MWC!

HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic

Embed Size (px)

DESCRIPTION

HP-UX 11i LVM Mirroring Features and Multi-threads

Citation preview

Page 1: HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic

Dusan Baljevic [email protected]

a) LVM recover mirror consistency uses two methods:

MWC (Mirror Write Cache)MCR (Mirror Consistency Record)

MCR and MWC are methods of keeping mirrors in synch and tracking writes to disk.

MCR is kept on the disks, in Volume Group Restricted Area (VGRA).

MWC is kept in core memory.

MWC/MCR is permanently running with the MWC in memory communicating with the MCR on disk.

This can have an effect on performance. Also it is used because of quick recovery from a crash.

b) The purpose of the mirror write consistency cache (MWC) is to provide a list of possibly out of sync mirrored areas. When a volume group is activated, the LVM copies all areas with an entry in the MWC from one of the good copies to all the other copies. This ensures that the mirrors are consistent, but makes no claims about the quality of the data.

c) On each write request to a mirrored logical volume that requests MWC, the LVM checks to see if there is already an entry for the data area in the current MWC. If so, it just sends the write to the underlying device driver. If there isn't an entry, it gets one and then waits for the now updated MWC to be written to disk.

So, each write to one of these logical volumes will potentially introduce one extra serial disk access. Whether or not this occurs is dependent on the degree to which accesses are random.The more random, the higher probability of missing the MWC!

d) Getting an MWC entry can involve waiting for one to be available. If all the MWC entries are currently being used by I/O in progress, a given request might have to wait in a queue of requests until an entry becomes available.

Notice that the MWC entry is never freed on disk when a request returns to the LVM, it is merely marked as available to be used by another outgoing request.

e) Whether or not you use the MWC will depend on which aspect of system performance is more important to your environment:

run-time orrecovery-time

You can disable MWC to improve run-time performance. Entire data space will be resynched after a crash. This may be done when a database is doing transaction logging for itself.

Page 2: HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic

f) You can disable both MCR and MWC only if the application can maintain mirror consistency itself (for example, database)! Mirrors will not be resynched by LVM after a crash.

MWC disabled gives better I/O performance.

If MCR is also disabled the mirrors will not synch at reboot. It will be up to you to decide if they want these features in use or not.

With MCR enabled (that is the default), the LVM will not keep run-time records of modified extents as MWC does, but in the event of a crash (followed by reboot and re-activation), the LVM will copy all extents from one non-stale copy of the mirror to all other mirrored copies of that extent. This is similar to the synchronization strategy used by DataPair/UX. The "good" copy of the data is chosen arbitrarily from the non-stale extents as there is no record kept as to which disk has the most recent copy of the data, so if a mirrored write is in progress during a crash, it is possible that old data could be copied over new data during the mirrored recovery at activation time. If this behavior is unacceptable, MWC should be chosen. For example, this behavior would be preferred in situations where a database will re-write all incomplete transactions after a crash, but relies on the file system as underlying structures: the consistent mirrors will allow fsck to cleanly fix the file system, after which the database can update any of its out-of-date data files.

g) If both mirrors are enabled, I/O is redirected to another mirror if one is busy - so it improves performance. This should balance the I/O cost of MWC. The cost of disabling MWC and MCR is a slower recovery after a crash.

h) In HP-UX 11.31, the MWC is larger in size than in previous releases. This leads to a better logical volume I/O performance by allowing more concurrent writes. MWC has also been enhanced to support large I/O sizes.

i) Logical volumes belonging to shared volume groups (those activated with "vgchange –a s") of LVM version 1.0 and 2.0 must have the consistency recovery set to NOMWC or NONE.

Versions 1.0 and 2.0 do not support MWC for logical volumes belonging to shared volume groups. This might have changed with some patches, but I did not check this yet...

With the September 2008 release of HP-UX 11i v3, LVM supports MWC for logical volumes belonging to LVM version 2.1 shared volume groups. This ensures faster recovery following a system crash.

j) Note that one cannot change MWC on an active logical volume. Here is an example for primary paging device (swap):

Problem: While attempting to disable the "Mirror Write Cache" and "Mirror Consistency" for primary swap (/dev/vg00/lvol2 ) which was mirrored, the following error message is shown:

The command used to modify logical volumes, /sbin/lvchange, has failed.

The stderr output from the command is shown below. The logical volume has not been modified.

Page 3: HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic

lvchange: Could not change MirrorWriteCache while Logical Volume is opened or being synchronized.

Solution: Since primary swap is activated when the system boots, even in single user mode, the only way to successfully use lvchange on the primary swap logical volume is from LVM maintenance mode.

To boot into LVM maintenance mode, reboot the machine and interrupt the boot sequence.

> hpux -lm (PA-RISC)Or> boot -lm (IA64)

This will boot the machine into LVM maintenance mode. Use lvchange(1M) with the "-M" and "-c" options to modify the mirror write cache and consistency settings.

# lvchange -M n -c n /dev/vg00/lvol2

k) A quick check of the system's lvol configurations will show if this parameter is misconfigured. Assuming we are interested in vg00:

# lvdisplay /dev/vg00/lvol* | more

Look (or grep) for the lines which describe each lvol's "Consistency Recovery":

Consistency Recovery MWCConsistency Recovery NOMWCConsistency Recovery NONE

If the "Consistency Recovery" is set to NONE for anything other than a swap device (or a raw database volume as stated above), it will need to be changed. Note that if the lvol is not currently mirrored, this is not an issue, and can safely be ignored until the customer wants to mirror that lvol.

It doesn't hurt to change the parameter early, and it could prevent stumbling later if they forget about this problem by the time they go to mirroring.

l) If we need to change the MWC for logical volume that is already mirrored, the process is a little bit more complex.

After determining which mirrored logical volumes need to have their consistency recovery changed, the steps to take are: reduce the mirror to only one good copy (non-mirrored), change the consistency recovery parameter, then recreate the mirroring configuration.

The simplest way to reduce a mirroring configuration to one without mirroring is to use "lvreduce -m 0" to simply eliminate the mirror copies. Then use the lvchange(1M) to turn on consistency followed by lvextend(1M) to re-add the mirrors. This reduction will minimize downtime, as it can safely be done while the system is fully operational, but it has two drawbacks:

Page 4: HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic

It allows the user less control over which copy of the mirror will remain, and it may require more reconstruction to recreate any specialized mirroring configuration suchas striped extents.

Although the logical volume can remain in-use during the operation,it would be best to avoid using the logical volume until integritychecks can be made on the data ().

Another way of getting to a non-mirrored state is to split-off the mirrored copies using lvsplit(1M).

m) If importing a volume group from a previous release of HP-UX, there will be a full resynchronization because the format of the MWC changed at HP-UX 11i v3. If the volume group contains mirrored logical volumes using MWC, LVM converts the MWC at import time. It also performs a complete resynchronization of all mirrored logical volumes, which can take substantial time.

n) Now, let's list some of typical rules for MWC:

Disable MWC and set MCR to "none" for the database logical volume because thedatabase logging mechanism already provides consistency recovery.

Disable MWC and MCR on mirrored logical volumes where the data is not needed aftera crash, such as paging device (swap space) or other raw scratch data.

Logical volumes containing database data or file systems with few or infrequentlywritten large files (greater than 256K) must not use the MWC when runtime performanceis more important than crash recovery time.

Use fast disks for the most intensive applications if they use mirrored logicalvolumes.

Ensure that all physical volumes for mirrored logical volumes are activebecause MWC and other I/O will be redirected to another mirror if one is busy -so it improves performance.

Spread the data space across as many physical volumes as possible.

The number of volume groups is directly related to the MWC. Since there is onlyone MWC per volume group, disk space that is used for many small random write requests must be kept in distinct volume groups if possible when the MWC is used.

If possible, ensure that physical volumes in volume groups that contain mirroredLogical volumes reside on different controllers. For example, in a system with severaldisk devices on each card and several cards on each bus converter, create volume groups so that all disks off of one bus converter are in one group and all the disks on theother are in another group (one way is via physical volume groups). This configurationensures that all mirrors are created with devices accessed through different I/O paths.

Page 5: HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic

Since mirroring is typically used for root volume group only (these days allother data is on SAN), it is strongly recommended not to allow any third-partyapplications or software to run in it. I go to such an extreme that I even forcecustomers to use their own areas for temporary files:

1. Set TMPDIR variable to point to some other non-boot-volume.I always encourage application admins to use their own areas for temporary files.

Some applications look at TMPDIR environment variable.Others look at two other variables: Try setting TEMP and TMP as well as TMPDIR.

2. Mount /tmp file system with "tmplog" option in /etc/fstab.

/tmp is DESIGNED for temporary files, so it should not be abused for other choices.

In "tmplog" mode, the intent log is almost always delayed. This improves performance, but recent changes may disappear if the system crashes.

3. Clean /tmp cleaned up at boot time (not really a performance issue but useful for maintenance, especially if number of temporary files keep growing)? By default I always enable it in /etc/rc.config.d/clean_tmps

CLEAR_TMP=1

Final comment is about multi-thread synching the mirror in LVM on HP-UX.

Option 1lvsync(1M) recognizes the following option:

-T Perform mirror synchronization of logical volumes within a volume group using multiple parallel threads. Logical volumes belonging to different volume groups will be synchronized serially. It is possible that logical volumes start and/or complete their synchronization in a different order than specified on the command line. The maximum number of threads used can be controlled using the PTHREAD_THREADS_MAX system tunable. NOTE: This option has no effect if the volume group is activated in shared mode.

For example, you can extend the logical volumes and then issue parallel threads:

# lvextend -m 1 -s /dev/vgapp/lvol1# lvextend -m 1 -s /dev/vgapp/lvol2# lvextend -m 1 -s /dev/vgapp/lvol3# lvsync -T /dev/vgapp/lvol1 /dev/vgapp/lvol2 /dev/vgapp/lvol3

Page 6: HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic

Option 2Check the defragmentation on the file system which is linked to the logical volumes you need to mirror. For example

# fsadm -F vxfs -DEde -t 600 /mydata

… and take action if necessary.

Another advice is to do it on the weekends, when activity by the users decreases.

Note the following on HP-UX 11.31:

# getconf PTHREAD_THREADS_MAX3002

# kctune -v max_thread_procTunable max_thread_procDescription Maximum number of threads in each processModule pm_procCurrent Value 3002Value at Next Boot 3002Value at Last Boot 3002Default Value 256Constraints max_thread_proc >= 64 max_thread_proc <= nkthreadCan Change Immediately or at Next Boot