36
© 2011 IBM Corporation Martin Kammerer 3/22/11 visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html Introduction to Linux features for disk I/O Linux on System z Performance Evaluation

Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

  • Upload
    lekhue

  • View
    235

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation

Martin Kammerer3/22/11

visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html

Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Page 2: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation2 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Considerations for applications

decision when and for which data to make I/O system calls think of keeping frequently used file content in user data space or if needed in shared

memory – this is the best place where the data should be cached consider the right size of the I/O request. It may be better to issue a few larger requests,

containing more data than issuing many I/O requests with little data.

Page 3: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation3 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Linux kernel components involved in disk I/O

Block device layer

Device drivers

Device mapper (dm)Multipath

Logical Volume Manager (LVM)

I/O scheduler

Virtual File System (VFS)

Application program

Page cache

Direct Acces Storage Device driver Small Computer System Interface driver (SCSI)

z Fiber Channel Protocol driver (zFCP)

Queued Direct I/O driver (qdio)

Issuing reads and writes

VFS dispatches requests to different devices andTranslates to sector addressing

LVM defines the physical to logical device relation

Multipath sets the multipath policies

dm holds the generic mapping of all block devices andperforms 1:n mapping for logical volumes and/or multipath

Page cache contains all file I/O data, direct I/O bypasses the page cacheI/O schedulers merge, order and queue requests, start device drivers via (un)plug device

Data transfer handling

Page 4: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation4 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Logical volumes

Linear logical volumes allow an easy extension of the file system Striped logical volumes

– Provide the capability to perform simultaneous I/O on different stripes– Allow load balancing– Are also extendable– May lead to smaller I/O request sizes than linear or no logical volumes

Don't use logical volumes for “/” or “/usr”– If the logical volume gets corrupted your system is gone

Logical volumes require more CPU cycles than physical disks– Consumption increases with the number of physical disks used for the logical volume

Page 5: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation5 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Linux multipath

connects a volume over multiple independent paths and/or ports multibus mode

– for load balancing

– to increase bandwidth limitations of a single path

– uses all paths in a priority group in a round-robin manner

– after a configurable amount of I/Os the path is switched

• See rr_min_io in multipath.conf

– method to increase throughput for

• ECKD volumes with static PAV devices in Linux distributions older than SLES11 and RHEL6

• SCSI volumes in all Linux distributions

failover mode– for high availability

– the volume remains reachable in case of a single point of failure in the connecting hardware

– should one path fail, the operating system will route I/O through one of the remaining paths, with no changes visible to the applications

– will not increase performance

Page 6: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation6 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Page cache – good or bad?

the page cache helps Linux to economize I/O requires Linux memory pages changes its size depending on how much memory is used by processes writes are delayed and data in the page cache can have multiple updates before being

written to disk. write requests in the page cache can be merged to larger I/O requests reads can be made faster by adding a read ahead quantity, depending on the historical

behavior of file system accesses by applications consider the settings of the configuration parameters swappiness, dirty_ratio,

dirty_background ratio Linux does not know which data the application really needs next. It makes only a guess

Page 7: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation7 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Page cache administration

/proc/sys/vm/swappiness– Is one of several parameters to influence the swap tendency– Can be set individually with percentage values from 0 to 100– The higher the swappiness value, the more the system will swap– A tuning knob to let the system prefer to shrink the Linux page cache instead of swapping

processes– Different kernel levels may react differently on the same swappiness value

pdflush– These processes write data out of the page cache– /proc/sys/vm/dirty_background_ratio

• controls if writes are done more consistently to the disk or more in a batch– /proc/sys/vm/dirty_ratio

• If this limit is reached writing to disk needs to happen immediately– Both can be set individually with percentage values from 0 to 100

Page 8: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation8 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Consider to use ...

direct I/O:– bypasses the page cache

– is a good choice in all cases where the application does not want Linux to economize I/O and/or where the application buffers larger amount of file contents

async I/O:– prevents the application from being blocked in the I/O system call until the I/O completes

– allows read merging by Linux in case of using page cache

– can be combined with direct I/O

temporary files:– should not reside on real disks, a ram disk or tmpfs allows fastest access to these files

– they don't need to survive a crash, don't place them on a journaling file system

file system: – use ext3 and select the appropriate journaling mode (journal, ordered, writeback)

– turning off atime is only suitable if no application makes decisions on "last read" time, consider relatime instead

Page 9: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation9 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Linux under z/VM

Minidisk cache– requires extra storage in VM, can be in central or in expanded storage or both– has a fixed size– contains data from all guest systems who have this option set on – can be great for shared minidisks with significant read I/O– not usable for SCSI LUNs or dedicated ECKD dasd– VM does not know which data the Linux guests really need next

For very I/O intense Linux guests consider to– use dedicated devices – move to LPAR

Page 10: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation10 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Storage server

provide a big cache to avoid real disk I/O provide a Non Volatile Storage (NVS) to ensure that cached data, which has not yet been

destaged to disk devices is not lost in case of a power outage provide algorithms to optimize disk access and cache usage, optimizes for sequential I/O or

for random I/O have the biggest challenge with heavy random write, because this is limited by the destage

speed if the NVS is full

Page 11: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation11 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

DS8000 storage server - ranks

The smallest unit is a array of disk drives A RAID 5 or RAID 10 technique is used to increase the

reliability of the disk array, defining a rank The volumes for use by the host systems, are configured

on the rank The volume has stripes on each disk drive All configured volumes will share the same disk drives

rank

Disk drives

volume

RAID technology

Page 12: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation12 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Ser

ver

0

ranks

2

1

3

Device Adapter

4

5

volumes

6

7

8

Ser

ver

1

DS8000 storage server – components

Disk drives

A rank – represents a raid array of disk drives

– is connected to the servers by device adapters

– belongs to one server for primary access

A device adapter connects ranks to servers

Each server controls half of the storage servers disk drives, read cache and NVS

Rea

d ca

che

NV

SR

ead

cach

eN

VS

Page 13: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation13 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Ser

ver

0

ranks

2

1

3

Device Adapter

4

5

volumes

6

7

8

ECKD

ECKD

ECKD

SCSI

SCSI

SCSI

SCSI

ECKD

DS8000 storage server – extent pool

Ser

ver

1

Extent pools – consist of one or several ranks

– can be configured for one type of volumes only: either ECKD dasds or SCSI LUNs

– are assigned to one server

Some possible extent pool definitions for ECKD and SCSI are shown in the example

Page 14: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation14 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

1

DS8000 storage pool striped volume

ranks A storage pool striped volume (rotate extents)

is defined on a extent pool consisting of several ranks

It is striped over the ranks in stripes of 1 GiB As shown in the example

– The 1 GiB stripes are rotated over all ranks

– The first storage pool striped volume starts in rank 1, the next one would start in rank 2

volume

1 74

3

2

Disk drives

6

5

3

2 8

Page 15: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation15 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

DS8000 storage pool striped volume

Ser

ver

0S

erve

r 1

ranks

2

1

Device Adapter

A storage pool striped volume– uses disk drives of multiple

ranks

– uses several device adapters

– is bound to one server

volumes

3extent pool

Disk drives

Page 16: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation16 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

DS8000 striped volumes comparison (1)

striping is done in Linux storage serverdisk placement need to plan for don't care

effort to construct the volume configure storage server

usual disk sizesextent pool 1 rank multiple ranks

maximum number of ranks for the constructed volumetotal number of ranks

administrating disks within Linux simplevolume extendable ? yes no

maximum I/O request size

DS8000 setup for Linux LVM striping

DS8000 storage pool striping

take care of picking subsequent disks from different ranks

Lv = many disks SCSI 20 GiB to 50 GiB, ECKD mod27 or mod54

Volume = 1 disk, SCSI unlimited, e.g. 500 GiB, ECKD max. 180 GiB (EAV)

Total number of one server side (50%)

can be challenging, e.g. several hundred for a database

stripe size (e.g. 64 KiB)maximum provided by the device driver (e.g. 512 KiB)

multipathing

SCSI: multipath multibus or multipath failover ECKD: path group

SCSI: multipath multibus, ECKD: path group

Page 17: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation17 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

DS8000 striped volumes comparison (2)

2

3

4

Equivalent resource usage with– storage pool striped volumes

– volumes from single rank extents

Storage pool striping allows consolidating the space of several smaller volumes of single rank extents in one big volume

For volumes from single rank extents the operating system must take care of the striping

Ser

ver

0

1

ranksDevice Adapter

5

volumes

6

7

8

Ser

ver

1

extent pool

Page 18: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation18 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

XIV storage server

Volumes can't be placed in single ranks (i.e. one single interface module or data module)

Offers only storage pool striping Stripe size is 1 MiB The volumes in the storage pool can use the entire

storage server Currently max. 4 Gbps FCP links Provides only SCSI volumes Recommendation for Linux

– Use multipath multibus to access the volumes

Page 19: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation19 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Disk I/O attachment and DS8000 storage subsystem

Sw

itch

Ser

ver

0S

erve

r 1

ranks

7

8

6

5

4

3

2

1

Host Bus Adapter 2

Channel path 2

Host Bus Adapter 4

Channelpath 4

Host Bus Adapter 6

Channelpath 6

Host Bus Adapter 7

Channelpath 7

Device Adapter

ECKD

SCSI

ECKD

SCSI

ECKD

SCSI

ECKD

SCSIFICON Express

Each disk is a configured volume in a extent pool (here extent pool = rank)

Host bus adapters connect internally to both servers

Host Bus Adapter 1

Channel path 1

Host Bus Adapter 3

Channel path 3

Host Bus Adapter 5

Channel path 5

Host Bus Adapter 8

Channel path 8

Page 20: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation20 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Connection to the storage server

FICON Express4, or 8. Consider that not all ports from the same feature can be used simultaneously at full speed

on the host side and on the storage server side Consider to use switches which do not limit the bandwidth of the channels

FICON PortFCP PortNot used

System z DS8KS

witc

h

Page 21: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation21 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

General layout for FICON/ECKD

HBA 1

HBA 4

HBA 3

HBA 2

Ser

ver

0S

erve

r 1

ranks

7

5

3

1

chpid 4

chpid 3

chpid 2

a

DA

b

c

d

Channelsubsystem

Block device layerdm

MultipathLVM

I/O scheduler

VFS

Application program

Page cache

dasd driver

The dasd driver starts the I/O on a subchannel

Each subchannel connects to all channel paths in the path group

Each channel connects via a switch to a host bus adapter

A host bus adapter connects to both servers

Each server connects to its ranks

chpid 1

Page 22: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation22 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

FICON/ECKD dasd I/O to a single disk

HBA 1

HBA 4

HBA 3

HBA 2

Ser

ver

0S

erve

r 1

ranks

7

5

3

1

chpid 4

chpid 3

chpid 2

a

DA

Channelsubsystem

Block device layer

I/O scheduler

VFS

Application program

Page cache

dasd driver

Assume that subchannel a corresponds to disk 2 in rank 1

The full choice of host adapters can be used

Only one I/O can be issued at a time through subchannel a

All other I/Os need to be queued in the dasd driver and in the block device layer until the subchannel is no longer busy with the preceding I/O

chpid 1

!

Page 23: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation23 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

FICON/ECKD dasd I/O to a single disk with PAV (SLES10 / RHEL5)

HBA 1

HBA 4

HBA 3

HBA 2

Ser

ver

0S

erve

r 1

ranks

7

5

3

1

chpid 4

chpid 3

chpid 2

a

DA

Channelsubsystem

Block device layer

I/O scheduler

VFS

Application program

Page cache

dasd driver

VFS sees one device

The device mapper sees the real device and all alias devices

Parallel Access Volumes solve the I/O queuing in front of the subchannel

Each alias device uses its own subchannel

Additional processor cycles are spent to do the load balancing in the device mapper

The next slowdown is the fact that only one disk is used in the storage server. This implies the use of only one rank, one device adapter, one server

chpid 1

Multipathdm

b

c

d

! !

Page 24: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation24 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

FICON/ECKD dasd I/O to a single disk with HyperPAV (SLES11 / RHEL6)

HBA 1

HBA 4

HBA 3

HBA 2

Ser

ver

0S

erve

r 1

ranks

7

5

3

1

chpid 4

chpid 3

chpid 2

a

DA

Channelsubsystem

Block device layer

I/O scheduler

VFS

Application program

Page cache

dasd driver

VFS sees one device

The dasd driver sees the real device and all alias devices

Load balancing with HyperPAV and static PAV is done in the dasd driver. The aliases need only to be added to Linux. The load balancing works better than on the device mapper layer.

Less additional processor cycles are needed than with Linux multipath.

The next slowdown is the fact that only one disk is used in the storage server. This implies the use of only one rank, one device adapter, one server

chpid 1

b

c

d

!

Page 25: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation25 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

FICON/ECKD dasd I/O to a linear or striped logical volume

HBA 1

HBA 4

HBA 3

HBA 2

Ser

ver

0S

erve

r 1

ranks

7

5

3

1

chpid 4

chpid 3

chpid 2

a

DA

Channelsubsystem

Block device layer

I/O scheduler

VFS

Application program

Page cache

dasd driver

VFS sees one device (logical volume)

The device mapper sees the logical volume and the physical volumes

Additional processor cycles are spent to map the I/Os to the physical volumes.

Striped logical volumes require more additional processor cycles than linear logical volumes

With a striped logical volume the I/Os can be well balanced over the entire storage server and overcome limitations from a single rank, a single device adapter or a single server

To ensure that I/O to one physical disk is not limited by one subchannel, PAV or HyperPAV should be used in combination with logical volumes

chpid 1

b

c

d

LVM

dm !

Page 26: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation26 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

FICON/ECKD dasd I/O to a storage pool striped volume with HyperPAV (SLES11)

HBA 1

HBA 4

HBA 3

HBA 2

Ser

ver

0S

erve

r 1

ranks

7

5

3

1

chpid 4

chpid 3

chpid 2

a

DA

Channelsubsystem

Block device layer

I/O scheduler

VFS

Application program

Page cache

dasd driver

A storage pool striped volume makes only sense in combination with PAV or HyperPAV

VFS sees one device

The dasd driver sees the real device and all alias devices

The storage pool striped volume spans over several ranks of one server and overcomes the limitations of a single rank and / or a single device adapter

Storage pool striped volumes can also be used as physical disks for a logical volume to use both server sides.

To ensure that I/O to one dasd is not limited by one subchannel, PAV or HyperPAV should be used

chpid 1

b

c

d

extent pool

!

Page 27: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation27 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

General layout for FCP/SCSI

HBA 5

HBA 8

HBA 7

HBA 6

Ser

ver

0S

erve

r 1

ranks

8

6

4

2

chpid 8

chpid 7

chpid 6

DA

Block device layerdm

MultipathLVM

I/O scheduler

VFS

Application program

Page cache

The SCSI driver finalizes the I/O requests

The zFCP driver adds the FCP protocol to the requests

The qdio driver transfers the I/O to the channel

A host bus adapter connects to both servers

Each server connects to its ranks

chpid 5

SCSI driverzFCP driverqdio driver

Page 28: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation28 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

FCP/SCSI LUN I/O to a single disk

HBA 5

HBA 8

HBA 7

HBA 6

Ser

ver

0S

erve

r 1

ranks

8

6

4

2

chpid 8

chpid 7

chpid 6

DA

Block device layer

I/O scheduler

VFS

Application program

Page cache

Assume that disk 3 in rank 8 is reachable via channel 6 and host bus adapter 6

Up to 32 (default value) I/O requests can be sent out to disk 3 before the first completion is required

The throughput will be limited by the rank and / or the device adapter

There is no high availability provided for the connection between the host and the storage server

chpid 5

SCSI driverzFCP driverqdio driver

!

!

Page 29: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation29 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

FCP/SCSI LUN I/O to a single disk with multipathing

HBA 5

HBA 8

HBA 7

HBA 6

Ser

ver

0S

erve

r 1

ranks

8

6

4

2

chpid 8

chpid 7

chpid 6

DA

Block device layer

I/O scheduler

VFS

Application program

Page cache

VFS sees one device

The device mapper sees the multibus or failover alternatives to the same disk

Administrational effort is required to define all paths to one disk

Additional processor cycles are spent to do the mapping to the desired path for the disk in the device mapper

chpid 5

SCSI driverzFCP driverqdio driver

Multipathdm !

!

Page 30: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation30 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

FCP/SCSI LUN I/O to a linear or striped logical volume

HBA 5

HBA 8

HBA 7

HBA 6

Ser

ver

0S

erve

r 1

ranks

8

6

4

2

chpid 8

chpid 7

chpid 6

DA

Block device layer

I/O scheduler

VFS

Application program

Page cache

VFS sees one device (logical volume)

The device mapper sees the logical volume and the physical volumes

Additional processor cycles are spent to do map the I/Os to the physical volumes.

Striped logical volumes require more additional processor cycles than linear logical volumes

With a striped logical volume the I/Os can be well balanced over the entire storage server and overcome limitations from a single rank, a single device adapter or a single server

To ensure high availability the logical volume should be used in combination with multipathing

chpid 5

SCSI driverzFCP driverqdio driver

dm

LVM

!

Page 31: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation31 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Sw

itch

FCP/SCSI LUN I/O to a storage pool striped volume with multipathing

HBA 5

HBA 8

HBA 7

HBA 6

Ser

ver

0S

erve

r 1

ranks

8

6

4

2

chpid 8

chpid 7

chpid 6

DA

Block device layer

I/O scheduler

VFS

Application program

Page cache

Storage pool striped volumes make no sense without high availability

VFS sees one device

The device mapper sees the multibus or failover alternatives to the same disk

The storage pool striped volume spans over several ranks of one server and overcomes the limitations of a single rank and / or a single device adapter

Storage pool striped volumes can also be used as physical disks for a logical volume to make use of both server sides.

chpid 5

SCSI driverzFCP driverqdio driver

Multipathdm

extent pool

!

!

Page 32: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation32 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Summary - I/O processing characteristics

FICON/ECKD:– 1:1 mapping host subchannel:dasd– Serialization of I/Os per subchannel– I/O request queue in Linux– Disk blocks are 4 KiB– High availability by FICON path groups– Load balancing by FICON path groups and Parallel Access Volumes

FCP/SCSI– Several I/Os can be issued against a LUN immediately– Queuing in the FICON Express card and/or in the storage server– Additional I/O request queue in Linux– Disk blocks are 512 Bytes– High availability by Linux multipathing, type failover or multibus– Load balancing by Linux multipathing, type multibus

Page 33: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation33 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Summary - Performance considerations

to speed up the data transfer to one or a group of volumes use more than 1 rank, because then simultaneously

– more than 8 physical disks are used– more cache and NVS can be used– more device adapters can be used

use more than 1 FICON Express channel

in case of ECKD use – more than 1 subchannel per volume in the host channel subsystem– High Performance FICON and/or Read Write Track Data

speed up techniques– use more than 1 rank: storage pool striping, Linux striped logical volume– use more channels: ECKD logical path groups, SCSI Linux multipath multibus– ECKD use more than 1 subchannel: PAV, HyperPAV

Page 34: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation34 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Trademarks

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Other product and service names might be trademarks of IBM or other companies.

Source: If applicable, describe source origin

Page 35: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation35 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Backup slides

Page 36: Introduction to Linux features for disk I/O - IBM · 3 Introduction to Linux features for disk I/O © 2011 IBM Corporation Linux on System z Performance Evaluation Linux kernel components

© 2011 IBM Corporation36 Introduction to Linux features for disk I/O

Linux on System z Performance Evaluation

Monitoring disk I/O

Output from iostat -dmx 2 /dev/sda

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 10800.50 6.00 1370.00 0.02 46.70 69.54 7.03 5.11 0.41 56.50 sequential write---------------------------------------------------------------------------------------------------------

sda 0.00 19089.50 21.50 495.00 0.08 76.59 304.02 6.35 12.31 1.85 95.50 sequential write---------------------------------------------------------------------------------------------------------

sda 0.00 15610.00 259.50 1805.50 1.01 68.14 68.58 26.07 12.23 0.48 100.00 random write---------------------------------------------------------------------------------------------------------

sda 1239.50 0.00 538.00 0.00 115.58 0.00 439.99 4.96 9.23 1.77 95.00 sequential read---------------------------------------------------------------------------------------------------------

sda 227.00 0.00 2452.00 0.00 94.26 0.00 78.73 1.76 0.72 0.32 78.50 random read

Merged requestsI/Os per second Throughput Request sizeQueue lengthService timesutilization