SCSI Mid-layer Eric Youngdale 2nd Annual Linux Storage Management Workshop October 2000

Preview:

Citation preview

SCSI Mid-layer

Eric Youngdale

2nd Annual Linux Storage Management WorkshopOctober 2000

Introduction

Main point of this talk:– Historical evolution of Linux SCSI.– Explain state of the art in Linux 2.2.– Discuss changes for 2.4.– Discuss pending changes in the 2.5 kernel.

Block devices and Linux

• Linux has a generic block device layer with which all filesystems will interact.

• SCSI is no different in this regard – it registers itself with the block device layer so it can receive requests.

• SCSI also handles character device requests and ioctls that do not originate in the block device layer.

What is the “Mid-Layer”?

• Linux SCSI support can be viewed as 3 levels.

• Upper level is device management, such as tape, cdrom, disk, etc.

• Lower level talks to host adapters.

• Middle layer is essentially a traffic cop, handing requests from rest of kernel, and dispatching them to the rest of SCSI.

State of the art in Linux-2.2

• Error handling handled better for drivers that make use of new error handling code. New error handling code introduced in 2.2.

• Queue management fundamentally unchanged since the Linux 1.x days. “The Code that Time Forgot”. Lots of dinosaurs running around in the code.

• Rest of mid-level largely stagnant.

What was wrong in 2.2?

• The elevator algorithms in 2.2 allowed requests to grow irregardless of the capabilities of the underlying device.

• All SCSI disks were handled in a single queue.• Disk driver had to split requests that had become

too large. • One set of common logic for verifying requests had

not become too large.

What was wrong in 2.2 (cont)

• Character device requests not in queue.

• SMP safety was clumsily handled, leading to race conditions and poor performance.

• Poor scalability.

• Many drivers continue to use old error handling code.

Queue handling in 2.2

Disk1Disk1Disk Queue Disk Queue HeadHead

Disk2Disk2

Disk1Disk1

Disk3Disk3

Disk1Disk1

Changes for Linux-2.4

• Block device layer was generalized to support a “request_queue_t” abstract datatype that represents a queue.

• Contains function pointers that drivers can use for managing the size of requests inserted into queues.

• Requests no longer can grow to be too large to be handled at one time.

Changes for 2.4 (cont)

• No longer any need for splitting requests.

• No need for ugly logic to scan a queue for a queueable request.

• SMP locking in mid-layer cleaned up to provide finer granularity.

Changes for 2.4 (cont)

• A SCSI queuing library was created – a set of functions for queue management that are tailored to different sets of requirements.

• SCSI was modified to use a single queue for each physical device.

• Character device requests and ioctls are inserted into the same queue at the tail, and handled the same as other requests.

Queuing library

Maintainability is a problem if multiple instances of code can perform similar function.

__inline static int __scsi_merge_requests_fn(request_queue_t * q,

struct request * req, struct request * next, int use_clustering, int dma_host)

{ /* * Appropriate contents */ }

Queueing Library (Cont).

#define MERGEREQFCT(_FUNCTION, _CLUSTER, _DMA) \static int _FUNCTION(request_queue_t * q, \

struct request * req, \ struct request * next) \

{ \ return __scsi_merge_requests_fn(q, req, next, _CLUSTER,

_DMA); \ } MERGEREQFCT(scsi_merge_requests_fn_, 0, 0)

MERGEREQFCT(scsi_merge_requests_fn_d, 0, 1) MERGEREQFCT(scsi_merge_requests_fn_c, 1, 0) MERGEREQFCT(scsi_merge_requests_fn_dc, 1, 1)

Changes for 2.4 (cont)

• In 2.2, there were separate functions and code paths for initializing SCSI for the case of compiled into kernel and loaded via modules.

• In 2.4, this was cleaned up – redundant code was removed, and the same code is used to initialize for both modules and compiled into kernel.

Upcoming changes for 2.5

• All drivers will be forced to use new error handling code.

• Disk driver will be updated to handle larger number of disks.

• SMP locking will be cleaned up some more to improve scalability.

Old error handling code

• Essentially a bad state machine.• Has tons of SMP problems that are not

easily fixed.• Tries to resolve errors while allowing new

requests to be queued.• Many kernel reliability problems are

because of old error handling problems.• Needs to be discarded in the worst way.

New error handling code

• The new error handling code has been available since the 2.1.75 kernel.

• To force driver authors to update their drivers, the old error handling code will simply be removed. Drivers that have not been updated will fail to compile.

• Orphaned drivers will be handled on a case-by-case basis.

Further SMP cleanups

• All low-level drivers currently use io_request_lock for SMP safety.

• This lock is also used by all other block devices on the system to protect their queues.

• Plans are in the works to switch the block device layer to use a per-queue lock, thereby isolating SCSI from other devices.

SMP Cleanups (cont).

• Low-level drivers don’t need to protect queue – they don’t have access to it.

• Each low-level driver should have a separate lock – ideally one per instance of host, but could be a driver-wide lock initially. This should be up to the low-level driver.

SMP Cleanups (cont)

• Block device layer has a number of arrays, indexed by major/minor:

blksize_size[MAJOR(dev)][MINOR(dev)]

• Access is not protected by any locks.

• Impossible for block drivers to resize without introducing race condition.

Large numbers of disks

• Current disk driver allocates 8 majors, allowing for only 128 disks.

• Plans are in the works to allow disk driver to dynamically allocate major numbers.

• Would support up to about 4000 disks, when major numbers are exhausted.

• Possible to go beyond this by using fewer bits for partitions.

Wish list.

• Implement some SCSI-3 features (larger commands, sense buffers).

• Improve support for shared busses.

• Support target-mode.

• Check module add/remove code for SMP safety, implement locks.

• Improvements related to high-availability.

Conclusions

The major goal of a rewrite of SCSI queuing has been accomplished. A number of architectural problems were resolved at the same time.

There are still some interesting tasks still to be addressed for 2.5.

See http://www.andante.org/scsi.html for more info, and http://www.andante.org/scsi_todo.html for “todo” list.

Contacts

Email: eric@andante.org

Web: http://www.andante.org

The notes for this talk are on the website.

Recommended