61
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Tinker, Greg & Chris – HP Master Technologists

Tinker Twins – TB2095 - Technical tactics for enterprise storage

Embed Size (px)

DESCRIPTION

HP Master Technologists: Greg & Chris Tinker, presentation deck from HP Discover 2012 Las Vegas “Technical tactics for enterprise storage”

Citation preview

Page 1: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Tinker, Greg & Chris – HP Master Technologists

Page 2: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

HP P9000 (XP), P10000 (3PAR), and P6000 (EVA), P4000 (Left Hand), X9000 (Ibrix)

Technical tactics for enterprise storage

Tinker, Greg & Chris – HP Master TechnologistsJune 2012

Page 3: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3

This session is high level and is subject to change without notice.

Forward-looking statements

This document contains forward looking statements regarding future operations, product development, and product capabilities. This information is subject to substantial uncertainties and is subject to change at any time without prior notification. Statements contained in this document concerning these matters only reflect Hewlett Packard's predictions and / or expectations as of the date of this document and actual results and future plans of Hewlett-Packard may differ significantly as a result of, among other things, changes in product strategy resulting from technological, internal corporate, market and other changes. This is not a commitment to deliver any material, code or functionality and should not be relied upon in making purchasing decisions.

Page 4: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4

Solu

tion

s In

frast

ruct

ure

HP Networking Wired, Wireless, Data Center, Security & Management

B, C & H SeriesFC Switches & Directors

SAN Connection Portfolio

HP Networking Enterprise Switches

Nearl

ine

D2D Backup Systems

ESL tapelibraries

Virtual library

systems

EML tape librariesMSL tape

librariesRDX, tape drives

& tape autoloaders

Soft

ware

HP S

erv

ices

Onlin

e

P2000X1000/X3000 P9500 XPX9000 P6000 EVAP4000

Data ProtectorExpress

Storage Essentials

Storage Array Software

Storage Mirroring

Data Protector

Business Copy Continuous Access

Cluster Extension

The HP storage portfolio

X5000

E5000for Exchange

HP VirtualSystem

HP CloudSystem

P10K/3PAR

HP AppSystem

Page 5: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Where to begin

Page 6: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6

Agenda

Technical tactics for enterprise storage

Device server – the array:• Front end (CHA/FA, Cache, MP/ASIC, Bus, CMD IOCTL, etc.) – overhead and/or

saturation• Back end – Hot disks, slow disks, Array Groups, Storage Tiers, external storage

Host / application:• IO profile – stride, reverse, random, sequential, buffered/non-buffered, sync/async...• CPU – interrupts, switches, freq of CPU…• Parallelism – keeping the pipe full

Storage connectivity• Flow control• Latency

DebuggingKnow the layers

Page 7: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Device server

Page 8: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8

Device serverP2000 MSA P9500

P4000 LeftHand

Virtual IT Mission Critical Consolidation P6000

EVA

Utility Storage

3PAR

Application Consolidation

Storage Consolidation

Architecture Dual Controller Scale-out Cluster Dual Controller Mesh-Active Cluster Fully Redundant

Connectivity SAS, iSCSI, FC iSCSI FC, iSCSI, FCoE iSCSI, FC, (FCoE) FC, FCoE

Performance 30K Random Read IOPs ; 1.5GB/s seq reads

35K Random read IOPs2.6 GB/s seq reads

55K Random read IOPS 1.7 GB/s seq Reads

> 400K random IOPs; > 10 GB/s seq reads

>300K Random IOPS (ThP)> 10GB/s seq reads

Application Sweet spot

SMB , enterprise ROBO, consolidation/ virtualizationServer attach, Video surveillance

SMB, ROBO and Enterprise –Virtualized inc VDI , Microsoft appsBladeSystem SAN (P4800)

Enterprise - Microsoft, Virtualized, OLTP

Enterprise and Service Provider , Utilities, Cloud, Virtualized Environments, OLTP, Mixed Workloads

Large Enterprise - Mission Critical w/Extreme availability, Virtualized Environments, Multi-Site DR

Capacity 600GB – 192TB; 6TB average

7TB – 768TB; 72TB average

2TB – 480TB; 36TB average

5TB – 1600TB;120TB average

10TB – 2000 TB; 150TB average

Key features Price / performanceController ChoiceReplicationServer Attach

All-inclusive SWMulti-Site DR includedVirtualizationVM IntegrationVirtual SAN Appliance

Ease of use and SimplicityIntegration/CompatibilityMulti-Site Failover

Multi-tenancyEfficiency (Thin Provisioning)PerformanceAutonomic Tiering and Management

Constant Data AvailabilityHeterogeneous VirtualizationMulti-site Disaster RecoveryApplication QOS (APEX)Smart Tiers

OS support Windows, vSphere, HP-UX, Linux, OVMS, Mac OS X, Solaris, Hyper-V

vSphere. Windows, Linux, HP-UX, MacOS X, AIX, Solaris, XenServer

Windows, VMware, HP-UX, Linux, OVMS, Mac OS X, Solaris, AIX

vSphere, Windows, Linux, HP-UX, AIX, Solaris

All major OS’s including Mainframe and Nonstop

Page 9: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9

Controller 2

CacheCPU

Controller 1

CacheCPU. . .

Front end: midrange controller architecture - ALUA

Device server

Typical Midrange Controller design

Clustered dual Controller, Host Ports, Mirrored Cache, Set of Software Features.

. . .Fibre Channel or

SAS BackendFC Ports

Page 10: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10

DKA

CHA

Front end: integrated processors vs. SMP distributed processors

Device server

• With the XP24000, up to 128 MPs were located directly on the CHAs and DKAs

• Each MP has specific task sets and limited performance.• All MPs competed for Share Memory (MP) access and

locks

CHA

DKA

• With the P9500, the much faster multi-core MPs reside on Micro Processor Blades (MPB) and are independent of specific responsibilities

• All MPs share responsibility for the operation of the whole array.

• The MP Blades do have Local Memory (LM) to store Shared Memory content and reducing SM traffic

MP MP

CACHE

Shared Memory

MP MP

MP MPMP MP

MPBMP MPMP MP

CACHE

ESWCSW

SM

LM

Page 11: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11

MPBMPB

DKA

Front end: integrated processors vs. distributed processors (SMP)

Device server

• LDEV ownership and SW features are shared between all MPs

• All MPs compete for Share Memory access and locks• Shared Memory may become a bottleneck

DKA

XP24000

Control of all LDEVs and SW features

DKADKA

CHACHA

• LDEV ownership and SW features are dedicated to one multi-core Micro Processor Blade (MPB) where all cores share responsibility and load

• Most Shared Memory traffic only occurs locally to the Local Memory (LM) eliminating Shared Memory as the bottleneck

LM

P9500

Control of dedicated LDEVsand SW features

Control of dedicated LDEVsand SW features

CHAMP MPMP MP

CHAMP MPMP MP

MP MPMP MP MP MPMP MP

Shared memory

MPMP

MPMP

MPMP

MPMP

LMMPMP

MPMP

MPMP

MPMP

Page 12: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12

Back end: physical to logical layout

Device server

CL1A CL2ACHIP Ports

Cache

Shared (& LM on P9500) memory Phy -- LDEV – Port

PG 1-1 -- 0:00 -- CL1A0:00 CL2A

MP board on SMParchitecture

Page 13: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13

Back end: HDD understanding

Device server

• The number and type of disk drives installed in a array is flexible and actual counts vary depending on model

• Averaged IOPS/drive vary depending on array models due to factors other than Physical drive – such as ASIC, MP SMP roles, and cache slot boundaries to name a few.

• Average IOPS/drive or per array group can only give you a small glimpse into what an array can do. One must consider the cache slot size (256K for P9000, or 16K for P10000) and average expected latency per drive (design usually around 8ms)

1) Max # of SSD drives: 128 with one DKC; 256 with 2 DKC 2) Each disk logs in to the SAS switch at max SAS speed

Page 14: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14

Back end: High level maximums

Device server

Maximum Values P10000 XP24K P9500

P4000 (LeftHan

d)

P6000 (EVA)

P2000 MSA

X9000 (IBRIX)

Internal Disks 1920 1152 2048 >1120 450(SFF)240(LFF)

96 (LFF)149 (SFF)

2048

Internal Capacity TB 1600 688/22601 1840/2000² 2240 480 ~130 --

Subsystem Capacity PB (Internal + External

Capacities)-- 247 247 -- -- -- 16PB

Host Ports FC 192 FC 224 FC 192 1GbE ~ 64 8 -- ~ 4

4Gb/Controller

# of LUN/LDEVs -- 65280 65280 -- 2047 512 512

Cache GB 768 512 1’024Cache 32GB

Memory 768GB

22/controller 2/controller

--

Performance Disk GB/s -- 11 >15 80 (32 nodes) ~1.6 ~1.6 --

Performance Disk IOPS>360,000

SPC-1: 450,213

>160k >350k -- ~55k ~16k --1 600GB FC / 2TB SATA 3.5” disks2 900GB SAS / 1TB NL SAS 2.5” disks

Page 15: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15

0 1 2 3 4 5 6 7 8 9 10

RotateSeek

8KB Xfr(Same For F/C Drives)

SCSI Ctlr

HSx80

7200 RPM

10K RPM

5400 RPM

Ultra-SCSI Bus(1/2 For F/C Bus)

Time (ms)

15K RPM

Back end: avoid disk access on critical application

Device server

Page 16: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16

Backend: HDD, LDEV, ThP, V-Vol, & pool-vol

Device server

PDEV 1 of 4 in 4 disk raid set

Logical Device (LDEV)

Array Group, RSS, etc…

LDEV #n

LDEV #2

LDEV #1

The space represents a clear dividebetween the LDEVs on a single Physical DEV (PDEV)

Page 17: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17

Back end: queue depth overview

Device server

Read ahead algorithm for sequential contiguous blocks is done prior to HDD access. At this point, each block requires physical disk access and seek for physical location. Best designs try to hold each drive with an average queue between 2 – 4 depending on load. Q-depth at HDD level is not captured with all array performance tools, and varies depending on model.

Note: it is very important to understand that each I/O block may or may not be equal to a cache slot depending on array model – this plays a large role in performance characteristics.

1

2

3

4

active I/Os

4k 32k 8k 16k

Page 18: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18

Increase of Response Time

0

5

10

15

20

25

0% 20% 40% 60% 80% 100%

Utilization of Resource

Incre

ase

Back end: Latency with respect to utilization

Device server

Max IOPS highest response time (queuing)

Design for ~<10ms/IO

Designing and maintaining an average disk usage of 60% provides best overall performance results while yielding room for spike loads.

Page 19: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19

Back end: traditional (thick) vs. thin provisioning

Device server

2TB3.5TB

1.8TB3TB

1.9TB2.1TB

No pool

TraditionalOS visible 14.3TB

(Projected requirements)

14.3TB logically

provisionedcapacity

(V-Volumes)

5TB ThP pool3.1TB

used/written 1.9TB

free/unused

HP ThPOS visible

14.3TB(Projected

requirements)

Server view

14.3TB physically provisioned capacity 3.1TB of data actually

written11.2TB stranded

Physical capacity

required 14.3TB net

Array groups/Disk Drives

ThPPool

Net physical pool

capacity 5TB

Array presentation

0.6TB 0.4TB0.5TB0.7TB0.5TB 0.4TB

2TB 3.5TB1.8TB 3TB 1.9TB 2.1TB

Page 20: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20

Back end: cache partitioning – depends on model… P9500 below

Device server

P9500

Super Admin

Cache Partition n

CachePartition 3

CachePartition 2

CachePartition 1

ExternalStorage

• You can divide your P9500 in up to 32 independent “sub arrays”

• Partitions array resources (CHA, Host Groups, Cache and disk groups)

• Allows array resources to be focused to meet critical host requirements

• Allows for service provider oriented deployments

• Array partitions can be independently managed

• Can be used in a mixed deployment (Cache, Array, Traditional, ThP, Smart)

• You can set a MAX of 4 independent “sub arrays” with full hardware isolation down to the MP board.

CHA CHA CHA CHA CHA CHA CHA CHA

Host Group Host Group Host Group Host Group

LDEVs/VVols LDEVs/VVols LDEVs/VVols LDEVs/VVols

Group A Group B Group C + D

Page 21: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Host / application

Overview – I/O

Page 22: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22

I/O profile

Host / application

• The IO subsystem is the slowest component of the system and it is often the cause of performance problems

• Distinguishing between the many different layers of the IO request is key

• Delays often occur before and after the physical IO request is dispatched to disk device(s)

• Throughput (Mb/sec) and responsiveness (msec/IO) are directly related.

Page 23: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23

I/O Profile

Host / application

Fundamental causes of I/O performance problems • Slow Response in communicating to the device server• Bottleneck / Saturation / Queuing in I/O stack• Contention for locks at all levels of I/O stack• I/O access patterns• Inefficient I/O – logical vs physical IO (Scatter Gather – read-ahead)

Page 24: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24

I/O profile: elements of an I/O request

Host / application

File System

Vol Mgmt

Buffer Cache

Device Driver

IO Channel

Disk Device

Raw IO

File system IOLogical IO

Physical IO

Page 25: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25

I/O Profile layers

Host / application

64Kb

8K 32Kb 16Kb 8Kb

logical I/O request

fs extents

physical I/O request16Kb 16Kb 16Kb 8Kb

192Kb

64K 64K

readahead

4 FS extents (excluding read ahead amount)

LVM physical extent boundary

64K

max buffered I/O = 64K

Page 26: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26

I/O profile layers: IBRIX

Host / application

Extremely High Aggregate Performance from a Single

Directory (and Single File)

Dir

F1 F2 F3 Fn

Subdir

S1 S2 S3 Sn

1

4

2

3

100

Segments

F2

F3

Fn

S1

S2

S3

Sn

Subdir

US Patent # 6,782,389

SegmentServers

S1

S2

Sn

Dir

Page 27: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.27

I/O profile: stride, reverse, random, sequential, buffered/non-buffered, sync/async, ...

Host / application

Regardless of the SCSI(), CPU Cycles will be required for the given operation, time will be required and depending on Queue depth – latency will be felt.

Examples:P6000 (EVA) port Queue Depth is 2048– if the running queue depth is hold at 825, and the average latency is 23ms/IO, and the I/O behind the port is scheduled for a single LUN, and a SCSI-2 Reserve is issued (ESX), the latency could theoretically go as high as 18,975ms for which the SCSIconflictretry counter would deprecate its default counter to 0 from 80 and fail any outstanding I/O.This is not a fault of the array, but a mis-understanding of the architecture and placing to many apples in one basket.

Page 28: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28

CPU: interrupts, switches, and Q-depth behavior

Host / application

A IOCTL an interrupt to the CPU which the HBA is assigned. Interrupt coalescing is a (HBA driver) vender’s ability to interrupt a CPU one time to handle multiple IOCTL’s in a period of time thus reducing the CPU context switches.

1

2

3

4

Active I/Os

Host

HB

A 1

HB

A 2

Queued I/Os

Page 29: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29

OS CPU: Interrupt processing, switches, Freq of CPU…

Host / application

• HBA is assigned CPU for I/O completion interrupt• CPU may be uninterruptible

cpu 0busy

cpu 1idle

I/O completion

I/O completion

I/O completion

I/O completion

1

2

3

4

HBA

Page 30: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30

Array CPU: Busy due to IOCTL()

Host / application

CPU (MP) on array can also become “busy” or over utilized by processing request from 3rd party API’s to manipulate the array through IOCTL(), not just normal SCSI() read, writes, reserves, TUR, etc.

cpu 0busy

cpu 1idle

IOCTL() request for disk array performance measurements tools resulting in internal processing for dumping performance registers

Page 31: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31

Parallelism – keeping the pipe full through threading

Host / application

8KBRead stream # -- 8KB 8KB 8KB 8KB 8KB 8KB 8KB

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 2.0 3.0 4.0 5.0 6.0 7.00

102030405060708090

0123456789

Single Stream MB/sec (in theory)

MB/secKB/IO

MB

/Sec

KB/IO

milliseconds / IO

Page 32: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Storage connectivity

Page 33: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.33

FCIP, IFCP, or ISCSI

Storage connectivity

• IFCP and FCIP are intended for interconnecting Storage Area Networks (SAN) devices. IFCP’s claim to fame is providing SAN fabric segmentation (form of routing) but does not have much vendor hardware backing

• FCIP tunnels FC through IP and allows for fabric merges. • ISCSI almost always can reside on the same router with FCIP but not on the

same GbE port.• IFCP – routes between fabrics and can also be used for storage and server

Page 34: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.34

Flow Control: FC

Storage connectivity

• Primary Mechanism buffer credits. • Each Transmitting port has a given credit, bb_credit, which

represents the maximum number of receive buffers (outstanding frames) it can use.

• Slow Drain –a slow component in the SAN can lead to exhaustion of the Buffer credits for an initiator or target resulting in severely inflated service times

2KB 2KB 2KB 2KB 2KB 2KB 2KB 2KB RxTx

Example:

Time to transmit 8 2KB Frames 32 KM takes 160us.

Page 35: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.35

Flow Control: FCIP, IFCP, & ISCSI

Storage connectivity

All three have one thing in common; they all utilize TCP/IP protocols for moving block data.

• Buffer credits @ FC layer• sliding windows @ TCP layer • QOS (@ both layers)

Page 36: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.36

Connectivity issues:

Storage connectivity

• TCP/IP network RTT• TCP congestion control – to avoid over running receiver • TCP/IP retransmissions severely impacts SCSI exchangeExample:Over subscription of a FCIP tunnel will result in TCP retransmissions. Anything greater than 1% TCP retransmissions will greatly inflate the SCSI exchange RTT thus resulting in the depletion of FC B2B and causing FC communication to stall. The FC “Slow Drain” effect. In 10Gbit 0.1% retransmission will result in never achieving more than 80% throughput.

Page 37: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Debugging

Page 38: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.38

Layer overview

Debugging

Use

r Space

Applications

GNU C lib

Kern

el S

pace

System Call Interface

VFS (ext3, NTFS, VxFS, etc)

Buffer Cache

MPIO – device mapper

RAW

LVM, VxVM, sd<alpha>

Blkdev SCSI IDE Etc…

Page 39: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.39

Layer overview -- SCSI

Debugging

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FC ISCSISAS

Etc…

Use

r S

pace

Applications

GNU C lib

Kern

el S

pace

System Call Interface

VFS (ext3, NTFS, VxFS, etc)

Buffer Cache

MPIO – device mapper

RAW

LVM, VxVM, sd<alpha>

Blkdev SCSI

IDEEtc…

Page 40: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.40

Layer Overview -- SCSI

Debugging

Upper Layer:

Converts the requests from the upper layers into SCSI commands.  This is where the SCSI encapsulation begins…

Files:

Remember

./linux/drivers/scsi/sd.c…/** * init_sd - entry point for this driver (both when built in or when * a module). * * Note: this function registers this driver with the scsi mid-level. **/static int __init init_sd(void){ int majors = 0, i, err;

SCSI_LOG_HLQUEUE(3, printk("init_sd: sd driver entry point\n"));

for (i = 0; i < SD_MAJORS; i++) if (register_blkdev(sd_major(i), "sd") == 0) majors++]

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FC ISCSISAS

Etc…

Page 41: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.41

Layer overview -- SCSI

Debugging

Mid Layer:Error handlers, Queuing, timeout, etc are represented in this layer.   Given that this layer is what holds the LLD layer together with the SCSI protocol it is referred to as the GLUE layer. 

SCSI logging “/sys/module/scsi_mod/parameters/scsi_logging_level” is enabled at this Layer.

Files:

~/scsi/scsi.c

~/scsi/scsi_error.c

static int __init init_scsi(void){ int error;

error = scsi_init_queue(); if (error) return error; error = scsi_init_procfs(); if (error) goto cleanup_queue; error = scsi_init_devinfo(); if (error) goto cleanup_procfs; error = scsi_init_hosts(); if (error) goto cleanup_devlist; error = scsi_init_sysctl(); if (error) goto cleanup_hosts; error = scsi_sysfs_register(); if (error) goto cleanup_sysctl;

scsi_netlink_init();

printk(KERN_NOTICE "SCSI subsystem initialized\n"); return 0;

…int scsi_error_handler(void *data){ struct Scsi_Host *shost = data;…

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FC ISCSISAS

Etc…

Page 42: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.42

Layer overview -- SCSI

Layer overview -- SCSI

Lower layerThe low Layer Device Driver layer (LDD), such as Qlogic, reside at this layer.  We also enabled debugging at this layer.

Files:qla_dbg.h …

iscsi_tcp.c (software)

be_iscsi.h

scsi_host.h

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FC ISCSISAS

Etc…

Page 43: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

SCSI -- LLDD

Lower layer device drivers – donnectivity – FC

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FCISCS

ISAS

Etc…

Page 44: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.44

Connectivity

Lower layer device drivers

Low layer device drivers• Qlogic• Emulex• Brocade• …

SCSI Lower Layer FC ISCSISAS

Etc…

Page 45: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.45

SanSurfer ~ Qlogic

Lower layer device driversSCSI Lower Layer FC ISCSI

SAS

Etc…

Upper LayerSD

ST SRSG

SCIS MID LAYER: GLUE

SCSI Lower Layer FCISCSI

SAS

Etc…

Use

r S

pace

Applications

GNU C lib

Kern

el S

pace

System Call Interface

VFS (ext3, NTFS, VxFS, etc)

Buffer Cache

MPIO – device mapper

RAW

LVM, VxVM, sd<alpha>

BlkdevSCS

IIDE

Etc…

Page 46: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.46

scsi_host via systool

Lower layer device drivers

# systool -c scsi_host -v output for this system #

##################################################

Class = "scsi_host"

Class Device = "host0"

Class Device path = "/sys/class/scsi_host/host0"

cmd_per_lun = "1"

host_busy = "0"

proc_name = "ata_piix"

scan = <store method only>

sg_tablesize = "128"

state = "running"

uevent = <store method only>

unchecked_isa_dma = "0"

unique_id = "1"

SCSI Lower Layer FC ISCSISAS

Etc…

./include/scsi/scsi_host.h

Page 47: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

SCSI -- LLDD

Lower layer Device Drivers – FC Debug

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FCISCS

ISAS

Etc…

Page 48: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.48

Debug

Lower layer device drivers

Enabling debug of the lower layer devices depends on driver options• Qlogic• Emulex

# modinfo <driver>

SCSI Lower Layer FC ISCSISAS

Etc…

Page 49: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.49

Debug qlogic

Lower layer device drivers

DynamicThe actual parameter varies depending on kernel version… in short.

To enable:

$ echo 1 > /sys/module/qla2xxx/ql2xextended_error_logging

or

$ echo 1 >/sys/module/qla2xxx/parameters/ql2xextended_error_logging

Full Details at:http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/Enable-verbose-debugging-with-Emulex-and-Qlogic-on-Linux/ba-p/89957

  

SCSI Lower Layer FC ISCSISAS

Etc…

Page 50: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.50

Debug lpfc ~ Emulex

Lower layer device drivers

# modinfo lpfc

depends: scsi_mod,scsi_transport_fc

vermagic: 2.6.18-194.el5 SMP mod_unload gcc-4.1

parm: lpfc_log_verbose:Verbose logging bit-mask (int)

SCSI Lower Layer FC ISCSISAS

Etc…

Log mask definition from to verbose bit discription

LOG_ELS 100 199 0x1 ELS events

LOG_DISCOVERY 200 299 0x2 Link discovery events

LOG_INIT 400 499 0x8 Initialization events

LOG_FCP 700 799 0x40 FCP traffic history

Reserved 800 899

LOG_NODE 900 999 0x80 Node table events

LOG_SECURITY 1000 1099 0x8000 FC Security

Reserved 1100 1199

LOG_MISC 1200 1299 0x400 Miscellaneous events

LOG_LINK_EVENT 1300 1399 0x10 Link events

… … … … …

position

4 3 2 1

HEX     D B

DEC     13 11

BIN     1101 1011

We (Greg and Chris) find that logging level 0xdb works GREAT in situations in which you need to see what is going on. Here is a detailed breakdown of how to calculate the logging level in case you wish to log something else. lpfc_log_verbose=0xdb Depends on driver (check the source)   

Page 51: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

SCSI -- LLDD

Lower layer Device Drivers – ISCSI

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FCISCS

ISAS

Etc…

Page 52: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.52

ISCSI

Lower layer device drivers

Low Layer device drivers• ISCSI transport (software or hardware)• iscsi_tcp.c

SCSI Lower Layer FC ISCSISAS

Etc…

Page 53: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.53

ISCSI

Lower layer device driversSCSI Lower Layer FC ISCSI

SAS

Etc…

Loading iSCSI transport class v2.0-871.cxgb3i: tag itt 0x1fff, 13 bits, age 0xf, 4 bits.iscsi: registered transport (cxgb3i)Broadcom NetXtreme II CNIC Driver cnic v2.1.0 (Oct 10, 2009)cnic: Added CNIC device: eth0cnic: Added CNIC device: eth1Broadcom NetXtreme II iSCSI Driver bnx2i v2.1.0 (Dec 06, 2009)iscsi: registered transport (bnx2i)scsi4 : Broadcom Offload iSCSI Initiatorscsi5 : Broadcom Offload iSCSI Initiatoriscsi: registered transport (tcp)iscsi: registered transport (iser)iscsi: registered transport (be2iscsi) scsi6 : iSCSI Initiator over TCP/IP Vendor: LEFTHAND Model: iSCSIDisk Rev: 9500 Type: Direct-Access ANSI SCSI revision: 05SCSI device sda: 2147483648 512-byte hdwr sectors (1099512 MB)sda: Write Protect is offsda: Mode Sense: 77 00 00 08SCSI device sda: drive cache: noneSCSI device sda: 2147483648 512-byte hdwr sectors (1099512 MB)sda: Write Protect is offsda: Mode Sense: 77 00 00 08SCSI device sda: drive cache: none sda: unknown partition tablesd 6:0:0:0: Attached scsi disk sdasd 6:0:0:0: Attached scsi generic sg0 type 0

This is the default print of a kernel ring buffer on a RH system where ISCIS drivers are installed.

It should be noted that not all of these drivers are needed.

• ISCSI_TCP --software (needed if there are no hardware offload adaptors installed)

• Cxgb3i ~ hardware Chelsio adaptors• Bnx2 ~ broadcom and cnic used for offload• ISER ~ AKA ib_iser ~ infiniband• Be2iscsi ~ Server Engine ~ Emulex acquired

Page 54: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.54

ISCSI

Lower layer device drivers

Enable debug at the ISCSI connection layer (not the TCP interconnect level)

# modprobe.conf:

Add “options iscsi_tcp debug_iscsi_tcp=1”

NOTE Depends on driver used

# /sbin/iscsid -c /etc/iscsi/iscsid.conf -i /etc/iscsi/initiatorname.iscsi -d 1

# iscsiadm -m node --loginall=automatic

Logging in to [iface: default, target: iqn.2003-10.com.lefthandnetworks:labtest:39:test, portal: 10.1.0.42,3260]

Login to [iface: default, target: iqn.2003-10.com.lefthandnetworks:labtest:39:test, portal: 10.1.0.42,3260] successful

Though the CLI shows the same information on STDOUT, the log file will have far more…

SCSI Lower Layer FC ISCSISAS

Etc…

Page 55: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

SCSI

Mid and upper layers

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FCISCS

ISAS

Etc…

Page 56: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.56

SCSI Architecture – t10.org

SCSI – mid and upper layersUpper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FCISCS

ISAS

Etc…

Page 57: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.57

Review of LLDD inquiry

SCSI – mid and upper layers

The LLDD layer has discovered targets and initialized it’s structures.

It has issued the SCSI_SCAN() or REPORT_LUN().

EXAMPLE:

RSCN comes in over the wire.. Triggers error handling at the HBA and the HBA calls a LIP to rescan.

LDD driver

qla2x00_do_dpc(void *data) --> qla2x00_rescan_fcports --> qla2x00_update_fcport --> qla2x00_lun_discovery() --> qla2x00_rpt_lun_discovery() --> qla2x00_report_lun()

Same for ISCSI. Once communication is established for the session, a REPORT_LUN() is issued.

• Devices are returned to the SCIS MID and UPPER layers for driver registration and context building.

• Udev wakes up with an interrupt and builds dynamic user level device files from the uevent.

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FCISCS

ISAS

Etc…

Page 58: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

SCSI

Mid and upper layers -- DEBUG

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FCISCS

ISAS

Etc…

Page 59: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.59

SCSI logging

SCSI mid and upper: DEBUG

Many levels exist, a good starting point:To enable:

echo 0x9411 > /proc/sys/dev/scsi/logging_level

 

To disable:echo 0 > /proc/sys/dev/scsi/logging_levelSee more details at:http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/Enable-verbose-debugging-with-native-SCSI-drivers-on-Linux/ba-p/89955

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FCISCS

ISAS

Etc…

Page 60: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.60

SCSI debug runneth over

SCSI mid and upper: DEBUG

WARNING:The above Logging level will fill up /var/log/messages at

a rapid pace.  Make sure you know what you are looking for

and try to keep the debug level down to 4 – 6 hours. 

 

The levels are described in: scsi_logging.h

Found on any distribution and on

http://linuxdb.corp.hp.com

Example:

http://brunel.gbr.hp.com/suse/lxr/http-SLES10-x86_64/source/drivers/scsi/scsi_logging.h

Upper Layer SD ST SR SG

SCIS MID LAYER: GLUE

SCSI Lower Layer FCISCS

ISAS

Etc…

Page 61: Tinker Twins – TB2095 - Technical tactics for enterprise storage

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you