Upload
hp-enterprise
View
16
Download
1
Tags:
Embed Size (px)
DESCRIPTION
HP Master Technologists: Greg & Chris Tinker, presentation deck from HP Discover 2012 Las Vegas “Technical tactics for enterprise storage”
Citation preview
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Tinker, Greg & Chris – HP Master Technologists
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP P9000 (XP), P10000 (3PAR), and P6000 (EVA), P4000 (Left Hand), X9000 (Ibrix)
Technical tactics for enterprise storage
Tinker, Greg & Chris – HP Master TechnologistsJune 2012
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3
This session is high level and is subject to change without notice.
Forward-looking statements
This document contains forward looking statements regarding future operations, product development, and product capabilities. This information is subject to substantial uncertainties and is subject to change at any time without prior notification. Statements contained in this document concerning these matters only reflect Hewlett Packard's predictions and / or expectations as of the date of this document and actual results and future plans of Hewlett-Packard may differ significantly as a result of, among other things, changes in product strategy resulting from technological, internal corporate, market and other changes. This is not a commitment to deliver any material, code or functionality and should not be relied upon in making purchasing decisions.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4
Solu
tion
s In
frast
ruct
ure
HP Networking Wired, Wireless, Data Center, Security & Management
B, C & H SeriesFC Switches & Directors
SAN Connection Portfolio
HP Networking Enterprise Switches
Nearl
ine
D2D Backup Systems
ESL tapelibraries
Virtual library
systems
EML tape librariesMSL tape
librariesRDX, tape drives
& tape autoloaders
Soft
ware
HP S
erv
ices
Onlin
e
P2000X1000/X3000 P9500 XPX9000 P6000 EVAP4000
Data ProtectorExpress
Storage Essentials
Storage Array Software
Storage Mirroring
Data Protector
Business Copy Continuous Access
Cluster Extension
The HP storage portfolio
X5000
E5000for Exchange
HP VirtualSystem
HP CloudSystem
P10K/3PAR
HP AppSystem
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Where to begin
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
Agenda
Technical tactics for enterprise storage
Device server – the array:• Front end (CHA/FA, Cache, MP/ASIC, Bus, CMD IOCTL, etc.) – overhead and/or
saturation• Back end – Hot disks, slow disks, Array Groups, Storage Tiers, external storage
Host / application:• IO profile – stride, reverse, random, sequential, buffered/non-buffered, sync/async...• CPU – interrupts, switches, freq of CPU…• Parallelism – keeping the pipe full
Storage connectivity• Flow control• Latency
DebuggingKnow the layers
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Device server
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
Device serverP2000 MSA P9500
P4000 LeftHand
Virtual IT Mission Critical Consolidation P6000
EVA
Utility Storage
3PAR
Application Consolidation
Storage Consolidation
Architecture Dual Controller Scale-out Cluster Dual Controller Mesh-Active Cluster Fully Redundant
Connectivity SAS, iSCSI, FC iSCSI FC, iSCSI, FCoE iSCSI, FC, (FCoE) FC, FCoE
Performance 30K Random Read IOPs ; 1.5GB/s seq reads
35K Random read IOPs2.6 GB/s seq reads
55K Random read IOPS 1.7 GB/s seq Reads
> 400K random IOPs; > 10 GB/s seq reads
>300K Random IOPS (ThP)> 10GB/s seq reads
Application Sweet spot
SMB , enterprise ROBO, consolidation/ virtualizationServer attach, Video surveillance
SMB, ROBO and Enterprise –Virtualized inc VDI , Microsoft appsBladeSystem SAN (P4800)
Enterprise - Microsoft, Virtualized, OLTP
Enterprise and Service Provider , Utilities, Cloud, Virtualized Environments, OLTP, Mixed Workloads
Large Enterprise - Mission Critical w/Extreme availability, Virtualized Environments, Multi-Site DR
Capacity 600GB – 192TB; 6TB average
7TB – 768TB; 72TB average
2TB – 480TB; 36TB average
5TB – 1600TB;120TB average
10TB – 2000 TB; 150TB average
Key features Price / performanceController ChoiceReplicationServer Attach
All-inclusive SWMulti-Site DR includedVirtualizationVM IntegrationVirtual SAN Appliance
Ease of use and SimplicityIntegration/CompatibilityMulti-Site Failover
Multi-tenancyEfficiency (Thin Provisioning)PerformanceAutonomic Tiering and Management
Constant Data AvailabilityHeterogeneous VirtualizationMulti-site Disaster RecoveryApplication QOS (APEX)Smart Tiers
OS support Windows, vSphere, HP-UX, Linux, OVMS, Mac OS X, Solaris, Hyper-V
vSphere. Windows, Linux, HP-UX, MacOS X, AIX, Solaris, XenServer
Windows, VMware, HP-UX, Linux, OVMS, Mac OS X, Solaris, AIX
vSphere, Windows, Linux, HP-UX, AIX, Solaris
All major OS’s including Mainframe and Nonstop
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
Controller 2
CacheCPU
Controller 1
CacheCPU. . .
Front end: midrange controller architecture - ALUA
Device server
Typical Midrange Controller design
Clustered dual Controller, Host Ports, Mirrored Cache, Set of Software Features.
. . .Fibre Channel or
SAS BackendFC Ports
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10
DKA
CHA
Front end: integrated processors vs. SMP distributed processors
Device server
• With the XP24000, up to 128 MPs were located directly on the CHAs and DKAs
• Each MP has specific task sets and limited performance.• All MPs competed for Share Memory (MP) access and
locks
CHA
DKA
• With the P9500, the much faster multi-core MPs reside on Micro Processor Blades (MPB) and are independent of specific responsibilities
• All MPs share responsibility for the operation of the whole array.
• The MP Blades do have Local Memory (LM) to store Shared Memory content and reducing SM traffic
MP MP
CACHE
Shared Memory
MP MP
MP MPMP MP
MPBMP MPMP MP
CACHE
ESWCSW
SM
LM
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
MPBMPB
DKA
Front end: integrated processors vs. distributed processors (SMP)
Device server
• LDEV ownership and SW features are shared between all MPs
• All MPs compete for Share Memory access and locks• Shared Memory may become a bottleneck
DKA
XP24000
Control of all LDEVs and SW features
DKADKA
CHACHA
• LDEV ownership and SW features are dedicated to one multi-core Micro Processor Blade (MPB) where all cores share responsibility and load
• Most Shared Memory traffic only occurs locally to the Local Memory (LM) eliminating Shared Memory as the bottleneck
LM
P9500
Control of dedicated LDEVsand SW features
Control of dedicated LDEVsand SW features
CHAMP MPMP MP
CHAMP MPMP MP
MP MPMP MP MP MPMP MP
Shared memory
MPMP
MPMP
MPMP
MPMP
LMMPMP
MPMP
MPMP
MPMP
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
Back end: physical to logical layout
Device server
CL1A CL2ACHIP Ports
Cache
Shared (& LM on P9500) memory Phy -- LDEV – Port
PG 1-1 -- 0:00 -- CL1A0:00 CL2A
MP board on SMParchitecture
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
Back end: HDD understanding
Device server
• The number and type of disk drives installed in a array is flexible and actual counts vary depending on model
• Averaged IOPS/drive vary depending on array models due to factors other than Physical drive – such as ASIC, MP SMP roles, and cache slot boundaries to name a few.
• Average IOPS/drive or per array group can only give you a small glimpse into what an array can do. One must consider the cache slot size (256K for P9000, or 16K for P10000) and average expected latency per drive (design usually around 8ms)
1) Max # of SSD drives: 128 with one DKC; 256 with 2 DKC 2) Each disk logs in to the SAS switch at max SAS speed
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Back end: High level maximums
Device server
Maximum Values P10000 XP24K P9500
P4000 (LeftHan
d)
P6000 (EVA)
P2000 MSA
X9000 (IBRIX)
Internal Disks 1920 1152 2048 >1120 450(SFF)240(LFF)
96 (LFF)149 (SFF)
2048
Internal Capacity TB 1600 688/22601 1840/2000² 2240 480 ~130 --
Subsystem Capacity PB (Internal + External
Capacities)-- 247 247 -- -- -- 16PB
Host Ports FC 192 FC 224 FC 192 1GbE ~ 64 8 -- ~ 4
4Gb/Controller
# of LUN/LDEVs -- 65280 65280 -- 2047 512 512
Cache GB 768 512 1’024Cache 32GB
Memory 768GB
22/controller 2/controller
--
Performance Disk GB/s -- 11 >15 80 (32 nodes) ~1.6 ~1.6 --
Performance Disk IOPS>360,000
SPC-1: 450,213
>160k >350k -- ~55k ~16k --1 600GB FC / 2TB SATA 3.5” disks2 900GB SAS / 1TB NL SAS 2.5” disks
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
0 1 2 3 4 5 6 7 8 9 10
RotateSeek
8KB Xfr(Same For F/C Drives)
SCSI Ctlr
HSx80
7200 RPM
10K RPM
5400 RPM
Ultra-SCSI Bus(1/2 For F/C Bus)
Time (ms)
15K RPM
Back end: avoid disk access on critical application
Device server
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
Backend: HDD, LDEV, ThP, V-Vol, & pool-vol
Device server
PDEV 1 of 4 in 4 disk raid set
Logical Device (LDEV)
Array Group, RSS, etc…
LDEV #n
LDEV #2
LDEV #1
The space represents a clear dividebetween the LDEVs on a single Physical DEV (PDEV)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
Back end: queue depth overview
Device server
Read ahead algorithm for sequential contiguous blocks is done prior to HDD access. At this point, each block requires physical disk access and seek for physical location. Best designs try to hold each drive with an average queue between 2 – 4 depending on load. Q-depth at HDD level is not captured with all array performance tools, and varies depending on model.
Note: it is very important to understand that each I/O block may or may not be equal to a cache slot depending on array model – this plays a large role in performance characteristics.
1
2
3
4
active I/Os
4k 32k 8k 16k
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18
Increase of Response Time
0
5
10
15
20
25
0% 20% 40% 60% 80% 100%
Utilization of Resource
Incre
ase
Back end: Latency with respect to utilization
Device server
Max IOPS highest response time (queuing)
Design for ~<10ms/IO
Designing and maintaining an average disk usage of 60% provides best overall performance results while yielding room for spike loads.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19
Back end: traditional (thick) vs. thin provisioning
Device server
2TB3.5TB
1.8TB3TB
1.9TB2.1TB
No pool
TraditionalOS visible 14.3TB
(Projected requirements)
14.3TB logically
provisionedcapacity
(V-Volumes)
5TB ThP pool3.1TB
used/written 1.9TB
free/unused
HP ThPOS visible
14.3TB(Projected
requirements)
Server view
14.3TB physically provisioned capacity 3.1TB of data actually
written11.2TB stranded
Physical capacity
required 14.3TB net
Array groups/Disk Drives
ThPPool
Net physical pool
capacity 5TB
Array presentation
0.6TB 0.4TB0.5TB0.7TB0.5TB 0.4TB
2TB 3.5TB1.8TB 3TB 1.9TB 2.1TB
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20
Back end: cache partitioning – depends on model… P9500 below
Device server
P9500
Super Admin
Cache Partition n
CachePartition 3
CachePartition 2
CachePartition 1
ExternalStorage
• You can divide your P9500 in up to 32 independent “sub arrays”
• Partitions array resources (CHA, Host Groups, Cache and disk groups)
• Allows array resources to be focused to meet critical host requirements
• Allows for service provider oriented deployments
• Array partitions can be independently managed
• Can be used in a mixed deployment (Cache, Array, Traditional, ThP, Smart)
• You can set a MAX of 4 independent “sub arrays” with full hardware isolation down to the MP board.
CHA CHA CHA CHA CHA CHA CHA CHA
Host Group Host Group Host Group Host Group
LDEVs/VVols LDEVs/VVols LDEVs/VVols LDEVs/VVols
Group A Group B Group C + D
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Host / application
Overview – I/O
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22
I/O profile
Host / application
• The IO subsystem is the slowest component of the system and it is often the cause of performance problems
• Distinguishing between the many different layers of the IO request is key
• Delays often occur before and after the physical IO request is dispatched to disk device(s)
• Throughput (Mb/sec) and responsiveness (msec/IO) are directly related.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23
I/O Profile
Host / application
Fundamental causes of I/O performance problems • Slow Response in communicating to the device server• Bottleneck / Saturation / Queuing in I/O stack• Contention for locks at all levels of I/O stack• I/O access patterns• Inefficient I/O – logical vs physical IO (Scatter Gather – read-ahead)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24
I/O profile: elements of an I/O request
Host / application
File System
Vol Mgmt
Buffer Cache
Device Driver
IO Channel
Disk Device
Raw IO
File system IOLogical IO
Physical IO
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25
I/O Profile layers
Host / application
64Kb
8K 32Kb 16Kb 8Kb
logical I/O request
fs extents
physical I/O request16Kb 16Kb 16Kb 8Kb
192Kb
64K 64K
readahead
4 FS extents (excluding read ahead amount)
LVM physical extent boundary
64K
max buffered I/O = 64K
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26
I/O profile layers: IBRIX
Host / application
Extremely High Aggregate Performance from a Single
Directory (and Single File)
Dir
F1 F2 F3 Fn
…
Subdir
S1 S2 S3 Sn
…
1
4
2
3
…
100
Segments
F2
F3
Fn
S1
S2
S3
Sn
Subdir
US Patent # 6,782,389
SegmentServers
S1
S2
Sn
Dir
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.27
I/O profile: stride, reverse, random, sequential, buffered/non-buffered, sync/async, ...
Host / application
Regardless of the SCSI(), CPU Cycles will be required for the given operation, time will be required and depending on Queue depth – latency will be felt.
Examples:P6000 (EVA) port Queue Depth is 2048– if the running queue depth is hold at 825, and the average latency is 23ms/IO, and the I/O behind the port is scheduled for a single LUN, and a SCSI-2 Reserve is issued (ESX), the latency could theoretically go as high as 18,975ms for which the SCSIconflictretry counter would deprecate its default counter to 0 from 80 and fail any outstanding I/O.This is not a fault of the array, but a mis-understanding of the architecture and placing to many apples in one basket.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28
CPU: interrupts, switches, and Q-depth behavior
Host / application
A IOCTL an interrupt to the CPU which the HBA is assigned. Interrupt coalescing is a (HBA driver) vender’s ability to interrupt a CPU one time to handle multiple IOCTL’s in a period of time thus reducing the CPU context switches.
1
2
3
4
Active I/Os
Host
HB
A 1
HB
A 2
Queued I/Os
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29
OS CPU: Interrupt processing, switches, Freq of CPU…
Host / application
• HBA is assigned CPU for I/O completion interrupt• CPU may be uninterruptible
cpu 0busy
cpu 1idle
I/O completion
I/O completion
I/O completion
I/O completion
1
2
3
4
HBA
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30
Array CPU: Busy due to IOCTL()
Host / application
CPU (MP) on array can also become “busy” or over utilized by processing request from 3rd party API’s to manipulate the array through IOCTL(), not just normal SCSI() read, writes, reserves, TUR, etc.
cpu 0busy
cpu 1idle
IOCTL() request for disk array performance measurements tools resulting in internal processing for dumping performance registers
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31
Parallelism – keeping the pipe full through threading
Host / application
8KBRead stream # -- 8KB 8KB 8KB 8KB 8KB 8KB 8KB
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 2.0 3.0 4.0 5.0 6.0 7.00
102030405060708090
0123456789
Single Stream MB/sec (in theory)
MB/secKB/IO
MB
/Sec
KB/IO
milliseconds / IO
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Storage connectivity
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.33
FCIP, IFCP, or ISCSI
Storage connectivity
• IFCP and FCIP are intended for interconnecting Storage Area Networks (SAN) devices. IFCP’s claim to fame is providing SAN fabric segmentation (form of routing) but does not have much vendor hardware backing
• FCIP tunnels FC through IP and allows for fabric merges. • ISCSI almost always can reside on the same router with FCIP but not on the
same GbE port.• IFCP – routes between fabrics and can also be used for storage and server
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.34
Flow Control: FC
Storage connectivity
• Primary Mechanism buffer credits. • Each Transmitting port has a given credit, bb_credit, which
represents the maximum number of receive buffers (outstanding frames) it can use.
• Slow Drain –a slow component in the SAN can lead to exhaustion of the Buffer credits for an initiator or target resulting in severely inflated service times
2KB 2KB 2KB 2KB 2KB 2KB 2KB 2KB RxTx
Example:
Time to transmit 8 2KB Frames 32 KM takes 160us.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.35
Flow Control: FCIP, IFCP, & ISCSI
Storage connectivity
All three have one thing in common; they all utilize TCP/IP protocols for moving block data.
• Buffer credits @ FC layer• sliding windows @ TCP layer • QOS (@ both layers)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.36
Connectivity issues:
Storage connectivity
• TCP/IP network RTT• TCP congestion control – to avoid over running receiver • TCP/IP retransmissions severely impacts SCSI exchangeExample:Over subscription of a FCIP tunnel will result in TCP retransmissions. Anything greater than 1% TCP retransmissions will greatly inflate the SCSI exchange RTT thus resulting in the depletion of FC B2B and causing FC communication to stall. The FC “Slow Drain” effect. In 10Gbit 0.1% retransmission will result in never achieving more than 80% throughput.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Debugging
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.38
Layer overview
Debugging
Use
r Space
Applications
GNU C lib
Kern
el S
pace
System Call Interface
VFS (ext3, NTFS, VxFS, etc)
Buffer Cache
MPIO – device mapper
RAW
LVM, VxVM, sd<alpha>
Blkdev SCSI IDE Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.39
Layer overview -- SCSI
Debugging
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FC ISCSISAS
Etc…
Use
r S
pace
Applications
GNU C lib
Kern
el S
pace
System Call Interface
VFS (ext3, NTFS, VxFS, etc)
Buffer Cache
MPIO – device mapper
RAW
LVM, VxVM, sd<alpha>
Blkdev SCSI
IDEEtc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.40
Layer Overview -- SCSI
Debugging
Upper Layer:
Converts the requests from the upper layers into SCSI commands. This is where the SCSI encapsulation begins…
Files:
Remember
./linux/drivers/scsi/sd.c…/** * init_sd - entry point for this driver (both when built in or when * a module). * * Note: this function registers this driver with the scsi mid-level. **/static int __init init_sd(void){ int majors = 0, i, err;
SCSI_LOG_HLQUEUE(3, printk("init_sd: sd driver entry point\n"));
for (i = 0; i < SD_MAJORS; i++) if (register_blkdev(sd_major(i), "sd") == 0) majors++]
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FC ISCSISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.41
Layer overview -- SCSI
Debugging
Mid Layer:Error handlers, Queuing, timeout, etc are represented in this layer. Given that this layer is what holds the LLD layer together with the SCSI protocol it is referred to as the GLUE layer.
SCSI logging “/sys/module/scsi_mod/parameters/scsi_logging_level” is enabled at this Layer.
Files:
~/scsi/scsi.c
~/scsi/scsi_error.c
static int __init init_scsi(void){ int error;
error = scsi_init_queue(); if (error) return error; error = scsi_init_procfs(); if (error) goto cleanup_queue; error = scsi_init_devinfo(); if (error) goto cleanup_procfs; error = scsi_init_hosts(); if (error) goto cleanup_devlist; error = scsi_init_sysctl(); if (error) goto cleanup_hosts; error = scsi_sysfs_register(); if (error) goto cleanup_sysctl;
scsi_netlink_init();
printk(KERN_NOTICE "SCSI subsystem initialized\n"); return 0;
…int scsi_error_handler(void *data){ struct Scsi_Host *shost = data;…
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FC ISCSISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.42
Layer overview -- SCSI
Layer overview -- SCSI
Lower layerThe low Layer Device Driver layer (LDD), such as Qlogic, reside at this layer. We also enabled debugging at this layer.
Files:qla_dbg.h …
iscsi_tcp.c (software)
be_iscsi.h
scsi_host.h
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FC ISCSISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
SCSI -- LLDD
Lower layer device drivers – donnectivity – FC
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FCISCS
ISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.44
Connectivity
Lower layer device drivers
Low layer device drivers• Qlogic• Emulex• Brocade• …
SCSI Lower Layer FC ISCSISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.45
SanSurfer ~ Qlogic
Lower layer device driversSCSI Lower Layer FC ISCSI
SAS
Etc…
Upper LayerSD
ST SRSG
SCIS MID LAYER: GLUE
SCSI Lower Layer FCISCSI
SAS
Etc…
Use
r S
pace
Applications
GNU C lib
Kern
el S
pace
System Call Interface
VFS (ext3, NTFS, VxFS, etc)
Buffer Cache
MPIO – device mapper
RAW
LVM, VxVM, sd<alpha>
BlkdevSCS
IIDE
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.46
scsi_host via systool
Lower layer device drivers
# systool -c scsi_host -v output for this system #
##################################################
Class = "scsi_host"
Class Device = "host0"
Class Device path = "/sys/class/scsi_host/host0"
cmd_per_lun = "1"
host_busy = "0"
proc_name = "ata_piix"
scan = <store method only>
sg_tablesize = "128"
state = "running"
uevent = <store method only>
unchecked_isa_dma = "0"
unique_id = "1"
…
SCSI Lower Layer FC ISCSISAS
Etc…
./include/scsi/scsi_host.h
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
SCSI -- LLDD
Lower layer Device Drivers – FC Debug
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FCISCS
ISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.48
Debug
Lower layer device drivers
Enabling debug of the lower layer devices depends on driver options• Qlogic• Emulex
# modinfo <driver>
SCSI Lower Layer FC ISCSISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.49
Debug qlogic
Lower layer device drivers
DynamicThe actual parameter varies depending on kernel version… in short.
To enable:
$ echo 1 > /sys/module/qla2xxx/ql2xextended_error_logging
or
$ echo 1 >/sys/module/qla2xxx/parameters/ql2xextended_error_logging
Full Details at:http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/Enable-verbose-debugging-with-Emulex-and-Qlogic-on-Linux/ba-p/89957
SCSI Lower Layer FC ISCSISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.50
Debug lpfc ~ Emulex
Lower layer device drivers
# modinfo lpfc
…
depends: scsi_mod,scsi_transport_fc
vermagic: 2.6.18-194.el5 SMP mod_unload gcc-4.1
parm: lpfc_log_verbose:Verbose logging bit-mask (int)
SCSI Lower Layer FC ISCSISAS
Etc…
Log mask definition from to verbose bit discription
LOG_ELS 100 199 0x1 ELS events
LOG_DISCOVERY 200 299 0x2 Link discovery events
LOG_INIT 400 499 0x8 Initialization events
LOG_FCP 700 799 0x40 FCP traffic history
Reserved 800 899
LOG_NODE 900 999 0x80 Node table events
LOG_SECURITY 1000 1099 0x8000 FC Security
Reserved 1100 1199
LOG_MISC 1200 1299 0x400 Miscellaneous events
LOG_LINK_EVENT 1300 1399 0x10 Link events
… … … … …
position
4 3 2 1
HEX D B
DEC 13 11
BIN 1101 1011
We (Greg and Chris) find that logging level 0xdb works GREAT in situations in which you need to see what is going on. Here is a detailed breakdown of how to calculate the logging level in case you wish to log something else. lpfc_log_verbose=0xdb Depends on driver (check the source)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
SCSI -- LLDD
Lower layer Device Drivers – ISCSI
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FCISCS
ISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.52
ISCSI
Lower layer device drivers
Low Layer device drivers• ISCSI transport (software or hardware)• iscsi_tcp.c
SCSI Lower Layer FC ISCSISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.53
ISCSI
Lower layer device driversSCSI Lower Layer FC ISCSI
SAS
Etc…
Loading iSCSI transport class v2.0-871.cxgb3i: tag itt 0x1fff, 13 bits, age 0xf, 4 bits.iscsi: registered transport (cxgb3i)Broadcom NetXtreme II CNIC Driver cnic v2.1.0 (Oct 10, 2009)cnic: Added CNIC device: eth0cnic: Added CNIC device: eth1Broadcom NetXtreme II iSCSI Driver bnx2i v2.1.0 (Dec 06, 2009)iscsi: registered transport (bnx2i)scsi4 : Broadcom Offload iSCSI Initiatorscsi5 : Broadcom Offload iSCSI Initiatoriscsi: registered transport (tcp)iscsi: registered transport (iser)iscsi: registered transport (be2iscsi) scsi6 : iSCSI Initiator over TCP/IP Vendor: LEFTHAND Model: iSCSIDisk Rev: 9500 Type: Direct-Access ANSI SCSI revision: 05SCSI device sda: 2147483648 512-byte hdwr sectors (1099512 MB)sda: Write Protect is offsda: Mode Sense: 77 00 00 08SCSI device sda: drive cache: noneSCSI device sda: 2147483648 512-byte hdwr sectors (1099512 MB)sda: Write Protect is offsda: Mode Sense: 77 00 00 08SCSI device sda: drive cache: none sda: unknown partition tablesd 6:0:0:0: Attached scsi disk sdasd 6:0:0:0: Attached scsi generic sg0 type 0
This is the default print of a kernel ring buffer on a RH system where ISCIS drivers are installed.
It should be noted that not all of these drivers are needed.
• ISCSI_TCP --software (needed if there are no hardware offload adaptors installed)
• Cxgb3i ~ hardware Chelsio adaptors• Bnx2 ~ broadcom and cnic used for offload• ISER ~ AKA ib_iser ~ infiniband• Be2iscsi ~ Server Engine ~ Emulex acquired
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.54
ISCSI
Lower layer device drivers
Enable debug at the ISCSI connection layer (not the TCP interconnect level)
# modprobe.conf:
Add “options iscsi_tcp debug_iscsi_tcp=1”
NOTE Depends on driver used
# /sbin/iscsid -c /etc/iscsi/iscsid.conf -i /etc/iscsi/initiatorname.iscsi -d 1
# iscsiadm -m node --loginall=automatic
Logging in to [iface: default, target: iqn.2003-10.com.lefthandnetworks:labtest:39:test, portal: 10.1.0.42,3260]
Login to [iface: default, target: iqn.2003-10.com.lefthandnetworks:labtest:39:test, portal: 10.1.0.42,3260] successful
Though the CLI shows the same information on STDOUT, the log file will have far more…
SCSI Lower Layer FC ISCSISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
SCSI
Mid and upper layers
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FCISCS
ISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.56
SCSI Architecture – t10.org
SCSI – mid and upper layersUpper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FCISCS
ISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.57
Review of LLDD inquiry
SCSI – mid and upper layers
The LLDD layer has discovered targets and initialized it’s structures.
It has issued the SCSI_SCAN() or REPORT_LUN().
EXAMPLE:
RSCN comes in over the wire.. Triggers error handling at the HBA and the HBA calls a LIP to rescan.
LDD driver
qla2x00_do_dpc(void *data) --> qla2x00_rescan_fcports --> qla2x00_update_fcport --> qla2x00_lun_discovery() --> qla2x00_rpt_lun_discovery() --> qla2x00_report_lun()
Same for ISCSI. Once communication is established for the session, a REPORT_LUN() is issued.
• Devices are returned to the SCIS MID and UPPER layers for driver registration and context building.
• Udev wakes up with an interrupt and builds dynamic user level device files from the uevent.
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FCISCS
ISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
SCSI
Mid and upper layers -- DEBUG
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FCISCS
ISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.59
SCSI logging
SCSI mid and upper: DEBUG
Many levels exist, a good starting point:To enable:
echo 0x9411 > /proc/sys/dev/scsi/logging_level
To disable:echo 0 > /proc/sys/dev/scsi/logging_levelSee more details at:http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/Enable-verbose-debugging-with-native-SCSI-drivers-on-Linux/ba-p/89955
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FCISCS
ISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.60
SCSI debug runneth over
SCSI mid and upper: DEBUG
WARNING:The above Logging level will fill up /var/log/messages at
a rapid pace. Make sure you know what you are looking for
and try to keep the debug level down to 4 – 6 hours.
The levels are described in: scsi_logging.h
Found on any distribution and on
http://linuxdb.corp.hp.com
Example:
http://brunel.gbr.hp.com/suse/lxr/http-SLES10-x86_64/source/drivers/scsi/scsi_logging.h
Upper Layer SD ST SR SG
SCIS MID LAYER: GLUE
SCSI Lower Layer FCISCS
ISAS
Etc…
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you