22
Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission Adventures Installing Infiniband Storage Randy Kreiser Chief Architect Sonoma OpenFabrics Workshop 1 May 2007

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission Adventures Installing Infiniband Storage Randy

Embed Size (px)

Citation preview

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Adventures Installing Infiniband Storage

Randy KreiserChief Architect

Sonoma OpenFabrics Workshop1 May 2007

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Meet the Players (Hardware)

Host Channel Adapters & Switches– Mellanox– Qlogic– Voltaire– Cisco

Storage– Data Direct Networks– Engenio– Texas Memory (SSD)– Others?

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Meet the Players (Software)

Infiniband Drivers– OFED– Mellanox IBGLD– Qlogic– Voltaire– Cisco

Subnet Manager– OpenSM– Qlogic– Voltaire– Cisco

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Decisions, Decisions, Decisions

What operating system am I using– SuSe– RedHat– Other?

What HCA should I use?– PCI-x– PCI-e

What switch should I use?– Port count?

What initiator driver should I use?– Performance ???– Compatibility– Failover

What storage should I use?– Performance ???

IOPS Bandwidth

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Decisions, Decisions, Decisions

SRP or iSER drivers

Which subnet manager should I use?

Where should the subnet manager run?– Switch– Host

Troubleshooting– I can’t see any luns

Benchmarking– 600MBS– 800MBS– 1000MBS– 2000MBS

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Direct Connect

1 2 3 4 5 6 70 P1 P2

Tier 1

Tier 2

Tier 3

Tier 4

Tier 5

Tier 6

Tier 7

Tier 8

HCA

DCE

HCA HCA

HCA HCA HCA

Test Host

IB 4 X

FCAL

S2A Controller 1 S2A Controller 2

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Benchmarking

O_Direct I/O vs non O_Direct I/O– Large Sequential I/O– Small Random I/O

Software Striping– Chunk Size

Block device max sectors– MAX SECT

– SG_TABLE_SIZE

Block device read ahead

– hdparm

– blockdev

Queue Depth– Setting

RAID Controller Settings– Cache Size

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Benchmarking

Write performance

blk size /dev/sdc c+d+e+f

256MB 686.56 2527.49

128MB 684.54 2473.39

64MB 677.64 2375.96

32MB 673.22 2223.60

16MB 660.31 1967.58

8MB 638.19 1614.75

4MB 587.30 1336.12

2MB 523.75 792.44

1MB 419.26 420.73

512KB 314.54 317.76

256KB 217.89 221.72

128KB 151.55 154.67

Read performance

blk size /dev/sdc c+d+e+f

256MB 616.66 1793.89

128MB 603.98 1677.27

64MB 596.96 1573.50

32MB 583.34 1461.18

16MB 594.86 1414.46

8MB 575.79 1298.77

4MB 535.69 1112.40

2MB 476.80 672.72

1MB 386.84 366.45

512KB 295.09 288.99

256KB 213.43 208.64

128KB 158.39 158.00

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Supported Disk Technology SAS & SATA FibreChannel & SATA

RAID Parity Protection RAID6 8+2 Only RAID3 (8+1+1), RAID6 8+2

Sustained Throughput 5.6GB/s – 6.0GB/s 2.4 GB/s – 2.8GB/s

Maximum Cache 5.0 GB ECC Protected 2.5GB RAID Protected

Minimum Cache 2.5 GB ECC Protected 2.5GB RAID Protected

Disk Side Ports 20 x SAS 4 Lane 20 x FC-2

Host Side FC Ports 8 x IB 4x DDR or 8 x FC-8 8 x FC-4 or 8 x IB 4x

Dimensions 7 x 19 x 28 in. (4U) 7 x 19 x 25 in. (4U)

Certifications UL,CE,CUL,C-Tick,FCC UL,CE,CUL,C-Tick,FCC

Release Date 1Q/2008 September 2005

Specification S2A9900 Couplet S2A9550 Couplet

S2A 9900 Hardware Specifications (What’s Next)

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

SRP

SRP

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

SRP (SCSI RDMA Protocol)

Advantages– Inifiniband native protocol– No new hardware required– Requests carry buffer information– All data transfer through Infiniband RDMA– No Need for Multiple Packets– No flow control for data packets necessary

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Direct Connect Example

•IB ports with direct connections•Data distribution through servers•Asymmetrical file systems (Lustre, etc.)

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

SRP General

SCSI RDMA Protocol– SCSI over IB– Similar to FCP (SCSI over Fibre

Channel) except that CMD Information Unit includes addresses to get/place data.

– Initiator drivers available with IB Software Vendors and OFED.

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

SRP Command Request

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

iSER

iSER

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

iSER (iSCSI Extensions for RDMA)

iSER leverages on iSCSI management and discovery– Zero-Configuration, global storage naming (SLP, iSNS)– Change Notifications and active monitoring of devices and

initiators – High-Availability, and 3 levels of automated recovery – Multi-Pathing and storage aggregation – Industry standard management interfaces (MIB)– 3rd party storage managers – Security (Partitioning, Authentication, central login control, ..)

Working with iSER over IB Doesn’t require changes !!! – Enable investment protection (software, education, training, ..)– Reduce the fear-factor of IB

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

iSCSI Mapping to iSER / RDMA Transport iSCSI Mapping to iSER / RDMA Transport

• iSER eliminates the traditional iSCSI/TCP bottlenecks :

– Zero copy using RDMA

– CRC calculated by hardware

– Work with message boundaries instead of streams

– Transport protocol implemented in hardware (minimal CPU cycles per IO)

BHS AHS HD Data DD

Protocol frames (RDMA)

iSCSI PDU

RC Send RC RDMA Read/Write

XIn HW

XIn HW

iSCSI Mapping to iSER

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

iSER Protocol (Read)

• SCSI Reads

– Initiator Send Command PDU (Protocol data unit) to Target

– Target return data using RDMA Write

– Target send Response PDU back when completed transaction

– Initiator receives Response and complete SCSI operation

iSC

SI

Init

iato

r

iSE

R

HC

A

HC

A

iSE

R T

arge

t

Tar

get

Sto

rage

Send_Control (SCSI Read Cmd)

RDMA Write for Data

Send_Control + Buffer advertisement Control_Notify

Data_Put (Data-In PDU) for Read

Control_NotifySend_Control (SCSI Response)

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

iSCSI Discovery-Direct SLP

Client Broadcast:I’m xx where is my storage ?

FC Routers discover FC SAN

Relevant iSCSI Targets & FC gateways respond

Client may record multiple

possible targets & Portals

GbE Switch FC

Switch

IB to IP Router

Native IB RAID

IB to FC Routers

iSCSI Client

Portal – a network end-point (IP+port), indicating a path

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

iSCSI Discovery-iSNS

FC Routers discover FC SAN

iSCSI Targets & FC gateways report to iSNS Server

Client ask iSNS Server:I’m xx where is my storage ?

iSNS responds with targets and portals

resources may be divided to domains

Changes notified immediately (SCNs)

GbE Switch FC

Switch

IB to IP Router

Native IB RAID

IB to FC Routers

iSCSI Client

iSNS or SLP run over IPoIB or GbE, and can span both networks

iSNS Server

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Conclusion

Both SRP and iSER support RDMA– Source and Destination Addresses in the SCSI

transfer– Zero memory copy

SRP Uses– Direct server connections– Small controlled environments

iSER Uses– Large switch connected Networks– Discovery fully supported

Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission

Adventures Installing Infiniband Storage

Randy KreiserChief Architect

Sonoma OpenFabrics Workshop1 May 2007