Upload
hoangdien
View
221
Download
2
Embed Size (px)
Citation preview
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 1
Writing RDMA applications on Linux
Roland Dreier <[email protected]>
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 2
RDMA?
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 3
RDMA:Remote DMA
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 4
RDMA:Remote Direct Memory Access
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 5
RDMA:Remote Direct Memory Access
onesided operations
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 6
RDMA:Remote Direct Memory Access
onesided operations
get/put semantics
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 7
RDMA:Remote Direct Memory Access
onesided operations
get/put semantics
direct data placement
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 8
RDMA:Remote Direct Memory Access
...but wait, there's more...
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 9
RDMA:Remote Direct Memory Access
Asynch work/completion queues
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 10
RDMA:Remote Direct Memory Access
Asynch work/completion queuesKernel bypass
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 11
RDMA:InfiniBand
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 12
RDMA:InfiniBand
iWARP
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 13
RDMA Verbs
Subtitle
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 14
Verbs?
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 15
Verbs:not quite an API
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 16
Verbs:not quite an API;
“abstract definition of functionality”
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 17
Verbs:resources (objects)
operated on by
verbs (functions)
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 18
Verbs:create object
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 19
Verbs:create objectdestroy object
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 20
Verbs:create objectdestroy object
more interesting things...
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 21
Objects:
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 22
Objects:device context
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 23
Objects:queue pair (QP)
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 24
Objects:queue pair (QP)
send queue & receive queue
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 25
Objects:queue pair (QP)
post send
post receive
modify state
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 26
Objects:completion queue (CQ)
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 27
Objects:completion queue (CQ)
work request completions reported as CQ entries
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 28
Objects:completion queue (CQ)
poll CQ
request notification
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 29
Objects:completion channel
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 30
Objects:memory region (MR)
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 31
Objects:protection domain (PD)
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 32
Objects:shared receive queue (SRQ)
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 33
Objects:shared receive queue (SRQ)
multiple QPs can share a receive queue
cleaning up is a little tricky
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 34
Objects:shared receive queue (SRQ)
post receive
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 35
Objects:address handle (AH)
memory window (MW)
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 36
Work processing:requests from WQs get executedcompletions are reported to CQs
mostly things stay in order
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 37
Linux & RDMA
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 38
librdmacm
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 39
librdmacmLinux library to abstract
connection setup
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 40
librdmacmLinux library to abstract
connection setupsame code runs on IB and iWARP
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 41
librdmacmmimics TCP socket model
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 42
librdmacmmimics TCP socket model
“cm_id” is socket analog
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 43
librdmacmmimics TCP socket model
“cm_id” is socket analog
IP addressing used even on InfiniBand
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 44
librdmacmmimics TCP socket model
“cm_id” is socket analog
IP addressing used even on InfiniBand
additional address/route resolution steps
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 45
librdmacmevents reported through “channels”
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 46
librdmacmevents reported through “channels”
rdma_create_event_channel()
rdma_get_cm_event()
rdma_ack_cm_event()
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 47
librdmacmactive connection steps
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 48
librdmacmactive connection steps
rdma_resolve_addr()
rdma_resolve_route()
rdma_connect()
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 49
librdmacmpassive connection steps
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 50
librdmacmpassive connection steps
rdma_bind_addr()
rdma_listen()
rdma_accept()
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 51
libibverbs
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 52
libibverbsLinux implementation of
RDMA verbs
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 53
libibverbsLoads devicespecific drivers
for hardware support
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 54
libibverbsLoads devicespecific drivers
for hardware supportIB: libmthca, libmlx4, libipathverbs, libehca
iWARP: libcxgb3, libamso
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 55
libibverbsLoads devicespecific drivers
for hardware supportIB: libmthca, libmlx4, libipathverbs, libehca
iWARP: libcxgb3, libamso
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 56
libibverbsCreating QP can be confusing
rdma_create_qp() vs. ibv_create_qp()
all those parameters!
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 57
libibverbsPosting work requests is tricky too
send opcodes
iWARP doesn't have immed data or atomics
signaled and unsignaled completions
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 58
Q and A
© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 59