59
© 2007 Cisco Systems, Inc. All rights reserved. Cisco Public Presentation_ID 1 Writing RDMA applications on Linux Roland Dreier <[email protected]>

Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

Embed Size (px)

Citation preview

Page 1: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 1

Writing RDMA applications on Linux

Roland Dreier <[email protected]>

Page 2: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 2

RDMA?

Page 3: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 3

RDMA:Remote DMA

Page 4: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 4

RDMA:Remote Direct Memory Access

Page 5: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 5

RDMA:Remote Direct Memory Access

one­sided operations

Page 6: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 6

RDMA:Remote Direct Memory Access

one­sided operations

get/put semantics

Page 7: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 7

RDMA:Remote Direct Memory Access

one­sided operations

get/put semantics

direct data placement

Page 8: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 8

RDMA:Remote Direct Memory Access

...but wait, there's more...

Page 9: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 9

RDMA:Remote Direct Memory Access

Asynch work/completion queues

Page 10: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 10

RDMA:Remote Direct Memory Access

Asynch work/completion queuesKernel bypass

Page 11: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 11

RDMA:InfiniBand

Page 12: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 12

RDMA:InfiniBand

iWARP

Page 13: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 13

RDMA Verbs

Subtitle

Page 14: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 14

Verbs?

Page 15: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 15

Verbs:not quite an API

Page 16: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 16

Verbs:not quite an API;

“abstract definition of functionality”

Page 17: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 17

Verbs:resources (objects)

operated on by

verbs (functions)

Page 18: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 18

Verbs:create object

Page 19: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 19

Verbs:create objectdestroy object

Page 20: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 20

Verbs:create objectdestroy object

more interesting things...

Page 21: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 21

Objects:

Page 22: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 22

Objects:device context

Page 23: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 23

Objects:queue pair (QP)

Page 24: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 24

Objects:queue pair (QP)

send queue & receive queue

Page 25: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 25

Objects:queue pair (QP)

post send

post receive

modify state

Page 26: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 26

Objects:completion queue (CQ)

Page 27: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 27

Objects:completion queue (CQ)

work request completions reported as CQ entries

Page 28: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 28

Objects:completion queue (CQ)

poll CQ

request notification

Page 29: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 29

Objects:completion channel

Page 30: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 30

Objects:memory region (MR)

Page 31: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 31

Objects:protection domain (PD)

Page 32: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 32

Objects:shared receive queue (SRQ)

Page 33: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 33

Objects:shared receive queue (SRQ)

multiple QPs can share a receive queue

cleaning up is a little tricky

Page 34: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 34

Objects:shared receive queue (SRQ)

post receive

Page 35: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 35

Objects:address handle (AH)

memory window (MW)

Page 36: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 36

Work processing:requests from WQs get executedcompletions are reported to CQs

mostly things stay in order

Page 37: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 37

Linux & RDMA

Page 38: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 38

librdmacm

Page 39: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 39

librdmacmLinux library to abstract

connection setup

Page 40: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 40

librdmacmLinux library to abstract

connection setupsame code runs on IB and iWARP

Page 41: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 41

librdmacmmimics TCP socket model

Page 42: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 42

librdmacmmimics TCP socket model

“cm_id” is socket analog

Page 43: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 43

librdmacmmimics TCP socket model

“cm_id” is socket analog

IP addressing used even on InfiniBand

Page 44: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 44

librdmacmmimics TCP socket model

“cm_id” is socket analog

IP addressing used even on InfiniBand

additional address/route resolution steps

Page 45: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 45

librdmacmevents reported through “channels”

Page 46: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 46

librdmacmevents reported through “channels”

rdma_create_event_channel()

rdma_get_cm_event()

rdma_ack_cm_event()

Page 47: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 47

librdmacmactive connection steps

Page 48: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 48

librdmacmactive connection steps

rdma_resolve_addr()

rdma_resolve_route()

rdma_connect()

Page 49: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 49

librdmacmpassive connection steps

Page 50: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 50

librdmacmpassive connection steps

rdma_bind_addr()

rdma_listen()

rdma_accept()

Page 51: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 51

libibverbs

Page 52: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 52

libibverbsLinux implementation of

RDMA verbs

Page 53: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 53

libibverbsLoads device­specific drivers

for hardware support

Page 54: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 54

libibverbsLoads device­specific drivers

for hardware supportIB: libmthca, libmlx4, libipathverbs, libehca

iWARP: libcxgb3, libamso

Page 55: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 55

libibverbsLoads device­specific drivers

for hardware supportIB: libmthca, libmlx4, libipathverbs, libehca

iWARP: libcxgb3, libamso

Page 56: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 56

libibverbsCreating QP can be confusing

rdma_create_qp() vs. ibv_create_qp()

all those parameters!

Page 57: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 57

libibverbsPosting work requests is tricky too

send opcodes

iWARP doesn't have immed data or atomics

signaled and unsignaled completions

Page 58: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 58

Q and A

Page 59: Linux RDMA programming - digitalvampire.org · Presentation_ID © 2007 Cisco Systems, Inc. All rights reserved. Cisco Public 1 Writing RDMA applications on Linux Roland Dreier

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicPresentation_ID 59