53
Use only what you really need – Kernel space buffers in server process Rafal Szczesniak DellEMC / Isilon

Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 1

Use only what you really need –Kernel space buffers in server process

Rafal SzczesniakDellEMC / Isilon

Page 2: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 2

To copy or not to copy?Available alternativesIntroducing the io vectorUtilising the io vectorUse casesThe future

Page 3: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 3

Typical communication with the network

Arriving packets are held in the kernel buffers User process reading from the bsd socket -

copyout Writing the data to a file (descriptor) – copyin Lots of CPU cycles just to copy the data back

and forth

Page 4: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 4

Typical communication with the network

NETWORK INTERFACE

USER SPACE

FILE

Receive

KERNEL SPACE

Buffer

Page 5: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 5

Typical communication with the network

NETWORK INTERFACE

CopyoutUSER SPACE

FILE

Receive

KERNEL SPACE

Buffer

Buffer

Page 6: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 6

Typical communication with the network

NETWORK INTERFACE

CopyoutUSER SPACE

FILE

Receive

KERNEL SPACE

Processing

Buffer

Buffer Buffer

Page 7: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 7

Typical communication with the network

NETWORK INTERFACE

CopyoutUSER SPACE

FILE

Receive

KERNEL SPACE

Processing

Copyin

Buffer

Buffer Buffer

Buffer

Page 8: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 8

Typical communication with the network

NETWORK INTERFACE

CopyoutUSER SPACE

FILE

Receive

KERNEL SPACE

Processing

Copyin

Write

Buffer

Buffer Buffer

Buffer

Page 9: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 9

Networks are faster and more reliable

The larger the MTU (Maximum Transmission Unit) the more data to copy

Significant percentage of the CPU time can be spent on copying alone

Page 10: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 10

To copy or not to copy?Available alternativesIntroducing the io vectorUtilising the io vectorUse casesThe future

Page 11: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 11

Zero-copy An attempt to avoid wasting the CPU cycles Not a new concept Typically utilising file descriptors for accessing

the data Various implementations among various OSes

Page 12: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 12

Zero-copy and splice(2) One side of the transfer has to be a pipe fd This is mostly due to what pipe buffers support Two extra file descriptors per transfer Linux-specific syscall

Page 13: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 13

Zero-copy and sendfile(2) Input file descriptor must allow mmap (typically a

file) Obviously, it cannot receive

Page 14: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 14

To copy or not to copy?Available alternativesIntroducing the io vectorUtilising the io vectorUse casesThe future

Page 15: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 15

Engaging a kernel-level structure - mbuf Simple structure holding a buffer pointer in

FreeBSD Primarily used in IPC, so it facilitates certain

network communication features mbufs can be chained together in a list (mbuf

chain) Chains and also be linked (packet queue)

Page 16: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 16

mbuf chainsmbuf

m_nextm_nextpkt

m_nextm_nextpkt

m_nextm_nextpkt

m_nextm_nextpkt

chain

m_nextm_nextpkt

m_nextm_nextpkt

m_nextm_nextpkt

m_nextm_nextpkt

chain

packet queue

Page 17: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 17

Entering user space – io vector Follows the same scatter-gather pattern utilising

a list of entries Gets translated to an mbuf chain when in the

kernel space Not everything is needed in the user space –

different buffer types can be treated differently

Page 18: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 18

Entering user space – io vector (contd.)struct iov_entry {

struct iov_entry *next;enum iov_entry_type type;size_t size;size_t length;

};

struct iovector {struct iov_entry *first;struct iov_entry *last;

};

Page 19: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 19

IO Vector

iovector

firstlast

nexttype

iov_entry

nexttype

nexttype

iov_entry iov_entry

Page 20: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 20

Memory entry Represents a classic memory buffer (void*, size) The buffer is allocated in the user space, so it

can be modified Passing it down to the kernel is going to make its

copy (in mbufs)

Page 21: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 21

Memory entry (contd.)struct iov_entry_memory {

struct iov_entry *next;enum iov_entry_type type; /* = memory */size_t size;size_t length;char data[];

};

Page 22: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 22

Memory entry (contd.)

next

type = memory

struct iov_entry_memory

size

length

data

Page 23: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 23

Kernel entry Represents a buffer living in the kernel space Not accessible from the user space directly Passed down to the kernel space does not need

a copy

Page 24: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 24

Kernel entry Represents a buffer living in the kernel space Not accessible from the user space directly Passed down to the kernel space does not need

a copy Accessible through a file descriptor

Page 25: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 25

Kernel entry (contd.)struct iov_entry_kernel {

struct iov_entry *next;enum iov_entry_type type; /* = kernel */size_t size;size_t length;int fd;

};

Page 26: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 26

Kernel entry (contd.)

next

type = kernel

struct iov_entry_kernel

size

length

fd

Page 27: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 27

Kernel entry (contd.) A file descriptor needs its struct file* in the kernel An mbuf chain can be treated as a pseudo-file

and support the basic operations At the user level it is just passed around At the kernel level it can be used for:Reading an writing (file)Receiving and sending (socket)

Page 28: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 28

Kernel entry (contd.)

next

type = kernel

struct iov_entry_kernel

size

length

m_nextm_nextpkt

m_nextm_nextpkt

m_nextm_nextpkt

m_nextm_nextpkt

mbuf chain pseudo-file

fd

USER SPACE

KERNEL SPACE

struct file*

Page 29: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 29

To copy or not to copy?Available alternativesIntroducing the io vectorUtilising the io vectorUse casesThe future

Page 30: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 30

Flexibility through chaining Both entry types can represent any blob of data:Packet header allocated in the user spaceContents read from a file

Building a packet is much easier (header, request/response, padding can all be separate)

Page 31: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 31

Flexibility through chaining (contd.)int error;

struct iovector vec;

struct iov_entry_memory hdr, res;

struct iov_entry_kernel data;

error = IovInit(&vec);

CLEANUP_ON_ERROR(error);

error = IovCreateMemoryEntry(&hdr, /* size */);

CLEANUP_ON_ERROR(error);

/* Prepare the header */

error = IovPushBack(&vec, (struct iov_entry*)&hdr);

CLEANUP_ON_ERROR(error);

Page 32: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 32

Flexibility through chaining (contd.)error = IovCreateMemoryEntry(&res, /* size */);

CLEANUP_ON_ERROR(error);

/* Prepare the response */

error = IovPushBack(&vec, (struct iov_entry*)&hdr);

CLEANUP_ON_ERROR(error);

error = IovCreateKernelEntry(&data, /* size */);

CLEANUP_ON_ERROR(error);

/* Read the data */

error = IovPushBack(&vec, (struct iov_entry*)&data);

CLEANUP_ON_ERROR(error);

Page 33: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 33

Flexibility through chaining (contd.)

iovector

firstlast

nexttype

iov_entry_memory

nexttype

nexttype

iov_entry_kernel

HEADER RESPONSE DATA

Page 34: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 34

Flexibility through splitting and converting An io vector can have its parts split off and freed

when no longer needed Kernel entries can be pulled up to the user

space Memory entries can be pushed down, too New syscalls are required

Page 35: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 35

Flexibility through splitting and convertingint error;

struct iovector received, hdr_vec, req_vec, data;

struct smb2_header *header;

/* Vector received – now, get the header */

error = IovPullup(&received, sizeof(*header));

CLEANUP_ON_ERROR(error);

error = IovSplit(&receive, sizeof(*header), &hdr_vec);

CLEANUP_ON_ERROR(error);

header = IovMemOrigin(&hdr_vec);

/* Process the header */

Page 36: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 36

Replacing a buffer with an io vector send/recv/read/write could take an io vector

instead of a buffer This enables keeping a part or all of the data

read/received in the kernel We can always pull up when needed

Page 37: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 37

To copy or not to copy?Available alternativesIntroducing the io vectorUtilising the io vectorUse casesThe future

Page 38: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 38

Use case – large SMB2 write (1/5)

HEADER THE REST

mbufs

USER SPACE

KERNEL SPACE

A packet is received in a minimum-sized iovector The header says it’s a write request

Page 39: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 39

Use case – large SMB2 write (2/5)

HEADER REQUEST FILE DATA

mbufs

USER SPACE

KERNEL SPACE

Pull up the request (fixed-size)

Page 40: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 40

Use case – large SMB2 write (3/5)

HEADER REQUEST FILE DATA

mbufs

USER SPACE

KERNEL SPACE

Split off the SMB2 packet to a separate iovector File data is separate now, too

Page 41: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 41

Use case – large SMB2 write (4/5)

HEADER REQUEST FILE DATA

mbufs

USER SPACE

KERNEL SPACE

Free what’s no longer needed Write to the file – mbufs will be moved in-kernel

Page 42: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 42

Use case – large SMB2 write (5/5) The data written to the file has never left the

kernel The packet can be received to the “minimum

necessary” iovector (only the SMB2 header in user space)

What’s been processed and no longer needed can be released early

Page 43: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 43

Use case – SMB3 decryption (1/6)

TH THE REST

mbufs

USER SPACE

KERNEL SPACE

A packet is received in a minimum-sized iovector The header is a TRANSFORM_HEADER

Page 44: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 44

Use case – SMB3 decryption (2/6)

TH THE REST

mbufs

USER SPACE

KERNEL SPACE

Decrypt the rest first Split off and free TRANSFORM_HEADER

Page 45: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 45

Use case – SMB3 decryption (3/6)

HEADER THE REST

mbufs

USER SPACE

KERNEL SPACE

Pull up SMB2 header (it’s decrypted now) The header says it’s a change notify request

Page 46: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 46

Use case – SMB3 decryption (4/6)

HEADER REQUEST

USER SPACE

KERNEL SPACE

Pull up the request

Page 47: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 47

Use case – SMB3 decryption (5/6)

HEADER REQUEST

USER SPACE

KERNEL SPACE

Execute and free the packet

Page 48: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 48

Use case – SMB3 decryption (6/6) Decryption can be done both over the user and

kernel space buffers Sometimes we end up pulling up everything

Page 49: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 49

To copy or not to copy?Available alternativesIntroducing the io vectorUtilising the io vectorUse casesThe future

Page 50: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 50

This is today, what’s tomorrow? How difficult would it be with the Linux kernel? RDMA and the like

Page 51: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 51

Linux kernel There is no mbuf in Linux, some sources say

sk_buff is the equivalent sk_buff is much bigger and not necessarily

useful here iov_iter appears to fit the bill much better BSD-style UIO is in place It would still require new syscalls

Page 52: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 52

RDMA If an mbuf chain can be a pseudo-file, then any

data coming out of an RDMA-enabled device could be, too

Functions implementing the file ops would have to be different

Another iovector entry type would be needed Perhaps something for a recommendation?

Page 53: Use only what you really need – Kernel space buffers in ... · Entering user space – io vector Follows the same scatter-gather pattern utilising a list of entries Gets translated

2018 Storage Developer Conference. © DellEMC / Isilon. All Rights Reserved. 53

Thank you!Questions?

Credits: Brian Koropoff(for laying the groundwork of the kernel code)

[email protected]@emc.com