40
1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

Embed Size (px)

Citation preview

Page 1: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

1

OpenSSL acceleration using Graphics

Processing Units

Pedro Miguel Costa Saraiva

Page 2: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

2

Introduction•Cryptography: The study of

security techniques

•SSL: A set of rules governing authentication and encrypted client/server communication• De facto standard for secure electronic

communications

• Computationally intensive

• Large volumes of SSL traffic impact performance

Pedro Miguel Costa Saraiva

OpenSSL acceleration using Graphics Processing Units

Page 3: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

3

Introduction

•GPU: A specialised processing unit designed to manipulate graphics• Originally used solely for graphics calculations

• Recent developments enable its use for general purpose computing

• Massive computational power

Pedro Miguel Costa Saraiva

OpenSSL acceleration using Graphics Processing Units

Page 4: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

4

Introduction

•OpenSSL• Open-source implementation of the SSL and

TLS protocols

• Core-library implements a variety of cryptographic functions

• Intensively used by an extremely large number of both open and proprietary applications

Pedro Miguel Costa Saraiva

OpenSSL acceleration using Graphics Processing Units

Page 5: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

5

Introduction

•Objectives• Efficiently offload cryptographic operations

onto a GPU

• Add GPU functionality to OpenSSL

• Lighten the load on the CPU

Pedro Miguel Costa Saraiva

OpenSSL acceleration using Graphics Processing Units

Page 6: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

6

Introduction•Structure

• State of the art• OpenSSL

• GPU

• Programming the GPU

• OpenCL

• CUDA

• OpenCL vs CUDA

• Main challenges

• Implementation

• Results

• Conclusion

Pedro Miguel Costa Saraiva

OpenSSL acceleration using Graphics Processing Units

Page 7: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

7

State of the art

•OpenSSL

• Commercial-grade full-featured open source toolkit

• Divided into libssl and libcrypto

• Core library written in C

• Supports accelerator hardware via engines

Pedro Miguel Costa Saraiva

OpenSSL acceleration using Graphics Processing Units

Page 8: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

8

State of the art

• Massive parallel processing power

• Roughly ten times the floating point capability of a high end CPU

• Faster growth rate than CPUs

Pedro Miguel Costa Saraiva

GPU

OpenSSL acceleration using Graphics Processing Units

Page 9: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

9

State of the art

• At the end of the 90s, graphics cards could not be programmed

• Things changed in 2001 with the release of DirectX 8 and OpenGL

• Programmers had to express their computations in terms of textures, vertices and shader programs

Pedro Miguel Costa Saraiva

GPU - Programming

OpenSSL acceleration using Graphics Processing Units

Page 10: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

10

State of the art

• 2006: NVIDIA created the CUDA framework

• ATI created the CTM low-level framework

• 2008: NVIDIA and ATI joined the Khronos Group

• Development of an industry standard for hybrid computing

• OpenCL version 1.0 released in December 2008

Pedro Miguel Costa Saraiva

GPU - Programming

OpenSSL acceleration using Graphics Processing Units

Page 11: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

11

State of the art

• Open, royalty-free standard for general purpose programming

• Supports CPUs, GPUs, and other types of processors

• Maintained by the non-profit consortium Khronos Group

• Adopted by Intel, AMD, NVIDIA, and ARM Holdings

Pedro Miguel Costa Saraiva

GPU - OpenCL

OpenSSL acceleration using Graphics Processing Units

Page 12: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

12

State of the art

• API for coordinating parallel computation across different processors

• Cross-platform programming languages

• Subset of ISO C99

• Low performance on NVIDIA GPUs

Pedro Miguel Costa Saraiva

GPU - OpenCL

OpenSSL acceleration using Graphics Processing Units

Page 13: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

13

State of the art

• Proprietary hardware and software architecture

• Designed by NVIDIA

• Manages computations on a GPU

• API is programmed with “C for CUDA”

• Third party wrappers available for other languages

Pedro Miguel Costa Saraiva

GPU - CUDA

OpenSSL acceleration using Graphics Processing Units

Page 14: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

14

State of the art

• Well suited to extremely parallel problems

• Interaction between threads should be minimal

• Diverging executions paths are slow

• Limited memory

• Slow memory swapping

• Data-intensive operations are discouraged

• No file or standard I/O operations

Pedro Miguel Costa Saraiva

GPU - Main Challenges

OpenSSL acceleration using Graphics Processing Units

Page 15: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

15

Implementation

• OpenSSL

• AES

• RSA Key Generation

• RSA Cipher

Pedro Miguel Costa Saraiva

Structure

OpenSSL acceleration using Graphics Processing Units

Page 16: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

16

Implementation

• ENGINE component supports alternative cryptography implementations

• Supports dynamic loading of external engines

Pedro Miguel Costa Saraiva

OpenSSL

OpenSSL acceleration using Graphics Processing Units

Page 17: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

17

Implementation

• Binding function defines supported algorithms

• Pointers to functions implementing the defined algorithms

Pedro Miguel Costa Saraiva

OpenSSL Engine

OpenSSL acceleration using Graphics Processing Units

Page 18: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

18

Implementation

• CBC mode encryption cannot be parallelised

• Previous ciphertext block is required to begin encryption of the next one

• CBC mode decryption can be parallelised

• All blocks are decrypted in parallel

• ECB mode can be parallelised

Pedro Miguel Costa Saraiva

AES

OpenSSL acceleration using Graphics Processing Units

Page 19: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

19

Implementation

• Initialisation

• Key expansion is performed on the CPU

• Cipher

• Initialises the GPU

• Allocates host and GPU memory for input and output data

Pedro Miguel Costa Saraiva

AES

OpenSSL acceleration using Graphics Processing Units

Page 20: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

20

Implementation

• Cipher

• Input data transferred to the GPU memory

• All data transferred at once

• GPU Kernel is called

• Output data is transferred from the GPU memory

Pedro Miguel Costa Saraiva

AES

OpenSSL acceleration using Graphics Processing Units

Page 21: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

21

Implementation

• GPU Kernel

• For CBC encryption, a single thread is called

• Encrypts every block serially

• For CBC decryption and ECB operations, a thread is called for every block

• All blocks are processed in parallel

Pedro Miguel Costa Saraiva

AES

OpenSSL acceleration using Graphics Processing Units

Page 22: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

22

Implementation

• Generation function (CPU side)

• Calls the GPU to generate a large amount of prime candidates

• No more numbers are generated until the initial pool is exhausted

Pedro Miguel Costa Saraiva

RSA Key Generation

OpenSSL acceleration using Graphics Processing Units

Page 23: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

23

Implementation

• Generation function (GPU call)

• GPU RNG is initialised

• Device memory is allocated

• A large amount of threads is called to generate prime BIGNUMs

Pedro Miguel Costa Saraiva

RSA Key Generation

OpenSSL acceleration using Graphics Processing Units

Page 24: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

24

Implementation

• Generation function (GPU kernel)

• Random BIGNUM is generated

• BIGNUM p is tested for primality

• Miller-Rabin probabilistic primality test

• BIGNUMs determined to be prime are written into global memory

• Each thread tests one BIGNUM

Pedro Miguel Costa Saraiva

RSA Key Generation

OpenSSL acceleration using Graphics Processing Units

Page 25: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

25

Implementation

• Generation function (GPU call)

• Output data copied back to the host

• Required implementing the entire OpenSSL BIGNUM library on the GPU

Pedro Miguel Costa Saraiva

RSA Key Generation

OpenSSL acceleration using Graphics Processing Units

Page 26: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

26

Implementation

• BIGNUMs used in RSA must be broken down into small words

• Multiple threads can each process a word

• Chinese Remainder Theorem can split private key operations in half

Pedro Miguel Costa Saraiva

RSA Cipher

OpenSSL acceleration using Graphics Processing Units

Page 27: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

27

Implementation

• Multi-Precision Algorithm

• K-bit integer A is broken into s k/64 words

• O(s) parallel implementation

• Runs s threads in two phases

Pedro Miguel Costa Saraiva

RSA Cipher

OpenSSL acceleration using Graphics Processing Units

Page 28: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

28

Implementation

• First phase accumulates s partial products in 2s steps

• Carries accumulated in a separate array

• Second phase adds the carries to the intermediate result\

• Worst case scenario is s-1 iterations

• Usually only one or two

Pedro Miguel Costa Saraiva

RSA Cipher

OpenSSL acceleration using Graphics Processing Units

Page 29: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

29

Results

• Intel Core i7 950 CP, 3.07GHz

• NVIDIA GeForce GTX 580

• Stress tool used on heavy CPU load tests

• 300 threads looping on sqrt, malloc/free and sync

Pedro Miguel Costa Saraiva

Testing Framework

OpenSSL acceleration using Graphics Processing Units

Page 30: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

30

Results

Pedro Miguel Costa Saraiva

AES – CBC Decryption

OpenSSL acceleration using Graphics Processing Units

Page 31: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

31

Results

Pedro Miguel Costa Saraiva

AES – CBC Encryption

OpenSSL acceleration using Graphics Processing Units

Page 32: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

32

Results

Pedro Miguel Costa Saraiva

AES – ECB Encryption

OpenSSL acceleration using Graphics Processing Units

Page 33: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

33

Results

Pedro Miguel Costa Saraiva

AES – ECB Decryption

OpenSSL acceleration using Graphics Processing Units

Page 34: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

34

Results

Pedro Miguel Costa Saraiva

RSA Key Generation

OpenSSL acceleration using Graphics Processing Units

Page 35: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

35

Results

Pedro Miguel Costa Saraiva

RSA Key Generation – Heavy CPU load

OpenSSL acceleration using Graphics Processing Units

Page 36: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

36

Results

Pedro Miguel Costa Saraiva

RSA Cipher

OpenSSL acceleration using Graphics Processing Units

Single message, heavy CPU load

RSA Cipher

Single message

Multiple messages (4096-bit)

Page 37: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

37

Results

Pedro Miguel Costa Saraiva

RSA Key Generation – Heavy CPU load

OpenSSL acceleration using Graphics Processing Units

Page 38: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

38

Results

Pedro Miguel Costa Saraiva

RSA Key Generation – Heavy CPU load

OpenSSL acceleration using Graphics Processing Units

Page 39: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

39

Conclusion

Pedro Miguel Costa Saraiva

OpenSSL acceleration using Graphics Processing Units

• Significant performance boost for AES ECB and CBC Decryption

• AES CBC Encryption is slower, but significantly lighter on the CPU

• RSA Key Generation is significantly faster for multiple keys

• RSA Cipher is significantly slower

Page 40: 1 OpenSSL acceleration using Graphics Processing Units Pedro Miguel Costa Saraiva

40

Future Work

Pedro Miguel Costa Saraiva

OpenSSL acceleration using Graphics Processing Units

• AES CTR Cipher Mode

• OpenSSL implementation still unstable

• Manager to cache RSA requests for more effective use of the GPU