6
8/3/2019 hwcrypto http://slidepdf.com/reader/full/hwcrypto 1/6  Integration of Hardware Cryptography Acceleration on Embedded Systems under Linux 2.6 Olaf Christ mycable GmbH Gartenstrasse 10 24534 Neumünster Germany [email protected] Tel: +49 4321 559 56-22 http://www.mycable.de Introduction With the advent of embedded devices being connected to each other and to the internet, the need for secure transmissions is increasing steadily. Not only industrial, telecom and network, but also consumer and automotive embedded devices get connected to networks. Establishing encrypted VPN connections with the help of IPsec, for example, puts huge loads on current embedded processors. Also, technologies such as DRM (Digital Rights Management) cause the need for cryptographic operations. The problem with encryption algorithms on embedded processors is that nowadays processors are scalar and thus only perform poorly processing these algorithms. This certainly demands cryptography hardware. Cryptographic operations should be offloaded to this piece of hardware by the operating system. The following paper describes the problems occurred while developing the driver to access those devices under Linux 2.6.

hwcrypto

Embed Size (px)

Citation preview

Page 1: hwcrypto

8/3/2019 hwcrypto

http://slidepdf.com/reader/full/hwcrypto 1/6

 

Integration of Hardware Cryptography Accelerationon Embedded Systems under Linux 2.6

Olaf Christmycable GmbH

Gartenstrasse 1024534 Neumünster

Germany

[email protected]

Tel: +49 4321 559 56-22http://www.mycable.de

Introduction

With the advent of embedded devices being connected to each other and to the internet, theneed for secure transmissions is increasing steadily. Not only industrial, telecom andnetwork, but also consumer and automotive embedded devices get connected to networks.Establishing encrypted VPN connections with the help of IPsec, for example, puts huge loads

on current embedded processors. Also, technologies such as DRM (Digital RightsManagement) cause the need for cryptographic operations.

The problem with encryption algorithms on embedded processors is that nowadaysprocessors are scalar and thus only perform poorly processing these algorithms. Thiscertainly demands cryptography hardware. Cryptographic operations should be offloaded tothis piece of hardware by the operating system.

The following paper describes the problems occurred while developing the driver to accessthose devices under Linux 2.6.

Page 2: hwcrypto

8/3/2019 hwcrypto

http://slidepdf.com/reader/full/hwcrypto 2/6

Linux Kernel 2.6's Cryptographic API

The Cryptographic API (see Figure 1) is a main part of Linux Kernel 2.6 and has beeninitiated to deliver cryptographic functionality to the whole Kernel. Several other Kernel partsaccess these functions in their cryptographic routines, for example, cryptoloop and the IPsec

stack.

The Cryptographic API is divided into three parts: the Algorithm API, the TransformOperations (OPS) (split into ciphers, digests and compressions) and the Transform API. TheAlgorithm API provides functions for registering algorithms with the Cryptographic API. Thesealgorithms can be compiled statically or as modules, which call the functioncrypto_register_alg() (in crypto/api.c) on initialisation. Every algorithm can only register withone facility within the API, therefore being a cipher, a digest or a compression algorithm.These transform operations are accessed by clients through the Transform API, whichmaintains transformation states and handles common logical operations (e.g. HMAC). Theavailable functions are crypto_alg_available() for checking if the desired algorithm isregistered with the API, crypto_alloc_tfm() for allocating a transformation session and

crypto_free_tfm() for freeing sessions (all in crypto/api.c).

Figure 1: Cryptographic API Block Diagram

All cryptographic operations (transformations) are performed on scatterlists, which are arraysof scatterlist structures. A scatterlist is a structure, describing a specific memory area via a

pointer to the memory page it is located on, its offset on the page and its length. Linuxdivides its main memory into zones, which are then split up into equally sized pages. Thesize of these pages depends on the processor’s architecture, being 4KB on i386 and x86 64architectures and one of 4KB, 8KB, 16KB or 64KB on MIPS machines.

Scatterlists are defined in include/asm/scatterlist.h.

struct scatterlist {

struct page *page;

unsigned int offset;

dma_addr_t dma_address;

unsigned int length;

};

Page 3: hwcrypto

8/3/2019 hwcrypto

http://slidepdf.com/reader/full/hwcrypto 3/6

For memory directly accessible by other bus masters (DMA memory), the optional entry“dma_address” exists in the scatterlist structure.Scatterlists have been introduced to speed up cryptographic operations. The bestcryptographic performance can be achieved with data being located on a single page. Thisensures that the data is continuous and does not have to be copied around beforeprocessing. A scatterlist should furthermore contain an amount of data which is a multiple ofthe cipher’s block size (typically 8 bytes).

Hardware support and the Cryptographic API

Knowing the architecture of the Cryptographic API, implementing hardware cryptographydrivers is straightforward. There is just one issue to resolve: The Cryptographic API is asynchronous interface. This means, the kernel waits until a function in the Cryptographic APIhas finished processing a request. For software implemented algorithms this may be fine,since the CPU has to do the work anyway. Hardware cryptography chips, however, may takesome time to process a request. During this time, the CPU has to wait while the

Cryptographic API function call returns.

The best way to overcome this problem would be to use acrypto [1], a kernel patch, whichhas been created to allow for asynchronous cryptographic operations. It extends theCryptographic API to be more hardware crypto friendly by providing call-back functions andnotifications to the API upon completion of a request. There are several additional featuresincluding load balancing between multiple hardware chips and software implementedalgorithms and a priority mechanism for selecting which implementations to use fist.

Although the acrypto patch would be the best approach, it still has some disadvantages:First of all, the acrypto patch is very big and thus not guaranteed to work with future kernels.This makes supporting drivers for several kernels difficult. Additionally, acrypto’s API and

internals are subject to change before being included into the kernel. Drivers based on thisAPI would have to be rewritten every time a change would be made.

Therefore another solution has to be found. In the following, two patches for theCryptographic API from the community will be presented.

•  OCF-LinuxOCF-Linux [2] is a port of OpenBSD’s Cryptographic Framework (OCF) to Linux. Itaims at bringing asynchronous hardware cryptography support to Linux Kernel 2.6.Unfortunately it currently only provides acceleration for OpenSSL and programs thatrely on this library. Future plans include direct processing of skbuff s and IPsec. OCF-Linux currently only supports i386, SuperH and ARM platforms.

•  Eugene Surovegin’s Hardware Crypto PatchesEugene’s first patch [3] extends Linux Kernel 2.6’s built-in Cryptographic API tosupport hardware cryptography accelerators. This patch is very trivial, since it onlyextends the Cryptographic API’s structures and functions to know about hardwarecryptography chips. His second patch is a bit more complicated. It changes thefunction esp_hmac_digest() to be hardware cryptography friendly by allowingprocessing of multiple blocks of data at a time. This increases performancesignificantly if supported by the crypto hardware.

Eugene’s Hardware Crypto Patches seem to be best suitable for adding hardware crypto

support to Linux Kernel 2.6 quickly. They allow for writing a driver, which plugs into theexisting Cryptographic API, enabling hardware cryptographic support throughout the whole

Page 4: hwcrypto

8/3/2019 hwcrypto

http://slidepdf.com/reader/full/hwcrypto 4/6

Kernel. This also allows writing a modular driver, which can be easily adapted (andenhanced) to acrypto when it becomes available.

Example implementation on AMD™'s Au1550™ processor

In this section, an example solution for AMD™'s Au1550™ processor will be presented. Thisprocessor includes a so called “security engine“, which supports the encryption algorithmsDES, 3DES, AES and ARC-4 and the hash algorithms MD5 and SHA-1 in hardware. It alsofeatures a DMA engine and an interrupt controller. This example can be ported to otherhardware easily.

First of all, Eugine Surovegin’s hardware cryptography patches have to be applied to acurrent Kernel (Linux-mips version 2.6.5 rc4 in this case). Now, the Cryptographic API isready to support hardware cryptography chips.

Registration with the Cryptographic API is easy: First, a structure with information about the

module’s capabilities has to be created:

static struct crypto_alg des3_ede_alg = {

.cra_name = "des3_ede",

.cra_flags = CRYPTO_ALG_TYPE_CIPHER,

.cra_blocksize = DES3_EDE_BLOCK_SIZE,

.cra_ctxsize = sizeof(struct au1550_crypto_cipher_ctx),

.cra_module = THIS_MODULE,

.cra_list = LIST_HEAD_INIT(des3_ede_alg.cra_list),

.cra_u = {

.cipher = {

.cia_min_keysize = DES3_EDE_KEY_SIZE,

.cia_max_keysize = DES3_EDE_KEY_SIZE,

.cia_setkey = au1550_crypto_des3_ede_setkey,

.cia_encrypt = au1550_crypto_des3_ede_encrypt,

.cia_decrypt = au1550_crypto_des3_ede_decrypt

}

}

};

It is then passed to the Cryptographic API by calling its function crypto_register_alg() duringmodule initialisation.

The Cryptographic API places crypto requests by calling one of the module’s registeredfunctions. In these functions the request is converted into a (proprietary) request packet the

security engine understands. This request packet is then transmitted to the security engine,which fetches the payload from main memory via DMA, performs the cryptographicoperations requested, writes the result back to main memory via DMA and issues aninterrupt in the CPU. The driver went to sleep after handing over the request to the securityengine. Upon completion, the driver continues processing at that point.

This approach causes a lock in the Cryptographic API but since processing is synchronousanyways, this issue can be neglected.

Page 5: hwcrypto

8/3/2019 hwcrypto

http://slidepdf.com/reader/full/hwcrypto 5/6

Performance

In order to prove the driver works correctly, performance tests have to be performed. Figure2 shows the test network used for benchmarking. Two private networks, 10.0.0.0/8 on the leftand 192.168.128.0/24 on the right side are connected via two routers. The left router is a

regular personal computer with an Intel™ Pentium™ 4 1.3 GHz CPU, 512 MB RAM and two100 Mbit/s network interfaces, the one on the right is AMD™’s DBAu1550™ developmentboard. The computers in the private networks have their routing tables set up to find theother networks.

Figure 2: Test network

The two test machines (10.0.0.2 and 192.168.128.71) are usual personal computers, too.They are running SuSE Linux 9.2 standard installations without any netfilter rules set andwith their X servers shut down. Cron jobs and unnecessary servers were disabled too. Thesame applies for the “router PC” (10.0.0.1/5.0.0.1). On both routers the current version of theIPsec-Tools (0.6.3) has been used.

Since no hardware network testing equipment had been available for these benchmarks,software tools have been used instead. FTP transfers of random data have been measured

to obtain a general overview of throughput at the end-user level (excluding protocoloverhead).

Figure 3: FTP throughput

FTP throughput results are shown in Figure 3. They are divided into unencrypted, IPsec(software) and IPsec (hardware) connections. Unencrypted communication lead to a

throughput of 63.65 Mbit/s, software encryption to 4.91 Mbit/s and hardware supportedencryption to 24.32 Mbit/s.

Page 6: hwcrypto

8/3/2019 hwcrypto

http://slidepdf.com/reader/full/hwcrypto 6/6

 

Figure 4: CPU load IPsec

Figure 4 shows the results for CPU load measurements. The first values show theconnection establishment phase, where CPU loads are small. The results have to be splitinto two independent measurements: simple software connections and hardware supportedones on DBAu1550™. For software only connections the CPU load is at about 31% on the“router PC” and almost 100% on the DBAu1550™. For hardware supported connections thisdrops to about 52% for the DBAu1550™, but increases to about 74% on the PC. This isbecause the PC has to do the encryption in software faster now, since the DBAu1550™increases its encryption speed by using the security engine.Since both values for the PC are under 90%, the ordinary PC can be excluded as the

bottleneck. With hardware supported encryption, CPU load decreases significantly on theDBAu1550™.

Conclusion

The benchmark clearly proves a significant increase in performance when using hardwareencryption support for IPsec links. The performance boost with hardware supportedencryption is 415% compared to bare software encryption. Additionally CPU load on theembedded processor drops from 100% to 52%, leaving processing power to do other tasks.All this is achieved with a synchronous Cryptographic API. Using an asynchronous interface,the values should increase even more.

References

[1] Evgeniy Polyakov: Asynchronous Crypto Layer[http://lists.logix.cz/pipermail/cryptoapi/2004/000163.html]

[2] David McCullough: OCF-Linux [http://ocf-linux.sourceforge.net/][3] Eugene Surovegin: HW Crypto Patches [http://kernel.ebshome.net/]

- All trademarks herein before mentioned are the property of their respective owners. -