41
Cryptography and secure * systems * in the real world Vsevolod Stakhov [email protected] December 12, 2014

Cryptography and secure systems

Embed Size (px)

Citation preview

Cryptography and secure* systems

* in the real world

Vsevolod [email protected]

December 12, 2014

How expensive is encryption nowadays▶ New hardware:

– specialized encryption instructions (AES-NI)– vectorized operations (SSE, AVX, AVX2, AVX512)

▶ New algorithms:– optimized chaining mode (e.g. CTR instead of CBC)– optimized algorithms (from 3DES to ChaCha20 )

▶ New protocols

2 of 30

How expensive is encryption nowadays▶ New hardware:

– specialized encryption instructions (AES-NI)– vectorized operations (SSE, AVX, AVX2, AVX512)

▶ New algorithms:– optimized chaining mode (e.g. CTR instead of CBC)– optimized algorithms (from 3DES to ChaCha20 )

▶ New protocols

2 of 30

How expensive is encryption nowadays▶ New hardware:

– specialized encryption instructions (AES-NI)– vectorized operations (SSE, AVX, AVX2, AVX512)

▶ New algorithms:– optimized chaining mode (e.g. CTR instead of CBC)– optimized algorithms (from 3DES to ChaCha20 )

▶ New protocols

2 of 30

How expensive is encryption nowadaysHardware performance

2011: Westmere (SSE4, AES-NI):

1 2 3 4 5 6 7 8Cores count

0

2000

4000

6000

8000

10000

12000

Thro

ughp

ut m

bits

/sec

ond

Encryption throughtput from the CPU cores count

AES-128-GCMAES-256-GCMChacha20-Poly1305

Figure : XeonE7, 2.1 GHz, 8 CPU cores

3 of 30

How expensive is encryption nowadaysHardware performance

2012: Sandy Bridge (AVX, AES-NI):

1 2 3 4 5 6 7 8Cores count

5000

10000

15000

20000

25000

30000

35000

Thro

ughp

ut m

bits

/sec

ond

Encryption throughtput from the CPU cores count

AES-128-GCMAES-256-GCMChacha20-Poly1305

Figure : Xeon E3, 3.4 GHz, 8 CPU cores

4 of 30

How expensive is encryption nowadaysHardware performance

2013: Haswell (AVX2, AES-NI):

1 2 3 4 5 6 7 8Cores count

10000

15000

20000

25000

30000

35000

40000

45000

50000

55000

Thro

ughp

ut m

bits

/sec

ond

Encryption throughtput from the CPU cores count

AES-128-GCMAES-256-GCMChacha20-Poly1305

Figure : Core-i7 4770, 3.5 GHz, 8 CPU cores

5 of 30

How expensive is encryption nowadaysHardware performance

Pre-historic ages: Core2 quad (SSE3):

1.0 1.5 2.0 2.5 3.0 3.5 4.0Cores count

0

1000

2000

3000

4000

5000

6000

7000

8000

Thro

ughp

ut m

bits

/sec

ond

Encryption throughtput from the CPU cores count

AES-128-GCMAES-256-GCMChacha20-Poly1305

Figure : Core2-quad, 1.5 GHz, 4 CPU cores

6 of 30

How expensive is encryption nowadaysAlgorithm performance

Important properties:▶ rekeying interval▶ hardware vs software oriented algorithms▶ constant time implementation▶ space required (mostly significant for embedded systems)

7 of 30

How expensive is encryption nowadaysAlgorithm performance

ChaCha20 :

8 of 30

How expensive is encryption nowadaysAlgorithm performance

ChaCha20 round:void qr(a,b,c,d) {

a += b; d ^= a; d <<<= 16;c += d; b ^= c; b <<<= 12;a += b; d ^= a; d <<<= 8;c += d; b ^= c; b <<<= 7;

}

for (i = chacha_rounds;i > 0;i -= 2) {qr(x0, x4, x8,x12)qr(x1, x5, x9,x13)qr(x2, x6,x10,x14)qr(x3, x7,x11,x15)qr(x0, x5,x10,x15)qr(x1, x6,x11,x12)qr(x2, x7, x8,x13)qr(x3, x4, x9,x14)

}

▶ Each round modifies the whole block▶ Each quarter round operation is independent from othersin the round

▶ Efficient diagonal optimizations for SSE/AVX/AVX512▶ Need even number of rounds (20 or 12 typically)

9 of 30

How expensive is encryption nowadaysAlgorithm performance

ChaCha20 round:void qr(a,b,c,d) {

a += b; d ^= a; d <<<= 16;c += d; b ^= c; b <<<= 12;a += b; d ^= a; d <<<= 8;c += d; b ^= c; b <<<= 7;

}

for (i = chacha_rounds;i > 0;i -= 2) {qr(x0, x4, x8,x12)qr(x1, x5, x9,x13)qr(x2, x6,x10,x14)qr(x3, x7,x11,x15)qr(x0, x5,x10,x15)qr(x1, x6,x11,x12)qr(x2, x7, x8,x13)qr(x3, x4, x9,x14)

}

▶ Each round modifies the whole block▶ Each quarter round operation is independent from othersin the round

▶ Efficient diagonal optimizations for SSE/AVX/AVX512▶ Need even number of rounds (20 or 12 typically)

9 of 30

How expensive is encryption nowadaysAlgorithm performance

Advantages of ChaCha20 :▶ clear and simple design▶ 512 bits of block size (up to 270 bytes before rekeying)▶ fit very well for vectorized operations (and especially forAVX)

Current usage of ChaCha20-Poly1305 :▶ openssh▶ libressl and boringssl▶ libottery fast pseudo-random generator▶ OpenBSD arc4random▶ Chrome browser▶ proposed IETF standard for TLS and IPSEC

10 of 30

How expensive is encryption nowadaysProtocols performance:

Figure : TLS connection establishment

11 of 30

Building secure systemsGeneral problems when building secure systems:

▶ Curse of backward compatibility

▶ Complex and inconsistent API (OpenSSL)▶ Bad default settings▶ Permit everything by default

12 of 30

Building secure systemsGeneral problems when building secure systems:

▶ Curse of backward compatibility▶ Complex and inconsistent API (OpenSSL)

▶ Bad default settings▶ Permit everything by default

12 of 30

Building secure systemsGeneral problems when building secure systems:

▶ Curse of backward compatibility▶ Complex and inconsistent API (OpenSSL)▶ Bad default settings▶ Permit everything by default

12 of 30

Building secure systemsBackward compatibility

Practical example - LibreSSL replacement of OpenSSL▶ Legacy API - DES support in OpenLDAP▶ Invalid random number generators - RAND_egd which isvalid merely for Linux (Python, Curl, Wget and many otherports)

▶ SSL3 and even SSL2 support (Curl)▶ Unnecessary engine functions (many ports, includingapache)

Practically all the issues listed are also valid for the upcomingOpenSSL 1.0.2 branch.

13 of 30

Building secure systemsBackward compatibility

▶ Protocols design flaws

▶ Poor algorithmic choices:– Reduce security– Remove important properties, e.g. forward secrecy property– Can be very slow ( 3DES ) and hence lead to computational DoS

14 of 30

Building secure systemsBackward compatibility

▶ Protocols design flaws▶ Poor algorithmic choices:

– Reduce security– Remove important properties, e.g. forward secrecy property– Can be very slow ( 3DES ) and hence lead to computational DoS

14 of 30

Building secure systemsDesign flaws example: TLS

Encrypt and MAC choice.▶ Sensitive to side attacks (PaddingOracle, LuckyThirteen)▶ Inefficient computation▶ No integrity on ciphertext (can be dangerous if used withmalleable ciphers)

▶ No protection against the attacks to a cipher itself

Proposed: encrypt-then-mac extension in RFC7366.

15 of 30

Building secure systemsCBC mode

16 of 30

Building secure systemsCBC mode

▶ Assumes that cipher is secure for all data even if it iscontrolled by an attacker

▶ Decryption differs from encryption▶ Cannot be computed in parallel▶ Needs careful padding (Poodle attack)

Proposed: use ciphers in counter mode only (not compatiblewith old browsers)

17 of 30

Building secure systemsCounter mode

18 of 30

Building secure systemsCounter mode

▶ Ciphers accepts only deterministic counter input▶ Decryption is equal to encryption▶ Can be computed in parallel▶ Need to ensure that the counter never ever repeats

19 of 30

Building secure systemsAPI design flaws

▶ all software contain mistakes

▶ complicated API increases chance of mistakes some ofthem are security vulnerabilities

▶ inconsistent API provokes misusage▶ always prefer simple and widely used libraries (meaningdo not implement your own cryptographic library)

20 of 30

Building secure systemsAPI design flaws

▶ all software contain mistakes▶ complicated API increases chance of mistakes some ofthem are security vulnerabilities

▶ inconsistent API provokes misusage▶ always prefer simple and widely used libraries (meaningdo not implement your own cryptographic library)

20 of 30

Building secure systemsAPI design flaws

▶ all software contain mistakes▶ complicated API increases chance of mistakes some ofthem are security vulnerabilities

▶ inconsistent API provokes misusage

▶ always prefer simple and widely used libraries (meaningdo not implement your own cryptographic library)

20 of 30

Building secure systemsAPI design flaws

▶ all software contain mistakes▶ complicated API increases chance of mistakes some ofthem are security vulnerabilities

▶ inconsistent API provokes misusage▶ always prefer simple and widely used libraries (meaningdo not implement your own cryptographic library)

20 of 30

Building secure systemsAPI design flaws: OpenSSL

Certificates verification.▶ Terribly complicated - just look at the documentation ofSSL_set_verify and auxiliary function SSL_set_ex_data :

ssl = X509_STORE_CTX_get_ex_data(ctx, SSL_get_ex_data_X509_STORE_CTX_idx());mydata = SSL_get_ex_data(ssl, mydata_index);

▶ You need to check certificate CN manually (not evencovered in the example in the manual page)

▶ You need manually check all extensions such as SNI orALPN (and many openssl users fail to do it correctly)

21 of 30

Building secure systemsAPI design flaws: OpenSSL

Macro based API.▶ Inconsistent:

1. Many ways to do the same thing, for example EVP and legacyand obsoleted interfaces: EVP_PKEY_encrypt andRSA_public_encrypt .

2. Confusing names PEM_read_RSAPublicKey ,PEM_read_RSA_PUBKEY and PEM_read_PUBKEY

▶ Dangerous pointers API:unsigned char *sk, *p;size_t sklen;

sklen = i2d_ECPrivateKey(ec_key, NULL);sk = malloc(sklen);p = sk;i2d_ECPrivateKey(ec_key, &p);/* p is now at the end of sk */

22 of 30

Building secure systemsAPI design flaws: some practical advices

Higher level libraries:▶ libtls from OpenBSD (formely ressl):

#include <tls.h>

struct tls *tls_client(void);int tls_connect(struct tls *ctx, const char *host, const char *port);int tls_read(struct tls *ctx, void *buf, size_t buflen, size_t *outlen);int tls_write(struct tls *ctx, const void *buf, size_t buflen, size_t *outlen);int tls_close(struct tls *ctx);

▶ libsodium (for generic encryption/signing)/* Encrypt */memset(out, '\0', crypto_box_ZEROBYTES);memcpy(out + crypto_box_ZEROBYTES, plain, length);rc = crypto_box(enc, out, crypto_box_ZEROBYTES + length, nonce, pk, sk);

/* Decrypt */memset(in, '\0', crypto_box_BOXZEROBYTES);memcpy(in + crypto_box_BOXZEROBYTES, enc, length);rc = crypto_box_open(plain, in, crypto_box_BOXZEROBYTES + length, nonce, pk, sk);if (rc != 0) /* Verification error */

23 of 30

Building secure systemsInsecure default settings

▶ Enable all possible extensions even if they are not used(Heartbleed)

▶ Be compatible with old and broken systems, notablyWindows XP (SSL Poodle)

▶ Prefer client settings▶ Too many sources of trust

24 of 30

Building secure systemsSecure defaults: basics

▶ Explicitly set order and disable weak ciphersuites:– Disable SSLv3 if possible (or enable TLS_FALLBACK_SCSV in

both client and server)– Always prefer key exchange with ephemeral keys– Disable 3DES to avoid computational DoS– Prefer strong cipher suites, e.g.

EECDH+ECDSA+CHACHA20-POLY1305:EECDH+ECDSA+AESGCM

▶ Set sane expiration for keys▶ Use elliptic curve cryptography (if possible)▶ Be very careful when choosing entropy source: e.g.prefer getentropy(2) call in OpenBSD to /dev/randomespecially with sandboxing or containers

25 of 30

Performance: current progress▶ Experimental support of AES-GCM in OpenBSD andFreeBSD* kernels, problems with additional registerssaving on context switch

▶ Support of ChaCha20-Poly1305 in libressl, boringssl andlibgrypt (and gnutls)

▶ Deprecating of RC4 in OpenBSD (proposed in FreeBSDand NetBSD as well) and switching to ChaCha20 forarc4random

*by John-Mark Gurney 26 of 30

Performance: current progress▶ Experimental support of AES-GCM in OpenBSD andFreeBSD* kernels, problems with additional registerssaving on context switch

▶ Support of ChaCha20-Poly1305 in libressl, boringssl andlibgrypt (and gnutls)

▶ Deprecating of RC4 in OpenBSD (proposed in FreeBSDand NetBSD as well) and switching to ChaCha20 forarc4random

*by John-Mark Gurney 26 of 30

Performance: current progress▶ Experimental support of AES-GCM in OpenBSD andFreeBSD* kernels, problems with additional registerssaving on context switch

▶ Support of ChaCha20-Poly1305 in libressl, boringssl andlibgrypt (and gnutls)

▶ Deprecating of RC4 in OpenBSD (proposed in FreeBSDand NetBSD as well) and switching to ChaCha20 forarc4random

*by John-Mark Gurney 26 of 30

Building secure systemsAdvanced topics

Some additional modern technologies to build securesystems.

▶ DANE/DNSSEC allows to move from CA and PKI trustrelationships to DNS trust

– DNSSEC requires recursion being used for all DNS clients– Has some risk of replay based attacks (especially in clouds)– Increases the risk of DNS amplification attacks (can be mitigated

by rate-limits)– Implies ECC if you want to store the whole certificate in DNS (for

speed)– Requires additional DNS requests– Can be filtered by middleboxes– Still fragile if your 1-st level zone provider is not trusted

▶ Use opportunistic transport layer encryption: tcpcrypt▶ Use type safe TLS libraries: ocaml-tls, mitls

27 of 30

Building secure systemsAdvanced topics: DNS security

DNSCurve: complete DNS traffic encryption▶ Encrypts and authorize all traffic between client andauthoritative server

▶ Breaks all intermediate caches if enabled▶ Requires global deployment

DNSCrypt: encrypt the last mile of DNS exchange▶ Encrypts and protects all communications between clientand recursor

▶ Can be used with DNSSEC▶ Can be used even without explicit authorization but stillprotects from passive snooping (e.g. in open wirelessnetworks)

28 of 30

Building secure systemsAdvanced topics: Transport encryption

tcpcrypt - transport layer encryption (MAC is added for eachpacket as TCP option)

▶ Opportunistically encrypts all TCP connection invisible forapplications

▶ Compatible† with middleboxes in the Internet▶ Requires zero setup and establish encrypted connectionsautomatically

▶ Faster than SSL

Disadvantages:▶ No authority checks (can be done via DANE however)▶ Not supported by the most of TCP stacks (experimentallinux kernel support and divert socket concept version)

▶ Increases TCP handshake duration to 4 RTT▶ Incompatible with TSO†almost 29 of 30

Questions?

[email protected]

30 of 30