Side-channels Tom Ristenpart CS 6431. Pick target(s) Choose launch parameters for malicious VMs Each...

Preview:

Citation preview

Side-channels

Tom RistenpartCS 6431

Pick target(s) Choose launch parametersfor malicious VMs

Each VM checks for co-residence

Frequently achieve advantageous placement

“Cloud cartography”

Recall from last time:

Hardware

Hypervisor

OS1

P1 P2

OS2

P1 P2

This shouldn’t matter if VMM provides good isolation!

Today

• RSA remote side-channel attack• Flush+Reload side-channel attack

RSA reminder

ZN = { i | gcd(i,N) = 1 }*

Claim: Suppose e,d Zφ(N) satisfying ed mod φ(N) = 1 then for any x ZN we have that (xe)d mod N = x

*

*

(xe)d mod N = x(ed mod φ(N)) mod N = x1 mod N = x mod N

First equality isby Euler’s Theorem

φ(N) = |ZN|

RSA reminder

ZN = { i | gcd(i,N) = 1 }*

Claim: Suppose e,d Zφ(N) satisfying ed mod φ(N) = 1 then for any x ZN we have that (xe)d mod N = x

*

*

Z15 = { 1,2,4,7,8,11,13,14 }*

e = 3 , d = 3 gives ed mod 8 = 1

Zφ(15) = { 1,3,5,7 }*

x 1 2 4 7 8 11 13 14

x3 mod 15 1 8 4 13 2 11 7 14

y3 mod 15 1 2 4 7 8 11 13 14

φ(N) = |ZN|

X fpk(X)

easy given N,e

hard given N,eeasy given N,d

The RSA trapdoor permutation

pk = (N,e) sk = (N,d) with ed mod φ(N) = 1

fN,e(x) = xe mod N gN,d(y) = yd mod N

pk = (N,e) sk = (N,d) with ed mod φ(N) = 1

fN,e(x) = xe mod N gN,d(y) = yd mod N

But how do we find suitable N,e,d ?

If p,q distinct primes and N = pq then φ(N) = (p-1)(q-1)

Why?

φ(N) = |{1,…,N-1}| - |{ip : 1 ≤ i ≤ q-1}| - |{iq : 1 ≤ i ≤ p-1}| = N-1 - (q-1) - (p-1) = pq – p – q + 1 = (p-1)(q-1)

The RSA trapdoor permutation

pk = (N,e) sk = (N,d) with ed mod φ(N) = 1

fN,e(x) = xe mod N gN,d(y) = yd mod N

But how do we find suitable N,e,d ?

If p,q distinct primes and N = pq then φ(N) = (p-1)(q-1)

Given φ(N), choose e Zφ(N) and calculate d = e-1 mod φ(N)

The RSA trapdoor permutation

*

We don’t know if inverse is true, whether inverting RSA implies ability to factor

Learning p,q from N is the factoring problem

Textbook exponentiation

Exp(h,x,N)X’ = hFor i = 2 to x do

X’ = X’*hReturn X’

SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do

f = f2 mod NIf bi = 1 then

f = f*h mod NReturn f

How do we compute hx for any h ZN?

Requires time O(|G|) in worst case.

Requires time O(k) multiplies and squares in worst case.

*

SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do

f = f2 mod NIf bi = 1 then

f = f*h mod NReturn f

f3 = 1 h

f2 = h2 f1 = (h2)2 h

b3 = 1

b2 = 0

b1 = 1

f0 = (h4 h)2 h b1 = 1 = h8 h2 h

h11 = h8+2+1 = h8 h2 h

SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do

f = f2 mod NIf bi = 1 then

f = f*h mod NReturn f

What side-channels might arise?• Timing• CPU state (caches, branch predictors,…)• Power

“Remote” attack setting

Hardware

Hypervisor

OS1

RSA-Decrypt

OS2(Attacker)

Co-located placement on cloud instance

RSA-Decrypt takes adversarially supplied ciphertext c ZN and computes cd mod N

Attacker can time operation

Where would this come up in practice?

*

(Part of) TLS handshake forRSA transportClient Server

PMS <- D(sk,C)

ClientHello, MaxVer, Nc, Ciphers/CompMethods

ServerHello, Ver, Ns, SessionID, Cipher/CompMethod

CERT = (pk of bank, signature over it)Check CERTusing CA publicverification key

Pick random Nc

Pick random Ns

Pick random PMSC <- E(pk,PMS)

C

SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do

f = f2 mod NIf bi = 1 then

f = f*h mod NReturn f

C1

t1

C2

t2

Prior work before BB: Kocher 96 timing attackx = bk bk-1 … bk-i+1 uk-i uk-i-1 … u0

Guess some bits at front Remaining exponent bits are unknown

Predict how long it should take to decrypt (random) value C• If guess is right, predictions will correlate with observed timings• If guess is wrong, no correlations• As more bk … bk-i+1 known, predictions get strongerKocher shows can do with q = O(k)

Kocher attack in practice?

• BB point out:– OpenSSL widely used implementation of TLS

• Kocher attack doesn’t work against it:– OpenSSL uses CRT, sliding windows, two different

modular multiplication algorithms– CRT: Chinese Remainder Theorem– Sliding window: exponentation by handling

exponent in chunks, not one bit at a time

Chinese Remainder Theorem

For n = n1n2…nk with gcd(ni,nj) = 1 System of congruences x = x1 mod n1 = … = xk mod nk

has exactly one solution and we can find it efficiently

For RSA with N = pq d1 = d mod (p-1) d2 = d mod (q-1)q-1 = q-1 mod pCompute m1 = hd1 mod p m2 = hd2 mod qCompute m = m1 + (q-1 *(m1 – m2) mod p) * q

Towards a timing attackDecryption requires m2 = hd2 mod q• Square-and-multiply (lots of multiplies mod q)• Sliding window (fewer multiplies mod q)Multiply x*y mod q uses Montgomery reduction• Let R = 2z for some z• Compute (xR*yR)*R-1 = zR mod q– Fast to compute R-1

– At end of computation: zR > q then subtract q

Schindler’s observation: Pr[subtract q] ≈ (h mod q ) / 2R

R-1 * hlo mod N

tlo

R-1 *hhi mod Nthi

Guess first few bits of q = bk … b1

Let hlo = bk bk-1 … bk-i+1 0 0 0 … 0Let hhi = bk bk-1 … bk-i+1 1 0 0 … 0

If bk-i = 0 then: hlo < q < hhi

If bk-i = 1 then: hlo < hhi < q

q

Dec

rypti

on ti

me

hlo

hhi if bk-1 = 0

hhi if bk-1 = 1

RSA-Decrypt

Many more details

• Secondary timing effect– Karatsuba versus standard multiplication– Opposite timing difference for hhi - hlo – Use absolute values: one effect always dominates

• Sliding window (few multiplications)– Amplify by querying a bunch of values:

hhi , hhi + 1 , hhi + 2, …– Called a neighborhood

Timing differences measurable

The attack worked…

• ~1.5 million queries for 1024 bit RSA• 2 hours on that era’s computers• Blinding countermeasure:

y = re * c mod N m = yd / r mod N

SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do

f = f2 mod NIf bi = 1 then

f = f*h mod NReturn f

Hardware

Hypervisor

OS1

RSA-Decrypt

OS2(Attacker)

Cache-based side channel attacks• A long literature on cache side channel attacks– Percivel 2005: RSA side channels – Tromer et al. 2005: AES side channels

• Today: particularly simple one by Yarom and Faulkner useful in PaaS clouds

Attacker VM

Victim VM

Main memory

L1 instr cache

Prime+Probe protocol

Attacker VM

(each row represents cache set)

Schedulingorder on CPU core

Victim VM

Attacker VM

• Timings correlated to (distinct) cache usage patterns of S, M operations

• Can spy frequently (every ~16 μs) by exploiting scheduling

Interrupt

Interrupt

Slow

Fast

Runs (S)operation

Runs (M)operation

Prime+Probe limitations

• Works only for L1 caches– Some recent work extending to LLC in certain

settings– Multi-core settings difficult (Zhang et al. 2012)

• Lots of noise from various sources

Towards Flush+ReloadDeduplication-based memory page sharing (Linux, KVM, VMWare)• Duplicate memory pages

detected, physical pages coalesced

• Virtual address spaces different, but mapped to same physical addresses

Main memory

Libgcrypt square( ) instructions

Inclusive cache architecture

Towards Flush+ReloadDeduplication-based memory page sharing (Linux, KVM, VMWare)• Duplicate memory pages

detected, physical pages coalesced

• Virtual address spaces different, but mapped to same physical addresses

Main memory

Libgcrypt square( ) instructions

Inclusive cache architecture

Flush+Reload protocol

• Flush from LLC memory line of interest• Wait• Time reloading memory line

Not-accessed by victim

Accessed by victim

Attacking Square and Multiply

SqrAndMulExp(h,x,N)bk,…,b0 = xf = 1For i = k down to 0 do

f = f2 mod NIf bi = 1 then

f = f*h mod NReturn f

Flush+Reload type attacks damaging

• Immediately used in a large number of follow-up papers to break various things– Cryptographic targets– Non-cryptographic targets

• Requires memory deduplication– Turned off in Amazon EC2

• But not in Linux by default– PaaS services vulnerable [Zhang et al. 2014]

32

General victim execution tracing

#include "stdio.h”int b;int inc(int number) {

return number + 1;}int main() {

int a = 9;if (a % 2 == 1)

a = inc(a);b = a;return 0;

}

33

Control-Flow Graph#include "stdio.h”int b;int main() {

int a = 9;if (a % 2 == 1)

a = inc(a); int inc(int number) {return number + 1;

}

b = a;return 0;

}

34

Control-Flow Graph

4004b6: mov $0x9,%edi 4004bb: callq 4004b4 <inc>

400324: lea 0x1(%rdi),%eax400327: retq

4004c0: mov %eax,0x200b60(%rip) 4004c6: mov $0x0,%eax4004cb: retq

chunk 1: [400480-4004bf]

chunk 3: [4004c0-4004ff]

chunk 2: [400300-40033f]

35

Control-Flow Graph

chunk1[400480-4004bf]

chunk2[400300-40033f]

chunk 3[4004c0-4004ff]

36

An Attack NFA

{c1}

{c2, c3}

{c3}

Flush-Reload

Flush-Reload

Flush-Reload

{}

(c1, t1, t2)

(c2, t3, t4)

chunk1[400480-4004bf]

chunk2[400300-40033f]

chunk 3[4004c0-4004ff]

start

(c3, t7, t8)

(c3, t5, t6)

37

Three Example Attacks

• Inferring Sensitive User Data

• SAML-based Single Sign-on Attacks

• Password-Reset Attack

38

Three Example Attacks

• Inferring Sensitive User Data

• SAML-based Single Sign-on Attacks

• Password-Reset Attack

39

Password Reset

Password Reset

Token Verification

Pseudo-Random Number

GeneratorUser

Token

40

Password Reset Attack

Password Reset

Token Verification

Pseudo-Random Number

Generator

User

Token Pseudo-Random Number

Generator

Shared Library

gettimeofday()

Attacker

Attack NFA

Library calls

TokenPrediction

Shared OSApplication Application

41

Call Graph of Password Resetting

42

Attacker’s StrategyAttacker

applicationVictim

applicationAttacker’s

email account

Password reset against his own account

Password reset token(attack NFA)

Offline: Bruteforce value of getpid()

Password reset against victim account

Online: token guessing

(attack NFA)

(HTTP keepAlive)

43

The Attack NFA

44

Evaluation

• Demonstrated successful attacks against Magento (controlled by ourselves) in a public PaaS cloud.

• After 220 offline computation, the attacker can narrow down the password reset token to 22 possible values---easy to brute-force online.

Side-channel countermeasures

• Constant time code– Input-independent memory accesses, timing

• Very difficult to truly get!– Input-dependent instruction timing on CPU

• Best bet: don’t share physical resources

Recommended