Upload
elwin-casey
View
216
Download
0
Embed Size (px)
Citation preview
Accelerating Memory Decryption and Authentication With Frequent Value
Prediction
Weidong Shi Hsien-Hsin Sean LeeMotorola Labs Georgia Tech
2/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Security Frontier
Transistor Leaf Cell
Register/Unit Processor SoC
Embedded Secrets
Counterfeit Detection
Authentication/Secure Token
Isolation
Content Confidentiality
Circuit Camouflage/Obfuscation/Private Circuit
(Eurocrypt 02/06)
Secure MMU/Buses/Memory(CASES-04, ASPLOS-04,
PACT-06)
Secure Processor(e.g., IBM 06, MICRO-36/37/39,
ASPLOS 02/04, ISCA32/33)
Secure SoC
Chip De-liddingDie Analysis
Probing PCB
Side-channel
Clocking-Timing
Backdoor
3/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Secure Processor Architecture
Encrypted Memory
[MICRO-36,37, 39, ASPLOS-02,04, ISCA-32,33, IBM SecureBlue]
Trusted Secure Processor
Processor Core
Memory Enc/Dec,Integrity
Verification Engine
L2
4/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Agenda• Counter Mode Cipher
• “Direct Memory” Block Ciphers
• Frequent Value Speculation
• Performance Analysis
• Conclusion
5/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Counter Mode Encryption
Counter
Block Cipher(AES)
Plaintxt0
Ciphertxt0
XOR
Secret Key
One Time Pad
Nonce/IV
• Use Counter to generate a secret keystream that encrypts a memory block with a simple XOR
• Turn a block cipher into a stream cipher
6/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Counter Mode Encryption
Counter
Block Cipher(AES)
Plaintxt0
Ciphertxt0
XOR
Nonce Counter+1
Block Cipher(AES)
Plaintxt1
Ciphertxt1
XOR
Nonce Counter+N
Block Cipher(AES)
PlaintxtN
CiphertxtN
XOR
Nonce
• Use Counter to generate a secret keystream that encrypts a memory block with a simple XOR
• Turn a block cipher into a stream cipher
7/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Parallelization for Counter Mode Secure Arch
• OTP generation and Data fetch are done in parallel
• How to obtain Counter values– Counter Cache [MICRO36]– Prediction & Precomputation
[ISCA32]
Counter
Block Cipher(AES)
Plaintxt cache line X
Ciphertxt cache line X XORXOR
One Time Pad
Secure Processor
Memory
?
Nonce
8/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Block Cipher (ECB)
Block Cipher(AES)
Plaintxt0
Ciphertxt0
Secret Key
• “Direct” Memory Encryption• Electronic Code Book
9/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Block Cipher (ECB)
Block Cipher(AES)
Plaintxt0
Ciphertxt0
Secret Key
Block Cipher(AES)
PlaintxtN
CiphertxtN
Secret Key
• “Direct” Memory Encryption• Electronic Code Book
10/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Block Cipher (CBC)• Cipher-Block Chaining• A dependency with the neighboring ciphertext for
decrypting a target
Block Cipher(AES)
Plaintxt0
Ciphertxt0
Secret Key
XORInit. Vector
Block Cipher(AES)
Plaintxt1
Ciphertxt1
Secret Key
XOR
Block Cipher(AES)
Plaintxt2
Ciphertxt2
Secret Key
XOR
11/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Authenticated Encryption• The same cipher protects
– Confidentiality (tamper-resistance)
– Message Integrity (tamper-evidence)
• Offset Code Block (OCB)– One of the authenticated encryption methods– Non-malleable under chosen-ciphertxt -- which
counter mode is vulnerable to– 802.11i currently specifies AES-OCB as an
alternative to CCM for confidentiality and integrity
A B
C
A B
C
12/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Authenticated Encryption: OCB Encryption
Block Cipher(AES)
L pseudo random #
R
XOR
Secret Key
Nonce || mem addr
PlaintxtN
XOR
Block Cipher(AES)
Secret Key
aL+R
XORaL+R
CiphertxtN
13/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Authenticated Encryption: OCB Authentication
Plaintxt0 Plaintxt1 Plaintxt2 Plaintxt3
5L+R XOR
Block Cipher(AES)
Secret Key
Message Authentication Code(MAC)
Hash
14/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
OCB ─ Decryption and Integrity Verification
• Decryption can start after encrypted memory blocks are fetched.
• Decrypted blocks cannot be issued till its integrity is verified.
• MAC verification can take longer time than decryption.
E(B0)
Memory Fetch
E(B1) E(B2) E(B3)
Decryption
B0 B1 B2 B3
MAC Verification
Issue Issue Issue Issue
MAC
15/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Speculations in Secure Processor
Examples of
Prediction
Applicable Cipher Scenario
What can be Predicted
Why Predicable?
Counter Prediction[ISCA-32]
Counter Mode Encryption
Counter Values Coherence of Counter Values
Value Prediction[CF-07]
“Direct” Encryption mode
Encrypted Value Existence of Frequent Values
• Improve performance by taking advantage of – The nature of the data or,– Statistical property of the data.
• Do not compromise security as performed only
within the secure boundary.
16/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Analysis of Frequent ValuesFrequent value - 256K L2
0
10
20
30
40
50
60
70
80
90
Applu
apsi
art
bzip2
crafty
facerec
gadel
gcc
gzip
mcf
mesa
mgrid
parser
six
swim
twolf
vortex
vpr
wupw
ise
average
8 16 32
Frequent value - 1M L2
0
10
20
30
40
50
60
70
80
Applu
Apsi
Art
bzip2
crafty
facerec
gadel
gcc
gzip
mcf
mesa
mgrid
parser
six
swim
twolf
vortex
vpr
wupw
ise
average
• 40 to 60% encrypted memory data are frequent values
• 8 to 32 frequent values account for over 40% encrypted data
17/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Speculation Using Idle Pipelined Crypto Engine• Generate “encrypted” frequent values using
otherwise idle crypto engines
Encryption Pipeline
Memory Pipeline
Retrieving the Encrypted Cache Line Ek(X)
Frequent value Ek(A)
T1
Ek(B)
T2
Ek(C)
T3
Ek(D)
T4
Ek(E)
T5
Ek(F)
T6
Ek(G)
T7
=?
• Integrity verification can also be speculated. • Generate MAC for speculated frequent values
Ek(E) matches
Time Line
18/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Value Prediction Based Decryption
WBBuffer
Pipelined Encryption Engine
Pipelined Encryption Engine
Pipelined Decryption Engine
Scheduler
Cache
Returned Encrypted Data
Frequent Value Table
CAM
Secure processor
XYZW
E(X)E(Y)E(Z)E(W)
19/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Handle Large Block Size
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
Freq Value
Non-Freq Value
128-bit Cipher
128-bit Cipher
128-bit Cipher
128-bit Cipher
Four 64-bit frequent value blocks
• Under 128 bit cipher, is predictable. is not.
64-bitblock
64-bitblock
64-bitblock
64-bitblock
Predictable Blocks of Freq Value Blocks (%), L2=256KB
0
10
20
30
40
50
60
applu
apsi
art
bzip2
crafty
facerec
galgel
gap
gcc
gzip
mcf
mesa
mgrid
parser
sixtrack
swim
twolf
vortex
vpr
wupw
ise
average
20/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Block Re-ordering64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
64-bitblock
Predictable Freq Value Pair
Predictable Freq Value Pair
Predictable Blocks of Frequent Values Blocks (%) L2=256KB
0
20
40
60
80
100
applu
apsi
art
bzip2
crafty
facerec
galgel
gap
gcc
gzip
mcf
mesa
mgrid
parser
sixtrack
swim
twolf
vortex
vpr
wupw
ise
average
without reorder with_reorder
64-bitblock
64-bitblock
Freq Value
Non-Freq Value
21/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
…
1 0 0 0 0 1 0 10 1 0 1 0 1 1 0
0 1 0 0 0 0 1 1…
1 1 0 1 0 1 0 10 0 0 1 0 1 1 0
0 1 0 0 0 0 1 0
Frequent Value Map
• Speculation targeted only for frequent value blocks
• Overhead– 1 frequent value map bit
per encrypted block (128 bits)
– 8 bits per cache line (64B cache line size)
– 512 bits per page– Total 64K bits for 128-enry
TLB
• Can be shared for many other purposes – frequent value based cache
compression– power saving cache
Cache line FV bit map
Page
Pages in TLB
Frequent Value Map for All TLB Pages
…
0 1 0 1 0 1 0 11 0 0 1 0 1 1 0
0 0 0 1 0 1 1 0
22/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
MAC Speculation Speculated
Encrypted Block
Memory FetchMACSpeculation
Comparison
SpeculatedEncrypted Block
MACSpeculation
SpeculatedEncrypted Block
MACSpeculation
SpeculatedEncrypted Block
MACSpeculation
Comparison Comparison Comparison
• Compute MAC for speculated frequent value blocks
• Compare
• fetched encrypted block with speculated encrypted block
• fetched MAC with speculated MAC
• If both match, issue the fetched instruction/data
23/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Experimental Setup
Parameters Value
L1 I/D Cache DM, 16KB
L2 Cache 4way, unified, 256KB and 1MB
Memory Bus 8B wide, 1:4, 1:5, 1:6 Ratio
CPU Clock 1GHz
L1 Latency 1 cycle
L2 Latency 8 cycles (1MB), 4 cycles (256KB)
TDES Decryption Latency 96ns
AES Decryption Latency 65ns
Block Size 64-bit (Triple DES), 128-bit (AES)
24/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Results – Value PredictionIPC Speedup L2=256KB
1
1.051.1
1.151.2
1.251.3
1.35
applu
apsi
art
bzip2
crafty
facerec
galgel
gap
gcc
gzip
mcf
mesa
mgrid
parser
sixtrack
swim
twolf
vortex
vpr
wupw
ise
average
IPC Speedup L2=1MB
1
1.05
1.1
1.151.2
1.25
1.3
1.35
applu
apsi
art
bzip2
crafty
facerec
galgel
gap
gcc
gzip
mcf
mesa
mgrid
parser
sixtrack
swim
twolf
vortex
vpr
wupw
ise
average
25/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Performance ― Number of Frequent Values
• 64-bit block size
IPC Speedup L2=256KB
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
ap
plu
ap
si
art
bzip
2
crafty
face
rec
ga
lge
l
ga
p
gcc
gzip
mcf
me
sa
mg
rid
pa
rser
sixtrack
swim
two
lf
vorte
x
vpr
wu
pw
ise
ave
rag
e
8_freq_values 16_freq_values 32_freq_values
26/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Sensitivity to Memory SpeedIPC Speedup, L2=256KB, Ratio=1:4
11.05
1.11.15
1.21.25
1.31.35
1.4
ap
plu
ap
si
art
bzip
2
crafty
face
rec
ga
lge
l
ga
p
gcc
gzip
mcf
me
sa
mg
rid
pa
rser
sixtrack
swim
two
lf
vorte
x
vpr
wu
pw
ise
ave
rag
e
IPC Speedup, L2=256KB, Ratio=1:6
11.05
1.11.15
1.21.25
1.31.35
1.4
ap
plu
ap
si
art
bzip
2
crafty
face
rec
ga
lge
l
ga
p
gcc
gzip
mcf
me
sa
mg
rid
pa
rser
sixtrack
swim
two
lf
vorte
x
vpr
wu
pw
ise
ave
rag
e
27/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)
Conclusion
• Frequent value speculation can hide both• Decryption latency• Integrity verification latency• For direct memory block ciphers
• Encrypted values demonstrate predictability.
• We propose block re-ordering to consolidate the predictability
• Memory-bound benchmark programs show 10%- 30% performance improvement.
Thank You!
Georgia TechECE MARS Labshttp://arch.ece.gatech.edu