Upload
gage-shera
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
TheThe Emerging Secure Processor DesignsEmerging Secure Processor Designs
Youtao ZhangYoutao Zhang
University of PittsburghUniversity of Pittsburgh
Outline
Background Why to design secure processors?
Secure Processor Design What can secure processors do?
Future trends What will future secure processors be?
Cyber Security is Important
Widespread Internet and Network accesses Good connectivity but easier attack
Cyber attack is a real problem More and more people are being affected Leading cause of financial losses
Software piracy: $29B in 2003 Virus: $55B in 2003
More severe for mission critical applications battlefields
Various Types of Security Attacks
Intellectual property theft Illegally copying software
Virus/Worms Triggered by a special event, a malicious program can
do harmful things Trojans
Accessing the computer through a back door Denial of service
Limitation of Software-based Designs
Current practices: Using serial code
Causing piracy a little bit difficult, but not too much Install/update anti-virus software
Defend known viruses Firewall/intrusion detection
Configuration is not that easy, false +/– Secure software design
Performance is an issue
Enhance Security through Hardware
Secure co-processors Speed up crypto-computation operations
IBM 4758 secure coprocessor
Secure embedded processors Tamper resistant registers/buffers
Storing sensitive information Smartcard
Emerging Secure Processors
Trusted computing group (TCG )
IBM: SecureBlue, 4/2006 Intel: LaGrande, 2002 Microsoft: NGSCB
What Secure Processors can Do?
Design Objectives Protect sensitive data, program execution, intellectual
property, network communication Confidentiality: adversary cannot know the data Integrity: adversary cannot tamper with the data
Ease of use, execution speed, backward compatibility, manageability Performance
The First Design: Let us Encrypt ALL
XOM model (Execution only memory) ASPLOS’2000
Design goal Protecting intellectual property Secure execution even the hardware and OS are captured
Design strategy Everything offchip is encrypted
What is Trusted?
Disk
TCB
CPUcore
Cache OSMemory
Processor Boundary
Only CPU is trusted Hacking a high-end CPU chip is difficult
It is possible to scan an embedded chip using a microscope
Other components are not trusted
XOM Design
Use processor’s PKI to encrypt key
CPUCore
Cache
….
key
En-/De-cryptionUnit
….Processor
Memory
keyPrivate KeyPublic Key
key
Use session key to encrypt program
Problem 1: Slow Encryption
Encryption Only Memory Latency =100 cycles, encryption Latency = 50 cycles
Performance Degradation for XOM
34
42
23
16
29
2
39
2
13
8
22 21
0
10
20
30
40
50
ammp art bzip2equake gcc gzip mcf mesa parser vortex vpr Average
Execu
tion T
ime Incr
ease
[%
]
The Lengthened Critical Path
Encryption lies on the memory access critical path
Main MemoryMain MemoryMain MemoryMain Memory
Write Write BufferBufferWrite Write BufferBuffer
L2 CacheL2 CacheL2 CacheL2 Cachew
ritew
rite
Encryption/ Decryption
Unit
readread
Our Work
Slow memory accessmemory access and crypto-operationcrypto-operation XOM design: sequential Performance degradation
~20% for SPEC2000 benchmark program
Our design: parallel Performance degradation: 2% However
Direction encryption is no longer secure Adopt One-time-pad scheme
One-Time-Pad Encryption
XOM : clear-data = AES (cipher-data ) Our design: clear-data = cipher-data OTP
OTP = AES ( address || counter ||…)
Main MemoryMain MemoryMain MemoryMain Memory
L2 CacheL2 CacheL2 CacheL2 Cache
Encryption/ Decryption
Unit
Main MemoryMain MemoryMain MemoryMain Memory
L2 CacheL2 CacheL2 CacheL2 Cache
Encryption/ Decryption
Unit
CPU CPU
Offloading the Crypto-Computation
Original scheme Encryption input depends on memory accesses Carried in serial with memory accesses Latency: 100 cycles + 50 cycles = 150 cycles
Our scheme Decouple en-/de- cryption and memory accesses Carried in parallel Hide the crypto-computation latency
One-Time Pad (OTP) Encryption
AES normal mode
OTP encryption
cleartext AES ciphertext
cleartext ciphertext
random value (pad)AESseed seed
Seed Selection
Independent of data value, known before data is available Use memory address
Multiple accesses of the same location use different seeds Use one-time sequence
The Sequence Numbers
time t0 t1 t2
V 1 2 3
(1) Use A only P(A) 1 P(A) 2 P(A) 3
(2) Use A and t P(A,t0)1 P(A,t1)2 P(A,t2)3
Write V → A
OTP = AES (Address, one-time-seq)
OTP = AES (Seed)= AES (nonce, Address, one-time-seq)
Comparing XOM and OTP Based Encryption
XOM XOM w/ OTP
A1 100
A2 100
A1 Ekey(100)
A2 Ekey(100)
A1 Ekey(A1,t1)100
A2 Ekey(A2,t2)100
t1 100
t2 100
A t1 Ekey(100)
t2 Ekey(100)
t1 Ekey(A,t1)100
t2 Ekey(A,t2)100
spatially
temporally
Our scheme better randomizes encrypted data in memory
More Intuitively
Our Architectural Design
Write Write BufferBufferWrite Write BufferBufferread
L2 CacheL2 CacheL2 CacheL2 Cache
Sequence Sequence Number Number CacheCache
Physical Address
write
Virtual Address
EncryptionUnit
Security BoundarySecurity Boundary
Main MemoryMain MemoryMain MemoryMain Memory
Experimental Results
Settings Simplescalar toolset SPEC2000 programs
Simulation Baseline 4-issue out of order Caches
Separate 32KB 4-way L1 I-cache and D-cache 256KB 4-way L2 cache
Performance Comparison
0
10
20
30
40
Pro
gra
m S
low
dow
n [%
] XOM SNC-LRU
Equal Area Comparison
0.0
0.4
0.8
1.2
Norm
alize
d E
xecu
tion T
ime w
rt
XOM-256KL2 XOM-384KL2 SNC-32way-LRU-256KL2
1.211.21
1.121.12
1.021.02
SNC of Different Size
0
1
2
3
4
5
6
7
8
Slo
wdow
n for diff
ere
nt SNC si
zes
32KB 64KB 128KB
17.89
SNC of Different Associatively
0
2
4
6
8
10
12
ammp bzip2 gcc mcf parser vprSlo
wdow
n for diff
ere
nt SNC a
ssociativi
ty [%
]
fully associative 32-way set associative
II: Tamper Resistant Memory Model
Replay attack
CPUCore
CacheEn-/De-cryption
Unit
….Processor
Memory
keyPrivate KeyPublic Key
key
$1000
$1000$1000
$1000
$1000$50
Merkle TreeMemory Data
MAC1_1
MAC1_2
MAC2_1
MAC2_2
Root MAC
Ensuring Data Integrity
Any change in memory Result in a new root-MACroot-MAC
Update for each memory write Detect illegal modification of memory
Format of released code Code/data segments An encrypted session key An encrypted root-MAC
keyRoot-MAC
III: Protecting Multiprocessors
CPU/Memory is protected Confidentiality Integrity
CPU/CPU is NOTNOT Require both
Main Memory
A B C
BUS
Bus Attacks in Real Life
Mod-ChipModify game console
to boot up all CD/DVDs!
DSP Chip
BIOS Chip
The Algorithm
Secure Protect both confidentiality and integrity
Performance: < 1% Fast crypto operations Pad update in parallel
AES P
AESP
pad1
1
2pad
34
3
Data Bus
Originating Processor Snooping Processors
C C
Potential Attacks on the Bus
Type 1: Dropping 11 22 33
Type 2: Reordering 11 22 33
Type 3: Spoofing11 22 22
11 22 33
×
Secure SMP multiprocessors
Design goals To secure cache-to-cache bus transfers Security
Confidentiality Integrity
Efficiency Fast
Cryptographic Algorithms
CBC-AESP P
Mlast
C
Mlast
MAES
CAES
M
CFB-AES
AESM
P
Mlast
AESC
P
C
Mlast
Bus Encryption Scheme
Crypto Operations Fast: One XOR operation
Pad Update Done in parallel with bus transfer
AES P
AESP
pad1
1
2pad
34
3
Data Bus
Originating Processor Snooping Processors
C C
attack
Basic Authentication Scheme
Detect an attack Check the one on its own and the one received
Issues: Periodic authentication Previous data sequence (chaining mode) PID should also be included ….
Pt
AESMACt
MACt-1
Bus
==
MACt'
Defending Type 1: Dropping Attack
×
111 1
× 22111 13333 332 2
[PACT’04]
Defending Type 1: Dropping Attack
111
22
1
1,2
nn 1..n
×111 22 133 2,31,3
Pt
AESMACt
MACt-1
33 33
Defending Type 2: Reordering Attacks
11
22
nn
1
1,2
1..n
22
11
nn
2
2,1
2,1,..,n
Defending Type 3: Spoofing Attacks
Replaying 11 22 22
11 22 33Insertion
1 2 3 4
xx yy zz xx yy zz
PID: 2 3 4 1 2 3 2 1
Architectural DesignApplication2Application1
CPU0 CPU1 CPU2 CPUn
MainMemory
BUS
… … ……
CryptographicUnit
Private key
Public key
SK1 …G1
Sk2 …G2
Skn …Gn
Additional Info Table
Experiment Environment
Tools Simics full-system multiprocessor simulator 5 benchmarks from SPLASH2 suite
Configuration Machine: 1Ghz, SPARC V9, Solaris 9 Cache
Separate L1 I- and D-cache: write-through, 64K, 32B line Integrated L2 Cache: write-back, 1M/4M, 64B line MESI Coherence Protocol
Latency cache-to-cache: 120 cycles; cache-to-memory: 180 cycles AES: 80 cycles
Performance Slowdown
Write Invalidate Model + 4M Write Back L2 Cache
00.020.040.060.08
0.10.120.140.160.18
0.2
fft radix barnes lu ocean average
Perc
en
tag
e S
low
do
wn
(%
)
2P 4P
Bus Traffic Increase
Write Invalidate Model + 4M Write Back L2 Cache
00.05
0.10.15
0.20.25
0.30.35
0.40.45
0.5
fft radix barnes lu ocean average
Bu
s A
cti
vit
y I
ncre
ase (
%)
2P 4P
Integrated SystemWrite Invalidate Coherence Model + 1M Write Back L2 Cache
0.05
0.03
0.04
0.01 0.
080.
04
0
2
4
6
8
10
12
14
16
18
fft radix barnes lu ocean average
Per
cen
tag
e S
low
do
wn
(%
)
SENSS SENSS+Mem_OTP_Chash
IV: Defending Software Vulnerability
Buffer overflow attack The most common attack Procedure:
Insert malicious code into the user space Identify a vulnerable program point Control flow change to the malicious code
Still possible on secure processors Need software support
Hardware alone cannot defend all attacks
Need Compiler Support
How? Adopt existing approaches
StackGuard, array boundary check New approaches
Hardware supported information tracking Alarm when insecure input is used as return address Efficient, w/o significant performance loss Possible to include OS support for better protection
Future Trends
Wide adoption of secure processors Business transactions, mission critical applications General-purpose applications
Achieving more ambitious security goals Protect inter-process communication Defend viruses, worms, and DoS attacks
Collaborating with OS, compilers
Conclusion
Secure Processor is A promising solution is an insecure world Currently active in research community and industry