Upload
maegan
View
49
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Support for Symmetric Shadow Memory in Multiprocessors. Vijay Nagarajan Rajiv Gupta University of California, Riverside. Runtime Monitoring. Applications of monitoring Security DIFT Debugging Memcheck, Redux, OnTrac Performance Speculation Requirements of monitoring - PowerPoint PPT Presentation
Citation preview
Support for Symmetric Shadow Memory in Multiprocessors
Vijay Nagarajan Rajiv Gupta
University of California, Riverside
Runtime Monitoring• Applications of monitoring
– Security • DIFT
– Debugging • Memcheck, Redux, OnTrac
– Performance• Speculation
• Requirements of monitoring– Shadow Memory (SM)
• Meta-data associated with memory locations– Shadow memory instructions (SMIs)
• Instruction for maintenance of meta-data
DIFT: Example
• Each word/reg associated with “taint” value– Data from input channels are considered tainted– Flow of tainted data is tracked– Usage of tainted data in “malicious” fashion
detected
Original Instruction Shadow Memory OperationLd reg, mem Taint-val[reg]Taint-val[Mem]
St reg, mem Taint-val[mem]Taint-val[reg]
Add reg1, reg2 Taint-val[reg1]Taint-val[reg1] or Taint-val[reg2]
Jmp reg1 If Taint-val[reg1] raise exception
Shadow Memory Observations
• Single vs Multiple Shadow values– DIFT associates one taint value– Other applications associate multiple shadow values
• DDG computes dynamic dependence graph on the fly• For each memory word, maintains (instruction, instance) pair
that wrote to it last.
• Symmetric SMIs– Original stores (loads) associated with shadow stores
(loads)
• Atomic SMIs– OMI and SMIs must be executed atomically
Atomic SMIs
Proc A
St1
S St1
St2
S St2
Inconsistent ViewAtomicity
Proc B
Ld
S Ld
Proc A
St1
S St1
St2
S St2
Proc B
Ld
S Ld
Proc A
St1
S St1
St2
S St2
Proc B
LdS Ld
Robust & Efficient SM• Each SM access involves
– Calculating effective and shadow address– Accessing the shadow values
• Half-and-Half scheme– Reserve half of virtual space for shadow memory– Efficient SM access– Not Robust [Nethercote and Seward VEE ’07]
• Valgrind’s s/w page table like scheme– Robust– Inefficient (Valgrind’s Memcheck causes 22x slowdown)
• Need to be efficient and robust!
Research Question
• Can we make SMIs and OMIs atomic?
• Can we make SM accesses efficient without sacrificing robustness?
• Can we do the above with minimal HW support?
Our Approach• Convey atomic block to the processor
– Simple ISA support: shadow-start, shadow-end– SMIs implicitly identified
• Coupled Coherence– Coherence of SMIs and OMIs are coupled– Enforces the effect of atomicity
• OS Support– Couple allocation of original and shadow pages– Efficient addressing without sacrificing robustness
ISA Support• Shadow-start / Shadow-end
instructions– OMIs and SMIs enclosed– Conveys atomic block to the
processor– Guides actions of cache-
coherence protocol
• Implicitly distinguishing SMIs– First instruction is an OMI– All others with same VA treated
as SMIs– Multiple accesses implicitly
assumed to access different shadow values
EXAMPLE
0. shadow-start
// Original load
1. ld reg1, vaddr // 1st shadow load
2. ld reg2, vaddr
// 2nd shadow load
3. ld reg3, vaddr
4. shadow-end
Coupled Coherence
• Dependence Mirroring– Dependences
among SMIs mirror those of the OMIs
– If OMI2 OMI1 then SMI2 SMI1
– Couple coherence enforces this
Proc A
St1
S St1
St2
S St2
Proc B
Ld
S Ld
Coupled Coherence• Coupled Coherence involves
– No Explicit Shadow coherence messages• SMIs do not trigger coherence messages• Shadow stores do not trigger invalidates• Shadow loads do not cause misses
– Co-transfer• Data replies of original blocks are piggybacked with shadow
blocks
– Co-existence• Original blocks and shadow blocks co-exist in the cache• Brought in together• Replaced together
Dependence Mirroring: RAW
Shared
shared
Block ‘B’
Proc A Proc B
St
Shared
shared
Shadow Block ‘B’Proc A send invalidate for B and B’
Exc
Exc
Inv
Inv
S St Ld S Ld
Proc B send read miss for B and B’Proc A sends blocks B and B’
Dependence Mirroring: RAW
Block ‘B’
Proc A Proc B
St
Proc A send invalidate for B and B’
Exc Inv
S St Ld
S Ld
Proc B send read miss for B and B’Proc A waits until ready bit set
shadow-st
shadow-end
0
Ready bit
Proc A sends blocks B and B’
1
Dependence Mirroring: WAR
Proc A Proc BSt1 S St1
Ld
S Ld
St2 S St2
Proc A send invalidatesProc B send read miss for B and B’Proc A sends blocks B and B’
Coupled Coherence
• On a cache miss– Original Ld / St
• Place read miss for original, shadow block(s)• Write back dirty blocks
– Shadow Ld / St• //No coherence events
• Shadow-start– Set ready bit to 0
• Shadow-end– Set ready bit to 1
Symmetric/General SM
• Symmetric SM– Original loads (stores) accompanied by shadow loads
(stores)
• General SM– Original load can be accompanied by both shadow
loads and stores• Eg. Eraser: Online race detection
– Need to enforce shadow coherence for RAR• Typically no coherence events for RAR• Future Work
Addressing Support• Shadow pages allocated adjacent to original pages
– Virtual Memory space unaffected– Retains robustness – OS treats them as a single “superpage”
• Swapped in and swapped out together
• Address Translation– During Address translation add offset to access shadow page– Provides efficiency– No separate TLB for shadow pages
V.Page
OffOMI
V.Page
Off
SMI
Ph.page
TLB
Ori.Page
Shadow Page 1
Shadow Page 2
Memory
ShadowValue cnt
Experiments• Implementation in SESC Simulator
– Cycle Accurate, targets MIPS architecture• Shadow-start, Shadow-end instructions
– Models cache coherence protocol• Coupled Coherence implementation• Bus based protocol
– Models basic OS services• Coupled page allocation
• Monitoring Applications– DIFT: Detection of security attacks– DDG: Computes Dynamic dependence graph online
• Benchmarks– SPLASH-2
Efficiency of SM• Three versions:
– SM• Our SM implementation• ISA support• OS support for address translation • Coupled Coherence protocol for atomicity
– VAL: serial• Valgrind’s SM support.• Address Translation: involves software page table accesses• Atomicity: Enforced by thread serialization
– VAL:lb• Valgrind’s SM support with no atomicity guarantees• Means of comparison of our address translation support
Efficiency of SM: DIFT
0
10
20
30
40
50
60N
orm
aliz
ed E
xecu
tion
Ove
rhea
d
barn
es
fmm
ocea
n
radi
osity
Ray
trac
e
wat
er-n
sq
wat
er-s
p
aver
age
VAL:serialVAL:lbSM
• VAL:serial causes 41 times overhead on an average– Effect of serialization
• SM causes only 7 times overhead– Efficient Address translation + coupled coherence
• Even without serialization VAL:lb causes 12 times overhead– With coupled coherence this reduces to 7 times
Efficiency of SM:DDG
0
20
40
60
80
100
120N
orm
aliz
ed E
xecu
tion
Ove
rhea
d
barn
es
fmm
ocea
n
radi
osity
Ray
trac
e
wat
er-n
sq
wat
er-s
p
aver
age
VAL:serialVAL:lbSM
• VAL:serial causes 78 times overhead on an average– Effect of serialization
• SM causes only 23 times overhead– Efficient Address translation + coupled coherence
• Even without serialization VAL:lb causes 27 times overhead– With coupled coherence this reduces to 23 times– Effect not as pronounced as in DIFT
Effect of Coupled Coherence
0
0.2
0.4
0.6
0.8
1
1.2Pe
rcen
tage
Ove
rhea
d
barn
es
fmm
ocea
n
radi
osity
Ray
trace
wat
er-n
sq
wat
er-s
p
aver
age
DIFT:1
DDG:2
3-DColumn 3
• Performance overhead < 0.6% for DIFT and DDG– Total amount of traffic is about the same– Coupled coherence sees more bursts in traffic
Related Work• Enforcing Atomicity
– Valgrind [Nethercote et al. PLDI ‘07] through thread serialization• Not efficient
– TM [Chung et al. HPCA ‘08] can be used.• Requires additional HW changes• Support for rollback and re-execution.
• Address Translation– Valgrind [Nethercote VEE ’07] software page table structure
• Proposed application specific optimizations• Still inefficient
– Half-and-Half scheme [Qin et al MICRO ’07]• Divides virtual address space• Not Robust
Conclusion• SM used extensively for performing monitoring
– Performance– Security– Debugging
• Support for improving SM performance– ISA Support– Coupled coherence atomicity– Coupled allocation efficient addressing– Significant performance advantage
• Future Work– Extend system to not only symmetric SMIs– Look at other techniques for providing atomicity without changes
to coherence protocol
Questions?