Upload
tamas-k-lengyel
View
2.269
Download
0
Embed Size (px)
Citation preview
Virtual machine introspection on
modern hardware
Tamas K Lengyel
@tklengyel
2/19/2015 – CrySys Lab, Budapest
Agenda
1. VMI intro
2. Intel’s split-TLB
3. Intel’s EPT and its limitations
4. Intel’s #VE / VMFUNC / EPTP-switching
5. Intel’s SMM/DMM
6. ARM
7. Conclusion
Virtual Machine Introspection (VMI)
Interpret virtual hardware
● Network, Disk, vCPU & Memory
● The semantic gap problem:
Reconstruct high-level state information from low-
level data-sources.
Virtual Machine Introspection (VMI)
Bridging the semantic gap
● The guest OS is in charge of managing the virtual
hardware
● How to get the info from it?
o Install in-guest agent to query using standard
interfaces
o If OS is compromised in-guest agent can be
disabled / tampered with
o Just as vulnerable as your AntiVirus
Virtual Machine Introspection (VMI)
Bridging the semantic gap
● Replicate guest OS functions externally
o Requires expert knowledge on OS internals and
hardware behavior
o Requires debug data to understand in-memory
structures
● Need access to VM memory, vCPU registers, etc.
o Hypervisor support required (or custom VMM)
o Need to emulate hardware v2p translation
Translation lookaside buffer (TLB) poisoning
Translation lookaside buffer (TLB)● Virtual to physical address translation is
expensive
● Hardware managed transparent cache
of the results
● Separate cache for data read/write and instruction fetch
(Harvard-type architecture)!
● The OS can flush it, but cannot query it
o Opportunity to whack it out of sync!
o Shadow Walker / FU rootkit
TLB poisoning
Original algorithm:Input: Splitting Page Address (addr)
Pagetable Entry for addr (pte)
invalidate_instr_tlb (pte); // flush TLB
pte = the_shadow_code_page (addr); // replace PTE in memory
mark_global (pte); // disable auto-flush
reload_instr_tlb (pte); // load it into TLB
pte = the_orig_code_page (addr); // put original entry back
TLB poisoning and virtualization
VMENTRY/VMEXIT automatically
flushes the entire TLB
● TLB poisoning is impossible
● Performance hit
Introduction of TLB tagging (VPID) in Intel Nehalem (2008)
● 16-bit field specified in the VMCS for each vCPU
● Performance boost!
● VM TLB entries invisible to the VMM
● The problem is not the split TLB, it’s the TLB itself
TLB poisoning with Windows / Linux
TLB poisoning uses global pages
● CR4.PGE (bit 7)
● Makes PTE’s marked as global survive context-switches
● Great performance boost for kernel pages!
Windows 7
● Regularly flushes global pages by disabled & re-enabling
CR4.PGE
Linux
● Doesn’t touch CR4.PGE after boot
The tagged TLB in Xen
The TLB tag is assigned to the vCPU from a global counter● asid->asid = data->next_asid++;
No flushes, just assign a new tag when needed!
When counter is overflown, flush everything and restart
from 1
A new TLB tag is assigned to the vCPU every time a MOV-
TO-CR3 is caught!
● The use of global pages is negated in the guest!
● The TLB needs to be primed on each context-switch!
The tagged TLB in KVM
Tag is assigned when vCPU structure is created
• Doesn’t matter if the vCPU is activated or not.
• Disable tagging and revert to old VMENTRY/VMEXIT TLB
flushing if out of tags
Priming the TLB in Linux guests on KVM is a problem!
• However, the split TLB still has issues
The sTLB!
Intel Nehalem introduced second-level (victim) cache
The problem:
● Split-TLB relies on a custom page-fault handler being
called to re-split the TLB when it’s evicted
● With sTLB, the entry is brought back..
o Both into the iTLB & the dTLB
o Split-TLB becomes unsplit!
● Split-TLB poisoning is unreliable in VMs!
Disabling the sTLB
2014: MoRE Shadow Walker - The evolution of TLB
splitting on x86
● sTLB doesn’t merge entries with conflicting PTE
permissions
● Definitely applies to EPT PTEs but hasn’t been tested
with regular PTEs
● Make shadow code-page X only before loading and the
custom #PF handler will be triggered when evicted
Reconstructing the guest OS’s view
Use standard OS structures at known locations
● KPCR, KDBG
● Linux Sysmap
Scan for objects of interest with signatures
● Memory is in fast-flux
● False positives
● Weak signatures
● Cross-view validation is costly
Interposition on x86 Intel
Need to trap VM execution for coherent view
● Avoids memory fast-flux problems
● Can avoid DKOM/DKSM
Two main options
● Tracing via EPT
o Granularity is that of a page but can trap R/W/X
● Tracing via #BP
o Only traps X on direct hit but is guest-visible
o Can be protected with EPT to hide it
Extended Page Tables (EPT)
Speed up guest virtual to machine
physical address translation.
Two sets of tables:
1st layer managed by guest OS
2nd layer managed by the VMM
Permissions can be different in the two layers!
Extended Page Tables (EPT)
Up to 512 EPTs per VM!
● Most VMMs just assign 1 EPT / VM
● Value defined in VMCS for each vCPU
● Xen 4.6 will allow different EPT / vCPU
Permission stored in bit 0-2 of EPT PTE● r : 1, /* bit 0 - Read permission */
● w : 1, /* bit 1 - Write permission */
● x : 1, /* bit 2 - Execute permission */
Lots of software programmable bits in EPT PTE available● access : 4, /* bits 61:58 - p2m_access_t */
Extended Page Tables (EPT)
Unlike normal PTEs, permission can be:
• R
• X
• R/W
• R/X
• R/W/X
• W and W/X triggers EPT misconfiguration
• Still traps to the VMM but viewer info transfered
EPT limitations
● When violation is trapped,
the information only gives
the start address and type
of the violation.
● A single R/W violation may touch up to 8-bytes!
● Not sufficient to match violation offset against known
locations
● Violations in the vicinity of a watched area need to be
treated as potential hits as well!
EPT limitations
Read/Write violation ambiguities"An EPT violation that occurs during as a result of execution of a read-
modify-write operation sets bit 1 (data write). Whether it also sets bit
0 (data read) is implementation-specific and, for a given
implementation, may differ for different kinds of read-modify-write
operations.“ - Intel SDM
It is possible to siphon data using r-m-w operations from a
page that doesn’t allow reading!
Fixed in Xen 4.5
EPT limitations
How to let the VM progress without missing a potential
event?
Emulate the instruction!
• Xen comes with a baked in x86 emulator!
• Option to “emulate with no write”
• Perform the instruction but don’t let it write to
memory
Intel #VE (Virtualization Exceptions)
EPT based tracing has a 4k granularity
● Too much overhead when we only care about certain
points
● Handle EPT violations in the guest!
#VE Interrupt Service Routine (ISR)
● Interrupt handler within the guest
● Defined in guest IDT #20
● EPT PTE bit 63 determines if violation triggers #VE or
VMEXIT
VMFUNC and EPTP switching
“This instruction allows software in VMX non-root operation to invoke
a VM function, which is processor functionality enabled and
configured by software in VMX root operation. No VM exit occurs.”
VMFUNC with EAX=0 => EPTP switching
● Remember how we can have up to 512 EPT’s per VM?
● Pass EPT IDX as parameter in ECX (0-512)
● Faulting translation of GPFN is performed via the new
table!
● Guest never really “knows” the value of EPTP
EPTP switching
How do we go “back” to the original EPTP afterwards?
Single-step and restore?
“The key observation is that at any single point in time, a
given hardware thread can be fetching an instruction or
reading data, but not both. There are ways of avoiding the
single-step too. [...] It's an optimization certainly, but it's
not required, and it's not a technique we have placed in the
public domain. You could try talking to us under NDA or
figure it out for yourself.” Ed White (Intel) xen-devel, 1/16/2015
1-setting of the “EPT-violation #VE” VM-
execution control
Optional CPU feature baked into the chip
During VMFUNC[0] saves the updated EPT index (ECX[15:0])
and gives it to the next #VE
Allows you to juggle the “views” without missing anything!
Index 0 := selective X traps; 1 := only X; 2 := only R/W
• If #VE with IDX=0, switch to IDX=1 (VMEXIT if of interest)
• If #VE with IDX=1, switch to IDX=2
• If #VE with IDX=2, switch to IDX=0
System Management Mode (SMM)
SMM intended for low-level services, such as:
● thermal (fan) control
● USB emulation
● hw errata workarounds
Can be used for:
● VMI
● Anti-VMI!
Hard to take control of it on (most) Intel devices as it is
loaded by the signed BIOS.
SMM
● Normal mode SMM is
triggered by interrupts
(SMI)
● Can be configured to
happen periodically
● Always returns to the
same execution mode
afterwards
SMM
The problem for SMM based VMI systems:
“A limitation of any SMM-based solution [...] is that a
malicious hypervisor could block SMI interrupts on every
CPU in the APIC, effectively starving the introspection tool.
For VMI, trusting the hypervisor is not a problem, but the
hardware isolation from the hypervisor is incomplete.”Jain et al. “SoK: Introspections on Trust and the Semantic Gap” 2014, IEEE S&P
Intel Dual-monitor mode SMM – the DMM
Available on all CPUs with VT-x
SMM can become an independent
hypervisor!
SMIs are still available but..
A VMCALL executed by the VMM
automatically and unconditionally
traps into the DMM
Intel DMM
The VMCALL instruction can be used to instrument the
VMM.
● Same way the VMM can use #BP to instrument a VM.
The DMM can enter any execution mode on the system!
● Full control over the execution flow
● Hidden VMs
The DMM can disable SMIs for a VM!
● Forced execution
● Nothing in the system can preempt it (SMIs, NMIs, etc.)
ARM 2-stage paging
Very similar to EPT
• Fewer software available bits available
• We need to store our custom permissions somewhere so
we know if a violation was induced by us or not
• With EPT its just stored directly in the PTE
Xen mem_access for ARM
• VMM maintains a Radix tree to store custom permissions
• During violation the tree is queried
• If violation induced by monitor, forward to monitor
ARM 2-stage paging
How do we let the VM progress without potentially missing
the next event?
No MTF-like singlestep available
• Emulation?
• On x86 Xen already comes with a built-in emulator
• Not so much on ARM..
The same trick we did in the VMFUNC!
• Only this time it traps into the VMM
ARM SMC is trappable to the VMM!
ARM has no #BP that traps to the VMM
• Secure Monitor Call (SMC) can be configured to do so!
• SMC allowed only from VMM or guest kernel
• Better than nothing
What to use the TrustZone for?
• Integrity check the VMM!
• Make all VMM pages non-writable
• VMM #PF handler executes SMC
• Verify if change is allowed