Execution Replay for Multiprocessor Virtual Machines

Execution Replay for

Multiprocessor Virtual Machines

George W. DunlapDominic Lucchetti

Michael A. FettermanPeter M. Chen

Big ideas

• Detection and replay of memory races is possible on commodity hardware

• Overhead high for some workloads

• …but surprisingly low for other workloads

Execution Replay

Memory

Network

Keyboard, mouse

Interrupts

Uses of Execution Replay

• Reconstructing state– Fault tolerance

• Reconstructing execution– Debugging– Realistic trace generation

• Both– Intrusion analysis

Single-processor Replay• Basic principles well understood

– Log all non-deterministic inputs– Timing of asynchronous events

• Minimal overhead (Dunlap02)– 13% worst case– Log for months or years

• Available commercially– VMWare: Record/Replay

Replay for Multiprocessors• Memory races in multiprocessor VMs• The Ordering Requirement• The CREW Protocol

– Implementing with page protections– Relation to the Ordering Requirement– Generating constrants from CREW events

• DMA-capable devices and CREW• Performance

The Multiprocessor Challenge

• Interleaved reads and writes– Fine-grained non-determinism– Much more difficult

• Existing solutions– Hardware modification– Software instrumentation

• SMP-ReVirt– Hardware MMU to detect sharing

Multiprocessor Replay

Memory

n=3n=5

if (n<4)

Ordering Memory Accesses

• Preserving order will reproduce execution– a→b: “a happens-before b”– Ordering is transitive: a→b, b→c means

• Two instructions must be ordered if:– they both access the same memory, and– one of them is a write

Constraints: Enforcing order

• To guarantee a→d:– a→d– b→d– a→c– b→c

• Suppose we need b→c– b→c is necessary– a→d is redundant

overconstrained

CREW Protocol

• Each shared object in one of two states:– Concurrent-Read: all processors can read,

none can write– Exclusive-Write: one processor (the

owner) can read and write; others have no access

CREW protocol, con’t• Enforced with hardware MMU

– Read/write– Read-only– None

• Change CREW states on demand– Fault, fixup, re-execute

• CREW event– Increasing or reducing permission due to CREW

state changes

CREW Property

• If two instructions on different processors: – access the same page,– and one of them is a write,– there will be a CREW event on each

processor between them.

Generating Constraints• State: Concurrent Read

– All processors read-only

• d*: CREW fault• New state: P2 Exclusive• r: privilege reduction

– Read to None

• i: privilege increase– Read to Read/write

• Log timing of r and i• Constraint:

– r → i

Direct Memory Access

• Device accesses memory directly

• Logically another processor– Reads and writes need to be ordered– IOMMU: can’t fault/fixup/re-execute

• Observation: Transaction model

• Device: non-preemptible actor

Prototype: SMP-ReVirt

• Modified Xen hypervisor

• Implement logging, CREW protocol

• Details in paper

Evaluation questions

• What is the overhead?

• What affects performance?– In paper

• When might I want to use MP?– Log with 1, 2, or N cpus?

Evaluation Workloads

• SPLASH2 parallel application suite– FMM, LU, ocean, radix, water-spatial,

radiosity

• Kernel-build

• Dbench

Predicting results• Key changes in sharing attributes

– 4096-byte sharing granularity– “Miss” is very expensive

• SPLASH2– Good: high spatial locality / low false sharing– Bad: random access patterns / high false sharing

• The Linux kernel– Tuned to 16-byte cacheline– Involving the kernel may be expensive

Single-processor Xen guests

1.001.04

1.01 1.001.03

1.001.05

FMM LU ocean radix water-spatial

kernel-build

radiosity dbench

Unmodified 1-cpu guest

Logging 1-cpuguest

Log Growth RateWorkload Log growth(GB/day) Days to fill 300GB

FMM 0.234 1280

LU 0.237 1261

Ocean 0.232 1295

Radix 0.292 1025

Water-spatial 0.232 1296

Kernel-build 0.564 531

Radiosity 0.231 1295

Dbench 0.557 538

2-processor Xen guests

1.001.08

1.601.48

1.741.83

FMM LU ocean radix water-spatial kernel-build

Unmodified 2-cpuguest

Logging 2-cpu guest

Logging 1-cpu guest

2-processor, con’t

1.85 1.88

0123456789

radiosity dbench

Unmodified 2-cpu guest

Logging 2-cpu guest

Logging 1-cpu guest

Log Growth RateWorkload Log growth(GB/day) Days to fill 300GB

FMM 34.5 8.7

LU 3.2 92.7

Ocean 4.3 69.1

Radix 39.8 7.5

Water-spatial 36.3 8.25

Kernel-build 43.3 6.9

Radiosity 88.4 3.4

Dbench 77.0 3.9

4-processor Xen guests

1.12 1.28

FMM LU ocean radix water-spatial kernel-build

Unmodified domain, 4 cpus

CREW logging, 4 cpus

CREW logging, 2 cpus*

CREW logging, 1 cpu

Recap• Memory races in multiprocessor VMs• The Ordering Requirement• The CREW Protocol

– Implementing with page protections– Relation to the Ordering Requirement– Generating constrants from CREW events

• DMA-capable devices and CREW• Performance

Big ideas

• Detection and replay of memory races is possible on commodity hardware

• Overhead high for some workloads

• …but surprisingly low for other workloads

Questions

Execution Replay for Multiprocessor Virtual Machines

Documents

Multiprocessor Scheduling

Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen

Symmetric multiprocessor

Multiprocessor architecture

BugNet Continuously Recording Program Execution for Deterministic Replay Debugging Satish Narayanasamy Gilles Pokam Brad Calder

New Samsara: Efficient Deterministic Replay in Multiprocessor … · 2016. 6. 22. · Samsara achieves deterministic replay. Section 3 illus-trates how to record and replay the memory

Interactive Record/Replay for Web Application Debuggingmernst/pubs/record-replay-uist2013.pdf · execution environment of web browsers. Debugging tools are particularly important

Multiprocessor architectures

Kernel Support for Redundant Execution on …lie/papers/IanSin-thesis-2007.pdfKernel Support for Redundant Execution on Multiprocessor ... Kernel Support for Redundant Execution on

Pragmatic Source Code Reuse via Execution Record and Replaycmc/papers/armaly_jsep16_draft.pdfPragmatic Source Code Reuse via Execution Record and Replay Ameer Armaly , Collin McMillan

Multiprocessor structures

Models for Deterministic Execution of Real-time Multiprocessor Applications Peter Poplavko, Dario Socci, Paraskevas Bourgos, Saddek Bensalem, Marius Bozga

Execution replay for intrusion analysisweb.eecs.umich.edu/~pmchen/papers/dunlap06.pdf · events to generate entropy when choosing cryptographic keys, and all future com-munication

PRES: Probabilistic Replay with Execution Sketching on ...opera.ucsd.edu/paper/pres-sosp09.pdf · PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park

Multiprocessor Systems

PRES: Probabilistic Replay with Execution Sketching on ... · PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park and Yuanyuan Zhou Department of Computer

Multiprocessor Initialization

Multiprocessor communications

PERFORMANCE IMPROVEMENT OF MULTITHREADED JAVA APPLICATIONS ...hpc.ac.upc.edu/PDFs/dir06/file000064.pdf · PERFORMANCE IMPROVEMENT OF MULTITHREADED JAVA APPLICATIONS EXECUTION ON MULTIPROCESSOR

Continuously Recording Program Execution for Deterministic Replay Debugging