Isca Presentation Final - Illinoisiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca08_1.pdfIsca...

DeLorean: Recording and Deterministically Replaying Shared

Memory Multiprocessor Execution EfficientlyPablo Montesinos, Luis Ceze* and Josep Torrellas

Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

http://iacoma.cs.uiuc.edu

*Department of Computer Science and EngineeringUniversity of Washington

Pablo Montesinos DeLorean

Motivation

Debugging parallel applications on a multicore is challenging:

Bugs can be very timing dependent

Their effects might manifest after many instructions

Need effective debugging techniques for parallel applications on MPs

One such technique: HW-assisted deterministic replay

HW-Assisted Deterministic Replay

Phase I: Initial execution

HW records into a log the order of dependences between threads

This log captures the “interleaving” of threads

Phase II: Replay

Re-run the parallel application

Enforce the dependence orders stored in the log

Bacon and Goldstein. WPDD 91

Contributions

Propose DeLorean: a new scheme for hw-assisted deterministic replay of parallel programs based on “chunk-based” execution

Records initial execution at production-run speeds

Requires only small log (0.6% of current schemes)

Replays at high-speed

Has multiple modes of execution with different speed/logging requirements

delorean

Outline

Background on replay schemes

DeLorean: a Chunk-based replay scheme

Advanced DeLorean Execution

DeLorean Replay

Evaluation

Point-to-Point Schemes: FDR and RTR

Record dependences between individual instructions of different threads

Optimizations to reduce log size

Transitive reduction

Regulated transitive reduction

Use Sequential Consistency (SC)

RTR proposes an algorithm to support TSO:

Record value of loads that violate SC

P2’s LogP2’s LogP2’s Log

P1 n m

FDR (ISCA 03)

RTR (ASPLOS 06)

Vector-based Schemes: Strata

Stratum: vector of as many counters as processors:

Count # of memory operations executed

Log stores sequence of strata

It also uses SC

Log size is similar to RTR’s

Strata (ASPLOS 06)

P1 P2 P3Wa

Wc 1 1 2

Stratum

Episode-Based Schemes: ReRun8

ReRun (ISCA 08)

delorean

Outline

DeLorean: a Chunk-based replay scheme

DeLorean Replay

Evaluation

Chunk-Based Execution

Memory ops can be reordered and overlapped within and across chunks

Memory access interleaving happens at chunk boundaries

Chunk-based systems: TCC (ISCA 04), IT (ICPS 05), BulkSC (ISCA 07)

ld Xld Zld Xcommit

st Xst Y

ld Tld Zst W

re-execute

ld Xld Zld X

st Xst Y

ld Tld Zst W

Group consecutive dynamic instructions

commit

Execute chunks atomically

Execute chunks in isolation

chunk chunk

Limit possible interleavings

P0collision

A Great Substrate for MP Replay

Overlapping/reordering enables high-speed execution and replay

Performance similar to a system with Release Consistency (RC)

Limited memory interleaving minimizes logging requirements

No need to record individual dependences

Chunk-Based Replay: DeLorean

Replaying a chunk-based system means:

Generate same chunks during initial execution and replay

And commit them in the same order

DeLorean’s Log stores total order of chunk commits and their size

DeLorean’s log is very small:

Each entry is short: (ProcID, ChunkSize)

Log updated infrequently

Aggressive designs can make it even smaller

commit

st Xst Y

ld Ald Bld Ccommitst A

st Xld Zst Zst T

commit

ld Yst Bst Acommit

Basic DeLorean Operation13

Proc 0

Commit Request Commit Request

PI Log

CS LogA Size

ProcID = 0

ProcID = 1

Chunk A

Proc 1

CS LogB Size

Chunk BArbiter

Log implemented as two different structures:

Chunk Size (CS) Log

Processor Interleaving (PI) Log

delorean

Outline

DeLorean: a Chunk-Based replay scheme

DeLorean Replay

Evaluation

Reducing DeLorean’s Log

Use large chunks1

Fixed-size chunking2

Predefined interleaving3

PI LogP0

CS Log

P1CS Log

Reducing DeLorean’s Log

Use large chunks1Fewer entries in log

More collisions and cache overflows

Fixed-size chunking2No CS log

Need to handle cache overflows

Predefined interleaving3No PI log

Performance degradation

Almost no

DeLorean Execution Modes

PicoNano

Predefined interleaving3

Use large chunks1

(tiny CS log) (tiny CS log)

(no PI Log)

delorean

Outline

DeLorean: Chunk-Based replay scheme

DeLorean Replay

Evaluation

DMA Log

Network

Overall System

Proc 0

PI Log

CS Log

Chunk A Arbiter

Interrupt Log

I/O Log

Proc 1

CS Log

Chunk B

Interrupt Log

I/O Log

Interrupt, I/O and DMA logs also appear in all past replay schemes

DeLorean Replay

Virtual machine monitor loads initial system checkpoint

Interrupt Log provides initial-execution interrupts

I/O Log provides initial-execution I/O inputs

Processors execute normally (they execute in parallel):

Use CS Log to generate chunks with exceptional sizes (from cache overflows)

Arbiter uses PI Log to enforce initial-execution commit order (or predefined order in Pico)

Also in the paper

Rules for deterministic chunking

How to deal with interrupts, traps, cache overflows, I/O ops

Scalability of Pico

A detailed view of DeLorean’s architecture

Proof of why DeLorean’s replay is deterministic

Combination with other HW replay schemes

delorean

Outline

DeLorean: Chunk-Based replay scheme

DeLorean Replay

Evaluation

Evaluation Setup

Chunk-based simulator supports both execution and replay

8-processor CMP

1000, 2000, 3000 instructions per chunk

PI Log: 4-bit entries

CS Log: 32-bit entries

Applications: SPLASH, SPECjbb 2000 and SPECweb 2005

1000 2000 30001000 2000 30000

1000 2000 3000

Log Space Requirements

Nano’s log is 16% of RTR’s (2000-instruction configuration)

With one additional optimization, Nano’s log is 7.5% of RTR’s

Picos’s log is 0.6% of RTR’s (1000-instruction configuration)

PI log (Compressed)CS log (Compressed)

Average RTRcompressedlog size(estimated)

1000 2000 3000

CS log

Instructions per chunk

Average RTRcompressedlog size(estimated)

Instructions per chunk

PicoNano

(Compressed)

If We Were To Record a Day of Execution

FDR RTR Nano Pico

Execution and replay of long periods of time is feasible

10000.0

Performance

Initial Execution

Initial execution: Nano almost as fast as RC, Pico faster than SC

Replay: comparable to initial execution speed

Initial and Replay Execution

SCPicoNanoRC Pico ReplayNano Replay

R PO SR R

Conclusions

DeLorean is a new approach to HW replay of parallel applications

Advantages over traditional HW replay systems:

Executes at high speed (like most aggressive consistency models)

Replays at comparable speed

Summarizes execution into a very small log (0.6% of current schemes)

Great help for programmers that need to debug parallel codes

Future Work

Adapt DeLorean to conventional multiprocessors without chunk-based execution

Take conventional replay schemes, use very aggressive SC implementations, and compare to DeLorean

Combine the best aspects of the different recording approaches (DeLorean, RTR and Strata)

DeLorean: Recording and Deterministically Replaying Shared

Memory Multiprocessor Execution EfficientlyPablo Montesinos, Luis Ceze* and Josep Torrellas

Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

http://iacoma.cs.uiuc.edu

*Department of Computer Science and EngineeringUniversity of Washington

Isca Presentation Final - Illinoisiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca08_1.pdfIsca...

Documents

SANTOS MONTESINOS consumo de acociles como fuente importante de macronutrientes.pdf

"Peribanez" en "El Teatro en Lope de Vega" - Aubrun, Charles; Montesinos, José

Montesinos, Jose - Ciencia y Religion en La Edad Moderna

Caso Montesinos Torres, López Meneses - 6

Josep Torrellas - iacoma.cs.uiuc.eduiacoma.cs.uiuc.edu/iacoma-papers/cv_torrellas_nov2016.pdf2014 Distinguished Paper Award, International Conference on Programming Language Design

Sede Central: C/Antonio Montesinos 14, 37003 Salamanca

Vallejo Campos, Alvaro - Platon. El Filosofo de Atenas Ed. Montesinos 1996

“Et in Utopia ego”: Sir Thomas More and “Montesinos,” a ...writerly identity, and his Romantic poetics of history. The first part explores “Montesinos” as a byword for

Tesis Javier Galdos y Eddie Montesinos

Copyright by Lupita J. Limage-Montesinos 2004 · The Dissertation Committee for Lupita J. Limage-Montesinos certifies that this is the approved version of the following dissertation:

Literary Time in the Cueva de Montesinos

Josep Torrellas - Illinoisiacoma.cs.uiuc.edu/iacoma-papers/cv_torrellas_july2015.pdf · 2015-11-26 · Fryman, Ivan Ganev, Roger A. Golliver, Rob Knauerhase, Richard Lethin, Benoit

Antonio R Montesinos - Inopias

El SERMÓN de ANTONIO MONTESINOS

Caso Montesinos Torres, Bedoya de Vivanco

Alicia Marienne Basant Montesinos · 2004-11-18 · Alicia Marienne Basant Montesinos There is no justification from the medicolegal standpoint not to follow all scientific procedures

Montesinos Orphanage, Titanyen, Haiti

Cache-OnlyMemoryArchitecture - Illinoisiacoma.cs.uiuc.edu/iacoma-papers/encyclopedia_coma.pdfCache-OnlyMemoryArchitecture JosepTorrellas UniversityofIllinoisatUrbana-Champaign Siebel

President's Daily Diary, April 1, 1967 · H. E. The Amb of Venezuela (OAS)Dr. Pedro Paris -Montesinos and ' Mrs. Paris-Montesinos H. E. The Amb of Argentina (OAS) Dr. Eduardo Alejandro

BOUNDARY SLOPES FOR MONTESINOS KNOTS - pi.math…