19
Runway Diego Ongaro Lead Software Engineer, Compute Infrastructure @ongardie https://runway.systems A new tool for distributed systems design

Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Runway

Diego OngaroLead Software Engineer, Compute Infrastructure

@ongardiehttps://runway.systems

A new tool for distributed systems design

Page 2: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Outline

1. Why we need new tools for distributed systems design

2. Overview and demo of Runway

3. Building a Runway model

Page 3: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Distributed Systems Are Hard

● Concurrency and message delays

● Failures, failures during failures

● Many possible interleavings of events

● Little visibility, poor debugging environments

Page 4: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Raft Background / Difficult Bug

Raft: fault-tolerant consensus algorithm

Used in many examples in this talk

Quick summary:

1. Use majority voting to elect a leader2. Leader replicates its log to followers

Difficult design bug:

Page 5: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Typical Approaches Find Design Issues Too Late

These are good techniques for implementation errors

● Localized: easy to fix

Too expensive for design errors

● May require large changes

● May cause unforeseen consequences

Let’s find the right design sooner...

Code reviews

Unit tests

System tests

Randomized tests, fuzzing, Jepsen

Benchmarks

Metrics

Dark launches

Bug reports

Page 6: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Communication:

● Build intuition quickly● Unambiguous● Reviewable: discuss major issues and

consider alternatives

Design Phase

Evaluation:

● Simplicity● Correctness● Performance● Availability

State of the art:

● Visualization (animation)● Specification● Model checking● Simulation

Commonly used today:

Tool

sG

oals

Page 7: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Visualization

Specification

Model checking

Simulation

A model is a representation of a system that captures its essential concepts and omits irrelevant details.

Design Tools Use System Models

Page 8: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

A Tour of Runway

Page 9: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Runway Overview

Integrated into one tool: write one model, get many benefits

Specify, simulate, visualize, and check system models

model(spec)

(error)interaction

graphs, data

visualization(animation)

randomized simulator

modelchecker

executionS2:recv

S3:proc

S1:send

Page 10: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Runway DemoToo many bananas, elevators, and Raft

Page 11: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Runway Integration

Independent tools: create independent models

● Write similar models for different tools● Change the design: revise them all

Runway: reuse the same model

● Lower cost, additional benefit ⇒ create models sooner● More likely to find modeling bugs

TLA+

500 LOC

JS

300 LOC

Rust

550 LOC

pseudo

150 LOC

Specification, simulation, and model checking all benefit from visualization

Page 12: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Building a Runway Model

Page 13: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Developing a Model

Idealized steps:

1. Sketch view by hand2. Define types, state variables3. Create view based on sketch4. Write invariants5. Write transition rules

visualization aids with

debugging

specification

view

Tip: set convenient starting state

Page 14: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

● Specification is code

● Define starting state, transition rules, and invariants

○ Labeled Transition System

● Rules encode behavior + failures

● Applying a rule is atomic (one at a time)

● A rule is active if applying it would change the state

● If multiple rules are active, system decides

○ Simulator: random choice

○ Model checker: walk the tree

Runway’s Specification Language

Page 15: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Example: Too Many Bananas (1)Type and variable declarations, invariant

type-safe variant: can’t access unless ReturningFromStore

Page 16: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Example: Too Many Bananas (2)Transition rule

no state changed:

inactive until readset changes

Page 17: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

It’s About Time

Developers: each server tries to approximate “the global clock”

Physicists: Ha! Blah blah blah, blah, blah! Blah blah blah blah. Blah!

Want some safety properties to hold even if clocks misbehave

Need time to describe availability and performance

Runway’s current approach: global clock, conditionally

server.timeoutAt <= clocktrue

Page 18: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

Summary

● Let’s apply tools to help us design distributed systems

● Modeling helps focus our attention on concepts, leaving out unimportant details

● Runway combines spec, model checking, simulation, and interactive visualization

● Go view the models, build your own, and help develop Runway

Page 19: Runway - Stanford EE Computer Systems Colloquiumee380.stanford.edu/Abstracts/161102-slides.pdf A new tool for distributed systems design. Outline 1. Why we need new tools for distributed

solve design problems in the design phasehttps://runway.systems