39
Selling an executable data flow graph based IR John Yates

MathWorks Interview Lecture

Embed Size (px)

Citation preview

Page 1: MathWorks Interview Lecture

Selling an executabledata flow graph based IR

John Yates

Page 2: MathWorks Interview Lecture

Order of presentation

• Who am I and why am I here?• 2010: Netezza needs a new architecture• A family of statically typed acyclic DFG IRs• (Time permitting: Some engineering details)• Q&A

Page 3: MathWorks Interview Lecture

“Who am I and why am I here?”

(with apologies to Adm. Stockdale)

Page 4: MathWorks Interview Lecture

1970: Maybe I’ll be a programmer

• NYC hippie, ponytail, curled handlebar mustache

• Liberal arts high school, lousy student• Wanted to build things, real things• Computers seemed interesting and intuitive• Luckily in 1970 programmers were scarce

Page 5: MathWorks Interview Lecture

40 years…– 1970: learning the craft, various jobs (all in assembler)– 1978: Digital Equipment Corp

• Pascal frontend, dynamic programming code selector– 1983: Apollo Computer

• Designed RISC ISP w/ explicit parallel dispatch (pre-VLIW)• Lead architect for RISC backend optimizer; built team• 1st commercial: SSA IR, SW pipeliner, lattice const prop

– 1992: Binary translation: DEC (sw), Chromatic (hw-support)• More SSA IR, lowering; built teams; lot of patents (many hw)

– 1999: Everfile - NFS-like Win32 internet file system– 2002: Netezza, badge #26

• Storage: compression, indices, access methods, txns, CBTs

20+

year

s

Page 6: MathWorks Interview Lecture

2010: Netezza needsa new architecture

Page 7: MathWorks Interview Lecture

Data parallel analytics engine

• Data partitioned across a cluster of nodes– Multiple “slices” per node to exploit multi-core

• Execution model:– Leader accepts query, produces an execution plan– Leader broadcasts plan’s parallel components– Cluster performs data parallel work– Leader performs work requiring a single locus

• Competition: Teradata, Green Plum, DB2, …

Page 8: MathWorks Interview Lecture

Netezza’s architecturePG Plan

Split

1

Split

2

Gen

FPGA

Gen

C++

Gen

C++

Com

pil

eCo

mpi

le

Load

DL

L

Bcas

t

Load

DL

LLo

adFP

GA

Exec

ute

Exec

ute

N workers

Page 9: MathWorks Interview Lecture

Latency

Netezza’s problemsPG Plan

Split

1

Split

2

Gen

FPGA

Gen

C++

Gen

C++

Com

pil

eCo

mpi

le

Load

DL

L

Bcas

t

Load

DL

LLo

adFP

GA

Exec

ute

Exec

ute

Very simplistic code generator:-Lowering across an enormous semantic gulf- No intermediate representation- Very complex, very fragile- Difficult to implement much more than general case code patterns

Hardwaredevelopmenttime scales

N workers

Page 10: MathWorks Interview Lecture

Garth’s incomplete Marlin vision

• What is the real input to the interpreter?• How do we get from query plan to that form?

PG Plan

Split

Bcas

t

Inte

rpre

t(fa

ster

?)

Inte

rpre

t(fa

ster

?)

N workers

Unspecifiedmiracle

Multi-core?

Page 11: MathWorks Interview Lecture

A family of statically typed acyclic data flow graph IRs

Page 12: MathWorks Interview Lecture

Working backwards

• Graph• Dataflow• Acyclic• Statically typed• A family of … IRs

Page 13: MathWorks Interview Lecture

Graph

• Operators– Label names a function– Edge connections in and out

• Edges– Directed (“dataflow”)

Page 14: MathWorks Interview Lecture

Dataflow

• Dataflow machines– Apply history, wisdom, insights to the interpreter

• Value semantics– All edges carry data– No other kinds of edges (i.e. no anti-dependence)– No updatable shared state (i.e. no store)

• Expose all opportunities for concurrency

Page 15: MathWorks Interview Lecture

Acyclic

• No backedges ≡ no cycles J• Can exploit topological ordering– Fact propagation: rDFS (forward) or DFS (reverse)– No iteration, guaranteed termination– Linear algorithms, O(graph)

Page 16: MathWorks Interview Lecture

Statically typed

• Edges initially have unknown type• A well-formed graph can be statically typed– Linear pass over topologically ordered Operators– Assign edge types per Operator descriptors– Inconsistencies can be diagnosed and reported

Page 17: MathWorks Interview Lecture

• Well-nested subsets of edge type vocabularies• Constraining edge types constrains operators

A family of … IRsPG Plan

Split

Bcas

t

Inte

rpre

t

N workers

Low

er

and

Opt

Low

er

and

Opt

Low

er

and

Opt

Inte

rpre

t

Tree

pa

ttern

s

Grap

h 1

patte

rns

Grap

h 2

patte

rns

High level tree - tuplesHigh level graph - tuplesMid level graph - nullable valuesLow level graph - values

Commonpatternnotation

Topo

ex

pand

, in

sert

CL

ON

Es

Topo

ex

pand

, in

sert

CL

ON

Es

Page 18: MathWorks Interview Lecture

Nothing convinces like working code

• First delivery– Table drive operator semantics– Utilities: build, edit & expand– Topologically sort– Type check & report errors

Split

Bcas

t

Inte

rpre

t

N workers

Inte

rpre

t

Topo

ex

pand

, in

sert

CL

ON

Es

Topo

ex

pand

, in

sert

CL

ON

Es

Grap

h as

sem

ble

r

Graphassemblyprogram

Page 19: MathWorks Interview Lecture

Sold!

• Working code rendered mysuccessive lowerings idea credible

• Overall Marlin added ~10 engineers; I got 3• My team got itsfirst end-to-end test case working

PG Plan

Split

Bcas

t

Inte

rpre

t

N workers

Low

er

and

Opt

Low

er

and

Opt

Low

er

and

Opt

Inte

rpre

t

Tree

pa

ttern

s

Grap

h 1

patte

rns

Grap

h 2

patte

rns

Topo

ex

pand

, in

sert

CL

ON

Es

Topo

ex

pand

, in

sert

CL

ON

Es

Page 20: MathWorks Interview Lecture

IBM killed the Marlin program…

• Marlin was a clean up project promising…– Performance and shorter development cycles– But no new features nor functionality

• It is always hard to fund significant clean up– Especially if not legitimately tied to a coveted feature

• Harder if your company is under duress• Harder still if DB2 is gunning for your headcount

Page 21: MathWorks Interview Lecture

Question?

Page 22: MathWorks Interview Lecture

Some engineering details

Page 23: MathWorks Interview Lecture

Why clone?

• After expansion all edges are point-to-point– No output is multiply-consumed

• Chunk handoff along an edge becomes trivial– Think C++11’s new move semantics

• So only clones implement reference counting

Page 24: MathWorks Interview Lecture

Broadcast

• Serialize / deserialize• On network size matters• Graph object– Small number of scalar members– Handful of C++ vector (some ephemeral)– Position independent (no pointers in vectors)

Page 25: MathWorks Interview Lecture

No pointers

• Pointers index the linear address space– Implicit context (there is only one address space)

• Unsigned as vector index– User must provide explicit context (vector base)– 32 bit indices are ½ the size of 64 bit pointers– Position independence simplifies serialization

Page 26: MathWorks Interview Lecture

The graph object

• Exposed read-only data– Vector of Operator objects– Vector of EdgeIn objects– Vector of EdgeOut objects– Literal table and pool

• Private data (may be missing or elided)– Vector of EdgeIn next links– Vector of Operator BreadCrumbs

Page 27: MathWorks Interview Lecture

Discardable elements

• vecBc: BreadCrumbs vector• vecNxt: EdgeIn sibling links• LiteralPool hash table array

Page 28: MathWorks Interview Lecture

Graph vector detailsVector Index Type Element Type Element Sizeg.vecOp OperatorIndex Operator 16 bytes

g.vecOut EdgeOutIndex EdgeOut 8 bytes

g.vecIn EdgeInIndex EdgeIn 8 bytes

g.lit LiteralKey Literal multiple of 8 bytes

g.vecNxt EdgeInIndex EdgeInIndex 4 bytes

g.vecBc OperatorIndex BreadCrumb 4 bytes

Page 29: MathWorks Interview Lecture

Connectivity: Operator objects

• Operator private members– Operator’s edges are sub-vectors of g.vecIn, g.vecOut– Start of EdgeIn objects: EdgeInIndex baseIn_;

– Start of EdgeOut objects: EdgeOutIndex baseOut_;• Number of connections– Inputs: vecOp[x+1].baseIn_ - vecOp[x].baseIn_– Outputs: vecOp[x+1].baseOut_ - vecOp[x].baseOut_

Page 30: MathWorks Interview Lecture

Connectivity: EdgeIn objects

• EdgeIn private members– Sink Operator: OperatorIndex dstOp_;

– Source EdgeOut: EdgeOutIndex src_;

• EdgeIn connection position– Use pointer arithmetic:this - (vecIn + vecOp[dstOp_].baseIn_);

Page 31: MathWorks Interview Lecture

Connectivity: EdgeOut objects

• EdgeOut private members– Source Operator: OperatorIndex srcOp_;

– Sink EdgeIn: EdgeInXIndex dst_;

• EdgeOut connection position– Use pointer arithmeticthis - (vecOut + vecOp[srcOp_].baseOut_);

Page 32: MathWorks Interview Lecture

Working with XG

Page 33: MathWorks Interview Lecture

Thin graph constructionMethod Effect

graph.add(BreadCrumb, Op, Locus, Expansion, unsigned nVarIn =0, unsigned nVarOut =0);

Add an Operator and its Edge resources

graph.connect(OperatorIndex srcOp, unsigned srcPos, OperatorIndex dstOp, unsigned dstPos);

Guarantee a srcOp[srcPos] to dstOp[dstPos] edge exists

Page 34: MathWorks Interview Lecture

Whole graph operationsOperation Effect

Graph(); Construct an empty Graph

void done(); Topo sort and type check

Graph(Graph const thinGraph&, bool forSpu); Partitioning constructor

BinStream& operator << (BinStream&, Graph const&); Put to a BinStream (cheap)

BinStream& operator >> (BinStream&, Graph&); Get from a BinStream (cheap)

void expand(bool forSpu, Environment const& env); Expand, insert clones, etc.

Page 35: MathWorks Interview Lecture

Graph states and conversions

• Start with a “thin” graph• Leader plus one representative node and dataslice• Operators tagged with a locus and expansion rule• Outputs can have multiple consumers

• Partition into leader-side & node-side subsets• Expand based on loci and system topology

• Duplicate operators, adjust in and out arities, add sites• Expand edges: fan-in, fan-out, parallel• Introduce clones as needed

Page 36: MathWorks Interview Lecture

Graph overlay

• Template object publically derived from Graph• Macro hides lots of template boilerplate• User supplied types for parallel vectors– MyOperator ovOp[OperatorIndex]– MyEdgeIn ovIn[EdgeInIndex]– MyEdgeOut ovOut[EdgeOutIndex]

• Constructor shares vectors and LiteralTable

Page 37: MathWorks Interview Lecture

1973: Began 2-axis controller I wrote every line of code (in assembler)

Page 38: MathWorks Interview Lecture

1975: First installation 0.5 MegaWatt torch cutting up to ¾”

steel plate at Marion Power Shovel

Page 39: MathWorks Interview Lecture

1975: Torch on… I was hooked!