28
High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov 10, 2009

High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Embed Size (px)

Citation preview

Page 1: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

High-Quality, Deterministic Parallel Placement for FPGAson Commodity HardwareAdrian Ludwin, Vaughn Betz & Ketan Padalia

FPGA Seminar Presentation

Nov 10, 2009

Page 2: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Overview

Motivation Review simulated annealing Approaches Summary

Page 3: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Motivation

Page 4: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Simulated Annealing Placement

Probabilistic approach to finding optimal solution Behavior

Moves through solution space Greedily Randomly

Balance between greediness and randomness is controlled by a temperature

Temperature evolves through time based on a cooling schedule

Page 5: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Simulated Annealing Placement

For a single moveCompute change in

cost: ΔCAccept move:

ΔC < 0 ΔC > 0, with

probability e-ΔC/T

Repeat while gradually decreasing T and window size

c4c1

c5

c2

c3t3

Page 6: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Constraints

Runs on commodity hardware Good quality of results

Robust Determinism

Bug reportingConsistent regression results

Page 7: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Selected Previous Work

Close relatedMove accelerationParallel moves

Other methods Independent setsPartitioned placementsSpeculative

Page 8: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Algorithm #1

Page 9: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Algorithm #2

Page 10: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Objective

Determine efficacy Analyze runtime and categorize

MemorySynchronization InfrastructureEvaluationProposal

Page 11: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Methodology

Parallel equivalent flowSerial flow which mimic parallel flowEmulates behavior of multithreaded

application by using only one thread/core Useful for comparison

Accounts for infrastructure overhead

Page 12: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Methodology

Attributing runtime Two types of measurements

Bottom up (bu) measure each component of a move

End-to-end (e2e) measure runtime for entire run

Page 13: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Methodology

Page 14: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Methodology

Test setsSet of 11 Stratix® II FPGA benchmark

designs IP and customer circuits 10k to 100k logic cells

Also tested on 40 Stratix II FPGA circuits Obtained similar result

Page 15: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Results for Algorithm #1

Page 16: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Moves attribution

Page 17: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Overhead analysis

Page 18: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Observations

Theoretical speedup 1.7xMeasured: 1.3x (best)

Increase in evaluation runtimeDue to reduced cache locality

Proposal time is “hidden”

Page 19: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Analysis

Time spent on stall is negligible Evaluation accounts for most of overhead Little to gain by removing determinism

Serial equivalency is less than 3% runtime

Page 20: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Summary for Algorithm #1

Speedup: 1 – 1.3x Memory inefficiency is the biggest

bottleneck Theoretically algorithm should scale

However, difficult to partition and balance two stages

Page 21: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Speedups for Algorithm #2

Page 22: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Attribution on 2 cores

Page 23: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Attribution on 2 cores

Page 24: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Attribution on 4 core

Page 25: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Attribution on 4 cores

Page 26: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Observations

Memory latency due to inter-processor communicationWorsens with more cores

Page 27: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Summary for Algorithm #2

Parallel moves has better scalability than pipelined moves

Bottleneck is still memory Again serial equivalency costs little

Page 28: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov

Take Home Messages

Memory is important Good algorithms are even more important