View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Synthesis of Transaction-Level Models to FPGAsSynthesis of Transaction-Level Models to FPGAs
Prof. Jason CongProf. Jason Cong
Yiping Fan, Guoling Han, Wei Jiang, Zhiru ZhangYiping Fan, Guoling Han, Wei Jiang, Zhiru ZhangVLSI CAD LabVLSI CAD Lab
Computer Science DepartmentComputer Science Department
University of California, Los AngelesUniversity of California, Los Angeles
OutlineOutline
Transaction-level model (TLM)Transaction-level model (TLM) SystemC TLMSystemC TLM
Metropolis Meta ModelMetropolis Meta Model
Synthesis from TLMSynthesis from TLM RDR/MCAS: our existing architectural synthesis approachRDR/MCAS: our existing architectural synthesis approach
xPilot: Ongoing synthesis infrastructure for TLMxPilot: Ongoing synthesis infrastructure for TLM
OutlineOutline
Transaction-level model (TLM)Transaction-level model (TLM) SystemC TLMSystemC TLM
Metropolis Meta ModelMetropolis Meta Model
Synthesis from TLMSynthesis from TLM RDR/MCAS: our existing architectural synthesis approachRDR/MCAS: our existing architectural synthesis approach
xPilot: Ongoing synthesis infrastructure for TLMxPilot: Ongoing synthesis infrastructure for TLM
SystemC FrameworkSystemC Framework
SystemC historySystemC history OO system/HW modeling OO system/HW modeling
and simulationand simulation SystemC under development SystemC under development
by CAD vendors/researchersby CAD vendors/researchers• SynopsysSynopsys• Frontier DesignFrontier Design• CoWare (Belgium)CoWare (Belgium)
Released to public Sept. ‘99Released to public Sept. ‘99• Open source distribution Open source distribution
@ @ www.systemc.orgwww.systemc.org• Version 2 out July ‘01Version 2 out July ‘01
Channels and ModulesChannels and Modules
Basic building blocks:Basic building blocks: ModuleModule (class) instances, communicating via (class) instances, communicating via channelchannel (class) instances (class) instances Modules’ functionality coded as concurrent Modules’ functionality coded as concurrent processesprocesses
• Processes communicate via channels or Processes communicate via channels or eventsevents
Primitive Channels in SystemC LibraryPrimitive Channels in SystemC Library Ordinary signal (wire) of type <T>Ordinary signal (wire) of type <T>
Fill in data type T when instantiatedFill in data type T when instantiated Point-to-point or multi-point (1 writer, n readers)Point-to-point or multi-point (1 writer, n readers)
Signal bus (arbitrary width)Signal bus (arbitrary width) FIFO, for producer/consumer connectionFIFO, for producer/consumer connection Pseudo-channelsPseudo-channels
Mutex & semaphore, for interprocess syncMutex & semaphore, for interprocess sync Accessed using channel syntaxAccessed using channel syntax
Complex “hierarchical” channels composed of primitive channels, Complex “hierarchical” channels composed of primitive channels, processes, modulesprocesses, modules
Events and ProcessesEvents and Processes Events: abstract occurrences used forEvents: abstract occurrences used for
Process triggering (like VHDL sensitivity list)Process triggering (like VHDL sensitivity list) Channel communicationChannel communication Interprocess synchronizationInterprocess synchronization
Process can call wait() to block on eventProcess can call wait() to block on event Event occurrence tells simulator to schedule simulation of relevant processEvent occurrence tells simulator to schedule simulation of relevant process Processes execution Processes execution
NotNot called directly from your code called directly from your code Triggered for simulation by events on ports, channels, or explicit named eventsTriggered for simulation by events on ports, channels, or explicit named events Registered in constructor of enclosing module (associate method with events)Registered in constructor of enclosing module (associate method with events)
Thread process → infinite loopThread process → infinite loop Must call wait() to lose controlMust call wait() to lose control
Method process → runs to completionMethod process → runs to completion Less scheduling overheadLess scheduling overhead
Data Types in SystemCData Types in SystemC SystemC supports
Native C/C++ Types SystemC Types
SystemC Types Data type for system modeling 2 value (‘0’,’1’) logic/logic vector 4 value (‘0’,’1’,’Z’,’X’) logic/logic vector Arbitrary sized integer (Signed/Unsigned) Fixed Point types (Templated/Untemplated)
Objective: Objective: to reflect HW registers & ALU operationsto reflect HW registers & ALU operations
Functional Level and RTL Modeling in SystemCFunctional Level and RTL Modeling in SystemC Functional levelFunctional level
Sequential, algorithmic, software-likeSequential, algorithmic, software-like
Explore HW/SW architectures, proof of algorithms, performance modeling & Explore HW/SW architectures, proof of algorithms, performance modeling & analysisanalysis
Register transfer level Register transfer level Complete Complete detailed functional descriptiondetailed functional description of hardware of hardware
• Every register, bus, bit for every clock cycleEvery register, bus, bit for every clock cycle• Use C++ switch/case for FSM implementationUse C++ switch/case for FSM implementation
At this point, can switch to HDL, but staying in SystemC leverages test At this point, can switch to HDL, but staying in SystemC leverages test benchesbenches
Prepare for HW synthesis step by using only synthesizable constructsPrepare for HW synthesis step by using only synthesizable constructs
Transaction Level Modeling in SystemCTransaction Level Modeling in SystemC
Transaction level Transaction level Model includes architectural componentsModel includes architectural components
Maintain component interface accuracyMaintain component interface accuracy• E.g., buses modeled as channels (read/write operations)E.g., buses modeled as channels (read/write operations)
Behavioral style inside a componentBehavioral style inside a component
Simulates 100-10,000x faster than RTLSimulates 100-10,000x faster than RTL
Provide execution platform for SW developmentProvide execution platform for SW development
TLM – Raise the Level of Architectural ModelingTLM – Raise the Level of Architectural Modeling
What is TLM? Communication uses function calls
• burst_read(char* buf, int addr, int len);
Why is TLM interesting? Simulation: Fast and compact
Integrate HW and SW models
Early platform for SW development
Early system exploration and verification
Verification reuse
Synthesis …
Reference: www.systemc.org
Typical Design Flow Using TLMTypical Design Flow Using TLM
Functional modelFunctional model Captures system Captures system
behaviourbehaviour
TLM, Transaction Level TLM, Transaction Level ModelModel Bus transactionsBus transactions Accurate interaction Accurate interaction
with SW portionwith SW portion Simulates rapidlySimulates rapidly
Can create TLM model Can create TLM model initiallyinitially
Introduction of MetropolisIntroduction of Metropolis A UCB and GSRC project, A UCB and GSRC project, http://www.gigascale.org/metropolis/http://www.gigascale.org/metropolis/
Platform-based design [ASV]Platform-based design [ASV] Platforms have sufficient flexibility to support a series of applications/products Platforms have sufficient flexibility to support a series of applications/products
Choose a platform by design space exploration Choose a platform by design space exploration
Above two require models to be reusableAbove two require models to be reusable
Orthogonalization of concernsOrthogonalization of concerns Computation vs. CommunicationComputation vs. Communication
Behavior vs. CoordinationBehavior vs. Coordination
Behavior vs. ArchitectureBehavior vs. Architecture
Capability vs. CostCapability vs. Cost
Metropolis Meta ModelMetropolis Meta Model A combination of imperative program and declarative constraintsA combination of imperative program and declarative constraints
Imperative program:Imperative program: objects (process, media, quantity, statemedia)objects (process, media, quantity, statemedia)
netlistnetlist
awaitawait
block and label block and label
interface function call interface function call
quantity annotationquantity annotation
Declarative constraintsDeclarative constraints Linear Temporal Logic (LTL)Linear Temporal Logic (LTL)
(synch)(synch)
Logic of Constraints (LOC)Logic of Constraints (LOC)
A Metropolis Design TutorialA Metropolis Design TutorialMyMapNetlist
MyFncNetlist
MP1 P2
Env1 Env2
Y2T
write()Th,Wk
T2Y
read() Bus
ArbiterBus
Mem
Cpu OsSched
MyArchNetlist
mP1 mP2mP1 mP2
B(P1, M.write) <=> B(mP1, mP1.writeCpu); E(P1, M.write) <=> E(mP1, mP1.writeCpu);
B(P1, P1.f) <=> B(mP1, mP1.mapf); E(P1, P1.f) <=> E(mP1, mP1.mapf);
B(P2, M.read) <=> B(P2, mP2.readCpu); E(P2, M.read) <=> E(mP2, mP2.readCpu);
B(P2, P2.f) <=> B(mP2, mP2.mapf); E(P2, P2.f) <=> E(mP2, mP2.mapf);
Bus
ArbiterBus
Mem
Cpu OsSched
MyArchNetlist…
……
Outlook of the First Metropolis ReleaseOutlook of the First Metropolis Release
Meta model infrastructure
SPIN interface
LOC checking
Front end
Meta model language
SystemC simulation
Back end1
Abstract syntax trees
Back end2 Back endNBack end3
Meta model debugger
Sample architectural libraries:
• coarse-simple cpu, bus, memory, arbiters
• time quantity
Sample MoC:
• multi-media (Yapi, TTL)
• Synchronous
A design tutorial
http://www.gigascale.org/metropolis/http://www.gigascale.org/metropolis/
TLM ConclusionsTLM Conclusions SystemC is the defacto system-level-design standard SystemC is the defacto system-level-design standard
Pushed by many CAD tool vendorsPushed by many CAD tool vendors Used widely in industry and academia Used widely in industry and academia
• E.g., Intel handhold system project [ICCAD’04]E.g., Intel handhold system project [ICCAD’04] Unified language to model a system in different levelsUnified language to model a system in different levels Improving path to HW synthesis from SystemC source codeImproving path to HW synthesis from SystemC source code Fits with trend to take system design to higher levelFits with trend to take system design to higher level
Metropolis is a novel academic framework of model of Metropolis is a novel academic framework of model of computationcomputation Capable of representing TLM as wellCapable of representing TLM as well Provides a comprehensive starting point of synthesisProvides a comprehensive starting point of synthesis
OutlineOutline
Transaction-level model (TLM)Transaction-level model (TLM) SystemC TLMSystemC TLM
Metropolis Meta ModelMetropolis Meta Model
Synthesis from TLMSynthesis from TLM xPilot: our ongoing synthesis infrastructure for TLMxPilot: our ongoing synthesis infrastructure for TLM
RDR/MCAS: our existing architectural synthesis approachRDR/MCAS: our existing architectural synthesis approach
xPilot: TLM to RTL Synthesis Flow xPilot: TLM to RTL Synthesis Flow
TLM in TLM in SystemC/MetropolisSystemC/Metropolis
RTLRTL
SSDMSSDMSSDMSSDM
Arch-generation passes: RTL/constraints geneArch-generation passes: RTL/constraints generationration Verilog/VHDL/SystemCVerilog/VHDL/SystemC Altera/XilinxAltera/Xilinx General/Synopsys/Magma …General/Synopsys/Magma …
Arch-dependent passesArch-dependent passes Memory analysis/allocationMemory analysis/allocation Scheduling/Binding/Memory analysis/allocationScheduling/Binding/Memory analysis/allocation Register/port bindingRegister/port binding Traditional/Low power/RDR-pipe or Placement Traditional/Low power/RDR-pipe or Placement
driven …driven …
Arch-Independent passesArch-Independent passes SSDM CheckingSSDM Checking Loop unrolling/pipeliningLoop unrolling/pipelining Strength reduction/Bitwidth analysisStrength reduction/Bitwidth analysis Speculative-execution transformation …Speculative-execution transformation …
FPGAsFPGAsFPGAsFPGAs
FrontendFrontendFrontendFrontend
Integration xPilot with MetropolisIntegration xPilot with Metropolis
Meta model infrastructure
Front end
Meta model language
SystemC Simulation
Abstract syntax trees
LOC Checking SPIN Interface Synthesis
HW Implementation
…
FPGA ASICS …
IP AssemblyPredictable RTL Synthesis
RTLTiming
ConstraintsPhysical
Constraints
RTL Handoff
Latency Latency Insensitive DesignInsensitive Design
GALSGALSRDR/MCASRDR/MCAS
IP Library
HW implementation
Compilation for RP
…
Simulation
Extended Instruction
ReconfigurableInterconnect
ReconfigurableCoprocessor
…
xPilot/SSDM
SSDM Zoomed In – CDFG SSDM Zoomed In – CDFG
if (cond1) bb1();if (cond1) bb1();
else bb2();else bb2();
bb3();bb3();
switch (test1) {switch (test1) {
case c1: bb4(); break;case c1: bb4(); break;
case c2: bb5(); break;case c2: bb5(); break;
case c3: bb6(); break;case c3: bb6(); break;
}}
bb7()bb7()
cond1 bb1()
bb2()
bb3()
bb4()
test1
bb5() bb6()
T
F
c1
c2
c3
bb7()
2-level CDFG representation2-level CDFG representation 11stst level: control flow graph level: control flow graph 22ndnd level: data flow graph level: data flow graph
SSDM Features Different from Software IRSSDM Features Different from Software IR Top-level: netlist of concurrent processes Top-level: netlist of concurrent processes
Process port/interface semanticsProcess port/interface semantics FIFO: FifoRead() / FifoWrite()FIFO: FifoRead() / FifoWrite()
BUFF: BuffRead() / BuffWrite()BUFF: BuffRead() / BuffWrite()
Memory: MemRead() / MemWrite()Memory: MemRead() / MemWrite()
Bit vector manipulationBit vector manipulation Bit extraction / concatenation / insertionBit extraction / concatenation / insertion
Bit-width property for every valueBit-width property for every value
Cycle-level notationCycle-level notation Scheduling / binding information / delay Scheduling / binding information / delay
Our Architectural Synthesis Approaches – RDR / MCASOur Architectural Synthesis Approaches – RDR / MCAS
Consideration of multi-cycle communication during architConsideration of multi-cycle communication during archit
ectural (or behavioral) synthesisectural (or behavioral) synthesis Regular Distributed Register (RDR) micro-architecture Regular Distributed Register (RDR) micro-architecture
[Cong et al, ISPD’03][Cong et al, ISPD’03]• Highly regularHighly regular• Direct support of multi-cycle on-chip communicationDirect support of multi-cycle on-chip communication
MCAS: Architectural Synthesis for Multi-cycle CommunicationMCAS: Architectural Synthesis for Multi-cycle Communication• Efficiently maps the behavioral descriptions to RDR uArch Efficiently maps the behavioral descriptions to RDR uArch • Integrates architectural synthesis (e.g. resource binding, schedulinIntegrates architectural synthesis (e.g. resource binding, schedulin
g) with physical planningg) with physical planning
RDR/MCAS: Support for Heterogeneous Integration with Multi-RDR/MCAS: Support for Heterogeneous Integration with Multi-cycle Communication & Automatic Interconnect Pipeliningcycle Communication & Automatic Interconnect Pipelining
Distribute registers to each “island”Distribute registers to each “island” Choose the island size such thatChoose the island size such that
Single cycle for intra-island computation and communicationSingle cycle for intra-island computation and communication Multi-cycle communication between islands Multi-cycle communication between islands
Support interconnect pipeliningSupport interconnect pipelining Inter-island pipeline register station (PRS) for global communicationsInter-island pipeline register station (PRS) for global communications PRS performs PRS performs autonomous autonomous store-and-forwardstore-and-forward
MCAS: Multi-cycle architectural synthesis integrated with global placementMCAS: Multi-cycle architectural synthesis integrated with global placement Experimental resultsExperimental results
MCAS vs. Conventional flow:MCAS vs. Conventional flow:
• 36% reduction in clock period and 36% reduction in clock period and
• 30% reduction in total latency30% reduction in total latency MCAS-Pipe vs. MCAS:MCAS-Pipe vs. MCAS:
• 28.8% long global wirelength reduction28.8% long global wirelength reduction
• 19.3% total wirelength reduction19.3% total wirelength reduction
Can also support IP integration using latency Can also support IP integration using latency insensitive technique [Carloni, ICCAD’99]insensitive technique [Carloni, ICCAD’99]
Pipeline Register Station (PRS)3
1 24
LCC
FS
M
LCC
FS
M
LCC
FS
M
IP Library
Adaptor
Reg. FileV channel
H channel1 2
3 4
PRS
PRS
PRS
PRS
Synthesis Flow: MCAS-Pipe SystemSynthesis Flow: MCAS-Pipe System
ICG
C / VHDL
Locations
Placement-driven rescheduling & rebinding
Placement-driven rescheduling & rebinding
Scheduling-driven placementScheduling-driven placement
CDFG generationCDFG generation
Register and port bindingRegister and port binding
Datapath & FSM generationDatapath & FSM generation
Resource allocation& Functional unit binding
Resource allocation& Functional unit binding
RTL VHDL & Floorplan constraints
CDFG
Global interconnect sharingGlobal interconnect sharing
Global interconnect Global interconnect
sharingsharing Enable multiple data Enable multiple data
communications to share communications to share one physical link (a wire one physical link (a wire with pipeline registers)with pipeline registers)
Related PublicationsRelated Publications Regular distributed register (RDR) architecture and MCAS synthesis Regular distributed register (RDR) architecture and MCAS synthesis
algorithms algorithms ISPD’03, ICCAD’03ISPD’03, ICCAD’03
RDR-Pipe and MCAS-Pipe synthesis algorithmsRDR-Pipe and MCAS-Pipe synthesis algorithms DAC’04DAC’04
Lopass: high-level synthesis for low-power FPGAsLopass: high-level synthesis for low-power FPGAs ISLPED’03ISLPED’03
Multiplexor optimization through register/port binding Multiplexor optimization through register/port binding ASPDAC’04ASPDAC’04
Bitwidth-aware scheduling and binding algorithms Bitwidth-aware scheduling and binding algorithms ASPDAC’05ASPDAC’05
ConclusionsConclusions
Higher level abstraction is needed in current SO(P)C desigHigher level abstraction is needed in current SO(P)C desig
n flown flow SystemC becomes the SLD standard, esp., TLM is widely usedSystemC becomes the SLD standard, esp., TLM is widely used
Metropolis is a platform-based design frameworkMetropolis is a platform-based design framework
It is time to build new generation of behavioral synthesis system fIt is time to build new generation of behavioral synthesis system from TLMrom TLM
xPilot:xPilot: Ongoing projectOngoing project
An architectural synthesis infrastructure from TLM to RTL (FPGAsAn architectural synthesis infrastructure from TLM to RTL (FPGAs))