Upload
maurice-honeycutt
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Transforming a FAST simulator into RTL implementation
Nikhil A. Patil & Derek ChiouFAST Research group,
University of Texas at Austin
1
Outline• Research Goal• Motivation• Quick introduction to FAST• Going from FAST to RTL– Data-path– Microcode Compiler– Golden Models– Optimizing to single-cycle
• Benefits• Conclusions
2
Research Goal
• Simplify the design, development, and verification of computer systems
• Significantly reduce overall architecture, RTL, verification, software effort
• Eliminate wasted work; enable code-reuse
3
Motivation
Information duplication in traditional design flow
Architectural Simulator
RTL
Verification
Low Accuracy Software Simulator
Compiler
Synthesis Flow
Software
4
Pre-silicon S-RTL Bugs in Pentium 4
Bob Bentley, “Validating the Intel® Pentium® 4 Microprocessor”, DAC 2001 5
Vision of an ideal design flow
Architectural & Micro-architectural Specification
Architectural Simulator
RTL Verification Software
Shared specification reduces information duplication
6
Vision of an ideal design flow
• Single central source (“code-base”) for all of the following:– Architectural studies– Micro-architectural tuning– RTL implementation– RTL level power modeling– RTL Verification– Software development
• Note: For now, we don’t address anything beyond synthesizable RTL (physical design, etc.)
7
Points to note about FAST• FM is ISA specific, but micro-architecture agnostic
– Trace sent from FM to TM is ISA-specific, not micro-architecture specific; e.g., x86 opcode, not x86 microcode
• TM implements a (potentially inaccurate) microcode table to “decode” the meaning of the trace– For a simpler ISA, table is an identity mapping
• Currently, our FM can model x86 and PowerPC targets• TM written in Bluespec SystemVerilog• TM is composed of modules connected with FAST
Connectors, that manage latency, throughput and buffering (built upon the theory of Asim A-Ports)
• FAST methodology itself does not introduce any inherent inaccuracies; all inaccuracies are due to lower fidelity models (or bugs)
9
Vision for FAST
• Single central codebase will be comprised of the following three sub-modules:– ISA simulator (C/C++)– Micro-op definition (C/C++)– Micro-architectural definition (Bluespec/C)
• Note that the information contained in each is mutually exclusive– Eliminates possibility of inconsistency
10
From FAST to RTL
• Add data-paths to the timing model– ALU, cache data-stores, forwarding paths
• Magically move the ISA from the FM to TM• Detach trace-buffers; use internal data-path• TM module, improve fidelity– @ 100% fidelity, we have a Golden model
• TM module, improve host/target-cycle ratio– @ 1:1 h/t-cycle ratio, we have RTL– Will need changes to FAST connector
11
Caveats
• Fidelity of the simulation models is transferred to the implementation
• Depending on the model fidelity, it may or may not be possible to run actual software on the implementation
• Use software that uses only the subset of features supported with 100% fidelity; e.g.:– Self-modifying code– Unaligned accesses
12
From FAST to RTL
• Add Data-path• Add Functionality• Detach trace-buffers• Improve fidelity• Improve host performance
13
Data-path• Assuming a sufficiently high fidelity model:
• Adding data-path does not change the module interfaces significantly • It is simple enough to do manually (TASK)
• This process can sometimes unearth fidelity bugs in the simulator; e.g., not accounting for limited number of ports on a register file
• The data-path can be trivially removed for simulation flows
• Data-path also needed for power modeling of certain modules
`if `DATA_PATH == 1 typedef Bit#(32) Data_t;`else typedef Bit#(0) Data_t;`end
struct { Bool write; Addr_t addr; Data_t data;} DCacheReq_t
14
Functionality
• ISA simulation (in FM) can be summarized as:– Fetch: fetch instructions, advancing PC• Modeled in the TM already (with very high fidelity)
– Decode: identifies an instruction with a function• Not modeled in TM at all• Can be written manually or auto-generated (TASK)
– Execute: calls the function• Corresponds to target microcode and data-path• Microcode needs to be made 100% accurate (TASK)
15
Microcode Compiler
• Microcode Compiler (MCC) maps each instruction onto one or more micro-ops
• Takes two software (C/C++) simulators as it’s input:– ISA simulator (currently, bochs)– Micro-op simulator
• Compiles the specification of each instruction/micro-op into a data-flow graph
• Uses exhaustive search to statically map instruction execution onto one or more micro-ops based on a cost table
• In case of a failure, says why a mapping is not possible• Work in progress 16
From FAST to RTL
• Add Data-path √• Add Functionality √• Detach trace-buffers• TM module, improve fidelity– @ 100% fidelity, we have a Golden model
• TM module, improve host/target-cycle ratio– @ 1:1 h/t-cycle ratio, we have RTL– Will need changes to FAST connector
17
Golden models• A 100% cycle-accurate model• May still take multiple FPGA cycles to model a
single target cycle• It is in fact a legitimate implementation• Serves as a golden reference model for the next
step (optimization) as well as for writing and debugging verification suites
• Traditionally, verification teams have written golden models from the architectural specs
• Likely to use FPGA structures efficiently
18
Optimizing to single-cycle
• Automatic transformation of modules may be possible for some simple modules using algorithms to– Unroll a “loop” in hardware– Collapse a multi-state FSM into a single state
• Can Bluespec help here?• Manual optimization is certainly feasible• Currently, FAST Connectors don’t allow this
optimization (TASK)– Connector interface cannot support modules that take
exactly 1 host cycle for every target cycle– Work in progress
19
From FAST to RTL
• Add Data-path √• Add Functionality √• Detach trace-buffers √• TM module, improve fidelity √– @ 100% fidelity, we have a Golden model
• TM module, improve host/target-cycle ratio √– @ 1:1 h/t-cycle ratio, we have RTL– Will need changes to FAST connector
20
Alternative path
• Design the original TM modules as 1-host-cycle implementations
• Automatically convert to n-host-cycle for the simulator– Using Bluespec?
• Without automatic conversion, we would end up with RTL before FAST simulator!– Almost like prototyping
21
Potential benefits• Provides a way to verify FAST simulators• Golden models can be generated for the verification
teams– Verify resulting implementation
• Provide working implementation to RTL designers– Replace one component at a time– Provides a test-rig– Runs software
• Improves communication between teams• Eliminates SIM-RTL calibration• Potentially faster than the simulator– Early versions can be made available to software team
22
Conclusions
• This technology provides a way to use a “single codebase” to meet a variety of needs from Simulation to Implementation to Verification.
• Single central codebase will be comprised of the following three sub-modules:– ISA simulator (C/C++)– Micro-op definition (C/C++)– Micro-architectural definition (Bluespec/C)
23