CRAFT Generator-Based Hardware Design - Digital · specific hardware block/sub-system/system as possible. Zooming in for the moment on the RTL design portion of an overall SoC, our

CRAFT Generator-Based Hardware Design - Digital Elad Alon

Department of Electrical Engineering and Computer Sciences University of California at Berkeley, Berkeley, CA

[email protected]

Abstract—Under DARPA’s CRAFT program we are developing an "agile" methodology to substantially accelerate design and verification of advanced SoCs. The crux of the approach is to improve re-use by focusing designers’ efforts on capturing their knowledge in executable generators (rather than delivering instances). This paper highlights our approach (named CHISEL and FIRRTL) and how it was used to facilitate generator-based digital design of a representative SoC.

Keywords—digital, SoC, agile design, design productivity, generators, Chisel, FIRRTL

I. INTRODUCTION: AGILE DESIGN Despite the well-established energy-efficiency and

performance benefits of custom ASICs/SoCs, many end-applications simply do not have the necessary volume to justify investing the typically quoted NRE costs of $50-100M to develop such a design. There are a number of factors underlying these high costs, but as highlighted in Fig. 1, typically quoted reasons include:

• Long development cycles (>1 year) • Verification, validation, and software dominate costs • Requirement for one or more respins • Re-use largely limited to black box IP Interestingly, software developers face similar productivity

challenges in tackling complex projects as well. However, software developers have generally been very successful in adopting new workflows to substantially improve their productivity. In particular, as shown in Fig. 2, over the past few years, software engineer have transitioned away from the traditional “waterfall” development flow to an “agile”

methodology. Agile development establishes a few key principles:

• Functional (albeit incomplete) prototypes over fully features models

• Collaborative, flexible teams over rigid silos • “Sprinting” to successively more complete prototypes

in order to encounter (and resolve) issues in later portions of the development cycle as quickly as possible

• Improving tools and generators over improving the instance

• Responding to change over following a plan

As described by Fox and Patterson in [1], switching to an agile approach in software development resulted in substantially more projects being completed on-time (76% in agile vs. 10% in waterfall), fewer being completed late (20% in agile vs. 52% in waterfall), and even fewer being cancelled (4% in agile vs. 38% in waterfall). The question that then naturally arises is whether or not we can adopt agile methodologies to accelerate hardware design as well.

In order to be compatible with the agile paradigm, SoC design requires methodologies and flows that enable designs to be scalable, be iterated upon rapidly (so that one can sprint through all phases of the process and then quickly iterate by incrementally adding features), and support aggressive re-use. Out of all of these requirements, the one most lacking in current hardware design flows is unfortunately re-use.

While it is certainly true that hardware designs are currently re-used – especially in the industrial context where large SoCs are mostly assembled out of existing IP blocks – the mechanism for this re-use is mostly black-box IP. Typically, if a project has committed to building a custom

Fig.2: Waterfall vs. agile development.

Fig. 1: Breakdown of typical ASIC development NRE costs.

Distribution A: Approved for public release; distribution unlimited.

448

mailto:[email protected]

SoC, it is usually the case that there is some particular set of specializations/features the ASIC needs to include that are not necessarily included in/supported by the black-box IP. Since the IP itself does not contain any information about how or why it was designed the way it was, the dilemma is then whether to (a) redesign the IP from scratch using an internal design team (resulting in no reuse), or (b) (re-)hire the IP vendor to modify their design (which they may or may not be willing to do at reasonable cost).

To address this dearth of re-use in SoC/ASIC design, we propose to alter designers’ (and their managers) fundamental approach such that their goal is no longer to deliver a particular instance of a design meeting some specific individual set of specifications. Instead, designers should be focusing on capturing their knowledge about how to realize such designs (i.e., their design methodology) in the form of an executable generator. This generator-based approach explicitly facilitates re-use in two ways. First, by making the generators parameterized so that permutations/modifications already envisioned by the designer (or, in this context, the person or team acting as a “generator writer”) can be realized simply by re-executing the generator software (which would be done by a “generator user”). Second, in the (common) situation where one realizes that an additional feature is required, this would be accomplished by incrementally modifying the generator code (i.e., updating only the parts of the design procedure that actually need to be change) rather than attempting to modify the instance (which doesn’t contain in and of itself any information about how it was designed).

Enabling a agile development for hardware using a generator-based approach and evaluating the benefits of this approach have been the goals of our team – which includes UC Berkeley, Cadence, and Northrop Grumman – under the DARPA CRAFT program. The rest of this paper (which focuses on the digital side of SoC design) will therefore briefly describe a few of the key technologies we have developed and utilized for this purpose.

II. FACILITATING DIGITAL GENERATORS: CHISEL As mentioned in the introduction, the crux of a generator

is that it should capture as much of the designer’s knowledge of the approach used to realize (in an optimal way) some specific hardware block/sub-system/system as possible.

Zooming in for the moment on the RTL design portion of an overall SoC, our team has developed and made use of a domain-specific language called “Chisel” to facilitate the development of digital RTL generators [2]. Chisel stands for “Constructing Hardware in a Scala Embedded Language”, and is essentially a software library whose classes represent hardware primitives. Methods are used to connect those classes together such that executing the software “constructs” a graph that represents the RTL.

To provide clarity on some common misconceptions regarding Chisel, it is important to note that Chisel does not inherently change the level of abstraction at the hardware level – one can control the RTL produced by Chisel at as low of a level as you could in e.g. verilog. In other words, unlike high-level synthesis, Chisel itself does not make use of any sophisticated compiler – it simply constructs the RTL exactly as specified by the generator (and written by the generator writer). What Chisel does offer however beyond existing RTL languages is a substantially higher level of software abstraction; these higher-level software abstractions can make it easier for a designer to capture their methodology. While it is true that from the perspective of a generator user, both high-level synthesis and generator-based approaches “magically” produce designs at the push of a button, the difference is that in a generator-based approach (like Chisel), the generator writer does not fundamentally need to relinquish control over the lower-level details of the RTL and how it is constructed. A few key examples of the higher-level features offered by Chisel/Scala include powerful parameterization systems as well as Scala’s support for functional as well as object-oriented programming. Equally importantly, Scala has an enormous base of existing software libraries that hardware designers using Chisel can (and in our experience, extensively do) make use of.

Right from its inception in ~2010, Chisel has been utilized to design a large number of academic processors and later SoCs at UC Berkeley (Fig. 3). These designs have generally achieved record energy-efficiency and/or performance for academic processors, and have spanned a number of different process technologies – including IBM (now Global Foundries) 45nm PD-SOI, TSMC 28nm, ST 28nm FD-SOI, and most recently TSMC 16nm FinFET (under the CRAFT program). Realizing these results hinged on substantially on reuse, and all of these designs indeed made substantial use of the RocketChip hardware library (or early versions thereof). The RocketChip library includes generators allowing one to instantiate parameterized (by instruction set support, cache size, etc.) RISC-V processors, vector co-processors, DMA engines, etc. – all within an SoC context enabling additional external peripherals.

III. FIRRTL Since the entire goal of the generator-based methodology

is to facilitate as much reuse as possible, it is important to realize that there are portions of any given RTL design that may be the result of the specific requirements/libraries of a given implementation technology. For example, as shown in Fig. 4, the SRAM macros in ST’s 28nm FDSOI process may behave very differently (or have very different options av

Fig. 3: Chisel-designed chips at UC Berkeley.

449

available) than the memories in e.g. an FPGA implementation or another silicon process technology. Baking such process-specific constraints into the Chisel generator would thus inhibit re-use.

The issue mentioned above really boils down to separation of concerns between higher-level RTL structure and process specific implementation constraints. Fortunately, we can once again borrow a solution that nicely addresses this issue in software. In particular, modern compilers make use of a technology known as “LLVM” (Low-Level Virtual Machine) and an associated intermediate representation (IR) precisely to address a similar separation of concerns issue (between high-level program flow and optimizing binaries for a given platform) in software. The key idea is to take the high-level description (in our case, Chisel), and rather than directly produce an implementation optimized for a specific implementation platform (e.g., process technology, FPGA vs. ASIC, etc.), first produce an IR (which in our case we have named FIRRTL – or Flexible Intermediate Representation for RTL) which can then have a series of transformations applied to it to eventually result in the optimized implementation. This facilitates reuse both because it separates process-specific from process-independent portions of the code, but also because the transformations are often independent of the design itself (e.g., converting generic memories in to specific memories available in the process) and can hence themselves be reused across multiple different generators/blocks.

The latest version of Chisel (3.0) adopts this approach and emits FIRRTL instead of Verilog, and as of this writing, more than 30 FIRRTL transformations have been written. These transformations include SRAM insertion, dead elimination, power estimation, etc.

IV. VERIFICATION AND OVERALL FLOW Figure 5 shows the overall generator-based flow

developed by our team. As mentioned previously, verification and validation consume just as much as time and effort as the design itself (if not more), and it is therefore critical that these activities benefit from the same level of re-use as the design activities do. In this context, it should hopefully be abundantly clear that if the design itself is being generated, all of the verification collateral (environments and testing) must be generated as well. In other words, every design generator must have an associated verification generator as well.

Following standard best practices, it is desirable for the verification and design generators to be developed by two independent individuals. As such, a mechanism must be created for the design generator to inform the verification

generator about the details of the instance it has actually produced – i.e., which parameters were set and what their values were. In order to facilitate this communication, we have selected to make use of the IP-XACT standard [4]. IP-XACT is an industry standard XML Schema for capturing meta data about an IP block, and in addition to passing generator parameters, is also used to capture information such registers, interfaces, and memory maps for a given instance.

The task of developing a verification generator can be conceptually split in to two pieces. The first is to construct the scoreboard – i.e., the set of vectors used to exercise the design along with their expected values (or ranges), all of which may be functions of the design parameters. As indicated in Fig. 5, our initial flow implemented this with a Python script, but of course most any general-purpose programming language would suffice. The second task is to actually construct the testbench while ensuring that all of the interfaces are connected correctly. Although conceptually straightforward, when done manually, this “simple” step can actually turn out to be a dominant contributor to time spent on verification and debugging; with a complex SoC containing 100’s to 1000’s of blocks and millions of interfaces signals, it is perhaps not surprising that this is the case.

Fortunately, Cadence recently released a product named Verification Workbench that precisely addresses this issue of automated verification environment generation. Given an IP-XACT description of the block, Verification Workbench programmatically makes all of the connections and constructs the testbench environment, avoiding the errors introduced by a manual approach and naturally meshing with the overall generator-based approach. In addition, Verification Workbench includes VIPs for a wide variety of standard interfaces (including e.g. AXI-4, AXI-4 Stream, etc.) as well as the ability to automatically verify correctness (compliance to the standard) as well as characterize performance (peak throughput) for crossbars utilizing such standard interfaces. These facilities once again lend themselves perfectly to an agile generator-based flow, where one may be iteratively producing 10’s to even 100’s of instances from a generator, and where quick feedback about the correctness and performance of the instances is critical.

Fig. 4: Interaction between implementation technology and RTL.

Fig. 5: Overall generator-based design and verification flow.

450

V. CONCLUSION In order to address the NRE bottleneck associated with

today’s ASIC design flows, we have developed a generator-based flow that facilitates dramatically improved re-use over black-box IP and hence an agile approach to hardware design. Generator technology – and in particular, Chisel (for RTL generation), FIRRTL (for separation of concerns and process-specific optimizations) and Verification Workbench (to automatically construct and run testbenches for generated instances) are applied across all portions of an SoC design. (Although not described in this paper, an associated paper describes our approach to generating analog/mixed-signal circuitry as well.) We have utilized this generator-based flow to tapeout multiple SoCs that contain both general-purpose compute capabilities as well custom DSP and analog/mixed-signal blocks on TSMC’s 16nm process. Despite developing the flow and methodology at the same time as the SoC generators and instances, these SoCs were designed in ~60% of the time what was predicted for an SoC of similar complexity in 28nm by an industrial chip effort estimator. It is further worth noting that the time to produce a new instance of an SoC after changing a high-level parameter (e.g., addition or removal of some specific DSP block, modification to the sample rate/bitwidths, etc.) is a matter of hours for brand-new RTL, memory maps, and verification collateral. The only remaining potential bottleneck is therefore digital physical

design, which are currently in the midst of exploring the application of generator technology to as well.

ACKNOWLEDGMENT This work was funded by DARPA CRAFT program

(HR0011-16-C-0052). The author – who is merely representing the results of the team in this paper – would like to acknowledge the entire UCB-Cadence-NGC CRAFT team for their talents, efforts, and accomplishments, including the Co-PI’s Borivoje Nikolic and Jonathan Bachrach from UC Berkeley, Matthew Doerflein and Steven Shauck from Northrop Grumman, and Michael Stellfox and Joseph Cole from Cadence.

REFERENCES [1] A. Fox and D. Patterson, “Engineering Software as a Service: An Agile

Approach Using Cloud Computing,” 1st edition, May 2016. [2] J. Bachrach, H. Vo, B. Richards Y. Lee, A. Waterman, R. Avizienis, J.

Wawrzynek, and K. Asanovic, “Chisel: Constructing Hardware in a Scala Embedded Language,” IEEE/ACM Design Automation Conference, Jun. 2012..

[3] https://llvm.org/ [4] http://accellera.org/activities/working-groups/ip-xact/

451

https://llvm.org/

http://accellera.org/activities/working-groups/ip-xact/

Documents

CRAFT Generator-Based Hardware Design - Digital · specific hardware block/sub-system/system as possible. Zooming in for the moment on the RTL design portion of an overall SoC, our