Upload
lamphuc
View
246
Download
1
Embed Size (px)
Citation preview
S2CBench : Synthesizable SystemC Benchmark Suite for High-Level
SynthesisBenjamin Carrion Schafer1, Ansuhree Mahapatra2
The Hong Kong Polytechnic UniversityDepartment of Electronic and Information Engineering
[email protected], [email protected]
@DARC_lab
2
Outline
• Motivation for a Synthesizable SystemC Benchmark Suite• Requirements for S2CBench• A review of past HLS benchmarks• S2CBench overview
– 12 synthesizable design– 1 non synthesizable (tests floating point and trigonometric functions)
• Benchmark composition overview• Detail benchmark characteristics
– Size– Complexity– Arithmetic operations
• How to download• Conclusions
Motivation for S2CBench
3
Vendor Tool Name Supported Languages
Cadence (Forte) Cynthesizer SystemC
Cadence C-to-Silicon C, C++, SystemC
Calypto CatapultC C++, SystemC
NEC CyberWorkBench C, SystemC
Xilinx Vivado HLS C, C++, SystemC
• Not evaluation standards
• Not enough HLS models for RTL designs to compare QoR
HLS tool evaluation
times
• Different HLS tools support different input languages
• One common language supported by all HLS tools: SystemC
HLS tool input languages
Requirements for a HLS Benchmark
Must cover different domains of applications
Test specific optimization techniques of HLS tools
Avoid re-writing descriptions for tool evaluations
4
S2CBench enables the direct comparison of commercial HLS tools
HLS Benchmarks: A Review
5
CHStone
HLS benchmark based on ANSI-C
ANSI-C not supported by all
HLS tools
C does not support fixed-
point operations
HLS92,95
Written in behavioral VHDL
Targets various optimization
options
Currently not supported by any HLS tool
MiBench, MediaBench
Written in C
Do not support fixed point operations
Do not test specific HLS
features
S2CBench: An Overview
• Covers different domains:
Automotive
Security
Telecommunication
Consumer
• Contains Control dominant and Data dominant designs
6
12+1 SystemC Benchmarks which comply with latest SystemC synthesizable subset draft
12 Synthesizable1 non
synthesizable
• Contains trigonometric & floating-point operations
• Helps check the ability of tools to support them
S2CBench: Features
7
Synthesis optimizations(e.g. loop unrolling, pipelining, function inlining, array synthesis)
Tool language support(e.g. templates, structures, fixed point data types)
Tool performance(e.g. synthesis run-time, accuracy of synthesis report)
Every application features a testbench along with associated test vectors
HLS optimizations
Synthesizer language support
Synthesizer performance
S2CBench model
• Testbench functions:– TB sends stimuli data stored in files
(test vectors) to UUT
– TB receives the data and compares it against golden output (stored in file)
– TB reports if results match or not
• Option to dump VCD file
• Test vectors are modifiable
8
S2CBench : 12+1 designs
9
Design Type Domain Optimizations Tested
qsort dd Auto/Ind Loops, arrays, functions pointers
sobel dd Auto/Ind Loops, functions, IO array expansion, multi-dimensional arrays expansion, fixed arrays (ROM, logic)
aes cipher dd Security IO array expansion, multi-dimensional arrays expansion , large fixed arrays
kasumi dd Security Multi-processes, delay report accuracy
md5c dd Security #define macros, delay report accuracy
snow3G dd Security Templates, delay report accuracy, function synthesis
adpcm cd Telecom Structure synthesis
FFT dd Telecom Floating point, trigonometric functions
FIR dd Consumer IO array expansion, arrays, loops, functions, sum of products
Decimation dd Consumer Resource sharing across loops, fixed point data types
Interp dd Consumer Polynomial decomposition, fixed point data types, sum or products
IDCT dd Consumer #include statement to initialize arrays, loops, functions,
Disparity cd/dd Consumer Hierarchical design, multi-dimensional array expansion, synthesis running time
Detail Benchmark Contents for each Design
• Top module including UUT and TB(Main.cpp)
• Design description( <benchmark>.cpp/.h)
• Testbench module(tb_<benchmark>.cpp/.h)
SystemC files
• Test vector( input.txt)
• Golden output(outputs_golden.txt)
• Image files(some applications)Stimuli data
• Allows different make options•Make: generates executable binary(default)
•Make ‘wave’: binary having VCD file
•Make ‘debug’: binary with debug version
Makefile
10
Quick – Quick Sort
• Description
sort design sorts data in ascending order using the well-known quick sort algorithm
• Optimization options to be tested
loop unrolling
array synthesis (register or memory)
function synthesis with pointer argument support
11
Sobel
• Descriptionedge-detection algorithm that takes a bitmap image
directly as the input and returns a new bitmap image solely consisting of the edges of the original image.
• Optimization options to be tested nested loop unrolling and pipelining optimizations I/O ports expansion (expand inputs specified as arrays
to individual ports)multi-dimensional arrays expansionfixed arrays synthesized as logic or ROMspointer arguments to functions
12
AES - Advanced Encryption Standard Cipher
• DescriptionAdvanced Encryption Standard Cipher encryption
algorithm performs AES encryption
• Optimization options to be tested Contains a large number of small for loops having
inter-loop data dependencies.
Input port expansion
Array synthesis (memory or registers)
Function synthesis (inline, goto)
Large fixed arrays synthesized as logic or ROMs.
13
Kasumi
• Descriptionblock cipher algorithm used in mobile
communication systems
Composed of two sc_threads and multiple functions
• Optimization options to be tested Ability to synthesize large logic operations
Accuracy of estimating the critical path
Multi-process systems verification
14
MD5C - Message Digest Algorithm
• Description
generates hash functions and check data integrity.
• Optimization options to be tested
functions synthesis
arrays of different bit widths
different levels of loop nesting
extensive use of define macros (language support)
15
Snow3G
• Description
stream cipher that produces a key stream that consists of 32-bit blocks using a 128-bit key
Contains a variable length multiplication operation
• Optimization options to be tested
Support of templates.
Loops, functions and array synthesis
16
ADPCM -Adaptive Differential Pulse-Code modulation (encoder part only)
• Description
accepts 16-bit Pulse Code Modulation (PCM) samples as input and converts them into 4-bit samples
• Optimization options to be tested
loop unrolling, function synthesis, array synthesis
support for structures synthesis
17
FFT – Fast Fourier Transform
• Description
– Converts time/space to frequency and vice versa
– Not synthesizable as per latest synthesis draft
• Optimization options to be tested
– Support for floating point operations and trigonometric functions
18
Why is it included
Most HLS vendors do support floating-point, trigonometric operations
Helps evaluation engineers analyze the extent of support of these operations
FIR – Finite Impulse Response Filter
• Description
– 10- tap FIR filter algorithm designed for 8- bit integer operations.
• Optimization options to be tested
– loop unrolling and pipelining
– automatic array expansion of the I/O ports
– pointers to functions
19
Decimation Filter
• Description– 5-stage decimation filter. Consists of 5 FIR filters
cascaded together where the output of one stage is the input to the next stage.
• Optimization options to be tested – resource sharing of the Multiply Accumulate
(MAC) operations across loops
– fixed-point data types and its different rounding and saturation modes.
– Ability to preserve the SoP constructs
20
Interpolation Filter
• Description
– 4-stage interpolation filter
• Optimization options to be tested
– automatic polynomial decompositions(Mathematical optimization of HLS tool)
– fixed- point data types and its different rounding and saturation modes
21
IDCT - Inverse Discrete Cosine Transform
• Description
– Describes a finite sequence of data points as a sum of cosine functions of different frequencies
• Optimization options to be tested
– initialization of an array using #include statement
– loops, functions, array synthesis
22
Disparity – Stereoscopic Disparity Estimator
• Description
– estimates the disparity in a stereoscopic image.
– It is the largest of all the designs and consists of 4 processes executed in parallel
• Optimization options to be tested
– Synthesis running time of the HLS tool
– Verification of Multi-process (threads) systems
23
Detailed Benchmark Characteristics
Variety of operations
24
Distribution of Number of Program Lines
25Note: Only lines of code of synthesizable description (not including testbench)
Complexity distribution of designs
26
0
5
10
15
20
25
qsort sobel aescipher
kasumi md5c snow3G adpcm fft fir decim interp idct disparity
NU
MB
ER
APPLICATIONS
Processes
Function
No. of arrays
if statement
for loop
while loop
Publicly Available for download
• www.s2cbench.org• http://sourceforge.net/projects/s2cbench/
27
More Information
B. Carrion Schafer and A.Mahapatra, "S2CBench:Synthesizable SystemC Benchmark Suite for High-Level Synthesis “IEEE Embedded Systems Letters, Accepted for publication
YouTube Channel: DARClabify
28
Summary and Conclusions
• A benchmark suite in a common language supported by all major HLS vendors
• Each benchmark tests unique HLS features1. Tool language support
2. Synthesis optimizations
3. Tool performance
• Benchmarks include testbench with inputs, golden outputs and option to generate VCD file
• Publicly available at www.s2cbench.org or sourceforge.net
29