Copyright Notice and Proprietary Information · Code Analysis Graph: displays a graphical representation that combines the function execution times with the variable accesses in the

Copyright Notice and Proprietary Information© 2020 Silexica GmbH. All rights reserved. This software documentation contains con-fidential and proprietary information that is the property of Silexica GmbH. The softwaredocumentation is furnished under a license agreement and may be used or copiedonly in accordance with the terms of the license agreement. No part of the softwaredocumentation may be reproduced, transmitted, or translated, in any form or by anymeans, electronic, mechanical, manual, optical, or otherwise, without prior written per-mission of Silexica GmbH, or as expressly provided by the license agreement.

Destination Control StatementAll technical data contained in this publication is subject to the export control laws ofthe German Federal Republic. Disclosure to nationals of other countries contrary toGermany law is prohibited. It is the reader’s responsibility to determine the applicableregulations and to comply with them.

DisclaimerSILEXICA GMBH, MAKES NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IM-PLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULARPURPOSE.

TrademarksSilexica and certain Silexica product names are trademarks of Silexica GmbH. All otherproduct or company names may be trademarks of their respective owners.

Third-Party LinksAny links to third-party websites included in this document are for your convenienceonly. Silexica does not endorse and is not responsible for such websites and theirpractices, including privacy practices, availability, and content.

Third-Party Open-source SoftwareSLX contains third-party open-source software components. The full list of third-partysoftware components and their licenes can be found in the data/third-party-licensessub-directory of the SLX installation and online at https://www.silexica.com/tps/.

Silexica GmbHLichtstr. 2550825 Cologne, Germanywww.silexica.com

https://www.silexica.com/tps/

Contents

Preface ix

Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Typographic Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Introduction 1

1.1 The SLX FPGA Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 HLS Application Development with SLX FPGA 5

2.1 User Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Starting SLX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 SLX FPGA Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Creating a New SLX FPGA Project . . . . . . . . . . . . . . . . . 9

2.3.2 Importing an Existing SLX FPGA Project . . . . . . . . . . . . . . 12

2.3.3 Importing and Converting an Existing SLX C/C++ Project . . . . 14

2.3.4 A Look at the Project Files . . . . . . . . . . . . . . . . . . . . . . 14

2.3.5 Importing an Existing C/C++ Source Code Tree . . . . . . . . . . 15

2.3.6 Importing a Xilinx Vivado HLS Project . . . . . . . . . . . . . . . 16

2.4 Preparation and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4.1 SLX Preference Configuration Page . . . . . . . . . . . . . . . . 21

2.4.2 Configuring SLX to Use Xilinx . . . . . . . . . . . . . . . . . . . . 22

2.5 A Look at the Toolbar and Menus . . . . . . . . . . . . . . . . . . . . . . 23

2.6 Build, Run and Debug Applications . . . . . . . . . . . . . . . . . . . . . 24

2.6.1 Configure the Build System . . . . . . . . . . . . . . . . . . . . . 25

2.6.2 Run Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.6.3 Debug Application Code . . . . . . . . . . . . . . . . . . . . . . . 27

i

CONTENTS

2.7 Application Transformation towards Optimized IP Block . . . . . . . . . 32

2.7.1 Configure Project . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.7.2 Function Mapping Editor . . . . . . . . . . . . . . . . . . . . . . . 46

2.7.3 Discovering and Resolving Synthesizability Issues . . . . . . . . 57

2.7.4 Finding and Optimizing Parallel Loops . . . . . . . . . . . . . . . 59

2.7.5 Hardware Optimization . . . . . . . . . . . . . . . . . . . . . . . . 63

2.7.6 Generate HLS-Aware Code . . . . . . . . . . . . . . . . . . . . . 64

2.7.7 Synthesize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.8 Code Analysis Tools and Views . . . . . . . . . . . . . . . . . . . . . . . 75

2.8.1 Trace Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

2.8.2 Line Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

2.8.3 Code Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

2.8.4 SLX Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

2.8.5 Software Call Graph . . . . . . . . . . . . . . . . . . . . . . . . . 82

2.8.6 Analysis Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

2.8.7 Memory Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3 Provided Examples 109

4 Application and Parallelization Hints 111

4.1 Hint Format and Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4.2 Application Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

4.2.1 Function Selected for Parallelization . . . . . . . . . . . . . . . . 112

4.2.2 Function Excluded from Parallelization . . . . . . . . . . . . . . . 112

4.2.3 Function Identified as a Hotspot . . . . . . . . . . . . . . . . . . . 113

4.2.4 Function Rejected as a Hotspot . . . . . . . . . . . . . . . . . . . 113

4.2.5 No Valid Parallelization Candidates Found . . . . . . . . . . . . . 113

4.2.6 Function does not Exist . . . . . . . . . . . . . . . . . . . . . . . 113

4.2.7 Function Requested for Exclusion does not Exist . . . . . . . . . 114

4.2.8 Function cannot be both a user-defined candidate and requestedfor exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.2.9 Function has Dynamic Data Dependencies that Hinder its Paral-lelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

ii

CONTENTS

4.2.10 Number of Functions not Selected by the User . . . . . . . . . . 114

4.2.11 Number of Functions not Automatically Selected . . . . . . . . . 115

4.2.12 A Requested Hardware Function was not Executed . . . . . . . . 115

4.2.13 A Requested Hardware Function cannot be Excluded . . . . . . 115

4.2.14 A Requested Hardware Function will be Automatically Added toUser-Defined Candidates . . . . . . . . . . . . . . . . . . . . . . 115

4.2.15 Function main is not Supported as a Hardware Function . . . . . 116

4.2.16 No Hardware Functions Will be Selected . . . . . . . . . . . . . . 116

4.3 Parallelization Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

4.3.1 The loop carries dependencies that can be ignored . . . . . . . . 116

4.3.2 Considered Unroll Factors . . . . . . . . . . . . . . . . . . . . . . 117

4.3.3 The loop will not be unrolled because it has not been iterated . . 117

4.3.4 The loop carries dependencies . . . . . . . . . . . . . . . . . . . 117

4.3.5 Pipelining will be considered . . . . . . . . . . . . . . . . . . . . . 118

5 HLS Code Generation Hints 119

5.1 HLS Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.1.1 Inserted unroll Pragma . . . . . . . . . . . . . . . . . . . . . . . . 119

5.1.2 Inserted pipeline Pragma . . . . . . . . . . . . . . . . . . . . . . 120

5.1.3 Inserted loop_tripcount Pragma . . . . . . . . . . . . . . . . . . . 120

5.1.4 Inserted array_partition Pragma . . . . . . . . . . . . . . . . . . . 120

5.1.5 Inserted array_reshape Pragma . . . . . . . . . . . . . . . . . . . 121

5.1.6 Inserted inline Pragma . . . . . . . . . . . . . . . . . . . . . . . . 121

5.1.7 Inserted interface Pragma . . . . . . . . . . . . . . . . . . . . . . 122

5.1.8 Non-synthesizable Function Call was Substituted . . . . . . . . . 122

5.1.9 Function main() will not be considered for synthesis . . . . . . . 123

5.1.10 Synthesizability Check . . . . . . . . . . . . . . . . . . . . . . . . 123

5.1.11 No Code With HLS Pragmas was Generated . . . . . . . . . . . 124

5.1.12 There are No Files to be Extended with HLS Information . . . . . 125

5.1.13 Latency and Initiation Interval of Hardware Function . . . . . . . 125

5.1.14 Ignore a Top-Level Hardware Function that is Optimized Away . . 125

5.1.15 Time was Not Estimated for a Top-Level Hardware Function . . . 125

5.1.16 No Pragma Generated because a Loop is Declared in a Macro . 126

iii

CONTENTS

5.1.17 Cycle Time was Adjusted to Meet Timing . . . . . . . . . . . . . 126

5.1.18 Clock Constraint was Violated for a Hardware Function . . . . . . 126

5.1.19 No Valid Solution to the Hardware Optimization Problem was Found127

5.1.20 Cannot proceed due to exceptions . . . . . . . . . . . . . . . . . 127

5.2 HLS pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.2.1 Unroll pragma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.2.2 Pipeline pragma . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.2.3 Loop Trip Count pragma . . . . . . . . . . . . . . . . . . . . . . . 128

5.2.4 Array Partition pragma . . . . . . . . . . . . . . . . . . . . . . . . 128

5.2.5 Array Reshape pragma . . . . . . . . . . . . . . . . . . . . . . . 129

5.2.6 Inline pragma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

5.2.7 Interface pragma . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6 HLS Messages 131

6.1 Infos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.1.1 No Synthesizability Checks for OpenMP . . . . . . . . . . . . . . 131

6.1.2 Lightweight Synthesizability Checks . . . . . . . . . . . . . . . . 131

6.1.3 Advanced Synthesizability Checks . . . . . . . . . . . . . . . . . 132

6.1.4 Area exhausted but more optimizations are available . . . . . . . 133

6.2 Warnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.2.1 Hardware Function Has Been Optimized Away . . . . . . . . . . 133

6.2.2 Area for the Hardware Functions Is Exceeded . . . . . . . . . . . 133

6.2.3 Running Limited Synthesizability Checks Without the Xilinx ToolsEnabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.2.4 Function Argument cannot be a PS/PL Interface . . . . . . . . . 134

6.2.5 HW_FUNCTION not Found . . . . . . . . . . . . . . . . . . . . . 134

6.3 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.3.1 Early Synthesizability Checks Failed . . . . . . . . . . . . . . . . 135

6.3.2 Hardware Functions with Synthesizability Errors Cannot Be Syn-thesized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.3.3 Vivado HLS Failure . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.3.4 No Hardware Implementation If There Are No Files Under theBASEPATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

iv

CONTENTS

7 Configuration Variables 137

7.1 Build Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

7.1.1 USER_INIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.1.2 USER_SLX_MODE_SWITCH . . . . . . . . . . . . . . . . . . . . . . . 138

7.1.3 USER_CLEAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.1.4 USER_BUILD_AND_RUN . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.1.5 USER_BUILD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.1.6 USER_RUN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

7.1.7 OPT_LVL_AFTER_INSTRUMENTATION . . . . . . . . . . . . . . . . . . 141

7.1.8 TARGET_LDFLAGS . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.2 Analysis Configuration Variables . . . . . . . . . . . . . . . . . . . . . . 141

7.2.1 TARGET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.2.2 PLATFORM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.2.3 CANDIDATE_THRESHOLD . . . . . . . . . . . . . . . . . . . . . . . . 143

7.2.4 FUNCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.2.5 FUNCTION_EXCLUDES . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.2.6 BASEPATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.2.7 OPTIMISTIC_ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . 146

7.2.8 FIND_PLP, FIND_DLP . . . . . . . . . . . . . . . . . . . . . . . . . 146

7.2.9 ANY_EXIT_STATUS . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2.10 VERBOSITY_LEVEL . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.3 Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

7.3.1 BASEPATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

7.4 HLS Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

7.4.1 SYNTHESIS_FLOW . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

7.4.2 FPGA_PART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.4.3 HW_FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.4.4 FPGA_CLOCK_FREQUENCY . . . . . . . . . . . . . . . . . . . . . . . 149

7.4.5 DM_CLOCK_FREQUENCY . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.4.6 SYNTHESIS_MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.4.7 EST_MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.4.8 AUTO_COMPLETE_PARTITION_MAX_ELEMS . . . . . . . . . . . . . . . 151

v

CONTENTS

7.4.9 SYNCHK_CXXFLAGS . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.4.10 SKIP_XILINX_SYNTH_CHECKS . . . . . . . . . . . . . . . . . . . . . 151

7.5 Minimal defines.mk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.5.1 Sample Project with Makefile . . . . . . . . . . . . . . . . . . . . 152

8 SLX Views, Editors and Dialogs 153

8.1 Console View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

8.1.1 Console view toolbar . . . . . . . . . . . . . . . . . . . . . . . . . 155

8.2 SLX Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8.2.1 Table Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8.2.2 Table Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

8.2.3 Column Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

8.3 Path mapping dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

9 Debugging SLX Tools 161

9.1 Debug Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

9.2 IDE Console Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

9.3 Updating the Xilinx Device Totals file . . . . . . . . . . . . . . . . . . . . 162

10 Platform and Core Model Files 163

10.1 Cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

11 Supported Xilinx Libraries 165

11.1 Support for ap_int, ap_cint and ap_fixed . . . . . . . . . . . . . . . . . . 165

11.2 Support for hls::stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

11.3 C++ template class caveat . . . . . . . . . . . . . . . . . . . . . . . . . . 166

12 Supported Xilinx Versions, Platforms, and Parts 167

12.1 Xilinx Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

12.2 Xilinx Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

12.3 Xilinx FPGA Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

13 Known issues and limitations 169

13.1 Compatibility with Previous Results . . . . . . . . . . . . . . . . . . . . . 169

vi

13.2 Support for C++14 and later applications . . . . . . . . . . . . . . . . . . 169

13.3 IDE Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

13.4 GUI Font Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

13.5 Minimum Memory Requirements . . . . . . . . . . . . . . . . . . . . . . 170

13.6 Unsupported constructs in input code . . . . . . . . . . . . . . . . . . . 170

13.6.1 fork() and vfork() . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

13.6.2 C++ exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

13.7 Non-Blocking Reads or Writes on hls::stream . . . . . . . . . . . . . . . 171

13.8 Combined use of sdx::complex and ap_fixed . . . . . . . . . . . . . . . . 171

13.9 Modulo and division on ap_int larger than 128bits . . . . . . . . . . . . . 171

13.10Specifying Multiple Top-Level Hardware Functions . . . . . . . . . . . . 171

13.11Templated code with multiple instantiations . . . . . . . . . . . . . . . . 171

13.12Support for CMake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

13.13Support for hls_math library . . . . . . . . . . . . . . . . . . . . . . . . . 171

Appendix 173

A High Level Synthesis Introduction 173

B Xilinx Vivado Installation and Setup 175

B.1 Setting up Xilinx HLS 2019.2 . . . . . . . . . . . . . . . . . . . . . . . . 175

B.1.1 Downloading Xilinx HLS . . . . . . . . . . . . . . . . . . . . . . . 175

B.1.2 Installing Xilinx HLS . . . . . . . . . . . . . . . . . . . . . . . . . 175

B.1.3 Configuring SLX to use Xilinx . . . . . . . . . . . . . . . . . . . . 181

B.1.4 Configuring the Color Scheme for Xilinx Messages . . . . . . . . 182

B.2 Synthesizable code guidelines for C and C++ . . . . . . . . . . . . . . . 182

C Glossary 185

Acronyms and Abbreviations 191

List of Figures 193

List of Tables 199

vii

viii

Preface

Revision History

Date Version Revision

12/03/2019 2019.1 Official Release


08/08/2019 2019.2-sp1 Service Pack

27/09/2019 2019.2-sp2 Service Pack


31/01/2020 2019.4-sp1 Service Pack


06/05/2020 2020.1-sp1 Service Pack


Typographic Conventions

This manual uses specific typographic conventions. The following table summarizeshow font styles are used to emphasize important elements throughout the text.

Style Usage

Bold Names of Silexica products, such as SLX for FPGA

Typewriter Literal input, e.g., user input from the command line and actual code listings

Slanted To introduce terminology used in the Silexica (SLX) series of products

Special symbols are used to denote useful information, a remark, a warning, an ex-pected failure or a requirement for user interaction. They are used within a shaded boxwith rounded corners.

Symbol Meaning

i Useful tip

Z A noteworthy point or remark

ix

� A warning on tool-specific behavior

E An expected error or failure

b Required input

X Required selection of choices

x

1Introduction

Contemporary user applications ranging from wireless communication, artificial intel-ligence, robotics to embedded vision often require performance enhancements thatcannot be reached solely by deploying commercial off-the-shelf (COTS) processors. Insuch cases, dedicated or specialized hardware implementations of key system partson an FPGA are essential.

The task of hand-designing optimized IP (Intellectual Property) blocks for imple-menting application hot-spots or critical algorithms are expensive, time consuming anddependent on highly specialized hardware design expertise. Traditional RTL-based de-sign usually require high development costs and a relatively long time to market. High-level synthesis is an automated design process that employs high level abstractionsfor algorithmic descriptions. HLS allows a developer to write C/C++ code to describethe behaviour of the hardware, data type specifications and interfaces. A synthesiz-able Verilog or VHDL code is generated as the product. HLS enables developers tooptimize resources and processing speed by isolating them from low-level design de-cisions (i.e., creating control signals, declaring port directions and widths, declaringintermediate wires and registers).

Transforming arbitrary software code into an optimized and synthesizable HLSFPGA-ready representation, however, is a challenge that requires major applicationrewrite investments and a wealth of software and FPGA expertise. To successfully uti-lize HLS technology, the input specification must meet certain coding guidelines 1. Thecauses for these inefficiencies are inherent to shortcomings of the HLS engine, suchas:

• No early estimations of performance as well as of any violations of design andplatform constraints

• Not taking advantage of all possible ways of data communication

• The absence of optimizing transformations on the user’s code

1 Xilinx Ultra Fast Design Methodology, Chapter 4: https://www.xilinx.com/support/documentation/sw_manuals/ug1197-vivado-high-level-productivity.pdf

1

https://www.xilinx.com/support/documentation/sw_manuals/ug1197-vivado-high-level-productivity.pdf

https://www.xilinx.com/support/documentation/sw_manuals/ug1197-vivado-high-level-productivity.pdf

Chapter 1. Introduction

• Unsupported code and inefficient code styles being reported too late

1.1 The SLX FPGA Solution

The SLX FPGA tool provides an iterative design approach that helps developers pro-duce optimal designs while automating and expediting repetitive steps. An overview ofthe flow is shown in Figure 1.1, where its main steps are summarized.

Figure 1.1: SLX FPGA flow overview

The main input to the flow is an application whose execution needs to be optimizedfor a device that contains an FPGA fabric. Additionally, a model of the underlyingplatform including hardened processors, soft processors, and a logical model of theFPGA is required.

SLX FPGA is a full fledged C/C++ editor based on the eclipse CDT. SLX FPGAenables the development, test and debug of the application before code analysis andoptimization, which can be time consuming.

After confirming the application functionality, SLX FPGA enables the checking ofthe selected functions for issues that will hinder the Xilinx HLS compiler to synthesizethe application for valid hardware. If portions of the code are not fully synthesizable,SLX FPGA provides hints on how the code can be re-written to be synthesizable. Thecode can then be modified or refactored using SLX FPGA synthesizability guidance(see HLS Hints Synthesizability Guidance).

The tool then performs a deep analysis of the code. It identifies performancehotspots and parallelism patterns that will benefit concurrent execution. The detectedparallelism patterns can be utilized to generate efficient IP blocks.

SLX FPGA uses a hardware optimization engine for design space exploration ofreal-life industrial problems. Optimization is achieved in seconds or minutes, instead ofdays or weeks. The user is always in control and can edit the results of the optimizationto customize the amount of parallelism exploited in the resulting IP blocks. The goal isto achieve the highest performance or area trade-off.

2

1.1. The SLX FPGA Solution

SLX FPGA provides automatic HLS-compatible code generation, which is seam-lessly handed over to HLS-based third-party tools (Vivado HLS) and creates projectsthat can be used within the native development environments.

i This guide is based on the graphical user interface (GUI) of SLX. To use thetools from a command-line or integrate them into an in-house tool workflow,please contact Silexica for more details.

i Warnings and errors that are logged while using SLX are listed in the ProblemsView (at the bottom center of the GUI). They are also highlighted in color inthe Console View .

3

Chapter 1. Introduction

4

2HLS Application Development with

SLX FPGA

This chapter explains how SLX FPGA can be used to optimize a sample applica-tion that is shipped with the tool called workshop_fpga. This application is tailoredto demonstrate the tool features in a simplified, understandable fashion.

The application consists of a series of function calls, and the goal is to generate anoptimized IP block for the hotspots by inserting the right combination of HLS pragmasin the hotspot source code. Figure 2.1 shows the typical workflow for using SLX FPGAto optimize an application function for execution on a Xilinx device.

2.1 User Workflow

Figure 2.1 displays the detailed workflow for using SLX FPGA to optimize an applica-tion function for execution on a Xilinx device.

5

Chapter 2. HLS Application Development with SLX FPGA

Figure 2.1: User workflow with SLX FPGA

6

2.1. User Workflow

• First, a Workspace is specified or created that contains all project files, includingsource files and optimization results. New projects can be created or existingprojects imported into the Workspace (Section 2.3). The application code canthen be compiled and run. To be able to compile the application, a minimal set ofconfiguration options need to be set (Section 2.6.1). The application can then bevalidated and debugged (if necessary) within the SLX FPGA IDE, as explainedin Section 2.6.2 and Section 2.6.3.

• Once the application behavior has been validated, SLX FPGA capabilities areready to be used. These capabilities help transform the application so it can beunderstood by the Xilinx HLS compiler (Section 2.7). The application is refactoredto expose parallelism and annotated with HLS compiler directives to produce anoptimized IP block. The application is then synthesized and packaged for furtheruse in a complex system. The SLX FPGA Function Mapping Editor is at thecenter of all these steps, providing a centralized interface to interact with the IPblock design (see Section 2.7.2).

• The functions for which IP blocks need to be created are inspected for languageconstructs that are not supported by the Xilinx HLS compiler (the workflow stepCheck for Synthesizability - Section 2.7.3). SLX FPGA provides an under-standing of the structure of the application by first creating a static call graph inthe form of the Function Mapping Graph. Using this graph, unsupported languagein target functions can be checked using SLX FPGA synthesizability guidance.The code can be refactored to remove such constructs, compiled and validateduntil there are no more synthesizability issues.

• Once the target functions are free of synthesizability issues, the Function Map-ping Graph is used to map functions to the programmable logic (i.e., the FPGA).This step in the workflow is known as HW Optimization - (Section 2.7.5). TheFind and Optimize Parallel Loops functionality is used to analyze these functionsfor parallelism and collect dynamic execution information to provide details fortheir memory accesses. The information is used to extract parallelism for thefunctions (Section 2.7.4).

• SLX FPGA design space exploration algorithms will then determine the optimalcombination of HLS pragmas that will allow Xilinx tools to produce an optimizedIP block for the target functions. This step is known as HLS Pragma Generationin the workflow (Section 2.7.6). (Note: the result of this action will vary depend-ing on whether all functions found to be Synthesizable have been mapped to theFPGA. If not all Synthesizable functions have been mapped to the FPGA , runningFind and Optimize Parallel Loops may lead to generated pragmas being subopti-mal). By selecting a function in the Function Mapping Graph, the decisions madeby the design space exploration algorithm can be customized for a finer controlover the pragmas inserted for the target functions (see Section 2.7.2.3). Manualexploration allows the potential to improve the quality of the resulting IP blocks.

7


Figure 2.2: Setting up a workspace directory

• Once the configurations for each of the target functions have been finalized,the next step is to invoke SLX FPGA source code transformation engine (Sec-tion 2.7.6), which will insert Xilinx HLS pragmas into the original source code.This step in the workflow is called Synthesize Project (Section 2.7.7). Evenduring this stage, the code can be manually modified via the Code Transforma-tion Wizard , as described in Section 2.7.6.1. Finally, SLX FPGA will automati-cally create a single Vivado HLS project per target function, and call the synthesisprocess for them in order to obtain the VHDL/Verilog implementation of their cor-responding IP blocks.

i This guide is based on the graphical user interface (GUI) of SLX. To use thetools from a command-line or integrate them into an in-house tool workflow,please contact Silexica for more details.

i Warnings and errors that are logged while using SLX are listed in the ProblemsView (at the bottom center of the GUI). They are also highlighted in color inthe Console View .

2.2 Starting SLX

Please follow the steps provided in the Installation Guide to prepare the workstation.Note that a valid license for Vivado HLS is required to fully exploit the capabilities of theflow. SLX can then be started by clicking the SLX icon from the Desktop (Windows) orby typing the "SLX" command inside a console (Linux).

8

2.3. SLX FPGA Projects

The first time SLX is started, a workspace location must be provided (Figure 2.2).The following Welcome Screen (Figure 2.3) is then displayed. The default tab dis-played is the Quick Start tab, which provides the options to Create and Import FPGAor SDx projects, as well as open the product documentation. The Learning Center tabprovides additional product training material, when available.

Figure 2.3: SLX Welcome Screen

2.3 SLX FPGA Projects

The analysis of a C/C++ application with SLX FPGA is organized using an SLX FPGAProject .

2.3.1 Creating a New SLX FPGA Project

To create a new SLX FPGA project, navigate to the CREATE section of the WelcomePage and click New FPGA Project (Figure 2.4), or the New SLX Project button ifalready in the Workspace. This opens a menu listing all the possible project types thatcan be created (see Figure 2.5). Select SLX FPGA Project .

9


Figure 2.4: Selecting create New Project from the Welcome Screen

Figure 2.5: Selecting create New Project from the workspace

The selection of a project type will open another window where the new projectname must be specified (in this case, sample), as seen in Figure 2.6.

Z Existing projects can be imported into the workspace, as described in Sec-tion 2.3.2. Sample projects are provided with SLX FPGA for your convenience.

Z An existing SLX C/C++ Project can be imported into the workspace and con-verted into an SLX FPGA Project . Similarly, a new C/C++ Project can alsobe converted to an FPGA one. The process and its limitations are describedin Section 2.3.3.

After clicking Finish, a minimal project is created and the configuration editor for theSLX FPGA project type is opened, as seen in Figure 2.7. Now, choose the platformthat is being targeted. The wizard creates a C/C++ source file with a trivial applicationthat can be used to start development with the tool. The example can be tried andmodified to test the product features.

10


Figure 2.6: Creating an SLX FPGA project

Figure 2.7: Configuration editor to specify platform and relevant build details

Recently opened projects can be seen at the Recent Projects section of the Wel-come Screen ( Figure 2.8).

11


Figure 2.8: Recent projects section of the Welcome screen

i Note that if the SLX UI window is shrunk, the Recent Projects section may behidden.

To prevent the Welcome Screen from being displayed every time SLX starts, uncheckthe checkbox at the bottom right of the screen ( Figure 2.9).

Figure 2.9: Disable Welcome Screen at startup

2.3.2 Importing an Existing SLX FPGA Project

To import an existing project or a Silexica sample project, click the down arrow symbol(H) next to the Import SLX Project button, and select the SLX FPGA Project optionfrom the drop-down menu. This is illustrated in Figure 2.10. The Silexica sampleprojects can be found under the examples/fpga subfolder in the local SLX installation.

Z An SLX FPGA Project is a folder that contains a spec/ subfolder and the filedefines.mk within the subfolder that is edited with the configuration editor.Only folders of this form are suitable for importing.

12

http://www.silexica.com



Figure 2.10: Importing an SLX FPGA project

To import the sample SLX FPGA project workshop_fpga, scroll to the applicationexamples/fpga/workshop_fpga as shown in Figure 2.11. Check the desired appli-cation and then press Finish (bottom right corner). For the FPGA projects underexamples/fpga which are distributed with SLX, the Copy projects into workspace entry(bottom left) is preselected and cannot be altered. This ensures a fresh copy of theproject will be worked on that does not directly effect the files in the SLX distribution.After this step, the SLX FPGA Project is displayed in the workspace.

Figure 2.11: Selecting an SLX FPGA Project to import

13


Z For external applications, the Copy projects into workspace option is not en-abled by default. If kept disabled, the project files are referenced from theiroriginal location and any changes made will be applied to the original.

2.3.3 Importing and Converting an Existing SLX C/C++ Project

With an existing C/C++ application that was already imported as a SLX C/C++ Project,SLX will automatically convert it for use with SLX FPGA. The conversion enables cer-tain configuration variables, disables others that are not applicable to SLX FPGA andchanges the toolbar menus accordingly.

• After importing a C/C++ Project (refer to the SLX C/C++ manual at ./SLXC-Cxx.pdf on how this can be done) right-click on the project’s icon in the SLXProjects tab.

• Next, select Configure Project in the pop-up menu. From the submenu, selectAdd FPGA Nature as shown in Figure 2.12.

The project will remain a C/C++ Project for which the SLX FPGA features are nowalso available.

2.3.3.1 Converting an SLX C/C++ Project to SLX FPGA using Platform Selection

It is also possible to convert a C/C++ Project to an SLX FPGA Project when selecting aplatform model that supports SLX FPGA (i.e., a platform that has an FPGA fabric). Asa prerequisite, ensure that the confirmation dialog option in the SLX FPGA preferencepage is checked as shown in Figure 2.13.

Z The FPGA Preference Page can be opened by going to menu Windows >Preferences and selecting SLX FPGA Project

In the existing or newly imported C/C++ project, open the Configuration Editor andselect a platform that supports FPGA (e.g., zcu102). A confirmation dialog for theoption to convert from C/C++ to FPGA ( Figure 2.14) is displayed. The conversion canbe accepted by pressing the yes button.

2.3.4 A Look at the Project Files

SLX FPGA projects have a specific directory organization. A spec subfolder containsthe defines.mk configuration file, which is used to control the SLX FPGA featuresand options. More information about the defines.mk configuration file can be found inChapter 7. The Configure Project editor (Section 2.7.1) is a guided interface to edit theproject configuration.

14


Figure 2.12: Converting a C/C++ project into an SLX FPGA Project

2.3.5 Importing an Existing C/C++ Source Code Tree

The minimal example project created in Section 2.3.1 can be repurposed to analyzean existing application in which the source code and build system (e.g., GNU Make )already exists. The following procedure can be used to import an existing source tree.

1. Create an empty SLX FPGA Project (see Section 2.3.1)

2. Remove the trivial example code generated by SLX FPGA

3. Import your source tree in the SLX FPGA Project using the Import wizard acces-sible from the File menu or by directly dragging and dropping the source files andfolders on the main window of SLX FPGA.

4. Configure SLX FPGA to call the relevant commands to build and run your appli-cation using the Configuration Editor (see Section 2.6.1).

15


Figure 2.13: Show confirmation dialog for C/C++ to FPGA project conversion check-box

Figure 2.14: C/C++ to FPGA confirmation dialog

2.3.6 Importing a Xilinx Vivado HLS Project

Z The following section is only relevant to Xilinx Vivado HLS Projects.

To import an existing Xilinx Vivado HLS project from the Welcome screen, click"Import Vivado HLS Project" from the Quick Start: Import section.

To import an existing Xilinx Vivado HLS project from the Workspace, click the downarrow symbol (H) next to the Import SLX Project button on the main toolbar, andselect the Xilinx Vivado HLS Project option from the drop-down menu.

16


Figure 2.15: Import Xilinx Vivado HLS Project from the Welcome Screen

Figure 2.16: Import Xilinx Vivado HLS Project from the Main Toolbar

In order to import a Vivado HLS project, the project file vivado_hls.app must bepresent. In addition, all design files, testbench files and settings (e.g. compiler andlinker flags) must be included in the file.

17


Ideally, the solution file <solution_name>/<solution_name>.aps should also bepresent. Otherwise, the FPGA Part and the clock frequency need to be set manu-ally in the SLX FPGA Configuration Editor after import.

SLX FPGA must be configured with the location of Xilinx tools. Click the link thatopens up the Preferences page. The path to Xilinx tools can be entered manuallyor selected using the "Browse..." button. For detailed instructions, see Section 2.4.2:Configuring SLX to use Xilinx.

Figure 2.17: Select the path to Xilinx Tools

After the path to Xilinx tools has been set, browse for the Xilinx project to be im-ported. Select the desired project from the list and click Finish.

18


Figure 2.18: Selecting the Xilinx project to be imported

SLX will then start the import process. The progress, along with any errors, warn-ings and notes, is displayed in the Console window.

After the import process is complete, the project files are imported into SLX FPGA.A project directory is created with the spec and silexica project directories. SLXFPGA will attempt to import all source files, header files and data input files into thenew SLX FPGA project directory in the current workspace. The settings (compilerflags, linker flags, FPGA part, clock frequency) are automatically configured from theinformation in the active solution of the Vivado HLS project.

19


Figure 2.19: Vivado HLS project imported as a SLX FPGA project

In the case that a Vivado HLS project does not contain project or solution files (i.e.,only the source files and the TCL script are available), the project and solution files canbe created by running the TCL script in Vivado HLS with the command vivado_hls -f<name>.tcl.

Please make sure that the TCL script executes at least the following TCL com-mands:

• open_project: command to create the project file.

• add_files: command to add the design and testbench files and optionally setcompiler flags.

• set_top: command to set the top-level hardware function.

• open_solution: command to create a solution file.

• set_part: command to configure the FPGA part in the solution.

• create_clock: command to configure the clock in the solution.

• csim_design: command to set linker flags and the command line arguments forC simulation. Only needed for SLX FPGA import if linker flags or command linearguments are needed.

20

2.4. Preparation and Setup

After the TCL script containing the above commands has finished execution, theproject and solution file are generated and the Vivado HLS project can then be importedinto SLX FPGA.

2.4 Preparation and Setup

This section describes the configurations that should be set before working on a project.

2.4.1 SLX Preference Configuration Page

General preferences applicable to every project can be configured using the dialogshown in Figure 2.20 by clicking Window, Preferences and selecting SLX from the listto the left of the dialog.

Figure 2.20: General SLX preferences applicable to every project

The following options can be configured:

• Save required dirty editors before launching: controls the prompt to save any dirtyeditors before an action is launched. The settings are:

– Always - saving dirty editors will not display a prompt, and editors are auto-matically saved.

21


– Prompt - saving dirty editors will display a prompt before launching an action.

• Automatically trace when required : when selected, no dialog will be displayedwhen a project needs to be traced; that is, tracing will always be performed.When the option is deselected, a confirmation dialog is displayed every time anaction requires a project to be traced.

• Debug Mode: displays the internal commands triggered by the graphical interfacein the console. Debug mode provides additional information to Silexica Supportwhen reporting a bug.

• Always do backup before clean: when enabled, a copy of the project is backedup every time the project is cleaned, saving its former state before erasing it. Thebackups are saved in directories named backup_ followed by the backup date,directly in the project root directory. When the option is disabled, no backup isperformed before cleaning.

• Show confirmation dialog for backup: complements the previous option by show-ing a confirmation dialog before every backup. Using the dialog, it is possible todecide for every backup whether it should be aborted or performed.

• Call Graph filter threshold : sets the maximum number of nodes displayed bydefault in the callgraph view. More nodes can be displayed afterward if neededusing other options in the view. This option avoids memory over consumptionwith unexpectedly large callgraphs.

• Experimental Features: lists the experimental features currently enabled with theconfigured license.

2.4.2 Configuring SLX to Use Xilinx

SLX FPGA must be configured with the location of the Xilinx tools in order to properlyuse them. Navigate to the general preferences window (Window>Preferences).

A path to the Xilinx tools can either be entered manually or selected using theBrowse button. Additionally, the release name for one of the supported toolchains canbe selected using the version selector. Any path to Xilinx or Vivado installations canbe selected, and SLX FPGA will automatically select the path to the Vivado toolchainwhen they are available.

22

2.5. A Look at the Toolbar and Menus

Figure 2.21: The Xilinx Setup Preference page

2.5 A Look at the Toolbar and Menus

The location of the main commands used throughout the flow should be familiarized.These commands can be accessed from several locations, such as the main toolbar(see Figure 2.22), and the SLX menu, which can be accessed from the menu bar, oras a context menu from any SLX FPGA Project (see Figure 2.23). Each button orcommand has a well defined functionality.

Figure 2.22: SLX FPGA toolbar, showing the buttons for the different flow steps

23


Figure 2.23: SLX FPGA Eclipse menu, showing the commands for the different flowsteps

The first set of commands correspond to the initial steps that a user would executewhen developing an application with SLX FPGA, and are explained in Section 2.6.These commands are used in the initial development cycle to ensure that the applica-tion can be compiled, executed and that it carries the intended behaviour. They are:

• Configure Project

• Clean Project

• Run Code

The next set of commands correspond to the tools in the recommended workflowfor using SLX FPGA to optimize the application for execution. These commands areexplained in more detail in Section 2.7.6.

• The Function Mapping editor is a centralized interface that contains a staticcallgraph, a filter view, and a properties view for every function of the application.

• The Find and Optimize Parallel Loops command invokes the analysis and opti-mization phase to produce optimized IP blocks for selected application functions.

• The Generate HLS-Aware Code and Synthesize commands aids in annotat-ing the application code with HLS pragmas, and call Silexica’s integration layerwith Xilinx tools (Vivado HLS).

The SLX Hints, SW Call Graph, Code Analysis Graph, and MemoryAnalysis buttons open additional views to gain application understanding and removeperformance bottlenecks and HLS issues. These views are explained in Section 2.8.

2.6 Build, Run and Debug Applications

Once a project has been imported, based on the project type, a new set of buttons thatenable the development and optimization of the target application is displayed. Thefirst group of buttons is intended to provide a convenient way to configure the project,

24


2.6. Build, Run and Debug Applications

Figure 2.24: Invoking the Configure Project editor

clean intermediate files, develop, test and debug the application on the host before anydetailed analysis and optimization is performed by SLX FPGA. From left to right, thesebuttons are:

Configure Project : edits the configuration of the project. Details of the availableconfiguration options are provided in Section 2.7.1.

Clean Project : cleans the generated files for the project. A backup of prior resultscan optionally be performed.

Run Code: executes the compiled application code. Development, optimization,debugging and validation of the target application is performed first on the host.It is recommended to ensure that the application functionality is correct beforeperforming the more powerful (and time consuming) analyses supported by SLX.

2.6.1 Configure the Build System

The Configure Project editor is the primary mechanism used to configure and controlSLX FPGA. To open the configuration editor, click Configure Project (Figure 2.24) fromthe SLX menu or the icon from the main toolbar.

The options to configure how an application is compiled and executed are groupedin the Build Options group. The following fields are available:

• Prepare: defines an optional command used to set up the environment beforestarting the compilation. See Section 7.1.1 for more details.

• Clean: the command to clean up the build system and output files. (CleanProject). See Section 7.1.3 for more details.

• Change Tracing Mode: an optional command used for transitioning between theinstrumented (Silexica) or non-instrumented (user) link modes. See Section 7.1.2for more details.

25


Figure 2.25: Build Options section for the Configure Project editor

• Build: location where source files will be built, for both the instrumented (Silexica)and non-instrumented (user) modes. See Section 7.1.5 for more details.

• Run: location for running the compiled code (executable). See Section 7.1.6 formore details.

Z For complex projects, special compilation flags or paths to specific directoriesmay be required. All supported configuration flags can be found in Chapter 7.For the example workshop_fpga project, the aforementioned settings are suffi-cient.

Figure 2.26 shows the dialog box displayed when attempting to proceed when Con-figure Project editor settings are not saved.

Z To see how a Makefile can be used to conveniently execute build and runcommands, refer to the sample project at Section 7.5.1.

2.6.2 Run Code

To execute and test the functionality of the application, click Run Code (Figure 2.27).This will build and run the application on the host in which SLX FPGA is installed, anddisplay the output in the SLX console.

i The C/C++ code is assumed to be executed in compliance with the C/C++standard. The main() function must terminate by returning 0, either by returnor by a call to exit(). Programs that return by a non-zero value are supportedif Allow Non-zero Exit is set in the Configure Project editor. The same holds for

26


Figure 2.26: Saving the Configure Project options

compliant programs that terminate unsuccessfully by calling exit with a valueother than EXIT_SUCCESS.

� An example of a non-compliant program is void main(){} as the actual returnvalue is implementation dependent and thus defined by the compiler. This kindof program can lead to an invalid result comparison between a plain host runand a tracing run used to get dynamic data for application analysis.

2.6.3 Debug Application Code

During the development process, the application should be first tested on the host toensure that the functionality is correct before more expensive tracing and analysis areperformed using SLX. It is common to discover unexpected conditions that cause theapplication to fail with an error or behave incorrectly.

Silexica leverages debug capabilities included in the Eclipse IDE for debugging theapplication. This section explains basic steps to create a customized Eclipse debugconfiguration for the target application (extensive documentation is available as part ofthe Eclipse IDE. The workshop_fpga application is used as an example.

Figure 2.27: Run Code for the application

27



Figure 2.28: Context menu to reach the Debug Configurations dialog.

1. The first step required to debug an application on the host is to create a newdebug configuration. The debug configuration creation dialog can be reached byright-clicking on any file of the project and selecting Debug As, and then selectingDebug Configurations from the context menu. The process to reach the DebugConfigurations is displayed in Figure 2.28.

2. The next step is to create a new Debug configuration for the C/C++ application.This is achieved by selecting C/C++ Application from the left-most panel, and byclicking on the New button as shown in Figure 2.28. Note that as Silexica utilizesthe existing application’s build system, the Disable auto build option in the Maintab of the newly created debug configuration should be selected.

28



Additional instructions are provided in the dialog. An empty debug configurationis created where the binary to debug should be selected, along with any otherparameters that are necessary for the correct execution of the application. Amongthese parameters are:

• Application arguments• Environment variables that the application reads• The debugger to be used• The initial breakpoint• The location of the application source files• ...and other details

Figure 2.31 shows a newly created debug configuration, where a user has selectedthe workshop_fpga binary as the target to debug. As this application does not take anycommand line arguments, and does not read any configuration variables, the configu-ration can be saved and switched to the debug mode. For more details of the availableconfigurations for the debug mode, please consult the Eclipse IDE documentation.

i Before debugging the application, make sure that the host binary has beengenerated by going over the steps presented in Section 2.6.2. The applicationbinary should be automatically filled; it can also be found by clicking on theSearch Project or the Browse buttons shown in Figure 2.31.

After saving the debug configuration by clicking the Debug button, a messageprompts to switch to the Debug perspective. In the debug perspective (see Figure 2.31),it is possible to step through an application, set break points, print variables and readthe stack trace, among others. This allows evaluation of application functionality indetail and makes it easy to spot and solve unexpected or unwanted behavior. It is pos-sible to switch back to the SLX perspective from the Debug one by selecting it from theupper right corner of the Debug view.

29


Figure 2.29: Newly created debug configuration for the workshop_fpga project

Figure 2.30: Prompt to confirm switching to debug perspective

30


Figure 2.31: Debug perspective for the workshop_fpga application.

31


2.7 Application Transformation towards Optimized IPBlock

The function for which a hardware accelerator will be created, in the form of an IP block,is named the Top-Level Hardware function, in Xilinx terminology.

First, a Top-Level Hardware Function can be selected to generate an IP block inthe configure project area. Next, the focus goes to the Function Mapping editor. Thisfeature provides a centralized interface to facilitate access to the most important SLXFPGA features: checking for synthesizability, finding and optimizing parallelism andperforming manual design space exploration for the supported HLS pragmas. TheFunction Mapping editor enables an iterative flow that provides early feedback to thedeveloper.

After being satisfied with the results, the generated code is synthesized using Vi-vado HLS . The following code analysis tools can be exploited to further aide in refiningcode: SLX Hints, SW Call Graph, Analysis Graph and Memory Analysis, which areexplained in latter sections (Section 2.8).

Configure Project : edits the configuration of the project. Details of the availableconfiguration options are provided in Section 2.7.1.

Function Mapping: provides access to an editor that enables the performanceof main tasks related to creating an optimized IP block (i.e. checking for synthe-sizability, analyzing selected functions for parallelism, triggering SLX FPGA opti-mization to properly exploit HLS pragmas, and performing manual design spaceexploration (if desired)). It allows the selective mapping of synthesizable functionsto the FPGA.

Find and Optimize Parallel Loops: analyzes all functions in the program to dis-cover hidden parallelism, and runs optimization algorithms to use HLS pragmasto create an IP block providing the least latency for the available resources.

Generate HLS-Aware Code: generates C/C++ code with HLS pragmas that wereautomatically calculated by SLX FPGA, or manually setup by the user, using theFunction Mapping editor. Unsupported (non-synthesizable) functions, for whicha synthesizable alternative is available, are automatically rewritten by the codegeneration engine. The generated files are placed under the output/codegen/hls directory. Finally, a set of hints explaining the taken design trade-offs can beviewed in the SLX Hints tab.

Synthesize: combines generated code with integration scripts for Vivado HLS. Asa result of this process, a bitstream for the hardware accelerators is produced.

32

2.7. Application Transformation towards Optimized IP Block

2.7.1 Configure Project

The Configure Project editor is the primary mechanism for the user to configure andcontrol the behavior of the SLX FPGA tools. To open the Configuration Editor , click onthe button Configure Project (Figure 2.32).

Figure 2.32: Invoking the Configure Project editor for the project

The default appearance of the SLX FPGA project Configuration Editor is shown inFigure 2.33. The Configuration Editor groups the options into four categories:

• Basic Options

• Build Options

• HLS Options

• Array Partitioning Options

• Find Parallelism Options

33


Figure 2.33: The Configuration Editor

Each option in the Configuration Editor corresponds to a configuration variable,which can affect the application compilation, the analysis performed by SLX, and thepresentation of the results. For a detailed explanation of the configuration variables,refer to Chapter 7. The first portion of the editor shows the Basic Options.

Note that if Xilinx is not configured yet, the following warning is displayed (Fig-ure 2.34). To configure Xilinx, see Section 2.4.2.

Figure 2.34: Message when the Xilinx location isn’t configured

2.7.1.1 Configure Project - Basic Options

The Basic Options options include:

• Synthesis Flow: enables the selection of the desired flow, either Vivado HLS orSDSoC.

34


– Vivado HLS: This flow is intended for designers seeking to create a cus-tomized IP block for a given C/C++ algorithm implementation, as explainedin this user guide. The IP block will later be connected to a system usingVivado IPI. This flow is selected by default when a new project is createdand when an existing C/C++ project is converted to an FPGA project.

– SDSoC: This flow is intended to accelerate a complete application by choos-ing which functions to map in the FPGA, and which ones to run in the pro-cessing subsystem of a Xilinx device that includes CPUs. For more infor-mation on this flow, refer to the SLX FPGA SDSoC User Guide user guide.As SDSoC is only supported on Xilinx 2019.1, this flow is disabled if Xilinx2019.2 is configured as the version that SLX FPGA should use.

• FPGA Part: corresponds to one of the supported Xilinx devices, which belongs toa device family and an architecture. In order to use the device, the platform filesassociated with the device must be installed as part of the Xilinx tools. Pleasesee Section 9.3 for details. The supported architectures are:

– kintexu (Kintex UltraScale)

– kintexuplus (Kintex UltraScale+)

– virtexu (Virtex UltraScale)

– virtexuplus (Virtex UltraScale+)

– virtexuplusHBM (Virtex UltraScale+)

– virtexuplus58g (Virtex UltraScale+)

– zynq (Zynq-7000)

– zynquplus (Zynq UltraScale+)

– zynquplusRFSOC (Zynq UltraScale+ RFSOC)

• Xilinx Platform Archive: enables the import of an existing platform created usingVivado IP Integrator (IPI). The Platform Archive consists of a pre-implementedsystem in a DSA (Xilinx Device Support Archive) file or a collection of DSA filesinside a compressed archive. DSA files contain the part name (FPGA Part) forthe pre-implemented system it targets.

• Xilinx Platform Name: corresponds to the target development name that wasgiven by the creator of the DSA to the platform. It can correspond to a devel-opment kit that comes pre-installed with Xilinx tools, or an arbitrary name. Thename is selected from a list generated by loading a Xilinx Platform Archive.

The following steps describes the process of configuring the Basic Options. At thestart of the configuration, when no FPGA Part is selected, the configuration detailsdisplay the warning FPGA Part is not valid and the FPGA Part box is highlighted inred. The user can:

35


1. Select an FPGA Part (see Section 2.7.1.2)

2. Select a Xilinx Platform Archive (see Section 2.7.1.3)

2.7.1.2 Select an FPGA Part

Clicking the Select button opens the Part Selection dialog (Figure 2.35), which displaysa list of selectable parts along with the following columns:

• Part

• Architecture

• Device

• Package

• Speed

• Temperature

• LUT (Lookup Table)

• FF (Flip-Flops)

• DSP (Digital Signal Processing)

• BRAM

• URAM

Each column of the Part Selection dialog can be filtered individually (Figure 2.36).To select the desired Part, double-click on the Part or select the Part and click "OK".

36


Figure 2.35: Selecting an FPGA Part

Figure 2.36: Column filters in the FPGA part selector

37


Clicking the Show button after an FPGA part has been selected opens the FPGAPart details screen, which displays the properties of the chosen platform, such as:

• Voltage Domains

• Frequency Domains

• Scheduling Policy List

• Schedulers

• Processors

• Processor Power Models

• Memories

• Caches

• Logical Links

• Communications

• Programmable Logic

For more details on the FPGA Part details screen, refer to the Platform and CoreModeling Guide guide.

Figure 2.37: Show FPGA Part

38


Figure 2.38: FPGA Part Details

Note that if the path to Xilinx tools is not configured yet, the following warning isdisplayed (Figure 2.39). Configuring the path will fix errors and warnings related tomissing include paths in the code editor window when source files are displayed. Toconfigure the path to Xilinx tools, see Section 2.4.2.

Figure 2.39: Message when the Xilinx location isn’t configured

2.7.1.3 Select a Xilinx Platform Archive

Instead of selecting a SLX Platform manually, a Xilinx Platform Archive (as explainedin Section 2.7.1.1.) can be selected as displayed in Figure 2.40. Select the platformby clicking on the Select button and navigating to the location of a DSA file or DSAarchive, and click Open. The Xilinx Platform Name dropdown is populated with the listextracted from the DSA file or archive. Choose a Platform Name from the dropdown(Figure 2.41).

The FPGA Part is assigned from the information within the DSA file or archive if itis an installed SLX platform.

If the FPGA part is changed (i.e., a different FPGA Part is selected) after a validDSA archive has already been imported, the Xilinx Platform Archive and Xilinx Plat-

39


Figure 2.40: Select Xilinx Platform Archive

Figure 2.41: Choose Xilinx Platform

form Name are cleared. Re-select the values for these fields as in Section 2.7.1.1. Ifthe Synthesis Flow is changed to Xilinx SDSoC, some of the selections under BasicOptions are invalidated and must be re-selected to match the Xilinx SDSoC workflow(refer to the SLX FPGA SDSoC User Guide for more information).

40


Figure 2.42: Basic Options configuration complete

� An SLX FPGA Project can only be linked to one platform at a time. Further-more, the Xilinx device in a Xilinx platform must match the one in the cor-responding SLX platform, for the analysis, optimization, and integration pro-cesses to work properly.

This finishes the Basic Options configuration within the Configuration Editor . Inter-nally, the FPGA_PART, PLATFORM, SDX_PLATFORM_ARCHIVE_PATH and SDX_PLATFORM_NAMEconfiguration variables are written in the defines.mk configuration file (see Section 7for details).

2.7.1.4 Configure Project - HLS Options

The configuration for the workshop_fpga project, with the HLS Options view expanded,is shown in Figure 2.44. The HLS Options are explained below:

• Top-Level Hardware Function: comma separated list that specifies the func-tions to be implemented as optimized IP blocks. All function callees will also beinlined and moved to hardware.

Z For C++ applications, it is recommended to use the link (mangled) names thatuniquely identify a function (e.g., _ZN4myns10mytmplfuncIiiEET_T0_). If a fullyqualified name (e.g., myns::mytmplfunc<int, int>), a display name (e.g.,mytmplfunc<int, int>) or a base name (e.g., mytmplfunc) is able to uniquelyidentify the function, these can be used as well.

If a function name is ambiguous (e.g., the base name is specified, butthere are multiple functions with this base name), the function is not acceptedand a warning similar to the following is displayed:

slxcmd:0:0: warning: Multiple alternatives found for function’thefunc’ in FUNCTIONS: ’_ZN3ns17thefuncEi’, ’_ZN3ns27thefuncEi’.Selecting none of those by default. Please use a non-ambiguous

41


function name in the configuration.

One may also use the extern "C" at the function declaration (for bothfunction definition and function prototype). This allows the compiler to treatthe function as a C routine, allowing the actual name of the SLX "Top-LevelHardware Function" to be used. See Figure 2.43 for details.

Z Synthesizable functions can also be set as Top-Level Hardware Functions byright-clicking on the function in the Function Mapping Graph and selecting"Change to Top-Level Function". See Section 2.7.2.1 for details.

• FPGA Clock Frequency in MHz (megahertz): used to specify the frequencyconstraint for the hardware implementation that the HLS tool will attempt to derive.The default setting is 100MHz.

� The actual Clock Frequency accepted by the system is an interpolated whole-number value based on the user input:

* Entering a Clock Frequency of lower than 100MHz will register as100MHz internally.

* Entering a Clock Frequency between 100MHz and 600MHz will round upto the nearest hundred: e.g., entering 105MHz will register as 200MHzinternally.

* Entering a Clock Frequency of greater than 600MHz will register as anerror.

• Synthesis Strategy: specifies whether only the high-level synthesis estimatesfrom the third-party high-level synthesis tool will be considered, or whether afull logic synthesis and implementation including place-and-route should be per-formed.

• Estimation Mode: specifies whether the optimization should focus on improvingthe best case or the worst case. In many situations, estimated performance isperformed over a range of scenarios (e.g., when loops have variable iterationcounts).

• Code Generation Mode: specifies the programming paradigm for which codewill be generated. Currently, only the generation of Xilinx HLS pragmas is sup-ported.

42


Figure 2.43: Using extern "C" at function declaration for top-level function.

Figure 2.44: Project configuration dialog with the HLS Options view expanded

43


The Xilinx HLS compiler allows the partitioning of small arrays into individual ele-ments that can be accessed in a single clock cycle (i.e., registers). SLX FPGA allowsthe user to configure the maximum size of the arrays that are automatically split (parti-tioned). This configuration (described below) is available in the Array Partitioning optiongroup.

• Complete Partitioning Limit (elems): defines when complete array partitioningmust be forced depending on how many elements are in array variables. Whenset to 0, complete array partitioning is never forced, Xilinx toolchains are left freeto automatically partition arrays when needed. Other values define the maximalsize of arrays that should always be completely partitioned, in the number ofelements.

Attempting to proceed without saving the selections made in the Configure Projecteditor will result in a dialog box prompting whether to save the configuration options,as shown in Figure 2.45.

Figure 2.45: Saving the Configure Project options

2.7.1.5 Configure Project - Occupied Area Resources

The Occupied Area Resources section is used to tweak the amount of FPGA arearesources available for the optimization of the Hardware to be implemented on theFPGA. These options are only visible when HLS Options is unfolded. The fields aredisabled if the FPGA Part is not selected.

The configurable options include Lookup Tables (LUTs), Flip-Flops (FFs), BlockRandom Access Memories (BRAMs) and Digital Signal Processing Slices (DSPs). Ul-traRAM Blocks (URAMs) are only available for some Xilinx devices.

The following methods can be used to specify the amount of available area re-sources:

• Using the increment/decrement (plus/minus) buttons to select a value or by en-tering the desired value in the resource field.

44


• If a Xilinx platform has been imported and specifies the available area resources,these values will be used to automatically fill in the relevant available area re-source fields.

The maximum available area per resource on the FPGA is specified at the endof row. Note that it is possible to enter more than the maximum number to explorehypothetical devices; if a value beyond the maximum is inputted, a yellow warningis displayed with the message "More resource are used than what is available on theselected device" where resource is the specific FPGA resource being mentioned.

Figure 2.46: The Available Area section of the Configuration Editor with a value inputfor BRAM that’s higher than the available resources on the FPGA

The following sections explain the entries for specifying occupied area resources inmore detail.

Lookup Tables (LUT): the LUT variable is the number of combinational logic resources(lookup tables or LUTs) available in the FPGA for this design. Specify a value higherthan the device total to investigate the outcome on hypothetical larger devices.

Flip-Flops (FF): the FF variable is the number of register resources (flip-flops orFFs) available in the FPGA for this design. Specify a value higher than the device totalto investigate the outcome on hypothetical larger devices.

45


Block Random Access Memory (BRAM): the BRAM variable is the number of block RAMresources (BRAMs) available in the FPGA for this design. Specify a value higher thanthe device total to investigate the outcome on hypothetical larger devices.

Digital Signal Processing Slices (DSP): the DSP is the number of hardwired DSPblocks (DSPs) available in the FPGA for this design. Specify a value higher than thedevice total to investigate the outcome on hypothetical larger devices.

Ultra Random Access Memory (URAM): the URAM is the number of UltraRAM memoryresources available in the FPGA for this design. Specify a value higher than the devicetotal to investigate the outcome on hypothetical larger devices. This option is onlyavailable for some Xilinx devices.

2.7.2 Function Mapping Editor

The Function Mapping editor allows all the mapping parameters of the FPGA designto be configured. In particular, the editor allows selected functions to be mapped to theFPGA and their hardware implementation to be tweaked by factors such as utilizingparallel loops. The most important parameters of the design can be configured man-ually; SLX FPGA can also automatically determine an efficient configuration for theseparameters.

To open the function mapping editor, click on the function mapping toolbar button( ). It is recommended to start the workflow by first clicking on the function mappingeditor as it provides tools relevant to the early stages of hardware implementation. TheFunction Mapping editor consists of:

• The application call graph, called the Function Mapping Graph

• The Function Mapping View , providing a table-based representation of the appli-cation functions

• The Function Properties View , exposing the properties and design configurationof particular functions

The functionality of the views are interconnected. The Function Mapping is illus-trated in Figure 2.47 and its main views are presented in depth in the following subsec-tions.

2.7.2.1 Function Mapping Graph

The Function Mapping Graph is a static call graph: its nodes are functions in the ap-plication and its edges represent function calls in the source code. Functions in theapplication are represented using rounded rectangles and contain several properties:

• Function names are shown at the top of the rectangle.

46


Figure 2.47: Function Mapping Editor

• Border colors of the nodes represents where functions are mapped: the non-mapped part of the testbench have a blue border while functions mapped to hard-ware have a red border.

• The top-left icon represents the synthesizability status of functions. A questionmark ( ) represents a function that has not yet been checked for synthesizability.Functions which can be mapped to the FPGA as Top-Level Hardware Functionsdisplay a green check mark ( ) while those which need to be rewritten to bemade synthesizable display a red cross ( ).

• A function mapped to the FPGA as a top-level function also contains a star atthe top-right part of the node ( ). More than one function can be set as top-levelfunction, resulting in a distinct Vivado HLS project for every top-level function.

Figure 2.47 shows a sample Function Mapping Graph. In the figure, hwscale_-accum is mapped to the FPGA as a top-level function, while swscale_accum is a function

47


that has not yet been checked for synthesizability. It is possible to select one or morefunctions by clicking on the graph nodes. Depending on the selection, different actionsare available in the context menu displayed. The following actions relate to the overallgraph and are available everywhere in the graph, independently from the selection:

• Auto-select FPGA functions: automatically maps functions to FPGA and opti-mizes all synthesizable functions whose execution time is above a configurablethreshold (Section 7.2.3)

• Find and Optimize parallel loops: runs the application, analyzes the outcome touncover parallelism for all functions mapped to the FPGA, and optimizes the re-sulting FPGA design. This action can also be triggered by clicking on the relatedbutton in the toolbar ( ).

When one or many functions are selected, the following actions are available throughthe context menu of the Function Mapping Graph:

• Check Synthesizability : triggers synthesizability checks for the selected func-tion(s), identifying which functions can be made top-level FPGA functions andwhich ones cannot due to incompatible constructs. The action can only be trig-gered on functions which have not yet been checked for synthesizability.

• Map to FPGA: maps the selected function to the FPGA. This action is availableonly for functions which are synthesizable.

• Change to top-level : transforms a FPGA function into a top-level FPGA function.This action is only available for synthesizable non top-level FPGA functions.

• Un-map from FPGA: transforms a top-level FPGA function into a non top-levelone. This is only available for top-level FPGA functions.

• Show related hints: opens the hints panel and pre-filters the view to focus on theselected function only.

• Hide this: temporarily hides the selected function from the graph. The functioncan be made visible again by using the Function Mapping view.

• Go to source...: navigates to the source code of the selected function.

Figure 2.48 illustrates the actions available in the context menu for the given exam-ple. The different actions are enabled or disabled according to the current status of theselected functions.

i Functions initially mapped to hardware in the project configuration are immedi-ately tested for synthesizability and mapped to FPGA when opening the Func-tion Mapping Editor . To use this feature, please refer to Section 7.4.3.

48


Figure 2.48: Context menu actions in the Function Mapping Graph

Figure 2.49: Notification on top of the Function Mapping Graph guiding towards thenext step in the flow.

SLX FPGA will display notifications at the top of the Function Mapping Graph ateach step of the design process. The notifications are displayed using a blue bannerand provides information relevant at the current stage of the design flow. Figure 2.49shows one such banner. The links (underscored text) in the banner can be used totrigger actions or open views relevant to the current state.

2.7.2.2 Function Mapping View

The Function Mapping View displays an alternate representation of the application bylisting all the functions in a table. The table displays all the properties associated tofunctions in different columns that can be filtered and sorted. The following propertiesare available for the functions:

• Selected : whether the function is part of the current selection. The checkboxesin this column can be clicked to modify the selection. Unlike simply clicking onthe rows for highlighting them on the Function Mapping Graph, selected rows arepreserved after filtering or sorting the table. The selection can be used to triggeractions on a group of functions, as detailed below.

49


• Visible (Yes/No): whether the function is currently displayed or hidden in the Func-tion Mapping Graph.

• Synthesizability (Yes/No/Maybe): synthesizability status of the function. The"Maybe" status indicates that the function has not yet been checked for synthe-sizability.

• Mapping (FPGA/Unmapped): indicates whether the function is mapped to theFPGA.

Figure 2.50: Function Mapping View displaying the functions in the application in atable

Clicking (highlighting) rows on the Function Mapping View also highlights them inthe Function Mapping Graph. Multiple rows can be highlighted at the same time usingthe usual keyboard shortcuts.

Z SLX FPGA does not perform validation on user selections, and performing aninvalid selection can result in sub-optimal implementations that may lead tofailures during the synthesis process.

The following actions are available through different buttons above the table:

• Select : selects all the rows currently displayed. The rows hidden by the currentfilters will not be selected.

• Deselect : deselects all the rows currently displayed. The rows hidden by thecurrent filters will not be deselected.

• Deselect All : deselects all the rows in the table, including those currently hiddenby filters.

50


• Apply selection: opens the menu of actions available for the current selection(rows whose Selected checkbox is checked). Depending on the functions se-lected, different actions may be available. For instance, Check synthesizability isavailable only if the selection contains "Maybe Synthesizable" functions.

2.7.2.3 Function Properties View

The Function Properties View presents details about the function currently selected inthe Function Mapping Graph.

When no functions are selected, the Function Properties View displays the Layoutoptions where the visual layout of the graph can be modified.

• The layout can be switched between RADIAL or TREE format

• The node separation can be customized to be spread out or compact

• For the tree layout, the spacing between layers and the direction of the tree canbe customized

• Several node placement strategies are available for the radial layout of the graph

When a function is selected in the graph, its properties and configuration are pre-sented in the Function Properties View , which is divided into several sections, eachproviding different information. Note that the properties section is interactive, and theinformation that it contains is progressively filled through the workflow (see in Sec-tion 2.1). Therefore, it is recommended to revisit the properties section for the targetfunctions, after SLX FPGA executes one of the supported analyses.

i Upon editing fields in the Function Properties View , the change is immediatelysaved.

General: this section displays basic function details such as the name of the functionand its location in the code.

Figure 2.51: General function properties in the Function Mapping Editor

51


Call Edges: this section presents all the function calls related to the selected func-tion (as a caller or callee). Every entry in the table is a call instruction in the sourcecode. Similarly to the other tables, the entries can be filtered or sorted according to theproperties displayed in the table.

Figure 2.52: Call Edges of a function in the Function Mapping Editor

Parallel Loops: this section displays the parallelism identified by SLX FPGA in theFPGA functions picked for analysis. The section is displayed after a function has beenfound to be synthesizable, Find and Optimize Parallel Loops has been executed, andparallel loops have been detected. A list of loop(s) found within the selected functionare displayed at the region to the left.

Figure 2.53: Parallel loops of a function in the Function Mapping Editor from theWorkshop_FPGA project.

The loops are sorted by the order in which they appear in the source code. If thereare nested loops, the inner loops in the source code are grouped inside their outer loopin hierarchical order. The loops are expanded by default to reveal all inner loops; theycan be retracted by clicking the downward arrow. All loops can be expanded by clicking

at the top right, or rectracted by clicking button.Clicking on a loop populates the details of the loop to the right. The loop num-

ber and its line number are displayed at the top; for example, Loop 2 [workshop_-fpga.c:38] indicates that the loop is in the workshop_fpga.c file and begins at line 38.

52


Figure 2.54: Parallel loops of a function in the Function Mapping Editor displayingnested loops.

A tooltip info is displayed by hovering over the loop name. Clicking the name of theloop navigates to the source of the loop in the code.

Under the loop name are configurations that help take advantage of available par-allelism. These include Pragmas, Unrolling and Pipeling options, which are explainedin Section 2.7.4.

All available Unrolling and Pipelining options can be expanded or retracted usingthe or icons at the top right corner of the loop details section.

Interfaces: this section allows interfaces to be configured for the return and argumentports. In HLS, input and output operations must be performed through a port in thedesign interface using a specific I/O (input/output) protocol. This is accomplished viaInterface Synthesis, where the parameters to the function are synthesized into RTLports.

The Port Bitwidth parameter is used by SLX FPGA to generate the proper array_-reshape directive, which combines multiple elements from the original array to createone with greater word-width. This is useful for improving block RAM accesses withoutusing more block RAM. The Bitwidth of the m_axis, axis, s_axilite, ap_memory andbram interface types can be configured via the width field of the Interfaces section.

• On the left side of the view, all the arguments and return values from the selectedfunction are listed. A precondition is that the function is mapped to the FPGA.

• A filter field above the list allows easy search for a particular argument.

• When an argument is selected in the list, its detailed information is displayed onthe right side of the view. The detailed information include the interface type touse when implementing the port for the selected argument, as well as its configu-ration options. Most of the interface options correspond directly to options of theHLS interface pragma available in Xilinx Vivado HLS.

53


Interface: this section is used to set the amount of data that the IP block can con-sume, divided into input and output bandwidths. The unit of the

For Interface type axis, four types of register modes are provided to control howthe AXI Stream interface registers are implemented. Check the register checkbox toenable the selection of the register_mode. The selection options are:

• default: both (see below)

• forward: only the TDATA and TVALID signals are registered

• reverse: only the TREADY signal is registered

• both: signals are registered in both the forward and the reverse path

• off: None of the port signals are registered

Figure 2.55: Interfaces section showing the register modes for axi

For Interface type m_axi, the max_read_burst_length and max_write_burst_lengthcan be configured with values 2, 4, 8, 16, 32, 64, 128 or 256. These are the number ofdata elements that can be transferred (read or written) by giving only the base address.

54


Figure 2.56: Interfaces section with configuration for the maximum read and writeburst lengths

Bandwidth: this section is used to set the amount of data that the IP block can con-sume, divided into input and output bandwidths. The unit of the value (bps, kbps, etc)can be selected in the drop-down on the right.

By default, no constraints on the input and output bandwidths are applied. Thefields are highlighted in red. The default value appearing in both fields is 0.0; this valueis ignored as long as none of the fields is changed by the user.

• If no values are set via the IN and OUT fields, SLX FPGA recognizes that thereare no constraints on the bandwidths. Note that both fields are highlighted in redin this case (Figure 2.57).

• If either the IN or OUT field is filled, SLX FPGA recognizes the constraint valueon that field in addition to a constraint value of 0.0 on the field with no user input(Figure 2.58).

• If both the IN and OUT fields are filled, SLX FPGA recognizes the constraintvalues of both fields filled by the user (Figure 2.59)

55


Figure 2.57: Default state of the Bandwidth section without user input

Figure 2.58: Bandwidth section with user defined OUT field

Figure 2.59: Bandwidth section with user defined IN and OUT fields

i Note that SLX FPGA computes the data rate (also known as the effectivebandwidth) for each function array argument with the minimum values takenfrom the port bitwidth constraints and the bandwidth specification. This meansthat SLX FPGA will limit the amount of parallelism that can be exploited in agiven parallel loop by considering both bandwidth and interface bitwidth limi-tations. In the case that both are configured, the minimum between these twovalues will be chosen as the limiting factor.

56


2.7.3 Discovering and Resolving Synthesizability Issues

In the Function Mapping Graph, SLX FPGA allows individual functions to be checkedfor synthesizability. This is achieved by right-clicking on the desired function and se-lecting Check for Synthesizability, as depicted in Figure 2.60. SLX FPGA combines itsinternal checker with the one provided in Xilinx tools to ensure that no issues are leftundetected. The icon on the left of the function node in the Function Mapping Graphshows the result of the checks.

• A green checkmark ( ) means that the function can be translated into hardwareby the HLS compiler (Synthesizable)

• A red cross ( ) means that the compiler will not be able to generate an IP blockfor the function (Not Synthesizable)

• A question mark ( ) represents a function that has not yet been checked forsynthesizability

Figure 2.60: Context menu to discover synthesizability issues in a function

If the function is found to be not synthesizable:

• The following notification is displayed: Status is “Not synthesizable”: “<nameof function>” is not synthesizable, click here to investigate. See Fig-ure 2.61.

57


Figure 2.61: Click on hint to investigate synthesizability

• Clicking the link opens a source panel with the function highlighted and centered,as well as synthesizability hints for the function.

• In general, a code rewrite has to be performed for this case to ensure that thesource code can be synthesized.

• Guidance is available by clicking on the question mark icon at the right of the hint,as shown in Figure 2.63.

• More details of the meaning of the different synthesizability hints can be found inSection 2.8.4.

i If the "No valid Xilinx Vivado installation found. Only a fewvendor-independent synthesizability checks will be performed"message is displayed in the console, please follow the instructions given inSection 2.4.2 to make sure that the Xilinx Installation Directory is properlyspecified.

To map a synthesizable function to the FPGA, right-click on the function in theFunction Mapping Graph and click Map to FPGA (see Figure 2.64).

58


Figure 2.62: Synthesizability issues discovered for the swscale_accum function

i Note that a function that is non-synthesizable as a top-level function can becalled inside another function that is synthesizable. This is due to the fact thatXilinx tools can, for example, infer the sizes of arrays passed as parameters tofunctions that accept variable sized arrays. This makes the caller function syn-thesizable as a top-level function, while the callee remains non-synthesizable.For more details on the specific behavior of the Vivado HLS compiler, pleaserefer to the appropriate Xilinx documentation.

2.7.4 Finding and Optimizing Parallel Loops

Exploiting parallelism is essential to producing optimized IP blocks for functions thatare mapped to the FPGA. SLX FPGA is able to automatically extract Data-Level Par-allelism (DLP) and Pipeline-Level Parallelism (PLP) in the application. The tool is ableto flag parallelism blockers and point to them in the source code to ease their removal.

59


Figure 2.63: Guidance for the transformation from non-synthesizable to synthesiz-able code

Figure 2.64: Action to map a synthesizable function to the FPGA

The parallelism uncovered in the application is also used to automatically optimize theFPGA design, as explained in the next section.

� The result of Find and Optimize Parallel Loops will vary depending on whetherall functions found to be Synthesizable have been mapped to the FPGA. If not

60


all Synthesizable functions have been mapped to the FPGA , the generatedpragmas may be suboptimal.

Parallelism detection and optimization can be invoked when a function is first mappedto the FPGA, by:

• Clicking the notification displayed at the top of the Function Mapping Graph.

• Right-clicking on the graph and selecting the Find and Optimize parallel loopsoption in the context menu (Figure 2.66).

• Clicking on the Find and Optimize Parallel Loops button (Figure 2.65).

Figure 2.65: Use Find and Optimize Parallel Loops to find parallelism and optimizethe current mapping

Figure 2.66: Find and optimize parallelism for a function mapped to the FPGA in theFunction Mapping Editor

Tracing the source files (deep analyzing the source code by running the applica-tion) is required to find parallelism. As SLX FPGA utilizes internal algorithms to look for

61


parallelism, the progress is displayed in the Console (Figure 2.67). Note that, depend-ing on the application, both tracing and analysis may not be trivial, and thus requiresome time to complete. Please be patient and wait for the process to finish, as not allinternal details of the progress will be displayed.

Figure 2.67: Console output during finding parallelism

Figure 2.68: Parallel Loops of a function in the Function Mapping Editor

After the parallelism detection process is complete, useful information is extractedfrom the application, including parallelization hints (e.g. for DLP, PLP or TLP). Theseparallelization hints are presented via the SLX Hints view. See Section 2.8.4 for moredetails on the parallelization hints.

The parallelism uncovered and the initial configuration automatically computed bySLX FPGA is visible in the Parallel Loops section under the Properties view of theFunction Mapping Editor . On the left, all the discovered loops are displayed. On theright, configuration options for each loop are displayed:

62


• Unrolling

– Enable: checkbox to enable or disable loop unrolling. When disabled, theloop will not receive any pragma for unrolling purposes

– Unroll Factor : Only used for DLP loops. The unroll factor represents thenumber of times, n that the loop has been replicated. Consequently, thenumber of iterations is divided by n.

– Pragma: an overview of the HLS pragma which will be generated to exploitthe unroll parallelism available for the loop.

• Pipelining

– Enable: checkbox to enable or disable pipeline-level parallelism and the as-sociated pragma.

– Initiation Interval : the number of clock cycles between the start times ofconsecutive loop iterations.

– Pragma: an overview of the HLS pragma which will be generated to exploitthe pipelining parallelism available for the loop.

Z If both PLP and DLP are available for a loop, SLX FPGA allows selecting both.Note that the system is not responsible for failures during synthesis due tomultiple selections.

2.7.5 Hardware Optimization

SLX FPGA will try to maximize the performance of each function that needs to be im-plemented as an IP block, while respecting the resource constraints set by the FPGA,or manually by the user. SLX FPGA applies heuristics to improve the performanceof the detected parallel code patterns while enforcing the constraints imposed by thelimited platform resources. The suggested design is computed during a global opti-mization that simultaneously considers:

• unrolling

• pipelining

• array partitioning / reshaping

• interfaces

The effects of Hardware Optimization can be seen in the timing effects of the designand hints that expose the achieved application speedup (see section Section 2.8.5).

63


The result of the optimization can be visualized and modified. If adjustments to the de-sign configuration are needed, navigate to the Properties section of the Function Map-ping Editor to re-configure the Parallel regions, Interfaces and Bandwidth to achievethe desired effect.

2.7.5.1 Source-Level Partition Highlighting

Based on the parallelism information extracted from the application, SLX FPGA pro-vides a convenient highlighting of source code lines to examine suggested DLP or PLPparallelism. This allows visualization of source code lines that reside in the same par-allel region with a unique background color in the source code editor. To utilize thisfunctionality, use the Navigator view (left) and browse the workshop_fpga.c sourcecode file for the application.

With the source file focused on the editor, color highlighting can be performed byright-clicking on any region within the source code window (to the right) and select-ing Show Task from the context menu. This action will load the workshop_fpga.taskpartitioning file, and highlight the detected parallel tasks (Figure 2.69).

Figure 2.69 shows the functions hwscale_accum() and swscale_accum(). EachDLP partition or PLP stage is highlighted with a distinct color. For instance, the loopsat lines 25–29 and 35–39 have been identified as DLP partitions. Note that the colorsdisplayed are selected at random, and are not guaranteed to be consistent throughmultiple invocations.

2.7.6 Generate HLS-Aware Code

SLX FPGA provides precise implementation hints and code generation to help cre-ate an accelerated version of the application using the HLS compiler from Xilinx. Thissection describes the processes behind this stage. Code that is augmented with HLSpragmas is generated for the input application based on the final partitioning and opti-mization result from the previous section.

The option to generate code is provided as a separate button at the right side of theFind and Optimize Parallel Loops button, as shown in Figure 2.70. Clicking the buttonstarts the HLS code generation process (divided into two steps) as seen in Figure 2.71.

1. The first step allows the visualization of all the code transformations that are sug-gested by SLX, and enable or disable them if desired. In this step, the generatedcode can be refreshed and inspected to see how the code transformations willaffect the original application before applying them.

2. In the second step, the generated code can be manually modified (i.e., modi-fying the pragmas), extra HLS pragmas can be inserted, and the code can beformatted.

This two-step process is guided by the Code Transformation Wizard , shown inFigure 2.72. The functionality of the Code Transformation Wizard is explained in Sec-tion 2.7.6.1.

64


Figure 2.69: Enabling source-level highlighting by loading a task model

Figure 2.70: Clicking Generate HLS-Aware Code

2.7.6.1 SLX FPGA Code Transformation Wizard

As depicted in Figure 2.72, this wizard provides a list of HLS pragmas that can beapplied to the source code. All pragmas are applied to the source code by default, butcan be disabled individually by clicking on the corresponding checkbox if desired.

In order to visualize the differences between the original source code and thepragma-annotated version, the Code Transformation Wizard displays both versionsside by side as shown in Figure 2.73.

• Clicking on a HLS pragma highlights the source location in the original file wherethe pragma is inserted in turquoise.

• By following the editor, one can inspect the generated code.

65


Figure 2.71: HLS-aware code generation process

Figure 2.72: List of HLS pragmas displayed by the Code Transformation Wizard

• If desired, the code generation process can be canceled from both phases of theCode Transformation Wizard , by clicking on the Cancel button.

Clicking the Refresh button applies the currently selected code transformations(Figure 2.74). In the figure, the HLS inline pragma has been disabled, and the gener-ated code is updated accordingly by clicking on Refresh.

1. The first step of the Code Transformation Wizard is completed when the selectedcode transformations has been finished.

66


Figure 2.73: Clicking on HLS Inline Pragma

2. Clicking on the Next button moves to the second phase, which allows the user tomanually modify the generated code. Please note that further pragma selectionsare not allowed during the second step, as illustrated in Figure 2.75.

Going back to the selection phase is possible by clicking on the Back button. How-ever, all manual modifications performed to the generated code will be lost.

Clicking on the Finish button closes the Code Transformation Wizard and savesthe annotated version of the application source in the output/codegen/hls subfolderof the project. This is presented in Figure 2.76.

After closing the code refactoring wizard, the SLX Hints view is updated with HLShints that shows the generated pragmas and provide links to the source code of thegenerated files. Please note that when manual changes are made from the CodeTransformation Wizard , these links may not match the actual source code locationswhere the pragmas are inserted. Furthermore, manual modifications in the generatedcode are not reflected in the SLX Hints. For more details about the meaning of eachindividual hint, please refer to Section 2.8.4 and Chapter 5.

Figure 2.76 is an example where a number of HLS annotations have been added tothe generated code for the program. In function hwscale_accum(), for the loop at lines38 to 48, two pragmas have been inserted: an unroll pragma (line 43) as this loop

67


Figure 2.74: Disabling HLS Inline Pragma from Code Transformation Wizard

was identified with DLP and a loop_tripcount pragma (line 40) providing its numberof iterations. These pragmas have been inserted in the body of the corresponding loop.

i The current release of SLX FPGA supports the generation of HLS prag-mas for Data-Level (#pragma HLS unroll) and Pipeline-Level (#pragma HLSpipeline) parallelism patterns. Additionally, pragmas that enable parallel ac-cess to application arrays (#pragma HLS array_partition), inlining of func-tions (#pragma HLS inline), interface selection (#pragma HLS interface),and loop annotation (#pragma HLS loop_tripcount) are supported. Newpragmas are added with each release of SLX FPGA. For a full descriptionof each supported pragma, please refer to Section 5.2.

2.7.6.2 Automatic Code Refactoring For Synthesizability

Another important feature of Silexica’s code rewriting engine, is the capability to auto-matically replace unsupported, non-synthesizable functions with custom implementa-tions that are compatible with Xilinx tools. The replacement of unsupported functions

68



Figure 2.75: Second phase of HLS-Aware Code Generation Process

is performed for functions that are chosen to be mapped in the FPGA, via the map-ping editor, or with the Top-Level Hardware Function variable in the SLX FPGA projectconfiguration. This causes the code generator to replace the function call to rand(),made in hwscale_accum(), with the supported equivalent, slx_fpga_rand(), as illus-trated in Figure 2.77. The synthesizable function is included in the user’s program byinserting the corresponding include, slx_fpga_synth.h, automatically.

This process is transparent to the user and requires no manual intervention. Thelist of supported non-synthesizable functions for automatic replacement is:

• srand, rand nominally included from stdlib.h

• isalnum, isalpha, isblank, iscntrl, isdigit, islower, isprint, ispunct, isgraph,isspace, isupper, isxdigit, tolower, toupper nominally included from ctype.h

As shown in Figure 2.77, a hint to the generated source code is also created bythe automatic rewriting process, which can be clicked by the user to directly inspect thecode.

i Adding custom implementations for library functions to the automatic coderefactoring mechanism is possible, with minor effort. In the case that there

69


Figure 2.76: The workshop_fpga with HLS pragmas automatically inserted in theportions of the code mapped to the FPGA

is such a need, please contact Silexica customer support for detailed instruc-tions.

� Automatic substitution of non-synthesizable functions is currently attemptedonly on C applications.

70



Figure 2.77: Automatic replacement of rand() with its synthesizable equivalentslx_fpga_rand()

2.7.7 Synthesize

In this step, the generated code is combined with integration scripts for Vivado HLS.As a result of this process, a bitstream for the hardware accelerators is produced.

Figure 2.78: Clicking Synthesize.

To proceed with project synthesis, click the Synthesize button (Figure 2.78). Thiswill run Xilinx Vivado HLS until completion. The time required for the task depends onthe chosen Synthesis Strategy. The execution time will be shorter for Pre-synthesisas this strategy only generates the packaged RTL for the implemented hardware func-tions; in contrast Place & Route will need much longer.

71


Figure 2.79: Partial output from Synthesize for the Vivado HLS flow.

After the synthesis process is completed, the Synthesis Report is displayed. Thereport can also be opened via clicking the Show Synthesis Report button on the maintoolbar (Figure 2.80). The report shows statistics such as Performance Estimates,Utilization Estimates and an Interface Summary (Figure 2.81). In addition, SLX FPGAgenerates HLS hints . These hints are accessible in the SLX Hints tab.

Figure 2.80: Clicking Show Synthesis Report.

Clicking on "Show detailed report" at the top of the Synthesis Report opens a .rptfile that displays detailed information including Performance Estimates (such as Timingand Latency), as well as Utilization Estimates and Interface information (Figure 2.83)

72


Figure 2.81: SLX Viewer for the Synthesis Report created as a consequence of theinvocation of Vivado HLS. The report displays the version, product fam-ily, device, performance and utilization estimates.

73


Figure 2.82: The Synthesis Report (continued) displaying the interface implementa-tion details

Figure 2.83: The detailed Synthesis Report in the form of a .rpt file as generated byXilinx Vivado HLS

74

2.8. Code Analysis Tools and Views

2.8 Code Analysis Tools and Views

SLX FPGA provides a flow to gain insights into the application that can iterativelyimprove the application until the user’s requirements are met. The following sectionsguide the user through this process with the workshop_fpga example application, andis optimized to run on the zcu102 platform.

Application Analysis involves tracing and understanding the application character-istics. The information that is provided by each analysis can be used to remove perfor-mance and resource bottlenecks from the application, to achieve the performance/arearequirements.

SLX Hints: navigates through the generated hints in the SLX Hints view. Initially,hints come from the application analysis performed by Trace Source. Later, par-allelization, speedup, and HLS code generation hints are added to the same tab.All hints allow cross-referencing back to the relevant location in the applicationsources.

SW Call Graph: displays the dynamic call graph of the application. For SLXFPGA, two different versions of the application call graph can be visualized. Thefirst one corresponds to the SW execution times of the application, while thesecond one shows the application behavior once HW acceleration has been ex-ploited. More details about the call graphs can be found in Section 2.8.5.

Code Analysis Graph: displays a graphical representation that combines thefunction execution times with the variable accesses in the entire program, whichcan be filtered over multiple criteria.

Memory Analysis: displays detailed statistics for all the variables in the applica-tion. This includes read and write accesses and source code location of eachaccess. It covers local, global, and heap variables.

2.8.1 Trace Source

Internally SLX FPGA will instrument the source code, such that the resulting binaryproduces a trace file that records detailed information on actions that took place duringthe application execution. As source code tracing is a process that is automaticallyinvoked during most analysis phases, executing this process in isolation is not pos-sible within the GUI. Note that depending on the application, this step may be timeconsuming.

i In order to keep the tracing times manageable, the user should modify theapplication to be executed with an input data set that is not too large. Thesignificance of the chosen data set can be validated with the line profile andcode coverage features, explained in Sections 2.8.2 and 2.8.3.

75


2.8.2 Line Profile

The line profile of the application provides a convenient view of the contribution of eachsource line to the total execution time. To enable line profile information, right-click onthe grey area (or on the source line numbers, if visible), just to the left of the sourcecode window. A set of selections will be displayed; select Show Profile to enable thisspecific feature. Figure 2.84 illustrates the action required by the user.

A screenshot for function main() is shown in Figure 2.85. Source lines with theexecution time percentage are indicated on the left side column with a shaded back-ground. All source lines that correspond to source-level statements and have an exe-cution percentage smaller than 1% are visible as light cyan.

Figure 2.84: Enabling line profile results.

76


Figure 2.85: Visualizing line profile results.

2.8.3 Code Coverage

The code coverage report of the application shows the source code lines that werevisited during the execution of the application, and the ones that were not. To enablecode coverage information, use the mouse, right-click on the line number or columnarea, just to the left of the source code window. A set of selections will appear; se-lect Show Coverage to enable this specific feature. Figure 2.86 illustrates the actionrequired by the user.

A screenshot of the function main() is shown in Figure 2.87. Source lines thathave been visited are indicated with a light green color. As seen in the figure, alllines of function main() have been visited. Unvisited source lines are indicated by theabsence of color.

77


Figure 2.86: Enabling code coverage results.

Figure 2.87: Visualizing code coverage results for main().

78


2.8.4 SLX Hints

The SLX Hints view provides an overview of all hints generated for the application.Hints are organized in a hierarchical manner as a table, reflecting the application struc-ture. A detailed description of the hint format as well as the supported application andparallelization hints can be found in Chapter 4. Chapter 5 also lists the supported HLShints. Section 8.2.2 provides more details on tree tables.

Figure 2.88: Overview of the SLX Hints view.

Figure 2.89: Expanded parallel partition icon.

The SLX Hints view consists of columns that summarize and categorize the impor-tant properties of all of the insight gained:

• The Name column displays icons that either correspond to a hint or structuralinformation about the application. These icons help distinguish between the dif-ferent classes of information. Figure 2.89 shows the DLP and PLP partitionsof a loop in the main function, after Find and Optimize Parallel Loops has beenexecuted, which have been expanded to show the hints that belong to these par-titions. In addition to the icons mentioned in Section 4 and Section 5, the SLXHints view uses the following additional icons to represent structural informationabout the application:

– corresponds to a function in the source code.

– corresponds to a loop.

– corresponds to a group of code statements or variables that feature spe-cial properties detected during parallelization analyses.

79


– corresponds to parallel partition, and groups information relating to differ-ent parallelization strategies, specifically DLP and PLP.

• The Status column indicates the Status of a parallel partition by a small check-box:

– : the parallelization strategy associated with the code section (i.e., a loop)is feasible, meaning that the code in the loop can be executed concurrently.

– : the parallelization strategy associated with the code section is not feasi-ble, and there are blockers that hinder parallelism.

• The Description column shows the Description of each hint. This correspondsto the descriptions listed in Chapter 4 and Chapter 5 and provides detailed infor-mation about the parallelization opportunities.

• The Execution Percentage shows the execution percentage of functions andloops. The execution percentage is also represented as a histogram, whichmakes it easier to identify parts of the application with a high execution percent-age.

• The Help column provide links for a detailed explanation of a hint by opening thecorresponding section of the SLX user manual and accompanying documents.

The SLX Hints view allows filtering hints according to multiple criteria. The filtercan be accessed by clicking the Filter icon of the table (below the column name - seeSection 8.2 for more details on column filtering), and can be observed in figures 2.88,and 2.89. In addition to predefined filter options for filtering by type of a hint (e.g.,showing only HLS hints), it also allows configuring a list of filter criteria combined byan and operator.

Z Remember that only the hints related to relevant locations (within BASEPATH)will be shown. See Section 7.2.6 for more details.

Columns on the SLX Hints view can also be sorted by clicking on the column nameheader; see Section 8.2.3 for more details on column sorting.

2.8.4.1 Parallelization Hints

As mentioned in Section 2.7.4, a number of parallelization hints will be generated.To see these hints, type "PARTITIONING" in the SLX Hints Name filter. Figure 2.90shows the produced parallelization hints for Data-Level Parallelism (DLP) present in theapplication. The view depicted in the figure can be obtained by clicking on a specifichint in the SLX Hints tab, which will highlight the source line for the specific hint.

Reference information on the syntax and meaning of the parallelization hints canbe found in Chapter 4.

80


Figure 2.90: Generated DLP hint

Figure 2.91: Hints tab with a filter configured only to show the ones corresponding toHLS

2.8.4.2 Show HLS Hints

In addition to generating HLS code for the functions that are mapped to the FPGA,specific hints are produced that explain the positive decisions for pragma generationas well as negative decisions and rationale when not generating a certain pragma. Thecode generation hints provide a cross-reference to the modified application sources,reflecting what kind of pragma was inserted. The SLX Hints view provides the userwith an efficient way to inspect the different kinds of hints. You can choose to filter outeverything except the HLS hints, as shown in Figure 2.91.

81


Figure 2.92: Tooltip revealing full text in SLX Hints

Figure 2.93: Expanded row revealing full text in SLX Hints

To show the HLS hints, set the HLS filter in the Name column in the SLX Hintstab. To jump to the related code (original or generated), double click on any of the HLShint entries in the SLX Hints tab. This will highlight the corresponding source line inthe generated code and move the focus there (see Figure 2.76). Hovering over thehint icon on the left side of the highlighted source line will display an information boxcontaining the HLS hint. Reference information on the syntax and meaning of the HLShints can be found in Section 5.1.

To see the full text of descriptions that are cut off due to their line length, hover overthe description ( Figure 2.92) or expand the row (( Figure 2.93) to reveal the entire text.

2.8.5 Software Call Graph

Silexica generates a dynamic call graph for the application, which assumes that theentire application is executed on the processors of the target platform. This call graphtakes into account the characteristics of the device processors (e.g., the ARM cores inthe MPSoC for the zcu102 board), and can be used to focus optimization efforts in thecomputational hotspots of the application.

This section provides instructions in navigating through the generated SLX callgraphs. The dynamic call graph can be displayed by clicking on the SW Call Graphbutton, as shown in Figure 2.94. For workshop_fpga this produces the graph depictedin Figure 2.95.

82



Figure 2.94: Clicking SW Call Graph

Figure 2.95: Dynamic call graph diagram visualization as requested by SW CallGraph.

Depending on the number of functions and the connectivity of the call graph, itmight become cumbersome to visualize the call graph in its entirety in a single view.To navigate the dynamic call graph with ease, the user can utilize the Outline or theOverview modes, which can be selected in the bottom left corner (next to the Problems,Console, and Properties tabs).

Figure 2.96 illustrates the use of the Outline functionality of this tab, which appearsby default. This window also provides the list of executed functions with their contri-bution to the total execution cost of the application, including the cost of any functionsthey call. For example, function hwscale_accum() has a contribution to total executioncost of 49.99%. Clicking on the entry for this function automatically highlights that func-tion in the dynamic call graph (on the right). Figure 2.97 shows the Overview view withfunctions hwscale_accum() and main() highlighted.

83


Figure 2.96: SW Call graph view in Outline mode

Figure 2.97: SW Call graph view in Overview mode

i Double-clicking on a function in the SW call graph (workshop_fpga.cgd), opensthis function in the source file editor view.

It is possible to view different facets of the SW call graph composed of only asubset of the functions in the application. In this way, the user can extract a subset ofthe dynamic call graph to focus on the specific functions. This is done with the filtering

84


Figure 2.98: Select multiple functions to focus in the SW call graph using the filtersin the Properties tab

options provided in the Properties tab shown right below the call graphs in Figure 2.98.Figure 2.98 illustrates how to select swscale_accum(), and compMulScale() to onlyview the relevant part of the SW call graph. To select multiple functions, first checkthem in the Properties tab, then click on Show selection only. The filtering feature alsosupports function search and filtering via regular expressions.

In the Outline view, the user can also view all outgoing edges of a caller functionin the call graph to all its callees. These outgoing edges represent different static callsites. For example, as shown in Figure 2.99, function main() has two outgoing edgeslisted below its entry in the Outline tab (bottom-left corner). Selecting any edge willhighlight it in the SW call graph. In the figure shown, both edges have been chosenfrom the list.

For a specific call graph edge, the source and the target nodes can be highlighted.This is achieved by right-clicking on the call graph edge entry in the Outline and select-ing Show Source or Show Target, respectively. In Figure 2.100 main() is selected asthe source of the edge matching the static call site of swscale_accum(), which can befound at line 50 of workshop_fpga.c. This highlights the call graph node for main() inthe call graph diagram, indicating it is the source function.

More specific information on call graph nodes (representing functions) and edges(representing function call sites) can be accessed by hovering the mouse over theselected items. Figure 2.101 shows this additional information in the form of a pop-upmessage that gives the ideal application speedup achievable for this function, if it could

85


Figure 2.99: Navigate over the outgoing call graph edges of a function node

Figure 2.100: Select to show the source function of a call graph edge using ShowSource. Show Target is used in the same way for highlighting a targetfunction.

86


Figure 2.101: Additional information for a function

be fully parallelized (i.e., contains no sequential parts). For the zcu102 platform, themaximum speedup is given for utilizing 2, 4 or 8 CPUs. Figure 2.102 shows the specificcall site in the program. In the presented case, the caller function is main(), the calleeis hwscale_accum(), the function call occurs 100 times and the call is located in fileworkshop_fpga.c at line 51.

2.8.5.1 Function analysis focus

The BASEPATH variable can be used to focus the analysis to functions defined insidea particular path (see Section 7.2.6). The callgraph displayed reflects this behaviorby hiding unfocused functions defined in a file outside of BASEPATH, even if they werecalled during the profiled program execution. Additionally, calls via unfocused functionsmay appear as dashed arrows in the call graph when a function indirectly calls anotherone or itself via a function located outside of the BASEPATH.

For instance, consider the call chain where F1 calls F2 and F2 calls F3. If F2 isdefined outside of BASEPATH, it will not be shown in the call graph and a dashed arc willbe shown between F1 and F3 to represent the call path between the two functions.

Calls via unfocused functions are illustrated in Figure 2.103.

87


Figure 2.102: Additional information for a call site (an edge in the call graph)

Figure 2.103: F1 calls F2 (defined outside of BASEPATH) which calls F3, resulting in adashed arc from F1 to F3.

2.8.6 Analysis Graph

The Analysis Graph draws the relationship betweens functions and the variables theyaccess, and the calling relationship (call graph) between functions, for the application.

88


This graph can be very large, and not all nodes are equally interesting. With this inmind, local variables are initially not displayed in the graph. What is to be displayed ornot can be controlled by filtering - see Section 2.8.6.3.

The Analysis Graph becomes visible by clicking on the Analysis Graph button, asshown in Figure 2.104.

Figure 2.104: Clicking Analysis Graph.

Figure 2.105 shows a graph of the variables accessed by the main function. Func-tions are represented as rectangles. Variables are represented either as circles or assquares with rounded edges (see Section 2.8.6.5). Edges between a function and vari-ables represent the accesses made in that function. The edge arrow direction indicatesif the function reads and/or writes the variable.

Along with function and variable nodes, the Analysis Graph can illustrate relation-ships between multiple process and thread nodes when the analyzed application ismulti-threaded/process.

Z Remember that only the variables and functions defined in files located withinBASEPATH will be displayed - see Section 7.2.6 for more details. In addition,when all functions called for a thread are hidden, the thread is also hidden.

The Analysis Graph makes use of the Outline View . This lists all the functions inthe graph. Functions can be expanded to see the variables they access. An exampleof the outline view is shown in Figure 2.106.

The Analysis Graph makes use of the context menu (Section 2.8.6.1, the Propertiesview (Section 2.8.6.2) and the Filter view (Section 2.8.6.3) to present details on partsof the graph and to allow control over what is displayed.

2.8.6.1 Context Menu

Right-clicking on the Analysis Graph opens a context menu (see Figure 2.107). Thecontents of the menu depends on what you clicked on.

The context menu of a node or edge contains the menu item Go to source .... Thisopens source editors where related source lines are highlighted. Nodes additionallyhave a Hide This menu item to provide an easy way to remove the node from the dis-played graph. Nodes also have menu items to reveal (add), hide or highlight connectednodes - for examples, see Figure 2.108 and Figure 2.109.

89


Figure 2.105: View of the Analysis Graph for the function main.

Right-clicking on the background of the graph displays the context menu whichprovides menu items to open the Properties (see Section 2.8.6.2) and Filter (see (Sec-tion 2.8.6.3) views, along with a way to reset the graph to the initial contents.

2.8.6.2 Properties View

When an item is clicked on in the Analysis Graph, the Properties view updates toprovide access to related information or settings.

When the graph background is clicked on, the Properties view displays a tabbedset of settings for the graph. The tabs are:

• General : controls the edges to be displayed, the metrics to display in functionnodes (see Figure 2.110), and the scalability behavior if the set of nodes to bedisplayed is large (see Section 2.8.6.4).

• Highlight : the graph uses colors and node size to indicate relative metric val-ues, and the Highlight tab allows selection of the metrics to use for this (seeFigure 2.116 and Section 2.111).

90


Figure 2.106: Outline view of the Analysis Graph.

Figure 2.107: Context menu for a function node.

91


Figure 2.108: The context menu for function nodes in the graph. Demonstrating thepossibilities to view callers and callees of the function.

Figure 2.109: The context menu for variable nodes in the graph. Demonstrating thepossibilities to highlight or reveal the functions accessing the variable.

• Layout : different layout strategies are possible for graphs and the Layout taballows selection of layouts and parameters that effect them.

The associated Properties view provides a set of filters to allow a more selectivedisplay of information on the graph, as well as layout and highlighting control over thegraph. Filtering can be applied for processes, threads, functions and variable nodes.

92


Figure 2.110: Function Display configuration within the General Settings of AnalysisGraph.

2.8.6.3 Filter View

Applications can involve numerous function and variable nodes. The Analysis Graphallows the nodes displayed in the graph to be filtered in a number of ways. BASEPATHis one (see Section 7.2.6), which removes nodes entirely from consideration in theAnalysis Graph. A more flexible way of filtering nodes is provided via the Filter view.

The Filter view provides a tabbed set of tables to select the nodes that can bevisible in the graph, with one table per type of node in the application: variables andfunctions, and optionally threads and processes.

The tables are all SLX tables - see Section 8.2 for a general introduction to suchtables and how they operate.

Each table presents information for a specific type of node. Functions or variablesthat are defined outside of BASEPATH are not listed in the tables. The Visible True/Falseenumerated column has a check-box to indicate if the node is considered visible (note:it may not actually be visible if it is off-screen or, for multi threaded/process applications,it is part of a different relationship sub-graph than the one currently being displayed).Several buttons are provided at the top of each tab. The rightmost is Apply, and clicking

93


Figure 2.111: Highlight view of the Analysis Graph.

this causes the Visible settings to have the appropriate effect on the Analysis Graph.The Apply action is also on the context menu of the table’s cells. The Visible settingsand effects can also be controlled with the context menu of nodes currently visible inthe graph.

To the left of the Apply button are 4 buttons labeled:

• Select All, Deselect All, Invert Selection: to adjust the Visible setting of the rowsdisplayed in the table. These actions are also on the context menu of the cells,along with the action to select or deselect the individual row.

• Deselect Others: to clear the Visible setting of the rows hidden from the currenttable view using column filters.

The table can be exported as a tab-separated CSV file by clicking on the exportoption on the context menu of the cells or via the view’s toolbar.

Clicking on a row in a table causes that row to be highlighted in the table. If theassociated node is displayed in the current Analysis Graph it is also highlighted there.

94


Details of the table contents are:

• Processes:

– Visible: contains checkbox to indicate if a node is considered visible. Seesection above for details.

– Name: a simple-text column displaying the name SLX has for the process.

• Threads:


– Name: a simple-text column displaying the name SLX has for the thread .The name is preceded by an icon with the same color as the thread whendisplayed in the thread relationship sub-graph it belongs to.

– Location: a simple-text column displaying the file and line number wherethe thread was created, if known, or otherwise the first line executed by theprogram.

– Process: an enumerated column naming the process that executed thethread.

Note: if the process relationship sub-graph is being displayed, then threads forall processes are listed in the unfiltered table. Otherwise only the threads of theprocess being examined currently in the Analysis Graph are displayed.

• Functions:

– Selected : selected functions for further operations (e.g. show selection onlyon graph)


– Name: a simple-text column displaying the name SLX has for the function.

– Location: a simple-text column displaying the file and line number wherefunction was defined.

– Self Cost : averaged estimated cost for function execution (and as a percent-age of the entire application)

– Total Cost : averaged estimated cost for a function and its callees (and as apercentage of the entire application)

– Reads: average number of reads performed by the function (and as a per-centage of the entire application).

– Writes: average number of writes performed by the function (and as a per-centage of the entire application).

– Sync. Calls: averaged estimated cost of synchronous function calls

95


– Self Cost ARM: averaged estimated self cost of function when running onan ARM processor (i.e. ARM_CORTEX_A9)

– Self Cost ARM: averaged estimated total cost of function when running onan ARM processor (i.e. ARM_CORTEX_A9)

– Process: an enumerated column naming the process that executed the func-tion. This column is hidden by default.

– Threads: an enumerated column naming the threads that executed the func-tion. This column is hidden by default.

The context menu of a function row is extended with actions to select callers andcallees of the function. Note: if the process relationship graph is being displayed,functions used by all processes are listed in the unfiltered table. Otherwise onlythe functions used for the process being examined currently in the Analysis Graphare displayed.

• Variables:


– Name: a simple-text column displaying the name SLX has for the variable.The name is preceded by an icon to indicate if the variable is a local, global,heap or shared-memory variable.

– Type: a simple-text column displaying a representative name for the type ofthe variable, if SLX knows it. A type is not usually known by SLX for heapand shared-memory variables.

– Function: a simple-text column displaying the function where a local variablewas defined, or a heap or shared-memory variable was created.

– Size: a number field displaying the size of a variable - see Table 2.2 fordetails.

– Kind : an enumerated column displaying if the variable is a local, global, heapor shared-memory variable.

– Process: an enumerated column naming, for multi-process applications, theprocess that executed the function.

– Threads: an enumerated column naming, for multi-threaded applications,the threads that executed the function.

– Thread Shared : a True/False enumerated column indicating if two or morethreads in the application access the variable.

2.8.6.4 Graph Scalability

The Analysis Graph is equipped with scale settings. The scale setting limits the numberof nodes that are visualized within the Analysis Graph. This includes both Function and

96


Variable nodes. The setting can help to reduce the load time for a large Analysis Graphwith thousands of nodes.

Figure 2.112 demonstrates the possible scale settings that can be applied. Thealerts for when opening a graph that meets the scalability setting can be turned on oroff through the preference dialog.

Figure 2.113 demonstrates an example of such a warning. The warning dialog willnotify the amount of nodes present in the graph and the current maximum number ofnodes that will be visualized. The number of nodes currently displayed in the graphcan also be seen just below the Analysis Graph tab, to give an idea of the amount offunctions being visualized. If the default visualization is not satisfactory, the graph canbe expanded or reduced with graph highlighting (Section 2.8.6.6) or through the node’scontext menus (Section 2.8.6.1).

Figure 2.112: The scalability preference menu for the Analysis Graph.

Figure 2.113: When opening a large Analysis Graph, a warning message will appearwhen the graph contains more than the maximum number of nodes.

97


Figure 2.114: Analysis Graph with an expanded structure to display its members.

2.8.6.5 Subvariables

Some types of variables can be expanded to display accesses to their subvariables(see Section 2.8.7 for a description of subvariables). These variables have a squareshape with round corners. For those variables, the context menu contains an Expandentry which will replace the node by its subvariables. For convenience double-clickingsuch a variable will also expand the node.

Similar, the context-menu for subvariables has a Collapse entry which will replacethat variable and all siblings, or subvariables thereof, by its parent. Subvariables thatcan’t be further expanded are represented by a circle, like normal scalar variables, butwith a division along its horizontal. The name of a subvariable includes the name ofthe variable it is part of and a designation of the part that the subvariable represents.If subvariables are further expanded, intermediate designations may be truncated tofit in. The full designation is available in the tooltip for the subvariable. Figure 2.114shows an example with a simple fp_complex structure and an array. The structure hasbeen expanded to display its .real and .imag members. The outline view also reflectsthe subvariable hierarchy as one can see in Figure 2.114.

2.8.6.6 Highlighting

The Analysis Graph supports multiple metrics to highlight functions and variables. Ta-ble 2.1 lists and explains all supported metrics. Values of the metrics are displayed inthe Functions and Variables tabs of the Filter view (see Section 2.8.6.3). An example

98


Figure 2.115: Function filter table of the Analysis Graph

of the Functions filter tab is shown in Figure 2.115. Highlighting by metric in the graphprovides an alternative visualization of the values.

Highlighting can be changed via the graph’s background Properties (see Sec-tion 2.8.6.2). The highlighting of functions is done by coloring them. A more saturatedfunction means that the function has a higher value for the selected metric. Variablesare highlighted by resizing. A larger variable means that the variable has a higher valuefor the selected metric. An example that highlights functions according to their TotalCost and variables according to their Data Size is shown in Figure 2.116.

If the highlighting rules are applied to a graph that has the scaling setting enabled, awarning message will appear. Figure 2.117 shows the warning message. Two optionsare given for updating the graph to the new highlighting rule. The option, Replace, willonly visualize the nodes meeting the scalability criteria and highlighting criteria. Whilethe Add action, will simply add any new nodes that are not already visualized in thegraph.

Table 2.1: Highlight metrics for the Analysis Graph.

Metric name Supportednode types

Description

None Function/Variable Default metric.Self Cost Function Averaged estimated costs for a function running on the plat-

form.Total Cost Function Averaged estimated costs for a function, including total costs

of callees of the functions.Reads Function Number of reads performed by a function.Writes Function Number of writes performed by a function.Access Count Variable Number of accesses made to the variable.Data Size Variable Size of the variable type.Self Cost (CORE,FREQ)

Function Self Cost for a function running on a specific type of processorand clock frequency.

Total Cost (CORE,FREQ)

Function Total Cost for a function running on a specific type of processorand clock frequency.

99


Figure 2.116: Analysis Graph with functions highlighted by Total Cost and variableshighlighted by Data Size.

Figure 2.117: Warning message when a highlighting rule is applied to a scaledgraph.

100


Figure 2.118: Menu to display all functions that access a selected global variable.

2.8.6.7 Advanced Options

This subsection explains and visualizes more advanced filter possibilities related to theAnalysis Graph.

For a function that accesses a global variable it is important to know which otherfunctions also access the global variable. As shown in Figure 2.118 SLX supports aneasy way to visualize this information. For a given global variable in the Analysis Graphone has to right click on it and select the Display Related Functions entry of the contextmenu that is displayed. The Analysis Graph view will now display all functions thataccess the previously selected global variable.

Another common task is to see the Analysis Graph for a given function in the sourcecode file. Figure 2.119 shows how to directly navigate to the Analysis Graph of a givenfunction from the SLX source code editor. First select the function name in the sourcecode editor and right click on it. In the context menu, select the Show in AnalysisGraph entry. The Analysis Graph view will open with the function corresponding to thepreviously highlighted function selected.

101


Figure 2.119: Menu to navigate to the Analysis Graph of a highlighted function.

2.8.7 Memory Analysis

The Memory Analysis view displays detailed information on the usage of variablesduring the execution of the application. By clicking on the Memory Analysis button(see Figure 2.120), the Memory Analysis view is opened in the lower right area of thegraphical interface. The Variables table groups variables under the categories Local,Global, and Dynamic.

Table 2.2 shows the different attributes presented and exact measurements takenduring dynamic tracing for SLX Memory Analysis. The column Sort and Filter capabil-ities of each attribute is portrayed in the last two columns of Table 2.2. For details onsort and filtering, see Section 8.2. "L, G, D " columns describe the program variablecharacteristics as obtained by dynamic execution and presented by Memory Analysisfor Local (L), Global (G), Dynamic (D). If a given attribute applies to a certain type ofvariable, the corresponding cell (Local, Global, Dynamic) is set to Y, otherwise a dash(–) is used to denote that this attribute is inapplicable to the given type.

102


Table 2.2: Memory Analysis table column attributes

Attribute Description L G D Sort Filter

Name The name of the variable or subvariable.Dynamic variables use the allocation func-tion as their name. It is also used to showthe names of categories, processes andthreads.

The name can have multiple subentries thatcan be expanded. The subentries are ei-ther the different locations where an accessto a variable occurs (see Figure 2.123), orsubvariables that in turn can have suben-tries.

An access location provides informationabout the Line where the access occurs,its Size (which can change across differentallocations), and the Writes, Reads, andTotal accesses for that Line.

Variable names are grouped under the cat-egories Local, Global, and Dynamic .

Y Y Y Y Y

Type Variable type (e.g., array, struct, pointer,int)

Y Y – Y Y

File File name (prepended with the relative pathto access it) where the variable was de-clared (L, G) or created (H). The file is re-ported as external for extern or standardC library functions

Y Y Y N Y

Function The function where the variable is ac-cessed

Y Y Y Y Y

Line Line number where the (sub-)variable hasbeen declared. extern and standard C li-brary functions don’t have an implemen-tation in original source code and all ac-cesses made to arguments are associatedwith line -1 (no line specified)

Y Y Y Y Y

103


Figure 2.120: Clicking Memory Analysis.

Size (Bytes) Size of the variable in bytes. For variablelength arrays (VLAs) no size informationis shown. For dynamic variables the alloca-tion size is shown.

Y Y Y Y Y

Writes Number of write accesses (definitions) tothe variable. Expressed as an integercount. Hover over the cell to see the ratio tothe total number of writes as a percentagein parentheses. E.g., 26 (2.0%)

Y Y Y Y Y

Reads Number of read accesses (uses) to thevariable. Expressed as an integer count.Hover over the cell to see the ratio to thetotal number of writes as a percentage inparentheses. E.g., 52 (4.0%)

Y Y Y Y Y

Total Total number of accesses, either write orread to the variable. Expressed as an in-teger count. Hover over the cell to see theratio to the total number of writes as a per-centage in parentheses. E.g., 78 (6.0%)

Y Y Y Y Y

Z Remember that only the variables defined in files located within BASEPATH willbe shown. See Section 7.2.6 for more details.

By choosing to Sync with Analysis Graph (see Figure 2.121), accesses to variableswill not be shown if the function where the access occurs has been hidden in theAnalysis Graph (see Section 2.8.6). If a subvariable is hidden in the Analysis Graph orif none of its accesses would be shown, the subvariable is not shown.

For the workshop_fpga application, the default view appears as in Figure 2.122.The attributes for any variable appear in columns. By switching focus between thedifferent types of variables, the columns change to present only the kind of informationthat is relevant to this type of variable.

104


Figure 2.121: Synchronize filtering with hidden elements of the Analysis Graph.

Figure 2.122: Code analysis information for the variables in the program, as re-quested by Memory Analysis.

A named entry for a given variable can have multiple subentries that can be ex-panded in the Memory Analysis tab. The subentries are either the different locationswhere an access to a variable occurs (see Figure 2.123), or subvariables that in turncan have subentries. Subvariables can be:

• struct members of structs that do not contain unions of bit-fields. The accessedmembers can be further expanded (see Figure 2.124). If a single access in theprogram accesses multiple members, that access will be shown under all thosemembers.

• array index ranges of one dimension of an array. Different index ranges mayoverlap. The subentries are either subvariables of the elements in the range,such as inner dimensions of an array or struct members if the elements arestructs, or accesses to the elements of the range from different locations (seeFigure 2.125). If all elements in the range are accessed, the range is denotedby a dash (e.g. [0-42]). Otherwise, the range is denoted by a tilde (e.g. [0~42]),indicating that the first and last elements of the range are accessed, but some ofthe elements in the range may be untouched.

An access location provides information about the Line where the access occurs,its Size (which can change across different allocations), and the Writes, Reads, andTotal accesses for that Line.

105


Figure 2.123: Expand the view to see multiple accesses to local variable b.

Figure 2.124: Expanded view to see the fields of struct fp_complex for theworkshop_fpga example application.

For the local variable b we see that it is accessed in five source lines; lines 3, 7,9, 13, and 15 in file fp_complex.c. For each source line the number of writes (0, and204800), reads (204800, and 0) and total accesses (204800) is reported. Double click-ing on a certain variable causes the GUI to highlight the declaration, and the accessesto such variable in the source code editor, as depicted in Figure 2.126 for variablescaled_vector_hw.

106


Figure 2.125: Expanded view to see the access ranges of the scaled_vector_hwarray for the workshop_fpga example application.

Figure 2.126: Source code highlighting by double clicking on local variable scaled_-vector_hw.

107


Figure 2.127: Hovering over the cell to see the percentage as a total number ofReads or Writes

108

3Provided Examples

Some sample projects to use with SLX FPGA are available under the examples/fpgasubfolder in your local SLX installation.

With a SLX installation, the following SLX FPGA examples are provided:

direct_connectA simple example of matrix multiplication with matrix addition (Out = (A × B) +

C)) to demonstrate direct connection which helps to increase system parallelismand concurrency. This is a Xilinx example obtained from https://github.com/Xilinx/SDSoC_Examples and is licensed under the 3-clause BSD license.

keccakAn SHA-3 implementation written for HLS by the George Mason University’sCryptographic Engineering Research Group. https://cryptography.gmu.edu/athena/index.php?id=source_codes and is licensed under the GPL v3 license.

systolic_arrayA simple example of matrix multiplication to help developers learn systolic arraybased algorithm design. This is a Xilinx example obtained from https://github.com/Xilinx/SDSoC_Examples and is licensed under the 3-clause BSD license.

workshop_fpgaApplication where the computation presents data level parallelism in a hardwarefunction.

Sample Xilinx SDx projects are also provided so the user can import them with theXilinx SDx Project... selection in the Import SLX Project menu under the examples/fpga/sdx folder.

array_zeroA simple example, allocating data with sds_alloc, freeing data with sds_free andmoving data between the Programmable System and the Programmable Logic ina Xilinx Zynq UltraScale/UltraScale+ FPGA device.

109

https://github.com/Xilinx/SDSoC_Examples


https://cryptography.gmu.edu/athena/index.php?id=source_codes

https://cryptography.gmu.edu/athena/index.php?id=source_codes



Chapter 3. Provided Examples

color_space_convRGB-to-HSV and HSV-to-RGB color space converter.

FIRA finite-impulse response (FIR) filter application.

110

4Application and Parallelization Hints

SLX generates a wealth of information in order to guide the user to better understandparallelization opportunities of a given application. This information, falling under thecategories of application hints (ahints and shints, preceded by and , respectively,in the SLX Hints view) and parallelization hints (phints marked by ), is visible aspop-up boxes in the GUI editor. Such hints provide information about:

Hotspot detection during the construction of the Program Model for the applica-tion based on dynamic analysis of the application’s execution. Further interestinginformation at the application or function level that can be assessed once thediscovered parallelization opportunities are exploited.

The kind of detected parallelism (DLP, PLP ) through application analysis.

Estimated application speedup if the detected parallelism ( DLP, PLP ) is realizedunder a suitable parallel programming model and executed on the specified targetplatform.

4.1 Hint Format and Syntax

A hint is a sequence of statements expressed in natural language. This chapter coversapplication and parallelization hints when they haven’t been matched yet to a particularprogramming language. Its purpose is to provide information on results from SLX usinga fixed format.

The following sections summarize the format of each application or parallelizationhint along with a short explanation. Application hints can come from either analysis orexposing general partitioning information and are prefixed by (APP). Partitioning hintsare prefixed by either DLP or PLP depending on the type of detected parallelism.

As a part of a hint’s description, regular expressions are used for denoting repetitionand optionality of a token:

111

Chapter 4. Application and Parallelization Hints

• The Kleene star (a∗) is used for a token, a, repeated zero or more times.• The Kleene plus (a+) is used for a repeated one or more times. It is equivalent to

the concatenation a(a)∗.• The question mark (a?) is used for optionality, i.e., a occurs exactly zero or one

times.• Braces ({ }) are used for distinguishing between groups of tokens.• The vertical bar (|) separates alternatives in a brace-enclosed group of alterna-

tives.• In addition, brackets ([ ]) indicate information that is not part of the fixed text of a

hint. Such information is function and variable names as well as numbers.

This syntax is also used in code generation hints for particular programming lan-guages.

4.2 Application Hints

The following subsections summarize the hints that convey information generated bythe analysis of the application.

4.2.1 Function Selected for Parallelization

APP - [function_name] was selected by the user for parallelization. Total↪→ execution percentage: [pct]%

Function function_name has been preselected by the user for auto-selection as atop-level hardware function . It is reported during analysis that the contribution of thisfunction to the total application execution cost is equal to pct%.

The user can manually force a function to be considered for parallelization, by spec-ifying it in the FUNCTIONS command. How to use the FUNCTIONS command is explainedin Section 7.2.4.

4.2.2 Function Excluded from Parallelization

APP - [function_name] was requested not to be considered for parallelization↪→ . Total execution percentage: [pct]%

Function function_name has been excluded by the user from auto-selection of top-level hardware functions. It is reported during analysis that the contribution of thisfunction to the total application execution cost is equal to pct%.

The user can request this in the Configuration Editor by specifying it in the FUNCTION_-EXCLUDES configuration variable. How to use the FUNCTION_EXCLUDES variable is ex-plained in Section 7.2.5.

112

4.2. Application Hints

4.2.3 Function Identified as a Hotspot

APP - [function_name] has been identified automatically as a hotspot. Total↪→ execution percentage: [pct]%

The given function, function_name, has been automatically detected as a perfor-mance hotspot and therefore deserves detailed analysis. At this point it is reported thatthe contribution of this function to the total application execution cost is equal to pct%.

The user can alter the CANDIDATE_THRESHOLD, which controls the threshold for de-tecting execution hotspots. CANDIDATE_THRESHOLD is discussed in more detail in Sec-tion 7.2.3.

4.2.4 Function Rejected as a Hotspot

APP - [function_name] is rejected as a hotspot as it is compiler generated.↪→ Total execution percentage: [pct]%

The given function, function_name, has been automatically detected as a perfor-mance hotspot, but is rejected from further analysis as the code is compiler-generated.The hint draws attention to code that is needed by the program but cannot be directlymodified. At this point it is reported that the contribution of this function to the totalapplication execution cost is equal to pct%.

4.2.5 No Valid Parallelization Candidates Found

APP - No valid parallelization candidates were identified for application [↪→ app_name]

Application app_name does not contain any function that is identified as a valid par-allelization candidate. This implies that no parallelism will be extracted for this applica-tion.

4.2.6 Function does not Exist

APP - Function [function_name] is listed in FUNCTIONS but was not↪→ instrumented or was not executed

The user has specified a name of a function using the FUNCTIONS command, thateither does not exist in the program source or was never instrumented or executed.Therefore, it does not exist in the Program Model either, and any reference to it isignored.

113


4.2.7 Function Requested for Exclusion does not Exist

APP - Function [function_name] is listed in FUNCTION_EXCLUDES but was not↪→ instrumented or was not executed

The user has specified a name of a function using the FUNCTION_EXCLUDES com-mand, that does not exist in the program source or was never instrumented or exe-cuted. Therefore, it does not exist in the Program Model either, and any reference to itis ignored.

4.2.8 Function cannot be both a user-defined candidate and re-quested for exclusion

APP - Function [function_name] is listed in FUNCTION_EXCLUDES and FUNCTIONS↪→ but this is not supported

The user has specified function function_name to be a parallelization candidatewhile also specifying it as a function to be excluded from the Program Model. This isnot supported and for this reason SLX exits with an error.

4.2.9 Function has Dynamic Data Dependencies that Hinder itsParallelization

APP - Function [function_name] cannot be parallelized because it has data↪→ dependencies that change depending on the call site

The dynamic analysis for function function_name concluded that there are datadependency edges that differ between different function invocations. This implies thatit is not possible to parallelize this function in the same way for the entire lifetime of theprogram.

4.2.10 Number of Functions not Selected by the User

APP - Not analyzing [num_functions] other functions as they were not↪→ selected by the user

Reports the number of functions that were not examined due to not being selectedby the user. This means that the user provided a list of interesting functions by theFUNCTIONS configuration variable. This number is a count of all the functions in theprogram that were not provided in that list.

114

4.2. Application Hints

4.2.11 Number of Functions not Automatically Selected

APP - Not analyzing [num_functions] other functions as their execution↪→ percentage is less than CANDIDATE_THRESHOLD ([cand_thresh]%)

Reports the number of functions that were automatically excluded from being con-sidered as parallelization candidates. In this case, interesting functions were selectedif their execution percentage exceeded CANDIDATE_THRESHOLD. By default it is set to0%. Since the user might have overridden the default value, the value actually used ismentioned as cand_thresh.

4.2.12 A Requested Hardware Function was not Executed

APP - Function [function_name] is listed in HW_FUNCTION but was not↪→ instrumented or was not executed

Function function_name was requested to be moved to hardware but this func-tion does not exist in the program source or it exists but was never instrumented orexecuted. This causes the function to be ignored.

4.2.13 A Requested Hardware Function cannot be Excluded

APP - Function [function_name] is listed in FUNCTION_EXCLUDES and↪→ HW_FUNCTION.The function will be excluded

A function requested to be moved to hardware with the HW_FUNCTION command isalso requested to be excluded with the FUNCTION_EXCLUDES command. Excluding thefunction is given priority, as reported by this hint.

4.2.14 A Requested Hardware Function will be Automatically Addedto User-Defined Candidates

APP - Function [function_name] is specified in HW_FUNCTION but is not↪→ originally listed in FUNCTIONS. It will be considered as a user-specified↪→ function

This hint is produced when a function requested to be moved to hardware is notgiven in the explicit list of user-defined parallelization candidates provided by the FUNCTIONScommand. In this case, the function defined by HW_FUNCTION will be added to the user-defined parallelization candidates.

115


4.2.15 Function main is not Supported as a Hardware Function

APP - Function main is not supported as a HW_FUNCTION

The main() function is not supported to be used as a hardware function and therequest is ignored. main() will also be ignored in the automatic selection of hardwarefunctions.

4.2.16 No Hardware Functions Will be Selected

APP - Requested to not consider hardware functions

The hint is emitted when using the deprecated special value __none__ as top-levelhardware function in the configuration editor. This hint is now deprecated.

4.3 Parallelization Hints

The following summarizes the hints that convey parallelization information during theanalysis of the application.

4.3.1 The loop carries dependencies that can be ignored

The loop carries dependencies that can be ignored.Induction variable: <variable-name>...Reduction: <variable-name> (reduced over <operation>)...Considered private: <variable-name>...

SLX FPGA ignores certain LCDs (Loop Carried Dependencies) when consideringunrolling as they usually have a low negative impact.

• Induction variable: a variable with a value derived from the number of itera-tions that have been executed by the enclosing loop.

• Reduction variable: a scalar variable whose value in each iteration is computedwith an operation on its value from the previous iteration and another value. Thevariable may not be used for other purposes in the loop, and the operation mustbe associative. For further details on reduction variables, see reduction variable.

116

4.3. Parallelization Hints

• Private variable: one that has been assigned a value in an iteration of the loopbefore the first use of its value. Here, the last-produced value is not used afterthe loop.

4.3.2 Considered Unroll Factors

Unroll factor(s) <factor-list> will be considered.

SLX FPGA will consider the effectiveness of unrolling the loop with each of thelisted unroll factors. You may try other factors; the ones listed are the ones that willbe considered during optimization by SLX FPGA, assuming that they will result in thegreatest benefit.

4.3.3 The loop will not be unrolled because it has not been iterated

The loop will not be unrolled because it has not been iterated.

There was no dynamic information collected for the loop because the loop has notbeen completely iterated. Therefore, the examined loop is not considered for paral-lelization.

4.3.4 The loop carries dependencies

The loop carries dependencies.Variable: <variable-name> [<dependence-type:RAW|WAR|WAW>]...

A loop-carried dependency is one which exists between different iterations of thesame loop. The listed variables are used in a way that causes such dependencies inthe loop. In this case, SLX FPGA does not expect partial unrolling of the loop to bebeneficial, although it will consider full unrolling. The loop-carried dependencies arelisted with the dependency type, which can be any of the following:

• RAW: Read-After-Write dependency, also called a true or flow dependence, indi-cates the variable is read in an iteration after the one in which it was written.

• WAW: Write-After-Write dependency, also called an output dependence, indicatesa variable is written to by two different iterations without an intermediate read.

• WAR: Write-After-Read dependency, also called an anti-dependence, indicates avalue is written to a variable in an iteration after the one in which the old valuewas read.

117


4.3.5 Pipelining will be considered

Pipelining will be considered.

SLX FPGA considers every loop during optimization for a pipelinable implementa-tion in FPGA.

118

5HLS Code Generation Hints

In addition to the application and parallelization hints, SLX FPGA generates informa-tional hints for the purpose of generating HLS annotated code from the given sequentialC/C++ implementation.

• HLS hints, also called hhints for locating the code segments in the source file forwhich an HLS pragma was generated or for other interesting information aboutthe generated code, for instance, any non-synthesizable application code. In thelatter case, the hint explains the reason why the relevant construct cannot besynthesized and may suggest manual changes that could help. All generatedHLS hints are described in Section 5.1. They are preceded by in the SLXHints view.

• HLS pragmas that are supported by Xilinx Vivado HLS. The original sequential Ccode is automatically annotated with the identified pragmas. All supported HLSpragmas for automatic generation using SLX FPGA are described in Section 5.2.

5.1 HLS Hints

The following subsections summarize the hints that convey information about positivedecisions that lead to automatic HLS pragma insertion and negative decisions givingthe rationale for not inserting an HLS pragma.

5.1.1 Inserted unroll Pragma

DLP - HLS unroll pragma {{with unroll factor of [n]} {and skip exit check↪→ }?}? inserted for loop ([low1:high1]) originally at lines ([low2:high2])

This hint informs that the loop at lines low2 to high2 in the original source codewas identified as a parallel loop that can be unrolled with a factor of n. The location of

119

Chapter 5. HLS Code Generation Hints

the loop in the generated code is at lines low1 to high1. In case n divides perfectly thenumber of iterations of the loop, then checking for the exit condition is not necessaryand this check will be skipped from the hardware implementation.

Currently, full loop unrolling will be reported with an explicit unroll factor, eventhough this is strictly not necessary, as not providing an unroll factor implies a requestfor a full unrolling. Annotating all unroll pragmas with an explicit unroll factor facilitatesthe full loop unrolling of dynamic loops. Otherwise, Xilinx Vivado HLS would not beable to fully unroll these loops.

The description of the inserted pragma is given in Section 5.2.1.

5.1.2 Inserted pipeline Pragma

PLP - HLS pipeline pragma inserted for loop ([low1:high1]) originally at↪→ lines ([low2:high2])

The loop at lines low2 to high2 in the original source code was identified withpipeline-level parallelism. The location of the loop in the generated code is at lineslow1 to high1.

The pipeline pragma is issued without an explicit initiation interval, i.e., withoutan estimate of the number of cycles required for starting a new iteration of the loop.Omitting this forces Xilinx Vivado HLS to explore the feasible initiation intervals for theloop according to loop dependencies and select the lowest possible one.


5.1.3 Inserted loop_tripcount Pragma

HLS loop_tripcount pragma reporting {[average] | {a minimum of [minimum], a↪→ maximum of [maximum]{, and an average of [average]}?}} iterations,↪→ inserted for loop ([low1]:[high1]) originally at lines ([low2]:[high2])

This hint reports the number of minimum, maximum, and average iterations foreach source-level loop in the program. The description of the inserted pragma is givenin Section 5.2.3.

5.1.4 Inserted array_partition Pragma

[DLP|PLP] - HLS array_partition pragma for variable [varname] inserted for↪→ function [function_name]

Incoming array variables to a parallel loop will be partitioned so that there areenough memory ports for accessing the necessary data from each DLP worker. Thishint reports that this knowledge has been applied to array variable varname which is

120

5.1. HLS Hints

read or written in the loop region at lines low to high in the original source code. Aseparate hint is produced for each variable.

i Partitioning array variables comes with some limitations. The following casesof statically-allocated array variables are not currently supported for partition-ing:

– Arrays declared in the local scope, e.g., inside a loop body.

– Arrays for which there exists no parallel loop (DLP or PLP) where theycan be accessed by their syntactic name.

– Arrays declared with extern.

– Subvariables, for example, struct fields.

Function argument arrays may obtain an array_reshape pragma, see Sec-tion 5.1.5 for more details.

5.1.5 Inserted array_reshape Pragma

[DLP|PLP] - HLS array_reshape pragma for argument [varname] inserted for↪→ function [function_name]

Array variables passed as function arguments will be partitioned. The variable par-titioning will ensure that, for each DLP worker, enough memory ports are available toaccess the data the worker reads or writes. Data will be concatenated so that multi-ple elements of the original arrays can be accessed in parallel. This hint reports thatthis knowledge has been applied to function argument varname which is accessed infunction function_name. A separate hint is produced for each argument.

i Only function arguments are considered for array reshaping. Subvariables, forexample, struct fields in function arguments will not be reorganized by arrayreshaping.

5.1.6 Inserted inline Pragma

HLS inline pragma inserted for function [function_name]

It is often, if not always, worth to inline a non-top-level function that is called froma single static call site. The reason is that it has to be implemented anyway fully inhardware. If it is not inlined, unnecessary data copies from the caller to the callee

121


might be needed. If the callee is inlined inside the caller, it is likely that many of thesecopies will be eliminated. Further, storage requirements should be less than if the codeis not inlined because the function is called only from one site.


5.1.7 Inserted interface Pragma

[mode] HLS interface pragma inserted

The hint points at a HLS interface pragma inserted in a function. The pragma con-figures how interfaces are implemented in hardware for top-level hardware functions.The mode of the interface corresponds to the interface mode expected by Vivado HLS.

More details about the inserted pragma can be found in Section 5.2.7.

5.1.8 Non-synthesizable Function Call was Substituted

Wrapper - Non-synthesizable call to function: [function_name] originally at↪→ line [linenum] has been replaced with call to [new_function_name] [↪→ WARNING: Check functional equivalence]

This hint is produced when SLX FPGA automatically replaces a call to a non-synthesizable function from the standard C library with a call to a synthesizable func-tion providing the same functionality. The synthesizable function is included in theuser’s program by inserting the corresponding include, slx_hls_synth.h, automati-cally. Again, the process is completely transparent to the user and requires no manualintervention.

The list of supported non-synthesizable functions for automatic replacement is:

• srand, rand nominally included from stdlib.h

• isalnum, isalpha, isblank, iscntrl, isdigit, islower, isprint, ispunct, isgraph,isspace, isupper, isxdigit, tolower, toupper nominally included from ctype.h

The use of the synthesizable version of rand() and srand() have some limitations.The synthesizable version implementation is a POSIX-compliant one, but the POSIXstandard does not specify the implementation of rand() meaning that a different ver-sion might be provided by the POSIX implementation on the CPU subsystem. Further,even if the same implementation is used from both CPUs and FPGA, if calls to rand()and/or srand() occur from both CPU and FPGA functions, the resulting sequence forany given seed(s) will not be reproducible due to using independent seed, and thusstate, variables to seed and progress the pseudo-random number generator.

For the ctype.h functions, their synthesizable implementations is restricted to theUnicode 0x0000-0x007F address range which corresponds to 7-bit ASCII. Language-specific locales are not supported.

122

5.1. HLS Hints

i Adding custom implementations for library functions to the automatic coderefactoring mechanism is possible, with minor effort. In case that there is suchneed, please contact Silexica customer support for detailed instructions.

5.1.9 Function main() will not be considered for synthesis

Function 'main' is considered not synthesizable as it is assumed that theexecution environment cannot be entered from the FPGA.

The function main() will not be considered for synthesis in SLX FPGA. The functionis recommended to be kept as a wrapper running on the host, calling functions that willpotentially be mapped to the FPGA.

5.1.10 Synthesizability Check

Synthesizability check as a top-level function failed for the function ([↪→ function_name])[context_specific_message]

Each function that is a candidate for acceleration is checked for its synthesizability.SLX FPGA uses Xilinx Vivado HLS to check if a function is synthesizable. In thecase that non-synthesizable code is detected, the user is informed with an SLX hintcomprising of two parts. In the first part, it is mentioned that the synthesizability checksfailed for the given function. In the second part, the context-specific message from thecall to Vivado HLS is emitted. The location of the offending code segment is given andthe user can click from the SLX Hints tab view to go to the exact location in the code.

Further, the user can click the help link on a given hint to open a detailed descriptionof the issue in HLS Hints Synthesizability Guidance, with suggestions on how toresolve each issue, producing synthesizable code.

Z A function may not be synthesizable as a top-level but it may be part of (bedirectly or indirectly called from or inlined in the body of) a top-level functionthat is synthesizable.For example, a function is not synthesizable as a top-level function if it hasarray arguments of unknown size. However, if it is called from another functionand it is passed arrays of fixed and known sizes as these arguments, the func-tion in question will be synthesizable as part of this other function. In this caseVivado HLS can infer the type and synthesize the top-level function success-fully even if the called or inlined function by itself cannot be synthesized.

123



Table 5.1: Synthesizability check codes and the corresponding source-level con-structs that are unsupported for synthesis.

Phase Code DescriptionSYNCHK 200-11 Non-synthesizable typeSYNCHK 200-17 Unsupported union typeSYNCHK 200-21 Fused multiply-add (fma) is unsupported and throws a compiler errorSYNCHK 200-22 memcpy() is only supported for transferring data over AXISYNCHK 200-31 Dynamic memory (de)allocationSYNCHK 200-41 Pointer reinterpretationSYNCHK 200-42 Pointer comparisonSYNCHK 200-43 Use of non-static pointersSYNCHK 200-61 Arrays with unknown size at compile timeSYNCHK 200-64 C99 flexible array membersSYNCHK 200-71 Math functions that are not supported for synthesisSYNCHK 200-72 Unsupported C library functions, such as strlen, malloc, freeSYNCHK 200-73 Functions with variable number of argumentsSYNCHK 200-74 Recursive functionsSYNCHK 200-75 Functions returning a pointerSYNCHK 200-76 Calling a function pointerSYNCHK 200-79 Missing top-level functionSYNCHK 200-80 Overloaded member function cannot be a top-level functionSYNCHK 200-91 FIFO access is not possible on read-write portsSYNCHK 200-92 AXIS access is not possible on read-write portsXFORM 203-733 An internal stream used in a non-dataflow region may lead to a dead-

lock

The constructs that cannot be synthesized with Xilinx Vivado HLS, are detectedand generate an SLX synthesizability check hint are shown in Table 5.1. For moredetails about each unsupported construct and HLS code guidelines, please check theVivado HLS user guides and application notes.

5.1.11 No Code With HLS Pragmas was Generated

No HLS{+OpenMP}? code was generated for the application

This hhint is generated in the case that no pragmas could be generated for theentire application. This occurs when the application has no parallelization candidatesfor which any of the supported HLS pragmas can be inserted. In this case, there areno generated files under output/codegen/hls.

124

5.1. HLS Hints

5.1.12 There are No Files to be Extended with HLS Information

There are no files to be extended with HLS{+OpenMP}? information

This hhint is generated in the case that no pragmas and source code rewriting wasperformed on the source files of the application. This happens when the applicationhas no source-level loops and no functions to be wrapped by replacement functionsthat are synthesizable.

In this case, there are no generated files under output/codegen/hls.

5.1.13 Latency and Initiation Interval of Hardware Function

Function [function_name] implemented in hardware with a latency of [latency]↪→ cycles ([num] {ns|us|ms|s}) and an initiation interval of [↪→ initiation_interval] cycles ([num] {ns|us|ms|s})

The latency and initiation interval for function_name. Both are provided in terms ofcycles (FPGA clock cycles) and time.

5.1.14 Ignore a Top-Level Hardware Function that is OptimizedAway

A cost of zero for the top-level hardware function [function_name] means↪→ that it is trivial or may have been optimized away

The HLS tool has detected a user-selected hardware function with zero cost. Thismay mean that the function is trivial and may not be a profitable candidate. In suchcase, the produced call graph does not show any speedup obtained from moving thisfunction to hardware.

5.1.15 Time was Not Estimated for a Top-Level Hardware Function

A time estimate for hardware function [function_name] is not available.↪→ Potential reasons are: multiple induction variables in a loop of the↪→ hardware function, manually deleted or missing loop_tripcount pragmas.↪→ The available software time estimate will be used instead

This hint is emitted when the Vivado HLS cycle estimator was not able to determinethe performance of a hardware function.

In such cases, the execution time estimated by SLX for the software function isused instead, as shown in the Pure Software call graph of the application. Becausethere are no accurate cycle estimates for a hardware function, SLX cannot deduce the

125


execution time of the top-level hardware function and no speedup is thus computed forthis hardware function.

5.1.16 No Pragma Generated because a Loop is Declared in a Macro

Skipping annotation '[pragma_text]' because the loop is declared in a macro

The hint is emitted when SLX could not generate HLS pragmas because the loopto which the pragmas relate is declared in a preprocessor macro.

SLX FPGA does not attempt preprocessor macro modifications and will thus notgenerate pragmas in loops declared inside macros. When a loop needs to be anno-tated, it is recommended to declare it without using preprocessor macros. Alternatively,the preprocessed application code can be used instead of the original source code.

5.1.17 Cycle Time was Adjusted to Meet Timing

The clock period for all hardware functions was adjusted to a {supported|↪→ custom} period of [num] ns in order to meet timing

Some hardware function implementations may not meet timing, i.e., the computedcycle time from Vivado HLS exceeds the clock constraint applied by the FPGA ClockFrequency in the Configuration Editor. If the lowest clock frequency is already the oneused, then the computed cycle time will be used for this purpose, and this is consideredas a custom clock period. The Vivado HLS Synthesis Flow always reports the latter.

If the implementations of more than one hardware functions violate timing, thelongest cycle time is used as the common clock period of all hardware functions.

� The execution times shown by SLX FPGA with adjusted clock periods are esti-mates based on the hardware generated for the frequency initially configured.Because the hardware implementation may change with different frequencysettings, SLX FPGA can produce more accurate estimates by re-generatingthe hardware for a lower frequency.

5.1.18 Clock Constraint was Violated for a Hardware Function

The hardware implementation of function [function_name] did not meet the↪→ clock constraint of [num] ns ([num] MHz) but achieved a cycle time of [↪→ num] ns instead

126

5.2. HLS pragmas

When function function_name does not meet the timing constraints during VivadoHLS synthesis, SLX FPGA reports the actual cycle time for which the hardware sched-ule was achieved.

5.1.19 No Valid Solution to the Hardware Optimization Problemwas Found

No valid solution to the hardware optimization problem was found because the↪→ functions mapped to the FPGA may not fit in the device.Parallelism patterns are disabled by default.

No valid solution to the hardware optimization problem was found because thearea required by the hardware implementation is greater than the area available inthe device. As a consequence, SLX FPGA disables all parallelism patterns for eachhardware function. If desired, the patterns can be enabled manually before moving tothe synthesis phase using the CPU/FPGA Function Partition Editor.

5.1.20 Cannot proceed due to exceptions

Cannot use '[try|throw]': Exceptions and their corresponding C++constructs (throw, try, etc) are not yet supported in SLX FPGA.Click here for more details. Please contact [email protected] more information.

SLX FPGA currently does not support the use of exceptions or exception-handlingcode outside of system source files. In the presence of such constructs, SLX FPGAwill abort the analysis of the application and display the above error message. Thisbehavior is identical to using the -fno-exceptions compiler flag when compiling thesource code.

In order to enable the analysis of the application, please remove all the try-catchblocks and throw statements in the source code. Errors must be handled by returningerror codes and checking them in callers. Exception handling code (including code inthe test bench or CPU code) must be removed from the entire code base.

5.2 HLS pragmas

The following subsections summarize the HLS pragmas that can be generated by SLXFPGA. They are placed in the annotated version of users’ C code, with the annotatedcode placed under the <project>/output/codegen/hls. Pragmas will not be gener-ated for the source code which is not located below <project>.

127


5.2.1 Unroll pragma

1 // Loop annotated with HLS directives by SLX2 #pragma HLS unroll {{skip_exit_check}? factor=[n]}?

This directive annotates the body of a loop with the HLS unroll pragma. The loopwill be partially unrolled if a factor less than the number of iterations for the loop isprovided. For n that divides perfectly the number of iterations, the skip_exit_checkattribute will eliminate any intermediate checks for an early exit from the loop.

5.2.2 Pipeline pragma

1 // Loop annotated with HLS directives by SLX2 #pragma HLS pipeline

This directive annotates the body of a loop with the HLS pipeline pragma. Theloop will be pipelined automatically by Xilinx Vivado HLS with an initiation interval thatwill be computed from the HLS tool. This can be equal to 1 or more.

5.2.3 Loop Trip Count pragma

1 // Loop annotated with HLS directives by SLX2 #pragma HLS loop_tripcount min=[minimum] max=[maximum] {avg=[average]}?

SLX FPGA generates a loop_tripcount for every source-level loop in the program.This directive provides the minimum, maximum, and average number of iterations forthe loop. By emitting this pragma, the HLS tool is assisted so it can obtain more accu-rate cycle estimates for each loop individually. If there are no other sources of variationfor the data the program is operating on (e.g., different input arguments enabling dif-ferent execution paths), it is more likely to lead a specific cycle estimation for the entireexecution of the top-level hardware function on the FPGA.

5.2.4 Array Partition pragma

1 // Function annotated with HLS directives by SLX2 #pragma HLS array_partition variable=[var] {cyclic factor=[n]|complete {dim↪→ =0}?}

This directive annotates the body of a function with the HLS array_partition pragmafor incoming array variables that are read or written in parallel loops in the function.

An array partitioning pragma forces a specific memory organization for the arrayvariable which cannot be changed during the execution of the hardware function. To

128

5.2. HLS pragmas

avoid conflicts between different array partition pragmas for the same array variable,SLX generates this pragma once per array.

When the dimension clause, dim, is not specified, the array is partitioned across itsinner loop dimension by default. For completely partitioning a small multi-dimensionalarray, dim=0 is given, which means that such array will be partitioned completely acrossall its dimensions. A cyclic partitioning for a factor equal to the number of elements ina single-dimensional array is automatically converted to a complete one.

The array_partition pragma is usually combined with an unroll or pipelinepragma to specify how arrays will be split in local memory storage so that more parallelaccesses to them from the DLP workers are possible.

5.2.5 Array Reshape pragma

1 // Function annotated with HLS directives by SLX2 #pragma HLS array_reshape variable=[var] {cyclic factor=[n]|complete}

This directive annotates the body of a function with the HLS array_reshape pragmafor those of its arguments that are arrays read or written in DLP loops in the function.

An array reshaping pragma combines array partitioning (Section 5.2.4) with con-catenating elements in order to provide parallel access to the data. A cyclic reshapingfor a factor equal to the number of elements in a single-dimensional array is automati-cally converted to a complete one.

The array_reshape pragma is usually combined with an unroll or pipeline pragmato specify how arrays will be split in local memory storage so that more parallel ac-cesses to them from the DLP workers are possible.

5.2.6 Inline pragma

1 // Function annotated with HLS directives by SLX2 #pragma HLS inline

This directive annotates the body of a function with the HLS inline pragma so thatVivado HLS will inline the function inside its caller.

A callee function is inlined inside its caller when it is not a top-level hardware func-tion and it has a single static call site for each top-level hardware function. By allowinga single static call site we guarantee that we don’t exceed the area consumption thatwould occur without inlining.

5.2.7 Interface pragma

1 // Function annotated with HLS directives by SLX2 #pragma HLS interface [mode] port=[var]

129


The directive configures how the arguments of top-level hardware functions shouldbe synthesized. SLX FPGA generates interface pragmas for two main purposes: syn-thesizability and performance. In some cases, the default interface configuration is notsynthesizable and a specific configuration is required to synthesize the hardware. Inother cases, SLX FPGA determines that a particular interface configuration will lead toincreased performance and generates the appropriate directive.

130

6HLS Messages

SLX FPGA detects a number of user program attributes that can affect the quality ofresults or the feasibility of the functionality moved to the FPGA. These are formattedas errors, warnings and infos.

• An error has a catastrophic outcome to a certain phase or phases that shouldnormally be executed or accommodated by SLX FPGA.

• A warning will not have a catastrophic outcome but it is highly possible that thequality of results will be affected.

• An info is a piece of information that may be useful to the user.

6.1 Infos

The following subsections list the messages registered as info that are produced bySLX FPGA.

6.1.1 No Synthesizability Checks for OpenMP

info: In the openmp CODEGEN_MODE, no functions need be checked for↪→ synthesizability.

This is a reminder, that the synthesizability checks will not run. The reason isthat for the openmp code generation mode, there will be no mapping of a function tohardware. Therefore this property (whether the function is synthesizable or not) is notimportant.

6.1.2 Lightweight Synthesizability Checks

Synthesizability is checked in two steps: first, lightweight tests are performed to quicklycheck for trivial problems in all the functions and second, more advanced tests are per-

131

Chapter 6. HLS Messages

formed on demand to check all problems that may prevent the synthesis of functions.This strategy allows SLX FPGA to quickly dismiss functions which obviously cannot besynthesized.

info: Checking synthesizability of '[function_name]'

This message is emitted when the synthesizability of a function is checked.

info: '[function_name]' is not synthesizable

This message is emitted when a function is found to be non-synthesizable.

info: '[function_name]' is assumed synthesizable

This message is emitted when a function is assumed to be synthesizable for therest of the flow because no further checks are possible. This is the case for instancewhen no Vivado HLS installation is found. Please refer to Section 6.1.3 for more detailsabout how Vivado HLS is used to check synthesizability and Section 2.4.2 for directionsto configure Vivado HLS in SLX FPGA.

info: Queuing advanced synthesizability checks for '[function_name]'

This message is emitted when a function needs more advanced checks to deter-mine its synthesizability. The function will be retested later when requested from thefunction mapping editor.

6.1.3 Advanced Synthesizability Checks

This section describes the messages that can occur during the more advanced synthe-sizability checks. In this second step (triggered from actions in the function mappingeditor), Vivado HLS is invoked in pre-synthesis mode to evaluate the synthesizabilityof the functions. A valid Vivado HLS installation is then required to run these checks.See section 6.1.2 for details about the first batch of synthesizability checks.

info: Starting advanced synthesizability checks

This message is emitted when advanced synthesizability checks are performed onfunctions.

info: The synthesizability tests for hardware function [function_name]↪→ passed.

This message is emitted once a function was found to be synthesizable.

info: The synthesizability tests for hardware function [function_name]↪→ failed.

132

6.2. Warnings

This message is emitted when a function was found not to be synthesizable. Pleaserefer to the hints view to rewrite your program and make it synthesizable.

info: Advanced synthesizability checks done

This message is emitted when all the synthesizability checks have been performed.

6.1.4 Area exhausted but more optimizations are available

info: more parallelism is available but cannot be exploited because all the↪→ area is exhausted. More aggressive optimizations may however be available↪→ for this application. For more information, please contact↪→ [email protected]

SLX FPGA stops exploring parallelization opportunities for two main reasons: whenall the identified parallelism is exploited and when all the FPGA resources have beenused. In the latter case, it may be possible to reach a better design by re-calibrating thearea model of SLX FPGA for the application to optimize. Please contact [email protected] help in performing this procedure.

6.2 Warnings

The following subsections list the warning messages that are produced by SLX FPGA.

6.2.1 Hardware Function Has Been Optimized Away

warning: Total execution time for the hardware function is zero. Verify that↪→ hardware function [function_name] does useful work so that it is not↪→ optimized away.

This message results from analyzing the synthesis results of the hardware function.If the maximum execution cycles is zero, this means that the hardware implementationwas optimized away. This is possible if it was reduced to plain wiring, or if it wasreduced to an asynchronous ROM (LUT-based) structure.

6.2.2 Area for the Hardware Functions Is Exceeded

warning: Number of {BRAMs | DSPs | FFs | LUTs} for {hardware function [↪→ function_name] | all hardware functions} exceeds the available resources:↪→ ([num] / [total])

133


This warning is reported when the pre-synthesis utilization estimates from VivadoHLS have been exceeded. The number of block RAMs (BRAMs), DSP blocks (DSPs),flip-flops (FFs), and look-up tables (LUTs) are checked for each hardware function. Ifthe available resources on the FPGA device are exceeded from any hardware functionor in aggregate, this message is produced.

6.2.3 Running Limited Synthesizability Checks Without the XilinxTools Enabled

warning: No valid Xilinx Vivado installation found. Only a few vendor-↪→ independent synthesizability checks will be performed.

When the Xilinx installation is not configured in the SLX FPGA Preferences page,the functionality of SLX FPGA is limited. Vivado HLS will not be invoked and the ac-celerated call graph will be based on best effort estimations from the dynamic analysis.To decide which functions are not synthesizable independent of the vendor synthesistool used, SLX FPGA runs a limited set of synthesizability checks on the user code.

6.2.4 Function Argument cannot be a PS/PL Interface

warning: Argument [varname] of [function_name] cannot be a PS/PL interface.↪→ Is it aliased in the function? Aborting interface generation (unsafe)

Arguments of top-level hardware functions are synthesized as interfaces in the gen-erated hardware. Because of the specific status of these function arguments, SDximposes restrictions on how they are used in the C/C++ program. In particular, it isexpected that all the uses of the arguments are performed directly in the top-level hard-ware function, directly using the variable name explicitly, excluding any sort of alias. Inparticular, array parameters cannot be transmitted through a function call and used ina callee. When such incorrect use of a top-level function argument is detected, SLXFPGA emits the message above and aborts any interface generation. Even though notcertain, it is likely that the generated hardware will be incorrect in this case.

The recommended workaround is to explicitly copy the content of array argumentsinto local copies and perform all operations on these copies instead.

6.2.5 HW_FUNCTION not Found

warning: could not find the function '[function_name]' from HW\_FUNCTION,↪→ ignoring it.

134

6.3. Errors

This warning is displayed when a top-level hardware function specified in HW_-FUNCTION (Section 7.4.3) could not be found in the application. To resolve this, updateHW_FUNCTION with the correct function name.

6.3 Errors

The following subsections list the error messages that are produced by SLX FPGA.

6.3.1 Early Synthesizability Checks Failed

error: Early synthesizability check failed: [reason]

SLX FPGA runs some synthesizability checks prior to invoking Xilinx Vivado HLS. This error message is emitted when SLX FPGA determines that a function cannotbe synthesized. Most of the time, a function is deemed not synthesizable because itsname is not supported by Xilinx HLS tools. The exact reason for the failure is shown atthe end of the error message.

6.3.2 Hardware Functions with Synthesizability Errors Cannot BeSynthesized

error: [function_name] cannot be implemented in hardware because of↪→ unsynthesizable constructs. Aborting.

This message is produced when the user decides to circumvent the graphical func-tion mapping editor and to provide a wrong property (true instead of false) for theSynthesizability of a function mapped to the FPGA partition. The error is detectedbecause the user-specified value for the synthesizability would, in this case, contradictthe results from the synthesizability checking process.

6.3.3 Vivado HLS Failure

error: Vivado HLS failure detected. Please check the Vivado HLS output for↪→ further details

This error is triggered when an invocation of Xilinx Vivado HLS used for synthesisfailed for an unidentified reason. The error may happen for well-formed programs thatcan be compiled and analyzed by SLX but cannot be handled by Vivado HLS. Theproper way to fix the problem depends on the reason of the Vivado HLS failure. Usefulinformation can be found in the generated vivado_hls.log file.

135


6.3.4 No Hardware Implementation If There Are No Files Under theBASEPATH

error: The application has no copied or generated source files. Please↪→ verifythat all needed files are copied or generated under the BASEPATH. Aborting

A hardware implementation is not possible if SLX cannot determine any source filesto copy unmodified or any generated files with HLS pragmas and/or wrapped functioncalls. In such case, there are no target files to be passed to Vivado HLS and the flowmust be aborted.

A likely cause of the problem is not specifying the BASEPATH correctly, as files out-side of the BASEPATH are not considered further in the flow.

136

7Configuration Variables

SLX supports configuration variables so that you can specify the information neededto build your application, run it with a representative workload, and control where SLXshould focus its efforts when searching for parallelism opportunities.

These variables are set through the Configure Project editor. This chapter de-scribes the variables in detail. The names used are those seen in the configuration filedefines.mk, instead of the ones seen in the user interface. These variables are usedas make variables, and therefore follow make syntax. Note that defines.mk is only usedto store the variable assignments and is not a true make file.

7.1 Build Options

The build option variables allow users to specify the commands needed to build andrun the application. SLX provides a dedicated compiler and related build tools that theuser’s build system must use instead of the standard ones for their system.

When the SLX compiler is used, it stores the compiler’s representation instead ofmachine code in the program in object files (.o, .a, etc.). While linking, it can then addinstrumentation to the combined representation when producing machine code for thewhole application. When the application is run, the added instrumentation code storestracing information in the SLX FPGA project.

With this approach, SLX FPGA does not have to know in advance how to buildand run the application, but performs its work by being used by the user’s own buildsystem. SLX FPGA calls the user’s build system by executing the command specifiedin the USER_BUILD variable to build the application and the USER_RUN variable to executeit.

While the USER_BUILD and USER_RUN variables are sufficient, it can be time-consumingto fully rebuild the application each time the analysis is performed. Three other config-uration variables allow more control for re-building only the minimum necessary for acorrect result.

137

Chapter 7. Configuration Variables

• USER_INIT (see Section 7.1.1) is an optional command that is called once fora freshly imported or a cleaned project. It should prepare the application buildsystem so that the next use of USER_BUILD will compile and link the sources to beinstrumented and USER_RUN will run the resulting executable. This scripts is runwith the SLX compiler in a mode that does not instrument the code it compiles.

• USER_CLEAN (see Section 7.1.3) is an optional command that is called wheneverthe Clean Project button is used.

• USER_SLX_MODE_SWITCH (see Section 7.1.2) is an optional command that is calledbefore USER_BUILD when swapping between the non-instrumented Run Code andthe instrumented Trace Source. The only difference between these two buildcontexts is in whether or not instrumentation is added when linking the executable- in both cases the sources are compiled identically. It is sufficient for USER_SLX_-MODE_SWITCH to make sure USER_BUILD will re-link all executables (to add instru-mentation code as appropriate) and USER_RUN will run the re-linked application.

7.1.1 USER_INIT

The USER_INIT command is used once for a clean SLX project, to perform any ini-tialization or non-instrumented build steps, before the first call of any other USER_*commands.

USER_INIT := <shell_commands>

The command is executed from a Python script with bash. However, if the commandbegins with python or python3, the Python script is called directly, without bash. Ineither case, the current working directory is the spec/ folder.

It can be used to initialize the user’s build system, generate input data and othersetup operations that can be scripted.

SLX sets the same environment variables as for USER_BUILD (see Section 7.1.5).Sources compiled with the tools during USER_INIT will not be instrumented, but other-wise the result is fully compatible with the instrumenting variants. This means USER_-INIT can be used to compile sources that should not be instrumented.

After USER_INIT is finished, the build system must be left in a state that the sourcesto be instrumented will be built by the next use of the USER_BUILD command. Forexample, if the build system is make-based then removing the matching object files isusually sufficient.

7.1.2 USER_SLX_MODE_SWITCH

As discussed in Section 7.1.5, SLX builds and runs the application in different modesdepending on the need to add instrumentation when linking or not. Whenever SLX willcall USER_BUILD in a different mode, it first calls the USER_SLX_MODE_SWITCH commandso the command can prepare the build system to relink the application on the next

138

7.1. Build Options

use of the USER_BUILD command. For many build systems, all that is needed to forcere-linking is to delete the application’s binary before executing the build command.

USER_SLX_MODE_SWITCH := <clean_command>


When executed, the environment variable SLX_MODE is set as described in Sec-tion 7.1.5.

After USER_SLX_MODE_SWITCH is finished, the build system must be left in a state thatthe application executable will be re-linked and run by the next use of the USER_BUILDand the USER_RUN commands.

If USER_SLX_MODE_SWITCH is not set, it defaults to USER_CLEAN.

7.1.3 USER_CLEAN

SLX calls the USER_CLEAN command to allow optional clean up of the user’s build sys-tem when the Clean Project is requested. It is a discretionary action, and not essentialto the working of SLX.

USER_CLEAN := <clean_command>


7.1.4 USER_BUILD_AND_RUN

USER_BUILD_AND_RUN should be a command line that compiles the application and ex-ecutes it with a representative workload. This variable is deprecated in favor of USER_-BUILD and USER_RUN. If USER_BUILD is not set, it defaults to USER_BUILD_AND_RUN.

7.1.5 USER_BUILD

USER_BUILD should be a command line that compiles the application.

USER_BUILD := <build_command>

The command is executed from a Python script with bash. However, if the commandbegins with python or python3, the Python script is called directly, without bash. Ineither case, the current working directory is the spec/ folder. The exit status of thecommand should be zero only if it succeeds.

139


Unless the build command is very simple, it may be more convenient to add ascript to your project with the necessary commands and set USER_BUILD to executethat script.

SLX sets the following environment variables before calling the command:

CCthe SLX Clang/LLVM C compiler.

CXXthe SLX Clang/LLVM C++ compiler.

ARthe SLX llvm-ar archive utility.

RANLIBthe SLX llvm-ranlib archive-indexing utility.

SLX_MODEthe instrumentation mode, see below.

It is essential that these tools are used when building the application instead of thesystem defaults. The make build system will use these environment variables automat-ically in the default rules, though the application’s makefiles should be checked to seethat the variables are used and that an alternative definition is not provided. Otherbuild systems may require other mechanisms to make sure they use the correct tools.

The command is called for different purposes by SLX. The purpose is indicated byan environment variable, SLX_MODE, that is set before calling the command. While moremodes may be introduced, the current modes are:

• noninstrumented: this is the mode used by Run Code to check everything workswithout the addition of SLX instrumentation.

• instrumented: this is the mode used by Trace Source to collect dynamic tracesfrom the execution of the application with SLX instrumentation added to it.

7.1.6 USER_RUN

USER_RUN should be a command line that executes the application with a representativeworkload.

It is called by SLX after USER_BUILD if the application must be executed to completeinformation gathering for a requested action.

USER_RUN := <run_command>

The command is executed from a Python script with bash. However, if the commandbegins with python or python3, the Python script is called directly, without bash. In

140

7.2. Analysis Configuration Variables

either case, the current working directory is the spec/ folder. The exit status of thecommand should be zero only if it succeeds.

When executed, the environment variable SLX_MODE is set as described in Sec-tion 7.1.5.

Unless the execute command is very simple, it may be more convenient to add ascript to your project with the necessary commands and set USER_RUN to execute thatscript.

The command is called only when the previous call to USER_BUILD was successful.

7.1.7 OPT_LVL_AFTER_INSTRUMENTATION

OPT_LVL_AFTER_INSTRUMENTATION variable specifies which optimization level will beused after instrumentation. Optimization level detected in the user build commandwill be used, if the OPT_LVL_AFTER_INSTRUMENTATION is not defined.

OPT_LVL_AFTER_INSTRUMENTATION := <0|1|2|3|s|z>

Value given by this variable will override the optimization level specified in the orig-inal build command. Specifying 0 here will turn of the optimization pipeline entirely.

7.1.8 TARGET_LDFLAGS

TARGET_LDFLAGS contains the set of linker flags transmitted to the compilation toolchainsgenerating code for the target hardware platform. This includes any cross-compilergenerating software for the target platform but some flags may also be used by hard-ware generators.

7.2 Analysis Configuration Variables

The configuration variables used for analyzing and parallelizing the application aresummarized in Table 7.1, together with their defaults. M/O stands for mandatory oroptional parameter, the former needing to be explicitly specified by the user.

141


Table 7.1: Configuration variables for both models.

Variable name Description Defaultvalue

M/O

TARGET (Section 7.2.1) Target application name N/A MPLATFORM (Section 7.2.2) Target platform model to consider for applica-

tion partitioningN/A M

CANDIDATE_THRESHOLD(Section 7.2.3)

Execution percentage ratio for a function to beaccounted for analysis

empty string O

FUNCTIONS (Section 7.2.4) A comma-separated list of specific functions tobe analyzed. For C++ applications, it is the listof the linkage (mangled) names of the func-tions or fully qualified, display or base namesif they are able to uniquely identify the function

empty string O

FUNCTION_EXCLUDES(Section 7.2.5)

A comma-separated list of specific functions.For C++ applications, it is the list of the linkage(mangled) names of the functions or fully qual-ified, display or base names if they are able touniquely identify the function

empty string O

BASEPATH (Section 7.2.6) Root directory in which focusing the analysis . (projectroot)

O

OPTIMISTIC_ANALYSIS(Section 7.2.7)

Perform data-dependency analysis optimisti-cally with respect to loop-carried dependencies

0 O

FIND_PLP (Section 7.2.8) Enable pipeline-level parallelism analysis 1 OFIND_DLP (Section 7.2.8) Enable data-level parallelism analysis 1 OANY_EXIT_STATUS (Sec-tion 7.2.9)

Allow applications to terminate with non-zeroexit status

0 O

VERBOSITY_LEVEL (Sec-tion 7.2.10)

Amount of diagnostic information to produce VERBO O

REMOTE_PROF_DIR (Not available for SLX FPGA) Indicates the lo-cation where files are generated on the target

slxprofile O

7.2.1 TARGET

TARGET is a meaningful name for the application that will be used as the project name.It must always be provided from the user as it has no default value. Its value will beused to name generated binaries.

TARGET := workshop_xlp

7.2.2 PLATFORM

PLATFORM is a mandatory setting that specifies the target platform to be used. It refersto a SLX platform model file that is normally distributed with the tools. This variable

142


can be given either as an absolute path or as a basename (with or without file ex-tension). For the latter, the corresponding platform file will be selected from the in-stallation directory. SLX ships with a set of platform descriptions located under the$SLX_HOME/data/platform folder in your SLX installation.

For example, to specify that the exynos platform file will be used, either of thefollowing can be used:

# either basename without extensionPLATFORM := exynos

# or basename with extensionPLATFORM := exynos.platform

# or absolute pathPLATFORM := $(SLX_HOME)/data/platform/exynos.platform

7.2.3 CANDIDATE_THRESHOLD

CANDIDATE_THRESHOLD defines which functions are interesting for analysis purposesbased on their contribution to the total execution time. The criterion is only appliedwhen FUNCTIONS is not set, that is when hotspots are automatically selected by SLX.

The following example shows how the variable can be used to restrict the analysisto function whose execution time is greater or equal to 50% of the total applicationexecution time.

FUNCTIONS :=CANDIDATE_THRESHOLD := 50 # user-defined threshold

CANDIDATE_THRESHOLD allows for setting a reduced number of candidates whichare the ones that actually bring a reasonable speedup. It can take any integer valuebetween and inclusive 0 and 100.

Amdhal’s law helps estimating how much speedup can be expected from everyfunction selected based on the value of CANDIDATE_THRESHOLD. Amdhal’s law statesthat the execution time of a parallel program is bounded by the execution time of itssequential part plus that of its parallelizable part divided by the number of availableprocessor cores:

tparallelized ≥ tsequential +tparallelizableNprocessors

Setting CANDIDATE_THRESHOLD to 75% on a platform with 4 processor cores wouldlead to only consider functions which could potentially see their execution time reducedto 25 + 75/4 = 43.75% of the sequential execution time (2.29× speedup). A largervalue of CANDIDATE_THRESHOLD would focus the analysis on functions with even greaterpotential, if any exists.

143


i If CANDIDATE_THRESHOLD is set to 0, it is effectively disabled.

7.2.4 FUNCTIONS

FUNCTIONS is used for setting a list of comma separated, user-defined top-level hard-ware function candidates that auto-selection will propose irregardless of their workload.

i Usually it is more convenient to directly specify these in HW_FUNCTION.

FUNCTIONS := main,foo,bar # this is the list of candidatesCANDIDATE_THRESHOLD := 50 # no effect as FUNCTIONS is not empty

The FUNCTIONS variable allows the users flexibility while selecting candidates.

i The CANDIDATE_THRESHOLD and FUNCTIONS variables are mutually exclusive.This means that the user should only use one of the two methods for con-straining the parts of the input application that will be analyzed. If both arespecified, priority is given to the function list provided by FUNCTIONS.

i For C++ applications, it is recommended to use the link (mangled) names thatuniquely identify a function (e.g., _ZN4myns10mytmplfuncIiiEET_T0_). If a fullyqualified name (e.g., myns::mytmplfunc<int, int>), a display name (e.g.,mytmplfunc<int, int>) or a base name (e.g., mytmplfunc) is able to uniquelyidentify the function, these can be used as well.


slxcmd:0:0: warning: Multiple alternatives found for function’thefunc’ in FUNCTIONS: ’_ZN3ns17thefuncEi’, ’_ZN3ns27thefuncEi’.Selecting none of those by default. Please use a non-ambiguousfunction name in the configuration.

144


7.2.5 FUNCTION_EXCLUDES

FUNCTION_EXCLUDES is used for explicitly excluding functions from being considered forauto-selection of top-level hardware functions. In this case, the user defines a comma-separated list of functions to ignore:

FUNCTION_EXCLUDES := foo,bar # this is the list of functions to ignoreCANDIDATE_THRESHOLD := 50 # applied

The FUNCTION_EXCLUDES variable allows the users flexibility while removing candi-dates. FUNCTION_EXCLUDES is ignored if the user has also specified FUNCTIONS to forcea list of user-defined candidates. It is only considered if CANDIDATE_THRESHOLD (thedefault mechanism for deriving a list of candidate functions) is applied.

� Excluded functions by FUNCTION_EXCLUDES also has effect on the call graph,where the function is treated as having zero cost.

i For C++ applications, it is recommended to use the link (mangled) names thatuniquely identify a function (e.g., _ZN4myns10mytmplfuncIiiEET_T0_). If a fullyqualified name (e.g., myns::mytmplfunc<int, int>), a display name (e.g.,mytmplfunc<int, int>) or a base name (e.g., mytmplfunc) is able to uniquelyidentify the function, these can be used as well.


slxcmd:0:0: warning: Multiple alternatives found for function’thefunc’ in FUNCTIONS: ’_ZN3ns17thefuncEi’, ’_ZN3ns27thefuncEi’.Selecting none of those by default. Please use a non-ambiguousfunction name in the configuration.

7.2.6 BASEPATH

Many source projects use code external to the project, such as system or supportheader files and library code. As these are combined to form the application, SLXwill also analyze those sources to better understand the application, and will generateoptimization hints when it discovers potential opportunities there. But often the userhas little influence on external code, and would like to focus on the project sources.The BASEPATH variable allows specifying the root path of the interesting files for whichSLX should display analysis results.

145


Consider the following setting:

BASEPATH := /home/me/myproject

This would mean that only sources within /home/me/myproject or any of its subdi-rectory will be considered in the different graphical representations of the program.

The BASEPATH variable is optional. If it is absent or empty then all the source filesare analyzed. When a relative path is assigned to BASEPATH, it is interpreted as beingrelative to the <project> path. To focus the analysis on all the files in the SLX project(and only those files), you can set BASEPATH to ".".

7.2.7 OPTIMISTIC_ANALYSIS

OPTIMISTIC_ANALYSIS allows the user to select between two approaches to dealingwith some cases of array and struct access patterns:

• optimistically assuming these patterns will not inhibit parallelization, or

• conservatively assuming they will.

The analysis of some dependencies is inexact, e.g. if an array is accessed non-sequentially, the access information on that array is generally inexact and this maygive rise to dependencies that block certain parallelization patterns. These inexactdependencies are ignored if OPTIMISTIC_ANALYSIS is set.

Exact dependencies will never be ignored. Accessing scalar variables in a normalway will typically generate exact dependencies, as will accessing arrays sequentiallyand consecutively.

Optimistic Analysis is most useful in exploring optimization possibilities, but re-quires greater care in assessing that opportunities for parallelization are valid. By de-fault, it is disabled (OPTIMISTIC_ANALYSIS is set to 0.)

The user can enable Optimistic Analysis by using the Configure Project editor andselecting the corresponding option. This results in modifying defines.mk by setting:

OPTIMISTIC_ANALYSIS := 1

7.2.8 FIND_PLP, FIND_DLP

These options allow for selectively disabling the identification of specific kinds of par-allelism. FIND_DLP disables automatic unrolling, and FIND_PLP disables automaticpipelining. These are useful to focus on a subset of the parallelization possibilitiesthat SLX would provide by default. In the Configure Project by default, all kinds of par-allelism (PLP, and DLP) are enabled and the corresponding checkboxes are marked.If, for instance, DLP is not to be explored, the corresponding checkbox is deselected.Then, the relevant entry in defines.mk is automatically modified to:

146


FIND_DLP := 0

7.2.9 ANY_EXIT_STATUS

ANY_EXIT_STATUS can be used to allow for EXIT_FAILURE or an implementation-definedexit status from main(). This feature allows for applications to exit with non-zero statusintentionally. By default, SLX expects that an application returns by zero or EXIT_-SUCCESS.

Supporting any exit status is enabled by adding the following to defines.mk:

ANY_EXIT_STATUS := 1

The exit statuses of non-instrumented execution (Run Code described in Sec-tion 2.6.2) and dynamic analysis (Trace Source in Section 2.8.1) are compared. Ifthe exit statuses do not match, SLX will stop. The typical use case is to first invokeRun Code, and then Trace Source. This will fail if for some reason execution on thehost and of the instrumented executable do not return to the environment by the sameexit status.

� If Run Code is not executed, the exit status of Trace Source will be acceptedas is, as there is no reference from host execution to compare with.

7.2.10 VERBOSITY_LEVEL

VERBOSITY_LEVEL is an optional setting used for specifying the verbosity of generatedmessages from SLX logged under the ../log folder relative to the spec folder. Thedefault value is VERBO which stands for highly verbose output. The accepted values forthis setting are:

• VERBO: The default setting for highly verbose output.

• INFOR: Only output informational messages.

• FUNCT: Only output messages about functions performed by SLX, e.g., analysisor partitioning hints.

• WARNG: Only output warnings.

• EXCEP: Only output messages about exceptions met when SLX is run.

• ERROR: Only generate messages about errors.

147


Messages of the EXCEP or ERROR kind will terminate SLX with an error condition.Messages of any other kind will not affect the operation of SLX.

To set verbosity to only show warnings, the following is used:

VERBOSITY_LEVEL := WARNG

7.3 Code Generation

These variables are used for controlling the behavior of code generation in SLX.

7.3.1 BASEPATH

The BASEPATH variable allows specifying the root path of interest for the code annotatedby SLX (see Section 7.2.6).

Consider the following setting:

BASEPATH := /home/me/myproject

This would mean that only source files within /home/me/myproject will be consid-ered when generating pragmas. Additionally, the variable is also used to determinewhere annotated source files are written: if the file /home/me/myproject/src/t.c isparallelized, the corresponding annotated file will be produced in <project>/codegen/hls/src/t.c (see also Section 2.7.6).

When the BASEPATH variable is left empty, it is assumed to be the project root di-rectory during code generation. The project directory structure will thus be replicatedunder <project>/codegen/hls/.

7.4 HLS Code Generation

These variables are used for controlling the behavior of HLS code generation in SLXFPGA.

7.4.1 SYNTHESIS_FLOW

This configuration variable is used to select which synthesis flow will be used as thebackend of SLX for FPGA. There are three possibilities:

• vivado_hls: Xilinx Vivado HLS

• sdsoc: Xilinx SDSoC

• est: SLX estimation flow

148

7.4. HLS Code Generation

The first two options, vivado_hls and sdsoc, will use the Xilinx tools given that theyhave been properly setup for use with SLX. .

The est option is only available for command-line usage. When specified, it willuse a high-level estimation flow for cycles and area estimation of hardware functions.

The default value is vivado_hls.

7.4.2 FPGA_PART

To specify the FPGA part to use in an SLX platform, the FPGA_PART variable can beused. This is used to configure the corresponding programmable logic element inthe SLX platform description. It is necessary if the SLX platform does not specifyan FPGA part name already, but an FPGA architecture name instead. For instance,xc7z045-ffg900-2 is the name of an FPGA part that is one of many xc7z045 devicesand all of these devices belong to the zynq architecture (FPGA family.) If in the SLXplatform description the Logic element’s core attribute has the name of an FPGA part,then the core attribute value will be used. However, if it only specifies an architecture,e.g., zynq, then FPGA_PART should be set to the specific FPGA part.

Example of use:

FPGA_PART := xc7z045-ffg900-2

7.4.3 HW_FUNCTION

To specify the Top-Level Hardware function or functions, the HW_FUNCTION variable canbe used. The specified functions will be implemented as accelerators on the FPGA.All their callees will also be moved to hardware. In case a function is called by theremaining software part and directly or indirectly from the Top-Level Hardware functionthen there will remain a software version in use for these functions. A hardware functionmay include calls to certain supported library functions of the standard C libraries butnot everything is supported. Please check the documentation of Xilinx Vivado HLS formore details.

If HW_FUNCTION is left unspecified, no functions are initially mapped to FPGA butfunctions can later be mapped to FPGA using the function mapping editor. To specifymore than one Top-Level Hardware Function, a comma-separated list of their namesshould be provided.

HW_FUNCTION := myfunc

7.4.4 FPGA_CLOCK_FREQUENCY

This command allows the user to specify the clock frequency constraint for the hard-ware implementation of Top-Level Hardware Functions (accelerators.) This value sets

149


the frequency constraint that the third-party high-level synthesis tool, in this case XilinxVivado HLS, will try to satisfy. The clock frequency is specified in megahertz (MHz).The default setting is 100 (MHz).

FPGA_CLOCK_FREQUENCY := 200

7.4.5 DM_CLOCK_FREQUENCY

As with the case of FPGA_CLOCK_FREQUENCY (Section 7.4.4) the data motion networkclock frequency is specified in megahertz (MHz). The default setting is the value ofFPGA_CLOCK_FREQUENCY.

For the vivado_hls synthesis flow, DM_CLOCK_FREQUENCY has no effect.

DM_CLOCK_FREQUENCY := 300

7.4.6 SYNTHESIS_MODE

With this command, the user can specify whether only the high-level synthesis esti-mates from the third-party high-level synthesis tool will be considered, or whether a fulllogic synthesis and implementation including place-and-route must be done. The resultof using the former option (specified by the string presyn) is that the original estimatesfrom the HLS tool will be used. The estimates will be relatively fast to obtain but mightdeviate from the final implementation statistics. The latter option (specified by pandr)when synthesized with Xilinx Vivado HLS, provides the actual area for the implementeddesigns of the Top-Level Hardware Functions and their associated interconnect logicbut is much slower (up to several magnitudes). It should be noted that the originallyestimated FPGA cycles are used for both the presyn and the pandr options; no C/RTLco-simulation is performed alongside pandr.

The default setting is presyn.

SYNTHESIS_MODE := pandr

7.4.7 EST_MODE

With this command, the user specifies whether the best-case or the worst-case high-level synthesis estimates will be considered. In many cases these will not be the same,when a hardware function uses certain mathematical functions from the standard Clibrary, as their duration in cycles may be data-dependent.

There are two possible settings, worst, and best for specifying that the maximumor the minimum interval cycles will be used for the estimated hardware function. Theinterval cycles give the number of cycles needed between two consecutive calls to thehardware function in order to consume the new inputs.

150

7.5. Minimal defines.mk

The default setting is worst.

EST_MODE := worst

7.4.8 AUTO_COMPLETE_PARTITION_MAX_ELEMS

All statically allocated arrays with this number of elements or less are completely par-titioned across all dimensions by SLX FPGA.

The Xilinx tools impose the upper limit of 1024 for this so a number between 0 and1024 should be given. Setting the value of this variable to 0 means that the feature isdisabled.

The following example sets this limit to 128 elements.

AUTO_COMPLETE_PARTITION_MAX_ELEMS := 128

i This feature only affects arrays for which the array_partition pragma is con-trolled and generated by SLX FPGA. Arrays are considered for partitioningonly if they are accessed in a parallel loop.

7.4.9 SYNCHK_CXXFLAGS

Extra compiler flags used only during synthesizability checks. These compiler flagswill be ignored all the time, except during the synthesizability checks where they willbe passed to the underlying third-party compiler performing the checks along the otherflags.

7.4.10 SKIP_XILINX_SYNTH_CHECKS

When set to 1, Xilinx Vivado is not invoked to check synthesizability of the selectedHW_FUNCTION before SLX parallelism selection runs. Instead, it is assumed that theselected function is synthesizable. This helps to save time.

7.5 Minimal defines.mk

Based on the information from Section 7.1 and Table 7.1, the representative minimaldefines.mk would contain the following variables:

TARGET := appPLATFORM := $(SLX_HOME)/data/platform/clustered_arm.platform

151


Figure 7.1: Sample project utilizing a Makefile

USER_BUILD := $$CC app.c -o appUSER_RUN := ./app%USER_CLEAN := rm -f ./app

For SLX FPGA, an FPGA-enabled platform, PLATFORM := zc706, can be specifiedinstead.

TARGET corresponds to the name of the source code project. PLATFORM can pointto any valid platform model that is available to the user – in this example it is theZC706 Xilinx UltraScale FPGA board which features a dual-core Cortex A9-basedProgrammable System with Programmable Logic (zc706.platform at the installationlocation is inferred) that is provided with the release of SLX.

The USER_BUILD, USER_RUN, and USER_CLEAN configuration variables are describedin detail in Section 7.1. For the minimal example they are defined as the simplestcommands to build the application executable, run it, and delete it, respectively, for a“Hello, World” application on a Linux system. Strictly, USER_CLEAN can be left unspeci-fied, which would mean that the user does not intend to enable an automatic action forremoving the project’s executable.

The SLX_HOME environmental variable has already been defined during the initialinvocation of SLX.

7.5.1 Sample Project with Makefile

The example project "makefile_cpp_example", which can be found under the direc-tory examples/fpga, illustrates how a Makefile can be used to conveniently executebuild and run commands. In this example, the Clean, Build and Run commands willexecute the Makefile’s contents with the respective options and targets as defined inthe Configuration Editor .

152

8SLX Views, Editors and Dialogs

This section describes views and editors of the SLX perspective.

8.1 Console View

SLX Tools contributes a customized console to the Console view, which displays theoutput of a process. Each application is assigned a console view page, which theuser can change from the Console dropdown list. The console page is activated whenthe user selects the application in the navigator or executes any process using thetoolbar item. The console page is removed when the user closes or deletes the project.Rename project also changes the name of the console.

The output console shows several different kinds of text, each in a different color:

• Standard output

• Standard error

Figure 8.1: SLX Console view reporting DLP/PLP partitions with links to the sourcecode

153

Chapter 8. SLX Views, Editors and Dialogs

Figure 8.2: SLX Console color preferences

• Standard input

Build console provides also highlighting of build problems. You can use Hyperlinkon a highlighted line to open code in an editor when error parsers are able to determinefile and line from build output. You can control console highlighting with the Parseconsole output option on the SLX preference page.

You can choose the different colors for these kinds of text on the preferences panel.

154

8.2. SLX Tables

8.1.1 Console view toolbar

The table below lists the toolbar options displayed in the Console view.

Icon Name Description

Scroll Lock Toggles the Scroll Lock.

Clear Console Clears the current console.

Pin Console Forces the Console view to remainon top of other views in the window area.

Display Selected Console If multiple consoles are open,you can select the one to display from a list.

8.2 SLX Tables

SLX presents information in various views using tables. Many tables in SLX sharecommon behaviors. This section provides an overview of these behaviors. Individualtables are described in other sections, providing additional details specific to thoseindividual tables.

8.2.1 Table Interactions

Tables are made of cells arranged in rows and columns. The topmost cells are columntitle cells. Clicking on a column title sorts the table by that column. The column sorthas three modes: upwards, downwards, and unsorted. The precise behavior of sortingdepends both on the table type and the column type.

The second cell in each column is a filter specifier with a tooltip to hint at howthat column can be filtered. More than one column can have a filter specified. Rowssatisfying all filters will be selected. See column types in Section 8.2.3 to understandfilter specifications, and table types in Section 8.2.2 to understand which rows will bedisplayed for a given filtered selection.

Columns and rows are initially created with preset sizes. If the table is too wide ortall for the view it is in, use the scroll bars to see the other parts of the table. Columnscan be resized by dragging the right-hand side of the column to the left or the right, orcan be auto-sized or hidden using the context menu for that column title. The Show allcolumns option on a column title context menu makes hidden columns visible again.

Individual tables may have specific actions associated with interactions such asupon selecting a row, double-clicking, and different context menu items.

155


8.2.2 Table Types

There are two table types in SLX:

• Simple tables, which have independent rows of information.

• Tree tables, where some rows are grouped together in a tree hierarchy.

The difference between the two shows up in:

• Navigation: this is extended for tree tables by allowing trees of rows to be:

– Collapsed, displaying outer rows preceded by I to indicate that there areinner rows that are not displayed. Clicking on the black triangle expands therow to display the inner rows directly under that row.

– Expanded, so inner rows are all displayed.

• Sorting:

– Rows can be sorted by clicking on the column header. The header willdisplay a black triangle indicating the direction of the sort (sort in ascendingor descending order). To clear the sort, click on the header until the sortingicon (black triangle) disappears.

– Simple table rows will be sorted relative to all other rows.

– Tree tables will not move rows to different trees but only move them relativeto other inner rows of the same outer row.

• Filtering:

– Rows can be filtered by clicking on the filter icon (funnel icon) under thecolumn header for the column to be filtered by. Hover over the icon for atooltip hint regarding the valid filter criteria.

– Applying simple table filters displays only the rows that match the columnfilters.

– Applying tree table filters identifies a set of rows that match all filters. Thoserows, their inner rows, and the outer rows that lead to them are displayed.

8.2.3 Column Types

The table columns are used to present different types of information. The main typesare:

• Simple text:

– Sorting is case-sensitive alphabetical.

– Filters for these columns can be toggled (see Figure 8.3) for the table to beeither:

156

8.2. SLX Tables

* Simple: case-insensitive, partial match is sufficient, no wildcard charac-ters are supported.

* Regular expressions: to filter by regular expressions, click on the En-able Regular Expression icon (white square) at the table’s toolbar (topright). Note that this will enable filter by regular expressions for all ta-ble columns that support regular expressions. To toggle back to simpletext, click on the icon again. The matched expression must match fullcell contents. For details of regular expressions usage, see the sup-port link: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#sum. Note that to filter names within a tree struc-ture, the following syntax must be used: \Qtoplevel\E/.* where "toplevel"represents the top-level tree name (e.g. Top-Level Function), and "/.*"matches all child nodes under the tree. The "/.*" is required to match ahidden part of the branch name that cannot be seen: e.g. "toplevel" isactually labeled as "toplevel/id<num>" internally. Thus, to filter for every-thing within the tree, entering "/.*" is required in order to match all child-nodes. The \Q and \E ensures that the textual string is expressed as avalid regular expression, and must be added if the string contains char-acters special to Java regular expressions (e.g. f<int>). See Figure 8.4for an example of how to insert regular expressions for tree structures.

• Number:

– Sorting is by numerical value.

– Filters must be specified with an initial operator (=/!=/<=/</>/>=) followed bya number. Typing in a value to be filtered without the operator will fail togenerate any results.

– Visualization can be:

* Simple numbers

* Numbers displayed with additional information: Units (e.g. ns/ms/. . . )or a percentage indication, displayed both in brackets after the numberand by coloring an appropriate portion of the cell background.

Z When filtering for non-integer values (e.g. 100.01), make sure to type the entirevalue, including the decimal for exact matches to be displayed.

• Enumerated:

– Sorting follows the order of the values as presented for filtering.

– Filters are specified using check-boxes in the drop-down list that is presentedwhen the filter box is clicked for this column.

157

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#sum

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#sum


Figure 8.3: Regular expression matching in text columns (can be toggled on or off)

Figure 8.4: Regular expression matching the tree structure for function main

• Other:

– Sorting and filtering are specific to the use of the column for a particulartable.

8.3 Path mapping dialog

When SLX is used to navigate source code through in a different environment thanwhere the trace has been ran, SLX discovers that the source code for the given path isnon-existent and opens a path mapping dialog.

The user can either add a new prefix resolving the file path or the full file path. Thedialog pre-fills the "from" field of a new mapping (see Figure 8.5).

158

8.3. Path mapping dialog

Figure 8.5: File Mapping Dialog when the source file is non-existent

159


160

9Debugging SLX Tools

Working with embedded software can be highly complex. Therefore, it may happenthat errors occur during processing. The SLX IDE provides means to investigate thosesituations further and obtain more details about those situations.

9.1 Debug Mode

Most processing steps started from the IDE will call one or multiple Silexica tools inthe background. The tool flow logic is implemented using a combination between acustom python layer, whose commands are encapsulated behind GNU Make rules. Inorder to see all the details of tool executions in the background, it is possible to activatethe debug mode of SLX by selecting the Debug Mode check box on the SLX pageof the Eclipse preferences dialog (see Figure 9.1). To open this dialog, please selectWindow, Preferences from the Eclipse menu.

9.2 IDE Console Logs

The output of the tools run in the background is shown in the Console window of theEclipse project in the SLX IDE. For performance reasons, the Console window doesnot keep an infinite number of lines. If the number of lines exceeds a certain threshold,the first parts of the output are removed. Further, the Console is cleared before a newbackground action is started from the IDE.

The entire output of the tools run in the background is additionally written to a logfile in the silexica/guiConsoleLog subdirectory of the Eclipse project. The outputfrom each background action started from the IDE is written to a separate file. Onlythe last ten log files are kept in order to limit disk usage. The file slxGUIwrap.0.logalways contains the output of the most recent background operation. Files with largernumbers correspond to actions executed further in the past. All lines in the log files are

161


Chapter 9. Debugging SLX Tools

Figure 9.1: Activating Debug Mode.

prefixed with a timestamp and the type of the line. Lines of type cmd inform about thecommand line being executed. The output of the tools is contained in lines of type out.

When contacting Silexica support with questions regarding the background actionsand/or the output appearing in the Console window of the SLX IDE, please include thelog file from the silexica/guiConsoleLog subdirectory. In case the log file containsconfidential information that cannot be shared with Silexica, please send the log fileswith just the confidential information removed.

9.3 Updating the Xilinx Device Totals file

To update the Xilinx devicetotals.pp file, open a command console window. Navigate tothe silexica/data/fpga/characterization directory of your SLX FPGA installationand run the following script:

run_device_totals.pyThis will update the correct version of the devicetotals file that corresponds to the

$XILINX_VIVADO variable.

162



10Platform and Core Model Files

Silexica uses abstract, XML-based platform models to realistically mimic the behaviorof target platforms. A platform model contains information about available processors,memories, interconnects, and communication costs. Additionally, power informationcan be given for the different system components.

Processors are considered instances of so-called cores. For example, on a sam-ple embedded platform with two processors (labelled P0 and P1), P0 and P1 could beinstances of an ARM Cortex A15 core. This separation of cores and platforms makes itpossible to easily construct new platforms with a minimum amount of redundancy. Formore information on cores, see Section 10.1.

With each release Silexica provides a set of target platform models that range fromreal-world to virtual platforms. If none of the platforms provided with this release fitsyour needs, you can either change an existing platform model or create a new platformmodel on your own using the Platform Editor (see the Platform and Core ModelingGuide).

10.1 Cores

A core mainly contains information on execution timing: Functional units (supportedoperations, bit widths, etc.), library costs, etc.

163



Chapter 10. Platform and Core Model Files

164

11Supported Xilinx Libraries

SLX FPGA supports the following Xilinx libraries.

11.1 Support for ap_int, ap_cint and ap_fixed

ap_int, ap_cint and ap_fixed provide arbitrary precision data types: ap_cint in theform of a fixed set of C typedefs, ap_cint and ap_fixed as a set of C++ templates.The following features are supported for these data types:

• Tracing and Synthesizability Checking

• Pragma Generation

• Array Partitioning and Reshaping

i Reductions on ap_int or ap_fixed are not currently supported. Using reduc-tions here will block parallel (unrolled) loops from being generated becausescalar variables declared with these typedefs are not recognized as reductionvariables. For ap_cint, scalar variables are recognized as reduction variables,similar to normal integer variables, and thus supported. For details on reduc-tion variables, see reduction variable

i The std::complex template can only be used with these data types on Linux;Windows does not support usage in this manner.

11.2 Support for hls::stream

SLX FPGA understands the semantics for the hls::stream library. The hls::streamlibrary is a C++ template class provided by Vivado HLS for modelling streaming data

165

Chapter 11. Supported Xilinx Libraries

structures. SLX FPGA can be utilized to optimize code that utilizes the hls::stream;keep in mind that non-blocking read/write operations are not supported. In addition,#pragma HLS STREAM is not currently supported.

11.3 C++ template class caveat

Note that when using C++ template functions, SLX FPGA supports only one instan-tiation of the function. Having applications with multiple instantiations may result insuboptimal optimization of the resulting IP blocks, and on conflicting pragmas insertedfor each of the template instances. In the case that an application has more than onecall to a templated function instance, please manually duplicate the code so SLX FPGAcan perform appropriate optimizations for each call.

166

12Supported Xilinx Versions, Platforms,

and Parts

12.1 Xilinx Versions

SLX FPGA is tested with the newest and oldest officially supported version of XilinxVivado HLS version 2.

In the current release, these are the versions: 2019.1 and 2019.2. Older versionsare not supported.

12.2 Xilinx Platforms

SLX FPGA supports the development kit platforms that are installed with the respectiveXilinx version. It also supports user platforms that use FPGA parts available in the Xil-inx Vivado installation given that the corresponding SLX platform and logical model ofthe FPGA are available. For the supported development kit platforms, SLX FPGA sup-ports the following clock operating frequencies, shown in Table 12.1. User platformscan be generated using the Xilinx tools with the following process:

• The hardware IP subsystem is implemented using Vivado IP Integrator3 and adescription of it, called the DSA file (Device Support Archive), is exported.

2 https://www.xilinx.com/support/download.html3 https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug995-vivado-ip-subsystems-tutorial.pdf

167

https://www.xilinx.com/support/download.html

https://www.xilinx.com/support/documentation /sw_manuals/xilinx2019_2/ug995-vivado-ip-subsystems-tutorial.pdf

https://www.xilinx.com/support/documentation /sw_manuals/xilinx2019_2/ug995-vivado-ip-subsystems-tutorial.pdf

Chapter 12. Supported Xilinx Versions, Platforms, and Parts

Table 12.1: Supported clock frequencies for the development kit platforms.

Platform Clock ID Frequency (MHz)zed, 0 166 MHzzc702, 1 142 MHzzc706 2 100 MHz

3 200 MHzzcu102, 1 100 MHzzcu104, 3 200 MHzzcu106 4 300 MHz

5 400 MHz6 600 MHz

u96v2_avnet 0 100 MHz1 25 MHz

12.3 Xilinx FPGA Parts

SLX FPGA supports a limited number of Xilinx FPGA devices which constitute thedevices necessary to support the official Xilinx development kits in addition to u96v2_-avnet (marketed as Ultra96) from Avnet.

Synthesize with Vivado HLS will not work if the FPGA device is not installed in theXilinx toolsuite and the platform is neither installed nor provided in a separate folder.For platforms distributed by Xilinx, consult the Xilinx documentation. For the u96v2_avnet platform specifically, consult the Avnet page for the Ultra964 development kit.

The following lists the supported Xilinx architectures. All FPGA Parts for thesedevice families are supported.

• kintexu (Kintex UltraScale)

• kintexuplus (Kintex UltraScale+)

• virtexu (Virtex UltraScale)

• virtexuplus (Virtex UltraScale+)

• virtexuplusHBM (Virtex UltraScale+)

• virtexuplus58g (Virtex UltraScale+)

• zynq (Zynq-7000)

• zynquplus (Zynq UltraScale+)

• zynquplusRFSOC (Zynq UltraScale+ RFSOC)

4 http://zedboard.org/product/ultra96

168

http://zedboard.org/product/ultra96

13Known issues and limitations

This section illustrates known limitations that might be encountered when utilizing fea-tures in SLX FPGA, as well as potential workarounds.

13.1 Compatibility with Previous Results

This release of SLX FPGA has file formats containing profiling and analysis resultsthat are incompatible with previous releases of SLX FPGA. In addition, the internalstructure of these files in the SLX internal analysis directories shifted and may causeincompatibilities with older workspaces. If you are experiencing difficulties migrating anexisting project that was created with an earlier version of SLX FPGA, perform a Cleanof the project and try again. If the issue is not resolved, follow the steps below:

Step 1: In SLX FPGA 2020.2, create a new project with the same name as the projectto be imported.

Step 2: Replace the ".project" file from the project to be imported with the ".project"file from the newly created project.

Step 3: Delete the newly created project.

Step 4: Import the original project.

If the issue is still not resolved, please contact the Silexica’s support portal.

13.2 Support for C++14 and later applications

Applications utilizing C++ version 14 (C++14) and later are not supported because oflimitations in the Xilinx compiler. C++ applications are supported in SLX FPGA forapplication profiling and analysis, but the synthesis of this code is not supported.

169

http://support.silexica.com

Chapter 13. Known issues and limitations

13.3 IDE Rendering

If there is a large amount of data to be visualized, there may be BIRT5 or GTK6-relatedrendering issues in SLX. We are sorry for the inconvenience and are working hard withthe BIRT/GTK teams to solve these problems for the specific library for your platform.

13.4 GUI Font Sizes

If font sizes appear small in Linux (e.g., on an ultra high-definition monitors), use thefollowing command for starting the tools from a console:

1 GDK_DPI_SCALE=1.5 ./SLX

13.5 Minimum Memory Requirements

The minimum memory required to run SLX is 8GB. This is the default setup for SLX;re-configure the RAM as needed to suit your needs. If the memory is not enough, SLXmay freeze. To work around the issue, close and reopen SLX; if the issue persists,follow the steps below to increase the used heap of the GUI:

1. Close any open SLX windows

2. Open (SLX_installation_path)/bin/gui/SLX.ini

3. Edit the line starting with -Xmx. It should be changed to a value followed by -XmxDDDDm where DDDD is a multiple of 1024 (1GB), e.g. 2048, 4096, 8192, 16384, etc.The current default is -Xmx8192m (8GB)

4. Close the .ini file editor and restart SLX

13.6 Unsupported constructs in input code

13.6.1 fork() and vfork()

Using fork() or vfork() in the input code will result in a runtime failure during tracing.

13.6.2 C++ exceptions

Exceptions and the associated keywords (try, catch, throw) for C++ are not sup-ported. SLX FPGA will fail with an explicit error message during instrumentation ifexceptions are present in the testbench or design code.5 http://www.eclipse.org/birt/6 http://www.gtk.org/

170

http://www.eclipse.org/birt/

http://www.gtk.org/

13.7. Non-Blocking Reads or Writes on hls::stream

13.7 Non-Blocking Reads or Writes on hls::stream

write_nb() and read_nb() are not supported on hls::stream objects. SLX FPGA willfail with an explicit error message during instrumentation if these methods are called.

13.8 Combined use of sdx::complex and ap_fixed

Using both std::complex and ap_fixed in the same source file will cause SLX FPGAto fail during instrumentation. The failure occurs when the corresponding headers areincluded in a source or header file.

13.9 Modulo and division on ap_int larger than 128bits

Modulo and division operations on ap_int larger than 128 bits are not supported. Uti-lizing these operators will cause SLX FPGA to fail during instrumentation.

13.10 Specifying Multiple Top-Level Hardware Functions

A user can specify multiple top-level hardware functions as long as:

• one is not called inside the other, or

• the local variables for the function calling a top-level function (e.g. the Caller) areonly passed as arguments for that top-level function (e.g. the Callee)

13.11 Templated code with multiple instantiations

In some cases, SLX FPGA may generate invalid pragmas on templated code if thecode is instantiated multiple times in the design. The outcome can be inefficient andmay produce non-synthesizable designs. A possible workaround is to remove the tem-plate by replicating and specializing it.

13.12 Support for CMake

Currently, SLX FPGA does not support projects with CMake as the build system.

13.13 Support for hls_math library

The hls_math library is not fully supported by SLX FPGA.

171

Chapter 13. Known issues and limitations

In some cases, SLX FPGA may support the use of hls_math by manually link-ing against the library. The following link flags should be added to the link invoca-tion in the Configure Project or build system: -L${XILINX_VIVADO}/lnx64/lib/csim/-lhlsmc++-GCC46.

If linking C code instead of C++, the relevant flags are:-L${XILINX_VIVADO}/lnx64/lib/csim/ and -lhlsm-GCC46.Additionally, the Run command in the Configure Project should be preceded with

LD_LIBRARY_PATH="${XILINX_VIVADO}/lnx64/lib/csim/".

172

AHigh Level Synthesis Introduction

The term high-level synthesis refers to any design automation methodology that aimsto generate high-performance digital designs from descriptions coded in a high-levellanguage, such as C or C++. The input to this process is usually an algorithmic-leveldescription, from which synthesizable RTL designs can be implemented on FPGAs orASICs.

The reason for high-level synthesis and the research behind it, is to shorten time-to-market of new designs and to reduce the number of required iterations, ease theverification process and achieve first-time success for the designs. It is consideredas a raise in the design entry abstraction level that cannot be avoided, and is usuallycompared to the transition that was made from gate-level design to RTL, by usinghardware description languages such as VHDL or Verilog.

A silicon compiler, hardware compiler, behavioral synthesis tool or high-level syn-thesis tool is a core element of an HLS-enabled flow. Most of the steps involved intogenerating synthesizable RTL code from a source language are the same to compil-ing source code for a fixed processor system in a systems programming language.However, there are more degrees of freedom in some of the steps, and some are onlyrelevant to an HLS tool. For instance, one might be able to select from more than onepossible implementations of an operator, e.g., addition. Scheduling has to account theFPGA real estate, but there is much more room for feasible schedules. Finally, theHLS core engine would not be able to model precisely what will happen in the latebinding and place and route stage in the logic synthesis flow to actually derive the fi-nal implementation of the design. This means that HLS must be more conservativewhen reaching the limit of the available resources in the device, to ensure that thesynthesized design fits in the actual FPGA.

There are still many limitations in state-of-the-art commercial HLS:

• Source language constructs that are non-synthesizable need to be worked around.For Xilinx Vivado HLS, a guide to the most important non-synthesizable con-structs and general coding approaches to write synthesizable code is given inSection B.2.

173

Appendix A. High Level Synthesis Introduction

• HLS cannot automatically generalize a design to create the appropriate parame-ters for controlling the sizes of input widths and functional unit sizes. Workaroundswith limited applicability for C input involve the use of typedef and for C++ the useof templates.

• Profiling for understanding the application hotspots and its performance require-ments is not available. Therefore, there is little to no help available to derive theguidelines that can steer synthesis to better performance in the form of automaticcode rewriting or vendor-specific pragmas.

174

BXilinx Vivado Installation and Setup

This section provides setup instructions for the HLS tool from Xilinx.

B.1 Setting up Xilinx HLS 2019.2

This section details the procedure to download and install Xilinx HLS 2019.2. Theinstructions given in this tutorial were tested on Linux Ubuntu-18.04, and are providedas a convenience for the setup of SLX FPGA.

B.1.1 Downloading Xilinx HLS

The web installer binary file (Xilinx_Unified_2019.2_1106_2127_Lin64.bin) (or theequivelant latest version from the Xilinx website) is required in order to begin the in-stallation. The file can downloaded from Xilinx portal, as given in the footnote 7. Afterfilling the corresponding forms, the installation file should be placed under the homedirectory in the installation system.

B.1.2 Installing Xilinx HLS

The first step is to run the downloaded installer. In order to do this, the followingcommand should be executed in the folder where the installation file was downloaded:

sudo ./Xilinx_Unified_2019.2_1106_2127_Lin64.bin

Please note that the installation file must have executable permissions. Superuserrights (i.e., sudo) are important, otherwise the installation cannot fully complete. Oncethe aforementioned command is executed, a GUI dialog is displayed.

7 https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/2019-2.html

175

https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/2019-2.html

https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/2019-2.html

Appendix B. Xilinx Vivado Installation and Setup

The first screen that requires an input can be seen in Figure B.1. A Xilinx accountis required in this step; if a Xilinx account is not available, it should be created byclicking on Please Create One, which will redirect to the Xilinx registration web site.After filling the registration form, an e-mail is sent to the registration e-mail addresswith instructions for the account activation. After account activation, the installationcan continue.

Figure B.1: User authentication dialog in the Xilinx HLS installer

176

B.1. Setting up Xilinx HLS 2019.2

Figure B.2: Xilinx installer welcome page

Figure B.3: Accept License Agreements

177


Figure B.4: Select Product to Install

Figure B.5: Select Edition to Install

178


Figure B.6: Vivado HLS Design Edition

Figure B.7: Select Destination to Install

179


Figure B.8: Summary

180


After the above steps, the installation is ready to proceed. From this point on, it isassumed that a valid license for the usage of the tools has been obtained from Xilinx.This can be a node-locked license (for a given MAC address a host) or a floating-license to be run from a license server. In the first case, the FlexLM license file, e.g.,license.lic should be put somewhere on the filesystem where the tools are installed.The LM_LICENSE_FILE environment variable should point to it, as shown below.

export LM_LICENSE_FILE="$LM_LICENSE_FILE:/home/username/license.lic"

In the case that a floating license has been granted, then the LM_LICENSE_FILEneeds to be pointed to the correct server and port. This guide assumes that the nameof the license server is licserver and the port used is 34000. Then, the followingsuffices to point to the license server.

export LM_LICENSE_FILE="$LM_LICENSE_FILE:34000@licserver"

Please note that in order to use a floating license, the host where the tools willexecute requires an active connection to the network.

At this point and given that all the previous steps were successful, the vivado_hlstool should be available for execution. This can be tested by executing the followingcommands:

~/Xilinx/Vivado/2019.2/bin/vivado_hls -version

and observing the output, which should be similar to the following:

Vivado(TM) HLS - High-Level Synthesis from C, C++ and SystemC v2019.2 (64-↪→ bit)SW Build 2708876 on Wed Nov 6 21:39:14 MST 2019IP Build 2700528 on Thu Nov 7 00:09:20 MST 2019Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

The license export command should be added to the ~/.bashrc file (or the equiv-alent for other OSes), in order to automate the aforementioned steps.

B.1.3 Configuring SLX to use Xilinx

SLX FPGA must be configured with the location of the Xilinx tools in order to properlydetect and use them. A configuration panel is available for that purpose in the generalpreference window shown in Figure B.9.

A path to the Xilinx tools can be either entered manually or selected using the"Browse..." button. Additionally, the version of the toolchain to use can then be se-lected using the version selector. Any path to Xilinx HLS can be selected: SLX FPGAwill automatically deduce the path to the Vivado toolchain when it is available. Thetoolchain path can also be left empty to use the estimates of SLX FPGA instead ofthose provided by the Xilinx toolchain (this is not recommended).

181


Figure B.9: The Xilinx Setup Preference page

B.1.4 Configuring the Color Scheme for Xilinx Messages

The messages of the Xilinx tools that are called by SLX are shown in the so-called sec-ondary console. This is not a separate console window, but a separate color schemeto use when displaying messages that were directed to the secondary console. In thatway the messages can be distinguished without losing the connection between the twomessage streams.

The color scheme of this secondary console can be adjusted in the configurationpanel available for that purpose in the general preference window shown in Figure B.10.

The color scheme for Xilinx messages can be configured under Secondary Consolecategory for error, general, info and warning messages.

B.2 Synthesizable code guidelines for C and C++

While not definitive, this list of guidelines should be in mind when writing or adjustingcode to be synthesizable with the Xilinx high-level synthesis tools.

182

B.2. Synthesizable code guidelines for C and C++

Figure B.10: The Colors and Fonts Preference page

• The top-level C function must contain the entire functionality of the design

• No system calls and file I/O. printf and assert are allowed

• No array sizes unknown at compile time, all arrays must be static

• There are limitations in the use of pointers, e.g., there is no support for generalpointer casting and for function pointers

• Recursive functions are not supported

• C++ classes and templates can be synthesized, but not as the top-level (i.e., theprincipal) function of the design

• There is limited support for virtual functions

183


• Overall, C++ synthesis is challenging for commercial applications

• Use cc -Wall -Wextra -pedantic, valgrind, and cppcheck to check your codefor potential logical problems

• Allow labels and pragmas: -Wno-unused-label -Wno-unused-pragma

• Use golden reference data to compare the computation results from the gener-ated hardware after synthesis

184

CGlossary

This chapter establishes definitions for terms used throughout the SLX FPGA docu-mentation, as well as in the IDE. They are encountered in static and dynamic analysis,performance estimation and parallelism extraction from sequential applications.

acceleratorA processor core onto which code and data can be copied from a host core. Thepurpose is to increase execution performance of a program, often in terms ofspeed, but other criteria could also be involved. At completion of a task on theaccelerator core, data may be transferred from the accelerator back to the hostcore.

average trip countThe average number of times a loop executes.

candidate thresholdThe lowest execution percentage of a function for which it will be considered as aparallelization candidate . The execution percentage is calculated as the ratio ofthe total execution time of the function to the total execution time of the program,multiplied by 100.0.

data dependencyIf a value computed by a given statement Si is used by another statement Sj, it issaid that Sj is data dependent on Si. This is indicated by Si ≺ Sj.

data level parallelismA parallelism pattern where a given computation is replicated into multiple pro-cesses that operate on different input data sets in parallel. When detecting in aloop, its main goal is to split the iteration space of the loop into multiple workersas long as there are no loop-carried dependencies.

hardware functionA function in the program that can be implemented as hardware in programmablelogic.

185

Appendix C. Glossary

hostA processor core on which the a program begins execution.

initiation intervalThe initiation interval of a pipelined loop is the number of clock cycles betweenthe start times of consecutive loop iterations.

iterationA single execution of the loop body of a loop .

loopA loop comprises of a set of statements in a given source language executedrepeatedly for a number of times. The set of statements is often referred to asthe loop body.

loop-carried dependencyIf a statement executed in one iteration is data dependent on the statement exe-cuted in another iteration then the data dependency is said to be a loop-carrieddependency. For a given loop and two distinct iterations, a and b with a 6= b, suchdependency between statements Si and Sj is denoted by Sa

i ≺ Sbj .

loop nestSee nested loop .

loop unrollingA loop optimization causing the body of a loop to be replicated n times, where n

is larger than 1 and less than or equal to t, the trip count of the loop.

minimum trip countSee trip count .

nested loopA loop that contains other loops in the loop body.

offloadingThe process of copying code and data from a host to an accelerator core.

oversubscriptionOversubscription occurs when locally optimal decisions in parallelism extractionlead to a performance degradation or slowdown of the system. SLX FPGA avoidsthis problem when annotating parallel loops in software functions by selectingfor parallelization the outermost one in a loop nest . Oversubscription may notbe a problem in nested data level parallelism for high-performance computingsystems, given that they support multiple levels of hierarchical parallelism andcommunication costs justify the parallelization of non-outermost loops.

parallelization candidateA function that has been deemed from dynamic analysis as having a significant

186

contribution to the program’s execution time. By default, the threshold for consid-ering a function as a parallelization candidate is 0%. You can change this valueby setting the candidate threshold in the SLX FPGA Configuration Editor . Onlyparallelization candidates are considered for parallelism pattern detection.

parallelism patternA property of a part of a program when this part can be structured or interpretedin a way that allows parallel execution. SLX FPGA detects data level parallelismand pipeline level parallelism patterns in source code.

pipeline level parallelismA parallelism pattern where a computation is broken down into a sequence ofprocesses also called pipeline stage s, which are contained within a loop. Bydoing this, it is possible to execute pipeline stages in parallel.

private variableA variable that gets redefined in every iteration of a loop. Conditionally updatedvariables are not private.

reductionA reduction has the form of rv = rv⊕expr, where rv is a scalar variable that is de-fined exactly once in the loop and is not used elsewhere in it, and expr is typicallya loop-variant expression. The operation ⊕ is commutative and associative.

reduction variableA reduction variable, var, is a scalar variable, both defined and used in statementsof the form var = var⊕expr in a loop, exactly one time and not used elsewhere init, and expr is typically a loop-variant expression. The operation ⊕ is commutativeand associative. For example, accumulation through reduction is a typical idiomin scientific processing codes using floating-point arithmetic.

for(int i = 0 ; i < 100 ; ++i){int temp = array0[i] *array1[i] + array2[i]+ array3[i];sum = sum + temp;

}

In the above example, the calculation of the sum has a loop carried dependence.However, the dependencies induced by reductions do not completely prevent theparallelization of the loops carrying them. SLX FPGA accounts for this propertyand assumes that reductions do not prevent loops from being identified as DLP.

The following are cases that SLX FPGA detects as reduction variables:

• Arithmetic operations: additions, multiplications and subtractions (when itcould be rewritten to an addition)

• Bitwise (and, or, xor) operations on integer types

187


• Arithmetic operations on floating point types but only in the case where the-ffast-math compiler option is given in the USER_BUILD command

The following are limitations of SLX FPGA for reduction detection:

• SLX FPGA does not detect reductions on non-scalar variables, such as re-ductions to array elements or to sub-objects of a bigger object (a class or astruct)

• SLX FPGA does not detect reductions for variables whose lifetime is thesame as the program (global or static locals)

• SLX FPGA does not detect reductions for local variables if their addressis taken in any way (passed by reference to another function or passed bypointer with the & operator)

In the case that a loop has reduction variables, they are separately reported asvar along with the operation name (opname) that they are involved with. An exam-ple of a reduction variable would be an accumulator, for which case the operationwould be an addition.

stageIn case that pipeline level parallelism has been identified, a computation suchas a loop body is split into a sequence of intermediate steps with one step feed-ing data into the next one. Each process then comprises a stage of a pipeline,servicing a specific part of the entire functionality. Data moves from one stage tothe following one in the same sense that an assembly line operates. One stageideally corresponds to one processor core, albeit this is not necessary, since apipeline can be serviced with a number of cores different than the number ofstages.

subvariableA variable that is contained in another variable, i.e., a subset of a variable. InSLX, subvariables can be members of a struct that do not contain unions orbit-fields or array index ranges of one dimension of an array.

task level parallelismA parallelism pattern where a computation is divided into multiple processesthat operate in parallel on different input data sets.

Top-Level Hardware FunctionA function implemented in programmable logic which is an entry point of execu-tion for the programmable logic. See also hardware function .

trip countThe minimum number of times a loop executes. Sometimes referred to as mini-mum trip count .

188

unroll factorThe number of times, n, that the body of a loop has been replicated to create n

copies. Consequently, the number of iterations is reduced by 1/n. See also loopunrolling .

untracked memory definitionA synthetic global object that represents an untracked memory that is accessedduring dynamic profiling.

workerWhen data level parallelism is identified in context of a loop, the loop’s iterationspace is divided into multiple processes. These can run in parallel on multipleprocessor cores which are termed as workers or worker cores.

189


190

Acronyms and Abbreviations

ARM Advanced RISC MachinesCOTS Commercial Off-the-ShelfFPGA Field-Programmable Gate ArrayIP Intellectual PropertyXML eXtensible Markup Language

191

Acronyms and Abbreviations

192

List of Figures

1.1 SLX FPGA flow overview . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 User workflow with SLX FPGA . . . . . . . . . . . . . . . . . . . . . . 62.2 Setting up a workspace directory . . . . . . . . . . . . . . . . . . . . . 82.3 SLX Welcome Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Selecting create New Project from the Welcome Screen . . . . . . . . 102.5 Selecting create New Project from the workspace . . . . . . . . . . . . 102.6 Creating an SLX FPGA project . . . . . . . . . . . . . . . . . . . . . . 112.7 Configuration editor to specify platform and relevant build details . . . 112.8 Recent projects section of the Welcome screen . . . . . . . . . . . . . 122.9 Disable Welcome Screen at startup . . . . . . . . . . . . . . . . . . . . 122.10 Importing an SLX FPGA project . . . . . . . . . . . . . . . . . . . . . . 132.11 Selecting an SLX FPGA Project to import . . . . . . . . . . . . . . . . 132.12 Converting a C/C++ project into an SLX FPGA Project . . . . . . . . . 152.13 Show confirmation dialog for C/C++ to FPGA project conversion check-

box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.14 C/C++ to FPGA confirmation dialog . . . . . . . . . . . . . . . . . . . . 162.15 Import Xilinx Vivado HLS Project from the Welcome Screen . . . . . . 172.16 Import Xilinx Vivado HLS Project from the Main Toolbar . . . . . . . . 172.17 Select the path to Xilinx Tools . . . . . . . . . . . . . . . . . . . . . . . 182.18 Selecting the Xilinx project to be imported . . . . . . . . . . . . . . . . 192.19 Vivado HLS project imported as a SLX FPGA project . . . . . . . . . . 202.20 General SLX preferences applicable to every project . . . . . . . . . . 212.21 The Xilinx Setup Preference page . . . . . . . . . . . . . . . . . . . . . 232.22 SLX FPGA toolbar, showing the buttons for the different flow steps . . 232.23 SLX FPGA Eclipse menu, showing the commands for the different flow

steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.24 Invoking the Configure Project editor . . . . . . . . . . . . . . . . . . . 252.25 Build Options section for the Configure Project editor . . . . . . . . . . 262.26 Saving the Configure Project options . . . . . . . . . . . . . . . . . . . 272.27 Run Code for the application . . . . . . . . . . . . . . . . . . . . . . . . 272.28 Context menu to reach the Debug Configurations dialog. . . . . . . . . 282.29 Newly created debug configuration for the workshop_fpga project . . . 302.30 Prompt to confirm switching to debug perspective . . . . . . . . . . . . 302.31 Debug perspective for the workshop_fpga application. . . . . . . . . . 31

193

LIST OF FIGURES

2.32 Invoking the Configure Project editor for the project . . . . . . . . . . . 332.33 The Configuration Editor . . . . . . . . . . . . . . . . . . . . . . . . . . 342.34 Message when the Xilinx location isn’t configured . . . . . . . . . . . . 342.35 Selecting an FPGA Part . . . . . . . . . . . . . . . . . . . . . . . . . . 372.36 Column filters in the FPGA part selector . . . . . . . . . . . . . . . . . 372.37 Show FPGA Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.38 FPGA Part Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.39 Message when the Xilinx location isn’t configured . . . . . . . . . . . . 392.40 Select Xilinx Platform Archive . . . . . . . . . . . . . . . . . . . . . . . 402.41 Choose Xilinx Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.42 Basic Options configuration complete . . . . . . . . . . . . . . . . . . . 412.43 Using extern "C" at function declaration for top-level function. . . . . . 432.44 Project configuration dialog with the HLS Options view expanded . . . 432.45 Saving the Configure Project options . . . . . . . . . . . . . . . . . . . 442.46 The Available Area section of the Configuration Editor with a value

input for BRAM that’s higher than the available resources on the FPGA 452.47 Function Mapping Editor . . . . . . . . . . . . . . . . . . . . . . . . . . 472.48 Context menu actions in the Function Mapping Graph . . . . . . . . . 492.49 Notification on top of the Function Mapping Graph guiding towards the

next step in the flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.50 Function Mapping View displaying the functions in the application in a

table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.51 General function properties in the Function Mapping Editor . . . . . . 512.52 Call Edges of a function in the Function Mapping Editor . . . . . . . . 522.53 Parallel loops of a function in the Function Mapping Editor from the

Workshop_FPGA project. . . . . . . . . . . . . . . . . . . . . . . . . . 522.54 Parallel loops of a function in the Function Mapping Editor displaying

nested loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.55 Interfaces section showing the register modes for axi . . . . . . . . . . 542.56 Interfaces section with configuration for the maximum read and write

burst lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.57 Default state of the Bandwidth section without user input . . . . . . . . 562.58 Bandwidth section with user defined OUT field . . . . . . . . . . . . . 562.59 Bandwidth section with user defined IN and OUT fields . . . . . . . . 562.60 Context menu to discover synthesizability issues in a function . . . . . 572.61 Click on hint to investigate synthesizability . . . . . . . . . . . . . . . . 582.62 Synthesizability issues discovered for the swscale_accum function . . . 592.63 Guidance for the transformation from non-synthesizable to synthesiz-

able code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.64 Action to map a synthesizable function to the FPGA . . . . . . . . . . 602.65 Use Find and Optimize Parallel Loops to find parallelism and optimize

the current mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.66 Find and optimize parallelism for a function mapped to the FPGA in the

Function Mapping Editor . . . . . . . . . . . . . . . . . . . . . . . . . . 61

194

LIST OF FIGURES

2.67 Console output during finding parallelism . . . . . . . . . . . . . . . . . 622.68 Parallel Loops of a function in the Function Mapping Editor . . . . . . 622.69 Enabling source-level highlighting by loading a task model . . . . . . . 652.70 Clicking Generate HLS-Aware Code . . . . . . . . . . . . . . . . . . . 652.71 HLS-aware code generation process . . . . . . . . . . . . . . . . . . . 662.72 List of HLS pragmas displayed by the Code Transformation Wizard . . 662.73 Clicking on HLS Inline Pragma . . . . . . . . . . . . . . . . . . . . . . 672.74 Disabling HLS Inline Pragma from Code Transformation Wizard . . . . 682.75 Second phase of HLS-Aware Code Generation Process . . . . . . . . 692.76 The workshop_fpga with HLS pragmas automatically inserted in the

portions of the code mapped to the FPGA . . . . . . . . . . . . . . . . 702.77 Automatic replacement of rand() with its synthesizable equivalent slx_-

fpga_rand() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712.78 Clicking Synthesize. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712.79 Partial output from Synthesize for the Vivado HLS flow. . . . . . . . . . 722.80 Clicking Show Synthesis Report. . . . . . . . . . . . . . . . . . . . . . 722.81 SLX Viewer for the Synthesis Report created as a consequence of

the invocation of Vivado HLS. The report displays the version, productfamily, device, performance and utilization estimates. . . . . . . . . . . 73

2.82 The Synthesis Report (continued) displaying the interface implementa-tion details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

2.83 The detailed Synthesis Report in the form of a .rpt file as generated byXilinx Vivado HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

2.84 Enabling line profile results. . . . . . . . . . . . . . . . . . . . . . . . . 762.85 Visualizing line profile results. . . . . . . . . . . . . . . . . . . . . . . . 772.86 Enabling code coverage results. . . . . . . . . . . . . . . . . . . . . . . 782.87 Visualizing code coverage results for main(). . . . . . . . . . . . . . . 782.88 Overview of the SLX Hints view. . . . . . . . . . . . . . . . . . . . . . . 792.89 Expanded parallel partition icon. . . . . . . . . . . . . . . . . . . . . . . 792.90 Generated DLP hint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812.91 Hints tab with a filter configured only to show the ones corresponding

to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812.92 Tooltip revealing full text in SLX Hints . . . . . . . . . . . . . . . . . . . 822.93 Expanded row revealing full text in SLX Hints . . . . . . . . . . . . . . 822.94 Clicking SW Call Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 832.95 Dynamic call graph diagram visualization as requested by SW Call

Graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832.96 SW Call graph view in Outline mode . . . . . . . . . . . . . . . . . . . 842.97 SW Call graph view in Overview mode . . . . . . . . . . . . . . . . . . 842.98 Select multiple functions to focus in the SW call graph using the filters

in the Properties tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852.99 Navigate over the outgoing call graph edges of a function node . . . . 86

195

LIST OF FIGURES

2.100 Select to show the source function of a call graph edge using ShowSource. Show Target is used in the same way for highlighting a targetfunction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

2.101 Additional information for a function . . . . . . . . . . . . . . . . . . . . 872.102 Additional information for a call site (an edge in the call graph) . . . . . 882.103 F1 calls F2 (defined outside of BASEPATH) which calls F3, resulting in a

dashed arc from F1 to F3. . . . . . . . . . . . . . . . . . . . . . . . . . 882.104 Clicking Analysis Graph. . . . . . . . . . . . . . . . . . . . . . . . . . . 892.105 View of the Analysis Graph for the function main. . . . . . . . . . . . . 902.106 Outline view of the Analysis Graph. . . . . . . . . . . . . . . . . . . . . 912.107 Context menu for a function node. . . . . . . . . . . . . . . . . . . . . . 912.108 The context menu for function nodes in the graph. Demonstrating the

possibilities to view callers and callees of the function. . . . . . . . . . 922.109 The context menu for variable nodes in the graph. Demonstrating the

possibilities to highlight or reveal the functions accessing the variable. 922.110 Function Display configuration within the General Settings of Analysis

Graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932.111 Highlight view of the Analysis Graph. . . . . . . . . . . . . . . . . . . . 942.112 The scalability preference menu for the Analysis Graph. . . . . . . . . 972.113 When opening a large Analysis Graph, a warning message will appear

when the graph contains more than the maximum number of nodes. . 972.114 Analysis Graph with an expanded structure to display its members. . . 982.115 Function filter table of the Analysis Graph . . . . . . . . . . . . . . . . 992.116 Analysis Graph with functions highlighted by Total Cost and variables

highlighted by Data Size. . . . . . . . . . . . . . . . . . . . . . . . . . 1002.117 Warning message when a highlighting rule is applied to a scaled graph. 1002.118 Menu to display all functions that access a selected global variable. . . 1012.119 Menu to navigate to the Analysis Graph of a highlighted function. . . . 1022.120 Clicking Memory Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . 1042.121 Synchronize filtering with hidden elements of the Analysis Graph. . . . 1052.122 Code analysis information for the variables in the program, as requested

by Memory Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1052.123 Expand the view to see multiple accesses to local variable b. . . . . . 1062.124 Expanded view to see the fields of struct fp_complex for the workshop_-

fpga example application. . . . . . . . . . . . . . . . . . . . . . . . . . 1062.125 Expanded view to see the access ranges of the scaled_vector_hw ar-

ray for the workshop_fpga example application. . . . . . . . . . . . . . 1072.126 Source code highlighting by double clicking on local variable scaled_-

vector_hw. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1072.127 Hovering over the cell to see the percentage as a total number of Reads

or Writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

7.1 Sample project utilizing a Makefile . . . . . . . . . . . . . . . . . . . . 152

196

LIST OF FIGURES

8.1 SLX Console view reporting DLP/PLP partitions with links to the sourcecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

8.2 SLX Console color preferences . . . . . . . . . . . . . . . . . . . . . . 1548.3 Regular expression matching in text columns (can be toggled on or off) 1588.4 Regular expression matching the tree structure for function main . . . 1588.5 File Mapping Dialog when the source file is non-existent . . . . . . . . 159

9.1 Activating Debug Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . 162

B.1 User authentication dialog in the Xilinx HLS installer . . . . . . . . . . 176B.2 Xilinx installer welcome page . . . . . . . . . . . . . . . . . . . . . . . 177B.3 Accept License Agreements . . . . . . . . . . . . . . . . . . . . . . . . 177B.4 Select Product to Install . . . . . . . . . . . . . . . . . . . . . . . . . . 178B.5 Select Edition to Install . . . . . . . . . . . . . . . . . . . . . . . . . . . 178B.6 Vivado HLS Design Edition . . . . . . . . . . . . . . . . . . . . . . . . 179B.7 Select Destination to Install . . . . . . . . . . . . . . . . . . . . . . . . 179B.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180B.9 The Xilinx Setup Preference page . . . . . . . . . . . . . . . . . . . . . 182B.10 The Colors and Fonts Preference page . . . . . . . . . . . . . . . . . . 183

197

LIST OF FIGURES

198

List of Tables

2.1 Highlight metrics for the Analysis Graph. . . . . . . . . . . . . . . . . . 992.2 Memory Analysis table column attributes . . . . . . . . . . . . . . . . . 103

5.1 Synthesizability check codes and the corresponding source-level con-structs that are unsupported for synthesis. . . . . . . . . . . . . . . . . 124

7.1 Configuration variables for both models. . . . . . . . . . . . . . . . . . 142

12.1 Supported clock frequencies for the development kit platforms. . . . . 168

199

LIST OF TABLES

200

Documents

Copyright Notice and Proprietary Information · Code Analysis Graph: displays a graphical representation that combines the function execution times with the variable accesses in the