28
Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 1 of 28 SEVENTH FRAMEWORK PROGRAMME Call FP7-ICT-2009-4 Objective ICT-2009.8.1 [FET Proactive 1: Concurrent Tera-device Computing] Project acronym: EURETILE Project full title: European Reference Tiled Architecture Experiment Grant agreement no.: 247846 WP9 – Training, Exploitation and Dissemination D9.4 – Fourth Report on Training, Exploitation and Dissemination Lead contractor for deliverable: INFN Due date: 2014-09-30 Revision: See document footer. Project co-funded by the European Commission within the Seventh Framework Programme Dissemination Level: PU PU Public PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)

Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Embed Size (px)

Citation preview

Page 1: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 1 of 28

SEVENTH FRAMEWORK PROGRAMME Call FP7-ICT-2009-4 Objective ICT-2009.8.1

[FET Proactive 1: Concurrent Tera-device Computing]

Project acronym: EURETILE Project full title: European Reference Tiled Architecture Experiment Grant agreement no.: 247846

WP9 – Training, Exploitation and Dissemination D9.4 – Fourth Report on Training, Exploitation and

Dissemination

Lead contractor for deliverable: INFN Due date: 2014-09-30

Revision: See document footer. Project co-funded by the European Commission

within the Seventh Framework Programme Dissemination Level: PU PU Public PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission

Services) CO Confidential, only for members of the consortium (including the Commission

Services)

Page 2: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 2 of 28

TABLE OF CONTENTS

1.   Consolidated and Potential Impact ............................................................ 3  1.1   INDUSTRIAL EXPLOITATION AND IMPACT OF ASIP DESIGN TOOLS .......................................................... 3  1.2   INDUSTRIAL EXPLOITATION AND IMPACT OF SIMULATION AND DEBUGGING TECHNOLOGIES ..................... 4  1.3   EXPLOITATION OF THE NEURO-SYNAPTIC DPSNN-STDP SIMULATOR AND OF THE QUONG PLATFORM IN THE CORTICONIC FET PROJECT .................................................................................................................. 5  1.4   PUBLIC RELEASE OF DAL MANY-PROCESS DYNAMIC DEVELOPMENT ENVIRONMENT ................................ 5  1.5   PUBLIC RELEASE AND OR INDUSTRIAL IMPACT FORESEES BY TIMA ....................................................... 6  1.6   FROM EURETILE HPC TO HIGH-ENERGY PHYSICS REAL-TIME APPLICATIONS ...................................... 7  1.7   EURETILE EXPLOITATION VIA TETRACOM TECHNOLOGY TRANSFER PROJECT ................................... 9  1.8   WORKSHOP ON MULTI-CORE DEBUGGING ............................................................................................ 9  1.9   WORKSHOP ON COMPUTING ON HETEROGENEOUS MANY-TILE COMPUTING SYSTEMS ............................ 10  

2.   Publications / Presentations .................................................................... 10  2.1   JOINT PUBLICATIONS ........................................................................................................................ 10  2.2   PUBLICATIONS DURING THE LAST REPORTING PERIOD (JAN-SEP 2014), INDIVIDUALLY PRODUCED BY EACH BENEFICIARY ................................................................................................................................................. 11  2.3   PRESENTATIONS OF PROJECT RESULTS DURING LAST REPORTING PERIOD (JAN – SEP 2014) ................ 13  2.4   PUBLICATIONS DURING THE PREVIOUS REPORTING PERIODS (2010 - 2013), INDIVIDUALLY PRODUCED BY EACH BENEFICIARY ........................................................................................................................................ 14  2.5   A FEW PRESENTATIONS OF PROJECT RESULTS DURING PREVIOUS REPORTING PERIODS (2010-2013) ... 22  2.6   BOOKS ............................................................................................................................................ 28  2.7   CASTNESS'11 WORKSHOP ............................................................................................................ 28  

Page 3: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 3 of 28

This document is composed of three sections. • The first section is about the consolidated and potential industrial and academic

impact of the project, including the main dissemination activities and exploitation of results. This section is also included in the Final Publishable Report, for public disclosure

• The second section is a list of publications and presentations produced by the project

1. Consolidated and Potential Impact

1.1 Industrial Exploitation and Impact of ASIP Design Tools Euretile’s software tools for the design and programming of application-specific processors (ASIPs) were developed by project partner Target Compiler Technologies (TARGET) in a cooperation with INFN. Since its incorporation in 1996, TARGET has continuously invested in R&D, partly based on collaborations within EC and nationally funded projects, which resulted in the company becoming the world’s leading vendor of ASIP design tools. Today, TARGET’s IP Designer tool-suite is in production use by many of the world’s tier-1 and tier-2 semiconductor and system companies, including public references like Broadcom, Conexant, Dialog Semiconductor, Freescale Semiconductor, GN ReSound, Huawei, NXP Semiconductors, Olympus, ON Semiconductor, Sanyo, Silicon Labs, STMicroelectronics, and Texas Instruments. We estimate that more than 150 unique chips are in the market containing IP Designer-made ASIPs.

Already before the completion of the Euretile project, TARGET initiated the commercial exploitation of certain technologies developed in the project. The current exploitation status is as follows:

• After an initial trial period by key customers, the new integrated development environment (IDE) named “ChessDE” has been released as the standard IDE in the commercial IP Designer product.

• The new multicore IDE named “ChessMP” has been tested by key customers and is currently being rolled out commercially.

• After an initial trial period by key customers, the new methodology and tools for modeling the behavior of the ASIP’s datapath operators and program control unit, based on bit-accurate C code (so-called “PDG” language), have been released as the standard method in the commercial IP Designer product.

• The subsequent extension of the behavioral modeling methodology in the PDG language towards modeling of I/O interfaces has been tested by key customers and is currently being rolled out commercially. Additionally TARGET is extending the initial set of example I/O interface models that were developed as demonstrators in the project, into a complete library of I/O interface models that will be shipped to commercial IP Designer users.

A continuation of this exploitation work is expected in a period of 2-3 years following the completion of Eurtile.

On February 7, 2014, TARGET was successfully acquired by Synopsys, Inc., the world’s leading vendor of design tools for the semiconductor industry, headquartered in Mountain View, California.

Page 4: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 4 of 28

The acquisition underscores Synopsys’ strong belief in the future of ASIP tools and in TARGET’s IP Designer product in particular. The following statements are quoted from Synopsys’ press release announcing the acquisition [Synopsys, 2014b]:

“Today's SoCs rely more on heterogeneous multi-core architectures. Designers are turning to ASIPs to implement their unique data plane and digital signal processing requirements. (…) Target's leading IP Designer and MP Designer software tools perfectly complement Synopsys' offerings for ASIP developers, enabling design teams to develop ASIPs that meet their performance, power, and flexibility requirements more efficiently and with less risk. (…) The acquisition of Target strengthens Synopsys' existing ASIP tools portfolio while bringing a world-class team of ASIP experts into the company.”

Nine months after the acquisition, the following can be said:

• Synopsys is combining TARGET’s IP Designer and MP Designer products with its pre-existing product called “Processor Designer”, in order to build an even stronger next-generation tool-suite for ASIP-based system design.

• Synopsys has kept TARGET’s entire R&D team based in Leuven (Belgium), as well as the former Processor Designer R&D team based in Leuven (Belgium), Aachen (Germany) and Noida (India), completely intact. As a matter of fact, these two teams have been merged without any reduction of personnel, into a new sizable R&D group for ASIP-based system design technologies, lead by Gert Goossens (TARGET’s former CEO and now R&D Group Director in Synopsys). This ASIP R&D group is also cooperating with Synopsys’ R&D group for DSP platforms based in Eindhoven (Netherlands). As a result, the Leuven, Aachen and Eindhoven sites now form the center of gravity for ASIP and DSP technologies in the worldwide Synopsys organization. They also form one of the strongest concentrations of Synopsys R&D engineers in system-level methodologies outside of the US.

• Synopsys continues to invest in new R&D activities to expand the functionality of its ASIP-based system design tools, bootstrapping on TARGET’s R&D activities in Euretile. In particular, new related focus areas include compilation support for high code-density and for additional programming languages such as C++ and OpenCL (used for parallel programming). Also we will explore cooperation options with Euretile partner RWTH Aachen, which previously already had an active relationship with Synopsys.

1.2 Industrial Exploitation and Impact of Simulation and Debugging Technologies

The simulation and parallel debugging technologies created by RWTH to support the development of the EURETILE system have been transferred to third-party organizations via different means, and are being used in academia and industry in different contexts. During early project stages, the hybrid simulation (HySim) technology and the parSC SystemC kernel were used by Huawei Technologies Co. Ltd. and Synopsys Inc., respectively. HySim and parSC were adapted to their needs and evaluated in an industrial setting for a period of approx. 1 year. Both technology transfer projects ended up with satisfaction from the partners and yielded the expected result of higher simulation speed for electronic system-level (ESL) simulation. Besides these activities, RWTH acts as an external services provider of simulation technologies for the Chist-Era GEMSCLAIM (GreenEr Mobile System by Cross LAyer Integrated energy

Page 5: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 5 of 28

Management, http://www.gemsclaim.eu/) project. The SCope SystemC kernel currently supports the GEMSCLAIM system simulator and the software development activities performed on it, which include the development of an energy-aware optimizing and parallelizing compiler (OpenMP+) and an energy efficient OS kernel. Currently, the University of Innsbruck (AT), Queen’s University Belfast (IE) and the Polytechnic University of Timisoara (RO) are users of the SCope technology through the GEMSCLAIM simulator. Finally, cooperation plans are being studied with other projects and organizations that have shown interest for our simulation and debugging technologies. In particular, RWTH is currently in conversations with Synopsys Inc. and Sigma Designs Inc. regarding technology transfer agreements. Additionally, HySim, parSC, SCope, WSDB and SWAT are being or planned to be used by other internal RWTH projects as well as in several PhD and Master theses.

1.3 Exploitation of the neuro-synaptic DPSNN-STDP simulator and of the QUonG platform in the CORTICONIC FET Project

Starting from October 2014, the techniques of distributed simulation of neural activity and synaptic plasticity developed by INFN in EURETILE will be adapted and improved to support the simulations planned by the CORTICONIC project, in cooperation with ISS (Istituto Superiore di Sanità). The CORTICONIC projects is about “Computations and Organization of Retes Through the Interaction of Computational, Optical and Neurophysiological Investigations of the Cerebral cortex” and is funded by the FET Grant Agreement 600806. The DPSNN-STDP application will be the starting point for the development and the QUonG platform will be used to perform the large scale simulations required by the CORTICONIC project.

1.4 Public release of DAL many-process dynamic development environment

The distributed application layer (DAL) framework [Schor, 2012], which has been developed by ETH Zurich as part of the EURETILE project, significantly affected the activities of the Computer Engineering Group at ETH Zurich in the past few years. Since 2013, the DAL framework has been made available to interested industrial and academic institutions and since summer 2014, the DAL framework is public available for download at http://www.dal.ethz.ch. Since then, the DAL framework has been successfully applied in several academic and industrial projects, both as front-end and back-end. In the following, we describe the most important activities. The DAL framework is freely available for download at http://www.dal.ethz.ch. Besides the ability to download the DAL framework, various tutorials are provided to help the interested user to use the framework. Furthermore, a set of advanced embedded system benchmarks are available for download including a distributed implementation of a ray-tracing algorithm, a video-processing application, and an H-264 decoder and encoder. Finally, DALipse, the graphical software development environment for the DAL framework, is available for download at the website. At ETH Zurich, two separate projects have adopted the DAL framework. The EU FP 7 project CERTAINTY (http://www.certainty-project.eu) is using the DAL framework as a baseline to develop their own software management framework for mixed-criticality systems. The UltrasoundToGo project (http://www.nano-tera.ch/projects/359.php), which is founded by the Swiss National Science Foundation, uses the DAL framework to prototype software for embedded medical devices. In particular, the DAL framework has recently been used to demonstrate the ability to use process networks in order to develop complex medical applications [Tretter, 2014].

Page 6: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 6 of 28

Furthermore, the DAL framework has been the basis for various PhD, Master, and Bachelor theses at ETH Zurich. Besides internal projects, the DAL framework has been applied by other academic and industrial users to prototype embedded streaming applications. For instance, the DAL framework has been used by Huawei Technologies Co. Ltd. and by Broadcom Corporation for prototyping new design methodologies. In addition, the following academic institutions have been using the DAL framework for either teaching or research activities: the McGill University, the Halmstad University, the University of Amsterdam, and the University of Oulu. Among these collaborations, the integration of the DAL framework into the Open RVC CAL (ORCC) compiler is the most elaborative one. The CAL actor language is a popular dataflow language to specify complex multi-processor systems. The ORCC compiler is available to convert CAL applications into conventional software languages such as C, C++, and Verilog. In order to automatically generate code for the DAL framework, the University of Oulu recently added a code generator back-end to the ORCC compiler [Boutellier, 2014] as part of the Finnish Academy of Science project “Dataflow oriented automated design toolchain (DORADO)”, which is carried out between 2011 and 2015 (http://www.cse.oulu.fi/CMV/Research/DORADOproject). The current plan is to extend this interface during an additional research project that will be carried out between 2015 and 2019.

1.5 Public Release and or Industrial Impact foresees by TIMA The capabilities of DNA-OS have been proved in this project, targeting both embedded and HPC domains. Both targeted architectures are really specific and can not be exploited directly. Nevertheless, the port of DNA-OS for x86, its multi-core and multi-tile supports, IGB driver and some other features could be re-used in the context of industrial transfer to third-party, as specific service developments, or all-in-one for a multi-core or multi-tile architecture. This work opens new industrial and research activities, as DNA-OS is delivered with our own simulation environment (ARM-based architecture for embedded domain). For instance, a work with the Kalray company in Grenoble has started one month ago to port DNA-OS on the Kalray processor. The results obtained concerning task migration are new and validated only by simulation. This could have an industrial impact when it runs on the hardware, following publications of these results. The proposed methodology, that does not require any modification in the OS (even for a light OS without virtual memory or dynamic loading support) is an interesting innovation. The port of DNA-OS for x86 is freely available and could be used with our own simulation environment also freely available. This could be used by any other designers. The task migration methodolgy will be published to be used. But all the developments are quite specific and depend on the specification, and on the architecture (memory map, I/O, drivers, …) making this difficult to be used as it is. This work opens perspectives in the HPC domain. DNA-OS has been developed for the embedded domain and research activities. The work done in the EURETILE project and the results highlight that there is no restriction in the use of DNA-OS, and the HPC domain could be targeted. The added value of task migration for critical tasks (load balancing or thermal management) is of prime interest in this HPC domain. Contributions to H2020 project could be a perspective.

Page 7: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 7 of 28

1.6 From EURETILE HPC to High-Energy Physics real-time applications The GPGPU computational paradigm, i.e. General-purpose computing on graphics processing units, is nowadays well estabilished in the High Performace Computing arena: several GPU accelerated clusters are present in the highest ranks of top500 list since few years. Virtually any Computational Physics simulation application, such as Lattice Quantum Chromo-Dynamics or Fluid Dynamics, have been adapted to take advantage of GPUs processing power exploiting their fine grained parallelism and show significant speedups in their execution times. Similar efforts are ongoing in several fields of Experimental Physics, ranging from Radio Astronomy to High Energy Physics. These contexts are often characterized by real-time constraints in processing data streams coming from experimental apparatuses. GPUs show a stable processing latency once data are available in their own internal memories, so a deterministic latency data transport mechanism becomes crucial when implementing a GPGPU system with real-time constraints. The NaNet dissemination activity aimed at designing a Network Interface Card (NIC) implementing such deterministic latency data transport of experimental data to processor or GPU memories. NaNet reuse several IPs developed for the APEnet+ board in the framework of the EURETILE project, retailoring the design to integrate it in experimental setups, i.e. adding support for several standard and custom link technologies and developing dedicated modules to ensure the real-time characterization of the system. The result is a modular design of a low-latency PCIe Gen2 X8 RDMA NIC supporting standard 1GbE (1000BASE-T) and 10GbE (10Base-R), besides custom 34 Gbps APElink [Ammendola, 2013a] and 2.5 Gbps deterministic latency optical KM3link [Aloisio, 2011] links. The design includes a network stack protocol offload engine yielding a very stable communication latency, a feature making NaNet suitable for use in real-time contexts; NaNet GPUDirect P2P/RDMA capability, inherited from the APEnet+ design, extends its realtime-ness into the world of GPGPU heterogeneous computing. NaNet has been employed, with different design configurations and physical device implementations, in two different High Energy Physics (HEP) experiments: the NA62 experiment at CERN and the KM3Net underwater neutrino telescope. The NA62 experiment at CERN aims at measuring the Branching Ratio of the ultra-rare decay of the charged Kaon into a pion and a neutrino-antineutrino pair. The NA62 goal is to collect ~100 events with a signal to background ratio 10:1, using a novel technique with a high-energy (75~GeV) unseparated hadron beam decaying in flight. The trigger system plays a crucial role in any HEP experiment by deciding whether a particular event observed in a detector should be recorded or not, based on limited and partial information. Since every experiment features a limited amount of network bandwidth and storage for data acquisition, the use of real-time selections is fundamental to make the experiment affordable maintaining at the same time its discovery potential. Such selections reject uninteresting events only, and therefore reduce selectively data throughput. In the NA62 experiment, to manage the high-rate data stream due to a ~10 MHz rate of particle decays illuminating the detectors, the trigger system is designed as a set of three trigger levels reducing this rate by three orders of magnitude. The low-level trigger (L0), implemented in hardware by means of FPGAs on the readout boards, reduces the stream bandwidth by a factor 10 and is a synchronous real-time system: a decision whether data on readout board buffers are to be sent to higher levels or not has to be made within 1 ms to avoid loss of data. The upper trigger levels (L1 and L2) are software-implemented on a commodity PC farm for further reconstruction and event building. In the baseline implementation, the FPGAs on the readout boards compute simple trigger primitives on the fly, such as hit multiplicities and rough hit patterns, which are then timestamped and sent to a central trigger processor for matching and trigger decision. A pilot project within NA62 is investigating the possibility of using a GPGPU system as L0 trigger processor (GL0TP): this system processes unfiltered data from detectors, exploiting GPU computing power to implement more selective trigger algorithms. The first prototype of the GL0TP has recently been deployed in the experiment. The protoype integrates NaNet-1, a version of NaNet design configured to receive data from readout boards over 1GbE link with UDP data protocol and implemented on Altera Stratix IV development board. The GL0TP will operate in parasitic mode with respect to the main L0 trigger processor and, at least

Page 8: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 8 of 28

in the initial phases of the study, will process data from only one detector (the RICH - Ring Imaging CHerenkov detector). Altough the single 1GbE link of NaNet-1 has not enough bandwidth to cope with the bandwidth requirements of the experiment, in the order of tens of Gbps, we are using this reduced-bandwidth setup to assess the feasibility of our approach and to collect latency, bandwidth and throughput measurements that are driving the development of the multi-port 10GbE version of the board (NaNet-10). Preliminary results of this activity, available in [twepp2014-ArXiV], show that the NaNet design fits with the real-time requirements of the system, yielding low and stable communication latencies between readout boards and GPU internal memory, making possible the usage of a GPGPU system as low level trigger for the NA62 experiment. NaNet3 represents the customization of the NaNet design for KM3Net, an underwater experimental apparatus for the detection of high energy neutrinos in the TeV/PeV range based on the Cherenkov technique. The detector measures the visible Cherenkov photons induced by charged particles propagating in sea water at speed larger than that of light in the medium, and consists of a 3D array of photomultipliers (PMT). The charged particle track can be reconstructed measuring the time of arrival of the Cherenkov photons on the PMTs, whose positions is continuosly monitored with a precision better than 40 cm. The KM3Net detection unit is called Tower and consists of 14 floors vertically spaced 20 meters apart. The floor arms are about 8 m long and support 6 glass spheres called Optical Modules (OM): 2 OMs are located at each floor end and 2 OMs in the middle of the floor; each OM contains one 10 inches PMT and the front-end electronics needed to digitize the PMT signal, format and transmit the data. Each floor hosts also two hydrophones, used to reconstruct in real-time the OM position and oceanographic instrumentation to monitor site conditions relevant for the detector. All data produced by OMs, hydrophones, and instruments, are collected by an electronic board contained in a vessel at the centre of the floor; this board, called Floor Control Module (FCM) manages the communication between the on-shore laboratory and the underwater devices, also distributing the timing information and signals. Timing resolution is fundamental in track reconstruction, i.e. pointing accuracy in reconstructing the source position in the sky. An overall time resolution of about 3 ns yields an angular resolution of 0.1 degrees for neutrino with energies greater than 1 TeV. This requirement implies that the readout electronics, which is spatially distributed, have a common timing with a known delay with respect to a fixed reference. The described constraints hinted to the choice of a synchronous link protocol which embeds clock and data with a deterministic latency; due to the distance between the apparatus and shoreland, the transmission medium is forced to be an optical fiber. Data produced by the OMs, the hydrophones and other devices in a single floor are collected by the FCM board, packed together and transmitted through a optical bidirectional point-to-point link to one port of a NaNet3 board hosted in a server in the on-shore laboratory. A single NaNet3 board manages four optical links, and hence four floors, sending GPS timing data and slow control commands to off-shore devices and receiving experimental and devices slow control data from them. For this particular NaNet application we implemented a synchronous protocol with deterministic latency at link physical level in order to ensure the correct event timestamping, along with a data level offload module handling the Time Division Multiplexing protocol adopted for the data transport. NaNet3 is implemented on a Terasic DE5-Net board, which is based on a Altera Stratix V GX FPGA device with straight connections to four external 10G SFP+ modules and a PCIe x8 edge connector, fully supporting the four optical KM3link channels and PCIe Gen2 X8 I/O interface of the design. The GPUDirect P2P/RDMA module is also included in the NaNet3 design, paving the way for the already started development of a GPU-based trigger for the experiment. At the present stage of development, a single optical KM3link has been thoroughly tested, with 48 hours of continuous error-free data acquisition of OM data; slow control functionalities has been tested as well. Further details on the activities carried on until now are reported in [twepp2014-ArXiV]. The deployment of the first KM3Net Tower is expected within November 2014, NaNet3 will be used to perform data acquisition and off-shore devices monitoring and control tasks since first day of operation.

Page 9: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 9 of 28

1.7 EURETILE Exploitation via TETRACOM technology transfer project A more general, yet important, path to technology transfer is enabled via the TETRACOM FP7 project (www.tetracom.eu). TETRACOM, coordinated by Rainer Leupers from RWTH, provides a key instrument to improve industrial uptake of research results via so-called Technology Transfer Projects (TTPs). TTPs provide a novel and systematic incentive for small to medium scale TT at European level. As an important support measure, the TTPs are backed by Technology Transfer Infrastructures (TTIs), such as regular workshops, trainings, and TT consultation services by experts which will are widely announced via the HiPEAC network of excellence. All computing systems community members can apply for TTP funding in TETRACOM using a very efficient proposal scheme. In short, the “TTP algorithm” works as follows: All TTPs are based on bilateral academia-industry TT partnerships. One academic partner A teams up with one industry partner B, who is interested in taking up a specific technology or IP developed by A for internal use, evaluation, or productization. The total volume of the intended TTP is between 10k-200k Euros, and the total TTP duration is between 3-12 months. Partner A, assisted by B, submits a lightweight three-page TTP proposal to TETRACOM that will be efficiently evaluated by experts according to several well-defined and public criteria. Following a positive evaluation, TETRACOM can provide funding of up to 50% of the total TTP volume. This funding will be received only by partner A, but it will indirectly also benefit partner B of course. Three calls for TTP proposals are being issued in TETRACOM over the project duration. The 2nd call is open during Nov 15-Dec 31, 2014. While TETRACOM, as an open coordination action, naturally cannot provide any preference for specific projects, the EURETILE-TETRACOM link has been used to make all partners aware of the TTP opportunities regarding their newly developed foreground technologies. In fact, the first “EURETILE TTP” (INFN collaborating with EUROTECH) has already been accepted.

1.8 Workshop on Multi-core debugging Regarding debugging technologies, the Multicore Application Debugging workshop (MAD, http://www.mad-workshop.de) was created by RWTH (in cooperation with the Technical University of Munich) in 2013 largely inspired on the challenges, opportunities and experience derived from the EURETILE debugging activities. MAD was created to foster the exchange of ideas, approaches, solutions and requirements between industry and academia on the field of debugging for parallel embedded systems. The community response to the MAD workshop has been exceptional as it targets a critical need which is highly underrepresented in research. The first edition of MAD was held in Munich (DE) on Nov. 14-15, 2013, with approx. 50 participants largely from industry. The second MAD workshop was organized in Athens (GR) in the context of the HiPEAC Fall Computing Systems Week (Oct. 8-10, 2014). MAD’14 had approx. 90 registered participants but it was accessible overall to approx. 200 people attending the HiPEAC event. In its two editions, MAD has had involvement of companies like Bosch, Intel, Infineon, Samsung, ARM, Freescale, Synopsys, Lauterbach and ST as well as academic institutions like ETH Zurich, Barcelona Supercomputing Center, University of Grenoble, TU Vienna, University of Luebeck, National University of Singapore, and many others. The success of MAD’13 and MAD’14 calls for a new edition in 2015, which is already under plans. Furthermore, a possible merge with the S4D (System, Software, SoC and Silicon Debug) conference from ECSI is being studied by the MAD organizers.

Page 10: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 10 of 28

1.9 Workshop on computing on heterogeneous many-tile computing systems

The International Workshop “Perspective of GPU Computing in Physics and Astrophysics” (http://www.roma1.infn.it/conference/GPU2014/) was held in Rome, 15-17 September 2014. The aim of the meeting was to present and discuss some of the modern applications in Physics and Astrophysics (and related subfields) of hybrid computational systems based on multicore CPU governing a set of Graphic Processing Units (GPUs) acting as number crunchers while a significant amount of time has been specifically devoted to the hardware and software aspects involved. The meeting was organized by INFN jointly with several Italian research agencies, “Sapienza” University of Rome and primary industrial partners (Nvidia, HP, E4-Intel,…). More than 80 attendants participated from Europe, Asia and USA with very-well known (at international level) keynote and invited speakers from academia and industry. Piero Vicini was the workshop co-chair and EURETILE team was (minimally) involved in workshop organization. The EURETILE achievements were presented with 2 plenary talks and 1 poster.

2. Publications / Presentations This section includes lists of:

• joint papers and publications • papers produced during the last project period (Jan-Sep 2014) by individual

beneficiaries • presentations performed during the last project period (Jan-Sep 2014) by each

beneficiary • list of papers and presentations during previous project periods (2010 – 2013)

2.1 Joint Publications • L. Schor, I. Bacivarov, L. G. Murillo, P. S. Paolucci, F. Rousseau, A. El Antably, R. Buecs,

N. Fournel, R. Leupers, D. Rai, L. Thiele, L. Tosoratto, P. Vicini, and J. Weinstock,EURETILE Design Flow: Dynamic and Fault Tolerant Mapping of Multiple Applications onto Many-Tile Systems, Parallel and Distributed Processing with Applications (ISPA), 2014 IEEE International Symposium on, Milan, Italy, Aug. 2014 http://dx.doi.org/10.1109/ISPA.2014.32.

• J. H. Weinstock, C. Schumacher, R. Leupers, G. Ascheid and L. Tosoratto, Time-

Decoupled Parallel SystemC Simulation, in Proceedings of the Conference on Design, Automation & Test in Europe (DATE), 2014, Dresden, Germany http://dx.doi.org/10.7873/DATE.2014.204

• C. Schumacher, J. H. Weinstock, R. Leupers, G. Ascheid, L. Tosoratto, A. Lonardo, D.

Petras, A. Hoffmann: legaSCi: Legacy SystemC Model Integration into Parallel Simulators, ACM Transactions on Embedded Computing Systems, Special Issue on Virtual Prototyping of Parallel and Embedded Systems.http://dx.doi.org/10.1109/IPDPSW.2013.34

• F. Rousseau, C. Deschamps, N. Fournel, E. Pastorelli, F. Simula, L. Tosoratto, P.S.

Paolucci, An Efficient, Deterministic Environment for Distributed Polychronous Plastic Spiking Neural Nets, submitted to DATE'15, NeuComp2015, 2nd Int. Workshop on Neuromorphic and Brain-Based Computing Systems, 13 March 2015, Grenoble, France

Page 11: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 11 of 28

• P. S. Paolucci, I. Bacivarov, D. Rai, L. Schor, L. Thiele, H. Yang, E. Pastorelli, R.

Ammendola, A. Biagioni, O. Frezza, F. Lo Cicero, A. Lonardo, F. Simula, L. Tosoratto, P. Vicini EURETILE D7.3 - Dynamic DAL benchmark coding, measurements on MPI version of DPSNN-STDP (distributed plastic spiking neural net) and improvements to other DAL codes, (Aug 2014), arXiv:1408.4587 [cs.DC], http://arxiv.org/abs/1408.4587

• Paolucci, P.S., Bacivarov, I., Goossens, G., Leupers, R., Rousseau, F., Schumacher, C.,

Thiele, L., Vicini, P., EURETILE 2010-2012 summary: first three years of activity of the European Reference Tiled Experiment., (2013), arXiv:1305.1459 [cs.DC] , http://arxiv.org/abs/1305.1459 - then published as ISBN: 978-88-908488-0-3 (2013),http://dx.doi.org/10.12837/2013T01

• EURETILE 2010-2014: final technical report under preparation, a summary of technical

highlights will be submitted as Journal Paper.

2.2 Publications during the last reporting period (Jan-Sep 2014), individually produced by each beneficiary

2.2.1 INFN - 2014 • A. Lonardo, F. Ameli, R. Ammendola, A. Biagioni, O. Frezza, G. Lamanna, F. Lo Cicero,

M. Martinelli, P. S. Paolucci, E. Pastorelli, L. Pontisso, D. Rossetti, F. Simeone, F. Simula, M. Sozzi, L. Tosoratto, P. Vicini, "NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features", preprint arXiv:1406.3568 [physics.ins-det].

• Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini " A Hierarchical Watchdog Mechanism for Systemic Fault Awareness on Distributed Systems", to be published, Future Generation Computer Systems

• Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Werner Geurts, Gert Goossens, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini "ASIP Acceleration for Virtual-to-Physical Address Translation on RDMA-Enabled FPGA-Based Network Interfaces", to be published, Future Generation Computer Systems

• Matteo Bauce, Andrea Biagioni, Alessandro Lonardo, Andrea Messina, Pier Stanislao Paolucci, Francesco Simula, Piero Vicini, Stefano Chiozzi, Angelo Cotta Ramusino, Massimiliano Fiorini, Alberto Gianoli, Ilaria Neri, Riccardo Fantechi, Gianluca Lamanna, Roberto Piandani, Marco Sozzi, Mauro Piccini, Cristiano Santoni "GPUs for Online Processing in Low-Level Trigger Systems", to be published, Proceedings of Science (TIPP) Technology and Instrumentation in Particle Physics 2014, 2-6 June, 2014 Amsterdam, the Netherlands

• L. Tosoratto, R. Ammendola, A. Biagioni, O. Frezza, F. Lo Cicero, A. Lonardo, M. Martinelli, P.S. Paolucci, E. Pastorelli, D. Rossetti, F. Simula, P. Vicini "Architectural Improvements and Technological Enhancements for the APEnet+ Interconnect System", Poster, to be published, JINST, Journal of Instrumentation, Proceedings of Topical Workshop on Electronics for Particle Physics (TWEPP) 2014, IOP Publishing, 2014.

Page 12: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 12 of 28

• L. Tosoratto, A. Lonardo, F. Ameli, R. Ammendola, A. Biagioni, A. Cotta Ramusino, M. Fiorini, O. Frezza, G. Lamanna, F. Lo Cicero, M. Martinelli, I. Neri , P.S. Paolucci, E. Pastorelli, L. Pontisso, D.Rossetti, F. Simeone, F. Simula, M. Sozzi and P. Vicini"NaNet: a Configurable NIC Bridging the Gap Between HPC and Real-time HEP GPU Computing", Poster, to be published, JINST, Journal of Instrumentation, Proceedings of Topical Workshop on Electronics for Particle Physics (TWEPP) 2014, IOP Publishing, 2014.

• Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto and Piero Vicini "LO|FA|MO: Fault Detection and Systemic Awareness for the QUonG computing system", to be published, IEEE Proceedings (SRDS) International Symposium on Reliable Distributed Systems, Nara, Japan, October 6-9, 2014

• A. Biagioni, R. Ammendola, O. Frezza, F. Lo Cicero, A. Lonardo, M. Martinelli, P.S. Paolucci, E. Pastorelli, D. Rossetti, F. Simula, L. Tosoratto, P. Vicini, "Evolution of FPGA-based network acceleration for GPUs", poster at the conference Perspectives of GPU Computing in Physics and Astrophysics, Rome, Italy, September 15-17, 2014

• A. Biagioni, R. Ammendola, O. Frezza, F. Lo Cicero, A. Lonardo, M. Martinelli, P.S. Paolucci, E. Pastorelli, D. Rossetti, F. Simula, L. Tosoratto, P. Vicini, "Development of a GPU aware NIC: from HPC to HEP experiments"poster at the conference (GTC) GPU Technology Conference, March 24-26, 2013 - San Jose (California)

• P.S. Paolucci, R. Ammendola, A. Biagioni, O. Frezza, F. Lo Cicero, A. Lonardo, E. Pastorelli, F. Simula, L. Tosoratto, P. Vicini, "Distributed simulation of polychronous and plastic spiking neural networks: strong and weak scaling of a mini app benchmark", Poster at the Workshop 'Dagli atomi al cervello', Politecnico di Milano, 27 Jan 2014

2.2.2 ETHZ - 2014 Publications • L. Schor: "Programming Framework for Reliable and Efficient Embedded Many-Core

Systems", PhD Thesis, ETH Zurich, October 2014. • Gomez, L. Schor, P. Kumar, and L. Thiele: "SF3P: A Framework to Explore and Prototype

Hierarchical Compositions of Real-Time Schedulers", Proc. International Symposium on Rapid System Prototyping (RSP), New Delhi, October 2014 [[1]].

• L. Schor, I. Bacivarov, H. Yang, and L. Thiele: "AdaPNet: Adapting Process Networks in Response to Resource Variations", Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), New Delhi, India, October 2014 [[2]].

• P. Kumar and L. Thiele: "Worst-Case Guarantees on a Processor with Temperature-based Feedback Control of Speed", ACM Trans. Embed. Comput. Syst. 13, 4s, Article 122, July 2014 [[3]].

• L. Schor, H. Yang, I. Bacivarov, and L. Thiele: "AdaPNet: Adapting the Structure of Process Networks in Response to Resource Variations at Run-Time (Poster)", DAC Work-in-Progress Session (WIP), San Francisco, CA, USA, June 2014 [[4]].

• D. Rai, P. Huang, N. Stoimenov, and L. Thiele: "An Efficient Real Time Fault Detection and Tolerance Framework Validated on the Intel SCC Processor". Proc. Design Automation Conference (DAC), San Francisco, CA, USA, June 2014 [[5]].

• S.-H. Kang, H. Yang, S. Kim, I. Bacivarov, S. Ha, and L. Thiele: "Static Mapping of Mixed-Critical Applications for Fault-Tolerant MPSoCs", Proc. Design Automation Conference (DAC), pages 31:1-31:6, San Francisco, CA, USA, June 2014 [[6]].

Page 13: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 13 of 28

• S.-H. Kang, H. Yang, S. Kim, I. Bacivarov, S. Ha, and L. Thiele, "Reliability-Aware Mapping Optimization of Multi-Core Systems with Mixed-Criticality," Proc. IEEE/ACM Design Automation and Test in Europe (DATE), pages 327:1-327:4, 2014 [[7]].

2.2.3 RWTH - 2014 Publications • L. G. Murillo, R. Buecs, R., D. Hincapie, R. Leupers, and G. Ascheid, SWAT: Assertion-

based Debugging of Concurrency Issues at System Level, in ASP-DAC‘15, Chiba/Tokyo, Japan, Jan. 2015, (accepted for publication)

• L. G. Murillo, R. Buecs, D. Hincapie, R. Leupers and G. Ascheid, Assertion-based Debugging of Concurrency Issues in Many-core Systems across HW/SW Boundaries, in DAC‘14 Work in Progress Session (WIP), June 2014, San Francisco, USA

• L. G. Murillo, S. Wawroschek, J. Castrillon, R. Leupers and G. Ascheid: "Automatic Detection of Concurrency Bugs through Event Ordering Constraints", in Proceedings of the Conference on Design, Automation & Test in Europe (DATE), 2014, Dresden, Germany

• J. H. Weinstock, C. Schumacher, R. Leupers, G. Ascheid and L. Tosoratto: "Time-Decoupled Parallel SystemC Simulation", in Proceedings of the Conference on Design, Automation & Test in Europe (DATE), 2014, Dresden, Germany

2.2.4 TIMA - 2014 Publications • A. ElAntably, N. Fournel, F. Rousseau, "Lightweight task migration in embedded multi-tiled

architectures using task code replication", Rapid System Prototyping symposium, part of the Embedded System week (ESWeek) 2014, Greater Noida, India, 16 and 17th of Oct. 2014.

2.3 Presentations of project results during last reporting period (Jan – Sep 2014)

2.3.1 ETHZ - 2014 Presentations • Panel discussion -­‐‑ Self-­‐‑Adaptive Computing Systems: myths and successes, what to

expect in the next 10 years. Organizers: Marco D. Santambrogio, Politecnico di Milano, Italy, Hank Hoffmann, University of Chicago, IL, USA; Panelists: Iuliana Bacivarov - ETH Zürich, Switzerland; Ayse K. Coskun - Boston Univ., Boston, MA; Steven Hofmeyr - Lawrence Berkeley National Lab, Berkeley, CA; Gianluca Durelli - Politecnico di Milano, Italy; Oliver Pell - Univ. of California, Berkeley; Hank Hoffman - Univ. of Chicago, IL; Alessandro Nacci - Politecnico di Milano, Italy; Christian Pilato - Columbia Univ., New York, NY.

• Iuliana Bacivarov, AdaPNet Runtime System: Adapting Process Networks to Resource Variations. CHANGE Workshop co-located with Parallel and Pervasive Computing Week 2014, 25 August 2014, Milan, Italy.

• Iuliana Bacivarov, Adapting Process Networks to Dynamic Resource Changes – the AdaPNet Approach. CHANGE Workshop co-located with 51st Design Automation Conference (DAC), 1 June 2014, San Francisco, CA, USA.

• Iuliana Bacivarov, Mastering the Design of Modern Complex Distributed Systems. Public seminar at EPF Lausanne, 7 March 2014, Lausanne, Switzerland.

Page 14: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 14 of 28

2.3.2 RWTH - 2014 Presentations • L.G. Murillo. "System-level Debugging". Joint RWTH/Synopsys workshop, Nov 2014,

Aachen, Germany • J. Weinstock. "Flexible Time-Decoupling for Electronic System Level Simulators". Joint

RWTH/Synopsys workshop, Nov 2014, Aachen, Germany • L. G. Murillo. SW Debugging for Multi-tile Systems: The EURETILE Methodology and

Tools. 2nd International Workshop on Multi-core Applications Debugging. HiPEAC Fall Computing Systems Week, October 2014, Athens, Greece.

• R. Leupers. HiPEAC Workshop for New EU Member States. Zagreb (Croatia) and Ljubljana (Slovenia), Sept 2014

• R. Leupers, L. G. Murillo. 2nd International Workshop on Multi-core Applications Debugging (co-organization with TU Munich and HiPEAC). October 2014, Athens, Greece.

• R. Leupers. Joint Seminar RWTH/Kadir Has Univ., Jun 2014, Istanbul, Turkey • R. Leupers. Embedded Processor Design, course. Thai-German Graduate School

(TGGS), Apr 2014, Bangkok, Thailand • J. Weinstock. GEMSCLAIM Bi-annual Meeting. "A Parallel GEMSCLAIM Simulator using

Flexible Time-Decoupling", Feb 2014, Timisoara, Romania • R. Leupers. Academia to industry technology transfer and IP in domain of Computer

Systems, HiPEAC Workshop, Feb 2014, Timisoara, Romania • R. Leupers. Embedded Processor Design, Block lecture ALaRI, Jan-Feb 2014, University

of Lugano, Lugano, Switzerland • R. Leupers. TISU Workshop, Jan 2014, Vienna, Austria

2.3.3 INFN - 2014 Presentations • Roberto Ammendola, LO|FA|MO: Fault Detection and Systemic Awareness for the QUonG

computing system, IEEE Proceedings (SRDS) International Symposium on Reliable Distributed Systems, Nara, Japan, October 6-9, 2014 - talk at Conference

• Alessandro Lonardo, A FPGA-based Network Interface Card with GPUDirect enabling real-time GPU computing in HEP experiments, Perspectives of GPU Computing in Physics and Astrophysics, Roma, Italy, September 15-17, 2015 - talk at Conference

• Francesco Simula, Distributed simulation of Polychronous and plastic Spiking Neural Networks: experiments with GPUs, in Perspectives of GPU Computing in Physics and Astrophysics, Rome, Italy, September 15-17, 2014 - talk at Conference

• Alessandro Lonardo, A FPGA-based Network Interface Card with GPUDirect enabling real-time GPU computing in HEP experiments, GPUs in High Energy Physics, Pisa, Italy, September 10-12 , 2014 - talk at Conference

• Piero Vicini, NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features, at (RT2014) Real-Time Conference, Nara, Japan, May 26-30, 2014

2.4 Publications during the previous reporting periods (2010 - 2013), individually produced by each beneficiary

2.4.1 INFN - 2013 Publications • R. Ammendola, M. Bernaschi, A. Biagioni, M. Bisson, M. Fatica, O. Frezza, F. Lo Cicero,

A. Lonardo, E. Mastrostefano, P. S. Paolucci, D. Rossetti, F. Simula, L. Tosoratto, and P.

Page 15: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 15 of 28

Vicini, “GPU Techniques Applied to a Cluster Interconnect,” in Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2013 IEEE 27th International, pp. 806–815, 2013, [[18]]

• R. Ammendola, A. Biagioni, O. Frezza, A. Lonardo, F. Lo Cicero, P.S. Paolucci, D. Rossetti, F. Simula, L. Tosoratto and P. Vicini, “APEnet+ 34 Gbps data transmission system and custom transmission logic,” in JINST, Journal of Instrumentation, Proceedings of Topical Workshop on Electronics for Particle Physics (TWEPP) 2013, IOP Publishing, 2013. [[19]]

• Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini, “Virtual-to-Physical Address Translation for an FPGA-based Interconnect with Host and GPU Remote DMA Capabilities.,” in Field-Programmable Technology (FPT), 2013 International Conference on, 2013. [[20]]

• R. Ammendola, A. Biagioni, O. Frezza, F. Lo Cicero, A. Lonardo, P. S. Paolucci, D. Rossetti, F. Simula, L. Tosoratto and P. Vicini, “Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network“, Journal of Physics: Conference Series, Workshop on Advanced Computing & Analysis Techniques in Physics Research (ACAT) 2013, published as 2014 J. Phys.: Conf. Ser. 523 012013 doi:10.1088/1742-6596/523/1/012013

• R. Ammendola, A. Biagioni, L. Deri, M. Fiorini, O. Frezza, G. Lamanna, F. Lo Cicero, A. Lonardo, A. Messina, M. Sozzi, F. Pantaleo, P.S. Paolucci, D. Rossetti, F. Simula, L. Tosoratto and P. Vicini, “GPU for Real Time processing in HEP trigger systems“, Journal of Physics, Conference Series, Workshop on Advanced Computing & Analysis Techniques in Physics Research (ACAT) 2013, published as 2014 J. Phys.: Conf. Ser. 523 012007 doi: 10.1088/1742-6596/523/1/012007

• Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Pier Stanislao Paolucci, Alessandro Lonardo, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini, “Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems“, Journal of Physics: Conference Series, International Conference on Computing in High Energy and Nuclear Physics (CHEP) 2013, doi:10.1088/1742-6596/513/5/052002 arXiv:1311.1741

• M. Bauce, A. Biagioni, R. Fantechi, M. Fiorini, S. Giagu, E. Graverini, G. Lamanna, A. Lonardo, A. Messina, F. Pantaleo, R. Piandani , M. Rescigno, F. Simula, M. Sozzi and P. Vicini “GPUs for real-time processing in HEP trigger systems“, Journal of Physics: Conference Series, International Conference on Computing in High Energy and Nuclear Physics (CHEP) 2013, doi:10.1088/1742-6596/513/1/012017

• Roberto Ammendola, Andrea Biagioni, Riccardo Fantechi, Ottorino Frezza, Gianluca Lamanna, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Felice Pantaleo, Roberto Piandani, Luca Pontisso, Davide Rossetti, Francesco Simula, Marco Sozzi, Laura Tosoratto, Piero Vicini “NaNet: a low-latency NIC enabling GPU-based, real-time low level trigger systems“ , Journal of Physics: Conference Series, International Conference on Computing in High Energy and Nuclear Physics (CHEP) 2013, doi:10.1088/1742-6596/513/1/012018 arXiv:1311.1010

• M.Bauce, A.Biagioni, R.Fantechi, M.Fiorini, S.Giagu, E.Graverini, G.Lamanna , A.Lonardo, A. Messina, F.Pantaleo; R.Piandani , M.Rescigno, F.Simula, M.Sozzi and P.Vicini, “GPU for Real Time processing in HEP trigger systems“, Proceedings of Science, European Physical Society Conference on High Energy Physics (EPS-HEP) 2013, PoS EPS-HEP2013 (2013) 503.

• R. Ammendola, M. Bauce, A. Biagioni, R. Fantechi, M. Fiorini, S. Giagu, E. Graverini, G. Lamanna, A. Lonardo, A. Messina, F. Pantaleo, R. Piandani, M. Rescigno, F. Simula, M.

Page 16: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 16 of 28

Sozzi, P. Vicini “The GAP Project - GPU for Realtime Applications in High Energy Physics and Medical Imaging“, IEEE Xplore, Nuclear Science Symposium and Medical Imaging Conference workshop (NSS/MIC) 2013, DOI: 10.1109/NSSMIC.2013.6829757.

• Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto and Piero Vicini, “Design and implementation of a modular, low latency, fault-aware, FPGA-based Network Interface" IEEE Xplore, track on International Conference on Reconfigurable Computing and FPGAs (ReConFig) 2013, doi:10.1109/ReConFig.2013.6732275.

• R. Ammendola, A. Biagioni, O. Frezza, G. Lamanna, A. Lonardo, F. Lo Cicero, P. S. Paolucci, F. Pantaleo, D. Rossetti, F. Simula, M. Sozzi, L. Tosoratto, P.Vicini “NaNet: a flexible and configurable low-latency NIC for real-time trigger systems based on GPUs“, in JINST, Journal of Instrumentation, Proceedings of Topical Workshop on Electronics for Particle Physics (TWEPP) 2013, IOP Publishing, 2013, doi:10.1088/1748-0221/9/02/C02023.

• Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto and Piero Vicini, “’Mutual Watch-dog Networking’: Distributed Awareness of Faults and Critical Events in Petascale/Exascale systems,” arXiv:1307.0433, July 2013.

• Pier Stanislao Paolucci, Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Elena Pastorelli, Francesco Simula, Laura Tosoratto and Piero Vicini, “Distributed simulation of polychronous and plastic spiking neural networks: strong and weak scaling of a representative mini-application benchmark executed on a small-scale commodity cluster“ arXiv:1310.8478

• Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Werner Geurts, Gert Goossens, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini "A heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications: Vol. II", 2012 technical report - arXiv preprint arXiv:1307.1270

2.4.2 INFN - 2012 Publications • Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro

Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini. APEnet+: a 3-D Torus network optimized for GPU-based HPC Systems. New York, NY. Proceedings on 2012 J. Phys.: Conf. Ser. 396 042059 doi:10.1088/1742-6596/396/4/042059 [[21]]. (CHEP 2012).

• R. Ammendola, A. Biagioni, O. Frezza, A. Lonardo, F. Lo Cicero, P. S. Paolucci, D. Rossetti, A. Salamon, F. Simula, L. Tosoratto, P. Vicini. A 34 Gbps Data Transmission System with FPGAs Embedded Transceivers and QSFP plus Modules. 2012 IEEE NUCLEAR SCIENCE SYMPOSIUM AND MEDICAL IMAGING CONFERENCE RECORD (NSS/MIC). Book Series: IEEE Nuclear Science Symposium Conference Record. Pages: 872-876. Published: 2012.

• R Ammendola, A Biagioni, O Frezza, F Lo Cicero, A Lonardo, PS Paolucci, D Rossetti, F Simula, L Tosoratto, P Vicini. APEnet+: a 3D Torus network optimized for GPU-based HPC Systems. Journal of Physics: Conference Series - 396 042059 doi:10.1088/1742-6596/396/4/042059 CHEP 2012 proceedings [[22]]

• S. Amerio, R. Ammendola, A. Biagioni, D. Bastieri, D. Benjamin, O. Frezza, S. Gelain, W. Ketchum, Y. K. Kim, F. Lo Cicero, A. Lonardo, T. Liu, D. Lucchesi, P. S. Paolucci, S.

Page 17: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 17 of 28

Poprocki, D. Rossetti, F. Simula, L. Tosoratto, G. Urso, P. Vicini, and P. Wittich. Applications of GPUs to online track reconstruction in HEP experiments. - NSS-MIC 2012 Proceedings DOI: 10.1109/NSSMIC.2012.6551422

• I. Bacivarov, I. Belaid, A. Biagioni, A. El Antably, N. Fournel, O. Frezza, W. Geurts, G. Goossens, J. Jovic, R. Leupers, F. Lo Cicero, A. Lonardo, L. Murillo, P. S. Paolucci, D. Rai, D. Rossetti, F. Rousseau, L. Schor, C. Schumacher, F. Simula, L. Thiele, L. Tosoratto, P. Vicini and H. Yang. EURETILE: Unified Networking Infrastructure for Embedded and HPC many-tile platforms. Poster at HIPEAC12. Jan 2012 Paris, France [[23]]

• P. S. Paolucci. Brain Simulation Benchmark: Inspiring and benchmarking the scalability and fault-tolerance of future many-tile systems. Poster at HIPEAC12. Jan 2012 Paris, France, [[24]]

2.4.3 INFN - 2011 Publications • Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro

Lonardo, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini. QUonG: A GPU-based HPC System Dedicated to LQCD Computing. Application Accelerators in High-Performance Computing, Symposium on, pp. 113-122, 2011 Symposium on Application Accelerators in High-Performance Computing, (SAAHPC 2011) http://doi.ieeecomputersociety.org/10.1109/SAAHPC.2011.15

• Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Stanislao Paolucci, Davide Rossetti,Francesco Simula, Laura Tosoratto and Piero Vicini. APEnet+ project status. - Proceedings of XXIX International Symposium on Lattice Field Theory (Lattice 2011). July 10-16, 2011. Squaw Valley, Lake Tahoe, CA http://pos.sissa.it/archive/conferences/139/045/Lattice%202011_045.pdf

2.4.4 INFN - 2010 Publications • Pier Stanislao Paolucci. FP7 EURETILE Project: EUropean REference TILed architecture

Experiment. HipeacInfo, Quarterly Newsletter, Number 24, page 11, October 2010 (http://www.Hipeac.net/newsletter) File:PaolucciEuretileHipeacInfo24October2010.pdf

• Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Paolucci, Roberto Petronzio, Davide Rossetti, Andrea Salamon, Gaetano Salina, Francesco Simula, Nazario Tantalo, Laura Tosoratto, Piero Vicini. APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters. Proceedings of Science, PoS(Lattice 2010)022, http://pos.sissa.it/archive/conferences/105/022/Lattice%202010_022.pdf, Proceedings of The XXVIII International Symposium on Lattice Field Theory, Lattice 2010. arXiv:1012.0253 [hep-lat].

• R. Ammendola, A. Biagioni, O. Frezza, F. Lo Cicero, A. Lonardo, P.S. Paolucci, D. Rossetti, A. Salamon, G. Salina, F. Simula, L. Tosoratto, P. Vicini. apeNET+: High Bandwidth 3D Torus Direct Network for PetaFLOPS Scale Commodity Clusters. 2011 J. Phys.: Conf. Ser. 331 052029 doi:10.1088/1742-6596/331/5/052029, Proceedings of International Conference on Computing in High Energy and Nuclear Physics (CHEP 2010), October 2010, Taipei, Taiwan

• R. Ammendola, A. Biagioni, O. Frezza, F. Lo Cicero, A. Lonardo, P.S. Paolucci, D. Rossetti, A. Salamon, G. Salina, F. Simula, L. Tosoratto, P. Vicini. Mastering multi-GPU computing on a torus network. GPU Technology Conference 2010 (GTC2010). http://www.nvidia.com/content/GTC/posters/2010/I09-Mastering-Multi-GPU-Computing-on-a-Torus-Networki.pdf (poster)

Page 18: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 18 of 28

• R. Ammendola, A. Biagioni, G. Chiodi, O. Frezza, F. Lo Cicero, A. Lonardo, R. Lunadei, P.S. Paolucci, D. Rossetti, A. Salamon, G. Salina, F. Simula, L. Tosoratto and P. Vicini. High speed data transfer with FPGAs and QSFP+ modules. JINST 5 C12019 doi:10.1088/1748-0221/5/12/C12019

• Ammendola, R.; Biagioni, A.; Chiodi, G.; Frezza, O.; Cicero, F.L.; Lonardo, A.; Lunadei, R.; Paolucci, P.; Rossetti, D.; Salamon, A.; Salina, G.; Simula, F.; Tosoratto, L.; Vicini, P., High speed data transfer with FPGAs and QSFP+modules. Nuclear Science Symposium Conference Record (NSS/MIC), 2010 IEEE , vol., no., pp.1323,1325, Oct. 30 2010-Nov. 6 2010 doi: 10.1109/NSSMIC.2010.5873983

2.4.5 ETHZ - 2013 Publications • H. Yang, I. Bacivarov, D. Rai, J.-J. Chen, and L. Thiele: Real-Time Worst-Case

Temperature Analysis with Temperature-Dependent Parameters. Real-Time Systems. Volume 49, Issue 6, p. 730-762, November 2013 [[8]].

• L. Schor, A. Tretter, T. Scherer and L. Thiele. Exploiting the Parallelism of Heterogeneous Systems using Dataflow Graphs on Top of OpenCL. Proc. IEEE Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia), Montreal, Canada, p. 41-50, October 2013 [[9]].

• L. Schor, H. Yang, I. Bacivarov and L. Thiele. Expandable Process Networks to Efficiently Specify and Explore Task, Data, and Pipeline Parallelism. Proc. International Conference on Compilers Architecture and Synthesis for Embedded Systems (CASES), Montreal, Canada, Oct. 2013 [[10]].

• L. Schor, I. Bacivarov, H. Yang and L. Thiele. Efficient Worst-Case Temperature Evaluation for Thermal-Aware Assignment of Real-Time Applications on MPSoCs. Journal of Electronic Testing: Theory and Applications. Volume 29, Issue 4, p. 521-535, August 2013 [[11]].

• L. Schor, D. Rai, H. Yang, I. Bacivarov and L. Thiele. Reliable and Efficient Execution of Multiple Streaming Applications on Intel's SCC Processor. Proc. Workshop on Runtime and Operating Systems for the Many-core Era (ROME), Aachen, Germany, August 2013 [[12]].

• D. Rai, L. Schor, N. Stoimenov, I. Bacivarov and L.Thiele. Designing Applications with Predictable Runtime Characteristics for the Baremetal Intel SCC. Proc. Workshop on Runtime and Operating Systems for the Many-core Era (ROME), Aachen, Germany, August 2013 [[13]].

• D. Rai, L. Schor, N. Stoimenov and L. Thiele. Distributed Stable States for Process Networks - Algorithm, Analysis, and Experiments on the Intel SCC. Proceedings of the 50th Annual Design Automation Conference, Austin, TX, USA, p. 167:1--167:10, June 2013 [[14]].

• L. Schor, H. Yang, I. Bacivarov, D. Rai and L. Thiele. Distributed Application Layer - Adaptive Mapping of Multiple Streaming Applications onto On-Chip Many-Core Systems (Poster). Joint Switzerland-Korea Symposium 2013, May 2013 [[15]].

• L. Thiele, L. Schor, I. Bacivarov, and H. Yang. Predictability for Timing and Temperature in Multiprocessor System-on-Chip Platforms. ACM Transactions in Embedded Computing Systems (TECS), Volume 12, Mar. 2013 [[16]].

• P. Kumar, D. Chokshi, and L. Thiele. A Satisfiability Approach to Speed Assignment for Distributed Real-Time Systems. Proc. Design, Automation & Test in Europe Conference (DATE), Grenoble, France, Mar. 2013 [[17]].

Page 19: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 19 of 28

2.4.6 ETHZ - 2012 Publications • P. Kumar, and L. Thiele. Behavioural Composition: Constructively Built Server Algorithms.

Proc. 5th Workshop on Compositional Theory and Technology for Real-Time Embedded Systems, San Juan, Puerto Rico, p. 9-12, Dec. 2012.

• P. Kumar and L. Thiele. Quantifying the Effect of Rare Timing Events with Settling-Time and Overshoot. Proc. IEEE Real-Time Systems Symposium (RTSS), San Juan, Puerto Rico, p. 149-160, Dec. 2012.

• S.-H. Kang, H. Yang, L. Schor, I. Bacivarov, S. Ha, and L. Thiele. Multi-Objective Mapping Optimization via Problem Decomposition for Many-Core Systems. Proc. IEEE Symposium on Embedded Systems for Real-Time Multimedia (ESTIMedia), Tampere, Finland, p. 28-37, Oct. 2012.

• L. Schor, I. Bacivarov, D. Rai, H. Yang, S.-H. Kang, and L. Thiele. Scenario-Based Design Flow for Mapping Streaming Applications onto On-Chip Many-Core Systems. Proc. Int'l Conf. on Compilers Architecture and Synthesis for Embedded Systems (CASES), Tampere, Finland, p. 71-80, Oct. 2012.

• D. Rai, H. Yang, I. Bacivarov, and L. Thiele. Power Agnostic Technique for Efficient Temperature Estimation of Multicore Embedded Systems. Proc. Int'l Conf. on Compilers Architecture and Synthesis for Embedded Systems (CASES), Tampere, Finland, p. 61-70, Oct. 2012.

• L. Schor, H. Yang, I. Bacivarov, and L. Thiele. Thermal-Aware Task Assignment for Real-Time Applications on Multi-Core Systems. Proc. Int'l Symposium on Formal Methods for Components and Objects (FMCO) 2011, Turin, Italy, Volume 7542 of LNCS, p. 294-313, Oct. 2012.

• L. Schor, H. Yang, I. Bacivarov, and L. Thiele. Worst-Case Temperature Analysis for Different Resource Models. IET Circuits, Devices & Systems, Volume 6, Issue 5, p. 297-307, Sep. 2012.

• K. Huang, W. Haid, I. Bacivarov, M. Keller, L. Thiele. Embedding Formal Performance Analysis into the Design Cycle of MPSoCs for Real-time Streaming Applications. ACM Transactions in Embedded Computing Systems (TECS Journal), ACM, Volume 11, Issue 1, p. 8:1-8:23, 2012.

• L. Schor, I. Bacivarov, H. Yang, and L. Thiele. Fast Worst-Case Peak Temperature Evaluation for Real-Time Applications on Multi-Core Systems. Proc. IEEE Latin American Test Workshop (LATW), Quito, Ecuador, p. 1-6, Apr. 2012.

• L. Schor, I. Bacivarov, H. Yang, and L. Thiele. Worst-Case Temperature Guarantees for Real-Time Applications on Multi-Core Systems. Proc. IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Proc. IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Beijing, China, p. 87-96, Apr. 2012.

• P. Kumar and L. Thiele. Timing Analysis on a Processor with Temperature-Controlled Speed Scaling. Proc. IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Beijing, China, p. 77-86, Apr. 2012.

• I. Bacivarov, I. Belaid, A. Biagioni, A. El Antably, N. Fournel, O. Frezza, J. Jovic, R. Leupers, F. Lo Cicero, A. Lonardo, L. Murillo, P.S. Paolucci, D. Rai, D. Rossetti, F. Rousseau, L. Schor, C. Schumacher, F. Simula, L. Thiele, L. Tosoratto, P. Vicini, H. Yang – “DAL: Programming Efficient and Fault-Tolerant Applications for Many-Core Systems” - Poster at HIPEAC12 - Jan 23-25, 2012 Paris, France

Page 20: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 20 of 28

2.4.7 ETHZ - 2011 Publications • P. Kumar, J.-J. Chen, and L. Thiele. Demand Bound Server: Generalized Resource

Reservation for Hard Real-Time Systems. Proc. Int'l Conference on Embedded Software (EMSOFT), pages 233-242, Oct. 2011.

• L. Schor, H. Yang, I. Bacivarov, and L. Thiele. Worst-Case Temperature Analysis for Different Resource Availabilities: A Case Study. Proc. Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), Lecture Notes on Computer Science (LNCS), Springer, Vol. 6951, pages 288-297, Sep. 2011.

• P. Kumar, J.-J. Chen, L. Thiele, A. Schranzhofer, and G. C. Buttazzo. Real-Time Analysis of Servers for General Job Arrivals. Proc. Intl. Conf. on Embedded and Real-Time Computing Systems and Applications (RTCSA), pages 251-258, Aug. 2011.

• L. Thiele, L. Schor, H. Yang, and I. Bacivarov. Thermal-Aware System Analysis and Software Synthesis for Embedded Multi-Processors. Proc. Design Automation Conference (DAC), pages 268-273, Jun. 2011.

• D. Rai, H. Yang, I. Bacivarov, JJ. Chen, L. Thiele. Worst-Case Temperature Analysis for Real-Time Systems. In Proceedings of Design, Automation and Test in Europe (DATE), Grenoble, France, March 2011.

• S. Perathoner, K. Lampka, L. Thiele. Composing Heterogeneous Components for System-wide Performance Analysis. In Proceedings of Design, Automation and Test in Europe (DATE), Grenoble, France, March 2011 (invited paper).

• I. Bacivarov, H. Yang, L. Schor, D. Rai, S. Jha, L. Thiele, Poster: Distributed Application Layer - Towards Efficient and Reliable Programming of Many-Tile Architectures. Design, Automation and Test in Europe (DATE) Friday Workshop, Grenoble, France, March 2011.

• K. Huang, L. Santinelli, JJ. Chen, L. Thiele, and G. C. Buttazzo. Applying Real-Time Interface and Calculus for Dynamic Power Management in Hard Real-Time Systems. Real-Time Systems Journal, Springer Netherlands, Vol. 47, No. 2, pages 163-193, Mar. 2011.

2.4.8 ETHZ - 2010 Publications • A. Schranzhofer, JJ. Chen, L. Thiele. Dynamic Power-Aware Mapping of Applications onto

Heterogeneous MPSoC Platforms. IEEE Transactions on Industrial Informatics, IEEE, Vol. 6, No. 4, pages 692 -707, November, 2010.

2.4.9 RWTH - 2013 Publications • C. Schumacher, J. H. Weinstock, R. Leupers, G. Ascheid, L. Tosoratto, A. Lonardo, D.

Petras, A. Hoffmann: "legaSCi: Legacy SystemC Model Integration into Parallel Simulators", ACM Transactions on Embedded Computing Systems, Special Issue on Virtual Prototyping of Parallel and Embedded Systems. (to appear)

• J. H. Weinstock, C. Schumacher, R. Leupers and G. Ascheid: "SCandal: SystemC Analysis for Nondeterminism Anomalies", in Models, Methods, and Tools for Complex Chip Design (Haase, J., ed.) vol. 265 of Lecture Notes in Electrical Engineering. Springer, 2013.

• C. Schumacher, J. H. Weinstock, R. Leupers, G. Ascheid, L. Tosoratto, A. Lonardo, D. Petras, A. Hoffmann. “legaSCi: Legacy SystemC Model Integration into Parallel SystemC Simulators”. 1st Workshop on Virtual Prototyping of Parallel and Embedded Systems (ViPES), 2013, Boston, USA.

Page 21: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 21 of 28

2.4.10 RWTH - 2012 Publications • C. Schumacher, J. H. Weinstock, R. Leupers and G. Ascheid. Cause and effect of

nondeterministic behavior in sequential and parallel SystemC simulators. IEEE International High Level Design Validation and Test Workshop (HLDVT'12). Nov 2012, Huntington Beach (California-USA).

• C. Schumacher, J. H. Weinstock, R. Leupers and G. Ascheid: Scandal: SystemC Analysis for NonDeterminism AnomaLies. Forum on Specification and Design Languages (FDL '12), Sep 2012, Vienna (Austria)

• L. G. Murillo, J. Harnath, R. Leupers and G. Ascheid. Scalable and Retargetable Debugger Architecture for Heterogeneous MPSoCs. System, Software, SoC and Silicon Debug Conference (S4D '12), Sep 2012, Vienna (Austria)

• L. G. Murillo, J. Eusse, J. Jovic, S. Yakoushkin, R. Leupers and G. Ascheid: Synchronization for Hybrid MPSoC Full-System Simulation. Design Automation Conference (DAC '12), Jun 2012, San Francisco (USA)

• R. Leupers, F. Schirrmeister, G. Martin, T. Kogel, R. Plyaskin, A. Herkersdorf and M. Vaupel. Virtual platforms: Breaking new grounds (More Real Value for Virtual Platforms). Design, Automation and Test in Europe (DATE '12), Mar 2012, Dresden (Germany)

• J. Jovic, S. Yakoushkin, L. G. Murillo, J. Eusse, R. Leupers and G. Ascheid: Hybrid Simulation for Extensible Processor Cores. Design, Automation and Test in Europe (DATE '12), Mar 2012, Dresden (Germany)

2.4.11 RWTH - 2011 Publications • S. Kraemer, R. Leupers, D. Petras, T. Philipp, A. Hoffmann. Checkpointing SystemC-

Based Virtual Platforms. International Journal of Embedded and Real-Time Communication Systems (IJERTCS), vol. 2, no. 4, 2011

• L. G. Murillo, W. Zhou, J. Eusse, R. Leupers, G. Ascheid. Debugging Concurrent MPSoC Software with Bug Pattern Descriptions. System, Software, SoC and Silicon Debug Conference (S4D '11), Oct 2011, Munich (Germany)

• R. Leupers, G. Martin, N. Topham, L. Eeckhout, F. Schirrmeister, X. Chen: Virtual Manycore Platforms: Moving Towards 100+ Processor Cores. Design Automation & Test in Europe (DATE), Mar 2011, Grenoble (France)

• J. Castrillon, A. Shah, L. G. Murillo, R. Leupers, G. Ascheid. Backend for Virtual Platforms with Hardware Scheduler in the MAPS Framework. 2nd IEEE Latin America Symp. on Circuits and Systems, Feb 2011, Bogota (Colombia)

• S. Kraemer, Design and analysis of efficient MPSoC simulation techniques, dissertation, 2011, Aachen, Germany (http://darwin.bth.rwth-aachen.de/opus3/frontdoor.php?source_opus=3769&la=de)

2.4.12 RWTH - 2010 Publications • C. Schumacher, R. Leupers, D. Petras and A. Hoffmann. parSC: Synchronous Parallel

SystemC Simulation on Multi-Core Host Architectures. In Proceedings of CODES/ISSS '10, October, 2010, Scottsdale, Arizona, USA (http://dx.doi.org/10.1145/1878961.1879005)

2.4.13 TIMA - 2013 Publications • M. Jaber, A. Chagoya-Garzon, F. Rousseau, From System Model Formalization Towards

Correct and Efficient HW/SW Design, DTIS Conference , March 2013, pp. 88 – 94, Abu Dabi, UAE.

Page 22: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 22 of 28

2.4.14 TIMA - 2012 Publications • A. Chagoya-Garzon, F. Rousseau, F. Pétrot. Multi-Device Driver Synthesis Flow for

Heterogeneous Hierarchical Systems. Euromicro Conference on Digital System Design, Sept 2012, pp. 389 – 396, Izmir, Turkey.

• Ashraf Elantably, Frédéric Rousseau. Task migration in multi-tiled MPSoC: Challenges, state-of-the-art and preliminary solutions. Journée National du Réseau Doctoral en Microélectronique, Marseille, France, June 2012, Poster and 4 pages paper (in English).

2.4.15 TIMA – 2011 Publications • A. Chagoya-Garzon, N. Poste, F. Rousseau. Semi-Automation of Configuration Files

Generation for Heterogeneous Multi-Tile Systems, Computer Software and Application Conference (COMPSAC 2011), Munich, Germany, 18-21 July 2011.

• H. Chen, G. Godet-Bar, F. Rousseau, F. Petrot. Me3D: A Model-driven Methodology expediting Embedded Device Driver Development, International Symposium on Rapid System Prototyping (IEEE RSP 2011), pp. 171-177, May 2011, Karlshrue, Germany.

2.5 A few presentations of project results during previous reporting periods (2010-2013)

2.5.1 ETHZ - 2013 Presentations • Iuliana Bacivarov, How model-based design simplifies the debugging of many-core

systems. 1st International Workshop on Multicore Application Debugging (MAD 2013). Nov. 2013, Munich, Germany.

• Iuliana Bacivarov, Distributed Application Layer: mapping dynamic applications on many-core systems. CASTNESS'13. June 28th, 2013, Barcelona, Spain.

• Iuliana Bacivarov, Distributed Application Layer - Run-time mapping of streaming applications on heterogeneous many-core systems. DAC CHANGE - Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments, June 2013, Austin, TX, USA.

2.5.2 ETHZ - 2012 Presentations • Iuliana Bacivarov, System-Level Thermal Aware Design of Real-Time Embedded Systems,

Lecture at Design and Test Summer School 2012, Oct. 2012, Puebla, Mexico. • Iuliana Bacivarov, Distributed Application Layer: Scenario-Based Design Flow for Mapping

Streaming Applications onto On-Chip Many-Core Systems, Invited talk at Thales Paris, Aug. 2012, Paris, France.

• Iuliana Bacivarov, EU FP7 project EURETILE - EUropean REference TILed architecture Experiment (2010-2013): Distributed Application Layer, in Special Session on Ongoing EU Projects at ASAP 2012, Jul. 2012, Delft, Netherlands.

• Iuliana Bacivarov, Distributed Application Layer: Efficient Programming of Reliable Many-Core Systems, Invited talk at TU Delft, Jun. 2012, Delft, Netherlands.

• Iuliana Bacivarov, Distributed Application Layer: Mapping Dynamic Streaming Applications onto Many-Core Systems, Invited talk at Map2MPSoC/SCOPES, May 2012, Schloss Rheinfels, St. Goar, Germany.

• Iuliana Bacivarov, Management of Process Network Dynamism in the Distributed Application Layer, CASTNESS 2012, Jan. 2012, Paris, France.

Page 23: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 23 of 28

2.5.3 ETHZ - 2011 Presentations • Iuliana Bacivarov, Thermal-Aware Design of Real-Time Multi-Core Embedded Systems,

Invited talk at Mapping Applications to MPSoCs, Jun. 2011. • Iuliana Bacivarov, Temperature Predictability in Multi-Core Real-Time Systems, Invited talk

at DAC Workshop on Multiprocessor System-on-Chip for Cyber Physical Systems: Programmability, Run-Time Support, and Hardware Platforms for High Performance Embedded Applications, Jun. 2011.

• Iuliana Bacivarov, Distributed Application Layer – Towards Seamless Programming of Many-Tile Architectures, CASTNESS 2011, 17 and 18 January 2011, Rome, Italy,http://euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-IulianaBacivarov.pdf.

2.5.4 ETHZ - 2010 Presentations • Iuliana Bacivarov, Distributed Operation Layer: An Efficient and Predictable KPN-Based

Design Flow, invited talk at Workshop on Compiler-Assisted System-On-Chip Assembly 2010, in conjunction with Embedded Systems Week, Scottsdale, AZ, US, October 2010, http://www12.cs.fau.de/ws/casa10.

• Iuliana Bacivarov, Efficient Execution of Kahn Process Networks on CELL BE, invited talk at Summer School on Models for Embedded Signal Processing Systems at Lorentz Center, Leiden, Netherlands, 30 Aug - 3 Sep 2010, http://www.lorentzcenter.nl/lc/web/2010/427/presentations/Iuliana-cell.pdf.

• Iuliana Bacivarov, Distributed Operation Layer: A Practical Perspective, tutorial at Summer School on Models for Embedded Signal Processing Systems at Lorentz Center, Leiden, Netherlands, 30 Aug - 3 Sep 2010, http://www.lorentzcenter.nl/lc/web/2010/427/presentations/Iuliana-demo.pdf.

• Iuliana Bacivarov, Distributed Operation Layer: Efficient Design Space Exploration of Scalable MPSoC, invited talk at Combinatorial Optimization for Embedded System Design workshop 2010 in conjunction with CPAIOR2010, 7th International Conference on Integration of Artificial Intelligence and Operations Research techniques in Constraint Programming, Bologna, Italy, June 2010, http://www.artist-embedded.org/artist/Overview,2022.html.

• Iuliana Bacivarov, invited talk at Efficient Execution of Kahn Process Networks on MPSoC, Mapping Applications to MPSoCs 2010, June 29-30, 2010, St. Goar, Germany,http://www.artist-embedded.org/artist/Program,1822.html.

2.5.5 RWTH - 2013 Presentations • J. H. Weinstock. Time-decoupled Parallel SystemC Simulation. Joint RWTH/Synopsys

Seminar. Dec 2013, Aachen, Germany. • L. G. Murillo. Simulation-based Concurrent Software Debugging. Joint RWTH/Synopsys

Seminar. Dec 2013, Aachen, Germany. • L. G. Murillo. Automatic Exploration of SW Concurrency Bugs through Deterministic

Behavior Control. 1st International Workshop on Multi-core Applications Debugging. November 2013, Munich, Germany.

• R. Leupers, L. G. Murillo. 1st International Workshop on Multi-core Applications Debugging (co-organization with TU Munich). November 2013, Munich, Germany.

• R. Leupers. Keynote at Sabanci Univ, Oct 2013, Istanbul, Turkey • R. Leupers. Two new Use Cases for Virtual Platforms. 13th International Forum on

Embedded MPSoC and Multicore (MPSoC'13). July 2013, Otsu, Japan

Page 24: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 24 of 28

• J. H. Weinstock. The EURETILE Parallel Simulation Environment. CASTNESS'13, June 2013, Barcelona, Spain.

• L. G. Murillo. Simulation-based MPSoC Software Debugging. Joint RWTH/TU Munich Research Seminar, May 2013, Munich, Germany.

• R. Leupers. System-Level Design Technologies for Embedded Multicore Devices. HiPEAC workshop, Apr 2013, Sibiu, Romania

• R. Leupers. HiPEAC boot at DATE'13, Mar 2013, Grenoble, France • R. Leupers. System-Level Design Technologies. Joint RWTH/Intel Lab seminar, Mar 2013,

Aachen, Germany • R. Leupers. Embedded Processor Design, Block lecture ALaRI, Jan-Feb 2013, University

of Lugano, Lugano, Switzerland. • R. Leupers (organizer). Programming Embedded Multiprocessor Systems: Application

Code Mapping and Performance Estimation Technologies. Tutorial. 8th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 2013. Yokohama, Japan

• R. Leupers. Joint RWTH/U Ghent Seminar, Jan 2013, Aachen, Germany • R. Leupers. Embedded Processor Design, course. Thai-German Graduate School

(TGGS), Jan 2013, Bangkok, Thailand

2.5.6 RWTH - 2012 Presentations • R. Leupers. Multicore Platform Design: Tackling a Grand Challenge in Embedded

Computing. Keynote at the 15th Euromicro Conference on Digital System Design (DSD '12), Sep 2012, Izmir, Turkey

• R. Leupers. Embedded Multicore Design Technologies: The Next Generation. Keynote at the IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP '12), Jul 2012, Delft, The Netherlands

• R. Leupers. Design Technologies for Wireless Multiprocessor Systems-on-Chip, PhD course, University of Pisa, Jul 2012, Pisa, Italy

• Christoph Schumacher. Virtual EURETILE Platform: a Platform for Many-tiled Systems Simulation. Joint RWTH/TU Poznan Seminar, May 2012, Poznan, Poland

• R. Leupers (session chair). Workshop at DATE 2012: "Quo Vadis, Virtual Platforms? Challenges and Solutions for Today and Tomorrow". Mar 2012, Dresden, Germany

• R. Leupers (organized by). Special session at DATE 2012: "Virtual Platforms: Breaking New Grounds". Mar 2012, Dresden, Germany

• R. Leupers, H. Meyr. Embedded Processor Design, Block lecture ALaRI, Feb 2012, University of Lugano, Lugano, Switzerland

• R. Leupers. What's hot on Embedded System Design. Keynote at the Embedded Technology Conference (ETC '12), Feb 2012, San Pedro, Costa Rica

• Luis Murillo. Simulation-based software debugging in many-tile systems. CASTNESS 2012, January 26, 2012, Paris, France

2.5.7 RWTH - 2011 Presentations • Juan Eusse. Hybrid Simulation Technology for Extensible Cores and Full System

Simulation of Complex MPSoCs. Presentation at HiPEAC Computing Systems Week, Nov 2011, Barcelona, Spain

• R. Leupers. SoC Design Research in the UMIC Excellence Cluster, Seminar, TU Berlin, Sep 2011, Berlin, Germany

• S. Yakoushkin. Advanced Simulation Techniques, Joint RWTH/TU Tampere Seminar, June 2011, Tampere, Finland

Page 25: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 25 of 28

• R. Leupers (organized by). ICT Technology Transfer Workshop targeting Horizon 2020, Apr 2011, Brussels, Belgium

• R. Leupers and G. Martin (organized by). Special session at DATE 2011: Virtual Manycore Platforms: Moving Towards 100+ Processor Cores, March 2011, Grenoble, France

• R. Leupers, H. Meyr. Embedded Processor Design, Block lecture ALaRI, Feb 2011, University of Lugano, Lugano, Switzerland

• Jovana Jovic, Simulation Challenges in the EURETILE Project, CASTNESS 2011, January 17-18, 2011, Rome, Italy

2.5.8 RWTH - 2010 Presentations • Christoph Schumacher, Virtual Platform Technologies for Multi-core Platforms, UMIC Day,

19 October, 2010, RWTH Aachen, Germany • Rainer Leupers, HiPEAC Cluster Meeting (Design and Simulation Cluster), October 2010,

Barcelona, Spain • Rainer Leupers, Design Technologies for Wireless Systems-On-Chip, Huawei ESL

Symposium, September 2010, Shenzhen, People's Republic of China • Christoph Schumacher, Stefan Kraemer and Rainer Leupers, demonstration at DAC 2010

exhibition: parSC: parallel SystemC simulation, deterministic, accurate, fast, June 14-16, 2010, Anaheim, USA

• Rainer Leupers, MPSoC Design for Wireless Multimedia, Tutorial, MIXDES, June 2010, Wroclaw, Poland

• Stefan Kraemer, Advanced Simulation Techniques for Virtual Platforms, May 26, 2010, Imperial College London, London, United Kingdom

• Rainer Leupers, Cool MPSoC Design, ASCI Winter School on Embedded Systems, March 2010, Soesterburg, Netherlands

• Rainer Leupers, Embedded Procesor Design and Implementation, course in MSc in Embedded Systems track at ALaRI Institute, March 1-4, 2010, University of Lugano, Switzerland

2.5.9 INFN - 2013 Presentations • Roberto Ammendola - Virtual-to-Physical address translation for an FPGA-based

interconnect with host and GPU remote DMA capabilities - 2013 International Conference on Field-Programmable Technology (ICFPT) 2013 - 9-11 dec - Kyoto, Japan

• Alessandro Lonardo - Building a Low-latency, Real-time, GPU-based Stream Processing System - GTC 2013 - March 20, 2013 - San Jose (California)

• Andrea Biagioni - The EURETILE hardware experimental platform - CASTNESS 2013 - 28 June 2013 - Barcelona, Spain

• Laura Tosoratto - Fault and Critical Event Awareness: a no-single-point-of-failure approach for distributed systems - CASTNESS 2013 - 28 June 2013 - Barcelona, Spain

• Andrea Biagioni - APEnet+ 34 Gbps Data Transmission System and Custom Transmission Logic - TWEPP 2013 - Sept 23-27, Perugia, Italy 2013

• Francesco Simula - From GPU-accelerated computing to GPU-accelerated data acquisition for physics experiments; the QUonG cluster, the APEnet+ network card and the APE project evolution - X Seminar on Software for Nuclear, Subnuclear and Applied Physics 2013 - Alghero, Italy

• Piero Vicini - Analysis of performance improvements for host and gpu interface of the APENet+ 3D Torus network - XV International Workshop on Advanced Computing and Analysis Techniques in Physics (ACAT)- Beijing, China - May 2013

Page 26: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 26 of 28

• Piero Vicini - GPU for Real Time processing in HEP trigger systems - XV International Workshop on Advanced Computing and Analysis Techniques in Physics (ACAT)- Beijing, China - May 2013

• Davide Rossetti - GPU Techniques Applied to a Cluster Interconnect - Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW) 2013 - Boston, MA.

• Alessandro Lonardo - Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems - International Conference on Computing in High Energy and Nuclear Physics (CHEP) 2013, 14-18 Oct 2013, Amsterdam, Nederlands.

• Ottorino Frezza - Design and implementation of a modular, low latency, fault-aware, FPGA-based Network Interface - ReConFig 2013 - Dec 2013 - Cancun, MEX.

2.5.10 INFN – 2012 Presentations • R. Ammendola, apeNET+: a 3D toroidal network enabling petaFLOPS scale Lattice QCD

simulations on commodity clusters, Lattice 2010, THE XXVIII INTERNATIONAL SYMPOSIUM ON LATTICE FIELD THEORY, Villasimius, Italy, June 2010, http://agenda.infn.it/contributionDisplay.py?contribId=335&sessionId=70&confId=2128

• D. Rossetti, Leveraging NVIDIA GPUDirect on APEnet+ 3D Torus Cluster Interconnect - GTC 2012 - GPU Technology Conference - May 2012 - San Jose, CA, [[25]]

• R. Ammendola, Comunicazioni Peer to Peer tra GPU remote con APENet+ - E4 Workshop 2012 - Sept 2012 - Bologna, Italy [[26]]

• D. Rossetti, Multi GPU simulations: status and perspectives - New Frontiers in Lattice Gauge Theory, GGI Firenze, Italy - [[27]]

• Davide Rossetti, Breadth First Search on APEnet+ - talk at IA^3 Workshop on Irregular Applications at SC12 conference, 10 Nov 2012. Presentation available here

• Pier Stanislao Paolucci, Brain Inspired Many-Tile Experiment: second year overview of EURETILE, CASTNESS'12, 26 January 2012, Paris, France

• P. Vicini, Peer-to-peer GPGPU-APENet+connectivity on HPC EURETILE platform, CASTNESS'12, 26 January 2012, Paris, France

2.5.11 INFN – 2011 Presentations • D. Rossetti, apeNET+ Project Status, Lattice 2010, THE XXIX INTERNATIONAL

SYMPOSIUM ON LATTICE FIELD THEORY, Squaw Village, Lake Tahoe, CA, USA, July 2011 - to be published on https://latt11.llnl.gov/html/proceedings.php

• P. Vicini, QUonG: A GPU-based HPC System Dedicated to LQCD Computing - Symposium on Application Accelerators in High-Performance Computing, Knoxville, TN - USA, July 2011 http://doi.ieeecomputersociety.org/10.1109/SAAHPC.2011.15

• D. Rossetti, Remote Direct Memory Access between NVIDIA GPUs with the APEnet 3D Torus Interconnect - SC11 - International Conference for High Performance Computing, Networking, Storage and Analysis - Seattle, WA http://nvidia.fullviewmedia.com/fb/nv-sc11/tabscontent/archive/304-wed-rossetti.html

• Pier Stanislao Paolucci, EURETILE: Brain-Inspired many-tile SW/HW Experiment, CASTNESS 2011, 17 and 18 January 2011, Rome, Italy,http://euretile.roma1.infn.it/mediawiki/img_auth.php/3/3a/EURETILE-1-PierStanislaoPaolucci.pdf

• P. Vicini, EURETILE: The HPC and Embedded Experimental HW Platform, CASTNESS 2011, 17 and 18 January 2011, Rome, Italy,http://euretile.roma1.infn.it/mediawiki/img_auth.php/6/69/EURETILE-6-PieroVicini.ppt

Page 27: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 27 of 28

• P.S. Paolucci, Future Emerging Technologies High Performance Computing Projects, 17 Jan 2011, Lab. Naz. Legnaro, Italy

2.5.12 TIMA – 2014 Presentations • Frédéric Rousseau, "Lightweight task migration in multi-tiles architectures –

communication consistency", MPSoC forum, Margaux – France, 7-11 July 2014.

2.5.13 TIMA – 2013 Presentations • Ashraf Elantably, Frédéric Rousseau. Lightweight task migration in embedded multi-tiled

architectures using task code replication, CASTNESS 2013, Barcelona, Spain.

2.5.14 TIMA – 2012 Presentations • Frédéric Rousseau, Requirements in Communication Synthesis for EURETILE: The use of

Communication Path Formalization, CASTNESS 2012, January 26th, 2012, Paris, France • Frédéric Rousseau, presentation of the EURETILE project in front of the Board of directors

of the University Joseph Fourier, March 2011, Grenoble, France (in French) • Frédéric Rousseau, Communication Synthesis in Low Level Software for Hierarchical

Heterogeneous Systems, CASTNESS 2011, January 17-18, 2011, Rome, Italy

2.5.15 Target – 2011 Presentations • G. Goossens, “Why Compilation Tools are the Catalyst for Multicore SoC Design”,

Electronic Design and Solutions Fair, Yokohama (Japan), January 27-28, 2011. • G. Goossens, “Why Compilation Tools are a Catalyst for Multicore SoC Design”, Third

Friday Workshop on Designing for Embedded Parallel Computing Platforms: Architectures, Design Tools, and Applications, Design Automation and Test in Europe (DATE-2011), Grenoble (France), March 18, 2011.

• G. Goossens, P. Verbist, “Enabling the Design and Programming of Application-Specific Processors”, Sophia-Antipolis Micro-Electronics Conference (SAME-2011), Sophia-Antipolis (France), October 12-13, 2011.

• G. Goossens, “Building Multicore SoCs with Application-Specific Processors”, Electronic Design and Solutions Fair, Yokohama (Japan), November 16-18, 2011.

• P. Verbist, “Building software-programmable accelerators for ARM-based subsystems”, ARM Technical Symposium, Taipei and Hsinchu (Taiwan), November 17-18, 2011.

• G. Goossens, “Building Multicore SoCs with Application-Specific Processors”, Intl. Conf. on IP-Based SoC Design (IP-SoC-2011), Grenoble (France), December 7-8, 2011.

• G. Goossens, “Design Tools for Building Software-Programmable Accelerators in Multicore SoCs”, Workshop on Tools for Embedded System Design, Sint-Michielsgestel (Netherlands), December 13, 2011.

2.5.16 Target – 2010 Presentations • G. Goossens, "How ASIP Technology can Make your RTL Blocks More Flexible",

Electronic Design and Solutions Fair, Yokohama (Japan), January 28-29, 2010. • S. Cox, G. Goossens, "Hardware Accelerator Performance in a Programmable Context:

Methodology and Case Study", Embedded Systems Conference, San Jose (CA, USA), April 26-29, 2010.

• G. Goossens, "Design of Programmable Accelerators for Multicore SoCs", First Artemis Technology Conference, Budapest (Hungary), June 29-30, 2010.

Page 28: Project: EURETILE – European Reference Tiled ... · Semiconductor, GN ReSound, Huawei, NXP Semiconductors, ... interface models that were developed as demonstrators in the project,

Project: EURETILE – European Reference Tiled Architecture Experiment Grant Agreement no.: 247846 Call: FP7-ICT-2009-4 Objective: FET - ICT-2009.8.1 Concurrent Tera-device Computing

Deliverable number: D9.4 Deliverable name: Fourth Report on Training, Exploitation and Dissemination File name: EURETILE-D9-4-Dissemination-v20141112b.docx pag 28 of 28

• W. Geurts, G. Goossens, "Ideas for the Design of an ASIP for LQCD", CASTNESS 2011, Rome (Italy), January 17-18, 2011,http://euretile.roma1.infn.it/mediawiki/img_auth.php/3/3c/EURETILE-5-WernerGeurts.ppt

• G. Goossens, E. Brockmeyer, W. Geurts, “Application-Specific Instruction-set Processors (ASIPs) and related design tools for tiled systems”, CASTNESS 2012, Paris (France), January 26, 2012.

2.6 Books

• ETHZ o Book chapter: I. Bacivarov, W. Haid, K. Huang, L. Thiele. Methods and Tools for

Mapping Process Networks onto Multi-Processor Systems-On-Chip. Handbook of Signal Processing Systems, Springer, pages 1007-1040, October, 2010.

• RWTH o Book chapter: J. H. Weinstock, C. Schumacher, R. Leupers and G. Ascheid:

"SCandal: SystemC Analysis for Nondeterminism Anomalies", in Models, Methods, and Tools for Complex Chip Design (Haase, J., ed.) vol. 265 of Lecture Notes in Electrical Engineering. Springer, 2013.

o Book: T. Kempf, G. Ascheid, R. Leupers: Multiprocessor Systems on Chip: Design Space Exploration, Springer, Feb 2011, ISBN 978-1441981523

o Book: R. Leupers and O. Temam (Eds.), Processor and System-On-Chip Simulation, Springer, September 2010, ISBN 978-1441961747

• TIMA o Katalin Popovici, Frederic Rousseau, Ahmed A. Jerraya, Marilyn Wolf: Embedded

Software Design and Programming of Multiprocessor System-on-Chip, Simulink and SystemC Case Studies, Springer, April 2010, ISBN 978-1-4419-5566-1

o Book chapter: Xavier Guerin, Frederic Petrot, Operating System Support for Applications targeting Heterogeneous Multi-Core System)on-Chip in the book Multi-Core Embedded Systems, CRC Press, Chapter 9, 24 pages, April 2010.

2.7 CASTNESS'11 Workshop

• EURETILE organized in Roma, on 17-18 January 2011, the 2011 edition of the CASTNESS'11 Workshop about Computer Architectures, Software tools and non-Technologies for Numerical and Embedded Scalable Systems. The agenda and the presentations are stored on http://euretile.roma1.infn.it/mediawiki/index.php/CASTNESS_11

This list of publications and presentations has been retrieved from "http://euretile.roma1.infn.it/mediawiki/index.php/EURETILE_publications"