Hardware-Assisted Secure Communication in Embedded and

Heriot-Watt University Research Gateway

Hardware-Assisted Secure Communication in Embedded andMulti-Core Computing Systems

Citation for published version:Saeed, A, Ahmadinia, A & Just, M 2018, 'Hardware-Assisted Secure Communication in Embedded andMulti-Core Computing Systems', Computers, vol. 7, no. 2, 31. https://doi.org/10.3390/computers7020031

Digital Object Identifier (DOI):10.3390/computers7020031

Link:Link to publication record in Heriot-Watt Research Portal

Document Version:Publisher's PDF, also known as Version of record

Published In:Computers

Publisher Rights Statement:© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributedunder the terms and conditions of the Creative Commons Attribution (CC BY) license(http://creativecommons.org/licenses/by/4.0/).

General rightsCopyright for the publications made accessible via Heriot-Watt Research Portal is retained by the author(s) and /or other copyright owners and it is a condition of accessing these publications that users recognise and abide bythe legal requirements associated with these rights.

Take down policyHeriot-Watt University has made every reasonable effort to ensure that the content in Heriot-Watt ResearchPortal complies with UK legislation. If you believe that the public display of this file breaches copyright pleasecontact [email protected] providing details, and we will remove access to the work immediately andinvestigate your claim.

Download date: 24. May. 2022

https://doi.org/10.3390/computers7020031

https://doi.org/10.3390/computers7020031

https://researchportal.hw.ac.uk/en/publications/e8931dd7-85a3-475a-8e40-a957ab786bcf

Article

Hardware-Assisted Secure Communication inEmbedded and Multi-Core Computing Systems

Ahmed Saeed 1 ID , Ali Ahmadinia 2,* ID and Mike Just 3 ID

1 Department of Electrical Engineering, COMSATS Institute of Information Technology,54000 Lahore, Pakistan; [email protected]

2 Department of Computer Science and Information Systems, California State University San Marcos,San Marcos, CA 92096, USA

3 School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK;[email protected]

* Correspondence: [email protected]; Tel.: +1-760-750-8502

Received: 16 April 2018; Accepted: 12 May 2018; Published: 15 May 2018��

Abstract: With the sharp rise of functionalities and connectivities in multi-core embedded systems,these systems have become notably vulnerable to security attacks. Conventional software securitymechanisms fail to deliver full safety and also affect the system performance significantly. In thispaper, a hardware-based security procedure is proposed to handle critical information in real-timethrough comprehensive separation without needing any help from the software. To evaluatethe proposed system, an authentication system based on an image procession solution has beenimplemented on a reconfigurable device. In addition, the proposed security mechanism is evaluatedfor the Networks-on-chips, where minimal area, power consumption and performance overheadsare achieved.

Keywords: secure communication; embedded computing; hardware security; network-on-chip

1. Introduction

Security and trust are becoming crucial design considerations for all contemporary embeddedreal-time systems as they are extensively put to use in crucial and decisive applications that requireprocessing of data in a given time frame. Such embedded systems are now typically found intelecommunication devices, power grids, satellites, planes, automobiles, ATM machines, militaryequipment and medical diagnostic devices. Due to an increase in network connectivity and softwarecontent such devices are now becoming more prone to rapidly spreading security attacks throughmalicious software which were previously known to target servers, desktop systems and Internetconnected devices [1].

Malicious software mainly targets embedded systems to get unauthorized access to critical data(such as confidential files and cryptographic keys), to modify sensitive information saved in thesystem’s storage unit and to reduce performance or drain power source (e.g., through execution ofsome valid tasks uselessly). These types of attacks are more commonly known as data confidentialityattacks, data integrity attacks, and performance degradation attacks, respectively. Embedded systemsare usually more susceptible to these types of attacks due to two main features. Primarily, the reducedsystem resources and simplified processing cores architecture leave such systems prone to differentkinds of physical attacks. Secondly, the ability to communicate to the outside world, without anybuilt-in security mechanism, also expose such systems to software attacks leading to unauthorizedaccesses. Moreover, download or update of applications and other data by the embedded devices overinsecure network connection or other mechanisms (such as unencrypted Wi-Fi and blue-tooth) always

Computers 2018, 7, 31; doi:10.3390/computers7020031 www.mdpi.com/journal/computers

http://www.mdpi.com/journal/computers

http://www.mdpi.com

https://orcid.org/0000-0002-0275-7133

https://orcid.org/0000-0003-4612-1142

https://orcid.org/0000-0002-9669-5067

http://www.mdpi.com/2073-431X/7/2/31?type=check_update&version=1

http://dx.doi.org/10.3390/computers7020031

http://www.mdpi.com/journal/computers

Computers 2018, 7, 31 2 of 22

make them vulnerable to security attacks. More importantly, real-time embedded systems are designedand implemented in such a way that their main objective is to complete the specific task in the giventime frame without engaging in the execution of irrelevant security applications such as an anti-virussoftware. In contrast to desktop computers that can afford software-based security mechanisms,resource-limited embedded systems require to use a considerable amount of their resources suchas processing power and battery life to support such solutions. Furthermore, the existing solutionsrelying purely on software cannot detect unreported attacks without prior knowledge and regularupdates of virus and malware definitions. Therefore, embedded systems must deploy such securitycountermeasures that are self-contained and have the ability to detect such attacks on the run withoututilizing system’s dedicated resources.

In this paper, a new security mechanism has been proposed by isolating and positioningpivotal processing units in a secure zone to process sensitive data in a independent and secure way.The proposed secure zone-based solution operates independently without requiring modificationsin the system software. The feasibility of the proposed mechanism is explained by implementingan authentication system on a reconfigurable device based on an image processing technique. It isfurther shown that the given system, when equipped with the presented security mechanism, canidentify software-based attacks swiftly with nominal increase in area and power consumption.The scalability and functionality of the proposed solution is further verified by implementing itfor the Network on Chip (NoC)-based communication systems and evaluating its performance, powerconsumption and area overhead when configured in different network sizes.

The paper has been arranged as follows: The existing hardware-based security mechanismsare outlined in Section 2. The key concept of managing data by the proposed security mechanismand its implementation details are discussed in Section 3. Section 4 deals with the assessment of theproposed security mechanism when utilized within a existing FPGA-based authentication system.The scalability and performance evaluation for the NoC-based communication systems has beendiscussed in Section 5 while conclusion is presented in Section 6 of this paper.

2. Related Work

Various security solutions have been presented in the literature based on the configurationsof the system, functional specifications, types of attacks, and performance requirements. Physicalattacks targeting embedded systems require direct access such as power consumption analysis througheavesdropping [2], timing analyses [3] and chemical-based attacks [2] but these kinds of attacks havenot been covered here and we have targeted hardware-assisted security solutions capable of identifyingsoftware related attacks.

Presently, different hardware-assisted techniques related to secure processing of informationhave been presented ranging from cryptography, reference monitors, hardware virtualization [4] todynamic information flow tracking (DIFT) [5]. Solutions related to reference monitors are based oncomparing processor-executed code with a valid model whereas the security module works in parallelwith the processing core as discussed by Kornaros et al. [6]. For instance, the secure reference monitorswith hardware support [7–10] sense divergence in program execution on the run by analyzing systemresponse against a predefined behavior in order to detect any attacks targeting data integrity. The staticmodels are normally obtained through static analysis and profiling of the programs to be loaded onthe system.

Yan et al. in [11] have presented a concept of hardware isolation for mobile devices which is basedon disabling the dangerous hardware components when sensitive applications are running. They havenot verified this approach by practically implementing it for a realistic system. Rathgeb et al. [12]presented a secure processing technique by implementing generic modules as secure blockswhere interconnections among them can be reconfigured using run-time partial reconfiguration.Such approaches cannot be deployed where system switches rapidly between secure and insecuremode, as partial reconfiguration results in significant performance overhead.

Computers 2018, 7, 31 3 of 22

Security systems based on cryptography such as encryption devices have been used extensivelyfor the implementation of strong isolation. Thekkath et al. [13] provide a security mechanism atprocess level by isolating trusted process’s data and instruction code from untrusted and vulnerableprocesses. To prevent illegal extraction of information, dedicated tags are assigned to each processand cryptographic algorithms are used to encrypt the data and code. Similarity, Suh et al. [14] havepresented an encryption-based security technique by isolating the processor resources from externalmemory unit and other untrusted I/O devices. They have implemented a security kernel within theuntrusted operating system and secure context manager.

Hardware-assisted solutions based on DIFT techniques [5,15,16] are also quite capable to handlesoftware related attacks. These solutions mainly rely on flagging data arriving from insecure inputperipherals and then following the trail as the system processes the data. These kinds of solutionsare based on making modifications in the source code of the application, system memory and thearchitecture. Such requirements have made these solutions unlikely choice to secure embeddedsystems. Furthermore, techniques achieving security through hardware virtualization [17–19] dependon dedicated software support more commonly known as “hypervisor” to execute tasks requiringsecurity. Moreover, latest surveys [20,21] have presented that even these types of refined techniquesare vulnerable to security breaches and constrained in terms of code and data memory. Therefore, it isvery essential to come up with a new security solution that is capable of processing critical informationin a secured manner with no or little help from the software running on the system.

Fiorin et al. [22] have presented a data protection module, for multi-core systems, to detect illegalaccesses to specific blocks of the memory unit by verifying access rules and using address lookup tables.Similarly, another solution has been presented by Lukovic et al. [23] to detect buffer overflow-basedattacks by including a dedicated security module within the network interface attached to each of theprocessing cores. Following the same pattern, Porquet et al. [24] have presented a security mechanismfor network on chip architectures with a shared memory unit. Their basic idea is to segregate severalapplications based on the required trust level and process information securely using both dedicatedsoftware and hardware modules. More recently, Wassel et al. [25] proposed a low-latency andnon-interfering security mechanism, called SurfNoC, to protect system against side-channel attackssuch as timing analysis. This is achieved by maximizing temporal and spatial isolation of differentcommunication flows in the system through network scheduling in a wave manner. Wassel et al.extended this work [26] and proposed a partitioning mechanism through packet scheduling to passpackets between different domains with lower latency. Grammatikakis et al. [27] have proposeda self-contained NoC firewall within the network interface by validating the memory requests againstthe predefined set of rules. Hu et al. [28] have also presented a similar solution by incorporatingaccess rules verification and authentication mechanism for certain regions of the memory units. In ourprevious work [29], we have presented an identity and address verification-based security module formulti-core systems in order to detect security attacks and stop illegal accesses to the memory. Kornarosand Leivadaros [6] have proposed a system level framework for updating firmware in interconnectedembedded devices by classifying data types to transmitted, processed and stored data and tailoringa solution for each type of data by encryption or isolation.

The reference monitoring and DIFT-based security solutions either require changes in mainhardware resources (such as processor and memory architecture) or partial software support whichare not ideal for embedded systems especially for battery operated devices where power consumptionand available resources are one of the main design concerns. In one of our previous work [30],it is shown that security mechanisms that rely on cryptography are not realistic and scalable formulti-core based SoC devices due to their limited resources, high performance and power consumptionoverheads. In contrast, our proposed security solution does not require modifications in the processoror memory architecture, operates independently without any help from software side and bearsminimal performance overhead for NoC-based systems [31]. Security mechanisms for NoC-basedmulti-core systems (such as [22,24,27–29] have only been designed specifically for shared-memory

Computers 2018, 7, 31 4 of 22

architectures and provide protection to memory units only whereas our proposed solution can be usedto isolate any processing block ranging from memory controllers to other I/O peripherals. Similarly,the hardware-based security solutions, as discussed by Yan et al. [11], are based on disabling someunreliable processing cores when sensitive applications are running. For real-time embedded systems,such approaches are clearly not feasible. On the other hand, our proposed security mechanism worksalongside unreliable processing cores without disabling them. Furthermore, these existing solutionstarget specific security attacks whereas our hardware-assisted secure mechanism can handle dataintegrity, confidentiality and denial of service attacks as well.

3. Design Approach and Architecture

Software attacks have become increasingly widespread and a buffer overflow is one of themajor causes of security outbreaks. Although buffer overflows may arise unintentionally due tocommon programming errors, it is one of the most common types of software attacks targeting dataintegrity. In the case of buffer overflow-based attacks, the additional data may point to the maliciouscode designed to execute new instructions and initiating specific tasks on the system under attack.This research work is based on the hypothesis that the buffer overflow-based attacks can be addressedthrough dedicated security modules. The main aim of this research is to explore and develop a genericand scalable security hardware module for multi-core systems that can detect buffer overflow-basedattacks, identify the compromised processing cores under attack and stop its propagation in order tokeep the system secure and infection free.

This section explains the design steps of our proposed security solution along with its basicarchitecture layout considering a conventional embedded system based on a FPGA device. Normallyan embedded system consists of a microprocessor, Input/output devices, memory unit and otherperipherals which are essentially required to run the given task. The main objective of designinga hardware-assisted security module is to restrain the unreliable hardware modules from accessingtrusted processing units when critical tasks are under execution.

In order to execute the critical tasks in a protective environment, a secure zone has been designedwhereas its functionality has been tested for a FPGA-based embedded system and a NoC-basedmulti-core architecture as well. The basic concept is to adjust processing blocks requiring security inan isolated environment such that they are only accessible via a security checking module. The blocklevel implementation of a conventional embedded system accommodating a secure zone is presentedin Figure 1. The processing blocks executing or storing critical data are to be implemented separatelyinside the secure zone from untrusted processing blocks (such as peripherals and controllers).

Input peripheral

DDR Memory

Microblazeprocessor

Outputperipheral

Processor Local bus (PLB)

output controller

input controller

secure controller

secure zone

Block RAM

Input

Unsecure region

Figure 1. Prototype of embedded system enabled with secure zone.

Computers 2018, 7, 31 5 of 22

As shown in Figure 1, the secure zone communicates directly with the Processor Local Bus (PLB)and the secure controller only. The permission is granted by the secure zone after confirming the accesscontrol rules (e.g., write access only, read access only, and/or both write and read access). These accessrules are initialized at the time of configuration. The processing blocks implemented inside the securezone are only accessible by the untrusted processing blocks via specialized interface as indicated inFigure 2.

Processing block

Secure Zone

secure interface

sec_controller_in

systembus_plb_inplb data_out

controller_out

processing block

processing block

Figure 2. Block level implementation of secure zone.

The control input, sec_controller_in, to the secure zone is produced by the secure controllerwhich operates independently and has no linkage with the other processing blocks or software runningon the system. On the contrary, the second control input, systembus_plb_in is produced by PLBwhich is managed directly by the task/software under execution. In order to access the processingblocks within the secure zone, the secure interface must receive the same access control signals fromboth of the control inputs.

The internal working of the interfacing module is presented in Figure 3. This module is responsiblefor verifying the incoming data before authorizing access to the processing blocks within the securezone. For instance, in order to update the storage module within the secure zone, the software willgenerate signal requesting write access whereas the user is also required to generate the same signalusing secure controller. In the event of any security attack, the secure interface will fail to verify the accesscontrol signals and in that case an alert signal will be generated, preventing access to the secure zoneprocessing blocks. This has been further elaborated in the next section.

Input port Output Port

data_in data_out

Secure mode

Access rules check

Port_inPort_out

To secure zone blocks

Controlleroutput

From secure zone blocks

PLB mode

Figure 3. Internal working of secure interface.

Computers 2018, 7, 31 6 of 22

4. FPGA-Based Embedded System

As a proof of concept, the proposed security mechanism is implemented and tested byincorporating it in a signature-based authentication system [31] as shown in Figure 4. Such systemshave many applications where security is the primary concern. For instance, banking institutes can useit to validate signatures automatically on the checks. The prototype design has been implemented ona specialized FPGA device, Spartan-3A DSP Video Starter Kit (VSK), which has dedicated hardwareresources to process video data in real-time. This VSK comes with a reference design which iscustomized to implement an authentication system according to our proposed solution with the helpof Xilinx Embedded Development Kit (EDK) [32].

Video frame buffer

Camera controller MPMC

(Multiport memory

controller)

Microblazeprocessor

DDR2RAM

Video frame buffer

DVI out

Processor Local Bus (PLB)

Display controller

Video controller

Secure-zone controller

Secure Zone

Block RAM

Mode input

Camera input

Figure 4. Prototype of image processing-based authentication system enabled with secure zone.

The secure zone has been implemented for the given authentication system as shown in Figure 5.Here, the image library block is used to store images of valid signatures whereas the signature matchingblock is the primary module that has been used to process signature images arriving from the cameracontroller and perform verification based on template matching technique. Considering the sensitivityof the data, the image library and signature matching block have been placed inside the secure zone toperform authentication procedure in a protected environment.

The processing blocks (signature matching and image library), handling sensitive images, are placedseparately from untrusted blocks (e.g., DDR2_RAM, PLB, display controller). In general, the signaturematching block performs authentication whereas the image library is essentially a database ofreference images. The blocks placed within this secure zone are accessible to the untrusted blocksafter performing access rules checks. In order to access image library or signature matching block,the secure-zone controller and the PLB must pass the same access rights signals. After successfulverification by the secure interface, the camera_controller_in and systembus_plb_in signals are transferredto the signature matching block implemented inside the secure zone. For instance, to update images inthe image library, software will generate write access request through systembus_plb_in and at the sametime it is also required to have a write access request from secure_controller_in input signal. In the eventof any security attack, the access control signals will mismatch and an alert signal will be generated,stopping access to the secure zone. This is achieved through the secure interface as shown in Figure 3.The permission is granted by the secure zone after confirming the access control rules (e.g., write accessonly, read access only, and/or both write and read access) from both of the control inputs are identical.As the attacker does not have access to secure interface, even trying to send an access request justthrough PLB will be denied as secure module access control signals indicating no access or a differentauthorized access to the user such as read access instead of write access.

Computers 2018, 7, 31 7 of 22

Image library

Img controller

signature matching

Secure Zone

secure interface

secure_controller_insystembus_plb_in

camera_controller_in

plb_out

video_controller_out

Preprocessing

Edge detection

Edge thinning

Template Matching

Input portOutput

Port

From_untrusted blocks

To_untrusted blocks

Denial of Service check block

Input Portoutput

Port

To signature matching

Controller output

From signature matching

Secure mode

PLB mode

Access rules check

Figure 5. Secure zone with secure interface.

The flowchart, as shown in Figure 6, is being implemented inside the secure zone to performsignature matching by verifying the incoming image frames from the camera sensor. The user signson the specific sheet, which is placed in front of the camera at the fixed distance. After verifying theaccess rights through secure interface module, the system performs signature matching by following thesteps as presented in this figure.

camera sensor

Get image frame

Preprocessing

Edge detection

Edge thinning

Image library

Template matching

get template

Template available

YES

Signature match

Signature verified

yes

No

Signature not verified

NO

Signature sheet

Img Controller

Image p

rocessin

g blo

ck

Figure 6. Image Processing-based authentication (signature matching) system flowchart.

4.1. Security Effectiveness

The vulnerabilities in the software, executing on an embedded system, can be exploited atrun-time by different security attacks. Usually, the application data is susceptible to attacks whensystem resources are accessible through untrusted communication medium. Typical examples arebuffer overflow attacks [33], in which a part of the code is modified and the return address is replacedwhich in turn transfer the control to another region pointing to a harmful piece of code. As per theSourcefire report [34], 14% of all vulnerabilities and 35% of critical ones in the last two decades arebuffer overflow attacks. For this purpose, the effectiveness of the proposed security mechanism isevaluated by analyzing the system behavior under various buffer-overflow-based attacks.

Computers 2018, 7, 31 8 of 22

To modify the image library contents, we instantiated a buffer overflow attack as shown inFigure 7. During application execution, the control register is overwritten at the stack frame andadditional data is inserted for illegal branch instruction, which in turn overwrites the return pointer ofthe given function call. Therefore, the control flow is switched to the location where illegal image framehas already been placed by the attacker. Using the standard interface, without any protection mechanism,the image library is modified illegally through buffer-overflow-based attack. On the contrary, when thesecure zone is enabled with secure interface, the attack alert signal is generated successfully.

Program instructions

Program instructions

Stack directio

n

Invalid image frame

(a) (b)

Malicious code

“lib_update” call frame

Control register and return

address register are overwritten

ret addr reg

“lib_update” call frame

control reg

ret addr reg

control reg

Figure 7. Code Execution (a) Without buffer overflow attack; (b) With buffer overflow attack.

Beside handling buffer-overflow-based attacks that target integrity of the data, our securitymechanism is capable of detecting data confidentiality and denial of service attacks as well.For instance, if the system software is get infected with malware and the attacker tries to extractthe data, the secure interface will terminate the data flow as the sensitive information from the securezone can only be extracted by providing the identical access rights through the secure-zone controllerand PLB modules as explained earlier in Section 3. Similarly, denial of service attacks are avoided bymonitoring the incoming requests and granting access to the secure zone modules when required suchas updating the image library module implemented within the secure zone. Although the modulesinside the secure zone can only be activated through secure-zone controller, the attacker can generaterepeated requests deliberately to slow down the system performance that ultimately results in batterydrainage. In our security mechanism, we have monitored the access rights, being generated by thePLB data_in, through a specialized counter for a specific period of time. For instance, if the attackergenerates unnecessary accesses to the secure region modules continuously then such requests will befiltered out. The control flow of the system with and without secure zone implementation is presentedin Figure 8. Without secure zone, the control flow as shown in Figure 8a will be followed and theimage library will be updated by the software by transferring the write_access control signal. In theevent of a buffer overflow, as mentioned earlier, the attacker can modify the access rights within thesystembus_plb_in input signal and acquire unauthorized access to the image library block.

Consequently, the access signal is modified illegally through a buffer-overflow-based attack andreturn address of the function is updated leading to the malicious code. In this way, the images withinthe image library block are updated illegally without any authorization. Alternatively, the controlflow as presented in Figure 8b will be followed when secure zone-based solution is in use. In ourproposed security mechanism, the access control signal extracted from the systembus_plb_in inputwill be compared with the control signal derived from the secure_controller_in input. In the case of

Computers 2018, 7, 31 9 of 22

buffer-overflow-based attack, the access control signal generated by the software is overwritten but thesecure-zone controller output remain unaffected as it has direct link with the software. When controlflow comes to the “access rule check”, stage, a mismatch is successfully discovered and the secureinterface of the secure zone alerts the system.

Get Mode

Get access mode

Access rules check

Update image library

write_access

signature matching

read_access

Generate attack alert

mismatch

Get access mode

Update image library

write_access

Signature matching

read_access

(a) (b)

Check denial of service

secure_controller_in systembus_plb_insystembus_plb_in

Figure 8. System control flow diagram. (a) Normal control flow without secure interface; (b) Controlflow with secure interface.

4.2. Area and Power Consumption Overhead

In order to evaluate overhead of the proposed solution, the area and power consumption hasbeen measured in two steps. In the first step, the authentication system is implemented without securezone. In the second step, after testing and functional verification of the basic system, the secure zone(secure-zone controller and interface) is implemented. Xilinx Synthesis Tool (XST) is used to generatethe area and power consumption overhead results at both steps.

For fair comparison, the area overhead results are obtained by disabling the Block RAM (BRAM)and DSP blocks for the secure interface and secure-zone controller implementation. XST is configuredto use flip flops and lookup tables only during synthesis phase. In our case, the area overhead iscalculated by measuring number of slices utilized when the modified system, equipped with theproposed security mechanism, is synthesized for the given FPGA device. As presented in Figure 9a,our proposed solution has incurred an area overhead of 1.23% only.

Today’s FPGA-based embedded systems have increased substantially in computational powerand capacity which has resulted in significant rise in power consumption. Efficient management ofsystem resources with minimal impact on the power consumption has now become one of the primedesign concerns. For this purpose, we have measured the increase in dynamic power consumption ofthe modified system when enabled with the secure controller and interface as compared to the standardsystem. The XPE (Xilinx Power Estimator) tool [35] has been to measure the power consumptionoverhead. As indicated in Figure 9b, our proposed security mechanism has resulted a rise of 2 mW inpower consumption as compared to standard design with no dedicated security modules.

Computers 2018, 7, 31 10 of 22

2325 2354

0

800

1600

2400

Without securezone

With securezone

nu

mb

er o

f sl

ices

(a)

21 23

0

5

10

15

20

25

Without securezone

With securezone

Pow

er C

on

sum

pti

on

(m

W)

(b)

Figure 9. (a) FPGA slice utilization with and without secure zone; (b) Power Consumption with andwithout secure zone.

4.3. Performance Evaluation

The performance of the given system always gets affected whenever a dedicated hardware moduleis included serially within the processing blocks. In order to assess how the proposed security solutionimpacts the execution time of the system, the performance overhead is calculated for two differenttest cases.

In the first case, we have evaluated the delay introduced by our proposed solution in theimage processing-based authentication system. The secure controller and secure interface have beenimplemented in a pipelined architecture to minimize their impact on the system execution time.To measure the delay, at first we have synthesized our design with and then without securitymechanism. The timing information is produced with the help of XST. The time taken by the securezone to update image library, utilizing secure interface, is defined as the secure execution time (SET) andit is measured by using Equation (1).

SET(Cycles) = NET + [pd2 − pd1

pd1× NET] (1)

where pd1 and pd2 are the worst-case times required by the standard interface and the secure interfacerespectively to generate output signals. Normal execution time (NET) is defined as the total numberof clock cycles required to update the image library using standard interface without any securitymechanism. The performance overhead in terms of total number of cycles required to update imagelibrary through secure interface is presented in Figure 10. It is observed that the delay introduced by oursecurity mechanism remains constant and it is directly proportional to the number of images beingupdated. For instance, the secure interface induces 21.64% increase in the total number of cycles toupdate 25 images as compared to the standard interface.

7.2 14.4

28.8

57.6

115.2

180

9.2 18.4

36.8

73.5

147.0

229.7

0

50

100

150

200

250

1 2 4 8 16 25

Tim

e t

ake

n (

Cyc

les)

x 10,000

Number of images updated

Without secure-zone With secure-zone

Figure 10. Performance overhead: Number of cycles required to update image library.

Computers 2018, 7, 31 11 of 22

In the second case, we have implemented the proposed solution for a NoC-based multi-coresystems which has been discussed in detail in the next section.

5. NoC-Based Communication Architecture

NoC-based communication architectures in multi-core systems have become apparent asa promising substitute to shared bus architecture and have been investigated extensively in termsof system performance, network topologies, application-specific implementations and routingmechanisms. Security aware architectures for such multi-core systems have come forth lately inthe research literature as the network connectivity and complex embedded cores have also madethese systems more susceptible to the software-based attacks. The protection of sensitive informationhas become the main design goal, especially in shared memory-based architectures where, withoutrequired protection, the application data can be modified by the compromised processing cores throughillegitimate access to the security critical areas of the memory.

In order to determine the scalability of the proposed solution, we have implemented the secureinterface for NoC-based communication architecture. A standard five-port router is designed forthis purpose and the secure interface unit is embedded into the router’s local input port as shown inFigure 11. It is assumed that each router has secure_mode input signals attached to its local channel’sinput port. The processing core requiring higher level security can be configured to such routerswhereas normal cores can be attached to standard routers. In this way, the trusted processing cores canbe isolated from other untrusted cores.

Data_in

req_in

ack_in

Input FIFO

Alert

Control Logic

Secure interface

Co

ntr

ol s

ign

als

Enab

leRouting

Logic

Local Channel Input Port

id

route

18 16

12

4

Enable

Data_out

swt_ack

4swt_req

2access_rights

16

2

secu

re_m

od

e

South channel

Local Channel North channel

Wes

t ch

ann

el East

ch

ann

el

Inp

ut

po

rto

utp

ut

po

rt

Input port output port

Inp

ut

po

rto

utp

ut

po

rt

Input port output portInput port output port

Crossbar switch

Data_in

ack_in

req_in

Data_in

ack_in

req_in

Dat

a_o

ut

ack_

ou

t

req

_ou

t

Data_o

ut

ack_ou

t

req_o

ut

Dat

a_o

ut

ack_

ou

t

req

_ou

tData_out

ack_out

req_out

Dat

a_in

req

_in

ack_

in

Dat

a_in

req

_in

ack_

in

Data_out

ack_out

req_out

Data_in

req_in

ack_in

Figure 11. Secure interface embedded inside local input port of the router.

The processing core is attached with the local channel of the router and other four channels areconnected with its adjoining routers. As the processing core communicates with other cores throughthe local channel, it is the perfect place to configure secure interface in order to verify the access rightsand detect any compromised packet as soon as it is injected. The packet being injected comprisesfour flits, which will be stored at the buffer of the input port. XY routing algorithm and a cut-throughflow control mechanism has been implemented in order to route the packet flits. The header flit storesthe address of the source and destination cores along with the access control rules (e.g., write accessonly, read access only, and/or both write and read access). The secure interface module receivesaccess_rights input from the header flit of the packet and verifies it with the secure_mode input asdiscussed earlier in Section 3). If both inputs are correctly verified then an enable signal is generatedand in case of any mismatch an alert signal will be generated by secure interface module for further

Computers 2018, 7, 31 12 of 22

action. After getting the alert signal, based on the security policy, the manager core can either disablethe compromised core or reset the system to its default configuration.

For hardware resources evaluation, the router is designed and developed for a Xilinx Virtex-7FPGA where target technology is 28 nm. To evaluate the impact of the secure interface on the router,the increase in area and power consumption is measured in two steps. At first, a standard routerwithout secure interface is synthesized and configured for 16 and 32 bit channel width, and networksizes of 16, 64 and 256 nodes in mesh topology. In the second step, after implementing and verifyingthe functionality of a standard router, the secure interface module is embedded in the local input portof the router. The modified router is again synthesized for the same channel widths and networkconfigurations, using XST, as mentioned earlier.

5.1. Area and Power Consumption Overhead

Area overhead is calculated by observing increase in the slice utilization when the router isenabled with the secure interface and synthesized and compared to a standard router without thesecurity feature. The slice utilization of various configurations of the router has been presented inFigure 12. In our case, the secure interface has presented an area overhead of four slices only whenconfigured for different number of nodes.

Similarly, the power consumption overhead has also been measured. In our case, the main focusis in the power consumed by the when the secure interface enabled router is in operation. The increasein power consumption has been measured by calculating the dynamic power consumed by modifiedrouter and compared to the same router architecture without secure interface. This overhead is evaluatedusing the XPE tool [35]. As illustrated in Figure 13, the power consumption overhead remains constantfor different network sizes. This is expected as the secure interface module working does not depend onthe network size.

724

1252

728

1256

0

800

1600

16-bit router 32-bit-router

nu

mb

er o

f sl

ices

Without secure zone

With secure zone

(a)

794

1356

797

1359

0

800

1600


nu

mb

er o

f sl

ices

Without secure zone

With secure zone

(b)

796

1366

800

1370

0

800

1600


nu

mb

er o

f sl

ices

Without secure zone

With secure zone

(c)

Figure 12. Area overhead of security interface on the routers for different network sizes. (a) 4 × 4 meshnetwork; (b) 8 × 8 mesh network; (c) 16 × 16 mesh network.

16

28

17

31

0

10

20

30


Pow

er C

on

sum

pti

on

(m

W)

Without secure zoneWith secure zone

(a)

17

28

18

31

0

10

20

30


Pow

er C

on

sum

pti

on

(m

W)


(b)

17

28

18

31

0

10

20

30


Pow

er C

on

sum

pti

on

(m

W)


(c)

Figure 13. Power overhead of security interface on the routers for different network sizes; (a) 4 × 4 meshnetwork; (b) 8 × 8 mesh network; (c) 16 × 16 mesh network.

Computers 2018, 7, 31 13 of 22

5.2. Performance and Energy Evaluation

Beside measuring the area and power consumption overhead at router level, hardware levelchanges in the router architecture have certain impacts on the overall system performance, therefore itis very critical to evaluate the performance and energy consumption overhead at the network level aswell. Firstly, to evaluate impact of the secure interface on the performance of each router, the propagationdelay of the modified router is calculated by using Equation (2).

Delay(Cycles) =pd2 − pd1

pd1× packet_pd_cycles (2)

where pd1 and pd2 are the respective propagation delays required by a router to generate output whenconfigured without and with a secure interface module. The packet_pd_cycles is the propagation delay,measured in clock cycles, incurred by the standard router to transmit one complete packet from oneinput port to the output port. To get performance evaluation results, a cycle accurate interconnectionnetwork simulator [36] is used. The secure interface is configured inside each router using the simulatorfor a mesh network of 4 × 4, 8 × 8 and 16 × 16 nodes with XY routing scheme and virtual cut-throughpacket switching technique with a packet consisting of four flits. The simulator is configured to usefirst 2000 cycles as warm-up period and run the experiment up to 200,000 cycles. The simulator is thenset up to inject packets at different rates until the network reaches to a saturation point where injectionof new packets is not possible in the system unless existing packets are processed. During warm-upperiod the simulation results are ignored and collected for active cycles only.

Total packet latency has been used to measure performance overhead which is the number of clockcycles required for all the injected packets to reach their corresponding destinations. At first, the resultsare collected by measuring the packet latency for the standard router architecture. After getting thebase results, the router configurations are adjusted to incorporate the propagation delay presented bysecure interface block.

Network energy consumption overhead is measured by configuring the Orion power model [37]within the simulation framework. In the first step, extra energy required by the secure interface modulewithin the router is measured with the help of XPE tool. In the second step, to get the total networkenergy consumption overhead, the simulator framework is modified to add the secure interface moduleenergy overhead for each router.

The network energy consumption and packet latency are calculated by first simulating thenetwork in various network configuration settings with and without the secure interface. For thisexperimental evaluation, we have used both various traffic patterns as explained below.

Three different synthetic traffic patterns are used for the simulations: uniform, transpose andhotspot traffic patterns.

Uniform Traffic Pattern: Under this traffic pattern, packets are uniformly distributed to all theprocessing cores, which result in high inter-node communication density. The simulation results arepresented in Figures 14 and 15. The proposed modifications in the router architecture have negligibleimpact on the packet latency, where maximum percentage increase of 9.92% is reported for 16 nodenetwork as compared to 3.38% increase for the network of 256 nodes. As can be seen in Figure 15 thatthe proposed security solution has not contributed significantly in the network energy consumption.For instance, as presented in Figure 20b, the secure interface enabled routers have increased networkenergy consumption by 0.73% for 16 nodes network in the worst case scenario.

Transpose Traffic Pattern: In transpose, as a non-uniform traffic pattern, packets are generatedbased on a fixed pairing of sources and destinations (a processing core with value an−1, an−2, ..., a1, a0

communicates with the processing core an/2−1, ..., a0, an−1, ..., an/2). Under such non-uniform trafficpatterns, communication between remote cores occurs more frequently and source to destination hopcount varies in a non-uniform manner which results in earlier network saturation at a lower packet

Computers 2018, 7, 31 14 of 22

injection rate when compared to uniform traffic pattern. The detailed simulation results are presentedin Figures 16 and 17.

Under transpose traffic pattern our proposed solution has a lower impact on the packet latencyand energy consumption as compared to uniform traffic pattern, as will be discussed later in the paper.

Hotspot Traffic Pattern: As realistic applications might generate communication traffic where oneor more processing cores accept a greater number of packets as oppose to other processing corescausing hotspot regions. This behavior can be best described synthetically under hotspot traffic patternwhere different processing cores are designated as hotspot cores with higher packet injection rate.To evaluate the behavior of our proposed security mechanism under hotspot traffic pattern, a variousnumber of cores are labeled as hotspots and configured to accept double packets as compared to theremaining cores. Hotspots are placed and simulated in three different configurations: one hotspotin the center, three hotspots placed in close proximity and four distributed hotspots with utmosthop-count. For example, the detailed simulation results have been illustrated in Figures 18 and 19 forthe four hotspots that are configured in distributed manner and being placed from each other withminimum hop-count of 3, 6 and 14 nodes for the network of 16, 64 and 256 nodes respectively.

6.2

17

.1

46

.6

78

.5

12

2.6

48

7.6

67

2.9

11

33

.9

20

35

.0

27

37

.4

6.8

18

.7

49

.7

82

.2

12

6.6

49

1.9

67

7.2

11

38

.3

20

39

.4

27

41

.7

0

500

1000

1500

2000

2500

3000

0.01 0.025 0.05 0.06 0.065 0.07 0.0705 0.071 0.0715 0.072

Tota

l Pac

ket

Late

ncy

(C

ycle

s)

x 100,000

Injection Rate(Packet/Node/Cycle)

Without secure zone

with secure zone

(a)

19

.2

40

.0

63

.2

90

.0

12

5.0

14

8.7

18

3.4

25

5.0

71

8.0

16

02

.0

25

71

.7

20

.5

42

.5

66

.9

94

.9

13

1.1

15

5.5

19

0.7

26

3.0

72

6.6

16

10

.8

25

80

.5

0

500

1000

1500

2000

2500

3000

0.005 0.01 0.015 0.02 0.025 0.0275 0.03 0.0325 0.035 0.0357 0.0358

Tota

l Pac

ket

Late

ncy

(C

ycle

s)

x 100,000


Without secure zone

with secure zone

(b)

26

.0

65

.7

13

4.9

25

7.4

29

1.9

38

8.6

51

8.0

82

8.8

12

38

.1

19

23

.3

27

.0

68

.2

13

9.8

26

6.3

30

1.7

40

0.9

53

2.8

84

6.1

12

56

.0

19

41

.4

0

500

1000

1500

2000

2500

0.001 0.0025 0.005 0.009 0.01 0.0125 0.015 0.0175 0.0182 0.0184

Tota

l Pac

ket

Late

ncy

(C

ycle

s)

x 100,000


Without secure zone

with secure zone

(c)

Figure 14. Average packet latency with uniform traffic pattern. (a) 4 × 4 mesh network; (b) 8 × 8 meshnetwork; (c) 16 × 16 node network.

Computers 2018, 7, 31 15 of 22

5.0

8x10

-5

1.2

8x10-4

2.5

5x10-4

3.0

7x10-4

3.3

3x10-4

3.5

8x10-4

3.6

1x10-4

3.6

3x10-4

3.6

4x10-4

3.6

5x10-4

5.1

2x10-5

1.2

9x10-4

2.5

7x10-4

3.0

9x10-4

3.3

5x10-4

3.6

1x10-4

3.6

4x10-4

3.6

6x10-4

3.6

7x10-4

3.6

8x10-4

0

1.0x10-4

2.0x10-4

3.0x10-4

4.0x10-4

0.01 0.025 0.05 0.06 0.065 0.07 0.0705 0.071 0.0715 0.072

Tota

l En

ergy

Co

nsu

mti

on

(Jo

ule

s)

Injection Rate(Message/Node/Cycle)

Without secure zone

with secure zone

(a)

1.7

8x10 -4

3.5

6x10-4

5.3

4x10-4

7.1

1x10-4

8.9

0x10-4

9.7

9x10-4

1.0

7x10-3

1.1

6x10-3

1.2

5x10-3

1.2

7x10-3

1.2

7x10-3

1.7

9x10-4

3.5

8x10-4

5.3

6x10-4

7.1

4x10-4

8.9

4x10-4

9.8

3x10-4

1.0

7x10-3

1.1

6x10-3

1.2

5x10-3

1.2

7x10-3

1.2

7x10-3

0

2.0x10-4

4.0x10-4

6.0x10-4

8.0x10-4

1.0x10-3

1.2x10-3

1.4x10-3

0.005 0.01 0.015 0.02 0.025 0.0275 0.03 0.0325 0.035 0.0357 0.0358

Tota

l En

ergy

Co

nsu

mti

on

(Jo

ule

s)


Without secure zone

with secure zone

(b)

2.6

4x10-4

6.5

9x10-4

1.3

2x10-3

2.3

7x10-3

2.6

3x10-3

3.2

9x10-3

3.9

5x10-3

4.6

1E-0

3

4.7

9x10-3

4.8

5x10-3

2.6

4x10-4

6.6

1x10-4

1.3

2x10-3

2.3

7x10-3

2.6

4x10-3

3.3

0x10-3

3.9

6x10-3

4.6

2E-0

3

4.8

0x10-3

4.8

6x10-3

0

1.0x10-3

2.0x10-3

3.0x10-3

4.0x10-3

5.0x10-3

6.0x10-3

0.001 0.0025 0.005 0.009 0.01 0.0125 0.015 0.0175 0.0182 0.0184

Tota

l En

ergy

Co

nsu

mti

on

(Jo

ule

s)


Without securezone

with secure zone

(c)

Figure 15. Network energy consumption with uniform traffic pattern. (a) 4 × 4 mesh network;(b) 8 × 8 mesh network; (c) 16 × 16 mesh network.

11

.8

23

.2

50

.6

23

0.4

26

7.2

32

1.3

61

6.0

11

72

.7

16

78

.9

22

32

.2

12

.7

24

.6

52

.2

23

2.1

26

9.0

32

3.0

61

7.7

11

74

.5

16

80

.6

22

34

.0

0

500

1000

1500

2000

2500

0.02 0.03 0.035 0.037 0.0371 0.0372 0.0375 0.038 0.0385 0.039

Tota

l Pac

ket

Late

ncy

(C

ycle

s)

x 100,000


Without secure zone

with secure zone

(a)

Figure 16. Cont.

Computers 2018, 7, 31 16 of 22

9.1

18

.5

28

.5

40

.0

55

.7

91

.0

13

2.9

21

9.9

41

8.5

88

6.8

13

94

.3

9.6

19

.6

30

.1

42

.2

58

.4

94

.2

13

6.2

22

3.2

42

1.9

89

0.3

13

97

.8

0

200

400

600

800

1000

1200

1400

0.0025 0.005 0.0075 0.01 0.0125 0.015 0.0155 0.0157 0.016 0.0162 0.0165

Tota

l Pac

ket

Late

ncy

(C

ycle

s)

x 100,000


Without secure zone

with secure zone

(b)

25

.7

38

.5

51

.7

65

.2

13

8.4

17

5.5

25

3.6

41

7.3

86

5.0

13

34

.5

26

.6

39

.9

53

.5

67

.5

14

3.0

18

1.0

26

0.0

42

4.1

87

1.9

13

41

.5

0

200

400

600

800

1000

1200

1400

0.001 0.0015 0.002 0.0025 0.005 0.006 0.007 0.0073 0.0075 0.00755

Tota

l Pac

ket

Late

ncy

(C

ycle

s)

x 100,000


Without secure zone

with secure zone

(c)

Figure 16. Average packet latency with transpose traffic pattern. (a) 4 × 4 mesh network; (b) 8 × 8 meshnetwork; (c) 16 × 16 node network.

9.0

1x10-5

1.3

5x10-4

1.5

8x10-4

1.6

7x10-4

1.6

7x10-4

1.6

7x10-4

1.6

8x10-4

1.6

8x10-4

1.6

9x10-4

1.6

9x10-4

9.0

6x10 -5

1.3

6x10-4

1.5

9x10-4

1.6

8x10-4

1.6

8x10-4

1.6

8x10-4

1.6

9x10-4

1.6

9x10-4

1.7

0x10-4

1.7

1x10-4

7.0x10-5

9.0x10-5

1.1x10-4

1.3x10-4

1.5x10-4

1.7x10-4

1.9x10-4

0.02 0.03 0.035 0.037 0.0371 0.0372 0.0375 0.038 0.0385 0.039

Tota

l En

ergy

Co

nsu

mti

on

(Jo

ule

s)


Without secure zone with secure zone

(a)

8.6

1x10-5

1.7

1x10-4

2.5

6x10-4

3.4

2x10-4

4.2

8x10-4

5.1

2x10-4

5.3

1x10-4

5.3

7x10-4

5.4

5x10-4

5.4

8x10-4

5.5

5x10-4

8.6

4x10-5

1.7

2x10-4

2.5

7x10-4

3.4

4x10-4

4.3

0x10-4

5.1

4x10-4

5.3

3x10-4

5.4

0x10-4

5.4

7x10-4

5.5

0x10-4

5.5

7x10-4

0

1.0x10-4

2.0x10-4

3.0x10-4

4.0x10-4

5.0x10-4

6.0x10-4

0.0025 0.005 0.0075 0.01 0.0125 0.015 0.0155 0.0157 0.016 0.0162 0.0165

Tota

l En

ergy

Co

nsu

mti

on

( J

ou

les)


Without secure zone

with secure zone

(b)

Figure 17. Cont.

Computers 2018, 7, 31 17 of 22

2.6

1x10-4

3.8

9x10-4

5.1

9x10-4

6.5

0x10-4

1.3

0x10-3

1.5

6x10-3

1.8

2x10-3

1.9

0x10-3

1.9

4x10-3

1.9

5x10-3

2.6

1x10-4

3.9

0x10-4

5.2

0x10-4

6.5

1x10-4

1.3

0x10-3

1.5

7x10-3

1.8

2x10-3

1.9

0x10-3

1.9

4x10-3

1.9

5x10-3

0

5.0x10-4

1.0x10-3

1.5x10-3

2.0x10-3

2.5x10-3

0.001 0.0015 0.002 0.0025 0.005 0.006 0.007 0.0073 0.0075 0.00755

Tota

l En

ergy

Co

nsu

mti

on

(Jo

ule

s)


Without securezone

with secure zone

(c)

Figure 17. Network energy consumption with transpose traffic pattern. (a) 4 × 4 mesh network;(b) 8 × 8 mesh network; (c) 16 × 16 mesh network.

6.9

15

.6

29

.8

86

.9

14

5.1

67

7.5

74

0.8

12

60

.2

14

94

.5

19

86

.8

7.5

16

.8

31

.7

89

.2

14

7.5

67

9.9

74

3.3

12

62

.7

14

96

.9

19

89

.3

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0.01 0.02 0.03 0.038 0.039 0.04 0.0402 0.0404 0.0406 0.041

Tota

l Pac

ket

Late

ncy

(C

ycle

s)

x 100,000


Without secure zone

with secure zone

(a)

9.2

24

.9

52

.1

10

3.3

27

8.0

56

2.9

94

3.7

13

87

.3

19

47

.1

9.7

26

.2

54

.1

10

5.5

28

0.3

56

5.2

94

6.1

13

89

.6

19

49

.4

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0.002 0.005 0.008 0.009 0.0094 0.0096 0.0097 0.00975 0.0098

Tota

l Pac

ket

Late

ncy

(C

ycle

s)

x 100,000


Without secure zone

with secure zone

(b)

33

.9

85

.9

14

9.1

23

3.8

35

1.4

73

8.7

98

2.7

11

87

.2

21

68

.5

34

.9

87

.8

15

1.2

23

6.0

35

3.6

74

0.9

98

4.9

11

89

.5

21

70

.7

0

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

0.001 0.002 0.0022 0.00225 0.00227 0.0023 0.00231 0.00232 0.00235

Tota

l Pac

ket

Late

ncy

(C

ycle

s)

x 100,000


Without secure zone

with secure zone

(c)

Figure 18. Average packet latency with four distributed hotspots. (a) 4 × 4 mesh network; (b) 8 × 8 meshnetwork; (c) 16 × 16 node network.

Computers 2018, 7, 31 18 of 22

x 100,000

5.7

2x10-5

1.1

5x10-4

1.7

2x10-4

2.1

7x10-4

2.2

4x10-4

2.2

9x10-4

2.3

0x10-4

2.3

0x10-4

2.3

1x10-4

2.3

2x10-4

5.7

5x10-5

1.1

5x10-4

1.7

3x10-4

2.1

9x10-4

2.2

5x10-4

2.3

0x10-4

2.3

1x10-4

2.3

2x10-4

2.3

2x10-4

2.3

4x10-4

0

5.0x10-5

1.0x10-4

1.5x10-4

2.0x10-4

2.5x10-4

0.01 0.02 0.03 0.038 0.039 0.04 0.0402 0.0404 0.0406 0.041

Tota

l En

ergy

Co

nsu

mti

on

(Jo

ule

s)


Without secure zone

with secure zone

(a)

8.7

3x10-5

2.1

9x10-4

3.4

8x10-4

3.9

3x10-4

4.1

0x10-4

4.1

9x10-4

4.2

0x10-4

4.2

0x10-4

4.2

1x10-4

8.7

6x10-5

2.2

0x10-4

3.4

9x10-4

3.9

4x10-4

4.1

2x10-4

4.2

1x10-4

4.2

3x10-4

4.2

3x10-4

4.2

3x10-4

0

1.0x10-4

2.0x10-4

3.0x10-4

4.0x10-4

5.0x10-4

0.002 0.005 0.008 0.009 0.0094 0.0096 0.0097 0.00975 0.0098

Tota

l En

ergy

Co

nsu

mti

on

(Jo

ule

s)


Without secure zone

with secure zone

(b)3

.39x10

-4

6.7

8x10-4

7.4

3x10-4

7.6

1x10-4

7.7

0x10-4

7.7

5x10-4

7.7

6x10-4

7.8

0x10-4

7.8

1x10-4

3.4

0x10-4

6.7

9x10-4

7.4

5x10-4

7.6

2x10-4

7.7

2x10-4

7.7

7x10-4

7.7

8x10-4

7.8

2x10-4

7.8

3x10-4

2.0x10-4

3.0x10-4

4.0x10-4

5.0x10-4

6.0x10-4

7.0x10-4

8.0x10-4

9.0x10-4

0.001 0.002 0.0022 0.00225 0.00227 0.0023 0.00231 0.00232 0.00235

Tota

l En

ergy

Co

nsu

mti

on

(Jo

ule

s)


Without securezone with secure zone

(c)

Figure 19. Network energy consumption with four distributed hotspots. (a) 4 × 4 mesh network;(b) 8 × 8 mesh network; (c) 16 × 16 mesh network.

As summarized in Figure 20, the proposed security solution has presented approximatelycomparable or slightly increased overhead both for packet latency and energy consumption underhotspot traffic pattern when compared with the remaining traffic patterns. For example, with fourdistributed hotspots in 16, 64 and 256 nodes network, the packet latency is increased by 8.83%,5.38% and 2.91% respectively as compared to 9.72%, 6.51% and 4.10% increase for three closely placedhotspots under similar network settings. Similarly, the network energy consumption overhead fordifferent synthetic patterns is summarized in Figure 20b. These overheads can be reduced further byanalysing the communication pattern and the nodes with recurring traffic pattern can be placed ina neighboring area.

The scalability of the proposed security mechanism is further evaluated by measuring itsimpact on the performance by simulating multi-threaded applications for a system having 64 cores.For this purpose, the traffic patterns for a number of applications are generated through the Netracelibrary [38,39]. Netrace is a utility that generates packet traces by simulating a 64-core system executingmulti-threaded applications from the PARSEC V2.1 benchmark suite [40]. The traffic patterns for

Computers 2018, 7, 31 19 of 22

two applications from this benchmark are used in our simulator in a 8 × 8 mesh network environment.The impact of embedding secure interface inside each router on average message latency is shownin Figure 21. From these results, it is clearly indicated that the network performance is not muchaffected when such benchmark applications data is simulated. Moreover, our proposed mechanismhas negligible performance overhead when simulated for the benchmark applications as compared tothe results generated for the different synthetic traffic patterns.

0

2

4

6

8

10

12

Uniform transpose hotspot-1node hotspot-3node hotspot-4node

% In

crea

se in

Pac

ket

Late

ncy

Synthetic Traffic Patterns

16 nodes

64 nodes

256 nodes

(a)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Uniform transpose hotspot-1node hotspot-3node hotspot-4node

% In

crea

se in

En

ergy

Co

nsu

mp

tio

n

Synthetic Traffic Patterns

16 nodes

64 nodes

256 nodes

(b)

Figure 20. Maximum percentage increase in packet latency and total network energy consumptionunder different synthetic traffic patterns. (a) Percentage increase in packet latency. (b) Percentageincrease in total network energy consumption.

(a) (b)

Figure 21. Message latency for 64 node network using multithreaded applications from PARSECbenchmark suite. (a) blackscholes application; (b) fluidanimate application.

6. Conclusions and Future Work

In this paper, a practical hardware-based security solution is proposed for a FPGA-basedembedded systems and NoC-based communication architecture. The security mechanism has beenimplemented as a secure zone and it is able to efficiently detect software attacks by identifyingunauthorized accesses to the blocks configured within the protected zone. The effectiveness ofthe proposed mechanism is verified by initiating a buffer-overflow-based software attack whichis eventually detected successfully. The feasibility of our solution is illustrated by implementingan authentication system based on a FPGA device with the area and power consumption overheadsof 1.23% and 9.5% respectively. Additionally, for NoC-based communication architecture, the secureinterface has presented a negligible area and power consumption overhead against a standard router.Moreover, experimental results have been obtained through simulation to present the impact of ourapproach on the packet latency and total energy consumption for different network sizes and it isshown that the performance and energy overhead is minimal.

Computers 2018, 7, 31 20 of 22

Currently, the secure-zone controller module is configured to take input from system usermanually. In future work, different ways of autonomous configuration will be investigated anda mechanism will be devised to configure the secure-zone controller remotely over a dedicated securecommunication medium.

Author Contributions: A.A. and A.S. conceived the design method and developed the experiments. M.J. providedfeedback on fine tuning the concepts and experiment methods. A.S. performed the experiments. All authorsanalyzed the results. A.A. led the writing with contributions from all authors.

Conflicts of Interest: The authors declare no conflict of interest.

References

1. Schaumont, P.; Raghunathan, A. Guest Editors’ Introduction: Security and Trust in Embedded-SystemsDesign. IEEE Des. Test Comput. 2007, 24, 518–520. [CrossRef]

2. Ravi, S.; Raghunathan, A.; Kocher, P.; Hattangady, S. Security in embedded systems: Design challenges.ACM Trans. Embed. Comput. Syst. 2004, 3, 461–491. [CrossRef]

3. Song, D.X.; Wagner, D.; Tian, X. Timing Analysis of Keystrokes and Timing Attacks on SSH. In Proceedingsof the USENIX Security Symposium, Washington, DC, USA, 13–17 August 2001; Volume 2001.

4. Perez, R.; van Doorn, L.; Sailer, R. Virtualization and hardware-based security. IEEE Secur. Priv. 2008, 6,24–31. [CrossRef]

5. Suh, G.E.; Lee, J.W.; Zhang, D.; Devadas, S. Secure Program Execution via Dynamic Information FlowTracking. SIGARCH Comput. Archit. News 2004, 32, 85–96. [CrossRef]

6. Kornaros, G.; Pnevmatikatos, D. A survey and taxonomy of on-chip monitoring of multicore systems-on-chip.ACM Trans. Des. Autom. Electron. Syst. 2013, 18, 17. [CrossRef]

7. Arora, D.; Ravi, S.; Raghunathan, A.; Jha, N. Secure embedded processing through hardware-assistedrun-time monitoring. In Proceedings of the Design, Automation and Test in Europe, Munich, Germany,7–11 March 2005; Volume 1, pp. 178–183 .

8. Mao, S.; Wolf, T. Hardware Support for Secure Processing in Embedded Systems. IEEE Trans. Comput. 2010,59, 847–854. [CrossRef]

9. Rahmatian, M.; Kooti, H.; Harris, I.; Bozorgzadeh, E. Hardware-Assisted Detection of Malicious Software inEmbedded Systems. IEEE Embed. Syst. Lett. 2012, 4, 94–97. [CrossRef]

10. Yoon, M.K.; Mohan, S.; Choi, J.; Kim, J.E.; Sha, L. SecureCore: A multicore-based intrusion detectionarchitecture for real-time embedded systems. In Proceedings of the 2013 IEEE 19th Real-Time and EmbeddedTechnology and Applications Symposium (RTAS), Philadelphia, PA, USA, 9–11 April 2013; pp. 21–32.

11. Yan, Q.; Li, Y.; Li, T.; Deng, R. Insights into malware detection and prevention on mobile phones. In SecurityTechnology; Springer: New York, NY, USA, 2009; pp. 242–249.

12. Rathgeb, C.T.; Peterson, G.D. Secure Processing Using Dynamic Partial Reconfiguration. In Proceedingsof the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security andInformation Intelligence Challenges and Strategies, CSIIRW ’09, Oak Ridge, TN, USA, 13–15 April 2009; p. 58.

13. Lie, D.; Thekkath, C.; Mitchell, M.; Lincoln, P.; Boneh, D.; Mitchell, J.; Horowitz, M. Architectural Supportfor Copy and Tamper Resistant Software. SIGPLAN Not. 2000, 35, 168–177. [CrossRef]

14. Suh, G.E.; Clarke, D.; Gassend, B.; van Dijk, M.; Devadas, S. AEGIS: Architecture for Tamper-evident andTamper-resistant Processing. In Proceedings of the 17th Annual International Conference on Supercomputing,ICS ’03, San Francisco, CA, USA, 23–26 June 2003; ACM: New York, NY, USA, 2003; pp. 160–171.

15. Doudalis, I.; Clause, J.; Venkataramani, G.; Prvulovic, M.; Orso, A. Effective and Efficient Memory ProtectionUsing Dynamic Tainting. IEEE Trans. Comput. 2012, 61, 87–100. [CrossRef]

16. Kemerlis, V.P.; Portokalidis, G.; Jee, K.; Keromytis, A.D. Libdft: Practical Dynamic Data Flow Tracking forCommodity Systems. SIGPLAN Not. 2012, 47, 121–132. [CrossRef]

17. Trusted Computing Group. TPM Spectfications for Embedded Systems. Available online: http://www.trustedcomputinggroup.org/developers/embedded_systems (accessed on 15 March 2018).

18. ARM. TrustZone. Available online: http://arm.com/products/processors/technologies/trustzone(accessed on 15 March 2018).

http://dx.doi.org/10.1109/MDT.2007.188

http://dx.doi.org/10.1145/1015047.1015049

http://dx.doi.org/10.1109/MSP.2008.135

http://dx.doi.org/10.1145/1037947.1024404

http://dx.doi.org/10.1145/2442087.2442088

http://dx.doi.org/10.1109/TC.2010.32

http://dx.doi.org/10.1109/LES.2012.2218630

http://dx.doi.org/10.1145/356989.357005

http://dx.doi.org/10.1109/TC.2010.215

http://dx.doi.org/10.1145/2365864.2151042

http://www.trustedcomputinggroup.org/developers/embedded_systems

http://www.trustedcomputinggroup.org/developers/embedded_systems

http://arm.com/products/processors/technologies/trustzone

Computers 2018, 7, 31 21 of 22

19. Intel. VT-d. Available online: http://software.intel.com/en-us/articles/intel-virtualization-technology-for-directed-io-vt-d (accessed on 15 March 2018).

20. Pék, G.; Buttyán, L.; Bencsáth, B. A survey of security issues in hardware virtualization. ACM Comput. Surv.2013, 45, 40. [CrossRef]

21. La Polla, M.; Martinelli, F.; Sgandurra, D. A survey on security for mobile devices. IEEE Commun. Surv. Tutor.2013, 15, 446–471. [CrossRef]

22. Fiorin, L.; Palermo, G.; Lukovic, S.; Catalano, V.; Silvano, C. Secure Memory Accesses on Networks-on-Chip.IEEE Trans. Comput. 2008, 57, 1216–1229. [CrossRef]

23. Lukovic, S.; Christianos, N. Enhancing network-on-chip components to support security of processingelements. In Proceedings of the 5th Workshop on Embedded Systems Security, WESS ’10, Scottsdale, AZ,USA, 24 October 2010; p. 12.

24. Porquet, J.; Greiner, A.; Schwarz, C. NoC-MPU: A secure architecture for flexible co-hosting on sharedmemory MPSoCs. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE),Grenoble, France, 14–18 March 2011; pp. 1–4.

25. Wassel, H.M.G.; Gao, Y.; Oberg, J.K.; Huffmire, T.; Kastner, R.; Chong, F.T.; Sherwood, T. SurfNoC: A LowLatency and Provably Non-interfering Approach to Secure Networks-on-chip. In Proceedings of the 40thAnnual International Symposium on Computer Architecture, ISCA ’13, Tel-Aviv, Israel, 23–27 June 2013 ;ACM: New York, NY, USA, 2013; pp. 583–594, doi:10.1145/2485922.2485972. [CrossRef]

26. Wassel, H.; Gao, Y.; Oberg, J.; Huffmire, T.; Kastner, R.; Chong, F.; Sherwood, T. Networks on Chip withProvable Security Properties. IEEE Micro 2014, 34, 57–68. [CrossRef]

27. Grammatikakis, M.D.; Papadimitriou, K.; Petrakis, P.; Papagrigoriou, A.; Kornaros, G.; Christoforakis, I.;Tomoutzoglou, O.; Tsamis, G.; Coppola, M. Security in MPSoCs: A NoC Firewall and an Evaluation Framework.IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2015, 34, 1344–1357, doi:10.1109/TCAD.2015.2448684.[CrossRef]

28. Hu, Y.; Müller-Gritschneder, D.; Sepulveda, M.J.; Gogniat, G.; Schlichtmann, U. Automatic ILP-basedFirewall Insertion for Secure Application-Specific Networks-on-Chip. In Proceedings of the 2015 NinthInternational Workshop on Interconnection Network Architectures: On-Chip, Multi-Chip (INA-OCMC),Amsterdam, The Netherlands, 19 January 2015; pp. 9–12, doi:10.1109/INA-OCMC.2015.9. [CrossRef]

29. Saeed, A.; Ahmadinia, A.; Just, M. Secure On-Chip Communication Architecture for ReconfigurableMulti-Core Systems. J. Circuits Syst. Comput. 2016, 25, 1650089, doi:10.1142/S0218126616500894. [CrossRef]

30. Saeed, A.; Ahmadinia, A.; Javed, A.; Larijani, H. Intelligent Intrusion Detection in Low-Power IoTs.ACM Trans. Internet Technol. (TOIT) 2016, 16, 27. [CrossRef]

31. Saeed, A.; Ahmadinia, A.; Just, M. Hardware-assisted secure communication for FPGA-based embeddedsystems. In Proceedings of the 2015 11th Conference on Ph.D. Research in Microelectronics and Electronics(PRIME), Glasgow, UK, 29 June–2 July 2015; pp. 216–219, doi:10.1109/PRIME.2015.7251373. [CrossRef]

32. Platform Studio and the Embedded Development Kit (EDK). Available online: https://www.xilinx.com/products/design-tools/platform.html (accessed on 15 March 2108).

33. Lhee, K.S.; Chapin, S.J. Buffer overflow and format string overflow vulnerabilities. Softw. Pract. Exp.2003, 33, 423–460. [CrossRef]

34. Younan, Y. 25 Years of Vulnerabilities: 1988–2012. Available online: http://labs.snort.org/ (accessed on15 March 2018).

35. Xilinx Power Estimator. Available online: http://xilinx.com/products/technology/power/xpe (accessedon 15 March 2108).

36. Hu, J.; Marculescu, R. Energy- and performance-aware mapping for regular NoC architectures. IEEE Trans.Comput.-Aided Des. Integr. Circuits Syst. 2005, 24, 551–562.

37. Wang, H.S.; Zhu, X.; Peh, L.S.; Malik, S. Orion: A power-performance simulator for interconnection networks.In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-35),Istanbul, Turkey, 18–22 November 2002; pp. 294–305.

38. Hestness, J.; Grot, B.; Keckler, S.W. Netrace: Dependency-driven trace-based network-on-chip simulation.In Proceedings of the Third International Workshop on Network on Chip Architectures, Atlanta, GA, USA,4 December 2010; ACM: New York, NY, USA, 2010; pp. 31–36.

http://software.intel.com/en-us/articles/intel-virtualization-technology-for-directed-io-vt-d

http://software.intel.com/en-us/articles/intel-virtualization-technology-for-directed-io-vt-d

http://dx.doi.org/10.1145/2480741.2480757

http://dx.doi.org/10.1109/SURV.2012.013012.00028

http://dx.doi.org/10.1109/TC.2008.69

https://doi.org/10.1145/2485922.2485972

http://dx.doi.org/10.1145/2485922.2485972

http://dx.doi.org/10.1109/MM.2014.46

https://doi.org/10.1109/TCAD.2015.2448684

http://dx.doi.org/10.1109/TCAD.2015.2448684

https://doi.org/10.1109/INA-OCMC.2015.9

http://dx.doi.org/10.1109/INA-OCMC.2015.9

https://doi.org/10.1142/S0218126616500894

http://dx.doi.org/10.1142/S0218126616500894

http://dx.doi.org/10.1145/2990499

https://doi.org/10.1109/PRIME.2015.7251373

http://dx.doi.org/10.1109/PRIME.2015.7251373

https://www.xilinx.com/products/design-tools/platform.html

https://www.xilinx.com/products/design-tools/platform.html

http://dx.doi.org/10.1002/spe.515

http://labs.snort.org/

http://xilinx.com/products/technology/power/xpe

Computers 2018, 7, 31 22 of 22

39. Hestness, J.; Keckler, S.W. Netrace: Dependency-Tracking Traces for Efficient Network-on-Chip Experimentation;Technical Report TR-10-11; The University of Texas at Austin, Department of Computer Science: Austin, TX,USA, 2011.

40. Bienia, C.; Kumar, S.; Singh, J.P.; Li, K. The PARSEC benchmark suite: Characterization and architecturalimplications. In Proceedings of the 17th International Conference on Parallel Architectures and CompilationTechniques, Toronto, ON, Canada, 25–29 October 2008; ACM: New York, NY, USA, 2008; pp. 72–81.

c© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

http://creativecommons.org/

http://creativecommons.org/licenses/by/4.0/.

Documents

Hardware-Assisted Secure Communication in Embedded and