4
A Dual-Core 8051 Microcontroller System based on Synchronous-logic and Asynchronous-logic 1 Kok-Leong Chang, 2 Tong Lin, 2 Weng-Geng Ho, 2 Kwen-Siong Chong, 2 Bah-Hwee Gwee and 2 Joseph S. Chang 1 Synthesis and Integration, Institute of Materials Research and Engineering (IMRE), A*STAR, Singapore 2 Centre for Integrated Circuits and Systems, Nanyang Technological University, Singapore [email protected] Abstract—We describe a dual-core 8051 microcontroller system featuring the synchronous and asynchronous (clockless) mode of operation. The synchronous mode of operation is achieved by means of a synchronous 8051 microcontroller core, while the asynchronous mode of operation is achieved by means of an asynchronous 8051 microcontroller core. The 8051 microcontroller system features shared embedded program and data memories that enable the switching between the two microcontroller cores during program execution. The measured energy, speed and electromagnetic interference of both microcontroller cores will be compared at different operation workloads. I. INTRODUCTION Current Systems-on-Chip (SoCs) are faced with three challenges – energy efficiency, adaptability and form factor [1]. Energy efficiency translates to the battery lifetime of SoCs, and energy efficiency is particularly important for remote surveillance and monitoring applications where energy source is scarce [2]. Adaptability refers to the SoC’s ability to operate robustly under varying conditions. The well-established factors that pose challenges to the operation of SoCs are process, voltage and temperature (PVT) [3], noise, soft-errors, etc. In particular, PVT variations are becoming larger with nanometer- sized transistors and are thus receiving the most attention from researchers. Form factor refers to the portability and aesthetic nature of a device, which in turn limits the area of the SoCs and size of batteries deployed in the device. The three challenges described above are inter-related, e.g. the minimum size of the battery is limited by the energy-efficiency of the SoC and the maximum size of the battery is limited by the form factor of the device. Asynchronous-logic offers SoCs an alternate design methodology (to the synchronous-logic) by potentially addressing the energy efficiency and adaptability challenges [4]. Asynchronous logic is potentially more energy efficient than synchronous logic due to the following. First, asynchronous logic features average-case performance (higher average speed) that translates to lower leakage energy. Second, asynchronous logic features innate clock-gating (a deliberate effort for synchronous logic) and the granularity of clock- gating is innately at the gate level (fine-grained). The innate fine-grained clock-gating of asynchronous logic translates to lower dynamic energy. Asynchronous logic is also potentially more adaptable than synchronous logic because asynchronous logic is self-timed; asynchronous logic innately adjusts its own operating speed to adapt to varying conditions (e.g. PVT). Despite asynchronous logic being advantageous in energy efficiency and adaptability, it is well established that asynchronous logic circuits generally incur larger integrated circuit area which limits the form factor of the device [5]. Considering the general advantages and disadvantages of asynchronous logic, there is strong motivation to compare asynchronous logic and synchronous logic by means of a practical design. In this paper, we propose a dual-core 8051 microcontroller system (DC8051). The DC8051 has two modes of operation: synchronous and asynchronous. The synchronous mode of operation is based on the Synopsys DW8051 IP core [6] (S8051). The asynchronous mode of operation is based on our proposed asynchronous 8051 core (A8051). The DC8051 has 1kbyte Read-Only Memory (ROM) for program memory, and 128byte and 1kbyte of Random-Access Memory (RAM) and External Random-Access Memory (XRAM) for data memory, respectively. The aforesaid memories are shared between S8051 and A8051. We compare the S8051 and A8051 in two operating conditions. For high speed applications, we compare the S8051 and A8051 at full workload. Workload is defined by the minimum required computation speed (speed that satisfies a given computation task) divided by the maximum speed of the microcontroller. For medium to low speed applications, we further compare the S8051 and A8051 at medium and low workload to investigate energy efficiency. The design is fabricated on the 130nm process technology, and based on standard cell libraries and IP memories. Measured results show that at 100% workload, S8051 and A8051 are comparable – the proposed A8051, on average, features 10% higher speed, dissipates 10% lower energy and features 12dB lower electromagnetic interference as compared to S8051. At 10% workload, the proposed A8051 dissipates a significant ~1.8× lower energy compared to the S8051. The area cost of the A8051 is ~2× S8051. II. ARCHITECTURE OF THE DUAL-CORE 8051 MICROCONTROLLER SYSTEM Fig. 1 depicts the architecture of the DC8051 that consists of the S8051, the proposed A8051, and the shared embedded 1kbyte ROM, 128byte RAM and 1kbyte XRAM. The S8051 and A8051 also share three groups of inputs and outputs: the Control I/O (main control signals), Program I/O (programmable and debug signals) and General-Purpose I/O (GP_I/O). They also share two controller blocks: the Prog Port Control (Programmable Port Controller) and the GP Port Control (General-Purpose Port Controller). The former allows the initialization of the on-chip ROM by means of an off-chip mnemonic programmer (through the Program I/O) during power-up. The latter serves as an interface controller between the SPX and APX (I/O of S8051 and A8051 respectively), and the GP_I/O (PX, which comprises of PX_I_ON, PX_R, PX_A and PX_D (where X is 0-3)). Table I tabulates the operation modes of the S8051, A8051 and their shared memories. The master reset (RSTN), program enable (PROGN), microcontroller core selector (A8051_S8051N) and external interrupt (INTN) inputs apply to both S8051 and A8051. The active low input PROGN disables both the S8051 and A8051, and allows the ROM to be programmed via the Prog Port Control block by means of the Program I/O. The active high (active low) input 978-1-4673-5762-3/13/$31.00 ©2013 IEEE 3022

A Dual-Core 8051 Microcontroller System Based on ...Fig. 1. Block diagram of the dual-core 8051 microcontroller sy stem embodying the synchronous 8051 microcontroller core, the pro

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Dual-Core 8051 Microcontroller System Based on ...Fig. 1. Block diagram of the dual-core 8051 microcontroller sy stem embodying the synchronous 8051 microcontroller core, the pro

A Dual-Core 8051 Microcontroller System based on Synchronous-logic and Asynchronous-logic

1Kok-Leong Chang, 2Tong Lin, 2Weng-Geng Ho, 2Kwen-Siong Chong, 2Bah-Hwee Gwee and 2Joseph S. Chang 1Synthesis and Integration, Institute of Materials Research and Engineering (IMRE), A*STAR, Singapore

2Centre for Integrated Circuits and Systems, Nanyang Technological University, Singapore [email protected]

Abstract—We describe a dual-core 8051 microcontroller system featuring the synchronous and asynchronous (clockless) mode of operation. The synchronous mode of operation is achieved by means of a synchronous 8051 microcontroller core, while the asynchronous mode of operation is achieved by means of an asynchronous 8051 microcontroller core. The 8051 microcontroller system features shared embedded program and data memories that enable the switching between the two microcontroller cores during program execution. The measured energy, speed and electromagnetic interference of both microcontroller cores will be compared at different operation workloads.

I. INTRODUCTION Current Systems-on-Chip (SoCs) are faced with three

challenges – energy efficiency, adaptability and form factor [1]. Energy efficiency translates to the battery lifetime of SoCs, and energy efficiency is particularly important for remote surveillance and monitoring applications where energy source is scarce [2]. Adaptability refers to the SoC’s ability to operate robustly under varying conditions. The well-established factors that pose challenges to the operation of SoCs are process, voltage and temperature (PVT) [3], noise, soft-errors, etc. In particular, PVT variations are becoming larger with nanometer-sized transistors and are thus receiving the most attention from researchers. Form factor refers to the portability and aesthetic nature of a device, which in turn limits the area of the SoCs and size of batteries deployed in the device. The three challenges described above are inter-related, e.g. the minimum size of the battery is limited by the energy-efficiency of the SoC and the maximum size of the battery is limited by the form factor of the device.

Asynchronous-logic offers SoCs an alternate design methodology (to the synchronous-logic) by potentially addressing the energy efficiency and adaptability challenges [4]. Asynchronous logic is potentially more energy efficient than synchronous logic due to the following. First, asynchronous logic features average-case performance (higher average speed) that translates to lower leakage energy. Second, asynchronous logic features innate clock-gating (a deliberate effort for synchronous logic) and the granularity of clock-gating is innately at the gate level (fine-grained). The innate fine-grained clock-gating of asynchronous logic translates to lower dynamic energy. Asynchronous logic is also potentially more adaptable than synchronous logic because asynchronous logic is self-timed; asynchronous logic innately adjusts its own operating speed to adapt to varying conditions (e.g. PVT). Despite asynchronous logic being advantageous in energy efficiency and adaptability, it is well established that asynchronous logic circuits generally incur larger integrated circuit area which limits the form factor of the device [5]. Considering the general advantages and disadvantages of asynchronous logic, there is strong motivation to compare asynchronous logic and synchronous logic by means of a practical design.

In this paper, we propose a dual-core 8051 microcontroller system (DC8051). The DC8051 has two modes of operation: synchronous and asynchronous. The synchronous mode of operation is based on the Synopsys DW8051 IP core [6] (S8051). The asynchronous mode of operation is based on our proposed asynchronous 8051 core (A8051). The DC8051 has 1kbyte Read-Only Memory (ROM) for program memory, and 128byte and 1kbyte of Random-Access Memory (RAM) and External Random-Access Memory (XRAM) for data memory, respectively. The aforesaid memories are shared between S8051 and A8051.

We compare the S8051 and A8051 in two operating conditions. For high speed applications, we compare the S8051 and A8051 at full workload. Workload is defined by the minimum required computation speed (speed that satisfies a given computation task) divided by the maximum speed of the microcontroller. For medium to low speed applications, we further compare the S8051 and A8051 at medium and low workload to investigate energy efficiency. The design is fabricated on the 130nm process technology, and based on standard cell libraries and IP memories. Measured results show that at 100% workload, S8051 and A8051 are comparable – the proposed A8051, on average, features 10% higher speed, dissipates 10% lower energy and features 12dB lower electromagnetic interference as compared to S8051. At 10% workload, the proposed A8051 dissipates a significant ~1.8× lower energy compared to the S8051. The area cost of the A8051 is ~2× S8051.

II. ARCHITECTURE OF THE DUAL-CORE 8051 MICROCONTROLLER SYSTEM

Fig. 1 depicts the architecture of the DC8051 that consists of the S8051, the proposed A8051, and the shared embedded 1kbyte ROM, 128byte RAM and 1kbyte XRAM. The S8051 and A8051 also share three groups of inputs and outputs: the Control I/O (main control signals), Program I/O (programmable and debug signals) and General-Purpose I/O (GP_I/O). They also share two controller blocks: the Prog Port Control (Programmable Port Controller) and the GP Port Control (General-Purpose Port Controller). The former allows the initialization of the on-chip ROM by means of an off-chip mnemonic programmer (through the Program I/O) during power-up. The latter serves as an interface controller between the SPX and APX (I/O of S8051 and A8051 respectively), and the GP_I/O (PX, which comprises of PX_I_ON, PX_R, PX_A and PX_D (where X is 0-3)).

Table I tabulates the operation modes of the S8051, A8051 and their shared memories. The master reset (RSTN), program enable (PROGN), microcontroller core selector (A8051_S8051N) and external interrupt (INTN) inputs apply to both S8051 and A8051. The active low input PROGN disables both the S8051 and A8051, and allows the ROM to be programmed via the Prog Port Control block by means of the Program I/O. The active high (active low) input

978-1-4673-5762-3/13/$31.00 ©2013 IEEE 3022

Page 2: A Dual-Core 8051 Microcontroller System Based on ...Fig. 1. Block diagram of the dual-core 8051 microcontroller sy stem embodying the synchronous 8051 microcontroller core, the pro

A8051_S8051N activates the A8051 (S8051). The active low input INTN triggers the interrupt system of the S8051 and A8051.

A. Synchronous Microcontroller Core – S8051 The design of sync microcontrollers, including the S8051,

is mature and extensively reported in literature [7]. The design of S8051 herein is based on the technology-independent Synopsys microcontroller core macro cell [6] and synthesized for low-mid speed (50MHz). This synthesized design, being a practical design, would embody some delay margins to accommodate PVT variations at nominal operating condition.

Fig. 2 depicts the block diagram of the S8051; this block diagram is a continuation of Fig. 1. The S8051 is a 2-stage pipeline system. The Instruction Fetch (IF) and Decode and Execute (D&X) block forms the two pipeline stages. The first pipeline stage (IF block) manages the fetching, decoding and grouping of instructions. This stage includes the IF, Instruction

Pointer Arithmetic Unit (IPAU) and Instruction Pointer (IP) blocks. The second pipeline stage (D&X block) manages the fetching of operands, execution of operation, and writing back the results. This stage includes the D&X, Arithmetic and Logic Unit (ALU), and Register File (ReF) blocks. The I/O wires (SCLK, SRSTN and SINTN) and I/O buses (SPX, SROM, SRAM and SXRAM) of the S8051 herein forms the interface to the block diagram in Fig. 1 (consisting of the S8051, A8051 and the shared blocks).

Fig. 1. Block diagram of the dual-core 8051 microcontroller system embodying the synchronous 8051 microcontroller core, the proposed asynchronous 8051 microcontroller core and the shared blocks

TABLE I. OPERATION MODES OF THE SYNCHRONOUS 8051 CORE, ASYNCHRONOUS 8051 CORE AND THE SHARED BLOCKS

RST PROGN A8051_S8051N INTN Mode 0 X X X Reset 1 0 X X Program 1 1 0 1 S8051 active 1 1 1 1 A8051 active 1 1 0 0 S8051 interrupt 1 1 1 0 A8051 interrupt

3023

Page 3: A Dual-Core 8051 Microcontroller System Based on ...Fig. 1. Block diagram of the dual-core 8051 microcontroller sy stem embodying the synchronous 8051 microcontroller core, the pro

B. Proposed Asynchronous Microcontroller Core – A8051 Fig. 3 depicts the block diagram of the A8051; this block

diagram is a continuation of Fig. 1. A8051 is in part specified using Balsa [8], and in part (Flow Controller (FCont) block and Memory Controller (MemCont) block) handcrafted; Balsa is an asynchronous-logic behavioral synthesis EDA tool and as it is based on the syntax-translation approach. The FCont block synchronizes the Control I/O with the A8051 via the sync ARSTN (Asynchronous Reset) and AINTN (Asynchronous Interrupt), while MemCont block synchronizes the embedded memories with the A8051 via the AROM (Asynchronous ROM), ARAM (Asynchronous RAM) and AXRAM (Asynchronous XRAM).

The proposed A8051 is also designed as 2-stage pipeline systems. The IF block forms the first pipeline stage and manages the fetching and grouping of instructions, including the handling of exceptions (e.g. initialization, interrupts and branching). This stage includes IF, FCont, IP, Instruction Pointer Arithmetic Unit (IPAU) and MemCont blocks. The D&X block forms the second pipeline stage and manages the fetching of operands, execution of operation, and writing back the results. This stage includes the D&X, Register File (ReF), ALU and MemCont blocks. The ReF block directly drives the APX I/O buses without synchronization blocks (FCont and MemCont) to preserve the modality of the local handshake protocols. It will be delineated in the next section that this I/O bus modality features higher energy efficiency at low and medium workloads. The I/O wires (ARSTN and AINTN) and I/O buses (APX, AROM, ARAM and AXRAM) of the A8051 herein forms the interface to the block diagram in Fig. 1 (consisting of the S8051, A8051 and the shared blocks.

III. IMPLEMENTATION AND MEASUREMENT RESULTS The DC8051 microcontroller system is realized using

130nm CMOS and the chip microphotograph is shown in Fig. 4. Collectively, they occupy 4.1mm2, with the A8051 (1.1mm2) occupying ~2× the area of the S8051 (0.6mm2), and this is largely due to the dual-rail-encoded asynchronous logic (and based on standard library cells); this area overhead can be mitigated with custom cells [9].

Six benchmark programs are used to evaluate the EPI and MIPS of the S8051 and A8051: Arithmetic, Logical, Data

Transfer, Boolean Variable, Branching and Dhrystone v2.1. The first five benchmark programs evaluate the performance of one particular instruction type, whereas the last evaluates the overall performance. The measurement results on prototype ICs are tabulated in Table II and for ease of comparison, the results are normalized to the A8051 and shown in parentheses. Based on Dhrystone v2.1, the MIPS of the A8051 is ~1.1× higher (equivalent to 59MHz (7.4/6.3 × 50MHz in the last row)) than the S8051, yet its EPI is ~1.1× lower; a composite ~1.2× for MIPS×EPI. It can be observed that the EPI of the A8051 is generally higher when executing arithmetic and branching benchmark programs compared to other benchmark programs. This is due to the extensive use of the dual-rail-encoded datapath-dominant IPAU (branching) and ALU

Fig. 2. Block diagram of the synchronous 8051 microcontroller core Fig. 3. Block diagram of the asynchronous 8051 microcontroller core

Fig. 4. Chip microphotograph of the dual-core 8051 microcontroller system

TABLE II. MEASURED ENERGY PER INSTRUCTION (EPI) AND MILLIONS OF INSTRUCIONS PER SECOND (MIPS) OF THE SYNCHRONOUS 8051 CORE AND

THE PROPOSED ASYNCHRONOUS 8051 CORE

Benchmark Program

S8051 A8051 EPI (pJ)

MIPS (MInstr/s)

EPI (pJ)

MIPS (MInstr/s)

Arithmetic 374 (1.1) 6.4 (0.9) 352 (1) 6.9 (1) Logical 369 (1.4) 7.0 (0.7) 267 (1) 9.5 (1) Data Transfer 357 (1.1) 6.8 (0.9) 327 (1) 7.7 (1) Boolean Variable 358 (1.3) 6.7 (0.7) 271 (1) 9.6 (1) Branching 547 (1.0) 4.4 (1.3) 528 (1) 3.5 (1) Dhrystone v2.1 401 (1.1) 6.3 (0.9) 349 (1) 7.4 (1)

3024

Page 4: A Dual-Core 8051 Microcontroller System Based on ...Fig. 1. Block diagram of the dual-core 8051 microcontroller sy stem embodying the synchronous 8051 microcontroller core, the pro

(arithmetic) blocks; they are nevertheless lower than S8051.

Figs. 5(a) and 5(b) respectively depict the power spectrum (0Hz – 1GHz) of the S8051 and A8051. The power spectrum is obtained from the Fast Fourier Transform (FFT) of the current drawn from VDD at 100% workload. The 50MHz clock of the S8051 causes peaks at its harmonic frequencies, and the highest peak is at 400MHz. In comparison and not unexpectedly, the A8051 features a more evenly distributed power spectrum (the harmonic frequencies unlike the case of the S8051, are not present) and the highest peak at 330MHz is ~12dB (~16×) lower than the S8051. A low and evenly distributed power spectrum is often desirable in many applications, e.g. sensitive RF receivers.

Consider now the performance when the workload is varied from 10% to 100%. Fig. 6 depicts the measured energy per computation (EPC) of the S8051 and A8051. EPC is used in this comparison (rather than EPI) in order to evaluate the energy efficiency per operation at low and medium workloads. It is observed that as the workload reduces, the EPC of both the S8051 and A8051 (GP_I/O synchronous) increases and this is largely due to increasing leakage energy. Not unexpectedly, the

A8051 (GP_I/O asynchronous) features lower EPC than S8051 and A8051 (GP_I/O synchronous) as workload reduces. This is because the A8051 (GP_I/O asynchronous) stalls the IF block, thereby reducing the dynamic energy when the workload reduces. At 10% workload, the EPC of A8051 with its GP_I/O in the asynchronous mode is ~1.8× lower than the S8051.

IV. CONCLUSIONS A dual-core 8051 microcontroller system featuring a

synchronous and an asynchronous core has been described. It has been shown that at 100% workload, both the microcontroller cores are comparable, with the asynchronous microcontroller core slightly advantageous in energy, speed, and electromagnetic interference. The cost of the asynchronous core is the ~2× area. At 10% workload, the asynchronous microcontroller core dissipates a significant ~1.8× lower energy compared to the synchronous core. The dual-core 8051 microcontroller system has shown that the synchronous-logic approach is advantageous when operating at high workloads, while the asynchronous-logic approach is advantageous when operating at medium and low workloads.

REFERENCES [1] T.-C. Chen, "Where CMOS is going: trendy hype vs. real technology,"

in Proc. IEEE International Solid-State Circuits Conference, 2006, pp. 1-18.

[2] R. D. Jorgenson, L. Sorensen, D. Leet, M. S. Hagedorn, D. R. Lamb, T. H. Friddell, and W. P. Snapp, "Ultralow-Power Operation in Subthreshold Regimes Applying Clockless Logic," Proceedings of the IEEE, vol. 98, no. 2, pp. 299-314, Feb. 2010.

[3] Z. Bo, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in Proc. International Symposium on Low Power Electronics and Design, 2005, pp. 20-25.

[4] P. A. Beerel, R. O. Ozdag, and M. Ferretti, A Designer's Guide to Asynchronous VLSI: Cambridge University Press, 2010.

[5] L. A. Plana, S. Taylor, and D. Edwards, "Attacking control overhead to improve synthesised asynchronous circuit performance," in Proc. IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2005, pp. 703-710.

[6] DesignWare Library: DW8051 MacroCell. [Online]. Available: http://www.synopsys.com.

[7] I. S. MacKenzie and R. C.-W. Phan, The 8051 microcontroller: Pearson Prentice Hall, 2007.

[8] A. Bardsley, "Implementing Balsa Handshake Circuits," PhD Thesis, Department of Computer Science, University of Manchester, 2000.

[9] K. S. Chong, B. H. Gwee, and J. S. Chang, "Energy-efficient synchronous-logic and asynchronous-logic FFT/IFFT processors," IEEE Journal of Solid-State Circuits, vol. 42, no. 9, pp. 2034-2045, Sep. 2007.

Fig. 6. Measured energy per computation (EPC) of the synchronous 8051 microcontroller core and the asynchronous 8051 microcontroller core; the asynchronous 8051 microcontroller core features both the asynchronous and synchronousGP_I/O operating modes

Fig. 5. Measured power spectrum from 0Hz to 1GHz of (a) the synchronous 8051 microcontroller core and (b) the asynchronous 8051 microcontroller core

3025