1
POLITECNICO DI TORINO Dipartimento di Automatica e Informatica PhD in Computer and Control Engineering Supervisor cycle PhD Candidate: 1. Context The key success for the Internet-of-Thing (IoT) is the availability of always-on smart objects with embedded Integrated Circuits (ICs) that can process/transmit sensor data ceaseless. Due to limited budget of energy made available by small batteries, such ICs must show ultra-high energy efficiency thus to guarantee reasonable throughput. Overcoming classical low-power techniques, Speculative Computing trades Quality-of-Results (QoR) for energy exploiting error resilience of Data-intensive applications. Unlike the state-of-the-art methodologies (like Razor-based speculation), two unconventional knobs for speculative computing were explored: Timing-Speculation via Approximate Error Detection-Correction (AED-C) and Functional-Speculation via Inferential-Circuits (InfCs) for efficient Adaptive Voltage Over-Scaling (AVOS). 2. Timing-Speculation via AED-C The main intuition behind this technique [1][2][3][4] is that the availability of an AED-C mechanism represents a smart option to control the QoR-energy tradeoff. The proposed AED-C scheme leverages the resolution of in- situ timing error detection mechanisms as a knob to dynamically accelerate (Fig. 1a) or slow-down (Fig. 1b) AVOS, thus to achieve lower energy consumption or higher QoR. An Error Management Unit (EMU) implements an error-driven clock-gating enabling the error logic masking, while a Power Management Unit (PMU) uses the Error Rate (N e ) to handle voltage scaling (refer to Fig. 2). A parametric analysis on a set of representative benchmarks discloses AED-C eciency (Fig. 3), providing an assessment of the QoR-energy trade-o. Also, when applied to a real-life application, i.e., FDCT into a JPEG compressor, aggressive AED-C shows 51.9% energy savings (vs. baseline FDCT) and a PSNR of 48.5dB (vs. baseline JPEG). Worst and best quality images are shown in Fig. 4a and 4b. Energy-Efficient Speculative Computing for IoT ICs Prof. Andrea Calimera Prof. Enrico Macii XXXI 3. Functional-Speculation via InfCs InfCs, as design-level speculative architectures, are able to skip arithmetic operations inferring output values by evaluating the key features of a function learned during a training stage, i.e., Machine Learning (ML). Since InfCs leverage statistical inference engine as Classification Trees (CT), an additional stage for CT building is integrated into the standard synthesis flow (Fig. 5). Being inference faster than arithmetic computation, InfCs show larger margins to push VDD under the “safety” point, trading QoR for energy. As a case study, we implemented an Inferential Multiplier (I-MULT): it presents a 22% area savings being 2.2x faster than a Booth Mult. [5]. We tested the efficiency of I-MULT on Image Blending application, over a set of 56 blended images. I-MULT enables an aggressive VOS pushing the VDD from 1.10V to 0.64V, with 80% power savings (w.r.t. Booth Mult.) at the cost of 5.4% average error (NRMSE). 4. References 1. R.G. Rizzo et al. “Early Bird Sampling: a Short-Paths Free Error Detection- Correction Strategy for Data-Driven VOS”, Proc. VLSI-SoC, 2017 2. R.G. Rizzo et al. “Tunable Error Detection-Correction for Efficient Adaptive Voltage Over-Scaling”, Proc. NGCAS, 2017 3. R.G. Rizzo et al. “On the Efficiency of Early Bird Sampling (EBS) An Error Detection-Correction Scheme for Data-Driven Voltage Overs-Scaling”, VLSI-SoC Book, Springer, 2018 (In Press) 4. R.G. Rizzo et al. “Approximate Error Detection-Correction for Efficient Adaptive Voltage Over-Scaling”, Integration, the VLSI Journal, 2018 5. R.G. Rizzo et al. “Multiplication by Inference using Classification Trees: A Case-Study Analysis”, Proc. ISCAS, 2018 Roberto Giorgio Rizzo DW t paths T clk 1.5T clk Vdd nom Vdd Errors DW t paths T clk 1.25T clk Vdd nom Vdd Errors Miss Figure 1. AED-C Working Principle (a) Aggressive AED-C (b) Slow AED-C 19.36% 9.90% 6.60% 28.23% 14.22% 11.53% 55.38% 47.26% 45.09% Testbench1 Testbench2 Testbench3 0 10 20 30 40 50 60 Razor AED-C Slow AED-C Aggres. 0.17% 0.04% 0.01% NRMSE 50 45 40 35 30 25 20 15 DW [%Tclk] 0.6 0.8 1 Vdd avg [V] 0 0.5 1 1.5 2 NRMSE [%] Vdd avg NRMSE Figure 3. Parametric Analysis on AED-C applications (a) MAC: Vdd avg / Error avg vs. TDW (b) FIR Filter: Energy savings CLK Q ERROR 0 1 AED-C D FF Q FF D L Q L R L RST TDL Errors OR-TREE …. D MAIN TDW V delay * * * Errors Counter PMU VDD Clock-Gating REF_CLK CLK 0 1 EMU N e Figure 2. Error detection and logic masking circuitry in AED-C Truth Table Training data-set CT CT.v CT RTL Logic Synthesis & Optimization Gate-level netlist RTL Description Technology Mapping Model Extraction Standard Design Flow ML-driven Design Flow Figure 5. InfC ML-driven Design Flow through CTs (a) Baseline (b) AED-C Aggressive Figure 4. AED-C for FDCT in JPEG compression (a) PSNR=42dB (b) PSNR=52dB

Supervisor PhD in Computer and Control Engineering Prof. Andrea … · 2018. 10. 16. · POLITECNICO DI TORINO Dipartimento di Automatica e Informatica PhD in Computer and Control

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • POLITECNICODI TORINO

    Dipartimento diAutomatica e Informatica

    PhD in Computer and Control EngineeringSupervisor

    cycle

    PhD Candidate:

    1. ContextThe key success for the Internet-of-Thing (IoT) is theavailability of always-on smart objects with embeddedIntegrated Circuits (ICs) that can process/transmit sensordata ceaseless. Due to limited budget of energy madeavailable by small batteries, such ICs must show ultra-highenergy efficiency thus to guarantee reasonable throughput.Overcoming classical low-power techniques, SpeculativeComputing trades Quality-of-Results (QoR) for energyexploiting error resilience of Data-intensive applications.

    Unlike the state-of-the-art methodologies (like Razor-basedspeculation), two unconventional knobs for speculativecomputing were explored: Timing-Speculation viaApproximate Error Detection-Correction (AED-C) andFunctional-Speculation via Inferential-Circuits (InfCs) forefficient Adaptive Voltage Over-Scaling (AVOS).

    2. Timing-Speculation via AED-CThe main intuition behind this technique [1][2][3][4] is thatthe availability of an AED-C mechanism represents a smartoption to control the QoR-energy tradeoff.

    The proposed AED-C scheme leverages the resolution of in-situ timing error detection mechanisms as a knob todynamically accelerate (Fig. 1a) or slow-down (Fig. 1b) AVOS,thus to achieve lower energy consumption or higher QoR.

    An Error Management Unit (EMU) implements an error-drivenclock-gating enabling the error logic masking, while a PowerManagement Unit (PMU) uses the Error Rate (Ne) to handlevoltage scaling (refer to Fig. 2). A parametric analysis on a setof representative benchmarks discloses AED-C efficiency (Fig.3), providing an assessment of the QoR-energy trade-off. Also,when applied to a real-life application, i.e., FDCT into a JPEGcompressor, aggressive AED-C shows 51.9% energy savings(vs. baseline FDCT) and a PSNR of 48.5dB (vs. baseline JPEG).Worst and best quality images are shown in Fig. 4a and 4b.

    Energy-Efficient Speculative Computing for IoT ICs

    Prof. Andrea CalimeraProf. Enrico MaciiXXXI

    3. Functional-Speculation via InfCsInfCs, as design-level speculative architectures, are able to skiparithmetic operations inferring output values by evaluating thekey features of a function learned during a training stage, i.e.,Machine Learning (ML). Since InfCs leverage statistical inferenceengine as Classification Trees (CT), an additional stage for CTbuilding is integrated into the standard synthesis flow (Fig. 5).

    Being inference faster than arithmetic computation, InfCs showlarger margins to push VDD under the “safety” point, tradingQoR for energy. As a case study, we implemented anInferential Multiplier (I-MULT): it presents a 22% area savingsbeing 2.2x faster than a Booth Mult. [5]. We tested the efficiencyof I-MULT on Image Blending application, over a set of 56blended images. I-MULT enables an aggressive VOS pushingthe VDD from 1.10V to 0.64V, with 80% power savings (w.r.t.Booth Mult.) at the cost of 5.4% average error (NRMSE).

    4. References1. R.G. Rizzo et al. “Early Bird Sampling: a Short-Paths Free Error Detection-

    Correction Strategy for Data-Driven VOS”, Proc. VLSI-SoC, 2017

    2. R.G. Rizzo et al. “Tunable Error Detection-Correction for EfficientAdaptive Voltage Over-Scaling”, Proc. NGCAS, 2017

    3. R.G. Rizzo et al. “On the Efficiency of Early Bird Sampling (EBS) An ErrorDetection-Correction Scheme for Data-Driven Voltage Overs-Scaling”,VLSI-SoC Book, Springer, 2018 (In Press)

    4. R.G. Rizzo et al. “Approximate Error Detection-Correction for EfficientAdaptive Voltage Over-Scaling”, Integration, the VLSI Journal, 2018

    5. R.G. Rizzo et al. “Multiplication by Inference using Classification Trees: ACase-Study Analysis”, Proc. ISCAS, 2018

    Roberto Giorgio Rizzo

    DW

    t

    paths

    Tclk 1.5∙Tclk

    Vddnom Vdd ↓ Errors

    DW

    t

    paths

    Tclk 1.25∙Tclk

    Vddnom Vdd ↓ Errors

    Miss

    Figure 1. AED-C Working Principle(a) Aggressive AED-C (b) Slow AED-C

    19.36%

    9.90% 6.60%

    28.23%

    14.22%11.53%

    55.38%

    47.26%45.09%

    Testbench1 Testbench2 Testbench30

    10

    20

    30

    40

    50

    60

    RazorAED-C SlowAED-C Aggres.

    0.17% 0.04%0.01%

    NRMSE

    50 45 40 35 30 25 20 15DW [%Tclk]

    0.6

    0.8

    1

    Vdd

    avg

    [V

    ]

    0

    0.5

    1

    1.5

    2

    NR

    MS

    E [%

    ]

    Vddavg

    NRMSE

    Figure 3. Parametric Analysis on AED-C applications (a) MAC: Vddavg / Erroravg vs. TDW (b) FIR Filter: Energy savings

    CLK

    Q

    ERROR

    01

    AED-CDFF QFF

    DL QLRLRST

    TDL

    Errors OR-TREE

    ….

    D

    MAIN

    TDW

    Vdelay

    *

    *

    *Errors Counter

    PMUVDD

    Clock-Gating

    REF_CLK

    CLK

    0 1

    EMU

    Ne

    Figure 2. Error detection and logic masking circuitry in AED-C

    TruthTable

    Trainingdata-set

    CTCT.v

    CT RTL

    Logic Synthesis&

    OptimizationGate-level

    netlistRTL

    DescriptionTechnology

    Mapping

    ModelExtraction

    Stan

    dard

    Desig

    n Fl

    ow

    ML-

    driv

    en

    Desig

    n Fl

    ow

    Figure 5. InfC ML-driven Design Flow through CTs

    (a) Baseline (b) AED-C Aggressive

    Figure 18: Worst Case JPEG image

    (a) Baseline (b) AED-C Aggressive

    Figure 19: Best Case JPEG image

    compression, energy savings achieve a remarkable 51.9% (w.r.t.Baseline FDCT) with astounding QoR, PSNR 48.45 dB (w.r.t.Baseline JPEG pictures).

    References

    [1] L. Benini, G. Castelli, A. Macii, B. Macii, R. Scarai, Battery-driven dy-namic power management of portable systems, in: Proceedings 13th In-ternational Symposium on System Synthesis, 2000, pp. 25–30.

    [2] P. K. Krause, I. Polian, Adaptive voltage over-scaling for resilient ap-plications, in: 2011 Design, Automation Test in Europe, 2011, pp. 1–6.doi:10.1109/DATE.2011.5763153.

    [3] V. Peluso, R. G. Rizzo, A. Calimera, E. Macii, M. Alioto, Beyond idealDVFS through ultra-fine grain vdd-hopping, in: IFIP/IEEE InternationalConference on Very Large Scale Integration-System on a Chip, Springer,2016, pp. 152–172.

    [4] D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, N. S. Kim,K. Flautner, Razor: circuit-level correction of timing errors for low-poweroperation, IEEE Micro 24 (6) (2004) 10–20.

    [5] D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler,D. Blaauw, T. Austin, K. Flautner, et al., Razor: A low-power pipelinebased on circuit-level timing speculation, in: Microarchitecture, 2003.MICRO-36. Proceedings. 36th Annual IEEE/ACM International Sympo-sium on, IEEE, 2003, pp. 7–18.

    [6] S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner,T. Mudge, A self-tuning dvs processor using delay-error detection andcorrection, IEEE Journal of Solid-State Circuits 41 (4) (2006) 792–804.

    [7] S. Das, C. Tokunaga, S. Pant, W.-H. Ma, S. Kalaiselvan, K. Lai, D. M.Bull, D. T. Blaauw, RazorII: In situ error detection and correction for pvtand ser tolerance, IEEE Journal of Solid-State Circuits 44 (1) (2009) 32–48.

    [8] S. Valadimas, Y. Tsiatouhas, A. Arapoyanni, Timing error tolerance innanometer ICs, in: On-Line Testing Symposium (IOLTS), 2010 IEEE16th International, IEEE, 2010, pp. 283–288.

    [9] R. G. Rizzo, V. Peluso, A. Calimera, J. Zhou, X. Liu, Early bird sampling:a short-paths free error detection-correction strategy for data-driven VOS,in: International Conference on Very Large Scale Integration (VLSI-SoC)), 2017 IEEE 25th International, IEEE, 2017.

    [10] G. Karakonstantis, K. Roy, Voltage over-scaling: A cross-layer designperspective for energy e�cient systems, in: Circuit Theory and Design(ECCTD), 2011 20th European Conference on, IEEE, 2011, pp. 548–551.

    [11] K. A. Bowman, J. W. Tschanz, N. S. Kim, J. C. Lee, C. B. Wilkerson, S.-L. L. Lu, T. Karnik, V. K. De, Energy-e�cient and metastability-immuneresilient circuits for dynamic variation tolerance, IEEE Journal of Solid-State Circuits 44 (1) (2009) 49–63.

    [12] K. A. Bowman, J. W. Tschanz, S.-L. L. Lu, P. A. Aseron, M. M. Khel-lah, A. Raychowdhury, B. M. Geuskens, C. Tokunaga, C. B. Wilkerson,T. Karnik, et al., A 45 nm resilient microprocessor core for dynamic vari-ation tolerance, IEEE Journal of Solid-State Circuits 46 (1) (2011) 194–208.

    [13] S. Ghosh, S. Bhunia, K. Roy, Crista: A new paradigm for low-power,variation-tolerant, and adaptive circuit synthesis using critical path isola-tion, IEEE Transactions on Computer-Aided Design of Integrated Circuitsand Systems 26 (11) (2007) 1947–1956.

    [14] D. Mohapatra, G. Karakonstantis, K. Roy, Low-power process-variationtolerant arithmetic units using input-based elastic clocking, in: Proceed-ings of the 2007 international symposium on Low power electronics anddesign, ACM, 2007, pp. 74–79.

    [15] J. Carmona, J. Cortadella, M. Kishinevsky, A. Taubin, Elastic circuits,IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems 28 (10) (2009) 1437–1455.

    [16] B. Shim, S. R. Sridhara, N. R. Shanbhag, Reliable low-power digital sig-nal processing via reduced precision redundancy, IEEE Transactions onVery Large Scale Integration (VLSI) Systems 12 (5) (2004) 497–510.

    [17] D. J. Pagliari, A. Calimera, E. Macii, M. Poncino, An automated designflow for approximate circuits based on reduced precision redundancy, in:Computer Design (ICCD), 2015 33rd IEEE International Conference on,IEEE, 2015, pp. 86–93.

    [18] A. B. Kahng, S. Kang, R. Kumar, J. Sartori, Slack redistribution forgraceful degradation under voltage overscaling, in: Proceedings of the2010 Asia and South Pacific Design Automation Conference, IEEE Press,2010, pp. 825–831.

    [19] A. Calimera, R. I. Bahar, E. Macii, M. Poncino, Temperature-insensitivedual- vrmth synthesis for nanometer cmos technologies under inverse tem-perature dependence, IEEE Transactions on Very Large Scale Integration(VLSI) Systems 18 (11) (2010) 1608–1620.

    [20] R. Vattikonda, W. Wang, Y. Cao, Modeling and minimization of pmosnbti e↵ect for robust nanometer design, in: Proceedings of the 43rd annualDesign Automation Conference, ACM, 2006, pp. 1047–1052.

    [21] R. G. Rizzo, A. Calimera, Tunable error detection-correction fo e�cientvoltage over-scaling, in: New Generation of Circuits and Systems Con-ference (NGCAS), 2017 IEEE 1st International, IEEE, 2017.

    [22] S. Kim, M. Seok, Variation-tolerant, ultra-low-voltage microprocessorwith a low-overhead, within-a-cycle in-situ timing-error detection andcorrection technique, IEEE Journal of Solid-State Circuits 50 (6) (2015)1478–1490.

    [23] I. Kwon, S. Kim, D. Fick, M. Kim, Y.-P. Chen, D. Sylvester, Razor-lite: alight-weight register for error detection by observing virtual supply rails,IEEE Journal of Solid-State Circuits 49 (9) (2014) 2054–2066.

    [24] Y.-M. Yang, I. H.-R. Jiang, S.-T. Ho, Pushpull: Short-path padding fortiming error resilient circuits, IEEE Transactions on Computer-Aided De-sign of Integrated Circuits and Systems 33 (4) (2014) 558–570.

    [25] S. Das, G. S. Dasika, K. Shivashankar, D. Bull, A 1 GHz hardware loop-accelerator with razor-based dynamic adaptation for energy-e�cient op-eration, IEEE Transactions on Circuits and Systems I: Regular Papers61 (8) (2014) 2290–2298.

    [26] M. Fojtik, D. Fick, Y. Kim, N. Pinckney, D. M. Harris, D. Blaauw,D. Sylvester, Bubble razor: Eliminating timing margins in an arm cortex-m3 processor in 45 nm cmos using architecturally independent error de-tection and correction, IEEE Journal of Solid-State Circuits 48 (1) (2013)66–81.

    [27] K. A. Bowman, et al., Energy-e�cient and metastability-immune resilientcircuits for dynamic variation tolerance, IEEE Journal of Solid-State Cir-cuits 44 (1) (2009) 49–63.

    [28] J. Zhou, X. Liu, Y.-H. Lam, C. Wang, K.-H. Chang, J. Lan, M. Je,Hepp: A new in-situ timing-error prediction and prevention techniquefor variation-tolerant ultra-low-voltage designs, in: Solid-State CircuitsConference (A-SSCC), 2013 IEEE Asian, IEEE, 2013, pp. 129–132.

    [29] A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, L. Benini,A. Macii, E. Macii, M. Poncino, Dynamic thermal clock skew compensa-tion using tunable delay bu↵ers, IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems 16 (6) (2008) 639–649.

    [30] A. Calimera, A. Pullini, A. V. Sathanur, L. Benini, A. Macii, E. Macii,M. Poncino, Design of a family of sleep transistor cells for a clusteredpower-gating flow in 65nm technology, in: Proceedings of the 17th ACMGreat Lakes symposium on VLSI, ACM, 2007, pp. 501–504.

    12

    (a) Baseline (b) AED-C Aggressive

    Figure 18: Worst Case JPEG image

    (a) Baseline (b) AED-C Aggressive

    Figure 19: Best Case JPEG image

    compression, energy savings achieve a remarkable 51.9% (w.r.t.Baseline FDCT) with astounding QoR, PSNR 48.45 dB (w.r.t.Baseline JPEG pictures).

    References

    [1] L. Benini, G. Castelli, A. Macii, B. Macii, R. Scarai, Battery-driven dy-namic power management of portable systems, in: Proceedings 13th In-ternational Symposium on System Synthesis, 2000, pp. 25–30.

    [2] P. K. Krause, I. Polian, Adaptive voltage over-scaling for resilient ap-plications, in: 2011 Design, Automation Test in Europe, 2011, pp. 1–6.doi:10.1109/DATE.2011.5763153.

    [3] V. Peluso, R. G. Rizzo, A. Calimera, E. Macii, M. Alioto, Beyond idealDVFS through ultra-fine grain vdd-hopping, in: IFIP/IEEE InternationalConference on Very Large Scale Integration-System on a Chip, Springer,2016, pp. 152–172.

    [4] D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, N. S. Kim,K. Flautner, Razor: circuit-level correction of timing errors for low-poweroperation, IEEE Micro 24 (6) (2004) 10–20.

    [5] D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler,D. Blaauw, T. Austin, K. Flautner, et al., Razor: A low-power pipelinebased on circuit-level timing speculation, in: Microarchitecture, 2003.MICRO-36. Proceedings. 36th Annual IEEE/ACM International Sympo-sium on, IEEE, 2003, pp. 7–18.

    [6] S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner,T. Mudge, A self-tuning dvs processor using delay-error detection andcorrection, IEEE Journal of Solid-State Circuits 41 (4) (2006) 792–804.

    [7] S. Das, C. Tokunaga, S. Pant, W.-H. Ma, S. Kalaiselvan, K. Lai, D. M.Bull, D. T. Blaauw, RazorII: In situ error detection and correction for pvtand ser tolerance, IEEE Journal of Solid-State Circuits 44 (1) (2009) 32–48.

    [8] S. Valadimas, Y. Tsiatouhas, A. Arapoyanni, Timing error tolerance innanometer ICs, in: On-Line Testing Symposium (IOLTS), 2010 IEEE16th International, IEEE, 2010, pp. 283–288.

    [9] R. G. Rizzo, V. Peluso, A. Calimera, J. Zhou, X. Liu, Early bird sampling:a short-paths free error detection-correction strategy for data-driven VOS,in: International Conference on Very Large Scale Integration (VLSI-SoC)), 2017 IEEE 25th International, IEEE, 2017.

    [10] G. Karakonstantis, K. Roy, Voltage over-scaling: A cross-layer designperspective for energy e�cient systems, in: Circuit Theory and Design(ECCTD), 2011 20th European Conference on, IEEE, 2011, pp. 548–551.

    [11] K. A. Bowman, J. W. Tschanz, N. S. Kim, J. C. Lee, C. B. Wilkerson, S.-L. L. Lu, T. Karnik, V. K. De, Energy-e�cient and metastability-immuneresilient circuits for dynamic variation tolerance, IEEE Journal of Solid-State Circuits 44 (1) (2009) 49–63.

    [12] K. A. Bowman, J. W. Tschanz, S.-L. L. Lu, P. A. Aseron, M. M. Khel-lah, A. Raychowdhury, B. M. Geuskens, C. Tokunaga, C. B. Wilkerson,T. Karnik, et al., A 45 nm resilient microprocessor core for dynamic vari-ation tolerance, IEEE Journal of Solid-State Circuits 46 (1) (2011) 194–208.

    [13] S. Ghosh, S. Bhunia, K. Roy, Crista: A new paradigm for low-power,variation-tolerant, and adaptive circuit synthesis using critical path isola-tion, IEEE Transactions on Computer-Aided Design of Integrated Circuitsand Systems 26 (11) (2007) 1947–1956.

    [14] D. Mohapatra, G. Karakonstantis, K. Roy, Low-power process-variationtolerant arithmetic units using input-based elastic clocking, in: Proceed-ings of the 2007 international symposium on Low power electronics anddesign, ACM, 2007, pp. 74–79.

    [15] J. Carmona, J. Cortadella, M. Kishinevsky, A. Taubin, Elastic circuits,IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems 28 (10) (2009) 1437–1455.

    [16] B. Shim, S. R. Sridhara, N. R. Shanbhag, Reliable low-power digital sig-nal processing via reduced precision redundancy, IEEE Transactions onVery Large Scale Integration (VLSI) Systems 12 (5) (2004) 497–510.

    [17] D. J. Pagliari, A. Calimera, E. Macii, M. Poncino, An automated designflow for approximate circuits based on reduced precision redundancy, in:Computer Design (ICCD), 2015 33rd IEEE International Conference on,IEEE, 2015, pp. 86–93.

    [18] A. B. Kahng, S. Kang, R. Kumar, J. Sartori, Slack redistribution forgraceful degradation under voltage overscaling, in: Proceedings of the2010 Asia and South Pacific Design Automation Conference, IEEE Press,2010, pp. 825–831.

    [19] A. Calimera, R. I. Bahar, E. Macii, M. Poncino, Temperature-insensitivedual- vrmth synthesis for nanometer cmos technologies under inverse tem-perature dependence, IEEE Transactions on Very Large Scale Integration(VLSI) Systems 18 (11) (2010) 1608–1620.

    [20] R. Vattikonda, W. Wang, Y. Cao, Modeling and minimization of pmosnbti e↵ect for robust nanometer design, in: Proceedings of the 43rd annualDesign Automation Conference, ACM, 2006, pp. 1047–1052.

    [21] R. G. Rizzo, A. Calimera, Tunable error detection-correction fo e�cientvoltage over-scaling, in: New Generation of Circuits and Systems Con-ference (NGCAS), 2017 IEEE 1st International, IEEE, 2017.

    [22] S. Kim, M. Seok, Variation-tolerant, ultra-low-voltage microprocessorwith a low-overhead, within-a-cycle in-situ timing-error detection andcorrection technique, IEEE Journal of Solid-State Circuits 50 (6) (2015)1478–1490.

    [23] I. Kwon, S. Kim, D. Fick, M. Kim, Y.-P. Chen, D. Sylvester, Razor-lite: alight-weight register for error detection by observing virtual supply rails,IEEE Journal of Solid-State Circuits 49 (9) (2014) 2054–2066.

    [24] Y.-M. Yang, I. H.-R. Jiang, S.-T. Ho, Pushpull: Short-path padding fortiming error resilient circuits, IEEE Transactions on Computer-Aided De-sign of Integrated Circuits and Systems 33 (4) (2014) 558–570.

    [25] S. Das, G. S. Dasika, K. Shivashankar, D. Bull, A 1 GHz hardware loop-accelerator with razor-based dynamic adaptation for energy-e�cient op-eration, IEEE Transactions on Circuits and Systems I: Regular Papers61 (8) (2014) 2290–2298.

    [26] M. Fojtik, D. Fick, Y. Kim, N. Pinckney, D. M. Harris, D. Blaauw,D. Sylvester, Bubble razor: Eliminating timing margins in an arm cortex-m3 processor in 45 nm cmos using architecturally independent error de-tection and correction, IEEE Journal of Solid-State Circuits 48 (1) (2013)66–81.

    [27] K. A. Bowman, et al., Energy-e�cient and metastability-immune resilientcircuits for dynamic variation tolerance, IEEE Journal of Solid-State Cir-cuits 44 (1) (2009) 49–63.

    [28] J. Zhou, X. Liu, Y.-H. Lam, C. Wang, K.-H. Chang, J. Lan, M. Je,Hepp: A new in-situ timing-error prediction and prevention techniquefor variation-tolerant ultra-low-voltage designs, in: Solid-State CircuitsConference (A-SSCC), 2013 IEEE Asian, IEEE, 2013, pp. 129–132.

    [29] A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, L. Benini,A. Macii, E. Macii, M. Poncino, Dynamic thermal clock skew compensa-tion using tunable delay bu↵ers, IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems 16 (6) (2008) 639–649.

    [30] A. Calimera, A. Pullini, A. V. Sathanur, L. Benini, A. Macii, E. Macii,M. Poncino, Design of a family of sleep transistor cells for a clusteredpower-gating flow in 65nm technology, in: Proceedings of the 17th ACMGreat Lakes symposium on VLSI, ACM, 2007, pp. 501–504.

    12

    Figure 4. AED-C for FDCT in JPEG compression

    (a) PSNR=42dB (b) PSNR=52dB