Upload
morgan-colvert
View
214
Download
0
Embed Size (px)
Citation preview
1CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
WP4: Relationship between workpackages
2CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
WP4 Presentation Outline
Resources and funding
Deliverables in the review period
Task Structure
Task activity review
Summary
Jun. 22, 2010
2
3CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
WP4 Resources and Funding
Jun. 22, 2010
3
PartnerEffort (m/m)Total:
Effort (m/m) 2009Effort (m/m) 2010 (planned)
Contract signed
Advanced payment
LETI 36 11.5 (9 T4.1 / 2.5 T4.2) 12 (8 T4.1 / 4 T4.2) Y Y
CSEM 24 0 (T4.2) 12 (T4.2) N N
TMPO 57.6 18.5 (16 T4.2 / 2.5 T4.4) 22.3 (15.8 T4.2 / 6.5 T4.4) Y Y
ELX 36 12 (T4.2) 12 (T4.2) Y Y
TEKL 12 9 (T4.2) 3 (T4.2) N Y
ST I 72 20 (4 T4.2 / 16 T4.4) 30 (9 T4.2 / 21 T4.4) N N
ST F 66 5.5 (T4.3) 30 (T4.3) Y Y
ISD 123 55 (T4.3) 40 (T4.3) Y N
LIRM 30 14 (T4.5) 16 (T4.5) Y Y
NMX 30 5 (T4.3) 12 (T4.3) N N
THL 60 14 (13 T4.3 / 1 T4.5) 15 (12 T4.3 / 3 T4.5) Y Y
UPC 42 12 (3 T4.1 / 9 T4.4) 15 (10 T4.1 / 5 T4.4) Y Y
UNBO 23 8 (T4.4) 8 (T4.4) N N
POLI 36 15.3 (T4.2) 12 (T4.2) N N
4CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
WP4 Meetings
Face2face meetings– ST I, Agrate Brianza, Italy, April 3rd, 2009
• Participants: all WP4 partners
Web meetings– November 20th, 2009
• Participants: all WP4 partners– February 26th, 2010
• Participants: all WP4 task leaders and several partners– May 21st, 2010
• Participants: all WP4 partners
Jun. 22, 2010
4
5CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
WP4 Deliverables in the Review Period (1/2)
Jun. 22, 2010
5
No. Planned Status Explanation
D4.1.1 M24 On track Reports on PV-aware (self-) adaptive compensation and optimization techniques, including on-chip monitors
D4.1.2 M30 Tape-out of prototype on-chip sensors and level shifter circuits for (self-) adaptive design
D4.1.3 M36 Report on trade-off metrics for (self-) adaptive compensation and optimization techniques
D4.2.1 M12 DeliveredReports on PV-tolerant asynchronous blocks and on ultra low-power circuits/architectures. Prototype asynchronous/de-synchronization flow
D4.2.2 M24 On trackReports on PV-tolerant noise and EMI reduction techniques, and on asynchronous and de-synchronized communication scheme benchmarking
D4.2.3 M24 On trackAdvanced asynchronous/de-synchronization flow.Delivery of the first de-synchronized design, and ultra low-power circuits/architectures
D4.2.4 M36Report on PV-tolerant architectures and circuit performance analysis and current profile estimation. Synthesis and simulation of ultra low-power circuits/architectures
D4.2.5 M36
High-level asynchronous synthesis tool and exploitation on high-performance advanced industrial de-synchronized design. Advanced power shaping methodology and tool for low-EMI design
6CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
WP4 Deliverables in the Review Period (2/2)
Jun. 22, 2010
6
No. Planned Status Explanation
D4.3.1 M12 Delivered Robust architecture design specification, and SystemC model for a multi-core SoC virtual platform
D4.3.2 M24 On trackHigh-level models for robust and predictable blocks and architectures, also including NVM design, and robustness assessment report
D4.3.3 M24 On trackFunctional and test specs for a validated controller for ADC and PLL components. Fault-tolerant on-chip global communication scheme on a multi-core SoC virtual platform
D4.3.4 M36Validated macro blocks for controllers implementation on the multi-core SoC virtual platform. Report on signal coding for robust NVM design
D4.4.1 M24 On track Report on yield prediction tool and regular structures for PV-tolerant blocks
D4.4.2 M24 On trackReport on customizable and regular architectures for homogeneous multi-threading and signal processing,. Design flow for mapping on mask-programmable blocks
D4.4.3 M30 Tape-out of a chip based on regular transistor arrays
D4.4.4 M36Exploitation of the design flow for signal processing application mapping on the proposed regular architectures. Report on regular design impact on yield improvement
D4.5.1 M24 On track Report on programming methods and tools for PV-tolerant, reliable, and predictable MPSoC architectures
7CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
WP4 Task Structure
Task T4.1: Variability-aware design– Partners: LETI, UPC, STF– Definition and development of (self-) adaptive compensation and optimization
techniques to cope with the increasing impact of PV variations– New adaptive voltage and frequency scaling (AVFS) techniques, which can be
exploited either after testing or at run-time, will be developed
Task T4.2: Variation-tolerant, robust, low-noise and low-EMI architectures/micro-architectures
– Partners: ELX, CSEM, TMPO, LETI, POLI, ST I, TEKL– Development and design of advanced macro-blocks for robust and reliable
systems– Adaptive architectures based on asynchronous and de-synchronization
techniques– On-chip communication schemes (GALS paradigm) – Synthesis of PV-tolerant asynchronous/de-synchronized functional blocks and
architectures for low-EMI design– Design of ultra low-power applications
Jun. 22, 2010
7
8CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
WP4 Task Structure
Task T4.3: Design of reliable systems– Partners: ISD, THL, NMX, ST F– Design of highly reliable analog, mixed-mode, digital, and Non Volatile Memory
(NVM) systems based on unreliable foundations subject to large PV variations and degradation
Task T4.4: Design of regular architectures and circuits for high manufacturability and yield
– Partners: ST I, TMPO, UPC, UNBO– Development of customizable circuits, macro-blocks, and architectures based
on regular structures, in order to improve manufacturability and predictability
Task T4.5: Distributed reconfigurable PV-robust architectures– Partners: THL, LIRM– Development of MPSoC design and distributed and reconfigurable PV-tolerant
architectures– Programming methods and tools for predictable and PV-robust computing
architectures
Jun. 22, 2010
8
9CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.1: Local Adaptive Voltage and Frequency Scaling (LETI, STF)
A Local Adaptive Voltage and Frequency Scaling approach is proposed :- Allowing a local variability management- Requiring isolated Voltage Islands- Requiring isolated Frequency islands
Variability is managed at fine grain tuning dynamically V/F (WP4.1) according to on-chip diagnostic (WP3) A Globally Asynchronous and Locally Synchronous architecture is proposed (WP4.2)
Jun. 22, 2010
9
Digital Block
K : Actuators
S : Sensors
K : Actuators
S : Sensors
Parameter Control
Circuit
S
KS
KS
Decision Maker
K
S
Diagnostic
Action
Digital Block
K : Actuators
S : Sensors
K : Actuators
S : Sensors
Parameter Control
Circuit
S
KS
KS
Decision Maker
K
S
Diagnostic
Action
10CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.1: Design of efficient Level Shifters (UPC, LETI)
► Isolated voltage islands are requiring efficient Level Shifters developed by UPC:
- Early work achieved with some delay due to non-available first year funding
- A test-chip design is planned for second year so that deliverables will be completed on-time
Dynamic tuning of voltage and frequency
requires new Local actuators developed by LETI:
- Reduce the dynamic power by DVFS
- Serve as a regulator using an adaptive
technique, to exchange timing margins
against power budget.
Jun. 22, 2010
10
Pstat + dyn
Fclk
Fclk
Pstat + dyn
DFS
DVS
Fclk_reduite
Fclk_reduite
{Vhigh
, Fhigh
}
{Vlow
, Flow
}
{Vlow
,0}
11CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.2: Architectures to mitigate PV (CSEM)
Block architecture: for a full adder, which is the best architecture (ripple carry, carry look-ahead) and VDD for reducing the effect of PV?
At 500mV, RCA adder is about 2X slower than CSL adder at 500 mV, but σ/μ of delay is about 28% smaller due to longer critical path length.
But when we compare CSL at 500mV and RCA at 600mV, we see that RCA at 600 mV architectures are about 4% faster and less power hungry and 1.8X less sensitive to intra-die device-to-device process variations.
By comparing RCA at 500mV and RCA at 600mV it is clear that 21% of this improvement is due to higher VDD.
Jun. 22, 2010
11
CSL@500mV
RCA@500mV
RCA@600mV
EPO 1 0.67 0.94Delay 1 2.09 0.96σ/μ 8.1% 5.8% 4.6%
CSL: Carry Select AdderRCA: Ripple Carry Adder EPO: Energy Per Operation
Effect of power supply on circuit variability
12CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.2: Desynchronization flow and EMI reduction (ELX, POLI, ST I ) 1/2
Set up and tested an automatic flow for de-synchronization– Inherit the properties of asynchronous circuits with little effort– Use a mixture of EDA vendors
• Magma for the backend and Synopsys for the frontend and signoff
Apply the paradigm for EMI reduction– Analyzed supply current to estimate EMI improvement– Additionally, to improve the EMI reduction, multiplexed delay lines were used to introduce
local clock jitter in the Elastic Clocks
Fully integrated desynchronization into the ST implementation flow:– SOC Encounter for physical design– PrimeTime for sign-off– Apache RedHawk for power rail analysis
Tested on two designs: – An H.264 video encoder circuit– An ST7 8-bit microcontroller– 10-15 dB EMI improvement– Augment in robustness
Jun. 22, 2010
12
13CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010 Jun. 22, 2010
13
SynthesisSynthesis
Floorplan, Floorplan, PlacementPlacement
CTSCTS
RoutingRouting
Chip finishChip finish
ECOECO
RTLRTL
Sign-offSign-off
CUSTOMER FLOW
Volt. Domains, Netlist, SDC, DEF
TCL script, SDC
Netlist, DEFDelay line Delay line synthesissynthesis
(multi-corner)(multi-corner)
Elastix circuitElastix circuittransformationstransformations
Elastix timingElastix timingclosureclosure
Timing, netlist
TCL script, SDC
RTL, UPF Identify regions Identify regions to elasticizeto elasticize
ELASTIX DESIGN FLOW
TCL script, SDC
STASTATool DB
Timing
TCL script
T4.2: Desynchronization flow and EMI reduction (ELX, POLI, ST I) 2/2
14CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.2: Variability-tolerant low-EMI asynchronous circuits (TMPO)
Tiempo contribution is to enable the design of variability-tolerant low EMI asynchronous circuits and evaluate/predict at design time the EMC behavior
Tiempo first year achievements
► Set up a flow to design PVT-tolerant asynchronous cells
► Set up a flow to estimate current consumption profile and estimate EMI
► Tiempo demonstrated the flows on its asynchronous AES ciphering IP
Jun. 22, 2010
14
Corresponding current spectrumBelow -30 dB
AES asynchonous circuit currentcurve (extracted post P&R)
15CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.2: Robust asynchronous QDI communicationfor NoC (LETI)
Within a Local Adaptive Voltage and Frequency Scaling Architecture for dynamic variations compensation :
- Isolated voltage islands are requested (T4.1 work)
- Isolated frequency islands are also requested and a GALS architecture is proposed
In this GALS context, an asynchronous NoC is developed by CEA in T4.2:
- During the first year, an asynchronous library cells has been developed
- 32 nm technology, about 40 cells, fully characterized
Jun. 22, 2010
15
16CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.2: Integration of Power Shaping technology into EDA flows (TEKL, STI)
Jun. 22, 2010
16
Integration seamlessly into mainstream flows is done by analysing a given design using standard indudstry formats such as Verilog, SDF and SDC, and exporting modified Verilog as well as flow specific clock tree synthesis directives.
TEKL has integrated its power shaping technology into a Cadence Encounter-based as well as a Synopsys ICC-based ASIC backend flow. Part of this work has been done in close collaboration with ST I.
Place&Route
FloorDirector®
Synthesis
17CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.3: SystemC virtual platform for multicore SoC (ISD, THL) 1/2
ISD develops a clock-accurate, transaction-level SystemC virtual platform (VP) of a multicore SoC.
Once validated, the VP is extended to incorporate fault tolerance.
ISD also designs highly-reliable AMS blocks, e.g. PLL
Jun. 22, 2010
17
Shared Memory(ISD)
NoC(ISD)
Cluster of PEs (Thales)
Interconnect
SE1 SE2 SEN
CPE1 CPE2 CPEN…
…
18CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
Multilayered fault tolerant approach to diagnose and recover from permanent and transient node/link faults.
Methodology includes– packet encoding/retransmission, – fault tolerant routing – offline static reconfiguration.
Study performance degradation due to static and dynamic faults.
18
Jun. 22, 2010
5 10 15 20 25 30 35
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
8192 mp 8192 sh_mem 16384 mp
16384 sh_mem 32768 mp 32768 sh_mem
Hypercube size
Speedup
Speedup vs hypercube size for parallel sorting (no faults, 1st version of virtual platform)
T4.3: SystemC virtual platform for multicore SoC (ISD) 2/2
19CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.3: Multi-Core Architecture (THL) 1/2
Goal– Definition and development of a flexible, highly-parameterized, user-
friendly framework for exploring performance, power consumption and reliability trade-offs (different architectural and algorithmic solutions and technology process variations) in future multi-core systems
Results– Integration of Thales customized processor tile in coherence with fault-
tolerance scenarios selected for preliminary platform reliability evaluation– Preliminary VP models and specifications exchanged between ISD and
THL– Experimentation in Thales with processor model integration and platform
simulation based on preliminary test-benches
Jun. 22, 2010
19
20CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.3: Multi-Core Architecture (THL) 2/2
The Network Interface Module is in charge of network protocol translation. Because the iNoC and the NoC may not use the same protocol, or share the same frequency the tile must be isolated from the NoC.
Interfaces between modules are defined so that the SystemC model of this architecture allows to test any module. The simulator is based on OCP TL2.
Jun. 22, 2010
20
21CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.3: Fault Tolerant NoC architecture (STF)
Extended Spidergon STNoC to support fault tolerant routing through direction and destination reprogramming
Both node and link faults have been considered
Industrial application in STMicroelectronics products using SSTNoC technology
Jun. 22, 2010
21
22CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.3: Design of Highly-reliable Non-Volatile Memory systems (NUM) 1/2
Jun. 22, 2010
22
23CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010 Jun. 22, 2010
23
T4.3: Design of Highly-reliable Non-Volatile Memory systems (NUM) 2/2
24CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.4: Design of Mask Programmable IPs for Fast SoC Development (STI, UNBO)
Jun. 22, 2010
24
M1/M2 connections
Regular Transistor ArrayCustomization through
Metal Layers
VIA 4 connections
• Base-cell developed (logic)• P&R to be achieved with standard CAD• 2 Metal customization (M1 + M2)• Standard CORE library compliance
• Base-tile developed (logic + routing)• Tile logic fully synthesizable • 1 Via customization (Via 4)• All tiles identical
Customization Flow (same Front-End for both solutions)
.dot for GraphViz
ANSI C emulation
HDL netlist
Pseudo-CCode
(Griffy)
Syntax checks
Pipelined architecture distillation
Standard P&R CAD Flow and Signoff
Automatic Via 4 Layer generation for GDS view and standard Signoff
Transistor Array
Tile Datapath
IPReady for
integration
Front-End flow already implemented Back-End flow under development
Regular Tile DatapathCustomization through
Via Connections
25CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.4: Definition of customizable MP architecture (UNBO, STI)
Jun. 22, 2010
25
Programming model for application mapping on a regular multiprocessor architecture:
– Results:• Implementation of a compilation flow based on CUDA
programming model
– Future activities:• High level memory transfers management through
automated programming of DMA channels
Hardware/software design methodology for mapping accelerators on a customizable multiprocessor architecture:
– Results:• Implementation of a scalable and parametric system-
C (TLM) multiprocessor architecture
– Future activities: • Integration of accelerators emulation function with the
system-C model• High level management of heterogeneous distributed
hardware acceleration
Template architecture
Design flow
26CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.4: Development of a via-configurable regular transistor array (UPC, STI)
Main Target:
To develop a via-configurable regular transistor array (VCTA). The performance –area -power trade-offs of this approach for regular design will be evaluated, along with its impact on random defectivity, parametric yield, and manufacturability
Highlights: • VCTA basic architecture studied and implemented• Basic elements and blocks implemented • Regularity evaluation (part of D4.4.1, M24) using verification tools to
compute geometrical regularity characteristics A paper has been submitted to VLSI SOC 2010 conference
Lowlights: funding delays have affected the development of activitiesWork plan and on going activities:
Work on regular cell fabric to integrate ( Placement and Routing ) as automatic as possible, using state of art CAD tools
Jun. 22, 2010
26
27CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.4: Regular structures for variability-tolerant asynchronous circuits (TMPO, STI)
Main Target:
Study regular structures of variability-tolerant asynchronous circuits and evaluate their benefits on manufacturability and yield.
Highlights:• Study for characterization of asynchronous cells and macro-blocks
completed• Study of effect of variability on asynchronous circuits completed• Set up of a flow to characterize asynchronous building blocks: Design of
about 40 different cells and different drivers• Characterization of asynchronous cells designed has been completed• CAD view ( .lib, functional, verilog, schematic, symbols , layout) defined
Work plan and on going activities:• Characterization based on technology data to evaluate benefits of circuits
designed in term of manufacturability (support from industrial partner for technology data access)
Jun. 22, 2010
27
28CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.5: Distributed reconfigurable PV-robust architectures (THL)
– Definition of fault scenarios
– Definition of specification for solving the faults described in the fault scenarios.• Fault tolerance Operating Library allowing the user to detect faults and to
solve detected problems.• Definition of a set of functions (called through an API) that are used by the
operating system running on the architecture to detect faults, and by the user to receive fault reports
– From the given information, the user computes a new tile mapping for running processes
• After a reset of the chip, the new mapping will be used and the chip shall continue working like before the fault, without using the faulty part
• The new mapping implies new communication schemes
– Next step includes the development of re-mapping generation tools
Jun. 22, 2010
28
29CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.5: Distributed reconfigurable PV-robust architectures (LIRM) 1/2
► Distributed, homogeneous MPSoC Architecture (HS-Scale Architecture), from model to Hardware
► Run-Time Task remapping (Self Adaptive Task Migration)
► Distributed OS developed
► Monitors (CPU load for instance) used
Jun. 22, 2010
29
Network layer (packet switching)
Hardware processing layer
MIPSR3000
RAM NI
Processor 32 bit type MIPS R3000 CPU
No MMU, OS kernel…
Simple Interface memory
gcc4.0.1 cross-compiler
The Network Processor Unit
30CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
T4.5: Distributed reconfigurable PV-robust architectures (LIRM) 2/2
Jun. 22, 2010
30
-1000
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 5 10 15 20 25 30 35 40
Threshold
Threshold-1000
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 5 10 15 20 25 30 35 40
Threshold
Threshold
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
80,00%
90,00%
100,00%
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8 10 12 14 16
FIFO
mon
itori
ng
Thro
ughp
ut (
KB/s
)
Time (s)Throughput IVLC FIFO IQ FIFO IDCT FIFO
REF
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
80,00%
90,00%
100,00%
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8 10 12 14 16
FIFO
mon
itori
ng
Thro
ughp
ut (
KB/s
)
Time (s)Throughput IVLC FIFO IQ FIFO IDCT FIFO
REF
Validation
System C Model, Architecture ModelExploration(Game theory for instance)
Task Migration performances
31CONFIDENTIALMODERN 1st Year ReviewJune 30, 2010
WP4 Summary
All WP4 activities are on track and progressing according to milestones
D4.2.1 and D4.3.1 delivered on time (M12)
All other deliverables on track
Funding situation is not good: Several national public authorities haven’t signed the contract and granted the expected funding
Many WP4 partners are suffering from this situation and even if some activities were initially delayed, the strong commitment of WP4 partners to MODERN kept all WP4 activities and deliverables on track
However, if lack of funding from national Pas will persist in 2010, this will impact on WP4 activities and deliverables
WP4 is delivering innovative and outstanding scientific work with a prompt and timely industrial exploitation and good cooperation among the partners
Jun. 22, 2010
31