Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
Core-Based Testing of Multiprocessor System-on-Chips Utilizing Hierarchical Functional Buses
Fawnizu Azmadi Hussin1
Tomokazu Yoneda1
Alex Orailoglu2
Hideo Fujiwara1
1Nara Institute of Science and Technology, Japan2University of California San Diego, U.S.A.
2
TAM
WRAPPER
SoC Testing with a Dedicated Test Access Mechanism (TAM)
PROCESSOR
MPEG
DSPI/O
DMA
Network Interface
I/O
DRAM
SRAM
DRAM
USBUART
ATE
ATE
3
Functional Bus as a TAM
MPEG
DSPI/O
DMA
Network Interface
I/O
DRAM
SRAM
DRAM
USBUART
PROCESSOR
FUNCTIONALNETWORK
Hussin, et al., ICCD 2006“PAcket Set Scheduling (PASS)”
Bus-based System-on-Chip
Test Buffer
Source of test vector/response:- pseudo-random / deterministic SBTG- loaded progressively through a dedicated memory bus and a DMA
4
Presentation Outline
Buffer-based test support architectureHierarchical bus considerationPower-constrained test schedulingPacket set scheduling (PASS)Multiprocessor PASS (MPPASS)Experimental resultsConclusion
5
Buffer-based Test Architecture
Core (CUT)
PI1 PI2 PIn
n
PO1 PO2 POnPOn+1POn+2
PIn+1 PIn+u
nm
BU
S PRO
TOC
OL
INTER
FAC
E
OUTPUTBUFFER
m
Test / Normal mode
TESTCONTROLLER
From other cores / PIs
To other cores / POs
n
n
nn
INPUTBUFFER
SC2
SC1
SCm
FUN
CTIO
NA
L BU
S
Boundary cell
n
nPIn+1
POn+v
6
Packet Based Test Data Transportation
Buffer A
Functional Bus
Stage 1Stage 2 Stage 1 Stage 2B
uffer B
Bus
Interface
Bus
Interface
Core A
Boundary scan cells
Core B
VBusCore ACore B
Stage 1
Stage 2
Time
R RV V R RV V R RV
Test vector data Test response data
7
C4
C5C19
C17 C18C6 C7 C8
C9 C10
C14 C15 C16C29 C30 C31 C32
C20 C21 C22
b0b1
P0
A
B
C1 C2 C3
B
C23 C24 C25
B
C26
B
B B
B
C27 C28
B
b2
b5 b6
b7
b8
b3
b4
C11 C12 C13
P1
P2
Hierarchical Bus-based MPSoC
Which processor tests which core?What is the test data delivery schedule for each processor?How much buffer is needed for each core?
8
Test Configurations GraphA Test Configuration Graph (TCG) is a processor [test source] – core [test sink] pair that indicates the delivery path on the functional bus(es) for the test data transportation.
bu
bu
Pq
Ci
bu∧ bv bu∨ bv
Pq
bv
bu
B
CiCi
Pq Pq
Pq
Ci
bu
Pq
Ci
bv
Ci
(II) (III)
Ci
Pq
bu bu
Cj
bu
Pq
Ci Cj
(IV)
Ci
Pq
bu bv
Cj
buPq
Ci
Cj
(V)
bv
Types: (I)
9
C4
C5C19
C17 C18C6 C7 C8
C9 C10
C14 C15 C16C29 C30 C31 C32
C20 C21 C22
b0b1
P0
A
B
C1 C2 C3
B
C23 C24 C25
B
C26
B
B B
B
C27 C28
B
b2
b5 b6
b7
b8
b3
b4
C11 C12 C13
P1
P2
p0
b0
C4
p0
b0
C4
p0
b0
C5
p0
b0
C5
p0
b0
C11
p0
b0
C11
p0
b0
C12
p0
b0
C12
p1
b0
C13
b0 ∧ b3
C14
p1
b0
C13
b0 ∧ b3
C14
p1
b5
C6
p1
b5
C6
p1
b5
C7
p1
b5
C7
p0
C1
b0∧
b1
p0
C1
b0∧
b1
p0
C2
b0∧
b1
p0
C2
b0∧
b1
p0
C3
b0∧
b2
p0
C3
b0∧
b2
p1
C14
b0∧
b3
p1
C14
b0∧
b3
p1
C15
b0∧
b3
p1
C15
b0∧
b3
p2
b7
C20
p2
b7
C20
p2
b7
C21
p2
b7
C21
p2
b7
C22
p2
b7
C22
p2
C11
b7∧
b0
p2
C11
b7∧
b0
p2
C12
b7∧
b0
p2
C12
b7∧
b0
p2
C18
b7∧
b6
p2
C18
b7∧
b6
p2
C19
b7∧
b0
p2
C4
b7∧
b0
p2
C4
b7∧
b0
p2
C13
b7 ∧
b0 ∧
b3
C14
b 7∧
b 0p2
C13
b7 ∧
b0 ∧
b3
C14
b 7∧
b 0p2
C20
b2∧
b0 ∧
b1
p2
C20
b2∧
b0 ∧
b1
p2
C27
b2∧
b0 ∧
b1
p2
C27
b2∧
b0 ∧
b1
p0
b0
c19
p0
b0
c19
p1
C16
b0∧
b3
p1
C16
b0∧
b3
Power-Constrained Scheduling (1/2)
EXTRACTION
SoC Architecture
TCGs
…
10
m6
Power-Constrained Scheduling (2/2)
3P3T
3
2P
32 T×
Halve fm3
Pmax
m4
m5m7
m5
Power Dissipation
TAT
Pow
er dissipation
Core test time
Bus: b0 b1 bk
Bus Utilization
P
m6
b0∧ bk
Tm6
“d695” from ITC’02 benchmark
Tm6
11
Test Groups and Subgroups
G0 = {C1, C4, C7, C8, C11, C14, C17, C20 , C21, C23, C27, C29}GP00 = {C1, C4, C11, C14}GP10 = {C20 , C21, C23, C27, C29}GP20 = {C7, C8, C17}
Delivery schedule?Use PASS for each subgroup
C4
C5C19
C17 C18C6 C7 C8
C9 C10
C14 C15 C16C29 C30 C31 C32
C20 C21 C22
b0b1
A
B
C1 C2 C3
B
C23 C24 C25
B
C26
B
B B
BC27 C28
B
b2
b5 b6
b7
b8
b3
b4
C11 C12 C13
P1
P2
P0
12
Packet Set Scheduling (PASS)
C1C2C3C4C5C6C7C8C9C10 Time
1 2 1 1 / 1 ( 1 ) /2 ,1 2 ,1 2 ,1 ,1 ,1 ,11 2 12 , 2 2 , 2 2 , 2 1 ,11 2 2 2 / 2 ( 1 ) /2 ,3 2 ,3 2 ,3 , 2 ,1 ,11 2 22 , 4 2 , 4 2 , 4 1 ,1
1 2 / / /2 , 2 1 2 , 2 1 2 , 2 1 , ,
. . .
. . .
. .
q k r d k rk k k r r r
qk k k
q k r d k rk k k r r r
qk k k
q k r k r k rk k k k k k r r r r
p p p p p pp p p pp p p p p pp p p p
p p p p p
+ + −
+ + −
+− − −
L
L
L
L
M M M M
L / ( 1 ) /,
1 22 , 2 2 , 2 2 , 2 1 ,1
. k r d k rr r
q kk k k k k k
pp p p p
+ −
L
Bus Stage 1
Stage 2
Hussin, et al., ICCD 2006“PAcket Set Scheduling (PASS)”
4 x Split-1
Split-4
Split-8
13
Optimizing PASS MPPASS
C1 C2C3
C4C5C6
C7C8C9 C10
Candidates for scheduling
1 2 1 1 / 1 ( 1 ) /2 ,1 2 ,1 2 ,1 ,1 ,1 ,11 2 12 , 2 2 , 2 2 , 2 1 ,11 2 2 2 / 2 ( 1 ) /2 ,3 2 ,3 2 ,3 , 2 ,1 ,11 2 22 , 4 2 , 4 2 , 4 1 ,1
1 2 / / /2 , 2 1 2 , 2 1 2 , 2 1 , ,
. . .
. . .
. .
q k r d k rk k k r r r
qk k k
q k r d k rk k k r r r
qk k k
q k r k r k rk k k k k k r r r r
p p p p p pp p p pp p p p p pp p p p
p p p p p
+ + −
+ + −
+− − −
L
L
L
L
M M M M
L / ( 1 ) /,
1 22 , 2 2 , 2 2 , 2 1 ,1
. k r d k rr r
q kk k k k k k
pp p p p
+ −
L
Random ordering within subgroup is allowed
14
pq
mi
bu∧ bv
sendingPq
mi
Bpu
Pq owns bus
Bridge owns bus sending
Bridge owns bus
bu Pq
bv Bpv
Bpv sending receiving
Bpu
receiving
receiving
pq
mi
bu∧ bv
pq
mi
bu∧ bv
sendingPq
mi
Bpu
Pq owns bus
Bridge owns bus sending
Bridge owns bus
bu Pq
bv Bpv
Bpv sending receiving
Bpu
receiving
receiving
Packet Delivery (Timing Diagram)
pq
mi
bu
bv
vectorsresponses
arbitrationTime
pq
mi
busendingpq
mi
Bus arbitration periodVector delivery period
Test application period (scan in)
receiving sending receiving
Response return period
Pq PqbuBx or Py
pq
mi
bu
pq
mi
busendingpq
mi
Bus arbitration periodVector delivery period
Test application period (scan in)
receiving sending receiving
Response return period
Pq PqbuBx or Py
15
Packet Delivery and ForwardingTCGsBus activities during delivery
i Rib0
p1
b0
mi
m
m
Rm
m Rm
b0
b1
b2
p1 b0∧
b1∧
b2p1
mj
b0∧
b1
j
j
Rjb0
b1
Rlb0
b1
Rk
p1
b0
mk
b0∧
b1
ml
k, l
mm
C1
Time
Rj
l Rl
Rm
C2
C3
C4
16
Hierarchical Data Delivery (Timing Diagram)
i Ri m
m
Rm
m Rm
j
j
Rj RlRkk, l
Rj l Rl
Rm
C1 C2 C3 C4
t0 Time
b0
b1b2
j
j
Rj
Rj
m
m
Rm
m Rm
Rm
i Ri RlRkk, l
l Rl
C2 C1 C3C4b0
b1b2
j
j
Rj
Rj
m
m
Rm
m Rm
Rm
i Ri RlRkk, l
l Rl
C2 C1 C3C4b0
b1b2
m
m
Rm
m Rm
Rm
j
j
Rj
Rj
RlRkk, l
l Rl
i Ri
C1C2 C3C4b0
b1
b2
m
m
Rm
m Rm
Rm
j
j
Rj
Rj
RlRkk, l
l Rl
i Ri
C1C2 C3C4b0
b1
b2
17
MPPASS Algorithm Flowchart
START
Return best PASS
FINISH
Y
N
NPASS = 0bestTATPS = ∞
More PASS? orNPASS < NPASS,max
•Generate a new PASSnot yet simulated
•Simulate for new PASS.newTATPS = total TAT for k consecutive packet sets
If (newTATPS < bestTATPS) , bestTATPS = newTATPSbestPASS = newPASS
ElseNPASS++
Event-driven simulator “MPSim”
18
Experimental Setup
ITC’02 benchmark circuits
Scan frequency Bus frequency
TAM-based fs = Fs fb = fs < Fb
fb = fs fs = Fs fb = fs < Fb
fb = 2*fs fs = Fs fb = 2 × fs < Fb
IPASS
Fs = Maximum scan frequency (100 MHz)Fb = Maximum bus frequency
fs = Selected scan frequencyfb = Selected bus frequency
19
*TAT in milliseconds
[1] Pouget, et al. (JETTA’05)“Hierarchy constraint”
(a) P@b0 – one processor in level-0 bus(b) P@All – one processor in every bus region
Experimental Results
f b =2*f s f b =2*f s
P max Pouget IPASS IPASS Pouget IPASS IPASS10,000 18.28 18.44 9.04 11.17 8.94 5.3415,000 18.28 17.34 8.85 10.15 8.85 4.7020,000 18.28 17.35 8.89 9.58 8.93 4.5925,000 18.28 17.63 9.07 9.65 9.05 4.7530,000 18.28 17.78 9.08 9.45 9.07 4.67
p93791h1 flat-bus
BW = 32f b = f s
BW = 64f b = f s
P max P@b0 P@All P@b0 P@All P@b0 P@All P@b0 P@All10,000 18.28 26.97 15.51 13.47 7.83 11.17 13.47 7.83 7.13 5.6915,000 18.28 20.15 9.51 10.07 4.83 10.15 10.07 4.83 5.05 3.7920,000 18.28 20.39 7.37 10.20 4.27 9.58 10.21 4.23 5.11 3.5125,000 18.28 18.95 5.31 9.47 3.24 9.65 9.50 3.20 4.72 2.8230,000 18.28 18.89 5.31 9.44 3.24 9.45 9.44 3.20 4.78 2.82
p93791h2 hierarchy f b = f s
Pouget
f b = f s
Pouget
BW = 32f b = 2*f s f b = 2*f s
BW = 64
IPASS IPASS IPASS IPASS
20
Test Time Minimization
0
0.2
0.4
0.6
0.8
1
1.2
0.5 1 1.5 2 2.5 3Bus Frequency ( Scan Frequency (f s ))
TAT
(ms) 32-bit functional TAM
64-bit
×
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.5 1 1.5 2 2.5 3Bus Frequency ( Scan Frequency (f s ))
TAT
(ms)
64-bit
32-bit
×
21
Area Overhead (Buffer)
Buffer sizes per SoC core (#FF) averaged over all PmaxBus width = 32 bits
Circuit p93791h1 p93791h2 p22810h1 p22810h2
Min. 99.20 89.79 106.06 107.65
Max. 99.39 98.00 112.00 113.15
22
Conclusions
An integrated test scheduling for flat-bus and hierarchical bus SoCOffers a test methodology for SoCs with
multiple embedded processorsflat or hierarchical buses
Experimental results illustrate the potential of the proposed approach
Relax dedicated TAM requirementsMinimize TAT