Upload
buck-perkins
View
216
Download
1
Embed Size (px)
Citation preview
1WMC8 June 2007
Membrane Computing in Connex Environment
Membrane Computingin the
Connex Environment
Gheorghe Stefan
BrightScale Inc., Sunnyvale, CA & Politehnica University of Bucharest
2WMC8 June 2007
Membrane Computing in Connex Environment
Outline
Integral Parallel Architecture
The Connex Chip
The Connex Architecture
How to Use the Connex Environment
Concluding Remarks
3WMC8 June 2007
Membrane Computing in Connex Environment
Integral Parallel Architecture
• The Ubiquitousness of Parallelism Asks for Integral Parallel Architectures
• Partial Recursive Functions & Parallel Computation
• A Functional Taxonomy of Parallel Computation
4WMC8 June 2007
Membrane Computing in Connex Environment
Parallelism can not be avoided anymore Intel’s approach
• Multi-processors: • the best approach for multi-threading on MIMD architecture • Inefficient on SIMD architecture• Ignores the MISD architecture
• Many-processors asking for another taxonomy• They work as accelerators• They perform critical functions
Berkeley’s 13 dwarfs is a functional approach for many-processors
Real applications ask for all kind of parallelism to solve corner cases – the places where the devil hides
5WMC8 June 2007
Membrane Computing in Connex Environment
Partial Recursive Functions &
Parallel Computation
• Composition Rule & the Basic Parallel Structures
• Primitive Recursive Rule
• Minimalization Rule
6WMC8 June 2007
Membrane Computing in Connex Environment
Composition & the Associated Structure
f(x0, … xn-1) = g(h0(x0, … xn-1), h1(x0, … xn-1), … hm-1(x0, … xn-1))
x0, … xn-1
. . .
. . .
f(x0, … xn-1)
h0h0 h1
h1 hm-1hm-1
g(h0, h1, … hm-1)g(h0, h1, … hm-1)
7WMC8 June 2007
Membrane Computing in Connex Environment
Data Parallel Composition
X = {x0, … xn-1} {h(x0), h(x0), … h(x0)}
x0 x1 xn-1
. . .
h(x0) h(x1) h(xn-1)
hh hh hh
8WMC8 June 2007
Membrane Computing in Connex Environment
Speculative Composition
function vector: H = [h0, h1, … hn-1], scalar: x
H(x) = {h0(x), h1(x) … hn-1(x)}
x
. . .
h0(x) h1(x) hn-1(x)
h0h0 h1
h1 hm-1hm-1
9WMC8 June 2007
Membrane Computing in Connex Environment
Serial Composition
f(x) = g(h(x))
x
Time parallelism
The general case: f(x) = g1(g2( g3( … gp(x) …)))
f(x)
hh
g(h(x))g(h(x))
10WMC8 June 2007
Membrane Computing in Connex Environment
Reduction Composition
f(x0, … xm-1) = g(x0, … xm-1)
x0 x1 … xm-1
g(x0, … xm-1)
g(x0, x1, … xm-1)g(x0, x1, … xm-1)
11WMC8 June 2007
Membrane Computing in Connex Environment
Primitive recursive rulef(x,y) = h(x, f(x, y-1)), where f(x,0) = g(x)
f(x,y) = h(x, h(x, h(x, … h(x, g(x) )… )))
Parallel solution makes sense only if the function must becomputed many times.
Implementations:
1. Data parallel composition
2. Loop in a serial composition
12WMC8 June 2007
Membrane Computing in Connex Environment
Data Parallel Composition for the Primitive Recursive Rule
x, Y = {y0, … yn-1} {f(x,y0), f(x,y1), … f(x,yn-1)}
(x, y0) (x, y1) (x, yn-1)
. . .
f(x, y0) f(x, y1) f(x, yn-1)
hh hh hh
13WMC8 June 2007
Membrane Computing in Connex Environment
Serial Composition for the Primitive Recursive Rule
x, <Y> = <y0, … yn-1> <F> = <f(x,y0), f(x,y1), … f(x,yp-1)>
x, <Y> . . . <F>
hh hh hhselsel
14WMC8 June 2007
Membrane Computing in Connex Environment
Minimalization rule
f(x) = min(y)[m(x,y) = 0]
Implementations:
1. Speculative composition & reduction composition
2. Serial composition & reduction composition
15WMC8 June 2007
Membrane Computing in Connex Environment
Speculative Composition & Reduction Composition for Minimalization
x
. . .
. . .
{m(x,0), 0} {m(x,n-1), n-1}
f(x) = i
m(x,0)m(x,0) m(x,1)m(x,1) m(x,n-1)m(x,n-1)
first{0, i}first{0, i}
16WMC8 June 2007
Membrane Computing in Connex Environment
Serial Composition & Reduction Composition for Minimalization
yi-1 yi-2 … yi-s
selection code
yi (Pi: the i-th pipe stage)Example of dynamic reconfiguration
Pi-5Pi-5
selectorselector
fifi
Pi-1Pi-1 Pi-2
Pi-2 Pi-3Pi-3 Pi-4
Pi-4
PiPi
17WMC8 June 2007
Membrane Computing in Connex Environment
Functional Taxonomy of Parallel Computation
• Data Parallel Computation: uses SIMD-like machines
• Time Parallel Computation: is a very special sort of MIMD used to compute only one function
• Speculative Computation: is MISD machine completely ignored by the actual implementations
18WMC8 June 2007
Membrane Computing in Connex Environment
Integral Parallel Architecture
An Integral Parallel Architecture (IPA) uses all kinds of parallelism to build a real machine, in two versions:
• complex IPA: all types of parallel mechanisms tightly interleaved on the same physical structure (pipelined superscalar speculative general purpose processors)
• intensive IPA: all types of parallel mechanisms highly separated, implemented on specific physical structures (accelerators for embedded computation in a SoC approach)
19WMC8 June 2007
Membrane Computing in Connex Environment
Intensive IPA
Intensive IPA are used as accelerators for complex IPA
1. Monolithic intensive IPA: the same machine works in two modes:• Data parallel• Time parallel
2. Segregated intensive IPA: two distinct machines are used, one for data parallel computation and the other for time parallel (i.e. speculative) computation
20WMC8 June 2007
Membrane Computing in Connex Environment
The Connex Chip
The organization of BA1024:
• multi-core area of 4 MIPS
• many-core data parallel area of 1024 simple PEs
• speculative time parallel pipe of 8 PEs
• interfaces (DDR, PCI, video & audio interfaces for 2 HDTV channels)
21WMC8 June 2007
Membrane Computing in Connex Environment
The Connex System
1
I/OController(4KB data
&4KB
programmemory)
Connex Array
Connex Array:1,024 linearly connected 16-bit Processing Cells Sequencer:32-bit stack machine & program memory & data memory issues in each cycle (on a 2-stage pipe) one 64-bit instruction for Connex Array and a 24-bit instruction for itselfIO Controller:32-bit stack machine controls a 3.2 GB/s IO channelProcessing Cell:Integer unit & data memory & Boolean unit
I/O channel works in parallel with code running on the Connex Array
ConnexI/O
AUX
16-bitRAM For data
Address
BooleanIndex
16 bitALU
Sequencer (4KB data & 32Kb program memory)
255
R0R1
01
254
R2R3R4R5R6R7
22WMC8 June 2007
Membrane Computing in Connex Environment
16 bitALU
16 bitALU
16 bitALU
Connex Array Structure
Processing Cells are linearly connected using only the register R0
IO Plan consists in all R1s supervised mainly by the IO Controller
Conditional execution based on the state of Boolean unit
Integer unit, Boolean unit and Data memory execute in each cycle command fields from a 64-bit instruction issued by Sequencer
Vector reduction operations with scalar results in the TOS of Sequencer (receiving through a 3-stage pipe data from the array of cells)
255254
255
R0R1
01
254
R2R3R4R5R6R7
off1023
on
R0R1
01
0on
1
R2R3R4R5R6R7
255
R0R1
01
254
R2R3R4R5R6R7
23WMC8 June 2007
Membrane Computing in Connex Environment
I/O System
I/O Plane
Connex Array
IOC
Switch Fabric (128-bit word)
IS
Interrupts
DDR-DRAMController
DRAMDRAM
DRAMDRAM
24WMC8 June 2007
Membrane Computing in Connex Environment
Co
nfi
gu
rab
le S
wit
ch
Fa
bri
c
Configurable Switch Fabric
Au
dio
Ou
tV
ide
oO
ut
Vid
eo
Ou
t
HOSTI/F
Au
dio
Ou
t
Ext.Bus
Au
dio
InA
ud
ioIn
Vid
eo
InV
ide
oIn
Test ICE
PCI v2.2or
Generic
64-bit Wide DRAM
1x-I2S
4xI2S
BT.656/1120
BT.656/1120
Flash
1x-I2S
BT.656/1120
1x-I2S
BT.656/1120
DDR-DRAM Ctrl(400 MHz Data Rate)
EJTAGGPIO I2C
S/PDIF
StreamAccelerator
HostCPU
Audio CPU
TS/SecCPU
VideoCPU
Instruction Sequencer
Co
nfi
gu
rab
le S
wit
ch
Fa
bri
c
Test
I/O
S
equ
ence
r
ConnexArray™Programmable Media Processor
Multi-Codec ProcessingPre-Analysis
3D FilterScaling
Video Merge/BlendMotion Adaptive De-interlacing
BA1024
Configurable Switch Fabric
25WMC8 June 2007
Membrane Computing in Connex Environment
The Connex Architecture
• Vectors & selections
• Programming Connex
• Performances
26WMC8 June 2007
Membrane Computing in Connex Environment
Vectors & Selections
• Linear array of processing elements vectors
• Local data memory in each processing element array of vectors
• Data dependency operations at the level of each processing element selections
27WMC8 June 2007
Membrane Computing in Connex Environment
Full Line Operations
0
255
0 1023
Line i
Line k
Line j
+, -, *, XOR, etc.
=
Line k = Line i OP Line j Line k = Line i OP scalar value (repeated for all elements)
16-bit data operand
28WMC8 June 2007
Membrane Computing in Connex Environment
Columns Active Based On Repeating Patterns
0
255
0 1023
Line i
Line k
Line j
+, -, *, XOR, etc.
=
Mark all odd columns active. Or mark every third column active. Or mark every third and fourth column active, etc.
29WMC8 June 2007
Membrane Computing in Connex Environment
Columns Active Based On Data Content
0
255
0 1023
Line i
Line k
Line j
+, -, *, XOR, etc.
=
Apparently random columns are active, marked, based on data-dependent results of previous operations.
30WMC8 June 2007
Membrane Computing in Connex Environment
0
255
0 1023
Line i
Line j
Example: 128 sets of 8x8 run in parallel in a 1024-cell array
7
7
8x8 8x8 8x8 8x8
Outer-Loop Parallelism
……..
31WMC8 June 2007
Membrane Computing in Connex Environment
Programming Connex
VectorC is an extension/restriction of C++
Code that operates on scalar data written in regular C notation
Connex-specific operators defined as functions for features not available in C++, e.g. operations on vectors and selections (Boolean vectors)
VectorC uses sequential operators and
control structures on vector and select data-types
Using VectorC the Connex Machine is programmed the same way as conventional sequential machines
int main() {vector V1 = 2; // V1 = {2, 2, … 2}vector V2 = 3; // V2 = {3, 3, … 3}vector V; // V = {0, 0, … 0}vector Index = indexvector(); // Index = {0, 1, … }V = mm_absdiff(V1, V2); // V = {1,1, … 1}return 0;
}
// Find the absolute difference between two vectorsvector mm_absdiff(vector V1, vector V2) {
vector V;V = V1 - V2;WHERE (V < 0) {
V = -V; // V = abs(V);}ENDWreturn V;
}
Vectors are arrays of scalar components.
Selections are arrays of Boolean values that dictate what vector components are active.
32WMC8 June 2007
Membrane Computing in Connex Environment
Overall performances of BA1024
200 GOP/sec
3.2 GB/sec: external bandwidth
400 GB/sec: internal bandwidth
> 60 GOP/Watt
> 2 GOP/mm2
Note: 1 OP = 16-bit simple integer operation (excluding multiplication)
33WMC8 June 2007
Membrane Computing in Connex Environment
How to Use the Connex Environmentfor Membrane Computation
Example (G. Paun):
• the initial configuration: [1[2[3a f c]3 ]2 ]1...
• R1: e (e, out), f f
• R2: b d, d de, ff f, cf cdδ
• R3: a ab, a bδ, f ff
34WMC8 June 2007
Membrane Computing in Connex Environment
The first example of processing
Initial vector: (1,[) (2,[) (3,[) (0,a) (0,f) (0,c) (3,]) (2,]) (1,]) ... [[[a f c] ] ]... a ab, f ff: [[[a b f f c] ] ]... // 11 clock cyclesa ab, f ff: [[[a b b f f f f c] ] ]... // 15 clock cyclesa bδ, f ff: [[b b b f f f f f f f f c ] ]... // 27 clock cyclesb d, ff f: [[d d d f f f f c ] ]... // 10 clock cyclesd de, ff f: [[d e d e d e f f c ] ]... // 10 clock cyclesd de, cf cdδ: [d e e d e e d e e d f c ]... // 10 clock cyclese (e, out), f f: [d d d d f c ] e e e e e e... // 15 clock cycles
total: 98 clock cycles
35WMC8 June 2007
Membrane Computing in Connex Environment
The second example of processing
Initial vector:(1,[) (2,[) (3,[) (1,a) (1,f) (1,c) (3,]) (2,]) (1,]) ...
[[[1a 1f 1c] ] ]... [[[1a 1b 2f 1c] ] ]... // in 5 clock cycles [[[1a 2b 4f 1c] ] ]... // in 5 clock cycles [[3b 8f 1c ] ]... // in 10 clock cycles [[3d 4f 1c ] ]... // in 7 clock cycles [[3d 3e 2f 1c ] ]... // in 8 clock cycles [4d 3e 1f 1c ]... // in 8 clock cycles [4d 1f 1c ] 3e... // in 5 clock cycles total: 48 clock cycles
36WMC8 June 2007
Membrane Computing in Connex Environment
The third example of processing
The third membrane is duplicated (multiplicated), but the content can be different
[[[1a 1f 1c] [2a 1f 1c] ] ]... [[[1a 1b 2f 1c] [2a 2b 2f 1c] ] ]... // in 5 clock cycles [[[1a 2b 4f 1c] [2a 4b 4f 1c] ] ]... // in 5 clock cycles [[3b 8f 1c 6b 8f 1c ] ]... // in 10 clock cycles [[3d 4f 1c 6d 4f 1c ] ]... // in 7 clock cycles [[3d 3e 2f 1c 6d 6e 2f 1c ] ]... // in 8 clock cycles [4d 3e 1f 1c 7d 6e 1f 1c]... // in 8 clock cycles [4d 1f 1c 7d 1f 1c ] 9e... // in 10 clock cycles total: 53 clock cycles
For up to 200 level 3 membranes the number of clock cycles remains 53.
37WMC8 June 2007
Membrane Computing in Connex Environment
Concluding Remarks
1. Functional taxonomy vs. Flynn taxonomy
2. Connex architecture accelerates membrane computation
3. An efficient P-architecture asks for few additional features to the Connex architecture
4. Why not a P-language?
38WMC8 June 2007
Membrane Computing in Connex Environment
Main technical contributors to the Connex project:
Emanuele Altieri, BrightScale Inc., CALazar Bivolarski, BrightScale Inc., CAFrank Ho, BrightScale Inc., CAMihaela Malita, St. Anselm College, NHBogdan Mitu, BrightScale Inc., CADominique Thiebaut, Smith College, MATom Thomson, BrightScale Inc., CADan Tomescu, BrightScale Inc., CA