39
1 WMC8 June 2007 Membrane Computing in Connex Environment Membrane Computing in the Connex Environment Gheorghe Stefan BrightScale Inc., Sunnyvale, CA & Politehnica University of Bucharest [email protected]

Membrane Computing in Connex Environment 1WMC8 June 2007 Membrane Computing in the Connex Environment Gheorghe Stefan BrightScale Inc., Sunnyvale, CA &

Embed Size (px)

Citation preview

1WMC8 June 2007

Membrane Computing in Connex Environment

Membrane Computingin the

Connex Environment

Gheorghe Stefan

BrightScale Inc., Sunnyvale, CA & Politehnica University of Bucharest

[email protected]

2WMC8 June 2007

Membrane Computing in Connex Environment

Outline

Integral Parallel Architecture

The Connex Chip

The Connex Architecture

How to Use the Connex Environment

Concluding Remarks

3WMC8 June 2007

Membrane Computing in Connex Environment

Integral Parallel Architecture

• The Ubiquitousness of Parallelism Asks for Integral Parallel Architectures

• Partial Recursive Functions & Parallel Computation

• A Functional Taxonomy of Parallel Computation

4WMC8 June 2007

Membrane Computing in Connex Environment

Parallelism can not be avoided anymore Intel’s approach

• Multi-processors: • the best approach for multi-threading on MIMD architecture • Inefficient on SIMD architecture• Ignores the MISD architecture

• Many-processors asking for another taxonomy• They work as accelerators• They perform critical functions

Berkeley’s 13 dwarfs is a functional approach for many-processors

Real applications ask for all kind of parallelism to solve corner cases – the places where the devil hides

5WMC8 June 2007

Membrane Computing in Connex Environment

Partial Recursive Functions &

Parallel Computation

• Composition Rule & the Basic Parallel Structures

• Primitive Recursive Rule

• Minimalization Rule

6WMC8 June 2007

Membrane Computing in Connex Environment

Composition & the Associated Structure

f(x0, … xn-1) = g(h0(x0, … xn-1), h1(x0, … xn-1), … hm-1(x0, … xn-1))

x0, … xn-1

. . .

. . .

f(x0, … xn-1)

h0h0 h1

h1 hm-1hm-1

g(h0, h1, … hm-1)g(h0, h1, … hm-1)

7WMC8 June 2007

Membrane Computing in Connex Environment

Data Parallel Composition

X = {x0, … xn-1} {h(x0), h(x0), … h(x0)}

x0 x1 xn-1

. . .

h(x0) h(x1) h(xn-1)

hh hh hh

8WMC8 June 2007

Membrane Computing in Connex Environment

Speculative Composition

function vector: H = [h0, h1, … hn-1], scalar: x

H(x) = {h0(x), h1(x) … hn-1(x)}

x

. . .

h0(x) h1(x) hn-1(x)

h0h0 h1

h1 hm-1hm-1

9WMC8 June 2007

Membrane Computing in Connex Environment

Serial Composition

f(x) = g(h(x))

x

Time parallelism

The general case: f(x) = g1(g2( g3( … gp(x) …)))

f(x)

hh

g(h(x))g(h(x))

10WMC8 June 2007

Membrane Computing in Connex Environment

Reduction Composition

f(x0, … xm-1) = g(x0, … xm-1)

x0 x1 … xm-1

g(x0, … xm-1)

g(x0, x1, … xm-1)g(x0, x1, … xm-1)

11WMC8 June 2007

Membrane Computing in Connex Environment

Primitive recursive rulef(x,y) = h(x, f(x, y-1)), where f(x,0) = g(x)

f(x,y) = h(x, h(x, h(x, … h(x, g(x) )… )))

Parallel solution makes sense only if the function must becomputed many times.

Implementations:

1. Data parallel composition

2. Loop in a serial composition

12WMC8 June 2007

Membrane Computing in Connex Environment

Data Parallel Composition for the Primitive Recursive Rule

x, Y = {y0, … yn-1} {f(x,y0), f(x,y1), … f(x,yn-1)}

(x, y0) (x, y1) (x, yn-1)

. . .

f(x, y0) f(x, y1) f(x, yn-1)

hh hh hh

13WMC8 June 2007

Membrane Computing in Connex Environment

Serial Composition for the Primitive Recursive Rule

x, <Y> = <y0, … yn-1> <F> = <f(x,y0), f(x,y1), … f(x,yp-1)>

x, <Y> . . . <F>

hh hh hhselsel

14WMC8 June 2007

Membrane Computing in Connex Environment

Minimalization rule

f(x) = min(y)[m(x,y) = 0]

Implementations:

1. Speculative composition & reduction composition

2. Serial composition & reduction composition

15WMC8 June 2007

Membrane Computing in Connex Environment

Speculative Composition & Reduction Composition for Minimalization

x

. . .

. . .

{m(x,0), 0} {m(x,n-1), n-1}

f(x) = i

m(x,0)m(x,0) m(x,1)m(x,1) m(x,n-1)m(x,n-1)

first{0, i}first{0, i}

16WMC8 June 2007

Membrane Computing in Connex Environment

Serial Composition & Reduction Composition for Minimalization

yi-1 yi-2 … yi-s

selection code

yi (Pi: the i-th pipe stage)Example of dynamic reconfiguration

Pi-5Pi-5

selectorselector

fifi

Pi-1Pi-1 Pi-2

Pi-2 Pi-3Pi-3 Pi-4

Pi-4

PiPi

17WMC8 June 2007

Membrane Computing in Connex Environment

Functional Taxonomy of Parallel Computation

• Data Parallel Computation: uses SIMD-like machines

• Time Parallel Computation: is a very special sort of MIMD used to compute only one function

• Speculative Computation: is MISD machine completely ignored by the actual implementations

18WMC8 June 2007

Membrane Computing in Connex Environment

Integral Parallel Architecture

An Integral Parallel Architecture (IPA) uses all kinds of parallelism to build a real machine, in two versions:

• complex IPA: all types of parallel mechanisms tightly interleaved on the same physical structure (pipelined superscalar speculative general purpose processors)

• intensive IPA: all types of parallel mechanisms highly separated, implemented on specific physical structures (accelerators for embedded computation in a SoC approach)

19WMC8 June 2007

Membrane Computing in Connex Environment

Intensive IPA

Intensive IPA are used as accelerators for complex IPA

1. Monolithic intensive IPA: the same machine works in two modes:• Data parallel• Time parallel

2. Segregated intensive IPA: two distinct machines are used, one for data parallel computation and the other for time parallel (i.e. speculative) computation

20WMC8 June 2007

Membrane Computing in Connex Environment

The Connex Chip

The organization of BA1024:

• multi-core area of 4 MIPS

• many-core data parallel area of 1024 simple PEs

• speculative time parallel pipe of 8 PEs

• interfaces (DDR, PCI, video & audio interfaces for 2 HDTV channels)

21WMC8 June 2007

Membrane Computing in Connex Environment

The Connex System

1

I/OController(4KB data

&4KB

programmemory)

Connex Array

Connex Array:1,024 linearly connected 16-bit Processing Cells Sequencer:32-bit stack machine & program memory & data memory issues in each cycle (on a 2-stage pipe) one 64-bit instruction for Connex Array and a 24-bit instruction for itselfIO Controller:32-bit stack machine controls a 3.2 GB/s IO channelProcessing Cell:Integer unit & data memory & Boolean unit

I/O channel works in parallel with code running on the Connex Array

ConnexI/O

AUX

16-bitRAM For data

Address

BooleanIndex

16 bitALU

Sequencer (4KB data & 32Kb program memory)

255

R0R1

01

254

R2R3R4R5R6R7

22WMC8 June 2007

Membrane Computing in Connex Environment

16 bitALU

16 bitALU

16 bitALU

Connex Array Structure

Processing Cells are linearly connected using only the register R0

IO Plan consists in all R1s supervised mainly by the IO Controller

Conditional execution based on the state of Boolean unit

Integer unit, Boolean unit and Data memory execute in each cycle command fields from a 64-bit instruction issued by Sequencer

Vector reduction operations with scalar results in the TOS of Sequencer (receiving through a 3-stage pipe data from the array of cells)

255254

255

R0R1

01

254

R2R3R4R5R6R7

off1023

on

R0R1

01

0on

1

R2R3R4R5R6R7

255

R0R1

01

254

R2R3R4R5R6R7

23WMC8 June 2007

Membrane Computing in Connex Environment

I/O System

I/O Plane

Connex Array

IOC

Switch Fabric (128-bit word)

IS

Interrupts

DDR-DRAMController

DRAMDRAM

DRAMDRAM

24WMC8 June 2007

Membrane Computing in Connex Environment

Co

nfi

gu

rab

le S

wit

ch

Fa

bri

c

Configurable Switch Fabric

Au

dio

Ou

tV

ide

oO

ut

Vid

eo

Ou

t

HOSTI/F

Au

dio

Ou

t

Ext.Bus

Au

dio

InA

ud

ioIn

Vid

eo

InV

ide

oIn

Test ICE

PCI v2.2or

Generic

64-bit Wide DRAM

1x-I2S

4xI2S

BT.656/1120

BT.656/1120

Flash

1x-I2S

BT.656/1120

1x-I2S

BT.656/1120

DDR-DRAM Ctrl(400 MHz Data Rate)

EJTAGGPIO I2C

S/PDIF

StreamAccelerator

HostCPU

Audio CPU

TS/SecCPU

VideoCPU

Instruction Sequencer

Co

nfi

gu

rab

le S

wit

ch

Fa

bri

c

Test

I/O

S

equ

ence

r

ConnexArray™Programmable Media Processor

Multi-Codec ProcessingPre-Analysis

3D FilterScaling

Video Merge/BlendMotion Adaptive De-interlacing

BA1024

Configurable Switch Fabric

25WMC8 June 2007

Membrane Computing in Connex Environment

The Connex Architecture

• Vectors & selections

• Programming Connex

• Performances

26WMC8 June 2007

Membrane Computing in Connex Environment

Vectors & Selections

• Linear array of processing elements vectors

• Local data memory in each processing element array of vectors

• Data dependency operations at the level of each processing element selections

27WMC8 June 2007

Membrane Computing in Connex Environment

Full Line Operations

0

255

0 1023

Line i

Line k

Line j

+, -, *, XOR, etc.

=

Line k = Line i OP Line j Line k = Line i OP scalar value (repeated for all elements)

16-bit data operand

28WMC8 June 2007

Membrane Computing in Connex Environment

Columns Active Based On Repeating Patterns

0

255

0 1023

Line i

Line k

Line j

+, -, *, XOR, etc.

=

Mark all odd columns active. Or mark every third column active. Or mark every third and fourth column active, etc.

29WMC8 June 2007

Membrane Computing in Connex Environment

Columns Active Based On Data Content

0

255

0 1023

Line i

Line k

Line j

+, -, *, XOR, etc.

=

Apparently random columns are active, marked, based on data-dependent results of previous operations.

30WMC8 June 2007

Membrane Computing in Connex Environment

0

255

0 1023

Line i

Line j

Example: 128 sets of 8x8 run in parallel in a 1024-cell array

7

7

8x8 8x8 8x8 8x8

Outer-Loop Parallelism

……..

31WMC8 June 2007

Membrane Computing in Connex Environment

Programming Connex

VectorC is an extension/restriction of C++

Code that operates on scalar data written in regular C notation

Connex-specific operators defined as functions for features not available in C++, e.g. operations on vectors and selections (Boolean vectors)

VectorC uses sequential operators and

control structures on vector and select data-types

Using VectorC the Connex Machine is programmed the same way as conventional sequential machines

int main() {vector V1 = 2; // V1 = {2, 2, … 2}vector V2 = 3; // V2 = {3, 3, … 3}vector V; // V = {0, 0, … 0}vector Index = indexvector(); // Index = {0, 1, … }V = mm_absdiff(V1, V2); // V = {1,1, … 1}return 0;

}

// Find the absolute difference between two vectorsvector mm_absdiff(vector V1, vector V2) {

vector V;V = V1 - V2;WHERE (V < 0) {

V = -V; // V = abs(V);}ENDWreturn V;

}

Vectors are arrays of scalar components.

Selections are arrays of Boolean values that dictate what vector components are active.

32WMC8 June 2007

Membrane Computing in Connex Environment

Overall performances of BA1024

200 GOP/sec

3.2 GB/sec: external bandwidth

400 GB/sec: internal bandwidth

> 60 GOP/Watt

> 2 GOP/mm2

Note: 1 OP = 16-bit simple integer operation (excluding multiplication)

33WMC8 June 2007

Membrane Computing in Connex Environment

How to Use the Connex Environmentfor Membrane Computation

Example (G. Paun):

• the initial configuration: [1[2[3a f c]3 ]2 ]1...

• R1: e (e, out), f f

• R2: b d, d de, ff f, cf cdδ

• R3: a ab, a bδ, f ff

34WMC8 June 2007

Membrane Computing in Connex Environment

The first example of processing

Initial vector: (1,[) (2,[) (3,[) (0,a) (0,f) (0,c) (3,]) (2,]) (1,]) ... [[[a f c] ] ]... a ab, f ff: [[[a b f f c] ] ]... // 11 clock cyclesa ab, f ff: [[[a b b f f f f c] ] ]... // 15 clock cyclesa bδ, f ff: [[b b b f f f f f f f f c ] ]... // 27 clock cyclesb d, ff f: [[d d d f f f f c ] ]... // 10 clock cyclesd de, ff f: [[d e d e d e f f c ] ]... // 10 clock cyclesd de, cf cdδ: [d e e d e e d e e d f c ]... // 10 clock cyclese (e, out), f f: [d d d d f c ] e e e e e e... // 15 clock cycles

total: 98 clock cycles

35WMC8 June 2007

Membrane Computing in Connex Environment

The second example of processing

Initial vector:(1,[) (2,[) (3,[) (1,a) (1,f) (1,c) (3,]) (2,]) (1,]) ...

[[[1a 1f 1c] ] ]... [[[1a 1b 2f 1c] ] ]... // in 5 clock cycles [[[1a 2b 4f 1c] ] ]... // in 5 clock cycles [[3b 8f 1c ] ]... // in 10 clock cycles [[3d 4f 1c ] ]... // in 7 clock cycles [[3d 3e 2f 1c ] ]... // in 8 clock cycles [4d 3e 1f 1c ]... // in 8 clock cycles [4d 1f 1c ] 3e... // in 5 clock cycles total: 48 clock cycles

36WMC8 June 2007

Membrane Computing in Connex Environment

The third example of processing

The third membrane is duplicated (multiplicated), but the content can be different

[[[1a 1f 1c] [2a 1f 1c] ] ]... [[[1a 1b 2f 1c] [2a 2b 2f 1c] ] ]... // in 5 clock cycles [[[1a 2b 4f 1c] [2a 4b 4f 1c] ] ]... // in 5 clock cycles [[3b 8f 1c 6b 8f 1c ] ]... // in 10 clock cycles [[3d 4f 1c 6d 4f 1c ] ]... // in 7 clock cycles [[3d 3e 2f 1c 6d 6e 2f 1c ] ]... // in 8 clock cycles [4d 3e 1f 1c 7d 6e 1f 1c]... // in 8 clock cycles [4d 1f 1c 7d 1f 1c ] 9e... // in 10 clock cycles total: 53 clock cycles

For up to 200 level 3 membranes the number of clock cycles remains 53.

37WMC8 June 2007

Membrane Computing in Connex Environment

Concluding Remarks

1. Functional taxonomy vs. Flynn taxonomy

2. Connex architecture accelerates membrane computation

3. An efficient P-architecture asks for few additional features to the Connex architecture

4. Why not a P-language?

38WMC8 June 2007

Membrane Computing in Connex Environment

Main technical contributors to the Connex project:

Emanuele Altieri, BrightScale Inc., CALazar Bivolarski, BrightScale Inc., CAFrank Ho, BrightScale Inc., CAMihaela Malita, St. Anselm College, NHBogdan Mitu, BrightScale Inc., CADominique Thiebaut, Smith College, MATom Thomson, BrightScale Inc., CADan Tomescu, BrightScale Inc., CA

39WMC8 June 2007

Membrane Computing in Connex Environment

Thank You

Mihaela’s webpage on VectorCwww.anselm.edu/homepage/mmalita/

Q&A