54
November 18, 2005 PACL and ASC Processor Rese arch Overview 1 Research Overview Parallel and Associative Computing Group and the ASC Processor Group Kent State University Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus), Michael Scherger, Wittaya Chantamas, Hong Wang Sabegh Singh Virdi, Shannon Steinfadt, Kevin Schaffer Department of Computer Science Kent State University Kent, Ohio

Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

  • Upload
    airlia

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Research Overview Parallel and Associative Computing Group and the ASC Processor Group Kent State University. Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus), Michael Scherger, Wittaya Chantamas, Hong Wang Sabegh Singh Virdi, Shannon Steinfadt, Kevin Schaffer - PowerPoint PPT Presentation

Citation preview

Page 1: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

November 18, 2005 PACL and ASC Processor Research Overview

1

Research Overview Parallel and Associative Computing Groupand theASC Processor Group

Kent State University

Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus), Michael Scherger, Wittaya Chantamas, Hong Wang

Sabegh Singh Virdi, Shannon Steinfadt, Kevin Schaffer

Department of Computer Science

Kent State University

Kent, Ohio

Page 2: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

2November 18, 2005

Presentation Outline

Associative and Parallel Algorithms and Applications Design of an Associative Parallel Database Server –

Michael Scherger Longest Common Subsequence Algorithm on ASC

Processors Using a Coterie Network – Sabegh Singh Virdi

Shannon’s Presentation Title Goes Here… Multithreaded ASC – Kevin Schaffer

Page 3: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

3November 18, 2005

MASC Parallel Database Server

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

ASHDGALTLONLATAIDRBusy / Idle

ASHDGALTLONLATAIDRBusy / Idle

330907084.1339.54CO2341TT

……………………

F

F

F

F

30531529082.5340.0AA1223FT

525015081.5141.24UA722FT

2752256084.3139.9AA123FT

450027081.2640.55CO56TT

330907084.1339.54CO2341TT

……………………

F

F

F

F

30531529082.5340.0AA1223FT

525015081.5141.24UA722FT

2752256084.3139.9AA123FT

450027081.2640.55CO56TT

Page 4: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

4November 18, 2005

MASC Parallel Database Server

Database is resident in parallel array memory.

Parallel memory map is similar to table layout.

Size of parallel array is fixed during execution. Only used physical PEs.

PE memory size is fixed.

Table records are placed in the array memory in “folds”.

Parallel array memory is allocated dynamically.

Coalescing parallel memory manager.

Tables will also have a set of parallel control bits.

Page 5: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

5November 18, 2005

MASC Parallel Database Server

Table A1 Table A2

Database

Table List

Fold ListField List

Light shaded regions are empty records for table A

Dark shaded regions are

occupied records for table A

Stores relative

offsets of table fields

Stores base memory address

of each fold

Page 6: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

6November 18, 2005

Multiple Tables and Folds

Table A1 Table B1 Table B2 Free Parallel Memory

Page 7: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

7November 18, 2005

Parallel Database Operations

SQL INSERT Algorithm Perform associative search on

BI field to find first/next cell for insert.

If none found, create new fold.

Copy data from sequential to parallel memory.

O(1)…(O(#folds)) Uses broadcast / reduction

network.

Table A1 Table A2

INSERT INTO FLIGHTS( AID, LAT, LON, ALT, AS )

VALUES( ‘CO128’, 43.39, 83.67, 190, 450 )

Page 8: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

8November 18, 2005

Parallel Database Operations

SQL DELETE Algorithm Perform associative search on

WHERE clause. (Flag responders)

Reset BI field where responder flag is set true. Data remains in memory.

If all PE records are idle, CPMM marks block as free.

O(1)…(O(#folds))

Table A1 Table A2

DELETE FROM FLIGHTS

WHERE AID = ‘NW 545’

Page 9: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

9November 18, 2005

Future Work

MASC Parallel Database Server Change the parallel memory manager to support

multiple instruction streams

Table ATable BTable A x B

Free Parallel Memory

Internally Fragmented Parallel Memory

Page 10: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

November 18, 2005 PACL and ASC Processor Research Overview

10

Longest Common Subsequence Algorithm on ASC Processors using

Coterie Network

Sabegh Singh Virdi

ASC Processor GroupComputer Science Department

Kent State University

Page 11: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

11November 18, 2005

Presentation Outline

Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network

Exact match Approximate match

Summary and Future work

Page 12: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

12November 18, 2005

String Matching

One of the most fundamental operation in computing.

Comparing two linear arrays of character Application in bioinformatics, searching

genetic databases String involved are how ever enormous,

efficient string processing is therefore a requirement

Page 13: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

13November 18, 2005

String Matching Variations

Is Exact match the only solution? What if the pattern does not occur in the text? It still makes sense to find the longest

subsequence that occurs both in the pattern and in the text. This is the longest common subsequence problem

Longest Common Subsequence, Longest Common Substring, Sequence alignment, Edit distance Problem are all variation of SM problem

Page 14: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

14November 18, 2005

Presentation Outline

String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network

Exact match Approximate match

Summary and Future work

Page 15: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

15November 18, 2005

Role of LCS in Molecular biology

DNA sequences (genes) can be represented as sequences of four letters A, C, G, and T corresponding to the four submolecules forming DNA

When biologists find a new sequences, they typically want to know what other sequences it is most similar to

One way of computing how similar (homologous) two sequences are, is to find the length of their longest common subsequence

Page 16: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

16November 18, 2005

Role of LCS in Molecular biology

This is a simplification, since in the biological situation one would typically take into account not only the length of the LCS, but also e.g. how gaps occur when the LCS is embedded in the two original sequences.

An obvious measure for the closeness of two strings is to find the maximum number of identical symbols (preserving symbol order)

This by definition, is the longest common subsequence of the strings

Page 17: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

17November 18, 2005

Overview of LCS Algorithm

Given two strings, find the LCS common to both strings. Example:

String 1: AGACTGAGGTA String 2: ACTGAG

AGACTGAGGTA - -ACTGAG - - - list of possible alignments - -ACTGA - G- - A- -CTGA - G- - A- -CTGAG - - -

The time complexity of this algorithm is clearly O(nm);

Page 18: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

18November 18, 2005

Overview of LCS Algorithm

Actually this time does not depend on the sequences u and v themselves but only on their lengths

The bottleneck in efficient parallelization of LCS problem are the calculating the value of diagonal elements, as shown

As seen, the value of {i,j} depend upon the previous element {i-1,j-1}, when a match is found

Page 19: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

19November 18, 2005

Possibility of more then one LCS Associate some parameters Take account into gap penalties The Smith-Waterman Algorithm uses the

same concept that of LCS algorithm, but gives us the optimal result

Overview of LCS Algorithm

Page 20: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

20November 18, 2005

Overview of LCS Algorithm

1 1 1 1 1

11

2111

1 222222

111111

3

1

1

1

44443222

3333

43332

5

55

43332 6

5

4

3

2 2

666

5 5

4

3

0 0 0 0 0 0 0 0 0 0 0 0

A G A C T G A G G T A

0

0

0

0

0

0

A

C

T

G

A

G

Page 21: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

21November 18, 2005

Communication between PE’s

In 2D mesh network, Communication between P.E’s themselves take place

in two different ways By using the nearest neighbors mesh

interconnection network Powerful variation on the nearest-neighbor mesh

called the “Coterie network”, developed in response to the requirement for nonlocal communication

Properties significantly different from the usual mesh

Page 22: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

22November 18, 2005

Presentation Outline

Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network

Exact match Approximate match

Summary and Future work

Page 23: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

23November 18, 2005

Coteries[ Weems & Herbordt ]

“A small often selected group of persons who associate with one another frequently” Features:

Related to other Reconfigurable broadcast network Describable using hypergraphs Dynamic in nature

Advantages: Propagation of information quickly over long distances at

electrical speed Support of one-to-many communication within coterie,

reconfigurability of the coterie

Page 24: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

24November 18, 2005

PE’s form Coteries

5 x 5 coterie network with switches shown in “arbitrary” settings. Shaded areas denotes coterie (the set of PEs Sharing same circuit)

Page 25: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

25November 18, 2005

Coterie’s Physical Structure

In the Physical implementation, each PE controls set of switches Four of these switches control

access in the different directions (N,S,E,W)

Two switches H and V are used to emulated horizontal and vertical buses

The last two switches NE and NW are used to creation of eight way connected region

NWNE

WSES

V

H E

S

W

: Switch

N

Page 26: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

26November 18, 2005

Presentation Outline

Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network

Exact match Approximate match

Summary and Future work

Page 27: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

27November 18, 2005

LCS Algorithm on Coterie Network

A G A C T G A G G T A

Page 28: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

28November 18, 2005

LCS Algorithm on Coterie Network

A G A C T G A G G T A

Page 29: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

29November 18, 2005

LCS Algorithm on Coterie Network

A G A C T G A G G T A

A G A C T G A G G T A

A G A C T G A G G T A

A G A C T G A G G T A

A G A C T G A G G T A

A G A C T G A G G T A

Content of each PE’s after MULTICAST operation

Page 30: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

30November 18, 2005

LCS Algorithm on Coterie Network

A

C

T

G

A

G

Page 31: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

31November 18, 2005

LCS Algorithm on Coterie Network

A

C

T

G

A

G

Page 32: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

32November 18, 2005

LCS Algorithm on Coterie Network

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

A

C

T

G

A

G

Content of each PE’s after MULTICAST operation

Page 33: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

33November 18, 2005

LCS Algorithm on Coterie Network

1 0 1 0 0

00

0000

0 000001

100010

1

0

0

1

11010001

0000

00010

0

01

10001 1

0

0

1

0 0

001

0 1

0

0

A G A C T G A G G T A

A

C

T

G

A

G

Page 34: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

34November 18, 2005

LCS Algorithm on Coterie Network

1 0 1 0 0

00

0000

0 000001

100010

1

0

0

1

11010001

0000

00010

0

01

10001 1

0

0

1

0 0

001

0 1

0

0

A G A C T G A G G T A

A

C

T

G

A

G

Inject unique token

Page 35: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

35November 18, 2005

LCS Algorithm on Coterie Network

We try to refine the Exact Match algorithm to support approximate matching

We make use of tokens The next example demonstrate this

problemFor the string:

Text :AGACTGAGGTA Pattern : ACTAAG

Page 36: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

36November 18, 2005

Presentation Outline

Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network

Exact match Approximate match

Summary and Future work

Page 37: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

37November 18, 2005

LCS Algorithm on Coterie Network

1 0 1 0 0

00

0000

0 000001

100010

1

1

0

1

00100010

0000

00010

0

01

10001 1

0

0

1

0 0

001

0 1

1

0

A G A C T G A G G T A

A

C

T

A

A

G

Inject unique token

Page 38: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

38November 18, 2005

Token method

In this method, we explicitly close the W-S switch based on some condition

Inject unique token symbols Where the H and V switch is set within a PE’s,

we close the W-S switch as shown As the token traverse, keep track of gaps and

match Resulting a path from first row to the last row Value in CR gives the length of LCS and in SR

number of gaps occurred, first row of PE

Page 39: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

39November 18, 2005

LCS Algorithm on Coterie Network

1 0 1 0 0

00

0000

0 000001

100010

1

1

0

1

00100010

0000

00010

0

01

10001 1

0

0

1

0 0

001

0 1

1

0

A G A C T G A G G T A

A

C

T

A

A

G

Inject unique token

Page 40: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

40November 18, 2005

Presentation Outline

Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network

Exact match Approximate match

Summary and Future work

Page 41: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

41November 18, 2005

Summary and Future work

We have presented two variation of the lcs algorithm

We have Explored a new network for this problem

Constant time algorithm for Exact matchApproximate algorithm depends upon

the diameter of the network

Page 42: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

42November 18, 2005

Summary and Future work

Future Work:

Optimize the algorithm for Approximate match Implementing the algorithm on FPGA’s model Incorporating the Don’t Care Symbol Extend the idea to support sequence

alignmentConserve memory by using encoding schemeWe can use Virtual simulation of PEs, in case

we ran out of PEs

Page 43: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

43November 18, 2005

Acknowledgements

Professor Walker Professor Baker Professor Weems Professor Herbordt Professor Piontkivska Kevin Schaffer, Hong Wang, Shannon Steinfadt,

Jalpesh Chitalia, and Michael Scherger

Page 44: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

44November 18, 2005

THANK YOU

Page 45: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

November 18, 2005 PACL and ASC Processor Research Overview

45

Multithreaded ASC

Kevin Schaffer

ASC Processor GroupComputer Science Department

Kent State University

Page 46: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

46November 18, 2005

Motivation

As we scale up the number of PEs, broadcast and reduction operations become more expensive due to wire delays

Deeper pipelines can absorb wire delays, but the cost is higher operational latencies

It becomes more difficult to keep the pipeline fully utilized, especially in associative programs Maximum/minimum search Branch if any responders

Possible solution: use instructions from multiple threads to keep the pipeline full

Page 47: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

47November 18, 2005

Execution Time vs. Latency

0

50

100

150

200

250

300

350

400

450

1 2 3 4 5 6 7 8

Communication Latency (cycles)

Execu

tio

n T

ime (

cycle

s)

ASC Multithreaded ASC MASC

Page 48: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

48November 18, 2005

Throughput vs. Latency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8

Communication Latency (cycles)

No

rmalized

Th

rou

gh

pu

t (i

nstr

ucti

on

s/c

ycle

)

ASC Multithreaded ASC MASC

Page 49: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

49November 18, 2005

Designing a SIMD Pipeline

Separate scalar and parallel pipelines Parallel pipeline includes extra stages to handle

broadcast and reduction delays Allows scalar and parallel instructions to complete out of

order with respect to each other

I F I D B 1 R F

E X

W B

M

E X M R 4R 3B 2

W B

Page 50: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

50November 18, 2005

Broadcast Dependency

Due to the shorter scalar pipeline, forwarding can eliminate stalls caused by broadcast dependencies

A longer scalar pipeline would have caused a stall

IF ID E X W B

1 2 3 4 5

ADD S2, S3, S4

PSUB P5, P6, S2

6

M

IF ID B1 RFB2 EX M R3 WBR4

PAND P7, P8, S2 IF ID B 1 R FB 2 E X M R 3 W BR 4

7 8 9 10 11 12

Page 51: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

51November 18, 2005

Reduction Dependency

No amount of forwarding hardware can eliminate stalls due to reduction dependencies

Must use instructions from other threads to fill in the unused execution slots

IF ID B 1 R F

IF ID E X W B

1 2 3 4 5

RMAX S2, P3

ADD S5, S6, S2 ID

6

B2

ID

7

R1 R2 R3 WBR4

ID ID ID ID

8 9 10 11

Page 52: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

52November 18, 2005

Instruction Issue

Use instructions from multiple threads to keep the pipeline fully utilized

Instructions within a thread are issued in order Scalar and parallel instruction may complete out of

order, so we check for hazards before issue Fine-grain multithreading interleaves threads at the

instruction level For now, at most one instruction issues each cycle With separate scalar and parallel pipelines, we could

issue one instruction to each

Page 53: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

53November 18, 2005

Working with Threads

Management Fork and join instructions start and end threads Threads are allocated dynamically by the hardware

Communication Send and receive instructions can transmit small

amounts of data quickly Used to synchronize register contents between parent

and child threads Can use shared memory for large data

Page 54: Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus),

PACL and ASC Processor Research Overview

54November 18, 2005

Multithreaded ASC and MASC

All processors execute the same instruction at the same time in lock step

Each thread can use all the processors Multiple threads improve pipeline utilization, but

not processor utilization Multithreading and MASC can be used together

in order to improve both