Upload
airlia
View
31
Download
0
Embed Size (px)
DESCRIPTION
Research Overview Parallel and Associative Computing Group and the ASC Processor Group Kent State University. Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus), Michael Scherger, Wittaya Chantamas, Hong Wang Sabegh Singh Virdi, Shannon Steinfadt, Kevin Schaffer - PowerPoint PPT Presentation
Citation preview
November 18, 2005 PACL and ASC Processor Research Overview
1
Research Overview Parallel and Associative Computing Groupand theASC Processor Group
Kent State University
Dr. Johnnie Baker, Dr. Robert Walker, and Dr. Jerry Potter (Emeritus), Michael Scherger, Wittaya Chantamas, Hong Wang
Sabegh Singh Virdi, Shannon Steinfadt, Kevin Schaffer
Department of Computer Science
Kent State University
Kent, Ohio
PACL and ASC Processor Research Overview
2November 18, 2005
Presentation Outline
Associative and Parallel Algorithms and Applications Design of an Associative Parallel Database Server –
Michael Scherger Longest Common Subsequence Algorithm on ASC
Processors Using a Coterie Network – Sabegh Singh Virdi
Shannon’s Presentation Title Goes Here… Multithreaded ASC – Kevin Schaffer
PACL and ASC Processor Research Overview
3November 18, 2005
MASC Parallel Database Server
PE
…
PE
PE
PE
PE
PE
…
PE
PE
PE
PE
ASHDGALTLONLATAIDRBusy / Idle
ASHDGALTLONLATAIDRBusy / Idle
330907084.1339.54CO2341TT
……………………
F
F
F
F
30531529082.5340.0AA1223FT
525015081.5141.24UA722FT
2752256084.3139.9AA123FT
450027081.2640.55CO56TT
330907084.1339.54CO2341TT
……………………
F
F
F
F
30531529082.5340.0AA1223FT
525015081.5141.24UA722FT
2752256084.3139.9AA123FT
450027081.2640.55CO56TT
PACL and ASC Processor Research Overview
4November 18, 2005
MASC Parallel Database Server
Database is resident in parallel array memory.
Parallel memory map is similar to table layout.
Size of parallel array is fixed during execution. Only used physical PEs.
PE memory size is fixed.
Table records are placed in the array memory in “folds”.
Parallel array memory is allocated dynamically.
Coalescing parallel memory manager.
Tables will also have a set of parallel control bits.
PACL and ASC Processor Research Overview
5November 18, 2005
MASC Parallel Database Server
Table A1 Table A2
Database
Table List
Fold ListField List
Light shaded regions are empty records for table A
Dark shaded regions are
occupied records for table A
Stores relative
offsets of table fields
Stores base memory address
of each fold
PACL and ASC Processor Research Overview
6November 18, 2005
Multiple Tables and Folds
Table A1 Table B1 Table B2 Free Parallel Memory
PACL and ASC Processor Research Overview
7November 18, 2005
Parallel Database Operations
SQL INSERT Algorithm Perform associative search on
BI field to find first/next cell for insert.
If none found, create new fold.
Copy data from sequential to parallel memory.
O(1)…(O(#folds)) Uses broadcast / reduction
network.
Table A1 Table A2
INSERT INTO FLIGHTS( AID, LAT, LON, ALT, AS )
VALUES( ‘CO128’, 43.39, 83.67, 190, 450 )
PACL and ASC Processor Research Overview
8November 18, 2005
Parallel Database Operations
SQL DELETE Algorithm Perform associative search on
WHERE clause. (Flag responders)
Reset BI field where responder flag is set true. Data remains in memory.
If all PE records are idle, CPMM marks block as free.
O(1)…(O(#folds))
Table A1 Table A2
DELETE FROM FLIGHTS
WHERE AID = ‘NW 545’
PACL and ASC Processor Research Overview
9November 18, 2005
Future Work
MASC Parallel Database Server Change the parallel memory manager to support
multiple instruction streams
Table ATable BTable A x B
Free Parallel Memory
Internally Fragmented Parallel Memory
November 18, 2005 PACL and ASC Processor Research Overview
10
Longest Common Subsequence Algorithm on ASC Processors using
Coterie Network
Sabegh Singh Virdi
ASC Processor GroupComputer Science Department
Kent State University
PACL and ASC Processor Research Overview
11November 18, 2005
Presentation Outline
Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network
Exact match Approximate match
Summary and Future work
PACL and ASC Processor Research Overview
12November 18, 2005
String Matching
One of the most fundamental operation in computing.
Comparing two linear arrays of character Application in bioinformatics, searching
genetic databases String involved are how ever enormous,
efficient string processing is therefore a requirement
PACL and ASC Processor Research Overview
13November 18, 2005
String Matching Variations
Is Exact match the only solution? What if the pattern does not occur in the text? It still makes sense to find the longest
subsequence that occurs both in the pattern and in the text. This is the longest common subsequence problem
Longest Common Subsequence, Longest Common Substring, Sequence alignment, Edit distance Problem are all variation of SM problem
PACL and ASC Processor Research Overview
14November 18, 2005
Presentation Outline
String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network
Exact match Approximate match
Summary and Future work
PACL and ASC Processor Research Overview
15November 18, 2005
Role of LCS in Molecular biology
DNA sequences (genes) can be represented as sequences of four letters A, C, G, and T corresponding to the four submolecules forming DNA
When biologists find a new sequences, they typically want to know what other sequences it is most similar to
One way of computing how similar (homologous) two sequences are, is to find the length of their longest common subsequence
PACL and ASC Processor Research Overview
16November 18, 2005
Role of LCS in Molecular biology
This is a simplification, since in the biological situation one would typically take into account not only the length of the LCS, but also e.g. how gaps occur when the LCS is embedded in the two original sequences.
An obvious measure for the closeness of two strings is to find the maximum number of identical symbols (preserving symbol order)
This by definition, is the longest common subsequence of the strings
PACL and ASC Processor Research Overview
17November 18, 2005
Overview of LCS Algorithm
Given two strings, find the LCS common to both strings. Example:
String 1: AGACTGAGGTA String 2: ACTGAG
AGACTGAGGTA - -ACTGAG - - - list of possible alignments - -ACTGA - G- - A- -CTGA - G- - A- -CTGAG - - -
The time complexity of this algorithm is clearly O(nm);
PACL and ASC Processor Research Overview
18November 18, 2005
Overview of LCS Algorithm
Actually this time does not depend on the sequences u and v themselves but only on their lengths
The bottleneck in efficient parallelization of LCS problem are the calculating the value of diagonal elements, as shown
As seen, the value of {i,j} depend upon the previous element {i-1,j-1}, when a match is found
PACL and ASC Processor Research Overview
19November 18, 2005
Possibility of more then one LCS Associate some parameters Take account into gap penalties The Smith-Waterman Algorithm uses the
same concept that of LCS algorithm, but gives us the optimal result
Overview of LCS Algorithm
PACL and ASC Processor Research Overview
20November 18, 2005
Overview of LCS Algorithm
1 1 1 1 1
11
2111
1 222222
111111
3
1
1
1
44443222
3333
43332
5
55
43332 6
5
4
3
2 2
666
5 5
4
3
0 0 0 0 0 0 0 0 0 0 0 0
A G A C T G A G G T A
0
0
0
0
0
0
A
C
T
G
A
G
PACL and ASC Processor Research Overview
21November 18, 2005
Communication between PE’s
In 2D mesh network, Communication between P.E’s themselves take place
in two different ways By using the nearest neighbors mesh
interconnection network Powerful variation on the nearest-neighbor mesh
called the “Coterie network”, developed in response to the requirement for nonlocal communication
Properties significantly different from the usual mesh
PACL and ASC Processor Research Overview
22November 18, 2005
Presentation Outline
Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network
Exact match Approximate match
Summary and Future work
PACL and ASC Processor Research Overview
23November 18, 2005
Coteries[ Weems & Herbordt ]
“A small often selected group of persons who associate with one another frequently” Features:
Related to other Reconfigurable broadcast network Describable using hypergraphs Dynamic in nature
Advantages: Propagation of information quickly over long distances at
electrical speed Support of one-to-many communication within coterie,
reconfigurability of the coterie
PACL and ASC Processor Research Overview
24November 18, 2005
PE’s form Coteries
5 x 5 coterie network with switches shown in “arbitrary” settings. Shaded areas denotes coterie (the set of PEs Sharing same circuit)
PACL and ASC Processor Research Overview
25November 18, 2005
Coterie’s Physical Structure
In the Physical implementation, each PE controls set of switches Four of these switches control
access in the different directions (N,S,E,W)
Two switches H and V are used to emulated horizontal and vertical buses
The last two switches NE and NW are used to creation of eight way connected region
NWNE
WSES
V
H E
S
W
: Switch
N
PACL and ASC Processor Research Overview
26November 18, 2005
Presentation Outline
Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network
Exact match Approximate match
Summary and Future work
PACL and ASC Processor Research Overview
27November 18, 2005
LCS Algorithm on Coterie Network
A G A C T G A G G T A
PACL and ASC Processor Research Overview
28November 18, 2005
LCS Algorithm on Coterie Network
A G A C T G A G G T A
PACL and ASC Processor Research Overview
29November 18, 2005
LCS Algorithm on Coterie Network
A G A C T G A G G T A
A G A C T G A G G T A
A G A C T G A G G T A
A G A C T G A G G T A
A G A C T G A G G T A
A G A C T G A G G T A
Content of each PE’s after MULTICAST operation
PACL and ASC Processor Research Overview
30November 18, 2005
LCS Algorithm on Coterie Network
A
C
T
G
A
G
PACL and ASC Processor Research Overview
31November 18, 2005
LCS Algorithm on Coterie Network
A
C
T
G
A
G
PACL and ASC Processor Research Overview
32November 18, 2005
LCS Algorithm on Coterie Network
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
Content of each PE’s after MULTICAST operation
PACL and ASC Processor Research Overview
33November 18, 2005
LCS Algorithm on Coterie Network
1 0 1 0 0
00
0000
0 000001
100010
1
0
0
1
11010001
0000
00010
0
01
10001 1
0
0
1
0 0
001
0 1
0
0
A G A C T G A G G T A
A
C
T
G
A
G
PACL and ASC Processor Research Overview
34November 18, 2005
LCS Algorithm on Coterie Network
1 0 1 0 0
00
0000
0 000001
100010
1
0
0
1
11010001
0000
00010
0
01
10001 1
0
0
1
0 0
001
0 1
0
0
A G A C T G A G G T A
A
C
T
G
A
G
Inject unique token
PACL and ASC Processor Research Overview
35November 18, 2005
LCS Algorithm on Coterie Network
We try to refine the Exact Match algorithm to support approximate matching
We make use of tokens The next example demonstrate this
problemFor the string:
Text :AGACTGAGGTA Pattern : ACTAAG
PACL and ASC Processor Research Overview
36November 18, 2005
Presentation Outline
Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network
Exact match Approximate match
Summary and Future work
PACL and ASC Processor Research Overview
37November 18, 2005
LCS Algorithm on Coterie Network
1 0 1 0 0
00
0000
0 000001
100010
1
1
0
1
00100010
0000
00010
0
01
10001 1
0
0
1
0 0
001
0 1
1
0
A G A C T G A G G T A
A
C
T
A
A
G
Inject unique token
PACL and ASC Processor Research Overview
38November 18, 2005
Token method
In this method, we explicitly close the W-S switch based on some condition
Inject unique token symbols Where the H and V switch is set within a PE’s,
we close the W-S switch as shown As the token traverse, keep track of gaps and
match Resulting a path from first row to the last row Value in CR gives the length of LCS and in SR
number of gaps occurred, first row of PE
PACL and ASC Processor Research Overview
39November 18, 2005
LCS Algorithm on Coterie Network
1 0 1 0 0
00
0000
0 000001
100010
1
1
0
1
00100010
0000
00010
0
01
10001 1
0
0
1
0 0
001
0 1
1
0
A G A C T G A G G T A
A
C
T
A
A
G
Inject unique token
PACL and ASC Processor Research Overview
40November 18, 2005
Presentation Outline
Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie Network
Exact match Approximate match
Summary and Future work
PACL and ASC Processor Research Overview
41November 18, 2005
Summary and Future work
We have presented two variation of the lcs algorithm
We have Explored a new network for this problem
Constant time algorithm for Exact matchApproximate algorithm depends upon
the diameter of the network
PACL and ASC Processor Research Overview
42November 18, 2005
Summary and Future work
Future Work:
Optimize the algorithm for Approximate match Implementing the algorithm on FPGA’s model Incorporating the Don’t Care Symbol Extend the idea to support sequence
alignmentConserve memory by using encoding schemeWe can use Virtual simulation of PEs, in case
we ran out of PEs
PACL and ASC Processor Research Overview
43November 18, 2005
Acknowledgements
Professor Walker Professor Baker Professor Weems Professor Herbordt Professor Piontkivska Kevin Schaffer, Hong Wang, Shannon Steinfadt,
Jalpesh Chitalia, and Michael Scherger
PACL and ASC Processor Research Overview
44November 18, 2005
THANK YOU
November 18, 2005 PACL and ASC Processor Research Overview
45
Multithreaded ASC
Kevin Schaffer
ASC Processor GroupComputer Science Department
Kent State University
PACL and ASC Processor Research Overview
46November 18, 2005
Motivation
As we scale up the number of PEs, broadcast and reduction operations become more expensive due to wire delays
Deeper pipelines can absorb wire delays, but the cost is higher operational latencies
It becomes more difficult to keep the pipeline fully utilized, especially in associative programs Maximum/minimum search Branch if any responders
Possible solution: use instructions from multiple threads to keep the pipeline full
PACL and ASC Processor Research Overview
47November 18, 2005
Execution Time vs. Latency
0
50
100
150
200
250
300
350
400
450
1 2 3 4 5 6 7 8
Communication Latency (cycles)
Execu
tio
n T
ime (
cycle
s)
ASC Multithreaded ASC MASC
PACL and ASC Processor Research Overview
48November 18, 2005
Throughput vs. Latency
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8
Communication Latency (cycles)
No
rmalized
Th
rou
gh
pu
t (i
nstr
ucti
on
s/c
ycle
)
ASC Multithreaded ASC MASC
PACL and ASC Processor Research Overview
49November 18, 2005
Designing a SIMD Pipeline
Separate scalar and parallel pipelines Parallel pipeline includes extra stages to handle
broadcast and reduction delays Allows scalar and parallel instructions to complete out of
order with respect to each other
I F I D B 1 R F
E X
W B
M
E X M R 4R 3B 2
W B
PACL and ASC Processor Research Overview
50November 18, 2005
Broadcast Dependency
Due to the shorter scalar pipeline, forwarding can eliminate stalls caused by broadcast dependencies
A longer scalar pipeline would have caused a stall
IF ID E X W B
1 2 3 4 5
ADD S2, S3, S4
PSUB P5, P6, S2
6
M
IF ID B1 RFB2 EX M R3 WBR4
PAND P7, P8, S2 IF ID B 1 R FB 2 E X M R 3 W BR 4
7 8 9 10 11 12
PACL and ASC Processor Research Overview
51November 18, 2005
Reduction Dependency
No amount of forwarding hardware can eliminate stalls due to reduction dependencies
Must use instructions from other threads to fill in the unused execution slots
IF ID B 1 R F
IF ID E X W B
1 2 3 4 5
RMAX S2, P3
ADD S5, S6, S2 ID
6
B2
ID
7
R1 R2 R3 WBR4
ID ID ID ID
8 9 10 11
PACL and ASC Processor Research Overview
52November 18, 2005
Instruction Issue
Use instructions from multiple threads to keep the pipeline fully utilized
Instructions within a thread are issued in order Scalar and parallel instruction may complete out of
order, so we check for hazards before issue Fine-grain multithreading interleaves threads at the
instruction level For now, at most one instruction issues each cycle With separate scalar and parallel pipelines, we could
issue one instruction to each
PACL and ASC Processor Research Overview
53November 18, 2005
Working with Threads
Management Fork and join instructions start and end threads Threads are allocated dynamically by the hardware
Communication Send and receive instructions can transmit small
amounts of data quickly Used to synchronize register contents between parent
and child threads Can use shared memory for large data
PACL and ASC Processor Research Overview
54November 18, 2005
Multithreaded ASC and MASC
All processors execute the same instruction at the same time in lock step
Each thread can use all the processors Multiple threads improve pipeline utilization, but
not processor utilization Multithreading and MASC can be used together
in order to improve both