Transcript
Page 1: Josep Torrellas (University of Illinois at Urbana ...iacoma.cs.uiuc.edu/m3t/poster/poster_sandiego.pdfTed Bapty (Vanderbilt University) Bob Bassett, David Ngo (BAE SYSTEMS) Hubertus

Josep Torrellas (University of Illinois at Urbana-Champaign) Ben Abbott (Southwest Research Institute)Ted Bapty (Vanderbilt University) Bob Bassett, David Ngo (BAE SYSTEMS) Hubertus Franke, Jose Moreira(IBM Research)

ArchitectureArchitecture Compiler SupportCompiler Support

Software ProductivitySoftware Productivity

� � �� �� �� � �� � �� ��

� � ��

� �� � �

��� �� � � � �� �

�� �� � � � �� � � � � ��

�� � !� � !" � � " " ��

� � � " � !# � �� �$% &' � � % �� � (� � � �� � � # ) � � � �

M PΦΦΦΦ M

M P M P

P M P M

M P M ΦΦΦΦ

M3T

�����

�������

� � �� � � �� � � �� � ���� � �� � � � � � �

� ��� � � � � � � �

� �� � �

� � � � � � � �

� �� � �

Novel Inter-Task Optimizations� � � �� �� � �� � � � � ��

� � � �� � � � ��

� � � �� � � � � ��

� � � �� � � � � � �� � � �

� � � �� �� � ��

� � � �� �� �� � �� � � �

� � � ��� � � � � � � ��

� � ! � "

� # �� "

*+ , - *. *+ , - */*+ , - *0*+ , - *.1 /

Front End

High Level Transformations

Task Selection

Inter-Task Optimizations

Code Generation

Intra-Task Optimizations

Novel compiler algorithms to build tasks

Sync Bus

CPU+L1CPU+L1

Banked L2

Off-Chip Memory

On-Chip Network

Banked L2 Banked L2

TST

PT

W task

TST: Task State Table

PTW: Pending Task Window

TaskScalar Morph Evaluation

Applications: � � � $ %

2 �3 2 � � � � � # ) � # � � � � !3 � � !" �4

& � ' ' � �%

�" " � !� � ! � # � ! � 5� � � 4 � � ( � " � � # � � � � � � � �� 5� 5 5 � �� � 6 � � !7 ! 4

� � ( �� �� � � � %

� ! ( � �� � !3 5� � � � � � !7 ! 4 � � # � � # � � � ) � # � # � � �� � ��

Effect of Task Size Effect of Number of Processors Effect of Network Latency

Timeline of Tasks (Matrix) Timeline of Tasks (Bubble) Timeline of Tasks (Pathological)

89: : ;< 9= > ?: @AB @: C: D EF ?= G > H HI= AJ : ;K >= L= 8M > H > N A HAKO F EK @: = 9 : : ;< 9 = = AB D A EA M > DK HO;: 9: D ;= F DK @: >9 9 HA M >K A F D 89 : : ;< 9= > ?: C: ?O K F H: ? > DK K F D: KP F ? L H >K : DMO

8 G F F K @: Q: M < K A F D F EK >= L= R < ?= KO = 9 >P D > D ;: Q: M < K A F D F EK >= L= S AB @ H F > ; A G N > H > DM:T P >= K : F E ?: = F < ?M: =

$ � # � 6 !# � � # � � 4 # � � � � # ! � � ! � #

U F ;: M F > ?= : M ?AK A M > H= : M K A F DV D= : ?K 9: ? @ >9= < D D: M: = = > ?O N > ? ? A: ?=� " � � � �� !7 � � 4 # � � � � # ! � � ! � # � � " � � � �� � " � � � � !7 � 5 � � � ! �� �% � �� (�% 6 ��W �

X: K: M K M F D E HA M K =Y ? F H H N >M L F E E: D ;A DB K @ ?: > ;=Z= : M >M @: = K F = K F ?: = 9: M < H >K A C: = K >K :

�� !# � !# � �� � � �

�����������→

6 �� � � � ) " � �W � �� �

[ F M L\ FP D: ?] H >B \ 9 ? F ;< M: ?

R > ? ? A: ? \ H >B B A DB K >= L=

Debugging Data Races Debugging Data Races [ISCA03][ISCA03]

……LD AINCST A…

…lock(L)LD AINCST Aunlock(L)…

Task X Task Y

?

CPU

Memory

Cache

CPU

Cache

A A

M3T Architecture

CPU+L1CPU+L1TST CPU+L1CPU+L1TST

TaskScalar Morph^ # ! � 6 �3 �� � ! � # � � � � (

� � � ( � � # 5 � � � ! � ) �� �_ � � � � � ) !# � # � � � �

PT

W

PT

W

No explicit orderbetween

`` ``

and

`` ``

$ � �� � # � 4 # � � � � # ! � � ) � � � # !� � ! � #

� # � � � � # � � � # ) � � # ) � �a � 3 � � � ! � #

b �a �3 �� � ! � # !� ) � � � !# !� !�

Unlock L

Unlock L

Lock LLock L

Set F

Wait F

Barrier

Barrier

Task Ordering

cccdef� # )� � � )g � W � �

b � � 7 � )g � � � ! ��

defdefdefdef� # )� � � )g � W � �

b � � 7 � ) ' �� (

hidefdefdefj � � � k 3 !� !# W

$ � � b� � ��

defdefdefdef� 4 # � � � �� W �

" �� !# 7 � � !� 5 � ��

� � () � � * � �+ �� � ' � �, �� �� �

Effectiveness

Speculative Barrier

Speculative Lock

C

D ACQUIRE

RELEASESafe

Speculative

BA

E

C

BARRIERA B

Safe

Speculative

0

20

40

60

80

100

120

Base Spec

Nor

mal

ized

Tim

e Lo

st to

Syn

chro

niza

tion

lmn o pnq r

stu q v pwx y rz{ | }~stu q v pw� q � vn �q �q ~stu q v pw � ou n �q �q ~s� y |

17.7%

Sync Time Reduction

� ' � �W � � � )� � ! � # � � ��

TaskScalar attempts torun section in parallelspeculate past synchronization

Result: appear as if we had invested more man-hours

Reducing Parallel Programming Effort Reducing Parallel Programming Effort [ASPLOS02][ASPLOS02]

Parallelism

Superscalar

SMT

CMPTaskScalar

SpecIntSpecFP

Scientific

Per

form

ance

0%

5%

10%

15%

0 20 40 60 80 100 120Rollback Distance [Instructions per CPU]

Ove

rhe

ad

Better

Chosen- .

Overhead

K K K K K K

Recommended