Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
�
�
Lecture 2:Overview of High-Performance Computing
-DFN�'RQJDUUD�8QLYHUVLW\�RI�7HQQHVVHH�
�
Most Important Slide
�1HWOLE�� VRIWZDUH�UHSRVLWRU\¾*R�WR�KWWS���ZZZ�QHWOLE�RUJ�
� 5HJLVWHU�IRU�WKH�QD�GLJHVW¾*R�WR�KWWS���ZZZ�QHWOLE�RUJ�QD�QHW�¾5HJLVWHU�WR�UHFHLYH�WKH�QD�GLJHVW
� MYYU���\\\�SJYQNG�TWL�SF�SJY�OTNSDRFNQDKTW\�MYRQ
�
�
Computational Science
�+3&�RIIHUV�D�QHZ�ZD\�WR�GR�VFLHQFH�¾ ([SHULPHQW�� 7KHRU\�� &RPSXWDWLRQ
� &RPSXWDWLRQ�XVHG�WR�DSSUR[LPDWH�SK\VLFDO�V\VWHPV�� $GYDQWDJHV�LQFOXGH�¾3OD\LQJ�ZLWK�VLPXODWLRQ�SDUDPHWHUV�WR�VWXG\�HPHUJHQW�WUHQGV
¾3RVVLEOH�UHSOD\�RI�D�SDUWLFXODU�VLPXODWLRQ�HYHQW
¾6WXG\�V\VWHPV�ZKHUH�QR�H[DFW�WKHRULHV�H[LVW�
�
Why Turn to Simulation?�:KHQ�WKH�SUREOHP�LV�WRR������
¾&RPSOH[¾/DUJH���VPDOO¾([SHQVLYH¾'DQJHURXV
� WR�GR�DQ\�RWKHU�ZD\�
�
�
Why Turn to Simulation?� &OLPDWH���:HDWKHU�0RGHOLQJ� 'DWD�LQWHQVLYH�SUREOHPV�
�GDWD�PLQLQJ��RLO�UHVHUYRLU�VLPXODWLRQ�
� 3UREOHPV�ZLWK�ODUJH�OHQJWK�DQG�WLPH�VFDOHV��FRVPRORJ\�
�
Units of High Performance Computing
1 Mflop/s 1 Megaflop/s 106 Flop/sec
1 Gflop/s 1 Gigaflop/s 109 Flop/sec
1 Tflop/s 1 Teraflop/s 1012 Flop/sec
1 Pflop/s 1 Petaflop/s 1015 Flop/sec
1 MB 1 Megabyte 106 Bytes
1 GB 1 Gigabyte 109 Bytes
1 TB 1 Terabyte 1012 Bytes
1 PB 1 Petabyte 1015 Bytes
�
�
•Why Parallel Computing
� 'HVLUH�WR�VROYH�ELJJHU��PRUH�UHDOLVWLF�DSSOLFDWLRQV�SUREOHPV�
� )XQGDPHQWDO�OLPLWV�DUH�EHLQJ�DSSURDFKHG�
� 0RUH�FRVW�HIIHFWLYH�VROXWLRQ
�
Automotive Industry� +XJH�XVHUV�RI�+3&�WHFKQRORJ\�
¾ )RUG�LV���WK�ODUJHVW�XVHU�RI�+3&�LQ�WKH�ZRUOG� 0DLQ�XVHV�RI�VLPXODWLRQ�
¾ $HURG\QDPLFV��VLPLODU�WR�DHURVSDFH�¾ &UDVK�VLPXODWLRQ¾ 0HWDO�VKHHW�IRUPDWLRQ¾ 1RLVH�YLEUDWLRQ�RSWLPL]DWLRQ¾ 7UDIILF�VLPXODWLRQ
� 0DLQ�EHQHILWV�¾ 5HGXFHG�WLPH�WR�PDUNHW�RI�QHZ�FDUV¾ ,QFUHDVHG�TXDOLW\¾ 5HGXFHG�QHHG�WR�EXLOG�SURWRW\SHV¾ PRUH�HIILFLHQW��LQWHJUDWHG�PDQXIDFWXULQJ�SURFHVVHV
�
�
Technology Trends: Microprocessor Capacity
2X transistors/Chip Every 1.5 years
Called “Moore’s Law”
Moore’s Law
Microprocessors have become smaller, denser, and more powerful.Not just processors, bandwidth, storage, etc
Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.
��
Internet –4th Revolution in Telecommunications
� 7HOHSKRQH��5DGLR��7HOHYLVLRQ� *URZWK�LQ�,QWHUQHW�RXWVWULSV�WKH�RWKHUV� ([SRQHQWLDO�JURZWK�VLQFH������ 7UDIILF�GRXEOHV�HYHU\�����GD\V
�
��
The Web Phenomenon
� ���² ���:HE�LQYHQWHG� 8�RI�,OOLQRLV�0RVDLF�UHOHDVHG�
0DUFK�����a������WUDIILF� 6HSWHPEHU����a����WUDIILF�
Z�����VLWHV� -XQH����a�����RI�WUDIILF�
Z�������VLWHV� 7RGD\�����RI�WUDIILF�
Z�����������VLWHV� (YHU\�RUJDQL]DWLRQ��FRPSDQ\��
VFKRRO
��
Peer to Peer Computing� 3HHU�WR�SHHU�LV�D�VW\OH�RI�QHWZRUNLQJ�LQ�ZKLFK��������������
D�JURXS�RI��FRPSXWHUV�FRPPXQLFDWH�GLUHFWO\�ZLWK�����������HDFK�RWKHU�
� :LUHOHVV�FRPPXQLFDWLRQ� +RPH�FRPSXWHU�LQ�WKH�XWLOLW\�URRP��QH[W�WR�WKH�ZDWHU�KHDWHU�
DQG�IXUQDFH��� :HE�WDEOHWV�� ,PEHGGHG�FRPSXWHUV�LQ�WKLQJV������������������������������������
DOO�WLHG�WRJHWKHU�¾ %RRNV��IXUQLWXUH��PLON�FDUWRQV��HWF
� 6PDUW�$SSOLDQFHV¾ 5HIULJHUDWRU��VFDOH��HWF
�
��
Internet On Everything
��
SETI@home� 8VH�WKRXVDQGV�RI�,QWHUQHW�
FRQQHFWHG�3&V�WR�KHOS�LQ�WKH�VHDUFK�IRU�H[WUDWHUUHVWULDO�LQWHOOLJHQFH�
� 8VHV�GDWD�FROOHFWHG�ZLWK�WKH�$UHFLER 5DGLR�7HOHVFRSH��LQ�3XHUWR�5LFR�
� :KHQ�WKHLU�FRPSXWHU�LV�LGOH�RU�EHLQJ�ZDVWHG�WKLV�VRIWZDUH�ZLOO�GRZQORDG�D�����NLORE\WH�FKXQN�RI�GDWD�IRU�DQDO\VLV��
� 7KH�UHVXOWV�RI�WKLV�DQDO\VLV�DUH�VHQW�EDFN�WR�WKH�6(7,�WHDP��FRPELQHG�ZLWK�WKH�FUXQFKHG�GDWD�IURP�WKH�PDQ\�WKRXVDQGV�RI�RWKHU�6(7,#KRPH�SDUWLFLSDQWV�
�
��
Next Generation Web
� 7R�WUHDW�&38�F\FOHV�DQG�VRIWZDUH�OLNH�FRPPRGLWLHV�� (QDEOH�WKH�FRRUGLQDWHG�XVH�RI�JHRJUDSKLFDOO\�GLVWULEXWHG�UHVRXUFHV�² LQ�WKH�DEVHQFH�RI�FHQWUDO�FRQWURO�DQG�H[LVWLQJ�WUXVW�UHODWLRQVKLSV��
� &RPSXWLQJ�SRZHU�LV�SURGXFHG�PXFK�OLNH�XWLOLWLHV�VXFK�DV�SRZHU�DQG�ZDWHU�DUH�SURGXFHG�IRU�FRQVXPHUV�
� 8VHUV�ZLOO�KDYH�DFFHVV�WR�´SRZHUµ�RQ�GHPDQG�� 7KLV�LV�RQH�RI�RXU�HIIRUWV�DW�87�
��
High-Performance Computing Today
� ,Q�WKH�SDVW�GHFDGH��WKH�ZRUOG�KDV�H[SHULHQFHG�RQH�RI�WKH�PRVW�H[FLWLQJ�SHULRGV�LQ�FRPSXWHU�GHYHORSPHQW�
�0LFURSURFHVVRUV�KDYH�EHFRPH�VPDOOHU��GHQVHU��DQG�PRUH�SRZHUIXO�
� 7KH�UHVXOW�LV�WKDW�PLFURSURFHVVRU�EDVHG�VXSHUFRPSXWLQJ�LV�UDSLGO\�EHFRPLQJ�WKH�WHFKQRORJ\�RI�SUHIHUHQFH�LQ�DWWDFNLQJ�VRPH�RI�WKH�PRVW�LPSRUWDQW�SUREOHPV�RI�VFLHQFH�DQG�HQJLQHHULQJ�
�
0.010.1
110
1001000
10000
1980
1982
1986
1988
1990
1992
1994
1996
Per
form
ance
in M
flop
/s
MicrosSupers
8087 80287
6881
80387
R2000
i860
RS6000/540
AlphaRS6000/590
Alpha
Cray 1S
Cray X-MP
Cray 2 Cray Y-MP Cray C90
Cray T90
Growth of Microprocessor Performance
Other Examples: Sony PlayStation2
� (PRWLRQ�(QJLQH������*IORS�V�����PLOOLRQ�SRO\JRQV�SHU�VHFRQG��0LFURSURFHVVRU�5HSRUW�������¾6XSHUVFDODU 0,36�FRUH���YHFWRU�FRSURFHVVRU���JUDSKLFV�'5$0¾&ODLP��´7R\�6WRU\µ�UHDOLVP�EURXJKW�WR�JDPHV¾$ERXW�����
��
��
Sony PlayStation2 Export Limits?
��
Processor PerformanceTrends
Microprocessors
Minicomputers
Mainframes
Supercomputers
Year
0.1
1
10
100
1000
1965 1970 1975 1980 1985 1990 1995 2000
��
Performance vs. Time
Mips25 mhz
0.1
1.0
10
100P
erfo
rman
ce (
VA
X 7
80s)
1980 1985 1990
MV10K
68K
7805 Mhz
RISC 60% / yr
uVAX 6K(CMOS)
8600
TTL
ECL 15%/yr
CMOS CISC
38%/yr
o ||MIPS (8 Mhz)
o9000
Mips(65 Mhz)
uVAXCMOS Will RISC continue on a
60%, (x4 / 3 years)? Moore’s speed law?
4K
��
Trends in Computer PerformanceT
$6&,�5HG
��
��
Where Has This Performance Improvement Come From?
� 7HFKQRORJ\"�2UJDQL]DWLRQ"� ,QVWUXFWLRQ�6HW�$UFKLWHFWXUH"� 6RIWZDUH"� 6RPH�FRPELQDWLRQ�RI�DOO�RI�WKH�DERYH"
��
How fast can a serial computer be?
� &RQVLGHU�WKH���7IORS�VHTXHQWLDO�PDFKLQH¾ GDWD�PXVW�WUDYHO�VRPH�GLVWDQFH��U��WR�JHW�IURP�PHPRU\�WR�&38
¾ WR�JHW���GDWD�HOHPHQW�SHU�F\FOH��WKLV�PHDQV������WLPHV�SHU�VHFRQG�DW�WKH�VSHHG�RI�OLJKW��F� ��[��� P�V
¾ VR�U���F����� ����PP� 1RZ�SXW���7%�RI�VWRUDJH�LQ�D����PP� DUHD�
¾ HDFK�ZRUG�RFFXSLHV�DERXW���$QJVWURPV���WKH�VL]H�RI�D�VPDOO�DWRP
r = .3 mm1 Tflop 1 TB sequential machine
��
Microprocessor Transistors & Paralle
Name
Bit-LevelParallelism
Instruction-LevelParallelism
Thread-LevelParallelism?
��
Processor-DRAM Gap (latency)
µProc60%/yr.
DRAM7%/yr.
1
10
100
1000
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
CPU
1982
Processor-MemoryPerformance Gap:(grows 50% / year)
Per
form
ance
Time
“Moore’s Law”
��
1st Principles
�:KDW�KDSSHQV�ZKHQ�WKH�IHDWXUH�VL]H�VKULQNV�E\�D�IDFWRU�RI�[ "
�&ORFN�UDWH�JRHV�XS�E\�[¾DFWXDOO\�OHVV�WKDQ�[��EHFDXVH�RI�SRZHU�FRQVXPSWLRQ
�7UDQVLVWRUV�SHU�XQLW�DUHD�JRHV�XS�E\�[��'LH�VL]H�DOVR�WHQGV�WR�LQFUHDVH
¾W\SLFDOO\�DQRWKHU�IDFWRU�RI�a[�5DZ�FRPSXWLQJ�SRZHU�RI�WKH�FKLS�JRHV�XS�E\�a�[���
¾RI�ZKLFK�[��LV�GHYRWHG�HLWKHU�WR�SDUDOOHOLVP RU�ORFDOLW\
��
Principles of Parallel Computing
� 3DUDOOHOLVP�DQG�$PGDKO·V�/DZ� *UDQXODULW\� /RFDOLW\� /RDG�EDODQFH� &RRUGLQDWLRQ�DQG�V\QFKURQL]DWLRQ� 3HUIRUPDQFH�PRGHOLQJ
All of these things makes parallel programming even harder than sequential programming.
��
��
“Automatic” Parallelism in Modern Machines
� %LW�OHYHO�SDUDOOHOLVP¾ ZLWKLQ�IORDWLQJ�SRLQW�RSHUDWLRQV��HWF�
� ,QVWUXFWLRQ�OHYHO�SDUDOOHOLVP��,/3�¾ PXOWLSOH�LQVWUXFWLRQV�H[HFXWH�SHU�FORFN�F\FOH
� 0HPRU\�V\VWHP�SDUDOOHOLVP¾ RYHUODS�RI�PHPRU\�RSHUDWLRQV�ZLWK�FRPSXWDWLRQ
� 26�SDUDOOHOLVP¾ PXOWLSOH�MREV�UXQ�LQ�SDUDOOHO�RQ�FRPPRGLW\�603V
Limits to all of these -- for very high performance, need user to identify, schedule and coordinate parallel tasks
��
Finding Enough Parallelism
� 6XSSRVH�RQO\�SDUW�RI�DQ�DSSOLFDWLRQ�VHHPV�SDUDOOHO
� $PGDKO·V�ODZ¾ OHW�V�EH�WKH�IUDFWLRQ�RI�ZRUN�GRQH�VHTXHQWLDOO\��V����V��LV�IUDFWLRQ SDUDOOHOL]DEOH
¾3� �QXPEHU�RI�SURFHVVRUV
Speedup(P) = Time(1)/Time(P)
<= 1/(s + (1-s)/P)
<= 1/s
� (YHQ�LI�WKH�SDUDOOHO�SDUW�VSHHGV�XS�SHUIHFWO\�PD\�EH�OLPLWHG�����������������������������E\�WKH�VHTXHQWLDO�SDUW
��
��
Overhead of Parallelism
� *LYHQ�HQRXJK�SDUDOOHO�ZRUN��WKLV�LV�WKH�ELJJHVW�EDUULHU�WR�JHWWLQJ�GHVLUHG�VSHHGXS
� 3DUDOOHOLVP�RYHUKHDGV�LQFOXGH�¾ FRVW�RI�VWDUWLQJ�D�WKUHDG�RU�SURFHVV¾ FRVW�RI�FRPPXQLFDWLQJ�VKDUHG�GDWD¾ FRVW�RI�V\QFKURQL]LQJ¾ H[WUD��UHGXQGDQW��FRPSXWDWLRQ
� (DFK�RI�WKHVH�FDQ�EH�LQ�WKH�UDQJH�RI�PLOOLVHFRQGV�� PLOOLRQV�RI�IORSV��RQ�VRPH�V\VWHPV
� 7UDGHRII��$OJRULWKP�QHHGV�VXIILFLHQWO\�ODUJH�XQLWV�RI�ZRUN�WR�UXQ�IDVW�LQ�SDUDOOHO��,�H��ODUJH�JUDQXODULW\���EXW�QRW�VR�ODUJH�WKDW�WKHUH�LV�QRW�HQRXJK�SDUDOOHO�ZRUN�
Locality and Parallelism
�/DUJH�PHPRULHV�DUH�VORZ��IDVW�PHPRULHV�DUH�VPDOO�6WRUDJH�KLHUDUFKLHV�DUH�ODUJH�DQG�IDVW�RQ�DYHUDJH�3DUDOOHO�SURFHVVRUV��FROOHFWLYHO\��KDYH�ODUJH��IDVW��
¾WKH�VORZ�DFFHVVHV�WR�´UHPRWHµ�GDWD�ZH�FDOO�´FRPPXQLFDWLRQµ
�$OJRULWKP�VKRXOG�GR�PRVW�ZRUN�RQ�ORFDO�GDWD
ProcCache
L2 Cache
L3 Cache
Memory
Conventional Storage Hierarchy
ProcCache
L2 Cache
L3 Cache
Memory
ProcCache
L2 Cache
L3 Cache
Memory
potentialinterconnects
��
��
Load Imbalance
� /RDG�LPEDODQFH�LV�WKH�WLPH�WKDW�VRPH�SURFHVVRUV�LQ�WKH�V\VWHP�DUH�LGOH�GXH�WR¾ LQVXIILFLHQW�SDUDOOHOLVP��GXULQJ�WKDW�SKDVH�¾XQHTXDO�VL]H�WDVNV
� ([DPSOHV�RI�WKH�ODWWHU¾DGDSWLQJ�WR�´LQWHUHVWLQJ�SDUWV�RI�D�GRPDLQµ¾ WUHH�VWUXFWXUHG�FRPSXWDWLRQV�¾IXQGDPHQWDOO\�XQVWUXFWXUHG�SUREOHPV�
� $OJRULWKP�QHHGV�WR�EDODQFH�ORDG
��
Performance Trends Revisited(Architectural Innovation)
Microprocessors
Minicomputers
Mainframes
Supercomputers
Year
0.1
1
10
100
1000
1965 1970 1975 1980 1985 1990 1995 2000
CISC/RISC
��
��
Performance Trends Revisited (Microprocessor Organization)
Year
1000
10000
100000
1000000
10000000
100000000
1970 1975 1980 1985 1990 1995 2000
r4400
r4000
r3010
i80386
i4004
i8080
i80286
i8086
• Bit Level Parallelism
• Pipelining
• Caches
• Instruction Level Parallelism
• Out-of-order Xeq
• Speculation
• . . .
��
� *UHDWHU�LQVWUXFWLRQ�OHYHO�SDUDOOHOLVP"� %LJJHU�FDFKHV"� 0XOWLSOH�SURFHVVRUV�SHU�FKLS"� &RPSOHWH�V\VWHPV�RQ�D�FKLS"��3RUWDEOH�6\VWHPV�
� +LJK�SHUIRUPDQFH�/$1��,QWHUIDFH��DQG�,QWHUFRQQHFW
What is Ahead?
��
��
Directions
�0RYH�WRZDUG�VKDUHG�PHPRU\¾603V�DQG�'LVWULEXWHG�6KDUHG�0HPRU\¾6KDUHG�DGGUHVV�VSDFH�Z�GHHS�PHPRU\�KLHUDUFK\
� &OXVWHULQJ�RI�VKDUHG�PHPRU\�PDFKLQHV�IRU�VFDODELOLW\
� (IILFLHQF\�RI�PHVVDJH�SDVVLQJ�DQG�GDWD�SDUDOOHO�SURJUDPPLQJ¾+HOSHG�E\�VWDQGDUGV�HIIRUWV�VXFK�DV�03,�DQG�+3)
��
Performance Numbers on RISC Processors
� 8VLQJ�/LQSDFN�%HQFKPDUNMachine MHz Linpack
n=100Mflop/s
Ax=b n=1000Mflop/s
Peak Mflop/s
DEC Alpha 612 287(23%) 764 (59%) 1224IBM RS/397 160 315 (49%) 532 (83%) 640HP PA (Expl V) 240 203 (21%) 731 (76%) 960SGI Origin 2K 195 114 (29%) 344 (88%) 390SUN Ultra II 336 154(23%) 461 (69%) 672Intel Pent. P6 200 63 (31%) 97 (49%) 200
Cray T90 454 705 (39%) 1603 (89%) 1800Cray C90 238 387 (41%) 902 (95%) 952Cray Y-MP 166 161 (48%) 324 (97%) 333
��
��
High Performance Computers� a����\HDUV�DJR
¾ �[����)ORDWLQJ�3RLQW�2SV�VHF��0IORS�V��� 8HFQFW�GFXJI
� a����\HDUV�DJR¾ �[��� )ORDWLQJ�3RLQW�2SV�VHF��*IORS�V��
� ;JHYTW���8MFWJI�RJRTW^�HTRUZYNSL��GFSI\NIYM�F\FWJ� 'QTHP�UFWYNYNTSJI��QFYJSH^�YTQJWFSY
� a�7RGD\¾ �[�����)ORDWLQJ�3RLQW�2SV�VHF��7IORS�V��
� -NLMQ^�UFWFQQJQ��INXYWNGZYJI�UWTHJXXNSL��RJXXFLJ�UFXXNSL��SJY\TWP�GFXJI� IFYF�IJHTRUTXNYNTS��HTRRZSNHFYNTS�HTRUZYFYNTS
� a����\HDUV�DZD\¾ �[�����)ORDWLQJ�3RLQW�2SV�VHF��3IORS�V��
� 2FS^�RTWJ�QJ[JQX�2-��HTRGNSFYNTS�LWNIX�-5(� 2TWJ�FIFUYN[J��19�FSI�GFSI\NIYM�F\FWJ��KFZQY�YTQJWFSY�����������J]YJSIJI�UWJHNXNTS��FYYJSYNTS�YT�825�STIJX
��
TOP500TOP500- Listing of the 500 most powerfulComputers in the World
- Yardstick: Rmax from LINPACK MPPAx=b, dense problem
- Updated twice a yearSC‘xy in the States in NovemberMeeting in Mannheim, Germany in June
All data available from www.top500.org
8N_J
7FYJ
955�UJWKTWRFSHJ
��
,Q������D�FRPSXWDWLRQ�WKDW�WRRN���IXOO�\HDU�WR�FRPSOHWHFDQ�QRZ�EH�GRQH�LQ�a����PLQXWHV�
Fastest Computer Over Time
TMCCM-2(2048)
Fujitsu VP-2600
Cray Y-MP (8)
05
101520253035404550
1990 1992 1994 1996 1998 2000
Year
GF
lop/
s
,Q������D�FRPSXWDWLRQ�WKDW�WRRN���IXOO�\HDU�WR�FRPSOHWHFDQ�QRZ�EH�GRQH�LQ�a�����PLQXWHV�
Fastest Computer Over Time
HitachiCP-PACS
(2040)IntelParagon
(6788)
FujitsuVPP-500
(140)TMC CM-5(1024)
NEC SX-3(4)
TMCCM-2
(2048)Fujitsu
VP-2600Cray
Y-MP (8)
050
100150200250300350400450500
1990 1992 1994 1996 1998 2000
Year
GF
lop/
s
��
,Q������D�FRPSXWDWLRQ�WKDW�WRRN���IXOO�\HDU�WR�FRPSOHWHFDQ�WRGD\�EH�GRQH�LQ�a��VHFRQG�
Fastest Computer Over TimeASCI White
Pacific(7424)
ASCI BluePacific SST
(5808)
SGI ASCIBlue
Mountain(5040)
Intel ASCI Red
(9152)Hitachi
CP-PACS(2040)
IntelParagon(6788)
FujitsuVPP-500
(140)
TMC CM-5(1024)
NEC SX-3
(4)
TMCCM-2(2048)
Fujitsu VP-2600
Cray Y-MP (8)
Intel ASCIRed Xeon
(9632)
0500
100015002000250030003500400045005000
1990 1992 1994 1996 1998 2000
Year
GFl
op/s
Rank Company Machine Procs Gflop/s Place Country Year
1 Intel ASCI Red 9632 2380Sandia National Labs
AlbuquerqueUSA 1999
2 IBMASCI Blue-Pacific SST, IBM SP 604e
5808 2144Lawrence Livermore National
Laboratory LivermoreUSA 1999
3 SGIASCI Blue Mountain
6144 1608Los Alamos National Laboratory
Los AlamosUSA 1998
4 Hitachi SR8000-F1/112 112 1035Leibniz Rechenzentrum
MuenchenGermany 2000
5 Hitachi SR8000-F1/100 100 917High Energy Accelerator
Research Organization /KEK Tsukuba
Japan 2000
6 Cray Inc. T3E1200 1084 892 Government USA 1998
7 Cray Inc. T3E1200 1084 892US Army HPC Research Center
at NCS MinneapolisUSA 2000
8 Hitachi SR8000/128 128 874 University of Tokyo Tokyo Japan 19999 Cray Inc. T3E900 1324 815 Government USA 1997
10 IBMSP Power3
375 MHz1336 723
Naval Oceanographic Office (NAVOCEANO) Poughkeepsie
USA 2000
Top 10 Machines (June 2000)
��
Performance Development
0.1
1
10
100
1000
10000
100000
1000000
Jun-9
3
Nov-9
4
Jun-9
6
Nov-9
7
Jun-9
9
Nov-0
0
Jun-
02
Nov-0
3
Jun-
05
Nov-06
Jun-0
8
Nov-0
9
Per
form
ance
[G
Flo
p/s
]
N=1
N=500
Sum
1 TFlop/s
1 PFlop/s
ASCI
Earth Simulator
*SYW^���9������FSI���5�����
My Laptop
Chip Technology
Inmos Transputer
Alpha
IBMHP
intel
MIPS
SUN
Other
0
100
200
300
400
500
Jun-
93
Nov-9
3
Jun-9
4
Nov-9
4
Jun-9
5
Nov-9
5
Jun-
96
Nov-9
6
Jun-9
7
Nov-9
7
Jun-
98
Nov-9
8
Jun-9
9
Nov-9
9
Jun-0
0
��
��
Architectures
Single Processor
SMP
MPP
SIMD ConstellationCluster
0
100
200
300
400
500
Jun-9
3
Nov-93
Jun-9
4
Nov-94
Jun-9
5
Nov-95
Jun-9
6
Nov-96
Jun-9
7
Nov-97
Jun-9
8
Nov-98
Jun-9
9
Nov-99
��
The Maturation of Highly Parallel Technology
� $IIRUGDEOH�SDUDOOHO�V\VWHPV�QRZ�RXW�SHUIRUP�WKH�EHVW�FRQYHQWLRQDO�VXSHUFRPSXWHUV��
� 3HUIRUPDQFH�SHU�GROODU�LV�SDUWLFXODUO\�IDYRUDEOH��� 7KH�ILHOG�LV�WKLQQLQJ�WR�D�IHZ�YHU\�FDSDEOH�V\VWHPV�
� 5HOLDELOLW\�LV�JUHDWO\�LPSURYHG��� 7KLUG�SDUW\�VFLHQWLILF�DQG�HQJLQHHULQJ�DSSOLFDWLRQV�DUH��DSSHDULQJ�
� %XVLQHVV�DSSOLFDWLRQV�DUH�DSSHDULQJ�� &RPPHUFLDO�FXVWRPHUV��QRW�MXVW�UHVHDUFK�ODEV��DUH�DFTXLULQJ�V\VWHPV��
��
��
Performance Development
88.0 TF/s
1.167 TF/s
59.7 GF/s
4.94 TF/s
0.4 GF/s
55.1 GF/s
Jun-
93
Nov-93
Jun-
94
Nov-94
Jun-9
5
Nov-9
5
Jun-9
6
Nov-9
6
Jun-9
7
Nov-9
7
Jun-9
8
Nov-98
Jun-
99
Nov-99
Jun-
00
Nov-00
Intel XP/S140Sandia
Fujitsu’NWT’ NAL
SNI VP200EXUni Dresden
Hitachi/TsukubaCP-PACS/2048
IntelASCI Red
Sandia
IBMASCI White
LLNLN=1
N=500
SUM
IBM SP PC604e130 processors
Alcatel
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
��
Manufacturer
Cray
SGI
IBM
Sun
Convex/HP
TMC
Intel
FujitsuNEC
Hitachiothers
0
100
200
300
400
500
Jun-9
3
Nov-93
Jun-9
4
Nov-94
Jun-9
5
Nov-95
Jun-9
6
Nov-96
Jun-9
7
Nov-97
Jun-9
8
Nov-98
Jun-9
9
Nov-99
Jun-0
0
Nov-00
��
��
Top500 Conclusions
�0LFURSURFHVVRU�EDVHG�VXSHUFRPSXWHUV�KDYH�EURXJKW�D�PDMRU�FKDQJH�LQ�DFFHVVLELOLW\�DQG�DIIRUGDELOLW\�
�033V FRQWLQXH�WR�DFFRXQW�RI�PRUH�WKDQ�KDOI�RI�DOO�LQVWDOOHG�KLJK�SHUIRUPDQFH��FRPSXWHUV�ZRUOGZLGH�
High-Performance Computing Directions: Beowulf-class PC Clusters
� &276�3&�1RGHV¾ 3HQWLXP��$OSKD��3RZHU3&��
603� &276�/$1�6$1�
,QWHUFRQQHFW¾ (WKHUQHW��0\ULQHW��
*LJDQHW��$70� 2SHQ�6RXUFH�8QL[
¾ /LQX[��%6'� 0HVVDJH�3DVVLQJ�&RPSXWLQJ
¾ 03,��390¾ +3)
� %HVW�SULFH�SHUIRUPDQFH� /RZ�HQWU\�OHYHO�FRVW� -XVW�LQ�SODFH�
FRQILJXUDWLRQ� 9HQGRU�LQYXOQHUDEOH� 6FDODEOH� 5DSLG�WHFKQRORJ\�WUDFNLQJ
Definition: Advantages:
Enabled by PC hardware, networks and operating system achieving capabilities of scientific workstations at a fraction of the cost and availability of industry standard message passing libraries.
��
��
Clusters on the Top500
Distributed and Parallel Systems
Distributedsystemshetero-geneous
Massivelyparallelsystemshomo-geneous
Grid
Com
putin
gB
eow
ulf
Ber
kley
NO
WS
NL
Cpl
ant
Ent
ropi
a
AS
CI T
flops
� *DWKHU��XQXVHG��UHVRXUFHV
� 6WHDO�F\FOHV
� 6\VWHP�6:�PDQDJHV�UHVRXUFHV
� 6\VWHP�6:�DGGV�YDOXH
� ����� ����RYHUKHDG�LV�2.
� 5HVRXUFHV�GULYH�DSSOLFDWLRQV
� 7LPH�WR�FRPSOHWLRQ�LV�QRW�FULWLFDO
� 7LPH�VKDUHG
� %RXQGHG�VHW�RI�UHVRXUFHV�
� $SSV�JURZ�WR�FRQVXPH�DOO�F\FOHV
� $SSOLFDWLRQ�PDQDJHV�UHVRXUFHV
� 6\VWHP�6:�JHWV�LQ�WKH�ZD\
� ���RYHUKHDG�LV�PD[LPXP
� $SSV�GULYH�SXUFKDVH�RI�HTXLSPHQW
� 5HDO�WLPH�FRQVWUDLQWV
� 6SDFH�VKDUHG
SET
I@ho
me
Par
alle
l Dis
t mem
��
��
Virtual Environments����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
����(��������(��������(��������(��������(��������(��������(��������(��������(��������(���
'R�WKH\�PDNH�DQ\�VHQVH"
��
��
Performance Improvements for Scientific Computing Problems
Derived from Computational Methods
1
10
100
1000
10000
1970 1975 1980 1985 1990 1995
Spee
d-Up F
actor
Multi-grid
Conjugate Gradient
SOR
Gauss-Seidel
Sparse GE
1
10
100
1000
10000
1970 1975 1980 1985 1995
��
Highly Parallel Supercomputing: Where Are We?� 3HUIRUPDQFH�
¾ 6XVWDLQHG�SHUIRUPDQFH�KDV�GUDPDWLFDOO\�LQFUHDVHG�GXULQJ�WKH�ODVW\HDU�
¾ 2Q�PRVW�DSSOLFDWLRQV��VXVWDLQHG�SHUIRUPDQFH�SHU�GROODU�QRZ�H[FHHGV�WKDW�RI�FRQYHQWLRQDO�VXSHUFRPSXWHUV��%XW���
¾ &RQYHQWLRQDO�V\VWHPV�DUH�VWLOO�IDVWHU�RQ�VRPH�DSSOLFDWLRQV�� /DQJXDJHV�DQG�FRPSLOHUV�
¾ 6WDQGDUGL]HG��SRUWDEOH��KLJK�OHYHO�ODQJXDJHV�VXFK�DV�+3)��390�DQG�03,�DUH�DYDLODEOH��%XW����
¾ ,QLWLDO�+3)�UHOHDVHV�DUH�QRW�YHU\�HIILFLHQW�¾ 0HVVDJH�SDVVLQJ�SURJUDPPLQJ�LV�WHGLRXV�DQG�KDUG
WR�GHEXJ�¾ 3URJUDPPLQJ�GLIILFXOW\�UHPDLQV�D�PDMRU�REVWDFOH�WR
XVDJH�E\�PDLQVWUHDP�VFLHQWLVW�
��
��
Highly Parallel Supercomputing: Where Are We?
�2SHUDWLQJ�V\VWHPV�¾5REXVWQHVV�DQG�UHOLDELOLW\�DUH�LPSURYLQJ�¾1HZ�V\VWHP�PDQDJHPHQW�WRROV�LPSURYH�V\VWHP�XWLOL]DWLRQ��%XW���
¾5HOLDELOLW\�VWLOO�QRW�DV�JRRG�DV�FRQYHQWLRQDO�V\VWHPV�
� ,�2�VXEV\VWHPV�¾1HZ�5$,'�GLVNV� +L33, LQWHUIDFHV��HWF��SURYLGH�VXEVWDQWLDOO\�LPSURYHG�,�2�SHUIRUPDQFH��%XW���
¾,�2�UHPDLQV�D�ERWWOHQHFN�RQ�VRPH�V\VWHPV�
��
The Importance of Standards -Software� :ULWLQJ�SURJUDPV�IRU�033�LV�KDUG����� %XW�����RQH�RII�HIIRUWV�LI�ZULWWHQ�LQ�D�VWDQGDUG�ODQJXDJH� 3DVW�ODFN�RI�SDUDOOHO�SURJUDPPLQJ�VWDQGDUGV����
¾ ����KDV�UHVWULFWHG�XSWDNH�RI�WHFKQRORJ\��WR��HQWKXVLDVWV��¾ ����UHGXFHG�SRUWDELOLW\��RYHU�D�UDQJH�RI�FXUUHQW
DUFKLWHFWXUHV�DQG�EHWZHHQ�IXWXUH�JHQHUDWLRQV�� 1RZ�VWDQGDUGV�H[LVW���390��03,��+3)���ZKLFK����
¾ ����DOORZV�XVHUV��PDQXIDFWXUHUV�WR�SURWHFW�VRIWZDUH�LQYHVWPHQW¾ ����HQFRXUDJH�JURZWK�RI�D��WKLUG�SDUW\��SDUDOOHO�VRIWZDUH�LQGXVWU\�
�SDUDOOHO�YHUVLRQV�RI�ZLGHO\�XVHG�FRGHV
��
��
The Importance of Standards -Hardware
� 3URFHVVRUV¾ FRPPRGLW\�5,6&�SURFHVVRUV
� ,QWHUFRQQHFWV¾ KLJK�EDQGZLGWK��ORZ�ODWHQF\�FRPPXQLFDWLRQV�SURWRFRO¾ QR�GH�IDFWR�VWDQGDUG�\HW��$70� )LEUH &KDQQHO��+33,��)'',�
� *URZLQJ�GHPDQG�IRU�WRWDO�VROXWLRQ�¾ UREXVW�KDUGZDUH���XVDEOH�VRIWZDUH
� +3&�V\VWHPV�FRQWDLQLQJ�DOO�WKH�SURJUDPPLQJ�WRROV��HQYLURQPHQWV���ODQJXDJHV���OLEUDULHV���DSSOLFDWLRQVSDFNDJHV�IRXQG�RQ�GHVNWRSV
��
The Future of HPC� 7KH�H[SHQVH�RI�EHLQJ�GLIIHUHQW�LV�EHLQJ�UHSODFHG�E\�WKH�HFRQRPLFV�RI�EHLQJ�WKH�VDPH
� +3&�QHHGV�WR�ORVH�LWV��VSHFLDO�SXUSRVH��WDJ� 6WLOO�KDV�WR�EULQJ�DERXW�WKH�SURPLVH�RI�VFDODEOH�JHQHUDO�SXUSRVH�FRPSXWLQJ����
� ����EXW�LW�LV�GDQJHURXV�WR�LJQRUH�WKLV�WHFKQRORJ\� )LQDO�VXFFHVV�ZKHQ�033�WHFKQRORJ\�LV�HPEHGGHG�LQ�GHVNWRS�FRPSXWLQJ
� <HVWHUGD\V�+3&�LV�WRGD\V�PDLQIUDPH�LV�WRPRUURZV�ZRUNVWDWLRQ
��
��
Different Architectures
� 6,0'�¾ 9HFWRU�&RPSXWHUV
� 0,0'¾ 6KDUHG�0HPRU\
� 3*(�8=�;JHYTW�(TRUZYJW� (WF^�/���9���;JHYTW�(TRUZYJW
¾ 'LVWULEXWHG�6KDUHG�0HPRU\� 8,.�4WNLNS�)NXYWNGZYJI�8MFWJI�2JRTW^� -5�(TS[J]�)NXYWNGZYJI�8MFWJI�2JRTW^� 8:3�*SYJWUWNXJ�8MFWJI�2JRTW^
¾ 'LVWULEXWHG�0HPRU\� (WF^�9�*�)NXYWNGZYJI�2JRTW^� .'2��85�)NXYWNGZYJI�2JRTW^� +ZONYXZ�;55�;JHYTW�)NXYWNGZYJI�2JRTW^� (QZXYJW�TK�5WTHJXXTWX� (4<8��34<8��'JT\ZQK�
��
SISD
� 5,6&�� 6XSHUVFDODU
��
��
Architectures�0,0'�6KDUHG�PHPRU\
¾6*,�2ULJLQ¾+3�&RQYH[�([HPSOHU¾&UD\�7���-��¾1(&�6;��¾6XQ�(QWHUSULVH
��
Distributed Memory
¾,%0�63¾&5$<�6*,�7�(¾6*,�2ULJLQ�����¾+3�&RQYH[�([HPSOHU¾6XQ�(QWHUSULVH¾ �,QWHO�
��
��
Vector Architecture (SIMD)
��
SIMD
� �7KLQNLQJ�0DFKLQH�&0���� �0DV3DU�
��
��
SGI/Cray Vector ComputersSystem parameters:
Model Cray J90(se) Cray T90 Clock cycle 10 ns (100 MHz) 2.2 ns (455 MHz)Theor. peak performance Per processor
200 Mflop/s 1.8 Gflop/s No. of processors 4-32 1-32Maximal 6.4 Gflop/s 58 Gflop/s Main memory <= 4 GB <= 8 GB Memory bandwidth Single proc. bandwidth 1.6 GB/s 24 GB/s
��
Achieving TeraFlops
� ,Q���������*IORS�V� �����IROG�LQFUHDVH
¾$UFKLWHFWXUH� J]UQTNYNSL�UFWFQQJQNXR
¾3URFHVVRU��FRPPXQLFDWLRQ��PHPRU\� 2TTWJgX�1F\
¾$OJRULWKP�LPSURYHPHQWV� GQTHP�UFWYNYNTSJI�FQLTWNYMRX
��
��
Future: Petaflops ( fl pt ops/s)
� $�3IORS�IRU���VHFRQG��O D�W\SLFDO�ZRUNVWDWLRQ�FRPSXWLQJ�IRU���\HDU�
� )URP�DQ�DOJRULWKPLF�VWDQGSRLQW¾ FRQFXUUHQF\¾ GDWD�ORFDOLW\¾ ODWHQF\��V\QF¾ IORDWLQJ�SRLQW�DFFXUDF\
1015
¾ G\QDPLF�UHGLVWULEXWLRQ�RI������ZRUNORDG
¾ QHZ�ODQJXDJH�DQG�FRQVWUXFWV
¾ UROH�RI�QXPHULFDO�OLEUDULHV
¾ DOJRULWKP�DGDSWDWLRQ�WR�KDUGZDUH�IDLOXUH
Today flops for our workstations≈ 1015
��
A Petaflops Computer System
� ��3IORS�V�VXVWDLQHG�FRPSXWLQJ� %HWZHHQ��������DQG�����������SURFHVVRUV� %HWZHHQ����7%�DQG��3%�PDLQ�PHPRU\� &RPPHQVXUDWH�,�2�EDQGZLGWK��PDVV�VWRUH��HWF�� ,I�EXLOW�WRGD\��FRVW�����%�DQG�FRQVXPH���7:DWW�
� 0D\�EH�IHDVLEOH�DQG�´DIIRUGDEOHµ�E\�WKH�\HDU�����