1
CS 370 Dr. Young 1
Parallel Processing: Past, Present and Future
Dr. G. Young
CS 370 Dr. Young 2
What is a Supercomputer? Let us run a contest. Who gives the
most updated explanation?
CS 370 Dr. Young 3
Supercomputer (AllWords.com)
A very fast, powerful mainframe computer, used in advanced military and scientific applications.
CS 370 Dr. Young 4
Supercomputer (M-W.com, Merriam-Webster's
Collegiate Dictionary)
A large very fast mainframe used especially for scientific computations
2
CS 370 Dr. Young 5
Supercomputer (Dictionary.com)
A mainframe computer that is among the largest, fastest, or most powerful of those available at a given time.
CS 370 Dr. Young 6
Supercomputer (FOLDOC.doc.ic.ac.uk)
A broad term for one of the fastest computers currently available. Such computers are typically used for number crunchingincluding scientific simulations, (animated) graphics, analysis of geological data (e.g. in petrochemical prospecting), structural analysis, computational fluid dynamics, physics, chemistry, electronic design, nuclear energy research and meteorology.Perhaps the best known supercomputer manufacturer is Cray Research. A less serious definition, reported from about 1990 at The University Of New South Wales states that a supercomputer is any computer that can outperform IBM's current fastest, thus making it impossible for IBM to ever produce a supercomputer.
CS 370 Dr. Young 7
Supercomputer (ComputerUser.com)
A very fast and powerful computer,outperforming most mainframes, and usedfor intensive calculation, scientificsimulations, animated graphics, and otherwork that requires sophisticated and high-powered computing. Cray Research and Intel are well-knownproducers of supercomputers.
CS 370 Dr. Young 8
Supercomputer (PCWebopaedia.com)
The fastest type of computer. Supercomputers are very expensive and are employed for specialized applications that require immense amounts of mathematical calculations.For example, weather forecasting requires a supercomputer. Other uses of supercomputers include animated graphics, fluid dynamic calculations, nuclear energy research, and petroleum exploration. The chief difference between a supercomputer and a mainframe is that a supercomputer channels all its power into executing a few programs as fast as possible, whereas a mainframe uses its power to execute many programs concurrently.
3
CS 370 Dr. Young 9
Supercomputer (PrenHall.com)
The category that includes the largest and most powerful computers.
CS 370 Dr. Young 10
Supercomputer (Geek.com) This refers to a computer that is
able to operate at a speed that places it at or near the top speed of currently produced computers. Most supercomputers cost millions of dollars, and the traditional model of using one large computer with proprietary hardware is being challenged by using a cluster of cheaper computers with more standard hardware.
CS 370 Dr. Young 11
Supercomputer Contest Who is the winner?
AllWords.com M-W.com, Merriam-Webster's Collegiate
Dictionary Dictionary.com FOLDOC.doc.ic.ac.uk ComputerUser.com PCWebopaedia.com PrenHall.com Geek.com
CS 370 Dr. Young 12
Contest Winner geek.com @ 2001
(Led by Chief Geek - Joel Evans )
Used to tell people all about Geek.
For example, to check out if you’re Beginner Geek, Intermediate Geek, Advanced Geek or Super Geek
4
CS 370 Dr. Young 13
Winner Highlight (Geek.com@2001) This refers to a computer
that is able to operate at a speed that places it at or near the top speed of currently produced computers. Most supercomputers cost millions of dollars, and the traditional model of using one large computer with proprietary hardware is being challenged by usinga cluster of cheaper computers with more standard hardware.
CS 370 Dr. Young 14
CS 370 Dr. Young 15
Topics of Discussion Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A
CS 370 Dr. Young 16
Introduction Why we need Supercomputers? Supercomputer Vendors Supercomputer Products Top Supercomputers How to evaluate the power of a
supercomputer? Top 10 Supercomputers Theoretical Implication of Parallel machines Areas of Research in Supercomputing Supercomputing Journals
5
CS 370 Dr. Young 17
Why we need Supercomputers?
Even though processor speed has been increased dramatically, but still not fast enough to our needs. Use multiple processors is the way to go.
Areas need supercomputers: Generally involves intensive computation Aerospace, Weather, Finance, Defense, Energy,
Internet, Government, Chemistry, Geophysics, Telecom, Academic, Database, Mechanics, Automotive,Transportation, Electronics, Manufacturing, Fluid Dynamic, Petroleum
CS 370 Dr. Young 18
Supercomputer Vendors
CS 370 Dr. Young 19
Supercomputer Products The Avalon A12 The Cambridge Parallel Processing Gamma II Plus. The Compaq AlphaServer SC Series. The Fujitsu AP3000 The Fujitsu VPP5000 series The Hitachi SR8000 system The HP Exemplar V2600 The IBM RS/6000 SP The NEC Cenju-4 The NEC SX-5 The SGI Origin 2000 series The Sun E1000 Starfire The Tera/Cray SV1 The Tera/Cray T3E
They use different technologies: Processor, OS, Connection structure, Proprietary hardware and Software
CS 370 Dr. Young 20
How to evaluate the power of a supercomputer?
Peak-performance Theoretical Run-time
Benchmarks Linpack benchmark (Top500)
Finding Largest Mersenne Prime Number
6
CS 370 Dr. Young 21
How to evaluate the power of a supercomputer?
Benchmarks LINPACK Benchmark (introduced by Jack Dongarra)
is to solve a dense system of linear equations. Rank Top500 supercomputers
This performance does not reflect the overall performance of a given system, as no single number ever can.
Since the problem is very regular, the performance achieved is quite high, and the performance numbers give a good correction of peak performance.
CS 370 Dr. Young 22
How to evaluate the power of a supercomputer?
Prime Number Greek mathematician Euclid proved that there
are an infinite number of prime numbers. do not occur in a regular sequence no formula for generating them.
Discovery of new primes requires randomly generating and testing millions of numbers.
CS 370 Dr. Young 23
How to evaluate the power of a supercomputer?Largest known Mersenne Prime Numbers* before 2000
Prime Digits Year Name 2^21701-1 6533 1978 Landon Curt Noll (with Laura Nickel, Ariel Glenn) 2^23209-1 6987 1979 Landon Curt Noll 2^44497-1 13395 1979 David Slowinski (with Harry Nelson) 2^86243-1 25962 1982 David Slowinski 2^132049-1 39751 1983 David Slowinski 2^216091-1 65050 1985 David Slowinski 2^756839-1 227832 1992 David Slowinski Paul Gage 2^859433-1 258716 1994 David Slowinski Paul Gage 2^1257787-1 378632 1996 David Slowinski Paul Gage 2^1398269-1 420921 1997 David Slowinski Paul Gage 2^2976221-1 895932 1997 David Slowinski Paul Gage 2^3021377-1 909526 1998 David Slowinski Paul Gage 2^6972593-1 2098960 # 1999 David Slowinski Paul Gage
* Mersenne Prime Numbers are Prime Numbers in the form of 2^<Integer> -1# 67 pages long if printed on Newspaper
CS 370 Dr. Young 24
How to evaluate the power of a supercomputer?
The current largest known Mersenne Prime Numbers (in the form of 2n – 1) can be found at
http://www.mersenne.org/$$$ The Electronic Frontier Foundation
is offering a $100,000 award for discovering the next largest (ten million digits) prime number
7
CS 370 Dr. Young 25
How to evaluate the power of a supercomputer?
Finding the Largest Mersenne Prime Number Slowinski: (SGI, Cray)
"The prime finder program rigorously tests all elements of a system -- from the logic of the processors, to the memory, the compiler and the operating and multitasking systems. For high performance systems with multiple processors, this is an excellent test of the system's ability."
CS 370 Dr. Young 26
Top 10 Supercomputers
Country 2006 2007 2008USA 6 8 6Japan 2Spain 1India 1Germany 1 1 1France 1 2
CS 370 Dr. Young 27
Top 10 Supercomputers
Country 2012 (Nov)
2013 (June)
2013 (Nov)
USA 5 5 5China 1 2 1Japan 1 2 1Germany 2 1 2Italy 1Switzerland 1
CS 370 Dr. Young 28
Top Supercomputers
Timeline http://www.top500.org/timeline/
Top #1 System http://www.top500.org/featured/to
p-systems/
8
CS 370 Dr. Young 29
Theoretical Implication of Parallel machines
Parallel machine with infinite number of processors means we have a Non-deterministic Machine
Statement like Guess({S1,S2}) can be added to our familiar deterministic program.
Suddenly, those NP-hard problems (e.g. Traveling Salesman Problem) can be solved in Linear time
CS 370 Dr. Young 30
Areas of Research in P&D Computing Parallel and Distributed Architectures
Parallel and Distributed Algorithms
Parallel Programming Languages
Scientific Computing
Signal & Image Processing Systems
Special Purpose Processors
VLSI and Configurable Logic Systems
Performance Modeling/Evaluation
Memory Hierarchy Issues in Parallel and Distributed Processing
Programming Environments and Tools for Parallel and Distributed Platforms
Compilers and Optimizations for Parallel and Distributed Processing
Operating System and Runtime Support for Parallel and Distributed Computing
Parallel and Distributed Network Protocols and Implementations
Applications of Parallel and Distributed Computing
Nontraditional Processor Technologies (Optical, Quantum, DNA, etc.)
CS 370 Dr. Young 31
Supercomputing Journals ACM J. of Experimental
Algorithmics BIT Cluster Computing Computing and Visualization in
Science IEEE Trans. on Computers IEEE Trans. on Parallel and
Distributed Systems International J. of Computer
Research International J. of Computers and
Their Applications International J. of High Performance
Computing and Networking International J. of High Speed
Computing
International J. of Parallel Programming
J. of Interconnection Networks J. of Parallel and Distributed
Computing J. of Performance Evaluation and
Modeling of Computer Systems J. of Supercomputing J. of Visual Languages &
Computing Parallel Algorithms and Applications Parallel Computing Parallel and Distributed Computing
Practices Parallel Processing Letters SIAM J. of Computing SIAM J. of Scientific Computing
CS 370 Dr. Young 32
Topics of Discussion Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A
9
CS 370 Dr. Young 33
Computer Networks Homogeneity
Same kind of computers Examples: a network of PCs, a network of
Sun workstations, …
Heterogeneity A mixture of different computers Example: Internet
CS 370 Dr. Young 34
Computer NetworksNetwork/Parallel Computer Architecture
Chain Ring Mesh Torus
Tree Star Cube Hypercube
CS 370 Dr. Young 35
Computer NetworksProprietary Parallel Computers
Ring HP Exemplar V2600
Mesh Cambridge Parallel Processing Gamma II Plus
Torus Fujitsu AP3000
Tera/Cray Research Inc. T3E
Hypercube SGI Origin series
CS 370 Dr. Young 36
Topics of Discussion Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A
10
CS 370 Dr. Young 37
Parallel and Distributed Processing
Hardware structure of Parallel Computers Architectural Classes Memory Systems Distributed Processing PVM & MPI Parallel Applications Task Assignment
CS 370 Dr. Young 38
Parallel and Distributed ProcessingHardware Structure of Parallel Computers
Classification is based on the way of manipulating of instruction and data streams
4 main architectural classes [Flynn, 1972] Multiple/Single Instruction (MI/SI) Multiple/Single Data (MD/SD)
M.J. Flynn, Some computer organizations and their) effectiveness, IEEE Transactions on Computing, C-21,pp. 948-960, 1972.
CS 370 Dr. Young 39
Parallel and Distributed Processing
Architectural ClassesSISD machines:
Accommodate one instruction stream that is executed serially. These are the conventional systems that contain one CPU
SIMD machines: Such systems often have thousands of processing units execute the same instruction on different data Hitachi S3600
CS 370 Dr. Young 40
Parallel and Distributed Processing
Architectural ClassesMISD machines:
Multiple instructions should act on a single stream of data
No practical machine
MIMD machines: Execute instruction streams in parallel on different data. Run many sub-tasks in parallel Large variety of MIMD systems
11
CS 370 Dr. Young 41
Parallel and Distributed Processing
Memory SystemsShared memory systems:
Have multiple CPUs all of which share the same address space.
Distributed memory systems: Each CPU has its own associated
memory.
CS 370 Dr. Young 42
Parallel and Distributed Processing
Distributed Processing DM-MIMD concept one step further Instead of many integrated processors
in one or several boxes, workstations are connected by (Gigabit) Ethernet, FDDI, or otherwise and set to work concurrently on tasks in the same program.
communication between processors is often slower in orders of magnitude.
CS 370 Dr. Young 43
Parallel and Distributed Processing
PVM & MPI Packages to realize Distributed Processing
PVM (Parallel Virtual Machine) [Geist et al., 1994]
MPI (Message Passing Interface) [Snir et al. and Gropp et al., 1998]
A. Geist, A. Beguelin, J. Dongarra, R. Manchek, W. Jaing, and V. Sunderam, PVM: A Users' Guide and Tutorial for Networked Parallel Computing, MIT Press, Boston, 1994. M. Snir, S. Otto, S. Huss-Lederman, D. Walker, J. Dongarra, MPI: The Complete Reference Vol. 1, The MPI Core, MIT Press, Boston, 1998. W. Gropp, S. Huss-Ledermann, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, M. Snir, MPI: The Complete Reference, Vol. 2, The MPI Extensions, MIT Press, Boston, 1998.
CS 370 Dr. Young 44
Parallel and Distributed Processing
PVM & MPI This style of programming, called the
"message passing" model, has been widely accepted
PVM and MPI have been adopted by virtually all major vendors of distributed-memory MIMD systems and even on shared-memory MIMD systems for compatibility reasons.
12
CS 370 Dr. Young 45
Parallel and Distributed Processing
Parallel Applications Parallel Algorithms Fine grain/Coarse grain Parallel Programming
ParBegin/ParEnd
PVM/MPI APIs
CS 370 Dr. Young 46
Parallel and Distributed Processing
Task Assignment Performance Measures
Completion Time Throughput
Overheads for P&D Processing Execution Time for tasks (E) Intra-task Interference cost (ITI) Inter-task Communication cost (ITC)
CS 370 Dr. Young 47
Parallel and Distributed Processing
Task AssignmentThroughput (Stone, 1977)
E + ITI + ITC
H. Stone, Multiprocessor Scheduling with the Aid of Network Flow Algorithms, IEEE Transactions on Software Engineering, Vol. 3, No. 1, pp. 83-85, 1977.
CS 370 Dr. Young 48
Topics of Discussion Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A
13
CS 370 Dr. Young 49
Affordable supercomputer Computer networks with Off-the-Shelf hardware
Powered by Parallel and Distributed Software Tools
Advantages over Conventional Supercomputer System of Homogeneous Network
A network of PC with SCSI Link SPVM
System of Heterogeneous Network Internet JMPI
CS 370 Dr. Young 50
Computer Networks with Off-the-Shelf Hardware Powered by Parallel and Distributed Processing Tools
CS 370 Dr. Young 51
Advantages over Conventional Supercomputer
Decomposable Reusable Scale up and down easily Off-the-shelf Third World friendly Economical Reconfigurable Interconnection Topology Easy to upgrade – bus, processor, software Collaborative R&D Environment General-purpose Multi-usage
CS 370 Dr. Young 52
Homogeneous Network A network of Pentium PCs
14
CS 370 Dr. Young 53
Heterogeneous Network
CS 370 Dr. Young 54
Topics of Discussion Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A
CS 370 Dr. Young 55
Future Trend and Challenge PVM and MPI Community continues to
grow Cheaper and faster processors and
Interconnections More employment of Clusters of
Workstations for High Performance Computing
More freely available Software Tools
CS 370 Dr. Young 56
Future Trend and Challenge Race between Proprietary supercomputer and
the Cluster computers How fast can a supercomputer go? How the heterogeneous computing evolves? Will a cluster of computers over Internet be a
faster computer in the world? Processing Power on Demand Service? Processor Sharing?
15
CS 370 Dr. Young 57
Topics of Discussion Introduction Computer Networks Parallel and Distributed Processing Affordable Supercomputer Future Trend and Challenge Conclusion Q&A
CS 370 Dr. Young 58
Conclusion
Powered by the state-of-art Parallel and Distributed Processing Tools, high-speed computer network, with powerful workstations, will become a very attractive, affordable, highly scalable and highly available solution for the High Performance Computing world.
CS 370 Dr. Young 59
Conclusion Such an Exciting Area of Research
Practical Affordable Educational
Knowledge Sharing through Major Forums (e.g. IEEE TFCC, Top500, TopClusters)
One Key issue is how to compare/evaluate/rank their performances
CS 370 Dr. Young 60
Conclusion Research topics
Build Your Own Supercomputer(Cluster) Heterogeneous System Employ new COTS (Com. Off-the-Shelf) Classification Benchmarks Performance Tracking Tools System Administration Software
16
CS 370 Dr. Young 61
Top 500 Supercomputers Update
Trend of Cluster Computers Versus Proprietary Supercomputers.
The TOP 500 Supercomputer List http://www.top500.org/
CS 370 Dr. Young 62
Q&A