Upload
matthew-weiss
View
63
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Seminar on parallel computing. Goal: provide environment for exploration of parallel computing Driven by participants Weekly hour for discussion, show & tell Focus primarily on distributed memory computing on linux PC clusters Target audience: Experience with linux computing & Fortran/C - PowerPoint PPT Presentation
Citation preview
Seminar on parallel computing• Goal: provide environment for exploration of
parallel computing • Driven by participants• Weekly hour for discussion, show & tell• Focus primarily on distributed memory
computing on linux PC clusters• Target audience:
– Experience with linux computing & Fortran/C– Requires parallel computing for own studies
• 1 credit possible for completion of ‘proportional’ project
Main idea
• Distribute a job over multiple processing units
• Do bigger jobs than is possible on single machines
• Solve bigger problems faster
• Resources: e.g., www-jics.cs.utk.edu
Sequential limits
• Moore’s law
• Clock speed physically limited– Speed of light– Miniaturization; dissipation; quantum effects
• Memory addressing – 32 bit words in PCs: 4 Gbyte RAM max.
Machine architecture: serial
– Single processor– Hierarchical memory:
• Small number of registers on CPU
• Cache (L1/L2)
• RAM
• Disk (swap space)
– Operations require multiple steps• Fetch two floating point numbers from main memory
• Add and store
• Put back into main memory
Vector processing
• Speed up single instructions on vectors– E.g., while adding two floating point numbers
fetch two new ones from main memory– Pushing vectors through the pipeline
• Useful in particular for long vectors• Requires good memory control:
– Bigger cache is better
• Common on most modern CPUs– Implemented in both hardware and software
SIMD
• Same instruction works simultaneously on different data sets
• Extension of vector computing• Example:
DO IN PARALLELfor i=1,n
x(i) = a(i)*b(i) endDONE PARALLEL
MIMD• Multiple instruction, multiple data• Most flexible, encompasses SIMD/serial.• Often best for ‘coarse grained’ parallelism• Message passing• Example: domain decomposition
– Divide computational grid in equal chunks
– Work on each domain with one CPU
– Communicate boundary values when necessary
• 1976 Cray-1 at Los Alamos (vector)
• 1980s Control Data Cyber 205 (vector)
• 1980s Cray XMP– 4 coupled Cray-1s
• 1985 Thinking Machines Connection Machine– SIMD, up to 64k processors
• 1984+ Nec/Fujitsu/Hitachi– Automatic vectorization
Historical machines
Sun and SGI (90s)
• Scaling between desktops and compute servers– Use of both vectorization and large scale
parallelization– RISC processors– Sparc for Sun– MIPS for SGI: PowerChallenge/Origin
Happy developments
• High performance Fortran / Fortran 90
• Definitions for message passing languages– PVM– MPI
• Linux
• Performance increase of commodity CPUs
• Combination leads to affordable cluster computing
Who’s the biggest
• www.top500.org
• Linpack matrix-vector benchmarks
• June 2003:– Earth Simulator, Yokohama, NEC, 36 Tflops– Asci Q, Los Alamos, HP, 14 Tflops– Linux cluster, Livermore, 8 Tflops
Parallel approaches
• Embarrassingly parallel– “Monte Carlo” searches– SETI @ home
• Analyze lots of small time series
• Parallalize DO-loops in dominantly serial code• Domain decomposition
– Fully parallel– Requires complete rewrite/rethinking
Example: seismic wave propagation
• 3D spherical wave propagation modeled with high order finite element technique (Komatitsch and Tromp, GJI, 2002)
• Massively parallel computation on linux PC clusters
• Approx. 34 Gbyte RAM needed for 10 km average resolution
• www.geo.lsa.umich.edu/~keken/waves
Resolution
• Spectral elements: 10 km average resolution
• 4th order interpolation functions
• Reasonable graphics resolution: 10 km or better
• 12 km: 10243 = 1 GB
• 6 km: 20483 = 8 GB
Simulated EQ (d=15 km) after 17 minutes512x512 256 colorsPositive onlyTruncated maxLog10 scaleParticle velocity
P
PP
PKIKP
SK
PPP
PKPPKPab
512x512 256 colorsPositive onlyTruncated maxLog10 scaleParticle velocity
Some S component
R
PKS
PcSS
PcSS
SS
Resources at UM
• Various linux clusters in Geology– Agassiz (Ehlers) 8 Pentium 4 @ 2 Gbyte each– Panoramix (van Keken) 10 P3 @ 512 Gbyte– Trans (van Keken, Ehlers) 24 P4 @ 2 Gbyte
• SGIs– Origin 2000 (Stixrude, Lithgow, van Keken)
• Center for Advanced Computing @ UM– Athlon clusters (384 nodes @ 1 Gbyte each)– Opteron cluster (to be installed)
• NPACI
Software resources
• GNU and Intel compilers– Fortran/Fortran 90/C/C++
• MPICH www-fp.mcs.anl.gov– Primary implementation of MPI– “Using MPI” 2nd edition, Gropp et al., 1999
• Sun Grid Engine
• Petsc www-fp.mcs.anl.gov– Toolbox for parallel scientific computing