High Performance Computing and CyberGIS Keith T. Weber, GISP
GIS Director, ISU
Slide 2
Goal of this presentation Introduce you to an another world of
computing, analysis, and opportunity Encourage you to learn
more!
Slide 3
Some Terminology Up-Front Supercomputing HPC HTC CI
Slide 4
AcknowledgementsAcknowledgements Much of the material presented
here, was originally designed by Henry Neeman at Oklahoma
University and OSCER
Slide 5
5 What is Supercomputing? Supercomputing is the biggest,
fastest computing right this minute. Likewise, a supercomputer is
one of the biggest, fastest computers right this minute. So, the
definition of supercomputing is constantly changing. Rule of Thumb:
A supercomputer is typically 100 X as powerful as a PC.
Slide 6
Fastest Supercomputer and Moores Law
Slide 7
7 What is Supercomputing About? Size Speed Laptop
Slide 8
8 SizeSize Many problems that are interesting to scientists and
engineers cant fit on a PC usually because they need more than a
few GB of RAM, or more than a few 100 GB of disk.
Slide 9
SpeedSpeed Many problems that are interesting to scientists and
engineers would take a long time to run on a PC. months or even
years. But a problem that would take 1 month on a PC might take
only a few hours on a supercomputer
Slide 10
What can Supercomputing be used for? Data Mining Modeling
Simulation Visualization [1]
Slide 11
What is a Supercomputer? A cluster of small computers, each
called a node, hooked together by an interconnection network
(interconnect for short). A cluster needs software that allows the
nodes to communicate across the interconnect. But what a cluster is
is all of these components working together as if theyre one big
computer... a super computer.
Slide 12
1,076 Intel Xeon CPU chips/4288 cores 8,800 GB RAM ~130 TB
globally accessible disk QLogic Infiniband Force10 Networks Gigabit
Ethernet Red Hat Enterprise Linux 5 Peak speed: 34.5 TFLOPs*
*TFLOPs: trillion floating point operations (calculations) per
second For example: Dell Intel Xeon Linux Cluster
sooner.oscer.ou.edu
Slide 13
Quantifying a Supercomputer Number of cores Your workstation
(4?) ISU cluster (800) Blue Waters (300,000) TeraFlops
Slide 14
PARALLELISMPARALLELISM How a cluster works together:
Slide 15
ParallelismParallelism Less fish More fish! Parallelism means
doing multiple things at the same time
Slide 16
THE JIGSAW PUZZLE ANALOGY Understanding Parallel Processing
16
Slide 17
Serial Computing We are very accustom to serial processing. It
can be compared to building a jigsaw puzzle by yourself. In other
words, suppose you want to complete a jigsaw puzzle that has 1000
pieces. We can agree this will take a certain amount of timelets
just say, one hour
Slide 18
Shared Memory Parallelism If Scott sits across the table from
you, then he can work on his half of the puzzle and you can work on
yours. Once in a while, youll both reach into the pile of pieces at
the same time (youll contend for the same resource), which will
cause you to slowdown. And from time to time youll have to work
together (communicate) at the interface between his half and yours.
The speedup will be nearly 2-to-1: Together it will take about 35
minutes instead of 30.
Slide 19
The More the Merrier? Now lets put Paul and Charlie on the
other two sides of the table. Each of you can work on a part of the
puzzle, but therell be a lot more contention for the shared
resource (the pile of puzzle pieces) and a lot more communication
at the interfaces. So you will achieve noticeably less than a
4-to-1 speedup. But youll still have an improvement, maybe
something like 20 minutes instead of an hour.
Slide 20
Diminishing Returns If we now put Dave, Tom, Horst, and Brandon
at the corners of the table, theres going to be a much more
contention for the shared resource, and a lot of communication at
the many interfaces. The speedup will be much less than wed like;
youll be lucky to get 5-to-1. We can see that adding more and more
workers onto a shared resource is eventually going to have a
diminishing return.
Slide 21
Amdahls Law Source:
http://codeidol.com/java/java-concurrency/Performance-and-Scalability/Amdahls-Law/http://codeidol.com/java/java-concurrency/Performance-and-Scalability/Amdahls-Law/
CPU utilization
Slide 22
Distributed Parallelism Lets try something a little different.
Lets set up two tables You will sit at one table and Scott at the
other. We will put half of the puzzle pieces on your table and the
other half of the pieces on Scotts. Now you can work completely
independently, without any contention for a shared resource. BUT,
the cost per communication is MUCH higher, and you need the ability
to split up (decompose) the puzzle correctly, which can be
tricky.
Slide 23
23 More Distributed Processors Its easy to add more processors
in distributed parallelism. But you must be aware of the need to:
decompose the problem and communicate among the processors. Also,
as you add more processors, it may be harder to load balance the
amount of work that each processor gets.
Why Parallelism Is Good The Trees: We like parallelism because,
as the number of processing units working on a problem grows, we
can solve the same problem in less time. The Forest: We like
parallelism because, as the number of processing units working on a
problem grows, we can solve bigger problems.
Slide 26
JargonJargon Threads are execution sequences that share a
single memory area Processes are execution sequences with their own
independent, private memory areas Multithreading: parallelism via
multiple threads Multiprocessing: parallelism via multiple
processes Shared Memory Parallelism is concerned with threads
Distributed Parallelism is concerned with processes.
Slide 27
Basic Strategies Data Parallelism: Each processor does exactly
the same tasks on its unique subset of the data jigsaw puzzles or
big datasets that need to be processed now! Task Parallelism: Each
processor does different tasks on exactly the same set of data
which algorithm is best?
Slide 28
An Example: Embarrassingly Parallel An application is known as
embarrassingly parallel if its parallel implementation: Can
straightforwardly be broken up into equal amounts of work per
processor, AND Has minimal parallel overhead (i.e., communication
among processors) FYIEmbarrassingly parallel applications are also
known as loosely coupled.
Slide 29
Monte Carlo Methods Monte Carlo methods are ways of simulating
or calculating actual phenomena based on randomness within known
error limits. In GIS, we use Monte Carlo simulations to calculate
error propagation effects How? Monte Carlo simulations are
typically, embarrassingly parallel applications.
Slide 30
Monte Carlo Methods In a Monte Carlo method, you randomly
generate a large number of example cases (realizations), and then
compare the results of these realizations When the average of the
realizations converges that is, your answer doesnt change
substantially if new realizations are generated, then the Monte
Carlo simulation can stop.
Slide 31
Embarrassingly Parallel Monte Carlo simulations are
embarrassingly parallel, because each realization is independent of
all other realizations
Slide 32
A Quiz Q: Is this an example of Data Parallelism or Task
Parallelism? A: Task Parallelism: Each processor does different
tasks on exactly the same set of data
Slide 33
Questions so far?
Slide 34
WHAT IS A GPGPU? OR THANK YOU GAMING INDUSTRY A bit more to
know
Slide 35
Its an Accelerator No, not this....
Slide 36
AcceleratorsAccelerators In HPC, an accelerator is hardware
whose role it is to speed up some aspect of the computing workload.
In the olden days (1980s), PCs sometimes had floating point
accelerators (aka, the math coprocessor)
Slide 37
Why Accelerators are Good They make your code run faster.
Slide 38
Why Accelerators are Bad Because: Theyre expensive (or they
were) Theyre harder to program (NVIDIA CUDA) Your code may not be
portable to other accelerators, so the labor you invest in
programming may have a very short life.
Slide 39
The King of the Accelerators The undisputed king of
accelerators is the graphics processing unit (GPU).
Slide 40
Why GPU? Graphics Processing Units (GPUs) were originally
designed to accelerate graphics tasks like image rendering for
gaming. They became very popular with gamers, because they produced
better and better images, at lightning fast refresh speeds As a
result, prices have become extremely reasonable, ranging from three
figures at the low end to four figures at the high end.
Slide 41
GPUs Do Arithmetic GPUs render images This is done through
floating point arithmetic As it turns out, this is the same stuff
people use supercomputing for!
Slide 42
Interested? Curious? To learn more, or to get involved with
supercomputing there is a host of opportunities awaiting you Get to
know your Campus Champions Visit
http://giscenter.isu.edu/research/Techpg/CI/XSEDE/
http://giscenter.isu.edu/research/Techpg/CI/XSEDE/ Visit
https://www.xsede.org/https://www.xsede.org/ Ask about internships
(BWUPEP) Learn C (not C++, but C) or Fortran Learn UNIX