High Performance Computing and CyberGIS Keith T. Weber, GISP GIS Director, ISU

Embed Size (px)

Citation preview

  • Slide 1
  • High Performance Computing and CyberGIS Keith T. Weber, GISP GIS Director, ISU
  • Slide 2
  • Goal of this presentation Introduce you to an another world of computing, analysis, and opportunity Encourage you to learn more!
  • Slide 3
  • Some Terminology Up-Front Supercomputing HPC HTC CI
  • Slide 4
  • AcknowledgementsAcknowledgements Much of the material presented here, was originally designed by Henry Neeman at Oklahoma University and OSCER
  • Slide 5
  • 5 What is Supercomputing? Supercomputing is the biggest, fastest computing right this minute. Likewise, a supercomputer is one of the biggest, fastest computers right this minute. So, the definition of supercomputing is constantly changing. Rule of Thumb: A supercomputer is typically 100 X as powerful as a PC.
  • Slide 6
  • Fastest Supercomputer and Moores Law
  • Slide 7
  • 7 What is Supercomputing About? Size Speed Laptop
  • Slide 8
  • 8 SizeSize Many problems that are interesting to scientists and engineers cant fit on a PC usually because they need more than a few GB of RAM, or more than a few 100 GB of disk.
  • Slide 9
  • SpeedSpeed Many problems that are interesting to scientists and engineers would take a long time to run on a PC. months or even years. But a problem that would take 1 month on a PC might take only a few hours on a supercomputer
  • Slide 10
  • What can Supercomputing be used for? Data Mining Modeling Simulation Visualization [1]
  • Slide 11
  • What is a Supercomputer? A cluster of small computers, each called a node, hooked together by an interconnection network (interconnect for short). A cluster needs software that allows the nodes to communicate across the interconnect. But what a cluster is is all of these components working together as if theyre one big computer... a super computer.
  • Slide 12
  • 1,076 Intel Xeon CPU chips/4288 cores 8,800 GB RAM ~130 TB globally accessible disk QLogic Infiniband Force10 Networks Gigabit Ethernet Red Hat Enterprise Linux 5 Peak speed: 34.5 TFLOPs* *TFLOPs: trillion floating point operations (calculations) per second For example: Dell Intel Xeon Linux Cluster sooner.oscer.ou.edu
  • Slide 13
  • Quantifying a Supercomputer Number of cores Your workstation (4?) ISU cluster (800) Blue Waters (300,000) TeraFlops
  • Slide 14
  • PARALLELISMPARALLELISM How a cluster works together:
  • Slide 15
  • ParallelismParallelism Less fish More fish! Parallelism means doing multiple things at the same time
  • Slide 16
  • THE JIGSAW PUZZLE ANALOGY Understanding Parallel Processing 16
  • Slide 17
  • Serial Computing We are very accustom to serial processing. It can be compared to building a jigsaw puzzle by yourself. In other words, suppose you want to complete a jigsaw puzzle that has 1000 pieces. We can agree this will take a certain amount of timelets just say, one hour
  • Slide 18
  • Shared Memory Parallelism If Scott sits across the table from you, then he can work on his half of the puzzle and you can work on yours. Once in a while, youll both reach into the pile of pieces at the same time (youll contend for the same resource), which will cause you to slowdown. And from time to time youll have to work together (communicate) at the interface between his half and yours. The speedup will be nearly 2-to-1: Together it will take about 35 minutes instead of 30.
  • Slide 19
  • The More the Merrier? Now lets put Paul and Charlie on the other two sides of the table. Each of you can work on a part of the puzzle, but therell be a lot more contention for the shared resource (the pile of puzzle pieces) and a lot more communication at the interfaces. So you will achieve noticeably less than a 4-to-1 speedup. But youll still have an improvement, maybe something like 20 minutes instead of an hour.
  • Slide 20
  • Diminishing Returns If we now put Dave, Tom, Horst, and Brandon at the corners of the table, theres going to be a much more contention for the shared resource, and a lot of communication at the many interfaces. The speedup will be much less than wed like; youll be lucky to get 5-to-1. We can see that adding more and more workers onto a shared resource is eventually going to have a diminishing return.
  • Slide 21
  • Amdahls Law Source: http://codeidol.com/java/java-concurrency/Performance-and-Scalability/Amdahls-Law/http://codeidol.com/java/java-concurrency/Performance-and-Scalability/Amdahls-Law/ CPU utilization
  • Slide 22
  • Distributed Parallelism Lets try something a little different. Lets set up two tables You will sit at one table and Scott at the other. We will put half of the puzzle pieces on your table and the other half of the pieces on Scotts. Now you can work completely independently, without any contention for a shared resource. BUT, the cost per communication is MUCH higher, and you need the ability to split up (decompose) the puzzle correctly, which can be tricky.
  • Slide 23
  • 23 More Distributed Processors Its easy to add more processors in distributed parallelism. But you must be aware of the need to: decompose the problem and communicate among the processors. Also, as you add more processors, it may be harder to load balance the amount of work that each processor gets.
  • Slide 24
  • FYIKinds of Parallelism Instruction Level Parallelism Shared Memory Multithreading Distributed Memory Multiprocessing GPU Parallelism Hybrid Parallelism (Shared + Distributed + GPU) 24
  • Slide 25
  • Why Parallelism Is Good The Trees: We like parallelism because, as the number of processing units working on a problem grows, we can solve the same problem in less time. The Forest: We like parallelism because, as the number of processing units working on a problem grows, we can solve bigger problems.
  • Slide 26
  • JargonJargon Threads are execution sequences that share a single memory area Processes are execution sequences with their own independent, private memory areas Multithreading: parallelism via multiple threads Multiprocessing: parallelism via multiple processes Shared Memory Parallelism is concerned with threads Distributed Parallelism is concerned with processes.
  • Slide 27
  • Basic Strategies Data Parallelism: Each processor does exactly the same tasks on its unique subset of the data jigsaw puzzles or big datasets that need to be processed now! Task Parallelism: Each processor does different tasks on exactly the same set of data which algorithm is best?
  • Slide 28
  • An Example: Embarrassingly Parallel An application is known as embarrassingly parallel if its parallel implementation: Can straightforwardly be broken up into equal amounts of work per processor, AND Has minimal parallel overhead (i.e., communication among processors) FYIEmbarrassingly parallel applications are also known as loosely coupled.
  • Slide 29
  • Monte Carlo Methods Monte Carlo methods are ways of simulating or calculating actual phenomena based on randomness within known error limits. In GIS, we use Monte Carlo simulations to calculate error propagation effects How? Monte Carlo simulations are typically, embarrassingly parallel applications.
  • Slide 30
  • Monte Carlo Methods In a Monte Carlo method, you randomly generate a large number of example cases (realizations), and then compare the results of these realizations When the average of the realizations converges that is, your answer doesnt change substantially if new realizations are generated, then the Monte Carlo simulation can stop.
  • Slide 31
  • Embarrassingly Parallel Monte Carlo simulations are embarrassingly parallel, because each realization is independent of all other realizations
  • Slide 32
  • A Quiz Q: Is this an example of Data Parallelism or Task Parallelism? A: Task Parallelism: Each processor does different tasks on exactly the same set of data
  • Slide 33
  • Questions so far?
  • Slide 34
  • WHAT IS A GPGPU? OR THANK YOU GAMING INDUSTRY A bit more to know
  • Slide 35
  • Its an Accelerator No, not this....
  • Slide 36
  • AcceleratorsAccelerators In HPC, an accelerator is hardware whose role it is to speed up some aspect of the computing workload. In the olden days (1980s), PCs sometimes had floating point accelerators (aka, the math coprocessor)
  • Slide 37
  • Why Accelerators are Good They make your code run faster.
  • Slide 38
  • Why Accelerators are Bad Because: Theyre expensive (or they were) Theyre harder to program (NVIDIA CUDA) Your code may not be portable to other accelerators, so the labor you invest in programming may have a very short life.
  • Slide 39
  • The King of the Accelerators The undisputed king of accelerators is the graphics processing unit (GPU).
  • Slide 40
  • Why GPU? Graphics Processing Units (GPUs) were originally designed to accelerate graphics tasks like image rendering for gaming. They became very popular with gamers, because they produced better and better images, at lightning fast refresh speeds As a result, prices have become extremely reasonable, ranging from three figures at the low end to four figures at the high end.
  • Slide 41
  • GPUs Do Arithmetic GPUs render images This is done through floating point arithmetic As it turns out, this is the same stuff people use supercomputing for!
  • Slide 42
  • Interested? Curious? To learn more, or to get involved with supercomputing there is a host of opportunities awaiting you Get to know your Campus Champions Visit http://giscenter.isu.edu/research/Techpg/CI/XSEDE/ http://giscenter.isu.edu/research/Techpg/CI/XSEDE/ Visit https://www.xsede.org/https://www.xsede.org/ Ask about internships (BWUPEP) Learn C (not C++, but C) or Fortran Learn UNIX
  • Slide 43
  • Questions?Questions?