46
Parallel Computation Mark Greenstreet CpSc 418 – September 9, 2020 Why parallel programming matters Key aspects of parallel computing: I Architecture , performance , algorithms , paradigms . Course overview Our first parallel program Unless otherwise noted or cited, these slides are copyright 2020 by Mark Greenstreet and are made available under the terms of the Creative Commons Attribution 4.0 International license http://creativecommons.org/licenses/by/4.0/ Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 1 / 30

Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Parallel Computation

Mark Greenstreet

CpSc 418 – September 9, 2020

Why parallel programming mattersKey aspects of parallel computing:I Architecture, performance, algorithms, paradigms.

Course overviewOur first parallel program

Unless otherwise noted or cited, these slides are copyright 2020 by Mark Greenstreet and aremade available under the terms of the Creative Commons Attribution 4.0 International licensehttp://creativecommons.org/licenses/by/4.0/

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 1 / 30

Page 2: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Scholar StrikeI am aware of Scholar StrikeI I wholeheartedly agree that we are all people and deserve

equal opportunities regardless of our race, gender, ethnicity,religious beliefs, etc.

I I also understand that, sadly, we fall short of these ideals.Anyone who is participating in Scholar Strike is welcome to reviewthese slides and/or watch the recorded lecture at a time of theirchoosing.I I am giving this lecture at the scheduled time because: This

class has a waitlist, and the waitlist ordering will be prioritizedbased on completion of the first two, pre-class assignments.To delay this lecture would put students on the waitlist at anunfair disadvantage.

I You will not be penalized for not attending this lecture andreviewing the content later.

I If you have any questions, please send me e-mail:[email protected].

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 2 / 30

Page 3: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

A few more announcements

The waitlist: Priority will be given to those who complete the firsttwo pre-class assignments.I See https://www.students.cs.ubc.ca/˜cs-418/2020-1/pika.html

CPSC 418 is all on-lineI The class outline part of these slides is based on previous,

in-person offerings.I I will update how we actually teach the course based on your

feedback and what works.I We’ll make this work for everyone.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 3 / 30

Page 4: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Why Parallel Computation Matters: Clock Limit

year1970 1980 1990 2000 2010 2020

cloc

k fre

quen

cy

100KHz

1MHz

10MHz

100MHz

1GHz

10GHz

100GHz

1THzsingle core

double core

triple core

quad core

hex core

26% annual growth

51% annual growth

3.3GHz

2003

Clock Speed of Intel Processors vs. Year Released

In the good-old days (1993-2003), processor performance doubledroughly every 1.5 to 2 years.Single thread performance has seen small gains in the past 14 years.If the trend had continued, we would have > 2THz CPUs today. ,

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 4 / 30

Page 5: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Why Parallel Computation Matters: The Power Wall

1990 2000 2010

100MHz

1GHz

10GHz

year

cloc

k fre

quen

cy

single coredouble coretriple corequad corehex core

3.3GHz

2003

51% annual clock freq. growth

25W

50W

75W

100W

125W

150W

pow

er c

onsu

mpt

ion

(wat

ts)

Clock Speed and Power of Intel Processors vs. Year Releaseda

Power grew with clock speed.The post 2003 data is the highest clock frequency processor announced thatyearI Large variation due depending on whether that processor was for the lap-

top, desktop, gaming, or server product line.While clock frequency hit a limit, transistors continued to shrink.I We can’t make an individual processor faster, but we can put more proces-

sors on each chip.aSee http://en.wikipedia.org/wiki/List_of_CPU_power_dissipation

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 5 / 30

Page 6: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Welcome to Parallel ComputingSequential computing is all the sameI Architecture: IBM 360 = PDP 11 = M68000 = x86 = ARM =

RISC V. . .I Programming languages: Cobol = Fortran = C = C++ = Java

= Python = . . .Parallel computing doesn’t have a dominant architecture orprogramming paradigmI Architecture: multi-core, super-scalar, multi-threading,

shared-memory, clusters, GPUs, . . .I Programming paradigms: message passing, shared memory,

data-parallel, and more.I Programming frameworks: MPI, PThreads, CUDA, Hadoop,

Tensor Flow, . . .I Platforms: multi-core mobile devices, multi-core desktops and

laptops, GPU based-accelerators, data centers, clouds,scientific supercomputers.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 6 / 30

Page 7: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Example: GPUs

2020-09-05, 8:19 PMGeForce RTX 3090 Graphics Card | NVIDIA

Page 1 of 7https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090/

NVIDIA AMPERE ARCHITECTURE2ND GENERATIONRT CORES2X THROUGHPUT

3RD GENERATIONTENSOR CORESUP TO 2X THROUGHPUT

NEWSM2X FP32 THROUGHPUT

DLSS AI ACCELERATION

GEFORCE RTX 3090THE BFGPU

The GeForce RTX™ 3090 is a big ferocious GPU (BFGPU) with TITAN classperformance. It’s powered by Ampere—NVIDIA’s 2nd gen RTX architecture—doubling down on ray tracing and AI performance with enhanced Ray Tracing(RT) Cores, Tensor Cores, and new streaming multiprocessors. Plus, itfeatures a staggering 24 GB of G6X memory, all to deliver the ultimategaming experience.

STARTING AT $1,499.00

Bundle included with purchase. Learn more !

Founders Edition$1,499.00

Available on September 24th

" NOTIFY ME

0

AMPERE SPECS GALLERY ADDITIONAL FEATURESNOTIFY MEGEFORCE RTX 3090

10496 cores.Data parallel programming model: CUDA – *lots* of threads.Citation: https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/?nvid=nv-int-cwmfg-49069#

cid=_nv-int-cwmfg_en-us

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 7 / 30

Page 8: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Example: Cell-Phone platforms

2020-09-05, 9:02 PMQualcomm Announces Snapdragon 865 and 765(G): 5G For All in 2020, All The Details

Page 2 of 8https://www.anandtech.com/show/15178/qualcomm-announces-snapdragon-865-and-765-5g-for-all-in-2020-all-the-details

As with other 7-series models since the launch of the new range, the new generation adds of the newfeatures introduced with the new Snapdragon 865, at a lower performance level and a more affordableprice for what is becoming an increasingly popular device category.

Qualcomm Snapdragon Flagship SoCs 2019-2020

SoC Snapdragon 865 Snapdragon 855

CPU 1x Cortex A77 @ 2.84GHz 1x512KB pL2

3x Cortex A77@ 2.42GHz 3x256KB pL2

4x Cortex A55@ 1.80GHz 4x128KB pL2

4MB sL3 @ ?MHz

1x Kryo 485 Gold (A76 derivative)@ 2.84GHz 1x512KB pL2

3x Kryo 485 Gold (A76 derivative)@ 2.42GHz 3x256KB pL2

4x Kryo 485 Silver (A55 derivative)@ 1.80GHz 4x128KB pL2

2MB sL3 @ 1612MHz

GPU Adreno 650 @ ? MHz

+25% perf+50% ALUs

+50% pixel/clock+0% texels/clock

Adreno 640 @ 585 MHz

DSP / NPU Hexagon 698

15 TOPS AI(Total CPU+GPU+HVX+Tensor)

Hexagon 690

7 TOPS AI(Total CPU+GPU+HVX+Tensor)

MemoryController

4x 16-bit CH

@ 2133MHz LPDDR4X / 33.4GB/sor

@ 2750MHz LPDDR5 / 44.0GB/s

3MB system level cache

4x 16-bit CH

@ 1866MHz LPDDR4X 29.9GB/s

3MB system level cache

ISP/Camera Dual 14-bit Spectra 480 ISP

1x 200MP or 64MP with ZSLor

2x 25MP with ZSL

4K video & 64MP burst capture

Dual 14-bit Spectra 380 ISP

1x 48MP or 2x 22MP

Encode/ 8K30 / 4K120 10-bit H.265 4K60 10-bit H.265

be quiet! Announces The SilentBase 802 Chassis, With USB 3.2Type-C

be quiet! Reveals Shadow Rock3 White, Same Performance, ButWhite

Arm Announces Cortex-R82: First 64-bitReal Time Processor

Netgear Updates Orbi Pro Lineup withWi-Fi 6 AX6000 Tri-Band Model

Western Digital Launches SecurityPlatform for Portable Storage with G-Technology's ArmorLock-EncryptedNVMe SSD

Schenker VIA 14: 14-inch Tiger LakeMagnesium Notebook, with 28W VersionInbound

TeamGroup Previews New 15.36 TBConsumer SATA SSD, for $3990

Seagate Updates IronWolf NAS DrivesLineup with 18TB Pro HDD and New 4TBSSDs

Samsung Launches New Galaxy Z Fold2 at$1999

HP Updates The ZBook Lineup WithPower And Fury

Toshiba Updates Canvio Portable StorageLineup with Flex and Gaming HDDs

Advertisement

[Photos] Couple Thought They Were Having A BabySponsored by The Daily Distraction

Read More

quiet! Pure Loop Series,Up to 360 mm AIOs

TWEETSIanCutress: @Haze2K1 It's not about notseeing a person's tweets. It's about makingsure they can't see yours.

IanCutress: Wear a mask to preventcomputer getting virus

Heterogeneouscores forpower,performance,and parallelismtrade-offs.

Many dedicatedprocessors forperformance andenergy efficiency.

Citation: https://www.anandtech.com/show/16050/qualcomm-announces-snapdragon-732g-speed-bump

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 8 / 30

Page 9: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

And more. . .

Multi-core CPUs: from cellphones and laptops to large servers.I A high-end server CPU may have 20 or more cores, plus

hardware support for dataparallel computing, dedicated AIhardware, cryptography, security features, etc.

Large compute clusters (e.g. cloud servers, scientificsupercomputers)I Thousands of machinesI Large amounts of networking equipment (routers and cables)I Distributed programming model (i.e. message passing)

FPGAs = field programmable gate array.I Reconfigurable accelerator.I E.g., Amazon AWS:

https://aws.amazon.com/ec2/instance-types/f1/

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 9 / 30

Page 10: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Key Concepts of Parallel Computing

The last few slides described the diversity of parallel computing.I There are unifying concepts.

Scalable parallelism: divide your task in a way that scales withproblem size.I If the problem is twice as large, ideally you’ll have twice as

many subtasks.Communication and task coordination (C&C) are expensiveI The many parallel architectures can be understood by how

they try to make C&C efficient.I The many parallel programming paradigms can be

understood by how they support C&C.I When devising a parallel solution, look for ways to minimize

the costs of C&C.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 10 / 30

Page 11: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Parallel ArchitecturesThere isn’t one, standard, parallel architecture for everything.We have:I Multi-core CPUs with a shared-memory programming model.

Used for mobile device application processors, laptops,desktops, and many large data-base servers.

I Networked clusters, typically running linux. Used forweb-servers and data-mining. Scientific supercomputers aretypically huge clusters with dedicated, high-performancenetworks.

I GPUs: hundreds or few thousand simple cores that allexecute the same instruction sequence.

I Domain specific processors2 video codecs, WiFi interfaces, image and sound

processing, crypto engines, network packet filtering, andso on.

As a consequence, there isn’t one, standard, parallelprogramming paradigm.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 11 / 30

Page 12: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Parallel PerformanceThe incentive for parallel computing is to do things that wouldn’t bepractical on a single processor.

Performance matters.We need good models:I Counting operations can be very misleading – “adding is free.”I Communication and coordination are often the dominant

costs.Know Big-O, and measure realityI Parallel computing makes inefficient algorithms worse

because we care about large N.I Communication and coordination costs are critical

2 These costs involve the program, its run-time, theoperating system, the network, contention with othertasks, and other factors.

2 Measure, identify the real issues, and model whatmatters.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 12 / 30

Page 13: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Parallel Algorithms

We’ll explore some old friends in a parallel context:I Sum of the elements of an arrayI matrix multiplicationI dynamic programming.

And we’ll explore some uniquely parallel algorithms:

I Bitonic sortI mutual exclusionI producer consumer

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 13 / 30

Page 14: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Parallel Programming Paradigms

CPSC 418 will take a multiparadigm approach:There is no single dominant paradigm for parallel computing.We’ll focus on two:I Message passing – using Erlang

2 Functional programming avoids many common concur-rency errors.

2 It’s easy to illustrate many concepts – remember CPSC110.

2 Message passing makes communication explicit, andcommunication is usually the critical design issue.

I Data parallel – using CUDA2 Your graphics card is a super-computer2 It’s a fairly restrictive model, but . . .2 It works very well for many problems from graphics, sci-

entific computing, and machine learning.We’ll mention other paradigms as the course goes on.I Our goal is to have your prepared so you can quickly pick-up

new parallel programming paradigms – especially those thatdon’t exist yet!

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 14 / 30

Page 15: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Course Overview

Blah, blah, blah, . . .

I Every course has an instructor and usually there are TAs, atextbook, and a syllabus. This course has all that stuff. Youcan read all about it on someslides after the end of the lecture.

I This course has its own, cool stuff:2 PIKAs – pre-class activities (maybe a few in-class as

well).2 Early Bird bonuses2 Plagiarism policy: please don’t.2 Bug Bounties

I Mark can’t hear (what?!)

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 15 / 30

Page 16: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

PIKAs – Pre-Class Activities

Worth 20% of points missed from HW and the midterm exams.I If your pre-final grade is 90%, you can raise that at most 2% through

the pikas. Missing one or two isn’t a big deal.I If your pre-final grade is 70%, you can raise that 6% from the pikas.

This might move your letter grade up a notch (e.g. C+ to B−).I If your pre-final grade is 45%, you can raise that 11% from the pikas.

So do the pikas – we hate turning in failing grades.The first has just been posted (Sept. 9) and due before lecture on Sept. 14.Why pika?

Pica is a serious eating disorder – if you suffer from aneating disorder, it’s an important medical issue. Get help.Vancouver Coastal Health has excellent programs.A pika is a cute (but unfortunately endangered) mammal thatis native to British Columbia.

Photo from http://pixdaus.com/northern-pika-photographer-sandro-animals-pika-yakutia/items/view/281604/.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 16 / 30

Page 17: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Early Bird Bonuses

Late policy: Late homework will not be accepted.Early policy: Early homework gets a 5% bonus.I Typically, the early-bird deadline is two days before the hard

deadline.No worms.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 17 / 30

Page 18: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

PlagiarismI have a very simple criterion for plagiarism:Submitting the work of another person, whether that be another student,something from a book, or something off the web and representing it asyour own is plagiarism and constitutes academic misconduct.If the source is clearly cited, then it is not academic misconduct.If you tell me “This is copied word for word from Jane Foo’s solution” thatis not academic misconduct. It will be graded as one solution for twopeople and each will get half credit. I guess that you could try telling mehow much credit each of you should get, but I’ve never had anyone trythis before.I encourage you to discuss the homework problems with each other.If you’re brainstorming with some friends and the key idea for a solutioncomes up, that’s OK. In this case, add a note to your solution that listswho you collaborated with.More details at:I http://www.students.cs.ubc.ca/˜cs-418/plagiarism.htmlI http://learningcommons.ubc.ca/guide-to-academic-integrity/

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 18 / 30

Page 19: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Collaboration is good

Talk to your friends.Talk with your enemies, if this class brings peace between you, allthe better.Figure things out together, but write your own code, write yourown solutions, and clearly state who you worked with.We will give full credit if you work out your own solution afterdiscussing the problem with others.If you and a friend work out one solution, turn in two copies of thesame solution, and clearly state that you did so,I This is not academic misconduct. No prosecution.I Of course, we’ll only give credit for one solution; so, you’ll

each get half.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 19 / 30

Page 20: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Bug Bounties

If I make a mistake when stating a homework problem, then thefirst person to report the error gets extra credit.I If the error would have prevented solving the problem, then

the extra credit is the same as the value of the problem.I Smaller errors get extra credit in proportion to their severity.

Likewise, bug bounties are awarded (as homework extra credit) forfinding errors in pikas, lecture slides, the course web-pages, codeI provide, etc.The midterm and final have bug bounties awarded in midterm andfinal exam points respectively.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 20 / 30

Page 21: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Mark Can’t Hear (very well)

When working with the whole class, Mark will often move to theperson who’s asking or answering a question so he can hear (andsee) them better.I Besides, the exercise is good for him.

When you have a question, post it on the canvas chat.I We’ll have a TA attending the lecture if I need help with

follow-up questions.I I’ll figure out a way to ask you questions during lecture as

well.I We’re sure to update and evolve these plans as we go.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 21 / 30

Page 22: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Learning Objectives (1/2)

Parallel AlgorithmsI Familiar with parallel patterns such as reduce, scan, and tiling

and can apply them to common parallel programmingproblems.

I Can describe parallel algorithms for matrix operations,sorting, dynamic programming, and process coordination.

Parallel ArchitecturesI Can describe shared-memory, message-passing, and SIMD

architectures.I Can describe a simple cache-coherence protocol.I Can identify how communication latency and bandwidth are

limited by physical constraints in these architectures.I Can describe the difference between bandwidth and inverse

latency, and how these impact parallel architectures.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 22 / 30

Page 23: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Learning Objectives (2/2)

Parallel PerformanceI Understands the concept of “speed-up”: can calculate it from

simple execution models or measured execution times.I Can identify key bottlenecks for parallel program performance

including communication latency and bandwidth,synchronization overhead, and intrinsically sequential code.

Parallel Programming FrameworksI Can implement simple parallel programs in Erlang and CUDA.I Can describe the differences between these paradigms.I Can identify when one of these paradigms is particularly

well-suited (or badly suited) for a particular application.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 23 / 30

Page 24: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Count 3s: a simple example

Given an array (or list) with N items, return the number of thoseelements that have the value 3.

In Erlang, we’ll use listsI [] is the empty list.I [3] is a list with one-element, where that element has the

value 3.I [1, 2, 3, 4, 5] is a list of five elements.I [1 | List2] is the list whose first element has the value 1

and whose tail is the list List2.We can write functions using pattern matchingcount3s([]) -> 0; % The empty list has 0 threes

% A list whose head is 3 has one more 3 than its tailcount3s([3 | Tail]) -> 1 + count3s(Tail);

% A list whose head is not 3 has the same number of 3s as its tailcount3s([ Other | Tail]) -> count3s(Tail).

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 24 / 30

Page 25: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Counting Threes – In Parallel

Design:Use P worker processes.I Each worker generates a list with N

P elements.A master process sends a request to each worker asking it tocount the 3s in its piece of the list and send its subtotal back to themaster.The master adds up the subtotals from each worker and reportsthe grand total.The code is in

http://www.students.cs.ubc.ca/˜cs-418/2020-1/lecture/src/course_intro.erl

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 25 / 30

Page 26: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

The Results (part 1)lulu thetis

P Time Speed-up Time Speed-up1 12.69 1.00 15.40ms 1.002 7.97 1.59 9.82ms 1.574 6.26 2.03 5.98ms 2.588 3.28 3.87 3.11ms 4.95

16 2.93 4.33 2.04ms 7.5532 2.91 4.36 1.66ms 9.2864 2.80 4.53 1.62ms 9.51

128 2.78 4.56 2.24ms 6.87256 2.83 4.84 1.64ms 9.39

Approach:This slide: Set N, the total number of data values to something “large” (i.e. 106),sweep P to find optimal value.Next slide: Use optimal value for P and sweep N to show trends.Notes: lulu.students.cs.ubc.ca has four, two-way multithreaded cores;thetis.students.cs.ubc.ca has thirty-two, two-way multithreaded cores.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 26 / 30

Page 27: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

The Results (part 2)

TBDlulu thetis

N Time Speed-up Time Speed-up1,000 0.12ms 0.10 0.01ms 0.08

10,000 0.18ms 0.73 0.14ms 1.04100,000 0.48ms 2.78 0.30ms 2.911,00,000 2.88ms 4.55 1.27ms 6.73

10,000,000 48.67ms 2.62 14.66ms 7.04100,000,000 246.99ms 5.10 55.54ms 13.82

Speed-up increases with the problem size.To utilize more processors (e.g. thetis) we need a bigger problem size to ap-proach peak speed-up.Speed-up can be greater than the number of cores!This is a consequence of “multithreading”.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 27 / 30

Page 28: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

VariationsStart with all the data on the “master” process – send each workerits chunk of the list (or array) in which to count 3s.I This is a common “newbie” error motivated by a desire to

make the code “look sequential” (i.e. familiar, comfortable,and safe).

I In practice, this makes the parallel version slower than thesequential version.

I Sending the data takes more time than counting the threes.I Communication is very often the bottleneck for parallel

computing. See slide 10.Combine the sub-totals using a tree structure.I This is a great idea!I It allows communication to be done in parallel.I This version achieve near-peak speed-up for smaller problem

sizes than the simple method described on slide 25I We will explore this approach in more detail starting with the

Sept. 12 lecture.Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 28 / 30

Page 29: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Preview of the next lecture

September 11: Introduction to ErlangRead Learn You Some Erlang: “Introduction” through “Functionally SolvingProblems” – You scan skip “Errors and Exceptions” (until next week).Erlang is functionalErlang uses message passingAbstraction of programming patterns using higher-order functions.There will be in-class programming activities:I Bring your laptop or share with someone who brings theirs.

Peaking ahead:I The first PIKA is on the web: http://www.students.cs.ubc.ca/˜cs-418/2020-1/pika/pika1/pika1.erl.

It is due on Sept. 14 at 1pm (use handin).I Reading for Sept. 10, Learn You Some Erlang: “A short visit to com-

mon data structures” through “More on Multiprocessing”.I The first homework will go out on Sept. 16. Due on Sept. 23 at

11:59pm (Early-Bird deadline: Sept. 21 at 11:59pm).

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 29 / 30

Page 30: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Review Questions

Name one, or a few, key reasons that parallel programming ismoving into mainstream applications.What is a pika?How does the impact of your pika score on your final gradedepend on how you did on the other parts of the class?What are bug-bounties?What is the count 3s problem?How did we measure running times to compute speed up?I Why did one approach show a speed-up greater than the

number of cores used?I Why did the other approach show that the parallel version

was slower than the sequential one?

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 30 / 30

Page 31: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Supplementary Material

Course OrganizationGetting Started with ErlangSlide Index

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 31 / 30

Page 32: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Course Organization

Topics: see slide 10SyllabusThe instructor and TAsThe textbook(s)GradesI October 14 and November 9 Midterms – in class

Plagiarism: see slide 18Learning Objectives: see slide 22

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 32 / 30

Page 33: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Syllabus

September: Erlang & an intro. to everythingSep. 9–14: Course overview, intro. to Erlang programming.Sep. 16–23: Parallel programming in Erlang, reduce and scan.Sep. 25–Oct. 2: Parallel architectures

October: Performance, Sorting, and the midtermOct. 3–16: Performance analysisOctober 14: First MidtermOct. 19–23: Sorting

Oct. 26 – Nov. 25: Data Parallel computing with CUDA.November 9: Second MidtermNov. 27 – Dec. 2: More fun with parallel computing.Note: We’ll make adjustments to this schedule as we go.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 33 / 30

Page 34: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Administrative Stuff – Who

The instructorI Mark Greenstreet, [email protected]

2 ICCS 323, (604) 822-30652 Office hours: Tuesdays, 1pm – 2:30pm. Or, just e-mail a

request and we’ll find a time.The TAsJustin Reiher, [email protected] Yang, [email protected]

Office & Tutorial hours:

TBACourse webpage:http://www.students.cs.ubc.ca/˜cs-418.Online discussion group: on piazza.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 34 / 30

Page 35: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Textbook(s)

For Erlang: Learn You Some Erlang For Great Good, Fred Hebert,I Free! On-line at http://learnyousomeerlang.com.

For CUDA: Programming Massively Parallel Processors: AHands-on Approach (2nd or 3rd ed.), D.B. Kirk and W-M.W. Hwu.I Free! On-line through the UBC Library

And more. . . I’ll keep a list at .

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 35 / 30

Page 36: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Why so many texts?

There isn’t one, dominant parallel architecture or programmingparadigm.The Lin & Snyder book is a great, paradigm independentintroduction,But, I’ve found that descriptions of real programming frameworkslack the details that help you write real code.So, I’m using several texts.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 36 / 30

Page 37: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Grades

Homework (roughly every two weeks): 30%

Midterm exam (October 14 and November 9 in class): 30%

Final exam (Date, time and location to be determined): 40%

PIKA and bug bounties

PreFinalGrade = 0.30 ∗ HW + 0.30 ∗MidTermPikaBonus = 0.20 ∗ (0.60−min(0.60,PreFinalGrade)) ∗ PikaFinalGrade = 0.40 ∗ Final

BB = 0.30 ∗ BBHW + 0.30 ∗ BBMT + 0.40 ∗ BBFinal

CourseGrade = min

(1.00,

PreFinalGrade + FinalGrade+PikaBonus + BB

)∗ 100%

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 37 / 30

Page 38: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Scaling

Don’t ask.A loose interpretation of CS department guidelines.If the course average is less than 73%, I will probably revise thegrading formula to scale up.I I’ll examine the distributions of each component of the

grading formula to determine the appropriate adjustment.If the course average is greater than 82%, I will probably revisethe grading formula to scale down.“Probably” leaves room for instructor discretion.I If the whole class clearly outperforms every offering of this

class in the past 5 years, then, I’ll turn in grades that reflectthis fact.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 38 / 30

Page 39: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Homework

Collaboration policyI You are welcome and encouraged to discuss the homework

problems with other students in the class, with the TAs and me, andfind relevant material in the text books, other book, on the web, etc.

I You are expected to work out your own solutions and write yourown code. Discussions as described above are to help understandthe material. Your solutions must be your own.

I You must properly cite your collaborators and any outside sourcesthat you used. You don’t need to cite material from class, thetextbooks, or meeting with the TAs or instructor. See slide 18 formore on the plagiarism policy.

Late policyI Each assignment has an “early bird” date before the main date.

Turn in you assignment by the early-bird date to get a 5% bonus.I No late homework accepted.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 39 / 30

Page 40: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Exams

Two midterms, in class, on October 14 and November 9.Final exam will be scheduled by the registrar.Both exams are open book, open notes, open homework andsolutions – open anything printed on paper.I You can bring a calculator.I No communication devices: laptops, tablets, cell-phones, etc.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 40 / 30

Page 41: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Erlang Resources

Learn You Some Erlanghttp://learnyousomeerlang.com

An on-line book that gives a very good introduction to Erlang. Ithas great answers to the “Why is Erlang this way?” kinds ofquestions, and it gives realistic assessments of both the strengthsand limitations of Erlang.Erlang Examples:http://www.students.cs.ubc.ca/˜cs-418/2012-1/lecture/09-08.pdf

My lecture notes that walk through the main features of Erlangwith examples for each. Try it with an Erlang interpreter running inanother window so you can try the examples and make up yourown as you go. This will cover everything you’ll need to make itthrough all (or most) of what we’ll do in class, but it doesn’t explainhow to think in Erlang as well as “Learn You Some Erlang” orArmstrong’s Erlang book (next slide).

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 41 / 30

Page 42: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

More Erlang ResourcesThe erlang.org tutorialhttp://www.erlang.org/doc/getting_started/users_guide.html

Somewhere between my “Erlang Examples” and “Learn YouSome Erlang.”Erlang Language Manualhttp://www.erlang.org/doc/reference_manual/users_guide.html

My go-to place when looking up details of Erlang operators, etc.On-line API documentation:http://www.erlang.org/erldoc.The book: Programming Erlang: Software for a Concurrent World,Joe Armstrong, 2007,http://pragprog.com/book/jaerlang/programming-erlang

Very well written, with lots of great examples. More than you’llneed for this class, but great if you find yourself using Erlang for abig project.More resources listed at http://www.erlang.org/doc.html.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 42 / 30

Page 43: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Getting Erlang

You can run Erlang by giving the command erl on anydepartmental machine. For example:I Linux: bowen, lulu, thetis, lin01, . . . , lin25, . . . ,

all machines above are in the students.cs.ubc.ca domain, e.g.remote.students.cs.ubc.ca, etc.You can install Erlang on your computerI Erlang solutions provides packages for Windows, OSX, and

the most common linux distroshttps://www.erlang-solutions.com/resources/download.html

I Note: some linux distros come with Erlang pre-installed, but it might be an oldversion. You should probably install from the link above.

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 43 / 30

Page 44: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Starting Erlang

Start the Erlang interpreter.thetis % erlErlang/OTP 20 [erts-9.0] [source] ...

Eshell V9.0 (abort with ∧G)

1> 2+3.52>

The Erlang interpreter evaluates expressions that you type.Expressions end with a “.” (period).

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 44 / 30

Page 45: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Index – Main LectureWhy Parallel Computation MattersTopics coveredI Topics coveredI Parallel ArchitecturesI Key Concepts of Parallel ComputingI Parallel ArchitecturesI Parallel PerformanceI Parallel AlgorithmsI Parallel Programming Paradigms

Course OverviewI PIKAs – Pre/In-Class ActivitiesI Early Bird BonusesI PlagiarismI Bug BountiesI Mark Can’t Hear (very well)I Learning Objectives

Count 3s: a simple examplePreview & Review

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 45 / 30

Page 46: Parallel Computationcs-418/2020-1/lecture/09...CPSC 418 is all on-line I The class outline part of these slides is based on previous, in-person offerings. I I will update how we actually

Index – Supplementary Material

Course OrganizationI SyllabusI Administrative Stuff – WhoI Textbook(s)I GradesI HomeworkI Exams

Erlang ResourcesIndex

Greenstreet Parallel Computation CpSc 418 – Sep. 9, 2020 46 / 30