49
Lecture 2: Parallel Computing Fundamentals Presented by Simon Winberg Digital Systems EEE4084F (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Embed Size (px)

Citation preview

Page 1: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Lecture 2:Parallel Computing Fundamentals

Presented by

Simon Winberg

Digital Systems

EEE4084F

(planned for double period)Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Page 2: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Lecture Overview

Review of quiz 0 UML blurb Parallel computing fundamentals Automatic parallelism Performance

benchmarking Trends

Page 3: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz 0 Review…

37 students wrote the quiz

Page 4: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q1Q1 How good is your VHDL

Most of class answered (+/- 45%) [3] Reasonably good

A few less answered (+/- 35%) [2] A little

Other said Excellent (about 1 or 2) and some left it blank

Page 5: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q2Q2 The 'Internet of Things' has become a catchy term. Explain briefly what this refers to.

My thoughts…

0/3: but thanks for the honesty!... so 100 brownie points for you

1/3: Sounds sensible, not quite right though

2/3 maybe more: This is more correct, but isn’t limited to households

Page 6: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q2Q2 The 'Internet of Things' has become a catchy term. Explain briefly what this refers to.

The 'internet of things' (IoT) refers to a larger scale internet in which everyday objects, like alarm clocks, light bulbs and such like, are connected to a network that allows data to be sent to or received from them.

Sample solution

Yebo! It is 100% right.

Mark? … 3/3

Page 7: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q3Q3 What is Xilinx ISE?

[1] No idea - I would probably delete it from my PC if I found it there[2] It is a software tool for analyzing Inter-Species Expressions (like the face your dog makes when he sees your neighbor's cat and then barks like mad).[3] It is an application for converting C code into BASIC[4] It is an application used to develop HDL code and programme FPGAs

Class response: about 80% of the class chose the right answer

Page 8: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q4Q4 Briefly motivate why computer engineers, planning to work on large and complex FPGA projects, should understand both Verilog and VHDL.

My thoughts…

Nice story 2/3 ?

Sounds good, didn’t say much 2/3

Ag shame! Missed the mark. 0/3

Page 9: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q4Q4 Briefly motivate why computer engineers, planning to work on large and complex FPGA projects, should understand both Verilog and VHDL.

Answer:The main reason is reuse. Reusing existing gateware devices. For example, there might be an excellent digital filter out there on the web (e.g. opencores) that does almost what you want; but it’s in Verilog instead of your favorite VHDL. So you have to either start from scratch or learn Verilog.

Now you’re talkingbig savings & benefits

3/3

Page 10: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q5Q3 What is spatial computing?

[1] A new programming language[2] A programming paradigm whereby computation is described as happening in different spaces instead of different times.[3] This term (if rather informal) refers to an algorithm implementation that has certain awkward imperfections; kind of like when someone accidentally snorts uproariously at a good joke in polite company. [4] It refers to fitting computing infrastructure into a limited space.

How did the class respond?...

Nice try,wise-guy!

for office use=

Most of the class got it right though!

A technically wrong response:

Page 11: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

My thoughts…

Quiz0 review: Q6

My thoughts on the topic: high performance computers are becoming more task-specific – the trend is moving away from supercomputers for general applications, towards platforms that are more custom-designed (or tailored) to a particular domain of application (or even to a specific application – e.g., reconfigurable computing platforms).

The story seems on the right track; don’treally find it all that clear.

Page 12: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q6 – elaboration

Good:

HPC system can be in the form of an embedded system: its special purpose, must be fast, real-time, and perform tasks concurrently.

An HPC with a well defined task can be optimised for that task - hence akin to an embedded system.

Bad:

An embedded system can be very complex, such as a HPC placed into a bigger system, still making it embedded.

HPCs are trending towards embedded systems because they are becoming more task specific.To enhance performance and increase functionality (?)

Page 13: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q7

Q: In computer engineering terminology, what does 'co-design' mean?

A: co-design means:Software/Hardware co-design is generally considered simultaneous design of both hardware and software to implement required functions. Often it refers to a hardware team and a software team needing to work closely together while developing the software and hardware parts of an embedded system, together with the activities involved in acheiving this, such as defining clear hardware/software interfaces, functional decomposition, etc.

Page 14: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Some Answersfrom

Previous Quiz0’s

See extra slides at end of this presentation

Page 15: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

UML Review

Page 16: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

UML Review* General errors that student often make:

Not using the UML syntax (at all)Block and line models, things like flowcharts

Using the UML syntax wronglyInvalid modelling constructs (i.e., wrong type

of connectors / associations and blocks)Vague modelsInsufficient information

Some examples of common mistakes…

* No longer a UML assessment in Quiz0 due to time constraints

Page 17: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

UML demonstration

JOE'S STORY:Joe Baggits is a developer in the Construction Division of Freezit Commerial Products (FCP) Pty. Ltd. The FCP corporation produces a variety of Cooling systems, grouped into two product ranges, namely: 1) the portafrige range and 2) the coolzone range. The portafrige range includes the Icecart™ and the Chillblock™. Coolzone range includes the ShopFrige™ and Coolroom™. Andy Wrapp is also a developer in the Construction Division and is Joe's manager. Joe works on portafridge products, whereas Andy works on coolzone products.

Task: Produce a UML model describing the products and people related to Joe Baggits, according to the description below.

Look at this if your UMLunderstanding is lacking.

Page 18: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

SampleSolution

Notice use of roles, aggregation and inheritance.

In this situation, it is handy to use objects instead of classes to clearly capture things that exist as physical entities (e.g., the people)

Page 19: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Containment Aggregationvs.

Container Group

Item in the container

Item in the group

Container has solid diamond

n n

Aggregation has outline diamond

Arity next to item(not next to container)

Page 20: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Super fast class exercise(for Thurs double)

Briefly describe the following as a UML diagram:

A new computer system called PMB (Parallel Multicore Beast) contains eight CPUs and two blocks of 8Gb memory. It can run multiple programs from its built-in Linux O/S. Each program can spawn multiple threads. The programs each contain one or more data blocks.

Take around 5 minutes to draw a rough UML class diagram

Page 21: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Sample solutionA new computer system called PMB (Parallel Multicore Beast) contains eight CPUs and two blocks of 8Gb memory. It can run multiple programs from its built-in Linux O/S. Each program can spawn multiple threads. The programs eachcontain one ormore data blocks. PMB 8GB

Memory

2

8

CPU

Thread1..*

Assumptions:• Threads are contained

in a program, and there must be at least one thread in a program.

• By stating Linux is built-in it indicates that this O/S is an instrumental non-optional aspect so this is probably more a containment relation than an aggregation.

Data block

1..*

*

Linux

Program

Page 22: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Parallel SystemsEEE4084F: Digital Systems

A?

B?

C?

+ *

X !

Y !

-

Page 23: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Anyone else beside me feel that

way?

Question:Do you sometimes feel that despitehaving a wizbang multicore PC,it still just isn’t keeping up wellwith the latest software demands?...

Processor coresMajor software

Page 24: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

It may be so because your software isn’t

designed to leverage the full potential of the available hardware.

CPUs at idle

Page 25: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Computation MethodsHardware Reconfigurable

ComputerSoftwareProcessor

E.g. PCBs, ASICsAdvantages:• High speed &

performance• Efficient (possibly

lower power than idle proc.)• Parallelizable Drawbacks:• Expensive• Static (cannot

change)

E.g. IBM Blade, FPGA-basedcomputing platformAdvantages:• Faster than software alone• More flexible than software• More flexible than hardware• ParallelizableDrawbacks:• Expensive• Complex(both s/w & h/w)

E.g. PC, embedded software on microcontrollerAdvantages:• Flexible• Adaptable• Can be much

cheaperDrawbacks:• The hardware is

static• Limit of clock speed• Sequential processing

Page 26: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Short Video:Latest Intel Chipsets

Page 27: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Intermission

Page 28: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quick view: Intel’s Latest CPUs Intel® Core™ i7

I7 Extreme Edition vs. (regular) i7 processor Intel® Xeon™ Processors

4/8 cores; 8/16 threads; 40W to 130W (more cores, exponential growth in power)

1.86-3.3 Ghz CPU clock, 1066-1600 Mhz bus Intel® Itanium®

Scalable. 1/2/4 cores; hyperthreading* Allows designs up to 512 cores, 32/64bit Power use starts around 75W 1-2.53 Ghz (Itanium 9500 ‘Poulson’); QPI 6.4GT/s bus

* Hyperthreading two virtual/logical processors per core (more: http://www.techopedia.com/definition/2866/hyperthreading-ht)

gigatransfers per second (GT/s) or megatransfers per second (MT/s): (somewhat informal) number of operations transferring data that occurs each second in a given data-transfer channel. Also known as sample rate, i.e. num data samples captured per second, each sample normally occurring at the clock edge. [1]

Page 29: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Mainstream parallel computing Most server class machines today are:

PC class SMP’s (Symmetric Multi-Processors *) 2, 4, 8 processors - cheap Run Windows & Linux

Delux SMP’s 8 to 64 processors Expensive: 16-way SMP costs ≈ 4 x 4-way SMPs

Applications: Databases, web servers, internet commerce / OLTP (online transaction processing)

Newer applications: technical computing, threat analysis, credit card fraud...

SMP offers all processors and memory on a common front side bus (FSB –bus that connects the CPU and motherboard subsystems).

* Also termed “Shared Memory Processor”

Page 30: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Large scale parallel computing systems Hundreds of processors, typically built as

clusters of SMPs Often custom built with government

funding (costly! 10 to 100 million USD) National / international resource Total sales tiny fraction of PC server sales

Few independent software developers Programmed by small set of majorly

smart people

Page 31: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Large scale parallel computing systems Some applications

Code breaking (CIA, FBI)Weather and climate modeling /

predictionPharmaceutical – drug simulation, DNA

modeling and drug designScientific research (e.g., astronomy,

SETI)Defense and weapons development

Large-scale parallel systems are often used for modelling

Page 32: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Software developmentfor parallel computers

Important software sections (frequently run sections) usually hand-crafted (often from initial sequential versions) to run on the parallel computing platform

Why this happens: Parallel programming is difficult Parallel programs run poorly on sequential

machines (need to simulate them) Automatic parallelization difficult (& messy)

Leads to high utilization expenses

Page 33: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Do parallel languages and compilers exist?

Yes!

… some other examples…

Page 34: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Do parallel languages and compilers exist?

Yes!MatLab, Simulink, SystemC, UML*,

NESL** and others “Automatic parallelization”:

Def: converting sequential code into multi-threaded or vectorised code (or both) to utilize multiple processors simultaneously (e.g., for SMP machine)

… short powwow on the topic…* Model-Driven Development using case tools, e.g. Rhapsody for RT-UML

Try interactive tutorial on: http://www.cs.cmu.edu/~scandal/nesl/tutorial2.html

Page 35: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Powwow* momentConsult the four winds, and your neighbouring classmates, as to:Why is it probably not easy to automate conversion from sequential code (e.g. BASIC or std C program) to parallel code?

HINT: Perhaps start by clarifying the difference between sequential and parallel code.

PS: You’re also welcome to get up, stretch, move about, infiltrate a more intelligent-looking tribe, and so on. Note approx. 5 min. time limit!

Timelimit

TIMEUPNext slide provides some reasons...

* It’s a term from North America's Native people used to refer to a cultural gathering.

Page 36: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Return from thePowwow

What the winds decree…

Page 37: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Data hazards

Timing issues

Deciding how to break-up data to distribute

Deciding when to implement semaphores and locks

When code needs to block and when not

How to split-up a loop into parallel parts

Having to convert clocks of statementsor functions in inter process calls

Figuring out timing dependencies

Some thoughts

Difficulty in figuring out data dependencies

On why is it probably not easy to automate conversion from sequential code (e.g. BASIC or C) to parallel code.

What the wind hasto say…

Page 38: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Why automatic parallelism isn’t “quite there yet”

Accumulation of 30+ years of research… Only limited success in parallelism

detection and program transformations Instruction-level parallelism at the basic-block

level (e.g., pipelining of instructions) Parallelism in nested for-loops containing arrays

with simple index expressions Analysis methods: data dependence analysis,

pointer analysis, abstraction back to more optimized implementation, flow analysis

Page 39: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Why automatic parallelism isn’t “quite there yet” Main reasons (user perspective):

Tends to take too long. Tend to be too fragile (i.e., breaks down after small

changes to the code). Tends to miss many things a human would notice

and provide an effective solution for, i.e., human intellect, underlying knowledgeable of the application, and trained to write and problem-solve parallel code.

So: Instead of training compilers to recognize parallelism, people are being trained to write parallel code (i.e., no “middle man” approach).

Page 40: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Seminar Groups Try to start on formalizing groups

Group Date Members Chapter

  TERM1 Microprocessor-based parallel systems1 24 Feb 2015 S. Winberg (done by lecturer) The landscape of parallel computing research: a view from

Berkeley

2 CH1 A Retrospective on High Performance Embedded Computing and CH2Representative Example of a High Performance Embedded Computing System

3 CH3 System Architecture of a Multiprocessor System

4 CH5 Computational Characteristics of High Performance Embedded Algorithms and Applications (optional additional reading: CH15 Performance Metrics and Software Architecture)

5 CH13 Computing Devices

  TERM2 FPGA / Reconfigurable parallel systems6 CH9 Application-Specific Integrated Circuits and CH10 Field

Programmable Gate Arrays

7 CH7 Analog-to-Digital Conversion

8 CH14 Interconnection Fabrics

9 CH24 Application and HPEC System Trends NOTE: this last seminar is on a Thursday as the Tues is a holiday

10   CH20 Radar Applications(probably discard)

Will be posted as

Sign-up on Vula

Page 41: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Next lecture

Moore’s law and related trends, from a high performance computing angle

Discussion of Prac1 & Pthreads Conceptual Assignment planning Finalizing Seminar Groups Benchmarking, etc.

No quiz next week

Page 42: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Some Answersfrom

Previous Quiz0’s

(supplementary slides)

Page 43: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

How good is your VHDL? (tick answer)[1] None [2] A little [3] Reasonably good [4] Excellent

17%

79%

3%

1 - not used2 - A little3 - Reasonably4 - Excellent

Quiz0 review: Q1

Excellent: 0%

Reasonably good

None: 0%A little

2012This year 2013

89%

11%

Page 44: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

15

20

1

Question 3: Quartus II Experience

NoneA LittleReasonably GoodExcellent

21%

76%

3%

1 - not used2 - A little3 - Rea-sonably4 - Excellent

201286%

3%10%

Quiz0 review: Q3Have you used Altera Quartus II? Or Xilinx ISE? Check appropriate boxes.

A little

A littleexcellent

Reasonably good

(i.e. almost everyone has used it)

2013

23

1

Question 3: Xilinx ISE Experience

NoneA LittleReasonably GoodExcellent

Not used

2013

2014

Page 45: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q4

Close! But no cigar

My thoughts?...

More ‘intelligence’?!My cellphone is probably as intelligent as my PC… as in not at all intelligent based on the turning test. Heading into troubled waters here

Page 46: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Suggested sample answer…

Quiz0 review: Q4

I’d be happy with something clear & general. E.g.: It is a computer that is able to perform multiple computations in parallel.

Pretty good, if abit wordy

Not precisely… That’s a special case

Page 47: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

A student’s answer…

Quiz0 review: Q6In computer engineering terminology, what is meant by sampling? Include a short example and possibly a image to aid your explanation.

my description …

Pretty good!!

Page 48: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Quiz0 review: Q6In computer engineering terminology, what is meant by sampling? Include a short example and possibly a image to aid your explanation.

Sampling, used by computer engineers, typically refers to the process of digitizing an analogue signal, or looking at discrete instances of a continuous signal. A sample is basically a value or set of related values representing an instance in time of an analogue/real event. Usually a fixed sample period is used.

time

Analogue / real signal

sample period

a sample

Page 49: Presented by Simon Winberg (planned for double period) Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Image sources: Clipart sources – public domain CC0 (http://pixabay.com/) commons.wikimedia.org images from flickr

Disclaimers and copyright/licensing details

I have tried to follow the correct practices concerning copyright and licensing of material, particularly image sources that have been used in this presentation. I have put much effort into trying to make this material open access so that it can be of benefit to others in their teaching and learning practice. Any mistakes or omissions with regards to these issues I will correct when notified. To the best of my understanding the material in these slides can be shared according to the Creative Commons “Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)” license, and that is why I selected that license to apply to this presentation (it’s not because I particulate want my slides referenced but more to acknowledge the sources and generosity of others who have provided free material such as the images I have used).