7/25/2019 6963 Midterm Review
1/20
CS6235
Review for Midterm
7/25/2019 6963 Midterm Review
2/20
Review for Midterm
2CS6235
Administrative
Pascal will meet the class on Wednesday
- I will join at the beginning for questions on test Midterm
- In class March 28, can bring single age of notes
- !e"iew notes, readings and re"iew lecture
- Prior e#ams co"ered, will be discussed today
$esign !e"iew- Intermediate assessment of rogress on roject, oral and short
- In class %ril &
'inal rojects- Poster session, %ril 2( )dry run %ril *8+
- 'inal reort, May (
7/25/2019 6963 Midterm Review
3/20
Review for Midterm
3CS6235
Parts of ExamI $efinitions
% list of terms you will be as.ed to define
II /hort %nswer )& questions, 20 oints+- 1nderstand basic P1 architecture3 rocessors and memory hierarchy
- 4igh le"el questions on more recent 5attern and alication6 lectures
III Problem /ol"ing 7 tyes of questions
- %nalye data deendences and data reuse in code and use this to guide
91$% aralleliation and memory hierarchy maing- i"en some 91$% code, indicate whether global memory accesses will be
coalesced and whether there will be ban. conflicts in shared memory
- i"en some 91$% code, add synchroniation to deri"e a correct
imlementation
-i"en some 91$% code, ro"ide an otimied "ersion that will ha"e fewerdi"ergent branches
I: );rief+
7/25/2019 6963 Midterm Review
4/20
Review for Midterm
4CS6235
Syllabus
>*3 Introduction and 91$% ?"er"iew
@ot much thereA
>23 4ardware ( D >&3 Memory 4ierarchy3 >ocality and $ata Placement
Memory latency and memory bandwidth otimiations
!euse and locality What are the different memory saces on the de"ice, who can
readEwrite themC
4ow do you tell the comiler that something belongs in a articularmemory saceC
Biling transformation )to fit data into constrained storage+3 /afetyand rofitability
7/25/2019 6963 Midterm Review
5/20
Review for Midterm
5CS6235
Syllabus
> D >F3 Memory 4ierarchy III3 Memory ;andwidth?timiation
Biling )for registers+ ;andwidth 7 ma#imie utility of each memory cycle
Memory accesses in scheduling )half-war+
1nderstanding global memory coalescing )for comute
caability G *2 and H *2+ 1nderstanding shared memory ban. conflicts
>3 Writing 9orrect Programs
!ace condition, deendence
What is a reduction comutation and why is it a good matchfor a P1C What does JJsyncthreads )+ doC )barrier synchroniation+
%tomic oerations
Memory 'ence Instructions
$e"ice emulation mode
7/25/2019 6963 Midterm Review
6/20
Review for Midterm
6CS6235
Syllabus
>83 9ontrol 'low
$i"ergent branches
**3 /arse >inear %lgebra on P1/
$ifferent sarse matri# reresentations
/tencil comutations using sarse matrices
7/25/2019 6963 Midterm Review
7/20
Review for Midterm
7CS6235
Syllabus
>*2, >*( and >*&3 %lication case studies
4ost tiling for constant cache )lus data structure reorganiation+
!elacing trig function intrinsic calls with hardware imlementations
lobal synchroniation for MPMEIMP
>*3 $ynamic /cheduling Bas. queues
/tatic queues, dynamic queues
Wait-free synchroniation
>*F3 /orting 1sing a hybrid algorithm for different sied lists
%"oiding synchroniation
Bradeoff between additional comutation and eliminating costlysynchroniation
7/25/2019 6963 Midterm Review
8/20
Review for Midterm8CS6235
2010 Exam: Problem III.aa Manain memory bandwidt!
i"en the following 91$% code, how would you rewrite to imro"e bandwidth to
global memory and, if alicable, shared memoryC ?9R/ *2EF&Q
float aL*2, bL*2, cL*2L*2Q
JJglobal comute)float a, float Sb, float Sc+ T
int t# threadId##Q
int b# bloc.Id##Q
for )j b#SF&Q jG )b#SF&+NF&Q jNN+
aLt# aLt# - cLt#Lj S bLjQ
U
7/25/2019 6963 Midterm Review
9/20
Review for Midterm9CS6235
Exam: Problem III.aa Manain memory bandwidt!
i"en the following 91$% code, how would you rewrite to imro"e bandwidth to
global memory and, if alicable, shared memoryC ?9R/ *2EF&Q
float aL*2, bL*2, cL*2L*2Q
JJglobal comute)float a, float Sb, float Sc+ T
int t# threadId##Q
int b# bloc.Id##Q
for )j b#SF&Q jG )b#SF&+NF&Q jNN+
aLt# aLt# - cLt#Lj S bLjQ
U
How to solve?
Copy ! to s"#red memoryi$ o#lesed order
%ile i$ s"#red memory
Copy # to re&ister
Copy ' to s"#red memory(
o$st#$t memory or te)t*re
memory
7/25/2019 6963 Midterm Review
10/20
Review for Midterm+,CS6235
Exam: Problem III.a@ *2Q @1M;>?9R/ *2EF&Q
float aL*2, bL*2, cL*2L*2Q
float tmaQ
JJglobal comute)float a, float Sb, float Sc+ T
JJsharedJJ ctmL*02&N(2Q EE letVs use (2#(2
EE ad for ban. conflicts
int t# threadId##Q
int b# bloc.Id##Q
tma aLt#Q
Pad* t#E(2Q Pad2 jE(2Q
for )jj b#SF&Q jjG )b#SF&+NF&Q jjN(2+ for )jjjQ jGjjN2Q jNN+
9tmLjS*2Nt#Nad* cLjLt#Q
JJsyncthreads)+Q
tma tma - ctmLt#S*2 N j N ad2 S bLjQ
How to solve?
Copy ! to s"#red memoryi$ o#lesed order
%ile i$ s"#red memory
Copy # to re&ister
Copy ' to s"#red memory(
o$st#$t memory or te)t*re
memory
7/25/2019 6963 Midterm Review
11/20
Review for Midterm++CS6235
2010 Exam: Problem III.bb "iverent #ran$!i"en the following 91$% code, describe how you would modify this to deri"e an
otimied "ersion that will ha"e fewer di"ergent branches AMain)+ T float hJaL*02&, hJbL*02&Q
A ES assume aroriate cudaMalloc called to create dJa and dJb, and dJa is SE
ES initialied from hJa using aroriate call to cudaMemcy SE
dim( dimbloc.)2F+Q dim( dimgrid)&+Q comuteGGGdimgrid, dimbloc.,0HHH)dJa,dJb+Q ES assume dJb is coied bac. from the de"ice using call to cudaMemcy SEU
JJglobalJJ comute )float Sa, float Sb+ Tfloat aL&L2F, bL&L2FQint t# threadId##Q b# bloc.Id##Qif )t# *F 0+
)"oid+ startingJ.ernel )aLb#Lt#, bLb#Lt#+Qelse ES )t# *F H 0+ SE )"oid+ defaultJ.ernel )aLb#Lt#, bLb#Lt#+Q
U
-ey ide#.Sep#r#te m*ltiples of +6
from ot"ers
7/25/2019 6963 Midterm Review
12/20
Review for Midterm+2CS6235
Problem III.b
%roach3
!enumber thread to concentrate case where notdi"isible by *F
if )t# G 2&0+ t t# N )t#E*F+ N *Q
7/25/2019 6963 Midterm Review
13/20
Review for Midterm+3CS6235
2010 Exam: Problem III.$c %ilinBhe following sequential image correlation comutation comares a region of an image to
a temlate /how how you would tile the image and threshold data to fit in *28M;global memory and the temlate data to fit in a *FR; shared memoryC
7/25/2019 6963 Midterm Review
14/20
Review for Midterm+4CS6235
&iew of 'om(utation
Perform correlation of temlate with ortion of image
Mo"e 5window6 horiontally and downward and reeat
im#&e
templ#te
7/25/2019 6963 Midterm Review
15/20
Review for Midterm+5CS6235
2010 Exam: Problem III.$
i 4ow big is image and temlate dataC
Image *K22
S & bytesEint *00 MbytesBh *00 Mbytes
Bemlate F&2S & bytes Eint e#actly *FR;ytes
Botal data set sie H 200 Mbytes
9annot ha"e both image and Bh in global memory 7 must generate2 tiles
Bemlate data does not fit in shared memory due to other thingslaced thereA
ii Partitioning to suort tiling for shared memory
4int to e#loit reuse on temlate by coying to shared memory
9ould also e#loit reuse on ortion of image
$eendences only on th )a reduction+
7/25/2019 6963 Midterm Review
16/20
Review for Midterm+6CS6235
2010 Exam: Problem III.$
)iii+ @eed to show tiling for temlate
9an coy into shared memory in coalesced order
9oy half or less at a time
7/25/2019 6963 Midterm Review
17/20
Review for Midterm+7CS6235
2010 Exam: Problem III.dd Parallel (artitionin and syn$!roni)ation *+, "e$om(osition-
Without writing out the 91$% code, consider a 91$% maing of the >1 $ecomosition
sequential code below %nswer should be in three arts, ro"iding oortunities forartial credit3 )i+ where are the data deendences in this comutationC )ii+ how would youartition the comutation across threads and bloc.sC )iii+ how would you addsynchroniation to a"oid race conditionsC
float aL*02&L*02&Q
for ).0Q jG*02(Q .NN+ T
for )i.N*Q iG*02&Q iNN+
aLiL. aLiL. E aL.L.Q
for )i.N*Q iG*02&Q iNN+
for )j.N*Q jG*02&Q jNN+
aLiLj aLiLj 7 aLiL.SaL.LjQ
U
7/25/2019 6963 Midterm Review
18/20
Review for Midterm+8CS6235
2010 Exam: Problem III.dd Parallel (artitionin and syn$!roni)ation *+, "e$om(osition-
Without writing out the 91$% code, consider a 91$% maing of the >1 $ecomosition
sequential code below %nswer should be in three arts, ro"iding oortunities forartial credit3 )i+ where are the data deendences in this comutationC )ii+ how would youartition the comutation across threads and bloc.sC )iii+ how would you addsynchroniation to a"oid race conditionsC
float aL*02&L*02&Q
for ).0Q jG*02(Q .NN+ T
for )i.N*Q iG*02&Q iNN+
aLiL. aLiL. E aL.L.Q
for )i.N*Q iG*02&Q iNN+
for )j.N*Q jG*02&Q jNN+
aLiLj aLiLj 7 aLiL.SaL.LjQ
U
-ey /e#t*res of Sol*tio$.
i01epe$de$es.
%r*e #i(#(#i(#i #rried 'y
%r*e #i(#( #i(#i( #rried 'y %r*e #i(#i( #i( #( #rried 'y
ii0#rtitio$.
Mer&e i loops( i$ter"#$&e wit" ( p#rtitio$
ross 'los:t"re#ds ;s*ffiie$t
7/25/2019 6963 Midterm Review
19/20
Review for Midterm+9CS6235
2011 Exam: Exam(les of s!ort answers
a $escribe how you can e#loit satial reuse inotimiing for memory bandwidth on a P1 )Partialcredit3 what are the memory bandwidth otimiationswe studiedC+
b i"en e#amles we ha"e seen of control flow inP1 .ernels, describe ?@< way to reduce di"ergentbranches for ?@< of the following3 consider tree-structured reductions, e"en-odd comutations, orboundary conditions
c !egarding floating oint suort in P1s, how doesthe architecture ermit trading off recision and
erformanceC d What haens if two threads assigned to
different bloc.s write to the same memory locationin global memoryC
7/25/2019 6963 Midterm Review
20/20
Review for Midterm2,CS6235
2011 Exam: Exam(les of Essay
Pic. one of the following three toics and write a "erybrief essay about it, no more than ( sentences
a We tal.ed about sarse matri# comutations withresect to linear algebra, grah coloring and rogramanalysis $escribe a sarse matri# reresentationthat is aroriate for a P1 imlementation of oneof these alications and e#lain why it is well suited
b We tal.ed about how to ma tree-structuredcomutations to P1s ;riefly describe features ofthis maing that would yield an efficient P1imlementation
c We tal.ed about dynamic scheduling on a P1$escribe a secific strategy for dynamic scheduling)static tas. list, dynamic tas. list, wait-freesynchroniation+ and when it would be aroriate touse it