90

PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation
Page 2: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Page 3: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

••

Page 5: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

••

Page 6: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

••

Page 7: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Page 8: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

••

Page 9: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Thread

Page 10: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Thread Group

Page 11: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Grid (Kernel)

Page 12: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Thread Group

(0, 0)

(0, 1)

(0, 2)

(0, 3)

(1, 0)

(1, 1)

(1, 2)

(1, 3)

(2, 0)

(2, 1)

(2, 2)

(2, 3)

(3, 0)

(3, 1)

(3, 2)

(3, 3)

Thread Group (0, 0) Thread Group (1, 0)

Thread Group (0, 1) Thread Group (1, 1)

Grid (Kernel 0)

Thread + group identifiers uniquely specify a thread within a kernel

Page 13: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Private Memory

Thread

Page 14: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Shared Memory

Thread Group

Page 15: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Device Memory

Grid (Kernel 0)

Grid (Kernel 1)

Page 16: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation
Page 17: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

. . . .

. . . .

. . . .

Page 18: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Page 19: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

Page 20: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE

MP MP

Connections

MemoryProcessors

Units/groupings

MP: MultiprocessorPE: Processing Element

Page 21: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE

Page 22: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE

Core Core

Core Core

Page 23: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE

Page 24: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE

MP MP

Page 25: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE

Page 26: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE …

Page 27: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE

MP MP…

Page 28: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE …

Page 29: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE …

Page 30: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

CPU

HostMemory

Device Memory

PE PE

SharedMemory

RegistersPage

PE

MP MP…

Page 31: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation
Page 32: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

Page 33: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

if (tid.x % 2 == 0) {a += 5;

} else {a += 4;

}

Page 34: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

••

PE PE

SharedMemory

RegistersPage

PE …

Page 35: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

PE PE

SharedMemory

RegistersPage

PE …

Page 36: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Device Memory

PE PE

SharedMemory

RegistersPage

PE…

Page 37: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Page 38: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation
Page 39: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

COUNT SUM AVG MAX MIN

8 16 2 4 0

Page 40: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0device

Page 41: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

Page 42: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

1 location per thread

group 1 group 2

Page 43: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

shared 1 location per thread

group 1 group 2

Page 44: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

1 location per thread

group 1 group 2

group 1, thread 1

shared

Page 45: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

1 location per thread

group 1 group 2

group 1, thread 1

shared

Page 46: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

1 location per thread

group 1 group 2

group 1, thread 1

shared

Page 47: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

3 1 location per thread

group 1 group 2

group 1, thread 1

shared

Page 48: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

3 4 1 location per thread

group 1 group 2

group 1, thread 2

shared

Page 49: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

3 4 1 location per thread7

group 1 group 2

group 2, thread 1

shared

Page 50: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

3 4 1 location per thread7 2

group 1 group 2

group 2, thread 2

shared

Page 51: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

3 4 1 location per thread7 2

group 1 group 2

access #1

shared

Page 52: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

0 2 4 2 3 2 3 0

2 thread groups, 2 threads/group = 4 threads

device

3 4 1 location per thread7 2

group 1 group 2

access #2

shared

Page 53: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

shared

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

group 1 group 2

Page 54: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2shared

shared synchronization (__syncthreads())

Page 55: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

shared

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

3 4 Same shared memory7 2

group 1 group 2

shared

Page 56: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

3 4 Same shared memory7 2

group 1, thread 1

group 1 group 2

shared

shared

Page 57: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

3 4 Same shared memory7 2

group 1, thread 1

group 1 group 2

shared

shared

Page 58: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

3 4 Same shared memory7 2

group 1, thread 1

group 1 group 2

shared

shared

Page 59: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory7 2

group 1, thread 1

group 1 group 2

shared

shared

Page 60: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

group 2, thread 1

group 1 group 2

shared

shared

Page 61: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

shared 9 27 4

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

Same shared memory

group 1 group 2

shared

Page 62: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 1 location per group

shared

shared

Page 63: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 1 location per group

group 1, thread 1

shared

shared

Page 64: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 1 location per group

group 1, thread 1

shared

shared

Page 65: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

7

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 1 location per group

group 1, thread 1

shared

shared

Page 66: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

7

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 1 location per group

group 2, thread 1

shared

shared

Page 67: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

7

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 1 location per group

group 2, thread 1

shared

shared

Page 68: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

7 9

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 1 location per group

group 2, thread 1

shared

shared

Page 69: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 7 9 1 location per group

shared

shared

Page 70: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 7 9 1 location per group

device synchronization (kernel boundary)

shared

shared

Page 71: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 7 9 1 location per group

device 7 9 Same device memory

shared

shared

Page 72: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 7 9 1 location per group

device 7 9 Same device memory

group 1, thread 1

shared

shared

Page 73: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 7 9 1 location per group

device 7 9 Same device memory

group 1, thread 1

shared

shared

Page 74: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 7 9 1 location per group

device 7 9 Same device memory

group 1, thread 1

shared

shared

Page 75: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

2 thread groups, 2 threads/group = 4 threads

device 0 2 4 2 3 2 3 0

3 4 1 location per thread7 2

7 4 Same shared memory9 2

device 7 9 1 location per group

device 16 9 Same device memory

group 1, thread 1

shared

shared

Page 76: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Page 77: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Page 78: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

Page 79: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

• .reg

••

•• .sreg

• tid ctaid

Page 80: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

• .global .local .param

••

• .global

••

Page 81: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

Page 82: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

.version 6.1

.target sm_61

.address_size 64

.visible .entry AddTest(.param .u64 AddTest_0){

[…]}

Page 83: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

••

• .param .reg

.visible .entry AddTest(.param .u64 AddTest_0) { […] }

.func (.reg .u64 Return) Function(.param .f64 Return_0) { […] }

Page 84: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

Page 85: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

••

add.sat.s32 %r2, %r0, %r1

%r0-%r1

fma.rn.f16 %f3, %f0, %f1, %f2

%f0-%f3

Page 86: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

mov.u32 %r0, %tid.x

%r0 .reg %tid.x .sreg

ld.param.u64 %rd0, [ExampleParam_0]

%rd0 .reg ExampleParam_0 .param

Page 87: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

@p add.sat.s32 %r2, %r0, %r1

p .pred

setp.eq.u32 p, %tid.x, 0

@p bra target_label

Page 88: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

bar.warp.sync 0xffffffff

Page 89: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

.version 6.1

.target sm_61

.address_size 64

.visible .entry AddTest(.param .u64 .ptr.align 8 AddTest_0){

.reg .u32 %r; // variable declarations

.reg .u64 %rd<4>; // multiple declarations %rd0, %rd1…

.reg .f64 %f<2>;ld.param.u64 %rd0, [AddTest_0]; // load ptr param

cvta.to.global.u64 %rd1, %rd0; // convert generic to global ptrmov.u32 %r, %tid.x; // load thread idmul.wide.u32 %rd2, %r, 8; // calc offset for thread

add.u64 %rd3, %rd1, %rd2; // calc position for thread

ld.global.f64 %f0, [%rd3]; // load current valueadd.f64 %f1, %f0, 2.000000; // increment by 2st.global.f64 [%rd3], %f1; // write valueret;

}

Page 90: PowerPoint Presentationcs520/2019/slides/18-gpu.pdf · 7 4 9 2 Same shared memory device 7 9 1 location per group device 16 9 Same device memory group 1, thread 1 ... PowerPoint Presentation

.version 6.1

.target sm_61

.address_size 64

.visible .entry ConditionalTest(.param .u64 ConditionalTest_0){

.reg .pred p; // predicate register[…]rem.u32 %r1, %tid.x, 2; // check if even or oddsetp.ne.u32 p, %r1, 0; // set predicate

@p bra false; // conditionally branchadd.u32 %r2, %r0, 1;bra end;

false:add.u32 %r2, %r0, 2;

end: // converge pointst.global.u32 [%rd3], %r2;ret;

}