19
In the Name of Performance Sam Coleman

In the Name of Performance

Embed Size (px)

Citation preview

Page 1: In the Name of Performance

In the Name of Performance

Sam Coleman

Page 2: In the Name of Performance

Who am I?

Page 3: In the Name of Performance

Who am I?

Page 4: In the Name of Performance

What is this not?• Not about overall architecture

Page 5: In the Name of Performance

What is this not?• Not about rescuing an entire codebase

Page 6: In the Name of Performance

What is this, then?• Let’s look at something gross• Let’s understand it• Let’s never do it unless it needs to be done• …“Performance”

Page 7: In the Name of Performance

Duff’s Device• Live coding• Please excuse the mess

Page 8: In the Name of Performance

Duff’s Device• Why?• Don’t do this.• …unless you have to.• memcpy available almost everywhere.

Page 9: In the Name of Performance

Bit Counting• aka “population count”, aka “Hamming Weight”

Page 10: In the Name of Performance

But First, Bit Operations• Bitwise: &, >>• Masking: “covering up” bits we don’t care about

h g f e d c b a

& 0 0 1 1 1 1 0 0

0 0 f e d c 0 0

Page 11: In the Name of Performance

Bit Counting• Live coding

Page 12: In the Name of Performance

Bit Countingh g f e d c b a

- h g f e d c b\ \ \

= h (g ^ h) f (e ^ f) d (c ^ d) b (a ^ b)

• The left column retains its original value.• The right column is “on” only if either d or c were on.• So across both columns, possible results are “10”, “01”, or “00”.• …you may recognize these as “2”, “1”, or “0”.

Page 13: In the Name of Performance

Bit Counting

+

h g f e d c b a

h g d c

Page 14: In the Name of Performance

Bit Counting• Sum of two 2-bit numbers can require 3 bits• Need to clean (mask) the input so we know which

bits of output are relevant• Result is clean, 4-bit integers

+

h g f e d c b a

i h g e d c

Page 15: In the Name of Performance

Bit Counting• Our “4-bit” inputs will only have up to 3 bits in use

(to store “4”) so the sum will take at most 4 bits• Only need to mask the output to get a clean, 8-bit

result – saves an operation vs. masking both inputs

+

p o n m l k j i h g f e d c b a

p o n m h g f e

Page 16: In the Name of Performance

Bit Counting- - - - p o n m - - - - l k j i - - - - h g f e - - - - d c b a0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1

- - - - p o n m - - - - l k j i - - - - h g f e - - - - d c b a

- - - - l k j i - - - - h g f e - - - - d c b a

- - - - h g f e - - - - d c b a

- - - - d c b a

… >> 24

Page 17: In the Name of Performance

Bit Counting• Why?• Don’t do this.• …unless you have to.• __builtin_popcount available almost everywhere.

Page 18: In the Name of Performance

Concluding Remarks• Old code can be a pain• Embedded code can be a pain• Progress can also be pain• Boyscout rule: leave the campground cleaner than

you found it.• Caveat: Don’t clean the campground without

understanding it!

Page 19: In the Name of Performance

?