What Do You Mean by “Cache Friendly”?...What Do You Mean by “Cache Friendly”? – code::dive...

Preview:

Citation preview

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 1/205

What Do You Mean by “Cache Friendly”?

Björn Fahller

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 2/205

typedef uint32_t (*timer_cb)(void*);struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev;};

static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };

timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer* iter = timeouts.prev; while (iter != &timeouts !& is_after(iter!>deadline, deadline)) iter = iter!>prev; add_behind(iter, deadline, cb, userp);}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 3/205

typedef uint32_t (*timer_cb)(void*);struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev;};

static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };

timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer* iter = timeouts.prev; while (iter != &timeouts !& is_after(iter!>deadline, deadline)) iter = iter!>prev; add_behind(iter, deadline, cb, userp);}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 4/205

typedef uint32_t (*timer_cb)(void*);struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev;};

static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };

timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer* iter = timeouts.prev; while (iter != &timeouts !& is_after(iter!>deadline, deadline)) iter = iter!>prev; add_behind(iter, deadline, cb, userp);}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 5/205

typedef uint32_t (*timer_cb)(void*);struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev;};

static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };

timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer* iter = timeouts.prev; while (iter != &timeouts !& is_after(iter!>deadline, deadline)) iter = iter!>prev; add_behind(iter, deadline, cb, userp);}

void cancel_timer(timer* t) { t!>next!>prev = t!>prev; t!>prev!>next = t!>next; free(t);}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 6/205

What Do You Mean by “Cache Friendly”?

Björn Fahller

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 7/205

Simplistic model of cache behaviour

Includes

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 8/205

Simplistic model of cache behaviour

Includes

● The cache is small

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 9/205

Simplistic model of cache behaviour

Includes

● The cache is small● and consists of fixed size lines

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 10/205

Simplistic model of cache behaviour

Includes

● The cache is small● and consists of fixed size lines● and data access hit is very fast

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 11/205

Simplistic model of cache behaviour

Includes

● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 12/205

Simplistic model of cache behaviour

Includes

● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow

Excludes

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 13/205

Simplistic model of cache behaviour

Includes

● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow

Excludes

● Multiple levels of caches

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 14/205

Simplistic model of cache behaviour

Includes

● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow

Excludes

● Multiple levels of caches● Associativity

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 15/205

Simplistic model of cache behaviour

Includes

● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow

Excludes

● Multiple levels of caches● Associativity● Threading

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 16/205

Simplistic model of cache behaviour

Includes

● The cache is small● and consists of fixed size lines● and data access hit is very fast● and data acess miss is very slow

Excludes

● Multiple levels of caches● Associativity● Threading

All models are wrong, but some are useful

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 17/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x3A100x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memorySimplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 18/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x3A100x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memorySimplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 19/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x3A100x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memorySimplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 20/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x3A100x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memorySimplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 21/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x3A100x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memorySimplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 22/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x3A100x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memorySimplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 23/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memorySimplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 24/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 25/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 26/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 27/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 28/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

Simplistic model of cache behaviour

0x4010

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 29/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40100x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

Simplistic model of cache behaviour

0x4010

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 30/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 31/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

0x4080

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 32/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

0x4080

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 33/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

0x4080

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 34/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

0x4080

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 35/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

0x4080

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 36/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

0x4080

0x4080

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 37/205

const int* hot = 0x4001;const int* cold = 0x4042;int* also_cold = 0x4080;

int a = *hot;int c = *cold;*also_cold = a;also_cold[1] = c;

0x40000x4FF0

cache

0x40000x40100x40200x40300x40400x40500x40600x40700x40800x40900x40A00x40B00x40C00x40D00x40E00x40F0

memory

0x4040

0x4080

0x4080

Simplistic model of cache behaviour

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 38/205

Analysis of implementation

int main() {   std!:random_device rd;   std!:mt19937 gen(rd());   std!:uniform_int_distribution<uint32_t> dist;

  for (int k = 0; k < 10; !+k) {    timer* prev = nullptr;    for (int i = 0; i < 20'000; !+i) {    timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr);    if (i & 1) cancel_timer(prev);    prev = t;    }    while (shoot_first())     ;   } }

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 39/205

Analysis of implementation

int main() {   std!:random_device rd;   std!:mt19937 gen(rd());   std!:uniform_int_distribution<uint32_t> dist;

  for (int k = 0; k < 10; !+k) {    timer* prev = nullptr;    for (int i = 0; i < 20'000; !+i) {    timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr);    if (i & 1) cancel_timer(prev);    prev = t;    }    while (shoot_first())     ;   } }

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 40/205

Analysis of implementation

int main() {   std!:random_device rd;   std!:mt19937 gen(rd());   std!:uniform_int_distribution<uint32_t> dist;

  for (int k = 0; k < 10; !+k) {    timer* prev = nullptr;    for (int i = 0; i < 20'000; !+i) {    timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr);    if (i & 1) cancel_timer(prev);    prev = t;    }    while (shoot_first())     ;   } }

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 41/205

Analysis of implementation

int main() {   std!:random_device rd;   std!:mt19937 gen(rd());   std!:uniform_int_distribution<uint32_t> dist;

  for (int k = 0; k < 10; !+k) {    timer* prev = nullptr;    for (int i = 0; i < 20'000; !+i) {    timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr);    if (i & 1) cancel_timer(prev);    prev = t;    }    while (shoot_first())     ;   } }

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 42/205

Analysis of implementation

int main() {   std!:random_device rd;   std!:mt19937 gen(rd());   std!:uniform_int_distribution<uint32_t> dist;

  for (int k = 0; k < 10; !+k) {    timer* prev = nullptr;    for (int i = 0; i < 20'000; !+i) {    timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr);    if (i & 1) cancel_timer(prev);    prev = t;    }    while (shoot_first())     ;   } }

bool shoot_first() {  if (timeouts.next != &timeouts) return false;

  timer* t = timeouts.next;  t!>callback(t!>userp);  cancel_timer(t);  return true;}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 43/205

Analysis of implementation

valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 44/205

Analysis of implementation

valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes

Essentially a profiler thatcollects info about callhierarchies, number ofcalls, and time spent.

The CPU simulator is notcycle accurate, so see

timing results as a broadpicture.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 45/205

Analysis of implementation

valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes

Essentially a profiler thatcollects info about callhierarchies, number ofcalls, and time spent.

The CPU simulator is notcycle accurate, so see

timing results as a broadpicture.

Simulates a CPU cache,flattened to 2 levels, L1 and LL.

It shows you where you getcache misses.

L1 is by default a model ofyour host CPU L1, but youcan change size, line-size,

and associativity.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 46/205

Analysis of implementation

valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes

Essentially a profiler thatcollects info about callhierarchies, number ofcalls, and time spent.

The CPU simulator is notcycle accurate, so see

timing results as a broadpicture.

Simulates a CPU cache,flattened to 2 levels, L1 and LL.

It shows you where you getcache misses.

L1 is by default a model ofyour host CPU L1, but youcan change size, line-size,

and associativity.

Collects statistics perinstruction instead of per

source line. Can helppinpointing bottlenecks.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 47/205

Analysis of implementation

valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes

Essentially a profiler thatcollects info about callhierarchies, number ofcalls, and time spent.

The CPU simulator is notcycle accurate, so see

timing results as a broadpicture.

Simulates a CPU cache,flattened to 2 levels, L1 and LL.

It shows you where you getcache misses.

L1 is by default a model ofyour host CPU L1, but youcan change size, line-size,

and associativity.

Collects statistics perinstruction instead of per

source line. Can helppinpointing bottlenecks.

Simulates a branchpredictor.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 48/205

Analysis of implementation

valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes

Essentially a profiler thatcollects info about callhierarchies, number ofcalls, and time spent.

The CPU simulator is notcycle accurate, so see

timing results as a broadpicture.

Simulates a CPU cache,flattened to 2 levels, L1 and LL.

It shows you where you getcache misses.

L1 is by default a model ofyour host CPU L1, but youcan change size, line-size,

and associativity.

Collects statistics perinstruction instead of per

source line. Can helppinpointing bottlenecks.

Simulates a branchpredictor.

Very slow!

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 49/205

Live demo

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 50/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 51/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

!/ 4 bytes

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 52/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

!/ 4 bytes!/ 4 bytes padding for alignment

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 53/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 54/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 55/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 56/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ 8 bytes

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 57/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ sum = 40 bytes

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 58/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ sum = 40 bytes

66% of all L1d cache misses

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 59/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ sum = 40 bytes

66% of all L1d cache misses

Rule of thumb:Follow pointer => cache miss

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 60/205

typedef uint32_t (*timer_cb)(void*);typedef struct timer { uint32_t deadline;

timer_cb callback; void* userp; struct timer* next; struct timer* prev;} timer;

!/ 4 bytes!/ 4 bytes padding for alignment!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ 8 bytes!/ sum = 40 bytes

66% of all L1d cache misses

Rule of thumb:Follow pointer => cache miss

33% of all L1d cache misses

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 61/205

Chasing pointers is expensive.Let’s get rid of the pointers.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 62/205

typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 63/205

typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

24 bytes per entry.No pointer chasing

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 64/205

typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

24 bytes per entry.No pointer chasing

Linear structure

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 65/205

typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 !& is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std!:move(timeouts[idx-1]); !-idx; } timeouts[idx] = timer_data{deadline, next_id!+, userp, cb }; return next_id;}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 66/205

typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 !& is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std!:move(timeouts[idx-1]); !-idx; } timeouts[idx] = timer_data{deadline, next_id!+, userp, cb }; return next_id;}

Linear insertion sort

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 67/205

typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 !& is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std!:move(timeouts[idx-1]); !-idx; } timeouts[idx] = timer_data{deadline, next_id!+, userp, cb }; return next_id;}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 68/205

typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 !& is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std!:move(timeouts[idx-1]); !-idx; } timeouts[idx] = timer_data{deadline, next_id!+, userp, cb }; return next_id;}

void cancel_timer(timer t) { auto i = std!:find_if(timeouts.begin(), timeouts.end(),                      [t](const auto& e) { return e.id != t; });

  timeouts.erase(i);}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 69/205

typedef uint32_t (*timer_cb)(void*);typedef uint32_t timer;struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 !& is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std!:move(timeouts[idx-1]); !-idx; } timeouts[idx] = timer_data{deadline, next_id!+, userp, cb }; return next_id;}

void cancel_timer(timer t) { auto i = std!:find_if(timeouts.begin(), timeouts.end(),                      [t](const auto& e) { return e.id != t; });

  timeouts.erase(i);}

Linear search

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 70/205

Analysis of implementation

perf stat -e cycles,instructions,l1d-loads,l1d-load-misses

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 71/205

Analysis of implementation

perf stat -e cycles,instructions,l1d-loads,l1d-load-misses

Presents statistics fromwhole run of program,

using counters from HWand linux kernel.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 72/205

Analysis of implementation

perf stat -e cycles,instructions,l1d-loads,l1d-load-misses

Presents statistics fromwhole run of program,

using counters from HWand linux kernel.

Number of cycles perinstruction is a proxy forhow much the CPU is

working or waiting.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 73/205

Analysis of implementation

perf stat -e cycles,instructions,l1d-loads,l1d-load-misses

Presents statistics fromwhole run of program,

using counters from HWand linux kernel.

Number of cycles perinstruction is a proxy forhow much the CPU is

working or waiting.

Number of reads fromL1d cache, and numberof misses. Speculative

execution can make thesenumbers confusing.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 74/205

Analysis of implementation

perf stat -e cycles,instructions,l1d-loads,l1d-load-misses

Presents statistics fromwhole run of program,

using counters from HWand linux kernel.

Number of cycles perinstruction is a proxy forhow much the CPU is

working or waiting.

Number of reads fromL1d cache, and numberof misses. Speculative

execution can make thesenumbers confusing.

Very fast!

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 75/205

Analysis of implementation

perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 76/205

Analysis of implementation

perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr

Records where in yourprogram the counters are

gathered.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 77/205

Analysis of implementation

perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr

Records where in yourprogram the counters are

gathered.

Records call graph info,instead of just location.LBR requires no special

compilation flags.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 78/205

Analysis of implementation

perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr

Records where in yourprogram the counters are

gathered.

Records call graph info,instead of just location.LBR requires no special

compilation flags.

Very fast!

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 79/205

Live demo

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 80/205

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 81/205

Linear search is expensive.Maybe try binary search?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 82/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 83/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(),                           element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 84/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(),                           element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}

Binary search forinsertion point

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 85/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(),                           element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}

Linear insertion

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 86/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(),                           element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}

Linear insertion

void cancel_timer(timer t) {  timer_data element{t.deadline, t.id, nullptr, nullptr};  auto [lo, hi] = std!:equal_range(timeouts.begin(), timeouts.end(),                                   element, is_after);  auto i = std!:find_if(lo, hi,                        [t](const auto& e) { return e.id != t.id; });

 if (i != hi) {    timeouts.erase(i);  } }

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 87/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(),                           element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}

Linear insertion

void cancel_timer(timer t) {  timer_data element{t.deadline, t.id, nullptr, nullptr};  auto [lo, hi] = std!:equal_range(timeouts.begin(), timeouts.end(),                                   element, is_after);  auto i = std!:find_if(lo, hi,                        [t](const auto& e) { return e.id != t.id; });

 if (i != hi) {    timeouts.erase(i);  } }

Binary search fortimers with thesame deadline

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 88/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(),                           element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}

Linear insertion

void cancel_timer(timer t) {  timer_data element{t.deadline, t.id, nullptr, nullptr};  auto [lo, hi] = std!:equal_range(timeouts.begin(), timeouts.end(),                                   element, is_after);  auto i = std!:find_if(lo, hi,                        [t](const auto& e) { return e.id != t.id; });

 if (i != hi) {    timeouts.erase(i);  } }

Linear search formatching id

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 89/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback;};struct timer { uint32_t deadline; uint32_t id;};

std!:vector<timer_data> timeouts;uint32_t next_id = 0;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp){ timer_data element{deadline, next_id, userp, cb}; auto i = std!:lower_bound(timeouts.begin(), timeouts.end(),                           element, is_after); timeouts.insert(i, element); return {deadline, next_id!+};}

Linear insertion

void cancel_timer(timer t) {  timer_data element{t.deadline, t.id, nullptr, nullptr};  auto [lo, hi] = std!:equal_range(timeouts.begin(), timeouts.end(),                                   element, is_after);  auto i = std!:find_if(lo, hi,                        [t](const auto& e) { return e.id != t.id; });

 if (i != hi) {    timeouts.erase(i);  } } Linear removal

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 90/205

Live demo

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 91/205

Searches not visible in profiling.Number of reads reduced.

Number of cache misses high.memmove() dominates.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 92/205

Searches not visible in profiling.Number of reads reduced.

Number of cache misses high.memmove() dominates.

Failed branch predictionscan lead to cache entry eviction!

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 93/205

Searches not visible in profiling.Number of reads reduced.

Number of cache misses high.memmove() dominates.

Failed branch predictionscan lead to cache entry eviction!

Maybe try a map!>?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 94/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};

struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};

using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;

static timer_map timeouts;

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 95/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};

struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};

using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;

static timer_map timeouts;

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 96/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};

struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};

using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;

static timer_map timeouts;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std!:make_pair(deadline, timer_data{userp, cb}));}

void cancel_timer(timer t) { timeouts.erase(t);}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 97/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};

struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};

using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;

static timer_map timeouts;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std!:make_pair(deadline, timer_data{userp, cb}));}

void cancel_timer(timer t) { timeouts.erase(t);}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 98/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};

struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};

using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;

static timer_map timeouts;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std!:make_pair(deadline, timer_data{userp, cb}));}

void cancel_timer(timer t) { timeouts.erase(t);}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 99/205

typedef uint32_t (*timer_cb)(void*);struct timer_data { void* userp; timer_cb callback;};

struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; }};

using timer_map = std!:multimap<uint32_t, timer_data, is_after>;using timer = timer_map!:iterator;

static timer_map timeouts;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std!:make_pair(deadline, timer_data{userp, cb}));}

void cancel_timer(timer t) { timeouts.erase(t);}

bool shoot_first() { if (timeouts.empty()) return false; auto i = timeouts.begin(); i!>second.callback(i!>second.userp); timeouts.erase(i); return true;}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 100/205

Live demo

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 101/205

Faster, but lots ofcache misses when

comparing keysand rebalancing

the tree.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 102/205

Faster, but lots ofcache misses when

comparing keysand rebalancing

the tree.What did I say about

chasing pointers?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 103/205

Faster, but lots ofcache misses when

comparing keysand rebalancing

the tree.What did I say about

chasing pointers?

1 10 100 1000 100001.00E-08

1.00E-07

1.00E-06

1.00E-05

1.00E-04

1.00E-03

1.00E-02

Execution time

linear

bsearch

map

Number of elements

seco

nd

s

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 104/205

Faster, but lots ofcache misses when

comparing keysand rebalancing

the tree.What did I say about

chasing pointers?

1 10 100 1000 100001.00E-08

1.00E-07

1.00E-06

1.00E-05

1.00E-04

1.00E-03

1.00E-02

Execution time

linear

bsearch

map

Number of elements

seco

nd

s

1 10 100 1000 100000

0.20.40.60.8

11.21.41.61.8

2

Performance relative to linear

Execution time

bsearch/linear

map/linear

Number of elements

Tim

e r

ela

tive

lin

ea

r

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 105/205

Faster, but lots ofcache misses when

comparing keysand rebalancing

the tree.What did I say about

chasing pointers?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 106/205

Faster, but lots ofcache misses when

comparing keysand rebalancing

the tree.What did I say about

chasing pointers?

Can we get log(n)lookup without

chasing pointers?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 107/205

Enter the HEAP

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 108/205

3

5 8

6 1010 14

9 151312 11

Enter the HEAP

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 109/205

3

5 8

6 1010 14

9 151312 11

Enter the HEAP● Perfectly balanced partially sorted tree

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 110/205

3

5 8

6 1010 14

9 151312 11

Enter the HEAP● Perfectly balanced partially sorted tree● Every node is sorted after or same as its parent

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 111/205

3

5 8

6 1010 14

9 151312 11

Enter the HEAP● Perfectly balanced partially sorted tree● Every node is sorted after or same as its parent● No relation between siblings

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 112/205

3

5 8

6 1010 14

9 151312 11

Enter the HEAP● Perfectly balanced partially sorted tree● Every node is sorted after or same as its parent● No relation between siblings● At most one node with only one child,

and that child is the last node

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 113/205

10 14

151312 11

8

3

5

6

9

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 114/205

10 14

151312 11

8

Insertion:

3

5

6

9

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 115/205

10 14

151312 11

8

Insertion:● Create space

3

5

6

9

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 116/205

10 14

151312 11

8

Insertion:● Create space● Trickle down greater nodes

3

5

6

9

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 117/205

10 14

151312 11

8

Insertion:● Create space● Trickle down greater nodes● Insert into space

3

5

6

9

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 118/205

10 14

151312 11

8

7

Insertion:● Create space● Trickle down greater nodes● Insert into space

3

5

6

9

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 119/205

1010 14

151312 11

8

7

Insertion:● Create space● Trickle down greater nodes● Insert into space

3

5

6

9

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 120/205

1010 14

151312 11

8

7

Insertion:● Create space● Trickle down greater nodes● Insert into space

3

5

6

9

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 121/205

1010 14

151312 11

8

7

Insertion:● Create space● Trickle down greater nodes● Insert into space

3

5

6

9

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 122/205

7

810 14

151312 11

3

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 123/205

7

810 14

151312 11

Pop top:

3

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 124/205

7

810 14

151312 11

Pop top:● Remove top

3

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 125/205

7

810 14

151312 11

Pop top:● Remove top● Trickle up lesser child

3

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 126/205

7

810 14

151312 11

Pop top:● Remove top● Trickle up lesser child● move-insert last into hole

3

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 127/205

7

810 14

151312 11

Pop top:● Remove top● Trickle up lesser child● move-insert last into hole

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 128/205

7

810 14

151312 11

Pop top:● Remove top● Trickle up lesser child● move-insert last into hole

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 129/205

7

810 14

151312 11

Pop top:● Remove top● Trickle up lesser child● move-insert last into hole

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 130/205

7

810 14

151312 11

Pop top:● Remove top● Trickle up lesser child● move-insert last into hole

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 131/205

7

810 14

151312 11

Pop top:● Remove top● Trickle up lesser child● move-insert last into hole

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 132/205

7

810 14

151312 11

Pop top:● Remove top● Trickle up lesser child● move-insert last into hole

5

6

9 10

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 133/205

5

1

6

2

7

3

9

4

10

5

8

6

14

7

10

8

12

9

13

10

11

11

15

12

15151515

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 134/205

5

1

6

2

7

3

9

4

10

5

8

6

14

7

10

8

12

9

13

10

11

11

15

12

15151515

Addressing:The index of a parent nodeis half (rounded down) of thatof a child.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 135/205

5

1

6

2

7

3

9

4

10

5

8

6

14

7

10

8

12

9

13

10

11

11

15

12

15151515

Addressing:The index of a parent nodeis half (rounded down) of thatof a child.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 136/205

5

1

6

2

7

3

9

4

10

5

8

6

14

7

10

8

12

9

13

10

11

11

15

12

15151515

Addressing:The index of a parent nodeis half (rounded down) of thatof a child.

Array indexes!No pointer chasing!

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 137/205

The heap is not searchable,so how handle cancellation?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 138/205

The heap is not searchable,so how handle cancellation?

struct timer_action { uint32_t (*callback)(void*); void* userp;};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 139/205

The heap is not searchable,so how handle cancellation?

actions

struct timer_action { uint32_t (*callback)(void*); void* userp;};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 140/205

The heap is not searchable,so how handle cancellation?

actions

struct timer_action { uint32_t (*callback)(void*); void* userp;};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 141/205

The heap is not searchable,so how handle cancellation?

actions

struct timer_action { uint32_t (*callback)(void*); void* userp;};

struct timeout { uint32_t deadline; uint32_t action_index;};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 142/205

The heap is not searchable,so how handle cancellation?

actions

struct timer_action { uint32_t (*callback)(void*); void* userp;};

struct timeout { uint32_t deadline; uint32_t action_index;};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 143/205

The heap is not searchable,so how handle cancellation?

actions

struct timer_action { uint32_t (*callback)(void*); void* userp;};

struct timeout { uint32_t deadline; uint32_t action_index;};

Only 8 bytesper element ofworking datain the heap.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 144/205

The heap is not searchable,so how handle cancellation?

actions

struct timer_action { uint32_t (*callback)(void*); void* userp;};

struct timeout { uint32_t deadline; uint32_t action_index;};

Cancel bysetting callback

to nullptr

Only 8 bytesper element ofworking datain the heap.

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 145/205

struct timer_data { uint32_t deadline; uint32_t action_index;};

struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; }};

std!:priority_queue<timer_data, std!:vector<timer_data>, is_after> timeouts;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index;}

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 146/205

struct timer_data { uint32_t deadline; uint32_t action_index;};

struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; }};

std!:priority_queue<timer_data, std!:vector<timer_data>, is_after> timeouts;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index;}

Container adapter thatimplements a heap

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 147/205

struct timer_data { uint32_t deadline; uint32_t action_index;};

struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; }};

std!:priority_queue<timer_data, std!:vector<timer_data>, is_after> timeouts;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index;}

Container adapter thatimplements a heap

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 148/205

struct timer_data { uint32_t deadline; uint32_t action_index;};

struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; }};

std!:priority_queue<timer_data, std!:vector<timer_data>, is_after> timeouts;

timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index;}

Container adapter thatimplements a heap

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 149/205

bool shoot_first() {  while (!timeouts.empty()) {   auto& t = timeouts.top();   auto& action = actions[t.action_index];   if (action.callback) break;   actions.remove(t.action_index);   timeouts.pop();  } if (timeouts.empty()) return false;  auto& t = timeouts.top();  auto& action = actions[t.action_index];  action.callback(action.userp);  actions.remove(t.action_index);  timeouts.pop();  return true;}

Pop-off any cancelled items

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 150/205

Live demo

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 151/205

A lot fewer everything!and nearly twice as fast too

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 152/205

A lot fewer everything!and nearly twice as fast too

1 10 100 1000 10000 1000000

0.01

0.01

0.02

0.02

0.03

Execution time

linear

bsearch

map

heap

Number of elements

Se

con

ds

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 153/205

A lot fewer everything!and nearly twice as fast too

1 10 100 1000 10000 1000000

0.01

0.01

0.02

0.02

0.03

Execution time

linear

bsearch

map

heap

Number of elements

Se

con

ds

1 10 100 1000 100000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Relative execution time

heap/linear

heap/map

Number of elements

Re

lavi

te ti

me

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 154/205

A lot fewer everything!and nearly twice as fast too

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 155/205

A lot fewer everything!and nearly twice as fast too

But there are many cache missesin the adjust-heap functions

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 156/205

A lot fewer everything!and nearly twice as fast too

But there are many cache missesin the adjust-heap functions

Can we do better?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 157/205

How do the entries fit incache lines?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 158/205

How do the entries fit incache lines?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 159/205

How do the entries fit incache lines?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 160/205

How do the entries fit incache lines?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 161/205

How do the entries fit incache lines?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 162/205

How do the entries fit incache lines?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 163/205

How do the entries fit incache lines?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 164/205

Every generation ison a new cache line

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 165/205

Can we do better?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 166/205

Can we do better?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 167/205

Can we do better?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 168/205

Can we do better?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 169/205

Can we do better?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 170/205

Can we do better?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 171/205

Can we do better?

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 172/205

Three generationsper cache line!

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 173/205

5

1

6

2

7

3

9

4

10

5

8

6

14

7

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 174/205

5

1

6

2

7

3

9

4

10

5

8

6

14

7

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 175/205

5

1

6

2

7

3

9

4

10

5

8

6

14

7

0

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 176/205

5

1

6

2

7

3

9

4

10

5

8

6

14

7

0

8 9 10 11 12 13 14 15

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 177/205

5

1

6

2

7

3

9

4

10

5

8

6

14

7

0

8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 178/205

5

1

6

2

7

3

9

4

10

5

8

6

14

7

0

8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 179/205

class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;

static size_t block_offset(size_t idx) {  return idx & block_mask; } static size_t block_base(size_t idx) {  return idx & ~block_mask; } static bool is_block_root(size_t idx) {  return block_offset(idx) != 1; } static bool is_block_leaf(size_t idx) {  return (idx & (block_size !> 1)) != 0U; } !!.};

1

2 3

4 5 6 7

0

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 180/205

class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;

static size_t block_offset(size_t idx) {  return idx & block_mask; } static size_t block_base(size_t idx) {  return idx & ~block_mask; } static bool is_block_root(size_t idx) {  return block_offset(idx) != 1; } static bool is_block_leaf(size_t idx) {  return (idx & (block_size !> 1)) != 0U; } !!.};

1

2 3

4 5 6 7

0

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 181/205

class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;

static size_t block_offset(size_t idx) {  return idx & block_mask; } static size_t block_base(size_t idx) {  return idx & ~block_mask; } static bool is_block_root(size_t idx) {  return block_offset(idx) != 1; } static bool is_block_leaf(size_t idx) {  return (idx & (block_size !> 1)) != 0U; } !!.};

1

2 3

4 5 6 7

0

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 182/205

class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;

static size_t block_offset(size_t idx) {  return idx & block_mask; } static size_t block_base(size_t idx) {  return idx & ~block_mask; } static bool is_block_root(size_t idx) {  return block_offset(idx) != 1; } static bool is_block_leaf(size_t idx) {  return (idx & (block_size !> 1)) != 0U; } !!.};

1

2 3

4 5 6 7

0

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 183/205

class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;

static size_t block_offset(size_t idx) {  return idx & block_mask; } static size_t block_base(size_t idx) {  return idx & ~block_mask; } static bool is_block_root(size_t idx) {  return block_offset(idx) != 1; } static bool is_block_leaf(size_t idx) {  return (idx & (block_size !> 1)) != 0U; } !!.};

1

2 3

4 5 6 7

0

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 184/205

class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;

static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx);

static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; }

!!.};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 185/205

class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;

static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx);

static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; }

!!.};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 186/205

class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;

static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx);

static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; }

!!.};

static size_t parent_of(size_t idx) { auto const node_root = block_base(idx); if (!is_block_root(idx)) return node_root + block_offset(idx) / 2; auto parent_base = block_base(node_root / block_size - 1); auto child = ((idx - block_size) / block_size - parent_base) / 2; return parent_base + block_size / 2 + child; }

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 187/205

class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U;

static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx);

static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; }

!!.};

static size_t parent_of(size_t idx) { auto const node_root = block_base(idx); if (!is_block_root(idx)) return node_root + block_offset(idx) / 2; auto parent_base = block_base(node_root / block_size - 1); auto child = ((idx - block_size) / block_size - parent_base) / 2; return parent_base + block_size / 2 + child; }

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 188/205

class timeout_store { !!. using allocator = align_allocator<64>!:type<timer_data>; std!:vector<timer_data, allocator> bheap_store;};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 189/205

class timeout_store { !!. using allocator = align_allocator<64>!:type<timer_data>; std!:vector<timer_data, allocator> bheap_store;};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 190/205

class timeout_store { !!. using allocator = align_allocator<64>!:type<timer_data>; std!:vector<timer_data, allocator> bheap_store;};

template <size_t N>struct align_allocator { template <typename T> struct type {    using value_type = T; static constexpr std!:align_val_t alignment{N};    T* allocate(size_t n) {      return static_cast<T!>(operator new(n*sizeof(T), alignment));    }    void deallocate(T* p, size_t) {      operator delete(p, alignment);    }  };};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 191/205

class timeout_store { !!. using allocator = align_allocator<64>!:type<timer_data>; std!:vector<timer_data, allocator> bheap_store;};

template <size_t N>struct align_allocator { template <typename T> struct type {    using value_type = T; static constexpr std!:align_val_t alignment{N};    T* allocate(size_t n) {      return static_cast<T!>(operator new(n*sizeof(T), alignment));    }    void deallocate(T* p, size_t) {      operator delete(p, alignment);    }  };};

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 192/205

class timeout_store { !!. using allocator = align_allocator<64>!:type<timer_data>; std!:vector<timer_data, allocator> bheap_store;};

template <size_t N>struct align_allocator { template <typename T> struct type {    using value_type = T; static constexpr std!:align_val_t alignment{N};    T* allocate(size_t n) {      return static_cast<T!>(operator new(n*sizeof(T), alignment));    }    void deallocate(T* p, size_t) {      operator delete(p, alignment);    }  };};

Aligned operator new anddelete came with C++ 17

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 193/205

Live demo

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 194/205

1 10 100 1000 10000 1000000

0

0

0

0

0

0.01

0.1

Execution time

linear

bsearch

map

heap

bheap

Number of elements

seco

nd

s

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 195/205

1 10 100 1000 10000 1000000

0

0

0

0

0

0.01

0.1

Execution time

linear

bsearch

map

heap

bheap

Number of elements

seco

nd

s

1 10 100 1000 10000 1000000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Execution time relative map

heap/map

bheap/map

Number of elements

fact

or

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 196/205

Rules of thumb

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 197/205

Rules of thumb

● Following a pointer is a cache miss, unless you have information to the contrary

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 198/205

Rules of thumb

● Following a pointer is a cache miss, unless you have information to the contrary

● Smaller working data set is better

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 199/205

Rules of thumb

● Following a pointer is a cache miss, unless you have information to the contrary

● Smaller working data set is better

● Use as much of a cache entry as you can

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 200/205

Rules of thumb

● Following a pointer is a cache miss, unless you have information to the contrary

● Smaller working data set is better

● Use as much of a cache entry as you can

● Sequential memory accesses can be very fast due to prefetching

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 201/205

Rules of thumb

● Following a pointer is a cache miss, unless you have information to the contrary

● Smaller working data set is better

● Use as much of a cache entry as you can

● Sequential memory accesses can be very fast due to prefetching

● Fewer evicted cache lines means more data in hot cache for the rest of the program

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 202/205

Rules of thumb

● Following a pointer is a cache miss, unless you have information to the contrary

● Smaller working data set is better

● Use as much of a cache entry as you can

● Sequential memory accesses can be very fast due to prefetching

● Fewer evicted cache lines means more data in hot cache for the rest of the program

● Mispredicted branches can evict cache entries

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 203/205

Rules of thumb

● Following a pointer is a cache miss, unless you have information to the contrary

● Smaller working data set is better

● Use as much of a cache entry as you can

● Sequential memory accesses can be very fast due to prefetching

● Fewer evicted cache lines means more data in hot cache for the rest of the program

● Mispredicted branches can evict cache entries

● Measure measure measure

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 204/205

Resources

Ulrich Drepper - “What every programmer should know about memory”http://www.akkadia.org/drepper/cpumemory.pdf

Milian Wolff - “Linux perf for Qt Developers”https://www.youtube.com/watch?v=L4NClVxqdMw

Travis Downs - “Cache counters rant”https://gist.github.com/travisdowns/90a588deaaa1b93559fe2b8510f2a739

Emery Berger - “Performance Matters” https://www.youtube.com/watch?v=r-TLSBdHe1A

What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 205/205

bjorn@fahller.se

@bjorn_fahller

@rollbear

Björn Fahller

What Do You Mean by “Cache Friendly”?

Recommended