ASLR on the Line · L1 L2 L3 (Last Level Cache), shared between cores DDR Memory CPU Core L1 L2 CPU...

Preview:

Citation preview

ASLR on the LineBen Gras, Kaveh Razavi, Erik Bosman, Herbert Bos, Cris�ano Giuffrida

VUSec

Erik Bosman @brainsmoke

Ben Gras @bjg

Kaveh Razavi @gober

Stephan van Schaik

ASLR

Address Space Layout Randomiza�on

Widely deployed exploit mi�ga�on strategy:

Choose a different loca�on for code and data

every �me a process is run.

lower addresses

higher addresses

0

248-1

lower addresses

higher addresses

0

248-1

lower addresses

higher addresses

0

248-1

lower addresses

higher addresses

0

248-1

lower addresses

higher addresses

0

248-1

lower addresses

higher addresses

0

248-1

lower addresses

higher addresses

0

248-1

Address Space Layout Randomiza�on

Makes life for exploit writers a bit more

difficult.

Usually exploits need to know the loca�on

of certain data in memory.

A Single Leak Reveals

-- Joshua Drake

Address Space Layout Randomiza�on

Exploit writers need to find a bug which leaks

addresses without crashing the program.

... or do they?

This Presenta�on:

A side-channel a�ack on processes bakedinto the hardware to discover ASLRinforma�on from Javascript in the browser.

ASLR Cache (AnC)⊕

CPU CoreL1

L2

L3 (Last Level Cache), shared between cores

DDR Memory

CPU CoreL1

L2

CPU CoreL1

L2

CPU CoreL1

L2

Modern CPU architectures

CPU Core

L1 code / L1 data

L2

L3 (Last Level Cache), shared between cores ......

CPU Core

L1 code / L1 data

L2

L3 (Last Level Cache), shared between cores ......

memory access

data

CPU Core

L1 code / L1 data

L2

L3 (Last Level Cache), shared between cores ......

memory access

data

virtualaddress

CPU Core

L1 code / L1 data

L2

L3 (Last Level Cache), shared between cores ......

memory access

data

MMU

virtualaddress

physical address

CPU Core

L1 code / L1 data

L2

L3 (Last Level Cache), shared between cores ......

memory access

data

virtualaddress

physical address

MMUTLB

cache

CPU Core

L1 code / L1 data

L2

L3 (Last Level Cache), shared between cores ......

memory access

data

virtualaddress

physical address

MMUTLB

cachePT

walk

miss

CPU Core

L1 code / L1 data

L2

L3 (Last Level Cache), shared between cores ......

memory access

data

virtualaddress

physical address

MMUTLB

cachePT

walk

miss

Timers in Javascript

t0=performance.now();

operation();

t1=performance.now();t = t1-t0; m

easu

red

�m

e

real �me

t0=performance.now();

operation();

t1=performance.now();t = t1-t0; m

easu

red

�m

e

real �me

t0=performance.now();

operation();

t1=performance.now();t = t1-t0; m

easu

red

�m

e

real �me

a�er an�- side-channel mi�ga�ons (firefox)a�er an�- side-channel mi�ga�ons (firefox)

c = 0;t0 = p.now();while(t0 == p.now());t1 = p.now();

operation();

while(t1 == p.now()){ c++; }

mea

sure

d �

me

real �me

a�er an�- side-channel mi�ga�ons (firefox)

c = 0;t0 = p.now();while(t0 == p.now());t1 = p.now();

operation();

while(t1 == p.now()){ c++; }

mea

sure

d �

me

real �me

a�er an�- side-channel mi�ga�ons (firefox)

c = 0;t0 = p.now();while(t0 == p.now());t1 = p.now();

operation();

while(t1 == p.now()){ c++; }

mea

sure

d �

me

real �me

a�er an�- side-channel mi�ga�ons (firefox)

c = 0;t0 = p.now();while(t0 == p.now());t1 = p.now();

operation();

while(t1 == p.now()){ c++; }

mea

sure

d �

me

real �me

a�er an�- side-channel mi�ga�ons (firefox)

c = 0;t0 = p.now();while(t0 == p.now());t1 = p.now();

operation();

while(t1 == p.now()){ c++; }

mea

sure

d �

me

real �me

a�er an�- side-channel mi�ga�ons (firefox)a�er an�- side-channel mi�ga�ons (firefox)

c = 0;t0 = p.now();while(t0 == p.now());t1 = p.now();

operation();

while(t1 == p.now()){ c++; }

mea

sure

d �

me

real �me

a�er an�- side-channel mi�ga�ons (chrome)a�er an�- side-channel mi�ga�ons (chrome)

new SharedArrayBuffer()

memory which may be shared between

mul�ple worker threads.

new SharedArrayBuffer()

enabled by default by Firefox, Chromeand Edge since 2017

memory which may be shared betweenmul�ple worker threads.

new SharedArrayBuffer()

let SharedRowhammerBuffer = SharedArrayBuffer;

c=0;while (buf[0] == 0);

while (buf[0] == 1){ c++; }

buf[0]=1;operation();buf[0]=0;

1

2 mea

sure

d �

me

real �me

using SharedArrayBuffer and worker threads

c=0;while (buf[0] == 0);

while (buf[0] == 1){ c++; }

buf[0]=1;operation();buf[0]=0;

1

2 mea

sure

d �

me

real �me

using SharedArrayBuffer and worker threads

c=0;while (buf[0] == 0);

while (buf[0] == 1){ c++; }

buf[0]=1;operation();buf[0]=0;

1

2 mea

sure

d �

me

real �me

using SharedArrayBuffer and worker threads

c=0;while (buf[0] == 0);

while (buf[0] == 1){ c++; }

buf[0]=1;operation();buf[0]=0;

1

2 mea

sure

d �

me

real �me

using SharedArrayBuffer and worker threads

c=0;while (buf[0] == 0);

while (buf[0] == 1){ c++; }

buf[0]=1;operation();buf[0]=0;

1

2 mea

sure

d �

me

real �me

using SharedArrayBuffer and worker threads

c=0;while (buf[0] == 0);

while (buf[0] == 1){ c++; }

buf[0]=1;operation();buf[0]=0;

1

2 mea

sure

d �

me

real �me

using SharedArrayBuffer and worker threads

c=0;while (buf[0] == 0);

while (buf[0] == 1){ c++; }

buf[0]=1;operation();buf[0]=0;

1

2 mea

sure

d �

me

real �me

using SharedArrayBuffer and worker threads

Cache Side-Channels

cache line (64 bytes)

memory

memory access

data physical address

L3 cache

N-way associa�vecache set

1 cache set

memory

memory access

data physical address

L3 cache

...2048 cache sets with 64 byte cache lines

memory

memory access

data physical address

L3 cache

...

as many slices as cores

...

...

...memory

memory access

data physical address

L3 cache

...

memory

memory access

data physical address

L3 cache

...cache_set = (addr >> 6) % 2048,

direct mapping,repeated every 128KB

memory

memory access

data physical address

L3 cache

...cache_set = (addr >> 6) % 2048,

cache_slice = xor_hash(addr)

direct mapping,

repeated every 128KB

memory

memory access

data physical address

L3 cache

cache_set = (addr >> 6) % 2048,direct mapping,

repeated every 128KB

memory

memory access

data physical address

L3 cache

cache_set = (addr >> 6) % 2048,

two cache lines mapping to the same cache sethave the same physical address modulo 128KB

direct mapping,repeated every 128KB

memory

memory access

data physical address

L3 cache

cache_set = (addr >> 6) % 2048,

two cache lines mapping to the same cache sethave the same physical address modulo 4KB

direct mapping,repeated every 128KB

memory

memory access

data physical address

L3 cache

two cache linesmapping to thesame cache sethave the sameoffset into theirmemory page

memory

memory access

data physical address

1 page =64 cache lines

L3 cacheEVICT + TIME

(does an opera�on use a specific cache line?)

L3 cacheEVICT + TIME

(does an opera�on use a specific cache line?)

evict(line_x);time();t0 = time();operation();t = time()-t0;

L3 cacheEVICT + TIME

(does an opera�on use a specific cache line?)

evict(line_x);time();t0 = time();operation();t = time()-t0;

L3 cacheEVICT + TIME

(does an opera�on use a specific cache line?)

Xmybuf

X

X

X

...

evict(line_x);time();t0 = time();operation();t = time()-t0;

L3 cacheEVICT + TIME

(does an opera�on use a specific cache line?)

Xmybuf

X

X

X

...

evict(line_x);time();t0 = time();operation();t = time()-t0;X X X X X X X X X X X X

...

L3 cacheEVICT + TIME

(does an opera�on use a specific cache line?)

Xmybuf

X

X

X

...

evict(line_x);time();t0 = time();operation();t = time()-t0;X X X X X X X X X X X X

...

L3 cacheEVICT + TIME

(does an opera�on use a specific cache line?)

Xmybuf

X

X

X

...

evict(line_x);time();t0 = time();operation();t = time()-t0;X X X X X X X X X X X X

...

trigger memory access (or not)

CPU Core

L1 code / L1 data

L2

L3 (Last Level Cache), shared between cores ......

memory access

data

virtualaddress

physical address

MMUTLB

cachePT

walk

miss

Page Tables

lower addresses

higher addresses

0

248-1

CR3

512 entriescovering512GB each

0

248-1

CR3512 entriescovering1GB each

0

248-1

CR3512 entriescovering2MB each

0

248-1

CR3

512 entriespoin�ngto 4096 byteregions inmemory 0

248-1

CR3

0

248-1

7F83B6372040virtual address lookup (x86_64)

7F83B6372040virtual address lookup (x86_64)

7 F 8 3 B 6 3 7 3 0 4 0

TLB miss!

CR3

CR3

512entries

0

511

255

CR3

512entries

0

511

255

CR3

512entries

255

CR3

14

512entries

255

CR3

14

512entries

255

CR3

14 433

512entries

255

CR3

14 433

512entries

255

CR3

14 433 370

512entries

255

CR3

14 433 370

actualdata

255

CR3

14 433 370

actualdata

64

255

CR3

14 433 370

actualdata

4K page

64

255

CR3

14 433 370

4K page

64

255

CR3

14 433 370

4K page

64

address informa�on is directly encoded

into the page table lookups, and pagetables are pages themselves.

Observa�on:

255

CR3

14 433 370

4K page

64

255

CR3

14 433 370

4K page

64

4K page 4K page4K page 4K page

255

CR3

14 433 370

4K page

64

4K page 4K page4K page 4K page

CR3

......

255254253252251

250249248

256

247

255

?

......

255254253252251

250249248

256

247

255

1 Cache line =64 bytes =8 possible page table entries

?

......

255254253252251

250249248

256

247

255

1 Cache line =64 bytes =8 possible page table entries

?

......

255254253252251

250249248

256

247

255

1 Cache line =64 bytes =8 possible page table entries

cache line reveals 6 address bits

6437043314255

6437043314255

6437043314255

loca�on withinthe page known

by studyingbrowser

memory allocator

6437043314255

6437043314255

max entropy le�:

? ? ? ?

max entropy le�:

? ? ? ?

max entropy le�: 4*3 bits + ...

? ? ? ?

which hit belongs to which cache line?

max entropy le�: 4*3 bits + ...

? ? ? ?

which hit belongs to which cache line?

max entropy le�: 4*3 bits + log2( 4 * 3 * 2 * 1 )

? ? ? ?

which hit belongs to which cache line?

max entropy le�: ~ 16.6 bits

allocate a buffer

perform this side-channel a�ack on bufferentries 4096 bytes apart

measure when the page table lookupcrosses a cache line boundary

Sliding

......

375374373372371

370369368

376

367

370

......

375374373372371

370369368

376

367

371

+4096 bytes

......

375374373372371

370369368

376

367

372

+4096 bytes+4096 bytes

......

375374373372371

370369368

376

367

373

+4096 bytes+4096 bytes

+4096 bytes

......

375374373372371

370369368

376

367

374

+4096 bytes+4096 bytes

+4096 bytes

+4096 bytes

......

375374373372371

370369368

376

367

375

+4096 bytes+4096 bytes

+4096 bytes

+4096 bytes

+4096 bytes

376

+4096 bytes+4096 bytes

+4096 bytes

+4096 bytes

+4096 bytes+4096 bytes

......

375374373372371

370369368367

376

we can do the same thing for the 2ndlevel page table

Sliding

......

439438437436435

434433432

440

431

433

......

439438437436435

434433432

440

431

434

+2MB

......

439438437436435

434433432

440

431

435

+2MB

+2MB

......

439438437436435

434433432

440

431

436

+2MB+2MB+2MB

......

439438437436435

434433432

440

431

437

+2MB+2MB+2MB+2MB

......

439438437436435

434433432

440

431

438

+2MB+2MB+2MB+2MB+2MB

......

439438437436435

434433432

440

431

439

+2MB+2MB+2MB+2MB+2MB+2MB

......

439438437436435

434433432

440

431

440

+2MB+2MB+2MB+2MB+2MB+2MB

+2MB

? ?

? ?

? ?

max entropy le�: 2*3 + log2(2 * 1) = 7 bits

......

1514131211

1098

16

7

+1GB

14

......

255254253252251

250249248

256

247

255

+512GB

Firefox (on Linux) does not ini�alizeArrayBuffers, so linux does not allocatespace for the actual pages

We can allocate huge chunks and usesliding to recover the whole address

Alloca�ng large chunks of memory

Chrome does ini�alize memory, butjumps ahead in the address space every�me it creates a new heap

3rd level address bits can be recovered,4'th level bits needs chrome toini�alize/free up to 4TB :-)

Alloca�ng large chunks of memory

CPU Model Microarchitecture Year

Intel Xeon E3-1240 v5 Skylake 2015

Intel Core i7-6700K Skylake 2015

Intel Celeron N2840 Silvermont 2014

Intel Xeon E5-2658 v2 Ivy Bridge EP 2013

Intel Atom C2750 Silvermont 2013

Intel Core i7-4500U Haswell 2013

Intel Core i7-3632QM Ivy Bridge 2012

Intel Core i7-2620QM Sandy Bridge 2011

Intel Core i5 M480 Westmere 2010

Intel Core i7 920 Nehalem 2008

AMD FX-8350 8-Core Piledriver 2012

AMD FX-8320 8-Core Piledriver 2012

AMD FX-8120 8-Core Bulldozer 2011

AMD Athlon II 640 X4 K10 2010

AMD E-350 Bobcat 2010

AMD Phenom 9550 4-Core K10 2008

Allwinner A64 ARM Cortex A53 2016

Samsung Exynos 5800 ARM Cortex A15 2014

Samsung Exynos 5800 ARM Cortex A7 2014

Nvidia Tegra K1 CD580M-A1 ARM Cortex A15 2014

Nvidia Tegra K1 CD570M-A1 ARM Cortex A15; LPAE 2014

This side-channelwas detected on22 out of 22 testedarchitectures!

Demo video

Conclusions

- Browser vendors seem to have given up on protec�ng against side-channel a�acks in favor of adding features :,-(

- It's possible to perform cache side-channel a�acks from Javascript on the Memory Managment Unit to recover ASLR informa�on

Any Ques�ons?

VUSecproject page:h�ps://vusec.net/projects/anc

Recommended