Hardware-assisted software tracing - eLinux.org · Hardware-assisted software tracing ......

Hardware-assistedsoftware tracing

Adrien Vergéadrienverge@gmail.com

talk about tracing

improve tracing

hardware

1 Tracing2 Hardware3 Improvements

Tracing

“a technique used to understand what is going on in a system in

order to debug or monitor it”

recording events

from the kernel:

in user-space: tracepoints inside your application

IRQ handlers, system calls,scheduling activity, network activity, etc.

Why is my software crashing?Where are the bottlenecks?

How to improve performance?

use less resourcesrun faster

save battery

a process spawns 2 threads:#1 produces chunks of datathat #2 consumes

thread 1thread 1thread 2thread 2

processprocess

example: LTTng+TMF12:40:48.500 12:40:48.600 12:40:48.700Process TID PTID

bottleneck

12:40:48.500 12:40:48.600 12:40:48.700

CPU 1CPU 2

CPU 3CPU 4

CPU 5CPU 6

CPU 7IRQ 44

IRQ 46IRQ 43

SOFT_IRQ 9SOFT_IRQ 4

SOFT_IRQ 1SOFT_IRQ 7

WAIT_BLOCKEDWAIT_FOR_CPUUSERMODESYSCALLINTERRUPTED

bottleneck

execvee c c

bottleneck

read write wri

tracing:recording events

for use in further analysis

tracing:recording events

for use in further analysis

So it's just logging?

tracing vs. logging

compact binary trace formatbuffering — avoid disk IO

lockless algorithmslow-level optimizations

result : ~200 μs vs. ~200 ns / event

tracin

ers heavy workload

servers

real-time

intrusion detection

Google IBM

Autodesk

CAEOPAL-RT

tracin

ers heavy workload

servers

embedded

systemsreal-time

intrusion detection

Google IBM

Ericsson

FreescaleMontavista

SiemensSTMicroelectronicsWind RiverAutodesk

CAEOPAL-RT

tracin

ers heavy workload

servers

embedded

systemsreal-time

intrusion detection

Google IBM

Ericsson

FreescaleMontavista

SiemensSTMicroelectronicsWind RiverAutodesk

CAEOPAL-RT

Beyond Heisenberg:observe without altering

— perform light (size) and fast (time)

— don't pollute memory space

— thousands of events / sneed

Hardware

Microchips are no longer just CPUs

Intel (x86)BTS, LBR, PT...

Freescale (PowerPC)Nexus Program Trace,

Data Acquisition...ARMCoreSightETM, ETB, STM...

lots of tracing units

STM (event tracing)

ETM (execution tracing)

BTS (execution tracing)

supported by (probably good) proprietary software

Do you have one of these?

widely spread

Is your Intel CPU newer than this one?

Improvements

Improvements1/3 STM on ARM

System Trace Module (STM)

help softwarerecording

eventsGoal:

Providesdedicated resources

bus, buffer, timestampingNeed to

instrumentsoftware

CPUETM

system bus

timestamping

system-on-chip

ation “LTTng-equivalent”

The traced process is instrumented:calling tracepoint() writes to the STM.

Embedding payload is possible. A consumer process retrieves

generated traces and stores them.

optimized, compact butproprietary format

Traces areencoded in STP.

result

only tracepoints computation + tracepoints

no tracing

LTTng-UST

STM + ETB

indicative benchmark: overhead mostly depends on the traced application!

result

only tracepoints computation + tracepoints

no tracing

LTTng-UST

STM + ETB

Improvements2/3 ETM on ARM

trace executionGoal:

Embedded Trace Macrocell (ETM)

i.e. save every executedinstruction address

address comparators,buffer, timestamping

Can focus on aspecific process or function triggers upon custom conditions

No need to instrument

software

Can focus on aspecific process or function triggers upon custom conditions

CPUETM

system bus

timestamping

system-on-chip

ation ETM not meant

to trace events

do execution tracingon event addresses

set address comparators to trigger in [event, event+4]

ETM not meantto trace events

ation needed to write

kernel support for

process andfunction tracing

result

only computation more computation

no tracingLTTng-USTETM + ETB

Improvements3/3 BTS on x86

Branch Trace Store (BTS)

x86 host

branch records

4015a8

7f2aac77e0247f2aac77e012

40ef264015b0

4015b4

does not provide dedicated buffers

cannot focus on a specific process or function: traces every branch!

$ perf record -e branches:u -c 1 -d ./myprogram$ perf script -f time,ip,addr101918.272364: ffffffff814a6f2c => 7f8d7b9b3180101918.272364: ffffffff814a6f2c => 7f8d7b9b3180101918.272364: 7f8d7b9b3183 => 7f8d7b9b6730101918.272364: ffffffff814a6f2c => 7f8d7b9b6730101918.272364: ffffffff814a6f2c => 7f8d7b9b674f101918.272364: ffffffff814a6f2c => 7f8d7b9b6756101918.272364: 7f8d7b9b67c2 => 7f8d7b9b67df101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8101918.272364: 7f8d7b9b67ef => 7f8d7b9b6a30101918.272364: 7f8d7b9b6a38 => 7f8d7b9b6a58101918.272364: 7f8d7b9b6a62 => 7f8d7b9b6bc0101918.272364: 7f8d7b9b6bd7 => 7f8d7b9b67d3101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8101918.272364: 7f8d7b9b67e3 => 7f8d7b9b67c8

“Ishardware-assisted branch tracing faster than pure-software event tracing?”

BTS not meantto trace events

if enabled, traces every branch

ation hardware-traced

with BTS:

software-tracedwith LTTng:

simple program,every branchrecorded

same program,add a tracepoint()at every branch

result

program branching rate (branch/s)

no tracing

LTTng-UST

BTS with perf

core 0

core 1

core 6

core 7

system buffer

original perfBTS writes trace

to a dedicated buffer

trace is copied toa bigger memory zone uponbuffer full or context switch

user stores trace todisk using the write

system call

possible copy inanother buffer because

no O_SYNC flag

core 0

core 1

core 6

core 7

512K ×

number of cores

new “spliced” perfBTS writes trace

to a dedicated buffer

upon buffer full orcontext switch, move to

the next sub-buffer

filled sub-buffersare labeled to be

written to disk later

writing is doneby a kernel taskin user context

result

program branching rate (branch/s)

no tracingLTTng-USTBTS with perfBTS with "spliced" perf

Results-75 % overhead compared to LTTng-USTneeds post-decodingSTM

-30 % to -50 % overheadlimited number of tracepointsno payload

not suited for event tracing (not flexible)compared to vanilla perf, 2× faster

otherhardwareFreescale: Data Acquisition

Program Trace

Intel: Processor Trace

last words

tracing helps you build efficient software

using LTTng:very low footprint

Cortex-A9: ~ 5 sµ / event Core i7: ~ 200 ns / event

using hardware:almost zero footprint

trace in production!

LinksLTTng and TMF:

https://lttng.org/

STM libraries:https://github.com/adrienverge/libcoresightomap4430

ETM patch:https://lkml.org/lkml/2014/1/30/259

BTS patch:https://github.com/adrienverge/linux/tree/patch_perf_bts_splice

Thank you

Questions?

Hardware-assisted software tracing - eLinux.org · Hardware-assisted software tracing ......

Documents

y Leveraging Ray Tracing Hardware Acceleration In Unity · 2019-05-31 · Leveraging Ray Tracing Hardware Acceleration In Unity Digital Dragons ‘19 Anis Benyoub ... Lewis Jordan

Hardware Ray Tracing using Fixed Point Arithmetic

Real-Time Rendering - UBIagomes/cig/teoricas/06-Shadows-Intro.pdf · Fake Hardware Assisted Pre-computed Ray tracing Rather slow for large scenes Shadows are pre-computed and stored

Taming Transactions: Towards Hardware-Assisted …chris/research/doc/raid16...Taming Transactions: Towards Hardware-Assisted Control Flow Integrity Using Transactional Memory Marius

Hardware-Assisted Rootkits & Instrumentation

Hardware and Software Tracing

Hardware Assisted Tracing on ARM with CoreSight and OpenCSDs3.amazonaws.com/connect.linaro.org/las16/Presentations/Tuesday/… · Hardware Assisted Tracing on ARM with CoreSight and

Hardware-assisted software clock synchronization for homogeneous

Exploiting Hardware-Accelerated Ray Tracing for Monte

Hardware Assisted Virtualization Intel Virtualization Technology

Hardware Assisted Tracing on ARM with CoreSight and OpenCSD · The option ‘k’ and ‘u’ can be used to limit tracing to kernel or user space./perf record -e cs_etm/@20070000.etr/u

Hardware-Assisted On-Demand Hypervisor Activation for

TRaX: A Multicore Hardware Architecture for Real-Time Ray Tracing

RayCore: A ray-tracing hardware architecture for mobile ...gamma.cs.unc.edu/SATO/Raycore/raycore.pdf · RayCore: A ray-tracing hardware architecture for mobile devices JAE-HO NAH

Inside Hardware-Assisted Virtualization

Dynamic Hardware-Assisted Software-Controlled Page

GPU-Assisted Path Tracing

HARDWARE-ASSISTED ROOTKITS & INSTRUMENTATION… · hardware-assisted rootkits & instrumentation: ... svc supervisor call ... event type event code ld_retired:

Bridging the gap between hardware and software tracing

CS6958: HARDWARE RAY TRACING