13
Introduction to Perf Hsiangkai

Introduction to Perf

Embed Size (px)

Citation preview

Page 1: Introduction to Perf

Introduction to PerfHsiangkai

Page 2: Introduction to Perf

What is Perf?• Perf is a profiler tool for Linux 2.6+ based systems

that abstracts away CPU hardware differences in Linux performance measurements and presents a simple command line interface.

• Perf is based on the perf_events interface exported by recent versions of the Linux kernel.

Page 3: Introduction to Perf

Events• software events - pure kernel counters

• context-switches

• hardware events - Performance Monitoring Unit (PMU)

• measure micro-architectural events such as the number of cycles, instructions retired, L1 cache misses and so on

• hardware cache events - events provided by the CPU

• tracepoint events - kernel ftrace infrastructure

Page 4: Introduction to Perf

perf stat• For any of the supported events, perf can keep a

running count during process execution.

• Events are designated using their symbolic names followed by optional unit masks and modifiers.

• perf stat -e cycles <command>

• perf stat -e cycles:u <command>

• perf stat -e cycles,instructions,cache-misses <command>

Page 5: Introduction to Perf

Modifiers

Page 6: Introduction to Perf

Multiplexing and Scaling Events

• If there are more events than counters, the kernel uses time multiplexing (switch frequency = HZ, generally 100 or 1000) to give each event a chance to access the monitoring hardware.

• Multiplexing only applies to PMU events.

• At the end of the run, the tool scales the count based on total time enabled vs time running.

• final_count = raw_count * time_enabled/time_running

Page 7: Introduction to Perf

• The perf tool can be used to count events on a per-thread, per-process, per-cpu or system-wide basis.

Page 8: Introduction to Perf

• per-thread

• the counter only monitors the execution of a designated thread.

• When the thread is scheduled out, monitoring stops.

• By default, perf stat counts in per-thread mode.

• Attaching to a running kernel thread

• perf stat -e cycles -t <thread-id>

Page 9: Introduction to Perf

• per-process

• all threads of the process are monitored

• Counts and samples are aggregated at the process level.

• The perf_events interface allows for automatic inheritance on fork() and pthread_create().

• Attaching to a running process

• perf stat -e cycles -p <pid>

Page 10: Introduction to Perf

• per-cpu

• all threads running on the designated processors are monitored.

• perf stat -e cycles:u,instructions:u -a <command>

• perf stat -e cycles:u,instructions:u -a -C 0,2-3 <command>

Page 11: Introduction to Perf

perf record• collect profiles on per-thread, per-process and per-

cpu basis

• This generates an output file called perf.data.

Page 12: Introduction to Perf

Event-Based Sampling• By default, perf record uses the cycles event as the sampling event.

• The perf_events interface allows two modes to express the sampling period:

• the number of occurrences of the event (period)

• perf record -e retired_instructions:u -c 2000 <command>

• the average rate of samples/sec (frequency)

• The perf tool defaults to the average rate. It is set to 1000Hz, or 1000 samples/sec.

• perf record -e instructions:u -F 250 <command>

Page 13: Introduction to Perf

perf report• Samples collected by perf record are saved into a

binary file called, by default, perf.data. The perf report command reads this file and generates a concise execution profile.