Detecting System Management Interrupts
3 min read
System Management Interrutps (SMIs)
- high priority interrupts caused by the hardware
- transparent to the operating system
- can be used by the mainboard for power management, thermal management, or other system-level functions independent of the OS
- can take a long time to execute, causing a CPU core to be blocked from other work
Detecting SMIs
- compile a kernel with hwlat tracing capabilities; usually, a typical Linux kernel has this enabled; if not, the config can be found in the appendix
- after starting the machine with a trace-capable image
- check available tracers
cat /sys/kernel/debug/tracing/available_tracers
- enable the tracer
echo hwlat > /sys/kernel/debug/tracing/current_tracer
- there now should be a process “hwlat” running that takes up 50% of one CPU
-
- output of the hwlat tracer available
cat /sys/kernel/debug/tracing/trace
orcat /sys/kernel/debug/tracing/trace_pipep
Example Output
# tracer: hwlat
#
# entries-in-buffer/entries-written: 1/1 #P:32
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<...>-30395 [010] d... 533.362276: #1 inner/outer(us): 0/388 ts:1711469939.505579595 count:1
- inner/outer: where the latency was detected, see next section
How does it work?
- this hwlat process is taking timestamps in a loop
- if distance between two timestamps is unreasonably large (bigger than ns), there was an SMI
- we should lower the threshold of this distance to 1us by executing
echo 1 > /sys/kernel/debug/tracing/tracing_thresh
- the hwlat process is migrated over all the cores to catch SMIs there
- inner vs outer latency
while (run) {
start_ts = trace_local_clock();
end_ts = trace_local_clock();
if (!first && start_ts - last_ts > thresh)
record_outer();
if (end_ts - start_ts > thresh)
record_inner();
last_ts = end_ts;
first = 0;
}
Further options
- by default, only 50% CPU time is used
- this can be increased by echoing into
echo 9999999 > /sys/kernel/debug/tracing/hwlat_detector/width
, where the value is smaller than the set windowcat /sys/kernel/debug/tracing/hwlat_detector/window
to avoid starving the system. - from my experience, this, however, is not necessary to catch SMIs. The default option is “good enough”.
Firing an SMI manually
- There is a nice small kernel module here for manually triggering an SMI to verify the setup
- follow the instructions in the readme to compile and load the module
Hardware Registers for counting SMIs
- Intel: MSR0x34, can be read out with turbostat / perf
- AMD: ls_msi_rx, can be used with
perf stat -e ls_smi_rx -I 60000
However, doesn’t seem to count everything; counts seem incorrect
Sources, Appendix
Custom Kernel Fragment: files/kernel_config_fragments/trace
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_RETHOOK=y
CONFIG_RETHOOK=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_RETVAL=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_NO_PATCHABLE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_OBJTOOL_MCOUNT=y
CONFIG_HAVE_OBJTOOL_NOP_MCOUNT=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_HAVE_BUILDTIME_MCOUNT_SORT=y
CONFIG_BUILDTIME_MCOUNT_SORT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_FUNCTION_GRAPH_RETVAL=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_FPROBE=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_TRACE_PREEMPT_TOGGLE=y
CONFIG_IRQSOFF_TRACER=y
CONFIG_PREEMPT_TRACER=y
CONFIG_SCHED_TRACER=y
CONFIG_HWLAT_TRACER=y
CONFIG_OSNOISE_TRACER=y
CONFIG_TIMERLAT_TRACER=y
CONFIG_MMIOTRACE=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP=y
CONFIG_BRANCH_PROFILE_NONE=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_FPROBE_EVENTS=y
CONFIG_KPROBE_EVENTS=y
CONFIG_UPROBE_EVENTS=y
CONFIG_DYNAMIC_EVENTS=y
CONFIG_PROBE_EVENTS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE_MCOUNT_USE_CC=y
CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
CONFIG_ARCH_HAS_DEVMEM_IS_ALLOWED=y
CONFIG_STRICT_DEVMEM=y