Latest revision as of 23:34, 21 April 2024

Linux-related notes

Shell, admin, and both:

Shell - command line and bash notes · shell login - profiles and scripts ·· find and xargs and parallel · screen and tmux ·· Shell and process nitty gritty ·· Isolating shell environments ·· Shell flow control notes

Linux admin - disk and filesystem · Linux networking · Init systems and service management (upstart notes, systemd notes) · users and permissions · Debugging · security enhanced linux · PAM notes · health and statistics · Machine Check Events · kernel modules · YP notes · unsorted and muck

Logging and graphing - Logging · RRDtool and munin notes

Network admin - Firewalling and other packet stuff ·

Remote desktops

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

What-does-it-spit-out debugging

print statements

...While various IDEs let you hook into the execution environment and allows you do step lines at a time and sees what happens in memory, and that is sometimes necessary...

...for many bugs, just printing out what happens in the crucial steps simple, effective, and almost universally available.

If this is something that might regress, consider doing it via logging instead - it makes it easier to filter the output, to check it after the fact, and it's handier when debugging is likelier to happen via helpdesks rather than the programmer who wrote the thing.

checking that invariants are actually invariant

strace

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A trace of system calls: finds syscalls a program makes, prints to stderr as they happen, no instrumentation/alteration of your program necessary.

For example, things file related will use fstat and probably open - so you can see what files are being accessed - and you can see reads and writes, sockets opening and closing, and such.

Good for certain debugging when programs don't say enough, decent for certain workload characterization for things that seem slow.

strace can run a command for you, or you can attach to a running process via its PID (-p)

For things that fork off, you'll want -f to follow that.

To compare time spent in each syscall, use -c.

You can filter what syscalls are reported using their names, and with a few existing categorizations:

file: any file-related calls (access, state, read, write, etc.)
desc: file descriptor related
process: process management, e.g. fork, wait,l exec
network
signal
ipc
memory

For example:

# only mention open() calls
strace -eopen,openat ls

# see whether there are a bulk of stat()s done (for how many, use -c)
strace -estat,fstat,lstat,newfstatat  find /tmp

# see what progrms are invoked
strace -f -eprocess  service apache2 status

# summarize time spent in each syscall
strace -c  ls -l /proc

# comparing e.g. ls on small and large dirs
strace -c ls -l /data/largedir

For example, say you notice that ls -l seems to do a lot of lgetxattr()s. -c would show that for large directories getdents dwarfs everything, and in general it's nearly free assuming that you've just lstat()ted.

Logging

System logging

Windows Event Log

Live debugging

From the IDE

Terms

Step into: Go into the called function's workings. (does the same as 'over' if not a call)
Step over: Perform evaluation/call, but don't go into it, and don't display ("I trust this, give me the next thing at the current level")
Step out: "I've seen enough of this function's guts, go (back) to its caller."

Note: while 'over' may suggest skipping code execution, none of the options skip execution - it just means skipping its debugger display.

Debug and performance inspection tools

ltrace

An execution shim that shows calls into shared libraries.

https://en.wikipedia.org/wiki/Ltrace

pcstat

Page Cache statistics

For individual files, find how much is in the page cache

`echo l > /proc/sysrq-trigger`

...gives the backtrace for what's on each CPU.

Writes it to dmesg, which often also goes to something like kern.log, syslog / messages

kernel profiler sources

kprobes

(kernel functions)

kernel tracepoints

dtrace

userspace profiler sources

uprobes

(userspace functions)

USDT (Userland Statically Defined Tracing)

A way of embedding dtrace probes into an app

LTTng userspace

profiler tools / frontends

(Note that some basic things can be gotten from /proc)

dtrace (solaris, also freebsd, linux, osx, smartos)

Very cool tools originating on Solaris.

The rest are ports, some close to the Solaris version, some further away.

perf (linux)

a.k.a. perf_events, since 2.6 kernels.

It basically abstracted away hardware when doing performance counting/measurements.

It also introduced a few CLI tools that are easier to use than many that came before.

https://perf.wiki.kernel.org/index.php/Tutorial

http://www.brendangregg.com/perf.html

https://www.brendangregg.com/blog/2014-07-03/perf-counting.html

https://github.com/brendangregg/perf-tools.git

systemtap (linux)

systemtap seems to imitate dtrace decently

https://en.wikipedia.org/wiki/SystemTap

https://sourceware.org/systemtap/

https://sourceware.org/systemtap/tutorial.pdf

ptrace

ftrace

BPF (linux, bsd, more?)

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

extended Berkeley Packet Filter ('eBPF, often just BPF) originated in network packet filtering, but grew so flexible that it is also very useful for system tracing (and never got a new name).

It compiles user requests into sandboxed bytecode to be run in the kernel, which makes it a mix of safe, flexible, and fast.

LTTng

OProfile

https://en.wikipedia.org/wiki/OProfile

Unsorted

Instruments (osx)

https://en.wikipedia.org/wiki/Instruments_(software)

Xperf (windows)

https://blogs.msdn.microsoft.com/ntdebugging/2008/04/03/windows-performance-toolkit-xperf/

flame graphs

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

For context (though I think they came later): Firefox and Chrome has Flame Charts, which basically just stack over time. Since this is usually shown at a scale useful to find code that slow down the framerate, and JS is single-threaded, this tends to look fairly clean.

CPU flame graphs instead are primarily geared to show how common a particular stack is.

While it does not show a timeline (x axis is sorted alphabetically), or really collect amount of time spent, the amount of times the sample mentions the specific function is still a good indication of where most time is spent (particularly when things are unusually slow, which often means one function is dominating).

It's regularly used with statistical profiling, which takes samples frequently enough (though ideally not in step with anything else), which for specific functions is more approximate than cycle-accurate profiling , but can be a more real-world-accurate view of the general execution due to the profiling having minimal impact itself (on CPU, but also cache).

The file format is basically just lines of (verify)

stackframe;stackframe;stackframe count

Notes:

the count is technically optional, but whenever you can aggregate it can make these files a lot shorter.
a second count column is used for the differential variant
variants use some annotation on the function name
See flamegraph.pl's comments for more detail

It was recently popularized by Brendan Gregg, who specializes in digging into lower levels of system so there are various relevant tools for kernel use, like syscalls or network efficiency - basically scripts that use dtrace, perf, SystemTap, etc.

Various runtimes can also output things useful for these.

There is sometimes value to producing them from code, e.g. get a better idea of database use with entries like stack;SQL milliseconds

Variations on the theme:

Off-CPU flame graphs - intercept mainly the syscalls related to why something is not scheduled for CPU, such as general scheduling, IO,
Hot/Cold flame graphs - combine CPU and Off-CPU, to get an indication of how a program is scheduled
Memory flame graph flame graphs - intercept mainly the syscalls related to memory allocation
differential flame graphs - mainly meant to indicate performance regressions/improvements; takes two CPU flame graphs, and show changes in time spent

since color is based on newstackvalue-oldstackvalue, this has issues like renames making things disappear

The original tool is [1], and see also [2], and there are various that imitate it, like [https://www.npmjs.com/package/stackvis stackvis]

http://www.brendangregg.com/flamegraphs.html

Unsorted

http://www.brendangregg.com/linuxperf.html
- https://www.youtube.com/watch?v=CbmEDXq7es0
- https://www.youtube.com/watch?v=OPio8V-z03c

http://dtrace.org/blogs/brendan/

NVidia profiler tools

Post-mortem debugging

Debug symbols

GNU debugger (gdb)

Unsorted

@@ Line 1: / Line 1: @@
+{{#addbodyclass:tag_tech}}
+{{#addbodyclass:tag_prog}}
 {{ShellsAndAdmin}}
 {{stub}}
@@ Line 6: / Line 8: @@
 ==print statements==
-Yes, you can use an IDE that hooks into the execution environment and allows you do step lines at a time and sees what happens in memory.
+{{comment|...While various IDEs let you hook into the execution environment and allows you do step lines at a time and sees what happens in memory, and that is sometimes necessary...}}
-Yes, that is sometimes necessary.
+...for many bugs, just printing out what happens in the crucial steps simple, effective, and almost universally available.
-But for most bugs, just print statement is simple, effective.
-...though if this is something that might regress, consider doing it via [[logging]] instead - it makes it easier to filter the output, to check it after the fact, and it's handier when debugging is likelier to happen via helpdesks rather than the programmer who wrote the thing.
+If this is something that might regress, consider doing it via [[logging]] instead - it makes it easier to filter the output, to check it after the fact, and it's handier when debugging is likelier to happen via helpdesks rather than the programmer who wrote the thing.
@@ Line 20: / Line 21: @@
 -->
 ==strace==
-A trace of system calls.
+{{stub}}
+A trace of system calls: finds syscalls a program makes, prints to stderr as they happen, no instrumentation/alteration of your program necessary.
+For example, things file related will use fstat and probably open - so you can see what files are being accessed - and you can see reads and writes, sockets opening and closing, and such.
+Good for certain debugging when programs don't say enough,
+decent for certain workload characterization for things that seem slow.
+strace can run a command for you, or you can attach to a running process via its PID (-p)
+For things that [[fork]] off, you'll want -f to follow that.
+To compare time spent in each syscall, use {{inlinecode|-c}}.
+You can filter what syscalls are reported using their names, and with a few existing categorizations:
+* file: any file-related calls (access, state, read, write, etc.)
+* desc: file descriptor related
+* process: process management, e.g. fork, wait,l exec
+* network
+* signal
+* ipc
+* memory
+For example:
+ # only mention open() calls
+ strace -eopen,openat ls
+ # see whether there are a bulk of stat()s done (for how many, use -c)
+ strace -estat,fstat,lstat,newfstatat  find /tmp
+ # see what progrms are invoked
+ strace -f -eprocess  service apache2 status
+ # summarize time spent in each syscall
+ strace -c  ls -l /proc
+ # comparing e.g. ls on small and large dirs
+ strace -c ls -l /data/largedir
+For example, say you notice that {{inlinecode|ls -l}} seems to do a ''lot'' of lgetxattr()s.
+{{inlinecode|-c}} would show that for large directories getdents dwarfs everything, and in general it's nearly free assuming that you've just lstat()ted.
-When those are the things you are interested in, this is just one command away, no alteration of your program necessary.
 <!--
-strace (pretty) prints each system call as it makes them.
+It tries to pretty-print calls, e.g. figuring the names of the constants used.
-For example, things file related will use fstat and probably open - so you can see what files are being accessed - and you can see reads and writes, sockets opening and closing, and such.
 If you always wondered where vmstat got its information from, you can do a
@@ Line 55: / Line 103: @@
 ==Logging==
-=Live debugging=
+=System logging=
+==Windows Event Log==
+<!--
+So, not a lot of programs use it, but you can - see [https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-reporteventw ReportEventW docs] which is also wrapped by other things (e.g. python has win32evtlogutil.ReportEvent(), part of pywin32)
-==From the IDE==
+Upsides include that
+: the log viewer is basically always installed.
+: you can do remote event logging
+: you can log data
-===Terms===
+Arguables/limitations include that
-* Step into: Go into the called function's workings. (does the same as 'over' if not a call)
+: there's not a lot of useful filtering
-* Step over: Perform evaluation/call, but don't go into it, and don't display (''"I trust this, give me the next thing at the current level"'')
+: applications may not be allowed to do things like create logs (file-in-your-profile is somewhat surer)
-* Step out: ''"I've seen enough of this function's guts, go (back) to its caller."''
+: not portable to non-windows
-Note: while 'over' may suggest skipping code execution, none of the options skip execution - it just means skipping its debugger display.
+eventvwr.msc
-==Debug and performance inspection tools==
+Custom Views
+Windows Logs
-===strace===
-{{stub}}
-(On solaris, see [[truss]])
+https://stackoverflow.com/questions/51385195/writing-to-windows-event-log-using-win32evtlog-from-pywin32-library
+-->
-finds syscalls a program makes, prints to stderr.
-Great for debugging when programs don't say enough,
-and for workload characterization for things that seem slow.
-strace can run a command for you, or attach to a running process via its PID (-p)
-For things that fork off, you'll want -f
-To compare time spent in each syscall, use {{inlinecode|-c}}.
+=Live debugging=
-You can filter what syscalls are reported using their names, and with a few existing categorizations:
+==From the IDE==
-* file: any file-related calls (access, state, read, write, etc.)
-* desc: file descriptor related
-* process: process management, e.g. fork, wait,l exec
-* network
-* signal
-* ipc
-* memory
+===Terms===
+* Step into: Go into the called function's workings. (does the same as 'over' if not a call)
+* Step over: Perform evaluation/call, but don't go into it, and don't display (''"I trust this, give me the next thing at the current level"'')
+* Step out: ''"I've seen enough of this function's guts, go (back) to its caller."''
-For example:
+Note: while 'over' may suggest skipping code execution, none of the options skip execution - it just means skipping its debugger display.
- # only mention open() calls
- strace -eopen,openat ls
- # see whether there are a bulk of stat()s done (for how many, use -c)
- strace -estat,fstat,lstat,newfstatat  find /tmp
- # see what progrms are invoked
- strace -f -eprocess  service apache2 status
- # summarize time spent in each syscall
+==Debug and performance inspection tools==
- strace -c  ls -l /proc
- # comparing e.g. ls on small and large dirs
- strace -c ls -l /data/largedir
-For example, say you notice that {{inlinecode|ls -l}} seems to do a ''lot'' of lgetxattr()s.
-{{inlinecode|-c}} would show that for large directories getdents dwarfs everything, and in general it's nearly free assuming that you've just lstat()ted.
 ===ltrace===
@@ Line 266: / Line 306: @@
 https://sourceware.org/systemtap/tutorial.pdf
+====ptrace====
+<!--
+'process trace' puts one process in control of another.
+https://en.wikipedia.org/wiki/Ptrace
+https://man7.org/linux/man-pages/man2/ptrace.2.html
+-->
 ====ftrace====
@@ Line 357: / Line 410: @@
   stackframe;stackframe;stackframe count
 Notes:
-* the count is technically optional, but can make these files a lot shorter.
+* the count is technically optional, but whenever you can aggregate it can make these files a lot shorter.
 * a second count column is used for the differential variant
 * variants use some annotation on the function name
@@ Line 409: / Line 462: @@
 =Post-mortem debugging=
+<!--
+Post-mortem debugging refers to "program crashed? Well dump all of its memory to a file, and figure out later what it was doing"
+-->
 ==Debug symbols==
 <!--
-In compiled languages, there is no need for names to be stored.
+In compiled languages, there is no need for names to be stored at all.
 Once the compiler settles where a function sits in memory, it only ever needs to jump to that address.
 Names are only really a small bit of bloat and smaller versions of the same code tends to runs faster.
-Debug symbols give names to addresses, because when debugging, they're a boon, basically necessary, to locate where a crash originates.
+You would rarely or never use that information, and it just makes things bigger (so caches less effective)
+Except when you ''do'' want it to identify the name of what it's doing.
+Debug symbols give names to addresses, because during debugging that makes things ''so'' much easier.
@@ Line 425: / Line 487: @@
 <!--
 <tt>gdb</tt> is the GNU Debugger, a text-mode utility which allows tracing and altering program execution.
@@ Line 447: / Line 510: @@
+Post-mortem debugging means that, instead of running gdb on an executable, it to give it a 'core dump', which you can configure unices to generate when something crashes (<tt>ulimit -c unlimited</tt>) {{comment|(or with some libraries, to save program state without terminating[http://code.google.com/p/google-coredumper/])}}.
-An alternative to running gdb on an executable is to load a 'core dump', which you can configure unices to generate when something crashes (<tt>ulimit -c unlimited</tt>), or with some libraries, to save program state without terminating[http://code.google.com/p/google-coredumper/].
+A core dump records the working memory state of the of a computer program at crash time, allowing gdb to view what happened after the fact.
+In this case, you start gdb with the executable, then load the state from the core dump.
-A core dump records the working memory state of the of a computer program at crash time, allowing gdb to view what happened after the fact. In this case, you start gdb with the executable, then load the state from the core dump.
   gdb ls core.3463
 You cannot backtrace core dumps{{verify}}, so you will have to reproduce the fault
@@ Line 483: / Line 546: @@
 http://cs.baylor.edu/~donahoo/tools/gdb/tutorial.html
+-->
+=Unsorted=
+<!--
+The gdb/softice sort of 'I love my assembly' way is intriguing, but when you can get around things just checking invariants it probably has a slightly steep learning curve.
+Actually, things like [http://www.gnu.org/software/ddd/ ddd] and visual studio's debugging watches make that a lot easier while debugging, which makes learning it worth it. (I still have to learn it properly. I have a few times, but it never quite stuck.)
+Still, when there are simpler ways, they're useful.

Debugging: Difference between revisions

Latest revision as of 23:34, 21 April 2024

What-does-it-spit-out debugging

print statements

checking that invariants are actually invariant

strace

Logging

System logging

Windows Event Log

Live debugging

From the IDE

Terms

Debug and performance inspection tools

ltrace

pcstat

echo l > /proc/sysrq-trigger

kernel profiler sources

kprobes

kernel tracepoints

dtrace

userspace profiler sources

uprobes

USDT (Userland Statically Defined Tracing)

LTTng userspace

profiler tools / frontends

dtrace (solaris, also freebsd, linux, osx, smartos)

perf (linux)

systemtap (linux)

ptrace

ftrace

BPF (linux, bsd, more?)

LTTng

OProfile

Unsorted

Instruments (osx)

Xperf (windows)

flame graphs

NVidia profiler tools

Post-mortem debugging

Debug symbols

GNU debugger (gdb)

Unsorted

Navigation menu

`echo l > /proc/sysrq-trigger`