Awk notes
📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense. |
Intro
Awk is useful for simple transforms of text, particularly text structured by fields, such as comma or tab separated columns.
Awk does matching (grep-like),
splitting into fields to match,
reformatting (cut/paste/sed/printf/whatnot-like).
It's actually a fairly minimal-syntax programming language (with few types - most things are strings) that for some basic tasks is more succinct than shell scripting, and sometimes it can be handier than sed.
...but if you're going to need any real logic you might as well use none of awk/sed/shell, and go for a more serious scripting language.
Basics
At its most basic, awk parses and walks space-separated (by default) fields within newline-separated records.
For example, it could selectively present parts of the output of ps:
$ ps aux | awk '{ print "User "$1" has been running "$11" since "$9"." }'
User root has been running bash since Nov24.
User me has been running -bash since 10:24.
...
You can prepend a regexp, like /regexp/ {command}, often for grep-like filtering.
For example:
- "When passwd's shell field (7) contains /bin/false or /nologin" (you can also match against $0, the full current record, to make it act rather like egrep):
$ awk -F':' '$7~/(\/bin\/false|\/nologin)/ {print $1" is not a real login"}' /etc/passwd
daemon is not a real login
...
- "If the home dir field (6) doesn't have 'home' in it":
$ awk -F':' '$6!~/home/ {print $1" is a non-regular user -- homedir is "$6}' /etc/passwd
mysql is a non-regular user -- homedir is /var/lib/mysql
...
- seeing who is currently logged in, and since when:
$ last | awk '/still logged/ {print $1"\tsince "$4" "$5" "$6"\t(on "$2")" }'
root since Wed Dec 5 (on pts/0)
root since Aug 10 00:06 (on :0)
root since Aug 9 03:38 (on tty1)
Input parsing, output formatting
Variables related to input, and output: (There are others that are sometimes useful in the middle of processing)
- FS: Field Separator (on input; initially acts on spaces and tabs(verify).
- can also be handed in with -F on the command line. You can give it several characters to split on(verify); at least, mine accepts "[/~]", and also "[ ]+")
- RS: Record Separator (on input, initially \n)
- OFS: Output Field Separator (initially )
- ORS: Output Record Separator (initially \n)
For example, a list of all users on one line:
$ awk 'BEGIN {FS=":"; ORS=", "} {print $1}' /etc/passwd
root, daemon, bin, sys, sync, games, man, lp, mail, ...
Blocks, processing in separate steps
You can define multiple blocks
- a BEGIN block will be executed before line processing
- one (or more, see below) main block(s), the line processing, which are unmarked
- an END block after line processing
This helps do setup, processing, and reporting, respectively.
User summary:
cat /etc/passwd | awk 'BEGIN {FS=":"; OFS=""; ORS="\n\n"} $7!~/false/ {print "User: "$1" (UID "$3", in group "$4")\n Shell: "$7"\n Home dir: "$6"\n Name: "$5 }'
</code>
User: backup (UID 34, in group 34)
Shell: /bin/sh
Home dir: /var/backups
Name: backup
...
Using associative arrays to summarizing how many users use each shell:
cat /etc/passwd | awk '
BEGIN {FS=":"}
{shells[$7]++}
END { for (shell in shells) printf("%15s users: %d\n", shell, shells[shell]) }'
/bin/false users: 11
/bin/sh users: 16
/bin/bash users: 3
/bin/sync users: 1
#!/bin/bash
# mentions users that use more than negligible CPU and/or memory
ps --no-headers -axeo user,%cpu,%mem | \
awk '{usercpu[$1]+=$2; usermem[$1]+=$3}
END { for (u in usercpu) { if (usercpu[u]>5 || usermem[u]>5)
printf("%15s using %4d%% CPU and %4d%% resident memory\n",
u, usercpu[u], usermem[u]) } }'
postgres using 4% CPU and 83% resident memory
liquids+ using 60% CPU and 1% resident memory
www-data using 4% CPU and 27% resident memory
root using 52% CPU and 11% resident memory
Use of several main blocks, makes sense when you have filters.
Example from munin:
netstat -s | awk '
/active connections ope/
/passive connection ope/ { print "passive.value " $1 }
/failed connection/ { print "failed.value " $1 }
/connection resets/ { print "resets.value " $1 }
/connections established/ { print "established.value " $1 }'
passive.value 181432543 failed.value 91976 resets.value 2954810 established.value 142
Seeing what programs are listening for connections, and what connections are currently established. (can be put into a shell script as-is):
netstat -pan | egrep -v '\bunix\b' | awk -F'[/ ]+' '
$0~/LISTEN/ {printf("%20s listening on %-20s (PID %s) \n", $8, $4, $7)}
$0~/ESTABLISHED/ {printf("%30s <--connection--> %s \n", $4, $5)}'
Or some more passwd related summary stuff:
#!/usr/bin/awk -f
BEGIN {
FS=":"
printf("Looking at line ")
}
{
printf("%d, ", NR)
shells[$7]++;
if ($3<1000) subthousand++
else supthousand++
}
$6~/\/home/ { home++ }
$6!~/\/home/ { nothome++ }
END {
printf("Done.\n")
for (shell in shells) {
printf("%15s users: %d\n", shell, shells[shell])
}
printf("Users with /home directories: %d. Users based elsewhere: %d\n", home, nothome)
printf("Users with UID <1000: %d, >1000: %d\n", subthousand, supthousand)
}
...into a file 'passwdparser' and chmod +x it. Now you can run it like ./passwdparser /etc/passwd.
(...NR is the number of records seen so far, which is frequently used as line numbering)
Language notes
AWK has while, do-while, for loops, for-in loops, arrays (note: one-based indexing), associative arrays, a bunch of operators (~ and !~ seem to be the only unusual ones), and more.
String functions include length, sub and gsub (replacing), substr, match, index, printf, sprintf.
Math functions include sin, cos, log, etc., though not min or max.
You can define your own functions.