Awk notes: Difference between revisions
mNo edit summary |
mNo edit summary |
||
Line 22: | Line 22: | ||
For example, it could selectively present parts of the output of {{inlinecode|ps}}: | For example, it could selectively present parts of the output of {{inlinecode|ps}}: | ||
< | <syntaxhighlight lang="bash"> | ||
$ ps aux | awk '{ print "User "$1" has been running "$11" since "$9"." }' | $ ps aux | awk '{ print "User "$1" has been running "$11" since "$9"." }' | ||
User root has been running bash since Nov24. | User root has been running bash since Nov24. | ||
User me has been running -bash since 10:24. | User me has been running -bash since 10:24. | ||
... | ... | ||
</ | </syntaxhighlight> | ||
Line 34: | Line 34: | ||
For example: | For example: | ||
* "When passwd's shell field (7) contains /bin/false or /nologin" {{comment|(you can also match against $0, the full current record, to make it act rather like egrep)}}: | * "When passwd's shell field (7) contains /bin/false or /nologin" {{comment|(you can also match against $0, the full current record, to make it act rather like egrep)}}: | ||
< | <syntaxhighlight lang="bash"> | ||
$ awk -F':' '$7~/(\/bin\/false|\/nologin)/ {print $1" is not a real login"}' /etc/passwd | $ awk -F':' '$7~/(\/bin\/false|\/nologin)/ {print $1" is not a real login"}' /etc/passwd | ||
daemon is not a real login | daemon is not a real login | ||
... | ... | ||
</ | </syntaxhighlight> | ||
* "If the home dir field (6) doesn't have 'home' in it": | * "If the home dir field (6) doesn't have 'home' in it": | ||
< | <syntaxhighlight lang="bash"> | ||
$ awk -F':' '$6!~/home/ {print $1" is a non-regular user -- homedir is "$6}' /etc/passwd | $ awk -F':' '$6!~/home/ {print $1" is a non-regular user -- homedir is "$6}' /etc/passwd | ||
mysql is a non-regular user -- homedir is /var/lib/mysql | mysql is a non-regular user -- homedir is /var/lib/mysql | ||
... | ... | ||
</ | </syntaxhighlight> | ||
* seeing who is ''currently'' logged in, and since when: | * seeing who is ''currently'' logged in, and since when: | ||
< | <syntaxhighlight lang="bash"> | ||
$ last | awk '/still logged/ {print $1"\tsince "$4" "$5" "$6"\t(on "$2")" }' | $ last | awk '/still logged/ {print $1"\tsince "$4" "$5" "$6"\t(on "$2")" }' | ||
root since Wed Dec 5 (on pts/0) | root since Wed Dec 5 (on pts/0) | ||
Line 73: | Line 73: | ||
For example, a list of all users on one line: | For example, a list of all users on one line: | ||
< | <syntaxhighlight lang="bash"> | ||
$ awk 'BEGIN {FS=":"; ORS=", "} {print $1}' /etc/passwd | $ awk 'BEGIN {FS=":"; ORS=", "} {print $1}' /etc/passwd | ||
</ | </syntaxhighlight > | ||
root, daemon, bin, sys, sync, games, man, lp, mail, ... | root, daemon, bin, sys, sync, games, man, lp, mail, ... | ||
Line 91: | Line 91: | ||
User summary: | User summary: | ||
< | <syntaxhighlight lang="bash"> | ||
cat /etc/passwd | awk 'BEGIN {FS=":"; OFS=""; ORS="\n\n"} $7!~/false/ {print "User: "$1" (UID "$3", in group "$4")\n Shell: "$7"\n Home dir: "$6"\n Name: "$5 }' | cat /etc/passwd | awk 'BEGIN {FS=":"; OFS=""; ORS="\n\n"} $7!~/false/ {print "User: "$1" (UID "$3", in group "$4")\n Shell: "$7"\n Home dir: "$6"\n Name: "$5 }' | ||
</code> | </code> | ||
Line 99: | Line 99: | ||
Name: backup | Name: backup | ||
... | ... | ||
</code> | |||
'''Using associative arrays''' to summarizing how many users use each shell: | '''Using associative arrays''' to summarizing how many users use each shell: | ||
< | <syntaxhighlight lang="bash"> | ||
cat /etc/passwd | awk ' | cat /etc/passwd | awk ' | ||
BEGIN {FS=":"} | BEGIN {FS=":"} | ||
Line 113: | Line 113: | ||
/bin/bash users: 3 | /bin/bash users: 3 | ||
/bin/sync users: 1 | /bin/sync users: 1 | ||
</code> | |||
<syntaxhighlight lang="bash"> | |||
< | |||
#!/bin/bash | #!/bin/bash | ||
# mentions users that use more than negligible CPU and/or memory | # mentions users that use more than negligible CPU and/or memory | ||
Line 123: | Line 123: | ||
printf("%15s using %4d%% CPU and %4d%% resident memory\n", | printf("%15s using %4d%% CPU and %4d%% resident memory\n", | ||
u, usercpu[u], usermem[u]) } }' | u, usercpu[u], usermem[u]) } }' | ||
</ | </syntaxhighlight > | ||
postgres using 4% CPU and 83% resident memory | postgres using 4% CPU and 83% resident memory | ||
liquids+ using 60% CPU and 1% resident memory | liquids+ using 60% CPU and 1% resident memory | ||
Line 135: | Line 135: | ||
Example from [http://en.wikipedia.org/wiki/Munin_%28network_monitoring_application%29 munin]: | Example from [http://en.wikipedia.org/wiki/Munin_%28network_monitoring_application%29 munin]: | ||
< | <syntaxhighlight lang="bash"> | ||
netstat -s | awk ' | netstat -s | awk ' | ||
/active connections ope/ | /active connections ope/ | ||
Line 142: | Line 142: | ||
/connection resets/ { print "resets.value " $1 } | /connection resets/ { print "resets.value " $1 } | ||
/connections established/ { print "established.value " $1 }' | /connections established/ { print "established.value " $1 }' | ||
</ | </syntaxhighlight > | ||
passive.value 181432543 | passive.value 181432543 | ||
failed.value 91976 | failed.value 91976 | ||
Line 150: | Line 150: | ||
Seeing what programs are listening for connections, and what connections are currently established. (can be put into a shell script as-is): | Seeing what programs are listening for connections, and what connections are currently established. (can be put into a shell script as-is): | ||
< | <syntaxhighlight lang="bash"> | ||
netstat -pan | egrep -v '\bunix\b' | awk -F'[/ ]+' ' | netstat -pan | egrep -v '\bunix\b' | awk -F'[/ ]+' ' | ||
$0~/LISTEN/ {printf("%20s listening on %-20s (PID %s) \n", $8, $4, $7)} | $0~/LISTEN/ {printf("%20s listening on %-20s (PID %s) \n", $8, $4, $7)} | ||
$0~/ESTABLISHED/ {printf("%30s <--connection--> %s \n", $4, $5)}' | $0~/ESTABLISHED/ {printf("%30s <--connection--> %s \n", $4, $5)}' | ||
</ | </syntaxhighlight > | ||
Or some more passwd related summary stuff: | Or some more passwd related summary stuff: | ||
< | <syntaxhighlight lang="bash"> | ||
#!/usr/bin/awk -f | #!/usr/bin/awk -f | ||
BEGIN { | BEGIN { | ||
Line 183: | Line 183: | ||
printf("Users with UID <1000: %d, >1000: %d\n", subthousand, supthousand) | printf("Users with UID <1000: %d, >1000: %d\n", subthousand, supthousand) | ||
} | } | ||
</ | </syntaxhighlight > | ||
...into a file 'passwdparser' and <tt>chmod +x</tt> it. Now you can run it like {{inlinecode|./passwdparser /etc/passwd}}. | ...into a file 'passwdparser' and <tt>chmod +x</tt> it. Now you can run it like {{inlinecode|./passwdparser /etc/passwd}}. | ||
(...NR is the number of records seen so far, which is frequently used as line numbering)<!-- | (...NR is the number of records seen so far, which is frequently used as line numbering)<!-- | ||
< | <syntaxhighlight lang="bash"> | ||
cat file.txt | awk '{print NR" "$0}' | cat file.txt | awk '{print NR" "$0}' | ||
</ | </syntaxhighlight > | ||
--> | --> | ||
Line 209: | Line 209: | ||
List processes in 'D' state (uninterruptible wait time, related to IO waiting) | List processes in 'D' state (uninterruptible wait time, related to IO waiting) | ||
< | <syntaxhighlight lang="bash"> | ||
ps auxw | awk '$8~/D/ { print $0 }' | ps auxw | awk '$8~/D/ { print $0 }' | ||
# although you can get a slightly more controlled version without needing awk: | # although you can get a slightly more controlled version without needing awk: | ||
ps -eo stat,user,comm,pid | egrep '^D' | ps -eo stat,user,comm,pid | egrep '^D' | ||
</ | </syntaxhighlight > | ||
--> | --> |
Revision as of 19:30, 15 July 2023
📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense. |
Intro
Awk is useful for simple transforms of text, particularly text structured by fields, such as comma or tab separated columns.
Awk does matching (grep-like),
splitting into fields to match,
reformatting (cut/paste/sed/printf/whatnot-like).
It's actually a fairly minimal-syntax programming language (with few types - most things are strings) that for some basic tasks is more succinct than shell scripting, and sometimes it can be handier than sed.
...but if you're going to need any real logic you might as well use none of awk/sed/shell, and go for a more serious scripting language.
Basics
At its most basic, awk parses and walks space-separated (by default) fields within newline-separated records.
For example, it could selectively present parts of the output of ps:
$ ps aux | awk '{ print "User "$1" has been running "$11" since "$9"." }'
User root has been running bash since Nov24.
User me has been running -bash since 10:24.
...
You can prepend a regexp, like /regexp/ {command}, often for grep-like filtering.
For example:
- "When passwd's shell field (7) contains /bin/false or /nologin" (you can also match against $0, the full current record, to make it act rather like egrep):
$ awk -F':' '$7~/(\/bin\/false|\/nologin)/ {print $1" is not a real login"}' /etc/passwd
daemon is not a real login
...
- "If the home dir field (6) doesn't have 'home' in it":
$ awk -F':' '$6!~/home/ {print $1" is a non-regular user -- homedir is "$6}' /etc/passwd
mysql is a non-regular user -- homedir is /var/lib/mysql
...
- seeing who is currently logged in, and since when:
$ last | awk '/still logged/ {print $1"\tsince "$4" "$5" "$6"\t(on "$2")" }'
root since Wed Dec 5 (on pts/0)
root since Aug 10 00:06 (on :0)
root since Aug 9 03:38 (on tty1)
</code>
===Input parsing, output formatting===
Variables related to input, and output: (There are others that are sometimes useful in the middle of processing)
* FS: Field Separator (on input; initially acts on spaces and tabs{{verify}}.
: can also be handed in with -F on the command line. You can give it several characters to split on{{verify}}; at least, mine accepts <tt>"[/~]"</tt>, and also <tt>"[ ]+"</tt>)
* RS: Record Separator (on input, initially {{inlinecode|\n}})
* OFS: Output Field Separator (initially {{inlinecode| }})
* ORS: Output Record Separator (initially {{inlinecode|\n}})
For example, a list of all users on one line:
<syntaxhighlight lang="bash">
$ awk 'BEGIN {FS=":"; ORS=", "} {print $1}' /etc/passwd
root, daemon, bin, sys, sync, games, man, lp, mail, ...
Blocks, processing in separate steps
You can define multiple blocks
- a BEGIN block will be executed before line processing
- one (or more, see below) main block(s), the line processing, which are unmarked
- an END block after line processing
This helps do setup, processing, and reporting, respectively.
User summary:
cat /etc/passwd | awk 'BEGIN {FS=":"; OFS=""; ORS="\n\n"} $7!~/false/ {print "User: "$1" (UID "$3", in group "$4")\n Shell: "$7"\n Home dir: "$6"\n Name: "$5 }'
</code>
User: backup (UID 34, in group 34)
Shell: /bin/sh
Home dir: /var/backups
Name: backup
...
</code>
'''Using associative arrays''' to summarizing how many users use each shell:
<syntaxhighlight lang="bash">
cat /etc/passwd | awk '
BEGIN {FS=":"}
{shells[$7]++}
END { for (shell in shells) printf("%15s users: %d\n", shell, shells[shell]) }'
</code>
/bin/false users: 11
/bin/sh users: 16
/bin/bash users: 3
/bin/sync users: 1
</code>
<syntaxhighlight lang="bash">
#!/bin/bash
# mentions users that use more than negligible CPU and/or memory
ps --no-headers -axeo user,%cpu,%mem | \
awk '{usercpu[$1]+=$2; usermem[$1]+=$3}
END { for (u in usercpu) { if (usercpu[u]>5 || usermem[u]>5)
printf("%15s using %4d%% CPU and %4d%% resident memory\n",
u, usercpu[u], usermem[u]) } }'
postgres using 4% CPU and 83% resident memory liquids+ using 60% CPU and 1% resident memory www-data using 4% CPU and 27% resident memory root using 52% CPU and 11% resident memory
Use of several main blocks, makes sense when you have filters.
Example from munin:
netstat -s | awk '
/active connections ope/
/passive connection ope/ { print "passive.value " $1 }
/failed connection/ { print "failed.value " $1 }
/connection resets/ { print "resets.value " $1 }
/connections established/ { print "established.value " $1 }'
passive.value 181432543 failed.value 91976 resets.value 2954810 established.value 142
Seeing what programs are listening for connections, and what connections are currently established. (can be put into a shell script as-is):
netstat -pan | egrep -v '\bunix\b' | awk -F'[/ ]+' '
$0~/LISTEN/ {printf("%20s listening on %-20s (PID %s) \n", $8, $4, $7)}
$0~/ESTABLISHED/ {printf("%30s <--connection--> %s \n", $4, $5)}'
Or some more passwd related summary stuff:
#!/usr/bin/awk -f
BEGIN {
FS=":"
printf("Looking at line ")
}
{
printf("%d, ", NR)
shells[$7]++;
if ($3<1000) subthousand++
else supthousand++
}
$6~/\/home/ { home++ }
$6!~/\/home/ { nothome++ }
END {
printf("Done.\n")
for (shell in shells) {
printf("%15s users: %d\n", shell, shells[shell])
}
printf("Users with /home directories: %d. Users based elsewhere: %d\n", home, nothome)
printf("Users with UID <1000: %d, >1000: %d\n", subthousand, supthousand)
}
...into a file 'passwdparser' and chmod +x it. Now you can run it like ./passwdparser /etc/passwd.
(...NR is the number of records seen so far, which is frequently used as line numbering)
Language notes
AWK has while, do-while, for loops, for-in loops, arrays (note: one-based indexing), associative arrays, a bunch of operators (~ and !~ seem to be the only unusual ones), and more.
String functions include length, sub and gsub (replacing), substr, match, index, printf, sprintf.
Math functions include sin, cos, log, etc., though not min or max.
You can define your own functions.