Awk notes: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
 
(3 intermediate revisions by the same user not shown)
Line 22: Line 22:


For example, it could selectively present parts of the output of {{inlinecode|ps}}:
For example, it could selectively present parts of the output of {{inlinecode|ps}}:
<code lang="bash">
<syntaxhighlight lang="bash">
$ ps aux | awk '{ print "User "$1" has been running "$11" since "$9"." }'
$ ps aux | awk '{ print "User "$1" has been running "$11" since "$9"." }'
User root has been running bash since Nov24.
User root has been running bash since Nov24.
User me has been running -bash since 10:24.
User me has been running -bash since 10:24.
...
...
</code>
</syntaxhighlight>




Line 34: Line 34:
For example:  
For example:  
* "When passwd's shell field (7) contains /bin/false or /nologin" {{comment|(you can also match against $0, the full current record, to make it act rather like egrep)}}:
* "When passwd's shell field (7) contains /bin/false or /nologin" {{comment|(you can also match against $0, the full current record, to make it act rather like egrep)}}:
<code lang="bash">
<syntaxhighlight lang="bash">
$ awk -F':' '$7~/(\/bin\/false|\/nologin)/ {print $1" is not a real login"}' /etc/passwd
$ awk -F':' '$7~/(\/bin\/false|\/nologin)/ {print $1" is not a real login"}' /etc/passwd
daemon is not a real login
daemon is not a real login
...
...
</code>
</syntaxhighlight>




* "If the home dir field (6) doesn't have 'home' in it":
* "If the home dir field (6) doesn't have 'home' in it":
<code lang="bash">
<syntaxhighlight lang="bash">
$ awk -F':' '$6!~/home/ {print $1" is a non-regular user -- homedir is "$6}' /etc/passwd
$ awk -F':' '$6!~/home/ {print $1" is a non-regular user -- homedir is "$6}' /etc/passwd
mysql is a non-regular user -- homedir is /var/lib/mysql
mysql is a non-regular user -- homedir is /var/lib/mysql
...
...
</code>
</syntaxhighlight>




* seeing who is ''currently'' logged in, and since when:
* seeing who is ''currently'' logged in, and since when:
<code lang="bash">
<syntaxhighlight lang="bash">
$ last | awk '/still logged/ {print $1"\tsince "$4" "$5" "$6"\t(on "$2")" }'
$ last | awk '/still logged/ {print $1"\tsince "$4" "$5" "$6"\t(on "$2")" }'
root    since Wed Dec 5 (on pts/0)
root    since Wed Dec 5 (on pts/0)
root    since Aug 10 00:06      (on :0)
root    since Aug 10 00:06      (on :0)
root    since Aug 9 03:38      (on tty1)
root    since Aug 9 03:38      (on tty1)
</code>
</syntaxhighlight >




Line 73: Line 73:


For example, a list of all users on one line:
For example, a list of all users on one line:
<code lang="bash">
<syntaxhighlight lang="bash">
$ awk 'BEGIN {FS=":"; ORS=", "} {print $1}' /etc/passwd
$ awk 'BEGIN {FS=":"; ORS=", "} {print $1}' /etc/passwd
</code>
</syntaxhighlight >
  root, daemon, bin, sys, sync, games, man, lp, mail, ...
  root, daemon, bin, sys, sync, games, man, lp, mail, ...


Line 91: Line 91:


User summary:
User summary:
<code lang="bash">
<syntaxhighlight lang="bash">
cat /etc/passwd | awk 'BEGIN {FS=":"; OFS=""; ORS="\n\n"} $7!~/false/ {print "User: "$1"  (UID "$3", in group "$4")\n  Shell:    "$7"\n  Home dir:  "$6"\n  Name:      "$5  }'
cat /etc/passwd | awk 'BEGIN {FS=":"; OFS=""; ORS="\n\n"} $7!~/false/ {print "User: "$1"  (UID "$3", in group "$4")\n  Shell:    "$7"\n  Home dir:  "$6"\n  Name:      "$5  }'
</code>
</code>
Line 99: Line 99:
   Name:      backup
   Name:      backup
  ...
  ...
 
</syntaxhighlight >




'''Using associative arrays''' to summarizing how many users use each shell:
'''Using associative arrays''' to summarizing how many users use each shell:
<code lang="bash">
<syntaxhighlight lang="bash">
cat /etc/passwd | awk '
cat /etc/passwd | awk '
   BEGIN {FS=":"}  
   BEGIN {FS=":"}  
   {shells[$7]++}   
   {shells[$7]++}   
   END { for (shell in shells) printf("%15s users: %d\n", shell, shells[shell]) }'  
   END { for (shell in shells) printf("%15s users: %d\n", shell, shells[shell]) }'  
</code>
 
     /bin/false users: 11
     /bin/false users: 11
       /bin/sh users: 16
       /bin/sh users: 16
     /bin/bash users: 3
     /bin/bash users: 3
     /bin/sync users: 1
     /bin/sync users: 1
</syntaxhighlight >


 
<syntaxhighlight lang="bash">
<code lang="bash">
#!/bin/bash
#!/bin/bash
# mentions users that use more than negligible CPU and/or memory
# mentions users that use more than negligible CPU and/or memory
Line 123: Line 123:
             printf("%15s using  %4d%% CPU  and %4d%% resident memory\n",  
             printf("%15s using  %4d%% CPU  and %4d%% resident memory\n",  
                         u, usercpu[u], usermem[u]) }  }'
                         u, usercpu[u], usermem[u]) }  }'
</code>
 
       postgres using    4% CPU  and  83% resident memory
       postgres using    4% CPU  and  83% resident memory
       liquids+ using    60% CPU  and    1% resident memory
       liquids+ using    60% CPU  and    1% resident memory
       www-data using    4% CPU  and  27% resident memory
       www-data using    4% CPU  and  27% resident memory
           root using    52% CPU  and  11% resident memory
           root using    52% CPU  and  11% resident memory
 
</syntaxhighlight >




Line 135: Line 135:


Example from [http://en.wikipedia.org/wiki/Munin_%28network_monitoring_application%29 munin]:
Example from [http://en.wikipedia.org/wiki/Munin_%28network_monitoring_application%29 munin]:
<code lang="bash">
<syntaxhighlight lang="bash">
netstat -s | awk '
netstat -s | awk '
/active connections ope/   
/active connections ope/   
Line 142: Line 142:
/connection resets/      { print "resets.value " $1 }
/connection resets/      { print "resets.value " $1 }
/connections established/ { print "established.value " $1 }'
/connections established/ { print "established.value " $1 }'
</code>
</syntaxhighlight >
  passive.value 181432543
  passive.value 181432543
  failed.value 91976
  failed.value 91976
Line 150: Line 150:


Seeing what programs are listening for connections, and what connections are currently established. (can be put into a shell script as-is):  
Seeing what programs are listening for connections, and what connections are currently established. (can be put into a shell script as-is):  
<code lang="bash">
<syntaxhighlight lang="bash">
  netstat -pan | egrep -v '\bunix\b' | awk -F'[/ ]+' '
  netstat -pan | egrep -v '\bunix\b' | awk -F'[/ ]+' '
   $0~/LISTEN/      {printf("%20s  listening on  %-20s  (PID %s) \n", $8, $4, $7)}
   $0~/LISTEN/      {printf("%20s  listening on  %-20s  (PID %s) \n", $8, $4, $7)}
   $0~/ESTABLISHED/ {printf("%30s  <--connection-->  %s \n", $4, $5)}'
   $0~/ESTABLISHED/ {printf("%30s  <--connection-->  %s \n", $4, $5)}'
</code>
</syntaxhighlight >






Or some more passwd related summary stuff:
Or some more passwd related summary stuff:
<code lang="bash">
<syntaxhighlight lang="bash">
#!/usr/bin/awk -f
#!/usr/bin/awk -f
BEGIN {
BEGIN {
Line 183: Line 183:
   printf("Users with UID <1000: %d,  >1000: %d\n", subthousand, supthousand)
   printf("Users with UID <1000: %d,  >1000: %d\n", subthousand, supthousand)
}
}
</code>
</syntaxhighlight >
...into a file 'passwdparser' and <tt>chmod +x</tt> it. Now you can run it like {{inlinecode|./passwdparser /etc/passwd}}.
...into a file 'passwdparser' and <tt>chmod +x</tt> it. Now you can run it like {{inlinecode|./passwdparser /etc/passwd}}.




(...NR is the number of records seen so far, which is frequently used as line numbering)<!--  
(...NR is the number of records seen so far, which is frequently used as line numbering)<!--  
<code lang="bash">
<syntaxhighlight lang="bash">
  cat file.txt | awk '{print NR" "$0}'
  cat file.txt | awk '{print NR" "$0}'
</code>
</syntaxhighlight >
-->
-->


Line 209: Line 209:


List processes in 'D' state (uninterruptible wait time, related to IO waiting)
List processes in 'D' state (uninterruptible wait time, related to IO waiting)
<code lang="bash">
<syntaxhighlight lang="bash">
ps auxw | awk '$8~/D/ { print $0 }'
ps auxw | awk '$8~/D/ { print $0 }'
# although you can get a slightly more controlled version without needing awk:
# although you can get a slightly more controlled version without needing awk:
ps -eo stat,user,comm,pid | egrep '^D'
ps -eo stat,user,comm,pid | egrep '^D'
</code>
</syntaxhighlight >


-->
-->

Latest revision as of 19:31, 15 July 2023

📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense.
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Intro

Awk is useful for simple transforms of text, particularly text structured by fields, such as comma or tab separated columns.


Awk does matching (grep-like), splitting into fields to match, reformatting (cut/paste/sed/printf/whatnot-like).


It's actually a fairly minimal-syntax programming language (with few types - most things are strings) that for some basic tasks is more succinct than shell scripting, and sometimes it can be handier than sed.

...but if you're going to need any real logic you might as well use none of awk/sed/shell, and go for a more serious scripting language.


Basics

At its most basic, awk parses and walks space-separated (by default) fields within newline-separated records.

For example, it could selectively present parts of the output of ps:

$ ps aux | awk '{ print "User "$1" has been running "$11" since "$9"." }'
User root has been running bash since Nov24.
User me has been running -bash since 10:24.
...


You can prepend a regexp, like /regexp/ {command}, often for grep-like filtering.

For example:

  • "When passwd's shell field (7) contains /bin/false or /nologin" (you can also match against $0, the full current record, to make it act rather like egrep):
$ awk -F':' '$7~/(\/bin\/false|\/nologin)/ {print $1" is not a real login"}' /etc/passwd
daemon is not a real login
...


  • "If the home dir field (6) doesn't have 'home' in it":
$ awk -F':' '$6!~/home/ {print $1" is a non-regular user -- homedir is "$6}' /etc/passwd
mysql is a non-regular user -- homedir is /var/lib/mysql
...


  • seeing who is currently logged in, and since when:
$ last | awk '/still logged/ {print $1"\tsince "$4" "$5" "$6"\t(on "$2")" }'
root    since Wed Dec 5 (on pts/0)
root    since Aug 10 00:06      (on :0)
root    since Aug 9 03:38       (on tty1)


Input parsing, output formatting

Variables related to input, and output: (There are others that are sometimes useful in the middle of processing)

  • FS: Field Separator (on input; initially acts on spaces and tabs(verify).
can also be handed in with -F on the command line. You can give it several characters to split on(verify); at least, mine accepts "[/~]", and also "[ ]+")
  • RS: Record Separator (on input, initially \n)
  • OFS: Output Field Separator (initially )
  • ORS: Output Record Separator (initially \n)


For example, a list of all users on one line:

$ awk 'BEGIN {FS=":"; ORS=", "} {print $1}' /etc/passwd
root, daemon, bin, sys, sync, games, man, lp, mail, ...

Blocks, processing in separate steps

You can define multiple blocks

  • a BEGIN block will be executed before line processing
  • one (or more, see below) main block(s), the line processing, which are unmarked
  • an END block after line processing

This helps do setup, processing, and reporting, respectively.


User summary:

cat /etc/passwd | awk 'BEGIN {FS=":"; OFS=""; ORS="\n\n"} $7!~/false/ {print "User: "$1"  (UID "$3", in group "$4")\n  Shell:     "$7"\n  Home dir:  "$6"\n  Name:      "$5  }'
</code>
 User: backup  (UID 34, in group 34)
   Shell:     /bin/sh
   Home dir:  /var/backups
   Name:      backup
 ...


Using associative arrays to summarizing how many users use each shell:

cat /etc/passwd | awk '
   BEGIN {FS=":"} 
   {shells[$7]++}  
   END { for (shell in shells) printf("%15s users: %d\n", shell, shells[shell]) }' 

    /bin/false users: 11
       /bin/sh users: 16
     /bin/bash users: 3
     /bin/sync users: 1
#!/bin/bash
# mentions users that use more than negligible CPU and/or memory
ps --no-headers -axeo user,%cpu,%mem | \
  awk '{usercpu[$1]+=$2; usermem[$1]+=$3} 
       END { for (u in usercpu) { if (usercpu[u]>5 || usermem[u]>5) 
             printf("%15s using  %4d%% CPU  and %4d%% resident memory\n", 
                        u, usercpu[u], usermem[u]) }  }'

       postgres using     4% CPU  and   83% resident memory
       liquids+ using    60% CPU  and    1% resident memory
       www-data using     4% CPU  and   27% resident memory
           root using    52% CPU  and   11% resident memory


Use of several main blocks, makes sense when you have filters.


Example from munin:

netstat -s | awk '
/active connections ope/  
/passive connection ope/  { print "passive.value " $1 }
/failed connection/       { print "failed.value " $1 }
/connection resets/       { print "resets.value " $1 }
/connections established/ { print "established.value " $1 }'
passive.value 181432543
failed.value 91976
resets.value 2954810
established.value 142


Seeing what programs are listening for connections, and what connections are currently established. (can be put into a shell script as-is):

 netstat -pan | egrep -v '\bunix\b' | awk -F'[/ ]+' '
   $0~/LISTEN/      {printf("%20s  listening on  %-20s  (PID %s) \n", $8, $4, $7)}
   $0~/ESTABLISHED/ {printf("%30s  <--connection-->  %s \n", $4, $5)}'


Or some more passwd related summary stuff:

#!/usr/bin/awk -f
BEGIN {
   FS=":"
   printf("Looking at line ")
}

{
   printf("%d, ", NR)
   shells[$7]++;
   if ($3<1000) subthousand++
   else         supthousand++
}
$6~/\/home/  { home++ }
$6!~/\/home/ { nothome++ }

END {
   printf("Done.\n")
   for (shell in shells) {
      printf("%15s users: %d\n", shell, shells[shell])
   }
   printf("Users with /home directories: %d.  Users based elsewhere: %d\n", home, nothome)
   printf("Users with UID <1000: %d,   >1000: %d\n", subthousand, supthousand)
}

...into a file 'passwdparser' and chmod +x it. Now you can run it like ./passwdparser /etc/passwd.


(...NR is the number of records seen so far, which is frequently used as line numbering)

Language notes

AWK has while, do-while, for loops, for-in loops, arrays (note: one-based indexing), associative arrays, a bunch of operators (~ and !~ seem to be the only unusual ones), and more.

String functions include length, sub and gsub (replacing), substr, match, index, printf, sprintf.

Math functions include sin, cos, log, etc., though not min or max.

You can define your own functions.