Awk notes

From Helpful
(Redirected from Awk)
Jump to: navigation, search
These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Intro

Awk is useful for simple transforms of text, particularly text structured by fields (like csv and such). It seems awk is mostly useful for powerful one-liners.

Awk does matching (grep-like), splitting into fields to match, reformatting (cut/paste/sed/printf/whatnot-like).


It's actually a fairly minimal-syntax programming language (with few types, most things are strings) that for some tasks is easier than shell scripting, and sometimes it can be handier than sed.

...but if you're going to need any real logic you might as well use none of those and go for a more serious scripting language.

Basics

At its most basic it parses and walks space-separated (by default) fields within newline-separated records. For example, it could selectively present parts of the output of
ps
:
$ ps aux | awk '{ print "User "$1" has been running "$11" since "$9"." }'
User root has been running bash since Nov24.
User me has been running -bash since 10:24.
...


The command can take various forms, including
/regexp/ {command}
and some simple variations with logic and filters.

For example:

  • "When passwd's shell field (7) contains /bin/false" (you can also use $0 the full current record, to make it act rather like egrep):
$ awk -F':' '$7~/\/bin\/false/ {print $1" is not a real login"}' /etc/passwd
daemon is not a real login
...
  • "If the home dir field (6) doesn't have 'home' in it":
$ awk -F':' '$6!~/home/ {print $1" is a non-regular user -- homedir is "$6}' /etc/passwd
mysql is a non-regular user -- homedir is /var/lib/mysql
...
  • seeing who is currently logged in, and since when:
$ last | awk '/still logged/ {print $1"\tsince "$4" "$5" "$6"\t(on "$2")" }'
root    since Wed Dec 5 (on pts/0)
root    since Aug 10 00:06      (on :0)
root    since Aug 9 03:38       (on tty1)

Parsing, printing, and memory

Variables related to parsing and printing: (There are others that are sometimes useful in the middle of processing)

  • FS: Field Separator (on input; initially
    , which acts on spaces and tabs(verify).
Same as -F on the command line. You can give it several characters to split on(verify); at least, mine accepts "[/~]", and also "[ ]+")
  • RS: Record Separator (on input, initially
    \n
    )
  • OFS: Output Field Separator (initially
    )
  • ORS: Output Record Separator (initially
    \n
    )


Also

  • a BEGIN block will be done before line processing
  • the END block after line processing
  • the main block (the line processing) is unmarked

This lets you do setup, processing, and reporting, respectively.


For example, a list of all users:

$ awk 'BEGIN {FS=":"; ORS=", "} {print $1}' /etc/passwd
 
root, daemon, bin, sys, sync, games, man, lp, mail, ...


In more readable form:

$ cat /etc/passwd | awk 'BEGIN {FS=":"; OFS=""; ORS="\n\n"} $7!~/false/ {print "User: "$1"  (UID "$3", in group "$4")\n  Shell:     "$7"\n  Home dir:  "$6"\n  Name:      "$5  }'
 
User: backup  (UID 34, in group 34)
  Shell:     /bin/sh
  Home dir:  /var/backups
  Name:      backup
...


Summarizing how many users use each shell (and spreading across lines for visibility - which is mostly your shell allowing this)

$ cat /etc/passwd | awk '
  BEGIN { FS=":" }  
  { shells[$7]++ }  
  END { for (shell in shells) 
           printf("%15s users: %d\n", shell, shells[shell]) 
  }' 
 
    /bin/false users: 11
       /bin/sh users: 16
     /bin/bash users: 3
     /bin/sync users: 1


There are other environment variables, see the man page. One example is NR, the number of records seen so far. You could use it to number lines:

cat file.txt | awk '{print NR" $0"}'

Scripting

You can have several main blocks, which makes sense when you have filters. While we're at it, though, you might as well make it a script. Put:

#!/usr/bin/awk -f
BEGIN {
   FS=":"
   printf("Looking at line ")
}
 
{
   printf("%d, ", NR)
   shells[$7]++;
   if ($3<1000) subthousand++
   else         supthousand++
}
$6~/\/home/  { home++ }
$6!~/\/home/ { nothome++ }
 
END {
   printf("Done.\n")
   for (shell in shells) {
      printf("%15s users: %d\n", shell, shells[shell])
   }
   printf("Users with /home directories: %d.  Users based elsewhere: %d\n", home, nothome)
   printf("Users with UIDs under and over 1000: %d and %d\n", subthousand, supthousand)
}
...into a file 'passwdparser' and chmod +x it. Now you can run it like
./passwdparser /etc/passwd
.

Language notes

AWK has while, do-while, for loops, for-in loops, arrays (note: one-based indexing), associative arrays, a bunch of operators (~ and !~ seem to be the only unusual ones), and more.

String functions include length, sub and gsub (replacing), substr, match, index, printf, sprintf. Math functions include sin, cos, log, etc., though not min or max.

You can define your own functions.

Examples

Seeing what programs are listening for connections, and what connections are currently established. (can be put into a shell script as-is):

netstat -pan | egrep -v '\bunix\b' | awk -F'[/ ]+' '
   $0~/LISTEN/      {printf("%20s  listening on  %-20s  (PID %s) \n", $8, $4, $7)}
   $0~/ESTABLISHED/ {printf("%30s  <--connection-->  %s \n", $4, $5)}'


Example from munin:

netstat -s | awk '
/active connections ope/  
/passive connection ope/  { print "passive.value " $1 }
/failed connection/       { print "failed.value " $1 }
/connection resets/       { print "resets.value " $1 }
/connections established/ { print "established.value " $1 }'


List processes in 'D' state (uninterruptible wait time, related to IO waiting)

ps auxw | awk '$8~/D/ { print $0 }'
# although you can get a slightly more controlled version without needing awk:
ps -eo stat,user,comm,pid | egrep '^D'



See also awk usage elsewhere, e.g. that on Harvesting wikipedia, and Linux_admin_notes_-_health_and_statistics#ps