Command line and bash notes

From Helpful
Jump to: navigation, search

Shell, admin, and both:

Shell - command line and bash notes · shell login - profiles and scripts ·· find and xargs and parallel · screen and tmux
Linux admin - disk and filesystem · users and permissions · Debugging · security enhanced linux · health and statistics · kernel modules · YP notes · unsorted and muck
Logging and graphing - Logging · RRDtool and munin notes
Network admin - Firewalling and other packet stuff ·


Remote desktops
VNC notes
XDMCP notes




Safer scripts

Shell expansion

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Introduction

Bash shell expansion is the following sections - and apply in that order.


While powerfully brief, it's also hard to truly understand, depends on environment settings (environment variables, shell option), so it will bite you and if you want something robust it's best avoided.

Usually the suggestion is to use a scripting language, one where it is easier to be correct, clear, and still brief. (...so not perl). Python is an option, due to it being ubiquitous in modern linux.


For example, can you say why

for fn in `ls *.txt`; do echo $n; done

is a problem while

for fn in *.txt; do echo $n; done

is mostly-fine-except-for-a-footnote-or-two, and what the better-form-yet is?


Some examples below are demonstrated via a command called something like argshow. Make this yourself with the following contents and a chmod +x

#!/bin/bash
printf "%d args:" $#
printf " <%s>" "$@"
echo
EOF

brace expansion

Combinatorial expansion:

# echo {a,b,c}{3,2,1}
a3 a2 a1 b3 b2 b1 c3 c2 c1
# echo a{d,c,b}e
ade ace abe
ls -l /usr/{bin,sbin}/h* 
/usr/bin/h2ph                                                                                                                                                                            
/usr/bin/h2xs                                                                                                                                                                            
/usr/bin/h5c++                                                                                                                                                                           
/usr/bin/h5cc                                                                                                                                                                            
...
/usr/sbin/httxt2dbm   

Sequence expression (integers):

# echo {1..6}
1 2 3 4 5 6
# echo {1..10..2}                                                                                                                                                                           
1 3 5 7 9


Sequence expression (characters, in C locale):

# echo {a..f}
a b c d e f                 


Notes:

  • Expanded left to right.
  • order is preserved as specified, not sorted
  • things stuck to the braces on the outside are treated as preamble (to prepend to each result) and postscript (to append to each result), see second example
  • single list is effectively
  • when using it for filenames, keep in mind that
it generates names without requiring they exist
it happens before pathname expansion (meaning you can combine with globs - and that you should consider cases where they don't expand)
  • may be nested, is treated flattened(verify)
# echo {1,2}-{{_,-},{X,Y,Z}}                                                                                                                                                                
1-_ 1-- 1-X 1-Y 1-Z 2-_ 2-- 2-X 2-Y 2-Z 
# echo {a,b}{{_,-}{X,Y,Z}}                                                                                                                                                                  
a{_X} a{_Y} a{_Z} a{-X} a{-Y} a{-Z} b{_X} b{_Y} b{_Z} b{-X} b{-Y} b{-Z}
# echo {00{1..9},0{10..50}}
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 
026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 
  • can't combine sequence and set (e.g. {1,3..5} works as two string elements)

See also:

tilde expansion

The two best-known two:

  • ~
    is your shell's $HOME
some footnotes with su
  • ~username
    is their home path


And these can start path expressions, e.g.
ls -l ~/.ssh ~nobody/.ssh

Keep in mind this comes from the account database and do not necessarily exist (though usually do).


There are a few others, such as
~+
and
~-
for the current and previous working directory ($PWD and $OLDPWD).
you may like the latter as the special-cased
cd -


parameter/variable expansion

On delimiting

The bracket style ${var} allows more unambiguous delimiting (and don't need whitespace to delimit between it and other things).

It also allows the conditional replacement mentioned below.


The $var is fine in basic cases.


Example:

o="One";t="Two" ; echo $otfoo ; echo $o t foo ; echo ${o}t foo ; echo ${o}${t}foo


Conditional replacement

Error if not set

${VAR:?message}
where if VAR is unset or null, bash complains with the message.
Note that this does not stop a script.
# yn="";echo ${yn:?Missing value}
-bash: yn: Missing value


Return this value if not set

${VAR:-word}
where if VAR is unset or null, (the expansion of) word is returned instead
# Take device from first command line argument, default to eth0 if not given
DEVICE=${1:-eth0}
#!/bin/bash
# Reports all files containing a certain pattern. Call like:  
#   fileswith greppattern [file [file...]]
 
PATTERN=${1}
shift # consume that pattern from the cmdlinearglist
FILES=${@:-*}
grep -l $PATTERN $FILES | tr '\n' ' '


"Return this value if not set, also assign to the variable":

${ans:=no}
#If ans was set, keep its value. 
#If ans was not set, will return no and set ans to it.
#nice in that later code can safely assume it is set
echo $ans


"Use given value when set at all"

For example "any actual answer is taken as 'yes', non-answers are unchanged"

yn="";echo ${yn:+yes}
 
yn="wonk";echo ${yn:+yes}
yes

arithmetic expansion

Basically, using
$(( expr ))
evaluates expr according to shell arithmetic rules

command substitution

The following will be replaced by stdout from that command

$(command)
`command`

(the former is mildly preferred in that it has fewer edge cases in parsing characters)


Notes:

  • it's executed in a subshell
  • trailing newlines are stripped
  • note that word splitting applies, except when this appears in double quotes (single quotes would avoid evaluation)
# argshow $(echo a b)
2 args: <a> <b>            
# argshow "$(echo a b)"
1 args: <a b>
# argshow '$(echo a b)'
1 args: <$(echo a b)>
  • $(< file)
    is done without subshell(verify) so is faster than
    $(cat file)
  • can be nested
(backquote style needs escaped backquotes to do so)
echo $(echo $(ls)) 
echo `echo \`ls\``
  • evaluated left-to-right

word splitting

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Most of what you need to know:

  • Word splitting is performed on almost all unquoted expansions
  • if no expansion occurs, no splitting will occur either (verify)
  • Will split on any run of the characters in $IFS
if unset, default is whitespaces
if set to empty string (a.k.a null), no splitting occurs


If IFS isn't set, it defaults to act like
\ \t\n
(space, tab, newline).
which is why it misbehaves around files with spaces in them. One partial workaround is to remove space from $IFS, i.e. set it to tab-and-newline.
IFS=$(echo -en "\t\n")  # echo call to parse these; IFS="\t\n" would actual be those four characters
for fn in `ls *.txt`; do echo $fn; done
unset IFS # unless you want everything late to behave differently


These delimiters are ignored at the edges (so empty-argument results are avoided)


You can use IFS for other tricks, like:

IFS=":"
while read username pwd uid gid gecos home shell 
do
   echo $username
done < "/etc/passwd"
unset IFS


Notes:

  • Double-quoting suppresses word splitting,
  • ...except for "$@" and "${array[@]}"
  • no word splitting in
    • bash keywords such as ... and case (verify)
    • expansions in assignments
  • You can see what's in IFS currently with something like
    echo -n "$IFS"
20 09 0a is space tab newline
echo -n means it won't add its own newline, doublequoting means avoids do word splitting :)

pathname expansion

Other notes

Shell stuff I occasionally look up

For

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Fixed values and variables

for arg in "$var1" "$var2"; do 
  echo $arg
done

The doublequoting is good practice because you usually want to avoid word splitting on arbitrary values.


Which is mostly a variant of something like:

for body in Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune; do
   echo $planet
done

or

for lastoct in `seq 2 254`; do
  echo 192.168.0.$lastoct
done

Both use bash-specific word splitting.


While

Mostly: See test, and a few of the notes for for


syntax error near unexpected token done
...often means you didn't put a semicolon/newline between the condition and do


A poor man's watch, which I use to get shell colors without forcing them:

while true
do
  echo ls
  sleep 1
done 
 
# Or as a one-liner
while true; do echo ls; sleep 1; done 
 
# You can use 
#while :           
#while [ 1 ] 
# ...if you find them easier to remember

Notes:

  • :
    is a historical shorthand for
    true
    , and is also sometimes useful as a short no-op
[1]

Redirecting, basic

  • <
    feed file into stdin
  • >
    write stdout to file (overwrite contents)
  • >>
    write stdout to file, appending if it already exists

For example:

ls dir1 > listing       # would overwrite each time
ls dir2 >> listing      # would append if exists
sort  <listing  >sorted_listing


By default this applies to stream 1, stdout, because that's where most programs put their most pertinent output.


The standard streams are numbered, and (unless redirected) are:

  • stdin is 0
  • stdout is 1
  • stderr is 2

So e.g.

find >output 2>errors
# or, equivalently
find 1>output 2>errors

Piping

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Piping is redirecting between programs.

When starting multiple processes, you redirect an output stream from one to the input stream of another.

For example:

locate etc/ | less
 
cat infile | sort | tee sorted_list | uniq > unique_list

This can also be combine with redirection, e.g.

find . 2>&1 | less   # don't ignore the errors

Redirecting, fancier

You'll want to know that there is some syntax variation (particurly between shells). In bash,

&> filename
>& filename
are equivalent, and short for:
>filename 2>&1
i.e. stdout and stderr are written to the same file, because it says:
write stdout to filename
write stderr to what stdout currently points to



Also, some of this is specific to bash

e.g. dash[2] will trip over
>&
saying Syntax error: Bad fd number)}}



Consider how multiple requests are handled - primarily that changes are processed in order. Consider:

prog >x 2>&1 >y
This means:
connect stdout to file named x
connect stderr to what stdout currently points to (which is the file named x) (actually duplicates the file descriptor(verify))
connect stdout to file named y
The net effect is "connect stderr to a file named x, and stdout to a file named y".

Redirection, less common

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
The
tee
utility
copies stdin to stdout verbatim and writes it to the named file

This is sometimes a nice streaming thing, though usually just for command brevity

# log output and show it live
find / 2>&1 | tee allfiles
 
# writes both sorted and unique list
cat infile | sort | tee sorted_list | uniq > unique_list


<<
(bash-specific, not bourne?(verify)) - pipe in a here document [3]
    • Example: TODO


<<<
(bash-specific, not bourne?) - here string [4]
    • goes through most interpretation. Some use this syntax primarily for its short command substitution
    • Example: TODO


pv
utility copies stdin to stdout and prints how fast on stderr.
can be nice to see how fast data is moving through
[5]
can deal with showing multiple streams. E.g. to test how people's homedirs would compress on average
tar cvf - /home 2>/dev/null | pv -c -N RAW | pigz -3 - | pv -c -N COMP > /dev/null


See also:

piping/catching both stdout and stderr

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

When you call an external program and read from one stream, you typically use blocking reads for simple 'wait until it does something' logic.

Doing that from both stdout and stderr is a potential problem, in that you can have output on one while not getting any on the other. Usually you can get away with this, but it can produce deadlock-like situations.


Generally, you want to either:

  • use non-blocking reads (probably in a loop with a small sleep to avoid hammering the system with IO)
  • test streams with select() before read()ing
    • In some cases, your OS or language (standard) libary does not expose select(), you cannot find the file descriptor to select on, it does not let you select on pipes, or some other problem.


Other workarounds:

  • redirect both to the same stream (but that can be annoying to do from an exec()-style call, because you need to wrap it in a shell - redirection is shell stuff)
  • for non-interactive stuff, write both streams to a file, read those after the programs exit

Shell escaping

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


You'll occasionally create a string to be evaluated in another context (or immediately via expr or backticks) -- and run into problems with escaping/delimiting.

'Not safe' below tends to mean one of:

  • Will open some interpreted, to-be-closed range (e.g. `)
  • Interpreted differently if in script or on command line (e.g. "\")
  • terminates some parse by odd tokenization, such as spaces in filenames

In various cases I prefer a scripting language that more or less forces you to things in a stricter (if longer) way, simply because I won't spend as much time convincing myself that the bash script is correct, or at least good enough.


single quotes: 'string'

  • Not safe to dump in:
    '
    , possibly more
  • Safe:
    `"$
    (safe as in "not interpreted as anything more than a character")


double quotes: "string"

  • Not safe to dump in:
    !$"`\
    and probably more
  • Safe:
    `


backslashing\ each\ necessary\ character

  • Potentially safer than the above (solves mentioned nonsafe character problems)
  • But: interpretation of backslashes unsafe themselves - or rather, they depend on quotes again:
    • 'single quotes' (no interpretation?)
      • echo '\z' → \z
      • echo '\\z' → \\z
      • echo '\\\z' → \\\z
      • echo '\\\\z' → \\\\z
    • outside quotes
      • echo \z → z
      • echo \\z → \z
      • echo \\\z → \z
      • echo \\\\z → \\z
    • "double quotes"
      • echo "\z" → \z
      • echo "\\z" → \z
      • echo "\\\z" → \\z
      • echo "\\\\z" → \\z


Further notes:

Here documents (those
<<EOF
things) act differently from the above descriptions, apparently acting like escapes inside backquotes (command substitution -- but frankly, if you're doing shell scripting that complex, you're dangerous to begin with:)

Using escaping from the shell (in most shells, anyway) gets a layer of pre-interpretation that would not be applied in a script (!)

Shell conditionals and scripting

Conditional execution

Say that you have a regularly-running script conceptually like:

collectdata
graphdata > file.gif
mv file.png /var/www/mywebserver

...and you want to do some parts only if the earlier bit succeeds.


Basically: Make programs return meaningful return codes (most do), and test for them and use the result.

The short syntax is
&&
('if success') and
||
('if failed').

You can even use both (though this is cheating a little bit), like in:

/bin/true  && echo "Jolly good." || echo Drat.
/bin/false && echo "Jolly good." || echo Drat.


A brief one-liner with bash syntax is to use
&&
, for example:
collectdata && graphdata > file.gif && mv file.png /var/www/mywebserver

If this is not a one-liner (e.g. in your crontab) but a longer script, it's probably cleaner to do something like:

collectdata                       || { echo "Data collection failed"; exit 1 }
graphdata > file.gif              || { echo "Data graphing failed"; exit 2 }
mv file.png /var/www/mywebserver  || { echo "Moving graph failed"; exit 3 }



For the pedantic: The && and || essentially mean 'if zero return code' and 'if nonzero' -- which is inverted from the way true and false works within almost all programming languages. It's often less confusing if you don't think about the values :)

See also

Control

if, test

In bourne-style scripts you frequently see lines like:

if [ "$val" -lt 2 ]; then
 
if test "$val" -lt 2; then
These two are functionally equivalent. The difference that
[
(which is an executable, with that somewhat unusual name) looks for a closing ]. People seem to prefer this form for its brevity.


Note that you can negate tests with
!
if test ! -r ~/.hushlogin; then
  echo "La la you haven't shut up motd yet"
fi
 
test ! -d /var/run/postgresql && mkdir -p /var/run/postgresql


Actual tests include: (list needs to be (verify)'d)

integers

  • -eq
    equal
  • -ne
    not equal
  • -lt
    ,
    -gt
    less than, greater than,
  • -le
    ,
    -ge
    less than or equal to, greater or equal

filesystem

  • -r
    exists and can be read
  • -w
    exists and can be written
  • -x
    exists and can be executed
  • -s
    file exists and isn't empty (size isn't zero)
  • -e
    file exists (may not appear in all implementations(verify))
  • -f
    exists and is regular file
  • -d
    exists and is directory
  • -h
    or -L: exists and is a symbolic link
  • -p
    exists and is a pipe
  • -b
    exists and is a block device
  • -c
    exists and is a character device
  • -S
    exists and is a socket

strings

  • -n
    nonzero string length (you probably want doublequotes around a variable)
  • -z
    zero string length (you probably want doublequotes around a variable)
  • =
    string equality
  • !=
    string inequality
  • Nonstandard: 'lexically comes before' and 'lexically comes after', \< and \>, but be careful: without correct escaping these become file redirection.

boolean combinations -- which are nonstandard

  • -a
    and
  • -o
    or

Other operators test ownership by set or effective user or group, by relative age, by inode equality and others.



On empty/missing arguments

Things can get a little finicky in this case.


Common mistake #1: Unquoted empty variables

Consider that if $var is not set, or an empty string, then

[ $var = '' ]
[ -n $var ]
[ -z $var ]

would expand into:

[ = '' ]
[ -n ]
[ -z ]

The first is a syntax error. The second isn't but doesn't do what you want (returns true without an argument). The third is basically fine. Regardless, you should be in the habit of always using quotes: (probably doublequotes)

[ "$var" = '' ]
[ -n "$var" ]
[ -z "$var" ]


Common annoyance #1: No substring test

It's not there.

But it sort of is -- in bash and sh (recent/all?(verify)), you can use case for this, e.g.:

case "$var" in
   *error*)
      echo "Saw error, stopping now"
      exit 0 ;;
   *)
      echo "We're probably good, doing stuff"
      ;;
esac


You could also use the extended test command (see below) but it's less standard.

The following is arguably more generic, because it uses something external (that we know the behaviour of):

grep -o "pattern1" <<< "$var"   && echo "do something"
test and conditional commands
Since
test
and
[
also set the exit code, you see shell script lines like:
# stop now if we are running in interactive context
[ -z "$PS1" ]      && return           
 
# source this if it exists
[ -f /etc/bashrc ] && . /etc/bashrc

While experimenting with command success/failure, you may find it useful to show test's exit status, for example:

test -n "`find . -name '*.err' -print0`" ; echo $?


...but this can be confusing -- the logic is wrapped around program exit codes, so 0 is true and nonzero is false, which is the opposite of how most programming logic works. You usually don't need to think about that until you're consuming it as a number. For example:

test 2 -eq 2 ; echo $?
0
test 4 -eq 2 ; echo $?
1



Extended test command

There is the extended test command,

Keep in mind that while test and [ are POSIX, [[ is not.

[[ was adopted by bash from ksh. However, sh, the default shell, may not, depending on whether that's a symlink to bash, dash, ash, or whatever else. This means you probably should not use extended test if you want wide portability. If you can count on your context being bash (e.g. mention it in the hashbang), you can probably get away with it.


In general, you can trust [[ to do less weird crap than [ or test

One of the major reasons is that it sees the string you gave it before and variable expansion, file completion, shell operator interpretation. Meaning you don't have to worry about mangling.


This also you can now also use && and || as boolean operators within a test expression.

Other additions:

  • glob matching, which includes substring matching
[[ abc = a* ]]
[[ $var = *a* ]]
  • regexp pattern matching (≥Bash 3.1)
[[ abb =~ ab+ ]]
More notes

If you want to take out some block of code via an if (faster than commenting a lot of lines) then:

if [ false ] # or whatever string in there 

...won't work because bash uses string variables, and the default operation is -n ("is string non-empty").


The shortest thing that does what you want is:

if [ ]

or perhaps:

if [ "" ]

or, if you want something more obvious to passing readers, you could do:

if [ ignore = block ]

case

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Note: only in the bourne-style shells (verify)

For example:

case "$var" in
   *pattern1* )            echo "seeing pattern 1" ;;
   *pattern2*|*pattern3* ) echo "seeing pattern 2 or 3" ;;
   * )                     echo "fallback case" ;;
esac

The thing this has over test/if is proper wildcard behaviour.

for

You'll know that a bash script can act as a batch file, running one command after the other in the hope nothing will screw up. Bash, however, offers more useful functionality, in and out of scripts (there is, in fact, no noticeable difference). For example:

for pid in `pidof swipl pl`; do renice 5 $pid; done 
29180: old priority 0, new priority -5
26858: old priority 0, new priority -5

...will re-nice the named processes, because for expects a space-seperated list, pidof returns a list of pids, and backquotes (`) mean "treat this as its output of the command specified"

The above could have been spread among lines:

bash-2.05b $ for pid in `pidof swipl pl`
> do 
>   renice 5 $pid
> done

Something similar goes for if-else loops. These allow you construct scripts that catch errors, run differently depending on how other commands managed, on environment variables, and whatnot. Scripting tends to beat real programming for simple little jobs.


While you can do this for files by using a wildcard, but it is generally a bad thing to do on files and you shouldn't learn this this way, because it won't work in two situations:

  • when files contain spaces (possibly also on other less usual but legal characters)
  • when there are so many files that bash expands the command to something longer than it can use (see Argument list too long (although this is less of a problem now)

If you want to do it robustly / properly, learn using using find and xargs.

while

while is a conditional loop.

You can do things like

while [ 1 ]; do (clear; df; sleep 5); done
#which imitates   watch -n 5 df

or

let c=0
while [ $c -lt 10 ]; do  # better served by a for
  echo $c; 
  let c=c+1 
done

User input

read reads user input into a variable, for example:

read -p "Do you want to continue? " usercont
echo $usercont

There are some options - that the man page doesn't mention.

Strings

Substring (by position,length):

# s="foobarquu";echo ${s:3:5}
barqu

Regexp is possible, but strange and limited. Use of awk and/or sed is probably handier.

sourcing scripts

Usually, running a script means creating a process, and running the listed commands in that process.


When you want to alter a current shell's environment, it is useful to run another script's commands in our context. This is what
source
is for.
source /etc/profile
 
# (bourne-style?) shorthand:
. /etc/profile



Backgrounding processes

Ctrl-Z, fg and bg

I occasionally see people using shells used only to run a program, typing e.g.
firefox
and then minimizing it as a useless shell. That while it is simple to background a program, like:
netscape &
.

(In the case of KDE, you can of course use the run dialog, Alt-F2. The parent of the process will be kdeinit then, I believe.) If you wish to have the same effect as the & after you didn't initially use it, you can use Control-Z to pause the current foreground process, which should print something like:

[1]+ Stopped       firefox
...which is a shell-specific (bash, here) job management list. You can then run
bg
to have the same effect as the ampersand, or
fg
to continue the program as before - an effective pause. You can use the job id's if you want detailed control over more than one process, but I've never needed this. When a backgrounded program's parent shell is terminated, the program should keep running, although there are likely details there that I've never checked out. For a more more certain, permanent and convenient running-in-the-background solution, use , which is probably more useful in the first place.

Directory of script being run

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

http://stackoverflow.com/questions/59895/getting-the-current-present-working-directory-of-a-bash-script-from-within-the-s

Console scrolling

Shift-PgUp and Shift-PgDown (often)

Useful for those happy-go-verbose programs, you can scroll back as far as the screen history goes. This usually works in text consoles, and is usually imitated by X terminal consoles.

Note that various things (PuTTy/Konsole/xterm, but also screen) may have their own configurable limit to how many past lines they keep, and in the case of screen, their own way of looking at it (screens are not really regular terminals, after all...)

Shell aliases and functions

Aliases

Aliases are short identifiers that expand to longer things. For bash, the syntax is like setting a variable. Potentially useful examples:

alias webdir="cd /var/www/www/htdocs"                # go to some directory you regularly work in
alias weblogtail="tail -F /var/log/apache2/*"        # watch web server log
alias logs="tail -F /var/log/*.log /var/log/*/*.log /var/log/syslog"  # watch various current logs
alias ..='cd ..'                                     # funky shortcut
Customized alternatives, such as and ls that uses iso date formats, hides
.
and
..
, hides the groupname, hides backups (*~), adds a / to directory names, sorts by mtime (most recent last), uses human-readable sizes, and color when appropriate:
alias l="ls -lAhGBptr --time-style=long-iso --color=auto "

Some examples:

alias vf='cd'                            # catch typo
alias duh="du -h "                       # use human-readable sizes
alias dud="du --max-depth=1 -h "         # human, one-deep (often more readable)
alias lslast="ls -lrt "                  # show last modified last
alias lsd='find * -prune -type d -ls'    # 'list directories under curdir'
alias hexdump='od -t x1z'                # "show hex representation, of single bytes at a time, show text alongside"
alias dlpage="wget -r -l 1 "             # save page and direct links
alias lesscol="less -R"                  # less that allows color (...control codes)
alias psgrep="ps aux | grep"             # short way of grepping through process list
alias go-go-gadget=sudo
# change default verbosity
alias df="df -hT "                       # use human-readable sizes and show filesystem type
alias bzip2="bzip2 -p "                  # always print progress when bzipping
alias pstree='pstree -pu '               # always show pid, and show usernames where UID changes
# change default behaviour:
alias grep="egrep "                      # always use extended grep (always have regexp)
alias bc="bc -lq "                       # bc always does float calculations

Notes:

  • aliases can be removed with
    unalias
  • naming an aliasing the same as the command is possible, but can mean arguments that are hard to negate, arrive double can cause confusion, and such. You may want to know about
    unalias
  • aliases don't have arguments as such - they expand and let arguments come at the end (note that bash functions can have arguments, so they can be a better choice)
  • if intead of an alias you can use an environment variable (e.g. <tt>GREP_COLOR=</tt> for grep's --color=), that may be preferable, as it is more flexible.
  • some distributions have different default behaviour differences (via aliases), such as having rm do "rm -i".

Functions

Bash functions have a somewhat flexible syntax. They look like the following, although the 'function' keyword is optional:

# alternative to cd that lists content when you switch directory
function cdd() { cd ${1} ; echo $PWD ; ls -FC --color ; };


More adaptively:

rot13 () { 
   if [ $# -eq 0 ]; then  #no arguments? eternal per-line translation
      tr '[a-m][n-z][A-M][N-Z]' '[n-z][a-m][N-Z][A-M]'
   else                   #translate all arguments
      echo $* | tr '[a-m][n-z][A-M][N-Z]' '[n-z][a-m][N-Z][A-M]'
   fi
}

Notes:

  • Functions can be removed with
    unset -f name
  • Neither aliases or functions are forked


Renaming many files

There are two different things called rename out there, and both seem to be associated with the same source (a package often called util-linux)



If running
rename
without arguments says:
Usage: rename [-v] [-n] [-f] perlexpr [filenames]


You use it something like:

rename 's/[.]htm$/[.]html/' *.htm
 
rename 's/[a-z]+_([0-9]+)[.]html$/frame_$1.html/' *.html

perlexpression is typically a regex, but could be any perl code that alters $_)

In a regex, you may often want to use <t>/g</t> or you'll get only one replacement.




If running it without argument says:

call: rename from to files...

It's a simple string replace and you use it like:

rename '.htm' '.html' *.htm

And e.g:

rename '' 'PrependMe ' *
rename 'RemoveMe' '' *




There's also mmv, which I've never used.

Configurable autocompletion

Bash >=2.04 has configurable autocompletion.

See
man bash
, somewhere under SHELL BUILTIN COMMANDS.


Actions are pre-made completion behaviour.

complete -A directory  cd rmdir         # complete only with directories for these two
complete -A variable   export           # assist re-exports
complete -A user       mail su finger   # complete usernames
complete -A hostname   ping scp         # complete hostnames (presumably from /etc/hosts)


Filter patterns are usually for filenames (-f), to filter out completion candidates, for example to filter out everything that does't end in '.(zip|ZIP)' when the command is unzip.

This can be helpful but also potentially really annoying: if I know a file is an archive but doesn't have the exact extension the completion is expecting, you will have to type out the filename (or change the command temporarily).


Manual filters: You can use a bash function, and inside that do whatever you want, including calling applications to get and process your options (just don't make them heavy ones). The following example (found somewhere, and rewritten) illustrates:

The following allows killall completion, with the names of the currently processes that the current user owns:

_processnames() {
   local cur=${COMP_WORDS[COMP_CWORD]}    #the partial thing you typed already
   COMPREPLY=(                                    \
       $( ps --no-headers -u $USER -o comm      | \
       awk '{if($0  ~ /^'$cur'/)    print $0}'  | \
       awk '{if($0 !~ /\/[0-9]$/)   print $0}' )  \
   )
   return 0
}
complete -F _processnames  killall

That first awk takes out things that don't start with what you typed so far, the second filters out the names of some 2.6 kernel process names (in the form processname/0)you probably can't and don't want to kill.



shopt

Shopt sets some options for the sh family of shells.

Shopt things are set (-s) or unset (-u).

You can see the current state with a
shopt -p
.


Most of the settings are low-level and are probably already set to sensible values.

Things that might interest you include:

  • checkwinsize
    update LINES and COLUMNS environment variables whenever the shell has control. Useful for resizeable shell windows, e.g. remote graphical ones.
  • histappend
    appends instead of overwrite the history file. Seems to be useful when you often have multiple shells on the same host.
  • dotglob
    considers .dotfiles in filename completion
  • nocaseglob
    ignores case while completing. This can be useful if you, say, want'*.jpg' to include '.JPG', '.Jpg', etc. files too. (You may wish to be a bit more careful when you have this set, though)



If you generally want case sensitive matching, but sometimes case insensitive matching, say,

ngc ls *.jpg    # case insensitive 
ls *.jpg        # case sensitive

...then you can use a trick to temporarily disable e.g. nocaseglob:

alias ncg='shopt -s nocaseglob; ncgf'
ncgf() {
  shopt -u nocaseglob
  "$@"
}

This works because an alias is evaluated before the main command, a function after.

Key shortcuts

To get an old command, instead of pressing Up a lot, you can search for a substring with Control-R. When you get the one you want, use enter to run it, or most anything else to change it first.



See also

Some shell-fu exercise

Links and sites

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

These are general sites, here partly because they need some place. You may find some of them intereting to read, but none are of the "Read this before you go on" type.


Thinks to look at:


Here documents

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

You've probably seen scripts with something like:

wall <<EOF
Hello there.
Please be aware the system is going down in half an hour.
EOF

<< means feeding in data that follows into stdin of the preceding command, everything up to the token mentioned immediately after. People often use EOF as a recognizable convention, but it could be xx62EndOfMessageZorp just as easily.

Here documents can be easier than trying to construct an echo command to do your multi-line escaped bidding.



Combination with shell arguments (redirection, piping) look weirdly positioned

wall <<EOF &
Test
EOF

...until you realize that the here-document start is really just a trigger for behaviour that starts after the rest of the command is parsed and evaluated

strace -eopen workhard <<EOF 2>&1 | grep datafile
Test
EOF 


See also:

Quick and dirty utilities

du with better kilo, mega, giga behaviour

Written to use
du
with size sorting and human-readable size output.

Made to be used in bash function (sort of like aliases, but allowing further arguments):

function duk()  { du --block-size=1 ${1} | sort -n | kmg; };
function duk1() { du --block-size=1 --max-depth=1 ${1} | sort -n | kmg; };
function duk2() { du --block-size=1 --max-depth=2 ${1} | sort -n | kmg; };


That kmg script (e.g. put it in /usr/local/bin and chmod +x it):

#!/usr/bin/python                                                                                           
""" Looks for inital number on a line. If large, is assumed to be summarizable in kilo/mega/giga """
import sys,re
 
mega=kilo*kilo
giga=mega*kilo
tera=giga*kilo
def kmg(bytes,kilo=1024):
    """ Readable size formatter.                                                                            
        Binary-based kilos by default. Specify kilo=1000 if you want decimal kilos.                         
    """
    if abs(bytes) > 0.95*tera:
        return "%.1fT"%(bytes/float(tera))
    if abs(bytes) > 0.95*giga:
        return "%.0fG"%(bytes/float(giga))
    if abs(bytes) > 0.9*mega:
        return "%.0fM"%(bytes/float(mega))
    if abs(bytes) > 0.85*kilo:
        return "%.0fK"%(bytes/float(kilo))
    else:
        return "%d"%bytes
 
firstws = re.compile('^[0-9]+(?=[\t\ ])')  # look for initial number, followed by space or tab
for line in sys.stdin:
    m = firstws.match(line)
    if m:
        bytesize = int( line[m.start():m.end()], 10)   
        #for du uses, we could filter out below a particular size (if argument given)
        sys.stdout.write("%s %s"%(kmg( bytesize ),line[m.end():])) # using stdout.write saves a rstrip()
    else:
	sys.stdout.write(line)

technical notes

Return codes

Return codes a.k.a. exit status are a number that a process returns.

Often either

  • via a return on the main() function
  • via a function called exit() that also causes the termination

It's regularly treated as an 8-bit value (It seems to be 32-bit in windows. In POSIX it's 32-bit internally, and it uses part of it for the wait/waitid/waitpid syscalls, but masks what you see elsewhere)


This can be used in simple shell logic, backing
&
and
|
(where 0 means success and anything else and error), and a shell can typically read out the most recent exit code, e.g. in bash:
diff one two ; echo $?


There is no real standard beyond stdlib.h's definition of:

EXIT_SUCCESS as 0
EXIT_FAILURE as 1

(which is a bit funny, since booleans usually work the other way around. It's how if, test, and shell logic all works, though.)


So you can do something meaningul/useful in any specific program

...though you should not expect that meaning to carry around,
...or even stay constant, unless programmers have said so (explicit documentation also helps)


Some conventions include:

  • using 1, 2, 3, 4, etc.. for specific reasons as you invent them
  • using -1, -2 in a similar way (and/or)
  • passing through errno (though note those could exceed 255 in theory)
  • using 128+ only for serious errors
  • sysexits.h added some entries a bunch of years later (originating from mail servers, apparently). You see them around, but not very widely.
64       command line usage error 
65       data format error 
66       cannot open input 
67       addressee unknown 
68       host name unknown 
69       service unavailable 
70       internal software error 
71       system error (e.g., can't fork) 
72       critical OS file missing 
73       can't create (user) output file 
74       input/output error 
75       temp failure; user is invited to retry 
76       remote error in protocol 
77       permission denied 
78       configuration error 


  • Bash (mainly meaning bash scripts) seems to add:
126	Command invoked cannot execute	(e.g. Permission problem or command is not an executable)
127	"command not found"		
128	Invalid argument to bash's exit
128+n Fatal error signal n (signals go up to 62ish), eg.
130 terminated by SIGINT (Ctrl-C)
137 terminated by SIGKILL
143 terminated by SIGTERM


See also


tty, pty, pts, and such

  • tty - teletypewriter.
    • broad term: can include physical terminals [6], virtual terminals (e.g. the text-mode terminals in various unices), and pseudoterminals (see below)
    • also regularly refers to 'the terminal that this process is wrapped in' (which is what the
      tty
      command reports - see its man page).


  • pty - pseudoterminal
    • ...which is a pair of a ptmx (pseudoterminal master) and pts (pseudoterminal slave)
    • see
      man pts
    • most recognizably used in cases like remote logins (e.g. sshd) and graphical terminals


On linux, /dev/tty* are text-mode terminals (getty), while pts are typically graphical shells and sshd

(may be useful when inspecting the output of things like
last
)


Be lazy, type less

Tab autocompletion

You don't have to type out long names. Most shells will autocomplete both command names and file names, up to the point of ambiguity.

For example, if you have three files in your current directory:

Iamalongfilenameyoubet
Iamalongfilenametoo.verboseout
inane-innit 

You can complete to the third with e.g.

cat inTab

and the second with:

cat ITabtTab

Pressing tab twice at a point of ambiguity will show the options. For example, psTabTab will likely list ps2pdf, ps2pdf13, ps2pdfwr, ps2ps, psscale, pslatex and more.


Using your history

In bash, there are two basic tools to use commands from your history.


I prefer to use only the search feature: Ctrl-R, then type a substring. If you want to change it before running it, make sure to accept your choice with some key that is not Enter.


In bash, there is also the exclamation mark. Other shells have similar functionality, but the details will differ.

It does not autocomplete, so the most use I see in it is repeating a very recent commands you know you can safely use verbatim. For example, if you recently used a long command line (e.g. pdflatex, bibtex, pdflatex, pdflatex) you can repeat the entire thing with:

!pdfl


Backgrounding, pausing, and detaching processes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


backgrounding, job control (Ctrl-Z, bg, fg, effective pausing)


Most shells have job control. The following mentions bash's

Job control means you can run multiple things from one shell, and the shell need not be occupied and useless while it's running something.


Say you want to update the database that backs the
locate
command. The update command is
updatedb
, and it takes a while. Running
updatedb &
will start it program,

but disconnect its stdin from your terminal. (Not that this particular program asks for any further input via stdin, but on other cases that can be a problem).

In other words, it now runs in the background.

Its stdout and stderr are still connected to your terminal (so it'll spout any output while you're doing other things -- in updatedb's case mostly warnings), and the process is still the shell's direct child (so will be killed when your terminal quits).


When a program occupies your terminal, Ctrl-Z will disconnect it from your stdin and effectively pause it.

If you follow that with a
fg
, it continues running in the foreground. (Sometimes this is a convenient way to pause a program, though anything watching the time may get confused, so this mostly makes sense for simple shell utilities) If you follow it up with a
bg
, it will continue running as a background process,

which is functionally equivalent to having started the process with & (note: bg and fg are bash-specific. other shells do job control differently)



avoiding dependency on starting process

When a shell starts a process, it is the child of that shell. Normally, killing a process with children means the HUP signal is sent to each child -- a message meaning "controlling terminal is closed". The default signal hander for HUP stops a process.

A shell is itself the child of something -- with SSH login it's the sshd process for the network connection, with local graphical login it's the xterm, itself a child of your window manager, which is a child of your login session, etc. Particularly in the graphical login case, this default to clean up is a useful thing.


There are actually two relevant ways a child and parent are related. One is the process tree cleanup described above. The other is how the stdin/stdout/stderr streams are attached (by default to the controlling terminal), because closing one end tends to break the program on the other end.


When you want to run a job that may take a while, both of the above mean it may quit purely because its startup shell was closed. If you are running long-term jobs, this is too fragile.

This is where nohup is useful. Nohup tells the process it will be starting up to ignore the HUP signal, which means that when its parent stops, the process will be moved to become a child to the init process (which will always be running).

The nohup utility will also not connect the standard streams to the controlling terminal. Instead, stdin is connected to /dev/null, stdout is written to a file called nohup.out (current directory or home directory), and stderr goes to stdout (?why?)


If you started a process that you want to become immune to HUP without having to restart it, your shell may provide for this. In bash, the command is
disown
(with a jobspec)



on bash jobspecs
In bash,
jobs
will give you a list like:
[1]   Running                 sleep 200 &
[2]   Running                 sleep 200 &
[4]   Running                 sleep 200 &
[5]-  Running                 sleep 200 &
[7]+  Running                 sleep 200 &

Use of jobspecs looks like:

disown %4
kill %1      # this kill is a bash builtin, /bin/kill won't understand this

There's more, but I've never needed the complex stuff.



Notes:

  • some shells have their own nohup, which supersedes the nohup executable. (example: csh's built-in nohup acts differently from bash's. In particular, it does not redirect stdout and stderr)


See also:



Limitations and problems

TODO


Changing to common directories

When working on a project or dataset is likely to take a while, I like to have a few-key method of going there.

Using tmux/screen solves that half of the time (because you return to a shell in the right directory), but it's still nice to allow new shells to move quickly.

The simple way is an alias:

alias work="cd /home/proj/code/mybranch"


I've worked at a place you would commonly want to find directories with known names which were annoying to type or even complete entirely.

While you can't get a subprocess to change directory for you (its environment is its own), a cd in a bash function applies to the shell it's called from, so you can do:

workdir () {
    cd $(/usr/local/bin/resolve-workdir.py $@)
}

My version of that script

  • was hardcoded to glob a few directories for subdirectories you might want to go to,
  • does a case-insensitive substring match,
  • ...and when that's ambiguous, or matches nothing, it outputs the current directory to effectively not change directories, and prints some friendly information on stderr.


tweaking less

You can set other behaviour by setting the options them in the LESS environment variable, for example:

export LESS="-SaXz-5j3R"

Some options that I've used:

  • -S: crop lines, don't wrap. Means it acts consistently as if you have a rectangular window on the text. (By default, less does line wrapping when you are positioned to the left, and disables line wrapping once you look to the right at all. Since I'm usually looking at data or code, not man-page-like text, I find this annoying)
  • -a: while searching and asking for the next match, those currently visible are considered as having been seen, and won't be paused at. Useful when there are a lot of hits on the same screen
  • -X: don't clear the screen (meaning very short files are shown inline in the terminal)
  • -z-5: PgUp/PgDn will scroll (a screenful minus five) lines instead of an exact screenful. I like this context when reading code and text.
  • -j3: search results show up on third line instead of top line, for some readable context (negative number has different meaning; see man page)
  • -R: Allow through ANSI control codes (mostly for colors), and try to estimate how that affects layout so that we don't mess up layout too much. It's probably a good idea to set this conditionally, e.g. have bashrc include something like:
case "$TERM" in
    xterm*|rxvt*|screen*)
        # allow raw ANSI too
        export LESS="-SaXz-5j3R" ;;
    * )
    export LESS="-SaXz-5j3" ;;
esac
  • -n: Suppress line numbering. Very large files load faster. Does disable some line-related features, but I rarely use them.
Note that you can also cancel this while it's doing it, with a single Ctrl-C


Further notes:

  • less filename and cat filename | less are not entirely identical.

When less takes input from stdin (the second way above), it will show contents more or less verbatim. When invoked directly, less may apply preprocessing.

  • Preprocessing basically means less runs something on the data, depending on what it is. For example to show the decompressed version of files, rendering HTML via a text-mode browser, showing the text from a PostScript file, showing music's metadata, colorizing code, and more.
see 'input preprocessor' in the man page for more details
  • less is usually the default PAGER, a variable which contains the executable that programs can call to show long content. For example, man uses PAGER.