Command line and bash notes

From Helpful
Revision as of 02:01, 10 January 2011 by Helpful (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Template:Shellrelated xargs' job is to take arguments from stdin, build commands, and execute them. It is often used in combination with find, and it is one solution to the argument list too long problem, as well as handling filenames with spaces and such. For example, to print all filenames (because xarg's default program to run is /bin/echo):

 find . | xargs

find and xargs

find tricks

File and path

File name (globs)

A simple and useful filter can be something like:

find . -name '*.gz' -print0 | xargs -0 gunzip

The -name and -iname arguments allow you to use case sensitive and case-insensitive globs that find itself evaluates (when you prevent bash from doing so, hence the single quotes), and -type will allow you to look for directories, files, symlinks and more. For example:

find /etc -iname 'co*.*' -type f | xargs file

A simpler but less exact alternative is:

find /etc | grep co | xargs file

To use filename-safe handling, see above. It's less typing (but more thinking escapewise) to do the same via find's -exec option. I don't like it because the escaping is hard to get right, but it does work:

find /etc -exec ls '{}' \;

However, xargs is a more powerful and probably easier alternative.

Filename (regex)

You can get regular expression treatment with -regex and -iregex.


Full path (globs)
-wholename, -iwholename


Date/time

You can check for files accessed, creation or data modification time

  • in days using -atime, -ctime and -mtime, or in minutes using -amin, -cmin and -mmin
  • Negative numbers mean 'in the last ...' (relative to current system time).

For example, "Find files created within the last day" and "files modified within the last ten minutes?:

find /etc -ctime -1
find /etc -mmin -10

Permissions, ownership

Permissions

You can look for exact permissions, like those within your home dir that are readable and writeable by everyone, or more specifically 'exact bit test'

find ~ -perm 777            #wide open things in your home directory

Query style variation:

  • at least has all mentioned, e.g. 'writeable by group and other'
find ~ -perm g+w,o+w
  • any of the mentioned bits,e.g. "writeable by group or other" (which is probably the test you wanted instead of the last):
find ~ -perm /g+w,o+w
  • any of the mentioned bits, numerically (identical to the last):
find ~ -perm +022    

Note: some systems / setups create a unique group for each user. In this case, +002 (o+w) would be more useful.

  • 'does not have bits' has to be handled with a negation. For example, things that are not group-writable:
find ~ ! -perm /g+w


Ownership

Say you administer your web contest as root. You'll inevitably have some things not owned by the web server's users but by root. This doesn't always matter - web servers often just read content and files generally have world-read permissions. Still, if you have any CGI scripts, there may be suid/sgid bits and you may as well play safe. A simple and drastic check: all users that "anything that isn't apache":

find /var/www ! -user apache -print0 | xargs -0 chown apache:

Note that if you just want everything owned by user apache, and group apache, a recursive chown like chown -R apache:apache /var/www is a lot simpler.


I moit groups from that example because I personally have one case where the group not being apache is functional. Still, I wouldn't want it to be root, or id that doesn't exist on this system. I would use something like:

find /var/www -group root -o -nogroup -print0 | xargs -0 chown :apache

In other words, "make the group apache in case it was in group 'root' or it had none".

nouser and nogroup is for files that have ids that do not map to a user. This happens mostly when you extract from tars because it stores integers, not names. (Yes, this is potentially a security problem when you extract as root - but only if you can't trust the archive contents anyway, in which case you're not being a careful root).


Security

When there are files that don't have a valid user or group (these items show up as numbers instead of usernames), this indicates often comes from unpacking with tar (which stores only user IDs) and not having things expanded as the unpacking user. It could also mean that you clean /tmp less often than you remove users, or even that someone has intruded. Try:

find ~ -nouser -or -nogroup


A file's mode consists not only of permissions but also includes entry type and things like:

1000 is sticky
2000 is sgid
4000 is suid

If you remember one form, it's probably the 'has any of the bits mentioned' one, for example looking for files that have SUID or SGID set by using:

find / -perm +6000 -type f

Type

You can ask find for a specific type of entry. This can be useful if, say, you want to avoid handing directory names to tar tar, which would recurse them (see backup for example).


Example: "What does the /proc directory tree look like, ignoring those PID-process things?"

find /proc -type d | egrep -v '/proc/[0-9]*($|/)' | less

The interesting types are directory (d), regular file (f) and symlink (l). (The others are: character device (c), block device (b), FIFO, a.k.a. named pipe (p), and socket (s))

If you want to dereference symlinks and get the type of whatever it points to, use -xtype instead of -type.

Size

Size can be used in queries like "find large logs" and "find lar temporary files that haven't changed in a month"

find /var/log -size +10M -ls
find /tmp /var/tmp -size +30M -mtime 31 -ls

In cases like these, getting the file listing with the -ls option is useful and saves a -print0 | xargs -0 ls -l.


"I can't remember what my small script file was called but want to grep as little of my filesystem for it as possible," that is, "grep only regular files smaller than, say, 10KB":

find / -type f -size -10k 2>/dev/null -print0 | xargs -0 egrep '\bsomeknowncontent\b'

Note that without k, M or G as a byte-implying unit, the number is in blocks (512-byte blocks). You can explicitly indicate bytes by using c. This is necessary particularly to test for particularly small files:

find /tmp -size -100c


Note you can easily get a range test because arguments are anded together. For example, between 10kB and 20kB:

find /tmp -size +10k -size -20k

Unsorted

With some additional shell-fu you can get creative, such as "grep text and html files - but not index.html - and report things that contain the word elevator in four or more lines"

find . -type f -print0 | egrep -iazZ '(\.txt|\.html?)$' | grep -vazZ 'index.html' | \
     xargs -n 1 -0 grep -c -Hi elevator | egrep -v ':[0123]$'

--

I prefer not to use the and and or logic that find provides. They don't react the way I think they should (and the file stat() wil happen anyway, whether the filename filter happens now or later) Of course, you do need to keep track of your greps and egreps, and arguably it's too much manual work, but still...

--

When you're not going for xargs, you may want to get detailed results. When you use find -ls you get a listing in more details than you'ld get from ls -l.


Issues

Number of files per command, argument ordering

Xargs hands a fair number of arguments to the same command. This is faster than running it once per filename because of various overheads in running.


This does mean that it only works with programs that take arguments in the form

command input input input

A good number of programs work like that, but certainly not all. Some take:

command singleinput 
command input explicitoutputfilename
command input input output

In these cases, xarg's default of adding many arguments will cause errors in the best case, or overwrite files in the worst (think of cp and mv)

The simplest solution is to force xargs to do an execution for each input:

find . | xargs -n 1 echo


In the case of where something pre-determined has to come last, like cp or mv to a directory, you can do something like:

find / -iname '*.txt' | xargs --replace=@ cp @ /tmp/txt

You could use any character (instead of @ used here) that you don't otherwise use in the command to xargs (whether it appears in the filename data it hands around is irrelevant).

Character robustness

Unix filenames can have almost any character, including spaces and even newlines, which command line tools don't always like. For robustnexx, hand xargs filename split not with spaces or newlines, but with null (0x00) characters. Find, xargs, grep and some other utilities can do this with the appropriate arguments:

find /etc -print0 xargs -0 file

Various things you can usually use for piping may not accept and/or produce this format, so try to do what you can with find's options; see find tricks.

Grep can work, since you can tell it to treat the input as null-delimited (--null-data, or -z) and print that way too (-null, or -Z). You should also disable the 'is this binary data?' guess with -a, since it would otherwise sometimes say "binary file ... matches" instead of giving you matching filenames:

find /etc -print0 | grep -azZ test | xargs -0 file

Notes:

  • For xargs, -0 is the shorter form of --null. For other tools, look for -0, -null, and -z

Null aware convenience aliases

It's tedious always write all the above out, which leads to lazy 'eh, I can probably get away with it' decisions. Me, I want convenient shortcuts that make it simple to do things right. I settled on:

_find() { `which find` "$@" -print0; };
alias find0="_find "
alias xargs0="xargs -0 -n 1 "
alias grep0="grep -azZ "
alias egrep0="grep -azZE "
_locate() { `which locate` "$@" | tr '\n' '\0'; };
alias locate0="_locate "
alias sort0="sort -z "

You can use these like their originals:

find0 /dev | grep0 sd | xargs0 file

Notes:

  • ...these definitions probably need tweaking in terms of arguments, particularly for find0
  • _find has to be a function because the print0 needs to come after the expression (I'm not sure whether there are side effects; I haven't really used bash functions before). The which is there to be sure we use the actual find executable.
  • find0 and locate0 are indirectly because since function defs don't like digits, while aliases don't mind
  • You could overwrite the standard utilities behaviour to have them always be null-handling (e.g. alias xargs="xargs -0") but:
    • it may break other use (possibly scripts? There was something about aliases only being expanded in interactive bash, which probably excludes scripts), but more pressingly:
    • you need to always worry about whether you have them in place on a particular computer. I'd rather have it tell me xargs0 doesn't exist than randomly use safe and unsafe versions depending on what host and what account I run it.
  • I added one-at-a-time behaviour on xargs (-n 1) because it's saner and safer for a few cases (primarily commands that expect 'inputfile outputfile' arguments). And you can always add another -n in the actual command. (I frankly think this should be the default, because as things are, in the worst case you can destroy half your argument files, which I dislike even if it's unlikely
  • I use locate fairly regularly, and it doesn't seem to know about nulls and apparently assumes newlines don't appear in filenames (which is not strictly a guarantee, but neither is it a strange assumption). Hence the locate0 above, even if it's sort of an afterthought.
  • sort0 is there pretty much because it's easier to remember the added 0 than to remember whether the command needs to be told -0, -z, -Z, -null, --null-something) that you could add.

I'm open to further suggestions.

Further experiments in convenience functions: find+grep

These could be a little simpler for trivial cases:

# Note that the first argument goes to find, the rest to grep 
# (so that you can easily use -i and other grep options)
 
findgrep() {
  if [ $# -lt 2 ]; then 
    echo "Needs (at least) two arguments: path, grep argument(s)."; 
    return -1; 
  fi;
  p=$1;shift;   
  `which find` $p -print0 | `which grep` -azZ $@;
}
 
findegrep() {
  if [ $# -lt 2 ]; then 
    echo "Needs (at least) two arguments: path, grep argument(s)."; 
    return -1; 
  fi;
  p=$1;shift; 
  `which find` $p -print0 | `which egrep` -azZ $@;
}

Which allows easily wading through files without worrying about odd filenames:

findgrep . issn                          
findegrep . 'beaut.*py' -i               #case insensitive regex

And of course:

findgrep . issn | xargs0 file



Further experiments in convenience functions: mv and cp helpers

When you want to use certain operations a lot, you could write things like:

xcp() { 
  if [ $# -ne 1 ]; then echo "Needs one argument, the target path"; return -1; fi; 
  `which xargs` -0 --replace=@ cp @ $1;
}
xmv() {
  if [ $# -ne 1 ]; then echo "Needs one argument, the target path"; return -1; fi; 
  `which xargs` -0 --replace=@ mv @ $1;
}
#Allowing things like:
findgrep . '*.txt' -i | xmv ./textfiles
findegrep . '\.htm[l]?' -i | xmv ./htmlfiles

Notes:

  • untested; 's your fault if you kill everything you have:)


Grep and alternatives

grep and variants

Standard are:

  • grep
  • egrep is grep -E (regexp match, ERE style)
  • fgrep is grep -F



variants and aliases

aliases for grep with options

ack

http://betterthangrep.com/


grin

http://pypi.python.org/pypi/grin