Sed

From Helpful
Jump to: navigation, search
These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Sed is a stream editor, meaning it does things to a stream of data, which makes it interesting for some shell-fu and text editing purposes. I rarely use it - by the time I may need it, I'm thinking of complex enough to require a simple script.

See also Awk.


Note: you often want
-r
argument, for extended regexps. sed's default seems to be classical unix regexps, which excludes things you probably expect, like
+
,
?
,
|
, backreferences(verify), and more.


Basics

sed processes one line at a time. It lets you take regular expressions and related operations to a streams of data, e.g. doing replacement:

# echo "All day and all night" | sed 's/day/night/'
All night and all night

There are three things you can do:

  • s/substitute/text/
  • y/transliterate/text/ (equal-length strings in both parts; this is basically a dumb tr)
  • /match text/, which is only interesting in combination with other options; more about that later.



First (default) versus all (/g)

By default, sed will only operate on the first match:

> echo "All day and all day" | sed 's/day/night/'
All night and all day


To work on all matches, add /g:

> echo "All day and all day" | sed 's/day/night/g'
All night and all night

Delimiters

You can use an arbitrary character as the delimiter. For example:

find /etc | sed 's_/etc/__'
...which is less confusing than
s/\/etc\///
.

-e: multiple expressions

You can apply multiple things per line. It looks like they are applied sequentially, acting like connected pipes.(verify)

This means you can do things like double output, sequential transformations, simple conditionals, and such.

This also makes things like -n (don't automatically print lines) and /p ('print this line') more interesting (see below).

Grouping

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Presumably, you will use groups to capture specific groups. In this case, you need to add -r (extended regexp).

For example, to selectively rewrite lines like:

   'key':'value',

to

   'value':'key',

You could use something like:

sed -r 's/\s+([^:]+)[:](.+?)[,]/    \2:\1,/g'

(The [^something] matches not-something characters. Having it match up to some point you know it should stop is one simple-and-stupid way of making a long match non-greedy)



When you s/replace//, you can also get the matched value using &

find /etc/ | sed -n -r 's/\.(conf|cnf|cf|ini|rc)$/: configuration file (type: &)/p'

Advanced features you may not need

Addresses: line-based conditions

You can make expressions apply according to what line of input you're on. For example, you can do:

find /etc/ | sed '3   s_/etc_/foo_'   #does the replacement only on the third line
find /etc/ | sed '1~5 s_/etc_/foo_'   #does the replacement every fifth line
find /etc/ | sed '$   s_/etc_/foo_'   #does the replacement on the last line
find /etc/ | sed '2,5 s_/etc_/foo_'   #does the replacement on line two to five

You can add a ! between the address and the command to negate the logic of the address. For the above what would respectively mean 'every line but the third', 'four of every five lines', 'everything but the last line', 'lines 1, and from 6 onwards'.

See the sed man page for more.

-n and /p: grep-like behaviour

When you use the -n option on sed, it does not print what it reads, only what you tell it to using /p:

For example, "print only lines that contain 'All' or 'all', and replace 'all' with 'boo'":

> echo -e "All day \n and all day\n  not too long" | sed -n -e '/All/p' -e 's/all/boo/p'
 All day
  and boo day

Note: the -e option on echo forces it to interpret the escape, so that it outputs the \n as an actual newline

# A slightly more useful example:
#   From the log of initial install (debian/ubuntu), 
#   use just the lines mentioning the package
#   and print only the name part of that line:
zcat /var/log/installer/initial-status.gz | sed -n 's/^Package: //p'

y///: transliteration

y/sourcelist/targetlist/
: transliterates single characters from the first list into the second, for example:
> echo "All day  and all day" | sed 'y/aA/Aa/'
all dAy And All dAy

This is basically a simple version of what the tr utility does, though in combination with other expressions you can make it more advanced than tr.

Matching as a condition for operations

Useful for things character substitution/transliteration.

The following means 'if the line matches the expression, apply operation to whole line' (not 'apply only to the match').

sed '/[A-Z]/y/0123456789/         /'
sed '/[A-Z]/s/[0-9]/\ /g'

Both of these do the nonsense operation of replacing digits with spaces only when there is a capital letter on the line.

/i and /a: inserting text before and after a line

For example, the following decorates 'Section ...':

cat mytext | sed -e '/^[Ss]ection [0-9]*$/i #####' -e '/^[Ss]ection [0-9]*$/a ###'

With this,

Section 1

would become

#####
Section 1
###

(this may have uses in the context of LaTeX, sometimes XML)


Halfway useful examples

Extract single value by regexp, here for the CUDA version:

nvcc -V 2>/dev/null | sed -n -r 's/.*release ([0-9][.][0-9]),.*/\1/p'


To grep ps - and always let through first line (ps's header):

ps aux | sed -n -e '1 p' -e "/$1/p"

Note this is somewhat fragile because of the way it places a command line arg in a string.


To replace something in one or more files, such as to change naming conventions in code, do an in-place restore

sed -i.bak 's/\bgetSession\b/get_session/g' *.py

The \b ('only if at word boundary') avoids potentially nasty substring replaces.

The .bak argument to -i makes sed make a backup (with that extension), just in case this messed up everything and you want to restore the old copy/copies. If you're happy (you could diff the two to check) you can remove these.

Note: other ways of doing this include perl -pie (perl -p -i -e), and replace. Using sed seems a decent balance of brief and powerful. See also replacing text in multiple files].


Indent an entire file

sed 's/^/    /'


Double-space a file:

sed 'G'


Insert newlines, e.g. into an XML file that had al;l data on a single line

cat isbns.xml | sed 's_</isbn><isbn>_</isbn>\n<isbn>_g' > isbns_viewable.xml

Common errors

invalid reference on `s' command's RHS

Usually you forgot the -r (extended regexp) option.

See also

Tutorials:

Examples: