Advanced Bash-Scripting Guide: An in-depth exploration of the art of shell scripting
Prev		Next

Appendix C. A Sed and Awk Micro-Primer

This is a very brief introduction to the sed and awk text processing utilities. We will deal with only a few basic commands here, but that will suffice for understanding simple sed and awk constructs within shell scripts.

sed: a non-interactive text file editor

awk: a field-oriented pattern processing language with a C-style syntax

For all their differences, the two utilities share a similar invocation syntax, both use regular expressions , both read input by default from stdin, and both output to stdout. These are well-behaved UNIX tools, and they work together well. The output from one can be piped into the other, and their combined capabilities give shell scripts some of the power of Perl.

One important difference between the utilities is that while shell scripts can easily pass arguments to sed, it is more complicated for awk (see Example 33-5 and Example 9-25).

C.1. Sed

Sed is a non-interactive [1] line editor. It receives text input, whether from stdin or from a file, performs certain operations on specified lines of the input, one line at a time, then outputs the result to stdout or to a file. Within a shell script, sed is usually one of several tool components in a pipe.

Sed determines which lines of its input that it will operate on from the address range passed to it. [2] Specify this address range either by line number or by a pattern to match. For example, 3d signals sed to delete line 3 of the input, and /windows/d tells sed that you want every line of the input containing a match to "windows" deleted.

Of all the operations in the sed toolkit, we will focus primarily on the three most commonly used ones. These are printing (to stdout), deletion, and substitution.

Table C-1. Basic sed operators

Operator	Name	Effect
`[address-range]/p`	print	Print [specified address range]
`[address-range]/d`	delete	Delete [specified address range]
`s/pattern1/pattern2/`	substitute	Substitute pattern2 for first instance of pattern1 in a line
`[address-range]/s/pattern1/pattern2/`	substitute	Substitute pattern2 for first instance of pattern1 in a line, over `address-range`
`[address-range]/y/pattern1/pattern2/`	transform	replace any character in pattern1 with the corresponding character in pattern2, over `address-range` (equivalent of tr)
`g`	global	Operate on every pattern match within each matched line of input

Unless the g (global) operator is appended to a substitute command, the substitution operates only on the first instance of a pattern match within each line.

From the command line and in a shell script, a sed operation may require quoting and certain options.

1 sed -e '/^$/d' $filename 2 # The -e option causes the next string to be interpreted as an editing instruction. 3 # (If passing only a single instruction to sed, the "-e" is optional.) 4 # The "strong" quotes ('') protect the RE characters in the instruction 5 #+ from reinterpretation as special characters by the body of the script. 6 # (This reserves RE expansion of the instruction for sed.) 7 # 8 # Operates on the text contained in file $filename.

In certain cases, a sed editing command will not work with single quotes.

1 filename=file1.txt 2 pattern=BEGIN 3 4 sed "/^$pattern/d" "$filename" # Works as specified. 5 # sed '/^$pattern/d' "$filename" has unexpected results. 6 # In this instance, with strong quoting (' ... '), 7 #+ "$pattern" will not expand to "BEGIN".

Sed uses the -e option to specify that the following string is an instruction or set of instructions. If there is only a single instruction contained in the string, then this may be omitted.

1 sed -n '/xzy/p' $filename 2 # The -n option tells sed to print only those lines matching the pattern. 3 # Otherwise all input lines would print. 4 # The -e option not necessary here since there is only a single editing instruction.

Table C-2. Examples of sed operators

Notation	Effect
`8d`	Delete 8th line of input.
`/^$/d`	Delete all blank lines.
`1,/^$/d`	Delete from beginning of input up to, and including first blank line.
`/Jones/p`	Print only lines containing "Jones" (with -n option).
`s/Windows/Linux/`	Substitute "Linux" for first instance of "Windows" found in each input line.
`s/BSOD/stability/g`	Substitute "stability" for every instance of "BSOD" found in each input line.
`s/ *$//`	Delete all spaces at the end of every line.
`s/00*/0/g`	Compress all consecutive sequences of zeroes into a single zero.
`/GUI/d`	Delete all lines containing "GUI".
`s/GUI//g`	Delete all instances of "GUI", leaving the remainder of each line intact.

Substituting a zero-length string for another is equivalent to deleting that string within a line of input. This leaves the remainder of the line intact. Applying s/GUI// to the line
The most important parts of any application are its GUI and sound effects
results in
The most important parts of any application are its and sound effects

A backslash forces the sed replacement command to continue on to the next line. This has the effect of using the newline at the end of the first line as the replacement string.
1 s/^ */\ 2 /g
This substitution replaces line-beginning spaces with a newline. The net result is to replace paragraph indents with a blank line between paragraphs.

An address range followed by one or more operations may require open and closed curly brackets, with appropriate newlines.
1 /[0-9A-Za-z]/,/^$/{ 2 /^$/d 3 }
This deletes only the first of each set of consecutive blank lines. That might be useful for single-spacing a text file, but retaining the blank line(s) between paragraphs.

The usual delimiter that sed uses is /. However, sed allows other delimiters, such as %. This is useful when / is part of a replacement string, as in a file pathname. See Example 10-9 and Example 15-32.

A quick way to double-space a text file is sed G filename.

For illustrative examples of sed within shell scripts, see:

For a more extensive treatment of sed, check the appropriate references in the Bibliography.

[1]	Sed executes without user intervention.
[2]	If no address range is specified, the default is all lines.

Prev	Home	Next
Reference Cards		Awk

Appendix C. A Sed and Awk Micro-Primer

C.1. Sed

Notes