Chapter 17. Regular Expressions

 

. . . the intellectual activity associated with software development is largely one of gaining insight.

--Stowe Boyd

To fully utilize the power of shell scripting, you need to master Regular Expressions. Certain commands and utilities commonly used in scripts, such as grep, expr, sed and awk, interpret and use REs. As of version 3, Bash has acquired its own RE-match operator: =~.

17.1. A Brief Introduction to Regular Expressions

An expression is a string of characters. Those characters having an interpretation above and beyond their literal meaning are called metacharacters. A quote symbol, for example, may denote speech by a person, ditto, or a meta-meaning [1] for the symbols that follow. Regular Expressions are sets of characters and/or metacharacters that match (or specify) patterns.

A Regular Expression contains one or more of the following:

The main uses for Regular Expressions (REs) are text searches and string manipulation. An RE matches a single character or a set of characters -- a string or a part of a string.

Note

Some versions of sed, ed, and ex support escaped versions of the extended Regular Expressions described above, as do the GNU utilities.

Sed, awk, and Perl, used as filters in scripts, take REs as arguments when "sifting" or transforming files or I/O streams. See Example A-12 and Example A-17 for illustrations of this.

The standard reference on this complex topic is Friedl's Mastering Regular Expressions. Sed & Awk, by Dougherty and Robbins, also gives a very lucid treatment of REs. See the Bibliography for more information on these books.

Notes

[1]

A meta-meaning is the meaning of a term or expression on a higher level of abstraction. For example, the literal meaning of regular expression is an ordinary expression that conforms to accepted usage. The meta-meaning is drastically different, as discussed at length in this chapter.

[2]

Since sed, awk, and grep process single lines, there will usually not be a newline to match. In those cases where there is a newline in a multiple line expression, the dot will match the newline.
   1 #!/bin/bash
   2 
   3 sed -e 'N;s/.*/[&]/' << EOF   # Here Document
   4 line1
   5 line2
   6 EOF
   7 # OUTPUT:
   8 # [line1
   9 # line2]
  10 
  11 
  12 
  13 echo
  14 
  15 awk '{ $0=$1 "\n" $2; if (/line.1/) {print}}' << EOF
  16 line 1
  17 line 2
  18 EOF
  19 # OUTPUT:
  20 # line
  21 # 1
  22 
  23 
  24 # Thanks, S.C.
  25 
  26 exit 0