REGULAR EXPRESSIONS

This description was copied from the TCL documentation which in turn was copied from a manual page written by Henry Spencer. The definition below is very succinct but accurate and worth a study, alternatively you may find a more readable description on UNIX systems by doing "man ed" and looking up regular expressions!

A regular expression is zero or more branches, separated by "|". It matches anything that matches one of the branches.

A branch is zero or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.

A piece is an atom possibly followed by "*", "+", or "?". An atom followed by "*" matches a sequence of 0 or more matches of the atom. An atom followed by "+" matches a sequence of 1 or more matches of the atom. An atom followed by "?" matches a match of the atom, or the null string.

An atom is a regular expression in parentheses (matching a match for the regular expression), a range (see below), "." (matching any single character), "^" (matching the null string at the beginning of the input string), "$" (matching the null string at the end of the input string), a "\" followed by a single character (matching that character), or a single character with no other significance (matching that character).

A range is a sequence of characters enclosed in "[]". It normally matches any single character from the sequence. If the sequence begins with "^", it matches any single character not from the rest of the sequence. If two characters in the sequence are separated by "-", this is shorthand for the full list of ASCII characters between them (e.g. "[0-9]" matches any decimal digit). To include a literal "]" in the sequence, make it the first character (following a possible "^"). To include a literal "-", make it the first or last character.

Here are some examples to get you going.

(abc|def) Matches the string "abc" or the string "def"
^U + Matches the beginning of the line followed directly by a U then one or more spaces
F.REWORKS Matches the strings FOREWORKS FIREWORKS F@REWORKS etc.
A[^BC]D Matches anything A.D matches but excluding ABC and ACD.
^[ \t]*$ Matches any lines with only spaces or tabs on them.