A regular expression is a pattern. Some parts of the pattern match single characters in the string of a particular type. Other parts of the pattern match multiple characters, or multiples of multiples.

Single - character Patterns:

  • a single character matches itself
  • a dot "." matches any single character except a newline
  • a chracter class is inclosed in [ ]. One and only one of these characters must be present at the corresponding part of the string to match
    e.g. [aeiuoAEIOU] matches any upper- or lowercase vowel in a string
    e.g. [0123456789] matches any single digit
  • a negated character class matches any single character that is not in the list
    e.g. [^0-9] matches any single non-digit

Grouping Patterns:

  • the first pattern is the sequence. That means that abc matches an a followed by a b followed by a c.
  • The asterisk indicates "zero or more" of the immediately previous character or character class
  • Similarly the "+" sign means "one or more"
  • and the "?" means "zero or one" of the immediately previous character or character class
  • another group construction is the alternation, as in "a|b|c". This means to match excactly one of the alternatives (a or b or c in this case). This works even if the alternatives have multiple characters, as in song|blue, which matches either song or blue!

Anchoring Patterns:

  • a \b requires a word boundary at the indicated point in order for the pattern to match
  • \B requires that there not be a word boundary at the indicated point
    e.g. \bFred\B matches "Frederick" but not "Fred Flintstone"
  • a ^ matches the beginning of the string if it's the first character in the expression to match
  • a $ matches the end of the string if it's the last character in the expression to match