Pattern Matching

Shell Expressions
In the shell you can use the following to match file names

Example / Meaning
?
??? / matches a single character
matches 3 characters
* / matches any number of characters
[abcxyz]
[a-rS-Z5-9P] / matches any characters inside [] or a range of characters
[^abcxyz123]
[!abcxyz1-9] / Matches anything but the characters [] or a range of character
! and ^ only have this meaning if they after the first character
[[:alpha:]] / matches a character class. See Extended Regular expressions
{Jan,Feb,March,Oct}
{report10..20} / Look for all of the alternate values
or the entire range of values
~
~smith / match your home directory
or the directory of another user

Extended Regular Expressions (Extended regex)
Regular expressions were developed after shell expressions, and were designed to handle more complex patterns needed to search inside text and binary files.

Example / Meaning
. (dot) / Matches a single character
^ / Matches the beginning of a line
$ / Matches the end of a line
[abcxyz123]
[a-rS-Z5-9P] / Matches any of the characters inside [] or a range of characters
Important note: If you insert a comma in the list, it isn’t a separator – it will litterally match a comma. Also if you want to match a - (dash), you need to escape it with a backslash: \-
[^abcxyz123]
[!abcxyz1-9] / Matches anything but the characters [] or a range of character
! and ^ only have this meaning if they after the first character
[[:alpha:]][[:digit]]
[:alum:]
[:blank:]
[:cntrl:]
[:graph:]
[:lower:]
[:print:]
[:punct:]
[:space:]
[:upper:]
[:xdigit:] / character classes - must be inside square brackets. Example matches A9 – alphabetic letter + digit.
Other classes:
alphanumeric
space or tab
control character
printable, except for space
lowercase
printable including space
punctuation only
whitespace character including form feet, carriage return, vertical and horiz. tabs
uppercase
hexadecimal digit
(pattern1|pattern2) / A string of characters – the | means “or” - either string or pattern can match
(pattern){3} / Exactly 3 of a letter or a pattern
.{10}$ - lines with 10 characters before the end
[[:digit:]]{5} - 5 digits
(pattern){3,5} / 3 to 5 copies of a pattern - the match is greedy – it finds the longest match
(pattern){3,} / 3 or more copies of a letter or a pattern
(pattern)* / Any number (including 0) copies of a pattern
(pattern)+ / Match 1 or more – the match is greedy
[0-9]*9
(pattern)? / Match 0 or 1 , but not more than one copy of a pattern
[0-9]?9 - matches digits followed by a 9 or 9 by itself.
match the beginning of a word -
match the end of a word
\n / Back reference. Match the nth bracketted expression again
(TECH154|TECH152).*\1 - match the 1st result again later in the line

The textbook doesn’t discuss the extended regex, only basic regex. The extended version is easier to use. The differences are as follows: In basic regex the special characters () {} and | have to be preceded by a backslash (the escape character). This often makes regex harder to read. In extended regex you just type them in, but if you want to match them as an actual character, you need to put a backslash in front. Basic regex also doesn’t support + or ?
Commands that use basic regex: expr, grep, vim
Commands that use extended regex: egrep, sed -e , awk, perl, python, oracle, java
There can be slight differences in behaviour and notation in how regex has been implemented in each of these tools – neither regex nor extended regex has ever been standardized.

In vim the default when you use / or ? to search is basic regex. You can switch to extended regex known as “magic” mode by immediately following the search operator with \v (backslash v)