Awk Is a Powerful Command Language That Allows the User to Manipulate Files Containing

Awk is a powerful command language that allows the user to manipulate files containing columns of data. Many applications of awk resemble those done on PC spreadsheets.
There are two ways to run awk. A simple awk command can be run from a single command line. More complex awk scripts should be written to a command file.

Awk takes each line of input and tries to match the 'pattern' (see below), and if it succeeds it will do whatever you tell it to do within the {} (called the action). Awk works best on files that have columns of numbers or strings that are separated by white space (tabs or spaces), though on most machines you can use the -F option if your columns are set apart by another character. Awk refers to the first column as $1, the second column as $2, etc., and the whole line as $0.

Example Problem 1: Suppose a file called 'file1' that has 2 columns of numbers, and you want to make a new file called 'file2' that has columns 1 and 2 as before, but also adds a third column which is the ratio of the numbers in columns 1 and 2. Further, suppose you want the new 3-column file (file2) to contain only those lines with column 1 smaller than column 2.

Solution:

awk '$1 < $2 {print $0, $1/$2}' file1 > file2
Or equivalently,
cat file1 | awk '$1 < $2 {print $0, $1/$2}' > file2
Let's look at the second one. You all know that 'cat file1' prints the contents of file1 to your screen. The | (called a pipe) directs the output of 'cat file1', which normally goes to your screen, to the command awk. Awk considers the input from 'cat file1' one line at a time, and tries to match the 'pattern'. The pattern is whatever is between the first ' and the {, in this case the pattern is $1 < $2. If the pattern is false, awk goes on to the next line. If the pattern is true, awk does whatever is in the {}. In this case we have asked awk to check if the first column is less than the second. If there is no pattern, awk assumes the pattern is true, and goes onto the action contained in the {}.
What is the action? Almost always it is a print statement of some sort. In this case we want awk to print the entire line, i.e. $0, and then print the ratio of columns 1 and 2, i.e. $1/$2. We close the action with a }, and close the awk command with a '. Finally, to store the final 3-column output into file2 (otherwise it prints to the screen), we add a '> file2'.
Example Problem 2: Suppose you have multiple files (e.g., hundreds) you want to move into a new directory and rename by appending an extension .old to the filenames. You could do this one by one (several hours), or use awk (several seconds). Suppose the files are named hw* (* is wildcard for any sequence of characters), and need to be moved to subdirectory called 620 of the current directory and have the extension '.old' appended to the name.

Solution:
ls hw* | awk '{print "mv "$0" ./620/"$0".old"}' | csh
ls hw* lists the filenames, and this output is piped into awk instead of going to your screen. There is no pattern (nothing between the ' and the {), so awk proceeds to print something for each line. For example, if the first three lines from 'ls hw*' produced hw1, hw2 and hw3, respectively, then awk would print:
mv hw1 ./620/hw1.old
mv hw2 ./620/hw2.old

mv hw3 ./620/hw3.old
At this point the mv commands are simply printed to the screen. To execute the command we take the output of awk and pipe it back into the operating system (the C-shell). Hence, to finish the statement we add a ' | csh'.
More complex awk scripts need to be run from a file. The syntax for such cases is:
cat file1 | awk -f commands.awk > file2
where file1 is the input file, file2 is the output file, and commands.awk is a file containing awk commands. Examples below that contain more than one line of awk need to be run from files.
Some useful awk variables defined for you are NF (number of columns), NR (the current line that awk is working on), END (true if awk reaches the EOF), BEGIN (true before awk reads anything), and length (number of characters in a line or a string). There is also looping capability, a search (/) command, a substring command (extremely useful), and formatted printing available. There are logical variables || (or) and & (and) that can be used in 'pattern'. You can define and manipulate your own user-defined variables. More examples are given below.

EXAMPLES # is the comment character for awk. 'field' means 'column'

# Print first two fields in opposite order:

awk '{ print $2, $1 }' file

# Print lines longer than 72 characters:

awk 'length > 72' file

# Print length of string in 2nd column

awk '{print length($2)}' file

# Add up first column, print sum and average:

{ s += $1 }

END { print "sum is", s, " average is", s/NR }

# Print fields in reverse order:

awk '{ for (i = NF; i > 0; --i) print $i }' file

# Print the last line

{line = $0}

END {print line}

# Print the total number of lines that contain the word Pat

/Pat/ {nlines = nlines + 1}

END {print nlines}

# Print all lines between start/stop pairs:

awk '/start/, /stop/' file

# Print all lines whose first field is different from previous one:

awk '$1 != prev { print; prev = $1 }' file

# Print column 3 if column 1 > column 2:

awk '$1 > $2 {print $3}' file

# Print line if column 3 > column 2:

awk '$3 > $2' file

# Count number of lines where col 3 > col 1

awk '$3 > $1 {print i + "1"; i++}' file

# Print sequence number and then column 1 of file:

awk '{print NR, $1}' file

# Print every line after erasing the 2nd field

awk '{$2 = ""; print}' file

# Replace every field by its absolute value

{ for (i = 1; i <= NF; i=i+1) if ($i < 0) $i = -$i print}

References: These notes on awk are a modification of a discussion given in the website

sparky.rice.edu/~hartigan/awk.html