Unix Lab Assignment 8
CSC332
UNIX Operating System

Name ______

This lab will discuss thesort, diff, uniq, and grep command.

Log on to your UNIX account. Type in:

pwd

What was the response to this command? ______

You should have the "absolute path name" from the root to your home directory. In the directory structure in Lab 3, the person's home directory was listed as /home/srp. In this example, the "/home" is the partition or group name and the "/srp" is the login name.

You will need the absolute path name that you received when you typed in the pwd above to copy the file employee to another directory. First, make a new directory. Type in:

mkdir lab8

Change your working directory to lab8 by typing in:

cd lab8

Verify that you are in subdirectory lab8. What command did you use to do this? ______

Use vi or other editors to create a new file named by employee with the contents as follows:

mgtCooper John 0615199566000

mgtDavidson Darla 0415199269500

mgtMacDonald George 0615198570000

actSmith Thomas 0410200256000

actSmith Alecia 0412199165000

misMacLeod Janice 0102197790000

misMack Joe 0225200385000

misWinslow Sarah 0215199558000

admSmith Dexter 01021975100000

misBennett Joan 0815200179000

mgtNeason Elizabeth1025199865500

actNeSmith Donald1130196699500

Then close the file and type in:
ls

The file employee consists of information on employees. The column titles and tables are listed only for your information and do not appear in the file. The file follows:

Dept / Last Name First Name / DateHired / Salary
mgt / Cooper John / 06151995 / 66000
mgt / Davidson Darla / 04151992 / 69500
mgt / MacDonald George / 06151985 / 70000
act / Smith Thomas / 04102002 / 56000
act / Smith Alecia / 04121991 / 65000
mis / MacLeod Janice / 01021977 / 90000
mis / Mack Joe / 02252003 / 85000
mis / Winslow Sarah / 02151995 / 58000
adm / Smith Dexter / 01021975 / 100000
mis / Bennett Joan / 08152001 / 79000
mgt / Neason Elizabeth / 10251998 / 65500
act / NeSmith Donald / 11301966 / 99500

In order to see what the file looks like, type in:

cat employee

Sort Command

Sort sorts the file on a line-by-line basis. If the first characters on two lines are the same, sort looks at the second characters to determine the proper order. This process continues until sort finds a character that differs between the lines. If lines are identical, it does not matter which one sort puts first. It also uses machine collating sequence--which in this case is ASCII code. Some important features of ASCII code is that capital letters have higher priority than lower case letters. Numbers have a higher priority than letters. A copy of the ASCII code chart is attached to the end of this lab for your reference. An important point to remember is that sort is a filter and does not change the contents of the input file. It takes the contents of the specified input and outputs it in a sorted fashion. In other words, it filters its input. If the output is not redirected to a file, the output goes to stdout--which means the terminal screen. The sorted output will not be saved unless the output is redirected to a file.

First, type the following command:

sort employee

What is the order that employee is sorted in? ______

Give a brief description of how the file is sorted. ______

The first sort will sort the Dept field in alphabetic order. The sort command will sort the first field on the machine's collating sequence if no options are specified (in ASCII sequence on the SUN system). Listed below are some options that you will use to control the way in which the the sort command works. There are other options available that are explained in the man page for sort.

-b ignore leading blanks - Blanks (TAB and SPACE characters) are normally field delimiters in the input file. Unless you use this option, sort also considers leading blanks to be part of the field they precede.

-f fold lowercase into uppercase- This considers all lowercase letters to be uppercase letters. Use this option when you are sorting a file that contains both uppercase and lowercase text.

-n numeric sort - When you use this option, minus signs and decimal points take on their arithmetic meaning and the -b option is implied. The sort utility does not order lines or order sort fields in the machine collating sequence but in arithmetic order.

-r reverse - Reverses the order of the sort. If it is a numerical sort, the output will be in descending order.

You can sort on the different line fields. There are five line fields (dept, last, first, date hired, and salary). These sequences are bounded by blanks or by a blank and the beginning or end of a line. You can use these line fields to define a sort field. You can instruct sort to skip several fields if you wish. Even though there are five fields, the first field is considered to be +0, the second field is +1, the third is +2, etc. If you wish to sort on the firstname field, you would use sort +2 employee. (think of this number as the number of fields to skip before beginning the sort)

Now, sort on the field for last name.

sort +1 employee

Look at the sorted file. Are all the names sorted in alphabetical order? ______

Give a brief description of the output.______

There is a problem because MacLeod comes before Mack. Sort put "L" before "k" because it arranges lines in the order of ASCII character codes. See the last page of this lab to see the ASCII values associated with the alpha characters. Also NeSmith comes before Neason.

In this ordering, uppercase letters come before lowercase letters. You can use the -f option to have the sort command ignore this and sort alphabetically. This option will fold the lowercase letters into uppercase letters and correct the problem from the sort above. The option is placed before the field number that is to be used in the sort. Sort the file again using the following command:

sort -f +1 employee

What happens when you sorted it this time? ______

The next sort will be on the "date hired" field. Also save these next three sort routines into files by using the redirect symbol. Remember that the redirect symbol saves, to a file, output that would normally go to the screen (stdout). Type in:

sort +3 employee > hired1

Use the cat command to list out the file hired1 to see the results. Are the hire dates sorted in order? ______

If not, what has happened? ______

This sort did not put the numbers in order but put the shortest name first in the sorted list and longest name last. With the +3, sort skips the first three line fields and counts the spaces or blanks after the third line field as part of the sort field. The ASCII value of a space character is less than that of any other printable character, so sort puts the date hired that is preceded by the greatest number of spaces first.

One solution to this problem is to eliminate the leading blanks by using the -b option. However, since dates hired is a numeric field, you can use the -n option.

Type in:

sort -n +3 employee > hired2

What is the result of the sort? ______

When you sorted on the date hired, it basically sorted on the month hired. The numeric sort treats this date field as a single number and sorts in a true numeric order. Hence, the months 01 (Jan) will come before the months 02 (Feb) regardless of the day and year of hire. You may want to sort on theyear hired instead. This can be done also. You can not only skip line fields but you can also skip characters in one line field as well. The +3.4 skips three line fields and then skips four characters before it sorts. Remember you must take care of the blank spaces also. Type in:

sort -nb +3.4 employee > hired3

What was the result? ______

Briefly explain what happened. ______

Unfortunately, the sort works differently on different machines. You will need to be careful using the sort until you understand how the machine you work on will use the sort.

Sorting on more than one field

UNIX also allows you to sort on more than one field. One example would be to sort on the department and on salary. We can see if people in one department are getting paid more than other departments. Of course there are other things that may be factors on salary such as years on the job. Type in:

sort +0 +4n employee

What was the result? ______

Were both columns sorted? ______

As you can see, there are complications! First of all, note that the n was used after the 4. That is because the first field is alphabetic and we do not want to sort the entire file with a numeric sort. So in this case, the n is used after the field number.

Next, the command line instructs sort to sort on the entire line (+0) and then make a second pass, sorting on each entire line. Look at the first field and second field in the first two lines. The sort routine sorts on the first field and then goes to the second field. It matches on the names and does not go further. In order to stop the sort routine from going past the +0 field, you need to define where the first sort ends--in this case, it will be -1 (one). This will stop the sort before going to the next field and will then go on to the salary field. Type in the next command.

sort +0 -1 +4n employee

What were the results of this output. Was the file sorted on both the department and also the salary field?

Sorting data with more than one word in a field

In the data set example above, each new field was started after a blank space. Sometimes you want to have a field that contains two or more words. Examples would be names of books or their authors. The file books contains fields with multiple words in it.

Copy the file books from the directory /tmp/csc3321/booksto your lab8 directory by using the following command:

cp /tmp/csc3321/booksbooks
or
cp /tmp/csc3321/books.

The file books contains the following information:

Subject / Book Title / Author's
Last Name / Author's
First Name / Pub.
Date / Price
UNIX: / Introduction to UNIX: / Wrightson: / Kate: / 2003: / 45.00:
UNIX: / Just Enough UNIX: / Anderson: / Paul: / 2003: / 39.00:
UNIX: / Bulletproof UNIX: / Gottleber: / Timothy / 2002: / 48.00:
UNIX: / Learning the Korn Shell: / Rosenblatt: / Bill: / 1994: / 35.95:
UNIX: / A Student's Guide to UNIX: / Hahn: / Harley: / 1993: / 24.50:
UNIX: / Unix Shells by Example: / Quigley: / Ellie: / 1997: / 49.95:
UNIX: / UNIX and Shell Programming: / Forouzan: / Behrouz: / 2002: / 80.00:
UNIX: / UNIX for Programmers and Users: / Glass: / Graham: / 1993: / 50.00:
SAS: / SAS Software Solutions: / Miron: / Thomas: / 1993: / 25.95:
SAS: / The Little SAS Book, A Primer: / Delwiche: / Lora: / 1998: / 35.00:
SAS: / Painless Windows for SAS Users: / Gilmore: / Jodie: / 1999: / 40.00:
SAS: / Getting Started with SAS Learning: / Smith: / Ashley: / 2003: / 99.00:
SAS: / The How to for SAS/GRAPH Software: / Miron: / Thomas: / 1995: / 45.00:
SAS: / The Output Delivery System: / Haworth: / Lauren: / 2001: / 48.00:
SAS: / Proc Tabulate by Example: / Haworth: / Lauren: / 1999: / 42.00:
SAS: / SAS Application Programming: / Dilorio: / Frank: / 1991: / 35.00:
SAS: / Applied Statistics & SAS Programming: / Cody: / Ronald: / 1991: / 29.50:

Notice that you have book titles that contain more than one word. The names of the books have spaces in the titles. In this case, the entire title is one field. In order to sort a file on fields of this type, you need to add field delimiters. When you enter the data into a data set, you would use some character to tell where the fields end and the next one begins. In this case the delimiter character is the : (colon). It could be some other character. However, the delimiter character must be unique and not a character that will be in the regular field. You must use the -t option when sorting this file. This option is:

-txset field delimiter - When you use this option, replace the x with the character that is the field delimiter in the input file. This character will be interpreted as an end of field during a sort.

In order to sort this file on the publish date, issue the command:

sort -n -t: +4 books

What is the result? ______

Try another sort using the books file. Sort on the price field in reverse. Type in the following:

sort -nr -t: +5 books

What was the result? ______

Try one more sort, this time saving the sort to a file. This sort will be on two fields. Put it into a new file called newbooks. Type in:

sort -t: +0 +1 books > newbooks

Look at the file, newbooks. What does the sorted file look like now?

______

Use the vi editor to view and edit newbooks. Add your name to the top of the file. Save the file.

***************************************************************
Print out the file "newbooks" and attach it to this lab.
***************************************************************

Diff command

The diff command displays differences between two files on a line-by-line basis. It displays the differences as instructions that you can use to edit one of the files ( using the vi editor) to make it the same as the other. When you use diff, it produces a series of lines containing Append (a), Delete (d), and Change (c) instructions. Each of these lines is followed by the lines from the file that you need to append, delete, or change. A less than symbol () precedes lines from file1. A greater than symbol () precedes lines from file2.

You will now need four files. These are telnos, telnos2, telnos3, telnos4. These files are all short files that contain names, departments, and telephone numbers. This is what they look like.

telnos / telnos2
Hale Elizabeth Bot 744-6892
Harris Thomas Stat 744-7623
Davis Paulette Phys 744-9579
Cross Joseph MS 744-0320
Holland Tod A&S 744-8368 / Hale Elizabeth Bot 744-6892
Harris Thomas Stat 744-7623
Davis Paulette Phys 744-9579
Holland Tod A&S 744-8368
telnos3 / telnos 4
Hale Elizabeth Bot 744-6892
Harris Thomas Stat 744-7623
Smith John Comsc 744-4444
Davis Paulette Phys 744-9579
Cross Joseph MS 744-0320
Holland Tod A&S 744-8368 / Hale Elizabeth Bot 744-6892
Smith John Comsc 744-4444
Davis Paulette Phys 744-9579
Cross Joseph MS 744-0320
Holland Tod A&S 744-8368

To make it easier to copy you can use the * (wildcard) to copy these files. Type in the command:

cp /tmp/csc3321/telnos* .

Remember the . (period) means current directory and will copy all of the telnos files at one time and assignthem the names that they have in the instructor's file

In order to see how diff works, type in:

diff telnos telnos2

What was the result?
______

The difference between these two files (telnos and telnos2) is that the 4th line in telnos is missing from telnos2. The first line that diff displays (4d3) indicates that you need to delete the 4th line from file telnos to make the two files match. The 4 is the line number and the (d) is delete. The line number to the left of each of the a,c, or d instructions always pertains to file1. Numbers to the right of the instructions apply to file2. The diff command assumes that you are going to change file1 to file2. The next line that diff displays starts with a less than (<) symbol indicating that this line of text is from file1. Next type in:

diff telnos telnos3

What was the result?______

In this case, the second line has the (>) greater than sign which means that the extra line is in file2. The a means you must append a line to the file telnos after line 2 to make it match telnos3. Append means to add on to the end. Next is an example of the change feature. Type in the following command:

diff telnos telnos4

What was the result? ______

What lines do you need to change in order to make the two files alike?______

Notice that the three hyphens indicate the end of the text in the first file that needs to be changed and the start of the second file that needs to be changed. Next, copy telnos to telnos5. What command did you use to do this? ______

Next, type in:

diff telnos5 telnos2

What was the answer that you received? ______

Use the vi editor to change telnos5 to match the file telnos2. Then check to see if they are now alike.

What command did you use? ______

What was the result? ______

What is the output of the diff command when the files match?
______

When the two files are alike, there is no response. Unfortunately, Unix is not always user-friendly.

Uniq Command

The uniq command displays a file, removing all but one copy of successive repeated lines. If the file has been sorted, uniq ensures that no two lines that it displays are the same. Sort telnos and telnos3 and then send them to a new file called tel2. Type in the following: