Sorting and Merging

SORTING AND MERGING -

(CONTINUED)

What if need to process records after they have been sorted, but prior to being sent to a file or printed report?

We might want to sort a file and after sorting, read the sorted records and generate a report with control breaks, summary lines, etc., where the input is the sorted file.

We can do these things and much more via an "Output procedure."

Consider the full syntax of the sort in your book…(approximately)

descending

sort file-name-1 on ascending key dataname-1 ...

input procedure is proc-name-1 through procedure-name-2 thru

using file-name-2

output procedure is proc-name-3 through procedure-name-4

thru

giving file-name-3 ...

1

Just as with an input procedure, where the programmer elects to open, read, move, release, and close the input files, so too in an output procedure, the programmer takes on the responsibility of opening, returning, moving, writing, and closing appropriate files too.

1

Let's take a look at the return statement:

return sort-file-name-1

at end imperative-statement-1

[not at end imperative-statment-2] (Cobol 85)

[end-return]

Note the return appears like a "read" statement.

This is exactly what it is... a "read" of the sortfile!

Recall: sortfile is never opened or closed.

Therefore, when the sorted records are to be massaged, there is, in fact, no open or close of the sortfile, but we do still "return" records from it!!

1

The normal procedure in an output procedure (no pun intended) is to

open the output file (that would have been the "giving" file, (this is the file to which the sorted records are to be written…)

return the first sorted record (just like the priming read in normal processing...)

perform a paragraph that processes that sorted record, writing the record somewhere, and terminate that performed paragraph with a trailing "return", just as you would have with a “read.”

(Of course, an in-line Perform with an embedded Return is okay too)

(Perform this paragraph until the sortfile is empty)

Close the output file (printer file, disk file, etc.)

Recognize that after the output procedure paragraph is completed, control returns to the sort statement, as appropriate...

Let's consider an example: (not in book) by considering the UNF faculty file again.

(Realize, this example is pertinent to any organization broken down by division, cost centers, pay-grade, name, etc.)

Assume that I want to sort the entire input file, called the College-File,

by college,

rank, and

by name ... as before.

(note, when I say that I want to sort the ‘entire’ file, this is a tip off that I don’t want to do anything special on the input side and further, that I want all the records processed. This ==> will use "using"....)

But, I want to generate a printfile where each full professor gets a 10% raise each associate professor gets an 8% raise, and each assistant professor gets a 5% raise. No other faculty members get raises.

Now, Ido need the entire file sorted in the desired sequence.

Further, no records are to be eliminated from the printout - but some of the records have additional processing to be done on them before printing them out.

This is the perfect candidate for the output procedure...

Consider the code:

1

FDCollege-File

.....

01 college-file-record.

...

SDSortfile

....

01 sortfile-record.

05sf-fac-name...

05 (other fields)

05sf-fac-rank...

05(other fields)

05sf-salary...

05sf-college...

05the-rest(still more other fields)...

FDSalary-Report

....

01salary-report-record.

... {formatted print record}

procedure division....

...

sort Sortfile

on ascending key sf-college,

descending key sf-fac-rank,

ascending key sf-fac-name

using College-File

output procedure is 2000-get-raise.

end-sort

…..

1

2000-get-raise.

open output Salary-Report.

return Sortfile(note: the sd name...)

at end move 1 to f-eof(eof flag)

end-return

perform 2100-process-raise

until f-eof=1.

close salary-report.

2100-process-raise.

{move name and other fields to print rec}

if sf-fac-rank = "fullprof" (or encoded)

multiply sf-salary by 1.10 giving p-sr-salary

write salary-report-record

else

if sf-fac-rank = "assocprof"

multiply sf-salary by 1.08 giving p-sr-salary

write salary-report-record

else

if sf-fac-rank = "asstprof"

multiply sf-salary by 1.05

giving p-sr-salary

write salary-report-record

end-if

return sortfile

at end move 1 to f-eof

end-return(note skipping of records)

next procedure…

1

======Older COBOL – but you may still see this ======

Bear in mind that in Cobol 74 and older versions, you would have to code a goto to branch around the paragraphs in the performed section (the output procedure) in order to terminate the section.

2000-get-raise section. <====

2050-get-sorted-records. <====

open output salary-report.

return sortfile(note: the sd name...)

at end move 1 to f-eof

end-return

perform 2100-process-raise

until f-eof=1.

close salary-report.

goto 2200-end-of-sort. <=====

2100-process-raise.

{move name and other fields to print rec}

if sf-fac-rank = "fullprof"

multiply sf-salary by .10 giving p-sr-salary

write salary-report-record

else

if sf-fac-rank = "assocprof"

multiply sf-salary by .08 giving p-sr-salary

write salary-report-record

else

if sf-fac-rank = "asstprof"

multiply sf-salary by 1.05

giving p-sr-salary

write salary-report-record

end-if

return sortfile

at end move 1 to f-eof

end-return{note skipping of records}

2200-end-of-sort. <======

exit. <======

3000-next section (or physical end of program...)

1

======end older-COBOL ======

When to use input and/or output procedures

Depending upon the type of task you wish to do, it may be more advantageous to use an input procedure rather than an output procedure, or vice versa.

choose input procedure:

briefly, if there are a number of records that will be eliminated from qualifying for a sort, then the sort is much more efficient if we eliminate these records first prior to the sort.

choose output procedure

(why sort many records that we are not interested in?) if, however, there are only a few records we are not interested in, it is probably not worth scanning the entire input file to eliminate a few records.

choose output procedure

if the format of the receiving file is to be different than the input file, than an input or an output procedure must be used.

(somewhere along the line, we must reformat the record fields...)

choose an output procedure

e.g., the output from a sort is to go to a report program with control breaks based on the sort fields: an output procedure is the ticket.

1

your sorted file is in order, but you must (presumably) reformat the print records with spaces in between the output fields for readability (not present in the input records).

merely use an output procedure and as part of the procedure, reformat the record into the print format for printing. Under program control, you may handle control breaks, as appropriate...

Alternatives to input and output procedures

Your text goes into a couple of routines where there is a requirement to count the number of records in a file.

main idea here is that it is probably inefficient in general to process an entire file just to count the records, then close the file, and then submit the file to the sort.

it is more efficient to incorporate a count of the records within an input or output procedure.

1

Sort Options

formatresult

1. usingfile is sorted. no special handling

giving

2. input procedureused for processing the unsorted records

giving before they are sorted. Writes records

to the sort file with release verb; after an

input procedure is completed, the records are

automatically sorted – at that time…

3. usingused for processing the sorted records

output procedure before writing them to the output file.

Access or read records from the sortfile with a return verb.

4. input procedureused for processing the data before and after

output procedure the desired (qualified) records are sorted.

(restricting / qualifying numbers of records; counts, etc.; reporting, computed fields, ...)

1

The Merge Statement

There is a merge capability within Cobol that allows us to merge files already in the same sequence together into a merged file.

Observe the syntax of the merge statement

merge filename-1 on ascending key dataname-1 ...

descending

using file-name-2 {file-name-3} ...

output procedure is procedure-name-1 through procedure-name-2

thru

giving file-name-4

Note that it appears like the sort statement.

Note: files are already sorted in same sequence.

Note also that there is an SDd file and multiple (there must be at least two...) input files to be merged.

but, the output options include a

giving option (as in the sort), and an

output procedure option (again, as in the sort)

so, what this means is you may merge records only after they have been sorted!

Now, because we have both the giving options and the output procedure options,

this allows us to massage the merged workfile prior to delivering it to its final destination.

1

The merge statement automatically

opens,

closes,

moves,

releases…the affected records in the files.

Merged files, of course, are "merged" in the sequence specified via the merge verb - ascending, descending, or combination

So when is it appropriate to use the merge statement?

1. to flag duplicate records as errors

We may want to eliminate records that have the same unique key, such as ssan, stock number, etc.

2. to ensure duplicate records

there are instances when duplicate records may be desirable, as in the example in your text.

(upstate and downstate records...take a look...)

Using a utility in lieu of the Sort/Merge verb

System-supplied utilities versus Cobol-embedded facilities:

Rest assured that system-provided sort utilities are much more efficient than any facility built into a language.

True, you must

1. describe the file,

2. the fields in the record to sort/merge on,

3. the relative positions in each record where the sort fields are located,

4. whether the sort is ascending or descending, etc.

but, the utilities are streamlined, language independent, and very efficient insofar as the use of system resources are concerned.

The advantage of the Sort/Merge facilities within Cobol are that they are included within a program (and not as a separate program), and can be easily invoked without leaving a Cobol program to use the system utility.

If you are given the option of using a sort facility as part of a job step or the Cobol sort within a sort program, i would select the system-provided utility.

(for sorting entire files and not qualifying records….)

1