Programming Project: Iterative List Merge Sort

Programming Project: Iterative List Merge Sort

Timothy Rolfe, “Iterate Through the Tulips”Page 1

Iterate Through the Tulips

Timothy J. Rolfe

Article being prepared for submission to Dr. Dobb’s Journal

Within Java, there is a powerful pair of object types built for going through the elements of a collection, the Iterator and the ListIterator. They provide an implementation-independent method of access, in which the user does not need to know whether the underlying implementation is some form of array or of linked list.

For both, the iterator keeps track of what will be examined next: conceptually, it is always sitting between elements within the collection. The Interface Iterator is the simpler of the two. It only mandates three methods: boolean hasNext(), whose purpose is quite clear in its name; Object next(), which returns the next element in the collection; and void remove(), which removes from the collection the last element returned by next(). This facilitates the traversal of the collection, without worrying about linkages within linked lists or subscripting within arrays.

Note that the Interface Iterator only traverses the collection in one direction, and only supports a single modifier for the collection, the remove. If you require a more powerful way to manipulate the collection, you need to use the Interface ListIterator. This contains methods for traversing the collection in both directions (previous() as well as next()). In addition, it contains more methods for modifying the collection, besides add() and remove(), it also allows modification of the collection by replacing one element with another — set(Object). Further, the ListIterator gives you access to the index within the collection of the element that will be accessed via either next() or previous(): nextIndex() and previousIndex().

To "learn by doing", I'm going to work through three sorting algorithms, looking at what you need to do to implement them by directly manipulating a doubly-linked list, and then what you need to do to implement them through a ListIterator (since the Iterator itself is not powerful enough).

Selection Sort

About the simplest and most intuitive sorting algorithm is the one that poker players might use to put their card hands in order: initially, hold all of your jumbled cards in one hand, and then repeatedly move the largest available card to the back of the cards held in the other hand.

Listing One shows the implementation of this in Java 1.5.x (that is, using a generic object (similar to use of a template within C++) within a linked list. Since it moves values around among the nodes of the list, the fact that it is a doubly-linked list does not enter into the code. Specifically, this is a method added to the class KWLinkedList that accompanies Koffman and Wolfgang's Objects, Abstraction, Data Structures and Design Using Java Version 5.0.

Listing One
// The list structure is unchanged, while data values are moved
// around within it. Since the class Node is a nested class, we
// have access to its private fields.
public void selectSort()
{ Node<E> front = head;
if ( front == null ) // Empty list
return;
// When only one item remains, we are finished: it is the largest
while ( front.next != null )
{ Node<E> check = front.next,
small = front;
E smallValue = small.data;
while ( check != null )
{ if ( smallValue.compareTo(check.data) > 0 )
{ small = check;
smallValue = small.data;
}
check = check.next;
}
// Finish the swap begun by setting smallValue
small.data = front.data;
front.data = smallValue;
// Advance to the next position in the list
front = front.next;
}
}

If, however, you were using the class java.util.LinkedList, you would not have access to the nodes. That means you can't just keep your finger on the node with the smallest value; all you can capture is what and where that smallest value is. That means that after you have found the smallest value, you have to send an iterator down to that position to access the node. Remember that the generation of an iterator at a given index requires the traversal of the linked list from its beginning down to that position —linked list is not a random-access structure. While you could have one iterator acting like the outer loop (keeping track of the front of the remaining list), the iterator accomplishing the task of the nested loop would always need to traverse the entire list. It first traverses down to the starting point and then traverses to the end as you look for the smallest value.

If, however, you use an iterator, you can avoid many of these traversals by actually building a new list. To accomplish this, you first clone the existing list; that is, you generate a "shallow copy", a linked list that references the same nodes as the original list. Once you've done that, you can use the clear() method on the original list (that is, set its head and tail references to null and its size to zero) and build your new list there. Now you make a slight shift: repeatedly find the largest element in the old list, remove it, and put it at the front of the new list. This way each traversal of the old list gets ever shorter. Since you're building the new list in a stack-like manner, you effectively reverse the order, so that the resulting list goes from smallest to largest.

Listing Two shows the implementation of the algorithm described above. While it references the KWLinkedList, it is in fact using methods that also exist in java.util.LinkedList. Since this is a static method (within a little library of ListIterator sorting methods), it does not jump through the hoops necessary to use generics, but instead depends on using Comparable objects. The Java version 1.5.x compiler does get rather unhappy with the unchecked references!

Listing Two
public static void selectSort ( KWLinkedList work )
{ KWLinkedList oldList = work.clone();
ListIterator finder;
Comparable maxValue;
if ( oldList.size() < 2 ) // Empty or single-element list
return;
work.clear(); // Empty out the list to receive the new contents
// When only one item remains, it is the smallest and goes first
while ( oldList.size () > 1 )
{ int maxIndex = 0;
finder = oldList.listIterator(0);
maxValue = (Comparable) finder.next();
while ( finder.hasNext() )
{ Comparable testValue = (Comparable) finder.next();
if ( maxValue.compareTo(testValue) < 0 )
{ maxValue = testValue;
// We've just stepped PAST the item, so correct for that
maxIndex = finder.nextIndex() - 1;
}
}
// Note: this generates an extra traversal
finder = oldList.listIterator(maxIndex);
maxValue = (Comparable) finder.next();
finder.remove(); // Eliminate the element returned by finder.next()
work.addFirst(maxValue); // Add to the FRONT of the new list.
}
// Move the last value --- the smallest item
finder = oldList.listIterator(0);
maxValue = (Comparable) finder.next();
finder.remove(); // not really required, given garbage collection
work.addFirst(maxValue);
}

Insertion Sort

Another fairly intuitive sorting algorithm is the one that bridge players might use to put their card hands in order: from the pile of cards in front of you, pick up one card at a time and put into its proper place with the cards that you already have picked up.

Listing Three shows the implementation of this in Java 1.5.x within a linked list. Since the method does move nodes around, it is important that you correctly deal with the doubly-linked list. The head of the existing list is copied to a node that will be used to walk through that list. Then the initial list is transformed into a single-entry list. As you walk through the old list, you remove each node and then insert it into its proper place in the new list. This task is made somewhat complex by the doubly-linked list and the cases of inserting into a doubly-linked list at the front, middle, and back of the existing list. As often happens with linked lists, the code gets long handling all the cases.

Listing Three
public void insertSort()
{ Node<E> working = head;
// No work to do if the list is empty or has just one element
if ( head == null || head.next == null )
return;
// The nodes to be inserted begin with the SECOND
working = working.next;
// Transform the existing list into a one-node list
head.prev = head.next = null;
tail = head;
working.prev = null; // I.e., this is now the front of the list
while ( working != null )
{ Node<E> current = head;
Node<E> hold = working;
E holdData = hold.data;
working = working.next; // New front of the remaining list
if ( working != null ) // Maintain appropriate back link
working.prev = null;
// Find where this element belongs in the list being generated
while ( current != null ) // "<= 0" to guarantee a stable sort
if ( holdData.compareTo(current.data) <= 0 )
break;
else
current = current.next;
// The "hold" node belongs BEFORE current
hold.next = current;
if ( current == null ) // Add to end of list
{ hold.prev = this.tail;
this.tail = this.tail.next = hold;
}
else // Check whether this is the beginning or farther along
{ if ( current.prev != null )
{ hold.prev = current.prev;
current.prev.next = hold;
current.prev = hold;
}
else // We must be adding this to the front of the list
{ hold.prev = null;
current.prev = hold;
head = hold;
}
}
}
}

The iterator implementation, on the other hand, doesn't have to deal with all that messy doubly-linked list stuff. It can trust the ListIterator methods to remove and add elements correctly. The spirit is the same, though. Remove from the old list and insert into proper position within the new list. Start by cloning the list to be sorted, and then use the clear() method on the list received so that it ends up with the sorted contents. Move one element directly over from the old list to the new list (using the LinkedList method addFirst). Then walk through the old list, removing elements and explicitly inserting them into proper position within the new list (using the ListIterator method add). Since the ListIterator methods handle the messing linked list stuff, the program is briefer and clearer. Again, as a static method it cannot use generics and so uses Comparable objects. The methods of KWLinkedList used are identical with those in java.util.LinkedList.

Listing Four
public static void insertSort ( KWLinkedList work )
{ KWLinkedList oldList = work.clone();
ListIterator oldIter = oldList.listIterator();
Comparable hold;
if ( ! oldIter.hasNext() ) // Empty list. Return.
return;
hold = (Comparable) oldIter.next(); // Pull first item
if ( ! oldIter.hasNext() ) // One-item list. Return
return;
work.clear(); // Empty out the list to receive the new contents
work.addFirst(hold); // First entry in the new list
oldIter.remove();
while ( oldIter.hasNext() )
{ ListIterator finder = work.listIterator(0);
hold = (Comparable) oldIter.next();
oldIter.remove(); // Explicit REMOVE this from the list.
while ( finder.hasNext() )
{ Comparable test = (Comparable) finder.next();
if ( test.compareTo(hold) > 0 )
{ finder.previous(); // Move past the in-position item
break;
}
}
finder.add(hold);
}
}

Merge Sort

Within the computing community, these two algorithms are well known as not being appropriate for large data sets (unless, for Insertion Sort, the data set is nearly in proper order). The time required increases as the square of the problem size. When dealing the linked lists, the Merge Sort algorithm is preferable — the time required increases only slightly faster than the problem size. In the jargon, Merge Sort is an order(n log n) algorithm. Here the fundamental operation is merging two sorted lists into a single, larger sorted list. Here also you can invoke my favorite expression while teaching computing: "Recursion is your friend!"

Merge Sort is best viewed as a recursive method. When you get the list to be sorted inside of Merge Sort, you check to see if it has size 0 or 1 — so that it is already sorted and you don't have anything to do. Otherwise you break the list into two lists of equal or nearly equal size, calling the first half the "left" list and the second half the "right" list. Recursively sort those two lists and then merge them into a final list. To do that, repeatedly check the beginning of the two lists. If the left entry is less than or equal to the right entry, move the left entry over to the final list, otherwise move the right. Once one of the two lists has gone empty, just move the rest of the other list over to the final list. [Note: we select from the left list on equal values to make this a "stable" sorting algorithm, namely one that does not undo an existing order. In other words, if in the original list item A has a lower index than item B, then if A and B are equal A still has a lower index than B in the sorted list.]

As you build more powerful sorting methods, you have to work harder. Listing Five shows the implementation of this in Java 1.5.x within a doubly-linked list. It also uses a non-standard constructor within the KWLinkedList — one that receives the references for the head, the tail, and the size and constructs a list with those fields, in the process insuring that the resulting list is a correct doubly-linked list. That means that the head node has a null previous link, and the tail node has a null next link. If the left-list size is size/2 (with truncation, if appropriate), you break the list in half by taking the appropriate number of steps from the head node to be sitting on the last node in the left list. From there you know what will be the head node in the right sublist (the one immediately after this one), and you know the last node in the right sublist since it is the last node in the input list. Once you've built the right sublist, you can easily build the left sublist, using the input list's head node as first and the node you're sitting on as the last.

Recursion is your friend: make the calls and these two lists have become sorted. Now you just have the merge process to go through. First, though, you can use a technique to avoid having the first entry as a special case: you use a dummy header. The real list begins after that single node. You keep track of the node that the next entry comes after — call it the current node. You then set up to step through the left and right lists. You repeatedly check how the nodes at the front of those two lists compare, moving from the right list if that is strictly less than the left entry, or else from the left list. Once one of the two lists has gone empty, you can just tack the non-empty list to the back of the list you're building — make that entire list come after the current node. Once you're finished, you just discard the dummy header. Since Java does your garbage collection for you, all you need to do is to update what head refers to.

Listing Five
public void mergeSort()
{ KWLinkedList<E> leftList, rightList;
Node<E> current = this.head;
int leftSize = this.size / 2,
rightSize = this.size - leftSize,
k; // loop variable
if ( this.size < 2 ) return;
// Traverse to the _end_ of the left sublist
// Start at 1 to go (leftSize-1) steps
for ( k = 1; k < leftSize; k++ )
current = current.next;
// NOTE: it is CRITICAL that the right list be generated first, since
// creation of the left list sets current.next to null.
rightList = new KWLinkedList<E> ( current.next, this.tail, rightSize );
leftList = new KWLinkedList<E> ( this.head, current, leftSize );
this.head = this.tail = null; this.size = 0;
// Recursively sort these two sublists.
leftList.mergeSort(); rightList.mergeSort();
// Start out the list with a dummy header for convenience ---
// that way the first entry is not a special case.
current = this.head = new Node<E>(null);
// Merge these two lists into a single list
// Execute until one of the two lists goes empty.
while (leftList.head != null & rightList.head != null )
{ // ? Pull from the right list ?
if (leftList.head.data.compareTo(rightList.head.data) > 0 )
{ current.next = rightList.head; // Link in the node
rightList.head = rightList.head.next;// Advance this list
}
else // pull from the left list.
{ current.next = leftList.head; // Link in the node
leftList.head = leftList.head.next; // Advance this list
}
current.next.prev = current; // Set the back link
current = current.next; // Update the current node
}
// Append the non-empty list at the back of the current list
if ( leftList.head != null ) // Left sublist has contents
{ current.next = leftList.head;
current.next.prev = current; // Set the back link
this.tail = leftList.tail; // and grab the tail setting
}
else // Right sublist has contents
{ current.next = rightList.head;
current.next.prev = current; // Set the back link
this.tail = rightList.tail; // and grab the tail setting
}
// Regenerate the size
this.size = leftList.size + rightList.size;
// REMOVE the dummy header --- the garbage collector will recover it.
this.head = this.head.next;
}