Laboratory 13

Searching

Introduction

We now turn our attention to a second fundamental list-processing algorithm, that of searching a list for a desired object. In this lab, the objects belong to the class Player. Specifically, each Player object corresponds to a record containing the name, team, and some batting statistics for a major league baseball player. The class LeagueData allows us to manipulate the entire list of Player objects for a given league, performing operations such as searching for a particular player’s record, and listing the top ten hitters in the league. As an exercise, you will be asked to add more member functions to this class. Our primary goal, however, is to use this setting to compare the two algorithms for searching: linear searching vs. binary searching.

Key Concepts

Abstract Data Types

Arrays

Problem Solving: Linear vs. Binary Searching

Complexity: Average case and worst case measures

Before the Lab

Read Section 9.8 in Cohoon and Davidson.

Preliminary

Turn the PC on, if necessary.

Access the network.

Copy the source files from the Lab 13 Source Files folder.

Start up CodeWarrior.

Open the LabWork project.

Linear and Binary Searches

Observation

The purpose of this program is to compare the efficiency of the linear search to the binary search. The difference is dramatic, as you will see, indicating that the binary search is much better than the linear search. On the other hand, the binary search requires that the list be sorted, which itself takes more time than a linear search.

Open the LabWork project.

Open the files PlayerSearch.cpp, leaguedata.cpp, and player.cpp. and add them to the LabWork project.

Read the program PlayerSearch.cpp to see what the program does.

In this program, we perform a linear search for Jeff Kent’s record and a binary search for Mike Piazza’s record, by sending to the appropriate search member functions as a string constant the last name of the Player we are interested in. Then, to see what happens when a record is not found, we perform a linear search for Valente and a binary search for Koether.

The LinSearch() and BinSearch() functions return the number of comparisons made during the search. The program PlayerSearch.cpp reports the count after each search. Each function also has a second parameter, verbose. If this parameter is true, then the search will print the record if it is found, and it will print a message if the record is not found. If this parameter is false or omitted, then the search functions will print nothing.

Run PlayerSearch.cpp.

How many comparisons were required to find Jeff Kent’s record?

How many comparisons were required to find Mike Piazza’s record?

Assume this particular list has 341 Player objects.

How many comparisons, in the worst case, should we expect of the linear search?

How many comparisons, on the average, should we expect the linear search to perform?

How many comparisons, in the worst case, should we expect of the binary search? Note that with each comparison, the number of records remaining under consideration is cut in half.

How many comparisons, on the average, should we expect of binary search to locate an object in this list of 341 objects?

Write down your answers to the above questions. Later in this lab we will experiment to see if your answers are close to the empirical results.

Open the file leaguedata.cpp.

Read the functions LinSearch() and BinSearch() to see how the search functions are implemented.

Open the file PlayerRandomSearch.cpp.

Add PlayerRandomSearch.cpp to the LabWork project.

Remove PlayerSearch.cpp from the LabWork project. Leave player.cpp and leaguedata.cpp in the project.

Earlier in this lab, you guessed the number of comparisons in the average and worst cases for the linear and binary search algorithms performed on a list of 341 records. Now we will test your hypotheses by running a program that will conduct 1000 random linear searches and reporting the average and the maximum number of comparisons. The maximum and the average should be good approximations to the worst case and the true average number of comparisons, respectively.

Run PlayerRandomSearch.cpp.

What are the maximum and average numbers of comparisons?

Is this result surprising? How does it compare to your guesses?

If your guesses were not close, then ask your instructor for an explanation.

Now we will repeat this experiment for binary searches by performing 1000 random binary searches and reporting the results. In the program PlayerRandomSearch.cpp,

Change the function call LinSearch() to the call BinSearch().

Run PlayerRandomSearch.cpp.

Is this result surprising? How does it compare to your guesses?

If your guesses were not close, then ask your instructor for an explanation.

Sorting an Array by a Field

Observation

Note that the class LeagueData stores two different lists of Players, one called alphaPlayerList and the other called playerList. These lists contain exactly the same Players, but in different orders. The array alphaPlayerList always remains in alphabetical order according to the Players’ last names. That is why the BinSearch() member functions is applied only to alphaPlayerList.

Normally, the records stored in a database contain many individual fields. In this lab, some of the fields are the player’s name, team, and batting average. It is possible to sort the database by any of these fields. In our next example, we would like to find the players who have the highest batting averages. To do this, we will sort the records by the field of batting averages.

Open the file BALeaders.cpp.

Add BALeaders.cpp to the LabWork project.

This program will sort the players in the playerList array into order according to batting average. Then it will print the names of the players with the top ten batting averages.

Note that, in sorting the Players from highest to lowest batting average, the SortByBA() member function is careful to sort the playerList array, not the alphaPlayerList array. Were it instead to sort alphaPlayerList by batting average, it would render useless all further calls to BinSearch() since BinSearch() depends on the names in alphaPlayerList being in alphabetical order.

As a good review of sorting,

Study the SortByBA() function.

Run BALeaders.cpp.

Application

Open the MS Word document Lab 13 App.doc, found in the Lab 13 App subfolder, and follow the instructions.

Summary

In this lab, we conducted experiments with two different searching algorithms: linear search and binary search. We obtained evidence to support the theory that a linear search of a list with n objects requires, on average, (n + 1)/2 comparisons. For example, our list of 341 objects requires 170.5 comparisons on average. In effect, if one doubles the size of the list, the linear search algorithm will require twice the time to search it. This is a property of all “linear” or O(n) algorithms.

Our experiments demonstrated that a binary search is vastly superior to a linear search. Of course, a binary search can work properly only if the list is ordered. To appreciate the differences in performance of the two searching algorithms, we note that if the list contains 341 names, the binary search algorithm requires, in the worst case, just nine comparisons of the search name to names in the list. This number comes about as the number of terms in the sequence of numbers that represent the sizes of the sublists under consideration:

341 ® 170 ® 85 ® 42 ® 21 ® 10 ® 5 ® 2 ® 1

The above sequence suggests that if the list size is doubled, the binary search algorithm’s worst case will involve only 1 additional comparison! This is a property of all O(log n) algorithms.