Previous Section
 < Free Open Study > 
Next Section

10.2 Searching

As discussed in Chapter 2, for each particular structure used to hold data, the functions that allow access to elements in the structure must be defined. In some cases, access is limited to the elements in specific positions in the structure, such as the top element in a stack or the front element in a queue. Often, when data are stored in a list or a table, we want to be able to access any element in the structure.

Sometimes the retrieval of a specified element can be performed directly. For instance, the fifth element of the list stored sequentially in an array-based list called list is found in[4]. Often, however, you want to access an element according to some key value. For instance, if a list contains student records, you may want to find the record of the student named Suzy Brown or the record of the student whose ID number is 203557. In cases such as these, you need some kind of searching technique to retrieve the desired record.

For each of the techniques we review or introduce, our algorithm must meet the following specifications. Note that we are talking about techniques within the class, not client code.

This specification applies to both array-based and linked lists, where location would be either an index in an array-based list or a pointer in a linked list, and NULL would be either -1 in an array-based list or the null pointer in a linked list.

Linear Searching

We cannot discuss efficient ways to find an element in a list without considering how the elements were inserted into the list. Therefore, our discussion of search algorithms must deal with the issue of the list's InsertItem operation. Suppose that we want to insert elements as quickly as possible, and we are not equally concerned with how long it takes to find them. We would put the element into the last slot in an array-based list and the first slot in a linked list. These are O(1) insertion algorithms. The resulting list is sorted according to the time of insertion, not according to key value.

To search this list for the element with a given key, we must use a simple linear (or sequential) search. Beginning with the first element in the list, we search for the desired element by examining each subsequent item's key until either the search is successful or the list is exhausted.

Based on the number of comparisons, it should be obvious that this search is O(N), where N represents the number of elements. In the worst case, in which we are looking for the last element in the list or for a nonexistent element, we make N key comparisons. On the average, assuming that there is an equal probability of searching for any item in the list, we make N/2 comparisons for a successful search; that is, we search half of the list.

High-Probability Ordering

The assumption of equal probability for every element in the list is not always valid. Sometimes certain list elements are in much greater demand than others. This observation suggests a way to improve the search: Put the most-often-desired elements at the beginning of the list. Using this scheme, you are more likely to have a hit in the first few tries, and rarely do you have to search the whole list.

If the elements in the list are not static or if you cannot predict their relative demand, you need some scheme to keep the most frequently used elements at the front of the list. One way to accomplish this goal is to move each element accessed to the front of the list. Of course, this scheme offers no guarantee that this element is later frequently used. If the element is not retrieved again, however, it drifts toward the end of the list as other elements move to the front. This scheme is easy to implement for linked lists, requiring only a couple of pointer changes, but it is less desirable for lists kept sequentially in arrays, because of the need to move all the other elements down to make room at the front.

A second approach, which causes elements to move toward the front of the list gradually, is appropriate for either linked or sequential list representations. As an element is found, it is swapped with the element that precedes it. Over many list retrievals, the most frequently desired elements tend to be grouped at the front of the list. To implement this approach, we need to modify only the end of the algorithm to exchange the found element with the one before it in the list (unless it is the first element). If the search operation was implemented as a const function and we made this modification, we would have to remove the const declaration, as the search operation actually changes the list. This modification should be documented; it is an unexpected side effect of searching the list.

Keeping the most active elements at the front of the list does not affect the worst case; if the search value is the last element or is not in the list, the search still takes N comparisons. It is still an O(N) search. The average performance on successful searches should improve, however. Both of these algorithms depend on the assumption that some elements in the list are used much more often than others. If this assumption does not hold true, a different ordering strategy is needed to improve the efficiency of the search technique.

Lists in which the relative positions of the elements are changed in an attempt to improve search efficiency are called self-organizing or self-adjusting lists.

Key Ordering

If a list is sorted according to the key value, we can write more efficient search routines. To support a sorted list, we must either insert the elements in order or sort the list before searching it. (Inserting the elements in order is an O(N2) process, as each insertion is O(N). If we insert each element into the next free slot and then sort the list with a "good" sort, the process has O(Nlog2N) complexity.)

If the list is sorted, a sequential search no longer needs to search the whole list to discover that an element does not exist. Rather, it needs to search only until it has passed the element's logical place in the list-that is, until it encounters an element with a larger key value. Versions of the Sorted List ADT in Chapters 3 and 5 implement this search technique.

The advantage of linear searching of a sorted list is the ability to stop searching before the list is exhausted if the element does not exist. Again, the search is O(N)-the worst case, searching for the largest element, still requires N comparisons. The average number of comparisons for an unsuccessful search is now N/2, however, instead of a guaranteed N.

The advantage of linear searching lies in its simplicity. The disadvantage relates to its performance: In the worst case, you make N comparisons. If the list is sorted and stored in an array, however, you can improve the search time to a worst case of O(log2N) with a binary search. In this instance, efficiency is improved at the expense of simplicity.

Binary Searching

We have seen a way to improve searching efficiency from O(N) to O(log2N). If the data elements are sorted and stored sequentially in an array, we can use a binary search. The binary search algorithm improves the search efficiency by limiting the search to the area where the element might be. It takes a divide-and-conquer approach, continually paring down the area to be searched until either the element is found or the search area is gone (the element is not in the list). We developed the BinarySearch function in Chapter 3 and converted it to a recursive function in Chapter 7.

The binary search is not guaranteed to be faster for searching very small lists. Even though the binary search generally requires fewer comparisons, each comparison involves more computation. When N is very small, this extra work (the constants and smaller terms that we ignore in determining the Big-O approximation) may dominate. Although the algorithm requires fewer comparisons, each involves more processing. For instance, in one assembly-language program, the linear search required 5 time units per comparison, whereas the binary search took 35. For a list containing 16 elements, therefore, the worst-case linear search would require 5 * 16 = 80 time units. The worst-case binary search requires only 4 comparisons, but at 35 time units each, the comparisons take 140 time units. In cases where the list contains a small number of elements, a linear search is certainly adequate and sometimes faster than a binary search.

As the number of elements increases, however, the disparity between the linear search and the binary search grows very quickly. Look back at Table 3.1 to compare the rates of growth for the two algorithms.

The binary search discussed here is appropriate only for list elements stored in a sequential array-based representation. After all, how can you efficiently find the midpoint of a linked list? However, you already know of a structure that allows you to perform a binary search on a linked data representation, the binary search tree. The operations used to search a binary tree are discussed in Chapter 8.

Previous Section
 < Free Open Study > 
Next Section
Converted from CHM to HTML with chm2web Pro 2.85 (unicode)