Previous Section
 < Free Open Study > 
Next Section

10.4 Radix Sort

We have placed the radix sort in a section by itself after sorting, searching, and hashing for two reasons.

First, the radix sort is not a comparison sort; that is, the algorithm does not compare two items in a list. Therefore, we cannot analyze the amount of work done in terms of comparisons. In fact, the only thing that the radix sort has in common with the other sorts is that it takes an unsorted list as input and returns a sorted list as output.

Second, the radix sort is to sorting as hashing is to searching. That is, it makes use of the values in the individual keys to order the items just as hashing makes use of the values in the individual keys to determine where to place items. As in the other sorting algorithms, the number of values to be sorted and the array in which they are stored serve as parameters.

The idea behind the radix sort is to divide the values to be sorted into as many subgroups as there are possible alternatives for each position in the key. For example, if the key is an integer number, each position is a digit and has ten possibilities: 0 .. 9. If the key is a string of letters and case is not important, then each position has 26 possibilities: "a" .. "z". The number of possibilities is called the radix. After subdividing the values into radix subgroups, we combine them again into one array and repeat the process. If we begin with the least-significant position in the key, regroup the values in order, and repeat the process as many times as there are positions in the key, moving one position to the left each time, the array will be sorted when we finish.

Radix The number of possibilities for each position; the digits in a number system

Let's illustrate this algorithm by sorting three-digit positive integers. Within a three-digit number, let's refer to the ones, tens, and hundreds positions as positions 1, 2, and 3, respectively. We divide the values into ten subgroups based on the digit in the ones position (position 1). Let's create an array of queues, queues [0]..queues [9], to hold the groups. All items with 0 in the ones position are enqueued into queues [0]; all items with a 1 in the ones position are enqueued into queues [1]; and so on. After the first pass through the array, we collect the subgroups (queues) with the queues [0] subgroup on top and the queues [9] subgroup on the bottom. We repeat the process using the tens position and the hundreds position. When we collect the queues the last time, the values in the array are in order. This algorithm is illustrated in Figures 10.26 and 10.27.

Click To expand
Figure 10.26: Array after each pass
Click To expand
Figure 10.27: Queues after each pass

Look at the array after each pass; the digits in the position that corresponds to the pass number are sorted (Figure 10.26). Likewise, the digits in the pass-number position are the same as the index of the queue that it is in (Figure 10.27).

Let's first write the algorithm for the radix sort that matches our example and then examine ways to make it more general.

In this algorithm, each iteration of the outer loop corresponds to one pass in Figures 10.26 and 10.27. In the first pass, we use the ones digit of an integer item to determine the appropriate queue for the item. In the second pass, we use the tens digit. In the third pass, we use the hundreds digit. Next, we need to write the Collect Queues step of the algorithm. Here we collect the items from all of the queues and put them back into the values array.

Now that we understand the algorithm for three-digit integer keys, let's look at how we can make it more general before we code it. When we examined the insertion of an item into a sorted list in Chapter 3, we required the comparison of two items to be a member function of ItemType. The corresponding idea here is to make accessing the correct position in the key become a function. For example, with an integer key, we must extract the digits using / and %. If the key is a string, then we need access into an array of characters. The point is that only the user knows, so the user should provide a member function for ItemType to access successive positions in the key. This function (SubKey) takes the position number as a parameter.

The radix sort function itself, however, must know the number of positions in the key (numPositions) and the number of possible values for each position in the key (radix). We make numPositions and radix be parameters to the function.

template<class ItemType>
void RadixSort(ItemType values [], int numValues,
     int numPositions, int radix)
// Post: Elements in values are in order by key.
  QueType<ItemType> queues [radix];
  // With default constructor, each queue size is 500.
  int whichQueue;

  for (int position = 1; position <= numPositions; position++)
    for (int counter = 0; counter < length; counter++)
      whichQueue = values[counter].SubKey(position);
    CollectQueues(values, queues, radix);

template<class ItemType>
void CollectQueues(ItemType values[], QueType<ItemType> queues[],
     int radix)
// Post: queues are concatenated with queue[0]'s on top and
//       queue[9]'s on the bottom and copied into values.
  int index = 0;
  ItemType item;

  for (int counter = 0; counter < radix; counter++)
    while (!queues[counter].IsEmpty())
      values[index] = item;

If the keys are integer values, then the function SubKey must take the position number and extract the digit in that position. Let's calculate a few positions and look for a pattern. Assume itemKey is the four-digit integer 8749.

Position is 1: itemKey % 10

= 9

Position is 2: (itemKey / 10) % 10

= 4

Position is 3: (itemKey / 100) % 10

= 7

Position is 4: (itemKey / 1000) % 10

= 8

Notice that as the position number gets larger, the second operand of the / operation increases. If we rewrite the first calculation as

Position is 1: (itemKey / 1) % 10

the pattern becomes even clearer:

Result = (itemKey/10position - 1)% 10

If the key is alphabetic, then SubKey must take each character and convert it to a number between 0 and 25 (if case does not count) or between 0 and 51 (if case does matter). The algorithm that you use depends on the character set of the machine you are using.

Analyzing the Radix Sort

The amount of work done by the radix sort is more complicated than any scenario we have examined so far. Each item in the array is processed numPositions times, making the Big-O analysis a function of two variables: N (the number of items to be sorted) and P (the number of positions in the key). The processing includes extracting a value from the key, inserting the item into a queue, dequeueing each item, and copying each item back into the array. We know that each operation is O(1). So an approximation is O(N * P). However, when N is large, it dominates P. (N is the elephant and P is the goldfish, to use our familiar analogy.)

In each iteration of the radix sort, the queues are collected, meaning that each item to be sorted is processed twice on each iteration: once to put it into a queue and once when the queues are collected. We could streamline the processing in the radix sort somewhat by using the linked queue implementation and accessing the queues directly to re-create the intermediate list in linked form. However, this approach would require copying the final linked version back into the array-based form.

What about space requirements? Our RadixSort function requires space for at least two copies of each element: one place in the array and one place in the queue. If the queues are array-based, the amount of space is prohibitive because each queue must have room for every element. If the queues are linked, additional space for N pointers is required. We can cut the space requirements if we realize that this algorithm works just as well if the values to be sorted are in linked form. Nodes can be removed from the linked structure and moved to the appropriate queue; then the linked structure can be re-created by concatenating the queues. In this way, only one copy of an item (plus a pointer) exists: either in the linked structure or in a subgroup (queue).

Hence, both time and space requirements can be improved in the radix sort, if we use the linked versions of the queue and list.

Previous Section
 < Free Open Study > 
Next Section
Converted from CHM to HTML with chm2web Pro 2.85 (unicode)