Previous Section
 < Free Open Study > 
Next Section


9.4 Sets

Graphs and sets differ from the other ADTs that we have studied in that they are modeled on mathematical objects. The operations that are defined on them come from mathematics. In fact, we use the concept of a set to define a graph: A graph is a set of nodes and a set of edges that relate nodes to one another. In this definition, we rely on our intuitive definition of a set as a collection. In this section, however, we will view a set as an abstract data type.

Logical Level

In mathematics, a set is a collection or group of items, each of which can itself be a set or an item. For our ADT, we define a set as an unordered collection of distinct values, chosen from the possible values of an atomic data type or a composite data type called the component or base type. All of the items in the set are of the same data type.

Three special types of sets are important:

  • The subset, a set contained within another set

  • The universal set, a set that contains all the values of the base type

  • The empty set, which contains no values

We define two transformers for the Set ADT: , which puts an item into a set, and , which removes an item from the set. Three mathematical operations are defined on sets: union, intersection, and difference. Union takes two sets and creates a third set that contains all the elements in either of the input sets. Intersection takes two sets and creates a third set that contains only the elements in both sets. Difference takes two sets and creates a third set that contains all the items in the first set that are not in the second set. In set terminology, the number of items in the set is called the set's cardinality.

Set An unordered collection of distinct values (items or components), chosen from the possible values of a single data type, called the component (base) type

Component (base) type The data type of the components or items in a set

Subset A set X is a subset of set Y if each item of X is an element of Y; if at least one element of Y is not in X, then X is a proper subset of Y

Universal set The set containing all the values of the component type

Empty set A set with no members

Union A binary set operation that returns a set made up of all the items that are in either of the input sets

Intersection A binary set operation that returns a set made up of all the items that are in both of the input sets

Difference A binary set operation that returns a set made up of all the items that are in the first set but not the second set

Cardinality The number of items in a set

We need observer operations to test whether the set is empty or full, to return the cardinality, and to make the set empty. Here is the CRC card summarizing our operations that mirror the mathematical definitions:

Click To expand

Here is the specification for these responsibilities:

Notice that the definitions for the mathematical set say nothing about the set being full. However, to implement the Set ADT, we must consider that possibility. and are actually not necessary. We could create an empty set and a set with one item and perform a union of the two to implement . We could create a set containing the item to be deleted and perform a difference of the set and the set with the one item to implement . However, having and makes processing easier.

Application Level

A set could be the implementation structure for the vertices and/or edges in a graph. One property of a set is that putting an item, which is already there, into the set does not change the set. (Did you realize that is what the specification said?) Also, deleting an item that is not there does not change the set. These properties can prove useful in determining whether items occur in a passage of text, for example. All the characters in the text could be put into a set. When the process is finished, the set contains exactly one copy of every item that appeared in the text.

Implementation Level

There are two basic ways to implement sets. The first explicitly records the presence or absence of each item in the base type (ItemType) in the representation of the set variable. The second records only those items that are in a set variable at a particular time. If an item is not listed as being in the set, it is not in the set. That is, the presence of each item in the set is explicitly recorded; the absence of an item is implicit.

Explicit Representation The explicit representation of each item in the base type in each set variable is called a bit vector representation. A one-to-one mapping matches each item in the base type (ItemType) with a Boolean flag. If the item is in a set variable, the flag is true; if the item is not in the set variable, the flag is false. Languages that have a built-in set data type use this technique where the Boolean flag is represented by a bit-hence the name "bit vector."

Bit vector A representation that maps each item in the component type to a Boolean flag

We can use an array-based implementation for the bit vector because the cardinality of the base type gives us the size of the array. Of course, this technique does not work for infinite base types or even fairly large base types. However, it would work very well for a limited base type. For example, if we want to implement a set type, with a base type of the uppercase letters, we could represent each set instance with a Boolean array indexed from 0 through 25. The flag in position 0 represents the presence or absence of the letter "A," the flag in position 1 represents the presence or absence of the letter "B," and so on. The empty set would be an array of all false values; the universal set would be an array of all true values. The constructor would create an array and set each position to false. All of the binary operations could be accomplished using Boolean operators.

Here is the class definition for the explicit implementation. Because the base type can be anything, we need a function to map the item into an index within the array size. We leave this mapping function to the user of the set.

#include "map.h"
// File map.h must include a definition of ItemType and a function named
//  "map" that maps an item of ItemType into an index between 0 and max - 1
//  if the parameterized constructor is used and between 0 and 399 if
//  the default constructor is used.
class SetType
{
public:
  SetType();        // Default constructor: Array size is 400.
  SetType(int max); // Paramaterized constructor
  ~SetType();       // Destructor
  SetType(const SetType anotherSet);  // Copy constructor
  void MakeEmpty();
  void Store(ItemType item);
  void Delete(ItemType item);
  bool IsEmpty();
  bool IsFull();
  int CardinalityIs();
  SetType Union(SetType setB);
  SetType Intersection(SetType setB);
  SetType Difference(SetType setB);
  void Print(std::ofstream& outFile);
private:
  int maxItems;
  ItemType* items;
};

In the following algorithms, setA is self.

IsFull has no meaning in this implementation, but IsEmpty and MakeEmpty do. We leave the rest of the algorithms as exercises.

Implicit Representation Set variables implemented as bit vectors (explicit set representation) use the same amount of space regardless of how many items are in the set. The space is proportional to the cardinality of the universal set. (Actually, the space equals the size of the universal set.) This fact can limit the base type to comparatively small finite sets.

A second approach to implementing sets involves keeping a list of the items in the set. The empty set is an empty list. If an item is stored in a set, we store the item in the list. Implicit set representation requires only space that is proportional to the number of items in a set at any one time. The limit in this case is the cardinality of the individual sets rather than the cardinality of the universal set.

With implicit set representation, we implement the set algorithms as list algorithms. We use ListType in the following definition, because we do not at the moment know which implementation we should use.

class SetType         // Set ItemType using a typedef statement.
{
public:
  SetType();          // Default constructor: Array size is 400.
  SetType(int max);   // Parameterized constructor.
  ~SetType();         // Destructor.
  void MakeEmpty();
  void Store(ItemType item);
  void Delete(ItemType item);
  bool IsEmpty();
  bool IsFull();
  int CardinalityIs();
  SetType Union(SetType setB);
  SetType Intersection(SetType setB);
  SetType Difference(SetType setB);
private:
  ListType items;
};

The SetType constructor calls the ListType constructor. MakeEmpty, IsEmpty, IsFull, and CardinalityIs just call the corresponding list operations. The constraint on the Store operation says that only one copy exists. Thus the RetrieveItem list operation must be called before Store. If the item is in the set, the function returns without doing anything. If the item is not in the set, InsertItem is called. The constraint on Delete says to delete the item if it is in the set. Again RetrieveItem is called. If the item is in the set, it is deleted; if it is not there, the function returns without doing anything further.

Let's consider how to implement Intersection, one of the binary operations using ListType. So far, we have not decided which of the lists we should use. In our algorithm, we say setA, but we are referring to the items member of setA, which is a list.

Recall that the intersection of two sets consists of the set made up of those items that are found in both of the other sets. If we were doing this operation by hand, one algorithm might be to take an item from the first set and scan the second set to see if it is there. If it is, the item goes into the result. We repeat this algorithm for every item in the first set. That's easy to express using our list operations.

The Big-O complexity of this algorithm is O(N * N) or O(NlogN), depending on how RetrieveItem is implemented. A little thought produces an algorithm that has O(N) complexity. Assume that the lists are sorted. Rather than looking at one item from the first list (set) and checking whether it is in the second list, let's look at the first item from each list. Three situations can occur:

  • The item from setA comes before the item from setB.

  • The item from setB comes before the item from setA.

  • The items are equal.

If the items are equal, the item from one of the sets goes into the result, and we get a new item from both setA and setB. If the item from setA comes before the item from setB, we know that the item from setA cannot be in the result, so we get another item from setA. If the item from setB comes before the item from setA, we know that the item from setB cannot be in the result, so we get another item from setB.

Here is the revised algorithm based on the assumption that the items are sorted within the container:

Be very careful: Sets are unordered collections, but the implementation structure that contains the sets can be ordered-and we can take advantage of that ordering to produce an O(N) algorithm for the intersection operation. This process of moving down two ordered lists in parallel is called a merge. This very useful algorithm can be used to implement the other binary operations in O(N) time. We leave the design of these algorithms as exercises.

Both implementations are left as programming assignments.



Previous Section
 < Free Open Study > 
Next Section
Converted from CHM to HTML with chm2web Pro 2.85 (unicode)