Previous Section
 < Free Open Study > 
Next Section


2.2 Abstraction and Built-In Types

In the last section, we suggested that a built-in simple type such as int or float could be viewed as an abstraction whose underlying implementation is defined in terms of machine-level operations. The same perspective applies to built-in composite data types provided in programming languages to build data objects. A composite data type is one in which a name is given to a collection of data items. Composite data types come in two forms: unstructured and structured. An unstructured composite type is a collection of components that are not organized with respect to one another. A structured data type is an organized collection of components in which the organization determines the method used to access individual data components.

Composite data type A data type that allows a collection of values to be associated with an object of that type

For instance, C++ provides the following composite types: records (structs), classes, and arrays of various dimensions. Classes and structs can have member functions as well as data, but it is the organization of the data we are considering here. Classes and structs are logically unstructured; arrays are structured.

Let's look at each of these types from our three perspectives. First, we examine the abstract view of the structure-how we construct variables of that type and how we access individual components in our programs. Next, from an application perspective, we discuss what kinds of things can be modeled using each structure. These two points of view are important to you as a C++ programmer. Finally, we look at how some of the structures may be implemented-how the "logical" accessing function is turned into a location in memory. For built-in constructs, the abstract view is the syntax of the construct itself, and the implementation level remains hidden within the compiler. So long as you know the syntax, you as a programmer do not need to understand the implementation view of predefined composite data types. As you read through the implementation sections and see the formulas needed to access an element of a composite type, you should appreciate why information hiding and encapsulation are necessary.

Records

The record is not available in all programming languages. FORTRAN, for instance, does not support records; conversely, COBOL, a business-oriented language, uses records extensively. In C++, records are implemented by structs. C++ classes are another implementation of a record. For the purposes of the following discussion, we use the generic term "record," but both structs and classes behave as records.

Logical Level A record is a composite data type made up of a finite collection of not necessarily homogeneous elements called members or fields. Accessing is done directly through a set of named member or field selectors.

We illustrate the syntax and semantics of the component selector within the context of the following struct declaration:

struct CarType
{
  int year;
  char maker[10];
  float price;
};
CarType myCar;

The record variable myCar is made up of three components. The first, year, is of type int. The second, maker, is an array of characters. The third, price, is a float number. The names of the components make up the set of member selectors. A picture of myCar appears in Figure 2.5.

Click To expand
Figure 2.5: Record myCar

The syntax of the component selector is the record variable name, followed by a period, followed by the member selector for the component in which you are interested:

Click To expand

If this expression appears on the right-hand side of an assignment statement, a value is being extracted from that place (for example, pricePaid = myCar.price). If it appears on the left-hand side, a value is being stored in that member of the struct (for example, myCar.price = 20009.33).

Here myCar.maker is an array whose elements are of type char. You can access that array member as a whole (for example, myCar.maker), or you can access individual characters by using an index.

Click To expand

In C++, a struct may be passed as a parameter to a function (either by value or by reference), one struct may be assigned to another struct of the same type, and a struct may be a function return value.

C++: Parameter Passing
Start example

C++ supports two types of formal parameters: value parameters and reference parameters. A value parameter is a formal parameter that receives a copy of the contents of the corresponding actual parameter (also called argument). Because the formal parameter holds a copy of the actual parameter, the actual parameter cannot be changed by the function to which it is a parameter. On the other hand, a reference parameter is a formal parameter that receives the location (memory address) of the corresponding actual parameter. Because the formal parameter holds the memory address of the actual parameter, the function can change the contents of the actual parameter. By default in C++, arrays are passed by reference, and nonarray parameters are passed by value.

To specify that a formal nonarray parameter is a reference parameter, append an ampersand (&) to the right of the type name on the formal parameter list. Look at the following examples:

void AdjustForInflation(CarType& car, float perCent)
// Increases price by the amount specified in perCent.
{
  car.price = car.price * perCent + car.price;
}

bool LateModel(CarType car, int date)
// Returns true if the car's model year is later than or
//  equal to date; returns false otherwise.
{
  return car.year >= date;
}

The function AdjustForInflation changes the price data member of the formal parameter car, so car must be a reference parameter. Within the body of the function, car.price is the price member of the actual parameter. The function LateModel examines car without changing it, so car should be a value parameter. Within the function, car.year is a copy of the caller's actual parameter.

End example

Application Level Records (structs) are very useful for modeling objects that have a number of characteristics. This data type allows us to collect various types of data about an object and to refer to the whole object by a single name. We also can refer to the different members of the object by name. You probably have seen many examples of records used in this way to represent objects.

Records are also useful for defining other data structures, allowing programmers to combine information about the structure with the storage of the elements. We make extensive use of records in this way when we develop representations of our own programmer-defined data structures.

Implementation Level Two things must be done to implement a built-in composite data type: (1) memory cells must be reserved for the data, and (2) the accessing function must be determined. An accessing function is a rule that tells the compiler and run-time system where an individual element is located within the data structure. Before we examine a concrete example, let's look at memory. The unit of memory that is assigned to hold a value is machine dependent. Figure 2.6 shows several different memory configurations. In practice, memory configuration is a consideration for the compiler writer. To be as general as possible, we will use the generic term cell to represent a location in memory rather than "word" or "byte." In the examples that follow, we assume that an integer or character is stored in one cell and a floating-point number in two cells. (This assumption is not accurate in C++, but we use it here to simplify the discussion.)

Click To expand
Figure 2.6: Memory configurations

The declaration statements in a program tell the compiler how many cells are needed to represent the record. The name of the record then is associated with the characteristics of the record. These characteristics include the following:

  • The location in memory of the first cell in the record, called the base address of the record

  • A table containing the number of memory locations needed for each member of the record

A record occupies a block of consecutive cells in memory.[2] The record's accessing function calculates the location of a particular cell from a named member selector. The basic question is, Which cell (or cells) in this consecutive block do you want?

The base address of the record is the address of the first member in the record. To access any member, we need to know how much of the record to skip to get to the desired member. A reference to a record member causes the compiler to examine the characteristics table to determine the member's offset from the beginning of the record. The compiler then can generate the member's address by adding the offset to the base. Figure 2.7 shows such a table for CarType. If the base address of myCar were 8500, the fields or members of this record would be found at the following addresses:

Address of myCar.year

=

8500 + 0 = 8500

Address of myCar.maker

=

8500 + 1 = 8501

Address of myCar.price

=

8500 + 11 = 8511

Click To expand
Figure 2.7: Implementation-level view of CarType

We said that the record is a nonstructured data type, yet the component selector depends on the relative positions of the members of the record. This is true: A record is a structured data type if viewed from the implementation perspective. However, from the user's view, it is unstructured. The user accesses the members by name, not by position. For example, if we had defined CarType as

struct CarType
{
  char make[10];
  float price;
  int year;
};

the code that manipulates instances of CarType would not change.

One-Dimensional Arrays

Logical Level A one-dimensional array is a structured composite data type made up of a finite, fixed-size collection of ordered homogeneous elements to which direct access is available. Finite indicates that a last element is identifiable. Fixed size means that the size of the array must be known in advance; it doesn't mean that all slots in the array must contain meaningful values. Ordered means that there is a first element, a second element, and so on. (The relative position of the elements is ordered, not necessarily the values stored there.) Because the elements in an array must all be of the same type, they are physically homogeneous; that is, they are all of the same data type. In general, it is desirable for the array elements to be logically homogeneous as well-that is, for all the elements to have the same purpose. (If we kept a list of numbers in an array of integers, with the length of the list-an integer-kept in the first array slot, the array elements would be physically, but not logically, homogeneous.)

The component selection mechanism of an array is direct access, which means we can access any element directly, without first accessing the preceding elements. The desired element is specified using an index, which gives its relative position in the collection. Later we discuss how C++ uses the index and some characteristics of the array to figure out exactly where in memory to find the element. That's part of the implementation view, and the application programmer using an array doesn't need to be concerned with it. (It's encapsulated.)

Which operations are defined for the array? If the language we were using lacked predefined arrays and we were defining arrays ourselves, we would want to specify at least three operations (shown here as C++ function calls):

CreateArray(anArray, numberOfSlots);
// Create array anArray with numberOfSlots locations.

Store(anArray, value, index);
// Store value into anArray at position index.

Retrieve(anArray, value, index);
// Retrieve into value the array element found at position index.

Because arrays are predefined data types, the C++ programming language supplies a special way to perform each of these operations. C++'s syntax provides a primitive constructor for creating arrays in memory, with indexes used as a way to directly access an element of an array.

In C++, the declaration of an array serves as a primitive constructor operation. For example, a one-dimensional array can be declared with this statement:

int numbers[10];

The type of the elements in the array comes first, followed by the name of the array with the number of elements (the array size) in brackets to the right of the name. This declaration defines a linearly ordered collection of 10 integer items. Abstractly, we can picture numbers as follows:

Click To expand

Each element of numbers can be accessed directly by its relative position in the array. The syntax of the component selector is described as follows:

array-name[index-expression]

The index expression must be of an integral type (char, short, int, long, or an enumeration type). The expression may be as simple as a constant or a variable name, or as complex as a combination of variables, operators, and function calls. Whatever the form of the expression, it must result in an integer value.

In C++, the index range is always 0 through the array size minus 1; in the case of numbers, the value must be between 0 and 9. In some other languages, the user may explicitly give the index range.

The semantics (meaning) of the component selector is "Locate the element associated with the index expression in the collection of elements identified by array-name." The component selector can be used in two ways:

  1. To specify a place into which a value is to be copied:

    numbers[2] = 5;
    

    or

    cin >> numbers[2];
    
    
  2. To specify a place from which a value is to be retrieved:

    value  =  numbers[4];
    

    or

    cout << numbers[4];
    

If the component selector appears on the left-hand side of the assignment statement, it is being used as a transformer: The data structure is changing. If the component selector appears on the right-hand side of the assignment statement, it is being used as an observer: It returns the value stored in a place in the array without changing it. Declaring an array and accessing individual array elements are operations predefined in nearly all high-level programming languages.

In C++, arrays may be passed as parameters (by reference only), but cannot be assigned to one another or serve as the return value type of a function.

C++: One-Dimensional Arrays as Parameters
Start example

In C++, arrays can only be reference parameters; it is not possible to pass an array by value. Therefore, the ampersand (&) to the right of the type is omitted. When an array is the formal parameter, the base address of the array (the memory address of the first slot in the array) is actually passed to a function. This is true whether the array has one or more dimensions. When declaring a one-dimensional array parameter, the compiler needs to know only that the parameter is an array; it does not need to know its size. If the size of the formal parameter is listed, the compiler ignores it. The code in the function that processes the array is responsible for ensuring that only legitimate array slots are referenced. Therefore, a separate parameter often is passed to the function to specify how many array slots will be processed.

int SumValues(int values[], int numberOfValues)
// Returns the sum of values[0] through values[numberOfValues-1].
{
  int sum = 0;

  for (int index = 0; index < numberOfValues; index++)
    sum = sum + values[index];
  return sum;
}

If arrays are always passed as reference parameters, how can we protect the actual parameter from inadvertent changes? For example, in SumValues the parameter values is only to be inspected but not modified. How can we protect it from being changed? We can declare it to be a const parameter as follows:

int SumValues(const int values[], int numberOfValues)

Within the function body, trying to change the contents of values now causes a syntax error.

End example

Application Level A one-dimensional array is the natural structure for the storage of lists of like data elements. Examples include grocery lists, price lists, lists of phone numbers, lists of student records, and lists of characters (a string). You have probably used one-dimensional arrays in similar ways in some of your programs.

Implementation Level Of course, when you use an array in a C++ program you do not have to be concerned with all of the implementation details. You have been dealing with an abstraction of the array from the time the construct was introduced, and you will never have to consider all the messy details described in this section.

An array declaration statement tells the compiler how many cells are needed to represent that array. The name of the array then is associated with the characteristics of the array. These characteristics include the following:

  • The number of elements (Number)

  • The location in memory of the first cell in the array, called the base address of the array (Base)

  • The number of memory locations needed for each element in the array (SizeOfElement)

The information about the array characteristics is often stored in a table called an array descriptor or dope vector. When the compiler encounters a reference to an array element, it uses this information to generate code that calculates the element's location in memory at run time.

How are the array characteristics used to calculate the number of cells needed and to develop the accessing functions for the following arrays? As before, we assume for simplicity that an integer or character is stored in one cell and a floating-point number is stored in two cells.

int data [10];
float money[6];
char letters[261;

These arrays have the following characteristics:

 

data

money

letters

Number

10

6

26

Base

unknown

unknown

unknown

SizeOfElement

1

2

1

Let's assume that the C++ compiler assigns memory cells to variables in sequential order. If, when the preceding declarations are encountered, the next memory cell available to be assigned is, say, 100, the memory assignments are as follows. (We have used 100 to make the arithmetic easier.)

Click To expand

Now we have determined the base address of each array: data is 100, money is 110, and letters is 122. The arrangement of these arrays in memory gives us the following relationships:

Click To expand

In C++ the accessing function that gives us the position of an element in a one-dimensional array associated with the expression Index is

Address(Index) = Base + Offset of the element at position Index

How do we calculate the offset? The general formula is

Offset = Index *SizeOfElement

The whole accessing function becomes

Address(Index) = Base + Index *SizeOfElement

Let's apply this formula and see if we do get what we claimed we should.

 

Base + Index * SizeOfElement

Address

data [0]

100 + (0 * 1)

= 100

data[8]

100 +(8*1)

= 108

letters[1]

122 + (1 * 1)

= 123

letters[25]

122 + (25 * 1)

= 147

money[0]

110 + (0 * 2)

= 110

money[3]

110 + (3 * 2)

= 116

The calculation of an array element address in C++ is much simpler than it is in many other languages because C++ assumes that the index range is from 0 through the maximum size minus 1. Languages such as Pascal and Ada allow the user to specify the lower and upper bounds on the index range rather than giving the size. This extra flexibility complicates the indexing process considerably but leaves the abstraction cleaner.

Earlier, we noted that an array is a structured data type. Unlike with a record, whose logical view is unstructured but whose implementation view is structured, both views of an array are structured. The structure is inherent in the logical component selector.

As we mentioned at the beginning of this section, when you use an array in a C++ program you do not have to be concerned with all of these implementation details. The advantages of this approach are very clear: You can think of the data and the operations in a logical sense and can consider their use without having to worry about implementation details. The lower levels are still there-they just remain hidden from you. We strive for this same sort of separation of the abstract and implementation views in the programmer-defined classes discussed in the remainder of this book.

Two-Dimensional Arrays

Logical Level Most of what we have said about the abstract view of a one-dimensional array applies as well to arrays of more than one dimension. A two-dimensional array is a composite data type made up of a finite, fixed-size collection of homogeneous elements ordered in two dimensions. Its component selector is direct access: a pair of indexes specifies the desired element by giving its relative position in each dimension.

A two-dimensional array is a natural way to represent data that is logically viewed as a table with columns and rows. The following example illustrates the syntax for declaring a two-dimensional array in C++.

int table [10] [6];

The abstract picture of this structure is a grid with rows and columns.

Click To expand

The component selector for the two-dimensional array is as follows:

Click To expand
C++: Two-Dimensional Arrays as Parameters
Start example

Two-dimensional arrays are stored in row order in C++. That is, all of the elements in one row are stored together, followed by all of the elements in the next row. To access any row other than the first, the compiler must be able to calculate where each row begins; this calculation depends on how many elements are present in each row. The second row begins at the base address plus the number of elements in each row, and each succeeding row begins at the address of the previous row plus the number of elements in each row. The second dimension-the number of columns-tells us how many elements are in each row; therefore the size of the second dimension must be included in the declaration of the formal parameter for a two-dimensional array.

int ProcessValues(int values[][5])
{
.
.
.
}

ProcessValues works for an array with any number of rows as long as it has exactly five columns. That is, the size of the second dimension of both the actual and formal array parameters must be identical. To ensure that formal and actual two-dimensional array parameters have the same size, use the typedef statement to define a two-dimensional array type and then declare both the actual and the formal parameters to be of that type. For example,

const int NUM_ROWS = 5;
const int NUM_COLS = 4;
typedef float TableType[NUM_ROWS] [NUM_COLS];

int ProcessValues(TableType table):

TableType mine;
TableType yours;

The typedef statement associates a two-dimensional float array with five rows and four columns with the type name TableType; mine and yours are two such arrays. Any actual parameter for ProcessValues should be of type TableType. By setting up the types this way, no possible mismatch can occur.

End example

Application Level As mentioned in the previous section, a two-dimensional array is the ideal data structure for modeling data that are logically structured as a table with rows and columns. The first dimension represents rows, and the second dimension represents columns. Each element in the array contains a value, and each dimension represents a relationship. For example, we usually represent a map as a two-dimensional array.

Click To expand

As with the one-dimensional array, the operations available for a two-dimensional array object are so limited (only creation and direct access) that the major application is the implementation of higher-level objects.

Implementation Level The implementation of two-dimensional arrays involves the mapping of two indexes to a particular memory cell. The mapping functions are more complicated than those for one-dimensional arrays. We do not give them here, as you will learn to write these accessing functions in later courses. Our goal is not to teach you to be a compiler writer but rather to give you an appreciation of the value of information hiding and encapsulation.

[2]In some machines this statement may not be exactly true, because boundary alignment (full- or half-word) may require that some space in memory be skipped so that the next member starts on an address that is divisible by 2 or 4. See Figure 2.6.



Previous Section
 < Free Open Study > 
Next Section
Converted from CHM to HTML with chm2web Pro 2.85 (unicode)