Previous Section
 < Free Open Study > 
Next Section

2.1 Different Views of Data

What Do We Mean by Data?

When we talk about the function of a program, we use words such as "add," "read," "multiply," "write," "do," and so on. The function of a program describes what it does in terms of the verbs in the programming language.

The data are the nouns of the programming world: the objects that are manipulated, the information that is processed by a computer program. In a sense, this information is just a collection of bits that can be turned on or off. The computer itself needs to have data in this form. Humans, however, tend to think of information in terms of somewhat larger units such as numbers and lists, so we want at least the human-readable portions of our programs to refer to data in a way that makes sense to us. To separate the computer's view of data from our own view, we use data abstraction to create other views. Whether we use functional decomposition to produce a hierarchy of tasks or object-oriented design to produce a hierarchy of cooperating objects, data abstraction is essential.

Data abstraction The separation of a data type's logical properties from its implementation

Data Abstraction

Many people feel more comfortable with things that they perceive as real than with things that they think of as abstract. As a consequence, "data abstraction" may seem more forbidding than a more concrete entity such as an "integer." But let's take a closer look at that very concrete-and very abstract-integer you've been using since you wrote your earliest programs.

Just what is an integer? Integers are physically represented in different ways on different computers. In the memory of one machine, an integer may be a binary-coded decimal. In a second machine, it may be a sign-and-magnitude binary. And in a third one, it may be represented in one's complement or two's complement notation. Although you may not know what any of these terms mean, that lack of knowledge hasn't stopped you from using integers. (You learn about these terms in an assembly language course, so we do not explain them here.) Figure 2.1 shows several representations of an integer number.

Click To expand
Figure 2.1: The decimal equivalents of an 8-bit binary number

The way that integers are physically represented determines how the computer manipulates them. As a C++ programmer, you rarely get involved at this level; instead, you simply use integers. All you need to know is how to declare an int type variable and what operations are allowed on integers: assignment, addition, subtraction, multiplication, division, and modulo arithmetic.

Consider the statement

distance = rate * time;

It's easy to understand the concept behind this statement. The concept of multiplication doesn't depend on whether the operands are, say, integers or real numbers, despite the fact that integer multiplication and floating-point multiplication may be implemented in very different ways on the same computer. Computers would not be so popular if every time we wanted to multiply two numbers we had to get down to the machine-representation level. But that isn't necessary: C++ has surrounded the int data type with a nice, neat package and has given you just the information you need to create and manipulate data of this type.

Another word for "surround" is "encapsulate." Think of the capsules surrounding the medicine you get from the pharmacist when you're sick. You don't have to know anything about the chemical composition of the medicine inside to recognize the big blue-and-white capsule as your antibiotic or the little yellow capsule as your decongestant. Data encapsulation means that the physical representation of a program's data is surrounded. The user of the data doesn't see the implementation, but deals with the data only in terms of its logical picture-its abstraction.

Data encapsulation The separation of the representation of data from the applications that use the data at a logical level; a programming language feature that enforces information hiding

If the data are encapsulated, how can the user get to them? Operations must be provided to allow the user to create, access, and change data. Let's look at the operations C++ provides for the encapsulated data type int. First, you can create ("construct") variables of type int using declarations in your program. Then you can assign values to these integer variables by using the assignment operator or by reading values into them and perform arithmetic operations using +, -, *, /, and %. Figure 2.2 shows how C++ has encapsulated the type int in a tidy package.

Click To expand
Figure 2.2: A black box representing an integer

The point of this discussion is that you have been dealing with a logical data abstraction of "integer" since the very beginning. The advantages of doing so are clear: You can think of the data and the operations in a logical sense and can consider their use without having to worry about implementation details. The lower levels are still there-they're just hidden from you.

Remember that the goal in design is to reduce complexity through abstraction. We can extend this goal further: to protect our data abstraction through encapsulation. We refer to the set of all possible values (the domain) of an encapsulated data "object," plus the specifications of the operations that are provided to create and manipulate the data, as an abstract data type (ADT for short).

Abstract data type (ADT) A data type whose properties (domain and operations) are specified independently of any particular implementation

Data Structures

A single integer can be very useful if we need a counter, a sum, or an index in a program, but generally we must also deal with data that have lots of parts, such as a list. We describe the logical properties of such a collection of data as an abstract data type; we call the concrete implementation of the data a data structure. When a program's information is made up of component parts, we must consider an appropriate data structure. Data structures have a few features worth noting. First, they can be "decomposed" into their component elements. Second, the arrangement of the elements is a feature of the structure that affects how each element is accessed. Third, both the arrangement of the elements and the way they are accessed can be encapsulated.

Data structure A collection of data elements whose organization is characterized by accessing operations that are used to store and retrieve the individual data elements; the implementation of the composite data members in an abstract data type

Let's look at a real-life example: a library. A library can be decomposed into its component elements-books. The collection of individual books can be arranged in a number of ways, as shown in Figure 2.3. Obviously, the way the books are physically arranged on the shelves determines how one would go about looking for a specific volume. The particular library with which we're concerned doesn't let its patrons get their own books, however; if you want a book, you must give your request to the librarian, who retrieves the book for you.

Click To expand
Figure 2.3: A collection of books ordered in different ways

The library "data structure" is composed of elements (books) in a particular physical arrangement; for instance, it might be ordered on the basis of the Dewey decimal system. Accessing a particular book requires knowledge of the arrangement of the books. The library user doesn't have to know about the structure, however, because it has been encapsulated: Users access books only through the librarian. The physical structure and the abstract picture of the books in the library are not the same. The card catalog provides logical views of the library-ordered by subject, author, or title-that differ from its physical arrangement.

We use the same approach to data structures in our programs. A data structure is defined by (1) the logical arrangement of data elements, combined with (2) the set of operations we need to access the elements.

Notice the difference between an abstract data type and a data structure. The former is a high-level description: the logical picture of the data and the operations that manipulate them. The latter is concrete: a collection of data elements and the operations that store and retrieve individual elements. An abstract data type is implementation independent, whereas a data structure is implementation dependent. A data structure is how we implement the data in an abstract data type whose values have component parts. The operations on an abstract data type are translated into algorithms on the data structure.

Another view of data focuses on how they are used in a program to solve a particular problem-that is, their application. If we were writing a program to keep track of student grades, we would need a list of students and a way to record the grades for each student. We might take a by-hand grade book and model it in our program. The operations on the grade book might include adding a name, adding a grade, averaging a student's grades, and so on. Once we have written a specification for our grade book data type, we must choose an appropriate data structure to implement it and design the algorithms to implement the operations on the structure.

In modeling data in a program, we wear many hats. That is, we must determine the logical picture of the data, choose the representation of the data, and develop the operations that encapsulate this arrangement. During this process, we consider data from three different perspectives, or levels:

  1. Application (or user) level: A way of modeling real-life data in a specific context; also called the problem domain

  2. Logical (or abstract) level: An abstract view of the data values (the domain) and the set of operations to manipulate them

  3. Implementation level: A specific representation of the structure to hold the data items, and the coding of the operations in a programming language (if the operations are not already provided by the language)

In our discussion, we refer to the second perspective as the "abstract data type." Because an abstract data type can be a simple type such as an integer or character, as well as a structure that contains component elements, we also use the term "composite data type" to refer to abstract data types that may contain component elements. The third level describes how we actually represent and manipulate the data in memory: the data structure and the algorithms for the operations that manipulate the items on the structure.

Let's see what these different viewpoints mean in terms of our library analogy. At the application level, we focus on entities such as the Library of Congress, the Dimsdale Collection of Rare Books, and the Austin City Library.

At the logical level, we deal with the "what" questions. What is a library? What services (operations) can a library perform? The library may be seen abstractly as "a collection of books" for which the following operations are specified:

  • Check out a book

  • Check in a book

  • Reserve a book that is currently checked out

  • Pay a fine for an overdue book

  • Pay for a lost book

How the books are organized on the shelves is not important at the logical level, because the patrons don't have direct access to the books. The abstract viewer of library services is not concerned with how the librarian actually organizes the books in the library. Instead, the library user needs to know only the correct way to invoke the desired operation. For instance, here is the user's view of the operation to check in a book: Present the book at the check-in window of the library from which the book was checked out, and receive a fine slip if the book is overdue.

At the implementation level, we deal with the "how" questions. How are the books cataloged? How are they organized on the shelf? How does the librarian process a book when it is checked in? For instance, the implementation information includes the fact that the books are cataloged according to the Dewey decimal system and arranged in four levels of stacks, with 14 rows of shelves on each level. The librarian needs such knowledge to be able to locate a book. This information also includes the details of what happens when each operation takes place. For example, when a book is checked back in, the librarian may use the following algorithm to implement the check-in operation:

All of this activity, of course, is invisible to the library user. The goal of our design approach is to hide the implementation level from the user.

Picture a wall separating the application level from the implementation level, as shown in Figure 2.4. Imagine yourself on one side and another programmer on the other side. How do the two of you, with your separate views of the data, communicate across this wall? Similarly, how do the library user's view and the librarian's view of the library come together? The library user and the librarian communicate through the data abstraction. The abstract view provides the specification of the accessing operations without telling how the operations work. It tells what but not how. For instance, the abstract view of checking in a book can be summarized in the following specification:

Click To expand
Figure 2.4: Communication between the application level and implementation level

The only communication from the user into the implementation level occurs in terms of input specifications and allowable assumptions-the preconditions of the accessing routines. The only output from the implementation level back to the user is the transformed data structure described by the output specifications, or postconditions, of the routines. The abstract view hides the data structure, but provides windows into it through the specified accessing operations.

When you write a program as a class assignment, you often deal with data at all three levels. In a job situation, however, you may not. Sometimes you may program an application that uses a data type that has been implemented by another programmer. Other times you may develop "utilities" that are called by other programs. In this book we ask you to move back and forth between these levels.

Abstract Data Type Operator Categories

In general, the basic operations that are performed on an abstract data type are classified into four categories: constructors, transformers (also called mutators), observers, and iterators.

A constructor is an operation that creates a new instance (object) of an abstract data type. It is almost always invoked at the language level by some sort of declaration. Transformers are operations that change the state of one or more of the data values, such as inserting an item into an object, deleting an item from an object, or making an object empty. An operation that takes two objects and merges them into a third object is a binary transformer.[1]

An observer is an operation that allows us to observe the state of one or more of the data values without changing them. Observers come in several forms: predicates that ask if a certain property is true, accessor or selector functions that return a copy of an item in the object, and summary functions that return information about the object as a whole. A Boolean function that returns true if an object is empty and false if it contains any components is an example of a predicate. A function that returns a copy of the last item put into the structure is an example of an accessor function. A function that returns the number of items in the structure is a summary function.

An iterator is an operation that allows us to process all components in a data structure sequentially. Operations that print the items in a list or return successive list items are iterators. Iterators are only defined on structured data types.

In later chapters, we use these ideas to define and implement some useful data types that may be new to you. First, however, let's explore the built-in composite data types C++ provides for us.

[1]In some of the literature, operations that create new instances are called primitive constructors, and transformers are called nonprimitive constructors.

Previous Section
 < Free Open Study > 
Next Section
Converted from CHM to HTML with chm2web Pro 2.85 (unicode)