According to Date

Decent Exposure

A brief look at some fundamental ideas from The Third Manifesto

In my August column, I mentioned that The Third Manifesto--the Manifesto for short--requires, for any given scalar type, certain "THE_ operators" to be defined that expose some possible representation for values and variables of the type in question. I gave the example of a type POINT, for which we might define operators THE_X and THE_Y to expose a Cartesian coordinates possible representation. However, I also emphasized the fact that Cartesian coordinates were only a possible representation--the actual representation might be Cartesian coordinates, polar coordinates, or something else entirely. Possible representations are relevant to the model; actual representations, by contrast, are relevant to the implementation merely.

This business of "possible representations" constitutes a fundamental part of the thinking underlying the Manifesto. In particular, it's highly relevant to an issue that I'd like to discuss in some detail over the months to come--namely, the (surprisingly complex) issue of type inheritance. This month, therefore, I'd like to lay some groundwork by explaining the matter of possible representations in a little more depth. I'll begin by discussing the notion of selector operators.

Scalar Selectors

For every data type (for every scalar type in particular), the Manifesto requires, among other things, that an operator be defined whose purpose is simply to "select"--that is, specify--a particular value of the type in question. Such selector operators are a generalization of the familiar notion of a literal. (A literal is a special case of a selector invocation, but not all selector invocations are literals.) For example, consider the following code fragment:

VAR X RATIONAL INIT ( +4.0 ) ;
VAR Y RATIONAL INIT ( -3.0 ) ;
VAR P POINT ;
P := POINT ( X, Y ) ;

(The example is expressed in Tutorial D; recall from three months back that Tutorial D is a language defined principally as a vehicle for illustrating and discussing features of the Manifesto. I should also explain that throughout the Manifesto we use the more accurate RATIONAL in place of the more traditional REAL; after all, floating point numbers are by definition rational numbers specifically, rather than real numbers in general.)

The effect of the code fragment is to set the variable P to contain a particular POINT value--namely, the point with Cartesian coordinates (4.0,-3.0). The expression on the right hand side of the assignment is, precisely, an invocation of a selector for type POINT; its effect is, precisely, to select the point with the specified Cartesian coordinates. (Note: If I'd written simply POINT (4.0,-3.0) instead of POINT (X,Y), then I would have been using a selector invocation that was in fact a literal. Also, for the benefit of readers who might be familiar with object systems, I should emphasize the fact that in our model the variable P now really does contain a point as such, not a "reference to" or "object ID of" such a point. In fact, our model explicitly proscribes object IDs.)

Observe, therefore, that the parameters to a given selector S together constitute--necessarily--a possible representation PR for objects of the pertinent type T. In the example, Cartesian coordinates X and Y constitute a possible representation for points.

Now, object systems do have something analogous to our selector operators (the more usual object term is constructor functions), and so users of those systems are necessarily aware of certain possible representations. However, object systems do not, in general, go on to insist that those possible representations be exposed for arbitrary purposes. For example, users might know from the format of the corresponding constructor function that points have a Cartesian coordinates possible representation, but if the system doesn't provide operators to "get" both the X and the Y coordinate of any given point, they won't be able to perform all kinds of simple operations. Moving on from the code fragment already shown, if no "get Y" operation exists, then the user won't be able to ask what the Y coordinate of point P is, even though he or she knows it's -3.0! In other words, object orientation seems to permit a design policy that isn't very sensible.

In view of the foregoing observations, we decided in the Manifesto to insist on some appropriate discipline. To be specific, we insist that:

Here's an example (Tutorial D again):

TYPE POINT 
POSSREP POINT ( X RATIONAL, Y RATIONAL )
POSSREP POLAR ( R RATIONAL, THETA RATIONAL ) ;

This statement defines the type POINT already used in earlier examples. Type POINT has two possible representations called POINT (Cartesian coordinates) and POLAR (polar coordinates), respectively, and two corresponding selectors with the same names. (Note: We also have another convention in Tutorial D according to which a possible representation with no explicit name of its own inherits the name of the corresponding type by default; thus, the first of the two POSSREP specifications in the example could optionally have omitted the explicit name POINT.)

Of course, the Manifesto requires that tuple and relation types have selectors as well. In the interests of simplicity, however, I'm concentrating here on scalar types specifically.

THE_ Operators

As I noted previously, the Manifesto also requires that for each specified possible representation of a given scalar type, a set of operators be "automatically" defined whose purpose is to expose the possible representation in question. And it specifically suggests that THE_ operators be used for this task. Here's the relevant excerpt from the Manifesto (slightly edited):

Let PR be a possible representation for scalar type T, and let PR have components C1, C2, ..., Cn. Define THE_C1, THE_C2, ..., THE_Cn to be a family of operators such that, for each i (i = 1, 2, ..., n), the operator THE_Ci has the following properties:

Note: The term pseudovariable is taken from PL/I. Be aware, however, that PL/I pseudovariables can't be nested, but THE_ pseudovariables can. In other words, we do regard pseudovariable invocations as references to variables, implying among other things that they can appear as arguments to other such invocations.

Here's an example:

TYPE TEMPERATURE POSSREP CELSIUS ( C RATIONAL ) ;
VAR TEMP TEMPERATURE ;
VAR CEL RATIONAL ;
CEL := THE_C ( TEMP ) ; 
THE_C ( TEMP ) := CEL ;

The first of these assignments assigns the temperature denoted by the current value of the TEMPERATURE variable TEMP, converted if necessary to degrees Celsius, to the RATIONAL variable CEL; the second uses the current value of the RATIONAL variable CEL, considered as a temperature in degrees Celsius, to update the TEMPERATURE variable TEMP appropriately. The operator THE_C thus effectively exposes the "degrees Celsius" possible representation for temperatures (for both read-only and update purposes). However, this possible representation is not necessarily an actual representation. For example, temperatures might actually be represented in degrees Fahrenheit, not degrees Celsius.

Here's a slightly more complex example, using the type POINT defined earlier:

VAR Z RATIONAL ;
VAR P POINT ;
Z := THE_X ( P ) ; 
THE_X ( P ) := Z ;

The first of these assignments assigns the X coordinate of the point denoted by the current value of the POINT variable P to the RATIONAL variable Z. The second uses the current value of the RATIONAL variable Z to update the X coordinate of the POINT variable P (speaking a trifle loosely). As I noted earlier, therefore, the operators THE_X and THE_Y effectively expose the Cartesian coordinates possible representation for points, for both read-only and update purposes; again, however, this possible representation is not necessarily the same as any corresponding actual representation.

And one more example, building on the previous one (LINESEG here stands for line segments):

TYPE LINESEG POSSREP ( BEGIN POINT, END POINT ) ;
/* begin and end points--corresponding */
/* selector is called LINESEG by default */
VAR Z RATIONAL ;
VAR LS LINESEG ;
Z := THE_X ( THE_BEGIN ( LS ) ) ; 
THE_X ( THE_BEGIN ( LS ) ) := Z ;

The first of these assignments assigns the X coordinate of the begin point of the current value of LS to the variable Z. The second uses the current value of Z to update the X coordinate of the begin point of the variable LS. (Note the pseudovariable nesting in this second assignment.) The operators THE_BEGIN and THE_END thus effectively expose the "begin and end points" possible representation for line segments--yet again, for both read-only and update purposes. Once again, however, this possible representation is not necessarily the same as any corresponding actual representation.

By the way, the Manifesto also requires support for a multiple form of assignment. Thus, for example, you can use the statement:

THE_BEGIN ( LS ) := P , 
THE_END ( LS ) := Q ;

to update the begin and end points of the line segment variable LS in a single operation.

THE_ Pseudovariables Are Just Shorthand

I now observe that THE_ pseudovariables are logically unnecessary! Consider the "updating" assignment from the first of the three examples in the previous section:

THE_C ( TEMP ) := CEL ;

This assignment, which uses a pseudovariable, is logically equivalent to the following one which doesn't:

TEMP := CELSIUS ( CEL ) ; /* invoke CELSIUS selector */

Similarly, the updating assignment in the second example was as follows:

THE_X ( P ) := Z ;

Here's a logical equivalent that doesn't use a pseudovariable:

P := POINT ( Z, THE_Y ( P ) ) ; /*invoke POINT selector */

Third example:

THE_X ( THE_BEGIN ( LS ) ) := Z ;

Logical equivalent:

LS := LINESEG /* invoke LINESEG selector */
( POINT ( Z, THE_Y ( THE_BEGIN ( LS ) ), THE_END ( LS ) );

In other words, pseudovariables per se aren't strictly necessary in order to support the kind of component-at-a-time updating under discussion (where by "component" I mean possible representation component, of course). However, the pseudovariable approach does seem intuitively more attractive than the alternative (for which it can be regarded as shorthand); moreover, it's potentially even more attractive--though still not logically necessary--if type inheritance is supported, as we'll see in a future installment. It also provides a higher degree of imperviousness to changes in the syntax of the corresponding selector. And it might possibly perform better--though performance has nothing to do with the model, of course.

Why Not GET_ and SET_ Operators?

As you might know, it's more usual in contexts such as the one at hand to speak not in terms of THE_ operators, as the Manifesto does, but in terms of GET_ and SET_ operators instead. For example:

Z := GET_X ( P ) ; 
/* get the X coordinate of P into Z */
CALL SET_X ( P, Z ) ;
/* set the X coordinate of P from Z */

GET_ and SET_ are examples of what the Manifesto calls read-only and update operators, respectively.

Why, then, do we prefer our THE_ operators over the more conventional GET_ and SET_ operators? The answer involves the fact that SET_ operators are update operators, and in our model update operators don't return a value. Now, we impose this latter rule because we don't want the possibility of apparently read-only expressions producing side effects; in particular, we don't want the possibility of apparently "simple retrievals" having the side effect of updating the database! However, the rule does have the consequence that update operator invocations can't be considered as scalar expressions (because they have no value) and therefore can't be nested; instead, they must be thought of as statements--typically CALL statements, as in the example.

Because SET_ operators can't be nested, it follows that a SET_ operator analog of (for example) the assignment:

THE_X ( THE_BEGIN ( LS ) ) := Z ;

would have to look something like this:

VAR TP POINT ; /* temporary holding variable for begin point*/
TP := GET_BEGIN ( LS ) ; /* make a copy of the begin point; */
CALL SET_X ( TP, Z ) ; /* update that copy appropriately; */
CALL SET_BEGIN ( LS, TP ) ; /* now update the begin point */

This example shows why we prefer THE_ pseudovariables to SET_ operators. For symmetry, therefore, we also prefer THE_ operators to GET_ operators (although here we're really talking about a purely syntactic issue, not a logical difference, because GET_ operators, unlike SET_ operators, can be nested).

A Note on Syntax

One last point: An easier way to support our "THE_ operator" requirements might use some kind of dot qualification syntax. Here are some examples (revised versions of certain of the examples shown earlier):

Z := LS.BEGIN.X ; 
LS.BEGIN.X := Z ;
LS := LINESEG ( POINT ( Z, LS.BEGIN.Y ), LS.END ) ;
LS.BEGIN := P , 
LS.END := Q ;

In this series, however, I'll stay with our THE_ notation. (The reason is that in our book on the Manifesto we already use dot qualification syntax for another purpose, not directly related to the topic of this month's column, and I want to stay consistent with that book as much as possible.)

C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database systems. His most recent books are Foundation for Object/Relational Databases: The Third Manifesto, cowritten with Hugh Darwen, and Relational Database Writings 1994-1997, both published by Addison-Wesley in 1998. You can send correspondence to him in care of Database Programming & Design Online at dbpd@mfi.com.


 
search - home - archives - contacts - site index
 

Copyright © 1998 Miller Freeman Inc. All Rights Reserved
Redistribution without permission is prohibited.

Questions? Comments? We would love to hear from you!