In my August column, I mentioned that The Third Manifesto--the Manifesto
for short--requires, for any given scalar type, certain "THE_ operators"
to be defined that expose some possible representation for values and
variables of the type in question. I gave the example of a type POINT,
for which we might define operators THE_X and THE_Y to expose a
Cartesian coordinates possible representation. However, I also
emphasized the fact that Cartesian coordinates were only a possible
representation--the actual representation might be Cartesian coordinates,
polar coordinates, or something else entirely. Possible representations
are relevant to the model; actual representations, by contrast, are
relevant to the implementation merely.
This business of "possible representations" constitutes a fundamental part of the thinking underlying the Manifesto. In particular, it's highly relevant to an issue that I'd like to discuss in some detail over the months to come--namely, the (surprisingly complex) issue of type inheritance. This month, therefore, I'd like to lay some groundwork by explaining the matter of possible representations in a little more depth. I'll begin by discussing the notion of selector operators.
For every data type (for every scalar type in particular), the Manifesto requires, among other things, that an operator be defined whose purpose is simply to "select"--that is, specify--a particular value of the type in question. Such selector operators are a generalization of the familiar notion of a literal. (A literal is a special case of a selector invocation, but not all selector invocations are literals.) For example, consider the following code fragment:
VAR X RATIONAL INIT ( +4.0 ) ; VAR Y RATIONAL INIT ( -3.0 ) ; VAR P POINT ; P := POINT ( X, Y ) ;
(The example is expressed in Tutorial D; recall from three months back
that Tutorial D is a language defined principally as a vehicle for
illustrating and discussing features of the Manifesto. I should also
explain that throughout the Manifesto we use the more accurate RATIONAL
in place of the more traditional REAL; after all, floating point numbers
are by definition rational numbers specifically, rather than real
numbers in general.)
The effect of the code fragment is to set the variable P to contain a
particular POINT value--namely, the point with Cartesian coordinates
(4.0,-3.0). The expression on the right hand side of the assignment is,
precisely, an invocation of a selector for type POINT; its effect is,
precisely, to select the point with the specified Cartesian coordinates.
(Note: If I'd written simply POINT (4.0,-3.0) instead of POINT (X,Y),
then I would have been using a selector invocation that was in fact a
literal. Also, for the benefit of readers who might be familiar with
object systems, I should emphasize the fact that in our model the
variable P now really does contain a point as such, not a "reference to"
or "object ID of" such a point. In fact, our model explicitly proscribes
object IDs.)
Observe, therefore, that the parameters to a given selector S together constitute--necessarily--a possible representation PR for objects of the pertinent type T. In the example, Cartesian coordinates X and Y constitute a possible representation for points.
Now, object systems do have something analogous to our selector operators (the more usual object term is constructor functions), and so users of those systems are necessarily aware of certain possible representations. However, object systems do not, in general, go on to insist that those possible representations be exposed for arbitrary purposes. For example, users might know from the format of the corresponding constructor function that points have a Cartesian coordinates possible representation, but if the system doesn't provide operators to "get" both the X and the Y coordinate of any given point, they won't be able to perform all kinds of simple operations. Moving on from the code fragment already shown, if no "get Y" operation exists, then the user won't be able to ask what the Y coordinate of point P is, even though he or she knows it's -3.0! In other words, object orientation seems to permit a design policy that isn't very sensible.
In view of the foregoing observations, we decided in the Manifesto to insist on some appropriate discipline. To be specific, we insist that:
Here's an example (Tutorial D again):
TYPE POINT POSSREP POINT ( X RATIONAL, Y RATIONAL ) POSSREP POLAR ( R RATIONAL, THETA RATIONAL ) ;
This statement defines the type POINT already used in earlier examples.
Type POINT has two possible representations called POINT (Cartesian
coordinates) and POLAR (polar coordinates), respectively, and two
corresponding selectors with the same names. (Note: We also have another
convention in Tutorial D according to which a possible representation
with no explicit name of its own inherits the name of the corresponding
type by default; thus, the first of the two POSSREP specifications in
the example could optionally have omitted the explicit name POINT.)
Of course, the Manifesto requires that tuple and relation types have selectors as well. In the interests of simplicity, however, I'm concentrating here on scalar types specifically.
THE_ Operators
As I noted previously, the Manifesto also requires that for each specified
possible representation of a given scalar type, a set of operators be
"automatically" defined whose purpose is to expose the possible
representation in question. And it specifically suggests that THE_
operators be used for this task. Here's the relevant excerpt from the
Manifesto (slightly edited):
Let PR be a possible representation for scalar type T, and let PR have components C1, C2, ..., Cn. DefineHere's an example:THE_C1, THE_C2, ..., THE_Cnto be a family of operators such that, for each i (i = 1, 2, ..., n), the operatorTHE_Cihas the following properties:
- Its sole parameter is of declared type T.
- If an invocation of the operator appears in a "source" position (in particular, on the right hand side of an assignment), then it returns the Ci component of its argument. (More precisely, it returns the value of the Ci component of the possible representation PR(v) of its argument value v.)
- If an invocation of the operator appears in a "target" position (in particular, on the left hand side of an assignment), then, first, that argument must be explicitly specified as a scalar variable, not as an arbitrary scalar expression; second, the invocation acts as a pseudovariable, which means that it actually designates--rather than just returning the value of--the Ci component of its argument. (More precisely, it designates the Ci component of the possible representation PR(V) of its argument variable V.)
Note: The term pseudovariable is taken from PL/I. Be aware, however, that PL/I pseudovariables can't be nested, but
THE_pseudovariables can. In other words, we do regard pseudovariable invocations as references to variables, implying among other things that they can appear as arguments to other such invocations.
TYPE TEMPERATURE POSSREP CELSIUS ( C RATIONAL ) ; VAR TEMP TEMPERATURE ; VAR CEL RATIONAL ; CEL := THE_C ( TEMP ) ; THE_C ( TEMP ) := CEL ;
The first of these assignments assigns the temperature denoted by the
current value of the TEMPERATURE variable TEMP, converted if necessary
to degrees Celsius, to the RATIONAL variable CEL; the second uses the
current value of the RATIONAL variable CEL, considered as a temperature
in degrees Celsius, to update the TEMPERATURE variable TEMP
appropriately. The operator THE_C thus effectively exposes the "degrees
Celsius" possible representation for temperatures (for both read-only
and update purposes). However, this possible representation is not
necessarily an actual representation. For example, temperatures might
actually be represented in degrees Fahrenheit, not degrees Celsius.
Here's a slightly more complex example, using the type POINT defined
earlier:
VAR Z RATIONAL ; VAR P POINT ; Z := THE_X ( P ) ; THE_X ( P ) := Z ;
The first of these assignments assigns the X coordinate of the point
denoted by the current value of the POINT variable P to the RATIONAL
variable Z. The second uses the current value of the RATIONAL
variable Z to update the X coordinate of the POINT variable P (speaking
a trifle loosely). As I noted earlier, therefore, the operators THE_X and
THE_Y effectively expose the Cartesian coordinates possible
representation for points, for both read-only and update purposes;
again, however, this possible representation is not necessarily the same
as any corresponding actual representation.
And one more example, building on the previous one (LINESEG here stands
for line segments):
TYPE LINESEG POSSREP ( BEGIN POINT, END POINT ) ; /* begin and end points--corresponding */ /* selector is called LINESEG by default */ VAR Z RATIONAL ; VAR LS LINESEG ; Z := THE_X ( THE_BEGIN ( LS ) ) ; THE_X ( THE_BEGIN ( LS ) ) := Z ;
The first of these assignments assigns the X coordinate of the begin
point of the current value of LS to the variable Z. The second uses the
current value of Z to update the X coordinate of the begin point of the
variable LS. (Note the pseudovariable nesting in this second
assignment.) The operators THE_BEGIN and THE_END thus effectively expose
the "begin and end points" possible representation for line segments--yet
again, for both read-only and update purposes. Once again, however, this
possible representation is not necessarily the same as any corresponding
actual representation.
By the way, the Manifesto also requires support for a multiple form of assignment. Thus, for example, you can use the statement:
THE_BEGIN ( LS ) := P , THE_END ( LS ) := Q ;
to update the begin and end points of the line segment variable LS in a single operation.
THE_ Pseudovariables Are Just Shorthand
I now observe that THE_ pseudovariables are logically unnecessary!
Consider the "updating" assignment from the first of the three examples
in the previous section:
THE_C ( TEMP ) := CEL ;
This assignment, which uses a pseudovariable, is logically equivalent to the following one which doesn't:
TEMP := CELSIUS ( CEL ) ; /* invoke CELSIUS selector */
Similarly, the updating assignment in the second example was as follows:
THE_X ( P ) := Z ;
Here's a logical equivalent that doesn't use a pseudovariable:
P := POINT ( Z, THE_Y ( P ) ) ; /*invoke POINT selector */
Third example:
THE_X ( THE_BEGIN ( LS ) ) := Z ;
Logical equivalent:
LS := LINESEG /* invoke LINESEG selector */ ( POINT ( Z, THE_Y ( THE_BEGIN ( LS ) ), THE_END ( LS ) );
In other words, pseudovariables per se aren't strictly necessary in order to support the kind of component-at-a-time updating under discussion (where by "component" I mean possible representation component, of course). However, the pseudovariable approach does seem intuitively more attractive than the alternative (for which it can be regarded as shorthand); moreover, it's potentially even more attractive--though still not logically necessary--if type inheritance is supported, as we'll see in a future installment. It also provides a higher degree of imperviousness to changes in the syntax of the corresponding selector. And it might possibly perform better--though performance has nothing to do with the model, of course.
GET_ and SET_ Operators?
As you might know, it's more usual in contexts such as the one at hand
to speak not in terms of THE_ operators, as the Manifesto does, but in
terms of GET_ and SET_ operators instead. For example:
Z := GET_X ( P ) ; /* get the X coordinate of P into Z */ CALL SET_X ( P, Z ) ; /* set the X coordinate of P from Z */
GET_ and SET_ are examples of what the Manifesto calls read-only and
update operators, respectively.
Why, then, do we prefer our THE_ operators over the more conventional
GET_ and SET_ operators? The answer involves the fact that SET_
operators are update operators, and in our model update operators don't
return a value. Now, we impose this latter rule because we don't want
the possibility of apparently read-only expressions producing side
effects; in particular, we don't want the possibility of apparently
"simple retrievals" having the side effect of updating the database!
However, the rule does have the consequence that update operator
invocations can't be considered as scalar expressions (because they have
no value) and therefore can't be nested; instead, they must be thought
of as statements--typically CALL statements, as in the example.
Because SET_ operators can't be nested, it follows that a SET_ operator
analog of (for example) the assignment:
THE_X ( THE_BEGIN ( LS ) ) := Z ;
would have to look something like this:
VAR TP POINT ; /* temporary holding variable for begin point*/ TP := GET_BEGIN ( LS ) ; /* make a copy of the begin point; */ CALL SET_X ( TP, Z ) ; /* update that copy appropriately; */ CALL SET_BEGIN ( LS, TP ) ; /* now update the begin point */
This example shows why we prefer THE_ pseudovariables to SET_ operators.
For symmetry, therefore, we also prefer THE_ operators to GET_ operators
(although here we're really talking about a purely syntactic issue, not
a logical difference, because GET_ operators, unlike SET_ operators, can
be nested).
One last point: An easier way to support our "THE_
operator" requirements might use some kind of dot qualification syntax.
Here are some examples (revised versions of certain of the examples
shown earlier):
Z := LS.BEGIN.X ; LS.BEGIN.X := Z ; LS := LINESEG ( POINT ( Z, LS.BEGIN.Y ), LS.END ) ; LS.BEGIN := P , LS.END := Q ;
In this series, however, I'll stay with our THE_ notation. (The reason
is that in our book on the Manifesto we already use dot qualification
syntax for another purpose, not directly related to the topic of this
month's column, and I want to stay consistent with that book as much as
possible.)
C. J. Date is an independent author, lecturer, researcher, and consultant, specializing in relational database systems. His most recent books are Foundation for Object/Relational Databases: The Third Manifesto, cowritten with Hugh Darwen, and Relational Database Writings 1994-1997, both published by Addison-Wesley in 1998. You can send correspondence to him in care of Database Programming & Design Online at dbpd@mfi.com.
Copyright © 1998 Miller Freeman Inc. All Rights Reserved
Redistribution without permission is prohibited.