According to Date

Predicates and Propositions

Hugh Darwen

What a database really is

In my last two columns, I looked at various redundancies in SQL (especially with respect to the GROUP BY and HAVING clauses). You'll find a follow-up column entitled "Fifty Ways to Quote Your Query" on the Database Programming & Design Web site (www.dbpd.com). That makes room in this month's column for a guest contribution from Hugh Darwen: an open letter he wrote to students he tutors on an Open University database course in the United Kingdom. His letter is an important one--I've often thought of trying to write something similar myself--and his explanation for writing it intrigues me: "Almost every year, about halfway through the course, some student or other asks, in a manner that suggests the question has been burning: What's a predicate? I'm always so pleased that anybody has even realized this question might be an important one that I take pains over my careful but ad hoc answer. And then I think: Why didn't we get this out of the way right at the very beginning? This year, I'm giving it a try."

Well, I hope it works for Hugh's students, for I sincerely believe that if only more people--especially DBMS implementers--thought this way about databases, we might stand a chance of seeing the emergence of respectable database systems that people could enjoy and not have to fight with. --Chris Date

INTRODUCTION

Hello. Let me introduce myself. My name is Hugh Darwen, and I'm the tutor (staff number 44525) in region 4 for certain students on course M357, entitled Data Models and Databases.

The preceding paragraph contains three sentences, and those sentences are different in kind as well as in content:

•The first, "Hello," is a mere signal, establishing contact.

•The second is in the form of an imperative, demanding something of you, the reader (although of course we all know that it's just a common courtesy in this particular instance).

•The third is a plain statement of fact.

Each of these three kinds of sentence has an analog in communications between people and computers. We often have to do something special to establish contact with a computer; we often use the imperative style to give commands to the computer; and sometimes such a command includes a statement of fact that we want the computer to remember (because that fact is both true and relevant), or forget (because it is now either false or irrelevant). For the rest of this discussion, I'll focus on this "statement of fact" kind of sentence.

STATEMENTS OF FACT

How do we distinguish statements of fact from other kinds of sentences, such as greetings, imperatives, and questions? Well, here's another example that might help:
 

I'm writing this letter in my study at home in Warwickshire on February 9, 1998.
 

Now, you can't tell whether what I've just told you is a true statement of fact or a false one, but you do know from its form that it is either true or false. By contrast, we can't say of utterances such as "Hello," "Let me introduce myself," or "What's the time?" that they must be either true or false.

By the way, note that I do entertain the notion that a "statement of fact" might be false. If you think the very term statement of fact connotes undeniable truth, please don't worry too much--I could have used some other term, such as assertion or declaration. As always, the concept is more important than the terminology, and sometimes it's difficult to choose the most appropriate term in everyday speech to match a concept one is trying to communicate, especially when that concept is a very precise one. I'll continue to use statement of fact in what follows, but I'm about to introduce an alternative term also, one that's conventionally used to mean precisely the concept I'm trying to convey. That term is proposition.

PROPOSITIONS

The term proposition is what logicians use for the "statement of fact" concept. Aristotle (384-322 B.C.) understood the importance of propositions, and he worked out a formal system of reasoning whereby, from an assumption of the truth of certain given propositions, the truth of certain other derived propositions could be concluded. The given propositions are called axioms, and the method of reasoning is called logic. The axioms and the derived propositions concluded from those axioms are collectively called theorems. For example, given certain propositions already discussed, you could use logic to obtain the following logical consequence (or derived proposition):

The home of the Open University tutor identified by staff number 44525 is in Warwickshire.

Furthermore, if the given propositions are in fact true, then you can be sure the logical consequence (the conclusion) is true, too.

A DATABASE IS A SET OF TRUE PROPOSITIONS

It's useful to view a database as a set of propositions (assumed to be true ones) concerning some enterprise of which the database is supposed to provide some kind of account or record. If we take that view, there are some important questions that arise immediately:

•How do we choose which propositions should be stated to form the record of our enterprise?

•In what form should those propositions be stated?

•How can we instruct the computer to remember or forget a given proposition or set of propositions?

•Can we get the computer to prevent us from stating propositions that are ridiculous or contradictory? (A ridiculous proposition might state that a certain person is 200 years old; a contradictory one might state that a person is both male and female.)

•In what form can we present a question (or "query") to the computer, the response to which would be a proposition or a set of propositions derived by logic from a given database? And in what form should we expect to find that response?

Course M357 answers these questions. Indeed, there's little in that course that's not related to at least one of them, though you might note that I didn't bother to mention certain subsidiary matters, such as who's allowed to access a database, how the computer checks authorizations, how databases might be protected from accidental loss or damage, and so on.

PREDICATES

There's a word in my title that I haven't used yet, and I come to it now: predicate. The concept of predicates is very important, for an understanding of them could underpin everything you'll be asked to learn in course M357.

Consider two things:

•First, logicians from Aristotle onward found that reasoning based just on the notion of propositions had certain severe limitations, which they eventually overcame by studying certain generalized forms of such propositions. They found that, when several propositions were of the same generalized form, various impressive shortcuts could be taken by reasoning in terms of those general forms instead of in terms of individual propositions per se.

•Second, commercial databases can contain billions of propositions; if propositions of the same general form can't somehow be lumped together, such databases will surely be unmanageable and unusable.

Logicians use the term predicate for the "general form" in question, and it's predicates that have made databases, database management, and database queries tractable to computerization. Consider once again the proposition from my opening paragraph, which I'll now restate in a slightly different way:
 

Hugh Darwen is the name of a tutor (staff number 44525) in region 4 for certain students on course M357, entitled Data Models and Databases.
 

It's easy to see that this statement has a certain form that could be common to a whole set of propositions we might wish to state in some record of the enterprise called the Open University. For example, we could replace the course number M357 by M355, thereby obtaining a proposition which makes the same kind of sense as the original one, and might even be true. (In fact it isn't true, for two reasons: First, I'm not a tutor for any students on course M355; second, course M355 isn't entitled Data Models and Databases.)

Here's what's probably the most general form of the original proposition that we might all agree on:
 

... is the name of a tutor (staff number ... ) in region ... for certain students on course ..., entitled ....
 

And that's a predicate!

Here now are some important points to note and questions to be asked:

1. The predicate as shown can be broken down in various ways into smaller pieces, each of which is a predicate in turn. For example, "... is the name of a tutor" is a predicate, and so are "... is the name of a tutor (staff number ... )" and "course ... [is] entitled ..." (and so on).

2. It might be useful to give the predicate a name, such as TUTOR_INFO. Such names are used a great deal in database designs. Indeed, many of the names used in databases are really predicate names, though they aren't often referred to as such.

3. The holes marked by "..." are known as placeholders. It might be useful to give them names, too. In fact, predicates are often written using such names. For example:
 

TUTOR is the name of a tutor (staff number STAFF#) in region REGION# for certain students on course COURSE#, entitled TITLE.
 

4. Notice that the placeholder names are often accompanied by text indicating the kind of thing they stand for: TUTOR is the name, staff number STAFF#, region REGION#, and so on. Staff number and name here are both common nouns, standing for anything or everything of the kind indicated. Region is perhaps a little sloppy, considering that what follows is really a region number, not a region per se, but "the region identified by region number REGION#" seemed just a little heavy-handed for my present purpose.

Now if those "indicators of kind" are common nouns, then the accompanying placeholders themselves can be thought of as pronouns. For example, consider the statement "He is her father." This statement contains two pronouns, he and her; he stands for some unspecified person who's the father of some other unspecified person, her. In normal discourse, the context would provide referents for these pronouns, and we would know precisely who's being asserted to be whose father. Because there aren't any referents here, we can't tell which people they actually stand for. However, imagine a context in which the referents are Tom and Jane. Then we'll understand that we need to substitute Tom and Jane for the pronouns to obtain "Tom is Jane's father." In a like manner, when we substitute an appropriate name or proper noun for each placeholder in a predicate, we obtain a proposition. For example, if we substitute Hugh Darwen for TUTOR and 44525 for STAFF# (and so on) in the TUTOR_INFO predicate, we obtain once again the proposition:
 

Hugh Darwen is the name of a tutor (staff number 44525) in region 4 for certain students on course M357, entitled Data Models and Databases.
 

In general, if there are n placeholders and we substitute a proper noun for one of them, we obtain a predicate with n-1 placeholders (and so on). When there are no placeholders left at all, the predicate degenerates to a pure proposition--it is now true or false, unequivocally.

5. The presence of at least one placeholder in a predicate means it can't be the kind of statement of which we can say categorically that it's either true or false. Although we can make propositions out of predicates, a predicate is not, in general, a proposition. (The exception is the degenerate case of a predicate with no placeholders at all.)

6. It's important to agree on which proper nouns, in general, are appropriate for each placeholder. For example, we might not wish to form propositions such as:
 

3.14159 is the name of a tutor (staff number Camembert) in region CV35 7AY for certain students on course Aintree, entitled Jurassic Park.
 

Indeed, the proper nouns agreed upon will almost certainly bear a close relationship to the kind of thing the placeholder represents, often indicated by the presence of a common noun in the predicate (as discussed in point 4).

7. The connectives and and or can be used with predicates to make longer predicates, just as they can be used with propositions to make longer propositions. Not can be used, too.

8. Would it actually be a good idea to use the suggested form, TUTOR_INFO, to hold information about tutors in the Open University database? Can you think of any problems that might arise if we did?

9. The form of some predicate might also be used to formulate a question ("query") to be presented to the computer. The answer to that question would be the set of all propositions that (a) can be formed by substitution of proper nouns for placeholders in the form and (b) can be shown to be true. Of course, if the database itself uses the form of that same predicate to hold the original given statements of fact, then "showing to be true" will be, for the computer, a trivial task of mere regurgitation.

QUANTIFICATION

In point 4 in the previous section, I showed how to make a proposition out of a predicate by substituting a proper noun for each placeholder. However, there's another way to dispose of a placeholder; it goes by the name of quantification, meaning "saying how many." Consider for example the simple predicate:

ARTIST painted a portrait of PERSON.

Instead of just substituting appropriate names for both ARTIST and PERSON (as in, for example, "Holbein painted a portrait of Henry VIII"), we can obtain propositions by saying how many artists painted a portrait of a certain person, or how many people had their portraits painted by a certain artist.

There's one particular form of quantification that's both fundamental and very common: It's called existential quantification, and it involves replacing the placeholder in question by a phrase involving something like "at least one" or "some" or "there exists." For example, the following are all propositions that can be obtained from the predicate "ARTIST painted a portrait of PERSON":

•Holbein painted a portrait of some person.

•Some artist painted a portrait of Henry VIII.

•Some artist painted a portrait of some person.

Aristotle studied propositions of a certain form that includes quantifiers. He realized that if propositions of the form "a is x" and "a is not x" are interesting, then we might also want to consider the truth of "Some a is x," "Every a is x," and "No a is x." Some (as we have seen) is the existential quantifier. Every is what we now call the universal quantifier. No is a negated form of the existential quantifier, for "No a is x" clearly means the same as "It's not the case that some a is x." However, while these observations might justify a claim that Aristotle started the study of predicates, that study didn't come to fruition until the late 19th century, with the contributions of Frege, Boole, Pierce, and others.

CONCLUDING REMARKS

The M357 course material does use the term predicate a little, but not much, for there are other ways of saying what's going on in databases--ways that are often more appealing, in their own special contexts, than always talking in terms of predicates. However, if you have difficulty understanding those other ways, try referring them back to the predicates and simple logic that underpin the whole subject.

To close, here are three observations that will give you a feel for that universal underpinning:

•In Block I of the course we study a method of analysis known as "Entity-Attribute-Relationship modeling," this activity being a common preliminary step in database design. Deciding what entity types to describe and what types of relationship might exist among instances of those entity types is really just deciding what kinds of statements of fact we would like to use to make a formal record of the enterprise being modeled. Kinds of statements of fact, as we have seen, are otherwise known as predicates. The concept known as attribute in this modeling method is just placeholder by another name.

•In Block II we study a theory called "The Relational Model of Data" and a simple computer language called RAS based on that theory. We learn how the ideas of predicate names and placeholder names are used in the formulation of such languages and see how queries can be presented to the computer by constructing predicates from predicates, using substitution, quantification, and the connectives and, or, and not. Incidentally, in this theory we'll find that the mathematical term relation is used to refer collectively both to the general form of a predicate and to the set of true propositions that can be formed from it.

•RAS isn't a commercially used language--it has been designed especially for the Open University for tutorial purposes. In Block III we study the industry's most widely accepted attempt to implement the theory we learned in Block II, a language called SQL. SQL has become so prevalent since its first commercial appearance in 1979 that it has been characterized by one authority as "intergalactic dataspeak." Alas, the industry's most widely accepted attempt isn't a very good one, as we'll see, but a good understanding of predicates will help us to use SQL wisely in spite of its traps and shortcomings.

Hugh Darwen is a database specialist with IBM United Kingdom Ltd. He was one of the chief architects and developers of an IBM relational DBMS called Business System 12, a product that faithfully embraced the principles of the relational model. You can reach him at hugh_darwen@uk.ibm.com.
 


 
search - home - archives - contacts - site index
 

Copyright © 1998 Miller Freeman Inc. All Rights Reserved
Redistribution without permission is prohibited.

Questions? Comments? We would love to hear from you!