2
Volume 6, number 5 INFORMATION PROCESSING LETTERS October 1977 NULLYALWSINARELATIONALDATABASE John GRANT Computerand hjhnation Sciences, University of norida, Gainesville, FL 32611, USA Heceived 9 Octofxer 1976, revised version received 14 May 1977 Information systems, relational data base In [ I] Codd explains the treatment of null values in a relational data sublanguage for retrieval and up ddte. The purpose of this note is to show that Codd’s method may not always give the desired result and to modify his definition to take case of this problem. In ( 11 a null value is interpreted only as a missing value. Codd applies a three-valued logic where (changing his notation) the truth values are 1 (TRUE), f (IJN- KVNOWN), 0 (FALSE), and the logical connectives are defined as follows: NOT (A) = 1 - A, A AND B = min (d,B),A ORB= max (A,B). Consider new the following example with a relation taken from [2] page 64. S is the relation with domains S#, SNAME,STATUS, CITY, and suppose that the 4-tuple p = (25, JONES, PARIS) is an element of S. Note that the value of STATUS for p is missing. Let us now try the retrieval GET W(S* S#) : (S l SNAME= ‘JONES’ AND S l STATUS = 10) OR (S - STATUS # 10 AND S l CITY = ‘PARIS’) . According to the truth-value evaluation using the three-valued logic 25 would not be in the retrieved set as the truth-value of 25 E W(S - S#) is (1 AND 4) OR (i AND 1) =i. Yet 25 should be in the retritved set since the conditions S - STATUS = 10 and S - STATUS ~610 are mutually exclusive and both S * SNAME= ‘JONES’ and S . CITY = ‘PARIS’ are TRUE for p. We suggest the use of a “non-truth-functional” ree-valued logic. Assume that an expression such as 25 E l+@*S#) is given. For each occurrence of a null value substitute a possible value (a member of the cor- responding domain). If the value of the expression is TRUE (respectively FALSE) for all possible substitu- tions we assign the truth-value TRUE (respectively FALSE) to the expression; otherwise we assign it the truth-value UNKNOWN. Note that if there are k occurrences of null values in a record being processed and there are nl, .... nk values in the corresponding domains then according to this method nl 9 n2 l ... l nk evaluations are needed. However in general only a few of the evaluations are really necessary. We need to isolate those atomic formulas in the expression which involve null values for the n-tuple under consideration. Thus in the exarn- ple above, since STATUS for p is missing, the atomic formulas S l STATUS = 10 and S l STATUS # 10 would be the ones to consider. So even though the domain STATUS may take on many possible values, in this case only 2 evaluations are needed. More gener- ally, if there are i relevant atomic formulas in the expression (the ones which involve null values), then at most 21evaluations are needed. Another exainple might involve the atomic formulas S 0 ST&I’US> 10, S l STATUS = IO, S 9 STATUS < 10. In this case only 3 evaluations are needed. Note that if the set of values which satisfy the k relevant atomic formulas form a partition of the domain of values of a field, then k evaluations are needed rather than 2k. Finally we have a suggestion concerning another kind of null value, not one where the null value indi- cates an Lnknown value, but one where the null value indicates a domain which does not apply to the tuple. or example in a relation

Null values in a relational data base

Embed Size (px)

Citation preview

Page 1: Null values in a relational data base

Volume 6, number 5 INFORMATION PROCESSING LETTERS October 1977

NULLYALWSINARELATIONALDATABASE

John GRANT Computer and hjhnation Sciences, University of norida, Gainesville, FL 32611, USA

Heceived 9 Octofxer 1976, revised version received 14 May 1977

Information systems, relational data base

In [ I] Codd explains the treatment of null values in a relational data sublanguage for retrieval and up ddte. The purpose of this note is to show that Codd’s method may not always give the desired result and to modify his definition to take case of this problem. In ( 11 a null value is interpreted only as a missing value. Codd applies a three-valued logic where (changing his notation) the truth values are 1 (TRUE), f (IJN- KVNOWN), 0 (FALSE), and the logical connectives are defined as follows: NOT (A) = 1 - A, A AND B = min (d,B),A ORB= max (A,B).

Consider new the following example with a relation taken from [2] page 64. S is the relation with domains S#, SNAME, STATUS, CITY, and suppose that the 4-tuple p = (25, JONES, PARIS) is an element of S. Note that the value of STATUS for p is missing. Let us now try the retrieval

GET W(S* S#) : (S l SNAME = ‘JONES’ AND

S l STATUS = 10) OR

(S - STATUS # 10 AND S l CITY = ‘PARIS’) .

According to the truth-value evaluation using the three-valued logic 25 would not be in the retrieved set as the truth-value of 25 E W(S - S#) is (1 AND 4) OR (i AND 1) =i. Yet 25 should be in the retritved set since the conditions S - STATUS = 10 and S - STATUS ~6 10 are mutually exclusive and both S * SNAME = ‘JONES’ and S . CITY = ‘PARIS’ are TRUE for p.

We suggest the use of a “non-truth-functional” ree-valued logic. Assume that an expression such as

25 E l+@* S#) is given. For each occurrence of a null

value substitute a possible value (a member of the cor- responding domain). If the value of the expression is TRUE (respectively FALSE) for all possible substitu- tions we assign the truth-value TRUE (respectively FALSE) to the expression; otherwise we assign it the truth-value UNKNOWN. Note that if there are k occurrences of null values in a record being processed and there are nl, . . . . nk values in the corresponding domains then according to this method nl 9 n2 l . . . l nk

evaluations are needed. However in general only a few of the evaluations

are really necessary. We need to isolate those atomic formulas in the expression which involve null values for the n-tuple under consideration. Thus in the exarn- ple above, since STATUS for p is missing, the atomic formulas S l STATUS = 10 and S l STATUS # 10 would be the ones to consider. So even though the domain STATUS may take on many possible values, in this case only 2 evaluations are needed. More gener- ally, if there are i relevant atomic formulas in the expression (the ones which involve null values), then at most 21 evaluations are needed. Another exainple might involve the atomic formulas S 0 ST&I’US > 10, S l STATUS = IO, S 9 STATUS < 10. In this case only 3 evaluations are needed. Note that if the set of values which satisfy the k relevant atomic formulas form a partition of the domain of values of a field, then k evaluations are needed rather than 2k.

Finally we have a suggestion concerning another kind of null value, not one where the null value indi- cates an Lnknown value, but one where the null value indicates a domain which does not apply to the tuple.

or example in a relation

Page 2: Null values in a relational data base

Volume 6, number 5 October 1977

MAIDENNAME does not apply to male employees. Our suggestion is that in this case two-valued logic can be used in such a way that where a relevant atomic formula occurs we assign it the value FALSE. Our idea is that in this case no value of the domain can possibly be substituted for the missing value, hence both the values of say MAIDENNAME = ‘SMITH’ and MAiDENNAME # ‘SMITH’ would be FALSE. Ciearly if different kinds of null values may occur for our truth-value evaluation we need to know whether the null value is a missing value or an inapplicable one.

Thus if the value of TELEPHONENUMBER is null for a tuple, then either that individual has no telephone or his telephonenumber is unknown.

References

[ 11 E.F. Codd, Understanding relations (Installment #7) FDT Bulletin of AC&l--SICMOD 7 (3-4) (1975) 23-28.

[2] C.J. Date, An Introduction to Database Systems (Addison- Wesley, Reading, MA, 1975).

157