Using Nested Relvalues to Handle Unknown Values 14thSeptember 2011

David Livingstone

Examples of Using Nested Relvalues in RAQUELto Handle Unknown Values

Introduction

There is no single, universally-approved method of dealing with every kind of missing value in a relational or SQL database. A number of methods have been proposed in the literature which have varying degrees of utility.

In a pure relational database system, the data type of truth values contains only2 values, true and false. There is no NULL of any kind available to represent a missing value. C. J. Date has pointed out the usefulness of using empty nested relvalues to represent missing values. A straightforward way of applyinghis proposal in RAQUEL is considered here.

The Basic Concept

The simplest way to exploit Date’s suggestion is as follows. Let there be a relvar attribute Athat may have values missing, and let X be the scalar type of the values to be held in attribute A. Instead of A being defined as having type X, let it have the type of a nested unary relvaluewhoseonly attribute hastypeX. Whenever a tuplehas a single scalar value of type X to be held in attribute A, instead of it being held directly as a scalar value in A, it will be held as the sole value of a nested relvalue of cardinality one held inA. Whenever a tuple has no attribute A value - i.e. the value is missing -then that tuple’s attribute A will consist of a relvalue of cardinality zero.

For example, consider an Employee relvar with an attribute Bonus, such that Bonusholds the current bonus payment for each employee. Let there be some employees who do not receive bonuses- i.e. their bonus payments are always missing. Furthermorethey must be distinguished from employees with a zero bonus - i.e. the employeeswho receive bonuses but who have currently not yet earned one. Hence a zero bonus payment cannot be used to represent a missing bonus payment[1]. For simplicity let monetary values be expressedvia the Number scalar type. The Bonusattribute would have a unary relvalue type, such that the relvalue’s sole attribute has the scalar type Number. All employees who receive bonus payments will have their current bonus represented as theNumbervalue of a nested relvalue with acardinality of one held in attribute Bonus - if they currently have not yet earned a bonus, then thatNumbervaluewill be zero. All employees who do not receive bonuses will have unary relvalues of cardinality zero in the Bonusattribute.

Let the Employee relvar have just three attributes, the Bonus attribute as described above, together with aName attribute of type Text and aSalary attribute of type Number, both of which must always contain values. Then, assuming some sample tuple values, the Employee relvar could be illustrated in a tabular format as follows :-

Name
Text / Salary
Number / Bonus
Relvalue
‘John’
/ 3000 / Amount
Number
500
‘James’
/ 3500 / Amount
Number
‘Jane’
/ 3200 / Amount
Number
300
‘Joanne’
/ 3300 / Amount
Number

Note that the nested relvalue in attribute Bonusneed not be restricted to a cardinalityof zero or one; it could be allowed to have zero, one, or more tuples. This would not make any difference to the use of empty nested relvars to represent missing values. If necessary, an integrity constraint could be added to prohibit the nested relvalue from having a cardinality greater than x, where x is 1 or any other desired maximum.

CreatingRelations that Handle Missing Values Via Nested Relvalues

Such a relvar is created by nesting the use of the Attribute Assignment. This can be illustrated with respect tothe above example as follows :-

Employee <==Attribute[ Name <== Text; Salary <== Number;
Bonus <==Attribute[ Amount <== Number] ]

The following illustrates the insertion of the 4 sample tuplesinto Employee :-

Employee --Insert {
{ Name <--‘John’; Salary <-- 3000; Bonus <--Insert { { Amount <-- 500 } } }
{ Name <--‘James’; Salary <-- 3500; Bonus <--Insert { } }
{ Name <--‘Jane’; Salary <-- 3200; Bonus <--Insert { { Amount <-- 300 } } }
{ Name <--‘Joanne’; Salary <-- 3300; Bonus <--Insert { } }
}

The insertion can be abbreviated by eliding theattribute value assignments, as long as the attribute values are given in the same order as that in which their attributes were declared when creating the relvar :-

Employee --Insert {
{ ‘John’; 3000; { {500 } } }
{ ‘James’; 3500; { } }
{ ‘Jane’; 3200; { {300 } } }
{ ‘Joanne’; 3300; { } }
}

Using Relations that Handle Missing Values Via Nested Relvalues

Using nested relvalues to handle missing values provides both an opportune means and a necessity. The opportunity is that of representing the absence of a value simply and explicitly by the absence of a value - i.e. a relvalue of cardinality zero - rather than by a completely different mechanism(such as NULL) which has to be interpreted as an absence but may be interpreted as something else - e.g. NULL is sometimes interpreted as an existing (i.e. non-absent) but unknown value. The necessity is that because the attribute value is a relvalue not a scalar value, the attribute value must always be treated differently,as a relvalue not a scalar value, even though it only contains a scalar value whenever that value is not missing.

When referencing an attribute that may contain missing values, it is useful to be able to omit tuples containing missing values, or to be able to pick out tuples with missing values, before proceeding further since the nature of the ‘further proceedings’ will be affected by whether one is dealing with scalar values or relvalues.

Toomit missing values can be done via the Unnest operator. Consider the exemplar Employee relvar. The expression
Employee Unnest[ Bonus ]
yields a relvalue that contains no tuples derived from those that contained empty Bonus attribute relvalues; i.e. its value will be :-

{ { ‘John’; 3000; 500 }
{ ‘Jane’; 3200; 300 } }

This is because, for each tuple in a relation, Unnest creates a cartesian product of the tuples in its nested relvalue (Bonus in the example) and a tuple formed from the remainder of thattuple’s attributes (Name and Salary in the example), and then unions all the cartesian product results to form the final result. So if a Bonus attribute relvalue contains zero tuples, the cartesian product contributes zero tuples to the final result.

To pick out missing valuescan be done by checking on the cardinality of the nested relvalues. For example
Employee Restrict[ Bonus Meta[ Card ] = 0 ]
returns all those Employeetuples whose Bonus relvalues contain no tuples. The Meta operator returns an item of meta data about its operand - in this case Bonus - the specific meta item being determined by the parameter; ‘Card’ is an abbreviation of cardinality. The result of the meta operation can be used in any comparison, so
Employee Restrict[ Bonus Meta[ Card ] > 0 ]
returns all the tuples which do not contain missing values. The difference between this result and that obtained via unnesting is that the latter returns the Bonus values as scalar values while a restriction using cardinalities returns the Bonus values as nested relvalues.

Suppose a restriction is to be applied to a relation via an attribute that may have missing values. One must first consider whether the restriction is to be applied to actual attribute valuesor to missing attribute values.

For example, to obtain all the Employee tuples with a bonus greater than 320 means that all the tuples for those with no bonuses should first be omitted, and then the actual bonus values checked. This result is given by the expression
Employee Unnest[ Bonus ] Restrict[ Amount > 320 ]
Note that the unnesting returns a result with no nested relvalues in it, so that the restriction only needs to handle scalar values. Furthermore the attribute that the restriction deals with is called Amount because that is the name of the attribute within the nested Bonus relvalues[2].

Suppose the restriction is to pick out em[ployees with salaries greater than 3200 whodo not rececive bonuses. Then the two conditions can be combined :-
Employee Restrict[ Bonus Meta[ Card ] = 0 And ( Salary > 3200 )]
returns the required result.

If the tuples with ‘missing’ bonuses are required, it may be useful to simply remove the Bonus attribute from the result to make the result clearer :-
Employee Restrict[ Bonus Meta[ Card ] = 0 ] Project[ ~ Bonus ]
The projection yields all but the Bonus attribute.

Suppose a join is to be applied to a relation, either a natural or generalised join, via an attribute that may have missing values, then againone must first consider whether the join is to use actual attribute values or to missing attribute values. The same techniques as described for restrictions can be applied to the join operands before the join operation is executed.

The join itself can be carried out in one of two ways :

  1. By comparing nested relvalues. ‘=’ and ‘~=’ relational comparisons are available as well as the set versions of relational comparisons, i.e. subset, proper subset, superset, and proper superset. It means that empty relvalues, representing missing values, will be included in the comparison and can appear in the result.
  2. By comparing scalar values. This requires that Unnest is applied to the nested relvalues of both join operands. It means that no missing values will be included in the join comparison and will not appear in the result.

Apart from the above considerations, in general it may or may not be necessary to pay special attention when applying a relational algebra operator to a relation that uses nested relvalues to handle missing values.

Certain relational operators refer explicitly to one or more attributes in the operand relvar to which they are applied. Therefore their invocation has to take account of the types of these attributes. These operators are the Generalised Join operators, Restrict, Extend and GroupBy. In RAQUEL the parameters of these operators contain attribute expressions. An expression must contain operators pertinent to the type of the attributes concerned, so for an attribute containing relvalues these operators must be relational operators. Thus while special attention is required, this is necessary anyway in order to meaningfully apply the operatorand nested relvalues to handle missing values require no additional overhead.

The remaining relational operators do notrequire specific considerationof operand attributes when they are invoked. These operators are the set operators, Project, the Natural Join operators, Divide, Rename, plus Nest and Unnest which are explicitly designed to handle nested relvalues. They are all operators which in RAQUEL have no parameters, parameters that are attribute names, or parameters that are based only on attribute names. In their case, no special consideration needs to be given to the fact that missing values are handled by means of nested relvalues.

Conclusion

In treating attributes that use nested relvalues to handle missing data, the user must always be aware of that fact. This awareness is an overhead for the user to handle. On the other hand, making the handling of missing values explicit avoids any unwitting errors arising, errors which may go unnoticed while still returning wrong answers, and is consequently often eventually required in practice anyway. Furthermore the treatment of missing values is straightforward and so the actual overhead is quite small. On balance it is considered beneficialfor missing values toalways be handled explicitly in a simple manner, and using nested relvalues provides that simple manner.

Page 1 of 5

[1] If a zero bonus is also construed as a missing bonus payment, then it is missing in a different way, and the two kinds of missing values should be represented differently, as indeed they are under the proposal considered here.

[2] One might consider whether the attribute within the Bonus relvalues should also be called Bonus instead of Amount.