3. TYPES (Part 3)
3.6 AGGREGATE TYPES
3.6.1 ARRAY TYPES
Array Bounds as Subtype Property
Our objective in the design of the array type facility was to provide a simple,
efficient mechanism that both satisfies the Steelman requirements (3-3A through
3-3F) and meshes well with the language's type/subtype and type equivalence
conventions. What distinguishes arrays in RED from arrays in languages such as
Pascal are, first, that size determination may be deferred until data allocation
time, and second, that routines may be defined to accept arrays of different sizes
( at different invocations. Both properties are implicit in RED's type/subtype rules.
An array type is defined by the types of its indices and the type of its component.
For example, with the declaration
VAR x: ARRAY INT(m..n) OF FLOAT(5, -1.0E7 .. 1.0E7);
VAR y: ARRAY INT(1..10) OF FLOAT(4, 0.0 .. 1.0 );
both x and y are of the single type ARRAY INT OF FLOAT. (As is the case uniformly
in RED, the type is derived by removing "round bracketed" -- i.e., subtype --
information.) Since the bounds of the index are subtype constraints, they may be
specified at run time, as in the above declaration of x. Moreover, since formal
parameters to routines may be declared with a type (rather than a subtype)
specification, it is straightforward to declare a routine which takes arrays of
different sizes at different times. A procedure p could be declared
PROC p(VAR a: ARRAY INT 0F FLOAT); ... END PROC p;
and invoked by both p(x) and p(Y).
It may be noted that an array indexed by integers is of a different type than
an array indexed by an enumeration type. For example, z declared as follows
VAR z: ARRAY ENUM['a,'b,'c] OF FLOAT(4, 0.0 .. 1.0);
has type ARRAY ENUM['a,'b,'c] OF FLOAT, which is different from the type of x and y.
Having the index type as part of the array type is not required by Steelman (SM
3-3D), but this choice is compatible with the language's rules for deriving types
from subtypes and, further, avoids an anomalous situation in which the type of the
index of a formal array parameter could vary from one invocation of the routine to
another.
Subarray Selection
RED permits the user to select and catenate subarrays of one-dimensional arrays
(e.g., y(1..4)) in compliance with Steelman (SM 3-3E). A number of alternative
approaches were considered in this area, with the major issues concerning whether
such subarrays should be allowed to be used as variables and, if so, whether
"overlapping" assignment should be allowed. The following alternatives were
examined:
- Restrict subarrays to be used as constants only.
- Allow subarrays to be used as variables, but prohibit overlapping
assignments (such as y(1..4) := y(3..6), or, in general, y(low1..high1) :=
y(low2..high2) when low1<low2<=high1 or low2<low1<=high2).
- Allow subarrays to be used as variables, allow overlapping assignments, but
leave the semantics undefined in such cases.
- Allow subarrays to be used as variables, allow overlapping assignments, and
define the semantics.
Alternative D was selected for RED. When an overlapping assignment occurs, the
semantics is that of a componentwise assignment where it is guaranteed that no
target component is modified before it has been used as a source component (i.e.,
the final value of the target is the same as the initial value of the source).
For example, assume that w has been declared as ARRAY INT(1..10) OF lNT(1..10)
and that w(i) has the value i for i from 1 through 10. Then the assignment
w(1..4) := w(2..5)
will cause the first M elements of w to become 2,3,4, and 5; and the assignment
w(7..10) := w(6..9)
will cause the last 4 elements to become 6,7,8,and 9.
The reasons for choosing Alternative D over the other alternatives will now be summarized.
Alternative A would comply with SM3-3E and would simplify some aspects of the
implementation, but it has potential drawbacks with respect to efficiency and
readability. If subarrays cannot be used as variables, then assignment to a
subarray could only be realized through assignment of a catenation of subarrays to a
whole array, or through explicit iteration on subarray elements. For example, to
assign v(3..7) to w(3..7), where v has been appropriately declared, the user would
have either to perform the assignment
w := w(1..2) & v(3..7) & w(8..10)
which is less efficient (and more obscure) than the assignment
w(3..7) := v(3..7)
or to explicitly generate the iteration
FOR i:INT(3..7) REPEAT w(i) := v(i); END REPEAT;
which is also more obscure and likely to be less efficient than the direct subarray
assignment (since the latter can use the target machine's block transfer instruction
if one exists). Note also that using an explicit iteration defeats the purpose of
having a subarray selection at all in such cases. For these reasons and for
uniformity with array component selection, we chose to reject A1 and thus to allow
assignment to subarrays. In particular, subarrays can be selected as variables via
the subarray selection form, consistent with conventional usage.
With the decision to allow subarrays as variables comes the problem of
assigning overlapping arrays. One alternative is to make such assignments illegal
(Alternative B). However, this would be overly pre-emptive,
since in many cases overlapping
assignments are useful. Thus Alternative B was rejected and the issue now became
the semantics of overlapping assignment. The problem is that with the most direct
and efficient implementation, differences in behavior can occur depending on whether
the low or high end of the target is overlapped. For example, assume that the
implementation has a block transfer instruction which takes two addresses and a
count -- L1, L2, C -- whose runtime effect is to transfer the contents of the element
at L1 to L2, from L1+1 to L2+1, and so on up to L1+C-1 to L2+C-1 in that order. In
the case of the assignment
w(1..4) := w(2..5)
this would work as expected. However, in the case
w(7..10) := w(6..9);
the effect would be to replicate the single value w(6) four times, assigning this
value to w(7), w(8), w(9), and w(10).
The most efficient scheme is for the language not to guarantee the semantics in
such cases (Alternative C), since in general run-time code is needed to check the
direction of the overlap. However, such an implementation dependency would not be
acceptable (e.g., it is in violation of SM 11F and 13A), Alternative C was therefore rejected,
and Alternative D adopted.
3.6.2 RECORD TYPES
A data item having a record type is a possibly heterogeneous collection of
components, each of which is selectable via a "dot qualification" form with the
field name as selector. For example, if x is declared as follows:
VAR x : RECORD[a:BOOL, b:INT(O..10), c:STRING[ASCII](5)];
then x.b is a variable of subtype INT(O..10). The type of x is
RECORD[a:BOOL, b:INT, c:STRING[ASCII]];
i.e., the field names and their order are
significant in determining type (see Section 3.2.5 for further discussion).
The Pascal notion of "variant" records is not part of the RED record facility,
but can be easily achieved by embedding a union inside a record. See Section 3.6.3
for a discussion of union types.
The Steelman requirement for non-assignable record components (SM 3-3F) is met
by RED through selection functions (see also Section 9.3).
For example, suppose the
user wanted to declare a variable x with the following structure:
- two variable INT(O..10) fields named c1 and c2,
- a constant BOOL field named b which is set only when the whole variable x is
the target of an assignment, and
- an INT field (not necessarily physically present) named s whose value is the
sum of the c1 and c2 fields.
The declarations below carry out this purpose:
As desired, x.c1 and x.c2 may only be used as constants. Note that the values of
x.b and x.s can be changed by an assignment to x. If the user wants to define
record components which are constant over the lifetime of the data item, such fields
can be declared as subtype constraints. For example, with the type declaration
TYPE t1(b: BOOL) : RECORD[c1, c2: INT(O..10)];
VAR x1: t1(TRUE);
the value of x1.b will be TRUE regardless of the value of the components c1 and c2
of x1.
3.6.3 UNION TYPES
Introduction
The union type specification has its origins in the desire to reuse the same
storage area for objects of different types. For example, FORTRAN realizes unions
via its EQUIVALENCE statement, COBOL via its REDEFINES and RENAMES clauses, PL/l via
the DEFINED attribute, and TACPOL via the CELL structure. In each of these cases,
however, the concept realized is that of "free union", and there is no protection
against defining a data object to be of one data type and then using it (via the
name of the overlaid object) as though it were of another type.
The RED language provides a "discriminated" as opposed to "free" union
facility, which enables implementation by storage overlaying but which includes
restrictions on the usage of union items so that type safety is not compromised and
so that type checking can still be carried out at translation time. The facility
provided is in compliance with Steelman requirements in this area (SM 3-3G and
3-3H).
With a union type, the user can define data structures which have different
compositions at different times during program elaboration. For example, the
structure of a message might depend on whether or not it is classified:
At any point in time, msg has two components: a constant field called the tag whose
value is an enumeration element of a type derived from the union's field names:
ENUM['unclassified, 'classified]
and a component which may be selected via dot selection using the field name
reflected in the tag.
It is frequently useful to interrogate the value of the tag field through a
CASE statement:
Note that the tag field can only be changed by an assignment to the entire object
(SM 3-3H). Thus, an attempt to assign
msg.TAG := 'unclassified;
will be recognized as a translation-time error. It is a run-time error (which
raises the X_TAG exception) if an attempt is made to select a component not present
as indicated by the value of the tag field. For example, if the user creates msg as
classified:
msg := [classified: [authorization: 255, data: make_string]];
and then attempts to select msg.unclassified, the X_TAG exception will be raised.
As is the case with exceptions in general, efficiency may be increased (at the
expense of reliability) if detection of the exception is suppressed through the
SUPPRESS pragmat.
Constraining the Tag
It is possible to restrict a variable having a union type so that it can only
have a given tag value. This is accomplished by the specification of the desired
tag as a subtype constraint for the variable. For example, let message_type be an
abbreviation for the union type shown earlier:
With the following variable declarations:
VAR msg : message_type;
VAR u_msg : message_type('unc1assified);
VAR c_msg : message_type('classified);
the variable msg can have either the classified or the unclassified alternative
component, whereas u_msg can have only the latter and c_msg only the former. Thus
the assignments:
msg := u_msg;
msg := c_msg;
are always permitted; the assignment
u_msg := msg;
will be permitted only if msg.TAG = 'unclassified (8 run-time check in general); and
the assignment
u_msg := c_msg;
is detected as illegal at translation-time. The ability to constrain the tag of a
union variable increases efficiency since only as much storage need be reserved for
the variable as is required by the specified alternative.
Variant Records
An alternative to unions is a "variant record" mechanism as used in Pascal. we
rejected this approach because it unnecessarily combines two independent notions --
that of a record, which contains all components denoted in the type, and that of a
union, which contains exactly one of its components. A variant record is achieved
in RED by specifying a union type for a component of a record. (Note that in RED
there may be several components with union types.) For example
where the message_type is the union type defined earlier.
Union Subtypes
Another approach to unions is to capture the tag concept through the type
parameterization facility -- e.g., to define the tag as a subtype property. As a
general rule, however, this is not appropriate. One of the basic principles of the
RED type mechanism is that the subtype of a data item cannot change during the
lifetime of the item. However, the tag of a union variable may change through
assignment to the entire variable. Thus the tag should not be considered a subtype
property. The tag range, though, should be so considered, for those cases where the
user wishes explicitly to constrain the variable so that its tag cannot change. The
approach adopted by RED to meet these objectives is to treat union types in a
fashion analogous to enumeration types. That is, an enumeration type has subtype
constraints for MIN and MAX values of its range. Given the declaration
ABBREV e_type : ENUM['a,'b,'c,'d,'e];
the user may declare e_type variables with or without an explicit specification of
these constraints:
VAR e1 : e_type('b..'d);
VAR e2 : e_type; % subtype is e_type('a..'e) by implication
Similarly, a union type has a tag range as subtype constraint which may or may not
be explicitly specified for variable declarations
Thus it is the tag range which is checked on union assignments; if the target tag has
not been constrained, then it may be changed by an assignment to the whole data
item. If the target tag has been constrained, then only a source which has the
given tag value can be assigned to the item.
Component Selection
Union component selection raises a number of issues in connection with safety,
efficiency, and language simplicity. Consider the u_type variable u1 declared
above. Safety considerations dictate that whenever either the value of u1.a is
accessed or an attempt is made to assign to u1.a, a check should be made that u1.TAG
= 'a. This is necessary so that the language's strong type checking is not
compromised.
The language might attempt to constrain the uses of union selection to minimize
the run-time overhead involved in performing such checks. For example, consider
occurrences of union selection within the branches of a CASE statement which
discriminates on the basis of the tag field of the union variable:
It might seem safe for the translator to omit the tag checking on the case branches
(e.g., to assume that u1 has tag 'a in the 'a branch), but in general this is not
possible. For example, the 'a branch might be:
WHEN 'a => p; write(u1.a);
where the procedure p imports u1 and assigns it a new value with tag 'b. To avoid
such a situation, and to enable union selection on CASE branches to omit the tag
check, the language could adopt one of several possible approaches.
- Define the CASE statement so that the tag of a union variable is "frozen"
during elaboration of the branches. In the above case, p would be in error
when it attempted to assign a value to u1 with a tag other than 'a.
- Make a local copy of u1 upon entry to each branch, with the tag of the copy
constrained to the indicated tag value. At normal termination of the
branch, the value of the copy is assigned back to the original variable. ln
the above case, the assignment to u1 by p would not affect the local copy in
the case branch and would be nullified by the copy-back on normal
termination of the branch.
The main problem with A is that it introduces ad hoc semantics for the CASE
statement when tag discrimination is performed. Moreover, "freezing" the tag of a
variable effectively changes the subtype of the variable, in conflict with the
language's conventions regarding subtypes, and it interacts with exception handling
through the need to "unfreeze" the variable's tag upon abnormal termination from the
branch. Further, this approach does not completely solve the problem of run-time
overhead, since it now induces a tag check when an assignment is made to the entire
variable (e.g, externally in the procedure p).
Alternative B has similar deficiencies with respect to language simplicity.
As with A, specialized semantics are created for the CASE statement when
discriminating union tags. In addition, there is the resulting inefficiency of
copying to and then from the local variable in the branch every time a branch is
elaborated. If the item is large, the copying overhead can easily outweigh any
savings obtained from the avoidance of tag checking.
As a result, RED does not attempt to solve the tag checking safety/efficiency
problem through special interpretations of the CASE statement. Instead, it is the
translator's responsibility to decide whether tag checking can safely be omitted.
In practice, this optimization will frequently be applicable. Moreover, the user
may explicitly suppress tag checking through a pragmat, with the understanding that
safety is thereby compromised.
3.6.4 STRING TYPES
The RED language allows the user to declare character strings with either the
predeclared ASCII character set or user-defined character sets. Moreover, just as
an enumeration literal may belong to several enumeration types, so a string literal
I may belong to several string types. For example, suppose that the user has declared
ebcdic as an ABBREV for an enumeration type corresponding to the appropriate
character set. Ebcdic strings can then be declared and manipulated via the standard
string operations (catenation, comparison, relationals, component selection,
substring selection). For example,
If the output device supports the ebcdic character set, the string AXY¢E is
produced.
String types conceptually result from an instantiation of a generic type:
The only properties of strings which are not definable through the generic mechanism
are the string literal form (i.e., the user writes "ABC" instead of [l:'A, 2:'B,
3:'C]), and the substring selector (s(2..4) is a STRING[ebcdic](3) and not an ARRAY
1NT(2..4) OF ebcdic).
The RED string facility satisfies the Steelman requirements for string literals
(SM 2H). It is a richer facility than that required by
SM 3-2D, since
considerations of language applicability caused us to reject the alternative of
allowing user-defined character sets without providing for their convenient use in
strings.
Further discussion relating to character sets and strings may be found in
Sections 2.2.4, 2.2.5, and 3.5.4. An example showing how "varying length" strings
can be realized in RED appears in Section 9.1.1.
3.6.5 SET TYPES
RED provides for set types with respect to INT and enumeration subtypes. For
example, with the enumeration type switch_type as defined below:
the user can declare a variable which represents any of the 16 possible switch
settings ('switch_a present or absent, ... 'switch_d present or absent):
VAR switch_set : SET[switch_type];
switch_set := ['switch_b, 'switch_d];
At this point elements 'switch_b and 'switch_d are present in switch_set, whereas
'switch_a and 'switch_c are absent. Note than switch_set can be represented in 4
bits (one per potential element), and the set operations can be implemented
efficiently via logical operations supported on the target machine.
The RED set mechanism satisfies Steelman requirements SM 3-4A and 3-4B,
supplying operations for equality, subset, membership, complement, intersection,
union, symmetric difference, and assignment. Note that the semantics of the
operations provided is oriented around the abstract properties of sets as opposed to
the underlying properties of boolean arrays. For example, one determines if
'switch_a is in switch_set by invoking 'switch_a IN switch_set, and not by
subscripting (the attempted selection switch_set('switch_a) is a translation time
error). Subarray selection and oatenation are not defined for sets, since they are
not appropriate for such data.
A useful programming convention is to iterate over a set, performing an
appropriate action for each element present. This is easily done in RED; for
example, to invoke procedure p on each element in switch_set the user can write the
following:
The insertion of elements into a set is accomplished by assignment and union.
For example, the assignment
switch_set := switch_set OR ['switch_a];
adds element 'switch_a to switch_set if it is not already present.
As with strings, sets can be regarded conceptually as though they were defined
through the generic mechanism:
GENERIC st: SUBTYPE(INT, ENUM)
TYPE SET[st] : ARRAY st OF BOOL;
with the operations declared likewise as generic. For example,
The only behavioral property of set types which cannot be defined through generics
is the constructor form.
An example of the use of set types is Warshall's algorithm [Wa62] for computing
the transitive closure of a binary relation: