. navigate
 Navigate
3. Types (Part 2) left arrow
x
right arrow 3. Types (Part 4)
Red Rationale
3.

TYPES
(Part 3)

Aggregate Types     Array     Array Bounds     Subarray Selection     Record
Union     Intro     Constraining the Tag     Variant Records     Union Subtypes     Component Selection     String     Set    



3.   TYPES (Part 3)


3.6  AGGREGATE TYPES


3.6.1  ARRAY TYPES


Array Bounds as Subtype Property

Our objective in the design of the array type facility was to provide a simple, efficient mechanism that both satisfies the Steelman requirements (3-3A through 3-3F) and meshes well with the language's type/subtype and type equivalence conventions. What distinguishes arrays in RED from arrays in languages such as Pascal are, first, that size determination may be deferred until data allocation time, and second, that routines may be defined to accept arrays of different sizes ( at different invocations. Both properties are implicit in RED's type/subtype rules. An array type is defined by the types of its indices and the type of its component. For example, with the declaration

      VAR x: ARRAY INT(m..n) OF FLOAT(5, -1.0E7 .. 1.0E7);
      VAR y: ARRAY INT(1..10) OF FLOAT(4, 0.0 .. 1.0 );

both x and y are of the single type ARRAY INT OF FLOAT. (As is the case uniformly in RED, the type is derived by removing "round bracketed" -- i.e., subtype -- information.) Since the bounds of the index are subtype constraints, they may be specified at run time, as in the above declaration of x. Moreover, since formal parameters to routines may be declared with a type (rather than a subtype) specification, it is straightforward to declare a routine which takes arrays of different sizes at different times. A procedure p could be declared

      PROC p(VAR a: ARRAY INT 0F FLOAT); ... END PROC p;

and invoked by both p(x) and p(Y).

It may be noted that an array indexed by integers is of a different type than an array indexed by an enumeration type. For example, z declared as follows

      VAR z: ARRAY ENUM['a,'b,'c] OF FLOAT(4, 0.0 .. 1.0);

has type ARRAY ENUM['a,'b,'c] OF FLOAT, which is different from the type of x and y. Having the index type as part of the array type is not required by Steelman (SM 3-3D), but this choice is compatible with the language's rules for deriving types from subtypes and, further, avoids an anomalous situation in which the type of the index of a formal array parameter could vary from one invocation of the routine to another.


Subarray Selection

RED permits the user to select and catenate subarrays of one-dimensional arrays (e.g., y(1..4)) in compliance with Steelman (SM 3-3E). A number of alternative approaches were considered in this area, with the major issues concerning whether such subarrays should be allowed to be used as variables and, if so, whether "overlapping" assignment should be allowed. The following alternatives were examined:

  1. Restrict subarrays to be used as constants only.

  2. Allow subarrays to be used as variables, but prohibit overlapping assignments (such as y(1..4) := y(3..6), or, in general, y(low1..high1) := y(low2..high2) when low1<low2<=high1 or low2<low1<=high2).

  3. Allow subarrays to be used as variables, allow overlapping assignments, but leave the semantics undefined in such cases.

  4. Allow subarrays to be used as variables, allow overlapping assignments, and define the semantics.

Alternative D was selected for RED. When an overlapping assignment occurs, the semantics is that of a componentwise assignment where it is guaranteed that no target component is modified before it has been used as a source component (i.e., the final value of the target is the same as the initial value of the source).

For example, assume that w has been declared as ARRAY INT(1..10) OF lNT(1..10) and that w(i) has the value i for i from 1 through 10. Then the assignment

      w(1..4) := w(2..5)

will cause the first M elements of w to become 2,3,4, and 5; and the assignment

      w(7..10) := w(6..9)

will cause the last 4 elements to become 6,7,8,and 9.

The reasons for choosing Alternative D over the other alternatives will now be summarized.

Alternative A would comply with SM3-3E and would simplify some aspects of the implementation, but it has potential drawbacks with respect to efficiency and readability. If subarrays cannot be used as variables, then assignment to a subarray could only be realized through assignment of a catenation of subarrays to a whole array, or through explicit iteration on subarray elements. For example, to assign v(3..7) to w(3..7), where v has been appropriately declared, the user would have either to perform the assignment

      w := w(1..2) & v(3..7) & w(8..10)

which is less efficient (and more obscure) than the assignment

      w(3..7) := v(3..7)

or to explicitly generate the iteration

      FOR i:INT(3..7) REPEAT w(i) := v(i); END REPEAT;

which is also more obscure and likely to be less efficient than the direct subarray assignment (since the latter can use the target machine's block transfer instruction if one exists). Note also that using an explicit iteration defeats the purpose of having a subarray selection at all in such cases. For these reasons and for uniformity with array component selection, we chose to reject A1 and thus to allow assignment to subarrays. In particular, subarrays can be selected as variables via the subarray selection form, consistent with conventional usage.

With the decision to allow subarrays as variables comes the problem of assigning overlapping arrays. One alternative is to make such assignments illegal (Alternative B). However, this would be overly pre-emptive, since in many cases overlapping assignments are useful. Thus Alternative B was rejected and the issue now became the semantics of overlapping assignment. The problem is that with the most direct and efficient implementation, differences in behavior can occur depending on whether the low or high end of the target is overlapped. For example, assume that the implementation has a block transfer instruction which takes two addresses and a count -- L1, L2, C -- whose runtime effect is to transfer the contents of the element at L1 to L2, from L1+1 to L2+1, and so on up to L1+C-1 to L2+C-1 in that order. In the case of the assignment

      w(1..4) := w(2..5)

this would work as expected. However, in the case

      w(7..10) := w(6..9);

the effect would be to replicate the single value w(6) four times, assigning this value to w(7), w(8), w(9), and w(10).

The most efficient scheme is for the language not to guarantee the semantics in such cases (Alternative C), since in general run-time code is needed to check the direction of the overlap. However, such an implementation dependency would not be acceptable (e.g., it is in violation of SM 11F and 13A), Alternative C was therefore rejected, and Alternative D adopted.




3.6.2  RECORD TYPES

A data item having a record type is a possibly heterogeneous collection of components, each of which is selectable via a "dot qualification" form with the field name as selector. For example, if x is declared as follows:

      VAR x : RECORD[a:BOOL, b:INT(O..10), c:STRING[ASCII](5)];

then x.b is a variable of subtype INT(O..10). The type of x is

      RECORD[a:BOOL, b:INT, c:STRING[ASCII]];

i.e., the field names and their order are significant in determining type (see Section 3.2.5 for further discussion). The Pascal notion of "variant" records is not part of the RED record facility, but can be easily achieved by embedding a union inside a record. See Section 3.6.3 for a discussion of union types. The Steelman requirement for non-assignable record components (SM 3-3F) is met by RED through selection functions (see also Section 9.3). For example, suppose the user wanted to declare a variable x with the following structure:

  1. two variable INT(O..10) fields named c1 and c2,

  2. a constant BOOL field named b which is set only when the whole variable x is the target of an assignment, and

  3. an INT field (not necessarily physically present) named s whose value is the sum of the c1 and c2 fields.

The declarations below carry out this purpose:

example

As desired, x.c1 and x.c2 may only be used as constants. Note that the values of x.b and x.s can be changed by an assignment to x. If the user wants to define record components which are constant over the lifetime of the data item, such fields can be declared as subtype constraints. For example, with the type declaration

      TYPE t1(b: BOOL) : RECORD[c1, c2: INT(O..10)];
      VAR x1: t1(TRUE);

the value of x1.b will be TRUE regardless of the value of the components c1 and c2 of x1.




3.6.3  UNION TYPES


Introduction

The union type specification has its origins in the desire to reuse the same storage area for objects of different types. For example, FORTRAN realizes unions via its EQUIVALENCE statement, COBOL via its REDEFINES and RENAMES clauses, PL/l via the DEFINED attribute, and TACPOL via the CELL structure. In each of these cases, however, the concept realized is that of "free union", and there is no protection against defining a data object to be of one data type and then using it (via the name of the overlaid object) as though it were of another type.

The RED language provides a "discriminated" as opposed to "free" union facility, which enables implementation by storage overlaying but which includes restrictions on the usage of union items so that type safety is not compromised and so that type checking can still be carried out at translation time. The facility provided is in compliance with Steelman requirements in this area (SM 3-3G and 3-3H).

With a union type, the user can define data structures which have different compositions at different times during program elaboration. For example, the structure of a message might depend on whether or not it is classified:

example

At any point in time, msg has two components: a constant field called the tag whose value is an enumeration element of a type derived from the union's field names:

      ENUM['unclassified, 'classified]

and a component which may be selected via dot selection using the field name reflected in the tag.

It is frequently useful to interrogate the value of the tag field through a CASE statement:

example

Note that the tag field can only be changed by an assignment to the entire object (SM 3-3H). Thus, an attempt to assign

      msg.TAG := 'unclassified;

will be recognized as a translation-time error. It is a run-time error (which raises the X_TAG exception) if an attempt is made to select a component not present as indicated by the value of the tag field. For example, if the user creates msg as classified:

      msg := [classified: [authorization: 255, data: make_string]];

and then attempts to select msg.unclassified, the X_TAG exception will be raised. As is the case with exceptions in general, efficiency may be increased (at the expense of reliability) if detection of the exception is suppressed through the SUPPRESS pragmat.




Constraining the Tag

It is possible to restrict a variable having a union type so that it can only have a given tag value. This is accomplished by the specification of the desired tag as a subtype constraint for the variable. For example, let message_type be an abbreviation for the union type shown earlier:

example

With the following variable declarations:

      VAR msg : message_type;
      VAR u_msg : message_type('unc1assified);
      VAR c_msg : message_type('classified);

the variable msg can have either the classified or the unclassified alternative component, whereas u_msg can have only the latter and c_msg only the former. Thus the assignments:

      msg := u_msg;
      msg := c_msg;

are always permitted; the assignment

      u_msg := msg;

will be permitted only if msg.TAG = 'unclassified (8 run-time check in general); and the assignment

      u_msg := c_msg;

is detected as illegal at translation-time. The ability to constrain the tag of a union variable increases efficiency since only as much storage need be reserved for the variable as is required by the specified alternative.




Variant Records

An alternative to unions is a "variant record" mechanism as used in Pascal. we rejected this approach because it unnecessarily combines two independent notions -- that of a record, which contains all components denoted in the type, and that of a union, which contains exactly one of its components. A variant record is achieved in RED by specifying a union type for a component of a record. (Note that in RED there may be several components with union types.) For example

example

where the message_type is the union type defined earlier.




Union Subtypes

Another approach to unions is to capture the tag concept through the type parameterization facility -- e.g., to define the tag as a subtype property. As a general rule, however, this is not appropriate. One of the basic principles of the RED type mechanism is that the subtype of a data item cannot change during the lifetime of the item. However, the tag of a union variable may change through assignment to the entire variable. Thus the tag should not be considered a subtype property. The tag range, though, should be so considered, for those cases where the user wishes explicitly to constrain the variable so that its tag cannot change. The approach adopted by RED to meet these objectives is to treat union types in a fashion analogous to enumeration types. That is, an enumeration type has subtype constraints for MIN and MAX values of its range. Given the declaration

      ABBREV e_type : ENUM['a,'b,'c,'d,'e];

the user may declare e_type variables with or without an explicit specification of these constraints:

      VAR e1 : e_type('b..'d);
      VAR e2 : e_type; % subtype is e_type('a..'e) by implication

Similarly, a union type has a tag range as subtype constraint which may or may not be explicitly specified for variable declarations

example

Thus it is the tag range which is checked on union assignments; if the target tag has not been constrained, then it may be changed by an assignment to the whole data item. If the target tag has been constrained, then only a source which has the given tag value can be assigned to the item.




Component Selection

Union component selection raises a number of issues in connection with safety, efficiency, and language simplicity. Consider the u_type variable u1 declared above. Safety considerations dictate that whenever either the value of u1.a is accessed or an attempt is made to assign to u1.a, a check should be made that u1.TAG = 'a. This is necessary so that the language's strong type checking is not compromised.

The language might attempt to constrain the uses of union selection to minimize the run-time overhead involved in performing such checks. For example, consider occurrences of union selection within the branches of a CASE statement which discriminates on the basis of the tag field of the union variable:

example

It might seem safe for the translator to omit the tag checking on the case branches (e.g., to assume that u1 has tag 'a in the 'a branch), but in general this is not possible. For example, the 'a branch might be:

      WHEN 'a => p; write(u1.a);

where the procedure p imports u1 and assigns it a new value with tag 'b. To avoid such a situation, and to enable union selection on CASE branches to omit the tag check, the language could adopt one of several possible approaches.

  1. Define the CASE statement so that the tag of a union variable is "frozen" during elaboration of the branches. In the above case, p would be in error when it attempted to assign a value to u1 with a tag other than 'a.

  2. Make a local copy of u1 upon entry to each branch, with the tag of the copy constrained to the indicated tag value. At normal termination of the branch, the value of the copy is assigned back to the original variable. ln the above case, the assignment to u1 by p would not affect the local copy in the case branch and would be nullified by the copy-back on normal termination of the branch.

The main problem with A is that it introduces ad hoc semantics for the CASE statement when tag discrimination is performed. Moreover, "freezing" the tag of a variable effectively changes the subtype of the variable, in conflict with the language's conventions regarding subtypes, and it interacts with exception handling through the need to "unfreeze" the variable's tag upon abnormal termination from the branch. Further, this approach does not completely solve the problem of run-time overhead, since it now induces a tag check when an assignment is made to the entire variable (e.g, externally in the procedure p).

Alternative B has similar deficiencies with respect to language simplicity. As with A, specialized semantics are created for the CASE statement when discriminating union tags. In addition, there is the resulting inefficiency of copying to and then from the local variable in the branch every time a branch is elaborated. If the item is large, the copying overhead can easily outweigh any savings obtained from the avoidance of tag checking.

As a result, RED does not attempt to solve the tag checking safety/efficiency problem through special interpretations of the CASE statement. Instead, it is the translator's responsibility to decide whether tag checking can safely be omitted. In practice, this optimization will frequently be applicable. Moreover, the user may explicitly suppress tag checking through a pragmat, with the understanding that safety is thereby compromised.




3.6.4  STRING TYPES

The RED language allows the user to declare character strings with either the predeclared ASCII character set or user-defined character sets. Moreover, just as an enumeration literal may belong to several enumeration types, so a string literal I may belong to several string types. For example, suppose that the user has declared ebcdic as an ABBREV for an enumeration type corresponding to the appropriate character set. Ebcdic strings can then be declared and manipulated via the standard string operations (catenation, comparison, relationals, component selection, substring selection). For example,

example

If the output device supports the ebcdic character set, the string AXY¢E is produced.

String types conceptually result from an instantiation of a generic type:

example

The only properties of strings which are not definable through the generic mechanism are the string literal form (i.e., the user writes "ABC" instead of [l:'A,  2:'B,   3:'C]), and the substring selector (s(2..4) is a STRING[ebcdic](3) and not an ARRAY 1NT(2..4) OF ebcdic).

The RED string facility satisfies the Steelman requirements for string literals (SM 2H). It is a richer facility than that required by SM 3-2D, since considerations of language applicability caused us to reject the alternative of allowing user-defined character sets without providing for their convenient use in strings.

Further discussion relating to character sets and strings may be found in Sections 2.2.4, 2.2.5, and 3.5.4. An example showing how "varying length" strings can be realized in RED appears in Section 9.1.1.




3.6.5  SET TYPES

RED provides for set types with respect to INT and enumeration subtypes. For example, with the enumeration type switch_type as defined below:

example

the user can declare a variable which represents any of the 16 possible switch settings ('switch_a present or absent, ... 'switch_d present or absent):

      VAR switch_set : SET[switch_type];
      switch_set := ['switch_b, 'switch_d];

At this point elements 'switch_b and 'switch_d are present in switch_set, whereas 'switch_a and 'switch_c are absent. Note than switch_set can be represented in 4 bits (one per potential element), and the set operations can be implemented efficiently via logical operations supported on the target machine.

The RED set mechanism satisfies Steelman requirements SM 3-4A and 3-4B, supplying operations for equality, subset, membership, complement, intersection, union, symmetric difference, and assignment. Note that the semantics of the operations provided is oriented around the abstract properties of sets as opposed to the underlying properties of boolean arrays. For example, one determines if 'switch_a is in switch_set by invoking 'switch_a IN switch_set, and not by subscripting (the attempted selection switch_set('switch_a) is a translation time error). Subarray selection and oatenation are not defined for sets, since they are not appropriate for such data.

A useful programming convention is to iterate over a set, performing an appropriate action for each element present. This is easily done in RED; for example, to invoke procedure p on each element in switch_set the user can write the following:

example

The insertion of elements into a set is accomplished by assignment and union. For example, the assignment

      switch_set := switch_set OR ['switch_a];

adds element 'switch_a to switch_set if it is not already present.

As with strings, sets can be regarded conceptually as though they were defined through the generic mechanism:

      GENERIC st: SUBTYPE(INT, ENUM)
          TYPE SET[st] : ARRAY st OF BOOL;

with the operations declared likewise as generic. For example,

example

The only behavioral property of set types which cannot be defined through generics is the constructor form.

An example of the use of set types is Warshall's algorithm [Wa62] for computing the transitive closure of a binary relation:

example






Aggregate Types     Array     Array Bounds     Subarray Selection     Record
Union     Intro     Constraining the Tag     Variant Records     Union Subtypes     Component Selection     String     Set    

3. Types (Part 2) left arrow
x
right arrow 3. Types (Part 4)


Overview

Requirements
      Strawman
      Woodenman
      Tinman
      Ironman
      Steelman

RED Reference
RED Rationale

Types in RED
Time/Life Computer Languages
Memories

Site Index

Overview             Reference ToC             Rationale ToC             Site Index


Home   Favorites   Map

IME logo Copyright © 2009, Mary S. Van Deusen