. navigate
 Navigate
3. Types (Part 1) left arrow
x
right arrow 3. Types (Part 3)
Red Rationale
3.

TYPES
(Part 2)

Type Opacity     Intro     Abbreviation
New Type     Realizing New Types     Transparency Problem     Opacity Problem     Conversion
Literals & Constructors     INT     FLOAT     Alternatives     Steelman     Portability     Simplicity     Efficiency     Conclusions
Fixed-Point     Enum Types     Intro     Enum Elements     Ordered vs Unordered     Char Sets    



3.   TYPES (Part 2)


3.3  TYPE OPACITY

Type definitions serve a variety of purposes in programs. When some of these purposes conflict, the language design must provide reasonable solutions. A case in point is the issue of "type opacity"; in this section we describe the underlying problems and the RED approach to solving them.


3.3.1  INTRODUCTION

The main issues connected with type opacity may be summarized fairly readily. When an identifier is declared as a type, does it simply serve as a shorthand notation for the underlying type used in its declaration, or does it denote a new type, distinct from its underlying type? If the latter, are the behavioral properties of the underlying type (e.g., literal/constructor forms, component selectors, predefined and/or user-defined routines) automatically extended so that they apply to the declared type?

The following terms are used throughout the discussion. If a type declaration introduces a shorthand notation, this will be called the abbreviation approach. Alternatively, if a type declaration causes a new type to be created, this will be called the new-type approach. In the latter case, a new type is said to be transparent with respect to a behavioral property of the underlying type if this property is automatically extended to the new type and is said to be opaque otherwise.

RED uses both the abbreviation and new type approaches. New direct types are transparent with respect to selector, assignment, and equality operations and opaque with respect to all others. The remainder of this section provides a justification for this strategy and describes the interactions of these features with other parts of the language. Section 3.7 will discuss type opacity issues related to indirect types.




3.3.2  ABBREVIATION APPROACH

One of the ways in which a type definition facility can be used is as a convenient notational shorthand. For example, suppose that the subtype FLOAT(8, -1.0 .. 1.0) is required for a large number of data declarations. It does not help either the reading, the writing, or the maintaining of the program to repeat the specification FLOAT(8, -1.0 .. 1.0) at each declaration -- e.g., if the precision must be increased from 8 to 10, to do so on an individual basis would be tedious and error-prone. What is desired in this case is a means for introducing an identifier that serves as an abbreviation for a subtype specification, but which would not introduce a new type.

The abbreviation declaration in RED provides the facility just described. For example, the declaration

      ABBREV fraction : FLOAT(8, -1.0 .. 1.0);

defines the name "fraction" as a subtype abbreviation, so that the user could subsequently declare

      VAR x : fraction;

and then use x as a FLOAT variable with the subtype constraints specified in the declaration of fraction.

Consistent with type declarations (see Section 3.1.1), abbreviation declarations may be parameterized, thus providing both language uniformity and a useful notational flexibility. For example, if the user wants to constrain a FLOAT range to, say, -1.0 to 1.0, but allow variation with respect to precision, the parameterized abbreviation declaration

      ABBREV fraction(p : INT) : FLOAT(p, -1.0 .. 1.0);

may be used as in the following examples:

      VAR x1 : fraction (8);
      VAR x2 : fraction (10);

Some readers may wonder why the keyword "ABBREV" was adopted instead of "TYPE" or "SUBTYPE", either of which might seem appropriate. The reason that "TYPE" was rejected for the abbreviation keyword is that it carries a connotation of creating a new type. In fact, the new-type declaration in RED is introduced by the keyword "TYPE". The reason that we decided against the keyword "SUBTYPE" in abbreviation declarations is that the term "subtype" is used with a specific meaning in RED -- a V subtype is the result of invoking a type. In particular, to say that an object has a given subtype is to imply that all subtype constraints have been determined. ABBREViations, on the other hand, may be parameterized and hence are not subtypes. For example, fraction in the preceding paragraph is not a subtype; fraction must be invoked, as in fraction (8), to yield a subtype -- in this case, FLOAT(8, -1.0 .. 1.0).




3.3.3  NEW TYPE APPROACH


Mechanism for Realizing New Types

In contrast to the situation where an identifier is declared simply to denote an existing subtype, there is frequently the need to introduce new types in programs. Two of the main reasons for defining new types are to obtain efficiency and to achieve pfcfecgicg. For example, consider the type declaration TYPE access_key : INT(0 .. 255)

Since the bounds of the underlying subtype are fixed (and manifest), there is no need to store run-time "dope" information with objects of type access_key, nor is there a need for routines taking access_key parameters to perform run-time bounds checking on their arguments. From the point of view of protection, new types help the programmer distinguish those operations relevant to the new type from those which pertain to the underlying type. In the above example, it should be possible for the declaration of access_key to be replaced by

      TYPE access_key : ARRAY INT(1 .. 8) OF BOOL

and affect only the definition of the (abstract) operations on access_key; i.e., no uses of these operations should have to be changed.

It is possible to achieve the semantics of the new-type approach through a combination of abbreviation and capsule declarations. In fact, this was essentially the method adopted by the BLUE language during Phase 1 [Go78]. For example, the user could define

example

with the rule that outside the capsule the INT operations would not be applicable to data declared as access_keys.

Although in principle the above method is valid, RED provides a different notation for new-type declarations than for abbreviations. To see why this was done, consider the following implications of the "encapsulated abbreviation" approach:

  1. Having two types treated as identical in some contexts (inside a capsule) but as different types in others (outside a capsule) would be confusing and cause complications in the semantics for overloading.

  2. If the user declared a capsule in all contexts where a new type was desired, there would be a proliferation of capsules in the program. Syntactically, this would be cumbersome. Moreover, since the capsule mechanism is relatively new in programming languages, we were unwilling to impose this as the only style with which new types could be obtained.

  3. Efficiency would be impaired. It is often the case that a new type is introduced for the purpose of pinning down the value of a subtype constraint for the underlying type; e.g., declaring a type t to be ARRAY INT(1..3) OF FLOAT(5, -1.0 .. 1.0) with the expectation that no "dope vector" information will be stored with data of type t with the encapsulated abbreviation approach, such an optimization would be difficult to perform; inside the capsule, data declared to be t would have the same type as arbitrary ARRAY INT OF FLOAT data.

As a result of these considerations, RED provides the type declaration as the means for realizing the new-type approach. The issue then becomes whether new types should be transparent or opaque. For example, given the type declaration

      TYPE access_key : INT(0 .. 255);

do we have the INT behavior (literals, operators, conversion functions, user-defined routines with INT parameters, usage as index types) automatically extended (overloaded) so that they also apply to access_key data? In RED, the answer is "no": new types are transparent only with respect to selector, assignment, and equality operations and are opaque with respect to all others. The reasons for our decision will now be explained.


Problems with Complete Transparency

We should first point out that complete transparency does not imply that the new type is just an abbreviation. For example, access_key could be transparent with respect to all INT behavioral properties yet be a different type from INT: cross assignments between access_key and INT variables would be prohibited, and routines defined to take access_key parameters could not be passed INT arguments.

The main reasons for rejecting the complete transparency approach to new types are, first, that it provides default behavior for new types which in many cases are inappropriate and, second, that confusion arises from the interaction between this approach and overloading.

To see the first problem, consider that access_key as a transparent type would have the full battery of arithmetic operations provided; e.g., K1 * K2 would yield an access_key, where K1 and K2 are access_key data, and IFLOAT(K) would convert access_key K to type FLOAT. In some cases -- in particular, when the new type is to be an extension of an existing one -- the provision of such defaults might be appropriate. More frequently, however, a new type is intended to restrict rather than extend the behavioral properties of an existing type; the default inheritance of the underlying behavior would be inappropriate and error-prone. Although these problems could be suppressed outside the capsule in which the new type is declared (by not exporting the inappropriate routines from this capsule), they would still arise within the capsule.

The second problem arises from an interaction between transparency and overloading. Let us look more closely at what INT operations would be overloaded for type access_key. Presumably "+" would take two acoess_keys and return an access_key (a run-time check being performed to guarantee that the result is in range). Unary "-" would apply to an access_key and raise an exception whenever its actual parameter is non-zero. Suppose the user defined a function F which takes a formal parameter with subtype INT(m .. n) and returns a result of the same subtype. If F is automatically overloaded so as to take and return an access_key, how are the subtype bounds m and n used (note that they might be run-time computable)? Does the implicit overloading apply only to those INT routines which are known at the point where the type declaration of access_key occurs, or does it also occur in inner scopes when the user there declares a new routine with an INT parameter? If the latter alternative is chosen, does the redefinition (in an inner scope) of an outer INT routine implicitly redefine there the corresponding access_key routine?

While answers might be formulated to all of these questions, the issues serve to illustrate that language complexity would result if new types are defined to be transparent with respect to the behavioral properties of the underlying type. This complexity, coupled with the desire to minimize defaults in the language, caused us to reject the "complete transparency" approach to type definition.


Problems with Complete Opacity

In the case of complete opacity, no behavioral properties of the underlying type are inherited by the new type. Thus, in the following examples:

      TYPE access_key : INT(0 .. 255);

      TYPE vector : ARRAY INT(1 .. 3) OF FLOAT(5, -1.0E7 .. 1.0E7);

none of the INT behavior automatically applies to access_key data, and none of the properties of ARRAY INT OF FLOAT automatically apply to vector data; all that is available in either case is the ".ALL" selector. In the first example this is not a problem, since the user can overload explicitly any INT routines which are to apply to access_key and can define a "constructor" function to resolve INT literals to type access_key. In the second example, however, there is no way to overload the array selector operations that would permit a subscripted vector variable to be used as the target of an assignment. Thus with the declaration

      VAR v : vector;

the user would have to write

      v.ALL(i) := 0.0;

instead of the more traditional and convenient

      v(i) := 0.0;

To allow such familiar notation, the RED language provides for new types to be transparent with respect to selector operations when the underlying type is an aggregate (e.g., array, record, or union). If the selectors are to be available outside the capsule in which the type is declared, they must be explicitly exported -- consistent with the language's rules for all other encapsulated properties.

In order to comply with SM 5F ("Assignment ... shall be automatically defined for each variable"), the RED language provides an assignment operation for each new (direct) type in terms of the assignment operation for the underlying type. In cases where different semantics are desired, assignment can be defined by the user (see Section 9.1). If the type is to be "non-assignable", the programmer can realize this effect simply by not exporting assignment from the capsule where the type is declared.

New types in RED are transparent with respect to equality as well as assignment. Although this conflicts with SM 3-3C, our decision was motivated by considerations of language uniformity. Either assignment and equality should both be automatically provided, or neither should be, since the abstract properties of these two operations are interrelated.




Conversion Between New and Underlying Types

RED provides a simple and efficient mechanism for converting between a new type and its underlying type (SM 3B); viz., a selector notation (".ALL") which implies sharing rather than copying. For example, with the declaration

      VAR k : access_key;

the object k.ALL is a variable with subtype INT(O .. 255). Thus

      k.ALL := 63;

converts the INT 63 to type access_key, and

      VAR i :: INT (O .. 255);
      i := k.ALL;

converts k to type INT.

The export of the ".ALL" selector from capsules is allowed (again, consistent with the language rules for export of encapsulated properties). This is useful in contexts where a type is introduced for efficiency rather than protection considerations, or where the intent of a capsule is to define conversion operations between two types, each of which is defined in a separate inner capsule. In the second case, each inner capsule could define and export its individual type and the ".ALL" selector, while the outer capsule defines the conversion operators (taking advantage of the ".ALL") and then exports the two types together with the conversions. The ".ALL" selectors would not be exported from the outer capsule.

The user can define an explicit constructor operation which takes a value from the underlying type and converts it to the new type. For example, the declaration

example

would enable the user to write statements such as

example

An alternative to the ".ALL" conversion scheme is to automatically provide "copying" conversion operations between the new and underlying types. For example, the language could automatically supply a "#" operator as an explicit conversion from the underlying to the new type whenever a new type is declared, thus saving the user the effort of defining such a function. The main reason for not doing this is that in many cases the effect of the conversion depends on the user's intent, especially when the new type has subtype attributes. Rather than attempt to draw the fine line required between those cases where a language-defined conversion would be appropriate and those where the user would have to define one explicitly, we opted for the simpler (and safer) position that a "copying" conversion from underlying type to new type be the user's responsibility to define. As an example, consider the type declaration

example

This type might be useful where arbitrarily long strings are expected, only the first 5 characters of which are to be retained, but whose original lengths are relevant. The problem with trying to provide automatically a conversion from STRING[ASCII] to t are that first, the destination subtype could not be uniquely determined from the source string (i.e., "ABCDE" could be converted to t(5), t(6), etc.), and second, the semantics of the conversion must be user-specified when the source string has length greater than 5 (whether an exception is raised, whether the first 5 characters are used, etc.).

It should be noted that programmer definition of conversion functions allows the use of constructor notations which are independent of the underlying type. For example, a capsule which defines the data type complex could export a conversion operator which takes a RECORD[real_part, imag_part : FLOAT] and constructs a complex. Thus, outside the capsule one can write

      VAR z : complex;
      Z := [real_part: 0.5, imag_part: 0.75] # complex;

with no commitment that the underlying type is that used in the constructor. Inside the capsule the type complex might be defined (via polar coordinates) as

      TYPE complex : RECORD[angle, magnitude : FLOAT(5, -1.0 .. 1.0)]

even though the conversion function takes as its input a record in Cartesian coordinates. Thus a change in the underlying type (prompted, e.g., by efficiency considerations) need not impact the use of conversion functions outside the encapsulation.




3.3.4  LITERALS AND CONSTRUCTORS

One of the most difficult issues in the design of a strongly-typed language with rich data definition facilities is the choice of the conventions for literals and constructors. Tradeoffs must be made between language simplicity and notational convenience, and interaction with other language rules -- such as type opacity and overloading of routines -- must be considered. The issue is: given the occurrence of a literal or constructor in a program, what is its type, and what is its subtype? Two extreme positions are possible. The conservative approach requires that the subtype (and thus also the type) of each literal and constructor be determinable independent of the context of its use. This determination would have to be made either by the form of the literal or constructor -- e.g., TRUE always has type (and subtype) BOOL -- or by explicit resolution -- e.g., 'A#ebcdic resolves the literal 'A to type ebcdic. At the other extreme, the liberal approach assumes that literals and constructors are intrinsically ambiguous with respect to type and subtype, allows literal and constructor forms to be inherited from an underlying type when a new type is declared, and permits unresolved literals and constructors whenever the type (or subtype, if required) can be determined from the context of their use. With this approach, the following examples would be legitimate:

example

The conservative approach achieves simplicity of language rules, but at a considerable price in notational convenience. For example, the literal NIL could never be used without explicit resolution in any scope where more than one indirect type is known:

example

With NIL (and with enumeration literals appearing in more than one type) the problem is type ambiguity. An even more severe problem notationally is that of subtype ambiguity, especially in the case of floating point literals. What is the subtype (in particular, the precision) of, say, the literal 1.23? In languages such as FORTRAN, where there are only as many different floating point types as there are different actual precisions on the target machine, this is not really an issue. Even though only three decimal digits appear in the literal, a single-precision approximation will be guaranteed. However, such an implementation dependency is not satisfactory, and in the absence of an explicit resolution of precision the conservative approach must define an apparent precision for each floating point literal. Counting the digits which follow the leading 0's in the literal is a reasonable method; thus the literal 1.23 will be represented with at least 3 decimal digits of precision. It may be noted that the technique of defining and then using constants set to floating point literals expanded to extremely high precision will not work well under the conservative approach, since the computation of expressions referencing such constants would be carried out at the maximal target precision.

The liberal approach gives the user a fair amount of notational flexibility, but it is not simple to state the circumstances under which the context can suffice to resolve the (sub)type of a literal or constructor. For example, the rules for overloading might have to be extended so that an overloaded function could be selected on the basis of the result type required in a given context. Consider as illustration the type angle, where the user has overloaded the "+" operator so that the sum of two angles yields an angle (modulo 2 times pi):

example

The issue here is the interpretation of pi+4.12, for which two equally valid possibilities arise.

  1. With the "bottom-up" approach, the manifest FLOAT sum is computed (i.e., pi and M.12 are treated as FLOAT), with 7.26159 as a result. An attempt to interpret this value as an angle fails (since it is out of the range 0.0 .. 2.0*pi), and the assignment a := pi+4.12 is flagged as erroneous.

  2. With the "top down" approach, a target type for the sum is determined -- viz., angle, the type of the variable on the left side of the assignment. Thus an overloading of "+" is sought for which the result type is angle. The only candidate is the user-defined "+" function whose two formal parameters have type angle. Thus an attempt is made to interpret pi and H.12 individually as angles. This succeeds, and the translator compiles a call of this user-defined "+" function with the two given operands. At run-time, a will be assigned the value 0.07841 (qua angle).

To the reader of the program it is not obvious which of the above two interpretations is valid.

The RED language strikes a compromise between the conservative and liberal approaches. In the interest of program readability, we are. closer to the conservative position; as a result, explicit type resolution is required in some contexts where the liberal approach would have allowed bare literals or constructors. We can summarize the RED conventions as follows:

  1. Opacity of new types regarding literals and constructors
    Literals and constructors are not inherited by new types. Thus if one wanted to assign the constant pi to angle a, the attempted assignment

          a := pi;

    would be illegal. Instead, the user should define "#" as a conversion from l FLOAT to angle and then write

          a := pi#angle;

    Thus the assignment

          a := pi+4.12;

    would be illegal and have to be written

          a := (pi+4.12)#angle;

    to realize the "bottom-up" approach, or

          a := pi#angle + 4.12#angle;

    to realize the "top-down" approach. It should be noted that the non-inheritance of literals is simply part of the language's type opacity rules (see Section 3.3.3).


  2. Type resolution of ambiguous literals.
    Uniform conventions apply to the literal NIL and to an element appearing in more than one enumeration type. If the routine to which the literal or enumeration element is passed is overloaded so that different versions of the routine could be selected under different type interpretations of NIL or the enumeration element, then an explicit resolution of the type must appear. For example, if the user defines a character set ebcdic as an ABBREV for an enumeration type, and the element 'A is in this set, then it is legitimate to write

          VAR e : ebcdic;
          e := 'A;

    since the assignment routine for ebcdic requires an ebcdic source value. However, it is not correct to write PRED('A) since the PRED function is defined for both ASCII and ebcdic; the user would have to write either PRED('A#ASCII) or PRED('A#ebcdic). Note that it is permissible to resolve a literal even if the context does not require it.

          VAR a : ASCII;
          a := 'A#ASCII;

    is always legal.


  3. Type resolution of ambiguous constructors
    Some constructor forms may be ambiguous, either because they contain ambiguous literals or because they can apply to different aggregate types. For example, [a : 10] could have any union type with an INT field named "a". The rules for such ambiguous constructors are identical to those in (2) for NIL and ambiguous enumeration elements.


  4. Subtype resolution of literals and constructors
    Since routines can interrogate subtype attributes of their formal parameters, the issue of the subtype of a literal or constructor passed as an actual parameter is relevant. If a literal or constructor is passed as an actual parameter, and the formal parameter is declared with a type (rather than a subtype), then the user may either explicitly specify the subtype via a resolution form or pass the bare literal or constructor and obtain the subtype determined by the language rules for the particular literal or constructor.




3.4  PREDEFINED TYPES


3.4.1  BOOL TYPE

The primary issue with respect to the Boolean type is the set of operations which are provided and, in particular, the role of "short-circuited" conjunction and disjunction. Steelman (SM 3-2C and 6D) requires a type for Boolean values and "infix control operations for short circuit conjunction and disjunction of the controlling Boolean expression in conditional and iterative control structures." (A "short-circuited" operation is one whose operands are guaranteed to be elaborated in left to right order, and where only as many operands are elaborated as are necessary to determine the result of the operation.)

RED provides the BOOL type and defines a unary operator for complement (NOT) and binary operators for conjunction (AND), disjunction (OR), exclusive disjunction (XOR), and comparison (=, /=). AND and OR are conditional (i.e., short circuited.)

The following alternative approaches were considered but rejected:

  1. Define the Boolean type with none of the above operators and, instead, introduce the logical operators as part of the syntax for assertions and IF and REPEAT statements.

  2. Provide operators for non-conditional (as well as conditional) conjunction and disjunction.

The main rationale for A is that since Boolean expressions are most frequently used to determine control flow, programs may be easier to understand if the control flow dependencies are localized to the control structures themselves, instead of appearing (implicitly) in the form of Boolean variables to which controlling expressions have been assigned elsewhere. That is, the intent of

      IF (I < J) AND (J < K) THEN ... END IF;

is clearer than that of

      VAR flag : BOOL;
      ...
      flag := (I < J) AND (J < K);
      ...
      IF flag THEN ... END IF;

However, the anomaly of a type with no operators would be potentially confusing; Boolean operators would appear to be present in the language, yet their use would be restricted to only certain contexts. Moreover, the definition of "predicate functions" -- functions which return Boolean values -- would be difficult and error-prone to write, since Boolean expressions would be disallowed as the source of an assignment. Further, A1 represents an attempt to over-dictate pnogramming style via language restrictions. Some embedded applications require the manipulation of Boolean values as data and not simply to guide control flow, and the adoption of A1 would seriously impair the programming of such applications.

Although Alternative A was rejected the programmer can still adhere to the style which A would have set. An analogy with the GOTO statement is relevant here. Although the RED language provides the GOTO statement (required by SM 6G), its usage is expected to be minimal in light of the availability of more structured control constructs. Similarly, although Boolean expressions are allowed outside conditional and iterative control structures, readability considerations imply that in most circumstances such usage should be minimized.

As to the issue of which operators to provide for the BOOL type, there are arguments in favor of supplying more operators than have been defined in RED (e.g., non-conditional conjunction and disjunction as mentioned in Alternative B) and arguments in favor of supplying fewer (e.g., inequality).

The exclusion of non-conditional conjunction and disjunction was motivated by the desire for language simplicity. In environments where non-conditional elaboration is more efficient than conditional elaboration (e.g., where both operands can be elaborated concurrently, as on a "pipeline" machine architecture) the user (or an implementation library) can define functions with the non-conditional elaboration semantics.

Although the inequality operator is redundant, since it computes the same result as XOR, any attempt to eliminate this redundancy would result in non-uniform language rules:

  1. If the XOR and equality operations were provided for BOOL but inequality was not, this would violate the language rule that the definition of the equality operator automatically implies the existence of the inequality operator.

  2. If the XOR operation was provided for BOOL but neither of the comparison operations were, BOOL would be the only one of the basic built-in types not to have these operations.

  3. If the comparison operations were provided for BOOL but XOR was not, this would violate the uniformity between SET and BOOL operations (exclusive disjunction is required for sets by SM 3-4B).




3.4.2  INT TYPE

RED provides the INT type with MIN and MAX attributes to satisfy the Steelman requirements for integer arithmetic (SM 3-1B,  3-1F,  3-1H). Since subtype constraints must always be given on variable declarations, RED automatically complies with the Steelman requirement that variable ranges be explicitly specified by the user, rather than implicitly determined by the implementation (SM 3-1C). Thus, the attempted declaration

      VAR x : INT;

is incorrect, and the user must instead supply a declaration such as

      VAR x : INT(m .. n);

with explicit range information. If it is necessary frequently to declare INT variables whose range is the maximum supported on the target machine, this can be done via an ABBREViation:

      ABBREV integer : INT(LOW_INT .. HIGH_INT);
      VAR x : integer;

A standard set of operations apply to INT operands: unary and binary + and -; binary *, **, MOD, DIV, comparison operators and relational operators; absolute value (ABS); and conversion to floating point (IFLOAT).

The main issue concerning the INT type is one of implementation when the target machine supports arithmetic for different integer lengths (e.g., halfword, word, doubleword), especially since the determination of a range for an INT might not be known until runtime. A detailed discussion of this issue is found in Section 3.1.2.




3.4.3  FLOAT TYPE


Alternative Approaches

Among the problems that must be faced in the design of a language intended for a variety of target machines, one of the most vexing is the choice of the language facilities for approximate arithmetic. There is no obvious solution, and tradeoffs must be made among program portability, language simplicity, and run-time efficiency.

The design choices considered were the following:

  1. Provide a set of floating point types, each corresponding to an actual precision supported on the particular target machine.

  2. Provide a set of floating point types, each corresponding to a specified precision1 chosen by the user.

  3. Provide a single floating point type, with the programmer required to supply a specified precision2 at the declaration of each floating point variable.

The RED language has adopted the third alternative. The main reason for this choice is that it comes closest to meeting the Steelman; there was not a clear enough advantage to either of the other two alternatives to justify deviating from the DoD requirements.

In the remainder of this section we shall look at the three choices with respect to the following criteria:

  1. Satisfying the Steelman requirements

  2. Portability

  3. Simplicity

  4. Efficiency


Satisfying Steelman

There are strong implications in Steelman that precision should not distinguish type (i.e., that there should be a single type for approximate arithmetic):

  1. Steelman 3D states: "The constraints that characterize subtypes shall include range [and] precision .... " That is, just as integer objects of different range shall have the same type, so floating point objects of different precisions shall have the same type.

  2. Steelman 3B states: "There shall be no implicit conversions between types" and Steelman 3-1D states: "Explicit conversions shall not be required between precisions." If precision distinguished type, one of these requirements would have to be violated.

Both (A) and (B) are in violation of these requirements, whereas (C) is in compliance.


Portability

(A) clearly entails a sacrifice with regard to portability, since differences in word lengths imply that a variable declared to be single precision for one target machine (e.g., the CDC 6600) should be declared to be double precision for another (e.g., the IBM 360).

(B) and (C) facilitate program portability, since the user specifies the precision for floating point variables explicitly in terms of the minimum number of decimal digits required in the mantissa. If the user declares a variable to have precision 10, this would be implemented as single-precision on some machines, and double-precision on others.


Simplicity

(C) is somewhat simpler than the other alternatives, since the existence of several floating point types (with (A) or (B)) implies a need for special rules to handle mixed-precision arithmetic or explicit conversions. With (C) the approach is straightforward: binary FLOAT operations can take operands of different precisions, with the result having the greater of the two precisions. No explicit conversions are required between FLOAT data of different precisions.

Literals are handled more smoothly with (C) than with (A) or (B) with (C), a simple rule can be formulated with respect to the precision of a literal (e.g., count the number of digits after all leading 0's), and the programmer may assign the literal to variables with either less or greater specified precision. For example, it is useful to be able to declare a constant which is equal to pi expanded to some large precision, and then use this constant in computations independent of the precisions of the other operands. This is easily done with (C):

      CONST pi := 3.14159265359;
      VAR X : FLOAT(5, 0.0 .. 1.0) := SIN(pi/6.0);

With (A), the form of the literal must indicate the actual precision intended (e.g., the use of "E" vs. "D" in FORTRAN exponents). with (B), either an explicit conversion must be applied when the target has a different precision than the literal, or a special rule is needed to explain why such an assignment is legal.


Efficiency

Among the three alternatives, (A) is the simplest to implement efficiently. However, the characteristic of (A) which facilitates the production of good code is the same property which defeats portability; viz., the requirement that floating point variables be declared with their actual (rather than logical) precision.

There are no significant differences in efficiency between (B) and (C), since the same implementation techniques may be applied to both approaches. In particular, by regarding a routine with a formal FLOAT parameter as though it were generic with respect to specified precision, the translator can obtain the effect of (B) even though the language has adopted (C). Moreover, through the generic facility the user may define a new type with a translation-time constraint corresponding to the specified precision attribute of the underlying FLOAT type and thus realize the semantics of (B) via (C). A detailed discussion of these issues is found in Section 3.1.2.


Conclusions

RED provides a single floating point data type, FLOAT, and requires the user to supply a manifest specified precision whenever a FLOAT variable is declared. Precision serves as a "subtype" rather than a "type" property. This approach comes closer to meeting the Steelman requirements than the other two alternatives considered, is as good or better with respect to the goals of portability and simplicity, and permits the generation of efficient object code.




3.4.4  FIXED-POINT ARITHMETIC

Section 3-1F of the Steelman specifies a fixed point facility which provides exact numeric values. Such a concept is appropriate when dealing with quantized objects for which the natural unit is not one quantum. The classical example of such objects is money, where the natural unit is dollars while the quantum size is one hundredth of a dollar. As another example, measurements for graphics devices are naturally expressed in inches, but the quantum unit depends upon the particular device and might be tenths, hundredths or thousandths of an inch. This fixed point concept presents some difficulties, but they are not insurmountable. The requirement of "no implicit truncation or rounding" (3-1F) determines that arithmetic operations must be performed to maintain all low order bits with the resulting possibility of an overflow exception. The only real problem posed by the requirements is that arbitrary scale factors may be used (3-1G) and that operations for exact arithmetic shall apply between arbitrary scales." These requirements imply scaling rules which are sometimes counter-intuitive. For instance, if X is measured in units of 1/1000 and Y is measured in units of 1/1001, then X+Y must be measured in units of 1/1001000. If both X and Y are in the range (-1,1), they can be easily represented in a. mini-computer's sixteen bit word (or twelve bit word for that matter). Nonetheless, X+Y requires 21 bits to accommodate the range (-1,1) without sacrificing low order bits. Thus, although addition reduces accuracy, the sum of two single precision numbers may require double precision! Depending upon the language design, the compiler will either generate very inefficient code or accept the likelihood of a run-time overflow.

Although it is possible to write inefficient code using an exact fixed point facility, one can also write very reasonable and efficient programs. Furthermore, the compiler can often warn the user about unexpected inefficiencies. The question is not so much whether it is possible to meet the Steelman requirements, but whether a complying language will fulfill the needs of the military community.

Through the several reviews of the Ada design project, we have become increasingly aware of, first, a concern over the Steelman requirements for fixed point and, second, the absence of a DoD consensus on what such requirements should be. As a result, the RED language does not define a fixed point facility; instead, it is our intention to integrate such a facility into the language when the DoD requirements in this area are clarified. See Appendix A for a discussion of the main issues surrounding fixed point arithmetic and some suggestions for a viable approach.




3.5  ENUMERATION TYPES


3.5.1  INTRODUCTION

Many applications use sets of constant data values which are most conveniently denoted by mnemonic names. For example, a program that processes messages of potentially different security clearances will be easier to read and maintain if the clearances are represented by names such as "secret" and "confidential" rather than by numeric encodings. Although languages such as FORTRAN allow the user to define variables preset to the desired encodings, program reliability suffers in three ways:

  1. The preset variables may be inadvertently modified during execution (i.e., there is no guarantee that they remain constant).

  2. The preset variables may be used arbitrarily as integer data; e.g., it would be legitimate to take the sum "secret+confidential", which should be an error.

  3. 3) The programmer must explicitly encode the representations for the data (e.g., "secret" being preset to 1, "confidential" to 2, etc.), which in almost all cases is irrelevant to the program's behavior.

Languages such as CMS-2, JOVIAL, and Pascal have solved these problems by providing enumeration (or "status") types, with which the user can directly specify a set of mnemonic names to denote constant data values. Such a facility is required by Steelman 3-2A and 3-2B, and is part of the RED type definition scheme.

In short, an enumeration type in RED comprises an ordered list of enumeration elements. By placing range constraints on an enumeration type, the user can obtain a subtype. Successor (SUCC), predecessor (PRED), and ordinal position (POS) functions are provided for each enumeration type, together with the binary comparison and ordering operators (=, /=, <, <=, >, >= ). Enumeration types may be used as index types for arrays and repeat-variable declarations, and may appear in SET and STRING specifications. As an example:

example

The main issues affecting the design of the enumeration type mechanism in RED concern meshing the facility with the language's rules for type equivalence, type opacity, and namescopes, and the realization of character sets as required by Steelman 3-2D.




3.5.2  ENUMERATION ELEMENTS

In Pascal, enumeration elements are treated as scoped identifiers; i.e., given the Pascal declaration

      TYPE vehicle = (tank, ship, plane)

the translator treats the identifiers tank, ship, and plane as though they were declared as constants in the scope immediately containing the type declaration. This approach is undesirable in RED for the following reasons.

  1. Enumeration types must be allowed to "overlap" -- i.e., the same element may appear in several different enumeration types in the same scope -- in order that character sets be realizable as required by Steelman 3-2D. This precludes treating enumeration elements as implicitly declared and scoped constants.

  2. RED is much richer in namescopes than Pascal -- e.g., the branches of an if statement are namescopes in RED. As a result, the scope of declaration for enumeration elements would be somewhat harder to perceive.

  3. The Pascal approach has the same disadvantages as the "single instance generator" approach to type equivalence (see Section 3.2.4).

In light of these considerations, we adopted the convention that enumeration elements have the properties of literals rather than scoped identifiers. A consequence of this choice is that, since it is possible for enumeration types to overlap, the type of an enumeration literal will not always be immediately determinable from the literal itself. (This is analogous to the situation with NIL, a literal belonging to any indirect type.) Thus it will sometimes be necessary to "disambiguate" an enumeration element; in particular, when it is passed as an argument to an overloaded routine for which more than one match could be found based on the possible types to which the element belongs. For example:

example




3.5.3  ORDERED VS UNORDERED ENUMERATIONS

Enumeration types in RED are ordered, as required by SM 3-3B. However, it is straightforward to obtain unordered enumeration types through the declaration of new types. As an example, suppose that the programmer intends color to have the semantics of an unordered enumeration of 'red, 'yellow, 'green, and 'blue. The following declarations will serve this purpose:

example

Color is an unordered type because of the language's type opacity conventions: the ordering operations defined for the underlying enumeration type are not automatically overloaded for color. Since literals, similarly, are not extended to new types, the conversion function #[color] was defined. Thus, if c1 and c2 have been declared of type color, the following usages are legal:

example

whereas the following are illegal:

example




3.5.4  REALIZATION OF CHARACTER SETS

Character sets are achieved in a straightforward fashion in RED via enumeration types. Since ordering operations are desired for character sets, ABBREViation declarations will typically be used; the definition of the built-in ASCII character set is given in Section 2.2.4.

***********

1 The actual precision of a floating point value is the number of decimal digits in its (normalized) mantissa.

2 The specified precision of a floating point variable is the minimum number of decimal digits needed in its (normalized) mantissa.






Type Opacity     Intro     Abbreviation
New Type     Realizing New Types     Transparency Problem     Opacity Problem     Conversion
Literals & Constructors     INT     FLOAT     Alternatives     Steelman     Portability     Simplicity     Efficiency     Conclusions
Fixed-Point     Enum Types     Intro     Enum Elements     Ordered vs Unordered     Char Sets    

3. Types (Part 1) left arrow
x
right arrow 3. Types (Part 3)


Overview

Requirements
      Strawman
      Woodenman
      Tinman
      Ironman
      Steelman

RED Reference
RED Rationale

Types in RED
Time/Life Computer Languages
Memories

Site Index

Overview             Reference ToC             Rationale ToC             Site Index


Home   Favorites   Map

IME logo Copyright © 2009, Mary S. Van Deusen