3. TYPES (Part 2)
3.3 TYPE OPACITY
Type definitions serve a variety of purposes in programs. When some of these
purposes conflict, the language design must provide reasonable solutions. A case in
point is the issue of "type opacity"; in this section we describe the underlying
problems and the RED approach to solving them.
3.3.1 INTRODUCTION
The main issues connected with type opacity may be summarized fairly readily.
When an identifier is declared as a type, does it simply serve as a shorthand
notation for the underlying type used in its declaration, or does it denote a new
type, distinct from its underlying type? If the latter, are the behavioral
properties of the underlying type (e.g., literal/constructor forms, component
selectors, predefined and/or user-defined routines) automatically extended so that
they apply to the declared type?
The following terms are used throughout the discussion. If a type declaration
introduces a shorthand notation, this will be called the abbreviation approach.
Alternatively, if a type declaration causes a new type to be created, this will be
called the new-type approach. In the latter case, a new type is said to be
transparent with respect to a behavioral property of the underlying type if this
property is automatically extended to the new type and is said to be opaque
otherwise.
RED uses both the abbreviation and new type approaches. New direct types are
transparent with respect to selector, assignment, and equality operations and opaque
with respect to all others. The remainder of this section provides a justification
for this strategy and describes the interactions of these features with other parts
of the language. Section 3.7 will discuss
type opacity issues related to indirect types.
3.3.2 ABBREVIATION APPROACH
One of the ways in which a type definition facility can be used is as a
convenient notational shorthand. For example, suppose that the subtype
FLOAT(8, -1.0 .. 1.0) is required for a large number of data declarations. It does
not help either the reading, the writing, or the maintaining of the program to
repeat the specification FLOAT(8, -1.0 .. 1.0) at each declaration -- e.g., if the
precision must be increased from 8 to 10, to do so on an individual basis would be
tedious and error-prone. What is desired in this case is a means for introducing an
identifier that serves as an abbreviation for a subtype specification, but which
would not introduce a new type.
The abbreviation declaration in RED provides the facility just described. For
example, the declaration
ABBREV fraction : FLOAT(8, -1.0 .. 1.0);
defines the name "fraction" as a subtype abbreviation, so that the user could
subsequently declare
VAR x : fraction;
and then use x as a FLOAT variable with the subtype constraints specified in the
declaration of fraction.
Consistent with type declarations (see Section 3.1.1), abbreviation
declarations may be parameterized, thus providing both language uniformity and a
useful notational flexibility. For example, if the user wants to constrain a FLOAT
range to, say, -1.0 to 1.0, but allow variation with respect to precision, the
parameterized abbreviation declaration
ABBREV fraction(p : INT) : FLOAT(p, -1.0 .. 1.0);
may be used as in the following examples:
VAR x1 : fraction (8);
VAR x2 : fraction (10);
Some readers may wonder why the keyword "ABBREV" was adopted instead of "TYPE"
or "SUBTYPE", either of which might seem appropriate. The reason that "TYPE" was
rejected for the abbreviation keyword is that it carries a connotation of creating a
new type. In fact, the new-type declaration in RED is introduced by the keyword
"TYPE". The reason that we decided against the keyword "SUBTYPE" in abbreviation
declarations is that the term "subtype" is used with a specific meaning in RED -- a
V subtype is the result of invoking a type. In particular, to say that an object has
a given subtype is to imply that all subtype constraints have been determined.
ABBREViations, on the other hand, may be parameterized and hence are not subtypes.
For example, fraction in the preceding paragraph is not a subtype; fraction must be
invoked, as in fraction (8), to yield a subtype -- in this case, FLOAT(8, -1.0 .. 1.0).
3.3.3 NEW TYPE APPROACH
Mechanism for Realizing New Types
In contrast to the situation where an identifier is declared simply to denote
an existing subtype, there is frequently the need to introduce new types in
programs. Two of the main reasons for defining new types are to obtain efficiency
and to achieve pfcfecgicg. For example, consider the type declaration
TYPE access_key : INT(0 .. 255)
Since the bounds of the underlying subtype are fixed (and manifest), there is no
need to store run-time "dope" information with objects of type access_key, nor is
there a need for routines taking access_key parameters to perform run-time bounds
checking on their arguments. From the point of view of protection, new types help
the programmer distinguish those operations relevant to the new type from those
which pertain to the underlying type. In the above example, it should be possible
for the declaration of access_key to be replaced by
TYPE access_key : ARRAY INT(1 .. 8) OF BOOL
and affect only the definition of the (abstract) operations on access_key; i.e., no
uses of these operations should have to be changed.
It is possible to achieve the semantics of the new-type approach through a
combination of abbreviation and capsule declarations. In fact, this was essentially
the method adopted by the BLUE language during Phase 1 [Go78]. For example, the
user could define
with the rule that outside the capsule the INT operations would not be applicable to
data declared as access_keys.
Although in principle the above method is valid, RED provides a different
notation for new-type declarations than for abbreviations. To see why this was
done, consider the following implications of the "encapsulated abbreviation"
approach:
- Having two types treated as identical in some contexts (inside a capsule)
but as different types in others (outside a capsule) would be confusing and
cause complications in the semantics for overloading.
- If the user declared a capsule in all contexts where a new type was desired,
there would be a proliferation of capsules in the program. Syntactically,
this would be cumbersome. Moreover, since the capsule mechanism is
relatively new in programming languages, we were unwilling to impose this as
the only style with which new types could be obtained.
- Efficiency would be impaired. It is often the case that a new type is
introduced for the purpose of pinning down the value of a subtype constraint
for the underlying type; e.g., declaring a type t to be
ARRAY INT(1..3) OF FLOAT(5, -1.0 .. 1.0) with the expectation that no "dope
vector" information will be stored with data of type t with the
encapsulated abbreviation approach, such an optimization would be difficult
to perform; inside the capsule, data declared to be t would have the same
type as arbitrary ARRAY INT OF FLOAT data.
As a result of these considerations, RED provides the type declaration as the
means for realizing the new-type approach. The issue then becomes whether new types
should be transparent or opaque. For example, given the type declaration
TYPE access_key : INT(0 .. 255);
do we have the INT behavior (literals, operators, conversion functions, user-defined
routines with INT parameters, usage as index types) automatically extended
(overloaded) so that they also apply to access_key data? In RED, the answer is
"no": new types are transparent only with respect to selector, assignment, and
equality operations and are opaque with respect to all others. The reasons for our
decision will now be explained.
Problems with Complete Transparency
We should first point out that complete transparency does not imply that the
new type is just an abbreviation. For example, access_key could be transparent with
respect to all INT behavioral properties yet be a different type from INT: cross
assignments between access_key and INT variables would be prohibited, and routines
defined to take access_key parameters could not be passed INT arguments.
The main reasons for rejecting the complete transparency approach to new types
are, first, that it provides default behavior for new types which in many cases are
inappropriate and, second, that confusion arises from the interaction between this
approach and overloading.
To see the first problem, consider that access_key as a transparent type would
have the full battery of arithmetic operations provided; e.g., K1 * K2 would yield
an access_key, where K1 and K2 are access_key data, and IFLOAT(K) would convert
access_key K to type FLOAT. In some cases -- in particular, when the new type is to
be an extension of an existing one -- the provision of such defaults might be
appropriate. More frequently, however, a new type is intended to restrict rather
than extend the behavioral properties of an existing type; the default inheritance
of the underlying behavior would be inappropriate and error-prone. Although these
problems could be suppressed outside the capsule in which the new type is declared
(by not exporting the inappropriate routines from this capsule), they would still
arise within the capsule.
The second problem arises from an interaction between transparency and
overloading. Let us look more closely at what INT operations would be overloaded
for type access_key. Presumably "+" would take two acoess_keys and return an
access_key (a run-time check being performed to guarantee that the result is in
range). Unary "-" would apply to an access_key and raise an exception whenever its
actual parameter is non-zero. Suppose the user defined a function F which takes a
formal parameter with subtype INT(m .. n) and returns a result of the same subtype.
If F is automatically overloaded so as to take and return an access_key, how are the
subtype bounds m and n used (note that they might be run-time computable)? Does the
implicit overloading apply only to those INT routines which are known at the point
where the type declaration of access_key occurs, or does it also occur in inner
scopes when the user there declares a new routine with an INT parameter? If the
latter alternative is chosen, does the redefinition (in an inner scope) of an outer
INT routine implicitly redefine there the corresponding access_key routine?
While answers might be formulated to all of these questions, the issues serve
to illustrate that language complexity would result if new types are defined to be
transparent with respect to the behavioral properties of the underlying type. This
complexity, coupled with the desire to minimize defaults in the language, caused us
to reject the "complete transparency" approach to type definition.
Problems with Complete Opacity
In the case of complete opacity, no behavioral properties of the underlying
type are inherited by the new type. Thus, in the following examples:
TYPE access_key : INT(0 .. 255);
TYPE vector : ARRAY INT(1 .. 3) OF FLOAT(5, -1.0E7 .. 1.0E7);
none of the INT behavior automatically applies to access_key data, and none of the
properties of ARRAY INT OF FLOAT automatically apply to vector data; all that is
available in either case is the ".ALL" selector. In the first example this is not a
problem, since the user can overload explicitly any INT routines which are to apply
to access_key and can define a "constructor" function to resolve INT literals to
type access_key. In the second example, however, there is no way to overload the
array selector operations that would permit a subscripted vector variable to be used
as the target of an assignment. Thus with the declaration
VAR v : vector;
the user would have to write
v.ALL(i) := 0.0;
instead of the more traditional and convenient
v(i) := 0.0;
To allow such familiar notation, the RED language provides for new types to be
transparent with respect to selector operations when the underlying type is an
aggregate (e.g., array, record, or union). If the selectors are to be available
outside the capsule in which the type is declared, they must be explicitly exported
-- consistent with the language's rules for all other encapsulated properties.
In order to comply with SM 5F ("Assignment ... shall be automatically defined
for each variable"), the RED language provides an assignment operation for each new
(direct) type in terms of the assignment operation for the underlying type. In
cases where different semantics are desired, assignment can be defined by the user
(see Section 9.1). If the type is to be "non-assignable", the programmer can
realize this effect simply by not exporting assignment from the capsule where the
type is declared.
New types in RED are transparent with respect to equality as well as
assignment. Although this conflicts with SM 3-3C, our decision was motivated by
considerations of language uniformity. Either assignment and equality should both
be automatically provided, or neither should be, since the abstract properties of
these two operations are interrelated.
Conversion Between New and Underlying Types
RED provides a simple and efficient mechanism for converting between a new type
and its underlying type (SM 3B); viz., a selector notation (".ALL") which implies
sharing rather than copying. For example, with the declaration
VAR k : access_key;
the object k.ALL is a variable with subtype INT(O .. 255). Thus
k.ALL := 63;
converts the INT 63 to type access_key, and
VAR i :: INT (O .. 255);
i := k.ALL;
converts k to type INT.
The export of the ".ALL" selector from capsules is allowed (again, consistent
with the language rules for export of encapsulated properties). This is useful in
contexts where a type is introduced for efficiency rather than protection
considerations, or where the intent of a capsule is to define conversion operations
between two types, each of which is defined in a separate inner capsule. In the
second case, each inner capsule could define and export its individual type and the
".ALL" selector, while the outer capsule defines the conversion operators (taking
advantage of the ".ALL") and then exports the two types together with the
conversions. The ".ALL" selectors would not be exported from the outer capsule.
The user can define an explicit constructor operation which takes a value from
the underlying type and converts it to the new type. For example, the declaration
would enable the user to write statements such as
An alternative to the ".ALL" conversion scheme is to automatically provide
"copying" conversion operations between the new and underlying types. For example,
the language could automatically supply a "#" operator as an explicit conversion
from the underlying to the new type whenever a new type is declared, thus saving the
user the effort of defining such a function. The main reason for not doing this is
that in many cases the effect of the conversion depends on the user's intent,
especially when the new type has subtype attributes. Rather than attempt to draw
the fine line required between those cases where a language-defined conversion would
be appropriate and those where the user would have to define one explicitly, we
opted for the simpler (and safer) position that a "copying" conversion from
underlying type to new type be the user's responsibility to define. As an example,
consider the type declaration
This type might be useful where arbitrarily long strings are expected, only the
first 5 characters of which are to be retained, but whose original lengths are
relevant. The problem with trying to provide automatically a conversion from
STRING[ASCII] to t are that first, the destination subtype could not be uniquely
determined from the source string (i.e., "ABCDE" could be converted to t(5), t(6),
etc.), and second, the semantics of the conversion must be user-specified when the
source string has length greater than 5 (whether an exception is raised, whether the
first 5 characters are used, etc.).
It should be noted that programmer definition of conversion functions allows
the use of constructor notations which are independent of the underlying type. For
example, a capsule which defines the data type complex could export a conversion
operator which takes a RECORD[real_part, imag_part : FLOAT] and constructs a
complex. Thus, outside the capsule one can write
VAR z : complex;
Z := [real_part: 0.5, imag_part: 0.75] # complex;
with no commitment that the underlying type is that used in the constructor. Inside
the capsule the type complex might be defined (via polar coordinates) as
TYPE complex : RECORD[angle, magnitude : FLOAT(5, -1.0 .. 1.0)]
even though the conversion function takes as its input a record in Cartesian
coordinates. Thus a change in the underlying type (prompted, e.g., by efficiency
considerations) need not impact the use of conversion functions outside the
encapsulation.
3.3.4 LITERALS AND CONSTRUCTORS
One of the most difficult issues in the design of a strongly-typed language
with rich data definition facilities is the choice of the conventions for literals
and constructors. Tradeoffs must be made between language simplicity and notational
convenience, and interaction with other language rules -- such as type opacity and
overloading of routines -- must be considered. The issue is: given the occurrence
of a literal or constructor in a program, what is its type, and what is its subtype?
Two extreme positions are possible. The conservative approach requires that the
subtype (and thus also the type) of each literal and constructor be determinable
independent of the context of its use. This determination would have to be made
either by the form of the literal or constructor -- e.g., TRUE always has type (and
subtype) BOOL -- or by explicit resolution -- e.g., 'A#ebcdic resolves the literal
'A to type ebcdic. At the other extreme, the liberal approach assumes that literals
and constructors are intrinsically ambiguous with respect to type and subtype,
allows literal and constructor forms to be inherited from an underlying type when a
new type is declared, and permits unresolved literals and constructors whenever the
type (or subtype, if required) can be determined from the context of their use.
With this approach, the following examples would be legitimate:
The conservative approach achieves simplicity of language rules, but at a
considerable price in notational convenience. For example, the literal NIL could
never be used without explicit resolution in any scope where more than one indirect
type is known:
With NIL (and with enumeration literals appearing in more than one type) the problem
is type ambiguity. An even more severe problem notationally is that of subtype
ambiguity, especially in the case of floating point literals. What is the subtype
(in particular, the precision) of, say, the literal 1.23? In languages such as
FORTRAN, where there are only as many different floating point types as there are
different actual precisions on the target machine, this is not really an issue.
Even though only three decimal digits appear in the literal, a single-precision
approximation will be guaranteed. However, such an implementation dependency is not
satisfactory, and in the absence of an explicit resolution of precision the
conservative approach must define an apparent precision for each floating point
literal. Counting the digits which follow the leading 0's in the literal is a
reasonable method; thus the literal 1.23 will be represented with at least 3 decimal
digits of precision. It may be noted that the technique of defining and then using
constants set to floating point literals expanded to extremely high precision will
not work well under the conservative approach, since the computation of expressions
referencing such constants would be carried out at the maximal target precision.
The liberal approach gives the user a fair amount of notational flexibility,
but it is not simple to state the circumstances under which the context can suffice
to resolve the (sub)type of a literal or constructor. For example, the rules for
overloading might have to be extended so that an overloaded function could be
selected on the basis of the result type required in a given context. Consider as
illustration the type angle, where the user has overloaded the "+" operator so that
the sum of two angles yields an angle (modulo 2 times pi):
The issue here is the interpretation of pi+4.12, for which two equally valid
possibilities arise.
- With the "bottom-up" approach, the manifest FLOAT sum is computed (i.e., pi
and M.12 are treated as FLOAT), with 7.26159 as a result. An attempt to
interpret this value as an angle fails (since it is out of the range
0.0 .. 2.0*pi), and the assignment a := pi+4.12 is flagged as erroneous.
- With the "top down" approach, a target type for the sum is determined --
viz., angle, the type of the variable on the left side of the assignment.
Thus an overloading of "+" is sought for which the result type is angle.
The only candidate is the user-defined "+" function whose two formal
parameters have type angle. Thus an attempt is made to interpret pi and
H.12 individually as angles. This succeeds, and the translator compiles a
call of this user-defined "+" function with the two given operands. At
run-time, a will be assigned the value 0.07841 (qua angle).
To the reader of the program it is not obvious which of the above two
interpretations is valid.
The RED language strikes a compromise between the conservative and liberal
approaches. In the interest of program readability, we are. closer to the
conservative position; as a result, explicit type resolution is required in some
contexts where the liberal approach would have allowed bare literals or
constructors. We can summarize the RED conventions as follows:
- Opacity of new types regarding literals and constructors
Literals and
constructors are not inherited by new types. Thus if one wanted to assign
the constant pi to angle a, the attempted assignment
a := pi;
would be illegal. Instead, the user should define "#" as a conversion from l
FLOAT to angle and then write
a := pi#angle;
Thus the assignment
a := pi+4.12;
would be illegal and have to be written
a := (pi+4.12)#angle;
to realize the "bottom-up" approach, or
a := pi#angle + 4.12#angle;
to realize the "top-down" approach. It should be noted that the
non-inheritance of literals is simply part of the language's type opacity
rules (see Section 3.3.3).
- Type resolution of ambiguous literals.
Uniform conventions apply to the
literal NIL and to an element appearing in more than one enumeration type.
If the routine to which the literal or enumeration element is passed is
overloaded so that different versions of the routine could be selected under
different type interpretations of NIL or the enumeration element, then an
explicit resolution of the type must appear. For example, if the user
defines a character set ebcdic as an ABBREV for an enumeration type, and the
element 'A is in this set, then it is legitimate to write
VAR e : ebcdic;
e := 'A;
since the assignment routine for ebcdic requires an ebcdic source value.
However, it is not correct to write PRED('A) since the PRED function is
defined for both ASCII and ebcdic; the user would have to write either
PRED('A#ASCII) or PRED('A#ebcdic). Note that it is permissible to resolve a
literal even if the context does not require it.
VAR a : ASCII;
a := 'A#ASCII;
is always legal.
- Type resolution of ambiguous constructors
Some constructor forms may be
ambiguous, either because they contain ambiguous literals or because they
can apply to different aggregate types. For example, [a : 10] could have
any union type with an INT field named "a". The rules for such ambiguous
constructors are identical to those in (2) for NIL and ambiguous enumeration
elements.
- Subtype resolution of literals and constructors
Since routines can
interrogate subtype attributes of their formal parameters, the issue of the
subtype of a literal or constructor passed as an actual parameter is
relevant. If a literal or constructor is passed as an actual parameter, and
the formal parameter is declared with a type (rather than a subtype), then
the user may either explicitly specify the subtype via a resolution form or
pass the bare literal or constructor and obtain the subtype determined by
the language rules for the particular literal or constructor.
3.4 PREDEFINED TYPES
3.4.1 BOOL TYPE
The primary issue with respect to the Boolean type is the set of operations which are provided and,
in particular, the role of "short-circuited" conjunction and disjunction. Steelman (SM 3-2C and 6D)
requires a type for Boolean values and "infix control operations for short circuit conjunction and disjunction
of the controlling Boolean expression in conditional and iterative control structures." (A
"short-circuited" operation is one whose operands are guaranteed to be elaborated in left to right order,
and where only as many operands are elaborated as are necessary to determine the result of the operation.)
RED provides the BOOL type and defines a unary operator for complement (NOT) and binary operators for
conjunction (AND), disjunction (OR), exclusive disjunction (XOR), and comparison (=, /=). AND and OR
are conditional (i.e., short circuited.)
The following alternative approaches were considered but rejected:
- Define the Boolean type with none of the above operators and, instead, introduce the logical
operators as part of the syntax for assertions and IF and REPEAT statements.
- Provide operators for non-conditional (as well as conditional) conjunction and disjunction.
The main rationale for A is that since Boolean expressions are most
frequently used to determine control flow, programs may be easier to understand if the control
flow dependencies are localized to the control structures themselves, instead of appearing
(implicitly) in the form of Boolean variables to which controlling expressions have been
assigned elsewhere. That is, the intent of
IF (I < J) AND (J < K) THEN ... END IF;
is clearer than that of
VAR flag : BOOL;
...
flag := (I < J) AND (J < K);
...
IF flag THEN ... END IF;
However, the anomaly of a type with no operators would be potentially confusing; Boolean operators
would appear to be present in the language, yet their use would
be restricted to only certain contexts. Moreover, the definition of "predicate
functions" -- functions which return Boolean values -- would be difficult and
error-prone to write, since Boolean expressions would be disallowed as the source of
an assignment. Further, A1 represents an attempt to over-dictate pnogramming style
via language restrictions. Some embedded applications require the manipulation of
Boolean values as data and not simply to guide control flow, and the adoption of A1
would seriously impair the programming of such applications.
Although Alternative A was rejected the programmer can still adhere to the style which
A would have
set. An analogy with the GOTO statement is relevant here. Although the RED language
provides the GOTO statement (required by SM 6G), its usage is expected to be minimal
in light of the availability of more structured control constructs. Similarly,
although Boolean expressions are allowed outside conditional and iterative control
structures, readability considerations imply that in most circumstances such usage should
be minimized.
As to the issue of which operators to provide for the BOOL type, there are
arguments in favor of supplying more operators than have been defined in RED (e.g.,
non-conditional conjunction and disjunction as mentioned in Alternative B) and
arguments in favor of supplying fewer (e.g., inequality).
The exclusion of non-conditional conjunction and disjunction was motivated by the
desire for language simplicity. In environments where non-conditional elaboration is
more efficient than conditional elaboration (e.g., where both operands
can be elaborated concurrently, as on a "pipeline" machine architecture) the user
(or an implementation library) can define functions with the
non-conditional elaboration semantics.
Although the inequality operator is redundant, since it computes the same result as
XOR, any attempt to eliminate this redundancy would result in non-uniform language rules:
- If the XOR and equality operations were provided for BOOL but inequality was not,
this would violate the language rule that the definition of the equality operator
automatically implies the existence of the inequality operator.
- If the XOR operation was provided for BOOL but neither of the comparison operations
were, BOOL would be the only one of the basic built-in types not to have these operations.
- If the comparison operations were provided for BOOL but XOR was not, this
would violate the uniformity between SET and BOOL operations (exclusive
disjunction is required for sets by SM 3-4B).
3.4.2 INT TYPE
RED provides the INT type with MIN and MAX attributes to satisfy the Steelman
requirements for integer arithmetic (SM 3-1B, 3-1F, 3-1H). Since subtype
constraints must always be given on variable declarations, RED automatically
complies with the Steelman requirement that variable ranges be explicitly specified
by the user, rather than implicitly determined by the implementation (SM 3-1C).
Thus, the attempted declaration
VAR x : INT;
is incorrect, and the user must instead supply a declaration such as
VAR x : INT(m .. n);
with explicit range information. If it is necessary frequently to declare INT
variables whose range is the maximum supported on the target machine, this can be
done via an ABBREViation:
ABBREV integer : INT(LOW_INT .. HIGH_INT);
VAR x : integer;
A standard set of operations apply to INT operands: unary and binary + and -;
binary *, **, MOD, DIV, comparison operators and relational operators; absolute
value (ABS); and conversion to floating point (IFLOAT).
The main issue concerning the INT type is one of implementation when the target
machine supports arithmetic for different integer lengths (e.g., halfword, word,
doubleword), especially since the determination of a range for an INT might not be
known until runtime. A detailed discussion of this issue is found in Section 3.1.2.
3.4.3 FLOAT TYPE
Alternative Approaches
Among the problems that must be faced in the design of a language intended for
a variety of target machines, one of the most vexing is the choice of the language
facilities for approximate arithmetic. There is no obvious solution, and tradeoffs
must be made among program portability, language simplicity, and run-time
efficiency.
The design choices considered were the following:
- Provide a set of floating point types, each corresponding to an actual
precision supported on the particular target machine.
- Provide a set of floating point types, each corresponding to a specified
precision1 chosen by the user.
- Provide a single floating point type, with the programmer required to
supply a specified precision2 at the declaration of each floating point
variable.
The RED language has adopted the third alternative. The main reason for this
choice is that it comes closest to meeting the Steelman; there was not a clear
enough advantage to either of the other two alternatives to justify deviating from
the DoD requirements.
In the remainder of this section we shall look at the three choices with
respect to the following criteria:
- Satisfying the Steelman requirements
- Portability
- Simplicity
- Efficiency
Satisfying Steelman
There are strong implications in Steelman that precision should not distinguish
type (i.e., that there should be a single type for approximate arithmetic):
- Steelman 3D states: "The constraints that characterize subtypes shall
include range [and] precision .... " That is, just as integer objects of
different range shall have the same type, so floating point objects of
different precisions shall have the same type.
- Steelman 3B states: "There shall be no implicit conversions between types"
and Steelman 3-1D states: "Explicit conversions shall not be required
between precisions." If precision distinguished type, one of these
requirements would have to be violated.
Both (A) and (B) are in violation of these requirements, whereas (C) is in compliance.
Portability
(A) clearly entails a sacrifice with regard to portability, since differences in
word lengths imply that a variable declared to be single precision for one target
machine (e.g., the CDC 6600) should be declared to be double precision for another
(e.g., the IBM 360).
(B) and (C) facilitate program portability, since the user specifies the
precision for floating point variables explicitly in terms of the minimum number of
decimal digits required in the mantissa. If the user declares a variable to have
precision 10, this would be implemented as single-precision on some machines, and
double-precision on others.
Simplicity
(C) is somewhat simpler than the other alternatives, since the existence of
several floating point types (with (A) or (B)) implies a need for special rules to
handle mixed-precision arithmetic or explicit conversions. With (C) the approach is
straightforward: binary FLOAT operations can take operands of different precisions,
with the result having the greater of the two precisions. No explicit conversions
are required between FLOAT data of different precisions.
Literals are handled more smoothly with (C) than with (A) or (B) with (C), a
simple rule can be formulated with respect to the precision of a literal (e.g.,
count the number of digits after all leading 0's), and the programmer may assign the
literal to variables with either less or greater specified precision. For example,
it is useful to be able to declare a constant which is equal to pi expanded to some
large precision, and then use this constant in computations independent of the
precisions of the other operands. This is easily done with (C):
CONST pi := 3.14159265359;
VAR X : FLOAT(5, 0.0 .. 1.0) := SIN(pi/6.0);
With (A), the form of the literal must indicate the actual precision intended (e.g.,
the use of "E" vs. "D" in FORTRAN exponents). with (B), either an explicit
conversion must be applied when the target has a different precision than the
literal, or a special rule is needed to explain why such an assignment is legal.
Efficiency
Among the three alternatives, (A) is the simplest to implement efficiently.
However, the characteristic of (A) which facilitates the production of good code is
the same property which defeats portability; viz., the requirement that floating
point variables be declared with their actual (rather than logical) precision.
There are no significant differences in efficiency between (B) and (C), since the
same implementation techniques may be applied to both approaches. In particular, by
regarding a routine with a formal FLOAT parameter as though it were generic with
respect to specified precision, the translator can obtain the effect of (B) even
though the language has adopted (C). Moreover, through the generic facility the user
may define a new type with a translation-time constraint corresponding to the
specified precision attribute of the underlying FLOAT type and thus realize the
semantics of (B) via (C).
A detailed discussion of these issues is found in Section 3.1.2.
Conclusions
RED provides a single floating point data type, FLOAT, and requires the user to
supply a manifest specified precision whenever a FLOAT variable is declared.
Precision serves as a "subtype" rather than a "type" property. This approach comes
closer to meeting the Steelman requirements than the other two alternatives
considered, is as good or better with respect to the goals of portability and
simplicity, and permits the generation of efficient object code.
3.4.4 FIXED-POINT ARITHMETIC
Section 3-1F of the Steelman specifies a fixed point facility which provides
exact numeric values. Such a concept is appropriate when dealing with quantized
objects for which the natural unit is not one quantum. The classical example of
such objects is money, where the natural unit is dollars while the quantum size is
one hundredth of a dollar. As another example, measurements for graphics devices
are naturally expressed in inches, but the quantum unit depends upon the particular
device and might be tenths, hundredths or thousandths of an inch. This fixed point
concept presents some difficulties, but they are not insurmountable. The
requirement of "no implicit truncation or rounding" (3-1F) determines that
arithmetic operations must be performed to maintain all low order bits with the
resulting possibility of an overflow exception. The only real problem posed by the
requirements is that arbitrary scale factors may be used (3-1G) and that operations
for exact arithmetic shall apply between arbitrary scales." These requirements imply
scaling rules which are sometimes counter-intuitive. For instance, if X is measured
in units of 1/1000 and Y is measured in units of 1/1001, then X+Y must be measured
in units of 1/1001000. If both X and Y are in the range (-1,1), they can be easily
represented in a. mini-computer's sixteen bit word (or twelve bit word for that
matter). Nonetheless, X+Y requires 21 bits to accommodate the range (-1,1) without
sacrificing low order bits. Thus, although addition reduces accuracy, the sum of
two single precision numbers may require double precision! Depending upon the
language design, the compiler will either generate very inefficient code or accept
the likelihood of a run-time overflow.
Although it is possible to write inefficient code using an exact fixed point
facility, one can also write very reasonable and efficient programs. Furthermore,
the compiler can often warn the user about unexpected inefficiencies. The question
is not so much whether it is possible to meet the Steelman requirements, but whether
a complying language will fulfill the needs of the military community.
Through the several reviews of the Ada design project, we have become
increasingly aware of, first, a concern over the Steelman requirements for fixed
point and, second, the absence of a DoD consensus on what such requirements should
be. As a result, the RED language does not define a fixed point facility; instead,
it is our intention to integrate such a facility into the language when the DoD
requirements in this area are clarified. See
Appendix A for a discussion of the
main issues surrounding fixed point arithmetic and some suggestions for a viable
approach.
3.5 ENUMERATION TYPES
3.5.1 INTRODUCTION
Many applications use sets of constant data values which are most conveniently
denoted by mnemonic names. For example, a program that processes messages of
potentially different security clearances will be easier to read and maintain if the
clearances are represented by names such as "secret" and "confidential" rather than
by numeric encodings. Although languages such as FORTRAN allow the user to define
variables preset to the desired encodings, program reliability suffers in three
ways:
- The preset variables may be inadvertently modified during execution (i.e.,
there is no guarantee that they remain constant).
- The preset variables may be used arbitrarily as integer data; e.g., it would
be legitimate to take the sum "secret+confidential", which should be an
error.
3) The programmer must explicitly encode the representations for the data
(e.g., "secret" being preset to 1, "confidential" to 2, etc.), which in
almost all cases is irrelevant to the program's behavior.
Languages such as CMS-2, JOVIAL, and Pascal have solved these problems by
providing enumeration (or "status") types, with which the user can directly specify
a set of mnemonic names to denote constant data values. Such a facility is required
by Steelman 3-2A and 3-2B, and is part of the RED type definition scheme.
In short, an enumeration type in RED comprises an ordered list of enumeration
elements. By placing range constraints on an enumeration type, the user can obtain
a subtype. Successor (SUCC), predecessor (PRED), and ordinal position (POS)
functions are provided for each enumeration type, together with the binary
comparison and ordering operators (=, /=, <, <=, >, >= ). Enumeration types may be
used as index types for arrays and repeat-variable declarations, and may appear in
SET and STRING specifications. As an example:
The main issues affecting the design of the enumeration type mechanism in RED
concern meshing the facility with the language's rules for type equivalence, type
opacity, and namescopes, and the realization of character sets as required by
Steelman 3-2D.
3.5.2 ENUMERATION ELEMENTS
In Pascal, enumeration elements are treated as scoped identifiers; i.e., given
the Pascal declaration
TYPE vehicle = (tank, ship, plane)
the translator treats the identifiers tank, ship, and plane as though they were
declared as constants in the scope immediately containing the type declaration.
This approach is undesirable in RED for the following reasons.
- Enumeration types must be allowed to "overlap" -- i.e., the same element may
appear in several different enumeration types in the same scope -- in order
that character sets be realizable as required by Steelman 3-2D. This
precludes treating enumeration elements as implicitly declared and scoped
constants.
- RED is much richer in namescopes than Pascal -- e.g., the branches of an if
statement are namescopes in RED. As a result, the scope of declaration for
enumeration elements would be somewhat harder to perceive.
- The Pascal approach has the same disadvantages as the "single instance
generator" approach to type equivalence (see Section 3.2.4).
In light of these considerations, we adopted the convention that enumeration
elements have the properties of literals rather than scoped identifiers. A
consequence of this choice is that, since it is possible for enumeration types to
overlap, the type of an enumeration literal will not always be immediately
determinable from the literal itself. (This is analogous to the situation with NIL,
a literal belonging to any indirect type.) Thus it will sometimes be necessary to
"disambiguate" an enumeration element; in particular, when it is passed as an
argument to an overloaded routine for which more than one match could be found based
on the possible types to which the element belongs. For example:
3.5.3 ORDERED VS UNORDERED ENUMERATIONS
Enumeration types in RED are ordered, as required by SM 3-3B. However, it is
straightforward to obtain unordered enumeration types through the declaration of new
types. As an example, suppose that the programmer intends color to have the
semantics of an unordered enumeration of 'red, 'yellow, 'green, and 'blue. The
following declarations will serve this purpose:
Color is an unordered type because of the language's type opacity conventions: the
ordering operations defined for the underlying enumeration type are not
automatically overloaded for color. Since literals, similarly, are not extended to
new types, the conversion function #[color] was defined. Thus, if c1 and c2 have
been declared of type color, the following usages are legal:
whereas the following are illegal:
3.5.4 REALIZATION OF CHARACTER SETS
Character sets are achieved in a straightforward fashion in RED via enumeration
types. Since ordering operations are desired for character sets, ABBREViation
declarations will typically be used; the definition of the built-in ASCII character
set is given in Section 2.2.4.
***********
1
The actual precision of a floating point value is the number of decimal digits in
its (normalized) mantissa.
2
The specified precision of a floating point variable is the minimum number of
decimal digits needed in its (normalized) mantissa.