Intermetrics, Inc. - RED Rationale - March 1979 - Overloading and Generic Definitions

.	Navigate

7. Exception Handling & Multitasking

9. Advanced & Low-Level Facilities

OVERLOADING &
GENERIC DEFINITIONS

Overloading     Providing Overloaded Definitions     Legality of Overloaded Definitions     Selecting the Overloaded Definition
Generic Definitions     Major Technical Issues     Overloading via Generics
Constraint Clauses     Subtypes     Manifest Expressions     Implementation

8.   OVERLOADING AND GENERIC DEFINITIONS

8.1  OVERLOADING

In accordance with SM 7A, RED permits routines to be overloaded. Such a facility is useful because it permits a single name to be given a standard meaning that transcends type boundaries. For example the name "+" can be chosen to mean addition. Addition is an action that is relevant to many types; e.g., integers, reals, vectors, matrices, etc. Overloading permits many definitions of + to be given. These definitions will be distinguished by the types of their actual parameters (i.e., they will have different signatures); also, they will have different bodies that perform the addition operation in accordance with the semantics of the corresponding types.
All typed languages provide a certain degree of overloading. At the very least, operations like + are overloaded for built-in types. In languages with user- defined abstract types, some means of permitting two types to have the same operation is required.
In RED, the declaration and invocation of an overloaded routine have the same form as for a non-overloaded routine, and the selection of the invoked routine is done in a simple manner. Furthermore, selection can always be done at translation time, since it is based on type information.
Various issues associated with overloading are discussed below. We will use the term "overloaded definition" to mean the declaration that overloads some routine name, and the term "use form" to mean the form that is used to invoke the overloaded routine.
In the interest of language consistency, RED does not restrict overloading to routines. For example, tasks and routines are quite similar; they are the deferred units invoked for their actions. To permit overloading for routines and not tasks would make an artificial distinction between them. Thus, overloading of tasks is allowed. Moreover, overloaded ABBREViations are useful. in dealing with related types and subtypes. This is particularly true in the case of enumeration types; e.g., the built-in ASCII ABBREViation is overloaded so that it may be used either as a type or as a subtype (in the former case it may be invoked with range bounds to yield a subtype, such as ASCII('A..'Z)). As a result of these considerations, RED permits any deferred declaration (other than a type declaration) to be overloaded. (Note that it is not possible to allow overloading of types, because of the potential ambiguity regarding the underlying type.)
Overloading can be produced by explicitly providing several definitions of the same name, or by means of a generic definition. In this section, only explicit overloading is discussed; generic overloading is discussed below in Section 8.2.

8.1.1  PROVIDING OVERLOADED DEFINITIONS

In RED, overloaded definitions are introduced as needed. For example, when a new abstract type is defined, some of the operations in the capsule defining the new type may be overloaded definitions. In the course of using the type, the programmer may provide further overloaded definitions. For example, if no "print" operation was provided in the capsule defining abstract type t, a programmer using t may, provide such a definition.
Overloaded definitions appear in the program text when it is convenient to introduce them. This is in contrast to ECL [Ha74] and PL/I, where it is necessary to provide "overload lines" in the single definition of the overloaded routine. The method in RED provides much better modularity.

8.1.2  LEGALITY OF OVERLOADED DEFINITIONS

When an overload use is processed to select an overloaded definition, it should not be the case that more than one overloaded definition could be selected. Such a situation must be considered an error: the alternative of having the translator make an arbitrary selection among the possibilities would certainly have a negative impact on readability and understandability. Furthermore, it is desirable that the error be detected at the time the overloaded definitions are translated rather than when they are used. This permits error detection to occur as early as possible, and also in the proper context, since it is really the definitions, and not the uses, that are in error.
RED satisfies these objectives by requiring that all overloaded definitions local to a scope must not conflict with one another. Two overloaded definitions conflict if their signatures are equal and their translation-time properties (if any) are equal. Recall that signatures describe the order and types of the formal parameters. Translation-time properties are described below, in Section 8.2.2. Since only types and not subtypes are involved, conflict between overloaded definitions can be detected at the time the overloaded definitions are translated.

8.1.3  SELECTING THE OVERLOADED DEFINITION

When the translator processes an invocation; e.g., "READ(a)", or "x+y", it determines the signature of the invocation, and selects the overloaded definition that has that signature. Note that:

There is no need to resolve ambiguity in selection, since conflicting overloaded definitions are a translation-time error. (The problem of ambiguous literals and constructors is addressed in Section 3.3.4.)

The selection can be done at translation time, since only types are involved.

More general selection rules can easily be imagined. For example, for an overloaded function, the result type could also be considered part of the signature. However, this leads to problems. For example, suppose f and g both have two (conflicting) overloaded definitions:
      FUNC g(x:INT) => INT;
      FUNC g(x:INT) => BOOL;
      FUNC f(x:BOOL) => BOOL;
      FUNC f(x:INT) => BOOL;
These definitions would not conflict if the result type were considered part of the signature. However, the invocation f(g(x)) does not unambiguously correspond to specific definitions of f and g. It is to avoid such problems that the result type is not included in the signature.

8.2  GENERIC DEFINITIONS

Generic definitions are a way of providing many definitions with a single form. Generic definitions have generic parameters; these differ from ordinary parameters in that the actuals must have values that are manifest. Furthermore, actual generic parameters are passed at translation time as compared to ordinary actual parameters, which are passed at run time. The generic provides a separate definition for every possible combination of actual generic parameters; since these parameters are manifest, the definitions can be produced at translation time.
Generic definitions can be given for each of the deferred units: procedures, functions, tasks, capsules, types, and abbreviations. Generic definitions are used when a class of definitions ranging over many types is wanted, or when a routine or task parameter is needed.

8.2.1  MAJOR TECHNICAL ISSUES

The major technical issues are the following: what is a legal generic E definition, what definitions does it produce, and when are they produced. At first consideration, it may appear that generics should simply provide a macro facility; i.e., that the names of the actuals should be substituted for the formals to obtain an instantiation of the macro; this instantiation would then be translated in the using environment.
However, careful study of a macro facility reveals some difficulties. The most serious of these is the lack of a definition environment. For example, suppose that a generic definition contains an invocation of a routine f to achieve a certain effect. with macro-like semantics it would be impossible for the definer of the generic to guarantee that the proper f will be invoked. Since the meaning of the generic definition is dependent upon the routines it invokes, it would be impossible for the definer of the generic to ensure that its intended meaning is realized. Therefore the macro approach is not acceptable; it conflicts directly with the higher level goal of modular program construction.
Generics in RED have aspects of both routines and macros. Like routines, a definition environment supplies the interpretation of all non-local identifiers. Also like routines, the values of actual parameters (not the names) are passed. However, generics resemble macros in that they are instantiated at translation time to produce the actual entity that is used at run time. Instantiation can occur at translation time because all the actuals are manifest.

Meaning of a Generic Definition

A generic definition comprises a set of generic parameters, an optional NEEDS list (see Section 8.2.3), and a declaration of a deferred unit. It can be thought of as providing an instance of the deferred declaration for all possible values that the formal generic parameters can assume. For ease of exposition, we will call the deferred declaration the "body" of the generic definition.
For example, consider the generic definition

This definition provides a function declaration for r[i,f] for every integer value i in the range 1..10 and function f from FLOAT to FLOAT.
Of course, it would be inappropriate to produce all instantiations at the time the generic definition is processed by the translator, since few of those instantiations would actually be used. In fact, it would be impossible to produce all instantiations, since the translator does not know all possible functions from FLOAT to FLOAT. Instead, the instantiations are produced when they are needed.
One possibility, similar to macros, would be to require an explicit form that instantiates a generic. Such a requirement would simplify the translator's job somewhat, but it would complicate the use of generics and result in the introduction of superfluous names by requiring the user to name each different instantiation. Therefore, we do not require explicit instantiation. In fact, it is transparent to the user whether a generic or a non-generic construct is being invoked. The first time a generic is used with particular actual values, the instantiation is actually constructed. When the generic is used with those values subsequently, the instantiation can be constructed again, or the previously constructed instantiation may be used. In the latter case, the translator must keep track of previously constructed instantiations. An instantiation can be named by the identifier that names the deferred unit, together with the values of the actual generic parameters. During instantiation, the generic formal parameters are given the values of the actual parameters in the environment of the instantiation. Then the deferred unit that is the body of the generic is translated in the defining environment. It is useful for some names in the generic to be bound in the instantiating environment; the NEEDS list fulfills this purpose.

Legality of Generic Definitions

The specification of a (formal) generic parameter includes a generic constraint. Uses of the formal generic parameter within the deferred declaration must be consistent with the constraint, and actual generic parameters supplied to instantiations must likewise satisfy the constraint. Thus generic constraints on formal generic parameters are analogous to type specifications on formal parameters to deferred units.
Actual generic parameters may be types, subtypes, routines, tasks, and manifest values, as required by Steelman (SM 12D). As an example, the following generic function performs numerical integration by Simpson's method:

The RED rules about legality of generics define the interface of the generic as fully as possible. The interface information constrains both the generic definition and its use. The use must provide actual generic parameters that satisfy the constraints, while the definition can only use the parameters in ways consistent with the constraints. For example, inside integrate, f must be used as a function from FLOAT to FLOAT; the user of integrate must provide such a function. Of course, only syntactic information can be captured in our interface description. Semantic restrictions, such as assumptions in integrate about the meaning of f, cannot be described or enforced; such a step requires a program verifier, not a translator.

8.2.2  OVERLOADING VIA GENERICS

Overloading definitions of a routine or task can be provided explicitly by writing different definitions (as discussed in Section 8.1), or implicitly by means of a generic definition. In either case, the overloaded definitions may not conflict. In the case of generic overloading, this means that for any routine or task name, and for any legal actual translation time property list associated with that name, there must only be one definition local to the definition scope with any given signature. The translation-time property list is specified through square brackets after the name; e.g.,

states that integrate has a single translation-time property. This generic definition provides a definition of integrate for any legal value of f (i.e., any function from FLOAT to FLOAT). Each such definition of integrate has signature FLOAT, FLOAT, INT. However, the definitions do not conflict with each other, since each has a different actual translation time property. The value for the actual generic property is supplied whenever integrate is used; e.g.,
      x:= integrate[sin](x, y)
A generic definition could produce routine or task definitions that conflict with other definitions, external to the generic but local to the scope containing the generic. There is no way that a generic could produce overloaded declarations that conflict with one another if we require that all generic parameters appear in the generic property list. This is, in fact, the requirement for non-overloadable deferred units such as capsules and types. However, we have weakened the rules for tasks and routines to enhance usability.
It is often the case that a generic parameter determines the type (or subtype) of a routine parameter. When this happens, the generic parameter can be derived from the actual routine parameter. For example, consider

A legal use of p is
      VAR y: INT(1..10);
      p[INT] (y);
However, such a form requires useless redundancy: the actual translation time property can only be the type of the actual parameter. No additional information is gained by requiring the user to explicitly state the actual generic property in this case, but the possibility of errors is increased.
RED permits a simpler form for the use of a task or routine when the explicit specification of the translation time property would constitute useless redundancy. Thus, in the above example we could write

The header of the deferred unit being replicated by the generic definition states exactly what the use form must be; the user need only look at this line to determine the proper use form. For example, the above generic definition states that
      p(y);
is the legal use form.
Since not all generic parameters need appear in the generic property list, for routines and tasks it is possible to write a generic definition that produces conflicting overloaded definitions. However, the translator can detect this error while processing the generic. For example,

is illegal, since it produces one definition of f for each possible type, but all these definitions have the same signature (INT). To avoid such an error, it is only necessary to ensure that each formal generic parameter is referenced either as a translation-time property or as part of the type or subtype of a parameter of the routine or task.
Translation-time property lists can be used in defining routines and tasks outside of generic definitions. This is useful when different bodies are needed depending on the return type (for example, in conversion routines). The selection rules for overloaded definitions produced by a generic definition are the same as for explicitly overloaded definitions. The conflict rules are also the same. Thus overloading is treated uniformly, whether produced by explicit or generic definitions.

8.2.3  CONSTRAINT CLAUSES

Parameters to generics can be routines, tasks, types, subtypes, and manifest expressions. In the sections below, the} different kinds of parameters are discussed.

Routines and Tasks

In accordance with Steelman (SM 12D), routines and tasks can be generic parameters. Each formal generic parameter constrained to be a routine or task must be specified with complete information about the types (or subtypes) of its formal parameters. As mentioned above, such a specification serves two purposes: it constrains the actual generic parameter that can be used to instantiate the generic, and it limits use of a routine or task parameter in the body of the generic to be consistent with its specification. Thus we guarantee that in any instantiation the invocation of the actual task or routine will be type-correct.
In specifying the actual routine or task parameter, the user supplies its name and actual translation—time properties (if any). This information must unambiguously select a routine or task definition. That definition is then used in selecting (and possibly creating, as explained above) the appropriate instantiation.

Types

RED permits types to be generic parameters. Usually type generic parameters will be used to stand for any arbitrary type, but forms are also provided to require that the type generic parameter be limited to one of a number of listed types, or to be a type from a particular category (e.g., enumeration).

The NEEDS List

When a generic definition is written based on a type generic parameter, then its body must be defined when any legal actual type is bound to that parameter. However, for the generic definition to have its intended effect, it sometimes needs to invoke routines that operate on data of that type. The NEEDS list, similar to mechanisms found in CLU and Alphard, is provided to state such a requirement.
For example, consider a sort routine that works on arrays of arbitrary component type. Such a routine can work only if the type is ordered. Such a need could be stated in RED as follows:

In the body of sort, "<" is used to determine the proper ordering of a.
The NEEDS list constrains both the body of the generic, and the use of the generic. In the body of the generic, tasks and routines listed in the NEEDS list must be used as specified (e.g., "<" must be used as a function that takes in two t's and returns a BOOL). In the use of the generic, there must be routines and tasks defined in the using environment as stated in the NEEDS list. For example, in
      VAR x: ARRAY INT(1..n) OF s;
      sort(x);
the invocation of sort is legal only if there is a function <, defined in the scope of the invocation, that takes two s's and returns a BOOL.
Other mechanisms than the NEEDS list can be used to ensure that the proper routines and tasks are made available to the generic instantiation. An obvious one is to simply add the needed routines and tasks as additional generic parameters, e.g.,

Such an approach would make generics harder to use, since the need for such routines is very common. For example, a set data abstraction is only defined for elements whose type provides an equality operation. If needed routines had to be passed explicitly, then most set operations would have such a parameter.
At first glance, it may appear that a NEEDS list (or explicit parameters) is totally unnecessary: why can't the translator figure out what is needed?
Essentially, the NEEDS list and generic formal parameters specify all the names that are to be bound in the using environment. An alternative method would be to attach the local meaning to a name that can be used in a generic definition, whenever the name is not a formal generic parameter and has a meaning in the definition scope. Any names left over would then be bound in the using environment. For example, in the function invocation
      x < y
in the sort procedure, where x and y are of type t, "<" must be defined in the using environment; it cannot possibly have a local meaning, since the actual value of t is not known locally.
A deeper analysis shows that such an approach does not work. For example, consider

Obviously, h must be defined in the using environment, since it depends on the actual value of type t. The problem is what to do about g. Suppose there is a local definition of FUNC g(INT) => INT. If in the using environment, h refers to a FUNC(t) => INT, then this local g should be used. However, if h refers to a FUNC(t) => t, then a non-local g must be used.
Such problems make things difficult for the translator. They also make it difficult for the definer of a generic to control its meaning, and for a user to know what a legal instantiation is. As a result, the NEEDS list is the better approach.
The NEEDS list affects the selection (and construction) of an instantiation; it also determines whether the instantiation may legally be used; its use is legal only if routines and tasks as stated in the NEEDS list are available in the using scope.

8.2.4  SUBTYPES

Type generic parameters are useful for the definition of generalized procedures and functions that work for actual parameters of any type (subject to the constraints imposed by the NEEDS list, if any). However, sometimes a generic definition must be specific about subtypes. For example, consider the following

a generic type definition
      TYPE t[s] (n: INT) : ARRAY INT(1..n) OF s;

a generalized search routine
      FUNC search (a: ARRAY INT OF t, x: t) => INT;

a conversion routine
      FUNC convert [t] (s: STRING[ASCII]) => t;

In case 1, the user of t must supply a subtype for s, since s is used as a subtype in defining t. However, it is desirable that the user be informed of what to supply without looking at the definition of t. In case 2, the definer of search may wish to require that the subtype of x be the same as the subtype of elements of a. In example 3, it may be necessary to let the user of convert specify the subtype of the result.
In all three cases, some means of writing a generic definition in terms of subtypes is needed.
RED satisfies these requirements by allowing subtype generic parameters. A subtype generic parameter provides a name that stands for a subtype in the body of the generic; it also constrains the user to explicitly provide a subtype (cases 1 and 3 above) or to satisfy a relationship among subtypes (case 2 above).
The subtype facility can be used to express each of the three examples above:

a generic type definition

a generalized search routine

a conversion routine from strings to complex numbers

A subtype generic parameter does not, however, affect instantiation. A generic definition with a subtype formal parameter stands for all instantiations by type values. Thus, definition (1) above stands for all types t[r], where r is some type. When such a generic is used, i.e.,
      t[u]
where u is a subtype, the type of the subtype (e.g., the type of u) is used to select (and possibly construct) the instantiation. However, the use is legal only if subtype constraints have been satisfied. For example, suppose u and v are different subtypes of the same type. If the user declares
      VAR a: ARRAY INT(1..100) OF u;
      VAR x: u;
      VAR y: v;
then
      search(a,x)
      search(a,y)
both select the same instantiation, but search(a,x) is legal while search(a,Y) is not.
Other methods of describing the need for subtype information were considered and rejected in the design of RED. Essentially there are two other choices:

let the context determine whether a type or a subtype is required

use assertions to state such information

The main objection to approach 1 is that it is necessary to examine the body of the generic (or possibly some other generic used by the body) to determine whether a type or subtype is needed; such a determination can be arbitrarily difficult for the user to make. Furthermore, this approach is not sufficient to express example 2 above, since either types or subtypes can be specified in routine headers.
The second alternative is similar to what we have. Our method is superior because it permits a more succinct statement of requirements, and because it is more obvious when a name stands for a subtype.

8.2.5  MANIFEST EXPRESSIONS

A generic parameter can stand for a manifest expression, in accordance with Steelman (SM 12D). Since the actual parameters must be manifest, optimizations can be performed on the instantiations. Therefore, such parameters are useful in writing generalized packages that permit instantiations.
For example, a library of mathematical functions can be parameterized by precision, and then instantiations created for standard precisions, and the instantiations optimized for these cases.

8.2.6  IMPLEMENTATION

The main implementation issue with generics concerns instantiation. when the need for an instantiation is discovered, the translator is processing a program that uses the generic. However, the generic must be instantiated in its definition environment, with the formal generic parameters bound to the actual values (and needed operations, if any, bound to the actual needed routines). Thus to process the instantiation, the translator must shift from the environment of the use to the definition environment of the generic. This shift might be done immediately (i.e., the translator invokes itself recursively when it needs an instantiation) or delayed until after the using program has been processed.
A reasonable way to accommodate the shift in environment is to collect information about the environment of a generic definition when it is processed. The translator must process a generic definition to determine if it is legal; basically, this legality checking determines that the formal generic parameters are used appropriately, and that the definition environment provides required definitions. During this processing, information about the generic parameters (and the NEEDS list, if any) and the definition environment would naturally be collected in some sort of symbol table. All that need be done is to save this symbol table in the program library along with the source for the generic. Then instantiation consists of filling in information about the actual generic parameters (and routines and tasks named in the NEEDS list) in the symbol table of the generic, and then translating the source of the body of the generic in the context of that symbol table. During this translation, optimizations can be done based on the values of the actual generic constraints.
Generics are defined in such a way that it is irrelevant how often an instantiation is produced: the meaning of a program is the same if an instantiation is constructed just once, and all uses refer to it (i.e., all share the code) or if each use causes the instantiation to be constructed again (no sharing of code), or anything in between. This flexibility is important because it permits the A translator to optimize the code produced to save space or time as desired. For example, inline substitution of the instantiation provides time efficiency at the expense of space, while sharing the code saves space but costs time.
To avoid constructing an instantiation more than once (if this is desired), it is necessary to remember an instantiation that has been constructed previously, and then reuse that instantiation. This can be done because every instantiation has a unique name that is known at translation time. The name consists of the identifier naming the deferred unit that is the body of the generic, the actual values of all the translation-time parameters (including implicit parameters acquired via the NEEDS list), and the signature (if any). This name can be produced from any legal use of an instantiation, and therefore can be used to store and later fetch the code of the instantiation. The instantiation might be remembered just for the translation of a single module, or it might be remembered for much longer by storing the instantiation and its name in the program library.
It is possible to have more of the work of producing an instantiation done at the time the generic definition is processed. In many cases almost all object code can be produced at this time. The output of the translator would consists of two parts: the object code, and some generic-parameter-dependent information. Then an instantiation can be produced in one of two ways:

Using the actual generic constraints and the parameter-dependent information, the translator copies the object code and modifies it appropriately to produce the code for the instantiation.

The parameter-dependent information is copied and the values of the actual generic constraints are filled in. The instantiation shares the object code with other instantiations.

The object code will refer to generic constraint information for the appropriate instantiation via a pointer in the activation record.
For example, consider the integrate routine discussed earlier. This could be preprocessed, and complete object code generated except for the invocations of the function that is its generic argument. (Separate code bodies could be produced for the four combinations of single-precision/double-precision actual parameters.) The function invocations could be handled by leaving blanks in the object code, and having the parameter-dependent information state where the blanks are (such information would be analogous to the link table in a linking loader); then instantiation would make a new copy with the blanks filled in. Alternatively, instantiation could be done by creating a new parameter-dependent part; the instantiations could all share the object code, and access parameter values indirectly. This second alternative may be the more desirable since the time penalty for the indirect invocation is slight and the space savings is large.
Thus, generics in RED provide flexibility in optimization for space or time through a variety of implementation techniques. Many of these techniques are based on existing work in the CLU implementation [ALS78] and also the Alphard implementation. For example, the CLU implementation uses a code sharing technique similar to the one sketched above.

7. Exception Handling & Multitasking

9. Advanced & Low-Level Facilities

Overview
Requirements
      Strawman
      Woodenman
      Tinman
      Ironman
      Steelman
RED Reference
RED Rationale
Types in RED
Time/Life Computer Languages
Memories
Site Index

Overview Reference ToC Rationale ToC Site Index