Intermetrics, Inc. - RED Rationale - March 1979

.	Navigate

3. Types

5. Procedures & Functions

EXPRESSIONS &
STATEMENTS

Expressions Operator Precedence Levels Effect of Parentheses Manifest Expressions
Statements Introduction Conditional Statements The REPEAT Statement Control Transfer Statements

4.   EXPRESSIONS AND STATEMENTS

4.1  EXPRESSIONS

This section discusses the general syntactic properties of expressions in RED. Semantic issues are described elsewhere in this document: operator overloading in Section 8.1, side effects in Section 5.2, disambiguation of literal and constructor forms in Section 3.3.4.

4.1.1  OPERATOR PRECEDENCE LEVELS

RED complies with the Steelman requirement (SM 4F) concerning precedence levels. Although a smaller number of precedence levels could have been achieved by adopting the Pascal convention of merging the BOOL operators into the arithmetic hierarchy (i.e., placing AND with the multiplicative operators, and OR and XOR with the additive operators), we chose to use the ALGOL 60 [Na60] approach and to place g the BOOL operators at lower precedence. In making this choice, we adopted the advice of N. Wirth, the designer of Pascal. In a paper which assessed the Pascal language, Wirth made the following statement [Wi77, p.194]:
In the interest of simplicity and efficient translatability, Pascal aimed at a reasonably small number of operator precedence levels. ALGOL 60's hierarchy of 9 levels seemed clearly too baroque ... In retrospect, however, the decision to break with a widely used tradition seems ill-advised, particularly with the growing significance of complicated Boolean expressions in connection with program verification.

4.1.2  EFFECT OF PARENTHESES

RED has adopted a simple rule concerning parenthesization:

when parentheses are present, they dictate the grouping of operators with operands, and

when parentheses are absent, consecutive operators at the same precedence level in an expression are grouped with their operands from left to right.

As an example of the first rule, the translator is not free to elaborate x+(y+z) as (x+y)+z (or as y+(x+z), etc.) unless it can guarantee semantic equivalence. Thus, in cases such as floating-point arithmetic where it may be critical for the programmer to control operator/operand association, parenthesization may be used for this purpose.
The second rule implies that an expression such as x-y+z is parsed as (x-y)+z. Although it might be argued that x-y+z is "psychologically ambiguous" and that parentheses should be required (cf. SM UG), considerations of language simplicity dictate otherwise. The Ironman requirement 4G attempted to define a set of circumstances under which parentheses are necessary ("whenever an expression has a nonassociative operator to the left of an operator of the same precedence"), but this constraint was overly restrictive. It implied, for example, that the (psychologically unambiguous) expression a/b + c/d would be illegal, since the nonassociative operator "/" is to the left of another occurrence of "/". The note in Steelman 4G corrects this Ironman problem, since it suggests a weaker restriction; viz, "requiring explicit parentheses to resolve the operator-operand association whenever a nonassociative operator appears to the left of an operator of the same precedence at the least-binding precedence level of any subexpression." However, the price in language complexity outweighs the benefits of such a rule, and we therefore adopted the simple and widely used left-to-right associativity conventions.
It is important to note that left-to-right operator/operand association within precedence levels does ng; imply left-to-right elaboration order. In particular, although x-y+z is parsed as (x-y)+z, the order in which the x, y, and z operands are elaborated is undefined.

4.2  MANIFEST EXPRESSIONS

One of the design decisions which is significant with respect to efficiency, simplicity, and usability is the definition of the class of expressions which are manifest -- i.e., whose values are guaranteed to be computable at translation time. This is important because type checking is to be performed at translation time, and the user can define type-determining properties which are specified by expressions. For example, with the declaration
      GENERIC n:INT
      TYPE vector[n]: ARRAY INT (1..n) OF FLOAT (5, -1.0E7..1.0E7);
the types vector[1], vector[2], etc., are all different. But suppose the user declares
      VAR v1: vector[exp1];
      VAR v2: vector[exp2];
Do v1 and v2 have the same type? The language must define conditions to be met by exp1 and exp2 so that they can be guaranteed to be translation time evaluable. In addition to their use in translation time property lists such as in the above example, manifest values are required for floating point precision (SM 3-1D) and are used in connection with the conditional translation facility. Specifically, if the BOOL expression in an IF statement is manifest, only the selected branch is translated (SM 6C); the analogous situation holds for CASE statements.
There is a wide range of design alternatives concerning the generality of manifest expressions. One rather extreme position would be to limit manifest values to literals. Pascal is very close to this extreme in that it requires either a literal or the name of a literal (a declared CONSTant) and limits manifest values to be from simple data types. At the other extreme, a language could place minimal restrictions on manifest computation (e.g., I/O would be excluded). Such a definition would allow translation time invocation of user-defined routines taking parameters of user-defined types.
The restrictive Pascal approach is inappropriate. One problem is that it does not adequately support conditional translation. Another difficulty is that it complicates the writing and reading of programs which use simple functions of manifest values. For example, if N were a declared manifest constant (equal to, say, 10) and it were necessary to use the value N+1 in a manifest context, the user would have to explicitly declare a new constant set equal to 11 and use it instead.
The opposite, general extreme has some advantages. It provides the user with maximum power, and it seems to be simple in that few rules are required. However, the level of generality presents severe language and implementation difficulties.
At the language level, complexities arise. User-defined functions generally deal with data whose values are not manifest. For instance, most operators on the abstract type matrix require several temporaries. What meaning .is attached to manipulations of non-manifest data? Does the meaning change if the objects are not local? Does the meaning change if the objects are defined in a separately translated capsule? How is a manifest value distinguished?
The extremely general case also poses implementation problems. The translator will need to incorporate an interpreter to determine manifest values. Depending upon the answer to the question, "How are manifests distinguished?", the translator may have to attempt compile time evaluation of arbitrary expressions, invoking user-defined functions, with the resulting possibility of a translation time infinite loop when the user has made an error. Since many of the abstract types are likely to be in separately translated capsules, the translation time invocation of separately translated routines would have to be supported. This in turn implies that any change at all in a separately translated capsule may force retranslation of every capsule which uses it.
RED has adopted a conservative compromise between the two extremes. The main point is that manifest values are substitutable for literals of predefined types. Thus literals are manifest, declared constants initialized to manifest expressions are manifest, and built-in operations applied to manifest expressions yield manifest values. Although the set of manifest values is limited, it provides the flexibility which is needed in dealing with type properties and conditional translation.
Note that many expressions might in fact be computable at translation time even though they are not manifest. (Such expressions may not be used in contexts where the language requires manifest values, since whether translation time elaboration occurs would be implementation dependent.) As an example, an optimizing translator may choose to elaborate library routines (or user-defined functions) at translation time.

4.3  STATEMENTS

4.3.1  INTRODUCTION

One of the basic objectives underlying the design of statements in the RED language has been to provide support for top down program development. When decomposing a high-level abstraction into several lower-level ones, the programmer must be able to work in the local context. This implies not only that the language allow nesting of control structures (SM 6A), but that the decomposition of a high level abstraction be permitted to include declarations. The RED language satisfies both requirements. Bodies inside compound statements may contain declarations, and a begin statement (i.e., a block) is provided so that a declaration may be locally introduced at the point at which it is needed.
When a program is developed in a top down manner, the need for a procedure is discovered only when an implementation for some abstraction is determined. Since the procedure is a low level implementation of the high level concept, the natural place for the procedure's declaration is after its invocations. This allows the reader to omit the details and allows the writer to proceed in a natural fashion.
It should be noted that RED, like Pascal, is a "statement" language as opposed to an "expression" language, such as ALGOL 68 [VWi75]. RED has assignment statements and IF statements but these statements do not return values; consequently, RED does not allow conditional expressions or nested assignments (i.e., IF statements or assignment statements used as expressions). Such constructs occasionally provide notational convenience and can sometimes make the optimizer's job easier; however, the problems they raise, summarized below, outweigh these benefits.

Statements by their nature produce side effects, and side effects in expressions are to be minimized (SM 4C). If assignment statements were permitted as expression constituents, this requirement would be severely compromised. Even if the use of such a statement was restricted so that it was valid only when it was the sole construct in the right-hand side of an assignment (not as an operand in a larger expression) it would be possible to "alter the value of a variable that can be accessed at the point of the expression", in violation of SM UC. Consider:
      x(i) := i := i+1;

Ad hoc rules would be required. For example, conditional expressions must have both THEN and ELSE clauses (both returning values of the same type), whereas conditional statements can omit the ELSE clause. As another example, it is not obvious whether the occurrence of an ambiguous enumeration literal in the branch of a conditional expression should be considered legal if the other branch can disambiguate it. (Consider the invocation
      WRITE(IF cond THEN 'A ELSE 'cent END IF)
where 'A and 'cent are elements of a non-ASCII character type.)

A language that is mostly statement-oriented but partially expression-oriented is a hybrid whose lack of uniformity would make it harder to learn. For example, it might be expected that case-expressions be supported in addition to if-expressions.

Conditional expressions are difficult to format in the output listing and complicate syntactic error recovery.

4.3.2  CONDITIONAL STATEMENTS

The CASE Statement

The CASE statement in RED has the following basic form:

Although modeled after the Pascal version, the RED CASE statement is distinctive in the following respects:

The use of the WHEN keyword rather than a label eliminates the ambiguity of Pascal's version and clearly delimits the selector for added reliability. Also, the use of keyword delimiters to separate the component parts of the compound statement is far more reliable than the single statement formalism of ALGOL 60 heritage.

The b's are bodies which can contain their own declarations, thus providing better support for top down programming. Since a body is a scope, a label declared in a body is invisible outside the body. There is thus no need for special rules disallowing GOTOs into the bodies.

While Pascal allows the e_i's to be lists of elements, RED allows them to be lists of elements and ranges. Large ranges are essentially impossible to describe in a Pascal CASE statement, while small ranges are tedious to read and write and are hard to compile efficiently.

The major departures from Pascal are that the e_i may be expressions (not necessarily manifest), and that in cases of overlap the language does not specify which alternative is elaborated. The allowance of nonmanifest expressions enables values from programmer-defined types to be used as case labels without complicating the rules for manifest computation (see Section 4.2). The decision to make a nondeterministic selection of a branch when the CASE expression matches more than one e_i (rather than raising an exception or choosing the first match) is motivated by considerations of program verifiability and consistency with the multi-way wait statement. It is important to note that the generality of the CASE statement in RED is paid for only when it is used. "Jump tables" may be used as an implementation technique for the branches whose labels are known at translation time (even when some branches of the same statement are not manifest).
The CASE statement in RED may be used to realize Dijkstra's "guarded command" control construct [Di76]:

The advantage of the RED form is that the frequently-occurring Pascal-style case is more readable and easier to optimize than the "guarded command" form.

The IF Statement

RED has a generalized form of the Pascal IF statement, achieved by replacing the single statement format with bodies, adding a closing delimiter (END IF), and allowing optional ELSEIF clauses. The use of bodies in the THEN and ELSE clauses has the same advantages described for the CASE statement but requires a. closing delimiter for the entire statement. (The choice of END IF is consistent with the RED conventions for such delimiters.) The ELSEIF clause then becomes useful for avoiding cascaded END IFs. For example,

is equivalent to:

The Steelman requires the provision of "short circuited" boolean operators (SM 6D). The possibility of meeting the requirement by building special syntax into the IF statement was rejected in favor of a more uniform treatment of the issue as part of the boolean operators themselves (cf. Section 3.4.1).

4.3.3  THE REPEAT STATEMENT

RED provides a simple set of constructs that meet the Steelman's requirements for iterative control (SM 6E): a REPEAT statement with either a test before each iteration (WHILE), or an indexed iteration (FOR). With the FOR clause the programmer may explicitly specify the sequencing in descending order over the values, when the language-defined default of ascending order is inappropriate.
One alternative considered was to allow an iteration statement with no WHILE or FOR control. Such a statement would continue to loop until an EXIT was elaborated in its body. The WHILE semantics could be realized by the placement of a conditional exit as its first statement. We rejected this alternative in the interest of reliability and readability -- it would be too easy to write an infinite loop unintentionally, and the WHILE case is frequent enough in practice to warrant a special syntax.
Another alternative was to provide added power to the REPEAT statement; e.g., to allow UNTIL semantics (test a boolean expression after each iteration, implying at least one elaboration of the body). The inclusion of such options on the REPEAT statement was rejected in the interest of simplicity. The syntax chosen for the FOR clause emphasizes the fact that the loop control variable is locally declared at that point; e.g.,
      FOR i: INT(1..n) REPEAT ... END REPEAT;
Since the loop control variable is readonly within the body of the repeat, and since it is inaccessible outside the body, it is simple for the implementation to optimize by preserving the variable in a register during loop elaboration.

4.3.4  CONTROL TRANSFER STATEMENTS

The EXIT Statement

The EXIT statement in RED is used when it is necessary to terminate elaboration of the body of a compound statement before elaboration of all its elements. For example, the following loop terminates when a particular value is first found in a table:

Though the EXIT statement is, in principle, unnecessary (its semantics could be realized with a GOTO), it serves the valuable purpose of enhancing program readability. The RED rule that each EXIT must explicitly specify the label of the compound statement to be terminated simplifies the task of program maintenance. If the language did not require this labeling (and thus if the EXIT were to terminate the most immediately enclosing compound statement), then a subsequent addition of a new compound statement surrounding the EXIT would change the intended behavior of the program.

The GOTO Statement

The GOTO statement satisfies the Steelman requirement for explicit control transfer (SM 6G). Some important features of the statement are as follows:

Labels for GOTOs are syntactically different from matching labels for EXITs. A reader will immediately notice any statement which is the target of a GOTO.

GOTOs cannot be used to transfer control out of routines or tasks.

The bodies of compound statements are scopes. A label appearing in such a body is local to that body; therefore, a GOTO cannot be used to jump into a compund statement, or between branches of conditional statements.

The RETURN Statement

RETURN statements terminate elaboration of a procedure, function, or task. For a procedure or task, the statement is simply:
      RETURN;
The form
      RETURN exp;
is used to terminate elaboration of a function.
One approach we considered did not include a RETURN statement at all; instead, an EXIT statement (such as EXIT q) would be used to terminate elaboration of the immediately enclosing routine q. With this approach, the result of a function would be returned via an explicitly declared result variable which would serve as the target of assignments during function elaboration.
We rejected this alternative for two reasons. First, it is useful (for readability) to have a clear syntactic distinction between exits from compound statements and returns from routines. Second, the use of a result variable to return values from functions suffers from a number of drawbacks, despite its advantage in making it easier for the translator to avoid an extra copy in some cases. Most significantly, use of a result variable is an inappropriate approach when a function return is specified with a type rather than a subtype. ln RED, functions are allowed to be declared this way since, in many cases, it is simpler for the programmer to figure out the subtype of a function result after entry to the function. Although this kind of declaration conflicts with SM 7D ("If a result is of a nonindirect array or record type then the number of its components must be determinable by the time of function call"), the price for the added generality is paid only when it is used, and it keeps the language rules uniform regarding type/subtype specifications on formal parameters and function results. However, when a type is specified on a function result, there is no sense in having a result variable. The only solution is to provide a RETURN statement which designates an expression whose value is to be returned. Consequently, this approach was adopted for RED.

Expressions Operator Precedence Levels Effect of Parentheses Manifest Expressions
Statements Introduction Conditional Statements The REPEAT Statement Control Transfer Statements

3. Types

5. Procedures & Functions

Overview
Requirements
      Strawman
      Woodenman
      Tinman
      Ironman
      Steelman
RED Reference
RED Rationale
Types in RED
Time/Life Computer Languages
Memories
Site Index

Overview Reference ToC Rationale ToC Site Index