4. EXPRESSIONS AND STATEMENTS
4.1 EXPRESSIONS
This section discusses the general syntactic properties of expressions in RED.
Semantic issues are described elsewhere in this document: operator overloading in
Section 8.1, side effects
in Section 5.2, disambiguation of literal and constructor
forms in Section 3.3.4.
4.1.1 OPERATOR PRECEDENCE LEVELS
RED complies with the Steelman requirement (SM 4F) concerning precedence
levels. Although a smaller number of precedence levels could have been achieved by
adopting the Pascal convention of merging the BOOL operators into the arithmetic
hierarchy (i.e., placing AND with the multiplicative operators, and OR and XOR with
the additive operators), we chose to use the ALGOL 60 [Na60] approach and to place g
the BOOL operators at lower precedence. In making this choice, we adopted the
advice of N. Wirth, the designer of Pascal. In a paper which assessed the Pascal
language, Wirth made the following statement [Wi77, p.194]:
In the interest of simplicity and efficient translatability, Pascal
aimed at a reasonably small number of operator precedence levels. ALGOL
60's hierarchy of 9 levels seemed clearly too baroque ... In retrospect,
however, the decision to break with a widely used tradition seems ill-advised,
particularly with the growing significance of complicated Boolean
expressions in connection with program verification.
4.1.2 EFFECT OF PARENTHESES
RED has adopted a simple rule concerning parenthesization:
- when parentheses are present, they dictate the grouping of operators with
operands, and
- when parentheses are absent, consecutive operators at the same precedence
level in an expression are grouped with their operands from left to right.
As an example of the first rule, the translator is not free to elaborate
x+(y+z) as (x+y)+z (or as y+(x+z), etc.) unless it can guarantee semantic
equivalence. Thus, in cases such as floating-point arithmetic where it may be
critical for the programmer to control operator/operand association,
parenthesization may be used for this purpose.
The second rule implies that an expression such as x-y+z is parsed as (x-y)+z.
Although it might be argued that x-y+z is "psychologically ambiguous" and that
parentheses should be required (cf. SM UG), considerations of language simplicity
dictate otherwise. The Ironman requirement 4G attempted to define a set of
circumstances under which parentheses are necessary ("whenever an expression has a
nonassociative operator to the left of an operator of the same precedence"), but
this constraint was overly restrictive. It implied, for example, that the
(psychologically unambiguous) expression a/b + c/d would be illegal, since the
nonassociative operator "/" is to the left of another occurrence of "/". The note
in Steelman 4G corrects this Ironman problem, since it suggests a weaker
restriction; viz, "requiring explicit parentheses to resolve the operator-operand
association whenever a nonassociative operator appears to the left of an operator of
the same precedence at the least-binding precedence level of any subexpression."
However, the price in language complexity outweighs the benefits of such a rule, and
we therefore adopted the simple and widely used left-to-right associativity
conventions.
It is important to note that left-to-right operator/operand association within
precedence levels does ng; imply left-to-right elaboration order. In particular,
although x-y+z is parsed as (x-y)+z, the order in which the x, y, and z operands are
elaborated is undefined.
4.2 MANIFEST EXPRESSIONS
One of the design decisions which is significant with respect to efficiency,
simplicity, and usability is the definition of the class of expressions which are
manifest -- i.e., whose values are guaranteed to be computable at translation time.
This is important because type checking is to be performed at translation time, and
the user can define type-determining properties which are specified by expressions.
For example, with the declaration
GENERIC n:INT
TYPE vector[n]: ARRAY INT (1..n) OF FLOAT (5, -1.0E7..1.0E7);
the types vector[1], vector[2], etc., are all different. But suppose the user
declares
VAR v1: vector[exp1];
VAR v2: vector[exp2];
Do v1 and v2 have the same type? The language must define conditions to be met by
exp1 and exp2 so that they can be guaranteed to be translation time evaluable.
In addition to their use in translation time property lists such as in the
above example, manifest values are required for floating point precision (SM 3-1D)
and are used in connection with the conditional translation facility. Specifically,
if the BOOL expression in an IF statement is manifest, only the selected branch is
translated (SM 6C); the analogous situation holds for CASE statements.
There is a wide range of design alternatives concerning the generality of
manifest expressions. One rather extreme position would be to limit manifest values
to literals. Pascal is very close to this extreme in that it requires either a
literal or the name of a literal (a declared CONSTant) and limits manifest values to
be from simple data types. At the other extreme, a language could place minimal
restrictions on manifest computation (e.g., I/O would be excluded). Such a
definition would allow translation time invocation of user-defined routines taking
parameters of user-defined types.
The restrictive Pascal approach is inappropriate. One problem is that it does
not adequately support conditional translation. Another difficulty is that it
complicates the writing and reading of programs which use simple functions of
manifest values. For example, if N were a declared manifest constant (equal to,
say, 10) and it were necessary to use the value N+1 in a manifest context, the user
would have to explicitly declare a new constant set equal to 11 and use it instead.
The opposite, general extreme has some advantages. It provides the user with
maximum power, and it seems to be simple in that few rules are required. However,
the level of generality presents severe language and implementation difficulties.
At the language level, complexities arise. User-defined functions generally
deal with data whose values are not manifest. For instance, most operators on the
abstract type matrix require several temporaries. What meaning .is attached to
manipulations of non-manifest data? Does the meaning change if the objects are not
local? Does the meaning change if the objects are defined in a separately
translated capsule? How is a manifest value distinguished?
The extremely general case also poses implementation problems. The translator
will need to incorporate an interpreter to determine manifest values. Depending
upon the answer to the question, "How are manifests distinguished?", the translator
may have to attempt compile time evaluation of arbitrary expressions, invoking
user-defined functions, with the resulting possibility of a translation time
infinite loop when the user has made an error. Since many of the abstract types are
likely to be in separately translated capsules, the translation time invocation of
separately translated routines would have to be supported. This in turn implies
that any change at all in a separately translated capsule may force retranslation of
every capsule which uses it.
RED has adopted a conservative compromise between the two extremes. The main
point is that manifest values are substitutable for literals of predefined types.
Thus literals are manifest, declared constants initialized to manifest expressions
are manifest, and built-in operations applied to manifest expressions yield manifest
values. Although the set of manifest values is limited, it provides the flexibility
which is needed in dealing with type properties and conditional translation.
Note that many expressions might in fact be computable at translation time even
though they are not manifest. (Such expressions may not be used in contexts where
the language requires manifest values, since whether translation time elaboration
occurs would be implementation dependent.) As an example, an optimizing translator
may choose to elaborate library routines (or user-defined functions) at translation
time.
4.3 STATEMENTS
4.3.1 INTRODUCTION
One of the basic objectives underlying the design of statements in the RED
language has been to provide support for top down program development. When
decomposing a high-level abstraction into several lower-level ones, the programmer
must be able to work in the local context. This implies not only that the language
allow nesting of control structures (SM 6A), but that the decomposition of a high
level abstraction be permitted to include declarations. The RED language satisfies
both requirements. Bodies inside compound statements may contain declarations, and
a begin statement (i.e., a block) is provided so that a declaration may be locally
introduced at the point at which it is needed.
When a program is developed in a top down manner, the need for a procedure is
discovered only when an implementation for some abstraction is determined. Since
the procedure is a low level implementation of the high level concept, the natural
place for the procedure's declaration is after its invocations. This allows the
reader to omit the details and allows the writer to proceed in a natural fashion.
It should be noted that RED, like Pascal, is a "statement" language as opposed
to an "expression" language, such as ALGOL 68 [VWi75]. RED has assignment
statements and IF statements but these statements do not return values;
consequently, RED does not allow conditional expressions or nested assignments
(i.e., IF statements or assignment statements used as expressions). Such constructs
occasionally provide notational convenience and can sometimes make the optimizer's
job easier; however, the problems they raise, summarized below, outweigh these
benefits.
- Statements by their nature produce side effects, and side effects in
expressions are to be minimized (SM 4C). If assignment statements were
permitted as expression constituents, this requirement would be severely
compromised. Even if the use of such a statement was restricted so that it
was valid only when it was the sole construct in the right-hand side of an
assignment (not as an operand in a larger expression) it would be possible
to "alter the value of a variable that can be accessed at the point of the
expression", in violation of SM UC. Consider:
x(i) := i := i+1;
- Ad hoc rules would be required. For example, conditional expressions must
have both THEN and ELSE clauses (both returning values of the same type),
whereas conditional statements can omit the ELSE clause. As another
example, it is not obvious whether the occurrence of an ambiguous
enumeration literal in the branch of a conditional expression should be
considered legal if the other branch can disambiguate it. (Consider the
invocation
WRITE(IF cond THEN 'A ELSE 'cent END IF)
where 'A and 'cent are elements of a non-ASCII character type.)
- A language that is mostly statement-oriented but partially
expression-oriented is a hybrid whose lack of uniformity would make it
harder to learn. For example, it might be expected that case-expressions be
supported in addition to if-expressions.
- Conditional expressions are difficult to format in the output listing and
complicate syntactic error recovery.
4.3.2 CONDITIONAL STATEMENTS
The CASE Statement
The CASE statement in RED has the following basic form:
Although modeled after the Pascal version, the RED CASE statement is
distinctive in the following respects:
- The use of the WHEN keyword rather than a label eliminates the ambiguity of
Pascal's version and clearly delimits the selector for added reliability.
Also, the use of keyword delimiters to separate the component parts of the
compound statement is far more reliable than the single statement formalism
of ALGOL 60 heritage.
- The b's are bodies which can contain their own declarations, thus providing
better support for top down programming. Since a body is a scope, a label
declared in a body is invisible outside the body. There is thus no need for
special rules disallowing GOTOs into the bodies.
- While Pascal allows the ei's to be lists of elements, RED allows them to be
lists of elements and ranges. Large ranges are essentially impossible to
describe in a Pascal CASE statement, while small ranges are tedious to read
and write and are hard to compile efficiently.
The major departures from Pascal are that the ei may be expressions (not
necessarily manifest), and that in cases of overlap the language does not specify
which alternative is elaborated. The allowance of nonmanifest expressions enables
values from programmer-defined types to be used as case labels without complicating
the rules for manifest computation (see Section 4.2). The decision to make a
nondeterministic selection of a branch when the CASE expression matches more than
one ei (rather than raising an exception or choosing the first match) is motivated
by considerations of program verifiability and consistency with the multi-way wait
statement. It is important to note that the generality of the CASE statement in RED
is paid for only when it is used. "Jump tables" may be used as an implementation
technique for the branches whose labels are known at translation time (even when
some branches of the same statement are not manifest).
The CASE statement in RED may be used to realize Dijkstra's "guarded command"
control construct [Di76]:
The advantage of the RED form is that the frequently-occurring Pascal-style case is
more readable and easier to optimize than the "guarded command" form.
The IF Statement
RED has a generalized form of the Pascal IF statement, achieved by replacing
the single statement format with bodies, adding a closing delimiter (END IF), and
allowing optional ELSEIF clauses. The use of bodies in the THEN and ELSE clauses
has the same advantages described for the CASE statement but requires a. closing
delimiter for the entire statement. (The choice of END IF is consistent with the
RED conventions for such delimiters.) The ELSEIF clause then becomes useful for
avoiding cascaded END IFs. For example,
is equivalent to:
The Steelman requires the provision of "short circuited" boolean operators (SM
6D). The possibility of meeting the requirement by building special syntax into the
IF statement was rejected in favor of a more uniform treatment of the issue as part
of the boolean operators themselves (cf. Section 3.4.1).
4.3.3 THE REPEAT STATEMENT
RED provides a simple set of constructs that meet the Steelman's requirements
for iterative control (SM 6E): a REPEAT statement with either a test before each
iteration (WHILE), or an indexed iteration (FOR). With the FOR clause the
programmer may explicitly specify the sequencing in descending order over the
values, when the language-defined default of ascending order is inappropriate.
One alternative considered was to allow an iteration statement with no WHILE or
FOR control. Such a statement would continue to loop until an EXIT was elaborated
in its body. The WHILE semantics could be realized by the placement of a
conditional exit as its first statement. We rejected this alternative in the
interest of reliability and readability -- it would be too easy to write an infinite
loop unintentionally, and the WHILE case is frequent enough in practice to warrant a
special syntax.
Another alternative was to provide added power to the REPEAT statement; e.g.,
to allow UNTIL semantics (test a boolean expression after each iteration, implying
at least one elaboration of the body). The inclusion of such options on the REPEAT
statement was rejected in the interest of simplicity.
The syntax chosen for the FOR clause emphasizes the fact that the loop control
variable is locally declared at that point; e.g.,
FOR i: INT(1..n) REPEAT ... END REPEAT;
Since the loop control variable is readonly within the body of the repeat, and since
it is inaccessible outside the body, it is simple for the implementation to optimize
by preserving the variable in a register during loop elaboration.
4.3.4 CONTROL TRANSFER STATEMENTS
The EXIT Statement
The EXIT statement in RED is used when it is necessary to terminate elaboration
of the body of a compound statement before elaboration of all its elements. For
example, the following loop terminates when a particular value is first found in a
table:
Though the EXIT statement is, in principle, unnecessary (its semantics could be
realized with a GOTO), it serves the valuable purpose of enhancing program
readability. The RED rule that each EXIT must explicitly specify the label of the
compound statement to be terminated simplifies the task of program maintenance. If
the language did not require this labeling (and thus if the EXIT were to terminate
the most immediately enclosing compound statement), then a subsequent addition of a
new compound statement surrounding the EXIT would change the intended behavior of
the program.
The GOTO Statement
The GOTO statement satisfies the Steelman requirement for explicit control
transfer (SM 6G). Some important features of the statement are as follows:
- Labels for GOTOs are syntactically different from matching labels for EXITs.
A reader will immediately notice any statement which is the target of a
GOTO.
- GOTOs cannot be used to transfer control out of routines or tasks.
- The bodies of compound statements are scopes. A label appearing in such a
body is local to that body; therefore, a GOTO cannot be used to jump into a
compund statement, or between branches of conditional statements.
The RETURN Statement
RETURN statements terminate elaboration of a procedure, function, or task. For
a procedure or task, the statement is simply:
RETURN;
The form
RETURN exp;
is used to terminate elaboration of a function.
One approach we considered did not include a RETURN statement at all; instead,
an EXIT statement (such as EXIT q) would be used to terminate elaboration of the
immediately enclosing routine q. With this approach, the result of a function would
be returned via an explicitly declared result variable which would serve as the
target of assignments during function elaboration.
We rejected this alternative for two reasons. First, it is useful (for
readability) to have a clear syntactic distinction between exits from compound
statements and returns from routines. Second, the use of a result variable to
return values from functions suffers from a number of drawbacks, despite its
advantage in making it easier for the translator to avoid an extra copy in some
cases. Most significantly, use of a result variable is an inappropriate approach
when a function return is specified with a type rather than a subtype. ln RED,
functions are allowed to be declared this way since, in many cases, it is simpler
for the programmer to figure out the subtype of a function result after entry to the
function. Although this kind of declaration conflicts with SM 7D
("If a result is
of a nonindirect array or record type then the number of its components must be
determinable by the time of function call"), the price for the added generality is
paid only when it is used, and it keeps the language rules uniform regarding
type/subtype specifications on formal parameters and function results. However,
when a type is specified on a function result, there is no sense in having a result
variable. The only solution is to provide a RETURN statement which designates an
expression whose value is to be returned. Consequently, this approach was adopted
for RED.