Avail Syntax


Basic Syntax of Avail

If you're not familiar with formal descriptions of grammars you may want to (A) read up on the subject, or (B) skip it.

This is the syntax of Avail. Comments are /* and */ (with nesting). In the grammar below, square brackets group terms, question mark indicates the preceding term is optional, asterisk allows zero or more occurrences of the preceding term, and plus sign indicates one or more occurrences of the preceding term. Two asterisks indicate zero or more repetitions of the previous term, using the next term as a separator. Two plus signs indicate one or more repetitions using the next term as a separator. For example, A++x is equivalent to A[xA]*, and includes A, AxA, AxAxA, etc.

    module ->
            'Module' stringLiteral
            ['Pragma' strings]?
            ['Extends' strings]?
            ['Uses' strings]?
            ['Names' strings]?
            'Body' statements
    strings -> stringLiteral ** ','
    statements -> [statement ';']*
    statement -> assignment | declaration | expression
        assignment -> variable ':=' expression
        declaration -> variableDeclaration | constantDeclaration | initializingDeclaration
            variableDeclaration -> variable ':' typeExpr
            constantDeclaration -> variable '::=' expression
            initializingDeclaration -> variable ':' typeExpr ':=' expression
        expression -> expressionList | expressionItem
            expressionList -> expressionItem ',' expressionItem ++ ','
            expressionItem -> send | block | simple
                send -> [keyword | sendArgument]+
                    sendArgument -> expression ['::' typeExpr]?
                block -> '[' [formal ++ ',' '|']? prim? label? statements ']' [':' typeExpr]?
                    formal -> variable ':' typeExpr
                    prim -> 'Primitive' integerLiteral ';'
                    label -> '$' variable ';'
                simple -> variable | reference | integerLiteral | stringLiteral
                    reference -> '&' variable
    typeExpr -> expression
    variable -> [a-zA-Z][a-zA-Z0-9]*
    stringLiteral -> '"' [character | '""']* '"'
    integerLiteral -> [0-9]+
    keyword -> [a-zA-Z][a-zA-Z0-9]*


Statements

Since the grammar in Avail has so many potential ambiguities, we must use not only the syntax but also some basic semantics to disambiguate expressions. Type consistency is used to discard otherwise possible interpretations. The statement is the "boundary" in terms of how far the compiler is willing to consider multiple interpretations. If there are two or more ways of compiling a statement, each of which is type consistent, then an ambiguity error is immediately reported. That's why the semicolon character may not be reused as a keyword. There is also a facility for setting up precedence relations between operations (e.g., + and *) to assist with disambiguation of expressions.

Assignments

x := 5;

An assignment changes a variable to contain a given value. This does not require the variable to already have a value. It does require that the value being assigned has a type that is compatible with the variable's type. Moreover, you have to prove it to the compiler at compile time. The static type of the right hand side of the assignment must match (be a subtype of) the type specified in the variable's declaration.

Declarations

There are three forms of declarations for variables:

x : integer;
x : integer := 5;
x ::= 5;

The most usual form of declaration is called a variable declaration. The name being declared appears first, as in the other two forms. Then there is a single colon and a type expression. The type expression is an expression which is compiled and evaluated when the declaration statement is being compiled. Thus, a type expression may not refer to the variables that are in scope for normal expressions (except for module variables). Typically, the type expression is some system type like "integer", but it could be something more complex, like "(1 + 2) type".

An initializing declaration has the variable name, the colon, and the type expression that are present in a variable declaration, but then it has a colon and an equals sign followed by a normal expression. The initializing declaration is just a convenient short form of a variable declaration followed by an assignment statament.

The constant declaration has just the variable name, two colons and an equals sign, and an expression. It also acts like a declaration followed by an assignment, but it asks the compiler to make sure there are no other assignments to the variable (making it "constant"). The type of the variable is omitted, but the compiler just uses the static type of the expression. The two colons and equals sign mimic a colon, an absent type expression, a colon and equals sign, and an expression to assign, as in an initializing declaration.

Expressions

Expressions have several forms:

Literals

123
"abc"
"Hello, my name is ""Avail""."

At this time, only positive decimal integers and quoted strings are supported as literals. Negative integers are actually an invocation of the unary negation operator. String literals are enclosed in double quotes. To indicate a double quote character within the string, use two consecutive double quotes.

Variables

x
thatDarnedCat
catch22

Wherever a variable name occurs in a program, its current value will be used at runtime. The exceptions to this are during the declaration of a variable, in an assignment to a variable, in a reference expression, in a block's formal argument declarations, and in a label declaration. In these cases, it's the variable itself that is being mentioned, not its value. There is also the possiblity that a powerful language construct called syntactic types will be implemented some day. This would be a generalization of the above quoting mechanisms, and these mechanisms could be implemented cleanly in terms of syntactic types.

References

&thatDarnedCat

A reference expression is a way of talking about a variable without automatically extracting its contents. Since variables are first class objects in Avail (unlike halfway OO languages like Smalltalk :-), they can be passed around as objects. So quoting them with a reference expression is occasionally useful. The syntax is just an ampersand (&) followed by the variable name. It's kind of a parody of C's syntax. You can extract the value of a variable by using a prefixed asterisk (*), continuing the parody of C.

Blocks

[Print "hello world";]
[x : integer | x + 2;] : integer
[Print 1; Print 2; Print 3;]

Like variables and references, sometimes you want to talk about a piece of code and pass it around without executing it first. That's what blocks are for. A block is basically a sequence of statements in square brackets. It can also have a list of formal argument declarations just after the open bracket, terminated by a vertical bar (|). Also, a label may immediately precede the statements. A primitive declaration may appear after the label (or where the label would be, if absent). The close bracket may be followed by a colon and a type expression (evaluated at compile time) that indicates the type of object returned by the block.

Lists

5,6,7
"hello","world",2+2

A list is two or more expressions separated by commas. A list is a first class object. The list type is not a subtype of all, but instead directly inherits from void. Some of the operations in the Avail library take lists as arguments. Usually these operations are defining new syntax for creating collections (e.g., "{_}" takes a list as its argument and creates a set) or evaluating functions with an unknown number of arguments (like "_(_)").

Message Sends

Print 5
1+2
Halt
t[5]
t[5..10]
1 <= x <= 10
p & q
if conditionExpr then trueBlock else falseBlock

A message send is a polymorphic call to another method. Which method is to be invoked depends on the actual arguments to the call, which are supplied at runtime. The syntax of a message send is very general. If a message is named "foo", then the keyword "foo" can be interpreted as invoking the method named "foo" with no arguments (note that there can be only one such method with that name, as overloading is clearly impossible when there are no arguments). If a message looks like "foo_", then foo is a prefix method, and occurences of the keyword "foo" followed by a suitable argument expression (one for whose type an overload exists) can be interpreted as a send of the message "foo_". Similarly for "_foo", but the keyword occurs after the argument. An infix operation like "_foo_" also makes sense, requiring an argument on either side of the keyword. There may be multiple keywords as well, such as "foo bar bar", or "foo_kaboom_and then do_". This last one takes three arguments, one for each underscore. A space in the method name matches any amount of whitespace at a call site. Punctuation characters are always scanned as one character tokens, so an operation like "_+*_" is invoked with an argument, the "+" token, optional whitespace, the "*" token, and the other argument. Similarly, "_+abc*_" is an argument, the "+" token, optional whitespace, the "abc" token, optional whitespace, the "*" token, and the final argument.

All the messages in the preceding paragraph can be defined in the same module without conflict. Sending these messages would be another matter, as "foo foo" might mean either to invoke "foo" then invoke "_foo" on the result, or to invoke "foo" and then invoke "foo_" on the result. Don't even think about what "foo foo foo" could mean. Obviously, this scheme can lead to pathologically awkward syntaxes, but the idea isn't to keep bad programmers from sawing their leg off. The idea is to provide reasonable safeguards against common errors by programmers of good intent and reasonable skill and wisdom, while giving them enough power to get the job done well. If you can't shoot yourself in the foot with a tool, you probably can't shoot an attacking beastie with it either.

Occasionally it is useful, from the body of some specialized method, to invoke a more general version of the method (somewhere above it in the hierarchy). This can be accomplished with the supercast notation. Just write two colons and a type expression after one or more of the arguments to a message send. This has the effect of using the result of these type expressions for the lookup of this one message send. Arguments that don't have the supercast notation applied to them are dispatched on normally, based on their actual runtime types. In order for the supercast notation to obey the type system, the type expression must yield a type that is a supertype of the argument expression's static type. For example, ("two"::integer) + 5 will obviously not work, and will be rejected by the compiler without so much as a chuckle.

Unlike some languages, this supercast notation can be used on any message send, not just ones that have the same selector as the method doing the invoking. Use this power wisely, and don't make me take it away from you later!

See also type hierarchy and type instantiation graph.

Labels

Labels are used to build loops and other control structures, such as backtracking.

A label occurs at the start of a block (optionally), just before the first statement. It's a dollar sign ($) followed by a variable name and a semicolon. It has the effect of declaring this variable local to the block. Each time this block is invoked, this variable will be initialized with the current (immutable) continuation. If nothing is done with this continuation, there is no effect. If the Exit message is sent to it, execution will continue with the continuation that caused the block to be invoked (i.e., the caller of the block containing the label). If the Restart message is sent to the continuation, execution will continue with the continuation itself, effectively restarting it as though the block were being invoked again from its original caller (with the same arguments as the first time).

Example:

    str : character;
    Print "Enter a number";
    [
        $someLabel;
        str := read line;
        if str = "Y" | str = "N" then [Exit someLabel;];
        Print "How sad, I really meant a boolean (Y or N).  Try again.";
        Restart someLabel;
    ]();

Table of Contents