The Limbo Programming Language
Dennis M. Ritchie
(Revised 2005 by Vita Nuova)
Limbo is a programming language intended for applications running distributed systems on small computers. It supports modular programming, strong type checking at compile- and run-time, interprocess communication over typed channels, automatic garbage collection, and simple abstract data types. It is designed for safe execution even on small machines without hardware memory protection.
In its implementation for the Inferno operating system, object programs generated by the Limbo compiler run using an interpreter for a fixed virtual machine. Inferno and its accompanying virtual machine run either stand-alone on bare hardware or as an application under conventional operating systems like Unix, Windows 2000, Linux, FreeBSD, MacOSX, and Plan 9. For most architectures, including Intel x86, ARM, PowerPC, MIPS and Sparc, Limbo object programs are transformed on-the-fly into instructions for the underlying hardware.
1. Overview and introduction
A Limbo application consists of one or more modules, each of which supplies an interface declaration and an implementation part. A module that uses another module includes its declaration part. During execution, a module dynamically attaches another module by stating the other module’s type identifier and a place from which to load the object code for its implementation.
A module declaration specifies the functions and data it will make visible, its data types, and constants. Its implementation part defines the functions and data visible at its interface and any functions associated with its data types; it may also contain definitions for functions used only internally and for data local to the module.
Here is a simple module to illustrate the flavour of the language.
1 implement Command;
2 include "sys.m";
3 include "draw.m";
4 sys: Sys;
5 Command: module
{
6 init: fn (ctxt: ref Draw->Context, argv: list of string);
7 };
8 # The canonical "Hello world" program, enhanced
9 init(ctxt: ref Draw->Context, argv: list of string)
10 {
11 sys = load Sys Sys->PATH;
12 sys->print("hello world\n");
13 for (; argv!=nil; argv = tl argv)
14 sys->print("%s ", hd argv);
15 sys->print("\n");
16 }
A quick glance at the program reveals that the syntax of Limbo is influenced by C in its expressions, statements, and some conventions (for example, look at lines 13-14), and also by Pascal and its successors (the declarations on lines 4, 6, 9). When executed in the Inferno environment, the program writes hello world somewhere, then echoes its arguments.
Let’s look at the program line-by-line. It begins (line 1) by saying that this is the implementation of module Command. Line 2 includes a file (found in a way analogous to C’s #include mechanism) named sys.m. This file defines the interface to module Sys; it says, in part,
Sys: module {
PATH: con "$Sys";
. . .
print: fn (s: string, *): int;
. . .
};
This declares Sys to be the type name for a module containing among other things a function named print; the first argument of print is a string. The * in the argument list specifies that further arguments, of unspecified type, may be given.
Line 3 includes draw.m; only one piece of information, mentioned below, is used from it. Line 4 declares the variable sys to be of type Sys; its name will be visible throughout the remainder of the file describing this module. It will be used later to refer to an instance of the Sys module. This declaration initializes it to nil; it still needs to be set to a useful value.
Lines 5-7 constitute the declaration of Command, the module being implemented. It contains only a function named init, with two arguments, a ref Draw->Context and a list of strings, and it doesn’t return any value. The ref Draw->Context argument would be used if the program did any graphics; it is a data type defined in draw.m and refers to the display. Since the program just writes text, it won’t be used. The init function isn’t special to the Limbo language, but it is conventional in the environment, like main in C.
In a module designed to be useful to other modules in an application, it would be wise to take the module declaration for Command out, put it in a separate file called command.m and use include command.m to allow this module and others to refer to it. It is called, for example, by the program loader in the Inferno system to start the execution of applications.
Line 8 is a comment; everything from the # to the end of line is ignored.
Line 9 begins the definition for the init function that was promised in the module’s declaration (line 6). The argument that is a list of strings is named argv.
Line 11 connects the program being written to the Sys module. The first token after load is the target module’s name as defined by its interface (here found in the include on line 2) The next token is the place where the code for the module can be found; it is a string that usually names a file. Conventionally, in the Inferno system, each module contains a constant declaration for the name PATH as a string that names the file where the object module can be found. Loading the file is performed dynamically during execution except for a few modules built into the execution environment. (These include Sys; this accounts for the peculiar file name $Sys as the value of PATH.)
The value of load is a reference to the named module; line 11 assigns it to the variable sys for later use. The load operator dynamically loads the code for the named module if it is not already present and instantiates a new instance of it.
Line 12 starts the work by printing a familiar message, using the facilities provided by module Sys through its handle sys; the notation sys->print(...) means to call the print function of the module referred to by sys. The interface of Sys resembles a binding to some of the mechanisms of Unix and the ISO/ANSI C library.
The loop at lines 13-14 takes the list of string argument to init and iterates over it using the hd (head) and tl (tail) operators. When executed, this module combines the traditional ‘Hello world’ and echo.
2. Lexical conventions
There are several kinds of tokens: keywords, identifiers, constants, strings, expression operators, and other separators. White space (blanks, tabs, new-lines) is ignored except that it serves to separate tokens; sometimes it is required to separate tokens. If the input has been parsed into tokens up to a particular character, the next token is taken to include the longest string of characters that could constitute a token.
The native character set of Limbo is Unicode, which is identical with the first 16-bit plane of the ISO 10646 standard. Any Unicode character may be used in comments, or in strings and character constants. The implementation assumes that source files use the UTF-8 representation, in which 16-bit Unicode characters are represented as sequences of one, two, or three bytes.
2.1. Comments
Comments begin with the # character and extend to the end of the line. Comments are ignored.
2.2. Identifiers
An identifier is a sequence of letters and digits of which the first is a letter. Letters are the Unicode characters a through z and A through Z, together with the underscore character, and all Unicode characters with encoded values greater than 160 (A0 hexadecimal, the beginning of the range corresponding to Latin-1).
Only the first 256 characters in an identifier are significant.
2.3. Keywords
The following identifiers are reserved for use as keywords, and may not be used otherwise:
adt alt array big
break byte case chan
con continue cyclic do
else exit fn for
hd if implement import
include int len list
load module nil of
or pick real ref
return self spawn string
tagof tl to type
while
The word union is not currently used by the language.
2.4. Constants
There are several kinds of constants for denoting values of the basic types.
2.4.1. Integer constants
Integer constants have type int or big. They can be represented in several ways.
Decimal integer constants consist of a sequence of decimal digits. A constant with an explicit radix consists of a decimal radix followed by R or r followed by the digits of the number. The radix is between 2 and 36 inclusive; digits above 10 in the number are expressed using letters A to Z or a to z. For example, 16r20 has value 32.
The type of a decimal or explicit-radix number is big if its value exceeds 231−1, otherwise it is int.
Character constants consist of a single Unicode character enclosed within single-quote characters ’. Inside the quotes the following escape sequences represent special characters:
\\ backslash
\’ single quote
\" double quote
\a bell (BEL)
\b backspace (BS)
\t horizontal tabulation (HT)
\n line feed (LF)
\v vertical tabulation (VT)
\f form feed (FF)
\r carriage return (CR)
\udddd Unicode character named by 4 hexadecimal digits
\0 NUL
Character constants have type int.
2.4.2. Real constants
Real constants consist of a sequence of decimal digits containing one period . and optionally followed by e or E and then by a possibly signed integer. If there is an explicit exponent, the period is not required. Real constants have type real.
2.4.3. Strings
String constants are sequences of Unicode characters contained in double quotes. They cannot extend across source lines. The same escape sequences listed above for character constants are usable within string constants. Strings have type string.
2.4.4. The nil constant
The constant nil denotes a reference to nothing. It may be used where an object of a reference type is expected; otherwise uninitialized values of reference type start off with this value, it can be assigned to reference objects, and reference types can be tested for equality with it. (The keyword has other uses as well.)
2.5. Operators and other separators
The operators are
+ - * / % & | ^
== < > <= >= != << >>
&& || <- ::
= += -= *= /= %= &= |= ^= <<= >>=
:=
~ ++ -- ! **
The other separators are
: ; ( ) { } [ ]
, . -> =>
3. Syntax notation
In this manual, Limbo syntax is described by a modified BNF in which syntactic categories are named in an italic font, and literals in typewriter font. Alternative productions are listed on separate lines, and an optional symbol is indicated with the subscript ‘‘opt.’’
4. Types and objects
Limbo has three kinds of objects. Data objects exist in the storage associated with a module; they can be manipulated by arithmetic operations, assignment, selection of component entities, and other concrete operations. Each data object has a type that determines what can be stored in it and what operations are applicable.
The second kind of object is the function. Functions are characterized by the types of the arguments they accept and the values they return, and are associated with the modules in which they are defined. Their names can be made visible in their module’s declaration, or they can be encapsulated within the adt (abstract data types) of their modules, or they can exist privately within their module.
Finally, Limbo programs are organized into modules: a named collection of constants, abstract data types, data, and functions made available by that module. A module declaration displays the members visible to other modules; the module’s implementation defines both the publicly visible members and its private parts, including the data objects it uses. A module that wishes to use the facilities of another includes its declaration in order to understand what it exports, but before using them it explicitly loads the new module.
4.1. Types
Limbo has several basic types, some built-in higher abstractions, and other ways of composing new types. In declarations and some other places, constructions naming a type are used. The syntax is:
type:
data-type
function-type
Functions will be discussed in §7 below. First, data types will be explored.
4.2. Data types
The syntax of data types is
data-type:
CbyteI
CintI
CbigI
CrealI
CstringI
tuple-type
Carray of Idata-type
Clist of Idata-type
Cchan of Idata-type
adt-type
Cref Iadt-type
Cref Ifunction-type
module-type
module-qualified-type
type-name
data-type-list:
data-type
data-type-list C,I data-type
Objects of most data types have value semantics; when they are assigned or passed to functions, the destination receives a copy of the object. Subsequent changes to the assigned object itself have no effect on the original object. The value types are byte, int, big, real, string, the tuple types, and abstract data types or adt. The rest have reference semantics. When they are assigned, the quantity actually assigned is a reference to (a pointer to) an underlying object that is not copied; thus changes or operations on the assigned value affect the original object. Reference types include lists, arrays, channels, modules, ref adt, and ref fn types.
4.2.1. Basic types
The five basic data types are denoted by byte, int, big, real, and string.
Bytes are unsigned 8-bit quantities.
Integers (int) are 32-bit signed quantities represented in two’s complement notation. Large integers (big) are 64-bit signed quantities represented in two’s complement notation.
Real numbers (real) are 64-bit quantities represented in the IEEE long floating notation.
The byte, int, big, and real types are collectively called arithmetic types.
Strings are rows of Unicode characters. They may be concatenated and extended character-by-character. When a string is indexed with a single subscript, it yields an integer with the Unicode encoding of the character; when it is indexed by a range, it yields another string.
4.2.2. Tuple type
The tuple type, denoted
tuple-type:
C( Idata-type-listC )I
is a type consisting of an ordered collection of two or more objects, each having its own data type. For each tuple type, the types of the members are fixed, but need not be identical; for example, a function might return a tuple containing an integer and a string. Each tuple type is characterized solely by the the order and identity of the types it contains. Objects of tuple type may be assigned to a list of identifiers (to pick out the components), and a parenthesized, comma-separated list of expressions denotes a tuple.
4.2.3. Array types
The array type describes a dynamically-sized row of objects, all of the same type; it is indexed starting from 0. An array type is denoted by
Carray of Idata-type
The size of an array is not part of its type; instead it is part of the value. The data-type may itself be an array, to achieve a multidimensional array.
4.2.4. List types
A list is a sequence of like-typed objects; its denotation is
Clist of Idata-type
A list is a stack-like object, optimized for a few operations: get the head (the first object), get the tail (the rest of the list), place an object at the beginning.
4.2.5. Channel types
A channel, whose type is written
Cchan of Idata-type
is a communication mechanism capable of sending and receiving objects of the specified type to another agent in the system. Channels may be used to communicate between local processes; using library procedures, they may be connected to named destinations. In either case send and receive operations may be directed to them. For example,
chan of (int, string)
is the type of a channel that transmits tuples consisting of an integer and an string. Once an instance of such a channel (say c) has been declared and initialized, the statement
c <-= (123, "Hello");
sends such a tuple across it.
4.2.6. Abstract data types
An abstract data type or adt is an object that can contain data objects of several different types and declare functions that operate on them. The syntax for declaring an adt is given later. Once an adt has been declared, the identifier associated with it becomes a data-type name.
adt-type:
identifier
module-qualified-type
There is also a ref adt type representing a reference (pointer) to an adt. It is denoted
Cref Iadt-type
where the identifier is the name of an adt type.
4.2.7. Module types
A module type name is an identifier:
module-type:
identifier
The identifier is declared as a module identifier by a module-declaration, as described in §6.5 below. An object of module type serves as a handle for the module, and is used to access its functions.
4.2.8. Module-qualified type
When an adt is declared within a module declaration, the type name of that adt is not generally visible to the rest of the program unless a specific import request is given (see §6.6, §10 below). Without such a request, when adt objects implemented by a module are declared by a client of that module, the adt type name is qualified:
module-qualified-type:
identifier C->I identifier
Here the first identifier is either the name of a module or a variable of the module type; the second is the name of a type mentioned in the module declaration.
4.2.9. Function reference types
A function reference type represents a reference to a function of a given type. It is written as
Cref Ifunction-type
Function types are discussed in §4.3 below.
4.2.10. Named types
Finally, data types may be named, using a type declaration; this is discussed in §6.4 below.
type-name:
identifier
4.3. Function types
A function type characterizes the arguments and return value of a function. The syntax is
function-type:
Cfn Ifunction-arg-ret
function-arg-ret:
C( Iformal-arg-listOC ) IraisesO
C( Iformal-arg-listOC ) : Idata-type raisesO
formal-arg-list:
formal-arg
formal-arg-listC , Iformal-arg
formal-arg:
nil-or-ID-listC : Itype
nil-or-IDC : self refO Iidentifier
nil-or-IDC : self Iidentifier
C*I
nil-or-ID-list:
nil-or-ID
nil-or-ID-list C, Inil-or-ID
nil-or-ID:
identifier
CnilI
raises:
Craises ( Inil-or-ID-listC )I
CraisesI nil-or-ID
That is, the denotation of a function type has the keyword fn followed by a comma-separated list of its arguments enclosed in parentheses, and perhaps followed by the type the function returns. Absence of a return value means that the function returns no value: it is a procedure. The names and types of arguments are specified. However, the name of an argument may be replaced by nil; in this case it is nameless. For example,
fn (nil: int, nil: int): int
fn (radius: int, angle: int): int
fn (radius, angle: int): int
all denote exactly the same type, namely a function of two integers that returns an integer. As another example,
fn (nil: string)
is the type of a function that takes a string argument and returns no value.
The self keyword has a specialized use within adt declarations. It may be used only for the first argument of a function declared within an adt; its meaning is discussed in §6.3 below.
The star character * may be given as the last argument in a function type. It declares that the function is variadic; during a call, actual arguments at its position and following are passed in a manner unspecified by the language. For example, the type of the print function of the Sys module is
fn (s: string, *): int
This means that the first argument of print is a string and that other arguments may be given when the function is called. The Limbo language itself has no way of accessing these arguments; the notation is an artifice for describing facilities built into the runtime system, such as the Sys module.
The type of a function includes user-defined exceptions that it raises, which must be listed in a corresponding raises clause.
5. Limbo programs
Limbo source programs that implement modules are stored in files, conventionally named with the suffix .b. Each such file begins with a single implement directive naming the type of the module being implemented, followed by a sequence of declarations. Other files, conventionally named with the suffix .m, contain declarations for things obtainable from other modules. These files are incorporated by an include declaration in the implementation modules that need them. At the top level, a program consists of a sequence of declarations. The syntax is
program:
Cimplement Iidentifier-listC ; Itop-declaration-sequence
top-declaration-sequence:
top-declaration
top-declaration-sequence top-declaration
top-declaration:
declaration
identifier-listC := IexpressionC ;I
identifier-listC = IexpressionC ;I
C( Iidentifier-listC ) := IexpressionC ;I
module-declaration
function-definition
adt-declaration
The implement declaration at the start identifies the type of the module that is being implemented. The rest of the program consists of a sequence of various kinds of declarations and definitions that announce the names of data objects, types, and functions, and also create and initialize them. It must include a module declaration for the module being implemented and the objects it announces, and may also include declarations for the functions, data objects, types, and constants used privately within the module as well as declarations for modules used by it.
Declarations are used both at the top level (outside of functions) and also inside functions and module declarations. Some styles of declaration are allowed only in certain of these places, but all will be discussed together.
Most implementation modules provide an implementation for one type of module. Several module types may be listed, however, in the implement declaration, when the implementation module implements them all. When the same name appears in more than one such module type, it must have the same type.
6. Declarations
Declarations take several forms:
declaration:
identifier-listC : ItypeC ;I
identifier-listC : ItypeC = IexpressionC ;I
identifier-listC : con IexpressionC ;I
Iidentifier-listC : import Iidentifier C;I
identifier-listC : typeI typeC ;I
identifier-listC : exceptionI tuple-typeO
Cinclude Istring-constantC ;I
identifier-list:
identifier
identifier-listC , Iidentifier
expression-list:
expression
expression-listC , Iexpression
6.1. Data declarations
These forms constitute the basic way to declare and initialize data:
identifier-listC : ItypeC ;I
identifier-listC : ItypeC = IexpressionC ;I
A comma-separated sequence of identifiers is followed by a colon and then the name of a type. Each identifier is declared as having that type and denotes a particular object for rest of its scope (see §11 below). If the declaration contains = and an expression, the type must be a data type, and all the objects are initialized from the value of the expression. In a declaration at the top level (outside of a function), the expression must be constant (see §8.5) or an array initialized with constant expressions; the bound of any array must be a constant expression. Lists and ref adt types may not be initialized at the top level. If an object is not explicitly initialized, then it is always set to nil if it has a reference type; if it has arithmetic type, then it is set to 0 at the top level and is undefined if it occurs within a function.
For example,
i, j: int = 1;
r, s: real = 1.0;
declares i and j as integers, r and s as real. It sets i and j to 1, and r and s to 1.0.
Another kind of declaration is a shorthand. In either of
identifierC := IexpressionC ;I
C( Iidentifier-listC ) := IexpressionC ;I
identifiers on the left are declared using the type of the expression, and are initialized with the value of the expression. In the second case, the expression must be a tuple or an adt, and the types and values attributed to the identifiers in the list are taken from the members of the tuple, or the data members of the adt respectively. For example,
x: int = 1;
and
x := 1;
are the same. Similarly,
(p, q) := (1, 2.1);
declares the identifiers on the left as int and real and initializes them to 1 and 2.1 respectively. Declarations with := can also be expressions, and are discussed again in §8.4.4 below.
6.2. Constant declarations
The con declaration
Iidentifier-listC : conI expressionC ;I
declares a name (or names) for constants. The expression must be constant (see §8.5). After the declaration, each identifier in the list may be used anywhere a constant of the appropriate type is needed; the type is taken from the type of the constant. For example, after
Seven: con 3+4;
the name Seven is exactly the same as the constant 7.
The identifier iota has a special meaning in the expression in a con declaration. It is equivalent to the integer constant 0 when evaluating the expression for the first (leftmost) identifier declared, 1 for the second, and so on numerically. For example, the declaration
M0, M1, M2, M3, M4: con (1<<iota);
declares several constants M0 through M4 with the values 1, 2, 4, 8, 16 respectively.
The identifier iota is not reserved except inside the expression of the con declaration.
6.3. adt declarations
An adt or abstract data type contains data objects and functions that operate on them. The syntax is
adt-declaration:
IidentifierC : adt { Iadt-member-listOC } ;I
adt-member-list:
adt-member
adt-member-list adt-member
adt-member:
identifier-listC : cyclicO Idata-typeC ;I
identifier-listC : con IexpressionC ;I
identifier-listC : Ifunction-typeC ;I
Cpick { Ipick-member-listC }I
After an adt-declaration, the identifier becomes the name of the type of that adt. For example, after
Point: adt {
x, y: int;
add: fn (p: Point, q: Point): Point;
eq: fn (p: Point, q: Point): int;
};
the name Point is a type name for an adt of two integers and two functions; the fragment
r, s: Point;
xcoord: int;
...
xcoord = s.x;
r = r.add(r, s);
makes sense. The first assignment selects one of the data members of s; the second calls one of the function members of r.
As this example indicates, adt members are accessed by mentioning an object with the adt type, a dot, and then the name of the member; the details will be discussed in §8.13 below. A special syntactic indulgence is available for functions declared within an adt: frequently such a function receives as an argument the same object used to access it (that is, the object before the dot). In the example just above, r was both the object being operated on and the first argument to the add function. If the first formal argument of a function declared in an adt is marked with the self keyword, then in any calls to the function, the adt object is implicitly passed to the function, and is not mentioned explicitly in the actual argument list at the call site. For example, in
Rect: adt {
min, max: Point;
contains: fn(r: self Rect, p: Point): int;
};
r1: Rect;
p1: Point;
...
if (r1.contains(p1)) ...
because the first argument of the contains function is declared with self, the subsequent call to it automatically passes r1 as its first argument. The contains function itself is defined elsewhere with this first argument explicit. (This mechanism is analogous to the this construct in C++ and other languages, but puts the special-casing at the declaration site and makes it explicit.)
If self is specified in the declaration of a function, it must also be specified in the definition as well. For example, contains would be defined
Rect.contains(r: self Rect, p: Point)
{