The Inferno Shell

Roger Peppé

rog@vitanuova.com

ABSTRACT

The Inferno shell sh is a reasonably small shell that brings together aspects of several other shells along with Inferno’s dynamically loaded modules, which it uses for much of the functionality traditionally built in to the shell. This paper focuses principally on the features that make it unusual, and presents an example ‘‘network chat’’ application written entirely in sh script.

Introduction

Shells come in many shapes and sizes. The Inferno shell sh (actually one of three shells supplied with Inferno) is an attempt to combine the strengths of a Unix-like shell, notably Tom Duff’s rc, with some of the features peculiar to Inferno. It owes its largest debt to rc, which provides almost all of the syntax and most of the semantics too; when in doubt, I copied rc’s behaviour. In fact, I borrowed as many good ideas as I could from elsewhere, inventing new concepts and syntax only when unbearably tempted. See Credits for a list of those I could remember.

This paper does not attempt to give more than a brief overview of the aspects of sh which it holds in common with Plan 9’s rc. The reader is referred to sh(1) (the definitive reference) and Tom Duff’s paper ‘‘Rc - The Plan 9 Shell’’. I have occasionally pinched examples from the latter, so the differences are easily contrasted.

Overview

Sh is, at its simplest level, a command interpreter that will be familiar to all those who have used the Bourne-shell, C shell, or any of the numerous variants thereof (e.g. bash, ksh, tcsh). All of the following commands behave as expected:

date

cat /lib/keyboard

ls -l > file.names

ls -l /dis >> file.names

wc <file

echo [a-f]*.b

ls | wc

ls; date

limbo *.b &

An rc concept that will be less familiar to users of more conventional shells is the rôle of lists in the shell. Each simple sh command, and the value of any sh environment variable, consists of a list of words. Sh lists are flat, a simple ordered list of words, where a word is a sequence of characters that may include white-space or characters special to the shell. The Bourne-shell and its kin have no such concept, which means that every time the value of any environment variable is used, it is split into blank separated words. For instance, the command:

x=’-l /lib/keyboard’

ls $x

would in many shells pass the two arguments ‘‘-l’’ and ‘‘/lib/keyboard’’ to the ls command. In sh, it will pass the single argument ‘‘-l /lib/keyboard’’.

The following aspects of sh’s syntax will be familiar to users of rc.

File descriptor manipulation:

echo hello, world > /dev/null >[1=2]

Environment variable values:

echo $var

Count number of elements in a variable:

echo $#var

Run a command and substitute its output:

rm ‘{grep -li microsoft *}

Lists:

echo (((a b) c) d)

List concatenation:

cat /appl/cmd/sh/^(std regex expr)^.b

To the above, sh adds a variant of the ‘{} operator: "{}, which is the same except that it does not split the input into tokens, for example:

for i in "{echo one two three} {

    echo loop

}

will only print loop once.

Sh also adds a new redirection operator <>, which opens the standard input (by default) for reading and writing.

Command blocks

Possibly sh’s most significant departure from the norm is its use of command blocks as values. In a conventional shell, a command block groups commands together into a single syntactic unit that can then be used wherever a simple command might appear. For example:

{

    echo hello

    echo goodbye

} > /dev/null

Sh allows this, but it also allows a command block to appear wherever a normal word would appear. In this case, the command block is not executed immediately, but is bundled up as if it was a single quoted word. For example:

cmd = {

    echo hello

    echo goodbye

}

will store the contents of the braced block inside the environment variable $cmd. Printing the value of $cmd gets the block back again, for example:

echo $cmd

gives

{echo hello;echo goodbye}

Note that when the shell parsed the block, it ignored everything that was not syntactically relevant to the execution of the block; for instance, the white space has been reduced to the minimum necessary, and the newline has been changed to the functionally identical semi-colon.

It is also worth pointing out that echo is an external module, implementing only the standard Command(2) interface; it has no knowledge of shell command blocks. When the shell invokes an external command, and one of the arguments is a command block, it simply passes the equivalent string. Internally, built in commands are slightly different for efficiency’s sake, as we will see, but for almost all purposes you can treat command blocks as if they were strings holding functionally equivalent shell commands.

This equivalence also applies to the execution of commands. When the shell comes to execute a simple command (a sequence of words), it examines the first word to decide what to execute. In most shells, this word can be either the file name of an external command, or the name of a command built in to the shell (e.g. exit).

Sh follows these conventional rules, but first, it examines the first character of the first word, and if it is an open brace ({) character, it treats it as a command block, parses it, and executes it according to the normal syntax rules of the shell. For the duration of this execution, it sets the environment variable $* to the list of arguments passed to the block. For example:

{echo $*} hello world

is exactly the same as

echo hello world

Execution of command blocks is the same whether the command block is just a string or has already been parsed by the shell. For example:

{echo hello}

is exactly the same as

’{echo hello}’

The only difference is that the former case has its syntax checked for correctness as soon as the shell sees the script; whereas if the latter contained a malformed command block, a syntax error will be raised only when it comes to actually execute the command.

The shell’s treatment of braces can be used to provide functionality similar to the eval command that is built in to most other shells.

cmd = ’echo hello; echo goodbye’

’{’^$cmd^’}’

In other words, simply by surrounding a string by braces and executing it, the string will be executed as if it had been typed to the shell. Note the use of the caret (^) string concatenatation operator. Sh does provide ‘free carets’ in the same way as rc, so in the previous example

’{’$cmd’}’

would work exactly the same, but generally, and in particular when writing scripts, it is good style to make the carets explicit.

Assignment and scope

The assignment operator in sh, in common with most other shells is =.

x=a b c d

assigns the four element list (a b c d) to the environment variable named x. The value can later be extracted with the $ operator, for example:

echo $x

will print

a b c d

Sh also implements a form of local variable. An execution of a braced block command creates a new scope for the duration of that block; the value of a variable assigned with := in that block will be lost when the block exits. For example:

x = hello

{x := goodbye }

echo $x

will print ‘‘hello’’. Note that the scoping rules are dynamic - variable references are interpreted relative to their containing scope at execution time. For example:

x := hello

cmd := {echo $x}

{

    x := goodbye

    $cmd

}

wil print ‘‘goodbye’’, not ‘‘hello’’. For one way of avoiding this problem, see ‘‘Lexical binding’’ below.

One late, but useful, addition to the shell’s assignment syntax is tuple assignment. This partially makes up for the lack of list indexing primitives in the shell. If the left hand side of the assignment operator is a list of variable names, each element of the list on the right hand side is assigned in turn to its respective variable. The last variable mentioned gets assigned all the remaining elements. For example, after:

(a b c) := (one two three four five)

a is one, b is two, and c contains the three element list (three four five). For example:

(first var) = $var

knocks the first element off $var and puts it in $first.

One important difference between sh’s variables and variables in shells under Unix-like operating systems derives from the fact that Inferno’s underlying process creation primitive is spawn, not fork. This means that, even though the shell might create a new process to accomplish an I/O redirection, variables changed by the sub-process are still visible in the parent process. This applies anywhere a new process is created that runs synchronously with respect to the rest of the shell script - i.e. there is no chance of parallel access to the environment. For example, it is possible to get access to the status value of a command executed by the ‘{} operator:

files=‘{du -a; dustatus = $status}

if {! ~ $dustatus ’’} {

    echo du failed

}

When the shell does spawn an asynchronous process (background processes and pipelines are the two occasions that it does so), the environment is copied so changes in one process do not affect another.

Loadable modules

The ability to pass command blocks as values is all very well, but does not in itself provide the programmability that is central to the power of shell scripts and is built in to most shells, the conditional execution of commands, for instance. The Inferno shell is different; it provides no programmability within the shell itself, but instead relies on external modules to provide this. It has a built in command load that loads a new module into the shell. The module that supports standard control flow functionality and a number of other useful tidbits is called std.

load std

loads this module into the shell. Std is a Dis module that implements the Shellbuiltin interface; the shell looks in the directory /dis/sh for the module file, in this case /dis/sh/std.dis.

When a module is loaded, it is given the opportunity to define as many new commands as it wants. Perhaps slightly confusingly, these are known as ‘‘built-in’’ commands (or just ‘‘builtins’’), to distinguish them from commands executed in a separate process with no access to shell internals. Built-in commands run in the same process as the shell, and have direct access to all its internal state (environment variables, command line options, and state stored within the implementing module itself). It is possible to find out what built-in commands are currently defined with the command loaded. Before any modules have been loaded, typing

loaded

produces:

builtin builtin

exit    builtin

load    builtin

loaded  builtin

run builtin

unload  builtin

whatis  builtin

${builtin}  builtin

${loaded}   builtin

${quote}    builtin

${unquote}  builtin

These are all the commands that are built in to the shell proper; I’ll explain the ${} commands later. After loading std, executing loaded produces:

!   std

and std

apply   std

builtin builtin

exit    builtin

flag    std

fn  std

for std

getlines    std

if  std

load    builtin

loaded  builtin

or  std

pctl    std

raise   std

rescue  std

run builtin

status  std

subfn   std

unload  builtin

whatis  builtin

while   std

~   std

${builtin}  builtin

${env}  std

${hd}   std

${index}    std

${join} std

${loaded}   builtin

${parse}    std

${pid}  std

${pipe} std

${quote}    builtin

${split}    std

${tl}   std

${unquote}  builtin

The name of each command defined by a loaded module is followed by the name of the module, so you can see that in this case std has defined commands such as if and while. These commands are reminiscent of the commands built in to the syntax of other shells, but have no special syntax associated with them: they obey the normal argument gathering and execution semantics.

As an example, consider the for command.

for i in a b c d {

    echo $i

}

This command traverses the list (a b c d) executing {echo $i} with $i set to each element in turn. In rc, this might be written

for (i in a b c d) {

    echo $i

}

and in fact, in sh, this is exactly equivalent. The round brackets denote a list and, like rc, all lists are flattened before passing to an executed command. Unlike the for command in rc, the braces around the command are not optional; as with the arguments to a normal command, gathering of arguments stops at a newline. The exception to this rule is that newlines within brackets are treated as white space. This last rule also applies to round brackets, for example:

(for i in

    a

    b

    c

    d

    {echo $i}

)

does the same thing. This is very useful for commands that take multiple command block arguments, and is actually the only line continuation mechanism that sh provides (the usual backslash (\) character is not in any way special to sh).

Control structures

Inferno commands, like shell commands in Unix or Plan 9, return a status when they finish. A command’s status in Inferno is a short string describing any error that has occurred; it can be found in the environment variable $status. This is the value that commands defined by std use to determine conditional execution - if it is empty, it is true; otherwise false. Std defines, for instance, a command ~ that provides a simple pattern matching capability. Its first argument is the string to test the patterns against, and subsequent arguments give the patterns, in normal shell wildcard syntax; its status is true if there is a match.

~ sh.y ’*.y’

~ std.b ’*.y’

give true and false statuses respectively. A couple of pitfalls lurk here for the unwary: unlike its rc namesake, the patterns are expanded by the shell if left unquoted, so one has to be careful to quote wildcard characters, or escape them with a backslash if they are to be used literally. Like any other command, ~ receives a simple list of arguments, so it has to assume that the string tested has exactly one element; if you provide a null variable, or one with more than one element, then you will get unexpected results. If in doubt, use the $" operator to make sure of that.

Used in conjunction with the $# operator, ~ provides a way to check the number of elements in a list:

~ $#var 0

will be true if $var is empty.

This can be tested by the if command, which accepts command blocks for its arguments, executing its second argument if the status of the first is empty (true). For example:

if {~ $#var 0} {

    echo ’$var has no elements’

}

Note that the start of one argument must come on the same line as the end of of the previous, otherwise it will be treated as a new command, and always executed. For example:

if {~ $#var 0}

    {echo ’$var has no elements’}   # this will always be executed

The way to get around this is to use list bracketing, for example:

(if {~ $#var 0}

    {echo ’$var has no elements’}

)

will have the desired effect. The if command is more general than rc’s if, in that it accepts an arbitrary number of condition/action pairs, and executes each condition in turn until one is true, whereupon it executes the associated action. If the last condition has no action, then it acts as the ‘‘else’’ clause in the if. For example:

(if {~ $#var 0} {

        echo zero elements

    }

    {~ $#var 1} {

        echo one element

    }

    {echo more than one element}

)

Std provides various other control structures. And and or provide the equivalent of rc’s && and || operators. They each take any number of command block arguments and conditionally execute each in turn. And stops executing when a block’s status is false, or when a block’s status is true:

and {~ $#var 1} {~ $var ’*.sbl’} {echo variable ends in .sbl}

(or {mount /dev/eia0 /n/remote} 

    {echo mount has failed with $status}

)

An extremely easy trap to fall into is to use $* inside a block assuming that its value is the same as that outside the block. For instance:

# this will not work

if {~ $#* 2} {echo two arguments}

It will not work because $* is set locally for every block, whether it is given arguments or not. A solution is to assign $* to a variable at the start of the block:

args = $*

if {~ $#args 2} {echo two arguments}

While provides looping, executing its second argument as long as the status of the first remains true. As the status of an empty block is always true,

while {} {echo yes}

will loop forever printing ‘‘yes’’. Another looping command is getlines, which loops reading lines from its standard input, and executing its command argument, setting the environment variable $line to each line in turn. For example:

getlines {

    echo ’#’ $line

} < x.b

will print each line of the file x.b preceded by a # character.

Exceptions

When the shell encounters some error conditions, such as a parsing error, or a redirection failure, it prints a message to standard error and raises an exception. In an interactive shell this is caught by the interactive command loop; in a script it will cause an exit with a false status, unless handled.

Exceptions can be handled and raised with the rescue and raise commands provided by std. An exception has a short string associated with it.

raise error

will raise an exception named ‘‘error’’.

rescue error {echo an error has occurred} {

    command

}

will execute command and will, in the event that it raises an error exception, print a diagnostic message. The name of the exception given to rescue can end in an asterisk (*), which will match any exception starting with the preceding characters. The * needs quoting to avoid being expanded as a wildcard by the shell.

rescue ’*’ {echo caught an exception $exception} {

    command

}

will catch all exceptions raised by command, regardless of name. Within the handler block, rescue sets the environment variable $exception to the actual name of the exception caught.

Exceptions can be caught only within a single process - if an exception is not caught, then the name of the exception becomes the exit status of the process. As sh starts a new process for commands with redirected I/O, this means that

raise error

echo got here

behaves differently to:

raise error > /dev/null

echo got here

The former prints nothing, while the latter prints ‘‘got here’’.

The exceptions break and continue are recognised by std’s looping commands for, while, and getlines. A break exception causes the loop to terminate; a continue exception causes the loop to continue as before. For example:

for i in * {

    if {~ $i ’r*’} {

        echo found $i

        raise break

    }

}

will print the name of the first file beginning with ‘‘r’’ in the current directory.

Substitution builtins

In addition to normal commands, a loaded module can also define substitution builtin commands. These are different from normal commands in that they are executed as part of the argument gathering process of a command, and instead of returning an exit status, they yield a list of values to be used as arguments to a command. They can be thought of as a kind of ‘active environment variable’, whose value is created every time it is referenced. For example, the split substitution builtin defined by std splits up a single argument into strings separated by characters in its first argument:

echo ${split e ’hello there’}

will print

h llo th r

Note that, unlike the conventional shell backquote operator, the result of the $ command is not re-interpreted, for example:

for i in ${split e ’hello there’} {

    echo arg $i

}

will print

arg h

arg llo th

arg r

Substitution builtins can only be named as the initial command inside a dollar-referenced command block - they live in a different namespace from that of normal commands. For instance, loaded and ${loaded} are quite distinct: the former prints a list of all builtin names and their defining modules, whereas the former yields a list of all the currently loaded modules.

Std provides a number of useful commands in the form of substitution builtins. ${join} is the complement of ${split}: it joins together any elements in its argument list using its first argument as the separator, for example:

echo ${join . file tar gz}

will print:

file.tar.gz

The in-built shell operator $" is exactly equivalent to ${join} with a space as its first argument.

List indexing is provided with ${index}, which given a numeric index and a list yields the index’th item in the list (origin 1). For example:

echo ${index 4 one two three four five}

will print

four

A pair of substitution builtins with some of the most interesting uses are defined by the shell itself: ${quote} packages its argument list into a single string in such a way that it can be later parsed by the shell and turned back into the same list. This entails quoting any items in the list that contain shell metacharacters, such as ’;‘ or ‘&’. For example:

x=’a;’ ’b’ ’c d’ ’’

echo $x

echo ${quote $x}

will print

a; b c d 

’a;’ b ’c d’ ’’

Travel in the reverse direction is possible using ${unquote}, which takes a single string, as produced by ${quote}, and produces the original list again. There are situations in sh where only a single string can be used, but it is useful to be able to pass around the values of arbitrary sh variables in this form; ${quote} and ${unquote} between them make this possible. For instance the value of a sh list can be stored in a file and later retrieved without loss. They are also useful to implement various types of behaviour involving automatically constructed shell scripts; see ‘‘Lexical binding’’, below, for an example.

Two more list manipulation commands provided by std are ${hd} and ${tl}, which mirror their Limbo namesakes: ${hd} returns the first element of a list, ${tl} returns all but the first element of a list. For example:

x=one two three four

echo ${hd $x}

echo ${tl $x}

will print:

one

two three four

Unlike their Limbo counterparts, they do not complain if their argument list is not long enough; they just yield a null list.

Std provides three other substitution builtins of note. ${pid} yields the process id of the current process. ${pipe} provides a somewhat more cumbersome equivalent of the >{} and <{} commands found in rc, i.e. branching pipelines. For example:

cmp ${pipe from {old}} ${pipe from {new}}

will regression-test a new version of a command. Using ${pipe} yields the name of a file in the namespace which is a pipe to its argument command.

The substitution builtin ${parse} is used to check shell syntax without actually executing a command. The command:

x=${parse ’{echo hello, world}’}

will return a parsed version of the string ‘‘echo hello, world’’; if an error occurs, then a parse error exception will be raised.

Functions

Shell functions are a facility provided by the std shell module; they associate a command name with some code to execute when that command is named.

fn hello {

    echo hello, world

}

defines a new command, hello, that prints a message when executed. The command is passed arguments in the usual way, for example:

fn removems {

    for i in $* {

        if {grep -s Microsoft $i} {

            rm $i

        }

    }

}

removems *

will remove all files in the current directory that contain the string ‘‘Microsoft’’.

The status command provides a way to return an arbitrary status from a function. It takes a single argument - its exit status is the value of that argument. For instance:

fn false {

    status false

}

fn true {

    status ’’

}

It is also possible to define new substitution builtins with the command subfn: the value of $result at the end of the execution of the command gives the value yielded. For example:

subfn backwards {

    for i in $* {

        result=$i $result

    }

}

echo ${backwards a b c ’d e’}

will reverse a list, producing:

d e c b a

The commands associated with shell functions are stored as normal environment variables, and so are exported to external commands in the usual way. Fn definitions are stored in environment variables starting fn-; subfn definitions use environment variables starting sfn-. It is useful to know this, as the shell core knows nothing of these functions - they look just like builtin commands defined by std; looking at the current definition of $fn-name is the only way of finding out the body of code associated with function name.

Other loadable sh modules

In addition to std, and tk, which is mentioned later, there are several loadable sh modules that extend sh’s functionality.

Expr provides a very simple stack-based calculator, giving simple arithmetic capability to the shell. For example:

load expr

echo ${expr 3 2 1 + x}

will print 9.

String provides shell level access to the Limbo string library routines. For example:

load string

echo ${tolower ’Hello, WORLD’}

will print

hello, world