.FP lucidasans .TL C Programming in Plan 9 from Bell Labs .AU Pietro Gagliardi .AB This paper is an introduction to programming with Plan 9 from Bell Labs with the C language. Plan 9 provides not only a significantly improved version of C, but also a number of programming libraries to simplify complicated tasks. This paper is meant to be a supplement to the manual pages, other documents provided by the system in .CW /sys/doc , and a programmer's literature collection. .AE .\" started July 8, 2008 .nr XT 4 \" tab in program is four spaces .EQ delim @@ .EN .de Bx \" BX, but works like B, I, BI, CW .ie t \\&\\$3\(br\|\\$1\|\(br\l'|0\(rn'\l'|0\(ul'\\$2 .el \\&\\$3\(br\\kA\|\\$1\|\\kB\(br\v'-1v'\h'|\\nBu'\l'|\\nAu'\v'1v'\l'|\\nAu'\\$2 .. .NH Introduction .PP Plan 9 from Bell Labs has always been a system above the rest: simple, portable, and feature-complete. It isn't .UX ; rather, it improves on the basics of .UX by providing a number of features absent from most other operating systems. One of those features is a great programming environment that rivals .UX 's. Plan 9 is fully Unicode-conformant through its nearly universal use of the UTF-8 encoding, brought to us by two of the people that brought us Plan 9. It not only keeps the C language of old, but through the work of Ken Thompson, it provides a C that makes some otherwise complicated constructs straightforward. Backing this new C up is 33 programmability libraries that significantly reduce the amount of code a programmer needs to write. And every single line of this code is fully portable among different Plan 9 installations, even with different architectures — the notion of a .CW configure script has been vanquished at last. .PP Learning programming with Plan 9 is not something that requires complicated textbooks and four years of college study to master. In fact, with just the manual pages and pages of some documentation in hand, someone can quickly master the core concepts. However, there sometimes is a need of a starter's guide or tutorial to start with or to clear up some uncertainty. That task is what this paper aims to do. This paper is .I not a full reference to Plan 9's programming environment — the manual pages do that. Keep this in mind while you read. .PP You need to know how to use Plan 9 from Bell Labs, rc, an editor such as sam or acme, and the C programming language to start. The official guide to C is Prof. Brian Kernighan and Dennis Ritchie's .I "The C Programming Language" , now in its second edition. Read through it: you'll learn quite a lot. .NH Core Concepts .PP Here is Kernighan's "hello, world"-printing program that has become quintessential, in Standard C and with a few differences from the one in Kernighan's book (for exposition purposes). .P1 #include int main() { printf("hello, world\en"); return 0; } .P2 Now here it is as a Plan 9 programmer would write it. .P1 #include #include void main() { print("hello, world\en"); exits(0); } .P2 Immediately, expert C programmers will say things like “Where did .CW stdio go?” and shout at the top of their lungs things like “You can't declare .CW main as returning .CW void !” If you're one of these guys, then you better get used to it. .PP The include file .CW u.h , stored in .CW /$objtype/include where .CW $objtype is an environment variable storing the current CPU name, contains CPU-specific definitions. All header files in Plan 9 use this, so it must be included first. Next comes .CW libc.h , stored in .CW /sys/include . .CW libc.h contains the definitions for the C library, which is linked into every Plan 9 program. The C library consists of several parts: .IP \(bu All the Plan 9 system calls (save for a few that only the library uses) .IP \(bu A set of subroutines to facilitate using the system calls .IP \(bu The formatted print routines .IP \(bu Mathematical functions .IP \(bu Time functions .IP \(bu Functions for working with Unicode characters, or .CW Rune s .LP .CW libc must be second; it is used by most, if not all, other libraries. .PP The .CW print function is a member of the set of formatted print routines; it works just like .CW printf in C, with several minor differences: .IP \(bu The .CW %u format is gone; it has been replaced with the .CW u modifier to other integer formats. So instead of saying .CW %-3lu , you say .CW %-3uld . .IP \(bu The .CW %b format is provided for printing binary numbers. .IP \(bu The .CW %C and .CW %S formats are provided for printing UTF-8 characters, called .CW Rune s, and strings of .CW Rune s, respecitvely. They are discussed later. .IP \(bu The .CW ll modifier flag to integral formats prints .CW vlong s, which are described later. .IP \(bu The .CW %r format prints the error string, which is described next. .IP \(bu You can create your own formats; that is described later. .LP Otherwise, .CW print behaves the same as .CW printf . .PP .CW exits and the .CW void return from .CW main require a bit of explanation. The traditional way of representing errors and status returns in C is with numbers: a return from .CW main or the argument to .CW exit represents a status return from a program, and .CW errno stores information about error returns from functions. The traditional behavior is to have zero mean no error and any other value mean error; ANSI C defines .CW EXIT_SUCCESS and .CW EXIT_FAILURE for status returns from programs. .PP This gets restricted very quickly. ANSI C only defines three standard values for .CW errno (domain error, range error, and illegal multibyte sequence) and two values for status return. And sometimes an integer won't tell you enough. For example, let's take the .UX .CW lseek system call, which manipulates the file read/write position: .P1 long lseek(int fd, long offset, int from); .P2 If any argument is invalid (for example, .CW from not 0, 1, or 2), .CW lseek returns with .CW errno set to .CW EINVAL (specific to .UX ). But this doesn't tell you .I which argument was invalid, or how many; it only says that something was not right. We can add the appropriate .CW errno values to resolve this problem. But what about a library that defines over 1,000 values for .CW errno ? On machines with small .CW int sizes, this chokes your program and defeats the purposes of both sides. .PP A better idea is to give the programmer the ability to handle any error that comes in without worry of losing standards compliance or clarity, and to generate any error without falling into a surfeit of possibility. So the designers of Plan 9 decided to use strings instead of numbers. Each program has an .I "error string" which is set by routines when an error occurs. And each program returns a string to the host environment with the .CW exits system call. The value given to .CW exits can be accessed from rc through the environment variable .CW $status . .PP So with a string, how do you represent a lack of error? Why, with a null pointer or null string! Because the constant .CW 0 turns into a null pointer, the statement .P1 exits(0); .P2 does everything already. Of course you can also say .P1 exits(nil); .P2 or .P1 exits(""); .P2 .CW nil , in .CW u.h , is Plan 9's .CW NULL . .PP So how does this explain why .CW main has to return .CW void ? You can't return a string placed in automatic storage from a function: .P1 char * f(void) { auto char s[] = "hello"; return s; /* WRONG */ } .P2 But a programmer may store the exit status of a program in this way. .SH An Aside on Style .PP Plan 9 programs are usually written to conform to a predefined set of style guidelines, described in the manual page .I style (6), for the sake of uniformity. Here is a taste: .P1 static int func(int f, char *g[]) { int i, j; j = 5; acquirelock(); for(i = 0; i < j; i++){ process(i, &j); if((j = g(&i)) == 0 ? h() : i()) /* g() affects h()/i() */ if(strcmp(s, t) == 0) something(); } return j - i; } .P2 Of course this piece of code doesn't do anything sensible by itself. It was written to show the basics of this style. If you want to contribute to Plan 9, be sure to use this style. Of course, you can still use your favorite style elsewhere. .NH Compiling Programs .PP .UX compilers give you the option of compiling a program in one shot: .P1 $ cc a.c b.c # compile and link; creates a.out $ a.out # run .P2 or in pieces: .P1 $ cc -c a.c # compile; creates a.o $ cc -c b.c # compile $ cc a.o b.o -lS # link; creates a.out. you can also use ld and omit -lS $ a.out # run .P2 Plan 9 gives you no choice but to do the latter, but with .CW ld instead of .CW cc for the final stage. On top of that, there is no single C compiler and no single linker — there is one of each for each supported processor architecture. .PP What are the benefits to this requirement? First, large projects can be built with ease, just like .CW make . (Plan 9 provides an improved variant, called .CW mk , that I describe later.) Second, it removes one possible error: mixing computer architectures. Third, it promotes separation of tasks: the C compiler should not be expected to link. .PP Using this system is easy. All you have to know is the single character that denotes your processor. For the Intel x86 family that is in most PCs, that character is .CW 8 . So I do .P1 % 8c a.c # compile; creates a.8 % 8c b.c # compile % 8l a.8 b.8 # link; creates 8.out % 8.out # run .P2 A complete list is in the manual page for the C compilers, .I 2c (1). .PP Also note that a special feature of the C compilers allows the linker to detect that .CW libc or another Plan 9 library is to be linked into the program without any extra flags. I will get to that later. .NH Manipulating Files .PP In Plan 9, absolutely everything is a file — even processes .CW /proc ), ( environment variables .CW /env ), ( and file descriptors .CW /fd )! ( What is a file descriptor? A file descriptor is an integer that represents an open file. Files are opened with the .CW open system call, which returns one. The syntax of .CW open is .P1 int open(char *filename, int openmode); .P2 .CW openmode is one of the constants .CW OREAD , .CW OWRITE , or .CW ORDWR , which define what you intend to do with this file (read, write, or both), optionally combined with the constants .CW OTRUNC , .CW OCEXEC , and .CW ORCLOSE via bitwise OR .CW | ). ( If .CW OTRUNC is given with .CW OWRITE or .CW ORDWR , the file is truncated to zero length. .CW OCEXEC and .CW ORCLOSE are described later. .CW open returns a valid file descriptor @n@ such that @n >= 0@ on success, or -1 on failure. .PP It is an error to open a file that doesn't exist, so the .CW create system call is used to create one. (Ken got his wish.) .CW create takes the form .P1 int create(char *filename, int createmode, int permissions); .P2 If the file already exists, it is truncated to zero length. The .CW permissions are just as in .UX : a three-digit octal number containing a combination of read, write, or execute bits for the file's owner, the group of the owner, and everyone else. For example, .CW 0644 yields .CW rw-r--r-- , and .CW 0750 yields .CW rwxr-x--- . .CW createmode is either 0 or a bitwise OR of .CW DMDIR , which creates a directory , .CW DMAPPEND , which makes a file that can only be appended to (i.e. a log file), .CW DMEXCL , which makes the file openable by only one program at a time, and .CW OEXCL , which will cause .CW create to fail if the file exists. .PP The .CW read and .CW write system calls read and write arbitrary data to the files: .P1 long read(int fd, void *buf, long n); long write(int fd, void *buf, long n); .P2 read .CW n bytes from .CW fd into .CW buf and write .CW n bytes from .CW fd into .CW buf , respectively. .CW read returns the number of bytes read, while .CW write returns the number of bytes written. .PP Why do .CW read and .CW write seem to return their argument .CW n ? The truth is, they don't always do so. Let's take .CW read as an example. What if the end of the file is reached before anything was read? Well, you read nothing, so .CW read will appropriately return 0. A .CW write can fail if the disk is full. .PP Instead of using the low-level .CW write , you can use .CW fprintf . .CW fprint , like .CW fprintf , allows formatted output to an open file. It takes, as an extra first argument, the appropriate file descriptor. Note that there are no reading functions like .CW scan ; buffered I/O via .CW libbio , described later, provides the facilities. .PP The .CW seek system call changes where reads and writes are performed in relation to the file. .P1 vlong seek(int fd, vlong amount, int from); .P2 If .CW from is 0, seek to .CW amount from the start of the file. If 1, seek from the current position. If 2, seek from the end. Note that .CW amount goes to the right if positive and left if negative regardless of .CW from , so to seek five characters before the end, you say .P1 seek(fd, -5, 2); .P2 .CW seek returns the position from the start regardless of .CW from . On error, .CW seek seems to succeed; only by examining the error string can you detect an error. .CW seek fails on directories and does nothing on pipes. .PP What is .CW vlong ? It is a .CW typedef -ed alias to .CW "long long" . The C compilers, as well as C99, provide the .CW "long long" type, which provides access to very long integer values, often 64 bits. There is also an .CW unsigned variant. .CW u.h provides the terse alias .CW uvlong . On a 32-bit processor like the x86, 64-bit values are simulated. For instance, you can't do .P1 vlong v; switch(v){ case a: /* ... */ } .P2 However, the mere fact that 64-bit values are available is promising. .PP Finally, the .CW close system call says that you are done with a file you opened or created. It takes the form .P1 int close(int fd); .P2 .CW close should only fail (return -1) if .CW fd is not really open, so just ignore its return value. .PP Before I move on, I need to talk about three file descriptors that all programs have when they are created. File descriptor 0 is .I "standard input" , which is the keyboard by default and changed with rc's .CW < , .CW << , .CW <{\fR...\fP} , and .CW | . File descriptor 1 is .I "standard output" , which is either the screen or the current rio window by default and changed with rc's .CW > , .CW >> , and .CW | . So .P1 print("hello"); .P2 is the same as .P1 fprint(1, "hello"); .P2 File descriptor 2 is .I "standard error" . This allows you to give the user emergency output in the case of an error, without fear of losing the error to redirected output. Standard error can be redirected with the .CW [2] modifier to the output redirection operators in rc. .NH UTF-8 Support .PP Plan 9 supports Unicode via UTF-8, however you need special provisions for handling the extended characters. The special type .CW Rune is large enough to store a UTF-8 character, which can be embedded into a C program using Standard C's wide character literal format .CW L'\fIcharacter\fP' . A string of .CW Rune s can be made in the same way as a string of characters, and has the type "array of .CW Rune s." Most .CW Rune s can be entered directly from the keyboard; see .I keyboard (6) for instructions and the file .CW /lib/keyboard for a complete list and their key codes. .PP A UTF-8 character or string can be output with the .CW %C and .CW %S formats to the print routines, respectively. For example, .P1 #include #include void main() { print("3 %C 4\en", L'≤'); print("%S\en", L"Άρχιμήδης"); /* Archimedes */ } .P2 The codes for capital alpha and lowercase eta with tonos (Unicode 0386 and 03AE, respectively) cannot be entered with the keyboard; they were generated with a simple program: .P1 #include #include void main(int argc, char *argv[]) { if(argc != 2){ fprint(2, "usage: %s hex-code\en", argv[0]); exits("usage"); } print("%C\en", (Rune)strtol(argv[1], nil, 16)); exits(0); } .P2 .CW argc , .CW argv , and .CW strtol act as in standard C. If this program is compiled as .CW code2rune , you can say .P1 % code2rune 0386 Ά % code2rune 41 A .P2 .PP A .CW Rune can be constructed from at least one .CW char . This allows input of .CW Rune s by reading a .CW char and seeing if it can be used to begin a .CW Rune . This is a simple multi-step process: .IP 1. Read a character. .IP 2. If that character is less than the constant .CW Runeself , then cast that character to a .CW Rune and return it. Otherwise, store that character in the first position of a buffer. .IP 3. Read the next character into the next buffer position. .IP 4. If the buffer from beginning to the current position is a full .CW Rune , return that .CW Rune . Otherwise, return to step 3. .PP The function .CW fullrune does the test in step 4. .P1 int fullrune(char *buf, int n); .P2 returns a nonzero (true) value if the .CW n characters pointed to by .CW buf make up a full .CW Rune . The function .CW chartorune does the actual conversion: .P1 int chartorune(Rune *dest, char *src); .P2 turns the data pointed to by .CW src into the .CW Rune stored at .CW *dest and returns the number of bytes of .CW src used. On error, it returns 1 and stores the constant .CW Runeerror in .CW *dest . The number of bytes shall never exceed .CW UTFmax , a constant that defines how many possible bytes may be in a .CW Rune . .PP With all this in mind, we can write a function that uses .CW read to read in a single .CW Rune from a given file descriptor and returns the number of characters read. It behaves similarly to .CW chartorune on error: it returns the number of bytes read, but stores .CW Runeerror . .P1 long readrune(int fd, Rune *r) { char buf[UTFmax]; char c; long nread, n; int i; if((nread = read(fd, &c, 1)) != 1){ *r = Runeerror; return nread; } if(c < Runeself){ *r = (Rune)c; return nread; } buf[0] = c; for(i = 1;;){ if((n = read(fd, &c, 1)) != 1){ *r = Runeerror; return nread; } nread += n; buf[i++] = c; if(fullrune(buf, i)){ chartorune(r, buf); return nread; } } } .P2 We can test this out in a program that reads .CW Rune s and prints them out, buffering the output. .P1 #include #include void main() { Rune rs[100]; int i; i = 0; while(readrune(0, &rs[i]) > 0) if(rs[i] == L'\en'){ rs[i] = '\e0'; print("%S\en", rs); i = 0; }else i++; exits(0); } .P2 Let's try this out: .P1 % readrune a a abc abc 3≤4 3\(pw\(pw\(pw4 ≤ \(pw\(pw\(pw \fIctl-\fPd% .P2 .PP Something seems to be amiss. For every Unicode character I put in, something gets eaten up and a mess of "I don't have that glyph" symbols (Peter Weinberger's famous headshot) comes up. Our problem is declaring .CW c in .CW readrune as a .CW char ; if we change it to .CW uchar (a synonym for .CW "unsigned char" ), then we get this interactive session: .P1 % readrune 3≤4 3≤4 ≤+-4556 ≤+-4556 \fIctl-\fPd% .P2 .PP There's still a problem. Consider .P1 % xd -c -b bad 0000000 e0 Q R S \en 0 e0 51 52 53 0a 0000005 % cat bad .Bx ? QRS % readrune < bad .Bx ? S % .P2 Obviously incorrect. The .BX \f(CW?\fP means "this is not a valid .CW Rune ." It turns out that even though .CW fullrune may report that the buffer contains a .CW Rune , it does not say that the .CW Rune is valid. In these situations, .CW chartorune may give up, returning a number of characters converted .I less than the number of characters read! This means we ate too much. Fortunately, and if we're not reading a pipe or directory, we can fix this with the use of .CW seek . Change the last .CW if to .P1 if(fullrune(buf, i)){ n = chartorune(r, buf); while(i > (int)n){ seek(fd, -1, 1); i--; nread--; } return nread; } .P2 and everything works: .P1 term% readrune < bad .Bx ? QRS .P2 .PP The .CW seek used says to seek -1 characters forward from the current position, or one character back. In effect, this is .CW ungetc from Standard C, except that it doesn't work on pipes or directories. There are other common uses of .CW seek : .P1 seek(fd, 0, 0); .P2 seeks to the beginning of a file, .P1 pos = seek(fd, 0, 1); .P2 doesn't change the file position but tells you where, from the beginning, you are, and .P1 seek(fd, 0, 2); .P2 goes to the end. This is done by default when opening a file that is append-only for writing. .NH Buffered I/O .PP Let us write a program .CW runecount that counts the number of .CW Rune s in a file. The standard wc doesn't do this; it counts the number of bytes. I have omitted the definition of .CW readrune . .P1 #include #include long readrune(int, Rune *); uvlong runecount(int fd, char *filename) { uvlong n; Rune r; n = 0; while(readrune(fd, &r) != 0) n++; print("%10ulld %s\en", n, filename); return n; } void main(int argc, char *argv[]) { int fd, i; uvlong total; total = 0; if(argc == 1) runecount(0, ""); else{ for(i = 1; i < argc; i++){ fd = open(argv[i], OREAD); if(fd == -1) fprint(2, "can't open %s: %r\en", argv[i]); else{ total += runecount(fd, argv[i]); close(fd); } } if(argc > 2) print("%10ulld total\en", total); } exits(0); } .P2 .PP The file .CW /lib/glass is a perfect file to test this program on; it contains translations of the phrase “I can eat glass and it doesn't hurt me.” in many languages and using Unicode characters. For example, .P1 % grep '^(French|Russian|Greek):' /lib/glass Greek: Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα. French: Je peux manger du verre, ça ne me fait pas de mal. Russian: Я могу есть стекло, оно мне не вредит. Greek: Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα. .P2 To compare .CW runecount with wc, let's try them out: .P1 % runecount /lib/glass 6715 /lib/glass % wc -c /lib/glass 8517 /lib/glass .P2 So .CW /lib/glass has 6,715 .CW Rune s that fill up 8,517 bytes. .PP It turns out that when running .CW runecount , I had to wait a while before getting any output, while wc returned immediately. The time program will tell me how long a program runs, so let's try it on .CW runecount : .P1 % time runecount /lib/glass 6715 /lib/glass 0.02u 1.24s 6.74r runecount /lib/glass .P2 This tells that the program took 6.74 seconds to run, with 1.24 seconds in the kernel, 0.02 seconds in user space (that is, .CW main , .CW runecount , and .CW readrune ), and the rest doing various other things that I really don't know about (sorry!). .PP wc is faster than .CW readrune because it buffers its input. A .I buffer is an in-memory array of a number of data objects. When you ask to acquire a character from a character buffer tied to a file, it first sees if there is a character in the buffer. If there is a character, the character is removed from the buffer and returned to the user. If not, then the buffer is filled by reading enough characters to occupy every element of the buffer array, and the first character in the buffer is removed. .CW readrune , however, does no buffering, so a new character has to be read every time. .PP Fortunately, Plan 9 provides not one but two ways of buffering input and output. The first is libstdio, which works just like in Standard C. But this doesn't support .CW Rune s, so we can't use it. It also has several other restrictions that I won't go into. .PP The second is libbio, with manual page .I bio (2). libbio is a library for buffering input and output in much the same way as libstdio, but provides a higher level of abstraction and full .CW Rune support. In fact, our .CW readrune function is based on libbio's equivalent function! To put libbio into your program, just do the following: .P1 #include .P2 This must follow the .CW #include of .CW libc.h . .PP The next step is to make a new .CW Biobuf , which is the libbio equivalent to .CW FILE . Note that I did not say that it was equivalent to .CW "FILE *" . This is because there are .I two ways to connect a file to a .CW Biobuf , with each method working differently. The first method actually opens a file: .P1 Biobuf *Bopen(char *filename, int openmode); .P2 .CW openmode is either .CW OREAD , to indicate reading, or .CW OWRITE , which creates the file with mode .CW 0666 .CW rw-rw-rw- ). ( It returns a pointer to a dynamically allocated .CW Biobuf , or .CW nil if error. .PP You can also connect a .CW Biobuf to an already open file: .P1 int Binit(Biobuf *bp, int fd, int mode); .P2 .CW bp is a pointer to an already allocated .CW Biobuf , either created explicitly by the compiler or dynamically allocated with .CW malloc . The entire .CW malloc family of routines is provided by the C library. In this case, .CW openmode is the same as in .CW Bopen , except .CW OWRITE does not create the file. It returns the constant .CW Beof on error. You can use this function to wrap the standard file descriptors to libbio; this is the only way to do formatted reads from standard input, since Plan 9 doesn't provide a .CW scanf equivalent. .PP Once we have a .CW Biobuf open, we can use several functions to read and write to them. But first, a brief note on an extension of Plan 9's C: basic inheritance is supported. The structure .CW Biobuf has the properties of another structured named .CW Biobufhdr , to the point that all .CW Biobuf needs is to have all the elements of .CW Biobufhdr and the buffer itself. A pointer to a .CW Biobuf can be used as a pointer to a .CW Biobufhdr . This feature will be described in full when we talk about the .CW lock family of routines. .PP The basic input functions provided by libbio are numerous and useful: .P1 long Bread(Biobufhdr *bp, void *buf, long n); void *Brdline(Biobufhdr *bp, int delim); char *Brdstr(Biobufhdr *bp, int delim, int nulldelim); int Blinelen(Biobufhdr *bp); int Bgetc(Biobufhdr *bp); long Bgetrune(Biobufhdr *bp); int Bungetc(Biobufhdr *bp); int Bungetrune(Biobufhdr *bp); int Bgetd(Biobufhdr *bp, double *d); .P2 .CW Bread behaves just like .CW read . .CW Brdline returns either a full buffer or everything up to the given delimiter. A more useful function is .CW Brdstr , which returns a .CW malloc -ed string consisting of the next full line ending with the given delimiter, or .CW nil on failure. If .CW nulldelim is nonzero, the delimiter is not included in the returned string. This eliminates the need for idioms like .P1 s[strlen(s) - 1] = '\e0'; .P2 In both cases, the function .CW Blinelen returns the length of the returned line. .PP .CW Bgetc and .CW Bgetrune read and return the next character and .CW Rune on the file, respectively. Both return .CW Beof on end of file, hence the .CW long return from .CW Bgetrune . They can be returned to the buffer with the equivalent .CW unget functions. Finally, .CW Bgetd reads in a .CW double , returning -1 on failure or the number of bytes read on success. .PP The output routines are .P1 long Bwrite(Biobufhdr *bp, void *buf, long n); int Bputc(Biobufhdr *bp, int c); int Bputrune(Biobufhdr *bp, long r); int Bprint(Biobufhdr *bp, char *fmt, ...); int Bvprint(Biobufhdr *bp, char *fmt, va_list v); int Bflush(Biobufhdr *bp); .P2 .CW va_list , and the family of (Standard C) supporting routines, are provided; the standard routines .CW vprint and .CW vfprint are provided. .CW Bflush immediately flushes the buffer; this is usually done when the buffer gets full. Everything else works as expected. .PP The .CW Bseek function works like .CW seek , but libbio provides an alternative to .P1 loc = Bseek(bp, 0, 1); .P2 in .CW Boffset , which takes the .CW "Biobufhdr *" and returns the offset as a .CW vlong : .P1 loc = Boffset(bp); .P2 To close an open file, use .P1 int Bterm(Biobufhdr *bp); .P2 .CW Bterm will not close files opened with .CW Binit ; this allows use of the standard file descriptors after a .CW Bterm on them. .PP Let's rewrite .CW runecount to use libbio. Note that we no longer need .CW readrune given .CW Bgetrune . .P1 #include #include #include uvlong runecount(Biobuf *f, char *filename) { uvlong n; Rune r; n = 0; while((r = Bgetrune(f)) != (Rune)Beof) n++; print("%10ulld %s\en", n, filename); return n; } void main(int argc, char *argv[]) { int i; uvlong total; Biobuf bstdin, *bfile; total = 0; if(argc == 1){ if(Binit(&bstdin, 0, OREAD) == Beof){ fprint(2, "can't connect stdin to bio: %r"); exits("Binit"); } runecount(&bstdin, ""); Bterm(&bstdin); }else{ for(i = 1; i < argc; i++){ bfile = Bopen(argv[i], OREAD); if(bfile == nil) fprint(2, "can't open %s: %r\en", argv[i]); else{ total += runecount(bfile, argv[i]); Bterm(bfile); } } if(argc > 2) print("%10ulld total\en", total); } exits(0); } .P2 and test it: .P1 % 8c runecount.c % 8l -o runecount runecount.8 % runecount /lib/glass 6715 /lib/glass % time runecount /lib/glass 6715 /lib/glass 0.00u 0.01s 0.02r runecount /lib/glass .P2 Now the program is significantly faster, and it still yields the proper answer. .PP Given .CW Bgetrune , is there a need for .CW runecount ? To be honest, this really depends on taste: one might argue that with libbio, we don't need to use the unbuffered .CW read and we will be just fine with .CW Bgetrune , while another might say that someone may want to use .CW readrune() and therefore it should be preserved. I will kill .CW readrune() in favor of .CW Bgetrune . I am doing this for several reasons: .IP \(bu Most programs use libbio and avoid the low-level system calls altogether. .IP \(bu If a program uses the system calls, it won't poll a byte or a .CW Rune at a time; it will just read an entire line or buffer. .IP \(bu Most functions deal with .CW Rune s implicitly, since a set of bytes makes up a .CW Rune , and for those that don't, conversion and handling routines are so straightforward that they are used after input is read. .LP Feel free to disagree. .SH An Aside on Linking .PP The compilation process for .CW runecount in the previous example was shown on purpose: it showed that you did not need an explicit linker flag to link to libbio. Of course, you could supply the libraries as arguments to the linker, in the form .CW -l \fIext\fP, where .I ext is the library name without the lib- prefix .CW -lbio , ( for example). .PP But the C compilers do this for you every time you include the appropriate header file. The C preprocessor reserves a special directive .P1 #pragma \fItext\fP .P2 where the .I text is implementation-defined. For Plan 9's C compiler, if .I text is of the form .P1 lib "\fIlibrary\fP" .P2 then the library is automatically linked .I "exactly once" per program. For example, .P1 % grep '^#pragma[ →]lib' /sys/include/libc.h #pragma lib "libc.a" .P2 (A tab in the command line is represented by .CW → .) The file .CW libc.a is part of a collection of library files in .CW /$objtype/lib . The .CW .a means that the library was made with the ar program; see .I ar (1). .NH Processes and Notes .PP Plan 9's process model, to the programmer, is very similar to .UX 's. You have .CW fork , .CW exec , and .CW wait , but they have changed quite a bit. The system call is no longer .CW fork but .CW rfork , which is much richer and more powerful. And .CW wait is now .CW await , which allows you to get a more precise indication of what happened and how. .CW fork and .CW wait are still there, but .CW wait is quite different. And Plan 9 has no notion of the signal; instead, it uses .I notes , which are strings. .PP The .CW rfork system call is simple: .P1 int rfork(int mode); .P2 The .CW mode is a bitmask of the following: .IP \f(CWRFPROC\fP \w'\f(CWRFNOWAIT\fP'+5 Make a new process. If not set, the mode is applied to the parent, allowing it to do things otherwise impossible. Few programs ever need to do so (for example, ar and rio do, for their own reasons). .IP \f(CWRFNOWAIT\fP The parent cannot use the .CW await system call or any related routines on the child. .IP \f(CWRFNAMEG\fP The child inherits a copy of the parent's name space (see below). If neither this nor .CW RFCNAMEG , the child shares the parent's name space. .IP \f(CWRFCNAMEG\fP The child has a clean name space to start. .IP \f(CWRFNOMNT\fP DIsallow the .CW mount system call (described later) and access to special device directories .CW # \fIletter\fP). ( .IP \f(CWRFENVG\fP Copy environment variables. Works the same as .CW RFNAMEG . .IP \f(CWRFCENVG\fP Start with no environment variables. .IP \f(CWRFNOTEG\fP Child has its own .I "note group" , so notes sent to it and its children don't affect the parent. .IP \f(CWRFFDG\fP Child's file descriptors are copied rather than shared. .IP \f(CWRFCFDG\fP Child has no file descriptors, .I "not even standard ones" . .IP \f(CWRFREND\fP Don't allow the child to .CW rendezvous with the parent or its parents. The .CW rendezvous system call is described below. .IP \f(CWRFMEM\fP Child and parent share data and “bss” segments — that is, global and local variables and function call. .LP As you can see, .CW rfork is a very powerful tool for controlling how a child behaves. (Parents may want to pray for a real-life .CW rfork .) But for most purposes, all you want to do is make a child that has its own file descriptors and not be able to communicate with the parent — .CW RFPROC|RFFDG|RFREND — and that is what the routine .CW fork does. Both return: .IP \(bu -1 on error .IP \(bu The child's process ID if the parent .IP \(bu 0 if the child .LP and continue execution from where you left off. So you can say .P1 switch(pid = rfork(RFPROC | RFFDG | RFNOTEG | RFENVG | RFNOWAIT | RFREND)){ case -1: sysfatal("rfork failed: %r"); case 0: child(); exits(0); } parent(); exits(0); .P2 The .CW sysfatal routine, which has the syntax .P1 void sysfatal(char *mesg, ...); .P2 prints the formatted message on standard error and terminates with that message as the status return. If the global variable .CW argv0 is set, it will be displayed before the message. .CW argv0 should be set to .CW argv[0] before programs mess with it; the command-line option macros we will see shortly do this for you. .PP Usually a .CW rfork is followed by one of the .CW exec routines, which allow a process to be replaced by another. The system call is .CW exec , which is similar to .UX 's .CW execv : .P1 void *exec(char *filename, char *argv[]); .P2 replaces the current process with the one at .CW filename , passing the given vector of arguments to the .CW main routine's .CW argv . The first argument .CW argv[0] ) ( is the program's effective name; usually the name without path. The final argument must be a null pointer; this is used to find .CW argc . The functions only return on failure and set the error string; the return value is insignificant. Therefore, you can say .P1 exec(prog, args); sysfatal("exec of %s failed: %r", prog); .P2 .PP .CW execl is a subroutine of the form .P1 void *execl(char *filename, ...); .P2 It turns each of its optional arguments into a member of an .CW argv array until a null pointer is seen, then calls .CW exec . Beware: .P1 execl(filename, nil); execl(filename); /* WRONG */ .P2 .PP What denotes an executable file? The user must have both execute and read permissions enabled on the file (although the manual page for .CW exec only states that execute is required), and the file cannot be a directory. The file is opened with the mode .CW OEXEC , which opens to read but requires execute permissions, and the first two bytes are scanned. If the bytes are the characters .CW #! , then the file is assumed to be text that is passed to another program. If the first line of file .CW f is .P1 #!/bin/rc .P2 and .CW f is called by .P1 execl("f", "a", nil); .P2 then the call to .CW execl is, in effect, .P1 execl("/bin/rc", "/bin/rc", "f", "a", nil); .P2 Otherwise, the two bytes are put back and a .CW long is read. If this does not equal the a.out magic number for the current CPU architecture (see .I a.out (6)), an error occurs. Otherwise, the program is executed. .PP The .CW await system call, which has the form .P1 int await(char *s, int n); .P2 waits for a child that was not .CW rfork -ed with the .CW RFNOWAIT flag set to terminate. When this happens, the first .CW n characters of a special string are stored in .CW s and the function returns the length of the special string that was stored (in case .CW n was too big), or -1 if there are no children to wait for. The special string is of the form .P1 \fIprocess-ID\fP \fIuser-time\fP \fIsystem-time\fP \fIreal-time\fP '\fIstatus-return\fP' .P2 with spaces separating each field. The status return is blank for successful termination; the appearance is .CW '' . The times are reported in milliseconds. There is .I no .CW '\e0' at the end of this string, so be sure to add one in your code: .P1 char buf[256]; int n; if((n = await(buf, 255)) >= 0) buf[n + 1] = '\e0'; .P2 .PP The .CW tokenize routine can be used to separate the individual fields: .P1 int tokenize(char *str, char **array, int max); .P2 .CW str is the string to tokenize, wich is split into at most .CW max elements of the array by overwriting certain delimiters with .CW '\e0' . The function returns the number of tokens actually split. The splitting rules are simple: split at whitespace, except treat quoted text as a single token. The quoting rules are the same as in rc: .TS center; lfCW l rfCW. 'hello' becomes hello 'stay here' becomes stay here 'the bee''s hive' becomes the bee's hive '' becomes \fRa null string\fP '''' becomes ' .TE So the code to split into the individual fields is simple: .P1 char *fields[5], buf[256]; int n; if((n = await(buf, 255)) < 0) sysfatal("await failed: %r"); buf[n + 1] = '\e0'; if(buf[n] != '\e'') sysfatal("buffer was too small to hold await's message"); tokenize(buf, fields, 5); print("pid %s took %s milliseconds and returned %s\en", fields[0], fields[3], *fields[4] == '\e0' ? "success" : fields[4]); .P2 .PP This is what the .CW wait subroutine does. .P1 Waitmsg *wait(void); .P2 which waits for a process and returns a .CW malloc -ed structure of type .CW Waitmsg : .P1 typedef struct Waitmsg Waitmsg; struct Waitmsg{ int pid; ulong time[3]; char *msg; }; .P2 where the fields are given in the same order that .CW await does, so .CW time[1] is system time. .CW msg is allocated with .CW malloc , but you can't use .CW free since the .CW malloc that was used is not what you think. You only have to .CW free the .CW Waitmsg , and everything else is fine. If you want to know the magic, see .CW /sys/src/libc/9sys/wait.c . if you want an example of .CW wait and .CW Waitmsg , see the source for the .CW time command at .CW /sys/src/cmd/time.c . .PP What happens if the command is interrupted (you hit the interrupt key)? An interrupt usually kills the process by sending what's called a .I note to the process and all its children in the same .I "note group" . Forking the child to have the .CW RFNOTEG flag set allows the child to handle its own notes independently from the parent. If .CW time did this, however, it would be unable to report that the command had been interrupted. .PP There are many different types of notes. The most common are .I interrupt , .I hangup , which is sent when you disconnect from a CPU server, .I alarm , which is associated with the .CW alarm system call, and .I "bad address" , which happens when you access invalid memory. If any of these notes are not handled, the program terminates. .PP How can you handle notes? rc allows you to define functions like .CW sigint that get executed when the specific note gets processed. What really happens is rc registers its .I "note handler" to execute the function and return when the specific note is issued. The system calls .CW notify and .CW noted do this. .PP Unlike with .UX signals, there is only one note handler function, which is registered with the .CW notify system call: .P1 int notify(void (*f)(void *, char *)); .P2 The argument is a pointer to a function .CW f defined as .P1 void f(void *ureg, char *note) .P2 The .CW ureg argument is turned into a pointer to a structure of type .CW Ureg , defined in .CW /$objtype/include/ureg.h . .CW Ureg contains the values of machine registers at the time the note was .I posted , and as such, is nonportable. Few, if any, programs ever need to use this structure and/or this argument to the handler. The second argument is the note string itself. If the function passed to .CW notify is a null pointer, the default handler is restored. The return value is insignificant. .PP Note handlers follow special rules. They may not use floating-point operations, nor may they call functions that do. A note handler cannot .CW return ; it must either exit, use the .CW noted system call, or call the .CW notejmp routine. .CW noted is of the form .P1 int noted(int how); .P2 .CW how is .CW NDFLT if you want the system to do the default action or .CW NCONT if you want the system to go back to where the program left off. The return value is insignificant, as the note handler doesn't return. Also, .CW jmp_buf , .CW setjmp , and .CW longjmp are provided, but you cannot .CW longjmp from within a note handler. Instead, you use the safer .CW notejmp routine, which works the same as .CW longjmp . .PP If a note interrupts a system call and the note handler calls .CW noted(NCONT) , the system call terminates early with error string .CW interrupted . This is very important, as it can be a cause of errors. Beware. .PP To send a process a note, use the .CW postnote subroutine: .P1 int postnote(int who, int pid, char *note); .P2 If .CW who is .CW PNPROC , only the process is killed. But if it is .CW PNGROUP , all the processes in the process group is killed, with the exception of the current process if it is in that group. This is a restriction of the operating system, not of .CW postnote itself. On failure, .CW postnote returns -1. A useful but undocumented note to post is .CW kill , which terminates the process without giving it a fighting chance. This is actually what the .CW kill command does: .P1 term% kill rc echo kill>/proc/2379/note # rc echo kill>/proc/4431/note # rc echo kill>/proc/5453/note # rc echo kill>/proc/6233/note # rc echo kill>/proc/6243/note # rc echo kill>/proc/6445/note # rc echo kill>/proc/6684/note # rc echo kill>/proc/7005/note # rc .P2 Piping that to rc will kill every rc, including the one you created in the pipe. .PP The .CW alarm note involves an .I "alarm clock" that each process has (and only one per process). The .CW alarm system call is of the form .P1 long alarm(ulong ms); .P2 .CW ulong is a synonym for .CW "unsigned long" . If its argument is 0, the alarm clock is cleared. Otherwise, the alarm clock is set to send the note .CW alarm after the given number of milliseconds. The return value is the number of milliseconds left in the previous alarm clock. .CW alarm can be used to write a command .CW timeout which stops a process from running after a given amount of time. .P1 #include #include int pid; char *prog; void notehandler(void *, char *note) { if(strcmp(note, "alarm") == 0) if(postnote(PNGROUP, pid, "kill") < 0) sysfatal("could not time out %s: %r\en", prog); else{ fprint(2, "timeout\en"); exits("timeout"); } else noted(NDFLT); } int endswith(char *full, char *what) { int i; char *wp = what + strlen(what) - 1; for(i = strlen(full) - 1; wp >= what; i--, wp--) if(full[i] != *wp) return 0; return 1; } void main(int argc, char *argv[]) { long ms; Waitmsg *w; if(argc <= 2){ fprint(2, "usage: %s seconds command-line\en", argv[0]); exits("usage"); } ms = strtoul(argv[1], nil, 10) * 1000; /* sec -> ms */ prog = smprint("/bin/%s", argv[2]); switch(pid = rfork(RFPROC | RFFDG | RFENVG | RFREND | RFMEM | RFNOTEG)){ case -1: sysfatal("fork failed: %r"); case 0: exec(prog, &argv[2]); prog = smprint("./%s", argv[2]); exec(prog, &argv[2]); sysfatal("exec failed: %r"); } notify(notehandler); alarm(ms); w = wait(); if(w->msg[0] != '\e0'){ fprint(2, "%s failed with %s\en", prog, w->msg); free(prog); exits("failed run"); } free(prog); exits(0); } .P2 no We have to provide .CW endswith since the C library doesn't provide the similar .CW strrstr (it does provide .CW strstr and other functions). .CW smprint creates, using .CW malloc , a string which contains the fully formatted text. Use this instead of a custom buffer and .CW sprint , as it avoids the risk of truncating or overflow due to an improperly sized buffer. The .CW RFMEM flag is set so the process can change .CW prog at will. We kill with .CW PNGROUP in case the program that you run forks its own processes. .PP This example shows another feature of the Plan 9 C compilers: an unnamed argument signals that it is not used. .ig .NH Segments, Interprocess Communication, and Locks .PP The easiest form of interprocess communication in Plan 9 is the pipe. Pipes are implemented just as in .UX , right down to the system call: .P1 int pipe(int fd[2]); .P2 creates a pipe of the form .PS File0: box "\f(CWfd[0]\fP" move right File1: box "\f(CWfd[1]\fP" arrow -> with .start at 1/2 arrow <- with .start at 1/2 .PE with an arrow pointing from the writer to the reader. However, Plan 9 has more sophisticated ways of interprocess communication. .PP A .I segment is a block of memory that can be shared. Segments can be as small as .CW int s or as large as the system permits. .CW fork retains segments, but .CW exec will only do so if the program is too large that it overwrites the sgement. We can use segments to implement shared memory. You create a segment with the .CW segattach system call: .P1 void *segattach(int attr, char *class, void *va, ulong len); .P2 The .CW class is a string containing the type of segment. For shared memory, the string is .CW """shared""" , and for a segment for normal use, the string is .CW """memory""" . The attribute is zero or a bitmask of .CW SG_RONLY for a read only segment and .CW SG_CEXEC which releases the segment on an .CW exec . .CW va marks where the segment is, or .CW nil if the system should choose. Most users won't have a need for any other value. The return value is the starting address of the segment on success, or .CW "(void *)-1" on error. Its counterpart is .P1 int segdetach(void *addr); .P2 Simply pass the return value of .CW segattach to free the segment. ..