man(1) Manual page archive


     CONVCS(2)                                               CONVCS(2)

     NAME
          Convcs,  Btos, Stob - character set conversion suite

     SYNOPSIS
          include "convcs.m";
          convcs := load Convcs Convcs->PATH;

          Btos: module {
               init: fn(arg: string): string;
               btos: fn(s: Convcs->State, b: array of byte, nchars: int)
                         : (Convcs->State, string, int);
          };

          Stob: module {
               init: fn (arg: string): string;
               stob: fn(s: Convcs->State, str: string)
                         : (Convcs->State, array of byte);
          };

          Convcs: module {
               State: type string;
               Startstate: con "";

               init: fn(csfile: string): string;
               getbtos: fn(cs: string): (Btos, string);
               getstob: fn(cs: string): (Stob, string);
               enumcs: fn(): list of (string, string, int);
               aliases: fn(cs: string): (string, list of string);
          };

     DESCRIPTION
          The Convcs suite is a collection of modules for converting
          various standard coded character sets and character encoding
          schemes to and from the Limbo strings.

          The Convcs module provides an entry point to the suite, map-
          ping character set names and aliases to their associated
          converter implementation.

        The Convcs module
          init(csfile)
               Init should be called once to initialise the internal
               state of Convcs.  The csfile argument specifies the
               path of the converter mapping file.  If this argument
               is nil, the default mapping file /lib/convcs/charsets
               is used.

          getbtos(cs)
               Getbtos returns an initialised Btos module for

     CONVCS(2)                                               CONVCS(2)

               converting from the requested encoding.

               The return value is a tuple, holding the module refer-
               ence and an error string.  If any errors were encoun-
               tered in locating, loading or initialising the
               requested converter, the module reference will be nil
               and the string will contain an explanation.

               The character set name, cs, is normalised by mapping
               all upper-case latin1 characters to lower-case before
               comparison with character set names and aliases from
               the charsets file.

          getstob(cs)
               Getstob returns an initialised Stob module for convert-
               ing from strings to the requested encoding.  Apart from
               the different module type, the return value and seman-
               tics are the same as for getbtos().

          enumcs()
               Returns a list describing the set of available convert-
               ers.  The list is comprised of (name, desc, mode)
               tuples.

               Name is the standard name of the character set.

               Desc is a more friendly description taken directly from
               the desc attribute given in the charsets file.  If the
               attribute is not given then desc is set to name.

               Mode is a bitmap that details the converters available
               for the given character set.  Valid mode bits are
               Convcs->BTOS and Convcs->STOB.  For convenience,
               Convcs->BOTH is also defined.

          aliases(cs)
               Returns the aliases for the character set name cs. The
               return value is the tuple (desc, aliases).

               Desc is the descriptive text for the character set, as
               returned by enumcs().

               Aliases lists all the known aliases for cs. The first
               name in the list is the default character set name -
               the name that all the other aliases refer to.  If the
               aliases list is nil then there was an error in looking
               up the character set name and desc will detail the
               error.

        Using a Btos converter
          The Btos module returned by getbtos() is already initialised
          and is ready to start the conversion.  Conversions can be

     CONVCS(2)                                               CONVCS(2)

          made on a individual basis, or in a `streamed' mode.

          converter->btos(s, b, nchars)
               Converts raw byte codes of the character set encoding
               to a Limbo string.

               The argument s is a converter state as returned from
               the previous call to btos on the same input stream.
               The first call to btos on a particular input stream
               should give Convcs->Startstate (or nil) as the value
               for s. The argument b is the bytes to be converted.
               The argument nchars is the maximal length of the string
               to be returned.  If this argument is -1 then as much of
               b will be consumed as possible.  A value of 0 indicates
               to the converter that there is no more data and that
               any pending state should be flushed.

               The return value of btos is the tuple (state, str,
               nbytes) where state is the new state of the converter,
               str is the converted string, and nbytes is the number
               of bytes from b consumed by the conversion.

               The same converter module can be used for multiple con-
               version streams by maintaining a separate state vari-
               able for each stream.

        Using an Stob converter
          The Stob module returned by getstob() is already initialised
          and is ready to start the conversion.

          converter->stob(s, str)
               Converts the limbo string str to the raw byte codes of
               the character set encoding.  The argument s represents
               the converter state and is treated in the same way as
               for converter->btos() described above.  To terminate a
               conversion stob should be called with an emtpy string
               in order to flush the converter state.

               The return value of stob is the tuple (state, bytes)
               where state is the new state of the converter and bytes
               is the result of the conversion.

        Conversion errors
          When using converter->btos() to convert data to Limbo
          strings, any byte sequences that are not valid for the spe-
          cific character encoding scheme will be converted to the
          Unicode error character 16rFFFD.

          When using converter->stob() to convert Limbo strings, any
          Unicode characters that can not be mapped into the character
          set will normally be substituted by the US-ASCII code for
          `?'.  Note that this may be inappropriate for certain

     CONVCS(2)                                               CONVCS(2)

          conversions, such converters will use a suitable error char-
          acter for their particular character set and encoding
          scheme.

        Charset file format
          The file /lib/convcs/charsets provides the mapping between
          character set names and their implementation modules.  The
          file format conforms to that supported by cfg(2). The fol-
          lowing description relies on terms defined in the cfg(2)
          manual page.

          Each record name defines a character set name.  If the pri-
          mary value of the record is non-empty then the name is an
          alias, the value being the real name.  An alias record must
          point to an actual converter record, not to another alias,
          as Convcs only follows one level of aliasing.

          Each converter record consists of a set of tuples with the
          following primary attributes:

          desc A more descriptive name for the character set encoding.
               This attribute is used for the description returned by
               enumcs().

          btos The path to the Btos module implementation of this con-
               verter.

          stob The path to the Stob module implementation of this con-
               verter.

          Both the btos and stob tuples can have an optional arg
          attribute which is passed to the init() function of the con-
          verter when initialised by Convcs. If a converter record has
          neither an stob nor a btos tuple, then it is ignored.

          The following example is an extract from the standard
          Inferno charsets file:

          cp866=ibm866
          866=ibm866
          ibm866=
              desc='Russian MS-DOS CP 866'
              stob=/dis/lib/convcs/cp_stob.dis    arg=/lib/convcs/ibm866.cp
              btos=/dis/lib/convcs/cp_btos.dis    arg=/lib/convcs/ibm866.cp

          This entry defines Stob and Btos converters for the charac-
          ter set called ibm866.  The converters are actually the gen-
          eric codepage converters cp_stob and cp_btos paramaterized
          with a codepage file.  The entry also defines the aliases
          cp866 and 866 for the name ibm866.

     FILES

     CONVCS(2)                                               CONVCS(2)

          /lib/convcs/charsets
               The default mapping between character set names and
               their implementation modules.

          /lib/convcs/csname.cp
               Codepage files for use by the generic codepage convert-
               ers cp_stob and cp_btos.  Each file consists of 256
               unicode runes mapping codepage byte values to unicode
               by their index.  The runes are stored in the utf-8
               encoding.

     SOURCE
          /appl/lib/convcs/convcs.b  Implementation of the Convcs mod-
                                     ule.
          /appl/lib/convcs/cp_btos.b Generic Btos codepage converter.
          /appl/lib/convcs/cp_stob.b Generic Stob codepage converter.
     SEE ALSO
          tcs(1), cfg(2)