Strings

Version 1.5 of uLisp adds support for strings. A string can consist of an arbitrary number of ASCII characters, and the storage required is 2 bytes per character plus four or five bytes. The following section describes how strings are implemented.

16th April 2017: This description has been updated to match uLisp 1.8.

String representation

As with all objects in uLisp a string consists of two 2-byte cells. Strings are identified by a '8' in the left cell, and there is a pointer to the characters in the string in the right cell:

Strings1.gif

The type enum is now:

enum type {ZERO=0, SYMBOL=2, NUMBER=4, STREAM=6, STRING=8, PAIR=10};

A null string just has NULL in the right cell.

In a string of one or more characters the right cell points to a linked list of objects, one object for each pair of characters. This avoids the need to have a separate storage area for strings, and allows strings to be garbage collected in the same way as other objects.

For example, creating a string with:

(defvar str "hello")

would give this structure:

Strings2.gif

In a string the cells are linked together using car pointers, rather than the usual cdr pointers, so that the characters won't be affected when the top bit of the car cell is marked during garbage collection.

Garbage collection

An additional test in markobject() handles the garbage collection of strings:

void markobject (object *obj) {
  MARK:
  if (obj == NULL) return;
  if (marked(obj)) return;

  object* arg = car(obj);
  unsigned int type = obj->type;
  mark(obj);
  
  if (type >= PAIR || type == ZERO) { // cons
    markobject(arg);
    obj = cdr(obj);
    goto MARK;
  }

  if (type == STRING) {
    obj = cdr(obj);
    while (obj != NULL) {
      arg = car(obj);
      mark(obj);
      obj = arg;
    }
  }
}

This simply steps along the string, until it reaches a NULL pointer, marking each pair as it goes.

Reading a string

The utility function readstring() reads in a string up to a specified delimiter and returns the string object:

object *readstring (char delim) {
  object *obj = myalloc();
  obj->type = STRING;
  char ch = gchar();
  object *head = NULL;
  object *tail = NULL;
  int chars = 0;
  while (ch != delim) {
    if (ch == '\\') ch = gchar();
    buildstring(ch, &chars, &head, &tail);
    ch = gchar();
  }
  obj->cdr = head;
  return obj;
}

This calls buildstring() which allocates a new object for each pair of characters, and packs the characters into the cdr cell of each object:

void buildstring (char ch, int *chars, object **head, object **tail) {
  if (*chars == 0) {
    *chars = ch<<8;
    object *cell = myalloc();
    if (*head == NULL) *head = cell; else (*tail)->car = cell;
    cell->car = NULL;
    cell->integer = *chars;
    *tail = cell;
  } else {
    *chars = *chars | ch;
    (*tail)->integer = *chars;
    *chars = 0;
  }
}

Finally readstring() stores the head of the linked list of character pairs into the cdr cell of the string object.

It is called by read and read-line.

Printing a string

Finally, the utility function printstring() handles the printing of a string object:

void printstring (object *form) {
  if (PrintReadably) pchar('"');
  form = cdr(form);
  while (form != NULL) {
    int chars = form->integer;
    char ch = chars>>8 & 0xFF;
    if (PrintReadably && ch == '"') pchar('\\');
    pchar(ch);
    ch = chars & 0xFF;
    if (PrintReadably && ch == '"') pchar('\\');
    if (ch) pchar(ch);
    form = car(form);
  }
  if (PrintReadably) pchar('"');
}

The global variable PrintReadably is used to determine whether the string is printed with enclosing quotation marks and escape characters, like print, or without them, like princ

String functions

Some functions have been added or extended to work with strings:

  • The function subseq returns a subsequence of a string.
  • The function concatenate joins together an arbitrary number of strings.
  • The function string= returns t if its two string arguments are equal, and nil otherwise.
  • The function stringp returns t if its argument is a string.
  • The function read-line reads in a string up to a return character.
  • The function length has been extended to return the number of non-null characters in a string.